From prentice at ias.edu  Fri Apr  1 10:47:28 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 01 Apr 2011 10:47:28 -0400
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D93B1E4.3080407@cora.nwra.com>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<4D93B1E4.3080407@cora.nwra.com>
Message-ID: <4D95E580.5090902@ias.edu>

On 03/30/2011 06:42 PM, Orion Poplawski wrote:
> On 03/21/2011 06:51 AM, Douglas Eadline wrote:
>> I got to thinking about how others are fairing (or not)
>> with GP-GPU technology. I put up a simple poll on
>> ClusterMonkey to help get a general idea.
>> (you can find it on the front page right top)
>> If you have a moment, please provide
>> your experience (results are available as well).
> 
> We've seen some reasonable speedup (12x) with some matlab code using Jacket. 
> It required up-to-the-minute bugfixes/enhancements from Accelereyes to get it 
> working though.  Ran into lots of limitations with some other code (sparse 
> matrices) that prevented it from being usable.  Have some reports of success 
> with gpulib and IDL.
> 
> 

I've installed 4 GPU-equipped servers in my environment; 2 are a part of
my cluster, and 2 are independent from the cluster so that users can
login interactively and program/debug/tinker/whatever. (My cluster
doesn't allow interactive logins by design).

A handful of users were interested in getting access to the GPUs, but so
far, not a single one has even logged into these systems to kick the
tires yet, and the systems have been online for approx. 9 months. It
just be that they're busy with other work. Most of my users are
post-docs who guide their own research, so they can create/modify their
own project schedules as they see fit.


-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Shainer at Mellanox.com  Fri Apr  1 18:41:07 2011
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Fri, 1 Apr 2011 15:41:07 -0700
Subject: [Beowulf] AMD 8 cores vs 12 cores CPUs and Infiniband
References: <1301387847.1995.144.camel@mundo><9FA59C95FFCBB34EA5E42C1A8573784F037FD082@mtiexch01.mti.com>
	<Pine.LNX.4.64.1103292349500.16437@coffee.psychology.mcmaster.ca>
Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F037FD5B7@mtiexch01.mti.com>

 
> > I have been using single card on Magny-Cours with no issues at all.
> You can
> 
> interesting.  what adjustments have you made to the MPI stack to
permit
> this?
> we've had a variety of apps that fail intermittently on high-core
> nodes.
> I have to say I was surprised such a thing came up - not sure whether
> it's
> inherent to IB or a result of the openmpi stack.  our usual way to
test
> this is to gradually reduce the ranks-per-node for the job until it
> starts
> to work.  an interesting cosmology code works at 1 pppn but not 3 ppn
> on our recent 12c MC, mellanox QDR cluster.


I will be more than happy to give it a try - have access to the
Magny-Cours system at
http://www.hpcadvisorycouncil.com/cluster_center.php


> 
> regards, mark hahn.
> _______________________________________________
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From akshar.bhosale at gmail.com  Sat Apr  2 08:41:07 2011
From: akshar.bhosale at gmail.com (akshar bhosale)
Date: Sat, 2 Apr 2011 18:11:07 +0530
Subject: [Beowulf] error in job; jobs failing
Message-ID: <AANLkTikCOnF1a1hCZ2wmz5ETanwDg1dL5jvNbSuhCAZZ@mail.gmail.com>

Hi,
we are getting dapl 4003 event error. We have rhel 5.2 x64 and intel mpi
library 4.3;dapl-1.2.7-1.ofed1.3.1;
What can be the reason? we have torque and pbs setup for job runs.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110402/e2fdef3c/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From herbert.fruchtl at st-andrews.ac.uk  Mon Apr  4 11:15:35 2011
From: herbert.fruchtl at st-andrews.ac.uk (Herbert Fruchtl)
Date: Mon, 04 Apr 2011 16:15:35 +0100
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <mailman.1.1301684401.12568.beowulf@beowulf.org>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
Message-ID: <4D99E097.7060807@st-andrews.ac.uk>

They hear great success stories (which in reality are often prototype 
implementations that do one carefully chosen benchmark well), then look at the 
API, look at their existing code, and postpone the start of their project until 
they have six months spare time for it. And we know when that is.

The current approach with more or less vendor specific libraries (be they "open" 
or not) limits the uptake of GPU computing to a few hardcore developers of 
experimental codes who don't mind rewriting their code every two years. It won't 
become mainstream until we have a compiler that turns standard Fortran (or C++, 
if it has to be) into GPU code. Anything that requires more change than let's 
say OpenMP directives is doomed, and rightly so.

   Herbert

>
> I've installed 4 GPU-equipped servers in my environment; 2 are a part of
> my cluster, and 2 are independent from the cluster so that users can
> login interactively and program/debug/tinker/whatever. (My cluster
> doesn't allow interactive logins by design).
>
> A handful of users were interested in getting access to the GPUs, but so
> far, not a single one has even logged into these systems to kick the
> tires yet, and the systems have been online for approx. 9 months. It
> just be that they're busy with other work. Most of my users are
> post-docs who guide their own research, so they can create/modify their
> own project schedules as they see fit.
>
>

-- 
Herbert Fruchtl
Senior Scientific Computing Officer
School of Chemistry, School of Mathematics and Statistics
University of St Andrews
--
The University of St Andrews is a charity registered in Scotland:
No SC013532
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Mon Apr  4 12:01:44 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Mon, 04 Apr 2011 23:01:44 +0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
Message-ID: <4D99EB68.4020800@pathscale.com>

Herbert Fruchtl wrote:
> They hear great success stories (which in reality are often prototype 
> implementations that do one carefully chosen benchmark well), then look at the 
> API, look at their existing code, and postpone the start of their project until 
> they have six months spare time for it. And we know when that is.
>
> The current approach with more or less vendor specific libraries (be they "open" 
> or not) limits the uptake of GPU computing to a few hardcore developers of 
> experimental codes who don't mind rewriting their code every two years. It won't 
> become mainstream until we have a compiler that turns standard Fortran (or C++, 
> if it has to be) into GPU code. Anything that requires more change than let's 
> say OpenMP directives is doomed, and rightly so.
>   
Hi Herbert,

I think your perspective pretty much nails it

(shameless self promotion)
http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source)

This is really only the tip of the problem and there must also be 
solutions for scaling *efficiently* across the cluster.  (No MPI + CUDA 
or even HMPP is *not* the answer imho.)

./C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Mon Apr  4 12:53:22 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Mon, 4 Apr 2011 09:53:22 -0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47FF0C60D4455@ALTPHYEMBEVSP20.RES.AD.JPL>

You've described it pretty well..

Look how long it took for "standard libraries" to take advantage of things like MPI to become "of course we use that"..

If the original code used standard library calls for things like matrix math, and it's a "drop in" so you could do a "test case" in less than a day or so, you get pretty rapid acceptance.  If it requires weeks to just figure out how to make it work, it's going to be in the "when someone specifically funds me to do it".  

I've seen lots of really interesting things that I'd like to try, but not being independently wealthy or having a patron who is, I have to work on things that other people want done (and, presumably which I also find interesting).  I can write proposals to say "it would be really nice to do X because of speculative benefit Y"  and every once in a while, someone will say, "Yeah, that sounds good, go check it out".  And then we do.  


But it's a long and time consuming process.  For instance, I was just in a presentation last week discussing a recent call for proposals from NASA.. the *shortest* time from proposal to response (yes/no) was around 120 days, the median was around 200 days, and the max was around 400 days plus, depending on the year.
http://science.nasa.gov/researchers/sara/grant-stats/

A lot depends on what happens to the budgets as they wend their leisurely way through the program offices at the agencies, then get rolled up in the President's submission, then thrashed in Congress, then allocated, then back through the agency, and finally back down to the program.  To provide some perspective on the front end of the process, the program managers at the agencies are winding up their PPBE13 submissions (that's for FY13, starting October 2012, although it also affects FY12 funding) 

A "new technology" that hasn't been "on the radar" probably has a 2-3 year lag before significant money can be applied to it (at least from government funding sources).  Often, one can get smaller sums more quickly out of some general "investigate new technologies" kind of bucket (smaller sums = a few $10k), but right now, even those have essentially dried up  (Continuing resolutions, etc.)

To tie this back to the first question.. a few $10k would pay for the "Lets try recompiling with the new library and see if it works" sort of level of effort, but not for a "Let's rewrite our codes for the new hardware, and engage in a validation and verification effort to show that it still works"

James Lux, P.E.
Co-Principal Investigator, CoNNeCT Project
Task Manager, SOMD Software Defined Radios
Flight Communications Systems Section
Jet Propulsion Laboratory
4800 Oak Grove Drive, Mail Stop 161-213
Pasadena, CA, 91109
+1(818)354-2075 phone
+1(818)393-6875 fax 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Herbert Fruchtl
> Sent: Monday, April 04, 2011 8:16 AM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] GP-GPU experience
> 
> They hear great success stories (which in reality are often prototype
> implementations that do one carefully chosen benchmark well), then look at the
> API, look at their existing code, and postpone the start of their project until
> they have six months spare time for it. And we know when that is.
> 
> The current approach with more or less vendor specific libraries (be they "open"
> or not) limits the uptake of GPU computing to a few hardcore developers of
> experimental codes who don't mind rewriting their code every two years. It won't
> become mainstream until we have a compiler that turns standard Fortran (or C++,
> if it has to be) into GPU code. Anything that requires more change than let's
> say OpenMP directives is doomed, and rightly so.
> 
>    Herbert
> 
> >
> > I've installed 4 GPU-equipped servers in my environment; 2 are a part of
> > my cluster, and 2 are independent from the cluster so that users can
> > login interactively and program/debug/tinker/whatever. (My cluster
> > doesn't allow interactive logins by design).
> >
> > A handful of users were interested in getting access to the GPUs, but so
> > far, not a single one has even logged into these systems to kick the
> > tires yet, and the systems have been online for approx. 9 months. It
> > just be that they're busy with other work. Most of my users are
> > post-docs who guide their own research, so they can create/modify their
> > own project schedules as they see fit.
> >
> >
> 
> --
> Herbert Fruchtl
> Senior Scientific Computing Officer
> School of Chemistry, School of Mathematics and Statistics
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mfatica at gmail.com  Mon Apr  4 12:54:37 2011
From: mfatica at gmail.com (Massimiliano Fatica)
Date: Mon, 4 Apr 2011 09:54:37 -0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D99EB68.4020800@pathscale.com>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
Message-ID: <BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>

If you are old enough to remember the time when the first distribute
computers appeared on the scene,
this is a deja-vu. Developers used to program on shared memory (
mostly with directives) were complaining
about the new programming models ( PVM, MPL, MPI).
Even today, if you have a serial code there is no tool that will make
your code runs on a cluster.
Even on a single system, if you try an auto-parallel/auto-vectorizing
compiler on a real code, your results will probably be disappointing.

When you can get a 10x boost on a production code rewriting some
portions of your code to use the GPU, if time to solution is important
or you could perform simulations that were impossible  before ( for
example using algorithms that were just too slow on CPUs,
Discontinuous Galerkin method is a perfect example), there are a lot
of developers that will write the code.
The effort it is clearly dependent of the code, the programmer and the
tool used ( you can go from fully custom GPU code with CUDA or OpenCL,
to automatically generated CUF kernels from PGI, to directives using
HMPP or PGI Accelerator).
In situation where time  to solution relates to money,  for example
oil and gas, GPUs are the answer today ( you will be surprised
by the number of GPUs in Houston).
Look at   the performance and scaling of AMBER ( MPI+ CUDA),
http://ambermd.org/gpus/benchmarks.htm, and tell me that the results
were not worth the effort.

Is GPU programming for everyone: probably not, in the same measure
that parallel programming in not for everyone.
Better tools will lower the threshold, but a threshold will be always present.


Massimiliano
PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran,
applications porting with CUDA, MPI+CUDA).


2011/4/4 "C. Bergstr?m" <cbergstrom at pathscale.com>:
> Herbert Fruchtl wrote:
>> They hear great success stories (which in reality are often prototype
>> implementations that do one carefully chosen benchmark well), then look at the
>> API, look at their existing code, and postpone the start of their project until
>> they have six months spare time for it. And we know when that is.
>>
>> The current approach with more or less vendor specific libraries (be they "open"
>> or not) limits the uptake of GPU computing to a few hardcore developers of
>> experimental codes who don't mind rewriting their code every two years. It won't
>> become mainstream until we have a compiler that turns standard Fortran (or C++,
>> if it has to be) into GPU code. Anything that requires more change than let's
>> say OpenMP directives is doomed, and rightly so.
>>
> Hi Herbert,
>
> I think your perspective pretty much nails it
>
> (shameless self promotion)
> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source)
>
> This is really only the tip of the problem and there must also be
> solutions for scaling *efficiently* across the cluster. ?(No MPI + CUDA
> or even HMPP is *not* the answer imho.)
>
> ./C
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 15:16:31 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 21:16:31 +0200
Subject: [Beowulf] Quadrics?
In-Reply-To: <4D2C8B7C.30300@bull.co.uk>
References: <Pine.LNX.4.64.1101111055290.21366@coffee.psychology.mcmaster.ca>
	<4D2C8B7C.30300@bull.co.uk>
Message-ID: <C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>

hi,

sometimes i go through a lot of mails at the mailing list here and  
had missed this one.
please keep me up to date and/or add me to mailing lists there.

latency is superior of quadrics compared to all the infini* stuff.
drivers that integrate into kernels - well some modifications  
shouldn't be too hard.

Of course even the realtime linux kernel is rather crappy there, as  
it locks every action
from and to a socket (even RAW/UDP communication in fact),
so you need a 'hack' of that kernel anyway to get faster latencies.

secondhand the quadrics stuff is cheap it seems.

Vincent

On Jan 11, 2011, at 5:55 PM, Daniel Kidger wrote:

> Mark,
>
> I will let others step forward individually.
>
> I was one of the last employees to leave Quadrics , so I do know  
> who had
> support contracts at that time, plus the even larger set of sites that
> had expired support contracts but still were actively running their
> QsNet clusters.
>
> You know that a company called Vega took on the ongoing support? :
> here is the website I set up at the time: https:// 
> support.hpc.vega.co.uk/
>
> I agree too though that there should be a community of QsNet-owning
> enthusiasts, who could provide mutual support in this legacy era.
>
>
> Also off the record, I know that there is a lot of Elan4 stock sitting
> in a warehouse. As long as you are not looking for long term vendor
> support, I expect  you could acquire cards, cables and switches for a
> bargain price.
>
> Daniel
>
>
>> Are you still using Quadrics Elan4-based clusters?
>>
>> We would like to continue using Quadrics on one of our clusters,  
>> since it
>> is still quite good in latency.  Maintaining the Quadrics drivers,  
>> though,
>> is a bit of a pain going forward - would be nice to avoid  
>> duplicating effort,
>> if there are other groups also doing so.
>>
>> please follow up or email me if you are using Elan4, or know anything
>> relevant.
>>
>> thanks,
>> Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http:// 
>> www.sharcnet.ca
>>             | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140  
>> x24687
>>             | Compute/Calcul Canada                | http:// 
>> www.computecanada.org
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>>
>
>
> -- 
> Bull, Architect of an Open World TM
>
> Dr. Daniel Kidger, HPC Technical Consultant
> daniel.kidger at bull.co.uk
> +44 (0) 7966822177
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 15:20:15 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 21:20:15 +0200
Subject: [Beowulf]
 =?iso-8859-1?q?Chinese_supercomputers_to_use_=91homemad?=
 =?iso-8859-1?q?e=92_chips?=
In-Reply-To: <Pine.LNX.4.64.1103110108590.19223@coffee.psychology.mcmaster.ca>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.1103110108590.19223@coffee.psychology.mcmaster.ca>
Message-ID: <D25DC520-6373-4BF7-A35C-C2F597F3BE0A@xs4all.nl>


On Mar 11, 2011, at 7:20 AM, Mark Hahn wrote:

>> Interesting:
>> Chinese supercomputers to use ?homemade? chips
>> http://timesofindia.indiatimes.com/tech/personal-tech/computing/ 
>> Chinese-supercomputers-to-use-homemade-chips/articleshow/7655183.cms
>
> it's important to remind ourselves that China is still a centrally- 
> planned,
> totalitarian dictatorship.  I mention this only because this  
> announcement
> is a bit like Putin et al announcing that they'll develop their own  
> linux distro because Russia is big and important and mustn't allow  
> itself to be vulnerable to foreign hegemony.
>
> so far, the very shallow reporting I've seen has said that future  
> generations will add wide FP vector units.  nothing wrong with that,
> though it's a bit unclear to me why other companies haven't done it
> if there is, in fact, lots of important vector codes that will run  
> efficiently on such a configuration.  adding/widening vector FP is  
> not breakthrough engineering afaikt.
>
> has anyone heard anything juicy about the Tianhe interconnect? 
> _______________________________________________

Not really but busy with an AMD-GPU now the 6970 (note the 6990 also  
is available having 2 gpu's) is so fast
that the real problem is bandwidth from and to the gpu; so for a big  
cluster calculation i can understand very well
the need for having your own interconnect, especially as they get  
produced in china anyway.

the cpu's you also need bigtime, but as i'm going to react onto a  
special GPU posting anyway let's move it to that subject.

> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 15:26:43 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 21:26:43 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
Message-ID: <DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>

you can forget about getting much info other than marketing data.

the companies and orgainsations that already calculate for years at  
gpu's
they are really good in keeping their mouth shut.

But if you realize that even with 16 fast AMD cores (which for this  
specific
prime number code are a LOT FASTER in ipc than any other x64 chip),
a box built cheap second hand by the way as it's 4 x 8356 are needed
to feed just 1 gpu, you start to realize the real problem.

GPU's completely annihilate cpu's everywhere.

The limitation is the bandwidth to the gpu, though i didn't fully  
test that
bandwidth yet.

The 6000 series from AMD has much improved multiplication logics,  
like 2.5x faster
than the previous generation and it'll take some time to optimize  
this code for it.

streamcores for a while got renamed to PE's nowadays, processing  
elements,
and it has 1536 per gpu.

The 6990 has 2 of 'em.

It took a while for a good driver for these gpu's. Last days of  
januari it was there.
AMD-CAL works great here now.

There is not much diff with CUDA, other than proprietary ways of how  
to access things
and limbs and a few function calls.

Programming is similar.

818 execution units that can do multiplication 32 x 32 bits == 64 bits.

That kicks butt. bye bye cpu's.


On Mar 21, 2011, at 1:51 PM, Douglas Eadline wrote:

>
> I was recently given a copy of "GPU Computing Gems"
> to review. It is basically research quality NVidia success
> stories, some of which are quite impressive.
>
> I got to thinking about how others are fairing (or not)
> with GP-GPU technology. I put up a simple poll on
> ClusterMonkey to help get a general idea.
> (you can find it on the front page right top)
> If you have a moment, please provide
> your experience (results are available as well).
>
>   http://www.clustermonkey.net/
>
> BTW: You can see all the previous polls
> and links to other market data  here:
>
> http://goo.gl/lDcUJ
>
>
> --
> Doug
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 16:07:31 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 22:07:31 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
Message-ID: <B917A553-E23E-4F10-8B09-C055E2969E82@xs4all.nl>


On Apr 4, 2011, at 6:54 PM, Massimiliano Fatica wrote:

> If you are old enough to remember the time when the first distribute
> computers appeared on the scene,
> this is a deja-vu. Developers used to program on shared memory (
> mostly with directives) were complaining
> about the new programming models ( PVM, MPL, MPI).
> Even today, if you have a serial code there is no tool that will make
> your code runs on a cluster.
> Even on a single system, if you try an auto-parallel/auto-vectorizing
> compiler on a real code, your results will probably be disappointing.
>
> When you can get a 10x boost on a production code rewriting some
> portions of your code to use the GPU, if time to solution is important

Oh comeon factor 10 is not realistic.

You're doing the usual compare here of a hobby coder who coded
a tad in C or slowish C++ (except for a SINGLE, so not several,
NCSA coder i'll have to find the first C++ guy who can write codes
equally fast to C for complex algorithms - granted for big companies
C++ makes more sense, just not when it's about performance)
and then compare that with a full blown sponsored project in CUDA
that uses the topend gpu and compare it versus a single core
instead of 4 sockets (as that's powerwise the same).

Moneywise of course is another issue, that's where the gpu's win it
bigtime.

Yet there is a hidden cost in gpu's, that's you can build something way
faster for less money with gpu's, but you also need to pay for a good
coder to write your code in either CUDA or AMD-CAL (or as the chinese
seem to support both at the same time, which is not so complicated
if you have setup things in the correct manner).

This last is a big problem for the western world; governments pay big
bucks for hardware, but paying good coders what they are worth they
seem to forget.

Secondly there is another problem, that's that NVIDIA hasn't even  
released
the instructoin set of their GPU. Try to figure that out without  
fulltime work for it.

It seems however pretty similar to AMD, despite other huge architectural
differences between the 2; the programming similarity is striking and  
selfexplains
the real purpose where they got designed for (GRAPHICS).

> or you could perform simulations that were impossible  before ( for
> example using algorithms that were just too slow on CPUs,

All true yet it takes a LOT OF TIME to write something that's fast on  
a gpu.

First of all you have to not write double precision code, as the  
gamers card
from nvidia seem to not have much double precision logic, they only have
32 bits logics.

So at double precision, AMD is like 10 times faster in money per  
gflop than Nvidia.

Yet try to figure that out without being fulltime busy with those gpu's.
Only the TESLA versions have those transistors it seems.

Secondly Nvidia seems to keep being busy maximizing the frequency of
the gpu.

Now that might be GREAT for games as high clocked cores work (see  
intel),
yet for throughput of course that's a dead end. In raw throughput  
AMD's (ATI's)
approach will always win it of course from nvidia, as clocking a  
processor
higher has a O ( n ^ 3 ) impact on power consumption.

Now a big problem with nvidia is also that they basically go over spec.
I didn't really figure it out, yet it seems pci-e got designed with  
300 watt in mind max.

Yet at this code i'm busy with, the CUDA version of it (mfaktc)  
consumes a whopping 400+ watt
and please realize that majority of the system time is only keeping  
the streamcores busy
and not caches at all nor much of a RAM.

It's only doing multiplications of course at full speed in 32 bits  
code, using the new Fermi's
instructions that allows multiplying 32 bits x 32 bits == 64 bits.

CUDA version of your code gets developed btw by a guy working for a  
HPC vendor
which, i guess, also sells those Tesla's.

So any performance bragging sure must keep in mind it's far over 33%  
over the specs in
terms of power consumption.

Note AMD seems to follow nvidia in its path there.

> Discontinuous Galerkin method is a perfect example), there are a lot
> of developers that will write the code.

Oh comeon, writing for gpu's is really complicated.

> The effort it is clearly dependent of the code, the programmer and the
> tool used ( you can go from fully custom GPU code with CUDA or OpenCL,

Forget OpenCL, not good enough.

Better to code in CUDA and AMD-CAL at the same time something.

> to automatically generated CUF kernels from PGI, to directives using
> HMPP or PGI Accelerator).
> In situation where time  to solution relates to money,  for example
> oil and gas, GPUs are the answer today ( you will be surprised
> by the number of GPUs in Houston).

Pardon me, those industries already were using vectorized solutoins  
long before CUDA was
there and are using massively GPU's to calculate of course as soon as  
nvidia released
a version that was programmable.

This is not new. All those industries will of course never say  
anything on the performance
nor how many they use.

> Look at   the performance and scaling of AMBER ( MPI+ CUDA),
> http://ambermd.org/gpus/benchmarks.htm, and tell me that the results
> were not worth the effort.
>
> Is GPU programming for everyone: probably not, in the same measure
> that parallel programming in not for everyone.
> Better tools will lower the threshold, but a threshold will be  
> always present.
>

I would argue that both AMD as well as Nvidia has really tried to  
give the 3d world nations an advantage
by stopping progress in the rich nations.

I will explain. The real big advantage of rich nations is that  
average persons have more cash.
Students are a good example there. They can afford gpu's easily.

Yet there is so little technical information available on latencies  
and in case of nvidia on instructoin set that
the gpu's support, that this gives a huge programming hurdle for  
students.

Also there is no good tips in nvidia documents how to program for  
those things.

The most fundamental lessons how to program a gpu i miss in all  
documents i scanned so far.

It's just a bunch of 'lectures' that's not going to create any  
topcoders.

A piece of information here and a tad there.
Very bad.

AMD also is a nightmare there, they can't even run more than 1  
program at the same time, despite claims
that the 4000 series gpu's already had hardware support to do it. The  
indian helpdesk in fact is so lazy that
they didn't even rename the word 'ati' in the documentation to AMD,  
and the library each few months gets a
new name. Stream SDK now it's another new fancy name. "we worked hard  
in India sahib, yes sahib, yes sahib".

Yet 5 years later still not much works. For example in opencl also  
the 2nd gpu doesn't work in case of AMD.
Result "undefined". Nice.

Default driver install at inux here doesn't get openCL to work in  
fact at the 6970.

Both nvidia as well as AMD are a total joke there and by means of  
incompetence,
the generic incompetence being complete and clear documentation just  
like we have documention on how
cpu's work. Be it intel or AMD or IBM.

Students who program now for those gpu's in CUDA or AMD-CAL, they  
will have to go to hell and back to get something
to work well on it, except some trivial stuff that works well at it.

We see that just a few manage.

That's not a problem of the students, but a problem for society,  
because doing calculations faster and especially
CHEAP, is a huge advantage to progress science.

NSA type organisations in 3d world nations are a lot bigger than  
here, simply because more people live there.
So right now more people over there code for gpu's than here, here  
where everyone can afford one.

Some big companies excepted of course, but this is not a small note  
on companies. This is a note on 1st world versus 3d
world. The real difference is students with budget over here.

They have budget for gpu's, yet there is no good documentation simply  
giving which instructions a gpu has let alone which
latencies.

If you google hard, you will find 1 guy who actually by means of  
measuring had to measure the latencies of simple
instructions that write to the same register. Why did an university  
guy need to measure this, why isn't this simply
in Nvidia documentation?

A few of those things will of course have majority, vaste vaste  
majority of students trying something on a gpu, completely fail.

Because they fail, they don't continue there and don't get back from  
those gpu's a faster running code that gives them
something very important: faster calculation speed for whatever they  
wanted to run.

This is where AMD and Nvidia, and i politely call it by means of  
incompetence, gives the rich nations no advantage
over the 3d world nations, as the students need to be compeltely  
fulltime busy to obtain knowledge on the internal workings
of the gpu's in order to get something going fast at them. Majority  
will fail therefore of course, which has simply avoided
gpu's from getting massively adapted.

I've seen so many students try and fail at gpu programming,  
especially CUDA.

It's bizarre. The fail % is so huge. Even a big succes doesn't get  
recognized as a big succes,
simply because the guy didn't know about a few bottlenecks in gpu  
programming, as no manual told him
the combination of problems he ran into, as there was no technical  
data available.

It is true gpu's can be fast, but i feel there is a big need for  
better technical documentation of them.
We can no longer ignore this now that 3d world nations are  
overrunning 1st world nations. Mainly
because the sneaky organisations that do know everything are of  
course bigger over there than here, by means of
population size. This where the huge advantage of the rich nations,  
namely that every student has such gpu
at home, is not getting taken advantage from as the hurdle to gpu  
programming is too high by means of lack of
accurate documentation. Of course in 3d world nations they have at  
most a mobile phone, and very very seldom a laptop (except for the  
rich elite), let alone a computer with a capable programmable gpu,  
which makes it impossible for majority
of 3d world nations students to do any gpu computation because of a  
shortage in cash.

>
> Massimiliano
> PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran,
> applications porting with CUDA, MPI+CUDA).
>
>
> 2011/4/4 "C. Bergstr?m" <cbergstrom at pathscale.com>:
>> Herbert Fruchtl wrote:
>>> They hear great success stories (which in reality are often  
>>> prototype
>>> implementations that do one carefully chosen benchmark well),  
>>> then look at the
>>> API, look at their existing code, and postpone the start of their  
>>> project until
>>> they have six months spare time for it. And we know when that is.
>>>
>>> The current approach with more or less vendor specific libraries  
>>> (be they "open"
>>> or not) limits the uptake of GPU computing to a few hardcore  
>>> developers of
>>> experimental codes who don't mind rewriting their code every two  
>>> years. It won't
>>> become mainstream until we have a compiler that turns standard  
>>> Fortran (or C++,
>>> if it has to be) into GPU code. Anything that requires more  
>>> change than let's
>>> say OpenMP directives is doomed, and rightly so.
>>>
>> Hi Herbert,
>>
>> I think your perspective pretty much nails it
>>
>> (shameless self promotion)
>> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
>> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
>> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to  
>> source)
>>
>> This is really only the tip of the problem and there must also be
>> solutions for scaling *efficiently* across the cluster.  (No MPI +  
>> CUDA
>> or even HMPP is *not* the answer imho.)
>>
>> ./C
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr  4 16:20:02 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Apr 2011 16:20:02 -0400 (EDT)
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
Message-ID: <Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>

> GPU's completely annihilate cpu's everywhere.

this is complete nonsense.  GPUs do very nicely on a quite narrow 
set of problems.  for a somewhat larger set of problems, they do OK,
but pretty "meh", really, considering.  for many problems, GPUs 
are irrelevant, whether that's because the problem uses too much 
memory, or already scales well on non-GPU, or doesn't have a GPU-friendly
structure.

> 818 execution units that can do multiplication 32 x 32 bits == 64 bits.
> That kicks butt. bye bye cpu's.

well, for your application, which is quite narrow.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 16:34:19 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 22:34:19 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
Message-ID: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>


On Apr 4, 2011, at 10:20 PM, Mark Hahn wrote:

>> GPU's completely annihilate cpu's everywhere.
>
> this is complete nonsense.  GPUs do very nicely on a quite narrow  
> set of problems.  for a somewhat larger set of problems, they do OK,
> but pretty "meh", really, considering.  for many problems, GPUs are  
> irrelevant, whether that's because the problem uses too much  
> memory, or already scales well on non-GPU, or doesn't have a GPU- 
> friendly
> structure.
>
>> 818 execution units that can do multiplication 32 x 32 bits == 64  
>> bits.
>> That kicks butt. bye bye cpu's.
>
> well, for your application, which is quite narrow.

Which is about any relevant domain where massive computation takes  
place.
The number of algorithms that really profit bigtime from a lot of  
RAM, in some cases you can also
replace by massive computation and a tad of memory, the cases where  
that cannot be the case
are very rare. For those few cases you order a few nodes with massive  
RAM rather than big cpu power.

yet majority of HPC calculations, especially if we add company codes  
there, the simulators and the oil,
gas, car and aviation industry.

So that makes 95% of all codes just need massive cpu power and can  
get away with relative small RAM sizes
per compute unit. Not to confuse btw with a compute unit of AMD as  
that is just a small part of a gpu, speaking
of redefinitions :)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr  4 17:54:00 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Apr 2011 17:54:00 -0400 (EDT)
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
	<7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
Message-ID: <Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>

>> well, for your application, which is quite narrow.
>
> Which is about any relevant domain where massive computation takes place.

you are given to hyperbole.  the massive domains I'm thinking of
are cosmology and explicit quantum condensed-matter calculations.
the experts in those fields I talk to both do use massive computation
and do not expect much benefit from GPUs.

> The number of algorithms that really profit bigtime from a lot of RAM, in 
> some cases you can also
> replace by massive computation and a tad of memory, the cases where that 
> cannot be the case
> are very rare.

no.  you are equating "uses lots of ram" with "uses memoization".

> yet majority of HPC calculations, especially if we add company codes there, 
> the simulators and the oil,
> gas, car and aviation industry.

jeez.
nevermind I said anything.  I'd forgotten about your style.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 18:10:44 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 00:10:44 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
	<7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
	<Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>
Message-ID: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl>


On Apr 4, 2011, at 11:54 PM, Mark Hahn wrote:

>>> well, for your application, which is quite narrow.
>>
>> Which is about any relevant domain where massive computation takes  
>> place.
>
> you are given to hyperbole.  the massive domains I'm thinking of
> are cosmology and explicit quantum condensed-matter calculations.
> the experts in those fields I talk to both do use massive computation
> and do not expect much benefit from GPUs.

Even the field you give as an example: quantum mechanica:
Vaste majority of quantum mechanica calculations are massive matrix  
calculations.

Furthermore i didn't take a look to the field you're speaking about.
I did however take a look to 1 other quantum mechanica calculation,
where someone used 1 core of his quadcore box and massive RAM.

It took me 1 afternoon to explain the guy how to trivially use all 4  
cores doing that calculation
using the same RAM buffer.

You realize that you also can do combined calculations?

Just have a new chipset with big bandwidth to gpu, at cpu's, based  
upon a big RAM buffer, prepare
batches, ship batch to gpu, do tough calculation work on the gpu,  
ship results back.

That's how many use those gpu's.

My attempt to write a sieve directly into the gpu in order to do  
everything inside the gpu,
is of a different league sir than where you are talking.

Your kind of talking is: "there are no tanks in the city, we will  
drive all tanks out of the city, so that only
our cpu's are left again".

Those days are over. Just get creative and find a way to do it at a gpu.

I parallellized 1 quantum mechanica calculation there; i wasn't paid  
for that.
Just pay someone to useful use a GPU. If it ain't easy it doesn't  
mean it's impossible.

Most quantum mechanica guys might be brilliant in their field, in  
manners how to parallellize things
without losing their branching factor that a huge RAM buffer gives,  
they didn't figure out simply yet.

Now it won't be easy to solve for every field; but being a speedfreak  
and in advance saying some faster type of
hardware cannot be used is just monkeytalk. Go get clever and solve  
the problem. Find solutions, don't see just
problems.

>
>> The number of algorithms that really profit bigtime from a lot of  
>> RAM, in some cases you can also
>> replace by massive computation and a tad of memory, the cases  
>> where that cannot be the case
>> are very rare.
>
> no.  you are equating "uses lots of ram" with "uses memoization".
>
>> yet majority of HPC calculations, especially if we add company  
>> codes there, the simulators and the oil,
>> gas, car and aviation industry.
>
> jeez.
> nevermind I said anything.  I'd forgotten about your style.

Read the statistics on the reports what eats system time sir.
You have access to those papers as well if you know how to google.

Regards,
Vincent

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr  4 18:20:08 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Apr 2011 18:20:08 -0400 (EDT)
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
	<7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
	<Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>
	<7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl>
Message-ID: <Pine.LNX.4.64.1104041815010.31929@coffee.psychology.mcmaster.ca>

>>>> well, for your application, which is quite narrow.
>>> 
>>> Which is about any relevant domain where massive computation takes place.
>> 
>> you are given to hyperbole.  the massive domains I'm thinking of
>> are cosmology and explicit quantum condensed-matter calculations.
>> the experts in those fields I talk to both do use massive computation
>> and do not expect much benefit from GPUs.
>
> Even the field you give as an example: quantum mechanica:
> Vaste majority of quantum mechanica calculations are massive matrix 
> calculations.

yes, specifically very large sparse eigensystems.  do you have an example
of effectively using GPUs for this?

> Furthermore i didn't take a look to the field you're speaking about.
> I did however take a look to 1 other quantum mechanica calculation,
> where someone used 1 core of his quadcore box and massive RAM.

sorry, I'm talking thousands of cores, ideally with > 4GB/core.

> It took me 1 afternoon to explain the guy how to trivially use all 4 cores 
> doing that calculation
> using the same RAM buffer.

the point is that lots of serious science uses MPI already,
and doesn't care much about GPUs.  if they were free, sure,
they might be interesting.

> My attempt to write a sieve directly into the gpu in order to do everything 
> inside the gpu,
> is of a different league sir than where you are talking.

bully for you.  your application is a niche.

> Your kind of talking is: "there are no tanks in the city, we will drive all 
> tanks out of the city, so that only
> our cpu's are left again".

nonsense.  I'm saying that GPUs are a nice, specialized accelerator.
you can't have them without hosts, so you need to compare host vs host+GPU.

> Those days are over. Just get creative and find a way to do it at a gpu.

don't be silly.  GPUs have weaknesses as well as strengths.  packaging 
and system design is one of the minor sticking points with GPUs.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Tue Apr  5 01:22:39 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Mon, 4 Apr 2011 22:22:39 -0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
Message-ID: <20110405052239.GA6130@bx9.net>

On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote:

> If you are old enough to remember the time when the first distribute
> computers appeared on the scene,
> this is a deja-vu.

Not to mention the prior appearance of array processors. Oil+Gas
bought a lot of those, too. Some important radio astronomy data
reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B
was 10X faster than the VAX by itself. Then microprocessor-based
workstations arrived, and the game was over, ease of use FTW.

> Even on a single system, if you try an auto-parallel/auto-vectorizing
> compiler on a real code, your results will probably be disappointing.

The wins from such compilers have been steadily decreasing, as main
memory gets farther and farther away from the CPU and caches.

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From beat at 0x1b.ch  Tue Apr  5 01:52:41 2011
From: beat at 0x1b.ch (Beat Rubischon)
Date: Tue, 05 Apr 2011 07:52:41 +0200
Subject: [Beowulf] Quadrics?
In-Reply-To: <C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>
References: <Pine.LNX.4.64.1101111055290.21366@coffee.psychology.mcmaster.ca>	<4D2C8B7C.30300@bull.co.uk>
	<C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>
Message-ID: <4D9AAE29.5090207@0x1b.ch>

Hi Vincent!

On 04.04.11 21:16, Vincent Diepeveen wrote:
> latency is superior of quadrics compared to all the infini* stuff.

Quadrics was great stuff - but it was outperformed once Mellanox invited
their ConnectX chips. Additional the Quadrics team never got their PCIe
chips (QSnet III) to fly. Finally the company closed their doors in may 09.

I really liked their hard- and software. But the time is over...

> Of course even the realtime linux kernel is rather crappy there, as
> it locks every action from and to a socket (even RAW/UDP
> communication in fact), so you need a 'hack' of that kernel anyway to
> get faster latencies.

When talking about Interconnects the kernel is not involved in
communication. Any context switch is avoided to keep the overhead small.
This basically means a real time kernel isn't needed as it would not
give you any additional benefit.

Beat

-- 
     \|/                           Beat Rubischon <beat at 0x1b.ch>
   ( 0-0 )                             http://www.0x1b.ch/~beat/
oOO--(_)--OOo---------------------------------------------------
Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 03:51:00 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 09:51:00 +0200
Subject: [Beowulf] Quadrics?
In-Reply-To: <4D9AAE29.5090207@0x1b.ch>
References: <Pine.LNX.4.64.1101111055290.21366@coffee.psychology.mcmaster.ca>	<4D2C8B7C.30300@bull.co.uk>
	<C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>
	<4D9AAE29.5090207@0x1b.ch>
Message-ID: <75CD2C36-0B25-4CD2-B3F8-2645BE1A72DC@xs4all.nl>


On Apr 5, 2011, at 7:52 AM, Beat Rubischon wrote:

> Hi Vincent!
>
> On 04.04.11 21:16, Vincent Diepeveen wrote:
>> latency is superior of quadrics compared to all the infini* stuff.
>
> Quadrics was great stuff - but it was outperformed once Mellanox  
> invited
> their ConnectX chips. Additional the Quadrics team never got their  
> PCIe
> chips (QSnet III) to fly. Finally the company closed their doors in  
> may 09.
>
> I really liked their hard- and software. But the time is over...
>

of course there is new great pci-e solutions, yet the price per port  
there is
bigger than entire machine with latest gpu, that's a big problem to  
make cheap
clusters.

If you buy a cheap 6 core box of 350 euro then a new generation gpu is
318 euro or so  (that's a HD6970).

What's node price of the network?

>> Of course even the realtime linux kernel is rather crappy there, as
>> it locks every action from and to a socket (even RAW/UDP
>> communication in fact), so you need a 'hack' of that kernel anyway to
>> get faster latencies.
>
> When talking about Interconnects the kernel is not involved in
> communication. Any context switch is avoided to keep the overhead  
> small.
> This basically means a real time kernel isn't needed as it would not
> give you any additional benefit.

realtime kernel keeps other worst cases down bigtime, especially with  
respect to
scheduling.

>
> Beat
>
> -- 
>      \|/                           Beat Rubischon <beat at 0x1b.ch>
>    ( 0-0 )                             http://www.0x1b.ch/~beat/
> oOO--(_)--OOo---------------------------------------------------
> Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 03:58:47 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 09:58:47 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <20110405052239.GA6130@bx9.net>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
	<20110405052239.GA6130@bx9.net>
Message-ID: <F38D3DC1-3A9F-48ED-B183-1C687CE5B44C@xs4all.nl>


On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote:

> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote:
>
>> If you are old enough to remember the time when the first distribute
>> computers appeared on the scene,
>> this is a deja-vu.
>
> Not to mention the prior appearance of array processors. Oil+Gas
> bought a lot of those, too. Some important radio astronomy data
> reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B
> was 10X faster than the VAX by itself. Then microprocessor-based
> workstations arrived, and the game was over, ease of use FTW.
>
>> Even on a single system, if you try an auto-parallel/auto-vectorizing
>> compiler on a real code, your results will probably be disappointing.
>
> The wins from such compilers have been steadily decreasing, as main
> memory gets farther and farther away from the CPU and caches.
>
> -- greg

It's different this time indeed; classic cpu's will never again  
deliver big performance.

cache - coherency is simply too complicated with many cores.
cpu's also will need a manycore co-processor therefore.

furthermore manycores simply are cheaper to produce and they can eat  
a bigger powerbudget.

3 very powerful arguments which regrettably limits cpu's, but that's  
the price we pay for progress.

It won't mean cpu's will go away of course any soon, they're so  
generic and easy to program that
they will survive. Just offload the calculations to the manycores.

please don't estimate the argument of cheaper to produce.


>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 04:04:35 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 10:04:35 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <F38D3DC1-3A9F-48ED-B183-1C687CE5B44C@xs4all.nl>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
	<20110405052239.GA6130@bx9.net>
	<F38D3DC1-3A9F-48ED-B183-1C687CE5B44C@xs4all.nl>
Message-ID: <B602B9D0-76A9-4F78-8CE7-F596CE76D946@xs4all.nl>


On Apr 5, 2011, at 9:58 AM, Vincent Diepeveen wrote:

>
> On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote:
>
>> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote:
>>
>>> If you are old enough to remember the time when the first distribute
>>> computers appeared on the scene,
>>> this is a deja-vu.
>>
>> Not to mention the prior appearance of array processors. Oil+Gas
>> bought a lot of those, too. Some important radio astronomy data
>> reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B
>> was 10X faster than the VAX by itself. Then microprocessor-based
>> workstations arrived, and the game was over, ease of use FTW.
>>
>>> Even on a single system, if you try an auto-parallel/auto- 
>>> vectorizing
>>> compiler on a real code, your results will probably be  
>>> disappointing.
>>
>> The wins from such compilers have been steadily decreasing, as main
>> memory gets farther and farther away from the CPU and caches.
>>
>> -- greg
>

Early Morning oh oh oh oh, apologies the context might be clear yet
the sentences were written down wrong.

> It's different this time indeed; classic cpu's will never again
> deliver big performance.
>

ack

> cache - coherency is simply too complicated with many cores.


1) Cache-coherency is too complicated for CPU's

> cpu's also will need a manycore co-processor therefore.
>

ack

> furthermore manycores simply are cheaper to produce and they can eat
> a bigger powerbudget.
>

ack

> 3 very powerful arguments which regrettably limits cpu's, but that's
> the price we pay for progress.
>

ack

> It won't mean cpu's will go away of course any soon, they're so
> generic and easy to program that
> they will survive. Just offload the calculations to the manycores.
>

ack

> please don't estimate the argument of cheaper to produce.
>
>

please don't 	UNDERESTIMATE the argument of cheaper to produce

only 6 out of 8 score = 75% sharp in the morning

>
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Tue Apr  5 05:10:28 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Tue, 05 Apr 2011 19:10:28 +1000
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
Message-ID: <4D9ADC84.7030804@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/04/11 05:26, Vincent Diepeveen wrote:

> GPU's completely annihilate cpu's everywhere.

Great!  Where can I get one with 1TB of on-card RAM to
keep our denovo reassembly people happy ?

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0
POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t
=OMhq
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 09:05:19 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 15:05:19 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D9ADC84.7030804@unimelb.edu.au>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<4D9ADC84.7030804@unimelb.edu.au>
Message-ID: <2538ED2A-7F07-4524-B74E-6F0AE623916E@xs4all.nl>


On Apr 5, 2011, at 11:10 AM, Christopher Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/04/11 05:26, Vincent Diepeveen wrote:
>
>> GPU's completely annihilate cpu's everywhere.
>
> Great!  Where can I get one with 1TB of on-card RAM to
> keep our denovo reassembly people happy ?

There is already several projects in that area that tried incorporate  
GPU's and with succes.

Just google a bit, i got bunches of hits from all sorts of research  
institutes in that area,
most already over 2 years old, nothing new there.

Your reaction just shows your ignorance there.

Regards,
Vincent


>
> - --
>     Christopher Samuel - Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>          http://www.vlsci.unimelb.edu.au/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0
> POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t
> =OMhq
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Wed Apr  6 06:58:12 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Wed, 6 Apr 2011 11:58:12 +0100
Subject: [Beowulf] Westmere EX
Message-ID: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/

10 core Westmere EX on an eight socket box = 80 cores 
These would be a very nice machine.
Anyone know if machines like this will be built?
Do the sockets have enough Quickpath links to create an 8-way topology?


John Hearns | CFD Hardware Specialist | McLaren Racing Limited
McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

T:  +44 (0) 1483 261000
D:  +44 (0) 1483 262352
F:  +44 (0) 1483 261010
E:  john.hearns at mclaren.com
W:  www.mclaren.com


The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From brice.goglin at gmail.com  Wed Apr  6 07:05:55 2011
From: brice.goglin at gmail.com (Brice Goglin)
Date: Wed, 06 Apr 2011 13:05:55 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4D9C4913.10802@gmail.com>

Le 06/04/2011 12:58, Hearns, John a ?crit :
> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
>
> 10 core Westmere EX on an eight socket box = 80 cores 
> These would be a very nice machine.
> Anyone know if machines like this will be built?
> Do the sockets have enough Quickpath links to create an 8-way topology?
>   

You only have 4 QPI links per sockets, no way to connect the entire graph.

Supermicro already announced such 8-way machines. See their QPI topology
on page 30 of the motherboard manual available at
http://www.supermicro.com/products/motherboard/Xeon7000/7500/X8OBN-F.cfm

Brice

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cap at nsc.liu.se  Wed Apr  6 11:41:18 2011
From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=)
Date: Wed, 6 Apr 2011 17:41:18 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <201104061741.18972.cap@nsc.liu.se>

On Wednesday, April 06, 2011 12:58:12 pm Hearns, John wrote:
> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
> 
> 10 core Westmere EX on an eight socket box = 80 cores
> These would be a very nice machine.
> Anyone know if machines like this will be built?
> Do the sockets have enough Quickpath links to create an 8-way topology?
> 
> 
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

The HP DL980 is an 8 socket EX box but it's not glue-less (it uses HPs own 
numa interconnect). If you stuff 'em full of dimms then they're probably 
competitive with the 4 socket 580 (assuming the 980 uses 8G dimms instead of 
16G for the 580...).

/Peter
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Wed Apr  6 14:00:17 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Wed, 6 Apr 2011 20:00:17 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>


On Apr 6, 2011, at 12:58 PM, Hearns, John wrote:

> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
>
> 10 core Westmere EX on an eight socket box = 80 cores
> These would be a very nice machine.
> Anyone know if machines like this will be built?
> Do the sockets have enough Quickpath links to create an 8-way  
> topology?

What do you intend to use the machines for?
For a chessprogram they would be great, but none of those guys has  
the cash to pay for these
machines.

For financial world it would be a waste of money as well as the  
latency probably will be very very bad.

They seem to get equipped with a max of 512GB ram, not really much  
for those who badly need a lot of RAM,
if we consider the price of such a configured machine.

Same price like a power7.

>
>
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK
>
> T:  +44 (0) 1483 261000
> D:  +44 (0) 1483 262352
> F:  +44 (0) 1483 261010
> E:  john.hearns at mclaren.com
> W:  www.mclaren.com
>
>
>
>
> The contents of this email are confidential and for the exclusive  
> use of the intended recipient.  If you receive this email in error  
> you should not copy it, retransmit it, use it or disclose its  
> contents but should return it to the sender immediately and delete  
> your copy.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Wed Apr  6 14:12:56 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Wed, 6 Apr 2011 19:12:56 +0100
Subject: [Beowulf] Westmere EX
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
Message-ID: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> What do you intend to use the machines for?
> For a chessprogram they would be great, but none of those guys has
> the cash to pay for these
> machines.


The Supermicro board which Bruce Goglin refers to is said to support
16gbytes DIMMS.

Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a
1024 Gbyte machine,
plus you can cook your dinner on it.

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Wed Apr  6 14:18:35 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Thu, 07 Apr 2011 01:18:35 +0700
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
	<207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4D9CAE7B.8000900@pathscale.com>

Hearns, John wrote:
>> What do you intend to use the machines for?
>> For a chessprogram they would be great, but none of those guys has
>> the cash to pay for these
>> machines.
>>     
>
>
>
> The Supermicro board which Bruce Goglin refers to is said to support
> 16gbytes DIMMS.
>
> Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a
> 1024 Gbyte machine,
> plus you can cook your dinner on it.
>   
LOL.. (I have to admit that's kinda funny, but only because it's true)

I didn't look at the specs, but I wonder how many IOPS you could get off 
a ram disk on that thing.. $60k is I believe (I could be wrong) in the 
same ballpark as 1T 1U 1million IOPS appliances (albeit they offer 
persistence and probably consume less power as well)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hearnsj at googlemail.com  Wed Apr  6 18:02:47 2011
From: hearnsj at googlemail.com (John Hearns)
Date: Wed, 6 Apr 2011 23:02:47 +0100
Subject: [Beowulf] Westmere EX
In-Reply-To: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
Message-ID: <BANLkTimxRTWgYzmPb-2EpXhv-F7BJt=ryA@mail.gmail.com>

On 6 April 2011 19:00, Vincent Diepeveen <diep at xs4all.nl> wrote:
>
> On Apr 6, 2011, at 12:58 PM, Hearns, John wrote:
>
>> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/

> What do you intend to use the machines for?

Maybe something like:
http://www.youtube.com/watch?v=x2Z3h_Hx310&NR=1
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Wed Apr  6 20:39:19 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 6 Apr 2011 20:39:19 -0400 (EDT)
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>

> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
>
> 10 core Westmere EX on an eight socket box = 80 cores
> These would be a very nice machine.

shrug.  does anyone have serious experience with real apps on manycore
machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
but they're substantially more exotic/rare/expensive.)

I bet there will be 100x more 4s servers build with these chips than 8s. 
and 1000x more 2s than 4s...

a friend noticed something weird on intel's spec sheets:
http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E

notice it says 32GB max memory size.  even if that means 32GB/socket,
it's not all that much.

I don't know about everyone else, but I'm already bored with core counts ;)
these also seem fairly warm (130W), considering that they're the fancy
new 32nm process and run at modest clock rates...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From joshua_mora at usa.net  Wed Apr  6 20:57:43 2011
From: joshua_mora at usa.net (Joshua mora acosta)
Date: Wed, 06 Apr 2011 19:57:43 -0500
Subject: [Beowulf] Westmere EX
Message-ID: <093PDga5r8464S02.1302137863@web02.cms.usa.net>

_3D_ FFT scaling will allow you to see how well balanced is the system.

Joshua
------ Original Message ------
Received: 07:40 PM CDT, 04/06/2011
From: Mark Hahn <hahn at mcmaster.ca>
To: Beowulf Mailing List <beowulf at beowulf.org>
Subject: Re: [Beowulf] Westmere EX

> > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
> >
> > 10 core Westmere EX on an eight socket box = 80 cores
> > These would be a very nice machine.
> 
> shrug.  does anyone have serious experience with real apps on manycore
> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
> but they're substantially more exotic/rare/expensive.)
> 
> I bet there will be 100x more 4s servers build with these chips than 8s. 
> and 1000x more 2s than 4s...
> 
> a friend noticed something weird on intel's spec sheets:
>
http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E
> 
> notice it says 32GB max memory size.  even if that means 32GB/socket,
> it's not all that much.
> 
> I don't know about everyone else, but I'm already bored with core counts ;)
> these also seem fairly warm (130W), considering that they're the fancy
> new 32nm process and run at modest clock rates...
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jlforrest at berkeley.edu  Wed Apr  6 22:15:17 2011
From: jlforrest at berkeley.edu (Jon Forrest)
Date: Wed, 06 Apr 2011 19:15:17 -0700
Subject: [Beowulf] Westmere EX
In-Reply-To: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
Message-ID: <4D9D1E35.9040802@berkeley.edu>

On 4/6/2011 5:39 PM, Mark Hahn wrote:

> shrug.  does anyone have serious experience with real apps on manycore
> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
> but they're substantially more exotic/rare/expensive.)

I have a couple 48-core 1U boxes. They can build
gcc and other large packages very quickly.

The scientists who run single process simulations
also like them but they're not real picky about
how long it takes for something to run. They also
generally spend close to no time at all optimizing
anything.

-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Thu Apr  7 04:43:06 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 09:43:06 +0100
Subject: [Beowulf] Westmere EX
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
Message-ID: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> 
> On 4/6/2011 5:39 PM, Mark Hahn wrote:
> 
> > shrug.  does anyone have serious experience with real apps on
> manycore
> > machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
> > but they're substantially more exotic/rare/expensive.)
> 
> I have a couple 48-core 1U boxes. They can build
> gcc and other large packages very quickly.
> 
> The scientists who run single process simulations
> also like them but they're not real picky about
> how long it takes for something to run. They also
> generally spend close to no time at all optimizing
> anything.

"Premature optimization is the root of all evil"  - Donald Knuth


I'm also interested in the response to Mark Hahn's question - I guess
that's why I started this thread really!

Also as I've said before, with the advent of affordable manycore systems
like this, we're going
to have to dust off those old skills practised in the age of SMP monster
machines - which were probably
something like the same specs as these affordable systems!

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From eugen at leitl.org  Thu Apr  7 04:56:33 2011
From: eugen at leitl.org (Eugen Leitl)
Date: Thu, 7 Apr 2011 10:56:33 +0200
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
Message-ID: <20110407085633.GE23560@leitl.org>


http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n 

10,000-core Linux supercomputer built in Amazon cloud

Cycle Computing builds cloud-based supercomputing cluster to boost scientific
research.

By Jon Brodkin, Network World

April 06, 2011 03:15 PM ET

High-performance computing expert Jason Stowe recently asked two of his
engineers a simple question: Can you build a 10,000-core cluster in the
cloud?

"It's a really nice round number," says Stowe, the CEO and founder of Cycle
Computing, a vendor that helps customers gain fast and efficient access to
the kind of supercomputing power usually reserved for universities and large
research organizations.

SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to
Linux

To continue reading, register here to become an Insider. You'll get free
access to premium content from CIO, Computerworld, CSO, InfoWorld, and
Network World. See more Insider content or sign in.

High-performance computing expert Jason Stowe recently asked two of his
engineers a simple question: Can you build a 10,000-core cluster in the
cloud?

"It's a really nice round number," says Stowe, the CEO and founder of Cycle
Computing, a vendor that helps customers gain fast and efficient access to
the kind of supercomputing power usually reserved for universities and large
research organizations.

SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to
Linux

Cycle Computing had already built a few clusters on Amazon's Elastic Compute
Cloud that scaled up to several thousand cores. But Stowe wanted to take it
to the next level. Provisioning 10,000 cores on Amazon has probably been done
numerous times, but Stowe says he's not aware of anyone else achieving that
number in an HPC cluster, meaning one that uses a batch scheduling technology
and runs an HPC-optimized application.

"We haven't found references to anything larger," Stowe says. Had it been
tested for speed, the Linux-based cluster Stowe ran on Amazon might have been
big enough to make the Top 500 list of the world's fastest supercomputers.

One of the first steps was finding a customer that would benefit from such a
large cluster. There's no sense in spinning up such a large environment
unless it's devoted to some real work.

The customer that opted for the 10,000-core cloud cluster was biotech company
Genentech in San Francisco, where scientist Jacob Corn needed computing power
to examine how proteins bind to each other, in research that might eventually
lead to medical treatments. Compared to the 10,000-core cluster, "we're a
tenth the size internally," Corn says.

Cycle Computing and Genentech spun up the cluster on March 1 a little after
midnight, based on Amazon's advice regarding the optimal time to request
10,000 cores. While Amazon offers virtual machine instances optimized for
high-performance computing, Cycle and Genentech instead opted for a "standard
vanilla CentOS" Linux cluster to save money, according to Stowe. CentOS is a
version of Linux based on Red Hat's Linux.

The 10,000 cores were composed of 1,250 instances with eight cores each, as
well as 8.75TB of RAM and 2PB disk space. Scaling up a couple of thousand
cores at a time, it took 45 minutes to provision the whole cluster. There
were no problems. "When we requested the 10,000th core, we got it," Stowe
said.

The cluster ran for eight hours at a cost of $8,500, including all the fees
to Amazon and Cycle Computing. (See also: Start-up transforms unused desktop
cycles into fast server clusters)

For Genentech, this was cheap and easy compared to the alternative of buying
10,000 cores for its own data center and having them idle away with no work
for most of their lives, Corn says. Using Genentech's existing resources to
perform the simulations would take weeks or months instead of the eight hours
it took on Amazon, he says. Genentech benefited from the high number of cores
because its calculations were "embarrassingly parallel," with no
communication between nodes, so performance stats "scaled linearly with the
number of cores," Corn said.

To provision the cluster, Cycle used its own CycleCloud software, the Condor
scheduling system and Chef, an open source configuration management
framework.

Cycle also used some of its own software to detect errors and restart nodes
when necessary, a shared file system, and a few extra nodes on top of the
10,000 to handle some of the legwork. To ensure security, the cluster was
engineered with secure-HTTP and 128/256-bit Advanced Encryption Standard
encryption, according to Cycle.

Cycle Computing boasted that the cluster was roughly equivalent to the 114th
fastest supercomputer in the world on the Top 500 list, which hit about 66
teraflops. In reality, they didn't run the speed benchmark required to submit
a cluster to the Top 500 list, but nearly all of the systems listed below No.
114 in the ranking contain fewer than 10,000 cores.

Genentech is still waiting to see whether the simulations lead to anything
useful in the real world, but Corn says the data "looks fantastic." He says
Genentech is "very open" to building out more Amazon clusters, and Cycle
Computing is looking ahead as well.

"We're already working on scaling up larger," Stowe says. All Cycle needs is
a customer with "a use case to take advantage of it."

Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin

Read more about data center in Network World's Data Center section. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr  7 08:47:54 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 7 Apr 2011 14:47:54 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
	<207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <33396FD5-FBAA-4735-8694-B0D7FE7EAA84@xs4all.nl>


On Apr 6, 2011, at 8:12 PM, Hearns, John wrote:

>> What do you intend to use the machines for?
>> For a chessprogram they would be great, but none of those guys has
>> the cash to pay for these
>> machines.
>
>
>
> The Supermicro board which Bruce Goglin refers to is said to support
> 16gbytes DIMMS.
>
> Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a
> 1024 Gbyte machine,
> plus you can cook your dinner on it.
>

Except that you can't buy the machine equipped with that for $60k in  
a shop.

512GB equipped 8 socket nehalem-ex (8 core version 2.26Ghz) was  
introduced at $205k,
that's without further equipment such as huge storage, so basic  
configuration when ordered at Oracle.

So this box will probably be $250k or $300k or so?

Regards,
Vincent

> The contents of this email are confidential and for the exclusive  
> use of the intended recipient.  If you receive this email in error  
> you should not copy it, retransmit it, use it or disclose its  
> contents but should return it to the sender immediately and delete  
> your copy.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr  7 08:52:43 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 7 Apr 2011 14:52:43 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>


On Apr 7, 2011, at 10:43 AM, Hearns, John wrote:

>>
>> On 4/6/2011 5:39 PM, Mark Hahn wrote:
>>
>>> shrug.  does anyone have serious experience with real apps on
>> manycore
>>> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
>>> but they're substantially more exotic/rare/expensive.)
>>
>> I have a couple 48-core 1U boxes. They can build
>> gcc and other large packages very quickly.
>>
>> The scientists who run single process simulations
>> also like them but they're not real picky about
>> how long it takes for something to run. They also
>> generally spend close to no time at all optimizing
>> anything.
>
> "Premature optimization is the root of all evil"  - Donald Knuth
>
>
> I'm also interested in the response to Mark Hahn's question - I guess
> that's why I started this thread really!
>
> Also as I've said before, with the advent of affordable manycore  
> systems
> like this, we're going


> to have to dust off those old skills practised in the age of SMP  
> monster
> machines - which were probably
> something like the same specs as these affordable systems!
>

it's not clear what 'these' refers to.

48 core AMD multicore machine: $8000 on ebay i saw one for. Of course  
not much of a RAM and not fastest chip.
Let's say fully configured about double that price.

GPU monster box, which is basically a few videocards inside such a  
box stacked up a tad, wil only add a couple of
thousands.

But a 8 socket @ 10 core nehalem-ex, in basic configuration will be  
already far above $205k. Probably a $300k or
so when configured.

Huge price difference.

So i assume you didn't refer to the Nehalem-ex box.

> The contents of this email are confidential and for the exclusive  
> use of the intended recipient.  If you receive this email in error  
> you should not copy it, retransmit it, use it or disclose its  
> contents but should return it to the sender immediately and delete  
> your copy.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 09:49:09 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 09:49:09 -0400
Subject: [Beowulf] Westmere EX
In-Reply-To: <4D9D1E35.9040802@berkeley.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
Message-ID: <4D9DC0D5.8060802@ias.edu>

On 04/06/2011 10:15 PM, Jon Forrest wrote:
> On 4/6/2011 5:39 PM, Mark Hahn wrote:
> 
>> shrug.  does anyone have serious experience with real apps on manycore
>> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
>> but they're substantially more exotic/rare/expensive.)
> 
> I have a couple 48-core 1U boxes. They can build
> gcc and other large packages very quickly.

But are the makes definitely running in parallel to take advantage of
the multiple cores? I haven't built gcc, so don't know if it uses make's
-j option to do parallel builds.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:03:47 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:03:47 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <20110407085633.GE23560@leitl.org>
References: <20110407085633.GE23560@leitl.org>
Message-ID: <4D9DC443.9080502@ias.edu>

On 04/07/2011 04:56 AM, Eugen Leitl wrote:
> 
> "It's a really nice round number," says Stowe, the CEO and founder of Cycle
> Computing,

Clearly he's a marketing man. Everyone know real computer guys think in
powers of 2.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:16:53 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:16:53 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <20110407085633.GE23560@leitl.org>
References: <20110407085633.GE23560@leitl.org>
Message-ID: <4D9DC755.5070004@ias.edu>

A great publicity stunt, but I still don't think it qualifies as a
"real" HPC cluster achievement.  See comments/objections in-line below.

On 04/07/2011 04:56 AM, Eugen Leitl wrote:
> 
> http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n 
> 
> The cluster ran for eight hours

That's not very long for HPC jobs. How much would the performance have
degraded if it started to run into the daytime hours, when demand for
CPU cycles  in EC2 would be at their peak?

> Genentech benefited from the high number of cores
> because its calculations were "embarrassingly parallel," with no
> communication between nodes, so performance stats "scaled linearly with the
> number of cores," Corn said.
> 

So it wasn't really a cluster at all, but a giant batch scheduling system.

I probably have a stricter sense of what makes a cluster than some
others, so let's not argue on the the definition of cluster and split
hairs. In my book, a cluster involves parallel communication between the
processes using MPI, PVM or some other parallel communications paradigm.

And BTW, my comments are not directed Eugene for posting this. Just
starting a general discussion on this article...

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:21:19 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:21:19 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
Message-ID: <4D9DC85F.9080503@ias.edu>

Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?

In all these commercials, the protagonists say "to the cloud" for their
solution, but then when they show them using Microsoft Windows to access
"the cloud", they're not using the cloud at all.

In fact, in one commercial, the one where the wife/mother is fixing the
family portrait, she's using a photoshop-like program on her own
desktop, not even the Internet is needed.

Not only do they use the term "cloud" incorrectly, they don't even show
how using Microsoft products give you and advantage for using "the cloud"

AAAAAAARRRRRRRGGGH!

Okay. Venting over. Whew! I feel better already.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Thu Apr  7 10:27:27 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 15:27:27 +0100
Subject: [Beowulf] Microsoft "cloud" commercials.
References: <4D9DC85F.9080503@ias.edu>
Message-ID: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> 
> Is anyone else as annoyed by the Microsoft "cloud" commercials as I
am?
> 
> 
In London there is a saturation of Microsoft Cloud advert posters in the
mainline stations
and Tube lines serving the City (the financial district) and Canary
Wharf.

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:40:28 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:40:28 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <4D9DC85F.9080503@ias.edu>
	<207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4D9DCCDC.1080607@ias.edu>


On 04/07/2011 10:27 AM, Hearns, John wrote:
>>
>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I
> am?
>>
>>
> In London there is a saturation of Microsoft Cloud advert posters in the
> mainline stations
> and Tube lines serving the City (the financial district) and Canary
> Wharf.
> 

But do they annoy you? ;)

For those of you outside the US, here's the commercials I'm referring to:

1. http://youtu.be/-HRrbLA7rss

2. http://youtu.be/mjtqoQE_ezA

3. http://youtu.be/_lu6v6hE_bA

4. http://youtu.be/Lel3swo4RMc

Out of these only (1) could possibly be using the cloud, if they're
using Google docs or something similar to create and share their documents.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at UR.Rochester.edu  Thu Apr  7 10:49:09 2011
From: scrusan at UR.Rochester.edu (Crusan, Steve)
Date: Thu, 7 Apr 2011 10:49:09 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
References: <4D9DC85F.9080503@ias.edu>
Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>

Windows HPC Server 2008 also has a builtin feature for an end user to submit excel docs to a windows cluster to do intense timesheet and office supplies calculations...

----------------------
Steve Crusan
System Administrator
Center for Research Computing


-----Original Message-----
From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
Sent: Thu 4/7/2011 10:21 AM
To: Beowulf Mailing List
Subject: [Beowulf] Microsoft "cloud" commercials.
 
Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?

In all these commercials, the protagonists say "to the cloud" for their
solution, but then when they show them using Microsoft Windows to access
"the cloud", they're not using the cloud at all.

In fact, in one commercial, the one where the wife/mother is fixing the
family portrait, she's using a photoshop-like program on her own
desktop, not even the Internet is needed.

Not only do they use the term "cloud" incorrectly, they don't even show
how using Microsoft products give you and advantage for using "the cloud"

AAAAAAARRRRRRRGGGH!

Okay. Venting over. Whew! I feel better already.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110407/b92c4881/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From dag at sonsorol.org  Thu Apr  7 11:03:25 2011
From: dag at sonsorol.org (Chris Dagdigian)
Date: Thu, 07 Apr 2011 11:03:25 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <4D9DC755.5070004@ias.edu>
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu>
Message-ID: <4D9DD23D.8090908@sonsorol.org>


The CycleComputing folks are good people in my book and I bet more than 
a few are subscribed to this list. The founders are old-school Condor 
gurus with a long track record in this field.

One of the nice things about their work is how "usable" it is to real 
people with real production computing requirements - in the IAAS cloud 
space there are way too many marketing robots talking vague BS about 
"cloud bursting", "hybrid clusters" and storage aggregation/access 
across LAN/WAN distances. Cycle has built, deployed & delivered all of 
this with (what I'd consider) a bare minimum of marketing and chest 
thumping.

It's not a PR gimmick and limiting the definition of "cluster" to only 
systems that run parallel applications would alienate quite a few of us 
on this list :) In the life sciences a typical cluster might run a 
mixture of 80-90% serial jobs with a small scattering of real MPI apps 
running alongside.

I get cynical about this stuff because in the cloud space you see way 
too many commercial people promising the world without actually 
delivering anything (other than carefully hand-managed reference account 
projects) while the academic & supercomputing folks are all busy 
presenting and bragging about things that will never see the light of 
day after their thesis defense.

There are people like Cycle/Rightscale etc. etc. who actually rise above 
the hype and deliver clever & usable stuff with a minimum of marketing BS.

My $.02 of course

-Chris


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 11:05:53 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 11:05:53 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
References: <4D9DC85F.9080503@ias.edu>
	<9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
Message-ID: <4D9DD2D1.9070309@ias.edu>

"Cluster" != "Cloud"

The Cloud, by definition requires the Internet. Clusters do not. In
fact, I bet the NSA can show you many clusters that are not connect to
the Internet at all.

While I'm at it, "Grid" != ("Cluster" || "Cloud") either!


On 04/07/2011 10:49 AM, Crusan, Steve wrote:
> Windows HPC Server 2008 also has a builtin feature for an end user to
> submit excel docs to a windows cluster to do intense timesheet and
> office supplies calculations...
> 
> ----------------------
> Steve Crusan
> System Administrator
> Center for Research Computing
> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
> Sent: Thu 4/7/2011 10:21 AM
> To: Beowulf Mailing List
> Subject: [Beowulf] Microsoft "cloud" commercials.
> 
> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?
> 
> In all these commercials, the protagonists say "to the cloud" for their
> solution, but then when they show them using Microsoft Windows to access
> "the cloud", they're not using the cloud at all.
> 
> In fact, in one commercial, the one where the wife/mother is fixing the
> family portrait, she's using a photoshop-like program on her own
> desktop, not even the Internet is needed.
> 
> Not only do they use the term "cloud" incorrectly, they don't even show
> how using Microsoft products give you and advantage for using "the cloud"
> 
> AAAAAAARRRRRRRGGGH!
> 
> Okay. Venting over. Whew! I feel better already.
> 
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 11:13:43 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 11:13:43 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <4D9DD23D.8090908@sonsorol.org>
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu>
	<4D9DD23D.8090908@sonsorol.org>
Message-ID: <4D9DD4A7.7060601@ias.edu>

On 04/07/2011 11:03 AM, Chris Dagdigian wrote:
> 
> The CycleComputing folks are good people in my book and I bet more than 
> a few are subscribed to this list. The founders are old-school Condor 
> gurus with a long track record in this field.
> 
> One of the nice things about their work is how "usable" it is to real 
> people with real production computing requirements - in the IAAS cloud 
> space there are way too many marketing robots talking vague BS about 
> "cloud bursting", "hybrid clusters" and storage aggregation/access 
> across LAN/WAN distances. Cycle has built, deployed & delivered all of 
> this with (what I'd consider) a bare minimum of marketing and chest 
> thumping.
> 
> It's not a PR gimmick and limiting the definition of "cluster" to only 
> systems that run parallel applications would alienate quite a few of us 
> on this list :) In the life sciences a typical cluster might run a 
> mixture of 80-90% serial jobs with a small scattering of real MPI apps 
> running alongside.

Do not confuse "scientific computing" or "high performance computing"
with "cluster". All terms are definitely related, but you can do
scientific/high-perfomance computing without a "cluster."

As someone who also works in life sciences, I know that there are a lot
of life science tasks that are embarrassingly parallel. Running these
tasks on a bunch of different machines simultaneously is definitely
scientific and high performance computing, but it doesn't necessarily
require a cluster. Folding at home, for example.

> 
> I get cynical about this stuff because in the cloud space you see way 
> too many commercial people promising the world without actually 
> delivering anything (other than carefully hand-managed reference account 
> projects) while the academic & supercomputing folks are all busy 
> presenting and bragging about things that will never see the light of 
> day after their thesis defense.

Me, too, which is why I started ranting about Microsoft's cloud
commercials in a separate thread. ;) It's also why I'm starting to get
picky about how the term "cluster" is used. More and more, I see people
confusing "cloud" with "cluster". I guess that cynicism is what caused
me to reply to the original post.

> 
> There are people like Cycle/Rightscale etc. etc. who actually rise above 
> the hype and deliver clever & usable stuff with a minimum of marketing BS.
> 
> My $.02 of course
> 
> -Chris
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at UR.Rochester.edu  Thu Apr  7 11:13:32 2011
From: scrusan at UR.Rochester.edu (Crusan, Steve)
Date: Thu, 7 Apr 2011 11:13:32 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
	<4D9DD2D1.9070309@ias.edu>
Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>

Oh I understand the difference, but I thought I'd take this opportunity to bash MS.

But, since MS's cloud runs off of MS Azure and MS Server 2008, I would bet the excel functionality would be possible.

----------------------
Steve Crusan
System Administrator
Center for Research Computing


-----Original Message-----
From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
Sent: Thu 4/7/2011 11:05 AM
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] Microsoft "cloud" commercials.
 
"Cluster" != "Cloud"

The Cloud, by definition requires the Internet. Clusters do not. In
fact, I bet the NSA can show you many clusters that are not connect to
the Internet at all.

While I'm at it, "Grid" != ("Cluster" || "Cloud") either!


On 04/07/2011 10:49 AM, Crusan, Steve wrote:
> Windows HPC Server 2008 also has a builtin feature for an end user to
> submit excel docs to a windows cluster to do intense timesheet and
> office supplies calculations...
> 
> ----------------------
> Steve Crusan
> System Administrator
> Center for Research Computing
> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
> Sent: Thu 4/7/2011 10:21 AM
> To: Beowulf Mailing List
> Subject: [Beowulf] Microsoft "cloud" commercials.
> 
> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?
> 
> In all these commercials, the protagonists say "to the cloud" for their
> solution, but then when they show them using Microsoft Windows to access
> "the cloud", they're not using the cloud at all.
> 
> In fact, in one commercial, the one where the wife/mother is fixing the
> family portrait, she's using a photoshop-like program on her own
> desktop, not even the Internet is needed.
> 
> Not only do they use the term "cloud" incorrectly, they don't even show
> how using Microsoft products give you and advantage for using "the cloud"
> 
> AAAAAAARRRRRRRGGGH!
> 
> Okay. Venting over. Whew! I feel better already.
> 
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110407/bed316f8/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From john.hearns at mclaren.com  Thu Apr  7 11:13:24 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 16:13:24 +0100
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> 
	<4D9DD23D.8090908@sonsorol.org>
Message-ID: <207BB2F60743C34496BE41039233A809042454EA@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> 
> There are people like Cycle/Rightscale etc. etc. who actually rise
> above
> the hype and deliver clever & usable stuff with a minimum of marketing
> BS.
> 
> My $.02 of course

Surely your $.02 per cpu per minute?

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 11:15:58 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 11:15:58 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>
References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
	<4D9DD2D1.9070309@ias.edu>
	<9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>
Message-ID: <4D9DD52E.8040103@ias.edu>

Oh, sorry. I missed the sarcasm. I thought you were defending MS.

The "office supplies calculations" should have tripped my sarcasm
detector immediately!

Sorry. I'm in a rare (and ranting!) mood today. Must be time for a vacation.

Prentice


On 04/07/2011 11:13 AM, Crusan, Steve wrote:
> Oh I understand the difference, but I thought I'd take this opportunity
> to bash MS.
> 
> But, since MS's cloud runs off of MS Azure and MS Server 2008, I would
> bet the excel functionality would be possible.
> 
> ----------------------
> Steve Crusan
> System Administrator"Crusan, Steve" <scrusan at UR.Rochester.edu>
> Center for Research Computing
> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
> Sent: Thu 4/7/2011 11:05 AM
> Cc: Beowulf Mailing List
> Subject: Re: [Beowulf] Microsoft "cloud" commercials.
> 
> "Cluster" != "Cloud"
> 
> The Cloud, by definition requires the Internet. Clusters do not. In
> fact, I bet the NSA can show you many clusters that are not connect to
> the Internet at all.
> 
> While I'm at it, "Grid" != ("Cluster" || "Cloud") either!
> 
> 
> On 04/07/2011 10:49 AM, Crusan, Steve wrote:
>> Windows HPC Server 2008 also has a builtin feature for an end user to
>> submit excel docs to a windows cluster to do intense timesheet and
>> office supplies calculations...
>>
>> ----------------------
>> Steve Crusan
>> System Administrator
>> Center for Research Computing
>>
>>
>>
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
>> Sent: Thu 4/7/2011 10:21 AM
>> To: Beowulf Mailing List
>> Subject: [Beowulf] Microsoft "cloud" commercials.
>>
>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?
>>
>> In all these commercials, the protagonists say "to the cloud" for their
>> solution, but then when they show them using Microsoft Windows to access
>> "the cloud", they're not using the cloud at all.
>>
>> In fact, in one commercial, the one where the wife/mother is fixing the
>> family portrait, she's using a photoshop-like program on her own
>> desktop, not even the Internet is needed.
>>
>> Not only do they use the term "cloud" incorrectly, they don't even show
>> how using Microsoft products give you and advantage for using "the cloud"
>>
>> AAAAAAARRRRRRRGGGH!
>>
>> Okay. Venting over. Whew! I feel better already.
>>
>> --
>> Prentice
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Thu Apr  7 11:35:24 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Thu, 07 Apr 2011 11:35:24 -0400
Subject: [Beowulf] Westmere EX
In-Reply-To: <4D9DC0D5.8060802@ias.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>
	<4D9DC0D5.8060802@ias.edu>
Message-ID: <4D9DD9BC.3090102@runnersroll.com>

On 04/07/11 09:49, Prentice Bisbal wrote:
> On 04/06/2011 10:15 PM, Jon Forrest wrote:
>> On 4/6/2011 5:39 PM, Mark Hahn wrote:
>>
>>> shrug.  does anyone have serious experience with real apps on manycore
>>> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
>>> but they're substantially more exotic/rare/expensive.)
>>
>> I have a couple 48-core 1U boxes. They can build
>> gcc and other large packages very quickly.
> 
> But are the makes definitely running in parallel to take advantage of
> the multiple cores? I haven't built gcc, so don't know if it uses make's
> -j option to do parallel builds.
> 

Yes, see:
http://gcc.gnu.org/install/build.html

In general I see quite nice speedups on my four-core machine at home
running Gentoo, but I find running -j > cores up to 2xcores tends to
produce better results as many packages (especially with recursive
makes) tend to mix configuration (low cpu usage) with makes (high cpu
usage).  The gentoo handbook itself suggests cores+1 for the -j
parameter.  Higher than core -j counts is purely a heuristic, and a few
packages will degrade a bit because in fact 8 (2x4cores) processes are
spawned, each contending heavily for the 4 cores and context switching
starts to slow things down and hurt locality.  Once again, I suppose
this is a YMMV situation.  It would be cool to hack make to dynamically
throttle parallelization based on cpu usage within some given bounds...

I have access to a 48 core box, so if I get a chance I'll generate a
graph for the list on gcc build times by -j count.  Note however that I
don't have root access so I can't clear caches, which should be taken
into account when examining results.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Thu Apr  7 11:42:18 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Thu, 07 Apr 2011 11:42:18 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <4D9DD52E.8040103@ias.edu>
References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>	<4D9DD2D1.9070309@ias.edu>	<9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>
	<4D9DD52E.8040103@ias.edu>
Message-ID: <4D9DDB5A.3030700@runnersroll.com>

>>> -----Original Message-----
>>> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
>>> Sent: Thu 4/7/2011 10:21 AM
>>> To: Beowulf Mailing List
>>> Subject: [Beowulf] Microsoft "cloud" commercials.
>>>
>>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?

I completely agree.  It's a darn shame all those Truth campaigns
concentrate on drugs - clarifying popular media is a desperately needed
service for so many domains (at least in US media).

Although I have to admit I'm not sure if the cloud misnomer or the
disgusting family dynamics of the Photoshop commercial are more
bothering to me.  Always the dopey dad with these commercials...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Thu Apr  7 11:53:31 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 16:53:31 +0100
Subject: [Beowulf] Westmere EX
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu> 
	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
Message-ID: <207BB2F60743C34496BE41039233A8090424562F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>


> -----Original Message-----
> From: Vincent Diepeveen [mailto:diep at xs4all.nl]
> Sent: 07 April 2011 13:53


> 
> But a 8 socket @ 10 core nehalem-ex, in basic configuration will be
> already far above $205k. Probably a $300k or
> so when configured.
> 
> Huge price difference.
> 
> So i assume you didn't refer to the Nehalem-ex box.

I was referring to the Nehalem.


http://www.lasystems.be/Supermicro/SYS-5086B-TRF/Superserver5086B-TRF8-W
ay/product/248987.html


Add 8 CPUs at $4000 per cpu,
and 64 DIMMs at $944 per DIMM


The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Thu Apr  7 12:11:13 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Thu, 07 Apr 2011 12:11:13 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <4D9DD23D.8090908@sonsorol.org>
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu>
	<4D9DD23D.8090908@sonsorol.org>
Message-ID: <4D9DE221.2040806@runnersroll.com>

On 04/07/11 11:03, Chris Dagdigian wrote:
> One of the nice things about their work is how "usable" it is to real 
> people with real production computing requirements - in the IAAS cloud 

I wonder what "real" people with "real" production computing
requirements means here.  See below for further thoughts on my thoughts
on "real" codes and where I suspect they arise.

> It's not a PR gimmick and limiting the definition of "cluster" to only 
> systems that run parallel applications would alienate quite a few of us 
> on this list :) In the life sciences a typical cluster might run a 
> mixture of 80-90% serial jobs with a small scattering of real MPI apps 
> running alongside.

I'm certainly a pragmatist here - use the machines as your organization
feels is best.  However I still have a strong suspicion that most jobs
are serial because of:

1. Lack of experience properly parallelizing codes
2. Lack of proper environment on one's own desktop (i.e. Linux or group
licenses)
3. In rare cases such rapid development and short lifetime of a code
that parallelizing it will take longer than poorly serially coding it
and tolerating the run-times.

I can only hope that within the decade the programming paradigm shifts
along with the hardware and the average bloke becomes at least exposed
to basic parallel programming concepts.

The machine is still a "cluster" - the way it's used shouldn't guide
what it is referred to.  That doesn't mean running serial jobs on a
machine tailored for parallel ones is the best way to use your
time/money.  Probably better for one to simply buy Linux desktops for
all the employees, put them on a typical GigE network and have the
employees submit jobs to some tiny server in the Bosses office which
routes jobs evenly to everyone's machine distributed throughout the
building.

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Apr  7 12:25:35 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 07 Apr 2011 12:25:35 -0400
Subject: [Beowulf] Westmere EX
In-Reply-To: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
Message-ID: <4D9DE57F.4040303@ldeo.columbia.edu>

Vincent Diepeveen wrote:

> GPU monster box, which is basically a few videocards inside such a  
> box stacked up a tad, wil only add a couple of
> thousands.
> 

This price may be OK for the videocard-class GPUs,
but sounds underestimated, at least for Fermi Tesla.
Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
with 448 cores and 3GB RAM per GPU, cost around $10k.
For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
If you care about ECC, that's the price you pay, right?

Gus Correa
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cap at nsc.liu.se  Thu Apr  7 13:26:51 2011
From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=)
Date: Thu, 7 Apr 2011 19:26:51 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
Message-ID: <201104071926.56911.cap@nsc.liu.se>

On Thursday, April 07, 2011 02:39:19 am Mark Hahn wrote:
...
> I bet there will be 100x more 4s servers build with these chips than 8s.
> and 1000x more 2s than 4s...

Sounds about right :-) Not your average compute node by a long shot.

> a friend noticed something weird on intel's spec sheets:
> http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC
> 3E
> 
> notice it says 32GB max memory size.  even if that means 32GB/socket,
> it's not all that much.

Certainly looks odd on that page but does likely refer to max DIMM size. With 
64 DIMMs (4 socket example) that would then give you 2T.
 
> I don't know about everyone else, but I'm already bored with core counts ;)
> these also seem fairly warm (130W), considering that they're the fancy
> new 32nm process and run at modest clock rates...

It's the size of the beast... (caused by the number of cores and size of last 
level cache).

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110407/b4bb76fd/attachment.sig>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From diep at xs4all.nl  Thu Apr  7 15:26:57 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 7 Apr 2011 21:26:57 +0200
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <4D9DE57F.4040303@ldeo.columbia.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
Message-ID: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>


On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:

> Vincent Diepeveen wrote:
>
>> GPU monster box, which is basically a few videocards inside such a
>> box stacked up a tad, wil only add a couple of
>> thousands.
>>
>
> This price may be OK for the videocard-class GPUs,
> but sounds underestimated, at least for Fermi Tesla.

Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
note there is a 6 GB version, not aware of price will be $$$$ i bet.
or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro

VERSUS

8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.

Factor 100 difference to those cards.

A couple of thousands versus a couple of hundreds of thousands.
Hope i made my point clear.


> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
> with 448 cores and 3GB RAM per GPU, cost around $10k.
> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
> If you care about ECC, that's the price you pay, right?

When fermi released it was a great gpu.

Regrettably they lobotomized the gamers card's double precision as i  
understand,
So it hardly has double precision capabilities; if you go for nvidia  
you sure need a Tesla,
no question about it.

As a company i would buy in 6990's though, they're a lot cheaper and  
roughly 3x faster
than the Nvidia's (for some more than 3x for other occassions less  
than 3x, note the card
has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).

3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for  
AMD
versus 448 cores nvidia with 448 execution units of 32 bits  
multiplication.

Especially because multiplication has improved a lot.

Already having written CUDA code some while ago, i wanted the cheap  
gamers card with big
horse power now at home so  i'm toying on a 6970 now so will be able  
to report to you what is possible to
achieve at that card with respect to prime numbers and such.

I'm a bit amazed so little public initiatives write code for the AMD  
gpu's.

Note that DDR5 ram doesn't have ECC by default, but has in case of  
AMD a CRC calculation
(if i understand it correctly). It's a bit more primitive than ECC,  
but works pretty ok and shows you
also when problems occured there, so figuring out remove what goes on  
is possible.

Make no mistake that this isn't ECC.
We know some HPC centers have as a hard requirement ECC, only nvidia  
is an alternative then.

In earlier posts from some time ago and some years ago i already  
wrote on that governments should
adapt more to how hardware develops rather than demand that hardware  
has to follow them.

HPC has too little cash to demand that from industry.

OpenCL i cannot advice at this moment (for a number of reasons).

AMD-CAL and CUDA are somewhat similar. Sure there is differences, but  
majority of codes are possible
to port quite well (there is exceptions), or easy work arounds.

Any company doing gpgpu i would advice developing both branches of  
code at the same time,
as that gives the company a lot of extra choices for really very  
little extra work. Maybe 1 coder,
and it always allows you to have the fastest setup run your  
production code.

That said we can safely expect that from raw performance coming years  
AMD will keep the leading edge
from crunching viewpoint. Elsewhere i pointed out why.

Even then i'd never bet at just 1 manufacturer. Go for both  
considering the cheap price of it.

For a lot of HPC centers the choice of nvidia will be an easy one, as  
the price of the Fermi cards
is peanuts compared to the price rest of the system and considering  
other demands that's what they'll go for.

That might change once you stick in bunches of videocards in nodes.

Please note that the gpu 'streamcores' or PE's whatever name you want  
to give them, are so bloody fast,
that your code has to work within the PE's themselves and hardly use  
the RAM.

Both for Nvidia as well as AMD, the streamcores are so fast, that you  
simply don't want to lose time on the RAM
when your software runs, let alone that you want to use huge RAM.

Add to that, that nvidia (have to still figure out for AMD) can in  
background stream from and to the gpu's RAM
from the CPU, so if you do really large calculations involving many  
nodes,
all that shouldn't be an issue in the first place.

So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that  
would really amaze me, though i'm sure
there is cases where that happens. If we see however what was ordered  
it mostly is the 3GB Tesla's,
at least on what has been reported, i have no global statistics on  
that...

Now all choices are valid there, but even then we speak about peanuts  
money compared to the price of
a single 8 socket Nehalem-ex box, which fully configured will be  
maybe $300k-$400k or something?

Whereas a set of 4x nvidia will be probably under $15k and 4x AMD  
6990 is 2000 euro.

There won't be 2 gpu nvidia's any soon because of the choice they  
have historically made for the memory controllers.
See explanation of intel fanboy David Kanter for that at  
realworldtech in a special article he wrote there.

Please note i'm not judging AMD nor Nvidia, they have made their  
choices based upon totally different
businessmodels i suspect and we must be happy we have this rich  
choice right now between cpu's from different
manufacturers and gpu's from different manufacturers.

Nvidia really seems to aim at supercomputers, giving their tesla line  
without lobotomization and lobotomizing their
gamers cards, where AMD aims at gamers and their gamercards have full  
functionality
without lobotomization.

Total different businessmodels. Both have their advantages and  
disadvantages.

 From pure performance viewpoint it's easy to see what's faster though.

Yet right now i realize all too well that just too many still  
hesitate between also offering gpu services additional to
cpu services, in which case having a gpu, regardless nvidia or amd,  
kicks butt of course from throughput viewpoint.

To be really honest with you guys, i had expected that by 2011 we  
would have a gpu reaching far over 1 Teraflop double precision  
handsdown. If we see that Nvidia delivers somewhere around 515 Gflop  
and AMD has 2 gpu's on a single card to get over that Teraflop double  
precision (claim is 1.27 Teraflop double precision),
that really is underneath my expectations from a few years ago.

Now of course i hope you realize i'm not coding double precision code  
at all; i'm writing everything in integers of 32 bits for the AMD  
card and the Nvidia equivalent also is using 32 bits integers. The  
ideal way to do calculations on those cards, so also very big  
transforms, is using the 32 x 32 == 64 bits instructions (that's 2  
instructions in case of AMD).

Regards,
Vincent


>
> Gus Correa
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 15:44:25 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 15:44:25 -0400
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <4D9E1419.9000408@ias.edu>

On 04/07/2011 03:26 PM, Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 
> 

You can't do a direct comparison between a CPU and a GPU. There are many
things that GPUs can't do (or can't do well) that are still better done
on a CPU. Even NVidia acknowledges in most of their promotional and
educational literature.

One example would be a code with a lot of branching.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Apr  7 16:37:46 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 07 Apr 2011 16:37:46 -0400
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <4D9E209A.1040408@ldeo.columbia.edu>

Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 

Not so much.

In your original message you said:

"GPU monster box, which is basically a few videocards inside such a
box stacked up a tad, wil only add a couple of thousands."

So, first it was a few GPUs on a box (whatever else the box
might have inside) for a couple of thousand (if dollars or euros
you did not specify).

Now you checked out the real prices, and said
that a *single* Fermi Tesla C2070 cost ~$2,200
(just the GPU alone, price in US dollars I suppose),
which is more like the real thing.

However, instead of admitting that your previous numbers were mistaken,
you insist that:

"Hope i made my point clear.".

Is this how you play chess?  :)
Even if your opponent is a computer, he/she/it might get
a bit discouraged.
You always win, even before the game starts.

Anyway, I don't play chess, I am no GPU expert,
I don't know about the lobotomizing of Fermi (I hope you're not talking 
about Enrico, he's dead),
and I don't think we're going anywhere with this discussion.
However, the GPU prices you sent in your original
email to the list were underestimated,
although I am afraid I may not be able to make this point go
across to you.
The prices you sent were too low,
at least when it comes to GPUs with ECC,
which is what is reliable for HPC.

Thank you,
Gus Correa

> 
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> 
> When fermi released it was a great gpu.
> 
> Regrettably they lobotomized the gamers card's double precision as i 
> understand,
> So it hardly has double precision capabilities; if you go for nvidia you 
> sure need a Tesla,
> no question about it.
> 
> As a company i would buy in 6990's though, they're a lot cheaper and 
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less than 
> 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD
> versus 448 cores nvidia with 448 execution units of 32 bits multiplication.
> 
> Especially because multiplication has improved a lot.
> 
> Already having written CUDA code some while ago, i wanted the cheap 
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able to 
> report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> 
> I'm a bit amazed so little public initiatives write code for the AMD gpu's.
> 
> Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a 
> CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC, but 
> works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on is 
> possible.
> 
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia is 
> an alternative then.
> 
> In earlier posts from some time ago and some years ago i already wrote 
> on that governments should
> adapt more to how hardware develops rather than demand that hardware has 
> to follow them.
> 
> HPC has too little cash to demand that from industry.
> 
> OpenCL i cannot advice at this moment (for a number of reasons).
> 
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but 
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> 
> Any company doing gpgpu i would advice developing both branches of code 
> at the same time,
> as that gives the company a lot of extra choices for really very little 
> extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your production 
> code.
> 
> That said we can safely expect that from raw performance coming years 
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> 
> Even then i'd never bet at just 1 manufacturer. Go for both considering 
> the cheap price of it.
> 
> For a lot of HPC centers the choice of nvidia will be an easy one, as 
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering 
> other demands that's what they'll go for.
> 
> That might change once you stick in bunches of videocards in nodes.
> 
> Please note that the gpu 'streamcores' or PE's whatever name you want to 
> give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use the 
> RAM.
> 
> Both for Nvidia as well as AMD, the streamcores are so fast, that you 
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> 
> Add to that, that nvidia (have to still figure out for AMD) can in 
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many nodes,
> all that shouldn't be an issue in the first place.
> 
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would 
> really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered it 
> mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on that...
> 
> Now all choices are valid there, but even then we speak about peanuts 
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be maybe 
> $300k-$400k or something?
> 
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 
> is 2000 euro.
> 
> There won't be 2 gpu nvidia's any soon because of the choice they have 
> historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at realworldtech 
> in a special article he wrote there.
> 
> Please note i'm not judging AMD nor Nvidia, they have made their choices 
> based upon totally different
> businessmodels i suspect and we must be happy we have this rich choice 
> right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> 
> Nvidia really seems to aim at supercomputers, giving their tesla line 
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full 
> functionality
> without lobotomization.
> 
> Total different businessmodels. Both have their advantages and 
> disadvantages.
> 
>  From pure performance viewpoint it's easy to see what's faster though.
> 
> Yet right now i realize all too well that just too many still hesitate 
> between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd, 
> kicks butt of course from throughput viewpoint.
> 
> To be really honest with you guys, i had expected that by 2011 we would 
> have a gpu reaching far over 1 Teraflop double precision handsdown. If 
> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 
> gpu's on a single card to get over that Teraflop double precision (claim 
> is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> 
> Now of course i hope you realize i'm not coding double precision code at 
> all; i'm writing everything in integers of 32 bits for the AMD card and 
> the Nvidia equivalent also is using 32 bits integers. The ideal way to 
> do calculations on those cards, so also very big transforms, is using 
> the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD).
> 
> Regards,
> Vincent
> 
> 
>>
>> Gus Correa
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Thu Apr  7 18:57:38 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Fri, 08 Apr 2011 05:57:38 +0700
Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to
 tackle huge scientific challenges
Message-ID: <4D9E4162.3030004@pathscale.com>

I just saw this on another ML and thought it may be of interest
------------
http://googleblog.blogspot.com/2011/04/1-billion-computing-core-hours-for.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Apr  7 21:03:07 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 07 Apr 2011 21:03:07 -0400
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <4D9E5ECB.60608@ldeo.columbia.edu>

Thank you for the information about AMD-CAL and the AMD GPUs.
Does AMD plan any GPU product with 64-bit and ECC,
similar to Tesla/Fermi?

The lack of a language standard may still be a hurdle here.
I guess there were old postings here about CUDA and OpenGL.
What fraction of the (non-gaming) GPU code is being written these days
in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using
compiler directives like those in the PGI compilers?

Thank you,
Gus Correa

Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 
> 
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> 
> When fermi released it was a great gpu.
> 
> Regrettably they lobotomized the gamers card's double precision as i 
> understand,
> So it hardly has double precision capabilities; if you go for nvidia you 
> sure need a Tesla,
> no question about it.
> 
> As a company i would buy in 6990's though, they're a lot cheaper and 
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less than 
> 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD
> versus 448 cores nvidia with 448 execution units of 32 bits multiplication.
> 
> Especially because multiplication has improved a lot.
> 
> Already having written CUDA code some while ago, i wanted the cheap 
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able to 
> report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> 
> I'm a bit amazed so little public initiatives write code for the AMD gpu's.
> 
> Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a 
> CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC, but 
> works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on is 
> possible.
> 
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia is 
> an alternative then.
> 
> In earlier posts from some time ago and some years ago i already wrote 
> on that governments should
> adapt more to how hardware develops rather than demand that hardware has 
> to follow them.
> 
> HPC has too little cash to demand that from industry.
> 
> OpenCL i cannot advice at this moment (for a number of reasons).
> 
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but 
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> 
> Any company doing gpgpu i would advice developing both branches of code 
> at the same time,
> as that gives the company a lot of extra choices for really very little 
> extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your production 
> code.
> 
> That said we can safely expect that from raw performance coming years 
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> 
> Even then i'd never bet at just 1 manufacturer. Go for both considering 
> the cheap price of it.
> 
> For a lot of HPC centers the choice of nvidia will be an easy one, as 
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering 
> other demands that's what they'll go for.
> 
> That might change once you stick in bunches of videocards in nodes.
> 
> Please note that the gpu 'streamcores' or PE's whatever name you want to 
> give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use the 
> RAM.
> 
> Both for Nvidia as well as AMD, the streamcores are so fast, that you 
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> 
> Add to that, that nvidia (have to still figure out for AMD) can in 
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many nodes,
> all that shouldn't be an issue in the first place.
> 
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would 
> really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered it 
> mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on that...
> 
> Now all choices are valid there, but even then we speak about peanuts 
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be maybe 
> $300k-$400k or something?
> 
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 
> is 2000 euro.
> 
> There won't be 2 gpu nvidia's any soon because of the choice they have 
> historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at realworldtech 
> in a special article he wrote there.
> 
> Please note i'm not judging AMD nor Nvidia, they have made their choices 
> based upon totally different
> businessmodels i suspect and we must be happy we have this rich choice 
> right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> 
> Nvidia really seems to aim at supercomputers, giving their tesla line 
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full 
> functionality
> without lobotomization.
> 
> Total different businessmodels. Both have their advantages and 
> disadvantages.
> 
>  From pure performance viewpoint it's easy to see what's faster though.
> 
> Yet right now i realize all too well that just too many still hesitate 
> between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd, 
> kicks butt of course from throughput viewpoint.
> 
> To be really honest with you guys, i had expected that by 2011 we would 
> have a gpu reaching far over 1 Teraflop double precision handsdown. If 
> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 
> gpu's on a single card to get over that Teraflop double precision (claim 
> is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> 
> Now of course i hope you realize i'm not coding double precision code at 
> all; i'm writing everything in integers of 32 bits for the AMD card and 
> the Nvidia equivalent also is using 32 bits integers. The ideal way to 
> do calculations on those cards, so also very big transforms, is using 
> the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD).
> 
> Regards,
> Vincent
> 
> 
>>
>> Gus Correa
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From kilian.cavalotti.work at gmail.com  Fri Apr  8 07:08:01 2011
From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI)
Date: Fri, 8 Apr 2011 13:08:01 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
Message-ID: <BANLkTikHZF7-n6W-wORPpoTBrSxgRrKevA@mail.gmail.com>

Hi Mark,

On Thu, Apr 7, 2011 at 2:39 AM, Mark Hahn <hahn at mcmaster.ca> wrote:
> notice it says 32GB max memory size. ?even if that means 32GB/socket,
> it's not all that much.

It's actually 32GB per DIMM, so up to 512GB per socket.

Cheers,
-- 
Kilian
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Fri Apr  8 08:45:09 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Fri, 8 Apr 2011 08:45:09 -0400 (EDT)
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <47386.192.168.93.213.1302266709.squirrel@mail.eadline.org>

All:

This video may help clear things up:

  http://www.youtube.com/watch?v=usGkq7tAhfc

have a nice weekend

--
Doug


>
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
>
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
>
> VERSUS
>
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
>
> Factor 100 difference to those cards.
>
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
>
>
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
>
> When fermi released it was a great gpu.
>
> Regrettably they lobotomized the gamers card's double precision as i
> understand,
> So it hardly has double precision capabilities; if you go for nvidia
> you sure need a Tesla,
> no question about it.
>
> As a company i would buy in 6990's though, they're a lot cheaper and
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less
> than 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
>
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for
> AMD
> versus 448 cores nvidia with 448 execution units of 32 bits
> multiplication.
>
> Especially because multiplication has improved a lot.
>
> Already having written CUDA code some while ago, i wanted the cheap
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able
> to report to you what is possible to
> achieve at that card with respect to prime numbers and such.
>
> I'm a bit amazed so little public initiatives write code for the AMD
> gpu's.
>
> Note that DDR5 ram doesn't have ECC by default, but has in case of
> AMD a CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC,
> but works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on
> is possible.
>
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia
> is an alternative then.
>
> In earlier posts from some time ago and some years ago i already
> wrote on that governments should
> adapt more to how hardware develops rather than demand that hardware
> has to follow them.
>
> HPC has too little cash to demand that from industry.
>
> OpenCL i cannot advice at this moment (for a number of reasons).
>
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
>
> Any company doing gpgpu i would advice developing both branches of
> code at the same time,
> as that gives the company a lot of extra choices for really very
> little extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your
> production code.
>
> That said we can safely expect that from raw performance coming years
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
>
> Even then i'd never bet at just 1 manufacturer. Go for both
> considering the cheap price of it.
>
> For a lot of HPC centers the choice of nvidia will be an easy one, as
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering
> other demands that's what they'll go for.
>
> That might change once you stick in bunches of videocards in nodes.
>
> Please note that the gpu 'streamcores' or PE's whatever name you want
> to give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use
> the RAM.
>
> Both for Nvidia as well as AMD, the streamcores are so fast, that you
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
>
> Add to that, that nvidia (have to still figure out for AMD) can in
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many
> nodes,
> all that shouldn't be an issue in the first place.
>
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that
> would really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered
> it mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on
> that...
>
> Now all choices are valid there, but even then we speak about peanuts
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be
> maybe $300k-$400k or something?
>
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD
> 6990 is 2000 euro.
>
> There won't be 2 gpu nvidia's any soon because of the choice they
> have historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at
> realworldtech in a special article he wrote there.
>
> Please note i'm not judging AMD nor Nvidia, they have made their
> choices based upon totally different
> businessmodels i suspect and we must be happy we have this rich
> choice right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
>
> Nvidia really seems to aim at supercomputers, giving their tesla line
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full
> functionality
> without lobotomization.
>
> Total different businessmodels. Both have their advantages and
> disadvantages.
>
>  From pure performance viewpoint it's easy to see what's faster though.
>
> Yet right now i realize all too well that just too many still
> hesitate between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd,
> kicks butt of course from throughput viewpoint.
>
> To be really honest with you guys, i had expected that by 2011 we
> would have a gpu reaching far over 1 Teraflop double precision
> handsdown. If we see that Nvidia delivers somewhere around 515 Gflop
> and AMD has 2 gpu's on a single card to get over that Teraflop double
> precision (claim is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
>
> Now of course i hope you realize i'm not coding double precision code
> at all; i'm writing everything in integers of 32 bits for the AMD
> card and the Nvidia equivalent also is using 32 bits integers. The
> ideal way to do calculations on those cards, so also very big
> transforms, is using the 32 x 32 == 64 bits instructions (that's 2
> instructions in case of AMD).
>
> Regards,
> Vincent
>
>
>>
>> Gus Correa
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at mcmaster.ca  Fri Apr  8 08:45:08 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 8 Apr 2011 08:45:08 -0400 (EDT)
Subject: [Beowulf] Westmere EX
In-Reply-To: <BANLkTikHZF7-n6W-wORPpoTBrSxgRrKevA@mail.gmail.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<BANLkTikHZF7-n6W-wORPpoTBrSxgRrKevA@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1104080838380.12696@coffee.psychology.mcmaster.ca>

>> notice it says 32GB max memory size. ??even if that means 32GB/socket,
>> it's not all that much.
>
> It's actually 32GB per DIMM, so up to 512GB per socket.

right - I eventually found the non-marketing docs.  each socket 
has two memory controllers, each of which supports 2 "intel scalable memory"
channels, which support an intel scalable memory buffer, which supports
4 dimms.   (the ISMB actually referred to as "advanced memory buffer"
in one place, like from fbdimm days...)

it also has double-bit correction, triple bit detection on the last-level
cache.  definitely not designed for cheap or even compact systems...

-mark
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From eugen at leitl.org  Fri Apr  8 15:42:37 2011
From: eugen at leitl.org (Eugen Leitl)
Date: Fri, 8 Apr 2011 21:42:37 +0200
Subject: [Beowulf] [FoRK] Cray help?? Re: FaceBook tries to cream Google
Message-ID: <20110408194237.GH23560@leitl.org>

----- Forwarded message from "J. Andrew Rogers" <andrew at ceruleansystems.com> -----

From: "J. Andrew Rogers" <andrew at ceruleansystems.com>
Date: Fri, 8 Apr 2011 12:27:35 -0700
To: Friends of Rohit Khare <fork at xent.com>
Subject: Re: [FoRK] Cray help??  Re:  FaceBook tries to cream Google
X-Mailer: Apple Mail (2.1084)
Reply-To: Friends of Rohit Khare <fork at xent.com>


On Apr 8, 2011, at 11:15 AM, Stephen Williams wrote:
> I used RabbitMQ not long ago.  Impressed with some of it, not with a lot of the rest.  Digging through Erlang to determine its real details and limitations was interesting.  The group that had chosen it assumed magic that was not there.  Bottlenecks were going to kill scalability using the naive design.


ZeroMQ is not an MQ despite its name. It is a high-performance implementation of messaging design patterns, including some that are MQ-like. I believe it had aspirations to be an MQ many years ago but turned into an MPI-like high-performance messaging library that abstracts network, IPC, and in-process communication.

The basic network performance and scalability of ZeroMQ is similar to MPI. Underneath the hood it is just a collection of lockless, async structures grafted to the usual operating system hooks. 

Thinking of it as a competitor to MPI in terms of basic functionality is probably the correct framing.

J. Andrew Rogers


_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From eugen at leitl.org  Fri Apr  8 15:42:51 2011
From: eugen at leitl.org (Eugen Leitl)
Date: Fri, 8 Apr 2011 21:42:51 +0200
Subject: [Beowulf] [FoRK] FaceBook tries to cream Google
Message-ID: <20110408194251.GI23560@leitl.org>

----- Forwarded message from "J. Andrew Rogers" <andrew at ceruleansystems.com> -----

From: "J. Andrew Rogers" <andrew at ceruleansystems.com>
Date: Fri, 8 Apr 2011 10:36:31 -0700
To: Friends of Rohit Khare <fork at xent.com>
Subject: Re: [FoRK] FaceBook tries to cream Google
X-Mailer: Apple Mail (2.1084)
Reply-To: Friends of Rohit Khare <fork at xent.com>


On Apr 8, 2011, at 8:05 AM, Stephen Williams wrote:
> 
> Agreed.  Strange that MPI isn't more widely used (outside supercomputing projects).  Although, I'm not aware of it expecting and handling faults / rework as a good Mapreduce imitation, and similar systems, must.


It is not that strange, MPI is a bit brittle as a communication library standard. Implementations tend to make simplifying assumptions that are not valid for some parallel applications. You can patch it up to do anything but the level of effort required seems to relegate it to just being used in scientific computing for which it was designed. I've seen ZeroMQ being increasingly used for roughly the same purpose as MPI in "normal" distributed systems, and I personally do not see much reason to prefer the latter over the former for most things. 

The difference is history. MPI's weakness is that it started from a mediocre design that immediately became part of a standards process, with all the politics and buy-in that entails. It is also badly documented as a practical matter. ZMQ also started with a somewhat dodgy early design but as a library rather than a standard; it was iterated by hackers over several versions into a more sensible and capable design. ZMQ has been willing to break backward compatibility to fix behaviors that irritated the programmers that use it or add badly needed features, which is possible because the "standard" is the implementation.


J. Andrew Rogers


_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Tue Apr 12 16:31:41 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 12 Apr 2011 16:31:41 -0400
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -
	pre-alpha release
Message-ID: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com>

If you are using the "Job to Core Binding" feature in SGE and running
SGE on newer hardware, then please give the new hwloc enabled
loadcheck a try.

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

The current hardware topology discovery library (Portable Linux
Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
hardware topology may not be detected correctly by PLPA.

If you are running SGE on AMD Magny-Cours servers, please post your
loadcheck output, as it is known to be wrong when handled by PLPA.

The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
support in later releases of Grid Engine / Grid Scheduler.

http://gridscheduler.sourceforge.net/

Thanks!!
Rayson
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Wed Apr 13 12:21:21 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Wed, 13 Apr 2011 12:21:21 -0400
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
 -pre-alpha release
In-Reply-To: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2>
References: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com>
	<26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2>
Message-ID: <BANLkTinLb2Di_5dFEfemqsZG5UyG4KytBQ@mail.gmail.com>

Carlos,

I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
arch string, so I believe you are running the loadcheck from standard
Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
the one from the Open Grid Scheduler page.

The existing Grid Engine (including the latest Open Grid Scheduler
releases: SGE 6.2u5p1 & SGE 6.2u5p2, or Univa's fork) uses PLPA, and
it is known to be wrong on magny-cours.

(i.e. SGE 6.2u5p1 & SGE 6.2u5p2 from:
http://sourceforge.net/projects/gridscheduler/files/ )


Chansup on the Grid Engine mailing list (it's the general purpose Grid
Engine mailing list for now) tested the version I uploaded last night,
and seems to work on a dual-socket magny-cours AMD machine. It prints:

m_topology      SCCCCCCCCCCCCSCCCCCCCCCCCC

However, I am still fixing the processor, core id mapping code:

http://gridengine.org/pipermail/users/2011-April/000629.html
http://gridengine.org/pipermail/users/2011-April/000628.html

I compiled the hwloc enabled loadcheck on kernel 2.6.34 & glibc 2.12,
so it may not work on machines running lower kernel or glibc versions,
you can download it from:

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

Rayson


On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
<carlosf at cesga.es> wrote:
> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD system
> (and seems to be wrong!):
>
> arch ? ? ? ? ? ?lx24-amd64
> num_proc ? ? ? ?24
> m_socket ? ? ? ?2
> m_core ? ? ? ? ?12
> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
> load_short ? ? ?0.29
> load_medium ? ? 0.13
> load_long ? ? ? 0.04
> mem_free ? ? ? ?26257.382812M
> swap_free ? ? ? 8191.992188M
> virtual_free ? ?34449.375000M
> mem_total ? ? ? 32238.328125M
> swap_total ? ? ?8191.992188M
> virtual_total ? 40430.320312M
> mem_used ? ? ? ?5980.945312M
> swap_used ? ? ? 0.000000M
> virtual_used ? ?5980.945312M
> cpu ? ? ? ? ? ? 0.0%
>
>
> Carlos Fernandez Sanchez
> Systems Manager
> CESGA
> Avda. de Vigo s/n. Campus Vida
> Tel.: (+34) 981569810, ext. 232
> 15705 - Santiago de Compostela
> SPAIN
>
> --------------------------------------------------
> From: "Rayson Ho" <raysonlogin at gmail.com>
> Sent: Tuesday, April 12, 2011 10:31 PM
> To: "Beowulf List" <Beowulf at beowulf.org>
> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
> -pre-alpha release
>
>> If you are using the "Job to Core Binding" feature in SGE and running
>> SGE on newer hardware, then please give the new hwloc enabled
>> loadcheck a try.
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> The current hardware topology discovery library (Portable Linux
>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>> hardware topology may not be detected correctly by PLPA.
>>
>> If you are running SGE on AMD Magny-Cours servers, please post your
>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>
>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>> support in later releases of Grid Engine / Grid Scheduler.
>>
>> http://gridscheduler.sourceforge.net/
>>
>> Thanks!!
>> Rayson
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Fri Apr 15 10:12:00 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Fri, 15 Apr 2011 10:12:00 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding
	enhancement)
Message-ID: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>

Hi all,

Distributing Linux application binaries is proven to be a major issue
as a lot of people wanted to test the hwloc loadcheck but their Linux
versions are older than mine. And compiling SGE from source is not
simple neither -- I wrote a quick & dirty guide for those who don't
want the add-ons but it's usually the extra stuff & dependencies that
fail the build. So I would like to offer pre-compiled binaries and
upload them onto sourceforge.

I know it's a complicated question - what version of Linux should I
use to build Grid Engine / Open Grid Scheduler when the binaries are
for others to consume??

(In case you are interested, the quick compile guide is at:
http://gridscheduler.sourceforge.net/CompileGridEngineSource.html )

Prakashan: I tried to link it statically, and I even tried to compile
an older version of glibc on my machine, but I could not get either of
them to work!!

Rayson


On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath <ppk at ats.ucla.edu> wrote:
> Hi Rayson,
>
> Do you have a statically linked version? Thanks.
>
> ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by
> ./loadcheck)
>
> Prakashan
>
>
>
> On 04/13/2011 09:21 AM, Rayson Ho wrote:
>>
>> Carlos,
>>
>> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
>> arch string, so I believe you are running the loadcheck from standard
>> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
>> the one from the Open Grid Scheduler page.
>>
>> The existing Grid Engine (including the latest Open Grid Scheduler
>> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and
>> it is known to be wrong on magny-cours.
>>
>> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from:
>> http://sourceforge.net/projects/gridscheduler/files/ )
>>
>>
>> Chansup on the Grid Engine mailing list (it's the general purpose Grid
>> Engine mailing list for now) tested the version I uploaded last night,
>> and seems to work on a dual-socket magny-cours AMD machine. It prints:
>>
>> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC
>>
>> However, I am still fixing the processor, core id mapping code:
>>
>> http://gridengine.org/pipermail/users/2011-April/000629.html
>> http://gridengine.org/pipermail/users/2011-April/000628.html
>>
>> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12,
>> so it may not work on machines running lower kernel or glibc versions,
>> you can download it from:
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> Rayson
>>
>>
>>
>> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
>> <carlosf at cesga.es> ?wrote:
>>>
>>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD
>>> system
>>> (and seems to be wrong!):
>>>
>>> arch ? ? ? ? ? ?lx24-amd64
>>> num_proc ? ? ? ?24
>>> m_socket ? ? ? ?2
>>> m_core ? ? ? ? ?12
>>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
>>> load_short ? ? ?0.29
>>> load_medium ? ? 0.13
>>> load_long ? ? ? 0.04
>>> mem_free ? ? ? ?26257.382812M
>>> swap_free ? ? ? 8191.992188M
>>> virtual_free ? ?34449.375000M
>>> mem_total ? ? ? 32238.328125M
>>> swap_total ? ? ?8191.992188M
>>> virtual_total ? 40430.320312M
>>> mem_used ? ? ? ?5980.945312M
>>> swap_used ? ? ? 0.000000M
>>> virtual_used ? ?5980.945312M
>>> cpu ? ? ? ? ? ? 0.0%
>>>
>>>
>>> Carlos Fernandez Sanchez
>>> Systems Manager
>>> CESGA
>>> Avda. de Vigo s/n. Campus Vida
>>> Tel.: (+34) 981569810, ext. 232
>>> 15705 - Santiago de Compostela
>>> SPAIN
>>>
>>> --------------------------------------------------
>>> From: "Rayson Ho"<raysonlogin at gmail.com>
>>> Sent: Tuesday, April 12, 2011 10:31 PM
>>> To: "Beowulf List"<Beowulf at beowulf.org>
>>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
>>> -pre-alpha release
>>>
>>>> If you are using the "Job to Core Binding" feature in SGE and running
>>>> SGE on newer hardware, then please give the new hwloc enabled
>>>> loadcheck a try.
>>>>
>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>>
>>>> The current hardware topology discovery library (Portable Linux
>>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>>>> hardware topology may not be detected correctly by PLPA.
>>>>
>>>> If you are running SGE on AMD Magny-Cours servers, please post your
>>>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>>>
>>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>>>> support in later releases of Grid Engine / Grid Scheduler.
>>>>
>>>> http://gridscheduler.sourceforge.net/
>>>>
>>>> Thanks!!
>>>> Rayson
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Fri Apr 15 10:19:04 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Apr 2011 10:19:04 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
Message-ID: <4DA853D8.8000308@scalableinformatics.com>

On 04/15/2011 10:12 AM, Rayson Ho wrote:

> I know it's a complicated question - what version of Linux should I
> use to build Grid Engine / Open Grid Scheduler when the binaries are
> for others to consume??

I'd recommend a Centos 5.x variant, and possibly a SuSE variant.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Apr 15 12:15:38 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Apr 2011 12:15:38 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <4DA853D8.8000308@scalableinformatics.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
	<4DA853D8.8000308@scalableinformatics.com>
Message-ID: <4DA86F2A.5010708@ias.edu>

On 04/15/2011 10:19 AM, Joe Landman wrote:
> On 04/15/2011 10:12 AM, Rayson Ho wrote:
> 
>> I know it's a complicated question - what version of Linux should I
>> use to build Grid Engine / Open Grid Scheduler when the binaries are
>> for others to consume??
> 
> I'd recommend a Centos 5.x variant, and possibly a SuSE variant.
> 

I agree, but I think that if you can get your hands on an actual RHEL
image, that's what you should use, as long as you already have access to
it.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Fri Apr 15 12:25:10 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Fri, 15 Apr 2011 12:25:10 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <4DA86F2A.5010708@ias.edu>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
	<4DA853D8.8000308@scalableinformatics.com>
	<4DA86F2A.5010708@ias.edu>
Message-ID: <BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>

Thanks all!!

If I build on Centos 5.6, will the binaries run on SuSE & Ubuntu??

(Don't want what versions SuSE & Ubuntu most people are using -- I
have Ubuntu 10 & 11 on my machines, and F13.)

Rayson


On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal <prentice at ias.edu> wrote:
> On 04/15/2011 10:19 AM, Joe Landman wrote:
>> On 04/15/2011 10:12 AM, Rayson Ho wrote:
>>
>>> I know it's a complicated question - what version of Linux should I
>>> use to build Grid Engine / Open Grid Scheduler when the binaries are
>>> for others to consume??
>>
>> I'd recommend a Centos 5.x variant, and possibly a SuSE variant.
>>
>
> I agree, but I think that if you can get your hands on an actual RHEL
> image, that's what you should use, as long as you already have access to
> it.
>
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Apr 15 13:49:55 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Apr 2011 13:49:55 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <BANLkTikKbnGQVkXhwSgfryxVN6zjWoMVbg@mail.gmail.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>	<4DA853D8.8000308@scalableinformatics.com>	<4DA86F2A.5010708@ias.edu>
	<BANLkTikKbnGQVkXhwSgfryxVN6zjWoMVbg@mail.gmail.com>
Message-ID: <4DA88543.6000603@ias.edu>


On 04/15/2011 01:40 PM, Chi Chan wrote:
> On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>> I agree, but I think that if you can get your hands on an actual RHEL
>> image, that's what you should use, as long as you already have access to
>> it.
> 
> Or just use Oracle Linux, it is free to download and distribute, and
> can be used in production:
> 
> http://www.oracle.com/us/technologies/linux/competitive-335546.html
> http://www.oracle.com/us/technologies/027617.pdf
> 
> From my experience, Oracle Linux and RHEL are idential, you can
> compile applications on Oracle Linux and ship it to run on RHEL boxes.

I had recommended RHEL just because its the "gold standard" for all
RHEL-derived distros. CentOS and a few others *should* be identical.
However, I don't think Oracle is. Doesn't Oracle make some changes to
optimize it for running Oracle? I'm not sure of that, which is why I'm
asking and not stating.


-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Sun Apr 17 20:36:40 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Mon, 18 Apr 2011 10:36:40 +1000
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>	<4DA853D8.8000308@scalableinformatics.com>	<4DA86F2A.5010708@ias.edu>
	<BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
Message-ID: <4DAB8798.8070102@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16/04/11 02:25, Rayson Ho wrote:

> If I build on Centos 5.6, will the binaries run on
> SuSE & Ubuntu??

I'd suggest that if you want them to work (and especially
if you want to package them appropriately) then you're
far better off getting a build machine for the OS's you
want to support.  CentOS, SLES, Debian & Ubuntu.

We build all our x86 stuff on our CentOS5 cluster and
rsync it over to our RHEL5 cluster (sadly we can't just
share /usr/local/ between them due to circumstances
beyond our control) without issues.

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz
430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy
=stQg
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Apr 18 09:03:50 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 18 Apr 2011 15:03:50 +0200
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
	binding enhancement)
In-Reply-To: <4DAB8798.8070102@unimelb.edu.au>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>	<4DA853D8.8000308@scalableinformatics.com>	<4DA86F2A.5010708@ias.edu>
	<BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
	<4DAB8798.8070102@unimelb.edu.au>
Message-ID: <DF74F810-AA89-408E-B35B-CBA9184F929E@staff.uni-marburg.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 18.04.2011 um 02:36 schrieb Christopher Samuel:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 16/04/11 02:25, Rayson Ho wrote:
> 
>> If I build on Centos 5.6, will the binaries run on
>> SuSE & Ubuntu??
> 
> I'd suggest that if you want them to work (and especially
> if you want to package them appropriately) then you're
> far better off getting a build machine for the OS's you
> want to support.  CentOS, SLES, Debian & Ubuntu.

Before there was only a common and a platform specific tarball. Does it imply to supply *.rpm in the future? It was always nice to just untar SGE and run it as a normal user w/o any root privilege (yes, rpm2cpio could do). And it was one tarball for all Linux variants. I would vote for staying with this.

- -- Reuti


> We build all our x86 stuff on our CentOS5 cluster and
> rsync it over to our RHEL5 cluster (sadly we can't just
> share /usr/local/ between them due to circumstances
> beyond our control) without issues.
> 
> cheers,
> Chris
> - -- 
>    Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>         http://www.vlsci.unimelb.edu.au/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz
> 430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy
> =stQg
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)

iEYEARECAAYFAk2sNsQACgkQo/GbGkBRnRr55QCdGyBkTKd7EsTWSvVPRWuMQbGA
kOQAniYFwJyMOlwcR3ITHS9nAfGRZndh
=iknW
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr 18 11:24:00 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 18 Apr 2011 11:24:00 -0400 (EDT)
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <DF74F810-AA89-408E-B35B-CBA9184F929E@staff.uni-marburg.de>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
	<4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu>
	<BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
	<4DAB8798.8070102@unimelb.edu.au>
	<DF74F810-AA89-408E-B35B-CBA9184F929E@staff.uni-marburg.de>
Message-ID: <Pine.LNX.4.64.1104181122330.26910@coffee.psychology.mcmaster.ca>

not to be overly surly, 
but this really has nothing to do with beowulf
and is a rather specialized sge support issue...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Mon Apr 18 12:34:11 2011
From: mathog at caltech.edu (David Mathog)
Date: Mon, 18 Apr 2011 09:34:11 -0700
Subject: [Beowulf] Grid Engine build machine
Message-ID: <E1QBrOt-0005OD-A5@mendel.bio.caltech.edu>

Rayson Ho <raysonlogin at gmail.com> wrote
> And compiling SGE from source is not
> simple neither -- I wrote a quick & dirty guide for those who don't
> want the add-ons but it's usually the extra stuff & dependencies that
> fail the build.

Does it still use aimk or has it finally gone over to autoconf, automake?
As I recall aimk was really touchy the last time I built this (4
years ago), with lots of futzing around to convince it to use library
files it should have found on its own.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Apr 18 12:36:19 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 18 Apr 2011 18:36:19 +0200
Subject: [Beowulf] Grid Engine build machine
In-Reply-To: <E1QBrOt-0005OD-A5@mendel.bio.caltech.edu>
References: <E1QBrOt-0005OD-A5@mendel.bio.caltech.edu>
Message-ID: <B603D5CF-D3F8-43EE-A4AE-F4414DB451EE@staff.uni-marburg.de>

Am 18.04.2011 um 18:34 schrieb David Mathog:

> Rayson Ho <raysonlogin at gmail.com> wrote
>> And compiling SGE from source is not
>> simple neither -- I wrote a quick & dirty guide for those who don't
>> want the add-ons but it's usually the extra stuff & dependencies that
>> fail the build.
> 
> Does it still use aimk

Still aimk.

-- Reuti

> or has it finally gone over to autoconf, automake?
> As I recall aimk was really touchy the last time I built this (4
> years ago), with lots of futzing around to convince it to use library
> files it should have found on its own.
> 
> Regards,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Mon Apr 18 14:26:57 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Mon, 18 Apr 2011 14:26:57 -0400
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
 -pre-alpha release
In-Reply-To: <4DA5E85D.4010801@ats.ucla.edu>
References: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com>
	<26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2>
	<BANLkTinLb2Di_5dFEfemqsZG5UyG4KytBQ@mail.gmail.com>
	<4DA5E85D.4010801@ats.ucla.edu>
Message-ID: <BANLkTinMFKDr7t6oARV5vYxkgj1iq1gYKQ@mail.gmail.com>

For those who had issues with earlier version, please try the latest
loadcheck v4:

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

I compiled the binary on Oracle Linux, which is compatible with RHEL
5.x, Scientific Linux or Centos 5.x. I tested the binary on the
standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise
Kernel", Fedora 13, Ubuntu 10.04 LTS.

Optimizing for AMD's NUMA machine characteristics is on the ToDo list.

Rayson


On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath <ppk at ats.ucla.edu> wrote:
> Hi Rayson,
>
> Do you have a statically linked version? Thanks.
>
> ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by
> ./loadcheck)
>
> Prakashan
>
>
>
> On 04/13/2011 09:21 AM, Rayson Ho wrote:
>>
>> Carlos,
>>
>> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
>> arch string, so I believe you are running the loadcheck from standard
>> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
>> the one from the Open Grid Scheduler page.
>>
>> The existing Grid Engine (including the latest Open Grid Scheduler
>> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and
>> it is known to be wrong on magny-cours.
>>
>> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from:
>> http://sourceforge.net/projects/gridscheduler/files/ )
>>
>>
>> Chansup on the Grid Engine mailing list (it's the general purpose Grid
>> Engine mailing list for now) tested the version I uploaded last night,
>> and seems to work on a dual-socket magny-cours AMD machine. It prints:
>>
>> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC
>>
>> However, I am still fixing the processor, core id mapping code:
>>
>> http://gridengine.org/pipermail/users/2011-April/000629.html
>> http://gridengine.org/pipermail/users/2011-April/000628.html
>>
>> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12,
>> so it may not work on machines running lower kernel or glibc versions,
>> you can download it from:
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> Rayson
>>
>>
>>
>> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
>> <carlosf at cesga.es> ?wrote:
>>>
>>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD
>>> system
>>> (and seems to be wrong!):
>>>
>>> arch ? ? ? ? ? ?lx24-amd64
>>> num_proc ? ? ? ?24
>>> m_socket ? ? ? ?2
>>> m_core ? ? ? ? ?12
>>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
>>> load_short ? ? ?0.29
>>> load_medium ? ? 0.13
>>> load_long ? ? ? 0.04
>>> mem_free ? ? ? ?26257.382812M
>>> swap_free ? ? ? 8191.992188M
>>> virtual_free ? ?34449.375000M
>>> mem_total ? ? ? 32238.328125M
>>> swap_total ? ? ?8191.992188M
>>> virtual_total ? 40430.320312M
>>> mem_used ? ? ? ?5980.945312M
>>> swap_used ? ? ? 0.000000M
>>> virtual_used ? ?5980.945312M
>>> cpu ? ? ? ? ? ? 0.0%
>>>
>>>
>>> Carlos Fernandez Sanchez
>>> Systems Manager
>>> CESGA
>>> Avda. de Vigo s/n. Campus Vida
>>> Tel.: (+34) 981569810, ext. 232
>>> 15705 - Santiago de Compostela
>>> SPAIN
>>>
>>> --------------------------------------------------
>>> From: "Rayson Ho"<raysonlogin at gmail.com>
>>> Sent: Tuesday, April 12, 2011 10:31 PM
>>> To: "Beowulf List"<Beowulf at beowulf.org>
>>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
>>> -pre-alpha release
>>>
>>>> If you are using the "Job to Core Binding" feature in SGE and running
>>>> SGE on newer hardware, then please give the new hwloc enabled
>>>> loadcheck a try.
>>>>
>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>>
>>>> The current hardware topology discovery library (Portable Linux
>>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>>>> hardware topology may not be detected correctly by PLPA.
>>>>
>>>> If you are running SGE on AMD Magny-Cours servers, please post your
>>>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>>>
>>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>>>> support in later releases of Grid Engine / Grid Scheduler.
>>>>
>>>> http://gridscheduler.sourceforge.net/
>>>>
>>>> Thanks!!
>>>> Rayson
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr 21 08:59:30 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 21 Apr 2011 14:59:30 +0200
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <4D9E5ECB.60608@ldeo.columbia.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
	<4D9E5ECB.60608@ldeo.columbia.edu>
Message-ID: <0F717DDD-470A-4B13-B1AF-FBCB034409DC@xs4all.nl>

hi,

Sometimes going through some old emails.

Note in the meantime i switched from AMD-CAL to OpenCL.

On Apr 8, 2011, at 3:03 AM, Gus Correa wrote:

> Thank you for the information about AMD-CAL and the AMD GPUs.
> Does AMD plan any GPU product with 64-bit and ECC,
> similar to Tesla/Fermi?

Actually DDR5 already calculates a CRC. Not as good as ECC, but it  
takes care you
have a form of checking. Also the amount of bitflips is so little as  
the quality of this DDR5 is so great,
according to some memory experts i spoke with, that this CRC is more  
than sufficient.

As i'm not a memory expert i would advice you to really speak with  
such a guy instead of some HPC guys here.

Now if your organisation wants ECC simply i'm not going to argue. A  
demand is a demand there.

I'm busy pricewise here how to build cheap something that delivers a  
big punch.

If you look objectively and then to gpgpu codes, then of course  
Nvidia has a few years more experience
setting up CUDA.

This is another problem of course, software support. Both suck at it,  
to say polite.

Yet we want to do calculations cheap huh.

Yet if performance matters, then AMD is a very cheap alternative.
In both cases of course, programming for a gpu is going to be the  
bottleneck;
historically organisations do not invest in good code, they only  
invest in hardware and in managers who
sit on their behind, drink coffee and do meetings.

Objectively most codes you can also code in 32 bits.

If we do a simple compare then the HD6990 is there for 540 euro in  
the shop here. Now that's European prices
where salestax is 19%, so in USA probably it's cheaper (if you  
calculate it back to euro's).

Let's now ignore the marketing nonsense ok, as marketing nonsense is  
marketing nonsense.
All those theoretic flops always, they shouldn't allow double  
counting specific instructions like multiply add.

The internals of these gpu's are all organized such that doing  
efficient matrix calculations on them is very well
possible. Not easy to solve well, as the bottleneck will be the  
bandwidth from the DDR3 cpu ram to the gpu,
  yet if you look to a lot of calculations, then it's algorithmic  
possible to do a lot more work at the execution unit
side than the bandwidth you need to another node; those execution  
units, PE's (processing elements)
  nowadays called, have huge GPR's which can proces all that. With  
that those tiny cheap power efficient cores
can easily take on huge expensive cpu cores. A single set of 4 PE's  
in case of AMD has a total of 1024 GPR's,
can read from a L1 cache when needed and write to a shared local  
cache of 32KB (shared by 64 pe's).

That L1 reads from the memory L2 and all that has a huge bandwidth.

That gives you PRACTICAL 3072 PE's @ 0.83 Ghz == 2.5+ Tflop in 32  
bits integers. It's not so hard to convert
that to 64 bits code if that's what you need. In fact i'm using it to  
approximate huge integers (prime numbers)
of million bit sizes (factorisation of them).

Using that efficiently is not easy, yet realize this is 2.5+ Tflop (i  
should actually say Tera 32 bits integer performance).
Good programmers can use todays GPU's very efficiently.

The 6000+ series of AMD and the Fermi series of Nvidia are very good  
and you can use them in a sustained manner.

Now the cheapest gpgpu of Nvidia is about $1200 which is the quadro  
6000 series and delivers 448 cores @ 1.2Ghz,
say roughly 550 Gflop.

Of course this is practical what you can achieve, i'm not counting of  
course multiply-add here as being 2 flops,
which is their own definition of how many gflops it gets; first of  
all i'm not interested in flops but in integers per cycle
and secondly i prefer a realistic measure, otherwise we have no  
measure on how efficiently we use the gpu.

If you look from a mathematical viewpoint, it's not so clever from  
most scientists at todays huge calculations to use
floating point. Single precision or double precision, in the end it  
all backtracks errors and you have complete non-deterministic results  
with big sureness.

Much better are integer transforms where you have 100% lossless  
calculations so big sureness your calculation is ok.

Yet i realize this is a very expertise field with most people who  
know something about that hiding in secrecy using fake names
and some even having fake burials, just in order to disappear. That  
in itself is all very sad, as progressing science doesn't
happen. As a result of that scientific world has focussed too much  
upon floating point.

Yet the cards can deliver that as well as we know.

The round off errors all those floating point calculations cause are  
always a huge multiple of bitflips of memory.
It's not even in the same league. Now of course calculating with 64  
bits integers it's easier to do huge transforms
and you can redo your calculation and at some spots you will have  
deterministic output in such case, in others of
course not (depends what you calculate of course - majority is non- 
deterministic).

With 32 bits integers you need a lot of CRT (Chinese Remainder  
Theorem) tricks to effectively use it for huge transforms,
or you simply emulate 64 bits calculations (so with 64 bits  
precision, please do not confuse with double precision
floating point).

Getting all that to work is very challenging and not easy, i realize  
that.

Yet look at the huge advantage you give to your scientists in such case.

They can look years ahead in the future which is a *huge* advantage.

In this manner you'll actually effectively get 2.x Tflop out of those  
6990, again that's 2 Tflop calculated in my manner, i'm looking  
simply at INSTRUCTION LEVEL where 1 instruction represents a single  
unit of 32 bits; counting the multiply-add instruction
as 2 flops is just too confusing for how efficient you manage to load  
your GPU, if you ask me.

In the transforms in fact multiply-add is very silly to use in many  
cases as that means you're doing some sort of inefficient
calculation.

Yet that chippie is just 500 euro, versus Nvidia delivers it for 1200  
dollar and the nvidia one is factor 3 slower,
though still lightyears faster than a CPU solution there (pricewise  
seen).

The quadro 6000 for those who don't realize it, is exactly the same  
like a Tesla. Just checkout the specs.

Yet of course for our lazy scientists all of the above is not so  
interesting. Just compiling your years 80 code,
pushing the enter button, is a lot easier.

If you care however for PERFORMANCE, consider spending a couple of  
thousands to hardware.

If you buy 1000 of those 6990's and program in opencl, you actually  
can also run that at nvidia hardware, might
nvidia be so lucky to release very quickly a 22 nm gpu some years  
from now. By then also nvidia's opencl will probably
be supporting the gpu hardware quite ok.

So my advice would be: program it in opencl. It's not the most  
efficient language on the planet, yet it'll work everywhere
and you can get probably around 2 Tflop out that 6990 AMD card.

That said of course there is a zillion problems still with opencl,  
yet if you want for $500k in gpu hardware
achieve 1 petaflop, you'll have to suffer a bit, and by the time your  
cluster is there, possibly all big bugs have
been fixed in opencl both by amd as well as by nvidia for their gpu  
lines.

Now all this said i do realize that you need a shift in thinking.  
Whether you use AMD-gpu's or Nvidia, in both cases you'll
need great new software. In fact it doesn't even matter whether you  
program it in OpenCL or CUDA. It's easy to port algorithms
from 1 entity to another; getting such algorithm to work is a lot  
harder than the question what language you program it in.

Translating CUDA to openCL is pretty much braindead work which many  
can carry out as we already saw in some examples.
The investment is in the software for the gpu's.

You don't buy that in from nvidia nor AMD. You'l have to hire people  
to program it, as your own scientists simply aren't good
enough to program efficiently for that GPU. The old fashionned vision  
of having scientists solve themselve how to do the
calculations is not going to work for gpgpu simply.

Now that is a big pitfall that is hard to overcome.

All this said, of course there is a few, really very few,  
applications where a full blown gpu nor hybrid solution is able to
solve the problems. Yet usually such claim that it is "not possible"  
gets done by scientists who are experts in their field,
but not very high level in finding solutions how to efficiently get  
their calculations done in HPC.

Regards,
Vincent

>
> The lack of a language standard may still be a hurdle here.
> I guess there were old postings here about CUDA and OpenGL.
> What fraction of the (non-gaming) GPU code is being written these days
> in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using
> compiler directives like those in the PGI compilers?
>
> Thank you,
> Gus Correa
>
> Vincent Diepeveen wrote:
>>
>> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>>
>>> Vincent Diepeveen wrote:
>>>
>>>> GPU monster box, which is basically a few videocards inside such a
>>>> box stacked up a tad, wil only add a couple of
>>>> thousands.
>>>>
>>>
>>> This price may be OK for the videocard-class GPUs,
>>> but sounds underestimated, at least for Fermi Tesla.
>>
>> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
>> note there is a 6 GB version, not aware of price will be $$$$ i bet.
>> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
>>
>> VERSUS
>>
>> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
>>
>> Factor 100 difference to those cards.
>>
>> A couple of thousands versus a couple of hundreds of thousands.
>> Hope i made my point clear.
>>
>>
>>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla  
>>> C2050,
>>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~ 
>>> $15k.
>>> If you care about ECC, that's the price you pay, right?
>>
>> When fermi released it was a great gpu.
>>
>> Regrettably they lobotomized the gamers card's double precision as i
>> understand,
>> So it hardly has double precision capabilities; if you go for  
>> nvidia you
>> sure need a Tesla,
>> no question about it.
>>
>> As a company i would buy in 6990's though, they're a lot cheaper and
>> roughly 3x faster
>> than the Nvidia's (for some more than 3x for other occassions less  
>> than
>> 3x, note the card
>> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
>>
>> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units  
>> for AMD
>> versus 448 cores nvidia with 448 execution units of 32 bits  
>> multiplication.
>>
>> Especially because multiplication has improved a lot.
>>
>> Already having written CUDA code some while ago, i wanted the cheap
>> gamers card with big
>> horse power now at home so  i'm toying on a 6970 now so will be  
>> able to
>> report to you what is possible to
>> achieve at that card with respect to prime numbers and such.
>>
>> I'm a bit amazed so little public initiatives write code for the  
>> AMD gpu's.
>>
>> Note that DDR5 ram doesn't have ECC by default, but has in case of  
>> AMD a
>> CRC calculation
>> (if i understand it correctly). It's a bit more primitive than  
>> ECC, but
>> works pretty ok and shows you
>> also when problems occured there, so figuring out remove what goes  
>> on is
>> possible.
>>
>> Make no mistake that this isn't ECC.
>> We know some HPC centers have as a hard requirement ECC, only  
>> nvidia is
>> an alternative then.
>>
>> In earlier posts from some time ago and some years ago i already  
>> wrote
>> on that governments should
>> adapt more to how hardware develops rather than demand that  
>> hardware has
>> to follow them.
>>
>> HPC has too little cash to demand that from industry.
>>
>> OpenCL i cannot advice at this moment (for a number of reasons).
>>
>> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but
>> majority of codes are possible
>> to port quite well (there is exceptions), or easy work arounds.
>>
>> Any company doing gpgpu i would advice developing both branches of  
>> code
>> at the same time,
>> as that gives the company a lot of extra choices for really very  
>> little
>> extra work. Maybe 1 coder,
>> and it always allows you to have the fastest setup run your  
>> production
>> code.
>>
>> That said we can safely expect that from raw performance coming years
>> AMD will keep the leading edge
>> from crunching viewpoint. Elsewhere i pointed out why.
>>
>> Even then i'd never bet at just 1 manufacturer. Go for both  
>> considering
>> the cheap price of it.
>>
>> For a lot of HPC centers the choice of nvidia will be an easy one, as
>> the price of the Fermi cards
>> is peanuts compared to the price rest of the system and considering
>> other demands that's what they'll go for.
>>
>> That might change once you stick in bunches of videocards in nodes.
>>
>> Please note that the gpu 'streamcores' or PE's whatever name you  
>> want to
>> give them, are so bloody fast,
>> that your code has to work within the PE's themselves and hardly  
>> use the
>> RAM.
>>
>> Both for Nvidia as well as AMD, the streamcores are so fast, that you
>> simply don't want to lose time on the RAM
>> when your software runs, let alone that you want to use huge RAM.
>>
>> Add to that, that nvidia (have to still figure out for AMD) can in
>> background stream from and to the gpu's RAM
>> from the CPU, so if you do really large calculations involving  
>> many nodes,
>> all that shouldn't be an issue in the first place.
>>
>> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that  
>> would
>> really amaze me, though i'm sure
>> there is cases where that happens. If we see however what was  
>> ordered it
>> mostly is the 3GB Tesla's,
>> at least on what has been reported, i have no global statistics on  
>> that...
>>
>> Now all choices are valid there, but even then we speak about peanuts
>> money compared to the price of
>> a single 8 socket Nehalem-ex box, which fully configured will be  
>> maybe
>> $300k-$400k or something?
>>
>> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD  
>> 6990
>> is 2000 euro.
>>
>> There won't be 2 gpu nvidia's any soon because of the choice they  
>> have
>> historically made for the memory controllers.
>> See explanation of intel fanboy David Kanter for that at  
>> realworldtech
>> in a special article he wrote there.
>>
>> Please note i'm not judging AMD nor Nvidia, they have made their  
>> choices
>> based upon totally different
>> businessmodels i suspect and we must be happy we have this rich  
>> choice
>> right now between cpu's from different
>> manufacturers and gpu's from different manufacturers.
>>
>> Nvidia really seems to aim at supercomputers, giving their tesla line
>> without lobotomization and lobotomizing their
>> gamers cards, where AMD aims at gamers and their gamercards have full
>> functionality
>> without lobotomization.
>>
>> Total different businessmodels. Both have their advantages and
>> disadvantages.
>>
>>  From pure performance viewpoint it's easy to see what's faster  
>> though.
>>
>> Yet right now i realize all too well that just too many still  
>> hesitate
>> between also offering gpu services additional to
>> cpu services, in which case having a gpu, regardless nvidia or amd,
>> kicks butt of course from throughput viewpoint.
>>
>> To be really honest with you guys, i had expected that by 2011 we  
>> would
>> have a gpu reaching far over 1 Teraflop double precision  
>> handsdown. If
>> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2
>> gpu's on a single card to get over that Teraflop double precision  
>> (claim
>> is 1.27 Teraflop double precision),
>> that really is underneath my expectations from a few years ago.
>>
>> Now of course i hope you realize i'm not coding double precision  
>> code at
>> all; i'm writing everything in integers of 32 bits for the AMD  
>> card and
>> the Nvidia equivalent also is using 32 bits integers. The ideal  
>> way to
>> do calculations on those cards, so also very big transforms, is using
>> the 32 x 32 == 64 bits instructions (that's 2 instructions in case  
>> of AMD).
>>
>> Regards,
>> Vincent
>>
>>
>>>
>>> Gus Correa
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr 21 09:11:54 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 21 Apr 2011 15:11:54 +0200
Subject: [Beowulf] Google: 1 billion computing core-hours for
	researchers to tackle huge scientific challenges
In-Reply-To: <4D9E4162.3030004@pathscale.com>
References: <4D9E4162.3030004@pathscale.com>
Message-ID: <A5923D60-5B04-4EB6-B8BB-BE0E17CC5BD1@xs4all.nl>

Regrettably the link is not available anymore. Can you expand on it?

As they count the cloud computing in units of 1Ghz per cpunode hour,
1 billion computing core hours is something like 1000 gpu's for 1 week?

1 billion sounds impressive nevertheless.

Regards,
Vincent

On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote:

> I just saw this on another ML and thought it may be of interest
> ------------
> http://googleblog.blogspot.com/2011/04/1-billion-computing-core- 
> hours-for.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Thu Apr 21 09:15:13 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Thu, 21 Apr 2011 15:15:13 +0200
Subject: [Beowulf] Google: 1 billion computing core-hours for
	researchers to tackle huge scientific challenges
In-Reply-To: <A5923D60-5B04-4EB6-B8BB-BE0E17CC5BD1@xs4all.nl>
References: <4D9E4162.3030004@pathscale.com>
	<A5923D60-5B04-4EB6-B8BB-BE0E17CC5BD1@xs4all.nl>
Message-ID: <5188F4D0-1D69-4B8F-874D-D20FDAC25CF6@staff.uni-marburg.de>

Am 21.04.2011 um 15:11 schrieb Vincent Diepeveen:

> Regrettably the link is not available anymore. Can you expand on it?

For me it's still working. You selected both lines?

--Reuti

> As they count the cloud computing in units of 1Ghz per cpunode hour,
> 1 billion computing core hours is something like 1000 gpu's for 1 week?
> 
> 1 billion sounds impressive nevertheless.
> 
> Regards,
> Vincent
> 
> On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote:
> 
>> I just saw this on another ML and thought it may be of interest
>> ------------
>> http://googleblog.blogspot.com/2011/04/1-billion-computing-core- 
>> hours-for.html
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Apr  1 10:47:28 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 01 Apr 2011 10:47:28 -0400
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D93B1E4.3080407@cora.nwra.com>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<4D93B1E4.3080407@cora.nwra.com>
Message-ID: <4D95E580.5090902@ias.edu>

On 03/30/2011 06:42 PM, Orion Poplawski wrote:
> On 03/21/2011 06:51 AM, Douglas Eadline wrote:
>> I got to thinking about how others are fairing (or not)
>> with GP-GPU technology. I put up a simple poll on
>> ClusterMonkey to help get a general idea.
>> (you can find it on the front page right top)
>> If you have a moment, please provide
>> your experience (results are available as well).
> 
> We've seen some reasonable speedup (12x) with some matlab code using Jacket. 
> It required up-to-the-minute bugfixes/enhancements from Accelereyes to get it 
> working though.  Ran into lots of limitations with some other code (sparse 
> matrices) that prevented it from being usable.  Have some reports of success 
> with gpulib and IDL.
> 
> 

I've installed 4 GPU-equipped servers in my environment; 2 are a part of
my cluster, and 2 are independent from the cluster so that users can
login interactively and program/debug/tinker/whatever. (My cluster
doesn't allow interactive logins by design).

A handful of users were interested in getting access to the GPUs, but so
far, not a single one has even logged into these systems to kick the
tires yet, and the systems have been online for approx. 9 months. It
just be that they're busy with other work. Most of my users are
post-docs who guide their own research, so they can create/modify their
own project schedules as they see fit.


-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Shainer at Mellanox.com  Fri Apr  1 18:41:07 2011
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Fri, 1 Apr 2011 15:41:07 -0700
Subject: [Beowulf] AMD 8 cores vs 12 cores CPUs and Infiniband
References: <1301387847.1995.144.camel@mundo><9FA59C95FFCBB34EA5E42C1A8573784F037FD082@mtiexch01.mti.com>
	<Pine.LNX.4.64.1103292349500.16437@coffee.psychology.mcmaster.ca>
Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F037FD5B7@mtiexch01.mti.com>

 
> > I have been using single card on Magny-Cours with no issues at all.
> You can
> 
> interesting.  what adjustments have you made to the MPI stack to
permit
> this?
> we've had a variety of apps that fail intermittently on high-core
> nodes.
> I have to say I was surprised such a thing came up - not sure whether
> it's
> inherent to IB or a result of the openmpi stack.  our usual way to
test
> this is to gradually reduce the ranks-per-node for the job until it
> starts
> to work.  an interesting cosmology code works at 1 pppn but not 3 ppn
> on our recent 12c MC, mellanox QDR cluster.


I will be more than happy to give it a try - have access to the
Magny-Cours system at
http://www.hpcadvisorycouncil.com/cluster_center.php


> 
> regards, mark hahn.
> _______________________________________________
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From akshar.bhosale at gmail.com  Sat Apr  2 08:41:07 2011
From: akshar.bhosale at gmail.com (akshar bhosale)
Date: Sat, 2 Apr 2011 18:11:07 +0530
Subject: [Beowulf] error in job; jobs failing
Message-ID: <AANLkTikCOnF1a1hCZ2wmz5ETanwDg1dL5jvNbSuhCAZZ@mail.gmail.com>

Hi,
we are getting dapl 4003 event error. We have rhel 5.2 x64 and intel mpi
library 4.3;dapl-1.2.7-1.ofed1.3.1;
What can be the reason? we have torque and pbs setup for job runs.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110402/e2fdef3c/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From herbert.fruchtl at st-andrews.ac.uk  Mon Apr  4 11:15:35 2011
From: herbert.fruchtl at st-andrews.ac.uk (Herbert Fruchtl)
Date: Mon, 04 Apr 2011 16:15:35 +0100
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <mailman.1.1301684401.12568.beowulf@beowulf.org>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
Message-ID: <4D99E097.7060807@st-andrews.ac.uk>

They hear great success stories (which in reality are often prototype 
implementations that do one carefully chosen benchmark well), then look at the 
API, look at their existing code, and postpone the start of their project until 
they have six months spare time for it. And we know when that is.

The current approach with more or less vendor specific libraries (be they "open" 
or not) limits the uptake of GPU computing to a few hardcore developers of 
experimental codes who don't mind rewriting their code every two years. It won't 
become mainstream until we have a compiler that turns standard Fortran (or C++, 
if it has to be) into GPU code. Anything that requires more change than let's 
say OpenMP directives is doomed, and rightly so.

   Herbert

>
> I've installed 4 GPU-equipped servers in my environment; 2 are a part of
> my cluster, and 2 are independent from the cluster so that users can
> login interactively and program/debug/tinker/whatever. (My cluster
> doesn't allow interactive logins by design).
>
> A handful of users were interested in getting access to the GPUs, but so
> far, not a single one has even logged into these systems to kick the
> tires yet, and the systems have been online for approx. 9 months. It
> just be that they're busy with other work. Most of my users are
> post-docs who guide their own research, so they can create/modify their
> own project schedules as they see fit.
>
>

-- 
Herbert Fruchtl
Senior Scientific Computing Officer
School of Chemistry, School of Mathematics and Statistics
University of St Andrews
--
The University of St Andrews is a charity registered in Scotland:
No SC013532
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Mon Apr  4 12:01:44 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Mon, 04 Apr 2011 23:01:44 +0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
Message-ID: <4D99EB68.4020800@pathscale.com>

Herbert Fruchtl wrote:
> They hear great success stories (which in reality are often prototype 
> implementations that do one carefully chosen benchmark well), then look at the 
> API, look at their existing code, and postpone the start of their project until 
> they have six months spare time for it. And we know when that is.
>
> The current approach with more or less vendor specific libraries (be they "open" 
> or not) limits the uptake of GPU computing to a few hardcore developers of 
> experimental codes who don't mind rewriting their code every two years. It won't 
> become mainstream until we have a compiler that turns standard Fortran (or C++, 
> if it has to be) into GPU code. Anything that requires more change than let's 
> say OpenMP directives is doomed, and rightly so.
>   
Hi Herbert,

I think your perspective pretty much nails it

(shameless self promotion)
http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source)

This is really only the tip of the problem and there must also be 
solutions for scaling *efficiently* across the cluster.  (No MPI + CUDA 
or even HMPP is *not* the answer imho.)

./C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Mon Apr  4 12:53:22 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Mon, 4 Apr 2011 09:53:22 -0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47FF0C60D4455@ALTPHYEMBEVSP20.RES.AD.JPL>

You've described it pretty well..

Look how long it took for "standard libraries" to take advantage of things like MPI to become "of course we use that"..

If the original code used standard library calls for things like matrix math, and it's a "drop in" so you could do a "test case" in less than a day or so, you get pretty rapid acceptance.  If it requires weeks to just figure out how to make it work, it's going to be in the "when someone specifically funds me to do it".  

I've seen lots of really interesting things that I'd like to try, but not being independently wealthy or having a patron who is, I have to work on things that other people want done (and, presumably which I also find interesting).  I can write proposals to say "it would be really nice to do X because of speculative benefit Y"  and every once in a while, someone will say, "Yeah, that sounds good, go check it out".  And then we do.  


But it's a long and time consuming process.  For instance, I was just in a presentation last week discussing a recent call for proposals from NASA.. the *shortest* time from proposal to response (yes/no) was around 120 days, the median was around 200 days, and the max was around 400 days plus, depending on the year.
http://science.nasa.gov/researchers/sara/grant-stats/

A lot depends on what happens to the budgets as they wend their leisurely way through the program offices at the agencies, then get rolled up in the President's submission, then thrashed in Congress, then allocated, then back through the agency, and finally back down to the program.  To provide some perspective on the front end of the process, the program managers at the agencies are winding up their PPBE13 submissions (that's for FY13, starting October 2012, although it also affects FY12 funding) 

A "new technology" that hasn't been "on the radar" probably has a 2-3 year lag before significant money can be applied to it (at least from government funding sources).  Often, one can get smaller sums more quickly out of some general "investigate new technologies" kind of bucket (smaller sums = a few $10k), but right now, even those have essentially dried up  (Continuing resolutions, etc.)

To tie this back to the first question.. a few $10k would pay for the "Lets try recompiling with the new library and see if it works" sort of level of effort, but not for a "Let's rewrite our codes for the new hardware, and engage in a validation and verification effort to show that it still works"

James Lux, P.E.
Co-Principal Investigator, CoNNeCT Project
Task Manager, SOMD Software Defined Radios
Flight Communications Systems Section
Jet Propulsion Laboratory
4800 Oak Grove Drive, Mail Stop 161-213
Pasadena, CA, 91109
+1(818)354-2075 phone
+1(818)393-6875 fax 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Herbert Fruchtl
> Sent: Monday, April 04, 2011 8:16 AM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] GP-GPU experience
> 
> They hear great success stories (which in reality are often prototype
> implementations that do one carefully chosen benchmark well), then look at the
> API, look at their existing code, and postpone the start of their project until
> they have six months spare time for it. And we know when that is.
> 
> The current approach with more or less vendor specific libraries (be they "open"
> or not) limits the uptake of GPU computing to a few hardcore developers of
> experimental codes who don't mind rewriting their code every two years. It won't
> become mainstream until we have a compiler that turns standard Fortran (or C++,
> if it has to be) into GPU code. Anything that requires more change than let's
> say OpenMP directives is doomed, and rightly so.
> 
>    Herbert
> 
> >
> > I've installed 4 GPU-equipped servers in my environment; 2 are a part of
> > my cluster, and 2 are independent from the cluster so that users can
> > login interactively and program/debug/tinker/whatever. (My cluster
> > doesn't allow interactive logins by design).
> >
> > A handful of users were interested in getting access to the GPUs, but so
> > far, not a single one has even logged into these systems to kick the
> > tires yet, and the systems have been online for approx. 9 months. It
> > just be that they're busy with other work. Most of my users are
> > post-docs who guide their own research, so they can create/modify their
> > own project schedules as they see fit.
> >
> >
> 
> --
> Herbert Fruchtl
> Senior Scientific Computing Officer
> School of Chemistry, School of Mathematics and Statistics
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mfatica at gmail.com  Mon Apr  4 12:54:37 2011
From: mfatica at gmail.com (Massimiliano Fatica)
Date: Mon, 4 Apr 2011 09:54:37 -0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D99EB68.4020800@pathscale.com>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
Message-ID: <BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>

If you are old enough to remember the time when the first distribute
computers appeared on the scene,
this is a deja-vu. Developers used to program on shared memory (
mostly with directives) were complaining
about the new programming models ( PVM, MPL, MPI).
Even today, if you have a serial code there is no tool that will make
your code runs on a cluster.
Even on a single system, if you try an auto-parallel/auto-vectorizing
compiler on a real code, your results will probably be disappointing.

When you can get a 10x boost on a production code rewriting some
portions of your code to use the GPU, if time to solution is important
or you could perform simulations that were impossible  before ( for
example using algorithms that were just too slow on CPUs,
Discontinuous Galerkin method is a perfect example), there are a lot
of developers that will write the code.
The effort it is clearly dependent of the code, the programmer and the
tool used ( you can go from fully custom GPU code with CUDA or OpenCL,
to automatically generated CUF kernels from PGI, to directives using
HMPP or PGI Accelerator).
In situation where time  to solution relates to money,  for example
oil and gas, GPUs are the answer today ( you will be surprised
by the number of GPUs in Houston).
Look at   the performance and scaling of AMBER ( MPI+ CUDA),
http://ambermd.org/gpus/benchmarks.htm, and tell me that the results
were not worth the effort.

Is GPU programming for everyone: probably not, in the same measure
that parallel programming in not for everyone.
Better tools will lower the threshold, but a threshold will be always present.


Massimiliano
PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran,
applications porting with CUDA, MPI+CUDA).


2011/4/4 "C. Bergstr?m" <cbergstrom at pathscale.com>:
> Herbert Fruchtl wrote:
>> They hear great success stories (which in reality are often prototype
>> implementations that do one carefully chosen benchmark well), then look at the
>> API, look at their existing code, and postpone the start of their project until
>> they have six months spare time for it. And we know when that is.
>>
>> The current approach with more or less vendor specific libraries (be they "open"
>> or not) limits the uptake of GPU computing to a few hardcore developers of
>> experimental codes who don't mind rewriting their code every two years. It won't
>> become mainstream until we have a compiler that turns standard Fortran (or C++,
>> if it has to be) into GPU code. Anything that requires more change than let's
>> say OpenMP directives is doomed, and rightly so.
>>
> Hi Herbert,
>
> I think your perspective pretty much nails it
>
> (shameless self promotion)
> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source)
>
> This is really only the tip of the problem and there must also be
> solutions for scaling *efficiently* across the cluster. ?(No MPI + CUDA
> or even HMPP is *not* the answer imho.)
>
> ./C
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 15:16:31 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 21:16:31 +0200
Subject: [Beowulf] Quadrics?
In-Reply-To: <4D2C8B7C.30300@bull.co.uk>
References: <Pine.LNX.4.64.1101111055290.21366@coffee.psychology.mcmaster.ca>
	<4D2C8B7C.30300@bull.co.uk>
Message-ID: <C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>

hi,

sometimes i go through a lot of mails at the mailing list here and  
had missed this one.
please keep me up to date and/or add me to mailing lists there.

latency is superior of quadrics compared to all the infini* stuff.
drivers that integrate into kernels - well some modifications  
shouldn't be too hard.

Of course even the realtime linux kernel is rather crappy there, as  
it locks every action
from and to a socket (even RAW/UDP communication in fact),
so you need a 'hack' of that kernel anyway to get faster latencies.

secondhand the quadrics stuff is cheap it seems.

Vincent

On Jan 11, 2011, at 5:55 PM, Daniel Kidger wrote:

> Mark,
>
> I will let others step forward individually.
>
> I was one of the last employees to leave Quadrics , so I do know  
> who had
> support contracts at that time, plus the even larger set of sites that
> had expired support contracts but still were actively running their
> QsNet clusters.
>
> You know that a company called Vega took on the ongoing support? :
> here is the website I set up at the time: https:// 
> support.hpc.vega.co.uk/
>
> I agree too though that there should be a community of QsNet-owning
> enthusiasts, who could provide mutual support in this legacy era.
>
>
> Also off the record, I know that there is a lot of Elan4 stock sitting
> in a warehouse. As long as you are not looking for long term vendor
> support, I expect  you could acquire cards, cables and switches for a
> bargain price.
>
> Daniel
>
>
>> Are you still using Quadrics Elan4-based clusters?
>>
>> We would like to continue using Quadrics on one of our clusters,  
>> since it
>> is still quite good in latency.  Maintaining the Quadrics drivers,  
>> though,
>> is a bit of a pain going forward - would be nice to avoid  
>> duplicating effort,
>> if there are other groups also doing so.
>>
>> please follow up or email me if you are using Elan4, or know anything
>> relevant.
>>
>> thanks,
>> Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http:// 
>> www.sharcnet.ca
>>             | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140  
>> x24687
>>             | Compute/Calcul Canada                | http:// 
>> www.computecanada.org
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>>
>
>
> -- 
> Bull, Architect of an Open World TM
>
> Dr. Daniel Kidger, HPC Technical Consultant
> daniel.kidger at bull.co.uk
> +44 (0) 7966822177
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 15:20:15 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 21:20:15 +0200
Subject: [Beowulf]
 =?iso-8859-1?q?Chinese_supercomputers_to_use_=91homemad?=
 =?iso-8859-1?q?e=92_chips?=
In-Reply-To: <Pine.LNX.4.64.1103110108590.19223@coffee.psychology.mcmaster.ca>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.1103110108590.19223@coffee.psychology.mcmaster.ca>
Message-ID: <D25DC520-6373-4BF7-A35C-C2F597F3BE0A@xs4all.nl>


On Mar 11, 2011, at 7:20 AM, Mark Hahn wrote:

>> Interesting:
>> Chinese supercomputers to use ?homemade? chips
>> http://timesofindia.indiatimes.com/tech/personal-tech/computing/ 
>> Chinese-supercomputers-to-use-homemade-chips/articleshow/7655183.cms
>
> it's important to remind ourselves that China is still a centrally- 
> planned,
> totalitarian dictatorship.  I mention this only because this  
> announcement
> is a bit like Putin et al announcing that they'll develop their own  
> linux distro because Russia is big and important and mustn't allow  
> itself to be vulnerable to foreign hegemony.
>
> so far, the very shallow reporting I've seen has said that future  
> generations will add wide FP vector units.  nothing wrong with that,
> though it's a bit unclear to me why other companies haven't done it
> if there is, in fact, lots of important vector codes that will run  
> efficiently on such a configuration.  adding/widening vector FP is  
> not breakthrough engineering afaikt.
>
> has anyone heard anything juicy about the Tianhe interconnect? 
> _______________________________________________

Not really but busy with an AMD-GPU now the 6970 (note the 6990 also  
is available having 2 gpu's) is so fast
that the real problem is bandwidth from and to the gpu; so for a big  
cluster calculation i can understand very well
the need for having your own interconnect, especially as they get  
produced in china anyway.

the cpu's you also need bigtime, but as i'm going to react onto a  
special GPU posting anyway let's move it to that subject.

> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 15:26:43 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 21:26:43 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
Message-ID: <DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>

you can forget about getting much info other than marketing data.

the companies and orgainsations that already calculate for years at  
gpu's
they are really good in keeping their mouth shut.

But if you realize that even with 16 fast AMD cores (which for this  
specific
prime number code are a LOT FASTER in ipc than any other x64 chip),
a box built cheap second hand by the way as it's 4 x 8356 are needed
to feed just 1 gpu, you start to realize the real problem.

GPU's completely annihilate cpu's everywhere.

The limitation is the bandwidth to the gpu, though i didn't fully  
test that
bandwidth yet.

The 6000 series from AMD has much improved multiplication logics,  
like 2.5x faster
than the previous generation and it'll take some time to optimize  
this code for it.

streamcores for a while got renamed to PE's nowadays, processing  
elements,
and it has 1536 per gpu.

The 6990 has 2 of 'em.

It took a while for a good driver for these gpu's. Last days of  
januari it was there.
AMD-CAL works great here now.

There is not much diff with CUDA, other than proprietary ways of how  
to access things
and limbs and a few function calls.

Programming is similar.

818 execution units that can do multiplication 32 x 32 bits == 64 bits.

That kicks butt. bye bye cpu's.


On Mar 21, 2011, at 1:51 PM, Douglas Eadline wrote:

>
> I was recently given a copy of "GPU Computing Gems"
> to review. It is basically research quality NVidia success
> stories, some of which are quite impressive.
>
> I got to thinking about how others are fairing (or not)
> with GP-GPU technology. I put up a simple poll on
> ClusterMonkey to help get a general idea.
> (you can find it on the front page right top)
> If you have a moment, please provide
> your experience (results are available as well).
>
>   http://www.clustermonkey.net/
>
> BTW: You can see all the previous polls
> and links to other market data  here:
>
> http://goo.gl/lDcUJ
>
>
> --
> Doug
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 16:07:31 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 22:07:31 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
Message-ID: <B917A553-E23E-4F10-8B09-C055E2969E82@xs4all.nl>


On Apr 4, 2011, at 6:54 PM, Massimiliano Fatica wrote:

> If you are old enough to remember the time when the first distribute
> computers appeared on the scene,
> this is a deja-vu. Developers used to program on shared memory (
> mostly with directives) were complaining
> about the new programming models ( PVM, MPL, MPI).
> Even today, if you have a serial code there is no tool that will make
> your code runs on a cluster.
> Even on a single system, if you try an auto-parallel/auto-vectorizing
> compiler on a real code, your results will probably be disappointing.
>
> When you can get a 10x boost on a production code rewriting some
> portions of your code to use the GPU, if time to solution is important

Oh comeon factor 10 is not realistic.

You're doing the usual compare here of a hobby coder who coded
a tad in C or slowish C++ (except for a SINGLE, so not several,
NCSA coder i'll have to find the first C++ guy who can write codes
equally fast to C for complex algorithms - granted for big companies
C++ makes more sense, just not when it's about performance)
and then compare that with a full blown sponsored project in CUDA
that uses the topend gpu and compare it versus a single core
instead of 4 sockets (as that's powerwise the same).

Moneywise of course is another issue, that's where the gpu's win it
bigtime.

Yet there is a hidden cost in gpu's, that's you can build something way
faster for less money with gpu's, but you also need to pay for a good
coder to write your code in either CUDA or AMD-CAL (or as the chinese
seem to support both at the same time, which is not so complicated
if you have setup things in the correct manner).

This last is a big problem for the western world; governments pay big
bucks for hardware, but paying good coders what they are worth they
seem to forget.

Secondly there is another problem, that's that NVIDIA hasn't even  
released
the instructoin set of their GPU. Try to figure that out without  
fulltime work for it.

It seems however pretty similar to AMD, despite other huge architectural
differences between the 2; the programming similarity is striking and  
selfexplains
the real purpose where they got designed for (GRAPHICS).

> or you could perform simulations that were impossible  before ( for
> example using algorithms that were just too slow on CPUs,

All true yet it takes a LOT OF TIME to write something that's fast on  
a gpu.

First of all you have to not write double precision code, as the  
gamers card
from nvidia seem to not have much double precision logic, they only have
32 bits logics.

So at double precision, AMD is like 10 times faster in money per  
gflop than Nvidia.

Yet try to figure that out without being fulltime busy with those gpu's.
Only the TESLA versions have those transistors it seems.

Secondly Nvidia seems to keep being busy maximizing the frequency of
the gpu.

Now that might be GREAT for games as high clocked cores work (see  
intel),
yet for throughput of course that's a dead end. In raw throughput  
AMD's (ATI's)
approach will always win it of course from nvidia, as clocking a  
processor
higher has a O ( n ^ 3 ) impact on power consumption.

Now a big problem with nvidia is also that they basically go over spec.
I didn't really figure it out, yet it seems pci-e got designed with  
300 watt in mind max.

Yet at this code i'm busy with, the CUDA version of it (mfaktc)  
consumes a whopping 400+ watt
and please realize that majority of the system time is only keeping  
the streamcores busy
and not caches at all nor much of a RAM.

It's only doing multiplications of course at full speed in 32 bits  
code, using the new Fermi's
instructions that allows multiplying 32 bits x 32 bits == 64 bits.

CUDA version of your code gets developed btw by a guy working for a  
HPC vendor
which, i guess, also sells those Tesla's.

So any performance bragging sure must keep in mind it's far over 33%  
over the specs in
terms of power consumption.

Note AMD seems to follow nvidia in its path there.

> Discontinuous Galerkin method is a perfect example), there are a lot
> of developers that will write the code.

Oh comeon, writing for gpu's is really complicated.

> The effort it is clearly dependent of the code, the programmer and the
> tool used ( you can go from fully custom GPU code with CUDA or OpenCL,

Forget OpenCL, not good enough.

Better to code in CUDA and AMD-CAL at the same time something.

> to automatically generated CUF kernels from PGI, to directives using
> HMPP or PGI Accelerator).
> In situation where time  to solution relates to money,  for example
> oil and gas, GPUs are the answer today ( you will be surprised
> by the number of GPUs in Houston).

Pardon me, those industries already were using vectorized solutoins  
long before CUDA was
there and are using massively GPU's to calculate of course as soon as  
nvidia released
a version that was programmable.

This is not new. All those industries will of course never say  
anything on the performance
nor how many they use.

> Look at   the performance and scaling of AMBER ( MPI+ CUDA),
> http://ambermd.org/gpus/benchmarks.htm, and tell me that the results
> were not worth the effort.
>
> Is GPU programming for everyone: probably not, in the same measure
> that parallel programming in not for everyone.
> Better tools will lower the threshold, but a threshold will be  
> always present.
>

I would argue that both AMD as well as Nvidia has really tried to  
give the 3d world nations an advantage
by stopping progress in the rich nations.

I will explain. The real big advantage of rich nations is that  
average persons have more cash.
Students are a good example there. They can afford gpu's easily.

Yet there is so little technical information available on latencies  
and in case of nvidia on instructoin set that
the gpu's support, that this gives a huge programming hurdle for  
students.

Also there is no good tips in nvidia documents how to program for  
those things.

The most fundamental lessons how to program a gpu i miss in all  
documents i scanned so far.

It's just a bunch of 'lectures' that's not going to create any  
topcoders.

A piece of information here and a tad there.
Very bad.

AMD also is a nightmare there, they can't even run more than 1  
program at the same time, despite claims
that the 4000 series gpu's already had hardware support to do it. The  
indian helpdesk in fact is so lazy that
they didn't even rename the word 'ati' in the documentation to AMD,  
and the library each few months gets a
new name. Stream SDK now it's another new fancy name. "we worked hard  
in India sahib, yes sahib, yes sahib".

Yet 5 years later still not much works. For example in opencl also  
the 2nd gpu doesn't work in case of AMD.
Result "undefined". Nice.

Default driver install at inux here doesn't get openCL to work in  
fact at the 6970.

Both nvidia as well as AMD are a total joke there and by means of  
incompetence,
the generic incompetence being complete and clear documentation just  
like we have documention on how
cpu's work. Be it intel or AMD or IBM.

Students who program now for those gpu's in CUDA or AMD-CAL, they  
will have to go to hell and back to get something
to work well on it, except some trivial stuff that works well at it.

We see that just a few manage.

That's not a problem of the students, but a problem for society,  
because doing calculations faster and especially
CHEAP, is a huge advantage to progress science.

NSA type organisations in 3d world nations are a lot bigger than  
here, simply because more people live there.
So right now more people over there code for gpu's than here, here  
where everyone can afford one.

Some big companies excepted of course, but this is not a small note  
on companies. This is a note on 1st world versus 3d
world. The real difference is students with budget over here.

They have budget for gpu's, yet there is no good documentation simply  
giving which instructions a gpu has let alone which
latencies.

If you google hard, you will find 1 guy who actually by means of  
measuring had to measure the latencies of simple
instructions that write to the same register. Why did an university  
guy need to measure this, why isn't this simply
in Nvidia documentation?

A few of those things will of course have majority, vaste vaste  
majority of students trying something on a gpu, completely fail.

Because they fail, they don't continue there and don't get back from  
those gpu's a faster running code that gives them
something very important: faster calculation speed for whatever they  
wanted to run.

This is where AMD and Nvidia, and i politely call it by means of  
incompetence, gives the rich nations no advantage
over the 3d world nations, as the students need to be compeltely  
fulltime busy to obtain knowledge on the internal workings
of the gpu's in order to get something going fast at them. Majority  
will fail therefore of course, which has simply avoided
gpu's from getting massively adapted.

I've seen so many students try and fail at gpu programming,  
especially CUDA.

It's bizarre. The fail % is so huge. Even a big succes doesn't get  
recognized as a big succes,
simply because the guy didn't know about a few bottlenecks in gpu  
programming, as no manual told him
the combination of problems he ran into, as there was no technical  
data available.

It is true gpu's can be fast, but i feel there is a big need for  
better technical documentation of them.
We can no longer ignore this now that 3d world nations are  
overrunning 1st world nations. Mainly
because the sneaky organisations that do know everything are of  
course bigger over there than here, by means of
population size. This where the huge advantage of the rich nations,  
namely that every student has such gpu
at home, is not getting taken advantage from as the hurdle to gpu  
programming is too high by means of lack of
accurate documentation. Of course in 3d world nations they have at  
most a mobile phone, and very very seldom a laptop (except for the  
rich elite), let alone a computer with a capable programmable gpu,  
which makes it impossible for majority
of 3d world nations students to do any gpu computation because of a  
shortage in cash.

>
> Massimiliano
> PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran,
> applications porting with CUDA, MPI+CUDA).
>
>
> 2011/4/4 "C. Bergstr?m" <cbergstrom at pathscale.com>:
>> Herbert Fruchtl wrote:
>>> They hear great success stories (which in reality are often  
>>> prototype
>>> implementations that do one carefully chosen benchmark well),  
>>> then look at the
>>> API, look at their existing code, and postpone the start of their  
>>> project until
>>> they have six months spare time for it. And we know when that is.
>>>
>>> The current approach with more or less vendor specific libraries  
>>> (be they "open"
>>> or not) limits the uptake of GPU computing to a few hardcore  
>>> developers of
>>> experimental codes who don't mind rewriting their code every two  
>>> years. It won't
>>> become mainstream until we have a compiler that turns standard  
>>> Fortran (or C++,
>>> if it has to be) into GPU code. Anything that requires more  
>>> change than let's
>>> say OpenMP directives is doomed, and rightly so.
>>>
>> Hi Herbert,
>>
>> I think your perspective pretty much nails it
>>
>> (shameless self promotion)
>> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
>> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
>> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to  
>> source)
>>
>> This is really only the tip of the problem and there must also be
>> solutions for scaling *efficiently* across the cluster.  (No MPI +  
>> CUDA
>> or even HMPP is *not* the answer imho.)
>>
>> ./C
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr  4 16:20:02 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Apr 2011 16:20:02 -0400 (EDT)
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
Message-ID: <Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>

> GPU's completely annihilate cpu's everywhere.

this is complete nonsense.  GPUs do very nicely on a quite narrow 
set of problems.  for a somewhat larger set of problems, they do OK,
but pretty "meh", really, considering.  for many problems, GPUs 
are irrelevant, whether that's because the problem uses too much 
memory, or already scales well on non-GPU, or doesn't have a GPU-friendly
structure.

> 818 execution units that can do multiplication 32 x 32 bits == 64 bits.
> That kicks butt. bye bye cpu's.

well, for your application, which is quite narrow.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 16:34:19 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 4 Apr 2011 22:34:19 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
Message-ID: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>


On Apr 4, 2011, at 10:20 PM, Mark Hahn wrote:

>> GPU's completely annihilate cpu's everywhere.
>
> this is complete nonsense.  GPUs do very nicely on a quite narrow  
> set of problems.  for a somewhat larger set of problems, they do OK,
> but pretty "meh", really, considering.  for many problems, GPUs are  
> irrelevant, whether that's because the problem uses too much  
> memory, or already scales well on non-GPU, or doesn't have a GPU- 
> friendly
> structure.
>
>> 818 execution units that can do multiplication 32 x 32 bits == 64  
>> bits.
>> That kicks butt. bye bye cpu's.
>
> well, for your application, which is quite narrow.

Which is about any relevant domain where massive computation takes  
place.
The number of algorithms that really profit bigtime from a lot of  
RAM, in some cases you can also
replace by massive computation and a tad of memory, the cases where  
that cannot be the case
are very rare. For those few cases you order a few nodes with massive  
RAM rather than big cpu power.

yet majority of HPC calculations, especially if we add company codes  
there, the simulators and the oil,
gas, car and aviation industry.

So that makes 95% of all codes just need massive cpu power and can  
get away with relative small RAM sizes
per compute unit. Not to confuse btw with a compute unit of AMD as  
that is just a small part of a gpu, speaking
of redefinitions :)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr  4 17:54:00 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Apr 2011 17:54:00 -0400 (EDT)
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
	<7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
Message-ID: <Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>

>> well, for your application, which is quite narrow.
>
> Which is about any relevant domain where massive computation takes place.

you are given to hyperbole.  the massive domains I'm thinking of
are cosmology and explicit quantum condensed-matter calculations.
the experts in those fields I talk to both do use massive computation
and do not expect much benefit from GPUs.

> The number of algorithms that really profit bigtime from a lot of RAM, in 
> some cases you can also
> replace by massive computation and a tad of memory, the cases where that 
> cannot be the case
> are very rare.

no.  you are equating "uses lots of ram" with "uses memoization".

> yet majority of HPC calculations, especially if we add company codes there, 
> the simulators and the oil,
> gas, car and aviation industry.

jeez.
nevermind I said anything.  I'd forgotten about your style.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Mon Apr  4 18:10:44 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 00:10:44 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
	<7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
	<Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>
Message-ID: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl>


On Apr 4, 2011, at 11:54 PM, Mark Hahn wrote:

>>> well, for your application, which is quite narrow.
>>
>> Which is about any relevant domain where massive computation takes  
>> place.
>
> you are given to hyperbole.  the massive domains I'm thinking of
> are cosmology and explicit quantum condensed-matter calculations.
> the experts in those fields I talk to both do use massive computation
> and do not expect much benefit from GPUs.

Even the field you give as an example: quantum mechanica:
Vaste majority of quantum mechanica calculations are massive matrix  
calculations.

Furthermore i didn't take a look to the field you're speaking about.
I did however take a look to 1 other quantum mechanica calculation,
where someone used 1 core of his quadcore box and massive RAM.

It took me 1 afternoon to explain the guy how to trivially use all 4  
cores doing that calculation
using the same RAM buffer.

You realize that you also can do combined calculations?

Just have a new chipset with big bandwidth to gpu, at cpu's, based  
upon a big RAM buffer, prepare
batches, ship batch to gpu, do tough calculation work on the gpu,  
ship results back.

That's how many use those gpu's.

My attempt to write a sieve directly into the gpu in order to do  
everything inside the gpu,
is of a different league sir than where you are talking.

Your kind of talking is: "there are no tanks in the city, we will  
drive all tanks out of the city, so that only
our cpu's are left again".

Those days are over. Just get creative and find a way to do it at a gpu.

I parallellized 1 quantum mechanica calculation there; i wasn't paid  
for that.
Just pay someone to useful use a GPU. If it ain't easy it doesn't  
mean it's impossible.

Most quantum mechanica guys might be brilliant in their field, in  
manners how to parallellize things
without losing their branching factor that a huge RAM buffer gives,  
they didn't figure out simply yet.

Now it won't be easy to solve for every field; but being a speedfreak  
and in advance saying some faster type of
hardware cannot be used is just monkeytalk. Go get clever and solve  
the problem. Find solutions, don't see just
problems.

>
>> The number of algorithms that really profit bigtime from a lot of  
>> RAM, in some cases you can also
>> replace by massive computation and a tad of memory, the cases  
>> where that cannot be the case
>> are very rare.
>
> no.  you are equating "uses lots of ram" with "uses memoization".
>
>> yet majority of HPC calculations, especially if we add company  
>> codes there, the simulators and the oil,
>> gas, car and aviation industry.
>
> jeez.
> nevermind I said anything.  I'd forgotten about your style.

Read the statistics on the reports what eats system time sir.
You have access to those papers as well if you know how to google.

Regards,
Vincent

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr  4 18:20:08 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Apr 2011 18:20:08 -0400 (EDT)
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>
	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>
	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>
	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>
	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<Pine.LNX.4.64.1104041616580.31929@coffee.psychology.mcmaster.ca>
	<7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl>
	<Pine.LNX.4.64.1104041748230.31929@coffee.psychology.mcmaster.ca>
	<7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl>
Message-ID: <Pine.LNX.4.64.1104041815010.31929@coffee.psychology.mcmaster.ca>

>>>> well, for your application, which is quite narrow.
>>> 
>>> Which is about any relevant domain where massive computation takes place.
>> 
>> you are given to hyperbole.  the massive domains I'm thinking of
>> are cosmology and explicit quantum condensed-matter calculations.
>> the experts in those fields I talk to both do use massive computation
>> and do not expect much benefit from GPUs.
>
> Even the field you give as an example: quantum mechanica:
> Vaste majority of quantum mechanica calculations are massive matrix 
> calculations.

yes, specifically very large sparse eigensystems.  do you have an example
of effectively using GPUs for this?

> Furthermore i didn't take a look to the field you're speaking about.
> I did however take a look to 1 other quantum mechanica calculation,
> where someone used 1 core of his quadcore box and massive RAM.

sorry, I'm talking thousands of cores, ideally with > 4GB/core.

> It took me 1 afternoon to explain the guy how to trivially use all 4 cores 
> doing that calculation
> using the same RAM buffer.

the point is that lots of serious science uses MPI already,
and doesn't care much about GPUs.  if they were free, sure,
they might be interesting.

> My attempt to write a sieve directly into the gpu in order to do everything 
> inside the gpu,
> is of a different league sir than where you are talking.

bully for you.  your application is a niche.

> Your kind of talking is: "there are no tanks in the city, we will drive all 
> tanks out of the city, so that only
> our cpu's are left again".

nonsense.  I'm saying that GPUs are a nice, specialized accelerator.
you can't have them without hosts, so you need to compare host vs host+GPU.

> Those days are over. Just get creative and find a way to do it at a gpu.

don't be silly.  GPUs have weaknesses as well as strengths.  packaging 
and system design is one of the minor sticking points with GPUs.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Tue Apr  5 01:22:39 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Mon, 4 Apr 2011 22:22:39 -0700
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
Message-ID: <20110405052239.GA6130@bx9.net>

On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote:

> If you are old enough to remember the time when the first distribute
> computers appeared on the scene,
> this is a deja-vu.

Not to mention the prior appearance of array processors. Oil+Gas
bought a lot of those, too. Some important radio astronomy data
reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B
was 10X faster than the VAX by itself. Then microprocessor-based
workstations arrived, and the game was over, ease of use FTW.

> Even on a single system, if you try an auto-parallel/auto-vectorizing
> compiler on a real code, your results will probably be disappointing.

The wins from such compilers have been steadily decreasing, as main
memory gets farther and farther away from the CPU and caches.

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From beat at 0x1b.ch  Tue Apr  5 01:52:41 2011
From: beat at 0x1b.ch (Beat Rubischon)
Date: Tue, 05 Apr 2011 07:52:41 +0200
Subject: [Beowulf] Quadrics?
In-Reply-To: <C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>
References: <Pine.LNX.4.64.1101111055290.21366@coffee.psychology.mcmaster.ca>	<4D2C8B7C.30300@bull.co.uk>
	<C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>
Message-ID: <4D9AAE29.5090207@0x1b.ch>

Hi Vincent!

On 04.04.11 21:16, Vincent Diepeveen wrote:
> latency is superior of quadrics compared to all the infini* stuff.

Quadrics was great stuff - but it was outperformed once Mellanox invited
their ConnectX chips. Additional the Quadrics team never got their PCIe
chips (QSnet III) to fly. Finally the company closed their doors in may 09.

I really liked their hard- and software. But the time is over...

> Of course even the realtime linux kernel is rather crappy there, as
> it locks every action from and to a socket (even RAW/UDP
> communication in fact), so you need a 'hack' of that kernel anyway to
> get faster latencies.

When talking about Interconnects the kernel is not involved in
communication. Any context switch is avoided to keep the overhead small.
This basically means a real time kernel isn't needed as it would not
give you any additional benefit.

Beat

-- 
     \|/                           Beat Rubischon <beat at 0x1b.ch>
   ( 0-0 )                             http://www.0x1b.ch/~beat/
oOO--(_)--OOo---------------------------------------------------
Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 03:51:00 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 09:51:00 +0200
Subject: [Beowulf] Quadrics?
In-Reply-To: <4D9AAE29.5090207@0x1b.ch>
References: <Pine.LNX.4.64.1101111055290.21366@coffee.psychology.mcmaster.ca>	<4D2C8B7C.30300@bull.co.uk>
	<C3A25EFE-EFE4-4D54-B3F5-C17BCA2C0BA7@xs4all.nl>
	<4D9AAE29.5090207@0x1b.ch>
Message-ID: <75CD2C36-0B25-4CD2-B3F8-2645BE1A72DC@xs4all.nl>


On Apr 5, 2011, at 7:52 AM, Beat Rubischon wrote:

> Hi Vincent!
>
> On 04.04.11 21:16, Vincent Diepeveen wrote:
>> latency is superior of quadrics compared to all the infini* stuff.
>
> Quadrics was great stuff - but it was outperformed once Mellanox  
> invited
> their ConnectX chips. Additional the Quadrics team never got their  
> PCIe
> chips (QSnet III) to fly. Finally the company closed their doors in  
> may 09.
>
> I really liked their hard- and software. But the time is over...
>

of course there is new great pci-e solutions, yet the price per port  
there is
bigger than entire machine with latest gpu, that's a big problem to  
make cheap
clusters.

If you buy a cheap 6 core box of 350 euro then a new generation gpu is
318 euro or so  (that's a HD6970).

What's node price of the network?

>> Of course even the realtime linux kernel is rather crappy there, as
>> it locks every action from and to a socket (even RAW/UDP
>> communication in fact), so you need a 'hack' of that kernel anyway to
>> get faster latencies.
>
> When talking about Interconnects the kernel is not involved in
> communication. Any context switch is avoided to keep the overhead  
> small.
> This basically means a real time kernel isn't needed as it would not
> give you any additional benefit.

realtime kernel keeps other worst cases down bigtime, especially with  
respect to
scheduling.

>
> Beat
>
> -- 
>      \|/                           Beat Rubischon <beat at 0x1b.ch>
>    ( 0-0 )                             http://www.0x1b.ch/~beat/
> oOO--(_)--OOo---------------------------------------------------
> Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 03:58:47 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 09:58:47 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <20110405052239.GA6130@bx9.net>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
	<20110405052239.GA6130@bx9.net>
Message-ID: <F38D3DC1-3A9F-48ED-B183-1C687CE5B44C@xs4all.nl>


On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote:

> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote:
>
>> If you are old enough to remember the time when the first distribute
>> computers appeared on the scene,
>> this is a deja-vu.
>
> Not to mention the prior appearance of array processors. Oil+Gas
> bought a lot of those, too. Some important radio astronomy data
> reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B
> was 10X faster than the VAX by itself. Then microprocessor-based
> workstations arrived, and the game was over, ease of use FTW.
>
>> Even on a single system, if you try an auto-parallel/auto-vectorizing
>> compiler on a real code, your results will probably be disappointing.
>
> The wins from such compilers have been steadily decreasing, as main
> memory gets farther and farther away from the CPU and caches.
>
> -- greg

It's different this time indeed; classic cpu's will never again  
deliver big performance.

cache - coherency is simply too complicated with many cores.
cpu's also will need a manycore co-processor therefore.

furthermore manycores simply are cheaper to produce and they can eat  
a bigger powerbudget.

3 very powerful arguments which regrettably limits cpu's, but that's  
the price we pay for progress.

It won't mean cpu's will go away of course any soon, they're so  
generic and easy to program that
they will survive. Just offload the calculations to the manycores.

please don't estimate the argument of cheaper to produce.


>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 04:04:35 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 10:04:35 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <F38D3DC1-3A9F-48ED-B183-1C687CE5B44C@xs4all.nl>
References: <mailman.1.1301684401.12568.beowulf@beowulf.org>
	<4D99E097.7060807@st-andrews.ac.uk>
	<4D99EB68.4020800@pathscale.com>
	<BANLkTi=O+Sdfrhg2PvYRO2avmAqBHjONNg@mail.gmail.com>
	<20110405052239.GA6130@bx9.net>
	<F38D3DC1-3A9F-48ED-B183-1C687CE5B44C@xs4all.nl>
Message-ID: <B602B9D0-76A9-4F78-8CE7-F596CE76D946@xs4all.nl>


On Apr 5, 2011, at 9:58 AM, Vincent Diepeveen wrote:

>
> On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote:
>
>> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote:
>>
>>> If you are old enough to remember the time when the first distribute
>>> computers appeared on the scene,
>>> this is a deja-vu.
>>
>> Not to mention the prior appearance of array processors. Oil+Gas
>> bought a lot of those, too. Some important radio astronomy data
>> reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B
>> was 10X faster than the VAX by itself. Then microprocessor-based
>> workstations arrived, and the game was over, ease of use FTW.
>>
>>> Even on a single system, if you try an auto-parallel/auto- 
>>> vectorizing
>>> compiler on a real code, your results will probably be  
>>> disappointing.
>>
>> The wins from such compilers have been steadily decreasing, as main
>> memory gets farther and farther away from the CPU and caches.
>>
>> -- greg
>

Early Morning oh oh oh oh, apologies the context might be clear yet
the sentences were written down wrong.

> It's different this time indeed; classic cpu's will never again
> deliver big performance.
>

ack

> cache - coherency is simply too complicated with many cores.


1) Cache-coherency is too complicated for CPU's

> cpu's also will need a manycore co-processor therefore.
>

ack

> furthermore manycores simply are cheaper to produce and they can eat
> a bigger powerbudget.
>

ack

> 3 very powerful arguments which regrettably limits cpu's, but that's
> the price we pay for progress.
>

ack

> It won't mean cpu's will go away of course any soon, they're so
> generic and easy to program that
> they will survive. Just offload the calculations to the manycores.
>

ack

> please don't estimate the argument of cheaper to produce.
>
>

please don't 	UNDERESTIMATE the argument of cheaper to produce

only 6 out of 8 score = 75% sharp in the morning

>
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Tue Apr  5 05:10:28 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Tue, 05 Apr 2011 19:10:28 +1000
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
Message-ID: <4D9ADC84.7030804@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/04/11 05:26, Vincent Diepeveen wrote:

> GPU's completely annihilate cpu's everywhere.

Great!  Where can I get one with 1TB of on-card RAM to
keep our denovo reassembly people happy ?

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0
POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t
=OMhq
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Tue Apr  5 09:05:19 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Tue, 5 Apr 2011 15:05:19 +0200
Subject: [Beowulf] GP-GPU experience
In-Reply-To: <4D9ADC84.7030804@unimelb.edu.au>
References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1103032343240.24334@coffee.psychology.mcmaster.ca>	<E1PvvZz-0003ng-00.mikky_m-mail-ru@f121.mail.ru>	<Pine.LNX.4.64.1103072315300.8438@coffee.psychology.mcmaster.ca>	<46317.192.168.93.213.1299685688.squirrel@mail.eadline.org>	<45254.192.168.93.213.1300711866.squirrel@mail.eadline.org>
	<DCF6BCEC-CFC3-4EB8-84FE-28DB66B0F246@xs4all.nl>
	<4D9ADC84.7030804@unimelb.edu.au>
Message-ID: <2538ED2A-7F07-4524-B74E-6F0AE623916E@xs4all.nl>


On Apr 5, 2011, at 11:10 AM, Christopher Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/04/11 05:26, Vincent Diepeveen wrote:
>
>> GPU's completely annihilate cpu's everywhere.
>
> Great!  Where can I get one with 1TB of on-card RAM to
> keep our denovo reassembly people happy ?

There is already several projects in that area that tried incorporate  
GPU's and with succes.

Just google a bit, i got bunches of hits from all sorts of research  
institutes in that area,
most already over 2 years old, nothing new there.

Your reaction just shows your ignorance there.

Regards,
Vincent


>
> - --
>     Christopher Samuel - Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>          http://www.vlsci.unimelb.edu.au/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0
> POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t
> =OMhq
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Wed Apr  6 06:58:12 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Wed, 6 Apr 2011 11:58:12 +0100
Subject: [Beowulf] Westmere EX
Message-ID: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/

10 core Westmere EX on an eight socket box = 80 cores 
These would be a very nice machine.
Anyone know if machines like this will be built?
Do the sockets have enough Quickpath links to create an 8-way topology?


John Hearns | CFD Hardware Specialist | McLaren Racing Limited
McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

T:  +44 (0) 1483 261000
D:  +44 (0) 1483 262352
F:  +44 (0) 1483 261010
E:  john.hearns at mclaren.com
W:  www.mclaren.com


The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From brice.goglin at gmail.com  Wed Apr  6 07:05:55 2011
From: brice.goglin at gmail.com (Brice Goglin)
Date: Wed, 06 Apr 2011 13:05:55 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4D9C4913.10802@gmail.com>

Le 06/04/2011 12:58, Hearns, John a ?crit :
> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
>
> 10 core Westmere EX on an eight socket box = 80 cores 
> These would be a very nice machine.
> Anyone know if machines like this will be built?
> Do the sockets have enough Quickpath links to create an 8-way topology?
>   

You only have 4 QPI links per sockets, no way to connect the entire graph.

Supermicro already announced such 8-way machines. See their QPI topology
on page 30 of the motherboard manual available at
http://www.supermicro.com/products/motherboard/Xeon7000/7500/X8OBN-F.cfm

Brice

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cap at nsc.liu.se  Wed Apr  6 11:41:18 2011
From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=)
Date: Wed, 6 Apr 2011 17:41:18 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <201104061741.18972.cap@nsc.liu.se>

On Wednesday, April 06, 2011 12:58:12 pm Hearns, John wrote:
> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
> 
> 10 core Westmere EX on an eight socket box = 80 cores
> These would be a very nice machine.
> Anyone know if machines like this will be built?
> Do the sockets have enough Quickpath links to create an 8-way topology?
> 
> 
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

The HP DL980 is an 8 socket EX box but it's not glue-less (it uses HPs own 
numa interconnect). If you stuff 'em full of dimms then they're probably 
competitive with the 4 socket 580 (assuming the 980 uses 8G dimms instead of 
16G for the 580...).

/Peter
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Wed Apr  6 14:00:17 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Wed, 6 Apr 2011 20:00:17 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>


On Apr 6, 2011, at 12:58 PM, Hearns, John wrote:

> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
>
> 10 core Westmere EX on an eight socket box = 80 cores
> These would be a very nice machine.
> Anyone know if machines like this will be built?
> Do the sockets have enough Quickpath links to create an 8-way  
> topology?

What do you intend to use the machines for?
For a chessprogram they would be great, but none of those guys has  
the cash to pay for these
machines.

For financial world it would be a waste of money as well as the  
latency probably will be very very bad.

They seem to get equipped with a max of 512GB ram, not really much  
for those who badly need a lot of RAM,
if we consider the price of such a configured machine.

Same price like a power7.

>
>
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK
>
> T:  +44 (0) 1483 261000
> D:  +44 (0) 1483 262352
> F:  +44 (0) 1483 261010
> E:  john.hearns at mclaren.com
> W:  www.mclaren.com
>
>
>
>
> The contents of this email are confidential and for the exclusive  
> use of the intended recipient.  If you receive this email in error  
> you should not copy it, retransmit it, use it or disclose its  
> contents but should return it to the sender immediately and delete  
> your copy.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Wed Apr  6 14:12:56 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Wed, 6 Apr 2011 19:12:56 +0100
Subject: [Beowulf] Westmere EX
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
Message-ID: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> What do you intend to use the machines for?
> For a chessprogram they would be great, but none of those guys has
> the cash to pay for these
> machines.


The Supermicro board which Bruce Goglin refers to is said to support
16gbytes DIMMS.

Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a
1024 Gbyte machine,
plus you can cook your dinner on it.

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Wed Apr  6 14:18:35 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Thu, 07 Apr 2011 01:18:35 +0700
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
	<207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4D9CAE7B.8000900@pathscale.com>

Hearns, John wrote:
>> What do you intend to use the machines for?
>> For a chessprogram they would be great, but none of those guys has
>> the cash to pay for these
>> machines.
>>     
>
>
>
> The Supermicro board which Bruce Goglin refers to is said to support
> 16gbytes DIMMS.
>
> Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a
> 1024 Gbyte machine,
> plus you can cook your dinner on it.
>   
LOL.. (I have to admit that's kinda funny, but only because it's true)

I didn't look at the specs, but I wonder how many IOPS you could get off 
a ram disk on that thing.. $60k is I believe (I could be wrong) in the 
same ballpark as 1T 1U 1million IOPS appliances (albeit they offer 
persistence and probably consume less power as well)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hearnsj at googlemail.com  Wed Apr  6 18:02:47 2011
From: hearnsj at googlemail.com (John Hearns)
Date: Wed, 6 Apr 2011 23:02:47 +0100
Subject: [Beowulf] Westmere EX
In-Reply-To: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
Message-ID: <BANLkTimxRTWgYzmPb-2EpXhv-F7BJt=ryA@mail.gmail.com>

On 6 April 2011 19:00, Vincent Diepeveen <diep at xs4all.nl> wrote:
>
> On Apr 6, 2011, at 12:58 PM, Hearns, John wrote:
>
>> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/

> What do you intend to use the machines for?

Maybe something like:
http://www.youtube.com/watch?v=x2Z3h_Hx310&NR=1
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Wed Apr  6 20:39:19 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 6 Apr 2011 20:39:19 -0400 (EDT)
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>

> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
>
> 10 core Westmere EX on an eight socket box = 80 cores
> These would be a very nice machine.

shrug.  does anyone have serious experience with real apps on manycore
machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
but they're substantially more exotic/rare/expensive.)

I bet there will be 100x more 4s servers build with these chips than 8s. 
and 1000x more 2s than 4s...

a friend noticed something weird on intel's spec sheets:
http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E

notice it says 32GB max memory size.  even if that means 32GB/socket,
it's not all that much.

I don't know about everyone else, but I'm already bored with core counts ;)
these also seem fairly warm (130W), considering that they're the fancy
new 32nm process and run at modest clock rates...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From joshua_mora at usa.net  Wed Apr  6 20:57:43 2011
From: joshua_mora at usa.net (Joshua mora acosta)
Date: Wed, 06 Apr 2011 19:57:43 -0500
Subject: [Beowulf] Westmere EX
Message-ID: <093PDga5r8464S02.1302137863@web02.cms.usa.net>

_3D_ FFT scaling will allow you to see how well balanced is the system.

Joshua
------ Original Message ------
Received: 07:40 PM CDT, 04/06/2011
From: Mark Hahn <hahn at mcmaster.ca>
To: Beowulf Mailing List <beowulf at beowulf.org>
Subject: Re: [Beowulf] Westmere EX

> > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
> >
> > 10 core Westmere EX on an eight socket box = 80 cores
> > These would be a very nice machine.
> 
> shrug.  does anyone have serious experience with real apps on manycore
> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
> but they're substantially more exotic/rare/expensive.)
> 
> I bet there will be 100x more 4s servers build with these chips than 8s. 
> and 1000x more 2s than 4s...
> 
> a friend noticed something weird on intel's spec sheets:
>
http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E
> 
> notice it says 32GB max memory size.  even if that means 32GB/socket,
> it's not all that much.
> 
> I don't know about everyone else, but I'm already bored with core counts ;)
> these also seem fairly warm (130W), considering that they're the fancy
> new 32nm process and run at modest clock rates...
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jlforrest at berkeley.edu  Wed Apr  6 22:15:17 2011
From: jlforrest at berkeley.edu (Jon Forrest)
Date: Wed, 06 Apr 2011 19:15:17 -0700
Subject: [Beowulf] Westmere EX
In-Reply-To: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
Message-ID: <4D9D1E35.9040802@berkeley.edu>

On 4/6/2011 5:39 PM, Mark Hahn wrote:

> shrug.  does anyone have serious experience with real apps on manycore
> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
> but they're substantially more exotic/rare/expensive.)

I have a couple 48-core 1U boxes. They can build
gcc and other large packages very quickly.

The scientists who run single process simulations
also like them but they're not real picky about
how long it takes for something to run. They also
generally spend close to no time at all optimizing
anything.

-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Thu Apr  7 04:43:06 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 09:43:06 +0100
Subject: [Beowulf] Westmere EX
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
Message-ID: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> 
> On 4/6/2011 5:39 PM, Mark Hahn wrote:
> 
> > shrug.  does anyone have serious experience with real apps on
> manycore
> > machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
> > but they're substantially more exotic/rare/expensive.)
> 
> I have a couple 48-core 1U boxes. They can build
> gcc and other large packages very quickly.
> 
> The scientists who run single process simulations
> also like them but they're not real picky about
> how long it takes for something to run. They also
> generally spend close to no time at all optimizing
> anything.

"Premature optimization is the root of all evil"  - Donald Knuth


I'm also interested in the response to Mark Hahn's question - I guess
that's why I started this thread really!

Also as I've said before, with the advent of affordable manycore systems
like this, we're going
to have to dust off those old skills practised in the age of SMP monster
machines - which were probably
something like the same specs as these affordable systems!

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From eugen at leitl.org  Thu Apr  7 04:56:33 2011
From: eugen at leitl.org (Eugen Leitl)
Date: Thu, 7 Apr 2011 10:56:33 +0200
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
Message-ID: <20110407085633.GE23560@leitl.org>


http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n 

10,000-core Linux supercomputer built in Amazon cloud

Cycle Computing builds cloud-based supercomputing cluster to boost scientific
research.

By Jon Brodkin, Network World

April 06, 2011 03:15 PM ET

High-performance computing expert Jason Stowe recently asked two of his
engineers a simple question: Can you build a 10,000-core cluster in the
cloud?

"It's a really nice round number," says Stowe, the CEO and founder of Cycle
Computing, a vendor that helps customers gain fast and efficient access to
the kind of supercomputing power usually reserved for universities and large
research organizations.

SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to
Linux

To continue reading, register here to become an Insider. You'll get free
access to premium content from CIO, Computerworld, CSO, InfoWorld, and
Network World. See more Insider content or sign in.

High-performance computing expert Jason Stowe recently asked two of his
engineers a simple question: Can you build a 10,000-core cluster in the
cloud?

"It's a really nice round number," says Stowe, the CEO and founder of Cycle
Computing, a vendor that helps customers gain fast and efficient access to
the kind of supercomputing power usually reserved for universities and large
research organizations.

SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to
Linux

Cycle Computing had already built a few clusters on Amazon's Elastic Compute
Cloud that scaled up to several thousand cores. But Stowe wanted to take it
to the next level. Provisioning 10,000 cores on Amazon has probably been done
numerous times, but Stowe says he's not aware of anyone else achieving that
number in an HPC cluster, meaning one that uses a batch scheduling technology
and runs an HPC-optimized application.

"We haven't found references to anything larger," Stowe says. Had it been
tested for speed, the Linux-based cluster Stowe ran on Amazon might have been
big enough to make the Top 500 list of the world's fastest supercomputers.

One of the first steps was finding a customer that would benefit from such a
large cluster. There's no sense in spinning up such a large environment
unless it's devoted to some real work.

The customer that opted for the 10,000-core cloud cluster was biotech company
Genentech in San Francisco, where scientist Jacob Corn needed computing power
to examine how proteins bind to each other, in research that might eventually
lead to medical treatments. Compared to the 10,000-core cluster, "we're a
tenth the size internally," Corn says.

Cycle Computing and Genentech spun up the cluster on March 1 a little after
midnight, based on Amazon's advice regarding the optimal time to request
10,000 cores. While Amazon offers virtual machine instances optimized for
high-performance computing, Cycle and Genentech instead opted for a "standard
vanilla CentOS" Linux cluster to save money, according to Stowe. CentOS is a
version of Linux based on Red Hat's Linux.

The 10,000 cores were composed of 1,250 instances with eight cores each, as
well as 8.75TB of RAM and 2PB disk space. Scaling up a couple of thousand
cores at a time, it took 45 minutes to provision the whole cluster. There
were no problems. "When we requested the 10,000th core, we got it," Stowe
said.

The cluster ran for eight hours at a cost of $8,500, including all the fees
to Amazon and Cycle Computing. (See also: Start-up transforms unused desktop
cycles into fast server clusters)

For Genentech, this was cheap and easy compared to the alternative of buying
10,000 cores for its own data center and having them idle away with no work
for most of their lives, Corn says. Using Genentech's existing resources to
perform the simulations would take weeks or months instead of the eight hours
it took on Amazon, he says. Genentech benefited from the high number of cores
because its calculations were "embarrassingly parallel," with no
communication between nodes, so performance stats "scaled linearly with the
number of cores," Corn said.

To provision the cluster, Cycle used its own CycleCloud software, the Condor
scheduling system and Chef, an open source configuration management
framework.

Cycle also used some of its own software to detect errors and restart nodes
when necessary, a shared file system, and a few extra nodes on top of the
10,000 to handle some of the legwork. To ensure security, the cluster was
engineered with secure-HTTP and 128/256-bit Advanced Encryption Standard
encryption, according to Cycle.

Cycle Computing boasted that the cluster was roughly equivalent to the 114th
fastest supercomputer in the world on the Top 500 list, which hit about 66
teraflops. In reality, they didn't run the speed benchmark required to submit
a cluster to the Top 500 list, but nearly all of the systems listed below No.
114 in the ranking contain fewer than 10,000 cores.

Genentech is still waiting to see whether the simulations lead to anything
useful in the real world, but Corn says the data "looks fantastic." He says
Genentech is "very open" to building out more Amazon clusters, and Cycle
Computing is looking ahead as well.

"We're already working on scaling up larger," Stowe says. All Cycle needs is
a customer with "a use case to take advantage of it."

Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin

Read more about data center in Network World's Data Center section. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr  7 08:47:54 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 7 Apr 2011 14:47:54 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl>
	<207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <33396FD5-FBAA-4735-8694-B0D7FE7EAA84@xs4all.nl>


On Apr 6, 2011, at 8:12 PM, Hearns, John wrote:

>> What do you intend to use the machines for?
>> For a chessprogram they would be great, but none of those guys has
>> the cash to pay for these
>> machines.
>
>
>
> The Supermicro board which Bruce Goglin refers to is said to support
> 16gbytes DIMMS.
>
> Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a
> 1024 Gbyte machine,
> plus you can cook your dinner on it.
>

Except that you can't buy the machine equipped with that for $60k in  
a shop.

512GB equipped 8 socket nehalem-ex (8 core version 2.26Ghz) was  
introduced at $205k,
that's without further equipment such as huge storage, so basic  
configuration when ordered at Oracle.

So this box will probably be $250k or $300k or so?

Regards,
Vincent

> The contents of this email are confidential and for the exclusive  
> use of the intended recipient.  If you receive this email in error  
> you should not copy it, retransmit it, use it or disclose its  
> contents but should return it to the sender immediately and delete  
> your copy.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr  7 08:52:43 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 7 Apr 2011 14:52:43 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>


On Apr 7, 2011, at 10:43 AM, Hearns, John wrote:

>>
>> On 4/6/2011 5:39 PM, Mark Hahn wrote:
>>
>>> shrug.  does anyone have serious experience with real apps on
>> manycore
>>> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
>>> but they're substantially more exotic/rare/expensive.)
>>
>> I have a couple 48-core 1U boxes. They can build
>> gcc and other large packages very quickly.
>>
>> The scientists who run single process simulations
>> also like them but they're not real picky about
>> how long it takes for something to run. They also
>> generally spend close to no time at all optimizing
>> anything.
>
> "Premature optimization is the root of all evil"  - Donald Knuth
>
>
> I'm also interested in the response to Mark Hahn's question - I guess
> that's why I started this thread really!
>
> Also as I've said before, with the advent of affordable manycore  
> systems
> like this, we're going


> to have to dust off those old skills practised in the age of SMP  
> monster
> machines - which were probably
> something like the same specs as these affordable systems!
>

it's not clear what 'these' refers to.

48 core AMD multicore machine: $8000 on ebay i saw one for. Of course  
not much of a RAM and not fastest chip.
Let's say fully configured about double that price.

GPU monster box, which is basically a few videocards inside such a  
box stacked up a tad, wil only add a couple of
thousands.

But a 8 socket @ 10 core nehalem-ex, in basic configuration will be  
already far above $205k. Probably a $300k or
so when configured.

Huge price difference.

So i assume you didn't refer to the Nehalem-ex box.

> The contents of this email are confidential and for the exclusive  
> use of the intended recipient.  If you receive this email in error  
> you should not copy it, retransmit it, use it or disclose its  
> contents but should return it to the sender immediately and delete  
> your copy.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 09:49:09 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 09:49:09 -0400
Subject: [Beowulf] Westmere EX
In-Reply-To: <4D9D1E35.9040802@berkeley.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
Message-ID: <4D9DC0D5.8060802@ias.edu>

On 04/06/2011 10:15 PM, Jon Forrest wrote:
> On 4/6/2011 5:39 PM, Mark Hahn wrote:
> 
>> shrug.  does anyone have serious experience with real apps on manycore
>> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
>> but they're substantially more exotic/rare/expensive.)
> 
> I have a couple 48-core 1U boxes. They can build
> gcc and other large packages very quickly.

But are the makes definitely running in parallel to take advantage of
the multiple cores? I haven't built gcc, so don't know if it uses make's
-j option to do parallel builds.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:03:47 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:03:47 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <20110407085633.GE23560@leitl.org>
References: <20110407085633.GE23560@leitl.org>
Message-ID: <4D9DC443.9080502@ias.edu>

On 04/07/2011 04:56 AM, Eugen Leitl wrote:
> 
> "It's a really nice round number," says Stowe, the CEO and founder of Cycle
> Computing,

Clearly he's a marketing man. Everyone know real computer guys think in
powers of 2.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:16:53 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:16:53 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <20110407085633.GE23560@leitl.org>
References: <20110407085633.GE23560@leitl.org>
Message-ID: <4D9DC755.5070004@ias.edu>

A great publicity stunt, but I still don't think it qualifies as a
"real" HPC cluster achievement.  See comments/objections in-line below.

On 04/07/2011 04:56 AM, Eugen Leitl wrote:
> 
> http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n 
> 
> The cluster ran for eight hours

That's not very long for HPC jobs. How much would the performance have
degraded if it started to run into the daytime hours, when demand for
CPU cycles  in EC2 would be at their peak?

> Genentech benefited from the high number of cores
> because its calculations were "embarrassingly parallel," with no
> communication between nodes, so performance stats "scaled linearly with the
> number of cores," Corn said.
> 

So it wasn't really a cluster at all, but a giant batch scheduling system.

I probably have a stricter sense of what makes a cluster than some
others, so let's not argue on the the definition of cluster and split
hairs. In my book, a cluster involves parallel communication between the
processes using MPI, PVM or some other parallel communications paradigm.

And BTW, my comments are not directed Eugene for posting this. Just
starting a general discussion on this article...

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:21:19 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:21:19 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
Message-ID: <4D9DC85F.9080503@ias.edu>

Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?

In all these commercials, the protagonists say "to the cloud" for their
solution, but then when they show them using Microsoft Windows to access
"the cloud", they're not using the cloud at all.

In fact, in one commercial, the one where the wife/mother is fixing the
family portrait, she's using a photoshop-like program on her own
desktop, not even the Internet is needed.

Not only do they use the term "cloud" incorrectly, they don't even show
how using Microsoft products give you and advantage for using "the cloud"

AAAAAAARRRRRRRGGGH!

Okay. Venting over. Whew! I feel better already.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Thu Apr  7 10:27:27 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 15:27:27 +0100
Subject: [Beowulf] Microsoft "cloud" commercials.
References: <4D9DC85F.9080503@ias.edu>
Message-ID: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> 
> Is anyone else as annoyed by the Microsoft "cloud" commercials as I
am?
> 
> 
In London there is a saturation of Microsoft Cloud advert posters in the
mainline stations
and Tube lines serving the City (the financial district) and Canary
Wharf.

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 10:40:28 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 10:40:28 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <4D9DC85F.9080503@ias.edu>
	<207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4D9DCCDC.1080607@ias.edu>


On 04/07/2011 10:27 AM, Hearns, John wrote:
>>
>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I
> am?
>>
>>
> In London there is a saturation of Microsoft Cloud advert posters in the
> mainline stations
> and Tube lines serving the City (the financial district) and Canary
> Wharf.
> 

But do they annoy you? ;)

For those of you outside the US, here's the commercials I'm referring to:

1. http://youtu.be/-HRrbLA7rss

2. http://youtu.be/mjtqoQE_ezA

3. http://youtu.be/_lu6v6hE_bA

4. http://youtu.be/Lel3swo4RMc

Out of these only (1) could possibly be using the cloud, if they're
using Google docs or something similar to create and share their documents.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at UR.Rochester.edu  Thu Apr  7 10:49:09 2011
From: scrusan at UR.Rochester.edu (Crusan, Steve)
Date: Thu, 7 Apr 2011 10:49:09 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
References: <4D9DC85F.9080503@ias.edu>
Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>

Windows HPC Server 2008 also has a builtin feature for an end user to submit excel docs to a windows cluster to do intense timesheet and office supplies calculations...

----------------------
Steve Crusan
System Administrator
Center for Research Computing


-----Original Message-----
From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
Sent: Thu 4/7/2011 10:21 AM
To: Beowulf Mailing List
Subject: [Beowulf] Microsoft "cloud" commercials.
 
Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?

In all these commercials, the protagonists say "to the cloud" for their
solution, but then when they show them using Microsoft Windows to access
"the cloud", they're not using the cloud at all.

In fact, in one commercial, the one where the wife/mother is fixing the
family portrait, she's using a photoshop-like program on her own
desktop, not even the Internet is needed.

Not only do they use the term "cloud" incorrectly, they don't even show
how using Microsoft products give you and advantage for using "the cloud"

AAAAAAARRRRRRRGGGH!

Okay. Venting over. Whew! I feel better already.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110407/b92c4881/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From dag at sonsorol.org  Thu Apr  7 11:03:25 2011
From: dag at sonsorol.org (Chris Dagdigian)
Date: Thu, 07 Apr 2011 11:03:25 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <4D9DC755.5070004@ias.edu>
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu>
Message-ID: <4D9DD23D.8090908@sonsorol.org>


The CycleComputing folks are good people in my book and I bet more than 
a few are subscribed to this list. The founders are old-school Condor 
gurus with a long track record in this field.

One of the nice things about their work is how "usable" it is to real 
people with real production computing requirements - in the IAAS cloud 
space there are way too many marketing robots talking vague BS about 
"cloud bursting", "hybrid clusters" and storage aggregation/access 
across LAN/WAN distances. Cycle has built, deployed & delivered all of 
this with (what I'd consider) a bare minimum of marketing and chest 
thumping.

It's not a PR gimmick and limiting the definition of "cluster" to only 
systems that run parallel applications would alienate quite a few of us 
on this list :) In the life sciences a typical cluster might run a 
mixture of 80-90% serial jobs with a small scattering of real MPI apps 
running alongside.

I get cynical about this stuff because in the cloud space you see way 
too many commercial people promising the world without actually 
delivering anything (other than carefully hand-managed reference account 
projects) while the academic & supercomputing folks are all busy 
presenting and bragging about things that will never see the light of 
day after their thesis defense.

There are people like Cycle/Rightscale etc. etc. who actually rise above 
the hype and deliver clever & usable stuff with a minimum of marketing BS.

My $.02 of course

-Chris


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 11:05:53 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 11:05:53 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
References: <4D9DC85F.9080503@ias.edu>
	<9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
Message-ID: <4D9DD2D1.9070309@ias.edu>

"Cluster" != "Cloud"

The Cloud, by definition requires the Internet. Clusters do not. In
fact, I bet the NSA can show you many clusters that are not connect to
the Internet at all.

While I'm at it, "Grid" != ("Cluster" || "Cloud") either!


On 04/07/2011 10:49 AM, Crusan, Steve wrote:
> Windows HPC Server 2008 also has a builtin feature for an end user to
> submit excel docs to a windows cluster to do intense timesheet and
> office supplies calculations...
> 
> ----------------------
> Steve Crusan
> System Administrator
> Center for Research Computing
> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
> Sent: Thu 4/7/2011 10:21 AM
> To: Beowulf Mailing List
> Subject: [Beowulf] Microsoft "cloud" commercials.
> 
> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?
> 
> In all these commercials, the protagonists say "to the cloud" for their
> solution, but then when they show them using Microsoft Windows to access
> "the cloud", they're not using the cloud at all.
> 
> In fact, in one commercial, the one where the wife/mother is fixing the
> family portrait, she's using a photoshop-like program on her own
> desktop, not even the Internet is needed.
> 
> Not only do they use the term "cloud" incorrectly, they don't even show
> how using Microsoft products give you and advantage for using "the cloud"
> 
> AAAAAAARRRRRRRGGGH!
> 
> Okay. Venting over. Whew! I feel better already.
> 
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 11:13:43 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 11:13:43 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <4D9DD23D.8090908@sonsorol.org>
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu>
	<4D9DD23D.8090908@sonsorol.org>
Message-ID: <4D9DD4A7.7060601@ias.edu>

On 04/07/2011 11:03 AM, Chris Dagdigian wrote:
> 
> The CycleComputing folks are good people in my book and I bet more than 
> a few are subscribed to this list. The founders are old-school Condor 
> gurus with a long track record in this field.
> 
> One of the nice things about their work is how "usable" it is to real 
> people with real production computing requirements - in the IAAS cloud 
> space there are way too many marketing robots talking vague BS about 
> "cloud bursting", "hybrid clusters" and storage aggregation/access 
> across LAN/WAN distances. Cycle has built, deployed & delivered all of 
> this with (what I'd consider) a bare minimum of marketing and chest 
> thumping.
> 
> It's not a PR gimmick and limiting the definition of "cluster" to only 
> systems that run parallel applications would alienate quite a few of us 
> on this list :) In the life sciences a typical cluster might run a 
> mixture of 80-90% serial jobs with a small scattering of real MPI apps 
> running alongside.

Do not confuse "scientific computing" or "high performance computing"
with "cluster". All terms are definitely related, but you can do
scientific/high-perfomance computing without a "cluster."

As someone who also works in life sciences, I know that there are a lot
of life science tasks that are embarrassingly parallel. Running these
tasks on a bunch of different machines simultaneously is definitely
scientific and high performance computing, but it doesn't necessarily
require a cluster. Folding at home, for example.

> 
> I get cynical about this stuff because in the cloud space you see way 
> too many commercial people promising the world without actually 
> delivering anything (other than carefully hand-managed reference account 
> projects) while the academic & supercomputing folks are all busy 
> presenting and bragging about things that will never see the light of 
> day after their thesis defense.

Me, too, which is why I started ranting about Microsoft's cloud
commercials in a separate thread. ;) It's also why I'm starting to get
picky about how the term "cluster" is used. More and more, I see people
confusing "cloud" with "cluster". I guess that cynicism is what caused
me to reply to the original post.

> 
> There are people like Cycle/Rightscale etc. etc. who actually rise above 
> the hype and deliver clever & usable stuff with a minimum of marketing BS.
> 
> My $.02 of course
> 
> -Chris
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at UR.Rochester.edu  Thu Apr  7 11:13:32 2011
From: scrusan at UR.Rochester.edu (Crusan, Steve)
Date: Thu, 7 Apr 2011 11:13:32 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
	<4D9DD2D1.9070309@ias.edu>
Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>

Oh I understand the difference, but I thought I'd take this opportunity to bash MS.

But, since MS's cloud runs off of MS Azure and MS Server 2008, I would bet the excel functionality would be possible.

----------------------
Steve Crusan
System Administrator
Center for Research Computing


-----Original Message-----
From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
Sent: Thu 4/7/2011 11:05 AM
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] Microsoft "cloud" commercials.
 
"Cluster" != "Cloud"

The Cloud, by definition requires the Internet. Clusters do not. In
fact, I bet the NSA can show you many clusters that are not connect to
the Internet at all.

While I'm at it, "Grid" != ("Cluster" || "Cloud") either!


On 04/07/2011 10:49 AM, Crusan, Steve wrote:
> Windows HPC Server 2008 also has a builtin feature for an end user to
> submit excel docs to a windows cluster to do intense timesheet and
> office supplies calculations...
> 
> ----------------------
> Steve Crusan
> System Administrator
> Center for Research Computing
> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
> Sent: Thu 4/7/2011 10:21 AM
> To: Beowulf Mailing List
> Subject: [Beowulf] Microsoft "cloud" commercials.
> 
> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?
> 
> In all these commercials, the protagonists say "to the cloud" for their
> solution, but then when they show them using Microsoft Windows to access
> "the cloud", they're not using the cloud at all.
> 
> In fact, in one commercial, the one where the wife/mother is fixing the
> family portrait, she's using a photoshop-like program on her own
> desktop, not even the Internet is needed.
> 
> Not only do they use the term "cloud" incorrectly, they don't even show
> how using Microsoft products give you and advantage for using "the cloud"
> 
> AAAAAAARRRRRRRGGGH!
> 
> Okay. Venting over. Whew! I feel better already.
> 
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110407/bed316f8/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From john.hearns at mclaren.com  Thu Apr  7 11:13:24 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 16:13:24 +0100
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> 
	<4D9DD23D.8090908@sonsorol.org>
Message-ID: <207BB2F60743C34496BE41039233A809042454EA@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

> 
> There are people like Cycle/Rightscale etc. etc. who actually rise
> above
> the hype and deliver clever & usable stuff with a minimum of marketing
> BS.
> 
> My $.02 of course

Surely your $.02 per cpu per minute?

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 11:15:58 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 11:15:58 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>
References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>
	<4D9DD2D1.9070309@ias.edu>
	<9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>
Message-ID: <4D9DD52E.8040103@ias.edu>

Oh, sorry. I missed the sarcasm. I thought you were defending MS.

The "office supplies calculations" should have tripped my sarcasm
detector immediately!

Sorry. I'm in a rare (and ranting!) mood today. Must be time for a vacation.

Prentice


On 04/07/2011 11:13 AM, Crusan, Steve wrote:
> Oh I understand the difference, but I thought I'd take this opportunity
> to bash MS.
> 
> But, since MS's cloud runs off of MS Azure and MS Server 2008, I would
> bet the excel functionality would be possible.
> 
> ----------------------
> Steve Crusan
> System Administrator"Crusan, Steve" <scrusan at UR.Rochester.edu>
> Center for Research Computing
> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
> Sent: Thu 4/7/2011 11:05 AM
> Cc: Beowulf Mailing List
> Subject: Re: [Beowulf] Microsoft "cloud" commercials.
> 
> "Cluster" != "Cloud"
> 
> The Cloud, by definition requires the Internet. Clusters do not. In
> fact, I bet the NSA can show you many clusters that are not connect to
> the Internet at all.
> 
> While I'm at it, "Grid" != ("Cluster" || "Cloud") either!
> 
> 
> On 04/07/2011 10:49 AM, Crusan, Steve wrote:
>> Windows HPC Server 2008 also has a builtin feature for an end user to
>> submit excel docs to a windows cluster to do intense timesheet and
>> office supplies calculations...
>>
>> ----------------------
>> Steve Crusan
>> System Administrator
>> Center for Research Computing
>>
>>
>>
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
>> Sent: Thu 4/7/2011 10:21 AM
>> To: Beowulf Mailing List
>> Subject: [Beowulf] Microsoft "cloud" commercials.
>>
>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?
>>
>> In all these commercials, the protagonists say "to the cloud" for their
>> solution, but then when they show them using Microsoft Windows to access
>> "the cloud", they're not using the cloud at all.
>>
>> In fact, in one commercial, the one where the wife/mother is fixing the
>> family portrait, she's using a photoshop-like program on her own
>> desktop, not even the Internet is needed.
>>
>> Not only do they use the term "cloud" incorrectly, they don't even show
>> how using Microsoft products give you and advantage for using "the cloud"
>>
>> AAAAAAARRRRRRRGGGH!
>>
>> Okay. Venting over. Whew! I feel better already.
>>
>> --
>> Prentice
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Thu Apr  7 11:35:24 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Thu, 07 Apr 2011 11:35:24 -0400
Subject: [Beowulf] Westmere EX
In-Reply-To: <4D9DC0D5.8060802@ias.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>
	<4D9DC0D5.8060802@ias.edu>
Message-ID: <4D9DD9BC.3090102@runnersroll.com>

On 04/07/11 09:49, Prentice Bisbal wrote:
> On 04/06/2011 10:15 PM, Jon Forrest wrote:
>> On 4/6/2011 5:39 PM, Mark Hahn wrote:
>>
>>> shrug.  does anyone have serious experience with real apps on manycore
>>> machines?  (I'm familiar with SGI boxes, where 80 is fairly ho-hum,
>>> but they're substantially more exotic/rare/expensive.)
>>
>> I have a couple 48-core 1U boxes. They can build
>> gcc and other large packages very quickly.
> 
> But are the makes definitely running in parallel to take advantage of
> the multiple cores? I haven't built gcc, so don't know if it uses make's
> -j option to do parallel builds.
> 

Yes, see:
http://gcc.gnu.org/install/build.html

In general I see quite nice speedups on my four-core machine at home
running Gentoo, but I find running -j > cores up to 2xcores tends to
produce better results as many packages (especially with recursive
makes) tend to mix configuration (low cpu usage) with makes (high cpu
usage).  The gentoo handbook itself suggests cores+1 for the -j
parameter.  Higher than core -j counts is purely a heuristic, and a few
packages will degrade a bit because in fact 8 (2x4cores) processes are
spawned, each contending heavily for the 4 cores and context switching
starts to slow things down and hurt locality.  Once again, I suppose
this is a YMMV situation.  It would be cool to hack make to dynamically
throttle parallelization based on cpu usage within some given bounds...

I have access to a 48 core box, so if I get a chance I'll generate a
graph for the list on gcc build times by -j count.  Note however that I
don't have root access so I can't clear caches, which should be taken
into account when examining results.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Thu Apr  7 11:42:18 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Thu, 07 Apr 2011 11:42:18 -0400
Subject: [Beowulf] Microsoft "cloud" commercials.
In-Reply-To: <4D9DD52E.8040103@ias.edu>
References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu>	<4D9DD2D1.9070309@ias.edu>	<9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu>
	<4D9DD52E.8040103@ias.edu>
Message-ID: <4D9DDB5A.3030700@runnersroll.com>

>>> -----Original Message-----
>>> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal
>>> Sent: Thu 4/7/2011 10:21 AM
>>> To: Beowulf Mailing List
>>> Subject: [Beowulf] Microsoft "cloud" commercials.
>>>
>>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am?

I completely agree.  It's a darn shame all those Truth campaigns
concentrate on drugs - clarifying popular media is a desperately needed
service for so many domains (at least in US media).

Although I have to admit I'm not sure if the cloud misnomer or the
disgusting family dynamics of the Photoshop commercial are more
bothering to me.  Always the dopey dad with these commercials...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Thu Apr  7 11:53:31 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Thu, 7 Apr 2011 16:53:31 +0100
Subject: [Beowulf] Westmere EX
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu> 
	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
Message-ID: <207BB2F60743C34496BE41039233A8090424562F@MRL-PWEXCHMB02.mil.tagmclarengroup.com>


> -----Original Message-----
> From: Vincent Diepeveen [mailto:diep at xs4all.nl]
> Sent: 07 April 2011 13:53


> 
> But a 8 socket @ 10 core nehalem-ex, in basic configuration will be
> already far above $205k. Probably a $300k or
> so when configured.
> 
> Huge price difference.
> 
> So i assume you didn't refer to the Nehalem-ex box.

I was referring to the Nehalem.


http://www.lasystems.be/Supermicro/SYS-5086B-TRF/Superserver5086B-TRF8-W
ay/product/248987.html


Add 8 CPUs at $4000 per cpu,
and 64 DIMMs at $944 per DIMM


The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Thu Apr  7 12:11:13 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Thu, 07 Apr 2011 12:11:13 -0400
Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour
In-Reply-To: <4D9DD23D.8090908@sonsorol.org>
References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu>
	<4D9DD23D.8090908@sonsorol.org>
Message-ID: <4D9DE221.2040806@runnersroll.com>

On 04/07/11 11:03, Chris Dagdigian wrote:
> One of the nice things about their work is how "usable" it is to real 
> people with real production computing requirements - in the IAAS cloud 

I wonder what "real" people with "real" production computing
requirements means here.  See below for further thoughts on my thoughts
on "real" codes and where I suspect they arise.

> It's not a PR gimmick and limiting the definition of "cluster" to only 
> systems that run parallel applications would alienate quite a few of us 
> on this list :) In the life sciences a typical cluster might run a 
> mixture of 80-90% serial jobs with a small scattering of real MPI apps 
> running alongside.

I'm certainly a pragmatist here - use the machines as your organization
feels is best.  However I still have a strong suspicion that most jobs
are serial because of:

1. Lack of experience properly parallelizing codes
2. Lack of proper environment on one's own desktop (i.e. Linux or group
licenses)
3. In rare cases such rapid development and short lifetime of a code
that parallelizing it will take longer than poorly serially coding it
and tolerating the run-times.

I can only hope that within the decade the programming paradigm shifts
along with the hardware and the average bloke becomes at least exposed
to basic parallel programming concepts.

The machine is still a "cluster" - the way it's used shouldn't guide
what it is referred to.  That doesn't mean running serial jobs on a
machine tailored for parallel ones is the best way to use your
time/money.  Probably better for one to simply buy Linux desktops for
all the employees, put them on a typical GigE network and have the
employees submit jobs to some tiny server in the Bosses office which
routes jobs evenly to everyone's machine distributed throughout the
building.

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Apr  7 12:25:35 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 07 Apr 2011 12:25:35 -0400
Subject: [Beowulf] Westmere EX
In-Reply-To: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
Message-ID: <4D9DE57F.4040303@ldeo.columbia.edu>

Vincent Diepeveen wrote:

> GPU monster box, which is basically a few videocards inside such a  
> box stacked up a tad, wil only add a couple of
> thousands.
> 

This price may be OK for the videocard-class GPUs,
but sounds underestimated, at least for Fermi Tesla.
Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
with 448 cores and 3GB RAM per GPU, cost around $10k.
For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
If you care about ECC, that's the price you pay, right?

Gus Correa
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cap at nsc.liu.se  Thu Apr  7 13:26:51 2011
From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=)
Date: Thu, 7 Apr 2011 19:26:51 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
Message-ID: <201104071926.56911.cap@nsc.liu.se>

On Thursday, April 07, 2011 02:39:19 am Mark Hahn wrote:
...
> I bet there will be 100x more 4s servers build with these chips than 8s.
> and 1000x more 2s than 4s...

Sounds about right :-) Not your average compute node by a long shot.

> a friend noticed something weird on intel's spec sheets:
> http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC
> 3E
> 
> notice it says 32GB max memory size.  even if that means 32GB/socket,
> it's not all that much.

Certainly looks odd on that page but does likely refer to max DIMM size. With 
64 DIMMs (4 socket example) that would then give you 2T.
 
> I don't know about everyone else, but I'm already bored with core counts ;)
> these also seem fairly warm (130W), considering that they're the fancy
> new 32nm process and run at modest clock rates...

It's the size of the beast... (caused by the number of cores and size of last 
level cache).

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110407/b4bb76fd/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From diep at xs4all.nl  Thu Apr  7 15:26:57 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 7 Apr 2011 21:26:57 +0200
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <4D9DE57F.4040303@ldeo.columbia.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
Message-ID: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>


On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:

> Vincent Diepeveen wrote:
>
>> GPU monster box, which is basically a few videocards inside such a
>> box stacked up a tad, wil only add a couple of
>> thousands.
>>
>
> This price may be OK for the videocard-class GPUs,
> but sounds underestimated, at least for Fermi Tesla.

Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
note there is a 6 GB version, not aware of price will be $$$$ i bet.
or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro

VERSUS

8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.

Factor 100 difference to those cards.

A couple of thousands versus a couple of hundreds of thousands.
Hope i made my point clear.


> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
> with 448 cores and 3GB RAM per GPU, cost around $10k.
> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
> If you care about ECC, that's the price you pay, right?

When fermi released it was a great gpu.

Regrettably they lobotomized the gamers card's double precision as i  
understand,
So it hardly has double precision capabilities; if you go for nvidia  
you sure need a Tesla,
no question about it.

As a company i would buy in 6990's though, they're a lot cheaper and  
roughly 3x faster
than the Nvidia's (for some more than 3x for other occassions less  
than 3x, note the card
has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).

3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for  
AMD
versus 448 cores nvidia with 448 execution units of 32 bits  
multiplication.

Especially because multiplication has improved a lot.

Already having written CUDA code some while ago, i wanted the cheap  
gamers card with big
horse power now at home so  i'm toying on a 6970 now so will be able  
to report to you what is possible to
achieve at that card with respect to prime numbers and such.

I'm a bit amazed so little public initiatives write code for the AMD  
gpu's.

Note that DDR5 ram doesn't have ECC by default, but has in case of  
AMD a CRC calculation
(if i understand it correctly). It's a bit more primitive than ECC,  
but works pretty ok and shows you
also when problems occured there, so figuring out remove what goes on  
is possible.

Make no mistake that this isn't ECC.
We know some HPC centers have as a hard requirement ECC, only nvidia  
is an alternative then.

In earlier posts from some time ago and some years ago i already  
wrote on that governments should
adapt more to how hardware develops rather than demand that hardware  
has to follow them.

HPC has too little cash to demand that from industry.

OpenCL i cannot advice at this moment (for a number of reasons).

AMD-CAL and CUDA are somewhat similar. Sure there is differences, but  
majority of codes are possible
to port quite well (there is exceptions), or easy work arounds.

Any company doing gpgpu i would advice developing both branches of  
code at the same time,
as that gives the company a lot of extra choices for really very  
little extra work. Maybe 1 coder,
and it always allows you to have the fastest setup run your  
production code.

That said we can safely expect that from raw performance coming years  
AMD will keep the leading edge
from crunching viewpoint. Elsewhere i pointed out why.

Even then i'd never bet at just 1 manufacturer. Go for both  
considering the cheap price of it.

For a lot of HPC centers the choice of nvidia will be an easy one, as  
the price of the Fermi cards
is peanuts compared to the price rest of the system and considering  
other demands that's what they'll go for.

That might change once you stick in bunches of videocards in nodes.

Please note that the gpu 'streamcores' or PE's whatever name you want  
to give them, are so bloody fast,
that your code has to work within the PE's themselves and hardly use  
the RAM.

Both for Nvidia as well as AMD, the streamcores are so fast, that you  
simply don't want to lose time on the RAM
when your software runs, let alone that you want to use huge RAM.

Add to that, that nvidia (have to still figure out for AMD) can in  
background stream from and to the gpu's RAM
from the CPU, so if you do really large calculations involving many  
nodes,
all that shouldn't be an issue in the first place.

So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that  
would really amaze me, though i'm sure
there is cases where that happens. If we see however what was ordered  
it mostly is the 3GB Tesla's,
at least on what has been reported, i have no global statistics on  
that...

Now all choices are valid there, but even then we speak about peanuts  
money compared to the price of
a single 8 socket Nehalem-ex box, which fully configured will be  
maybe $300k-$400k or something?

Whereas a set of 4x nvidia will be probably under $15k and 4x AMD  
6990 is 2000 euro.

There won't be 2 gpu nvidia's any soon because of the choice they  
have historically made for the memory controllers.
See explanation of intel fanboy David Kanter for that at  
realworldtech in a special article he wrote there.

Please note i'm not judging AMD nor Nvidia, they have made their  
choices based upon totally different
businessmodels i suspect and we must be happy we have this rich  
choice right now between cpu's from different
manufacturers and gpu's from different manufacturers.

Nvidia really seems to aim at supercomputers, giving their tesla line  
without lobotomization and lobotomizing their
gamers cards, where AMD aims at gamers and their gamercards have full  
functionality
without lobotomization.

Total different businessmodels. Both have their advantages and  
disadvantages.

 From pure performance viewpoint it's easy to see what's faster though.

Yet right now i realize all too well that just too many still  
hesitate between also offering gpu services additional to
cpu services, in which case having a gpu, regardless nvidia or amd,  
kicks butt of course from throughput viewpoint.

To be really honest with you guys, i had expected that by 2011 we  
would have a gpu reaching far over 1 Teraflop double precision  
handsdown. If we see that Nvidia delivers somewhere around 515 Gflop  
and AMD has 2 gpu's on a single card to get over that Teraflop double  
precision (claim is 1.27 Teraflop double precision),
that really is underneath my expectations from a few years ago.

Now of course i hope you realize i'm not coding double precision code  
at all; i'm writing everything in integers of 32 bits for the AMD  
card and the Nvidia equivalent also is using 32 bits integers. The  
ideal way to do calculations on those cards, so also very big  
transforms, is using the 32 x 32 == 64 bits instructions (that's 2  
instructions in case of AMD).

Regards,
Vincent


>
> Gus Correa
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Apr  7 15:44:25 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 07 Apr 2011 15:44:25 -0400
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <4D9E1419.9000408@ias.edu>

On 04/07/2011 03:26 PM, Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 
> 

You can't do a direct comparison between a CPU and a GPU. There are many
things that GPUs can't do (or can't do well) that are still better done
on a CPU. Even NVidia acknowledges in most of their promotional and
educational literature.

One example would be a code with a lot of branching.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Apr  7 16:37:46 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 07 Apr 2011 16:37:46 -0400
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <4D9E209A.1040408@ldeo.columbia.edu>

Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 

Not so much.

In your original message you said:

"GPU monster box, which is basically a few videocards inside such a
box stacked up a tad, wil only add a couple of thousands."

So, first it was a few GPUs on a box (whatever else the box
might have inside) for a couple of thousand (if dollars or euros
you did not specify).

Now you checked out the real prices, and said
that a *single* Fermi Tesla C2070 cost ~$2,200
(just the GPU alone, price in US dollars I suppose),
which is more like the real thing.

However, instead of admitting that your previous numbers were mistaken,
you insist that:

"Hope i made my point clear.".

Is this how you play chess?  :)
Even if your opponent is a computer, he/she/it might get
a bit discouraged.
You always win, even before the game starts.

Anyway, I don't play chess, I am no GPU expert,
I don't know about the lobotomizing of Fermi (I hope you're not talking 
about Enrico, he's dead),
and I don't think we're going anywhere with this discussion.
However, the GPU prices you sent in your original
email to the list were underestimated,
although I am afraid I may not be able to make this point go
across to you.
The prices you sent were too low,
at least when it comes to GPUs with ECC,
which is what is reliable for HPC.

Thank you,
Gus Correa

> 
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> 
> When fermi released it was a great gpu.
> 
> Regrettably they lobotomized the gamers card's double precision as i 
> understand,
> So it hardly has double precision capabilities; if you go for nvidia you 
> sure need a Tesla,
> no question about it.
> 
> As a company i would buy in 6990's though, they're a lot cheaper and 
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less than 
> 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD
> versus 448 cores nvidia with 448 execution units of 32 bits multiplication.
> 
> Especially because multiplication has improved a lot.
> 
> Already having written CUDA code some while ago, i wanted the cheap 
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able to 
> report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> 
> I'm a bit amazed so little public initiatives write code for the AMD gpu's.
> 
> Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a 
> CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC, but 
> works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on is 
> possible.
> 
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia is 
> an alternative then.
> 
> In earlier posts from some time ago and some years ago i already wrote 
> on that governments should
> adapt more to how hardware develops rather than demand that hardware has 
> to follow them.
> 
> HPC has too little cash to demand that from industry.
> 
> OpenCL i cannot advice at this moment (for a number of reasons).
> 
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but 
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> 
> Any company doing gpgpu i would advice developing both branches of code 
> at the same time,
> as that gives the company a lot of extra choices for really very little 
> extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your production 
> code.
> 
> That said we can safely expect that from raw performance coming years 
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> 
> Even then i'd never bet at just 1 manufacturer. Go for both considering 
> the cheap price of it.
> 
> For a lot of HPC centers the choice of nvidia will be an easy one, as 
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering 
> other demands that's what they'll go for.
> 
> That might change once you stick in bunches of videocards in nodes.
> 
> Please note that the gpu 'streamcores' or PE's whatever name you want to 
> give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use the 
> RAM.
> 
> Both for Nvidia as well as AMD, the streamcores are so fast, that you 
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> 
> Add to that, that nvidia (have to still figure out for AMD) can in 
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many nodes,
> all that shouldn't be an issue in the first place.
> 
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would 
> really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered it 
> mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on that...
> 
> Now all choices are valid there, but even then we speak about peanuts 
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be maybe 
> $300k-$400k or something?
> 
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 
> is 2000 euro.
> 
> There won't be 2 gpu nvidia's any soon because of the choice they have 
> historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at realworldtech 
> in a special article he wrote there.
> 
> Please note i'm not judging AMD nor Nvidia, they have made their choices 
> based upon totally different
> businessmodels i suspect and we must be happy we have this rich choice 
> right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> 
> Nvidia really seems to aim at supercomputers, giving their tesla line 
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full 
> functionality
> without lobotomization.
> 
> Total different businessmodels. Both have their advantages and 
> disadvantages.
> 
>  From pure performance viewpoint it's easy to see what's faster though.
> 
> Yet right now i realize all too well that just too many still hesitate 
> between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd, 
> kicks butt of course from throughput viewpoint.
> 
> To be really honest with you guys, i had expected that by 2011 we would 
> have a gpu reaching far over 1 Teraflop double precision handsdown. If 
> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 
> gpu's on a single card to get over that Teraflop double precision (claim 
> is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> 
> Now of course i hope you realize i'm not coding double precision code at 
> all; i'm writing everything in integers of 32 bits for the AMD card and 
> the Nvidia equivalent also is using 32 bits integers. The ideal way to 
> do calculations on those cards, so also very big transforms, is using 
> the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD).
> 
> Regards,
> Vincent
> 
> 
>>
>> Gus Correa
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Thu Apr  7 18:57:38 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Fri, 08 Apr 2011 05:57:38 +0700
Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to
 tackle huge scientific challenges
Message-ID: <4D9E4162.3030004@pathscale.com>

I just saw this on another ML and thought it may be of interest
------------
http://googleblog.blogspot.com/2011/04/1-billion-computing-core-hours-for.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Apr  7 21:03:07 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 07 Apr 2011 21:03:07 -0400
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <4D9E5ECB.60608@ldeo.columbia.edu>

Thank you for the information about AMD-CAL and the AMD GPUs.
Does AMD plan any GPU product with 64-bit and ECC,
similar to Tesla/Fermi?

The lack of a language standard may still be a hurdle here.
I guess there were old postings here about CUDA and OpenGL.
What fraction of the (non-gaming) GPU code is being written these days
in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using
compiler directives like those in the PGI compilers?

Thank you,
Gus Correa

Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 
> 
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> 
> When fermi released it was a great gpu.
> 
> Regrettably they lobotomized the gamers card's double precision as i 
> understand,
> So it hardly has double precision capabilities; if you go for nvidia you 
> sure need a Tesla,
> no question about it.
> 
> As a company i would buy in 6990's though, they're a lot cheaper and 
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less than 
> 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD
> versus 448 cores nvidia with 448 execution units of 32 bits multiplication.
> 
> Especially because multiplication has improved a lot.
> 
> Already having written CUDA code some while ago, i wanted the cheap 
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able to 
> report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> 
> I'm a bit amazed so little public initiatives write code for the AMD gpu's.
> 
> Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a 
> CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC, but 
> works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on is 
> possible.
> 
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia is 
> an alternative then.
> 
> In earlier posts from some time ago and some years ago i already wrote 
> on that governments should
> adapt more to how hardware develops rather than demand that hardware has 
> to follow them.
> 
> HPC has too little cash to demand that from industry.
> 
> OpenCL i cannot advice at this moment (for a number of reasons).
> 
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but 
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> 
> Any company doing gpgpu i would advice developing both branches of code 
> at the same time,
> as that gives the company a lot of extra choices for really very little 
> extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your production 
> code.
> 
> That said we can safely expect that from raw performance coming years 
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> 
> Even then i'd never bet at just 1 manufacturer. Go for both considering 
> the cheap price of it.
> 
> For a lot of HPC centers the choice of nvidia will be an easy one, as 
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering 
> other demands that's what they'll go for.
> 
> That might change once you stick in bunches of videocards in nodes.
> 
> Please note that the gpu 'streamcores' or PE's whatever name you want to 
> give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use the 
> RAM.
> 
> Both for Nvidia as well as AMD, the streamcores are so fast, that you 
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> 
> Add to that, that nvidia (have to still figure out for AMD) can in 
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many nodes,
> all that shouldn't be an issue in the first place.
> 
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would 
> really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered it 
> mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on that...
> 
> Now all choices are valid there, but even then we speak about peanuts 
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be maybe 
> $300k-$400k or something?
> 
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 
> is 2000 euro.
> 
> There won't be 2 gpu nvidia's any soon because of the choice they have 
> historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at realworldtech 
> in a special article he wrote there.
> 
> Please note i'm not judging AMD nor Nvidia, they have made their choices 
> based upon totally different
> businessmodels i suspect and we must be happy we have this rich choice 
> right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> 
> Nvidia really seems to aim at supercomputers, giving their tesla line 
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full 
> functionality
> without lobotomization.
> 
> Total different businessmodels. Both have their advantages and 
> disadvantages.
> 
>  From pure performance viewpoint it's easy to see what's faster though.
> 
> Yet right now i realize all too well that just too many still hesitate 
> between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd, 
> kicks butt of course from throughput viewpoint.
> 
> To be really honest with you guys, i had expected that by 2011 we would 
> have a gpu reaching far over 1 Teraflop double precision handsdown. If 
> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 
> gpu's on a single card to get over that Teraflop double precision (claim 
> is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> 
> Now of course i hope you realize i'm not coding double precision code at 
> all; i'm writing everything in integers of 32 bits for the AMD card and 
> the Nvidia equivalent also is using 32 bits integers. The ideal way to 
> do calculations on those cards, so also very big transforms, is using 
> the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD).
> 
> Regards,
> Vincent
> 
> 
>>
>> Gus Correa
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From kilian.cavalotti.work at gmail.com  Fri Apr  8 07:08:01 2011
From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI)
Date: Fri, 8 Apr 2011 13:08:01 +0200
Subject: [Beowulf] Westmere EX
In-Reply-To: <Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
Message-ID: <BANLkTikHZF7-n6W-wORPpoTBrSxgRrKevA@mail.gmail.com>

Hi Mark,

On Thu, Apr 7, 2011 at 2:39 AM, Mark Hahn <hahn at mcmaster.ca> wrote:
> notice it says 32GB max memory size. ?even if that means 32GB/socket,
> it's not all that much.

It's actually 32GB per DIMM, so up to 512GB per socket.

Cheers,
-- 
Kilian
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Fri Apr  8 08:45:09 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Fri, 8 Apr 2011 08:45:09 -0400 (EDT)
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<4D9D1E35.9040802@berkeley.edu>
	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
Message-ID: <47386.192.168.93.213.1302266709.squirrel@mail.eadline.org>

All:

This video may help clear things up:

  http://www.youtube.com/watch?v=usGkq7tAhfc

have a nice weekend

--
Doug


>
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
>
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
>
> VERSUS
>
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
>
> Factor 100 difference to those cards.
>
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
>
>
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
>
> When fermi released it was a great gpu.
>
> Regrettably they lobotomized the gamers card's double precision as i
> understand,
> So it hardly has double precision capabilities; if you go for nvidia
> you sure need a Tesla,
> no question about it.
>
> As a company i would buy in 6990's though, they're a lot cheaper and
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less
> than 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
>
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for
> AMD
> versus 448 cores nvidia with 448 execution units of 32 bits
> multiplication.
>
> Especially because multiplication has improved a lot.
>
> Already having written CUDA code some while ago, i wanted the cheap
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able
> to report to you what is possible to
> achieve at that card with respect to prime numbers and such.
>
> I'm a bit amazed so little public initiatives write code for the AMD
> gpu's.
>
> Note that DDR5 ram doesn't have ECC by default, but has in case of
> AMD a CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC,
> but works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on
> is possible.
>
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia
> is an alternative then.
>
> In earlier posts from some time ago and some years ago i already
> wrote on that governments should
> adapt more to how hardware develops rather than demand that hardware
> has to follow them.
>
> HPC has too little cash to demand that from industry.
>
> OpenCL i cannot advice at this moment (for a number of reasons).
>
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
>
> Any company doing gpgpu i would advice developing both branches of
> code at the same time,
> as that gives the company a lot of extra choices for really very
> little extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your
> production code.
>
> That said we can safely expect that from raw performance coming years
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
>
> Even then i'd never bet at just 1 manufacturer. Go for both
> considering the cheap price of it.
>
> For a lot of HPC centers the choice of nvidia will be an easy one, as
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering
> other demands that's what they'll go for.
>
> That might change once you stick in bunches of videocards in nodes.
>
> Please note that the gpu 'streamcores' or PE's whatever name you want
> to give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use
> the RAM.
>
> Both for Nvidia as well as AMD, the streamcores are so fast, that you
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
>
> Add to that, that nvidia (have to still figure out for AMD) can in
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many
> nodes,
> all that shouldn't be an issue in the first place.
>
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that
> would really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered
> it mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on
> that...
>
> Now all choices are valid there, but even then we speak about peanuts
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be
> maybe $300k-$400k or something?
>
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD
> 6990 is 2000 euro.
>
> There won't be 2 gpu nvidia's any soon because of the choice they
> have historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at
> realworldtech in a special article he wrote there.
>
> Please note i'm not judging AMD nor Nvidia, they have made their
> choices based upon totally different
> businessmodels i suspect and we must be happy we have this rich
> choice right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
>
> Nvidia really seems to aim at supercomputers, giving their tesla line
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full
> functionality
> without lobotomization.
>
> Total different businessmodels. Both have their advantages and
> disadvantages.
>
>  From pure performance viewpoint it's easy to see what's faster though.
>
> Yet right now i realize all too well that just too many still
> hesitate between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd,
> kicks butt of course from throughput viewpoint.
>
> To be really honest with you guys, i had expected that by 2011 we
> would have a gpu reaching far over 1 Teraflop double precision
> handsdown. If we see that Nvidia delivers somewhere around 515 Gflop
> and AMD has 2 gpu's on a single card to get over that Teraflop double
> precision (claim is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
>
> Now of course i hope you realize i'm not coding double precision code
> at all; i'm writing everything in integers of 32 bits for the AMD
> card and the Nvidia equivalent also is using 32 bits integers. The
> ideal way to do calculations on those cards, so also very big
> transforms, is using the 32 x 32 == 64 bits instructions (that's 2
> instructions in case of AMD).
>
> Regards,
> Vincent
>
>
>>
>> Gus Correa
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at mcmaster.ca  Fri Apr  8 08:45:08 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 8 Apr 2011 08:45:08 -0400 (EDT)
Subject: [Beowulf] Westmere EX
In-Reply-To: <BANLkTikHZF7-n6W-wORPpoTBrSxgRrKevA@mail.gmail.com>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>
	<BANLkTikHZF7-n6W-wORPpoTBrSxgRrKevA@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1104080838380.12696@coffee.psychology.mcmaster.ca>

>> notice it says 32GB max memory size. ??even if that means 32GB/socket,
>> it's not all that much.
>
> It's actually 32GB per DIMM, so up to 512GB per socket.

right - I eventually found the non-marketing docs.  each socket 
has two memory controllers, each of which supports 2 "intel scalable memory"
channels, which support an intel scalable memory buffer, which supports
4 dimms.   (the ISMB actually referred to as "advanced memory buffer"
in one place, like from fbdimm days...)

it also has double-bit correction, triple bit detection on the last-level
cache.  definitely not designed for cheap or even compact systems...

-mark
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From eugen at leitl.org  Fri Apr  8 15:42:37 2011
From: eugen at leitl.org (Eugen Leitl)
Date: Fri, 8 Apr 2011 21:42:37 +0200
Subject: [Beowulf] [FoRK] Cray help?? Re: FaceBook tries to cream Google
Message-ID: <20110408194237.GH23560@leitl.org>

----- Forwarded message from "J. Andrew Rogers" <andrew at ceruleansystems.com> -----

From: "J. Andrew Rogers" <andrew at ceruleansystems.com>
Date: Fri, 8 Apr 2011 12:27:35 -0700
To: Friends of Rohit Khare <fork at xent.com>
Subject: Re: [FoRK] Cray help??  Re:  FaceBook tries to cream Google
X-Mailer: Apple Mail (2.1084)
Reply-To: Friends of Rohit Khare <fork at xent.com>


On Apr 8, 2011, at 11:15 AM, Stephen Williams wrote:
> I used RabbitMQ not long ago.  Impressed with some of it, not with a lot of the rest.  Digging through Erlang to determine its real details and limitations was interesting.  The group that had chosen it assumed magic that was not there.  Bottlenecks were going to kill scalability using the naive design.


ZeroMQ is not an MQ despite its name. It is a high-performance implementation of messaging design patterns, including some that are MQ-like. I believe it had aspirations to be an MQ many years ago but turned into an MPI-like high-performance messaging library that abstracts network, IPC, and in-process communication.

The basic network performance and scalability of ZeroMQ is similar to MPI. Underneath the hood it is just a collection of lockless, async structures grafted to the usual operating system hooks. 

Thinking of it as a competitor to MPI in terms of basic functionality is probably the correct framing.

J. Andrew Rogers


_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From eugen at leitl.org  Fri Apr  8 15:42:51 2011
From: eugen at leitl.org (Eugen Leitl)
Date: Fri, 8 Apr 2011 21:42:51 +0200
Subject: [Beowulf] [FoRK] FaceBook tries to cream Google
Message-ID: <20110408194251.GI23560@leitl.org>

----- Forwarded message from "J. Andrew Rogers" <andrew at ceruleansystems.com> -----

From: "J. Andrew Rogers" <andrew at ceruleansystems.com>
Date: Fri, 8 Apr 2011 10:36:31 -0700
To: Friends of Rohit Khare <fork at xent.com>
Subject: Re: [FoRK] FaceBook tries to cream Google
X-Mailer: Apple Mail (2.1084)
Reply-To: Friends of Rohit Khare <fork at xent.com>


On Apr 8, 2011, at 8:05 AM, Stephen Williams wrote:
> 
> Agreed.  Strange that MPI isn't more widely used (outside supercomputing projects).  Although, I'm not aware of it expecting and handling faults / rework as a good Mapreduce imitation, and similar systems, must.


It is not that strange, MPI is a bit brittle as a communication library standard. Implementations tend to make simplifying assumptions that are not valid for some parallel applications. You can patch it up to do anything but the level of effort required seems to relegate it to just being used in scientific computing for which it was designed. I've seen ZeroMQ being increasingly used for roughly the same purpose as MPI in "normal" distributed systems, and I personally do not see much reason to prefer the latter over the former for most things. 

The difference is history. MPI's weakness is that it started from a mediocre design that immediately became part of a standards process, with all the politics and buy-in that entails. It is also badly documented as a practical matter. ZMQ also started with a somewhat dodgy early design but as a library rather than a standard; it was iterated by hackers over several versions into a more sensible and capable design. ZMQ has been willing to break backward compatibility to fix behaviors that irritated the programmers that use it or add badly needed features, which is possible because the "standard" is the implementation.


J. Andrew Rogers


_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Tue Apr 12 16:31:41 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 12 Apr 2011 16:31:41 -0400
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -
	pre-alpha release
Message-ID: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com>

If you are using the "Job to Core Binding" feature in SGE and running
SGE on newer hardware, then please give the new hwloc enabled
loadcheck a try.

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

The current hardware topology discovery library (Portable Linux
Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
hardware topology may not be detected correctly by PLPA.

If you are running SGE on AMD Magny-Cours servers, please post your
loadcheck output, as it is known to be wrong when handled by PLPA.

The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
support in later releases of Grid Engine / Grid Scheduler.

http://gridscheduler.sourceforge.net/

Thanks!!
Rayson
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Wed Apr 13 12:21:21 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Wed, 13 Apr 2011 12:21:21 -0400
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
 -pre-alpha release
In-Reply-To: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2>
References: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com>
	<26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2>
Message-ID: <BANLkTinLb2Di_5dFEfemqsZG5UyG4KytBQ@mail.gmail.com>

Carlos,

I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
arch string, so I believe you are running the loadcheck from standard
Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
the one from the Open Grid Scheduler page.

The existing Grid Engine (including the latest Open Grid Scheduler
releases: SGE 6.2u5p1 & SGE 6.2u5p2, or Univa's fork) uses PLPA, and
it is known to be wrong on magny-cours.

(i.e. SGE 6.2u5p1 & SGE 6.2u5p2 from:
http://sourceforge.net/projects/gridscheduler/files/ )


Chansup on the Grid Engine mailing list (it's the general purpose Grid
Engine mailing list for now) tested the version I uploaded last night,
and seems to work on a dual-socket magny-cours AMD machine. It prints:

m_topology      SCCCCCCCCCCCCSCCCCCCCCCCCC

However, I am still fixing the processor, core id mapping code:

http://gridengine.org/pipermail/users/2011-April/000629.html
http://gridengine.org/pipermail/users/2011-April/000628.html

I compiled the hwloc enabled loadcheck on kernel 2.6.34 & glibc 2.12,
so it may not work on machines running lower kernel or glibc versions,
you can download it from:

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

Rayson


On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
<carlosf at cesga.es> wrote:
> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD system
> (and seems to be wrong!):
>
> arch ? ? ? ? ? ?lx24-amd64
> num_proc ? ? ? ?24
> m_socket ? ? ? ?2
> m_core ? ? ? ? ?12
> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
> load_short ? ? ?0.29
> load_medium ? ? 0.13
> load_long ? ? ? 0.04
> mem_free ? ? ? ?26257.382812M
> swap_free ? ? ? 8191.992188M
> virtual_free ? ?34449.375000M
> mem_total ? ? ? 32238.328125M
> swap_total ? ? ?8191.992188M
> virtual_total ? 40430.320312M
> mem_used ? ? ? ?5980.945312M
> swap_used ? ? ? 0.000000M
> virtual_used ? ?5980.945312M
> cpu ? ? ? ? ? ? 0.0%
>
>
> Carlos Fernandez Sanchez
> Systems Manager
> CESGA
> Avda. de Vigo s/n. Campus Vida
> Tel.: (+34) 981569810, ext. 232
> 15705 - Santiago de Compostela
> SPAIN
>
> --------------------------------------------------
> From: "Rayson Ho" <raysonlogin at gmail.com>
> Sent: Tuesday, April 12, 2011 10:31 PM
> To: "Beowulf List" <Beowulf at beowulf.org>
> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
> -pre-alpha release
>
>> If you are using the "Job to Core Binding" feature in SGE and running
>> SGE on newer hardware, then please give the new hwloc enabled
>> loadcheck a try.
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> The current hardware topology discovery library (Portable Linux
>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>> hardware topology may not be detected correctly by PLPA.
>>
>> If you are running SGE on AMD Magny-Cours servers, please post your
>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>
>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>> support in later releases of Grid Engine / Grid Scheduler.
>>
>> http://gridscheduler.sourceforge.net/
>>
>> Thanks!!
>> Rayson
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Fri Apr 15 10:12:00 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Fri, 15 Apr 2011 10:12:00 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding
	enhancement)
Message-ID: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>

Hi all,

Distributing Linux application binaries is proven to be a major issue
as a lot of people wanted to test the hwloc loadcheck but their Linux
versions are older than mine. And compiling SGE from source is not
simple neither -- I wrote a quick & dirty guide for those who don't
want the add-ons but it's usually the extra stuff & dependencies that
fail the build. So I would like to offer pre-compiled binaries and
upload them onto sourceforge.

I know it's a complicated question - what version of Linux should I
use to build Grid Engine / Open Grid Scheduler when the binaries are
for others to consume??

(In case you are interested, the quick compile guide is at:
http://gridscheduler.sourceforge.net/CompileGridEngineSource.html )

Prakashan: I tried to link it statically, and I even tried to compile
an older version of glibc on my machine, but I could not get either of
them to work!!

Rayson


On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath <ppk at ats.ucla.edu> wrote:
> Hi Rayson,
>
> Do you have a statically linked version? Thanks.
>
> ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by
> ./loadcheck)
>
> Prakashan
>
>
>
> On 04/13/2011 09:21 AM, Rayson Ho wrote:
>>
>> Carlos,
>>
>> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
>> arch string, so I believe you are running the loadcheck from standard
>> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
>> the one from the Open Grid Scheduler page.
>>
>> The existing Grid Engine (including the latest Open Grid Scheduler
>> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and
>> it is known to be wrong on magny-cours.
>>
>> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from:
>> http://sourceforge.net/projects/gridscheduler/files/ )
>>
>>
>> Chansup on the Grid Engine mailing list (it's the general purpose Grid
>> Engine mailing list for now) tested the version I uploaded last night,
>> and seems to work on a dual-socket magny-cours AMD machine. It prints:
>>
>> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC
>>
>> However, I am still fixing the processor, core id mapping code:
>>
>> http://gridengine.org/pipermail/users/2011-April/000629.html
>> http://gridengine.org/pipermail/users/2011-April/000628.html
>>
>> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12,
>> so it may not work on machines running lower kernel or glibc versions,
>> you can download it from:
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> Rayson
>>
>>
>>
>> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
>> <carlosf at cesga.es> ?wrote:
>>>
>>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD
>>> system
>>> (and seems to be wrong!):
>>>
>>> arch ? ? ? ? ? ?lx24-amd64
>>> num_proc ? ? ? ?24
>>> m_socket ? ? ? ?2
>>> m_core ? ? ? ? ?12
>>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
>>> load_short ? ? ?0.29
>>> load_medium ? ? 0.13
>>> load_long ? ? ? 0.04
>>> mem_free ? ? ? ?26257.382812M
>>> swap_free ? ? ? 8191.992188M
>>> virtual_free ? ?34449.375000M
>>> mem_total ? ? ? 32238.328125M
>>> swap_total ? ? ?8191.992188M
>>> virtual_total ? 40430.320312M
>>> mem_used ? ? ? ?5980.945312M
>>> swap_used ? ? ? 0.000000M
>>> virtual_used ? ?5980.945312M
>>> cpu ? ? ? ? ? ? 0.0%
>>>
>>>
>>> Carlos Fernandez Sanchez
>>> Systems Manager
>>> CESGA
>>> Avda. de Vigo s/n. Campus Vida
>>> Tel.: (+34) 981569810, ext. 232
>>> 15705 - Santiago de Compostela
>>> SPAIN
>>>
>>> --------------------------------------------------
>>> From: "Rayson Ho"<raysonlogin at gmail.com>
>>> Sent: Tuesday, April 12, 2011 10:31 PM
>>> To: "Beowulf List"<Beowulf at beowulf.org>
>>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
>>> -pre-alpha release
>>>
>>>> If you are using the "Job to Core Binding" feature in SGE and running
>>>> SGE on newer hardware, then please give the new hwloc enabled
>>>> loadcheck a try.
>>>>
>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>>
>>>> The current hardware topology discovery library (Portable Linux
>>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>>>> hardware topology may not be detected correctly by PLPA.
>>>>
>>>> If you are running SGE on AMD Magny-Cours servers, please post your
>>>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>>>
>>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>>>> support in later releases of Grid Engine / Grid Scheduler.
>>>>
>>>> http://gridscheduler.sourceforge.net/
>>>>
>>>> Thanks!!
>>>> Rayson
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Fri Apr 15 10:19:04 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Apr 2011 10:19:04 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
Message-ID: <4DA853D8.8000308@scalableinformatics.com>

On 04/15/2011 10:12 AM, Rayson Ho wrote:

> I know it's a complicated question - what version of Linux should I
> use to build Grid Engine / Open Grid Scheduler when the binaries are
> for others to consume??

I'd recommend a Centos 5.x variant, and possibly a SuSE variant.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Apr 15 12:15:38 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Apr 2011 12:15:38 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <4DA853D8.8000308@scalableinformatics.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
	<4DA853D8.8000308@scalableinformatics.com>
Message-ID: <4DA86F2A.5010708@ias.edu>

On 04/15/2011 10:19 AM, Joe Landman wrote:
> On 04/15/2011 10:12 AM, Rayson Ho wrote:
> 
>> I know it's a complicated question - what version of Linux should I
>> use to build Grid Engine / Open Grid Scheduler when the binaries are
>> for others to consume??
> 
> I'd recommend a Centos 5.x variant, and possibly a SuSE variant.
> 

I agree, but I think that if you can get your hands on an actual RHEL
image, that's what you should use, as long as you already have access to
it.

-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Fri Apr 15 12:25:10 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Fri, 15 Apr 2011 12:25:10 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <4DA86F2A.5010708@ias.edu>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
	<4DA853D8.8000308@scalableinformatics.com>
	<4DA86F2A.5010708@ias.edu>
Message-ID: <BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>

Thanks all!!

If I build on Centos 5.6, will the binaries run on SuSE & Ubuntu??

(Don't want what versions SuSE & Ubuntu most people are using -- I
have Ubuntu 10 & 11 on my machines, and F13.)

Rayson


On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal <prentice at ias.edu> wrote:
> On 04/15/2011 10:19 AM, Joe Landman wrote:
>> On 04/15/2011 10:12 AM, Rayson Ho wrote:
>>
>>> I know it's a complicated question - what version of Linux should I
>>> use to build Grid Engine / Open Grid Scheduler when the binaries are
>>> for others to consume??
>>
>> I'd recommend a Centos 5.x variant, and possibly a SuSE variant.
>>
>
> I agree, but I think that if you can get your hands on an actual RHEL
> image, that's what you should use, as long as you already have access to
> it.
>
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Apr 15 13:49:55 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Apr 2011 13:49:55 -0400
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <BANLkTikKbnGQVkXhwSgfryxVN6zjWoMVbg@mail.gmail.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>	<4DA853D8.8000308@scalableinformatics.com>	<4DA86F2A.5010708@ias.edu>
	<BANLkTikKbnGQVkXhwSgfryxVN6zjWoMVbg@mail.gmail.com>
Message-ID: <4DA88543.6000603@ias.edu>


On 04/15/2011 01:40 PM, Chi Chan wrote:
> On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>> I agree, but I think that if you can get your hands on an actual RHEL
>> image, that's what you should use, as long as you already have access to
>> it.
> 
> Or just use Oracle Linux, it is free to download and distribute, and
> can be used in production:
> 
> http://www.oracle.com/us/technologies/linux/competitive-335546.html
> http://www.oracle.com/us/technologies/027617.pdf
> 
> From my experience, Oracle Linux and RHEL are idential, you can
> compile applications on Oracle Linux and ship it to run on RHEL boxes.

I had recommended RHEL just because its the "gold standard" for all
RHEL-derived distros. CentOS and a few others *should* be identical.
However, I don't think Oracle is. Doesn't Oracle make some changes to
optimize it for running Oracle? I'm not sure of that, which is why I'm
asking and not stating.


-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Sun Apr 17 20:36:40 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Mon, 18 Apr 2011 10:36:40 +1000
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>	<4DA853D8.8000308@scalableinformatics.com>	<4DA86F2A.5010708@ias.edu>
	<BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
Message-ID: <4DAB8798.8070102@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16/04/11 02:25, Rayson Ho wrote:

> If I build on Centos 5.6, will the binaries run on
> SuSE & Ubuntu??

I'd suggest that if you want them to work (and especially
if you want to package them appropriately) then you're
far better off getting a build machine for the OS's you
want to support.  CentOS, SLES, Debian & Ubuntu.

We build all our x86 stuff on our CentOS5 cluster and
rsync it over to our RHEL5 cluster (sadly we can't just
share /usr/local/ between them due to circumstances
beyond our control) without issues.

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz
430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy
=stQg
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Apr 18 09:03:50 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 18 Apr 2011 15:03:50 +0200
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
	binding enhancement)
In-Reply-To: <4DAB8798.8070102@unimelb.edu.au>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>	<4DA853D8.8000308@scalableinformatics.com>	<4DA86F2A.5010708@ias.edu>
	<BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
	<4DAB8798.8070102@unimelb.edu.au>
Message-ID: <DF74F810-AA89-408E-B35B-CBA9184F929E@staff.uni-marburg.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 18.04.2011 um 02:36 schrieb Christopher Samuel:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 16/04/11 02:25, Rayson Ho wrote:
> 
>> If I build on Centos 5.6, will the binaries run on
>> SuSE & Ubuntu??
> 
> I'd suggest that if you want them to work (and especially
> if you want to package them appropriately) then you're
> far better off getting a build machine for the OS's you
> want to support.  CentOS, SLES, Debian & Ubuntu.

Before there was only a common and a platform specific tarball. Does it imply to supply *.rpm in the future? It was always nice to just untar SGE and run it as a normal user w/o any root privilege (yes, rpm2cpio could do). And it was one tarball for all Linux variants. I would vote for staying with this.

- -- Reuti


> We build all our x86 stuff on our CentOS5 cluster and
> rsync it over to our RHEL5 cluster (sadly we can't just
> share /usr/local/ between them due to circumstances
> beyond our control) without issues.
> 
> cheers,
> Chris
> - -- 
>    Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>         http://www.vlsci.unimelb.edu.au/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz
> 430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy
> =stQg
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)

iEYEARECAAYFAk2sNsQACgkQo/GbGkBRnRr55QCdGyBkTKd7EsTWSvVPRWuMQbGA
kOQAniYFwJyMOlwcR3ITHS9nAfGRZndh
=iknW
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Mon Apr 18 11:24:00 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 18 Apr 2011 11:24:00 -0400 (EDT)
Subject: [Beowulf] Grid Engine build machine (was: multi-core thread
 binding enhancement)
In-Reply-To: <DF74F810-AA89-408E-B35B-CBA9184F929E@staff.uni-marburg.de>
References: <BANLkTikET_j6+rk6_m00qzYKP6U2zGJLnQ@mail.gmail.com>
	<4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu>
	<BANLkTimV0jLKq562uFnB3fBZUQ8bYusQ6A@mail.gmail.com>
	<4DAB8798.8070102@unimelb.edu.au>
	<DF74F810-AA89-408E-B35B-CBA9184F929E@staff.uni-marburg.de>
Message-ID: <Pine.LNX.4.64.1104181122330.26910@coffee.psychology.mcmaster.ca>

not to be overly surly, 
but this really has nothing to do with beowulf
and is a rather specialized sge support issue...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Mon Apr 18 12:34:11 2011
From: mathog at caltech.edu (David Mathog)
Date: Mon, 18 Apr 2011 09:34:11 -0700
Subject: [Beowulf] Grid Engine build machine
Message-ID: <E1QBrOt-0005OD-A5@mendel.bio.caltech.edu>

Rayson Ho <raysonlogin at gmail.com> wrote
> And compiling SGE from source is not
> simple neither -- I wrote a quick & dirty guide for those who don't
> want the add-ons but it's usually the extra stuff & dependencies that
> fail the build.

Does it still use aimk or has it finally gone over to autoconf, automake?
As I recall aimk was really touchy the last time I built this (4
years ago), with lots of futzing around to convince it to use library
files it should have found on its own.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Apr 18 12:36:19 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 18 Apr 2011 18:36:19 +0200
Subject: [Beowulf] Grid Engine build machine
In-Reply-To: <E1QBrOt-0005OD-A5@mendel.bio.caltech.edu>
References: <E1QBrOt-0005OD-A5@mendel.bio.caltech.edu>
Message-ID: <B603D5CF-D3F8-43EE-A4AE-F4414DB451EE@staff.uni-marburg.de>

Am 18.04.2011 um 18:34 schrieb David Mathog:

> Rayson Ho <raysonlogin at gmail.com> wrote
>> And compiling SGE from source is not
>> simple neither -- I wrote a quick & dirty guide for those who don't
>> want the add-ons but it's usually the extra stuff & dependencies that
>> fail the build.
> 
> Does it still use aimk

Still aimk.

-- Reuti

> or has it finally gone over to autoconf, automake?
> As I recall aimk was really touchy the last time I built this (4
> years ago), with lots of futzing around to convince it to use library
> files it should have found on its own.
> 
> Regards,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Mon Apr 18 14:26:57 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Mon, 18 Apr 2011 14:26:57 -0400
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
 -pre-alpha release
In-Reply-To: <4DA5E85D.4010801@ats.ucla.edu>
References: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com>
	<26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2>
	<BANLkTinLb2Di_5dFEfemqsZG5UyG4KytBQ@mail.gmail.com>
	<4DA5E85D.4010801@ats.ucla.edu>
Message-ID: <BANLkTinMFKDr7t6oARV5vYxkgj1iq1gYKQ@mail.gmail.com>

For those who had issues with earlier version, please try the latest
loadcheck v4:

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

I compiled the binary on Oracle Linux, which is compatible with RHEL
5.x, Scientific Linux or Centos 5.x. I tested the binary on the
standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise
Kernel", Fedora 13, Ubuntu 10.04 LTS.

Optimizing for AMD's NUMA machine characteristics is on the ToDo list.

Rayson


On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath <ppk at ats.ucla.edu> wrote:
> Hi Rayson,
>
> Do you have a statically linked version? Thanks.
>
> ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by
> ./loadcheck)
>
> Prakashan
>
>
>
> On 04/13/2011 09:21 AM, Rayson Ho wrote:
>>
>> Carlos,
>>
>> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
>> arch string, so I believe you are running the loadcheck from standard
>> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
>> the one from the Open Grid Scheduler page.
>>
>> The existing Grid Engine (including the latest Open Grid Scheduler
>> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and
>> it is known to be wrong on magny-cours.
>>
>> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from:
>> http://sourceforge.net/projects/gridscheduler/files/ )
>>
>>
>> Chansup on the Grid Engine mailing list (it's the general purpose Grid
>> Engine mailing list for now) tested the version I uploaded last night,
>> and seems to work on a dual-socket magny-cours AMD machine. It prints:
>>
>> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC
>>
>> However, I am still fixing the processor, core id mapping code:
>>
>> http://gridengine.org/pipermail/users/2011-April/000629.html
>> http://gridengine.org/pipermail/users/2011-April/000628.html
>>
>> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12,
>> so it may not work on machines running lower kernel or glibc versions,
>> you can download it from:
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> Rayson
>>
>>
>>
>> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
>> <carlosf at cesga.es> ?wrote:
>>>
>>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD
>>> system
>>> (and seems to be wrong!):
>>>
>>> arch ? ? ? ? ? ?lx24-amd64
>>> num_proc ? ? ? ?24
>>> m_socket ? ? ? ?2
>>> m_core ? ? ? ? ?12
>>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
>>> load_short ? ? ?0.29
>>> load_medium ? ? 0.13
>>> load_long ? ? ? 0.04
>>> mem_free ? ? ? ?26257.382812M
>>> swap_free ? ? ? 8191.992188M
>>> virtual_free ? ?34449.375000M
>>> mem_total ? ? ? 32238.328125M
>>> swap_total ? ? ?8191.992188M
>>> virtual_total ? 40430.320312M
>>> mem_used ? ? ? ?5980.945312M
>>> swap_used ? ? ? 0.000000M
>>> virtual_used ? ?5980.945312M
>>> cpu ? ? ? ? ? ? 0.0%
>>>
>>>
>>> Carlos Fernandez Sanchez
>>> Systems Manager
>>> CESGA
>>> Avda. de Vigo s/n. Campus Vida
>>> Tel.: (+34) 981569810, ext. 232
>>> 15705 - Santiago de Compostela
>>> SPAIN
>>>
>>> --------------------------------------------------
>>> From: "Rayson Ho"<raysonlogin at gmail.com>
>>> Sent: Tuesday, April 12, 2011 10:31 PM
>>> To: "Beowulf List"<Beowulf at beowulf.org>
>>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
>>> -pre-alpha release
>>>
>>>> If you are using the "Job to Core Binding" feature in SGE and running
>>>> SGE on newer hardware, then please give the new hwloc enabled
>>>> loadcheck a try.
>>>>
>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>>
>>>> The current hardware topology discovery library (Portable Linux
>>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>>>> hardware topology may not be detected correctly by PLPA.
>>>>
>>>> If you are running SGE on AMD Magny-Cours servers, please post your
>>>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>>>
>>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>>>> support in later releases of Grid Engine / Grid Scheduler.
>>>>
>>>> http://gridscheduler.sourceforge.net/
>>>>
>>>> Thanks!!
>>>> Rayson
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr 21 08:59:30 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 21 Apr 2011 14:59:30 +0200
Subject: [Beowulf] GPU's - was Westmere EX
In-Reply-To: <4D9E5ECB.60608@ldeo.columbia.edu>
References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com><Pine.LNX.4.64.1104062021490.12538@coffee.psychology.mcmaster.ca>	<4D9D1E35.9040802@berkeley.edu>	<207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
	<5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl>
	<4D9DE57F.4040303@ldeo.columbia.edu>
	<59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl>
	<4D9E5ECB.60608@ldeo.columbia.edu>
Message-ID: <0F717DDD-470A-4B13-B1AF-FBCB034409DC@xs4all.nl>

hi,

Sometimes going through some old emails.

Note in the meantime i switched from AMD-CAL to OpenCL.

On Apr 8, 2011, at 3:03 AM, Gus Correa wrote:

> Thank you for the information about AMD-CAL and the AMD GPUs.
> Does AMD plan any GPU product with 64-bit and ECC,
> similar to Tesla/Fermi?

Actually DDR5 already calculates a CRC. Not as good as ECC, but it  
takes care you
have a form of checking. Also the amount of bitflips is so little as  
the quality of this DDR5 is so great,
according to some memory experts i spoke with, that this CRC is more  
than sufficient.

As i'm not a memory expert i would advice you to really speak with  
such a guy instead of some HPC guys here.

Now if your organisation wants ECC simply i'm not going to argue. A  
demand is a demand there.

I'm busy pricewise here how to build cheap something that delivers a  
big punch.

If you look objectively and then to gpgpu codes, then of course  
Nvidia has a few years more experience
setting up CUDA.

This is another problem of course, software support. Both suck at it,  
to say polite.

Yet we want to do calculations cheap huh.

Yet if performance matters, then AMD is a very cheap alternative.
In both cases of course, programming for a gpu is going to be the  
bottleneck;
historically organisations do not invest in good code, they only  
invest in hardware and in managers who
sit on their behind, drink coffee and do meetings.

Objectively most codes you can also code in 32 bits.

If we do a simple compare then the HD6990 is there for 540 euro in  
the shop here. Now that's European prices
where salestax is 19%, so in USA probably it's cheaper (if you  
calculate it back to euro's).

Let's now ignore the marketing nonsense ok, as marketing nonsense is  
marketing nonsense.
All those theoretic flops always, they shouldn't allow double  
counting specific instructions like multiply add.

The internals of these gpu's are all organized such that doing  
efficient matrix calculations on them is very well
possible. Not easy to solve well, as the bottleneck will be the  
bandwidth from the DDR3 cpu ram to the gpu,
  yet if you look to a lot of calculations, then it's algorithmic  
possible to do a lot more work at the execution unit
side than the bandwidth you need to another node; those execution  
units, PE's (processing elements)
  nowadays called, have huge GPR's which can proces all that. With  
that those tiny cheap power efficient cores
can easily take on huge expensive cpu cores. A single set of 4 PE's  
in case of AMD has a total of 1024 GPR's,
can read from a L1 cache when needed and write to a shared local  
cache of 32KB (shared by 64 pe's).

That L1 reads from the memory L2 and all that has a huge bandwidth.

That gives you PRACTICAL 3072 PE's @ 0.83 Ghz == 2.5+ Tflop in 32  
bits integers. It's not so hard to convert
that to 64 bits code if that's what you need. In fact i'm using it to  
approximate huge integers (prime numbers)
of million bit sizes (factorisation of them).

Using that efficiently is not easy, yet realize this is 2.5+ Tflop (i  
should actually say Tera 32 bits integer performance).
Good programmers can use todays GPU's very efficiently.

The 6000+ series of AMD and the Fermi series of Nvidia are very good  
and you can use them in a sustained manner.

Now the cheapest gpgpu of Nvidia is about $1200 which is the quadro  
6000 series and delivers 448 cores @ 1.2Ghz,
say roughly 550 Gflop.

Of course this is practical what you can achieve, i'm not counting of  
course multiply-add here as being 2 flops,
which is their own definition of how many gflops it gets; first of  
all i'm not interested in flops but in integers per cycle
and secondly i prefer a realistic measure, otherwise we have no  
measure on how efficiently we use the gpu.

If you look from a mathematical viewpoint, it's not so clever from  
most scientists at todays huge calculations to use
floating point. Single precision or double precision, in the end it  
all backtracks errors and you have complete non-deterministic results  
with big sureness.

Much better are integer transforms where you have 100% lossless  
calculations so big sureness your calculation is ok.

Yet i realize this is a very expertise field with most people who  
know something about that hiding in secrecy using fake names
and some even having fake burials, just in order to disappear. That  
in itself is all very sad, as progressing science doesn't
happen. As a result of that scientific world has focussed too much  
upon floating point.

Yet the cards can deliver that as well as we know.

The round off errors all those floating point calculations cause are  
always a huge multiple of bitflips of memory.
It's not even in the same league. Now of course calculating with 64  
bits integers it's easier to do huge transforms
and you can redo your calculation and at some spots you will have  
deterministic output in such case, in others of
course not (depends what you calculate of course - majority is non- 
deterministic).

With 32 bits integers you need a lot of CRT (Chinese Remainder  
Theorem) tricks to effectively use it for huge transforms,
or you simply emulate 64 bits calculations (so with 64 bits  
precision, please do not confuse with double precision
floating point).

Getting all that to work is very challenging and not easy, i realize  
that.

Yet look at the huge advantage you give to your scientists in such case.

They can look years ahead in the future which is a *huge* advantage.

In this manner you'll actually effectively get 2.x Tflop out of those  
6990, again that's 2 Tflop calculated in my manner, i'm looking  
simply at INSTRUCTION LEVEL where 1 instruction represents a single  
unit of 32 bits; counting the multiply-add instruction
as 2 flops is just too confusing for how efficient you manage to load  
your GPU, if you ask me.

In the transforms in fact multiply-add is very silly to use in many  
cases as that means you're doing some sort of inefficient
calculation.

Yet that chippie is just 500 euro, versus Nvidia delivers it for 1200  
dollar and the nvidia one is factor 3 slower,
though still lightyears faster than a CPU solution there (pricewise  
seen).

The quadro 6000 for those who don't realize it, is exactly the same  
like a Tesla. Just checkout the specs.

Yet of course for our lazy scientists all of the above is not so  
interesting. Just compiling your years 80 code,
pushing the enter button, is a lot easier.

If you care however for PERFORMANCE, consider spending a couple of  
thousands to hardware.

If you buy 1000 of those 6990's and program in opencl, you actually  
can also run that at nvidia hardware, might
nvidia be so lucky to release very quickly a 22 nm gpu some years  
from now. By then also nvidia's opencl will probably
be supporting the gpu hardware quite ok.

So my advice would be: program it in opencl. It's not the most  
efficient language on the planet, yet it'll work everywhere
and you can get probably around 2 Tflop out that 6990 AMD card.

That said of course there is a zillion problems still with opencl,  
yet if you want for $500k in gpu hardware
achieve 1 petaflop, you'll have to suffer a bit, and by the time your  
cluster is there, possibly all big bugs have
been fixed in opencl both by amd as well as by nvidia for their gpu  
lines.

Now all this said i do realize that you need a shift in thinking.  
Whether you use AMD-gpu's or Nvidia, in both cases you'll
need great new software. In fact it doesn't even matter whether you  
program it in OpenCL or CUDA. It's easy to port algorithms
from 1 entity to another; getting such algorithm to work is a lot  
harder than the question what language you program it in.

Translating CUDA to openCL is pretty much braindead work which many  
can carry out as we already saw in some examples.
The investment is in the software for the gpu's.

You don't buy that in from nvidia nor AMD. You'l have to hire people  
to program it, as your own scientists simply aren't good
enough to program efficiently for that GPU. The old fashionned vision  
of having scientists solve themselve how to do the
calculations is not going to work for gpgpu simply.

Now that is a big pitfall that is hard to overcome.

All this said, of course there is a few, really very few,  
applications where a full blown gpu nor hybrid solution is able to
solve the problems. Yet usually such claim that it is "not possible"  
gets done by scientists who are experts in their field,
but not very high level in finding solutions how to efficiently get  
their calculations done in HPC.

Regards,
Vincent

>
> The lack of a language standard may still be a hurdle here.
> I guess there were old postings here about CUDA and OpenGL.
> What fraction of the (non-gaming) GPU code is being written these days
> in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using
> compiler directives like those in the PGI compilers?
>
> Thank you,
> Gus Correa
>
> Vincent Diepeveen wrote:
>>
>> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>>
>>> Vincent Diepeveen wrote:
>>>
>>>> GPU monster box, which is basically a few videocards inside such a
>>>> box stacked up a tad, wil only add a couple of
>>>> thousands.
>>>>
>>>
>>> This price may be OK for the videocard-class GPUs,
>>> but sounds underestimated, at least for Fermi Tesla.
>>
>> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
>> note there is a 6 GB version, not aware of price will be $$$$ i bet.
>> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
>>
>> VERSUS
>>
>> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
>>
>> Factor 100 difference to those cards.
>>
>> A couple of thousands versus a couple of hundreds of thousands.
>> Hope i made my point clear.
>>
>>
>>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla  
>>> C2050,
>>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~ 
>>> $15k.
>>> If you care about ECC, that's the price you pay, right?
>>
>> When fermi released it was a great gpu.
>>
>> Regrettably they lobotomized the gamers card's double precision as i
>> understand,
>> So it hardly has double precision capabilities; if you go for  
>> nvidia you
>> sure need a Tesla,
>> no question about it.
>>
>> As a company i would buy in 6990's though, they're a lot cheaper and
>> roughly 3x faster
>> than the Nvidia's (for some more than 3x for other occassions less  
>> than
>> 3x, note the card
>> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
>>
>> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units  
>> for AMD
>> versus 448 cores nvidia with 448 execution units of 32 bits  
>> multiplication.
>>
>> Especially because multiplication has improved a lot.
>>
>> Already having written CUDA code some while ago, i wanted the cheap
>> gamers card with big
>> horse power now at home so  i'm toying on a 6970 now so will be  
>> able to
>> report to you what is possible to
>> achieve at that card with respect to prime numbers and such.
>>
>> I'm a bit amazed so little public initiatives write code for the  
>> AMD gpu's.
>>
>> Note that DDR5 ram doesn't have ECC by default, but has in case of  
>> AMD a
>> CRC calculation
>> (if i understand it correctly). It's a bit more primitive than  
>> ECC, but
>> works pretty ok and shows you
>> also when problems occured there, so figuring out remove what goes  
>> on is
>> possible.
>>
>> Make no mistake that this isn't ECC.
>> We know some HPC centers have as a hard requirement ECC, only  
>> nvidia is
>> an alternative then.
>>
>> In earlier posts from some time ago and some years ago i already  
>> wrote
>> on that governments should
>> adapt more to how hardware develops rather than demand that  
>> hardware has
>> to follow them.
>>
>> HPC has too little cash to demand that from industry.
>>
>> OpenCL i cannot advice at this moment (for a number of reasons).
>>
>> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but
>> majority of codes are possible
>> to port quite well (there is exceptions), or easy work arounds.
>>
>> Any company doing gpgpu i would advice developing both branches of  
>> code
>> at the same time,
>> as that gives the company a lot of extra choices for really very  
>> little
>> extra work. Maybe 1 coder,
>> and it always allows you to have the fastest setup run your  
>> production
>> code.
>>
>> That said we can safely expect that from raw performance coming years
>> AMD will keep the leading edge
>> from crunching viewpoint. Elsewhere i pointed out why.
>>
>> Even then i'd never bet at just 1 manufacturer. Go for both  
>> considering
>> the cheap price of it.
>>
>> For a lot of HPC centers the choice of nvidia will be an easy one, as
>> the price of the Fermi cards
>> is peanuts compared to the price rest of the system and considering
>> other demands that's what they'll go for.
>>
>> That might change once you stick in bunches of videocards in nodes.
>>
>> Please note that the gpu 'streamcores' or PE's whatever name you  
>> want to
>> give them, are so bloody fast,
>> that your code has to work within the PE's themselves and hardly  
>> use the
>> RAM.
>>
>> Both for Nvidia as well as AMD, the streamcores are so fast, that you
>> simply don't want to lose time on the RAM
>> when your software runs, let alone that you want to use huge RAM.
>>
>> Add to that, that nvidia (have to still figure out for AMD) can in
>> background stream from and to the gpu's RAM
>> from the CPU, so if you do really large calculations involving  
>> many nodes,
>> all that shouldn't be an issue in the first place.
>>
>> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that  
>> would
>> really amaze me, though i'm sure
>> there is cases where that happens. If we see however what was  
>> ordered it
>> mostly is the 3GB Tesla's,
>> at least on what has been reported, i have no global statistics on  
>> that...
>>
>> Now all choices are valid there, but even then we speak about peanuts
>> money compared to the price of
>> a single 8 socket Nehalem-ex box, which fully configured will be  
>> maybe
>> $300k-$400k or something?
>>
>> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD  
>> 6990
>> is 2000 euro.
>>
>> There won't be 2 gpu nvidia's any soon because of the choice they  
>> have
>> historically made for the memory controllers.
>> See explanation of intel fanboy David Kanter for that at  
>> realworldtech
>> in a special article he wrote there.
>>
>> Please note i'm not judging AMD nor Nvidia, they have made their  
>> choices
>> based upon totally different
>> businessmodels i suspect and we must be happy we have this rich  
>> choice
>> right now between cpu's from different
>> manufacturers and gpu's from different manufacturers.
>>
>> Nvidia really seems to aim at supercomputers, giving their tesla line
>> without lobotomization and lobotomizing their
>> gamers cards, where AMD aims at gamers and their gamercards have full
>> functionality
>> without lobotomization.
>>
>> Total different businessmodels. Both have their advantages and
>> disadvantages.
>>
>>  From pure performance viewpoint it's easy to see what's faster  
>> though.
>>
>> Yet right now i realize all too well that just too many still  
>> hesitate
>> between also offering gpu services additional to
>> cpu services, in which case having a gpu, regardless nvidia or amd,
>> kicks butt of course from throughput viewpoint.
>>
>> To be really honest with you guys, i had expected that by 2011 we  
>> would
>> have a gpu reaching far over 1 Teraflop double precision  
>> handsdown. If
>> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2
>> gpu's on a single card to get over that Teraflop double precision  
>> (claim
>> is 1.27 Teraflop double precision),
>> that really is underneath my expectations from a few years ago.
>>
>> Now of course i hope you realize i'm not coding double precision  
>> code at
>> all; i'm writing everything in integers of 32 bits for the AMD  
>> card and
>> the Nvidia equivalent also is using 32 bits integers. The ideal  
>> way to
>> do calculations on those cards, so also very big transforms, is using
>> the 32 x 32 == 64 bits instructions (that's 2 instructions in case  
>> of AMD).
>>
>> Regards,
>> Vincent
>>
>>
>>>
>>> Gus Correa
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Apr 21 09:11:54 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 21 Apr 2011 15:11:54 +0200
Subject: [Beowulf] Google: 1 billion computing core-hours for
	researchers to tackle huge scientific challenges
In-Reply-To: <4D9E4162.3030004@pathscale.com>
References: <4D9E4162.3030004@pathscale.com>
Message-ID: <A5923D60-5B04-4EB6-B8BB-BE0E17CC5BD1@xs4all.nl>

Regrettably the link is not available anymore. Can you expand on it?

As they count the cloud computing in units of 1Ghz per cpunode hour,
1 billion computing core hours is something like 1000 gpu's for 1 week?

1 billion sounds impressive nevertheless.

Regards,
Vincent

On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote:

> I just saw this on another ML and thought it may be of interest
> ------------
> http://googleblog.blogspot.com/2011/04/1-billion-computing-core- 
> hours-for.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Thu Apr 21 09:15:13 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Thu, 21 Apr 2011 15:15:13 +0200
Subject: [Beowulf] Google: 1 billion computing core-hours for
	researchers to tackle huge scientific challenges
In-Reply-To: <A5923D60-5B04-4EB6-B8BB-BE0E17CC5BD1@xs4all.nl>
References: <4D9E4162.3030004@pathscale.com>
	<A5923D60-5B04-4EB6-B8BB-BE0E17CC5BD1@xs4all.nl>
Message-ID: <5188F4D0-1D69-4B8F-874D-D20FDAC25CF6@staff.uni-marburg.de>

Am 21.04.2011 um 15:11 schrieb Vincent Diepeveen:

> Regrettably the link is not available anymore. Can you expand on it?

For me it's still working. You selected both lines?

--Reuti

> As they count the cloud computing in units of 1Ghz per cpunode hour,
> 1 billion computing core hours is something like 1000 gpu's for 1 week?
> 
> 1 billion sounds impressive nevertheless.
> 
> Regards,
> Vincent
> 
> On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote:
> 
>> I just saw this on another ML and thought it may be of interest
>> ------------
>> http://googleblog.blogspot.com/2011/04/1-billion-computing-core- 
>> hours-for.html
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.