From prentice at ias.edu Fri Apr 1 10:47:28 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 01 Apr 2011 10:47:28 -0400 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D93B1E4.3080407@cora.nwra.com> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <4D93B1E4.3080407@cora.nwra.com> Message-ID: <4D95E580.5090902@ias.edu> On 03/30/2011 06:42 PM, Orion Poplawski wrote: > On 03/21/2011 06:51 AM, Douglas Eadline wrote: >> I got to thinking about how others are fairing (or not) >> with GP-GPU technology. I put up a simple poll on >> ClusterMonkey to help get a general idea. >> (you can find it on the front page right top) >> If you have a moment, please provide >> your experience (results are available as well). > > We've seen some reasonable speedup (12x) with some matlab code using Jacket. > It required up-to-the-minute bugfixes/enhancements from Accelereyes to get it > working though. Ran into lots of limitations with some other code (sparse > matrices) that prevented it from being usable. Have some reports of success > with gpulib and IDL. > > I've installed 4 GPU-equipped servers in my environment; 2 are a part of my cluster, and 2 are independent from the cluster so that users can login interactively and program/debug/tinker/whatever. (My cluster doesn't allow interactive logins by design). A handful of users were interested in getting access to the GPUs, but so far, not a single one has even logged into these systems to kick the tires yet, and the systems have been online for approx. 9 months. It just be that they're busy with other work. Most of my users are post-docs who guide their own research, so they can create/modify their own project schedules as they see fit. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Fri Apr 1 18:41:07 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Fri, 1 Apr 2011 15:41:07 -0700 Subject: [Beowulf] AMD 8 cores vs 12 cores CPUs and Infiniband References: <1301387847.1995.144.camel@mundo><9FA59C95FFCBB34EA5E42C1A8573784F037FD082@mtiexch01.mti.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F037FD5B7@mtiexch01.mti.com> > > I have been using single card on Magny-Cours with no issues at all. > You can > > interesting. what adjustments have you made to the MPI stack to permit > this? > we've had a variety of apps that fail intermittently on high-core > nodes. > I have to say I was surprised such a thing came up - not sure whether > it's > inherent to IB or a result of the openmpi stack. our usual way to test > this is to gradually reduce the ranks-per-node for the job until it > starts > to work. an interesting cosmology code works at 1 pppn but not 3 ppn > on our recent 12c MC, mellanox QDR cluster. I will be more than happy to give it a try - have access to the Magny-Cours system at http://www.hpcadvisorycouncil.com/cluster_center.php > > regards, mark hahn. > _______________________________________________ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From akshar.bhosale at gmail.com Sat Apr 2 08:41:07 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Sat, 2 Apr 2011 18:11:07 +0530 Subject: [Beowulf] error in job; jobs failing Message-ID: Hi, we are getting dapl 4003 event error. We have rhel 5.2 x64 and intel mpi library 4.3;dapl-1.2.7-1.ofed1.3.1; What can be the reason? we have torque and pbs setup for job runs. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From herbert.fruchtl at st-andrews.ac.uk Mon Apr 4 11:15:35 2011 From: herbert.fruchtl at st-andrews.ac.uk (Herbert Fruchtl) Date: Mon, 04 Apr 2011 16:15:35 +0100 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: Message-ID: <4D99E097.7060807@st-andrews.ac.uk> They hear great success stories (which in reality are often prototype implementations that do one carefully chosen benchmark well), then look at the API, look at their existing code, and postpone the start of their project until they have six months spare time for it. And we know when that is. The current approach with more or less vendor specific libraries (be they "open" or not) limits the uptake of GPU computing to a few hardcore developers of experimental codes who don't mind rewriting their code every two years. It won't become mainstream until we have a compiler that turns standard Fortran (or C++, if it has to be) into GPU code. Anything that requires more change than let's say OpenMP directives is doomed, and rightly so. Herbert > > I've installed 4 GPU-equipped servers in my environment; 2 are a part of > my cluster, and 2 are independent from the cluster so that users can > login interactively and program/debug/tinker/whatever. (My cluster > doesn't allow interactive logins by design). > > A handful of users were interested in getting access to the GPUs, but so > far, not a single one has even logged into these systems to kick the > tires yet, and the systems have been online for approx. 9 months. It > just be that they're busy with other work. Most of my users are > post-docs who guide their own research, so they can create/modify their > own project schedules as they see fit. > > -- Herbert Fruchtl Senior Scientific Computing Officer School of Chemistry, School of Mathematics and Statistics University of St Andrews -- The University of St Andrews is a charity registered in Scotland: No SC013532 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Mon Apr 4 12:01:44 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Mon, 04 Apr 2011 23:01:44 +0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk> References: <4D99E097.7060807@st-andrews.ac.uk> Message-ID: <4D99EB68.4020800@pathscale.com> Herbert Fruchtl wrote: > They hear great success stories (which in reality are often prototype > implementations that do one carefully chosen benchmark well), then look at the > API, look at their existing code, and postpone the start of their project until > they have six months spare time for it. And we know when that is. > > The current approach with more or less vendor specific libraries (be they "open" > or not) limits the uptake of GPU computing to a few hardcore developers of > experimental codes who don't mind rewriting their code every two years. It won't > become mainstream until we have a compiler that turns standard Fortran (or C++, > if it has to be) into GPU code. Anything that requires more change than let's > say OpenMP directives is doomed, and rightly so. > Hi Herbert, I think your perspective pretty much nails it (shameless self promotion) http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source) This is really only the tip of the problem and there must also be solutions for scaling *efficiently* across the cluster. (No MPI + CUDA or even HMPP is *not* the answer imho.) ./C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Apr 4 12:53:22 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 4 Apr 2011 09:53:22 -0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk> References: <4D99E097.7060807@st-andrews.ac.uk> Message-ID: You've described it pretty well.. Look how long it took for "standard libraries" to take advantage of things like MPI to become "of course we use that".. If the original code used standard library calls for things like matrix math, and it's a "drop in" so you could do a "test case" in less than a day or so, you get pretty rapid acceptance. If it requires weeks to just figure out how to make it work, it's going to be in the "when someone specifically funds me to do it". I've seen lots of really interesting things that I'd like to try, but not being independently wealthy or having a patron who is, I have to work on things that other people want done (and, presumably which I also find interesting). I can write proposals to say "it would be really nice to do X because of speculative benefit Y" and every once in a while, someone will say, "Yeah, that sounds good, go check it out". And then we do. But it's a long and time consuming process. For instance, I was just in a presentation last week discussing a recent call for proposals from NASA.. the *shortest* time from proposal to response (yes/no) was around 120 days, the median was around 200 days, and the max was around 400 days plus, depending on the year. http://science.nasa.gov/researchers/sara/grant-stats/ A lot depends on what happens to the budgets as they wend their leisurely way through the program offices at the agencies, then get rolled up in the President's submission, then thrashed in Congress, then allocated, then back through the agency, and finally back down to the program. To provide some perspective on the front end of the process, the program managers at the agencies are winding up their PPBE13 submissions (that's for FY13, starting October 2012, although it also affects FY12 funding) A "new technology" that hasn't been "on the radar" probably has a 2-3 year lag before significant money can be applied to it (at least from government funding sources). Often, one can get smaller sums more quickly out of some general "investigate new technologies" kind of bucket (smaller sums = a few $10k), but right now, even those have essentially dried up (Continuing resolutions, etc.) To tie this back to the first question.. a few $10k would pay for the "Lets try recompiling with the new library and see if it works" sort of level of effort, but not for a "Let's rewrite our codes for the new hardware, and engage in a validation and verification effort to show that it still works" James Lux, P.E. Co-Principal Investigator, CoNNeCT Project Task Manager, SOMD Software Defined Radios Flight Communications Systems Section Jet Propulsion Laboratory 4800 Oak Grove Drive, Mail Stop 161-213 Pasadena, CA, 91109 +1(818)354-2075 phone +1(818)393-6875 fax > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Herbert Fruchtl > Sent: Monday, April 04, 2011 8:16 AM > To: beowulf at beowulf.org > Subject: Re: [Beowulf] GP-GPU experience > > They hear great success stories (which in reality are often prototype > implementations that do one carefully chosen benchmark well), then look at the > API, look at their existing code, and postpone the start of their project until > they have six months spare time for it. And we know when that is. > > The current approach with more or less vendor specific libraries (be they "open" > or not) limits the uptake of GPU computing to a few hardcore developers of > experimental codes who don't mind rewriting their code every two years. It won't > become mainstream until we have a compiler that turns standard Fortran (or C++, > if it has to be) into GPU code. Anything that requires more change than let's > say OpenMP directives is doomed, and rightly so. > > Herbert > > > > > I've installed 4 GPU-equipped servers in my environment; 2 are a part of > > my cluster, and 2 are independent from the cluster so that users can > > login interactively and program/debug/tinker/whatever. (My cluster > > doesn't allow interactive logins by design). > > > > A handful of users were interested in getting access to the GPUs, but so > > far, not a single one has even logged into these systems to kick the > > tires yet, and the systems have been online for approx. 9 months. It > > just be that they're busy with other work. Most of my users are > > post-docs who guide their own research, so they can create/modify their > > own project schedules as they see fit. > > > > > > -- > Herbert Fruchtl > Senior Scientific Computing Officer > School of Chemistry, School of Mathematics and Statistics > University of St Andrews > -- > The University of St Andrews is a charity registered in Scotland: > No SC013532 > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mfatica at gmail.com Mon Apr 4 12:54:37 2011 From: mfatica at gmail.com (Massimiliano Fatica) Date: Mon, 4 Apr 2011 09:54:37 -0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D99EB68.4020800@pathscale.com> References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> Message-ID: If you are old enough to remember the time when the first distribute computers appeared on the scene, this is a deja-vu. Developers used to program on shared memory ( mostly with directives) were complaining about the new programming models ( PVM, MPL, MPI). Even today, if you have a serial code there is no tool that will make your code runs on a cluster. Even on a single system, if you try an auto-parallel/auto-vectorizing compiler on a real code, your results will probably be disappointing. When you can get a 10x boost on a production code rewriting some portions of your code to use the GPU, if time to solution is important or you could perform simulations that were impossible before ( for example using algorithms that were just too slow on CPUs, Discontinuous Galerkin method is a perfect example), there are a lot of developers that will write the code. The effort it is clearly dependent of the code, the programmer and the tool used ( you can go from fully custom GPU code with CUDA or OpenCL, to automatically generated CUF kernels from PGI, to directives using HMPP or PGI Accelerator). In situation where time to solution relates to money, for example oil and gas, GPUs are the answer today ( you will be surprised by the number of GPUs in Houston). Look at the performance and scaling of AMBER ( MPI+ CUDA), http://ambermd.org/gpus/benchmarks.htm, and tell me that the results were not worth the effort. Is GPU programming for everyone: probably not, in the same measure that parallel programming in not for everyone. Better tools will lower the threshold, but a threshold will be always present. Massimiliano PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran, applications porting with CUDA, MPI+CUDA). 2011/4/4 "C. Bergstr?m" : > Herbert Fruchtl wrote: >> They hear great success stories (which in reality are often prototype >> implementations that do one carefully chosen benchmark well), then look at the >> API, look at their existing code, and postpone the start of their project until >> they have six months spare time for it. And we know when that is. >> >> The current approach with more or less vendor specific libraries (be they "open" >> or not) limits the uptake of GPU computing to a few hardcore developers of >> experimental codes who don't mind rewriting their code every two years. It won't >> become mainstream until we have a compiler that turns standard Fortran (or C++, >> if it has to be) into GPU code. Anything that requires more change than let's >> say OpenMP directives is doomed, and rightly so. >> > Hi Herbert, > > I think your perspective pretty much nails it > > (shameless self promotion) > http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) > http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf > http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source) > > This is really only the tip of the problem and there must also be > solutions for scaling *efficiently* across the cluster. ?(No MPI + CUDA > or even HMPP is *not* the answer imho.) > > ./C > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 15:16:31 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 21:16:31 +0200 Subject: [Beowulf] Quadrics? In-Reply-To: <4D2C8B7C.30300@bull.co.uk> References: <4D2C8B7C.30300@bull.co.uk> Message-ID: hi, sometimes i go through a lot of mails at the mailing list here and had missed this one. please keep me up to date and/or add me to mailing lists there. latency is superior of quadrics compared to all the infini* stuff. drivers that integrate into kernels - well some modifications shouldn't be too hard. Of course even the realtime linux kernel is rather crappy there, as it locks every action from and to a socket (even RAW/UDP communication in fact), so you need a 'hack' of that kernel anyway to get faster latencies. secondhand the quadrics stuff is cheap it seems. Vincent On Jan 11, 2011, at 5:55 PM, Daniel Kidger wrote: > Mark, > > I will let others step forward individually. > > I was one of the last employees to leave Quadrics , so I do know > who had > support contracts at that time, plus the even larger set of sites that > had expired support contracts but still were actively running their > QsNet clusters. > > You know that a company called Vega took on the ongoing support? : > here is the website I set up at the time: https:// > support.hpc.vega.co.uk/ > > I agree too though that there should be a community of QsNet-owning > enthusiasts, who could provide mutual support in this legacy era. > > > Also off the record, I know that there is a lot of Elan4 stock sitting > in a warehouse. As long as you are not looking for long term vendor > support, I expect you could acquire cards, cables and switches for a > bargain price. > > Daniel > > >> Are you still using Quadrics Elan4-based clusters? >> >> We would like to continue using Quadrics on one of our clusters, >> since it >> is still quite good in latency. Maintaining the Quadrics drivers, >> though, >> is a bit of a pain going forward - would be nice to avoid >> duplicating effort, >> if there are other groups also doing so. >> >> please follow up or email me if you are using Elan4, or know anything >> relevant. >> >> thanks, >> Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http:// >> www.sharcnet.ca >> | McMaster RHPCS | hahn at mcmaster.ca | 905 525 9140 >> x24687 >> | Compute/Calcul Canada | http:// >> www.computecanada.org >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > > > -- > Bull, Architect of an Open World TM > > Dr. Daniel Kidger, HPC Technical Consultant > daniel.kidger at bull.co.uk > +44 (0) 7966822177 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 15:20:15 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 21:20:15 +0200 Subject: [Beowulf] =?iso-8859-1?q?Chinese_supercomputers_to_use_=91homemad?= =?iso-8859-1?q?e=92_chips?= In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> Message-ID: On Mar 11, 2011, at 7:20 AM, Mark Hahn wrote: >> Interesting: >> Chinese supercomputers to use ?homemade? chips >> http://timesofindia.indiatimes.com/tech/personal-tech/computing/ >> Chinese-supercomputers-to-use-homemade-chips/articleshow/7655183.cms > > it's important to remind ourselves that China is still a centrally- > planned, > totalitarian dictatorship. I mention this only because this > announcement > is a bit like Putin et al announcing that they'll develop their own > linux distro because Russia is big and important and mustn't allow > itself to be vulnerable to foreign hegemony. > > so far, the very shallow reporting I've seen has said that future > generations will add wide FP vector units. nothing wrong with that, > though it's a bit unclear to me why other companies haven't done it > if there is, in fact, lots of important vector codes that will run > efficiently on such a configuration. adding/widening vector FP is > not breakthrough engineering afaikt. > > has anyone heard anything juicy about the Tianhe interconnect? > _______________________________________________ Not really but busy with an AMD-GPU now the 6970 (note the 6990 also is available having 2 gpu's) is so fast that the real problem is bandwidth from and to the gpu; so for a big cluster calculation i can understand very well the need for having your own interconnect, especially as they get produced in china anyway. the cpu's you also need bigtime, but as i'm going to react onto a special GPU posting anyway let's move it to that subject. > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 15:26:43 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 21:26:43 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: you can forget about getting much info other than marketing data. the companies and orgainsations that already calculate for years at gpu's they are really good in keeping their mouth shut. But if you realize that even with 16 fast AMD cores (which for this specific prime number code are a LOT FASTER in ipc than any other x64 chip), a box built cheap second hand by the way as it's 4 x 8356 are needed to feed just 1 gpu, you start to realize the real problem. GPU's completely annihilate cpu's everywhere. The limitation is the bandwidth to the gpu, though i didn't fully test that bandwidth yet. The 6000 series from AMD has much improved multiplication logics, like 2.5x faster than the previous generation and it'll take some time to optimize this code for it. streamcores for a while got renamed to PE's nowadays, processing elements, and it has 1536 per gpu. The 6990 has 2 of 'em. It took a while for a good driver for these gpu's. Last days of januari it was there. AMD-CAL works great here now. There is not much diff with CUDA, other than proprietary ways of how to access things and limbs and a few function calls. Programming is similar. 818 execution units that can do multiplication 32 x 32 bits == 64 bits. That kicks butt. bye bye cpu's. On Mar 21, 2011, at 1:51 PM, Douglas Eadline wrote: > > I was recently given a copy of "GPU Computing Gems" > to review. It is basically research quality NVidia success > stories, some of which are quite impressive. > > I got to thinking about how others are fairing (or not) > with GP-GPU technology. I put up a simple poll on > ClusterMonkey to help get a general idea. > (you can find it on the front page right top) > If you have a moment, please provide > your experience (results are available as well). > > http://www.clustermonkey.net/ > > BTW: You can see all the previous polls > and links to other market data here: > > http://goo.gl/lDcUJ > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 16:07:31 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 22:07:31 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> Message-ID: On Apr 4, 2011, at 6:54 PM, Massimiliano Fatica wrote: > If you are old enough to remember the time when the first distribute > computers appeared on the scene, > this is a deja-vu. Developers used to program on shared memory ( > mostly with directives) were complaining > about the new programming models ( PVM, MPL, MPI). > Even today, if you have a serial code there is no tool that will make > your code runs on a cluster. > Even on a single system, if you try an auto-parallel/auto-vectorizing > compiler on a real code, your results will probably be disappointing. > > When you can get a 10x boost on a production code rewriting some > portions of your code to use the GPU, if time to solution is important Oh comeon factor 10 is not realistic. You're doing the usual compare here of a hobby coder who coded a tad in C or slowish C++ (except for a SINGLE, so not several, NCSA coder i'll have to find the first C++ guy who can write codes equally fast to C for complex algorithms - granted for big companies C++ makes more sense, just not when it's about performance) and then compare that with a full blown sponsored project in CUDA that uses the topend gpu and compare it versus a single core instead of 4 sockets (as that's powerwise the same). Moneywise of course is another issue, that's where the gpu's win it bigtime. Yet there is a hidden cost in gpu's, that's you can build something way faster for less money with gpu's, but you also need to pay for a good coder to write your code in either CUDA or AMD-CAL (or as the chinese seem to support both at the same time, which is not so complicated if you have setup things in the correct manner). This last is a big problem for the western world; governments pay big bucks for hardware, but paying good coders what they are worth they seem to forget. Secondly there is another problem, that's that NVIDIA hasn't even released the instructoin set of their GPU. Try to figure that out without fulltime work for it. It seems however pretty similar to AMD, despite other huge architectural differences between the 2; the programming similarity is striking and selfexplains the real purpose where they got designed for (GRAPHICS). > or you could perform simulations that were impossible before ( for > example using algorithms that were just too slow on CPUs, All true yet it takes a LOT OF TIME to write something that's fast on a gpu. First of all you have to not write double precision code, as the gamers card from nvidia seem to not have much double precision logic, they only have 32 bits logics. So at double precision, AMD is like 10 times faster in money per gflop than Nvidia. Yet try to figure that out without being fulltime busy with those gpu's. Only the TESLA versions have those transistors it seems. Secondly Nvidia seems to keep being busy maximizing the frequency of the gpu. Now that might be GREAT for games as high clocked cores work (see intel), yet for throughput of course that's a dead end. In raw throughput AMD's (ATI's) approach will always win it of course from nvidia, as clocking a processor higher has a O ( n ^ 3 ) impact on power consumption. Now a big problem with nvidia is also that they basically go over spec. I didn't really figure it out, yet it seems pci-e got designed with 300 watt in mind max. Yet at this code i'm busy with, the CUDA version of it (mfaktc) consumes a whopping 400+ watt and please realize that majority of the system time is only keeping the streamcores busy and not caches at all nor much of a RAM. It's only doing multiplications of course at full speed in 32 bits code, using the new Fermi's instructions that allows multiplying 32 bits x 32 bits == 64 bits. CUDA version of your code gets developed btw by a guy working for a HPC vendor which, i guess, also sells those Tesla's. So any performance bragging sure must keep in mind it's far over 33% over the specs in terms of power consumption. Note AMD seems to follow nvidia in its path there. > Discontinuous Galerkin method is a perfect example), there are a lot > of developers that will write the code. Oh comeon, writing for gpu's is really complicated. > The effort it is clearly dependent of the code, the programmer and the > tool used ( you can go from fully custom GPU code with CUDA or OpenCL, Forget OpenCL, not good enough. Better to code in CUDA and AMD-CAL at the same time something. > to automatically generated CUF kernels from PGI, to directives using > HMPP or PGI Accelerator). > In situation where time to solution relates to money, for example > oil and gas, GPUs are the answer today ( you will be surprised > by the number of GPUs in Houston). Pardon me, those industries already were using vectorized solutoins long before CUDA was there and are using massively GPU's to calculate of course as soon as nvidia released a version that was programmable. This is not new. All those industries will of course never say anything on the performance nor how many they use. > Look at the performance and scaling of AMBER ( MPI+ CUDA), > http://ambermd.org/gpus/benchmarks.htm, and tell me that the results > were not worth the effort. > > Is GPU programming for everyone: probably not, in the same measure > that parallel programming in not for everyone. > Better tools will lower the threshold, but a threshold will be > always present. > I would argue that both AMD as well as Nvidia has really tried to give the 3d world nations an advantage by stopping progress in the rich nations. I will explain. The real big advantage of rich nations is that average persons have more cash. Students are a good example there. They can afford gpu's easily. Yet there is so little technical information available on latencies and in case of nvidia on instructoin set that the gpu's support, that this gives a huge programming hurdle for students. Also there is no good tips in nvidia documents how to program for those things. The most fundamental lessons how to program a gpu i miss in all documents i scanned so far. It's just a bunch of 'lectures' that's not going to create any topcoders. A piece of information here and a tad there. Very bad. AMD also is a nightmare there, they can't even run more than 1 program at the same time, despite claims that the 4000 series gpu's already had hardware support to do it. The indian helpdesk in fact is so lazy that they didn't even rename the word 'ati' in the documentation to AMD, and the library each few months gets a new name. Stream SDK now it's another new fancy name. "we worked hard in India sahib, yes sahib, yes sahib". Yet 5 years later still not much works. For example in opencl also the 2nd gpu doesn't work in case of AMD. Result "undefined". Nice. Default driver install at inux here doesn't get openCL to work in fact at the 6970. Both nvidia as well as AMD are a total joke there and by means of incompetence, the generic incompetence being complete and clear documentation just like we have documention on how cpu's work. Be it intel or AMD or IBM. Students who program now for those gpu's in CUDA or AMD-CAL, they will have to go to hell and back to get something to work well on it, except some trivial stuff that works well at it. We see that just a few manage. That's not a problem of the students, but a problem for society, because doing calculations faster and especially CHEAP, is a huge advantage to progress science. NSA type organisations in 3d world nations are a lot bigger than here, simply because more people live there. So right now more people over there code for gpu's than here, here where everyone can afford one. Some big companies excepted of course, but this is not a small note on companies. This is a note on 1st world versus 3d world. The real difference is students with budget over here. They have budget for gpu's, yet there is no good documentation simply giving which instructions a gpu has let alone which latencies. If you google hard, you will find 1 guy who actually by means of measuring had to measure the latencies of simple instructions that write to the same register. Why did an university guy need to measure this, why isn't this simply in Nvidia documentation? A few of those things will of course have majority, vaste vaste majority of students trying something on a gpu, completely fail. Because they fail, they don't continue there and don't get back from those gpu's a faster running code that gives them something very important: faster calculation speed for whatever they wanted to run. This is where AMD and Nvidia, and i politely call it by means of incompetence, gives the rich nations no advantage over the 3d world nations, as the students need to be compeltely fulltime busy to obtain knowledge on the internal workings of the gpu's in order to get something going fast at them. Majority will fail therefore of course, which has simply avoided gpu's from getting massively adapted. I've seen so many students try and fail at gpu programming, especially CUDA. It's bizarre. The fail % is so huge. Even a big succes doesn't get recognized as a big succes, simply because the guy didn't know about a few bottlenecks in gpu programming, as no manual told him the combination of problems he ran into, as there was no technical data available. It is true gpu's can be fast, but i feel there is a big need for better technical documentation of them. We can no longer ignore this now that 3d world nations are overrunning 1st world nations. Mainly because the sneaky organisations that do know everything are of course bigger over there than here, by means of population size. This where the huge advantage of the rich nations, namely that every student has such gpu at home, is not getting taken advantage from as the hurdle to gpu programming is too high by means of lack of accurate documentation. Of course in 3d world nations they have at most a mobile phone, and very very seldom a laptop (except for the rich elite), let alone a computer with a capable programmable gpu, which makes it impossible for majority of 3d world nations students to do any gpu computation because of a shortage in cash. > > Massimiliano > PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran, > applications porting with CUDA, MPI+CUDA). > > > 2011/4/4 "C. Bergstr?m" : >> Herbert Fruchtl wrote: >>> They hear great success stories (which in reality are often >>> prototype >>> implementations that do one carefully chosen benchmark well), >>> then look at the >>> API, look at their existing code, and postpone the start of their >>> project until >>> they have six months spare time for it. And we know when that is. >>> >>> The current approach with more or less vendor specific libraries >>> (be they "open" >>> or not) limits the uptake of GPU computing to a few hardcore >>> developers of >>> experimental codes who don't mind rewriting their code every two >>> years. It won't >>> become mainstream until we have a compiler that turns standard >>> Fortran (or C++, >>> if it has to be) into GPU code. Anything that requires more >>> change than let's >>> say OpenMP directives is doomed, and rightly so. >>> >> Hi Herbert, >> >> I think your perspective pretty much nails it >> >> (shameless self promotion) >> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) >> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf >> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to >> source) >> >> This is really only the tip of the problem and there must also be >> solutions for scaling *efficiently* across the cluster. (No MPI + >> CUDA >> or even HMPP is *not* the answer imho.) >> >> ./C >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 4 16:20:02 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 4 Apr 2011 16:20:02 -0400 (EDT) Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: > GPU's completely annihilate cpu's everywhere. this is complete nonsense. GPUs do very nicely on a quite narrow set of problems. for a somewhat larger set of problems, they do OK, but pretty "meh", really, considering. for many problems, GPUs are irrelevant, whether that's because the problem uses too much memory, or already scales well on non-GPU, or doesn't have a GPU-friendly structure. > 818 execution units that can do multiplication 32 x 32 bits == 64 bits. > That kicks butt. bye bye cpu's. well, for your application, which is quite narrow. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 16:34:19 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 22:34:19 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> On Apr 4, 2011, at 10:20 PM, Mark Hahn wrote: >> GPU's completely annihilate cpu's everywhere. > > this is complete nonsense. GPUs do very nicely on a quite narrow > set of problems. for a somewhat larger set of problems, they do OK, > but pretty "meh", really, considering. for many problems, GPUs are > irrelevant, whether that's because the problem uses too much > memory, or already scales well on non-GPU, or doesn't have a GPU- > friendly > structure. > >> 818 execution units that can do multiplication 32 x 32 bits == 64 >> bits. >> That kicks butt. bye bye cpu's. > > well, for your application, which is quite narrow. Which is about any relevant domain where massive computation takes place. The number of algorithms that really profit bigtime from a lot of RAM, in some cases you can also replace by massive computation and a tad of memory, the cases where that cannot be the case are very rare. For those few cases you order a few nodes with massive RAM rather than big cpu power. yet majority of HPC calculations, especially if we add company codes there, the simulators and the oil, gas, car and aviation industry. So that makes 95% of all codes just need massive cpu power and can get away with relative small RAM sizes per compute unit. Not to confuse btw with a compute unit of AMD as that is just a small part of a gpu, speaking of redefinitions :) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 4 17:54:00 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 4 Apr 2011 17:54:00 -0400 (EDT) Subject: [Beowulf] GP-GPU experience In-Reply-To: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> Message-ID: >> well, for your application, which is quite narrow. > > Which is about any relevant domain where massive computation takes place. you are given to hyperbole. the massive domains I'm thinking of are cosmology and explicit quantum condensed-matter calculations. the experts in those fields I talk to both do use massive computation and do not expect much benefit from GPUs. > The number of algorithms that really profit bigtime from a lot of RAM, in > some cases you can also > replace by massive computation and a tad of memory, the cases where that > cannot be the case > are very rare. no. you are equating "uses lots of ram" with "uses memoization". > yet majority of HPC calculations, especially if we add company codes there, > the simulators and the oil, > gas, car and aviation industry. jeez. nevermind I said anything. I'd forgotten about your style. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 18:10:44 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 00:10:44 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> Message-ID: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl> On Apr 4, 2011, at 11:54 PM, Mark Hahn wrote: >>> well, for your application, which is quite narrow. >> >> Which is about any relevant domain where massive computation takes >> place. > > you are given to hyperbole. the massive domains I'm thinking of > are cosmology and explicit quantum condensed-matter calculations. > the experts in those fields I talk to both do use massive computation > and do not expect much benefit from GPUs. Even the field you give as an example: quantum mechanica: Vaste majority of quantum mechanica calculations are massive matrix calculations. Furthermore i didn't take a look to the field you're speaking about. I did however take a look to 1 other quantum mechanica calculation, where someone used 1 core of his quadcore box and massive RAM. It took me 1 afternoon to explain the guy how to trivially use all 4 cores doing that calculation using the same RAM buffer. You realize that you also can do combined calculations? Just have a new chipset with big bandwidth to gpu, at cpu's, based upon a big RAM buffer, prepare batches, ship batch to gpu, do tough calculation work on the gpu, ship results back. That's how many use those gpu's. My attempt to write a sieve directly into the gpu in order to do everything inside the gpu, is of a different league sir than where you are talking. Your kind of talking is: "there are no tanks in the city, we will drive all tanks out of the city, so that only our cpu's are left again". Those days are over. Just get creative and find a way to do it at a gpu. I parallellized 1 quantum mechanica calculation there; i wasn't paid for that. Just pay someone to useful use a GPU. If it ain't easy it doesn't mean it's impossible. Most quantum mechanica guys might be brilliant in their field, in manners how to parallellize things without losing their branching factor that a huge RAM buffer gives, they didn't figure out simply yet. Now it won't be easy to solve for every field; but being a speedfreak and in advance saying some faster type of hardware cannot be used is just monkeytalk. Go get clever and solve the problem. Find solutions, don't see just problems. > >> The number of algorithms that really profit bigtime from a lot of >> RAM, in some cases you can also >> replace by massive computation and a tad of memory, the cases >> where that cannot be the case >> are very rare. > > no. you are equating "uses lots of ram" with "uses memoization". > >> yet majority of HPC calculations, especially if we add company >> codes there, the simulators and the oil, >> gas, car and aviation industry. > > jeez. > nevermind I said anything. I'd forgotten about your style. Read the statistics on the reports what eats system time sir. You have access to those papers as well if you know how to google. Regards, Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 4 18:20:08 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 4 Apr 2011 18:20:08 -0400 (EDT) Subject: [Beowulf] GP-GPU experience In-Reply-To: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl> Message-ID: >>>> well, for your application, which is quite narrow. >>> >>> Which is about any relevant domain where massive computation takes place. >> >> you are given to hyperbole. the massive domains I'm thinking of >> are cosmology and explicit quantum condensed-matter calculations. >> the experts in those fields I talk to both do use massive computation >> and do not expect much benefit from GPUs. > > Even the field you give as an example: quantum mechanica: > Vaste majority of quantum mechanica calculations are massive matrix > calculations. yes, specifically very large sparse eigensystems. do you have an example of effectively using GPUs for this? > Furthermore i didn't take a look to the field you're speaking about. > I did however take a look to 1 other quantum mechanica calculation, > where someone used 1 core of his quadcore box and massive RAM. sorry, I'm talking thousands of cores, ideally with > 4GB/core. > It took me 1 afternoon to explain the guy how to trivially use all 4 cores > doing that calculation > using the same RAM buffer. the point is that lots of serious science uses MPI already, and doesn't care much about GPUs. if they were free, sure, they might be interesting. > My attempt to write a sieve directly into the gpu in order to do everything > inside the gpu, > is of a different league sir than where you are talking. bully for you. your application is a niche. > Your kind of talking is: "there are no tanks in the city, we will drive all > tanks out of the city, so that only > our cpu's are left again". nonsense. I'm saying that GPUs are a nice, specialized accelerator. you can't have them without hosts, so you need to compare host vs host+GPU. > Those days are over. Just get creative and find a way to do it at a gpu. don't be silly. GPUs have weaknesses as well as strengths. packaging and system design is one of the minor sticking points with GPUs. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Tue Apr 5 01:22:39 2011 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 4 Apr 2011 22:22:39 -0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> Message-ID: <20110405052239.GA6130@bx9.net> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: > If you are old enough to remember the time when the first distribute > computers appeared on the scene, > this is a deja-vu. Not to mention the prior appearance of array processors. Oil+Gas bought a lot of those, too. Some important radio astronomy data reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B was 10X faster than the VAX by itself. Then microprocessor-based workstations arrived, and the game was over, ease of use FTW. > Even on a single system, if you try an auto-parallel/auto-vectorizing > compiler on a real code, your results will probably be disappointing. The wins from such compilers have been steadily decreasing, as main memory gets farther and farther away from the CPU and caches. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From beat at 0x1b.ch Tue Apr 5 01:52:41 2011 From: beat at 0x1b.ch (Beat Rubischon) Date: Tue, 05 Apr 2011 07:52:41 +0200 Subject: [Beowulf] Quadrics? In-Reply-To: References: <4D2C8B7C.30300@bull.co.uk> Message-ID: <4D9AAE29.5090207@0x1b.ch> Hi Vincent! On 04.04.11 21:16, Vincent Diepeveen wrote: > latency is superior of quadrics compared to all the infini* stuff. Quadrics was great stuff - but it was outperformed once Mellanox invited their ConnectX chips. Additional the Quadrics team never got their PCIe chips (QSnet III) to fly. Finally the company closed their doors in may 09. I really liked their hard- and software. But the time is over... > Of course even the realtime linux kernel is rather crappy there, as > it locks every action from and to a socket (even RAW/UDP > communication in fact), so you need a 'hack' of that kernel anyway to > get faster latencies. When talking about Interconnects the kernel is not involved in communication. Any context switch is avoided to keep the overhead small. This basically means a real time kernel isn't needed as it would not give you any additional benefit. Beat -- \|/ Beat Rubischon ( 0-0 ) http://www.0x1b.ch/~beat/ oOO--(_)--OOo--------------------------------------------------- Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 03:51:00 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 09:51:00 +0200 Subject: [Beowulf] Quadrics? In-Reply-To: <4D9AAE29.5090207@0x1b.ch> References: <4D2C8B7C.30300@bull.co.uk> <4D9AAE29.5090207@0x1b.ch> Message-ID: <75CD2C36-0B25-4CD2-B3F8-2645BE1A72DC@xs4all.nl> On Apr 5, 2011, at 7:52 AM, Beat Rubischon wrote: > Hi Vincent! > > On 04.04.11 21:16, Vincent Diepeveen wrote: >> latency is superior of quadrics compared to all the infini* stuff. > > Quadrics was great stuff - but it was outperformed once Mellanox > invited > their ConnectX chips. Additional the Quadrics team never got their > PCIe > chips (QSnet III) to fly. Finally the company closed their doors in > may 09. > > I really liked their hard- and software. But the time is over... > of course there is new great pci-e solutions, yet the price per port there is bigger than entire machine with latest gpu, that's a big problem to make cheap clusters. If you buy a cheap 6 core box of 350 euro then a new generation gpu is 318 euro or so (that's a HD6970). What's node price of the network? >> Of course even the realtime linux kernel is rather crappy there, as >> it locks every action from and to a socket (even RAW/UDP >> communication in fact), so you need a 'hack' of that kernel anyway to >> get faster latencies. > > When talking about Interconnects the kernel is not involved in > communication. Any context switch is avoided to keep the overhead > small. > This basically means a real time kernel isn't needed as it would not > give you any additional benefit. realtime kernel keeps other worst cases down bigtime, especially with respect to scheduling. > > Beat > > -- > \|/ Beat Rubischon > ( 0-0 ) http://www.0x1b.ch/~beat/ > oOO--(_)--OOo--------------------------------------------------- > Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 03:58:47 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 09:58:47 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: <20110405052239.GA6130@bx9.net> References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> <20110405052239.GA6130@bx9.net> Message-ID: On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote: > On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: > >> If you are old enough to remember the time when the first distribute >> computers appeared on the scene, >> this is a deja-vu. > > Not to mention the prior appearance of array processors. Oil+Gas > bought a lot of those, too. Some important radio astronomy data > reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B > was 10X faster than the VAX by itself. Then microprocessor-based > workstations arrived, and the game was over, ease of use FTW. > >> Even on a single system, if you try an auto-parallel/auto-vectorizing >> compiler on a real code, your results will probably be disappointing. > > The wins from such compilers have been steadily decreasing, as main > memory gets farther and farther away from the CPU and caches. > > -- greg It's different this time indeed; classic cpu's will never again deliver big performance. cache - coherency is simply too complicated with many cores. cpu's also will need a manycore co-processor therefore. furthermore manycores simply are cheaper to produce and they can eat a bigger powerbudget. 3 very powerful arguments which regrettably limits cpu's, but that's the price we pay for progress. It won't mean cpu's will go away of course any soon, they're so generic and easy to program that they will survive. Just offload the calculations to the manycores. please don't estimate the argument of cheaper to produce. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 04:04:35 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 10:04:35 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> <20110405052239.GA6130@bx9.net> Message-ID: On Apr 5, 2011, at 9:58 AM, Vincent Diepeveen wrote: > > On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote: > >> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: >> >>> If you are old enough to remember the time when the first distribute >>> computers appeared on the scene, >>> this is a deja-vu. >> >> Not to mention the prior appearance of array processors. Oil+Gas >> bought a lot of those, too. Some important radio astronomy data >> reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B >> was 10X faster than the VAX by itself. Then microprocessor-based >> workstations arrived, and the game was over, ease of use FTW. >> >>> Even on a single system, if you try an auto-parallel/auto- >>> vectorizing >>> compiler on a real code, your results will probably be >>> disappointing. >> >> The wins from such compilers have been steadily decreasing, as main >> memory gets farther and farther away from the CPU and caches. >> >> -- greg > Early Morning oh oh oh oh, apologies the context might be clear yet the sentences were written down wrong. > It's different this time indeed; classic cpu's will never again > deliver big performance. > ack > cache - coherency is simply too complicated with many cores. 1) Cache-coherency is too complicated for CPU's > cpu's also will need a manycore co-processor therefore. > ack > furthermore manycores simply are cheaper to produce and they can eat > a bigger powerbudget. > ack > 3 very powerful arguments which regrettably limits cpu's, but that's > the price we pay for progress. > ack > It won't mean cpu's will go away of course any soon, they're so > generic and easy to program that > they will survive. Just offload the calculations to the manycores. > ack > please don't estimate the argument of cheaper to produce. > > please don't UNDERESTIMATE the argument of cheaper to produce only 6 out of 8 score = 75% sharp in the morning > >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Tue Apr 5 05:10:28 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 05 Apr 2011 19:10:28 +1000 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: <4D9ADC84.7030804@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/04/11 05:26, Vincent Diepeveen wrote: > GPU's completely annihilate cpu's everywhere. Great! Where can I get one with 1TB of on-card RAM to keep our denovo reassembly people happy ? - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0 POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t =OMhq -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 09:05:19 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 15:05:19 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D9ADC84.7030804@unimelb.edu.au> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <4D9ADC84.7030804@unimelb.edu.au> Message-ID: <2538ED2A-7F07-4524-B74E-6F0AE623916E@xs4all.nl> On Apr 5, 2011, at 11:10 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 05/04/11 05:26, Vincent Diepeveen wrote: > >> GPU's completely annihilate cpu's everywhere. > > Great! Where can I get one with 1TB of on-card RAM to > keep our denovo reassembly people happy ? There is already several projects in that area that tried incorporate GPU's and with succes. Just google a bit, i got bunches of hits from all sorts of research institutes in that area, most already over 2 years old, nothing new there. Your reaction just shows your ignorance there. Regards, Vincent > > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0 > POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t > =OMhq > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Wed Apr 6 06:58:12 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Wed, 6 Apr 2011 11:58:12 +0100 Subject: [Beowulf] Westmere EX Message-ID: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ 10 core Westmere EX on an eight socket box = 80 cores These would be a very nice machine. Anyone know if machines like this will be built? Do the sockets have enough Quickpath links to create an 8-way topology? John Hearns | CFD Hardware Specialist | McLaren Racing Limited McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK T: +44 (0) 1483 261000 D: +44 (0) 1483 262352 F: +44 (0) 1483 261010 E: john.hearns at mclaren.com W: www.mclaren.com The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From brice.goglin at gmail.com Wed Apr 6 07:05:55 2011 From: brice.goglin at gmail.com (Brice Goglin) Date: Wed, 06 Apr 2011 13:05:55 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9C4913.10802@gmail.com> Le 06/04/2011 12:58, Hearns, John a ?crit : > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. > Anyone know if machines like this will be built? > Do the sockets have enough Quickpath links to create an 8-way topology? > You only have 4 QPI links per sockets, no way to connect the entire graph. Supermicro already announced such 8-way machines. See their QPI topology on page 30 of the motherboard manual available at http://www.supermicro.com/products/motherboard/Xeon7000/7500/X8OBN-F.cfm Brice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cap at nsc.liu.se Wed Apr 6 11:41:18 2011 From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=) Date: Wed, 6 Apr 2011 17:41:18 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <201104061741.18972.cap@nsc.liu.se> On Wednesday, April 06, 2011 12:58:12 pm Hearns, John wrote: > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. > Anyone know if machines like this will be built? > Do the sockets have enough Quickpath links to create an 8-way topology? > > > John Hearns | CFD Hardware Specialist | McLaren Racing Limited > McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK The HP DL980 is an 8 socket EX box but it's not glue-less (it uses HPs own numa interconnect). If you stuff 'em full of dimms then they're probably competitive with the 4 socket 580 (assuming the 980 uses 8G dimms instead of 16G for the 580...). /Peter _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Apr 6 14:00:17 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 6 Apr 2011 20:00:17 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> On Apr 6, 2011, at 12:58 PM, Hearns, John wrote: > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. > Anyone know if machines like this will be built? > Do the sockets have enough Quickpath links to create an 8-way > topology? What do you intend to use the machines for? For a chessprogram they would be great, but none of those guys has the cash to pay for these machines. For financial world it would be a waste of money as well as the latency probably will be very very bad. They seem to get equipped with a max of 512GB ram, not really much for those who badly need a lot of RAM, if we consider the price of such a configured machine. Same price like a power7. > > > John Hearns | CFD Hardware Specialist | McLaren Racing Limited > McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK > > T: +44 (0) 1483 261000 > D: +44 (0) 1483 262352 > F: +44 (0) 1483 261010 > E: john.hearns at mclaren.com > W: www.mclaren.com > > > > > The contents of this email are confidential and for the exclusive > use of the intended recipient. If you receive this email in error > you should not copy it, retransmit it, use it or disclose its > contents but should return it to the sender immediately and delete > your copy. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Wed Apr 6 14:12:56 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Wed, 6 Apr 2011 19:12:56 +0100 Subject: [Beowulf] Westmere EX References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> Message-ID: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > What do you intend to use the machines for? > For a chessprogram they would be great, but none of those guys has > the cash to pay for these > machines. The Supermicro board which Bruce Goglin refers to is said to support 16gbytes DIMMS. Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a 1024 Gbyte machine, plus you can cook your dinner on it. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Wed Apr 6 14:18:35 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Thu, 07 Apr 2011 01:18:35 +0700 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9CAE7B.8000900@pathscale.com> Hearns, John wrote: >> What do you intend to use the machines for? >> For a chessprogram they would be great, but none of those guys has >> the cash to pay for these >> machines. >> > > > > The Supermicro board which Bruce Goglin refers to is said to support > 16gbytes DIMMS. > > Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a > 1024 Gbyte machine, > plus you can cook your dinner on it. > LOL.. (I have to admit that's kinda funny, but only because it's true) I didn't look at the specs, but I wonder how many IOPS you could get off a ram disk on that thing.. $60k is I believe (I could be wrong) in the same ballpark as 1T 1U 1million IOPS appliances (albeit they offer persistence and probably consume less power as well) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hearnsj at googlemail.com Wed Apr 6 18:02:47 2011 From: hearnsj at googlemail.com (John Hearns) Date: Wed, 6 Apr 2011 23:02:47 +0100 Subject: [Beowulf] Westmere EX In-Reply-To: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> Message-ID: On 6 April 2011 19:00, Vincent Diepeveen wrote: > > On Apr 6, 2011, at 12:58 PM, Hearns, John wrote: > >> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > What do you intend to use the machines for? Maybe something like: http://www.youtube.com/watch?v=x2Z3h_Hx310&NR=1 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Wed Apr 6 20:39:19 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 6 Apr 2011 20:39:19 -0400 (EDT) Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. shrug. does anyone have serious experience with real apps on manycore machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, but they're substantially more exotic/rare/expensive.) I bet there will be 100x more 4s servers build with these chips than 8s. and 1000x more 2s than 4s... a friend noticed something weird on intel's spec sheets: http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E notice it says 32GB max memory size. even if that means 32GB/socket, it's not all that much. I don't know about everyone else, but I'm already bored with core counts ;) these also seem fairly warm (130W), considering that they're the fancy new 32nm process and run at modest clock rates... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From joshua_mora at usa.net Wed Apr 6 20:57:43 2011 From: joshua_mora at usa.net (Joshua mora acosta) Date: Wed, 06 Apr 2011 19:57:43 -0500 Subject: [Beowulf] Westmere EX Message-ID: <093PDga5r8464S02.1302137863@web02.cms.usa.net> _3D_ FFT scaling will allow you to see how well balanced is the system. Joshua ------ Original Message ------ Received: 07:40 PM CDT, 04/06/2011 From: Mark Hahn To: Beowulf Mailing List Subject: Re: [Beowulf] Westmere EX > > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > > > 10 core Westmere EX on an eight socket box = 80 cores > > These would be a very nice machine. > > shrug. does anyone have serious experience with real apps on manycore > machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, > but they're substantially more exotic/rare/expensive.) > > I bet there will be 100x more 4s servers build with these chips than 8s. > and 1000x more 2s than 4s... > > a friend noticed something weird on intel's spec sheets: > http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E > > notice it says 32GB max memory size. even if that means 32GB/socket, > it's not all that much. > > I don't know about everyone else, but I'm already bored with core counts ;) > these also seem fairly warm (130W), considering that they're the fancy > new 32nm process and run at modest clock rates... > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From jlforrest at berkeley.edu Wed Apr 6 22:15:17 2011 From: jlforrest at berkeley.edu (Jon Forrest) Date: Wed, 06 Apr 2011 19:15:17 -0700 Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9D1E35.9040802@berkeley.edu> On 4/6/2011 5:39 PM, Mark Hahn wrote: > shrug. does anyone have serious experience with real apps on manycore > machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, > but they're substantially more exotic/rare/expensive.) I have a couple 48-core 1U boxes. They can build gcc and other large packages very quickly. The scientists who run single process simulations also like them but they're not real picky about how long it takes for something to run. They also generally spend close to no time at all optimizing anything. -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Apr 7 04:43:06 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 09:43:06 +0100 Subject: [Beowulf] Westmere EX References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> Message-ID: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > On 4/6/2011 5:39 PM, Mark Hahn wrote: > > > shrug. does anyone have serious experience with real apps on > manycore > > machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, > > but they're substantially more exotic/rare/expensive.) > > I have a couple 48-core 1U boxes. They can build > gcc and other large packages very quickly. > > The scientists who run single process simulations > also like them but they're not real picky about > how long it takes for something to run. They also > generally spend close to no time at all optimizing > anything. "Premature optimization is the root of all evil" - Donald Knuth I'm also interested in the response to Mark Hahn's question - I guess that's why I started this thread really! Also as I've said before, with the advent of affordable manycore systems like this, we're going to have to dust off those old skills practised in the age of SMP monster machines - which were probably something like the same specs as these affordable systems! The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Thu Apr 7 04:56:33 2011 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 7 Apr 2011 10:56:33 +0200 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour Message-ID: <20110407085633.GE23560@leitl.org> http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n 10,000-core Linux supercomputer built in Amazon cloud Cycle Computing builds cloud-based supercomputing cluster to boost scientific research. By Jon Brodkin, Network World April 06, 2011 03:15 PM ET High-performance computing expert Jason Stowe recently asked two of his engineers a simple question: Can you build a 10,000-core cluster in the cloud? "It's a really nice round number," says Stowe, the CEO and founder of Cycle Computing, a vendor that helps customers gain fast and efficient access to the kind of supercomputing power usually reserved for universities and large research organizations. SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to Linux To continue reading, register here to become an Insider. You'll get free access to premium content from CIO, Computerworld, CSO, InfoWorld, and Network World. See more Insider content or sign in. High-performance computing expert Jason Stowe recently asked two of his engineers a simple question: Can you build a 10,000-core cluster in the cloud? "It's a really nice round number," says Stowe, the CEO and founder of Cycle Computing, a vendor that helps customers gain fast and efficient access to the kind of supercomputing power usually reserved for universities and large research organizations. SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to Linux Cycle Computing had already built a few clusters on Amazon's Elastic Compute Cloud that scaled up to several thousand cores. But Stowe wanted to take it to the next level. Provisioning 10,000 cores on Amazon has probably been done numerous times, but Stowe says he's not aware of anyone else achieving that number in an HPC cluster, meaning one that uses a batch scheduling technology and runs an HPC-optimized application. "We haven't found references to anything larger," Stowe says. Had it been tested for speed, the Linux-based cluster Stowe ran on Amazon might have been big enough to make the Top 500 list of the world's fastest supercomputers. One of the first steps was finding a customer that would benefit from such a large cluster. There's no sense in spinning up such a large environment unless it's devoted to some real work. The customer that opted for the 10,000-core cloud cluster was biotech company Genentech in San Francisco, where scientist Jacob Corn needed computing power to examine how proteins bind to each other, in research that might eventually lead to medical treatments. Compared to the 10,000-core cluster, "we're a tenth the size internally," Corn says. Cycle Computing and Genentech spun up the cluster on March 1 a little after midnight, based on Amazon's advice regarding the optimal time to request 10,000 cores. While Amazon offers virtual machine instances optimized for high-performance computing, Cycle and Genentech instead opted for a "standard vanilla CentOS" Linux cluster to save money, according to Stowe. CentOS is a version of Linux based on Red Hat's Linux. The 10,000 cores were composed of 1,250 instances with eight cores each, as well as 8.75TB of RAM and 2PB disk space. Scaling up a couple of thousand cores at a time, it took 45 minutes to provision the whole cluster. There were no problems. "When we requested the 10,000th core, we got it," Stowe said. The cluster ran for eight hours at a cost of $8,500, including all the fees to Amazon and Cycle Computing. (See also: Start-up transforms unused desktop cycles into fast server clusters) For Genentech, this was cheap and easy compared to the alternative of buying 10,000 cores for its own data center and having them idle away with no work for most of their lives, Corn says. Using Genentech's existing resources to perform the simulations would take weeks or months instead of the eight hours it took on Amazon, he says. Genentech benefited from the high number of cores because its calculations were "embarrassingly parallel," with no communication between nodes, so performance stats "scaled linearly with the number of cores," Corn said. To provision the cluster, Cycle used its own CycleCloud software, the Condor scheduling system and Chef, an open source configuration management framework. Cycle also used some of its own software to detect errors and restart nodes when necessary, a shared file system, and a few extra nodes on top of the 10,000 to handle some of the legwork. To ensure security, the cluster was engineered with secure-HTTP and 128/256-bit Advanced Encryption Standard encryption, according to Cycle. Cycle Computing boasted that the cluster was roughly equivalent to the 114th fastest supercomputer in the world on the Top 500 list, which hit about 66 teraflops. In reality, they didn't run the speed benchmark required to submit a cluster to the Top 500 list, but nearly all of the systems listed below No. 114 in the ranking contain fewer than 10,000 cores. Genentech is still waiting to see whether the simulations lead to anything useful in the real world, but Corn says the data "looks fantastic." He says Genentech is "very open" to building out more Amazon clusters, and Cycle Computing is looking ahead as well. "We're already working on scaling up larger," Stowe says. All Cycle needs is a customer with "a use case to take advantage of it." Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin Read more about data center in Network World's Data Center section. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 7 08:47:54 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 7 Apr 2011 14:47:54 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <33396FD5-FBAA-4735-8694-B0D7FE7EAA84@xs4all.nl> On Apr 6, 2011, at 8:12 PM, Hearns, John wrote: >> What do you intend to use the machines for? >> For a chessprogram they would be great, but none of those guys has >> the cash to pay for these >> machines. > > > > The Supermicro board which Bruce Goglin refers to is said to support > 16gbytes DIMMS. > > Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a > 1024 Gbyte machine, > plus you can cook your dinner on it. > Except that you can't buy the machine equipped with that for $60k in a shop. 512GB equipped 8 socket nehalem-ex (8 core version 2.26Ghz) was introduced at $205k, that's without further equipment such as huge storage, so basic configuration when ordered at Oracle. So this box will probably be $250k or $300k or so? Regards, Vincent > The contents of this email are confidential and for the exclusive > use of the intended recipient. If you receive this email in error > you should not copy it, retransmit it, use it or disclose its > contents but should return it to the sender immediately and delete > your copy. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 7 08:52:43 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 7 Apr 2011 14:52:43 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> On Apr 7, 2011, at 10:43 AM, Hearns, John wrote: >> >> On 4/6/2011 5:39 PM, Mark Hahn wrote: >> >>> shrug. does anyone have serious experience with real apps on >> manycore >>> machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, >>> but they're substantially more exotic/rare/expensive.) >> >> I have a couple 48-core 1U boxes. They can build >> gcc and other large packages very quickly. >> >> The scientists who run single process simulations >> also like them but they're not real picky about >> how long it takes for something to run. They also >> generally spend close to no time at all optimizing >> anything. > > "Premature optimization is the root of all evil" - Donald Knuth > > > I'm also interested in the response to Mark Hahn's question - I guess > that's why I started this thread really! > > Also as I've said before, with the advent of affordable manycore > systems > like this, we're going > to have to dust off those old skills practised in the age of SMP > monster > machines - which were probably > something like the same specs as these affordable systems! > it's not clear what 'these' refers to. 48 core AMD multicore machine: $8000 on ebay i saw one for. Of course not much of a RAM and not fastest chip. Let's say fully configured about double that price. GPU monster box, which is basically a few videocards inside such a box stacked up a tad, wil only add a couple of thousands. But a 8 socket @ 10 core nehalem-ex, in basic configuration will be already far above $205k. Probably a $300k or so when configured. Huge price difference. So i assume you didn't refer to the Nehalem-ex box. > The contents of this email are confidential and for the exclusive > use of the intended recipient. If you receive this email in error > you should not copy it, retransmit it, use it or disclose its > contents but should return it to the sender immediately and delete > your copy. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 09:49:09 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 09:49:09 -0400 Subject: [Beowulf] Westmere EX In-Reply-To: <4D9D1E35.9040802@berkeley.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> Message-ID: <4D9DC0D5.8060802@ias.edu> On 04/06/2011 10:15 PM, Jon Forrest wrote: > On 4/6/2011 5:39 PM, Mark Hahn wrote: > >> shrug. does anyone have serious experience with real apps on manycore >> machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, >> but they're substantially more exotic/rare/expensive.) > > I have a couple 48-core 1U boxes. They can build > gcc and other large packages very quickly. But are the makes definitely running in parallel to take advantage of the multiple cores? I haven't built gcc, so don't know if it uses make's -j option to do parallel builds. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:03:47 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:03:47 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <20110407085633.GE23560@leitl.org> References: <20110407085633.GE23560@leitl.org> Message-ID: <4D9DC443.9080502@ias.edu> On 04/07/2011 04:56 AM, Eugen Leitl wrote: > > "It's a really nice round number," says Stowe, the CEO and founder of Cycle > Computing, Clearly he's a marketing man. Everyone know real computer guys think in powers of 2. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:16:53 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:16:53 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <20110407085633.GE23560@leitl.org> References: <20110407085633.GE23560@leitl.org> Message-ID: <4D9DC755.5070004@ias.edu> A great publicity stunt, but I still don't think it qualifies as a "real" HPC cluster achievement. See comments/objections in-line below. On 04/07/2011 04:56 AM, Eugen Leitl wrote: > > http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n > > The cluster ran for eight hours That's not very long for HPC jobs. How much would the performance have degraded if it started to run into the daytime hours, when demand for CPU cycles in EC2 would be at their peak? > Genentech benefited from the high number of cores > because its calculations were "embarrassingly parallel," with no > communication between nodes, so performance stats "scaled linearly with the > number of cores," Corn said. > So it wasn't really a cluster at all, but a giant batch scheduling system. I probably have a stricter sense of what makes a cluster than some others, so let's not argue on the the definition of cluster and split hairs. In my book, a cluster involves parallel communication between the processes using MPI, PVM or some other parallel communications paradigm. And BTW, my comments are not directed Eugene for posting this. Just starting a general discussion on this article... -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:21:19 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:21:19 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. Message-ID: <4D9DC85F.9080503@ias.edu> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? In all these commercials, the protagonists say "to the cloud" for their solution, but then when they show them using Microsoft Windows to access "the cloud", they're not using the cloud at all. In fact, in one commercial, the one where the wife/mother is fixing the family portrait, she's using a photoshop-like program on her own desktop, not even the Internet is needed. Not only do they use the term "cloud" incorrectly, they don't even show how using Microsoft products give you and advantage for using "the cloud" AAAAAAARRRRRRRGGGH! Okay. Venting over. Whew! I feel better already. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Apr 7 10:27:27 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 15:27:27 +0100 Subject: [Beowulf] Microsoft "cloud" commercials. References: <4D9DC85F.9080503@ias.edu> Message-ID: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? > > In London there is a saturation of Microsoft Cloud advert posters in the mainline stations and Tube lines serving the City (the financial district) and Canary Wharf. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:40:28 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:40:28 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <4D9DC85F.9080503@ias.edu> <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9DCCDC.1080607@ias.edu> On 04/07/2011 10:27 AM, Hearns, John wrote: >> >> Is anyone else as annoyed by the Microsoft "cloud" commercials as I > am? >> >> > In London there is a saturation of Microsoft Cloud advert posters in the > mainline stations > and Tube lines serving the City (the financial district) and Canary > Wharf. > But do they annoy you? ;) For those of you outside the US, here's the commercials I'm referring to: 1. http://youtu.be/-HRrbLA7rss 2. http://youtu.be/mjtqoQE_ezA 3. http://youtu.be/_lu6v6hE_bA 4. http://youtu.be/Lel3swo4RMc Out of these only (1) could possibly be using the cloud, if they're using Google docs or something similar to create and share their documents. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From scrusan at UR.Rochester.edu Thu Apr 7 10:49:09 2011 From: scrusan at UR.Rochester.edu (Crusan, Steve) Date: Thu, 7 Apr 2011 10:49:09 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. References: <4D9DC85F.9080503@ias.edu> Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> Windows HPC Server 2008 also has a builtin feature for an end user to submit excel docs to a windows cluster to do intense timesheet and office supplies calculations... ---------------------- Steve Crusan System Administrator Center for Research Computing -----Original Message----- From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal Sent: Thu 4/7/2011 10:21 AM To: Beowulf Mailing List Subject: [Beowulf] Microsoft "cloud" commercials. Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? In all these commercials, the protagonists say "to the cloud" for their solution, but then when they show them using Microsoft Windows to access "the cloud", they're not using the cloud at all. In fact, in one commercial, the one where the wife/mother is fixing the family portrait, she's using a photoshop-like program on her own desktop, not even the Internet is needed. Not only do they use the term "cloud" incorrectly, they don't even show how using Microsoft products give you and advantage for using "the cloud" AAAAAAARRRRRRRGGGH! Okay. Venting over. Whew! I feel better already. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dag at sonsorol.org Thu Apr 7 11:03:25 2011 From: dag at sonsorol.org (Chris Dagdigian) Date: Thu, 07 Apr 2011 11:03:25 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <4D9DC755.5070004@ias.edu> References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> Message-ID: <4D9DD23D.8090908@sonsorol.org> The CycleComputing folks are good people in my book and I bet more than a few are subscribed to this list. The founders are old-school Condor gurus with a long track record in this field. One of the nice things about their work is how "usable" it is to real people with real production computing requirements - in the IAAS cloud space there are way too many marketing robots talking vague BS about "cloud bursting", "hybrid clusters" and storage aggregation/access across LAN/WAN distances. Cycle has built, deployed & delivered all of this with (what I'd consider) a bare minimum of marketing and chest thumping. It's not a PR gimmick and limiting the definition of "cluster" to only systems that run parallel applications would alienate quite a few of us on this list :) In the life sciences a typical cluster might run a mixture of 80-90% serial jobs with a small scattering of real MPI apps running alongside. I get cynical about this stuff because in the cloud space you see way too many commercial people promising the world without actually delivering anything (other than carefully hand-managed reference account projects) while the academic & supercomputing folks are all busy presenting and bragging about things that will never see the light of day after their thesis defense. There are people like Cycle/Rightscale etc. etc. who actually rise above the hype and deliver clever & usable stuff with a minimum of marketing BS. My $.02 of course -Chris _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 11:05:53 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 11:05:53 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> References: <4D9DC85F.9080503@ias.edu> <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> Message-ID: <4D9DD2D1.9070309@ias.edu> "Cluster" != "Cloud" The Cloud, by definition requires the Internet. Clusters do not. In fact, I bet the NSA can show you many clusters that are not connect to the Internet at all. While I'm at it, "Grid" != ("Cluster" || "Cloud") either! On 04/07/2011 10:49 AM, Crusan, Steve wrote: > Windows HPC Server 2008 also has a builtin feature for an end user to > submit excel docs to a windows cluster to do intense timesheet and > office supplies calculations... > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal > Sent: Thu 4/7/2011 10:21 AM > To: Beowulf Mailing List > Subject: [Beowulf] Microsoft "cloud" commercials. > > Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? > > In all these commercials, the protagonists say "to the cloud" for their > solution, but then when they show them using Microsoft Windows to access > "the cloud", they're not using the cloud at all. > > In fact, in one commercial, the one where the wife/mother is fixing the > family portrait, she's using a photoshop-like program on her own > desktop, not even the Internet is needed. > > Not only do they use the term "cloud" incorrectly, they don't even show > how using Microsoft products give you and advantage for using "the cloud" > > AAAAAAARRRRRRRGGGH! > > Okay. Venting over. Whew! I feel better already. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 11:13:43 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 11:13:43 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <4D9DD23D.8090908@sonsorol.org> References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> <4D9DD23D.8090908@sonsorol.org> Message-ID: <4D9DD4A7.7060601@ias.edu> On 04/07/2011 11:03 AM, Chris Dagdigian wrote: > > The CycleComputing folks are good people in my book and I bet more than > a few are subscribed to this list. The founders are old-school Condor > gurus with a long track record in this field. > > One of the nice things about their work is how "usable" it is to real > people with real production computing requirements - in the IAAS cloud > space there are way too many marketing robots talking vague BS about > "cloud bursting", "hybrid clusters" and storage aggregation/access > across LAN/WAN distances. Cycle has built, deployed & delivered all of > this with (what I'd consider) a bare minimum of marketing and chest > thumping. > > It's not a PR gimmick and limiting the definition of "cluster" to only > systems that run parallel applications would alienate quite a few of us > on this list :) In the life sciences a typical cluster might run a > mixture of 80-90% serial jobs with a small scattering of real MPI apps > running alongside. Do not confuse "scientific computing" or "high performance computing" with "cluster". All terms are definitely related, but you can do scientific/high-perfomance computing without a "cluster." As someone who also works in life sciences, I know that there are a lot of life science tasks that are embarrassingly parallel. Running these tasks on a bunch of different machines simultaneously is definitely scientific and high performance computing, but it doesn't necessarily require a cluster. Folding at home, for example. > > I get cynical about this stuff because in the cloud space you see way > too many commercial people promising the world without actually > delivering anything (other than carefully hand-managed reference account > projects) while the academic & supercomputing folks are all busy > presenting and bragging about things that will never see the light of > day after their thesis defense. Me, too, which is why I started ranting about Microsoft's cloud commercials in a separate thread. ;) It's also why I'm starting to get picky about how the term "cluster" is used. More and more, I see people confusing "cloud" with "cluster". I guess that cynicism is what caused me to reply to the original post. > > There are people like Cycle/Rightscale etc. etc. who actually rise above > the hype and deliver clever & usable stuff with a minimum of marketing BS. > > My $.02 of course > > -Chris > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From scrusan at UR.Rochester.edu Thu Apr 7 11:13:32 2011 From: scrusan at UR.Rochester.edu (Crusan, Steve) Date: Thu, 7 Apr 2011 11:13:32 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> <4D9DD2D1.9070309@ias.edu> Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> Oh I understand the difference, but I thought I'd take this opportunity to bash MS. But, since MS's cloud runs off of MS Azure and MS Server 2008, I would bet the excel functionality would be possible. ---------------------- Steve Crusan System Administrator Center for Research Computing -----Original Message----- From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal Sent: Thu 4/7/2011 11:05 AM Cc: Beowulf Mailing List Subject: Re: [Beowulf] Microsoft "cloud" commercials. "Cluster" != "Cloud" The Cloud, by definition requires the Internet. Clusters do not. In fact, I bet the NSA can show you many clusters that are not connect to the Internet at all. While I'm at it, "Grid" != ("Cluster" || "Cloud") either! On 04/07/2011 10:49 AM, Crusan, Steve wrote: > Windows HPC Server 2008 also has a builtin feature for an end user to > submit excel docs to a windows cluster to do intense timesheet and > office supplies calculations... > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal > Sent: Thu 4/7/2011 10:21 AM > To: Beowulf Mailing List > Subject: [Beowulf] Microsoft "cloud" commercials. > > Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? > > In all these commercials, the protagonists say "to the cloud" for their > solution, but then when they show them using Microsoft Windows to access > "the cloud", they're not using the cloud at all. > > In fact, in one commercial, the one where the wife/mother is fixing the > family portrait, she's using a photoshop-like program on her own > desktop, not even the Internet is needed. > > Not only do they use the term "cloud" incorrectly, they don't even show > how using Microsoft products give you and advantage for using "the cloud" > > AAAAAAARRRRRRRGGGH! > > Okay. Venting over. Whew! I feel better already. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at mclaren.com Thu Apr 7 11:13:24 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 16:13:24 +0100 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> <4D9DD23D.8090908@sonsorol.org> Message-ID: <207BB2F60743C34496BE41039233A809042454EA@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > There are people like Cycle/Rightscale etc. etc. who actually rise > above > the hype and deliver clever & usable stuff with a minimum of marketing > BS. > > My $.02 of course Surely your $.02 per cpu per minute? The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 11:15:58 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 11:15:58 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> <4D9DD2D1.9070309@ias.edu> <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> Message-ID: <4D9DD52E.8040103@ias.edu> Oh, sorry. I missed the sarcasm. I thought you were defending MS. The "office supplies calculations" should have tripped my sarcasm detector immediately! Sorry. I'm in a rare (and ranting!) mood today. Must be time for a vacation. Prentice On 04/07/2011 11:13 AM, Crusan, Steve wrote: > Oh I understand the difference, but I thought I'd take this opportunity > to bash MS. > > But, since MS's cloud runs off of MS Azure and MS Server 2008, I would > bet the excel functionality would be possible. > > ---------------------- > Steve Crusan > System Administrator"Crusan, Steve" > Center for Research Computing > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal > Sent: Thu 4/7/2011 11:05 AM > Cc: Beowulf Mailing List > Subject: Re: [Beowulf] Microsoft "cloud" commercials. > > "Cluster" != "Cloud" > > The Cloud, by definition requires the Internet. Clusters do not. In > fact, I bet the NSA can show you many clusters that are not connect to > the Internet at all. > > While I'm at it, "Grid" != ("Cluster" || "Cloud") either! > > > On 04/07/2011 10:49 AM, Crusan, Steve wrote: >> Windows HPC Server 2008 also has a builtin feature for an end user to >> submit excel docs to a windows cluster to do intense timesheet and >> office supplies calculations... >> >> ---------------------- >> Steve Crusan >> System Administrator >> Center for Research Computing >> >> >> >> -----Original Message----- >> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal >> Sent: Thu 4/7/2011 10:21 AM >> To: Beowulf Mailing List >> Subject: [Beowulf] Microsoft "cloud" commercials. >> >> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? >> >> In all these commercials, the protagonists say "to the cloud" for their >> solution, but then when they show them using Microsoft Windows to access >> "the cloud", they're not using the cloud at all. >> >> In fact, in one commercial, the one where the wife/mother is fixing the >> family portrait, she's using a photoshop-like program on her own >> desktop, not even the Internet is needed. >> >> Not only do they use the term "cloud" incorrectly, they don't even show >> how using Microsoft products give you and advantage for using "the cloud" >> >> AAAAAAARRRRRRRGGGH! >> >> Okay. Venting over. Whew! I feel better already. >> >> -- >> Prentice >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Apr 7 11:35:24 2011 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 07 Apr 2011 11:35:24 -0400 Subject: [Beowulf] Westmere EX In-Reply-To: <4D9DC0D5.8060802@ias.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <4D9DC0D5.8060802@ias.edu> Message-ID: <4D9DD9BC.3090102@runnersroll.com> On 04/07/11 09:49, Prentice Bisbal wrote: > On 04/06/2011 10:15 PM, Jon Forrest wrote: >> On 4/6/2011 5:39 PM, Mark Hahn wrote: >> >>> shrug. does anyone have serious experience with real apps on manycore >>> machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, >>> but they're substantially more exotic/rare/expensive.) >> >> I have a couple 48-core 1U boxes. They can build >> gcc and other large packages very quickly. > > But are the makes definitely running in parallel to take advantage of > the multiple cores? I haven't built gcc, so don't know if it uses make's > -j option to do parallel builds. > Yes, see: http://gcc.gnu.org/install/build.html In general I see quite nice speedups on my four-core machine at home running Gentoo, but I find running -j > cores up to 2xcores tends to produce better results as many packages (especially with recursive makes) tend to mix configuration (low cpu usage) with makes (high cpu usage). The gentoo handbook itself suggests cores+1 for the -j parameter. Higher than core -j counts is purely a heuristic, and a few packages will degrade a bit because in fact 8 (2x4cores) processes are spawned, each contending heavily for the 4 cores and context switching starts to slow things down and hurt locality. Once again, I suppose this is a YMMV situation. It would be cool to hack make to dynamically throttle parallelization based on cpu usage within some given bounds... I have access to a 48 core box, so if I get a chance I'll generate a graph for the list on gcc build times by -j count. Note however that I don't have root access so I can't clear caches, which should be taken into account when examining results. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Apr 7 11:42:18 2011 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 07 Apr 2011 11:42:18 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <4D9DD52E.8040103@ias.edu> References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> <4D9DD2D1.9070309@ias.edu> <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> <4D9DD52E.8040103@ias.edu> Message-ID: <4D9DDB5A.3030700@runnersroll.com> >>> -----Original Message----- >>> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal >>> Sent: Thu 4/7/2011 10:21 AM >>> To: Beowulf Mailing List >>> Subject: [Beowulf] Microsoft "cloud" commercials. >>> >>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? I completely agree. It's a darn shame all those Truth campaigns concentrate on drugs - clarifying popular media is a desperately needed service for so many domains (at least in US media). Although I have to admit I'm not sure if the cloud misnomer or the disgusting family dynamics of the Photoshop commercial are more bothering to me. Always the dopey dad with these commercials... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Apr 7 11:53:31 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 16:53:31 +0100 Subject: [Beowulf] Westmere EX References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> Message-ID: <207BB2F60743C34496BE41039233A8090424562F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > -----Original Message----- > From: Vincent Diepeveen [mailto:diep at xs4all.nl] > Sent: 07 April 2011 13:53 > > But a 8 socket @ 10 core nehalem-ex, in basic configuration will be > already far above $205k. Probably a $300k or > so when configured. > > Huge price difference. > > So i assume you didn't refer to the Nehalem-ex box. I was referring to the Nehalem. http://www.lasystems.be/Supermicro/SYS-5086B-TRF/Superserver5086B-TRF8-W ay/product/248987.html Add 8 CPUs at $4000 per cpu, and 64 DIMMs at $944 per DIMM The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Apr 7 12:11:13 2011 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 07 Apr 2011 12:11:13 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <4D9DD23D.8090908@sonsorol.org> References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> <4D9DD23D.8090908@sonsorol.org> Message-ID: <4D9DE221.2040806@runnersroll.com> On 04/07/11 11:03, Chris Dagdigian wrote: > One of the nice things about their work is how "usable" it is to real > people with real production computing requirements - in the IAAS cloud I wonder what "real" people with "real" production computing requirements means here. See below for further thoughts on my thoughts on "real" codes and where I suspect they arise. > It's not a PR gimmick and limiting the definition of "cluster" to only > systems that run parallel applications would alienate quite a few of us > on this list :) In the life sciences a typical cluster might run a > mixture of 80-90% serial jobs with a small scattering of real MPI apps > running alongside. I'm certainly a pragmatist here - use the machines as your organization feels is best. However I still have a strong suspicion that most jobs are serial because of: 1. Lack of experience properly parallelizing codes 2. Lack of proper environment on one's own desktop (i.e. Linux or group licenses) 3. In rare cases such rapid development and short lifetime of a code that parallelizing it will take longer than poorly serially coding it and tolerating the run-times. I can only hope that within the decade the programming paradigm shifts along with the hardware and the average bloke becomes at least exposed to basic parallel programming concepts. The machine is still a "cluster" - the way it's used shouldn't guide what it is referred to. That doesn't mean running serial jobs on a machine tailored for parallel ones is the best way to use your time/money. Probably better for one to simply buy Linux desktops for all the employees, put them on a typical GigE network and have the employees submit jobs to some tiny server in the Bosses office which routes jobs evenly to everyone's machine distributed throughout the building. ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Apr 7 12:25:35 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 07 Apr 2011 12:25:35 -0400 Subject: [Beowulf] Westmere EX In-Reply-To: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> Message-ID: <4D9DE57F.4040303@ldeo.columbia.edu> Vincent Diepeveen wrote: > GPU monster box, which is basically a few videocards inside such a > box stacked up a tad, wil only add a couple of > thousands. > This price may be OK for the videocard-class GPUs, but sounds underestimated, at least for Fermi Tesla. Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, with 448 cores and 3GB RAM per GPU, cost around $10k. For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. If you care about ECC, that's the price you pay, right? Gus Correa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cap at nsc.liu.se Thu Apr 7 13:26:51 2011 From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=) Date: Thu, 7 Apr 2011 19:26:51 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <201104071926.56911.cap@nsc.liu.se> On Thursday, April 07, 2011 02:39:19 am Mark Hahn wrote: ... > I bet there will be 100x more 4s servers build with these chips than 8s. > and 1000x more 2s than 4s... Sounds about right :-) Not your average compute node by a long shot. > a friend noticed something weird on intel's spec sheets: > http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC > 3E > > notice it says 32GB max memory size. even if that means 32GB/socket, > it's not all that much. Certainly looks odd on that page but does likely refer to max DIMM size. With 64 DIMMs (4 socket example) that would then give you 2T. > I don't know about everyone else, but I'm already bored with core counts ;) > these also seem fairly warm (130W), considering that they're the fancy > new 32nm process and run at modest clock rates... It's the size of the beast... (caused by the number of cores and size of last level cache). /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Thu Apr 7 15:26:57 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 7 Apr 2011 21:26:57 +0200 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <4D9DE57F.4040303@ldeo.columbia.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> Message-ID: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > Vincent Diepeveen wrote: > >> GPU monster box, which is basically a few videocards inside such a >> box stacked up a tad, wil only add a couple of >> thousands. >> > > This price may be OK for the videocard-class GPUs, > but sounds underestimated, at least for Fermi Tesla. Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 note there is a 6 GB version, not aware of price will be $$$$ i bet. or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro VERSUS 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. Factor 100 difference to those cards. A couple of thousands versus a couple of hundreds of thousands. Hope i made my point clear. > Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, > with 448 cores and 3GB RAM per GPU, cost around $10k. > For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. > If you care about ECC, that's the price you pay, right? When fermi released it was a great gpu. Regrettably they lobotomized the gamers card's double precision as i understand, So it hardly has double precision capabilities; if you go for nvidia you sure need a Tesla, no question about it. As a company i would buy in 6990's though, they're a lot cheaper and roughly 3x faster than the Nvidia's (for some more than 3x for other occassions less than 3x, note the card has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD versus 448 cores nvidia with 448 execution units of 32 bits multiplication. Especially because multiplication has improved a lot. Already having written CUDA code some while ago, i wanted the cheap gamers card with big horse power now at home so i'm toying on a 6970 now so will be able to report to you what is possible to achieve at that card with respect to prime numbers and such. I'm a bit amazed so little public initiatives write code for the AMD gpu's. Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a CRC calculation (if i understand it correctly). It's a bit more primitive than ECC, but works pretty ok and shows you also when problems occured there, so figuring out remove what goes on is possible. Make no mistake that this isn't ECC. We know some HPC centers have as a hard requirement ECC, only nvidia is an alternative then. In earlier posts from some time ago and some years ago i already wrote on that governments should adapt more to how hardware develops rather than demand that hardware has to follow them. HPC has too little cash to demand that from industry. OpenCL i cannot advice at this moment (for a number of reasons). AMD-CAL and CUDA are somewhat similar. Sure there is differences, but majority of codes are possible to port quite well (there is exceptions), or easy work arounds. Any company doing gpgpu i would advice developing both branches of code at the same time, as that gives the company a lot of extra choices for really very little extra work. Maybe 1 coder, and it always allows you to have the fastest setup run your production code. That said we can safely expect that from raw performance coming years AMD will keep the leading edge from crunching viewpoint. Elsewhere i pointed out why. Even then i'd never bet at just 1 manufacturer. Go for both considering the cheap price of it. For a lot of HPC centers the choice of nvidia will be an easy one, as the price of the Fermi cards is peanuts compared to the price rest of the system and considering other demands that's what they'll go for. That might change once you stick in bunches of videocards in nodes. Please note that the gpu 'streamcores' or PE's whatever name you want to give them, are so bloody fast, that your code has to work within the PE's themselves and hardly use the RAM. Both for Nvidia as well as AMD, the streamcores are so fast, that you simply don't want to lose time on the RAM when your software runs, let alone that you want to use huge RAM. Add to that, that nvidia (have to still figure out for AMD) can in background stream from and to the gpu's RAM from the CPU, so if you do really large calculations involving many nodes, all that shouldn't be an issue in the first place. So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would really amaze me, though i'm sure there is cases where that happens. If we see however what was ordered it mostly is the 3GB Tesla's, at least on what has been reported, i have no global statistics on that... Now all choices are valid there, but even then we speak about peanuts money compared to the price of a single 8 socket Nehalem-ex box, which fully configured will be maybe $300k-$400k or something? Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 is 2000 euro. There won't be 2 gpu nvidia's any soon because of the choice they have historically made for the memory controllers. See explanation of intel fanboy David Kanter for that at realworldtech in a special article he wrote there. Please note i'm not judging AMD nor Nvidia, they have made their choices based upon totally different businessmodels i suspect and we must be happy we have this rich choice right now between cpu's from different manufacturers and gpu's from different manufacturers. Nvidia really seems to aim at supercomputers, giving their tesla line without lobotomization and lobotomizing their gamers cards, where AMD aims at gamers and their gamercards have full functionality without lobotomization. Total different businessmodels. Both have their advantages and disadvantages. From pure performance viewpoint it's easy to see what's faster though. Yet right now i realize all too well that just too many still hesitate between also offering gpu services additional to cpu services, in which case having a gpu, regardless nvidia or amd, kicks butt of course from throughput viewpoint. To be really honest with you guys, i had expected that by 2011 we would have a gpu reaching far over 1 Teraflop double precision handsdown. If we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 gpu's on a single card to get over that Teraflop double precision (claim is 1.27 Teraflop double precision), that really is underneath my expectations from a few years ago. Now of course i hope you realize i'm not coding double precision code at all; i'm writing everything in integers of 32 bits for the AMD card and the Nvidia equivalent also is using 32 bits integers. The ideal way to do calculations on those cards, so also very big transforms, is using the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD). Regards, Vincent > > Gus Correa > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 15:44:25 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 15:44:25 -0400 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <4D9E1419.9000408@ias.edu> On 04/07/2011 03:26 PM, Vincent Diepeveen wrote: > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > You can't do a direct comparison between a CPU and a GPU. There are many things that GPUs can't do (or can't do well) that are still better done on a CPU. Even NVidia acknowledges in most of their promotional and educational literature. One example would be a code with a lot of branching. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Apr 7 16:37:46 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 07 Apr 2011 16:37:46 -0400 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <4D9E209A.1040408@ldeo.columbia.edu> Vincent Diepeveen wrote: > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > Not so much. In your original message you said: "GPU monster box, which is basically a few videocards inside such a box stacked up a tad, wil only add a couple of thousands." So, first it was a few GPUs on a box (whatever else the box might have inside) for a couple of thousand (if dollars or euros you did not specify). Now you checked out the real prices, and said that a *single* Fermi Tesla C2070 cost ~$2,200 (just the GPU alone, price in US dollars I suppose), which is more like the real thing. However, instead of admitting that your previous numbers were mistaken, you insist that: "Hope i made my point clear.". Is this how you play chess? :) Even if your opponent is a computer, he/she/it might get a bit discouraged. You always win, even before the game starts. Anyway, I don't play chess, I am no GPU expert, I don't know about the lobotomizing of Fermi (I hope you're not talking about Enrico, he's dead), and I don't think we're going anywhere with this discussion. However, the GPU prices you sent in your original email to the list were underestimated, although I am afraid I may not be able to make this point go across to you. The prices you sent were too low, at least when it comes to GPUs with ECC, which is what is reliable for HPC. Thank you, Gus Correa > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia you > sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less than > 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD > versus 448 cores nvidia with 448 execution units of 32 bits multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able to > report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a > CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, but > works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on is > possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia is > an alternative then. > > In earlier posts from some time ago and some years ago i already wrote > on that governments should > adapt more to how hardware develops rather than demand that hardware has > to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of code > at the same time, > as that gives the company a lot of extra choices for really very little > extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your production > code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both considering > the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want to > give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use the > RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would > really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered it > mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be maybe > $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 > is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they have > historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at realworldtech > in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their choices > based upon totally different > businessmodels i suspect and we must be happy we have this rich choice > right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still hesitate > between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we would > have a gpu reaching far over 1 Teraflop double precision handsdown. If > we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 > gpu's on a single card to get over that Teraflop double precision (claim > is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code at > all; i'm writing everything in integers of 32 bits for the AMD card and > the Nvidia equivalent also is using 32 bits integers. The ideal way to > do calculations on those cards, so also very big transforms, is using > the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Thu Apr 7 18:57:38 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Fri, 08 Apr 2011 05:57:38 +0700 Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to tackle huge scientific challenges Message-ID: <4D9E4162.3030004@pathscale.com> I just saw this on another ML and thought it may be of interest ------------ http://googleblog.blogspot.com/2011/04/1-billion-computing-core-hours-for.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Apr 7 21:03:07 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 07 Apr 2011 21:03:07 -0400 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <4D9E5ECB.60608@ldeo.columbia.edu> Thank you for the information about AMD-CAL and the AMD GPUs. Does AMD plan any GPU product with 64-bit and ECC, similar to Tesla/Fermi? The lack of a language standard may still be a hurdle here. I guess there were old postings here about CUDA and OpenGL. What fraction of the (non-gaming) GPU code is being written these days in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using compiler directives like those in the PGI compilers? Thank you, Gus Correa Vincent Diepeveen wrote: > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia you > sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less than > 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD > versus 448 cores nvidia with 448 execution units of 32 bits multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able to > report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a > CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, but > works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on is > possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia is > an alternative then. > > In earlier posts from some time ago and some years ago i already wrote > on that governments should > adapt more to how hardware develops rather than demand that hardware has > to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of code > at the same time, > as that gives the company a lot of extra choices for really very little > extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your production > code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both considering > the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want to > give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use the > RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would > really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered it > mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be maybe > $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 > is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they have > historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at realworldtech > in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their choices > based upon totally different > businessmodels i suspect and we must be happy we have this rich choice > right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still hesitate > between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we would > have a gpu reaching far over 1 Teraflop double precision handsdown. If > we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 > gpu's on a single card to get over that Teraflop double precision (claim > is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code at > all; i'm writing everything in integers of 32 bits for the AMD card and > the Nvidia equivalent also is using 32 bits integers. The ideal way to > do calculations on those cards, so also very big transforms, is using > the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From kilian.cavalotti.work at gmail.com Fri Apr 8 07:08:01 2011 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Fri, 8 Apr 2011 13:08:01 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: Hi Mark, On Thu, Apr 7, 2011 at 2:39 AM, Mark Hahn wrote: > notice it says 32GB max memory size. ?even if that means 32GB/socket, > it's not all that much. It's actually 32GB per DIMM, so up to 512GB per socket. Cheers, -- Kilian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Fri Apr 8 08:45:09 2011 From: deadline at eadline.org (Douglas Eadline) Date: Fri, 8 Apr 2011 08:45:09 -0400 (EDT) Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <47386.192.168.93.213.1302266709.squirrel@mail.eadline.org> All: This video may help clear things up: http://www.youtube.com/watch?v=usGkq7tAhfc have a nice weekend -- Doug > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia > you sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less > than 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for > AMD > versus 448 cores nvidia with 448 execution units of 32 bits > multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able > to report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD > gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of > AMD a CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, > but works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on > is possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia > is an alternative then. > > In earlier posts from some time ago and some years ago i already > wrote on that governments should > adapt more to how hardware develops rather than demand that hardware > has to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of > code at the same time, > as that gives the company a lot of extra choices for really very > little extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your > production code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both > considering the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want > to give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use > the RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many > nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that > would really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered > it mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on > that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be > maybe $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD > 6990 is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they > have historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at > realworldtech in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their > choices based upon totally different > businessmodels i suspect and we must be happy we have this rich > choice right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still > hesitate between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we > would have a gpu reaching far over 1 Teraflop double precision > handsdown. If we see that Nvidia delivers somewhere around 515 Gflop > and AMD has 2 gpu's on a single card to get over that Teraflop double > precision (claim is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code > at all; i'm writing everything in integers of 32 bits for the AMD > card and the Nvidia equivalent also is using 32 bits integers. The > ideal way to do calculations on those cards, so also very big > transforms, is using the 32 x 32 == 64 bits instructions (that's 2 > instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Fri Apr 8 08:45:08 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 8 Apr 2011 08:45:08 -0400 (EDT) Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: >> notice it says 32GB max memory size. ??even if that means 32GB/socket, >> it's not all that much. > > It's actually 32GB per DIMM, so up to 512GB per socket. right - I eventually found the non-marketing docs. each socket has two memory controllers, each of which supports 2 "intel scalable memory" channels, which support an intel scalable memory buffer, which supports 4 dimms. (the ISMB actually referred to as "advanced memory buffer" in one place, like from fbdimm days...) it also has double-bit correction, triple bit detection on the last-level cache. definitely not designed for cheap or even compact systems... -mark -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eugen at leitl.org Fri Apr 8 15:42:37 2011 From: eugen at leitl.org (Eugen Leitl) Date: Fri, 8 Apr 2011 21:42:37 +0200 Subject: [Beowulf] [FoRK] Cray help?? Re: FaceBook tries to cream Google Message-ID: <20110408194237.GH23560@leitl.org> ----- Forwarded message from "J. Andrew Rogers" ----- From: "J. Andrew Rogers" Date: Fri, 8 Apr 2011 12:27:35 -0700 To: Friends of Rohit Khare Subject: Re: [FoRK] Cray help?? Re: FaceBook tries to cream Google X-Mailer: Apple Mail (2.1084) Reply-To: Friends of Rohit Khare On Apr 8, 2011, at 11:15 AM, Stephen Williams wrote: > I used RabbitMQ not long ago. Impressed with some of it, not with a lot of the rest. Digging through Erlang to determine its real details and limitations was interesting. The group that had chosen it assumed magic that was not there. Bottlenecks were going to kill scalability using the naive design. ZeroMQ is not an MQ despite its name. It is a high-performance implementation of messaging design patterns, including some that are MQ-like. I believe it had aspirations to be an MQ many years ago but turned into an MPI-like high-performance messaging library that abstracts network, IPC, and in-process communication. The basic network performance and scalability of ZeroMQ is similar to MPI. Underneath the hood it is just a collection of lockless, async structures grafted to the usual operating system hooks. Thinking of it as a competitor to MPI in terms of basic functionality is probably the correct framing. J. Andrew Rogers _______________________________________________ FoRK mailing list http://xent.com/mailman/listinfo/fork ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Fri Apr 8 15:42:51 2011 From: eugen at leitl.org (Eugen Leitl) Date: Fri, 8 Apr 2011 21:42:51 +0200 Subject: [Beowulf] [FoRK] FaceBook tries to cream Google Message-ID: <20110408194251.GI23560@leitl.org> ----- Forwarded message from "J. Andrew Rogers" ----- From: "J. Andrew Rogers" Date: Fri, 8 Apr 2011 10:36:31 -0700 To: Friends of Rohit Khare Subject: Re: [FoRK] FaceBook tries to cream Google X-Mailer: Apple Mail (2.1084) Reply-To: Friends of Rohit Khare On Apr 8, 2011, at 8:05 AM, Stephen Williams wrote: > > Agreed. Strange that MPI isn't more widely used (outside supercomputing projects). Although, I'm not aware of it expecting and handling faults / rework as a good Mapreduce imitation, and similar systems, must. It is not that strange, MPI is a bit brittle as a communication library standard. Implementations tend to make simplifying assumptions that are not valid for some parallel applications. You can patch it up to do anything but the level of effort required seems to relegate it to just being used in scientific computing for which it was designed. I've seen ZeroMQ being increasingly used for roughly the same purpose as MPI in "normal" distributed systems, and I personally do not see much reason to prefer the latter over the former for most things. The difference is history. MPI's weakness is that it started from a mediocre design that immediately became part of a standards process, with all the politics and buy-in that entails. It is also badly documented as a practical matter. ZMQ also started with a somewhat dodgy early design but as a library rather than a standard; it was iterated by hackers over several versions into a more sensible and capable design. ZMQ has been willing to break backward compatibility to fix behaviors that irritated the programmers that use it or add badly needed features, which is possible because the "standard" is the implementation. J. Andrew Rogers _______________________________________________ FoRK mailing list http://xent.com/mailman/listinfo/fork ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Tue Apr 12 16:31:41 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Tue, 12 Apr 2011 16:31:41 -0400 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement - pre-alpha release Message-ID: If you are using the "Job to Core Binding" feature in SGE and running SGE on newer hardware, then please give the new hwloc enabled loadcheck a try. http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html The current hardware topology discovery library (Portable Linux Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new hardware topology may not be detected correctly by PLPA. If you are running SGE on AMD Magny-Cours servers, please post your loadcheck output, as it is known to be wrong when handled by PLPA. The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc support in later releases of Grid Engine / Grid Scheduler. http://gridscheduler.sourceforge.net/ Thanks!! Rayson _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Wed Apr 13 12:21:21 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Wed, 13 Apr 2011 12:21:21 -0400 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release In-Reply-To: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2> References: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2> Message-ID: Carlos, I notice that you have "lx24-amd64" instead of "lx26-amd64" for the arch string, so I believe you are running the loadcheck from standard Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of the one from the Open Grid Scheduler page. The existing Grid Engine (including the latest Open Grid Scheduler releases: SGE 6.2u5p1 & SGE 6.2u5p2, or Univa's fork) uses PLPA, and it is known to be wrong on magny-cours. (i.e. SGE 6.2u5p1 & SGE 6.2u5p2 from: http://sourceforge.net/projects/gridscheduler/files/ ) Chansup on the Grid Engine mailing list (it's the general purpose Grid Engine mailing list for now) tested the version I uploaded last night, and seems to work on a dual-socket magny-cours AMD machine. It prints: m_topology SCCCCCCCCCCCCSCCCCCCCCCCCC However, I am still fixing the processor, core id mapping code: http://gridengine.org/pipermail/users/2011-April/000629.html http://gridengine.org/pipermail/users/2011-April/000628.html I compiled the hwloc enabled loadcheck on kernel 2.6.34 & glibc 2.12, so it may not work on machines running lower kernel or glibc versions, you can download it from: http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html Rayson On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez wrote: > This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD system > (and seems to be wrong!): > > arch ? ? ? ? ? ?lx24-amd64 > num_proc ? ? ? ?24 > m_socket ? ? ? ?2 > m_core ? ? ? ? ?12 > m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT > load_short ? ? ?0.29 > load_medium ? ? 0.13 > load_long ? ? ? 0.04 > mem_free ? ? ? ?26257.382812M > swap_free ? ? ? 8191.992188M > virtual_free ? ?34449.375000M > mem_total ? ? ? 32238.328125M > swap_total ? ? ?8191.992188M > virtual_total ? 40430.320312M > mem_used ? ? ? ?5980.945312M > swap_used ? ? ? 0.000000M > virtual_used ? ?5980.945312M > cpu ? ? ? ? ? ? 0.0% > > > Carlos Fernandez Sanchez > Systems Manager > CESGA > Avda. de Vigo s/n. Campus Vida > Tel.: (+34) 981569810, ext. 232 > 15705 - Santiago de Compostela > SPAIN > > -------------------------------------------------- > From: "Rayson Ho" > Sent: Tuesday, April 12, 2011 10:31 PM > To: "Beowulf List" > Subject: [Beowulf] Grid Engine multi-core thread binding enhancement > -pre-alpha release > >> If you are using the "Job to Core Binding" feature in SGE and running >> SGE on newer hardware, then please give the new hwloc enabled >> loadcheck a try. >> >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> The current hardware topology discovery library (Portable Linux >> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new >> hardware topology may not be detected correctly by PLPA. >> >> If you are running SGE on AMD Magny-Cours servers, please post your >> loadcheck output, as it is known to be wrong when handled by PLPA. >> >> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc >> support in later releases of Grid Engine / Grid Scheduler. >> >> http://gridscheduler.sourceforge.net/ >> >> Thanks!! >> Rayson >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Fri Apr 15 10:12:00 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Fri, 15 Apr 2011 10:12:00 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) Message-ID: Hi all, Distributing Linux application binaries is proven to be a major issue as a lot of people wanted to test the hwloc loadcheck but their Linux versions are older than mine. And compiling SGE from source is not simple neither -- I wrote a quick & dirty guide for those who don't want the add-ons but it's usually the extra stuff & dependencies that fail the build. So I would like to offer pre-compiled binaries and upload them onto sourceforge. I know it's a complicated question - what version of Linux should I use to build Grid Engine / Open Grid Scheduler when the binaries are for others to consume?? (In case you are interested, the quick compile guide is at: http://gridscheduler.sourceforge.net/CompileGridEngineSource.html ) Prakashan: I tried to link it statically, and I even tried to compile an older version of glibc on my machine, but I could not get either of them to work!! Rayson On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath wrote: > Hi Rayson, > > Do you have a statically linked version? Thanks. > > ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by > ./loadcheck) > > Prakashan > > > > On 04/13/2011 09:21 AM, Rayson Ho wrote: >> >> Carlos, >> >> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the >> arch string, so I believe you are running the loadcheck from standard >> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of >> the one from the Open Grid Scheduler page. >> >> The existing Grid Engine (including the latest Open Grid Scheduler >> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and >> it is known to be wrong on magny-cours. >> >> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from: >> http://sourceforge.net/projects/gridscheduler/files/ ) >> >> >> Chansup on the Grid Engine mailing list (it's the general purpose Grid >> Engine mailing list for now) tested the version I uploaded last night, >> and seems to work on a dual-socket magny-cours AMD machine. It prints: >> >> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC >> >> However, I am still fixing the processor, core id mapping code: >> >> http://gridengine.org/pipermail/users/2011-April/000629.html >> http://gridengine.org/pipermail/users/2011-April/000628.html >> >> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12, >> so it may not work on machines running lower kernel or glibc versions, >> you can download it from: >> >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> Rayson >> >> >> >> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez >> ?wrote: >>> >>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD >>> system >>> (and seems to be wrong!): >>> >>> arch ? ? ? ? ? ?lx24-amd64 >>> num_proc ? ? ? ?24 >>> m_socket ? ? ? ?2 >>> m_core ? ? ? ? ?12 >>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT >>> load_short ? ? ?0.29 >>> load_medium ? ? 0.13 >>> load_long ? ? ? 0.04 >>> mem_free ? ? ? ?26257.382812M >>> swap_free ? ? ? 8191.992188M >>> virtual_free ? ?34449.375000M >>> mem_total ? ? ? 32238.328125M >>> swap_total ? ? ?8191.992188M >>> virtual_total ? 40430.320312M >>> mem_used ? ? ? ?5980.945312M >>> swap_used ? ? ? 0.000000M >>> virtual_used ? ?5980.945312M >>> cpu ? ? ? ? ? ? 0.0% >>> >>> >>> Carlos Fernandez Sanchez >>> Systems Manager >>> CESGA >>> Avda. de Vigo s/n. Campus Vida >>> Tel.: (+34) 981569810, ext. 232 >>> 15705 - Santiago de Compostela >>> SPAIN >>> >>> -------------------------------------------------- >>> From: "Rayson Ho" >>> Sent: Tuesday, April 12, 2011 10:31 PM >>> To: "Beowulf List" >>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement >>> -pre-alpha release >>> >>>> If you are using the "Job to Core Binding" feature in SGE and running >>>> SGE on newer hardware, then please give the new hwloc enabled >>>> loadcheck a try. >>>> >>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >>>> >>>> The current hardware topology discovery library (Portable Linux >>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new >>>> hardware topology may not be detected correctly by PLPA. >>>> >>>> If you are running SGE on AMD Magny-Cours servers, please post your >>>> loadcheck output, as it is known to be wrong when handled by PLPA. >>>> >>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc >>>> support in later releases of Grid Engine / Grid Scheduler. >>>> >>>> http://gridscheduler.sourceforge.net/ >>>> >>>> Thanks!! >>>> Rayson >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Apr 15 10:19:04 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 15 Apr 2011 10:19:04 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: Message-ID: <4DA853D8.8000308@scalableinformatics.com> On 04/15/2011 10:12 AM, Rayson Ho wrote: > I know it's a complicated question - what version of Linux should I > use to build Grid Engine / Open Grid Scheduler when the binaries are > for others to consume?? I'd recommend a Centos 5.x variant, and possibly a SuSE variant. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Apr 15 12:15:38 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 15 Apr 2011 12:15:38 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: <4DA853D8.8000308@scalableinformatics.com> References: <4DA853D8.8000308@scalableinformatics.com> Message-ID: <4DA86F2A.5010708@ias.edu> On 04/15/2011 10:19 AM, Joe Landman wrote: > On 04/15/2011 10:12 AM, Rayson Ho wrote: > >> I know it's a complicated question - what version of Linux should I >> use to build Grid Engine / Open Grid Scheduler when the binaries are >> for others to consume?? > > I'd recommend a Centos 5.x variant, and possibly a SuSE variant. > I agree, but I think that if you can get your hands on an actual RHEL image, that's what you should use, as long as you already have access to it. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Fri Apr 15 12:25:10 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Fri, 15 Apr 2011 12:25:10 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: <4DA86F2A.5010708@ias.edu> References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> Message-ID: Thanks all!! If I build on Centos 5.6, will the binaries run on SuSE & Ubuntu?? (Don't want what versions SuSE & Ubuntu most people are using -- I have Ubuntu 10 & 11 on my machines, and F13.) Rayson On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal wrote: > On 04/15/2011 10:19 AM, Joe Landman wrote: >> On 04/15/2011 10:12 AM, Rayson Ho wrote: >> >>> I know it's a complicated question - what version of Linux should I >>> use to build Grid Engine / Open Grid Scheduler when the binaries are >>> for others to consume?? >> >> I'd recommend a Centos 5.x variant, and possibly a SuSE variant. >> > > I agree, but I think that if you can get your hands on an actual RHEL > image, that's what you should use, as long as you already have access to > it. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Apr 15 13:49:55 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 15 Apr 2011 13:49:55 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> Message-ID: <4DA88543.6000603@ias.edu> On 04/15/2011 01:40 PM, Chi Chan wrote: > On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal wrote: >> I agree, but I think that if you can get your hands on an actual RHEL >> image, that's what you should use, as long as you already have access to >> it. > > Or just use Oracle Linux, it is free to download and distribute, and > can be used in production: > > http://www.oracle.com/us/technologies/linux/competitive-335546.html > http://www.oracle.com/us/technologies/027617.pdf > > From my experience, Oracle Linux and RHEL are idential, you can > compile applications on Oracle Linux and ship it to run on RHEL boxes. I had recommended RHEL just because its the "gold standard" for all RHEL-derived distros. CentOS and a few others *should* be identical. However, I don't think Oracle is. Doesn't Oracle make some changes to optimize it for running Oracle? I'm not sure of that, which is why I'm asking and not stating. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Sun Apr 17 20:36:40 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Mon, 18 Apr 2011 10:36:40 +1000 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> Message-ID: <4DAB8798.8070102@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 16/04/11 02:25, Rayson Ho wrote: > If I build on Centos 5.6, will the binaries run on > SuSE & Ubuntu?? I'd suggest that if you want them to work (and especially if you want to package them appropriately) then you're far better off getting a build machine for the OS's you want to support. CentOS, SLES, Debian & Ubuntu. We build all our x86 stuff on our CentOS5 cluster and rsync it over to our RHEL5 cluster (sadly we can't just share /usr/local/ between them due to circumstances beyond our control) without issues. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz 430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy =stQg -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Mon Apr 18 09:03:50 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 18 Apr 2011 15:03:50 +0200 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: <4DAB8798.8070102@unimelb.edu.au> References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> <4DAB8798.8070102@unimelb.edu.au> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 18.04.2011 um 02:36 schrieb Christopher Samuel: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 16/04/11 02:25, Rayson Ho wrote: > >> If I build on Centos 5.6, will the binaries run on >> SuSE & Ubuntu?? > > I'd suggest that if you want them to work (and especially > if you want to package them appropriately) then you're > far better off getting a build machine for the OS's you > want to support. CentOS, SLES, Debian & Ubuntu. Before there was only a common and a platform specific tarball. Does it imply to supply *.rpm in the future? It was always nice to just untar SGE and run it as a normal user w/o any root privilege (yes, rpm2cpio could do). And it was one tarball for all Linux variants. I would vote for staying with this. - -- Reuti > We build all our x86 stuff on our CentOS5 cluster and > rsync it over to our RHEL5 cluster (sadly we can't just > share /usr/local/ between them due to circumstances > beyond our control) without issues. > > cheers, > Chris > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz > 430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy > =stQg > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.16 (Darwin) iEYEARECAAYFAk2sNsQACgkQo/GbGkBRnRr55QCdGyBkTKd7EsTWSvVPRWuMQbGA kOQAniYFwJyMOlwcR3ITHS9nAfGRZndh =iknW -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 18 11:24:00 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 18 Apr 2011 11:24:00 -0400 (EDT) Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> <4DAB8798.8070102@unimelb.edu.au> Message-ID: not to be overly surly, but this really has nothing to do with beowulf and is a rather specialized sge support issue... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Mon Apr 18 12:34:11 2011 From: mathog at caltech.edu (David Mathog) Date: Mon, 18 Apr 2011 09:34:11 -0700 Subject: [Beowulf] Grid Engine build machine Message-ID: Rayson Ho wrote > And compiling SGE from source is not > simple neither -- I wrote a quick & dirty guide for those who don't > want the add-ons but it's usually the extra stuff & dependencies that > fail the build. Does it still use aimk or has it finally gone over to autoconf, automake? As I recall aimk was really touchy the last time I built this (4 years ago), with lots of futzing around to convince it to use library files it should have found on its own. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Mon Apr 18 12:36:19 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 18 Apr 2011 18:36:19 +0200 Subject: [Beowulf] Grid Engine build machine In-Reply-To: References: Message-ID: Am 18.04.2011 um 18:34 schrieb David Mathog: > Rayson Ho wrote >> And compiling SGE from source is not >> simple neither -- I wrote a quick & dirty guide for those who don't >> want the add-ons but it's usually the extra stuff & dependencies that >> fail the build. > > Does it still use aimk Still aimk. -- Reuti > or has it finally gone over to autoconf, automake? > As I recall aimk was really touchy the last time I built this (4 > years ago), with lots of futzing around to convince it to use library > files it should have found on its own. > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Mon Apr 18 14:26:57 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Mon, 18 Apr 2011 14:26:57 -0400 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release In-Reply-To: <4DA5E85D.4010801@ats.ucla.edu> References: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2> <4DA5E85D.4010801@ats.ucla.edu> Message-ID: For those who had issues with earlier version, please try the latest loadcheck v4: http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html I compiled the binary on Oracle Linux, which is compatible with RHEL 5.x, Scientific Linux or Centos 5.x. I tested the binary on the standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise Kernel", Fedora 13, Ubuntu 10.04 LTS. Optimizing for AMD's NUMA machine characteristics is on the ToDo list. Rayson On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath wrote: > Hi Rayson, > > Do you have a statically linked version? Thanks. > > ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by > ./loadcheck) > > Prakashan > > > > On 04/13/2011 09:21 AM, Rayson Ho wrote: >> >> Carlos, >> >> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the >> arch string, so I believe you are running the loadcheck from standard >> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of >> the one from the Open Grid Scheduler page. >> >> The existing Grid Engine (including the latest Open Grid Scheduler >> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and >> it is known to be wrong on magny-cours. >> >> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from: >> http://sourceforge.net/projects/gridscheduler/files/ ) >> >> >> Chansup on the Grid Engine mailing list (it's the general purpose Grid >> Engine mailing list for now) tested the version I uploaded last night, >> and seems to work on a dual-socket magny-cours AMD machine. It prints: >> >> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC >> >> However, I am still fixing the processor, core id mapping code: >> >> http://gridengine.org/pipermail/users/2011-April/000629.html >> http://gridengine.org/pipermail/users/2011-April/000628.html >> >> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12, >> so it may not work on machines running lower kernel or glibc versions, >> you can download it from: >> >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> Rayson >> >> >> >> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez >> ?wrote: >>> >>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD >>> system >>> (and seems to be wrong!): >>> >>> arch ? ? ? ? ? ?lx24-amd64 >>> num_proc ? ? ? ?24 >>> m_socket ? ? ? ?2 >>> m_core ? ? ? ? ?12 >>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT >>> load_short ? ? ?0.29 >>> load_medium ? ? 0.13 >>> load_long ? ? ? 0.04 >>> mem_free ? ? ? ?26257.382812M >>> swap_free ? ? ? 8191.992188M >>> virtual_free ? ?34449.375000M >>> mem_total ? ? ? 32238.328125M >>> swap_total ? ? ?8191.992188M >>> virtual_total ? 40430.320312M >>> mem_used ? ? ? ?5980.945312M >>> swap_used ? ? ? 0.000000M >>> virtual_used ? ?5980.945312M >>> cpu ? ? ? ? ? ? 0.0% >>> >>> >>> Carlos Fernandez Sanchez >>> Systems Manager >>> CESGA >>> Avda. de Vigo s/n. Campus Vida >>> Tel.: (+34) 981569810, ext. 232 >>> 15705 - Santiago de Compostela >>> SPAIN >>> >>> -------------------------------------------------- >>> From: "Rayson Ho" >>> Sent: Tuesday, April 12, 2011 10:31 PM >>> To: "Beowulf List" >>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement >>> -pre-alpha release >>> >>>> If you are using the "Job to Core Binding" feature in SGE and running >>>> SGE on newer hardware, then please give the new hwloc enabled >>>> loadcheck a try. >>>> >>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >>>> >>>> The current hardware topology discovery library (Portable Linux >>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new >>>> hardware topology may not be detected correctly by PLPA. >>>> >>>> If you are running SGE on AMD Magny-Cours servers, please post your >>>> loadcheck output, as it is known to be wrong when handled by PLPA. >>>> >>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc >>>> support in later releases of Grid Engine / Grid Scheduler. >>>> >>>> http://gridscheduler.sourceforge.net/ >>>> >>>> Thanks!! >>>> Rayson >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 21 08:59:30 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 21 Apr 2011 14:59:30 +0200 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <4D9E5ECB.60608@ldeo.columbia.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> <4D9E5ECB.60608@ldeo.columbia.edu> Message-ID: <0F717DDD-470A-4B13-B1AF-FBCB034409DC@xs4all.nl> hi, Sometimes going through some old emails. Note in the meantime i switched from AMD-CAL to OpenCL. On Apr 8, 2011, at 3:03 AM, Gus Correa wrote: > Thank you for the information about AMD-CAL and the AMD GPUs. > Does AMD plan any GPU product with 64-bit and ECC, > similar to Tesla/Fermi? Actually DDR5 already calculates a CRC. Not as good as ECC, but it takes care you have a form of checking. Also the amount of bitflips is so little as the quality of this DDR5 is so great, according to some memory experts i spoke with, that this CRC is more than sufficient. As i'm not a memory expert i would advice you to really speak with such a guy instead of some HPC guys here. Now if your organisation wants ECC simply i'm not going to argue. A demand is a demand there. I'm busy pricewise here how to build cheap something that delivers a big punch. If you look objectively and then to gpgpu codes, then of course Nvidia has a few years more experience setting up CUDA. This is another problem of course, software support. Both suck at it, to say polite. Yet we want to do calculations cheap huh. Yet if performance matters, then AMD is a very cheap alternative. In both cases of course, programming for a gpu is going to be the bottleneck; historically organisations do not invest in good code, they only invest in hardware and in managers who sit on their behind, drink coffee and do meetings. Objectively most codes you can also code in 32 bits. If we do a simple compare then the HD6990 is there for 540 euro in the shop here. Now that's European prices where salestax is 19%, so in USA probably it's cheaper (if you calculate it back to euro's). Let's now ignore the marketing nonsense ok, as marketing nonsense is marketing nonsense. All those theoretic flops always, they shouldn't allow double counting specific instructions like multiply add. The internals of these gpu's are all organized such that doing efficient matrix calculations on them is very well possible. Not easy to solve well, as the bottleneck will be the bandwidth from the DDR3 cpu ram to the gpu, yet if you look to a lot of calculations, then it's algorithmic possible to do a lot more work at the execution unit side than the bandwidth you need to another node; those execution units, PE's (processing elements) nowadays called, have huge GPR's which can proces all that. With that those tiny cheap power efficient cores can easily take on huge expensive cpu cores. A single set of 4 PE's in case of AMD has a total of 1024 GPR's, can read from a L1 cache when needed and write to a shared local cache of 32KB (shared by 64 pe's). That L1 reads from the memory L2 and all that has a huge bandwidth. That gives you PRACTICAL 3072 PE's @ 0.83 Ghz == 2.5+ Tflop in 32 bits integers. It's not so hard to convert that to 64 bits code if that's what you need. In fact i'm using it to approximate huge integers (prime numbers) of million bit sizes (factorisation of them). Using that efficiently is not easy, yet realize this is 2.5+ Tflop (i should actually say Tera 32 bits integer performance). Good programmers can use todays GPU's very efficiently. The 6000+ series of AMD and the Fermi series of Nvidia are very good and you can use them in a sustained manner. Now the cheapest gpgpu of Nvidia is about $1200 which is the quadro 6000 series and delivers 448 cores @ 1.2Ghz, say roughly 550 Gflop. Of course this is practical what you can achieve, i'm not counting of course multiply-add here as being 2 flops, which is their own definition of how many gflops it gets; first of all i'm not interested in flops but in integers per cycle and secondly i prefer a realistic measure, otherwise we have no measure on how efficiently we use the gpu. If you look from a mathematical viewpoint, it's not so clever from most scientists at todays huge calculations to use floating point. Single precision or double precision, in the end it all backtracks errors and you have complete non-deterministic results with big sureness. Much better are integer transforms where you have 100% lossless calculations so big sureness your calculation is ok. Yet i realize this is a very expertise field with most people who know something about that hiding in secrecy using fake names and some even having fake burials, just in order to disappear. That in itself is all very sad, as progressing science doesn't happen. As a result of that scientific world has focussed too much upon floating point. Yet the cards can deliver that as well as we know. The round off errors all those floating point calculations cause are always a huge multiple of bitflips of memory. It's not even in the same league. Now of course calculating with 64 bits integers it's easier to do huge transforms and you can redo your calculation and at some spots you will have deterministic output in such case, in others of course not (depends what you calculate of course - majority is non- deterministic). With 32 bits integers you need a lot of CRT (Chinese Remainder Theorem) tricks to effectively use it for huge transforms, or you simply emulate 64 bits calculations (so with 64 bits precision, please do not confuse with double precision floating point). Getting all that to work is very challenging and not easy, i realize that. Yet look at the huge advantage you give to your scientists in such case. They can look years ahead in the future which is a *huge* advantage. In this manner you'll actually effectively get 2.x Tflop out of those 6990, again that's 2 Tflop calculated in my manner, i'm looking simply at INSTRUCTION LEVEL where 1 instruction represents a single unit of 32 bits; counting the multiply-add instruction as 2 flops is just too confusing for how efficient you manage to load your GPU, if you ask me. In the transforms in fact multiply-add is very silly to use in many cases as that means you're doing some sort of inefficient calculation. Yet that chippie is just 500 euro, versus Nvidia delivers it for 1200 dollar and the nvidia one is factor 3 slower, though still lightyears faster than a CPU solution there (pricewise seen). The quadro 6000 for those who don't realize it, is exactly the same like a Tesla. Just checkout the specs. Yet of course for our lazy scientists all of the above is not so interesting. Just compiling your years 80 code, pushing the enter button, is a lot easier. If you care however for PERFORMANCE, consider spending a couple of thousands to hardware. If you buy 1000 of those 6990's and program in opencl, you actually can also run that at nvidia hardware, might nvidia be so lucky to release very quickly a 22 nm gpu some years from now. By then also nvidia's opencl will probably be supporting the gpu hardware quite ok. So my advice would be: program it in opencl. It's not the most efficient language on the planet, yet it'll work everywhere and you can get probably around 2 Tflop out that 6990 AMD card. That said of course there is a zillion problems still with opencl, yet if you want for $500k in gpu hardware achieve 1 petaflop, you'll have to suffer a bit, and by the time your cluster is there, possibly all big bugs have been fixed in opencl both by amd as well as by nvidia for their gpu lines. Now all this said i do realize that you need a shift in thinking. Whether you use AMD-gpu's or Nvidia, in both cases you'll need great new software. In fact it doesn't even matter whether you program it in OpenCL or CUDA. It's easy to port algorithms from 1 entity to another; getting such algorithm to work is a lot harder than the question what language you program it in. Translating CUDA to openCL is pretty much braindead work which many can carry out as we already saw in some examples. The investment is in the software for the gpu's. You don't buy that in from nvidia nor AMD. You'l have to hire people to program it, as your own scientists simply aren't good enough to program efficiently for that GPU. The old fashionned vision of having scientists solve themselve how to do the calculations is not going to work for gpgpu simply. Now that is a big pitfall that is hard to overcome. All this said, of course there is a few, really very few, applications where a full blown gpu nor hybrid solution is able to solve the problems. Yet usually such claim that it is "not possible" gets done by scientists who are experts in their field, but not very high level in finding solutions how to efficiently get their calculations done in HPC. Regards, Vincent > > The lack of a language standard may still be a hurdle here. > I guess there were old postings here about CUDA and OpenGL. > What fraction of the (non-gaming) GPU code is being written these days > in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using > compiler directives like those in the PGI compilers? > > Thank you, > Gus Correa > > Vincent Diepeveen wrote: >> >> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: >> >>> Vincent Diepeveen wrote: >>> >>>> GPU monster box, which is basically a few videocards inside such a >>>> box stacked up a tad, wil only add a couple of >>>> thousands. >>>> >>> >>> This price may be OK for the videocard-class GPUs, >>> but sounds underestimated, at least for Fermi Tesla. >> >> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 >> note there is a 6 GB version, not aware of price will be $$$$ i bet. >> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro >> >> VERSUS >> >> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. >> >> Factor 100 difference to those cards. >> >> A couple of thousands versus a couple of hundreds of thousands. >> Hope i made my point clear. >> >> >>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla >>> C2050, >>> with 448 cores and 3GB RAM per GPU, cost around $10k. >>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~ >>> $15k. >>> If you care about ECC, that's the price you pay, right? >> >> When fermi released it was a great gpu. >> >> Regrettably they lobotomized the gamers card's double precision as i >> understand, >> So it hardly has double precision capabilities; if you go for >> nvidia you >> sure need a Tesla, >> no question about it. >> >> As a company i would buy in 6990's though, they're a lot cheaper and >> roughly 3x faster >> than the Nvidia's (for some more than 3x for other occassions less >> than >> 3x, note the card >> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). >> >> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units >> for AMD >> versus 448 cores nvidia with 448 execution units of 32 bits >> multiplication. >> >> Especially because multiplication has improved a lot. >> >> Already having written CUDA code some while ago, i wanted the cheap >> gamers card with big >> horse power now at home so i'm toying on a 6970 now so will be >> able to >> report to you what is possible to >> achieve at that card with respect to prime numbers and such. >> >> I'm a bit amazed so little public initiatives write code for the >> AMD gpu's. >> >> Note that DDR5 ram doesn't have ECC by default, but has in case of >> AMD a >> CRC calculation >> (if i understand it correctly). It's a bit more primitive than >> ECC, but >> works pretty ok and shows you >> also when problems occured there, so figuring out remove what goes >> on is >> possible. >> >> Make no mistake that this isn't ECC. >> We know some HPC centers have as a hard requirement ECC, only >> nvidia is >> an alternative then. >> >> In earlier posts from some time ago and some years ago i already >> wrote >> on that governments should >> adapt more to how hardware develops rather than demand that >> hardware has >> to follow them. >> >> HPC has too little cash to demand that from industry. >> >> OpenCL i cannot advice at this moment (for a number of reasons). >> >> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but >> majority of codes are possible >> to port quite well (there is exceptions), or easy work arounds. >> >> Any company doing gpgpu i would advice developing both branches of >> code >> at the same time, >> as that gives the company a lot of extra choices for really very >> little >> extra work. Maybe 1 coder, >> and it always allows you to have the fastest setup run your >> production >> code. >> >> That said we can safely expect that from raw performance coming years >> AMD will keep the leading edge >> from crunching viewpoint. Elsewhere i pointed out why. >> >> Even then i'd never bet at just 1 manufacturer. Go for both >> considering >> the cheap price of it. >> >> For a lot of HPC centers the choice of nvidia will be an easy one, as >> the price of the Fermi cards >> is peanuts compared to the price rest of the system and considering >> other demands that's what they'll go for. >> >> That might change once you stick in bunches of videocards in nodes. >> >> Please note that the gpu 'streamcores' or PE's whatever name you >> want to >> give them, are so bloody fast, >> that your code has to work within the PE's themselves and hardly >> use the >> RAM. >> >> Both for Nvidia as well as AMD, the streamcores are so fast, that you >> simply don't want to lose time on the RAM >> when your software runs, let alone that you want to use huge RAM. >> >> Add to that, that nvidia (have to still figure out for AMD) can in >> background stream from and to the gpu's RAM >> from the CPU, so if you do really large calculations involving >> many nodes, >> all that shouldn't be an issue in the first place. >> >> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that >> would >> really amaze me, though i'm sure >> there is cases where that happens. If we see however what was >> ordered it >> mostly is the 3GB Tesla's, >> at least on what has been reported, i have no global statistics on >> that... >> >> Now all choices are valid there, but even then we speak about peanuts >> money compared to the price of >> a single 8 socket Nehalem-ex box, which fully configured will be >> maybe >> $300k-$400k or something? >> >> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD >> 6990 >> is 2000 euro. >> >> There won't be 2 gpu nvidia's any soon because of the choice they >> have >> historically made for the memory controllers. >> See explanation of intel fanboy David Kanter for that at >> realworldtech >> in a special article he wrote there. >> >> Please note i'm not judging AMD nor Nvidia, they have made their >> choices >> based upon totally different >> businessmodels i suspect and we must be happy we have this rich >> choice >> right now between cpu's from different >> manufacturers and gpu's from different manufacturers. >> >> Nvidia really seems to aim at supercomputers, giving their tesla line >> without lobotomization and lobotomizing their >> gamers cards, where AMD aims at gamers and their gamercards have full >> functionality >> without lobotomization. >> >> Total different businessmodels. Both have their advantages and >> disadvantages. >> >> From pure performance viewpoint it's easy to see what's faster >> though. >> >> Yet right now i realize all too well that just too many still >> hesitate >> between also offering gpu services additional to >> cpu services, in which case having a gpu, regardless nvidia or amd, >> kicks butt of course from throughput viewpoint. >> >> To be really honest with you guys, i had expected that by 2011 we >> would >> have a gpu reaching far over 1 Teraflop double precision >> handsdown. If >> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 >> gpu's on a single card to get over that Teraflop double precision >> (claim >> is 1.27 Teraflop double precision), >> that really is underneath my expectations from a few years ago. >> >> Now of course i hope you realize i'm not coding double precision >> code at >> all; i'm writing everything in integers of 32 bits for the AMD >> card and >> the Nvidia equivalent also is using 32 bits integers. The ideal >> way to >> do calculations on those cards, so also very big transforms, is using >> the 32 x 32 == 64 bits instructions (that's 2 instructions in case >> of AMD). >> >> Regards, >> Vincent >> >> >>> >>> Gus Correa > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 21 09:11:54 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 21 Apr 2011 15:11:54 +0200 Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to tackle huge scientific challenges In-Reply-To: <4D9E4162.3030004@pathscale.com> References: <4D9E4162.3030004@pathscale.com> Message-ID: Regrettably the link is not available anymore. Can you expand on it? As they count the cloud computing in units of 1Ghz per cpunode hour, 1 billion computing core hours is something like 1000 gpu's for 1 week? 1 billion sounds impressive nevertheless. Regards, Vincent On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote: > I just saw this on another ML and thought it may be of interest > ------------ > http://googleblog.blogspot.com/2011/04/1-billion-computing-core- > hours-for.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Thu Apr 21 09:15:13 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Thu, 21 Apr 2011 15:15:13 +0200 Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to tackle huge scientific challenges In-Reply-To: References: <4D9E4162.3030004@pathscale.com> Message-ID: <5188F4D0-1D69-4B8F-874D-D20FDAC25CF6@staff.uni-marburg.de> Am 21.04.2011 um 15:11 schrieb Vincent Diepeveen: > Regrettably the link is not available anymore. Can you expand on it? For me it's still working. You selected both lines? --Reuti > As they count the cloud computing in units of 1Ghz per cpunode hour, > 1 billion computing core hours is something like 1000 gpu's for 1 week? > > 1 billion sounds impressive nevertheless. > > Regards, > Vincent > > On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote: > >> I just saw this on another ML and thought it may be of interest >> ------------ >> http://googleblog.blogspot.com/2011/04/1-billion-computing-core- >> hours-for.html >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Apr 1 10:47:28 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 01 Apr 2011 10:47:28 -0400 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D93B1E4.3080407@cora.nwra.com> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <4D93B1E4.3080407@cora.nwra.com> Message-ID: <4D95E580.5090902@ias.edu> On 03/30/2011 06:42 PM, Orion Poplawski wrote: > On 03/21/2011 06:51 AM, Douglas Eadline wrote: >> I got to thinking about how others are fairing (or not) >> with GP-GPU technology. I put up a simple poll on >> ClusterMonkey to help get a general idea. >> (you can find it on the front page right top) >> If you have a moment, please provide >> your experience (results are available as well). > > We've seen some reasonable speedup (12x) with some matlab code using Jacket. > It required up-to-the-minute bugfixes/enhancements from Accelereyes to get it > working though. Ran into lots of limitations with some other code (sparse > matrices) that prevented it from being usable. Have some reports of success > with gpulib and IDL. > > I've installed 4 GPU-equipped servers in my environment; 2 are a part of my cluster, and 2 are independent from the cluster so that users can login interactively and program/debug/tinker/whatever. (My cluster doesn't allow interactive logins by design). A handful of users were interested in getting access to the GPUs, but so far, not a single one has even logged into these systems to kick the tires yet, and the systems have been online for approx. 9 months. It just be that they're busy with other work. Most of my users are post-docs who guide their own research, so they can create/modify their own project schedules as they see fit. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Fri Apr 1 18:41:07 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Fri, 1 Apr 2011 15:41:07 -0700 Subject: [Beowulf] AMD 8 cores vs 12 cores CPUs and Infiniband References: <1301387847.1995.144.camel@mundo><9FA59C95FFCBB34EA5E42C1A8573784F037FD082@mtiexch01.mti.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F037FD5B7@mtiexch01.mti.com> > > I have been using single card on Magny-Cours with no issues at all. > You can > > interesting. what adjustments have you made to the MPI stack to permit > this? > we've had a variety of apps that fail intermittently on high-core > nodes. > I have to say I was surprised such a thing came up - not sure whether > it's > inherent to IB or a result of the openmpi stack. our usual way to test > this is to gradually reduce the ranks-per-node for the job until it > starts > to work. an interesting cosmology code works at 1 pppn but not 3 ppn > on our recent 12c MC, mellanox QDR cluster. I will be more than happy to give it a try - have access to the Magny-Cours system at http://www.hpcadvisorycouncil.com/cluster_center.php > > regards, mark hahn. > _______________________________________________ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From akshar.bhosale at gmail.com Sat Apr 2 08:41:07 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Sat, 2 Apr 2011 18:11:07 +0530 Subject: [Beowulf] error in job; jobs failing Message-ID: Hi, we are getting dapl 4003 event error. We have rhel 5.2 x64 and intel mpi library 4.3;dapl-1.2.7-1.ofed1.3.1; What can be the reason? we have torque and pbs setup for job runs. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From herbert.fruchtl at st-andrews.ac.uk Mon Apr 4 11:15:35 2011 From: herbert.fruchtl at st-andrews.ac.uk (Herbert Fruchtl) Date: Mon, 04 Apr 2011 16:15:35 +0100 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: Message-ID: <4D99E097.7060807@st-andrews.ac.uk> They hear great success stories (which in reality are often prototype implementations that do one carefully chosen benchmark well), then look at the API, look at their existing code, and postpone the start of their project until they have six months spare time for it. And we know when that is. The current approach with more or less vendor specific libraries (be they "open" or not) limits the uptake of GPU computing to a few hardcore developers of experimental codes who don't mind rewriting their code every two years. It won't become mainstream until we have a compiler that turns standard Fortran (or C++, if it has to be) into GPU code. Anything that requires more change than let's say OpenMP directives is doomed, and rightly so. Herbert > > I've installed 4 GPU-equipped servers in my environment; 2 are a part of > my cluster, and 2 are independent from the cluster so that users can > login interactively and program/debug/tinker/whatever. (My cluster > doesn't allow interactive logins by design). > > A handful of users were interested in getting access to the GPUs, but so > far, not a single one has even logged into these systems to kick the > tires yet, and the systems have been online for approx. 9 months. It > just be that they're busy with other work. Most of my users are > post-docs who guide their own research, so they can create/modify their > own project schedules as they see fit. > > -- Herbert Fruchtl Senior Scientific Computing Officer School of Chemistry, School of Mathematics and Statistics University of St Andrews -- The University of St Andrews is a charity registered in Scotland: No SC013532 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Mon Apr 4 12:01:44 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Mon, 04 Apr 2011 23:01:44 +0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk> References: <4D99E097.7060807@st-andrews.ac.uk> Message-ID: <4D99EB68.4020800@pathscale.com> Herbert Fruchtl wrote: > They hear great success stories (which in reality are often prototype > implementations that do one carefully chosen benchmark well), then look at the > API, look at their existing code, and postpone the start of their project until > they have six months spare time for it. And we know when that is. > > The current approach with more or less vendor specific libraries (be they "open" > or not) limits the uptake of GPU computing to a few hardcore developers of > experimental codes who don't mind rewriting their code every two years. It won't > become mainstream until we have a compiler that turns standard Fortran (or C++, > if it has to be) into GPU code. Anything that requires more change than let's > say OpenMP directives is doomed, and rightly so. > Hi Herbert, I think your perspective pretty much nails it (shameless self promotion) http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source) This is really only the tip of the problem and there must also be solutions for scaling *efficiently* across the cluster. (No MPI + CUDA or even HMPP is *not* the answer imho.) ./C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Apr 4 12:53:22 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 4 Apr 2011 09:53:22 -0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D99E097.7060807@st-andrews.ac.uk> References: <4D99E097.7060807@st-andrews.ac.uk> Message-ID: You've described it pretty well.. Look how long it took for "standard libraries" to take advantage of things like MPI to become "of course we use that".. If the original code used standard library calls for things like matrix math, and it's a "drop in" so you could do a "test case" in less than a day or so, you get pretty rapid acceptance. If it requires weeks to just figure out how to make it work, it's going to be in the "when someone specifically funds me to do it". I've seen lots of really interesting things that I'd like to try, but not being independently wealthy or having a patron who is, I have to work on things that other people want done (and, presumably which I also find interesting). I can write proposals to say "it would be really nice to do X because of speculative benefit Y" and every once in a while, someone will say, "Yeah, that sounds good, go check it out". And then we do. But it's a long and time consuming process. For instance, I was just in a presentation last week discussing a recent call for proposals from NASA.. the *shortest* time from proposal to response (yes/no) was around 120 days, the median was around 200 days, and the max was around 400 days plus, depending on the year. http://science.nasa.gov/researchers/sara/grant-stats/ A lot depends on what happens to the budgets as they wend their leisurely way through the program offices at the agencies, then get rolled up in the President's submission, then thrashed in Congress, then allocated, then back through the agency, and finally back down to the program. To provide some perspective on the front end of the process, the program managers at the agencies are winding up their PPBE13 submissions (that's for FY13, starting October 2012, although it also affects FY12 funding) A "new technology" that hasn't been "on the radar" probably has a 2-3 year lag before significant money can be applied to it (at least from government funding sources). Often, one can get smaller sums more quickly out of some general "investigate new technologies" kind of bucket (smaller sums = a few $10k), but right now, even those have essentially dried up (Continuing resolutions, etc.) To tie this back to the first question.. a few $10k would pay for the "Lets try recompiling with the new library and see if it works" sort of level of effort, but not for a "Let's rewrite our codes for the new hardware, and engage in a validation and verification effort to show that it still works" James Lux, P.E. Co-Principal Investigator, CoNNeCT Project Task Manager, SOMD Software Defined Radios Flight Communications Systems Section Jet Propulsion Laboratory 4800 Oak Grove Drive, Mail Stop 161-213 Pasadena, CA, 91109 +1(818)354-2075 phone +1(818)393-6875 fax > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Herbert Fruchtl > Sent: Monday, April 04, 2011 8:16 AM > To: beowulf at beowulf.org > Subject: Re: [Beowulf] GP-GPU experience > > They hear great success stories (which in reality are often prototype > implementations that do one carefully chosen benchmark well), then look at the > API, look at their existing code, and postpone the start of their project until > they have six months spare time for it. And we know when that is. > > The current approach with more or less vendor specific libraries (be they "open" > or not) limits the uptake of GPU computing to a few hardcore developers of > experimental codes who don't mind rewriting their code every two years. It won't > become mainstream until we have a compiler that turns standard Fortran (or C++, > if it has to be) into GPU code. Anything that requires more change than let's > say OpenMP directives is doomed, and rightly so. > > Herbert > > > > > I've installed 4 GPU-equipped servers in my environment; 2 are a part of > > my cluster, and 2 are independent from the cluster so that users can > > login interactively and program/debug/tinker/whatever. (My cluster > > doesn't allow interactive logins by design). > > > > A handful of users were interested in getting access to the GPUs, but so > > far, not a single one has even logged into these systems to kick the > > tires yet, and the systems have been online for approx. 9 months. It > > just be that they're busy with other work. Most of my users are > > post-docs who guide their own research, so they can create/modify their > > own project schedules as they see fit. > > > > > > -- > Herbert Fruchtl > Senior Scientific Computing Officer > School of Chemistry, School of Mathematics and Statistics > University of St Andrews > -- > The University of St Andrews is a charity registered in Scotland: > No SC013532 > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mfatica at gmail.com Mon Apr 4 12:54:37 2011 From: mfatica at gmail.com (Massimiliano Fatica) Date: Mon, 4 Apr 2011 09:54:37 -0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D99EB68.4020800@pathscale.com> References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> Message-ID: If you are old enough to remember the time when the first distribute computers appeared on the scene, this is a deja-vu. Developers used to program on shared memory ( mostly with directives) were complaining about the new programming models ( PVM, MPL, MPI). Even today, if you have a serial code there is no tool that will make your code runs on a cluster. Even on a single system, if you try an auto-parallel/auto-vectorizing compiler on a real code, your results will probably be disappointing. When you can get a 10x boost on a production code rewriting some portions of your code to use the GPU, if time to solution is important or you could perform simulations that were impossible before ( for example using algorithms that were just too slow on CPUs, Discontinuous Galerkin method is a perfect example), there are a lot of developers that will write the code. The effort it is clearly dependent of the code, the programmer and the tool used ( you can go from fully custom GPU code with CUDA or OpenCL, to automatically generated CUF kernels from PGI, to directives using HMPP or PGI Accelerator). In situation where time to solution relates to money, for example oil and gas, GPUs are the answer today ( you will be surprised by the number of GPUs in Houston). Look at the performance and scaling of AMBER ( MPI+ CUDA), http://ambermd.org/gpus/benchmarks.htm, and tell me that the results were not worth the effort. Is GPU programming for everyone: probably not, in the same measure that parallel programming in not for everyone. Better tools will lower the threshold, but a threshold will be always present. Massimiliano PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran, applications porting with CUDA, MPI+CUDA). 2011/4/4 "C. Bergstr?m" : > Herbert Fruchtl wrote: >> They hear great success stories (which in reality are often prototype >> implementations that do one carefully chosen benchmark well), then look at the >> API, look at their existing code, and postpone the start of their project until >> they have six months spare time for it. And we know when that is. >> >> The current approach with more or less vendor specific libraries (be they "open" >> or not) limits the uptake of GPU computing to a few hardcore developers of >> experimental codes who don't mind rewriting their code every two years. It won't >> become mainstream until we have a compiler that turns standard Fortran (or C++, >> if it has to be) into GPU code. Anything that requires more change than let's >> say OpenMP directives is doomed, and rightly so. >> > Hi Herbert, > > I think your perspective pretty much nails it > > (shameless self promotion) > http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) > http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf > http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source) > > This is really only the tip of the problem and there must also be > solutions for scaling *efficiently* across the cluster. ?(No MPI + CUDA > or even HMPP is *not* the answer imho.) > > ./C > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 15:16:31 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 21:16:31 +0200 Subject: [Beowulf] Quadrics? In-Reply-To: <4D2C8B7C.30300@bull.co.uk> References: <4D2C8B7C.30300@bull.co.uk> Message-ID: hi, sometimes i go through a lot of mails at the mailing list here and had missed this one. please keep me up to date and/or add me to mailing lists there. latency is superior of quadrics compared to all the infini* stuff. drivers that integrate into kernels - well some modifications shouldn't be too hard. Of course even the realtime linux kernel is rather crappy there, as it locks every action from and to a socket (even RAW/UDP communication in fact), so you need a 'hack' of that kernel anyway to get faster latencies. secondhand the quadrics stuff is cheap it seems. Vincent On Jan 11, 2011, at 5:55 PM, Daniel Kidger wrote: > Mark, > > I will let others step forward individually. > > I was one of the last employees to leave Quadrics , so I do know > who had > support contracts at that time, plus the even larger set of sites that > had expired support contracts but still were actively running their > QsNet clusters. > > You know that a company called Vega took on the ongoing support? : > here is the website I set up at the time: https:// > support.hpc.vega.co.uk/ > > I agree too though that there should be a community of QsNet-owning > enthusiasts, who could provide mutual support in this legacy era. > > > Also off the record, I know that there is a lot of Elan4 stock sitting > in a warehouse. As long as you are not looking for long term vendor > support, I expect you could acquire cards, cables and switches for a > bargain price. > > Daniel > > >> Are you still using Quadrics Elan4-based clusters? >> >> We would like to continue using Quadrics on one of our clusters, >> since it >> is still quite good in latency. Maintaining the Quadrics drivers, >> though, >> is a bit of a pain going forward - would be nice to avoid >> duplicating effort, >> if there are other groups also doing so. >> >> please follow up or email me if you are using Elan4, or know anything >> relevant. >> >> thanks, >> Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http:// >> www.sharcnet.ca >> | McMaster RHPCS | hahn at mcmaster.ca | 905 525 9140 >> x24687 >> | Compute/Calcul Canada | http:// >> www.computecanada.org >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > > > -- > Bull, Architect of an Open World TM > > Dr. Daniel Kidger, HPC Technical Consultant > daniel.kidger at bull.co.uk > +44 (0) 7966822177 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 15:20:15 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 21:20:15 +0200 Subject: [Beowulf] =?iso-8859-1?q?Chinese_supercomputers_to_use_=91homemad?= =?iso-8859-1?q?e=92_chips?= In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> Message-ID: On Mar 11, 2011, at 7:20 AM, Mark Hahn wrote: >> Interesting: >> Chinese supercomputers to use ?homemade? chips >> http://timesofindia.indiatimes.com/tech/personal-tech/computing/ >> Chinese-supercomputers-to-use-homemade-chips/articleshow/7655183.cms > > it's important to remind ourselves that China is still a centrally- > planned, > totalitarian dictatorship. I mention this only because this > announcement > is a bit like Putin et al announcing that they'll develop their own > linux distro because Russia is big and important and mustn't allow > itself to be vulnerable to foreign hegemony. > > so far, the very shallow reporting I've seen has said that future > generations will add wide FP vector units. nothing wrong with that, > though it's a bit unclear to me why other companies haven't done it > if there is, in fact, lots of important vector codes that will run > efficiently on such a configuration. adding/widening vector FP is > not breakthrough engineering afaikt. > > has anyone heard anything juicy about the Tianhe interconnect? > _______________________________________________ Not really but busy with an AMD-GPU now the 6970 (note the 6990 also is available having 2 gpu's) is so fast that the real problem is bandwidth from and to the gpu; so for a big cluster calculation i can understand very well the need for having your own interconnect, especially as they get produced in china anyway. the cpu's you also need bigtime, but as i'm going to react onto a special GPU posting anyway let's move it to that subject. > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 15:26:43 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 21:26:43 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: you can forget about getting much info other than marketing data. the companies and orgainsations that already calculate for years at gpu's they are really good in keeping their mouth shut. But if you realize that even with 16 fast AMD cores (which for this specific prime number code are a LOT FASTER in ipc than any other x64 chip), a box built cheap second hand by the way as it's 4 x 8356 are needed to feed just 1 gpu, you start to realize the real problem. GPU's completely annihilate cpu's everywhere. The limitation is the bandwidth to the gpu, though i didn't fully test that bandwidth yet. The 6000 series from AMD has much improved multiplication logics, like 2.5x faster than the previous generation and it'll take some time to optimize this code for it. streamcores for a while got renamed to PE's nowadays, processing elements, and it has 1536 per gpu. The 6990 has 2 of 'em. It took a while for a good driver for these gpu's. Last days of januari it was there. AMD-CAL works great here now. There is not much diff with CUDA, other than proprietary ways of how to access things and limbs and a few function calls. Programming is similar. 818 execution units that can do multiplication 32 x 32 bits == 64 bits. That kicks butt. bye bye cpu's. On Mar 21, 2011, at 1:51 PM, Douglas Eadline wrote: > > I was recently given a copy of "GPU Computing Gems" > to review. It is basically research quality NVidia success > stories, some of which are quite impressive. > > I got to thinking about how others are fairing (or not) > with GP-GPU technology. I put up a simple poll on > ClusterMonkey to help get a general idea. > (you can find it on the front page right top) > If you have a moment, please provide > your experience (results are available as well). > > http://www.clustermonkey.net/ > > BTW: You can see all the previous polls > and links to other market data here: > > http://goo.gl/lDcUJ > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 16:07:31 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 22:07:31 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> Message-ID: On Apr 4, 2011, at 6:54 PM, Massimiliano Fatica wrote: > If you are old enough to remember the time when the first distribute > computers appeared on the scene, > this is a deja-vu. Developers used to program on shared memory ( > mostly with directives) were complaining > about the new programming models ( PVM, MPL, MPI). > Even today, if you have a serial code there is no tool that will make > your code runs on a cluster. > Even on a single system, if you try an auto-parallel/auto-vectorizing > compiler on a real code, your results will probably be disappointing. > > When you can get a 10x boost on a production code rewriting some > portions of your code to use the GPU, if time to solution is important Oh comeon factor 10 is not realistic. You're doing the usual compare here of a hobby coder who coded a tad in C or slowish C++ (except for a SINGLE, so not several, NCSA coder i'll have to find the first C++ guy who can write codes equally fast to C for complex algorithms - granted for big companies C++ makes more sense, just not when it's about performance) and then compare that with a full blown sponsored project in CUDA that uses the topend gpu and compare it versus a single core instead of 4 sockets (as that's powerwise the same). Moneywise of course is another issue, that's where the gpu's win it bigtime. Yet there is a hidden cost in gpu's, that's you can build something way faster for less money with gpu's, but you also need to pay for a good coder to write your code in either CUDA or AMD-CAL (or as the chinese seem to support both at the same time, which is not so complicated if you have setup things in the correct manner). This last is a big problem for the western world; governments pay big bucks for hardware, but paying good coders what they are worth they seem to forget. Secondly there is another problem, that's that NVIDIA hasn't even released the instructoin set of their GPU. Try to figure that out without fulltime work for it. It seems however pretty similar to AMD, despite other huge architectural differences between the 2; the programming similarity is striking and selfexplains the real purpose where they got designed for (GRAPHICS). > or you could perform simulations that were impossible before ( for > example using algorithms that were just too slow on CPUs, All true yet it takes a LOT OF TIME to write something that's fast on a gpu. First of all you have to not write double precision code, as the gamers card from nvidia seem to not have much double precision logic, they only have 32 bits logics. So at double precision, AMD is like 10 times faster in money per gflop than Nvidia. Yet try to figure that out without being fulltime busy with those gpu's. Only the TESLA versions have those transistors it seems. Secondly Nvidia seems to keep being busy maximizing the frequency of the gpu. Now that might be GREAT for games as high clocked cores work (see intel), yet for throughput of course that's a dead end. In raw throughput AMD's (ATI's) approach will always win it of course from nvidia, as clocking a processor higher has a O ( n ^ 3 ) impact on power consumption. Now a big problem with nvidia is also that they basically go over spec. I didn't really figure it out, yet it seems pci-e got designed with 300 watt in mind max. Yet at this code i'm busy with, the CUDA version of it (mfaktc) consumes a whopping 400+ watt and please realize that majority of the system time is only keeping the streamcores busy and not caches at all nor much of a RAM. It's only doing multiplications of course at full speed in 32 bits code, using the new Fermi's instructions that allows multiplying 32 bits x 32 bits == 64 bits. CUDA version of your code gets developed btw by a guy working for a HPC vendor which, i guess, also sells those Tesla's. So any performance bragging sure must keep in mind it's far over 33% over the specs in terms of power consumption. Note AMD seems to follow nvidia in its path there. > Discontinuous Galerkin method is a perfect example), there are a lot > of developers that will write the code. Oh comeon, writing for gpu's is really complicated. > The effort it is clearly dependent of the code, the programmer and the > tool used ( you can go from fully custom GPU code with CUDA or OpenCL, Forget OpenCL, not good enough. Better to code in CUDA and AMD-CAL at the same time something. > to automatically generated CUF kernels from PGI, to directives using > HMPP or PGI Accelerator). > In situation where time to solution relates to money, for example > oil and gas, GPUs are the answer today ( you will be surprised > by the number of GPUs in Houston). Pardon me, those industries already were using vectorized solutoins long before CUDA was there and are using massively GPU's to calculate of course as soon as nvidia released a version that was programmable. This is not new. All those industries will of course never say anything on the performance nor how many they use. > Look at the performance and scaling of AMBER ( MPI+ CUDA), > http://ambermd.org/gpus/benchmarks.htm, and tell me that the results > were not worth the effort. > > Is GPU programming for everyone: probably not, in the same measure > that parallel programming in not for everyone. > Better tools will lower the threshold, but a threshold will be > always present. > I would argue that both AMD as well as Nvidia has really tried to give the 3d world nations an advantage by stopping progress in the rich nations. I will explain. The real big advantage of rich nations is that average persons have more cash. Students are a good example there. They can afford gpu's easily. Yet there is so little technical information available on latencies and in case of nvidia on instructoin set that the gpu's support, that this gives a huge programming hurdle for students. Also there is no good tips in nvidia documents how to program for those things. The most fundamental lessons how to program a gpu i miss in all documents i scanned so far. It's just a bunch of 'lectures' that's not going to create any topcoders. A piece of information here and a tad there. Very bad. AMD also is a nightmare there, they can't even run more than 1 program at the same time, despite claims that the 4000 series gpu's already had hardware support to do it. The indian helpdesk in fact is so lazy that they didn't even rename the word 'ati' in the documentation to AMD, and the library each few months gets a new name. Stream SDK now it's another new fancy name. "we worked hard in India sahib, yes sahib, yes sahib". Yet 5 years later still not much works. For example in opencl also the 2nd gpu doesn't work in case of AMD. Result "undefined". Nice. Default driver install at inux here doesn't get openCL to work in fact at the 6970. Both nvidia as well as AMD are a total joke there and by means of incompetence, the generic incompetence being complete and clear documentation just like we have documention on how cpu's work. Be it intel or AMD or IBM. Students who program now for those gpu's in CUDA or AMD-CAL, they will have to go to hell and back to get something to work well on it, except some trivial stuff that works well at it. We see that just a few manage. That's not a problem of the students, but a problem for society, because doing calculations faster and especially CHEAP, is a huge advantage to progress science. NSA type organisations in 3d world nations are a lot bigger than here, simply because more people live there. So right now more people over there code for gpu's than here, here where everyone can afford one. Some big companies excepted of course, but this is not a small note on companies. This is a note on 1st world versus 3d world. The real difference is students with budget over here. They have budget for gpu's, yet there is no good documentation simply giving which instructions a gpu has let alone which latencies. If you google hard, you will find 1 guy who actually by means of measuring had to measure the latencies of simple instructions that write to the same register. Why did an university guy need to measure this, why isn't this simply in Nvidia documentation? A few of those things will of course have majority, vaste vaste majority of students trying something on a gpu, completely fail. Because they fail, they don't continue there and don't get back from those gpu's a faster running code that gives them something very important: faster calculation speed for whatever they wanted to run. This is where AMD and Nvidia, and i politely call it by means of incompetence, gives the rich nations no advantage over the 3d world nations, as the students need to be compeltely fulltime busy to obtain knowledge on the internal workings of the gpu's in order to get something going fast at them. Majority will fail therefore of course, which has simply avoided gpu's from getting massively adapted. I've seen so many students try and fail at gpu programming, especially CUDA. It's bizarre. The fail % is so huge. Even a big succes doesn't get recognized as a big succes, simply because the guy didn't know about a few bottlenecks in gpu programming, as no manual told him the combination of problems he ran into, as there was no technical data available. It is true gpu's can be fast, but i feel there is a big need for better technical documentation of them. We can no longer ignore this now that 3d world nations are overrunning 1st world nations. Mainly because the sneaky organisations that do know everything are of course bigger over there than here, by means of population size. This where the huge advantage of the rich nations, namely that every student has such gpu at home, is not getting taken advantage from as the hurdle to gpu programming is too high by means of lack of accurate documentation. Of course in 3d world nations they have at most a mobile phone, and very very seldom a laptop (except for the rich elite), let alone a computer with a capable programmable gpu, which makes it impossible for majority of 3d world nations students to do any gpu computation because of a shortage in cash. > > Massimiliano > PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran, > applications porting with CUDA, MPI+CUDA). > > > 2011/4/4 "C. Bergstr?m" : >> Herbert Fruchtl wrote: >>> They hear great success stories (which in reality are often >>> prototype >>> implementations that do one carefully chosen benchmark well), >>> then look at the >>> API, look at their existing code, and postpone the start of their >>> project until >>> they have six months spare time for it. And we know when that is. >>> >>> The current approach with more or less vendor specific libraries >>> (be they "open" >>> or not) limits the uptake of GPU computing to a few hardcore >>> developers of >>> experimental codes who don't mind rewriting their code every two >>> years. It won't >>> become mainstream until we have a compiler that turns standard >>> Fortran (or C++, >>> if it has to be) into GPU code. Anything that requires more >>> change than let's >>> say OpenMP directives is doomed, and rightly so. >>> >> Hi Herbert, >> >> I think your perspective pretty much nails it >> >> (shameless self promotion) >> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) >> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf >> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to >> source) >> >> This is really only the tip of the problem and there must also be >> solutions for scaling *efficiently* across the cluster. (No MPI + >> CUDA >> or even HMPP is *not* the answer imho.) >> >> ./C >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 4 16:20:02 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 4 Apr 2011 16:20:02 -0400 (EDT) Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: > GPU's completely annihilate cpu's everywhere. this is complete nonsense. GPUs do very nicely on a quite narrow set of problems. for a somewhat larger set of problems, they do OK, but pretty "meh", really, considering. for many problems, GPUs are irrelevant, whether that's because the problem uses too much memory, or already scales well on non-GPU, or doesn't have a GPU-friendly structure. > 818 execution units that can do multiplication 32 x 32 bits == 64 bits. > That kicks butt. bye bye cpu's. well, for your application, which is quite narrow. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 16:34:19 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 4 Apr 2011 22:34:19 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> On Apr 4, 2011, at 10:20 PM, Mark Hahn wrote: >> GPU's completely annihilate cpu's everywhere. > > this is complete nonsense. GPUs do very nicely on a quite narrow > set of problems. for a somewhat larger set of problems, they do OK, > but pretty "meh", really, considering. for many problems, GPUs are > irrelevant, whether that's because the problem uses too much > memory, or already scales well on non-GPU, or doesn't have a GPU- > friendly > structure. > >> 818 execution units that can do multiplication 32 x 32 bits == 64 >> bits. >> That kicks butt. bye bye cpu's. > > well, for your application, which is quite narrow. Which is about any relevant domain where massive computation takes place. The number of algorithms that really profit bigtime from a lot of RAM, in some cases you can also replace by massive computation and a tad of memory, the cases where that cannot be the case are very rare. For those few cases you order a few nodes with massive RAM rather than big cpu power. yet majority of HPC calculations, especially if we add company codes there, the simulators and the oil, gas, car and aviation industry. So that makes 95% of all codes just need massive cpu power and can get away with relative small RAM sizes per compute unit. Not to confuse btw with a compute unit of AMD as that is just a small part of a gpu, speaking of redefinitions :) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 4 17:54:00 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 4 Apr 2011 17:54:00 -0400 (EDT) Subject: [Beowulf] GP-GPU experience In-Reply-To: <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> Message-ID: >> well, for your application, which is quite narrow. > > Which is about any relevant domain where massive computation takes place. you are given to hyperbole. the massive domains I'm thinking of are cosmology and explicit quantum condensed-matter calculations. the experts in those fields I talk to both do use massive computation and do not expect much benefit from GPUs. > The number of algorithms that really profit bigtime from a lot of RAM, in > some cases you can also > replace by massive computation and a tad of memory, the cases where that > cannot be the case > are very rare. no. you are equating "uses lots of ram" with "uses memoization". > yet majority of HPC calculations, especially if we add company codes there, > the simulators and the oil, > gas, car and aviation industry. jeez. nevermind I said anything. I'd forgotten about your style. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Apr 4 18:10:44 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 00:10:44 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> Message-ID: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl> On Apr 4, 2011, at 11:54 PM, Mark Hahn wrote: >>> well, for your application, which is quite narrow. >> >> Which is about any relevant domain where massive computation takes >> place. > > you are given to hyperbole. the massive domains I'm thinking of > are cosmology and explicit quantum condensed-matter calculations. > the experts in those fields I talk to both do use massive computation > and do not expect much benefit from GPUs. Even the field you give as an example: quantum mechanica: Vaste majority of quantum mechanica calculations are massive matrix calculations. Furthermore i didn't take a look to the field you're speaking about. I did however take a look to 1 other quantum mechanica calculation, where someone used 1 core of his quadcore box and massive RAM. It took me 1 afternoon to explain the guy how to trivially use all 4 cores doing that calculation using the same RAM buffer. You realize that you also can do combined calculations? Just have a new chipset with big bandwidth to gpu, at cpu's, based upon a big RAM buffer, prepare batches, ship batch to gpu, do tough calculation work on the gpu, ship results back. That's how many use those gpu's. My attempt to write a sieve directly into the gpu in order to do everything inside the gpu, is of a different league sir than where you are talking. Your kind of talking is: "there are no tanks in the city, we will drive all tanks out of the city, so that only our cpu's are left again". Those days are over. Just get creative and find a way to do it at a gpu. I parallellized 1 quantum mechanica calculation there; i wasn't paid for that. Just pay someone to useful use a GPU. If it ain't easy it doesn't mean it's impossible. Most quantum mechanica guys might be brilliant in their field, in manners how to parallellize things without losing their branching factor that a huge RAM buffer gives, they didn't figure out simply yet. Now it won't be easy to solve for every field; but being a speedfreak and in advance saying some faster type of hardware cannot be used is just monkeytalk. Go get clever and solve the problem. Find solutions, don't see just problems. > >> The number of algorithms that really profit bigtime from a lot of >> RAM, in some cases you can also >> replace by massive computation and a tad of memory, the cases >> where that cannot be the case >> are very rare. > > no. you are equating "uses lots of ram" with "uses memoization". > >> yet majority of HPC calculations, especially if we add company >> codes there, the simulators and the oil, >> gas, car and aviation industry. > > jeez. > nevermind I said anything. I'd forgotten about your style. Read the statistics on the reports what eats system time sir. You have access to those papers as well if you know how to google. Regards, Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 4 18:20:08 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 4 Apr 2011 18:20:08 -0400 (EDT) Subject: [Beowulf] GP-GPU experience In-Reply-To: <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <7FAA295F-3185-45E9-A297-10C2F61DEA52@xs4all.nl> <7385788F-FC47-4693-9EBD-7F551ABD93FE@xs4all.nl> Message-ID: >>>> well, for your application, which is quite narrow. >>> >>> Which is about any relevant domain where massive computation takes place. >> >> you are given to hyperbole. the massive domains I'm thinking of >> are cosmology and explicit quantum condensed-matter calculations. >> the experts in those fields I talk to both do use massive computation >> and do not expect much benefit from GPUs. > > Even the field you give as an example: quantum mechanica: > Vaste majority of quantum mechanica calculations are massive matrix > calculations. yes, specifically very large sparse eigensystems. do you have an example of effectively using GPUs for this? > Furthermore i didn't take a look to the field you're speaking about. > I did however take a look to 1 other quantum mechanica calculation, > where someone used 1 core of his quadcore box and massive RAM. sorry, I'm talking thousands of cores, ideally with > 4GB/core. > It took me 1 afternoon to explain the guy how to trivially use all 4 cores > doing that calculation > using the same RAM buffer. the point is that lots of serious science uses MPI already, and doesn't care much about GPUs. if they were free, sure, they might be interesting. > My attempt to write a sieve directly into the gpu in order to do everything > inside the gpu, > is of a different league sir than where you are talking. bully for you. your application is a niche. > Your kind of talking is: "there are no tanks in the city, we will drive all > tanks out of the city, so that only > our cpu's are left again". nonsense. I'm saying that GPUs are a nice, specialized accelerator. you can't have them without hosts, so you need to compare host vs host+GPU. > Those days are over. Just get creative and find a way to do it at a gpu. don't be silly. GPUs have weaknesses as well as strengths. packaging and system design is one of the minor sticking points with GPUs. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Tue Apr 5 01:22:39 2011 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 4 Apr 2011 22:22:39 -0700 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> Message-ID: <20110405052239.GA6130@bx9.net> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: > If you are old enough to remember the time when the first distribute > computers appeared on the scene, > this is a deja-vu. Not to mention the prior appearance of array processors. Oil+Gas bought a lot of those, too. Some important radio astronomy data reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B was 10X faster than the VAX by itself. Then microprocessor-based workstations arrived, and the game was over, ease of use FTW. > Even on a single system, if you try an auto-parallel/auto-vectorizing > compiler on a real code, your results will probably be disappointing. The wins from such compilers have been steadily decreasing, as main memory gets farther and farther away from the CPU and caches. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From beat at 0x1b.ch Tue Apr 5 01:52:41 2011 From: beat at 0x1b.ch (Beat Rubischon) Date: Tue, 05 Apr 2011 07:52:41 +0200 Subject: [Beowulf] Quadrics? In-Reply-To: References: <4D2C8B7C.30300@bull.co.uk> Message-ID: <4D9AAE29.5090207@0x1b.ch> Hi Vincent! On 04.04.11 21:16, Vincent Diepeveen wrote: > latency is superior of quadrics compared to all the infini* stuff. Quadrics was great stuff - but it was outperformed once Mellanox invited their ConnectX chips. Additional the Quadrics team never got their PCIe chips (QSnet III) to fly. Finally the company closed their doors in may 09. I really liked their hard- and software. But the time is over... > Of course even the realtime linux kernel is rather crappy there, as > it locks every action from and to a socket (even RAW/UDP > communication in fact), so you need a 'hack' of that kernel anyway to > get faster latencies. When talking about Interconnects the kernel is not involved in communication. Any context switch is avoided to keep the overhead small. This basically means a real time kernel isn't needed as it would not give you any additional benefit. Beat -- \|/ Beat Rubischon ( 0-0 ) http://www.0x1b.ch/~beat/ oOO--(_)--OOo--------------------------------------------------- Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 03:51:00 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 09:51:00 +0200 Subject: [Beowulf] Quadrics? In-Reply-To: <4D9AAE29.5090207@0x1b.ch> References: <4D2C8B7C.30300@bull.co.uk> <4D9AAE29.5090207@0x1b.ch> Message-ID: <75CD2C36-0B25-4CD2-B3F8-2645BE1A72DC@xs4all.nl> On Apr 5, 2011, at 7:52 AM, Beat Rubischon wrote: > Hi Vincent! > > On 04.04.11 21:16, Vincent Diepeveen wrote: >> latency is superior of quadrics compared to all the infini* stuff. > > Quadrics was great stuff - but it was outperformed once Mellanox > invited > their ConnectX chips. Additional the Quadrics team never got their > PCIe > chips (QSnet III) to fly. Finally the company closed their doors in > may 09. > > I really liked their hard- and software. But the time is over... > of course there is new great pci-e solutions, yet the price per port there is bigger than entire machine with latest gpu, that's a big problem to make cheap clusters. If you buy a cheap 6 core box of 350 euro then a new generation gpu is 318 euro or so (that's a HD6970). What's node price of the network? >> Of course even the realtime linux kernel is rather crappy there, as >> it locks every action from and to a socket (even RAW/UDP >> communication in fact), so you need a 'hack' of that kernel anyway to >> get faster latencies. > > When talking about Interconnects the kernel is not involved in > communication. Any context switch is avoided to keep the overhead > small. > This basically means a real time kernel isn't needed as it would not > give you any additional benefit. realtime kernel keeps other worst cases down bigtime, especially with respect to scheduling. > > Beat > > -- > \|/ Beat Rubischon > ( 0-0 ) http://www.0x1b.ch/~beat/ > oOO--(_)--OOo--------------------------------------------------- > Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 03:58:47 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 09:58:47 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: <20110405052239.GA6130@bx9.net> References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> <20110405052239.GA6130@bx9.net> Message-ID: On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote: > On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: > >> If you are old enough to remember the time when the first distribute >> computers appeared on the scene, >> this is a deja-vu. > > Not to mention the prior appearance of array processors. Oil+Gas > bought a lot of those, too. Some important radio astronomy data > reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B > was 10X faster than the VAX by itself. Then microprocessor-based > workstations arrived, and the game was over, ease of use FTW. > >> Even on a single system, if you try an auto-parallel/auto-vectorizing >> compiler on a real code, your results will probably be disappointing. > > The wins from such compilers have been steadily decreasing, as main > memory gets farther and farther away from the CPU and caches. > > -- greg It's different this time indeed; classic cpu's will never again deliver big performance. cache - coherency is simply too complicated with many cores. cpu's also will need a manycore co-processor therefore. furthermore manycores simply are cheaper to produce and they can eat a bigger powerbudget. 3 very powerful arguments which regrettably limits cpu's, but that's the price we pay for progress. It won't mean cpu's will go away of course any soon, they're so generic and easy to program that they will survive. Just offload the calculations to the manycores. please don't estimate the argument of cheaper to produce. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 04:04:35 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 10:04:35 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <4D99E097.7060807@st-andrews.ac.uk> <4D99EB68.4020800@pathscale.com> <20110405052239.GA6130@bx9.net> Message-ID: On Apr 5, 2011, at 9:58 AM, Vincent Diepeveen wrote: > > On Apr 5, 2011, at 7:22 AM, Greg Lindahl wrote: > >> On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: >> >>> If you are old enough to remember the time when the first distribute >>> computers appeared on the scene, >>> this is a deja-vu. >> >> Not to mention the prior appearance of array processors. Oil+Gas >> bought a lot of those, too. Some important radio astronomy data >> reduction algorithms were coded for them -- a VAX 11/780+FPS AP120B >> was 10X faster than the VAX by itself. Then microprocessor-based >> workstations arrived, and the game was over, ease of use FTW. >> >>> Even on a single system, if you try an auto-parallel/auto- >>> vectorizing >>> compiler on a real code, your results will probably be >>> disappointing. >> >> The wins from such compilers have been steadily decreasing, as main >> memory gets farther and farther away from the CPU and caches. >> >> -- greg > Early Morning oh oh oh oh, apologies the context might be clear yet the sentences were written down wrong. > It's different this time indeed; classic cpu's will never again > deliver big performance. > ack > cache - coherency is simply too complicated with many cores. 1) Cache-coherency is too complicated for CPU's > cpu's also will need a manycore co-processor therefore. > ack > furthermore manycores simply are cheaper to produce and they can eat > a bigger powerbudget. > ack > 3 very powerful arguments which regrettably limits cpu's, but that's > the price we pay for progress. > ack > It won't mean cpu's will go away of course any soon, they're so > generic and easy to program that > they will survive. Just offload the calculations to the manycores. > ack > please don't estimate the argument of cheaper to produce. > > please don't UNDERESTIMATE the argument of cheaper to produce only 6 out of 8 score = 75% sharp in the morning > >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Tue Apr 5 05:10:28 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 05 Apr 2011 19:10:28 +1000 Subject: [Beowulf] GP-GPU experience In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> Message-ID: <4D9ADC84.7030804@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/04/11 05:26, Vincent Diepeveen wrote: > GPU's completely annihilate cpu's everywhere. Great! Where can I get one with 1TB of on-card RAM to keep our denovo reassembly people happy ? - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0 POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t =OMhq -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Apr 5 09:05:19 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 5 Apr 2011 15:05:19 +0200 Subject: [Beowulf] GP-GPU experience In-Reply-To: <4D9ADC84.7030804@unimelb.edu.au> References: <68A57CCFD4005646957BD2D18E60667B1292EC74@milexchmb1.mil.tagmclarengroup.com> <46317.192.168.93.213.1299685688.squirrel@mail.eadline.org> <45254.192.168.93.213.1300711866.squirrel@mail.eadline.org> <4D9ADC84.7030804@unimelb.edu.au> Message-ID: <2538ED2A-7F07-4524-B74E-6F0AE623916E@xs4all.nl> On Apr 5, 2011, at 11:10 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 05/04/11 05:26, Vincent Diepeveen wrote: > >> GPU's completely annihilate cpu's everywhere. > > Great! Where can I get one with 1TB of on-card RAM to > keep our denovo reassembly people happy ? There is already several projects in that area that tried incorporate GPU's and with succes. Just google a bit, i got bunches of hits from all sorts of research institutes in that area, most already over 2 years old, nothing new there. Your reaction just shows your ignorance there. Regards, Vincent > > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk2a3IQACgkQO2KABBYQAh8HEwCfXfv8+1yhvtAxUStqBHI9zPv0 > POsAn1cs/vjgTV9s+F9+aIN9nIz+I87t > =OMhq > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Wed Apr 6 06:58:12 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Wed, 6 Apr 2011 11:58:12 +0100 Subject: [Beowulf] Westmere EX Message-ID: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ 10 core Westmere EX on an eight socket box = 80 cores These would be a very nice machine. Anyone know if machines like this will be built? Do the sockets have enough Quickpath links to create an 8-way topology? John Hearns | CFD Hardware Specialist | McLaren Racing Limited McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK T: +44 (0) 1483 261000 D: +44 (0) 1483 262352 F: +44 (0) 1483 261010 E: john.hearns at mclaren.com W: www.mclaren.com The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From brice.goglin at gmail.com Wed Apr 6 07:05:55 2011 From: brice.goglin at gmail.com (Brice Goglin) Date: Wed, 06 Apr 2011 13:05:55 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9C4913.10802@gmail.com> Le 06/04/2011 12:58, Hearns, John a ?crit : > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. > Anyone know if machines like this will be built? > Do the sockets have enough Quickpath links to create an 8-way topology? > You only have 4 QPI links per sockets, no way to connect the entire graph. Supermicro already announced such 8-way machines. See their QPI topology on page 30 of the motherboard manual available at http://www.supermicro.com/products/motherboard/Xeon7000/7500/X8OBN-F.cfm Brice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cap at nsc.liu.se Wed Apr 6 11:41:18 2011 From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=) Date: Wed, 6 Apr 2011 17:41:18 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <201104061741.18972.cap@nsc.liu.se> On Wednesday, April 06, 2011 12:58:12 pm Hearns, John wrote: > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. > Anyone know if machines like this will be built? > Do the sockets have enough Quickpath links to create an 8-way topology? > > > John Hearns | CFD Hardware Specialist | McLaren Racing Limited > McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK The HP DL980 is an 8 socket EX box but it's not glue-less (it uses HPs own numa interconnect). If you stuff 'em full of dimms then they're probably competitive with the 4 socket 580 (assuming the 980 uses 8G dimms instead of 16G for the 580...). /Peter _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Apr 6 14:00:17 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 6 Apr 2011 20:00:17 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> On Apr 6, 2011, at 12:58 PM, Hearns, John wrote: > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. > Anyone know if machines like this will be built? > Do the sockets have enough Quickpath links to create an 8-way > topology? What do you intend to use the machines for? For a chessprogram they would be great, but none of those guys has the cash to pay for these machines. For financial world it would be a waste of money as well as the latency probably will be very very bad. They seem to get equipped with a max of 512GB ram, not really much for those who badly need a lot of RAM, if we consider the price of such a configured machine. Same price like a power7. > > > John Hearns | CFD Hardware Specialist | McLaren Racing Limited > McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK > > T: +44 (0) 1483 261000 > D: +44 (0) 1483 262352 > F: +44 (0) 1483 261010 > E: john.hearns at mclaren.com > W: www.mclaren.com > > > > > The contents of this email are confidential and for the exclusive > use of the intended recipient. If you receive this email in error > you should not copy it, retransmit it, use it or disclose its > contents but should return it to the sender immediately and delete > your copy. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Wed Apr 6 14:12:56 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Wed, 6 Apr 2011 19:12:56 +0100 Subject: [Beowulf] Westmere EX References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> Message-ID: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > What do you intend to use the machines for? > For a chessprogram they would be great, but none of those guys has > the cash to pay for these > machines. The Supermicro board which Bruce Goglin refers to is said to support 16gbytes DIMMS. Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a 1024 Gbyte machine, plus you can cook your dinner on it. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Wed Apr 6 14:18:35 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Thu, 07 Apr 2011 01:18:35 +0700 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9CAE7B.8000900@pathscale.com> Hearns, John wrote: >> What do you intend to use the machines for? >> For a chessprogram they would be great, but none of those guys has >> the cash to pay for these >> machines. >> > > > > The Supermicro board which Bruce Goglin refers to is said to support > 16gbytes DIMMS. > > Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a > 1024 Gbyte machine, > plus you can cook your dinner on it. > LOL.. (I have to admit that's kinda funny, but only because it's true) I didn't look at the specs, but I wonder how many IOPS you could get off a ram disk on that thing.. $60k is I believe (I could be wrong) in the same ballpark as 1T 1U 1million IOPS appliances (albeit they offer persistence and probably consume less power as well) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hearnsj at googlemail.com Wed Apr 6 18:02:47 2011 From: hearnsj at googlemail.com (John Hearns) Date: Wed, 6 Apr 2011 23:02:47 +0100 Subject: [Beowulf] Westmere EX In-Reply-To: <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> Message-ID: On 6 April 2011 19:00, Vincent Diepeveen wrote: > > On Apr 6, 2011, at 12:58 PM, Hearns, John wrote: > >> http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > What do you intend to use the machines for? Maybe something like: http://www.youtube.com/watch?v=x2Z3h_Hx310&NR=1 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Wed Apr 6 20:39:19 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 6 Apr 2011 20:39:19 -0400 (EDT) Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > 10 core Westmere EX on an eight socket box = 80 cores > These would be a very nice machine. shrug. does anyone have serious experience with real apps on manycore machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, but they're substantially more exotic/rare/expensive.) I bet there will be 100x more 4s servers build with these chips than 8s. and 1000x more 2s than 4s... a friend noticed something weird on intel's spec sheets: http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E notice it says 32GB max memory size. even if that means 32GB/socket, it's not all that much. I don't know about everyone else, but I'm already bored with core counts ;) these also seem fairly warm (130W), considering that they're the fancy new 32nm process and run at modest clock rates... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From joshua_mora at usa.net Wed Apr 6 20:57:43 2011 From: joshua_mora at usa.net (Joshua mora acosta) Date: Wed, 06 Apr 2011 19:57:43 -0500 Subject: [Beowulf] Westmere EX Message-ID: <093PDga5r8464S02.1302137863@web02.cms.usa.net> _3D_ FFT scaling will allow you to see how well balanced is the system. Joshua ------ Original Message ------ Received: 07:40 PM CDT, 04/06/2011 From: Mark Hahn To: Beowulf Mailing List Subject: Re: [Beowulf] Westmere EX > > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/ > > > > 10 core Westmere EX on an eight socket box = 80 cores > > These would be a very nice machine. > > shrug. does anyone have serious experience with real apps on manycore > machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, > but they're substantially more exotic/rare/expensive.) > > I bet there will be 100x more 4s servers build with these chips than 8s. > and 1000x more 2s than 4s... > > a friend noticed something weird on intel's spec sheets: > http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC3E > > notice it says 32GB max memory size. even if that means 32GB/socket, > it's not all that much. > > I don't know about everyone else, but I'm already bored with core counts ;) > these also seem fairly warm (130W), considering that they're the fancy > new 32nm process and run at modest clock rates... > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From jlforrest at berkeley.edu Wed Apr 6 22:15:17 2011 From: jlforrest at berkeley.edu (Jon Forrest) Date: Wed, 06 Apr 2011 19:15:17 -0700 Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9D1E35.9040802@berkeley.edu> On 4/6/2011 5:39 PM, Mark Hahn wrote: > shrug. does anyone have serious experience with real apps on manycore > machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, > but they're substantially more exotic/rare/expensive.) I have a couple 48-core 1U boxes. They can build gcc and other large packages very quickly. The scientists who run single process simulations also like them but they're not real picky about how long it takes for something to run. They also generally spend close to no time at all optimizing anything. -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Apr 7 04:43:06 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 09:43:06 +0100 Subject: [Beowulf] Westmere EX References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> Message-ID: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > On 4/6/2011 5:39 PM, Mark Hahn wrote: > > > shrug. does anyone have serious experience with real apps on > manycore > > machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, > > but they're substantially more exotic/rare/expensive.) > > I have a couple 48-core 1U boxes. They can build > gcc and other large packages very quickly. > > The scientists who run single process simulations > also like them but they're not real picky about > how long it takes for something to run. They also > generally spend close to no time at all optimizing > anything. "Premature optimization is the root of all evil" - Donald Knuth I'm also interested in the response to Mark Hahn's question - I guess that's why I started this thread really! Also as I've said before, with the advent of affordable manycore systems like this, we're going to have to dust off those old skills practised in the age of SMP monster machines - which were probably something like the same specs as these affordable systems! The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Thu Apr 7 04:56:33 2011 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 7 Apr 2011 10:56:33 +0200 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour Message-ID: <20110407085633.GE23560@leitl.org> http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n 10,000-core Linux supercomputer built in Amazon cloud Cycle Computing builds cloud-based supercomputing cluster to boost scientific research. By Jon Brodkin, Network World April 06, 2011 03:15 PM ET High-performance computing expert Jason Stowe recently asked two of his engineers a simple question: Can you build a 10,000-core cluster in the cloud? "It's a really nice round number," says Stowe, the CEO and founder of Cycle Computing, a vendor that helps customers gain fast and efficient access to the kind of supercomputing power usually reserved for universities and large research organizations. SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to Linux To continue reading, register here to become an Insider. You'll get free access to premium content from CIO, Computerworld, CSO, InfoWorld, and Network World. See more Insider content or sign in. High-performance computing expert Jason Stowe recently asked two of his engineers a simple question: Can you build a 10,000-core cluster in the cloud? "It's a really nice round number," says Stowe, the CEO and founder of Cycle Computing, a vendor that helps customers gain fast and efficient access to the kind of supercomputing power usually reserved for universities and large research organizations. SUPERCOMPUTERS: Microsoft breaks petaflop barrier, loses Top 500 spot to Linux Cycle Computing had already built a few clusters on Amazon's Elastic Compute Cloud that scaled up to several thousand cores. But Stowe wanted to take it to the next level. Provisioning 10,000 cores on Amazon has probably been done numerous times, but Stowe says he's not aware of anyone else achieving that number in an HPC cluster, meaning one that uses a batch scheduling technology and runs an HPC-optimized application. "We haven't found references to anything larger," Stowe says. Had it been tested for speed, the Linux-based cluster Stowe ran on Amazon might have been big enough to make the Top 500 list of the world's fastest supercomputers. One of the first steps was finding a customer that would benefit from such a large cluster. There's no sense in spinning up such a large environment unless it's devoted to some real work. The customer that opted for the 10,000-core cloud cluster was biotech company Genentech in San Francisco, where scientist Jacob Corn needed computing power to examine how proteins bind to each other, in research that might eventually lead to medical treatments. Compared to the 10,000-core cluster, "we're a tenth the size internally," Corn says. Cycle Computing and Genentech spun up the cluster on March 1 a little after midnight, based on Amazon's advice regarding the optimal time to request 10,000 cores. While Amazon offers virtual machine instances optimized for high-performance computing, Cycle and Genentech instead opted for a "standard vanilla CentOS" Linux cluster to save money, according to Stowe. CentOS is a version of Linux based on Red Hat's Linux. The 10,000 cores were composed of 1,250 instances with eight cores each, as well as 8.75TB of RAM and 2PB disk space. Scaling up a couple of thousand cores at a time, it took 45 minutes to provision the whole cluster. There were no problems. "When we requested the 10,000th core, we got it," Stowe said. The cluster ran for eight hours at a cost of $8,500, including all the fees to Amazon and Cycle Computing. (See also: Start-up transforms unused desktop cycles into fast server clusters) For Genentech, this was cheap and easy compared to the alternative of buying 10,000 cores for its own data center and having them idle away with no work for most of their lives, Corn says. Using Genentech's existing resources to perform the simulations would take weeks or months instead of the eight hours it took on Amazon, he says. Genentech benefited from the high number of cores because its calculations were "embarrassingly parallel," with no communication between nodes, so performance stats "scaled linearly with the number of cores," Corn said. To provision the cluster, Cycle used its own CycleCloud software, the Condor scheduling system and Chef, an open source configuration management framework. Cycle also used some of its own software to detect errors and restart nodes when necessary, a shared file system, and a few extra nodes on top of the 10,000 to handle some of the legwork. To ensure security, the cluster was engineered with secure-HTTP and 128/256-bit Advanced Encryption Standard encryption, according to Cycle. Cycle Computing boasted that the cluster was roughly equivalent to the 114th fastest supercomputer in the world on the Top 500 list, which hit about 66 teraflops. In reality, they didn't run the speed benchmark required to submit a cluster to the Top 500 list, but nearly all of the systems listed below No. 114 in the ranking contain fewer than 10,000 cores. Genentech is still waiting to see whether the simulations lead to anything useful in the real world, but Corn says the data "looks fantastic." He says Genentech is "very open" to building out more Amazon clusters, and Cycle Computing is looking ahead as well. "We're already working on scaling up larger," Stowe says. All Cycle needs is a customer with "a use case to take advantage of it." Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin Read more about data center in Network World's Data Center section. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 7 08:47:54 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 7 Apr 2011 14:47:54 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4A59BB0D-53F3-4352-8EFE-6F686FF97372@xs4all.nl> <207BB2F60743C34496BE41039233A809041E3DBD@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <33396FD5-FBAA-4735-8694-B0D7FE7EAA84@xs4all.nl> On Apr 6, 2011, at 8:12 PM, Hearns, John wrote: >> What do you intend to use the machines for? >> For a chessprogram they would be great, but none of those guys has >> the cash to pay for these >> machines. > > > > The Supermicro board which Bruce Goglin refers to is said to support > 16gbytes DIMMS. > > Quick Google says $944 dollars per DIMM, so $60 000 memory cost for a > 1024 Gbyte machine, > plus you can cook your dinner on it. > Except that you can't buy the machine equipped with that for $60k in a shop. 512GB equipped 8 socket nehalem-ex (8 core version 2.26Ghz) was introduced at $205k, that's without further equipment such as huge storage, so basic configuration when ordered at Oracle. So this box will probably be $250k or $300k or so? Regards, Vincent > The contents of this email are confidential and for the exclusive > use of the intended recipient. If you receive this email in error > you should not copy it, retransmit it, use it or disclose its > contents but should return it to the sender immediately and delete > your copy. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 7 08:52:43 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 7 Apr 2011 14:52:43 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> On Apr 7, 2011, at 10:43 AM, Hearns, John wrote: >> >> On 4/6/2011 5:39 PM, Mark Hahn wrote: >> >>> shrug. does anyone have serious experience with real apps on >> manycore >>> machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, >>> but they're substantially more exotic/rare/expensive.) >> >> I have a couple 48-core 1U boxes. They can build >> gcc and other large packages very quickly. >> >> The scientists who run single process simulations >> also like them but they're not real picky about >> how long it takes for something to run. They also >> generally spend close to no time at all optimizing >> anything. > > "Premature optimization is the root of all evil" - Donald Knuth > > > I'm also interested in the response to Mark Hahn's question - I guess > that's why I started this thread really! > > Also as I've said before, with the advent of affordable manycore > systems > like this, we're going > to have to dust off those old skills practised in the age of SMP > monster > machines - which were probably > something like the same specs as these affordable systems! > it's not clear what 'these' refers to. 48 core AMD multicore machine: $8000 on ebay i saw one for. Of course not much of a RAM and not fastest chip. Let's say fully configured about double that price. GPU monster box, which is basically a few videocards inside such a box stacked up a tad, wil only add a couple of thousands. But a 8 socket @ 10 core nehalem-ex, in basic configuration will be already far above $205k. Probably a $300k or so when configured. Huge price difference. So i assume you didn't refer to the Nehalem-ex box. > The contents of this email are confidential and for the exclusive > use of the intended recipient. If you receive this email in error > you should not copy it, retransmit it, use it or disclose its > contents but should return it to the sender immediately and delete > your copy. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 09:49:09 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 09:49:09 -0400 Subject: [Beowulf] Westmere EX In-Reply-To: <4D9D1E35.9040802@berkeley.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> Message-ID: <4D9DC0D5.8060802@ias.edu> On 04/06/2011 10:15 PM, Jon Forrest wrote: > On 4/6/2011 5:39 PM, Mark Hahn wrote: > >> shrug. does anyone have serious experience with real apps on manycore >> machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, >> but they're substantially more exotic/rare/expensive.) > > I have a couple 48-core 1U boxes. They can build > gcc and other large packages very quickly. But are the makes definitely running in parallel to take advantage of the multiple cores? I haven't built gcc, so don't know if it uses make's -j option to do parallel builds. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:03:47 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:03:47 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <20110407085633.GE23560@leitl.org> References: <20110407085633.GE23560@leitl.org> Message-ID: <4D9DC443.9080502@ias.edu> On 04/07/2011 04:56 AM, Eugen Leitl wrote: > > "It's a really nice round number," says Stowe, the CEO and founder of Cycle > Computing, Clearly he's a marketing man. Everyone know real computer guys think in powers of 2. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:16:53 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:16:53 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <20110407085633.GE23560@leitl.org> References: <20110407085633.GE23560@leitl.org> Message-ID: <4D9DC755.5070004@ias.edu> A great publicity stunt, but I still don't think it qualifies as a "real" HPC cluster achievement. See comments/objections in-line below. On 04/07/2011 04:56 AM, Eugen Leitl wrote: > > http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2011/040611-linux-supercomputer.html&pagename=/news/2011/040611-linux-supercomputer.html&pageurl=http://www.networkworld.com/news/2011/040611-linux-supercomputer.html&site=datacenter&nsdr=n > > The cluster ran for eight hours That's not very long for HPC jobs. How much would the performance have degraded if it started to run into the daytime hours, when demand for CPU cycles in EC2 would be at their peak? > Genentech benefited from the high number of cores > because its calculations were "embarrassingly parallel," with no > communication between nodes, so performance stats "scaled linearly with the > number of cores," Corn said. > So it wasn't really a cluster at all, but a giant batch scheduling system. I probably have a stricter sense of what makes a cluster than some others, so let's not argue on the the definition of cluster and split hairs. In my book, a cluster involves parallel communication between the processes using MPI, PVM or some other parallel communications paradigm. And BTW, my comments are not directed Eugene for posting this. Just starting a general discussion on this article... -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:21:19 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:21:19 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. Message-ID: <4D9DC85F.9080503@ias.edu> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? In all these commercials, the protagonists say "to the cloud" for their solution, but then when they show them using Microsoft Windows to access "the cloud", they're not using the cloud at all. In fact, in one commercial, the one where the wife/mother is fixing the family portrait, she's using a photoshop-like program on her own desktop, not even the Internet is needed. Not only do they use the term "cloud" incorrectly, they don't even show how using Microsoft products give you and advantage for using "the cloud" AAAAAAARRRRRRRGGGH! Okay. Venting over. Whew! I feel better already. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Apr 7 10:27:27 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 15:27:27 +0100 Subject: [Beowulf] Microsoft "cloud" commercials. References: <4D9DC85F.9080503@ias.edu> Message-ID: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? > > In London there is a saturation of Microsoft Cloud advert posters in the mainline stations and Tube lines serving the City (the financial district) and Canary Wharf. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 10:40:28 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 10:40:28 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <4D9DC85F.9080503@ias.edu> <207BB2F60743C34496BE41039233A8090424538F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4D9DCCDC.1080607@ias.edu> On 04/07/2011 10:27 AM, Hearns, John wrote: >> >> Is anyone else as annoyed by the Microsoft "cloud" commercials as I > am? >> >> > In London there is a saturation of Microsoft Cloud advert posters in the > mainline stations > and Tube lines serving the City (the financial district) and Canary > Wharf. > But do they annoy you? ;) For those of you outside the US, here's the commercials I'm referring to: 1. http://youtu.be/-HRrbLA7rss 2. http://youtu.be/mjtqoQE_ezA 3. http://youtu.be/_lu6v6hE_bA 4. http://youtu.be/Lel3swo4RMc Out of these only (1) could possibly be using the cloud, if they're using Google docs or something similar to create and share their documents. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From scrusan at UR.Rochester.edu Thu Apr 7 10:49:09 2011 From: scrusan at UR.Rochester.edu (Crusan, Steve) Date: Thu, 7 Apr 2011 10:49:09 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. References: <4D9DC85F.9080503@ias.edu> Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> Windows HPC Server 2008 also has a builtin feature for an end user to submit excel docs to a windows cluster to do intense timesheet and office supplies calculations... ---------------------- Steve Crusan System Administrator Center for Research Computing -----Original Message----- From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal Sent: Thu 4/7/2011 10:21 AM To: Beowulf Mailing List Subject: [Beowulf] Microsoft "cloud" commercials. Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? In all these commercials, the protagonists say "to the cloud" for their solution, but then when they show them using Microsoft Windows to access "the cloud", they're not using the cloud at all. In fact, in one commercial, the one where the wife/mother is fixing the family portrait, she's using a photoshop-like program on her own desktop, not even the Internet is needed. Not only do they use the term "cloud" incorrectly, they don't even show how using Microsoft products give you and advantage for using "the cloud" AAAAAAARRRRRRRGGGH! Okay. Venting over. Whew! I feel better already. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dag at sonsorol.org Thu Apr 7 11:03:25 2011 From: dag at sonsorol.org (Chris Dagdigian) Date: Thu, 07 Apr 2011 11:03:25 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <4D9DC755.5070004@ias.edu> References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> Message-ID: <4D9DD23D.8090908@sonsorol.org> The CycleComputing folks are good people in my book and I bet more than a few are subscribed to this list. The founders are old-school Condor gurus with a long track record in this field. One of the nice things about their work is how "usable" it is to real people with real production computing requirements - in the IAAS cloud space there are way too many marketing robots talking vague BS about "cloud bursting", "hybrid clusters" and storage aggregation/access across LAN/WAN distances. Cycle has built, deployed & delivered all of this with (what I'd consider) a bare minimum of marketing and chest thumping. It's not a PR gimmick and limiting the definition of "cluster" to only systems that run parallel applications would alienate quite a few of us on this list :) In the life sciences a typical cluster might run a mixture of 80-90% serial jobs with a small scattering of real MPI apps running alongside. I get cynical about this stuff because in the cloud space you see way too many commercial people promising the world without actually delivering anything (other than carefully hand-managed reference account projects) while the academic & supercomputing folks are all busy presenting and bragging about things that will never see the light of day after their thesis defense. There are people like Cycle/Rightscale etc. etc. who actually rise above the hype and deliver clever & usable stuff with a minimum of marketing BS. My $.02 of course -Chris _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 11:05:53 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 11:05:53 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> References: <4D9DC85F.9080503@ias.edu> <9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> Message-ID: <4D9DD2D1.9070309@ias.edu> "Cluster" != "Cloud" The Cloud, by definition requires the Internet. Clusters do not. In fact, I bet the NSA can show you many clusters that are not connect to the Internet at all. While I'm at it, "Grid" != ("Cluster" || "Cloud") either! On 04/07/2011 10:49 AM, Crusan, Steve wrote: > Windows HPC Server 2008 also has a builtin feature for an end user to > submit excel docs to a windows cluster to do intense timesheet and > office supplies calculations... > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal > Sent: Thu 4/7/2011 10:21 AM > To: Beowulf Mailing List > Subject: [Beowulf] Microsoft "cloud" commercials. > > Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? > > In all these commercials, the protagonists say "to the cloud" for their > solution, but then when they show them using Microsoft Windows to access > "the cloud", they're not using the cloud at all. > > In fact, in one commercial, the one where the wife/mother is fixing the > family portrait, she's using a photoshop-like program on her own > desktop, not even the Internet is needed. > > Not only do they use the term "cloud" incorrectly, they don't even show > how using Microsoft products give you and advantage for using "the cloud" > > AAAAAAARRRRRRRGGGH! > > Okay. Venting over. Whew! I feel better already. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 11:13:43 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 11:13:43 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <4D9DD23D.8090908@sonsorol.org> References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> <4D9DD23D.8090908@sonsorol.org> Message-ID: <4D9DD4A7.7060601@ias.edu> On 04/07/2011 11:03 AM, Chris Dagdigian wrote: > > The CycleComputing folks are good people in my book and I bet more than > a few are subscribed to this list. The founders are old-school Condor > gurus with a long track record in this field. > > One of the nice things about their work is how "usable" it is to real > people with real production computing requirements - in the IAAS cloud > space there are way too many marketing robots talking vague BS about > "cloud bursting", "hybrid clusters" and storage aggregation/access > across LAN/WAN distances. Cycle has built, deployed & delivered all of > this with (what I'd consider) a bare minimum of marketing and chest > thumping. > > It's not a PR gimmick and limiting the definition of "cluster" to only > systems that run parallel applications would alienate quite a few of us > on this list :) In the life sciences a typical cluster might run a > mixture of 80-90% serial jobs with a small scattering of real MPI apps > running alongside. Do not confuse "scientific computing" or "high performance computing" with "cluster". All terms are definitely related, but you can do scientific/high-perfomance computing without a "cluster." As someone who also works in life sciences, I know that there are a lot of life science tasks that are embarrassingly parallel. Running these tasks on a bunch of different machines simultaneously is definitely scientific and high performance computing, but it doesn't necessarily require a cluster. Folding at home, for example. > > I get cynical about this stuff because in the cloud space you see way > too many commercial people promising the world without actually > delivering anything (other than carefully hand-managed reference account > projects) while the academic & supercomputing folks are all busy > presenting and bragging about things that will never see the light of > day after their thesis defense. Me, too, which is why I started ranting about Microsoft's cloud commercials in a separate thread. ;) It's also why I'm starting to get picky about how the term "cluster" is used. More and more, I see people confusing "cloud" with "cluster". I guess that cynicism is what caused me to reply to the original post. > > There are people like Cycle/Rightscale etc. etc. who actually rise above > the hype and deliver clever & usable stuff with a minimum of marketing BS. > > My $.02 of course > > -Chris > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From scrusan at UR.Rochester.edu Thu Apr 7 11:13:32 2011 From: scrusan at UR.Rochester.edu (Crusan, Steve) Date: Thu, 7 Apr 2011 11:13:32 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> <4D9DD2D1.9070309@ias.edu> Message-ID: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> Oh I understand the difference, but I thought I'd take this opportunity to bash MS. But, since MS's cloud runs off of MS Azure and MS Server 2008, I would bet the excel functionality would be possible. ---------------------- Steve Crusan System Administrator Center for Research Computing -----Original Message----- From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal Sent: Thu 4/7/2011 11:05 AM Cc: Beowulf Mailing List Subject: Re: [Beowulf] Microsoft "cloud" commercials. "Cluster" != "Cloud" The Cloud, by definition requires the Internet. Clusters do not. In fact, I bet the NSA can show you many clusters that are not connect to the Internet at all. While I'm at it, "Grid" != ("Cluster" || "Cloud") either! On 04/07/2011 10:49 AM, Crusan, Steve wrote: > Windows HPC Server 2008 also has a builtin feature for an end user to > submit excel docs to a windows cluster to do intense timesheet and > office supplies calculations... > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal > Sent: Thu 4/7/2011 10:21 AM > To: Beowulf Mailing List > Subject: [Beowulf] Microsoft "cloud" commercials. > > Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? > > In all these commercials, the protagonists say "to the cloud" for their > solution, but then when they show them using Microsoft Windows to access > "the cloud", they're not using the cloud at all. > > In fact, in one commercial, the one where the wife/mother is fixing the > family portrait, she's using a photoshop-like program on her own > desktop, not even the Internet is needed. > > Not only do they use the term "cloud" incorrectly, they don't even show > how using Microsoft products give you and advantage for using "the cloud" > > AAAAAAARRRRRRRGGGH! > > Okay. Venting over. Whew! I feel better already. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at mclaren.com Thu Apr 7 11:13:24 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 16:13:24 +0100 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> <4D9DD23D.8090908@sonsorol.org> Message-ID: <207BB2F60743C34496BE41039233A809042454EA@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > There are people like Cycle/Rightscale etc. etc. who actually rise > above > the hype and deliver clever & usable stuff with a minimum of marketing > BS. > > My $.02 of course Surely your $.02 per cpu per minute? The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 11:15:58 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 11:15:58 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> <4D9DD2D1.9070309@ias.edu> <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> Message-ID: <4D9DD52E.8040103@ias.edu> Oh, sorry. I missed the sarcasm. I thought you were defending MS. The "office supplies calculations" should have tripped my sarcasm detector immediately! Sorry. I'm in a rare (and ranting!) mood today. Must be time for a vacation. Prentice On 04/07/2011 11:13 AM, Crusan, Steve wrote: > Oh I understand the difference, but I thought I'd take this opportunity > to bash MS. > > But, since MS's cloud runs off of MS Azure and MS Server 2008, I would > bet the excel functionality would be possible. > > ---------------------- > Steve Crusan > System Administrator"Crusan, Steve" > Center for Research Computing > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal > Sent: Thu 4/7/2011 11:05 AM > Cc: Beowulf Mailing List > Subject: Re: [Beowulf] Microsoft "cloud" commercials. > > "Cluster" != "Cloud" > > The Cloud, by definition requires the Internet. Clusters do not. In > fact, I bet the NSA can show you many clusters that are not connect to > the Internet at all. > > While I'm at it, "Grid" != ("Cluster" || "Cloud") either! > > > On 04/07/2011 10:49 AM, Crusan, Steve wrote: >> Windows HPC Server 2008 also has a builtin feature for an end user to >> submit excel docs to a windows cluster to do intense timesheet and >> office supplies calculations... >> >> ---------------------- >> Steve Crusan >> System Administrator >> Center for Research Computing >> >> >> >> -----Original Message----- >> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal >> Sent: Thu 4/7/2011 10:21 AM >> To: Beowulf Mailing List >> Subject: [Beowulf] Microsoft "cloud" commercials. >> >> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? >> >> In all these commercials, the protagonists say "to the cloud" for their >> solution, but then when they show them using Microsoft Windows to access >> "the cloud", they're not using the cloud at all. >> >> In fact, in one commercial, the one where the wife/mother is fixing the >> family portrait, she's using a photoshop-like program on her own >> desktop, not even the Internet is needed. >> >> Not only do they use the term "cloud" incorrectly, they don't even show >> how using Microsoft products give you and advantage for using "the cloud" >> >> AAAAAAARRRRRRRGGGH! >> >> Okay. Venting over. Whew! I feel better already. >> >> -- >> Prentice >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Apr 7 11:35:24 2011 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 07 Apr 2011 11:35:24 -0400 Subject: [Beowulf] Westmere EX In-Reply-To: <4D9DC0D5.8060802@ias.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <4D9DC0D5.8060802@ias.edu> Message-ID: <4D9DD9BC.3090102@runnersroll.com> On 04/07/11 09:49, Prentice Bisbal wrote: > On 04/06/2011 10:15 PM, Jon Forrest wrote: >> On 4/6/2011 5:39 PM, Mark Hahn wrote: >> >>> shrug. does anyone have serious experience with real apps on manycore >>> machines? (I'm familiar with SGI boxes, where 80 is fairly ho-hum, >>> but they're substantially more exotic/rare/expensive.) >> >> I have a couple 48-core 1U boxes. They can build >> gcc and other large packages very quickly. > > But are the makes definitely running in parallel to take advantage of > the multiple cores? I haven't built gcc, so don't know if it uses make's > -j option to do parallel builds. > Yes, see: http://gcc.gnu.org/install/build.html In general I see quite nice speedups on my four-core machine at home running Gentoo, but I find running -j > cores up to 2xcores tends to produce better results as many packages (especially with recursive makes) tend to mix configuration (low cpu usage) with makes (high cpu usage). The gentoo handbook itself suggests cores+1 for the -j parameter. Higher than core -j counts is purely a heuristic, and a few packages will degrade a bit because in fact 8 (2x4cores) processes are spawned, each contending heavily for the 4 cores and context switching starts to slow things down and hurt locality. Once again, I suppose this is a YMMV situation. It would be cool to hack make to dynamically throttle parallelization based on cpu usage within some given bounds... I have access to a 48 core box, so if I get a chance I'll generate a graph for the list on gcc build times by -j count. Note however that I don't have root access so I can't clear caches, which should be taken into account when examining results. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Apr 7 11:42:18 2011 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 07 Apr 2011 11:42:18 -0400 Subject: [Beowulf] Microsoft "cloud" commercials. In-Reply-To: <4D9DD52E.8040103@ias.edu> References: <4D9DC85F.9080503@ias.edu><9B78C75A4DA8554DBF89D5B53E6361DF08ECE6FF@ITS-EXC2.UR.Rochester.edu> <4D9DD2D1.9070309@ias.edu> <9B78C75A4DA8554DBF89D5B53E6361DF08ECE700@ITS-EXC2.UR.Rochester.edu> <4D9DD52E.8040103@ias.edu> Message-ID: <4D9DDB5A.3030700@runnersroll.com> >>> -----Original Message----- >>> From: beowulf-bounces at beowulf.org on behalf of Prentice Bisbal >>> Sent: Thu 4/7/2011 10:21 AM >>> To: Beowulf Mailing List >>> Subject: [Beowulf] Microsoft "cloud" commercials. >>> >>> Is anyone else as annoyed by the Microsoft "cloud" commercials as I am? I completely agree. It's a darn shame all those Truth campaigns concentrate on drugs - clarifying popular media is a desperately needed service for so many domains (at least in US media). Although I have to admit I'm not sure if the cloud misnomer or the disgusting family dynamics of the Photoshop commercial are more bothering to me. Always the dopey dad with these commercials... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Apr 7 11:53:31 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 7 Apr 2011 16:53:31 +0100 Subject: [Beowulf] Westmere EX References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> Message-ID: <207BB2F60743C34496BE41039233A8090424562F@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > -----Original Message----- > From: Vincent Diepeveen [mailto:diep at xs4all.nl] > Sent: 07 April 2011 13:53 > > But a 8 socket @ 10 core nehalem-ex, in basic configuration will be > already far above $205k. Probably a $300k or > so when configured. > > Huge price difference. > > So i assume you didn't refer to the Nehalem-ex box. I was referring to the Nehalem. http://www.lasystems.be/Supermicro/SYS-5086B-TRF/Superserver5086B-TRF8-W ay/product/248987.html Add 8 CPUs at $4000 per cpu, and 64 DIMMs at $944 per DIMM The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Apr 7 12:11:13 2011 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 07 Apr 2011 12:11:13 -0400 Subject: [Beowulf] 10 kCore cluster in Amazon cloud, costs ~1 kUSD/hour In-Reply-To: <4D9DD23D.8090908@sonsorol.org> References: <20110407085633.GE23560@leitl.org> <4D9DC755.5070004@ias.edu> <4D9DD23D.8090908@sonsorol.org> Message-ID: <4D9DE221.2040806@runnersroll.com> On 04/07/11 11:03, Chris Dagdigian wrote: > One of the nice things about their work is how "usable" it is to real > people with real production computing requirements - in the IAAS cloud I wonder what "real" people with "real" production computing requirements means here. See below for further thoughts on my thoughts on "real" codes and where I suspect they arise. > It's not a PR gimmick and limiting the definition of "cluster" to only > systems that run parallel applications would alienate quite a few of us > on this list :) In the life sciences a typical cluster might run a > mixture of 80-90% serial jobs with a small scattering of real MPI apps > running alongside. I'm certainly a pragmatist here - use the machines as your organization feels is best. However I still have a strong suspicion that most jobs are serial because of: 1. Lack of experience properly parallelizing codes 2. Lack of proper environment on one's own desktop (i.e. Linux or group licenses) 3. In rare cases such rapid development and short lifetime of a code that parallelizing it will take longer than poorly serially coding it and tolerating the run-times. I can only hope that within the decade the programming paradigm shifts along with the hardware and the average bloke becomes at least exposed to basic parallel programming concepts. The machine is still a "cluster" - the way it's used shouldn't guide what it is referred to. That doesn't mean running serial jobs on a machine tailored for parallel ones is the best way to use your time/money. Probably better for one to simply buy Linux desktops for all the employees, put them on a typical GigE network and have the employees submit jobs to some tiny server in the Bosses office which routes jobs evenly to everyone's machine distributed throughout the building. ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Apr 7 12:25:35 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 07 Apr 2011 12:25:35 -0400 Subject: [Beowulf] Westmere EX In-Reply-To: <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> Message-ID: <4D9DE57F.4040303@ldeo.columbia.edu> Vincent Diepeveen wrote: > GPU monster box, which is basically a few videocards inside such a > box stacked up a tad, wil only add a couple of > thousands. > This price may be OK for the videocard-class GPUs, but sounds underestimated, at least for Fermi Tesla. Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, with 448 cores and 3GB RAM per GPU, cost around $10k. For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. If you care about ECC, that's the price you pay, right? Gus Correa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cap at nsc.liu.se Thu Apr 7 13:26:51 2011 From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=) Date: Thu, 7 Apr 2011 19:26:51 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <201104071926.56911.cap@nsc.liu.se> On Thursday, April 07, 2011 02:39:19 am Mark Hahn wrote: ... > I bet there will be 100x more 4s servers build with these chips than 8s. > and 1000x more 2s than 4s... Sounds about right :-) Not your average compute node by a long shot. > a friend noticed something weird on intel's spec sheets: > http://ark.intel.com/Product.aspx?id=53580&processor=E7-8870&spec-codes=SLC > 3E > > notice it says 32GB max memory size. even if that means 32GB/socket, > it's not all that much. Certainly looks odd on that page but does likely refer to max DIMM size. With 64 DIMMs (4 socket example) that would then give you 2T. > I don't know about everyone else, but I'm already bored with core counts ;) > these also seem fairly warm (130W), considering that they're the fancy > new 32nm process and run at modest clock rates... It's the size of the beast... (caused by the number of cores and size of last level cache). /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Thu Apr 7 15:26:57 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 7 Apr 2011 21:26:57 +0200 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <4D9DE57F.4040303@ldeo.columbia.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> Message-ID: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > Vincent Diepeveen wrote: > >> GPU monster box, which is basically a few videocards inside such a >> box stacked up a tad, wil only add a couple of >> thousands. >> > > This price may be OK for the videocard-class GPUs, > but sounds underestimated, at least for Fermi Tesla. Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 note there is a 6 GB version, not aware of price will be $$$$ i bet. or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro VERSUS 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. Factor 100 difference to those cards. A couple of thousands versus a couple of hundreds of thousands. Hope i made my point clear. > Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, > with 448 cores and 3GB RAM per GPU, cost around $10k. > For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. > If you care about ECC, that's the price you pay, right? When fermi released it was a great gpu. Regrettably they lobotomized the gamers card's double precision as i understand, So it hardly has double precision capabilities; if you go for nvidia you sure need a Tesla, no question about it. As a company i would buy in 6990's though, they're a lot cheaper and roughly 3x faster than the Nvidia's (for some more than 3x for other occassions less than 3x, note the card has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD versus 448 cores nvidia with 448 execution units of 32 bits multiplication. Especially because multiplication has improved a lot. Already having written CUDA code some while ago, i wanted the cheap gamers card with big horse power now at home so i'm toying on a 6970 now so will be able to report to you what is possible to achieve at that card with respect to prime numbers and such. I'm a bit amazed so little public initiatives write code for the AMD gpu's. Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a CRC calculation (if i understand it correctly). It's a bit more primitive than ECC, but works pretty ok and shows you also when problems occured there, so figuring out remove what goes on is possible. Make no mistake that this isn't ECC. We know some HPC centers have as a hard requirement ECC, only nvidia is an alternative then. In earlier posts from some time ago and some years ago i already wrote on that governments should adapt more to how hardware develops rather than demand that hardware has to follow them. HPC has too little cash to demand that from industry. OpenCL i cannot advice at this moment (for a number of reasons). AMD-CAL and CUDA are somewhat similar. Sure there is differences, but majority of codes are possible to port quite well (there is exceptions), or easy work arounds. Any company doing gpgpu i would advice developing both branches of code at the same time, as that gives the company a lot of extra choices for really very little extra work. Maybe 1 coder, and it always allows you to have the fastest setup run your production code. That said we can safely expect that from raw performance coming years AMD will keep the leading edge from crunching viewpoint. Elsewhere i pointed out why. Even then i'd never bet at just 1 manufacturer. Go for both considering the cheap price of it. For a lot of HPC centers the choice of nvidia will be an easy one, as the price of the Fermi cards is peanuts compared to the price rest of the system and considering other demands that's what they'll go for. That might change once you stick in bunches of videocards in nodes. Please note that the gpu 'streamcores' or PE's whatever name you want to give them, are so bloody fast, that your code has to work within the PE's themselves and hardly use the RAM. Both for Nvidia as well as AMD, the streamcores are so fast, that you simply don't want to lose time on the RAM when your software runs, let alone that you want to use huge RAM. Add to that, that nvidia (have to still figure out for AMD) can in background stream from and to the gpu's RAM from the CPU, so if you do really large calculations involving many nodes, all that shouldn't be an issue in the first place. So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would really amaze me, though i'm sure there is cases where that happens. If we see however what was ordered it mostly is the 3GB Tesla's, at least on what has been reported, i have no global statistics on that... Now all choices are valid there, but even then we speak about peanuts money compared to the price of a single 8 socket Nehalem-ex box, which fully configured will be maybe $300k-$400k or something? Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 is 2000 euro. There won't be 2 gpu nvidia's any soon because of the choice they have historically made for the memory controllers. See explanation of intel fanboy David Kanter for that at realworldtech in a special article he wrote there. Please note i'm not judging AMD nor Nvidia, they have made their choices based upon totally different businessmodels i suspect and we must be happy we have this rich choice right now between cpu's from different manufacturers and gpu's from different manufacturers. Nvidia really seems to aim at supercomputers, giving their tesla line without lobotomization and lobotomizing their gamers cards, where AMD aims at gamers and their gamercards have full functionality without lobotomization. Total different businessmodels. Both have their advantages and disadvantages. From pure performance viewpoint it's easy to see what's faster though. Yet right now i realize all too well that just too many still hesitate between also offering gpu services additional to cpu services, in which case having a gpu, regardless nvidia or amd, kicks butt of course from throughput viewpoint. To be really honest with you guys, i had expected that by 2011 we would have a gpu reaching far over 1 Teraflop double precision handsdown. If we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 gpu's on a single card to get over that Teraflop double precision (claim is 1.27 Teraflop double precision), that really is underneath my expectations from a few years ago. Now of course i hope you realize i'm not coding double precision code at all; i'm writing everything in integers of 32 bits for the AMD card and the Nvidia equivalent also is using 32 bits integers. The ideal way to do calculations on those cards, so also very big transforms, is using the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD). Regards, Vincent > > Gus Correa > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Apr 7 15:44:25 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 07 Apr 2011 15:44:25 -0400 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <4D9E1419.9000408@ias.edu> On 04/07/2011 03:26 PM, Vincent Diepeveen wrote: > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > You can't do a direct comparison between a CPU and a GPU. There are many things that GPUs can't do (or can't do well) that are still better done on a CPU. Even NVidia acknowledges in most of their promotional and educational literature. One example would be a code with a lot of branching. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Apr 7 16:37:46 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 07 Apr 2011 16:37:46 -0400 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <4D9E209A.1040408@ldeo.columbia.edu> Vincent Diepeveen wrote: > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > Not so much. In your original message you said: "GPU monster box, which is basically a few videocards inside such a box stacked up a tad, wil only add a couple of thousands." So, first it was a few GPUs on a box (whatever else the box might have inside) for a couple of thousand (if dollars or euros you did not specify). Now you checked out the real prices, and said that a *single* Fermi Tesla C2070 cost ~$2,200 (just the GPU alone, price in US dollars I suppose), which is more like the real thing. However, instead of admitting that your previous numbers were mistaken, you insist that: "Hope i made my point clear.". Is this how you play chess? :) Even if your opponent is a computer, he/she/it might get a bit discouraged. You always win, even before the game starts. Anyway, I don't play chess, I am no GPU expert, I don't know about the lobotomizing of Fermi (I hope you're not talking about Enrico, he's dead), and I don't think we're going anywhere with this discussion. However, the GPU prices you sent in your original email to the list were underestimated, although I am afraid I may not be able to make this point go across to you. The prices you sent were too low, at least when it comes to GPUs with ECC, which is what is reliable for HPC. Thank you, Gus Correa > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia you > sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less than > 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD > versus 448 cores nvidia with 448 execution units of 32 bits multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able to > report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a > CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, but > works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on is > possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia is > an alternative then. > > In earlier posts from some time ago and some years ago i already wrote > on that governments should > adapt more to how hardware develops rather than demand that hardware has > to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of code > at the same time, > as that gives the company a lot of extra choices for really very little > extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your production > code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both considering > the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want to > give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use the > RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would > really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered it > mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be maybe > $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 > is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they have > historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at realworldtech > in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their choices > based upon totally different > businessmodels i suspect and we must be happy we have this rich choice > right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still hesitate > between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we would > have a gpu reaching far over 1 Teraflop double precision handsdown. If > we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 > gpu's on a single card to get over that Teraflop double precision (claim > is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code at > all; i'm writing everything in integers of 32 bits for the AMD card and > the Nvidia equivalent also is using 32 bits integers. The ideal way to > do calculations on those cards, so also very big transforms, is using > the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Thu Apr 7 18:57:38 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Fri, 08 Apr 2011 05:57:38 +0700 Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to tackle huge scientific challenges Message-ID: <4D9E4162.3030004@pathscale.com> I just saw this on another ML and thought it may be of interest ------------ http://googleblog.blogspot.com/2011/04/1-billion-computing-core-hours-for.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Apr 7 21:03:07 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 07 Apr 2011 21:03:07 -0400 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <4D9E5ECB.60608@ldeo.columbia.edu> Thank you for the information about AMD-CAL and the AMD GPUs. Does AMD plan any GPU product with 64-bit and ECC, similar to Tesla/Fermi? The lack of a language standard may still be a hurdle here. I guess there were old postings here about CUDA and OpenGL. What fraction of the (non-gaming) GPU code is being written these days in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using compiler directives like those in the PGI compilers? Thank you, Gus Correa Vincent Diepeveen wrote: > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia you > sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less than > 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD > versus 448 cores nvidia with 448 execution units of 32 bits multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able to > report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a > CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, but > works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on is > possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia is > an alternative then. > > In earlier posts from some time ago and some years ago i already wrote > on that governments should > adapt more to how hardware develops rather than demand that hardware has > to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of code > at the same time, > as that gives the company a lot of extra choices for really very little > extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your production > code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both considering > the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want to > give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use the > RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would > really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered it > mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be maybe > $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 > is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they have > historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at realworldtech > in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their choices > based upon totally different > businessmodels i suspect and we must be happy we have this rich choice > right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still hesitate > between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we would > have a gpu reaching far over 1 Teraflop double precision handsdown. If > we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 > gpu's on a single card to get over that Teraflop double precision (claim > is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code at > all; i'm writing everything in integers of 32 bits for the AMD card and > the Nvidia equivalent also is using 32 bits integers. The ideal way to > do calculations on those cards, so also very big transforms, is using > the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From kilian.cavalotti.work at gmail.com Fri Apr 8 07:08:01 2011 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Fri, 8 Apr 2011 13:08:01 +0200 Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: Hi Mark, On Thu, Apr 7, 2011 at 2:39 AM, Mark Hahn wrote: > notice it says 32GB max memory size. ?even if that means 32GB/socket, > it's not all that much. It's actually 32GB per DIMM, so up to 512GB per socket. Cheers, -- Kilian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Fri Apr 8 08:45:09 2011 From: deadline at eadline.org (Douglas Eadline) Date: Fri, 8 Apr 2011 08:45:09 -0400 (EDT) Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> Message-ID: <47386.192.168.93.213.1302266709.squirrel@mail.eadline.org> All: This video may help clear things up: http://www.youtube.com/watch?v=usGkq7tAhfc have a nice weekend -- Doug > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia > you sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less > than 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for > AMD > versus 448 cores nvidia with 448 execution units of 32 bits > multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able > to report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD > gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of > AMD a CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, > but works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on > is possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia > is an alternative then. > > In earlier posts from some time ago and some years ago i already > wrote on that governments should > adapt more to how hardware develops rather than demand that hardware > has to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of > code at the same time, > as that gives the company a lot of extra choices for really very > little extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your > production code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both > considering the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want > to give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use > the RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many > nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that > would really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered > it mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on > that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be > maybe $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD > 6990 is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they > have historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at > realworldtech in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their > choices based upon totally different > businessmodels i suspect and we must be happy we have this rich > choice right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still > hesitate between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we > would have a gpu reaching far over 1 Teraflop double precision > handsdown. If we see that Nvidia delivers somewhere around 515 Gflop > and AMD has 2 gpu's on a single card to get over that Teraflop double > precision (claim is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code > at all; i'm writing everything in integers of 32 bits for the AMD > card and the Nvidia equivalent also is using 32 bits integers. The > ideal way to do calculations on those cards, so also very big > transforms, is using the 32 x 32 == 64 bits instructions (that's 2 > instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Fri Apr 8 08:45:08 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 8 Apr 2011 08:45:08 -0400 (EDT) Subject: [Beowulf] Westmere EX In-Reply-To: References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: >> notice it says 32GB max memory size. ??even if that means 32GB/socket, >> it's not all that much. > > It's actually 32GB per DIMM, so up to 512GB per socket. right - I eventually found the non-marketing docs. each socket has two memory controllers, each of which supports 2 "intel scalable memory" channels, which support an intel scalable memory buffer, which supports 4 dimms. (the ISMB actually referred to as "advanced memory buffer" in one place, like from fbdimm days...) it also has double-bit correction, triple bit detection on the last-level cache. definitely not designed for cheap or even compact systems... -mark -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eugen at leitl.org Fri Apr 8 15:42:37 2011 From: eugen at leitl.org (Eugen Leitl) Date: Fri, 8 Apr 2011 21:42:37 +0200 Subject: [Beowulf] [FoRK] Cray help?? Re: FaceBook tries to cream Google Message-ID: <20110408194237.GH23560@leitl.org> ----- Forwarded message from "J. Andrew Rogers" ----- From: "J. Andrew Rogers" Date: Fri, 8 Apr 2011 12:27:35 -0700 To: Friends of Rohit Khare Subject: Re: [FoRK] Cray help?? Re: FaceBook tries to cream Google X-Mailer: Apple Mail (2.1084) Reply-To: Friends of Rohit Khare On Apr 8, 2011, at 11:15 AM, Stephen Williams wrote: > I used RabbitMQ not long ago. Impressed with some of it, not with a lot of the rest. Digging through Erlang to determine its real details and limitations was interesting. The group that had chosen it assumed magic that was not there. Bottlenecks were going to kill scalability using the naive design. ZeroMQ is not an MQ despite its name. It is a high-performance implementation of messaging design patterns, including some that are MQ-like. I believe it had aspirations to be an MQ many years ago but turned into an MPI-like high-performance messaging library that abstracts network, IPC, and in-process communication. The basic network performance and scalability of ZeroMQ is similar to MPI. Underneath the hood it is just a collection of lockless, async structures grafted to the usual operating system hooks. Thinking of it as a competitor to MPI in terms of basic functionality is probably the correct framing. J. Andrew Rogers _______________________________________________ FoRK mailing list http://xent.com/mailman/listinfo/fork ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Fri Apr 8 15:42:51 2011 From: eugen at leitl.org (Eugen Leitl) Date: Fri, 8 Apr 2011 21:42:51 +0200 Subject: [Beowulf] [FoRK] FaceBook tries to cream Google Message-ID: <20110408194251.GI23560@leitl.org> ----- Forwarded message from "J. Andrew Rogers" ----- From: "J. Andrew Rogers" Date: Fri, 8 Apr 2011 10:36:31 -0700 To: Friends of Rohit Khare Subject: Re: [FoRK] FaceBook tries to cream Google X-Mailer: Apple Mail (2.1084) Reply-To: Friends of Rohit Khare On Apr 8, 2011, at 8:05 AM, Stephen Williams wrote: > > Agreed. Strange that MPI isn't more widely used (outside supercomputing projects). Although, I'm not aware of it expecting and handling faults / rework as a good Mapreduce imitation, and similar systems, must. It is not that strange, MPI is a bit brittle as a communication library standard. Implementations tend to make simplifying assumptions that are not valid for some parallel applications. You can patch it up to do anything but the level of effort required seems to relegate it to just being used in scientific computing for which it was designed. I've seen ZeroMQ being increasingly used for roughly the same purpose as MPI in "normal" distributed systems, and I personally do not see much reason to prefer the latter over the former for most things. The difference is history. MPI's weakness is that it started from a mediocre design that immediately became part of a standards process, with all the politics and buy-in that entails. It is also badly documented as a practical matter. ZMQ also started with a somewhat dodgy early design but as a library rather than a standard; it was iterated by hackers over several versions into a more sensible and capable design. ZMQ has been willing to break backward compatibility to fix behaviors that irritated the programmers that use it or add badly needed features, which is possible because the "standard" is the implementation. J. Andrew Rogers _______________________________________________ FoRK mailing list http://xent.com/mailman/listinfo/fork ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Tue Apr 12 16:31:41 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Tue, 12 Apr 2011 16:31:41 -0400 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement - pre-alpha release Message-ID: If you are using the "Job to Core Binding" feature in SGE and running SGE on newer hardware, then please give the new hwloc enabled loadcheck a try. http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html The current hardware topology discovery library (Portable Linux Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new hardware topology may not be detected correctly by PLPA. If you are running SGE on AMD Magny-Cours servers, please post your loadcheck output, as it is known to be wrong when handled by PLPA. The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc support in later releases of Grid Engine / Grid Scheduler. http://gridscheduler.sourceforge.net/ Thanks!! Rayson _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Wed Apr 13 12:21:21 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Wed, 13 Apr 2011 12:21:21 -0400 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release In-Reply-To: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2> References: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2> Message-ID: Carlos, I notice that you have "lx24-amd64" instead of "lx26-amd64" for the arch string, so I believe you are running the loadcheck from standard Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of the one from the Open Grid Scheduler page. The existing Grid Engine (including the latest Open Grid Scheduler releases: SGE 6.2u5p1 & SGE 6.2u5p2, or Univa's fork) uses PLPA, and it is known to be wrong on magny-cours. (i.e. SGE 6.2u5p1 & SGE 6.2u5p2 from: http://sourceforge.net/projects/gridscheduler/files/ ) Chansup on the Grid Engine mailing list (it's the general purpose Grid Engine mailing list for now) tested the version I uploaded last night, and seems to work on a dual-socket magny-cours AMD machine. It prints: m_topology SCCCCCCCCCCCCSCCCCCCCCCCCC However, I am still fixing the processor, core id mapping code: http://gridengine.org/pipermail/users/2011-April/000629.html http://gridengine.org/pipermail/users/2011-April/000628.html I compiled the hwloc enabled loadcheck on kernel 2.6.34 & glibc 2.12, so it may not work on machines running lower kernel or glibc versions, you can download it from: http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html Rayson On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez wrote: > This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD system > (and seems to be wrong!): > > arch ? ? ? ? ? ?lx24-amd64 > num_proc ? ? ? ?24 > m_socket ? ? ? ?2 > m_core ? ? ? ? ?12 > m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT > load_short ? ? ?0.29 > load_medium ? ? 0.13 > load_long ? ? ? 0.04 > mem_free ? ? ? ?26257.382812M > swap_free ? ? ? 8191.992188M > virtual_free ? ?34449.375000M > mem_total ? ? ? 32238.328125M > swap_total ? ? ?8191.992188M > virtual_total ? 40430.320312M > mem_used ? ? ? ?5980.945312M > swap_used ? ? ? 0.000000M > virtual_used ? ?5980.945312M > cpu ? ? ? ? ? ? 0.0% > > > Carlos Fernandez Sanchez > Systems Manager > CESGA > Avda. de Vigo s/n. Campus Vida > Tel.: (+34) 981569810, ext. 232 > 15705 - Santiago de Compostela > SPAIN > > -------------------------------------------------- > From: "Rayson Ho" > Sent: Tuesday, April 12, 2011 10:31 PM > To: "Beowulf List" > Subject: [Beowulf] Grid Engine multi-core thread binding enhancement > -pre-alpha release > >> If you are using the "Job to Core Binding" feature in SGE and running >> SGE on newer hardware, then please give the new hwloc enabled >> loadcheck a try. >> >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> The current hardware topology discovery library (Portable Linux >> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new >> hardware topology may not be detected correctly by PLPA. >> >> If you are running SGE on AMD Magny-Cours servers, please post your >> loadcheck output, as it is known to be wrong when handled by PLPA. >> >> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc >> support in later releases of Grid Engine / Grid Scheduler. >> >> http://gridscheduler.sourceforge.net/ >> >> Thanks!! >> Rayson >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Fri Apr 15 10:12:00 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Fri, 15 Apr 2011 10:12:00 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) Message-ID: Hi all, Distributing Linux application binaries is proven to be a major issue as a lot of people wanted to test the hwloc loadcheck but their Linux versions are older than mine. And compiling SGE from source is not simple neither -- I wrote a quick & dirty guide for those who don't want the add-ons but it's usually the extra stuff & dependencies that fail the build. So I would like to offer pre-compiled binaries and upload them onto sourceforge. I know it's a complicated question - what version of Linux should I use to build Grid Engine / Open Grid Scheduler when the binaries are for others to consume?? (In case you are interested, the quick compile guide is at: http://gridscheduler.sourceforge.net/CompileGridEngineSource.html ) Prakashan: I tried to link it statically, and I even tried to compile an older version of glibc on my machine, but I could not get either of them to work!! Rayson On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath wrote: > Hi Rayson, > > Do you have a statically linked version? Thanks. > > ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by > ./loadcheck) > > Prakashan > > > > On 04/13/2011 09:21 AM, Rayson Ho wrote: >> >> Carlos, >> >> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the >> arch string, so I believe you are running the loadcheck from standard >> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of >> the one from the Open Grid Scheduler page. >> >> The existing Grid Engine (including the latest Open Grid Scheduler >> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and >> it is known to be wrong on magny-cours. >> >> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from: >> http://sourceforge.net/projects/gridscheduler/files/ ) >> >> >> Chansup on the Grid Engine mailing list (it's the general purpose Grid >> Engine mailing list for now) tested the version I uploaded last night, >> and seems to work on a dual-socket magny-cours AMD machine. It prints: >> >> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC >> >> However, I am still fixing the processor, core id mapping code: >> >> http://gridengine.org/pipermail/users/2011-April/000629.html >> http://gridengine.org/pipermail/users/2011-April/000628.html >> >> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12, >> so it may not work on machines running lower kernel or glibc versions, >> you can download it from: >> >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> Rayson >> >> >> >> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez >> ?wrote: >>> >>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD >>> system >>> (and seems to be wrong!): >>> >>> arch ? ? ? ? ? ?lx24-amd64 >>> num_proc ? ? ? ?24 >>> m_socket ? ? ? ?2 >>> m_core ? ? ? ? ?12 >>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT >>> load_short ? ? ?0.29 >>> load_medium ? ? 0.13 >>> load_long ? ? ? 0.04 >>> mem_free ? ? ? ?26257.382812M >>> swap_free ? ? ? 8191.992188M >>> virtual_free ? ?34449.375000M >>> mem_total ? ? ? 32238.328125M >>> swap_total ? ? ?8191.992188M >>> virtual_total ? 40430.320312M >>> mem_used ? ? ? ?5980.945312M >>> swap_used ? ? ? 0.000000M >>> virtual_used ? ?5980.945312M >>> cpu ? ? ? ? ? ? 0.0% >>> >>> >>> Carlos Fernandez Sanchez >>> Systems Manager >>> CESGA >>> Avda. de Vigo s/n. Campus Vida >>> Tel.: (+34) 981569810, ext. 232 >>> 15705 - Santiago de Compostela >>> SPAIN >>> >>> -------------------------------------------------- >>> From: "Rayson Ho" >>> Sent: Tuesday, April 12, 2011 10:31 PM >>> To: "Beowulf List" >>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement >>> -pre-alpha release >>> >>>> If you are using the "Job to Core Binding" feature in SGE and running >>>> SGE on newer hardware, then please give the new hwloc enabled >>>> loadcheck a try. >>>> >>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >>>> >>>> The current hardware topology discovery library (Portable Linux >>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new >>>> hardware topology may not be detected correctly by PLPA. >>>> >>>> If you are running SGE on AMD Magny-Cours servers, please post your >>>> loadcheck output, as it is known to be wrong when handled by PLPA. >>>> >>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc >>>> support in later releases of Grid Engine / Grid Scheduler. >>>> >>>> http://gridscheduler.sourceforge.net/ >>>> >>>> Thanks!! >>>> Rayson >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Apr 15 10:19:04 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 15 Apr 2011 10:19:04 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: Message-ID: <4DA853D8.8000308@scalableinformatics.com> On 04/15/2011 10:12 AM, Rayson Ho wrote: > I know it's a complicated question - what version of Linux should I > use to build Grid Engine / Open Grid Scheduler when the binaries are > for others to consume?? I'd recommend a Centos 5.x variant, and possibly a SuSE variant. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Apr 15 12:15:38 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 15 Apr 2011 12:15:38 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: <4DA853D8.8000308@scalableinformatics.com> References: <4DA853D8.8000308@scalableinformatics.com> Message-ID: <4DA86F2A.5010708@ias.edu> On 04/15/2011 10:19 AM, Joe Landman wrote: > On 04/15/2011 10:12 AM, Rayson Ho wrote: > >> I know it's a complicated question - what version of Linux should I >> use to build Grid Engine / Open Grid Scheduler when the binaries are >> for others to consume?? > > I'd recommend a Centos 5.x variant, and possibly a SuSE variant. > I agree, but I think that if you can get your hands on an actual RHEL image, that's what you should use, as long as you already have access to it. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Fri Apr 15 12:25:10 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Fri, 15 Apr 2011 12:25:10 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: <4DA86F2A.5010708@ias.edu> References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> Message-ID: Thanks all!! If I build on Centos 5.6, will the binaries run on SuSE & Ubuntu?? (Don't want what versions SuSE & Ubuntu most people are using -- I have Ubuntu 10 & 11 on my machines, and F13.) Rayson On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal wrote: > On 04/15/2011 10:19 AM, Joe Landman wrote: >> On 04/15/2011 10:12 AM, Rayson Ho wrote: >> >>> I know it's a complicated question - what version of Linux should I >>> use to build Grid Engine / Open Grid Scheduler when the binaries are >>> for others to consume?? >> >> I'd recommend a Centos 5.x variant, and possibly a SuSE variant. >> > > I agree, but I think that if you can get your hands on an actual RHEL > image, that's what you should use, as long as you already have access to > it. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Apr 15 13:49:55 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 15 Apr 2011 13:49:55 -0400 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> Message-ID: <4DA88543.6000603@ias.edu> On 04/15/2011 01:40 PM, Chi Chan wrote: > On Fri, Apr 15, 2011 at 12:15 PM, Prentice Bisbal wrote: >> I agree, but I think that if you can get your hands on an actual RHEL >> image, that's what you should use, as long as you already have access to >> it. > > Or just use Oracle Linux, it is free to download and distribute, and > can be used in production: > > http://www.oracle.com/us/technologies/linux/competitive-335546.html > http://www.oracle.com/us/technologies/027617.pdf > > From my experience, Oracle Linux and RHEL are idential, you can > compile applications on Oracle Linux and ship it to run on RHEL boxes. I had recommended RHEL just because its the "gold standard" for all RHEL-derived distros. CentOS and a few others *should* be identical. However, I don't think Oracle is. Doesn't Oracle make some changes to optimize it for running Oracle? I'm not sure of that, which is why I'm asking and not stating. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Sun Apr 17 20:36:40 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Mon, 18 Apr 2011 10:36:40 +1000 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> Message-ID: <4DAB8798.8070102@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 16/04/11 02:25, Rayson Ho wrote: > If I build on Centos 5.6, will the binaries run on > SuSE & Ubuntu?? I'd suggest that if you want them to work (and especially if you want to package them appropriately) then you're far better off getting a build machine for the OS's you want to support. CentOS, SLES, Debian & Ubuntu. We build all our x86 stuff on our CentOS5 cluster and rsync it over to our RHEL5 cluster (sadly we can't just share /usr/local/ between them due to circumstances beyond our control) without issues. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz 430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy =stQg -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Mon Apr 18 09:03:50 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 18 Apr 2011 15:03:50 +0200 Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: <4DAB8798.8070102@unimelb.edu.au> References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> <4DAB8798.8070102@unimelb.edu.au> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 18.04.2011 um 02:36 schrieb Christopher Samuel: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 16/04/11 02:25, Rayson Ho wrote: > >> If I build on Centos 5.6, will the binaries run on >> SuSE & Ubuntu?? > > I'd suggest that if you want them to work (and especially > if you want to package them appropriately) then you're > far better off getting a build machine for the OS's you > want to support. CentOS, SLES, Debian & Ubuntu. Before there was only a common and a platform specific tarball. Does it imply to supply *.rpm in the future? It was always nice to just untar SGE and run it as a normal user w/o any root privilege (yes, rpm2cpio could do). And it was one tarball for all Linux variants. I would vote for staying with this. - -- Reuti > We build all our x86 stuff on our CentOS5 cluster and > rsync it over to our RHEL5 cluster (sadly we can't just > share /usr/local/ between them due to circumstances > beyond our control) without issues. > > cheers, > Chris > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk2rh5gACgkQO2KABBYQAh88rQCgh4JpW+uguOJktV6nMgAbc0mz > 430AnRVNuggLdGYH1rm5Fg2oDcFDoCmy > =stQg > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.16 (Darwin) iEYEARECAAYFAk2sNsQACgkQo/GbGkBRnRr55QCdGyBkTKd7EsTWSvVPRWuMQbGA kOQAniYFwJyMOlwcR3ITHS9nAfGRZndh =iknW -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Apr 18 11:24:00 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 18 Apr 2011 11:24:00 -0400 (EDT) Subject: [Beowulf] Grid Engine build machine (was: multi-core thread binding enhancement) In-Reply-To: References: <4DA853D8.8000308@scalableinformatics.com> <4DA86F2A.5010708@ias.edu> <4DAB8798.8070102@unimelb.edu.au> Message-ID: not to be overly surly, but this really has nothing to do with beowulf and is a rather specialized sge support issue... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Mon Apr 18 12:34:11 2011 From: mathog at caltech.edu (David Mathog) Date: Mon, 18 Apr 2011 09:34:11 -0700 Subject: [Beowulf] Grid Engine build machine Message-ID: Rayson Ho wrote > And compiling SGE from source is not > simple neither -- I wrote a quick & dirty guide for those who don't > want the add-ons but it's usually the extra stuff & dependencies that > fail the build. Does it still use aimk or has it finally gone over to autoconf, automake? As I recall aimk was really touchy the last time I built this (4 years ago), with lots of futzing around to convince it to use library files it should have found on its own. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Mon Apr 18 12:36:19 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 18 Apr 2011 18:36:19 +0200 Subject: [Beowulf] Grid Engine build machine In-Reply-To: References: Message-ID: Am 18.04.2011 um 18:34 schrieb David Mathog: > Rayson Ho wrote >> And compiling SGE from source is not >> simple neither -- I wrote a quick & dirty guide for those who don't >> want the add-ons but it's usually the extra stuff & dependencies that >> fail the build. > > Does it still use aimk Still aimk. -- Reuti > or has it finally gone over to autoconf, automake? > As I recall aimk was really touchy the last time I built this (4 > years ago), with lots of futzing around to convince it to use library > files it should have found on its own. > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Mon Apr 18 14:26:57 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Mon, 18 Apr 2011 14:26:57 -0400 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release In-Reply-To: <4DA5E85D.4010801@ats.ucla.edu> References: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2> <4DA5E85D.4010801@ats.ucla.edu> Message-ID: For those who had issues with earlier version, please try the latest loadcheck v4: http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html I compiled the binary on Oracle Linux, which is compatible with RHEL 5.x, Scientific Linux or Centos 5.x. I tested the binary on the standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise Kernel", Fedora 13, Ubuntu 10.04 LTS. Optimizing for AMD's NUMA machine characteristics is on the ToDo list. Rayson On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath wrote: > Hi Rayson, > > Do you have a statically linked version? Thanks. > > ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by > ./loadcheck) > > Prakashan > > > > On 04/13/2011 09:21 AM, Rayson Ho wrote: >> >> Carlos, >> >> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the >> arch string, so I believe you are running the loadcheck from standard >> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of >> the one from the Open Grid Scheduler page. >> >> The existing Grid Engine (including the latest Open Grid Scheduler >> releases: SGE 6.2u5p1& ?SGE 6.2u5p2, or Univa's fork) uses PLPA, and >> it is known to be wrong on magny-cours. >> >> (i.e. SGE 6.2u5p1& ?SGE 6.2u5p2 from: >> http://sourceforge.net/projects/gridscheduler/files/ ) >> >> >> Chansup on the Grid Engine mailing list (it's the general purpose Grid >> Engine mailing list for now) tested the version I uploaded last night, >> and seems to work on a dual-socket magny-cours AMD machine. It prints: >> >> m_topology ? ? ?SCCCCCCCCCCCCSCCCCCCCCCCCC >> >> However, I am still fixing the processor, core id mapping code: >> >> http://gridengine.org/pipermail/users/2011-April/000629.html >> http://gridengine.org/pipermail/users/2011-April/000628.html >> >> I compiled the hwloc enabled loadcheck on kernel 2.6.34& ?glibc 2.12, >> so it may not work on machines running lower kernel or glibc versions, >> you can download it from: >> >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> Rayson >> >> >> >> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez >> ?wrote: >>> >>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD >>> system >>> (and seems to be wrong!): >>> >>> arch ? ? ? ? ? ?lx24-amd64 >>> num_proc ? ? ? ?24 >>> m_socket ? ? ? ?2 >>> m_core ? ? ? ? ?12 >>> m_topology ? ? ?SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT >>> load_short ? ? ?0.29 >>> load_medium ? ? 0.13 >>> load_long ? ? ? 0.04 >>> mem_free ? ? ? ?26257.382812M >>> swap_free ? ? ? 8191.992188M >>> virtual_free ? ?34449.375000M >>> mem_total ? ? ? 32238.328125M >>> swap_total ? ? ?8191.992188M >>> virtual_total ? 40430.320312M >>> mem_used ? ? ? ?5980.945312M >>> swap_used ? ? ? 0.000000M >>> virtual_used ? ?5980.945312M >>> cpu ? ? ? ? ? ? 0.0% >>> >>> >>> Carlos Fernandez Sanchez >>> Systems Manager >>> CESGA >>> Avda. de Vigo s/n. Campus Vida >>> Tel.: (+34) 981569810, ext. 232 >>> 15705 - Santiago de Compostela >>> SPAIN >>> >>> -------------------------------------------------- >>> From: "Rayson Ho" >>> Sent: Tuesday, April 12, 2011 10:31 PM >>> To: "Beowulf List" >>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement >>> -pre-alpha release >>> >>>> If you are using the "Job to Core Binding" feature in SGE and running >>>> SGE on newer hardware, then please give the new hwloc enabled >>>> loadcheck a try. >>>> >>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >>>> >>>> The current hardware topology discovery library (Portable Linux >>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new >>>> hardware topology may not be detected correctly by PLPA. >>>> >>>> If you are running SGE on AMD Magny-Cours servers, please post your >>>> loadcheck output, as it is known to be wrong when handled by PLPA. >>>> >>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc >>>> support in later releases of Grid Engine / Grid Scheduler. >>>> >>>> http://gridscheduler.sourceforge.net/ >>>> >>>> Thanks!! >>>> Rayson >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 21 08:59:30 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 21 Apr 2011 14:59:30 +0200 Subject: [Beowulf] GPU's - was Westmere EX In-Reply-To: <4D9E5ECB.60608@ldeo.columbia.edu> References: <207BB2F60743C34496BE41039233A809041944F3@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <4D9D1E35.9040802@berkeley.edu> <207BB2F60743C34496BE41039233A809041E4257@MRL-PWEXCHMB02.mil.tagmclarengroup.com> <5C69A796-AAFC-4F4C-AD31-304AEDF2736D@xs4all.nl> <4D9DE57F.4040303@ldeo.columbia.edu> <59EDC663-F38D-4CAD-8F57-AFB5C3E076C7@xs4all.nl> <4D9E5ECB.60608@ldeo.columbia.edu> Message-ID: <0F717DDD-470A-4B13-B1AF-FBCB034409DC@xs4all.nl> hi, Sometimes going through some old emails. Note in the meantime i switched from AMD-CAL to OpenCL. On Apr 8, 2011, at 3:03 AM, Gus Correa wrote: > Thank you for the information about AMD-CAL and the AMD GPUs. > Does AMD plan any GPU product with 64-bit and ECC, > similar to Tesla/Fermi? Actually DDR5 already calculates a CRC. Not as good as ECC, but it takes care you have a form of checking. Also the amount of bitflips is so little as the quality of this DDR5 is so great, according to some memory experts i spoke with, that this CRC is more than sufficient. As i'm not a memory expert i would advice you to really speak with such a guy instead of some HPC guys here. Now if your organisation wants ECC simply i'm not going to argue. A demand is a demand there. I'm busy pricewise here how to build cheap something that delivers a big punch. If you look objectively and then to gpgpu codes, then of course Nvidia has a few years more experience setting up CUDA. This is another problem of course, software support. Both suck at it, to say polite. Yet we want to do calculations cheap huh. Yet if performance matters, then AMD is a very cheap alternative. In both cases of course, programming for a gpu is going to be the bottleneck; historically organisations do not invest in good code, they only invest in hardware and in managers who sit on their behind, drink coffee and do meetings. Objectively most codes you can also code in 32 bits. If we do a simple compare then the HD6990 is there for 540 euro in the shop here. Now that's European prices where salestax is 19%, so in USA probably it's cheaper (if you calculate it back to euro's). Let's now ignore the marketing nonsense ok, as marketing nonsense is marketing nonsense. All those theoretic flops always, they shouldn't allow double counting specific instructions like multiply add. The internals of these gpu's are all organized such that doing efficient matrix calculations on them is very well possible. Not easy to solve well, as the bottleneck will be the bandwidth from the DDR3 cpu ram to the gpu, yet if you look to a lot of calculations, then it's algorithmic possible to do a lot more work at the execution unit side than the bandwidth you need to another node; those execution units, PE's (processing elements) nowadays called, have huge GPR's which can proces all that. With that those tiny cheap power efficient cores can easily take on huge expensive cpu cores. A single set of 4 PE's in case of AMD has a total of 1024 GPR's, can read from a L1 cache when needed and write to a shared local cache of 32KB (shared by 64 pe's). That L1 reads from the memory L2 and all that has a huge bandwidth. That gives you PRACTICAL 3072 PE's @ 0.83 Ghz == 2.5+ Tflop in 32 bits integers. It's not so hard to convert that to 64 bits code if that's what you need. In fact i'm using it to approximate huge integers (prime numbers) of million bit sizes (factorisation of them). Using that efficiently is not easy, yet realize this is 2.5+ Tflop (i should actually say Tera 32 bits integer performance). Good programmers can use todays GPU's very efficiently. The 6000+ series of AMD and the Fermi series of Nvidia are very good and you can use them in a sustained manner. Now the cheapest gpgpu of Nvidia is about $1200 which is the quadro 6000 series and delivers 448 cores @ 1.2Ghz, say roughly 550 Gflop. Of course this is practical what you can achieve, i'm not counting of course multiply-add here as being 2 flops, which is their own definition of how many gflops it gets; first of all i'm not interested in flops but in integers per cycle and secondly i prefer a realistic measure, otherwise we have no measure on how efficiently we use the gpu. If you look from a mathematical viewpoint, it's not so clever from most scientists at todays huge calculations to use floating point. Single precision or double precision, in the end it all backtracks errors and you have complete non-deterministic results with big sureness. Much better are integer transforms where you have 100% lossless calculations so big sureness your calculation is ok. Yet i realize this is a very expertise field with most people who know something about that hiding in secrecy using fake names and some even having fake burials, just in order to disappear. That in itself is all very sad, as progressing science doesn't happen. As a result of that scientific world has focussed too much upon floating point. Yet the cards can deliver that as well as we know. The round off errors all those floating point calculations cause are always a huge multiple of bitflips of memory. It's not even in the same league. Now of course calculating with 64 bits integers it's easier to do huge transforms and you can redo your calculation and at some spots you will have deterministic output in such case, in others of course not (depends what you calculate of course - majority is non- deterministic). With 32 bits integers you need a lot of CRT (Chinese Remainder Theorem) tricks to effectively use it for huge transforms, or you simply emulate 64 bits calculations (so with 64 bits precision, please do not confuse with double precision floating point). Getting all that to work is very challenging and not easy, i realize that. Yet look at the huge advantage you give to your scientists in such case. They can look years ahead in the future which is a *huge* advantage. In this manner you'll actually effectively get 2.x Tflop out of those 6990, again that's 2 Tflop calculated in my manner, i'm looking simply at INSTRUCTION LEVEL where 1 instruction represents a single unit of 32 bits; counting the multiply-add instruction as 2 flops is just too confusing for how efficient you manage to load your GPU, if you ask me. In the transforms in fact multiply-add is very silly to use in many cases as that means you're doing some sort of inefficient calculation. Yet that chippie is just 500 euro, versus Nvidia delivers it for 1200 dollar and the nvidia one is factor 3 slower, though still lightyears faster than a CPU solution there (pricewise seen). The quadro 6000 for those who don't realize it, is exactly the same like a Tesla. Just checkout the specs. Yet of course for our lazy scientists all of the above is not so interesting. Just compiling your years 80 code, pushing the enter button, is a lot easier. If you care however for PERFORMANCE, consider spending a couple of thousands to hardware. If you buy 1000 of those 6990's and program in opencl, you actually can also run that at nvidia hardware, might nvidia be so lucky to release very quickly a 22 nm gpu some years from now. By then also nvidia's opencl will probably be supporting the gpu hardware quite ok. So my advice would be: program it in opencl. It's not the most efficient language on the planet, yet it'll work everywhere and you can get probably around 2 Tflop out that 6990 AMD card. That said of course there is a zillion problems still with opencl, yet if you want for $500k in gpu hardware achieve 1 petaflop, you'll have to suffer a bit, and by the time your cluster is there, possibly all big bugs have been fixed in opencl both by amd as well as by nvidia for their gpu lines. Now all this said i do realize that you need a shift in thinking. Whether you use AMD-gpu's or Nvidia, in both cases you'll need great new software. In fact it doesn't even matter whether you program it in OpenCL or CUDA. It's easy to port algorithms from 1 entity to another; getting such algorithm to work is a lot harder than the question what language you program it in. Translating CUDA to openCL is pretty much braindead work which many can carry out as we already saw in some examples. The investment is in the software for the gpu's. You don't buy that in from nvidia nor AMD. You'l have to hire people to program it, as your own scientists simply aren't good enough to program efficiently for that GPU. The old fashionned vision of having scientists solve themselve how to do the calculations is not going to work for gpgpu simply. Now that is a big pitfall that is hard to overcome. All this said, of course there is a few, really very few, applications where a full blown gpu nor hybrid solution is able to solve the problems. Yet usually such claim that it is "not possible" gets done by scientists who are experts in their field, but not very high level in finding solutions how to efficiently get their calculations done in HPC. Regards, Vincent > > The lack of a language standard may still be a hurdle here. > I guess there were old postings here about CUDA and OpenGL. > What fraction of the (non-gaming) GPU code is being written these days > in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using > compiler directives like those in the PGI compilers? > > Thank you, > Gus Correa > > Vincent Diepeveen wrote: >> >> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: >> >>> Vincent Diepeveen wrote: >>> >>>> GPU monster box, which is basically a few videocards inside such a >>>> box stacked up a tad, wil only add a couple of >>>> thousands. >>>> >>> >>> This price may be OK for the videocard-class GPUs, >>> but sounds underestimated, at least for Fermi Tesla. >> >> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 >> note there is a 6 GB version, not aware of price will be $$$$ i bet. >> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro >> >> VERSUS >> >> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. >> >> Factor 100 difference to those cards. >> >> A couple of thousands versus a couple of hundreds of thousands. >> Hope i made my point clear. >> >> >>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla >>> C2050, >>> with 448 cores and 3GB RAM per GPU, cost around $10k. >>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~ >>> $15k. >>> If you care about ECC, that's the price you pay, right? >> >> When fermi released it was a great gpu. >> >> Regrettably they lobotomized the gamers card's double precision as i >> understand, >> So it hardly has double precision capabilities; if you go for >> nvidia you >> sure need a Tesla, >> no question about it. >> >> As a company i would buy in 6990's though, they're a lot cheaper and >> roughly 3x faster >> than the Nvidia's (for some more than 3x for other occassions less >> than >> 3x, note the card >> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). >> >> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units >> for AMD >> versus 448 cores nvidia with 448 execution units of 32 bits >> multiplication. >> >> Especially because multiplication has improved a lot. >> >> Already having written CUDA code some while ago, i wanted the cheap >> gamers card with big >> horse power now at home so i'm toying on a 6970 now so will be >> able to >> report to you what is possible to >> achieve at that card with respect to prime numbers and such. >> >> I'm a bit amazed so little public initiatives write code for the >> AMD gpu's. >> >> Note that DDR5 ram doesn't have ECC by default, but has in case of >> AMD a >> CRC calculation >> (if i understand it correctly). It's a bit more primitive than >> ECC, but >> works pretty ok and shows you >> also when problems occured there, so figuring out remove what goes >> on is >> possible. >> >> Make no mistake that this isn't ECC. >> We know some HPC centers have as a hard requirement ECC, only >> nvidia is >> an alternative then. >> >> In earlier posts from some time ago and some years ago i already >> wrote >> on that governments should >> adapt more to how hardware develops rather than demand that >> hardware has >> to follow them. >> >> HPC has too little cash to demand that from industry. >> >> OpenCL i cannot advice at this moment (for a number of reasons). >> >> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but >> majority of codes are possible >> to port quite well (there is exceptions), or easy work arounds. >> >> Any company doing gpgpu i would advice developing both branches of >> code >> at the same time, >> as that gives the company a lot of extra choices for really very >> little >> extra work. Maybe 1 coder, >> and it always allows you to have the fastest setup run your >> production >> code. >> >> That said we can safely expect that from raw performance coming years >> AMD will keep the leading edge >> from crunching viewpoint. Elsewhere i pointed out why. >> >> Even then i'd never bet at just 1 manufacturer. Go for both >> considering >> the cheap price of it. >> >> For a lot of HPC centers the choice of nvidia will be an easy one, as >> the price of the Fermi cards >> is peanuts compared to the price rest of the system and considering >> other demands that's what they'll go for. >> >> That might change once you stick in bunches of videocards in nodes. >> >> Please note that the gpu 'streamcores' or PE's whatever name you >> want to >> give them, are so bloody fast, >> that your code has to work within the PE's themselves and hardly >> use the >> RAM. >> >> Both for Nvidia as well as AMD, the streamcores are so fast, that you >> simply don't want to lose time on the RAM >> when your software runs, let alone that you want to use huge RAM. >> >> Add to that, that nvidia (have to still figure out for AMD) can in >> background stream from and to the gpu's RAM >> from the CPU, so if you do really large calculations involving >> many nodes, >> all that shouldn't be an issue in the first place. >> >> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that >> would >> really amaze me, though i'm sure >> there is cases where that happens. If we see however what was >> ordered it >> mostly is the 3GB Tesla's, >> at least on what has been reported, i have no global statistics on >> that... >> >> Now all choices are valid there, but even then we speak about peanuts >> money compared to the price of >> a single 8 socket Nehalem-ex box, which fully configured will be >> maybe >> $300k-$400k or something? >> >> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD >> 6990 >> is 2000 euro. >> >> There won't be 2 gpu nvidia's any soon because of the choice they >> have >> historically made for the memory controllers. >> See explanation of intel fanboy David Kanter for that at >> realworldtech >> in a special article he wrote there. >> >> Please note i'm not judging AMD nor Nvidia, they have made their >> choices >> based upon totally different >> businessmodels i suspect and we must be happy we have this rich >> choice >> right now between cpu's from different >> manufacturers and gpu's from different manufacturers. >> >> Nvidia really seems to aim at supercomputers, giving their tesla line >> without lobotomization and lobotomizing their >> gamers cards, where AMD aims at gamers and their gamercards have full >> functionality >> without lobotomization. >> >> Total different businessmodels. Both have their advantages and >> disadvantages. >> >> From pure performance viewpoint it's easy to see what's faster >> though. >> >> Yet right now i realize all too well that just too many still >> hesitate >> between also offering gpu services additional to >> cpu services, in which case having a gpu, regardless nvidia or amd, >> kicks butt of course from throughput viewpoint. >> >> To be really honest with you guys, i had expected that by 2011 we >> would >> have a gpu reaching far over 1 Teraflop double precision >> handsdown. If >> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 >> gpu's on a single card to get over that Teraflop double precision >> (claim >> is 1.27 Teraflop double precision), >> that really is underneath my expectations from a few years ago. >> >> Now of course i hope you realize i'm not coding double precision >> code at >> all; i'm writing everything in integers of 32 bits for the AMD >> card and >> the Nvidia equivalent also is using 32 bits integers. The ideal >> way to >> do calculations on those cards, so also very big transforms, is using >> the 32 x 32 == 64 bits instructions (that's 2 instructions in case >> of AMD). >> >> Regards, >> Vincent >> >> >>> >>> Gus Correa > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Apr 21 09:11:54 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 21 Apr 2011 15:11:54 +0200 Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to tackle huge scientific challenges In-Reply-To: <4D9E4162.3030004@pathscale.com> References: <4D9E4162.3030004@pathscale.com> Message-ID: Regrettably the link is not available anymore. Can you expand on it? As they count the cloud computing in units of 1Ghz per cpunode hour, 1 billion computing core hours is something like 1000 gpu's for 1 week? 1 billion sounds impressive nevertheless. Regards, Vincent On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote: > I just saw this on another ML and thought it may be of interest > ------------ > http://googleblog.blogspot.com/2011/04/1-billion-computing-core- > hours-for.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Thu Apr 21 09:15:13 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Thu, 21 Apr 2011 15:15:13 +0200 Subject: [Beowulf] Google: 1 billion computing core-hours for researchers to tackle huge scientific challenges In-Reply-To: References: <4D9E4162.3030004@pathscale.com> Message-ID: <5188F4D0-1D69-4B8F-874D-D20FDAC25CF6@staff.uni-marburg.de> Am 21.04.2011 um 15:11 schrieb Vincent Diepeveen: > Regrettably the link is not available anymore. Can you expand on it? For me it's still working. You selected both lines? --Reuti > As they count the cloud computing in units of 1Ghz per cpunode hour, > 1 billion computing core hours is something like 1000 gpu's for 1 week? > > 1 billion sounds impressive nevertheless. > > Regards, > Vincent > > On Apr 8, 2011, at 12:57 AM, C. Bergstr?m wrote: > >> I just saw this on another ML and thought it may be of interest >> ------------ >> http://googleblog.blogspot.com/2011/04/1-billion-computing-core- >> hours-for.html >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.