From prentice at ias.edu Mon Jan 2 14:12:47 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 02 Jan 2012 14:12:47 -0500 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> Message-ID: <4F0201AF.6080509@ias.edu> On 12/29/2011 02:49 PM, Mark Hahn wrote: > guys, this isn't a dating site. ...yet. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- MailScanner: clean From prentice at ias.edu Mon Jan 2 14:15:16 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 02 Jan 2012 14:15:16 -0500 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> Message-ID: <4F020244.4040505@ias.edu> On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: > it's very useful Mark, as we know now he works for the company and > also for which nation. > > Vincent For someone who's always bashing on US Foreign policy, you sure sound like a Republican or member of the Department of Homeland Security! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- MailScanner: clean From eugen at leitl.org Wed Jan 11 04:13:02 2012 From: eugen at leitl.org (Eugen Leitl) Date: Wed, 11 Jan 2012 10:13:02 +0100 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems Message-ID: <20120111091302.GU21917@leitl.org> ----- Forwarded message from Georg Hager ----- From: Georg Hager Date: Wed, 11 Jan 2012 01:40:09 +0100 (CET) To: eugen at leitl.org Subject: Course: Parallel Programming of High Performance Systems "Parallel Programming of High Performance Systems" is the yearly course provided by LRZ and RRZE that gives students and scientists a solid introduction to - Processor and HPC system architectures - Code development and basic tools - Scalar optimizations (generic and architecture-specific) - Parallelization basics - Parallel programming with OpenMP and MPI There will also be an additional course with advanced topics, which covers - Parallel performance tools for MPI and OpenMP - Parallel I/O with MPI I/O - I/O tuning and libraries Hands-on sessions will enable participants to apply the concepts right away. Although the federal HPC system at LRZ Munich is treated in some detail, most of the conveyed concepts are of general use. You can find the preliminary course agendas on the web: Basic course: Advanced course: This year the basic course is hosted by RRZE in Erlangen and will be available at LRZ in Garching via videoconferencing, if a sufficient number of people are interested. Hands-On sessions will then be provided at both locations. The advanced course will be hosted by LRZ in Garching. Basic course: ============ Location: RRZE, Martensstr. 1, 91058 Erlangen Date: March 5-9, 2012, 9:00-18:00 Advanced course: =============== Location: LRZ, Boltzmannstr. 1, 85748 Garching b. Muenchen Date: March 19-22, 2012, 9:00-18:00 There is no course fee. Please register for course "HPPP1W11" and/or "HPAT1W11" at the following LRZ website: Hoping to see you there, G. Hager -- Dr. Georg Hager, HPC Services Friedrich-Alexander-Universitaet Erlangen-Nuernberg Regionales RechenZentrum Erlangen (RRZE) Martensstrasse 1, 91058 Erlangen, Germany Tel. +49 9131 85-28973, Fax +49 9131 302941 mailto:georg.hager at rrze.uni-erlangen.de http://www.hpc.rrze.uni-erlangen.de/ ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 10:36:48 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 16:36:48 +0100 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems In-Reply-To: <20120111091302.GU21917@leitl.org> References: <20120111091302.GU21917@leitl.org> Message-ID: <3DCEF7EC-45ED-43C0-9345-A59938AB9861@xs4all.nl> Yeah, the sheets are there from the 2003 lecture. filename LRZ210703_1.pdf Very helpful if you have grey hair and want to port your years 80 fortran code to todays HPC hardware. Vincent On Jan 11, 2012, at 10:13 AM, Eugen Leitl wrote: > ----- Forwarded message from Georg Hager erlangen.de> ----- > > From: Georg Hager > Date: Wed, 11 Jan 2012 01:40:09 +0100 (CET) > To: eugen at leitl.org > Subject: Course: Parallel Programming of High Performance Systems > > "Parallel Programming of High Performance Systems" is the > yearly course provided by LRZ and RRZE that gives students > and scientists a solid introduction to > > - Processor and HPC system architectures > - Code development and basic tools > - Scalar optimizations (generic and architecture-specific) > - Parallelization basics > - Parallel programming with OpenMP and MPI > > There will also be an additional course with advanced topics, > which covers > > - Parallel performance tools for MPI and OpenMP > - Parallel I/O with MPI I/O > - I/O tuning and libraries > > Hands-on sessions will enable participants to apply the concepts > right away. > > Although the federal HPC system at LRZ Munich is treated in some > detail, most of the conveyed concepts are of general use. > You can find the preliminary course agendas on the web: > > Basic course: > > > Advanced course: > > > This year the basic course is hosted by RRZE in Erlangen and > will be available at LRZ in Garching via videoconferencing, > if a sufficient number of people are interested. Hands-On > sessions will then be provided at both locations. The advanced > course will be hosted by LRZ in Garching. > > Basic course: > ============ > Location: RRZE, Martensstr. 1, 91058 Erlangen > Date: March 5-9, 2012, 9:00-18:00 > > Advanced course: > =============== > Location: LRZ, Boltzmannstr. 1, 85748 Garching b. Muenchen > Date: March 19-22, 2012, 9:00-18:00 > > > There is no course fee. > > Please register for course "HPPP1W11" and/or "HPAT1W11" > at the following LRZ website: > > > > Hoping to see you there, > G. Hager > > -- > Dr. Georg Hager, HPC Services > Friedrich-Alexander-Universitaet Erlangen-Nuernberg > Regionales RechenZentrum Erlangen (RRZE) > Martensstrasse 1, 91058 Erlangen, Germany > Tel. +49 9131 85-28973, Fax +49 9131 302941 > mailto:georg.hager at rrze.uni-erlangen.de > http://www.hpc.rrze.uni-erlangen.de/ > > ----- End forwarded message ----- > -- > Eugen* Leitl leitl http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 11:09:00 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 08:09:00 -0800 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems In-Reply-To: <3DCEF7EC-45ED-43C0-9345-A59938AB9861@xs4all.nl> Message-ID: I don't have grey hair (part grey beard, I confess), but I have plenty of 70s era FORTRAN that benefits from parallelization. Numerical Electromagnetics Code V4, specifically. The implementation has been throughly validated and have been used for decades, finding all the little idiosyncracies and dealing with numerical precision issues, etc. There's extensive software around that generates the card image input files it expects and parses the line printer output files (with the 1 in column 1 for a page break). Rewriting it from scratch would not be a very good use of time. You'd have to revisit all the years of validation, make sure there were subtle differences in function, because while there's an official validation suite, it's more to make sure that the compile worked ok and there's not an egregious problem. And who knows what users out there have depended on some idiosyncratic implementation aspects. I suspect the same is true for lots of fluid mechanics and other FEM codes (NASTRAN, for instance). So an incremental approach of parallelizing that old FORTRAN, replacing pieces with "new FORTRAN", for instance, might be useful. (and don't get me started on my experiences with the f2c engine) On 1/11/12 7:36 AM, "Vincent Diepeveen" wrote: >Yeah, the sheets are there from the 2003 lecture. >filename LRZ210703_1.pdf > >Very helpful if you have grey hair and want to port your years 80 >fortran code to todays HPC hardware. > >Vincent > >On Jan 11, 2012, at 10:13 AM, Eugen Leitl wrote: > >> ----- Forwarded message from Georg Hager > erlangen.de> ----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 11:18:41 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 08:18:41 -0800 Subject: [Beowulf] A cluster of Arduinos Message-ID: For educational purposes.. Has anyone done something where they implement some sort of message passing API on a network of Arduinos. Since they cost only $20 each, and have a fairly facile development environment, it seems you could put together a simple demonstration of parallel processing and various message passing things. For instance, you could introduce errors in the message links and do experiments with Byzantine General type algorithms, or with multiple parallel routes, etc. I've not actually tried hooking up multiple arduinos through a USB hub to one PC, but if that works, it gives you a nice "head node, debug console" sort of interface. Smaller, lighter, cheaper than lashing together MiniITX mobos or building a Wal-Mart Cluster. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Wed Jan 11 12:00:43 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:00:43 +0100 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems In-Reply-To: References: Message-ID: <7B7DB325-4FFB-4C68-9602-2E1E71B41D12@xs4all.nl> On Jan 11, 2012, at 5:09 PM, Lux, Jim (337C) wrote: > I don't have grey hair (part grey beard, I confess), but I have > plenty of > 70s era FORTRAN that benefits from parallelization. > Numerical Electromagnetics Code V4, specifically. > > The implementation has been throughly validated and have been used for > decades, finding all the little idiosyncracies and dealing with > numerical > precision issues, etc. There's extensive software around that > generates > the card image input files it expects and parses the line printer > output > files (with the 1 in column 1 for a page break). > > Rewriting it from scratch would not be a very good use of time. > You'd have > to revisit all the years of validation, make sure there were subtle > differences in function, because while there's an official validation > suite, it's more to make sure that the compile worked ok and > there's not > an egregious problem. And who knows what users out there have > depended on > some idiosyncratic implementation aspects. > > I suspect the same is true for lots of fluid mechanics and other > FEM codes > (NASTRAN, for instance). > > So an incremental approach of parallelizing that old FORTRAN, > replacing > pieces with "new FORTRAN", for instance, might be useful. > > (and don't get me started on my experiences with the f2c engine) > No need to get started Jim, NASA can ask that the Russians as well. > > > On 1/11/12 7:36 AM, "Vincent Diepeveen" wrote: > >> Yeah, the sheets are there from the 2003 lecture. >> filename LRZ210703_1.pdf >> >> Very helpful if you have grey hair and want to port your years 80 >> fortran code to todays HPC hardware. >> >> Vincent >> >> On Jan 11, 2012, at 10:13 AM, Eugen Leitl wrote: >> >>> ----- Forwarded message from Georg Hager >> erlangen.de> ----- > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Jan 11 11:58:59 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 11 Jan 2012 11:58:59 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <4F0DBFD3.3070503@ias.edu> On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: > > For educational purposes.. > > Has anyone done something where they implement some sort of message > passing API on a network of Arduinos. Since they cost only $20 each, > and have a fairly facile development environment, it seems you could > put together a simple demonstration of parallel processing and various > message passing things. > > For instance, you could introduce errors in the message links and do > experiments with Byzantine General type algorithms, or with multiple > parallel routes, etc. > > I've not actually tried hooking up multiple arduinos through a USB hub > to one PC, but if that works, it gives you a nice "head node, debug > console" sort of interface. > > Smaller, lighter, cheaper than lashing together MiniITX mobos or > building a Wal-Mart Cluster. > I started tinkering with Arduinos a couple of months ago. Got lots of related goodies for Christmas, so I've been looking like a mad scientist building arduino things lately. I'm still a beginner arduino hacker, but I'd be game for giving this a try, if anyone else wants to give this a go. The Arduino Due, which is overdue in the marketplace, will have a Cortex-M3 ARM processor. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 12:30:30 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:30:30 +0100 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: <4F020244.4040505@ias.edu> References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> <4F020244.4040505@ias.edu> Message-ID: <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> On Jan 2, 2012, at 8:15 PM, Prentice Bisbal wrote: > On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: >> it's very useful Mark, as we know now he works for the company and >> also for which nation. >> >> Vincent > > For someone who's always bashing on US Foreign policy, you sure sound > like a Republican or member of the Department of Homeland Security! Where is my paycheck? > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ntmoore at gmail.com Wed Jan 11 12:31:30 2012 From: ntmoore at gmail.com (Nathan Moore) Date: Wed, 11 Jan 2012 11:31:30 -0600 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0DBFD3.3070503@ias.edu> References: <4F0DBFD3.3070503@ias.edu> Message-ID: I think something like the Raspberry Pi might be easier for this sort of task. They'll also be about $25, but they'll run something like ARM/linux. Not out yet thought. http://www.raspberrypi.org/ On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal wrote: > On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >> >> For educational purposes.. >> >> Has anyone done something where they implement some sort of message >> passing API on a network of Arduinos. ?Since they cost only $20 each, >> and have a fairly facile development environment, it seems you could >> put together a simple demonstration of parallel processing and various >> message passing things. >> >> For instance, you could introduce errors in the message links and do >> experiments with Byzantine General type algorithms, or with multiple >> parallel routes, etc. >> >> I've not actually tried hooking up multiple arduinos through a USB hub >> to one PC, but if that works, it gives you a nice "head node, debug >> console" sort of interface. >> >> Smaller, lighter, cheaper than lashing together MiniITX mobos or >> building a Wal-Mart Cluster. >> > > I started tinkering with Arduinos a couple of months ago. Got lots of > related goodies for Christmas, so I've been looking like a mad scientist > building arduino things lately. I'm still a beginner arduino hacker, but > I'd be game for giving this a try, ?if anyone else wants to give this a go. > > The Arduino Due, which is overdue in the marketplace, will have a > Cortex-M3 ARM processor. > > -- > Prentice > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- - - - - - - -?? - - - - - - -?? - - - - - - - Nathan Moore Associate Professor, Physics Winona State University - - - - - - -?? - - - - - - -?? - - - - - - - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 12:43:17 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:43:17 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0DBFD3.3070503@ias.edu> References: <4F0DBFD3.3070503@ias.edu> Message-ID: <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> On Jan 11, 2012, at 5:58 PM, Prentice Bisbal wrote: > On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >> >> For educational purposes.. >> >> Has anyone done something where they implement some sort of message >> passing API on a network of Arduinos. Since they cost only $20 each, >> and have a fairly facile development environment, it seems you could >> put together a simple demonstration of parallel processing and >> various >> message passing things. >> >> For instance, you could introduce errors in the message links and do >> experiments with Byzantine General type algorithms, or with multiple >> parallel routes, etc. >> >> I've not actually tried hooking up multiple arduinos through a USB >> hub >> to one PC, but if that works, it gives you a nice "head node, debug >> console" sort of interface. >> >> Smaller, lighter, cheaper than lashing together MiniITX mobos or >> building a Wal-Mart Cluster. >> > > I started tinkering with Arduinos a couple of months ago. Got lots of > related goodies for Christmas, so I've been looking like a mad > scientist > building arduino things lately. I'm still a beginner arduino > hacker, but > I'd be game for giving this a try, if anyone else wants to give > this a go. > > The Arduino Due, which is overdue in the marketplace, will have a > Cortex-M3 ARM processor. Completely superior chip that Cortex-M3. Though i couldn't program much for it so far - difficult to get contract jobs for. Can do fast multiplication 32 x 32 bits. You can even implement RSA very fast on that chip. Runs at 70Mhz or so? Usually writing assembler for such CPU's is more efficient by the way than using a compiler. Compilers are not so efficient, to say polite, for embedded cpu's. Writing assembler for such cpu's is pretty straightforward, whereas in HPC things are far more complicated because of vectorization. AVX is the latest there. Speaking of AVX, is there already lots of HPC support for AVX? I see that after years of wrestling the George Woltman released some prime number code (GWNUM), of course as always: in beta for the remainder of this century, which uses AVX. Claims are that it's a tad faster than the existing SIMD codes. I saw claims of even above 20% faster, which is really a lot at that level of engineering; usually you work 6 months for 0.5% speedup. If you improve algorithm, you still lose it from this code, as your C/ C++ code will be default a factor 10 slower if not more. I remember how i found a clever caching trick in 2006 for a Numeric Theoretic Transform (that's a FFT but then in integers, so without the rounding errors that the floating point FFT's give), yet after some hard work there my C code still was factor 8 slower than Woltman's SIMD assembler. > > -- > Prentice > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 12:44:43 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:44:43 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> Message-ID: <940F5BCF-8CC3-4461-ABA4-79FBCF9BF057@xs4all.nl> That's all very expensive considering the cpu's are under $1 i'd guess. I actually might need some of this stuff some months from now to build some robots. On Jan 11, 2012, at 6:31 PM, Nathan Moore wrote: > I think something like the Raspberry Pi might be easier for this sort > of task. They'll also be about $25, but they'll run something like > ARM/linux. Not out yet thought. > > http://www.raspberrypi.org/ > > On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal > wrote: >> On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >>> >>> For educational purposes.. >>> >>> Has anyone done something where they implement some sort of message >>> passing API on a network of Arduinos. Since they cost only $20 >>> each, >>> and have a fairly facile development environment, it seems you could >>> put together a simple demonstration of parallel processing and >>> various >>> message passing things. >>> >>> For instance, you could introduce errors in the message links and do >>> experiments with Byzantine General type algorithms, or with multiple >>> parallel routes, etc. >>> >>> I've not actually tried hooking up multiple arduinos through a >>> USB hub >>> to one PC, but if that works, it gives you a nice "head node, debug >>> console" sort of interface. >>> >>> Smaller, lighter, cheaper than lashing together MiniITX mobos or >>> building a Wal-Mart Cluster. >>> >> >> I started tinkering with Arduinos a couple of months ago. Got lots of >> related goodies for Christmas, so I've been looking like a mad >> scientist >> building arduino things lately. I'm still a beginner arduino >> hacker, but >> I'd be game for giving this a try, if anyone else wants to give >> this a go. >> >> The Arduino Due, which is overdue in the marketplace, will have a >> Cortex-M3 ARM processor. >> >> -- >> Prentice >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > - - - - - - - - - - - - - - - - - - - - - > Nathan Moore > Associate Professor, Physics > Winona State University > - - - - - - - - - - - - - - - - - - - - - > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 12:58:13 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 09:58:13 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> Message-ID: Yes.. better the widget that one can whip on down to Radio Shack and buy on my way home from work than the ghostware that may live for Christmas future. Also, does the Raspberry PI $25 price point include a power supply? The Arduino runs off the USB 5V power, so it's one less thing to hassle with. I don't know that performance is all that important in this application. It's more to experiment with message passing in a multiprocessor system. Slow is fine. (I can't think of a computational application for a ArdWulf (combining Italian and Saxon) that wouldn't be blown away by almost any single computer, including something like a smart phone) Realistically, you're looking at bitbanging kinds of serial interfaces. I can see several network implementations: SPI shared bus, Hypercubes, toroidal surfaces, etc. -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Nathan Moore Sent: Wednesday, January 11, 2012 9:32 AM To: Prentice Bisbal Cc: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos I think something like the Raspberry Pi might be easier for this sort of task. They'll also be about $25, but they'll run something like ARM/linux. Not out yet thought. http://www.raspberrypi.org/ On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal wrote: > On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >> >> For educational purposes.. >> >> Has anyone done something where they implement some sort of message >> passing API on a network of Arduinos. ?Since they cost only $20 each, >> and have a fairly facile development environment, it seems you could >> put together a simple demonstration of parallel processing and >> various message passing things. >> >> _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:00:36 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:00:36 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> Message-ID: > The Arduino Due, which is overdue in the marketplace, will have a > Cortex-M3 ARM processor. Completely superior chip that Cortex-M3. Though i couldn't program much for it so far - difficult to get contract jobs for. Can do fast multiplication 32 x 32 bits. You can even implement RSA very fast on that chip. Runs at 70Mhz or so? Usually writing assembler for such CPU's is more efficient by the way than using a compiler. Compilers are not so efficient, to say polite, for embedded cpu's. Writing assembler for such cpu's is pretty straightforward, whereas in HPC things are far more complicated because of vectorization. -->> ah, but this is not really a HPC application. It's a cluster computer architecture demonstration platform. The Java based arduino environment is pretty simple and multiplatform. Yes, it uses a sort of weird C-like language, but there it is... it's easy to use. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:19:24 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:19:24 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> Message-ID: Yes.. And there's been a bunch of "value clusters" over the years (StoneSouperComputer, for instance).. But that's still $3k. I could see putting together 8 nodes for a few hundred dollars. Arduino Uno R3 is about $25 each in quantity. Think in terms of a small class where you want to have, say, 10 mini-clusters, one per student. No sharing, etc. -----Original Message----- From: Alex Chekholko [mailto:alex.chekholko at gmail.com] Sent: Wednesday, January 11, 2012 10:12 AM To: Lux, Jim (337C) Cc: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos The LittleFe cluster is designed specifically for teaching and demonstration. Current cost is ~$3k. But it's all standard x86 and runs Linux and even has GPUs. http://littlefe.net/ I saw them build a bunch of them at SC11. On Wed, Jan 11, 2012 at 10:00 AM, Lux, Jim (337C) wrote: > ?It's a cluster computer architecture demonstration platform. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:27:31 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:27:31 -0800 Subject: [Beowulf] PAPERS interface Message-ID: Arghh.. my google-fu is failing me.. I'm looking for the papers on the PAPERS cluster interface (based on using parallel ports.. back in the 90s) and, of course, if you search for the word papers, you get nothing useful.. I can't remember who the authors were or where it was done (I'm thinking in the SouthEast US, for some reason, but I'm not sure) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sabujp at gmail.com Wed Jan 11 13:35:17 2012 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Wed, 11 Jan 2012 12:35:17 -0600 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: https://www.google.com/search?hl=en&q=%22PAPERS%22%20parallel%20port%20interface&btnG=Google+Search http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1183&context=ecetr HTH, Sabuj Google Proxy Certified Search Partner On Wed, Jan 11, 2012 at 12:27 PM, Lux, Jim (337C) wrote: > Arghh.. my google-fu is failing me.. > > > > I?m looking for the papers on the PAPERS cluster interface (based on using > parallel ports.. back in the 90s) and, of course, if you search for the word > papers, you get nothing useful.. > > > > I can?t remember who the authors were or where it was done (I?m thinking in > the SouthEast US, for some reason, but I?m not sure) > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:37:14 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:37:14 -0800 Subject: [Beowulf] PAPERS interface In-Reply-To: <4F0DD65B.3060808@nasa.gov> References: <4F0DD65B.3060808@nasa.gov> Message-ID: Thanks.. Also props to Juan Gallego who found it, too.. From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] Sent: Wednesday, January 11, 2012 10:35 AM To: Lux, Jim (337C) Cc: beowulf at beowulf.org Subject: Re: [Beowulf] PAPERS interface On 01/11/12 10:27, Lux, Jim (337C) wrote: Arghh.. my google-fu is failing me.. I'm looking for the papers on the PAPERS cluster interface (based on using parallel ports.. back in the 90s) and, of course, if you search for the word papers, you get nothing useful.. I can't remember who the authors were or where it was done (I'm thinking in the SouthEast US, for some reason, but I'm not sure) Hi Jim. The lead author is Hank Dietz. The acronym is: PAPERS: Purdue's adapter for parallel execution and rapid synchronization. Cheers from NASA Ames... -jeff -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Jan 11 13:39:41 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:39:41 -0800 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: Excellent.. Purdue.. and have we really been beowulfing since 1994? I'll be that the earliest clusters can legally buy alcohol now... So, If I build a cluster with Arduinos using the PAPERS style interface, what will it be called... BeoPaperDuino? From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Lux, Jim (337C) Sent: Wednesday, January 11, 2012 10:28 AM To: beowulf at beowulf.org Subject: [Beowulf] PAPERS interface Arghh.. my google-fu is failing me.. I'm looking for the papers on the PAPERS cluster interface (based on using parallel ports.. back in the 90s) and, of course, if you search for the word papers, you get nothing useful.. I can't remember who the authors were or where it was done (I'm thinking in the SouthEast US, for some reason, but I'm not sure) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Wed Jan 11 14:38:53 2012 From: atp at piskorski.com (Andrew Piskorski) Date: Wed, 11 Jan 2012 14:38:53 -0500 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: <20120111193853.GA86203@piskorski.com> On Wed, Jan 11, 2012 at 10:27:31AM -0800, Lux, Jim (337C) wrote: > I'm looking for the papers on the PAPERS cluster interface (based on > using parallel ports.. back in the 90s) and, of course, if you It also came up a few times here on the list, e.g.: http://www.beowulf.org/archive/2004-October/010934.html From: Tim Mattox Date: Sat Oct 16 15:15:14 PDT 2004 -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 17:47:00 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 23:47:00 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> Message-ID: <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Jim, your microcontroller cluster is not a rather good idea. Latency didn't keep up with the CPU speeds... Todays nodes have a CPU core or 12 and soon 16 which can execute, let's take a simple integer example in my chessprogram and its IPC, about 24 instructions per cycle So nothing SIMD, just simple integer instructions most of it, of course loads which effectively come from L1 play an overwhelming role there. typical latencies to do a random memory read from the remote nodes, even with the latest networks, it's between 0.85 and 1.9 microseconds. Let's take optimistic 1 microsecond. RDMA read... So in that timeframe you can execute 24k+ instructions. IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for most codes. Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are busy with rough measures here. Let's call that 1/4 millisecond. Even USB 1.1 has to sticks latencies far under 1 millisecond. So actual latency of todays clusters is factor 25k worse than this 'cluster'. In fact your microcontrollercluster here has latencies that you do not even have core to core within a single CPU today. There is still too much years 80s and years 90s software out there, written by the guys who wrote books about how to parallellize, which simply doesn't scale at all at modern hardware. Let me not quote too many names there as i've done before. They were just too lazy to throw away their old code and start over new writing a new parallel concept that works at todays hardware. If we involve GPU's now then there is gonna be an even bigger problem and that's that bandwidth of the network can't keep up with what a single GPU delivers. Who is to blame for that is quite a complicated discussion, if anyone has to be blamed anyway. We just need more clever algorithms there. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 17:56:12 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 23:56:12 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: <106FFC0A-B488-4A39-8C55-7FD27C3BCFC1@xs4all.nl> On Jan 11, 2012, at 11:47 PM, Vincent Diepeveen wrote: > Jim, your microcontroller cluster is not a rather good idea. > > Latency didn't keep up with the CPU speeds... > > Todays nodes have a CPU core or 12 and soon 16 which can execute, > let's take a simple integer example in my chessprogram and its IPC, > about 24 instructions per cycle > > So nothing SIMD, just simple integer instructions most of it, of > course loads which effectively > come from L1 play an overwhelming role there. > > typical latencies to do a random memory read from the remote nodes, > even with the latest networks, > it's between 0.85 and 1.9 microseconds. Let's take optimistic 1 > microsecond. RDMA read... > > So in that timeframe you can execute 24k+ instructions. > Hah, how easy it is to make a mistake, sorry for that. I didn't even multiply by the Ghz frequency of the cpu's yet. So if it's 3Ghz or so, it's actually closer to factor 75k faster than 24k. Furthermore another problem is that you cant fully load networks of course. So to keep the network functioning great you want to do such hammering over the network no more than once each 750k instructions. > IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for > most codes. > > Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are > busy with rough measures here. > > Let's call that 1/4 millisecond. > > Even USB 1.1 has to sticks latencies far under 1 millisecond. > > So actual latency of todays clusters is factor 25k worse than this > 'cluster'. > > In fact your microcontrollercluster here has latencies that you do > not even have core to core > within a single CPU today. > > There is still too much years 80s and years 90s software out there, > written by the guys who wrote books about how to parallellize, which > simply > doesn't scale at all at modern hardware. > > Let me not quote too many names there as i've done before. > > They were just too lazy to throw away their old code and start over > new writing a new parallel concept > that works at todays hardware. > > If we involve GPU's now then there is gonna be an even bigger problem > and that's that bandwidth of the network > can't keep up with what a single GPU delivers. Who is to blame for > that is quite a complicated discussion, > if anyone has to be blamed anyway. > > We just need more clever algorithms there. > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 18:24:55 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 15:24:55 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Vincent Diepeveen Sent: Wednesday, January 11, 2012 2:47 PM To: Beowulf Mailing List Subject: Re: [Beowulf] A cluster of Arduinos Jim, your microcontroller cluster is not a rather good idea. Latency didn't keep up with the CPU speeds... --- You're missing the point of the cluster. It's not for performance (where I can't imagine that the slowest single CPU PC out there wouldn't blow the figurative doors off). It's to provide a very inexpensive way to experiment/play/demonstrate loosely coupled multiprocessor systems. --> for example, you could experiment with redundant message routing across a fabric of nodes. The algorithms are fairly simple, and this gives you a testbed which is qualitatively different than just simulating a bunch of nodes on a single PC. There is pedagogical value in a system where you can force a link error by just disconnecting the cable, and your blinky lights on each node show what's going on. There is still too much years 80s and years 90s software out there, written by the guys who wrote books about how to parallellize, which simply doesn't scale at all at modern hardware. --> I think that a lot of the theory of parallel processes is speed independent, and while some historical approaches might not be used in a modern system for good implementation reasons, students and others still need to learn about them, if only as the canonical approach. Sure, you could do a simulation on a single PC (and I've seen them, in Simulink, and in other more specialized tools), but there's a lot of appeal to a hands-on-the-cheap-hardware approach to learning. --> To take an example, if you set a student a problem of lighting a LED on each node in a specified node order at specified intervals, and where the node interconnects are not specified in advance, that's a fairly interesting homework problem. You have to discover the network connectivity graph, then figure out how to pass the message to the appropriate node at the appropriate time. This is a classic "hot plug network discovery" kind of problem, and in the face of intermittent links, it's of great interest. --> While that particular problem isn't exactly HPC, it DOES relate to HPC in a world where you cannot assume perfect processor nodes and perfect communications links. And that gets right to the whole "scalability" thing in HPC. It wasn't til the implementation of Error Correcting Codes in logic that something like the Q7A computer was even possible, because it was so large that you couldn't guarantee that all the tubes would be working all the time. Likewise with many other aspects of modern computing. --> And, of course, in the spaceflight world, this kind of thing is even more important. A concept of growing importance is the "fractionated spacecraft" where all of the functions that would have been all in one physical vehicle are now spread across many smaller pieces. And one might reallocate spacecraft fractional pieces between different virtual spacecraft. Maybe right now, you need a lot of processing power to do image compression and analysis, so you want to allocate a lot of "processing pieces" to the job, with an ad hoc network connection among them. Later, you don't need them, so you can release them to other uses. The pieces might be in the immediate vicinity, or they might be some distance away, which affects the data rate in the link and its error rates. --> You can legitimately ask whether this sort of thing (the fractionated spacecraft) is a Beowulf (defined as a cluster supercomputer built of commodity components) and I would say it shares many of the same properties, especially in the early Beowulf days before multicores and fancy interconnects were fashionable for multi-thousand processor clusters. It's that idea of building a large complex device out of many basically identical subunits, using open source/simple software to manage it. -->> in summary, it's not about performance.. it's about a teaching tool for networking in the context of cluster computing. You claim we need to cast off the shackles of old programming styles and get some new blood and ideas. Well, you need to get people interested in parallel computing and learning the basics (so at least they don't reinvent the square wheel). One way might be challenges such as parallelization of game play; another might be working with parallelized database; the way I propose is with experimenting with message passing parallelization using dirt cheap hardware. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Wed Jan 11 19:18:11 2012 From: deadline at eadline.org (Douglas Eadline) Date: Wed, 11 Jan 2012 19:18:11 -0500 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: <2d6fa78f1fc44cea3df118e1c0a27f31.squirrel@mail.eadline.org> Hank Deitz, was at Purdue, now at Kentucky, see aggregate.org -- Doug > Arghh.. my google-fu is failing me.. > > I'm looking for the papers on the PAPERS cluster interface (based on using > parallel ports.. back in the 90s) and, of course, if you search for the > word papers, you get nothing useful.. > > I can't remember who the authors were or where it was done (I'm thinking > in the SouthEast US, for some reason, but I'm not sure) > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Wed Jan 11 19:36:37 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 01:36:37 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: Yes this was impossible to explain to a bunch of MiT folks as well, some of whom wrote your book i bet - yet the slower the processor, the more of a true SMP system it is. It's obvious that you missed that point. Writing code for a multicore is tougher, from SMP constraints viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond latency to the other cpu's. So it's far from demonstrating clusterprogramming. Lightyears away. Emulation at a simple quadcore is in fact better representative than this. If you want to get closer to clusterprogramming than this, just buy yourself off ebay some barcelona core SMP system with 4 sockets. Say with energy efficient 1.8Ghz CPU's. So with one of the first incarnations of hypertransport, as of course later on it dramatically improved. Latency from cpu to cpu is some 300+ ns if you lookup randomly. Even good programmers in game tree search have big problems working with those latencies. Clusters are having latencies that are far worse than that. Yet as cpu speeds no longer increase much and number of cores doesn't double that quickly, clusters are the way to go if you're CPU hungry. Setting up small clusters is cheap as well. If i put in the name 'mellanox' in ebay i see bunches of cheap cards out there and also switches. With a single switch you can teach half a dozen students. You can just connect the machines you already got there onto a few switches and write MPI code like that. Average cost per student also will be a couple of hundreds of dollars. Vincent On Jan 12, 2012, at 12:24 AM, Lux, Jim (337C) wrote: > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Wednesday, January 11, 2012 2:47 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] A cluster of Arduinos > > Jim, your microcontroller cluster is not a rather good idea. > > Latency didn't keep up with the CPU speeds... > > --- You're missing the point of the cluster. It's not for > performance (where I can't imagine that the slowest single CPU PC > out there wouldn't blow the figurative doors off). It's to provide > a very inexpensive way to experiment/play/demonstrate loosely > coupled multiprocessor systems. > > --> for example, you could experiment with redundant message > routing across a fabric of nodes. The algorithms are fairly > simple, and this gives you a testbed which is qualitatively > different than just simulating a bunch of nodes on a single PC. > There is pedagogical value in a system where you can force a link > error by just disconnecting the cable, and your blinky lights on > each node show what's going on. > > > There is still too much years 80s and years 90s software out there, > written by the guys who wrote books about how to parallellize, > which simply doesn't scale at all at modern hardware. > > --> I think that a lot of the theory of parallel processes is > speed independent, and while some historical approaches might not > be used in a modern system for good implementation reasons, > students and others still need to learn about them, if only as the > canonical approach. Sure, you could do a simulation on a single > PC (and I've seen them, in Simulink, and in other more specialized > tools), but there's a lot of appeal to a hands-on-the-cheap- > hardware approach to learning. > > --> To take an example, if you set a student a problem of lighting > a LED on each node in a specified node order at specified > intervals, and where the node interconnects are not specified in > advance, that's a fairly interesting homework problem. You have to > discover the network connectivity graph, then figure out how to > pass the message to the appropriate node at the appropriate time. > This is a classic "hot plug network discovery" kind of problem, and > in the face of intermittent links, it's of great interest. > > --> While that particular problem isn't exactly HPC, it DOES relate > to HPC in a world where you cannot assume perfect processor nodes > and perfect communications links. And that gets right to the whole > "scalability" thing in HPC. It wasn't til the implementation of > Error Correcting Codes in logic that something like the Q7A > computer was even possible, because it was so large that you > couldn't guarantee that all the tubes would be working all the > time. Likewise with many other aspects of modern computing. > > --> And, of course, in the spaceflight world, this kind of thing is > even more important. A concept of growing importance is the > "fractionated spacecraft" where all of the functions that would > have been all in one physical vehicle are now spread across many > smaller pieces. And one might reallocate spacecraft fractional > pieces between different virtual spacecraft. Maybe right now, you > need a lot of processing power to do image compression and > analysis, so you want to allocate a lot of "processing pieces" to > the job, with an ad hoc network connection among them. Later, you > don't need them, so you can release them to other uses. The pieces > might be in the immediate vicinity, or they might be some distance > away, which affects the data rate in the link and its error rates. > > --> You can legitimately ask whether this sort of thing (the > fractionated spacecraft) is a Beowulf (defined as a cluster > supercomputer built of commodity components) and I would say it > shares many of the same properties, especially in the early Beowulf > days before multicores and fancy interconnects were fashionable for > multi-thousand processor clusters. It's that idea of building a > large complex device out of many basically identical subunits, > using open source/simple software to manage it. > > > -->> in summary, it's not about performance.. it's about a teaching > tool for networking in the context of cluster computing. You claim > we need to cast off the shackles of old programming styles and get > some new blood and ideas. Well, you need to get people interested > in parallel computing and learning the basics (so at least they > don't reinvent the square wheel). One way might be challenges such > as parallelization of game play; another might be working with > parallelized database; the way I propose is with experimenting with > message passing parallelization using dirt cheap hardware. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Jan 11 19:59:18 2012 From: samuel at unimelb.edu.au (Chris Samuel) Date: Thu, 12 Jan 2012 11:59:18 +1100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <201201121159.18993.samuel@unimelb.edu.au> On Thu, 12 Jan 2012 11:36:37 AM Vincent Diepeveen wrote: > So it's far from demonstrating clusterprogramming. Lightyears away. Whatever happpened to hacking on hardware just for the fun of it? Just because it's not going to be useful doesn't mean you won't learn from the experience, even if the lesson is only "don't do it again". :-) -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Jan 11 20:04:32 2012 From: samuel at unimelb.edu.au (Chris Samuel) Date: Thu, 12 Jan 2012 12:04:32 +1100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <201201121204.32332.samuel@unimelb.edu.au> On Thu, 12 Jan 2012 04:58:13 AM Lux, Jim (337C) wrote: > Also, does the Raspberry PI $25 price point include a power supply? I thought the plan was for them to be powered from the HDMI connector, but it appears I was wrong, it looks like it can use either microUSB or the GPIO header. http://elinux.org/RaspberryPiBoard # The board takes fixed 5V input, (with the 1V2 core voltage generated # directly from the input using the internal switch-mode supply on the # BCM2835 die). This permits adoption of the micro USB form factor, # which, in turn, prevents the user from inadvertently plugging in # out-of-range power inputs; that would be dangerous, since the 5V # would go straight to HDMI and output USB ports, even though the # problem should be mitigated by some protections applied to the input # power: The board provides a polarity protection diode, a voltage # clamp, and a self-resetting semiconductor fuse. -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 20:09:53 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 17:09:53 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Vincent Diepeveen Sent: Wednesday, January 11, 2012 4:37 PM To: Beowulf Mailing List Subject: Re: [Beowulf] A cluster of Arduinos Yes this was impossible to explain to a bunch of MiT folks as well, some of whom wrote your book i bet - yet the slower the processor, the more of a true SMP system it is. It's obvious that you missed that point. Writing code for a multicore is tougher, from SMP constraints viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond latency to the other cpu's. -> Yes, that's true... but that's also what I would think of as more advanced than understanding basic message passing or non-tightly-coupled multiprocessing systems. And there are lots of applications for the latter. Some might not be as sexy as others, but they exist. So it's far from demonstrating clusterprogramming. Lightyears away. Emulation at a simple quadcore is in fact better representative than this. If you want to get closer to clusterprogramming than this, just buy yourself off ebay some barcelona core SMP system with 4 sockets. Say with energy efficient 1.8Ghz CPU's. So with one of the first incarnations of hypertransport, as of course later on it dramatically improved. Latency from cpu to cpu is some 300+ ns if you lookup randomly. Even good programmers in game tree search have big problems working with those latencies. -> but that's an entirely different sort of problem space and instructional area. Clusters are having latencies that are far worse than that. Yet as cpu speeds no longer increase much and number of cores doesn't double that quickly, clusters are the way to go if you're CPU hungry. Setting up small clusters is cheap as well. If i put in the name 'mellanox' in ebay i see bunches of cheap cards out there and also switches. -> Oh, Im sure the surplus market is full of things one could potentially use. But I suspect that by the time you lash together your $40 cards and $20 cables and several hundred $ switch, you're up in the total system price >$1k. And you're using surplus, so there's a support issue. If you're tinkering for yourself in the garage or as a one-off, then surplus is a fine way to go. If you want to be able to give a list of "go buy this" to a teacher, it needs to be off-the-shelf currently being manufactured stuff. -> Say you want to set up 10 demo systems with 8 nodes each, so that each student in a small class has their own to work with. There's a big difference between $30 Arduinos and $200 netbooks. With a single switch you can teach half a dozen students. You can just connect the machines you already got there onto a few switches and write MPI code like that. -> The whole point is to give a student exclusive access to the system, without needing to share. Sure, we've all done the shared "computer lab" resource thing and managed to learn(In the late 1970s, I would have done quite a lot to have on demand access to an 029 keypunch). That's part of what *personal* computers is all about. My program doesn't work right, I just hit the reset button and start over. -> I confess, too, that there is an aspect of the "mass of boards on the desktop with cables strewn around", which is a learning experience in itself. On the other hand, the Arduino experience is a lot less hassle than, say, a mass of PC mobos, network cards, and power supplies and trying to get them to boot off the net or a USB drive. Average cost per student also will be a couple of hundreds of dollars. -> that's the "total cost of several thousand dollars divided by N students who share it" I suspect. We could get into a little BOM battle, and I'd venture that I can keep the off the shelf parts cost under $500, and give each student a dedicated system to play with. The only part that I don't know right off the top of my head is the actual interconnect hardware. I think you'd want to design some sort of board with a bunch of connectors that connects to the Arduinos with ribbon cables. But even there, that could be "here's your PCBExpress file.. order the board and you get 3 for $50" -> over the years I've been involved in several of these "what can we set up for a demonstration", and I've converged to the realization that what you need is a parts list (preferably preloaded at Newark or DigiKey or Mouser or similar) and an explicit set of instructions. A setup that starts out with: 1) Find 8 motherboards on eBay or newegg with these sorts of specs 2) Find 8 power supplies that match the mother boards Is doomed to failure. You need "buy 3 of those and 6 of these, and hook them up this way" This is the beauty of the whole Arduino culture. In fact, it's a bit too much of that.. there's not a lot of good overview tutorial material.. but lots of "here's how to do specific task X"... I got started looking at Arduinos because I want to build a multichannel temperature controller to smoke/cure sausage. But I've used just about every small single board computer out there: Rabbit, Basic Stamp, various PIC boards, etc. not to mention various MiniITX and PC schemes. So far, the Arduino is the winner on dirt cheap and simple combined. Spend $30, plug in USB cable, load java environment, done. Now I know why all those projects at the science fair are using them. You get to focus on what you want to do, rather than getting a computer working. Vincent On Jan 12, 2012, at 12:24 AM, Lux, Jim (337C) wrote: > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Wednesday, January 11, 2012 2:47 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] A cluster of Arduinos > > Jim, your microcontroller cluster is not a rather good idea. > > Latency didn't keep up with the CPU speeds... > > --- You're missing the point of the cluster. It's not for performance > (where I can't imagine that the slowest single CPU PC out there > wouldn't blow the figurative doors off). It's to provide a very > inexpensive way to experiment/play/demonstrate loosely coupled > multiprocessor systems. > > --> for example, you could experiment with redundant message > routing across a fabric of nodes. The algorithms are fairly simple, > and this gives you a testbed which is qualitatively > different than just simulating a bunch of nodes on a single PC. > There is pedagogical value in a system where you can force a link > error by just disconnecting the cable, and your blinky lights on each > node show what's going on. > > > There is still too much years 80s and years 90s software out there, > written by the guys who wrote books about how to parallellize, which > simply doesn't scale at all at modern hardware. > > --> I think that a lot of the theory of parallel processes is > speed independent, and while some historical approaches might not be > used in a modern system for good implementation reasons, students and > others still need to learn about them, if only as the > canonical approach. Sure, you could do a simulation on a single > PC (and I've seen them, in Simulink, and in other more specialized > tools), but there's a lot of appeal to a hands-on-the-cheap- hardware > approach to learning. > > --> To take an example, if you set a student a problem of lighting > a LED on each node in a specified node order at specified intervals, > and where the node interconnects are not specified in advance, that's > a fairly interesting homework problem. You have to discover the > network connectivity graph, then figure out how to > pass the message to the appropriate node at the appropriate time. > This is a classic "hot plug network discovery" kind of problem, and in > the face of intermittent links, it's of great interest. > > --> While that particular problem isn't exactly HPC, it DOES relate > to HPC in a world where you cannot assume perfect processor nodes and > perfect communications links. And that gets right to the whole > "scalability" thing in HPC. It wasn't til the implementation of Error > Correcting Codes in logic that something like the Q7A computer was > even possible, because it was so large that you couldn't guarantee > that all the tubes would be working all the time. Likewise with many > other aspects of modern computing. > > --> And, of course, in the spaceflight world, this kind of thing is > even more important. A concept of growing importance is the > "fractionated spacecraft" where all of the functions that would have > been all in one physical vehicle are now spread across many smaller > pieces. And one might reallocate spacecraft fractional pieces between > different virtual spacecraft. Maybe right now, you need a lot of > processing power to do image compression and analysis, so you want to > allocate a lot of "processing pieces" to the job, with an ad hoc > network connection among them. Later, you don't need them, so you > can release them to other uses. The pieces might be in the immediate > vicinity, or they might be some distance away, which affects the data > rate in the link and its error rates. > > --> You can legitimately ask whether this sort of thing (the > fractionated spacecraft) is a Beowulf (defined as a cluster > supercomputer built of commodity components) and I would say it shares > many of the same properties, especially in the early Beowulf days > before multicores and fancy interconnects were fashionable for > multi-thousand processor clusters. It's that idea of building a large > complex device out of many basically identical subunits, using open > source/simple software to manage it. > > > -->> in summary, it's not about performance.. it's about a teaching > tool for networking in the context of cluster computing. You claim we > need to cast off the shackles of old programming styles and get some > new blood and ideas. Well, you need to get people interested in > parallel computing and learning the basics (so at least they don't > reinvent the square wheel). One way might be challenges such as > parallelization of game play; another might be working with > parallelized database; the way I propose is with experimenting with > message passing parallelization using dirt cheap hardware. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 20:22:07 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 17:22:07 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <201201121204.32332.samuel@unimelb.edu.au> References: <201201121204.32332.samuel@unimelb.edu.au> Message-ID: Interesting... That seems to be a growing trend, then. So, now we just have to wait for them to actually exist. The $35 B style board has Ethernet, and assuming one could netboot and operate "headless", then a stack o'raspberry PIs and a cheap Ethernet switch might be an alternate approach. The "per node" cost is comparable to the Arduino, and it's true that Ethernet is probably more congenial in the long run. Drawing 700mA off the microUSB, though.. That's fairly hefty (although not a big deal in general.. you might need to have some better power supply scheme for a basket o'pi cluster. (Arduino Uno runs around 40-50 mA) -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Chris Samuel Sent: Wednesday, January 11, 2012 5:05 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos On Thu, 12 Jan 2012 04:58:13 AM Lux, Jim (337C) wrote: > Also, does the Raspberry PI $25 price point include a power supply? I thought the plan was for them to be powered from the HDMI connector, but it appears I was wrong, it looks like it can use either microUSB or the GPIO header. http://elinux.org/RaspberryPiBoard # The board takes fixed 5V input, (with the 1V2 core voltage generated # directly from the input using the internal switch-mode supply on the # BCM2835 die). This permits adoption of the micro USB form factor, # which, in turn, prevents the user from inadvertently plugging in # out-of-range power inputs; that would be dangerous, since the 5V # would go straight to HDMI and output USB ports, even though the # problem should be mitigated by some protections applied to the input # power: The board provides a polarity protection diode, a voltage # clamp, and a self-resetting semiconductor fuse. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 21:03:21 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 03:03:21 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> The whole purpose of PC's is that they are generic to use. I remember how in past decision taking bought low clocked junk for big price - much against the wish of the sysadmins who wanted a PC for every student exclusively. Outdated slow junk is not interesting to students. Now you and i might like that CPU as it's under $1, but to them it's just 70Mhz, factor 500 slower than their home PC single core is. What impresses is if you got something that can beat their own machine at home. In the end in science we basically learn a lot easier if we can take a look into the future - so being faster than a single PC is a good example of that. So let them do that. If you take care you launch 1 proces on each machine, then at quadcore machines, not to mention i7's with hyperthreading, you can have 24 computers on 1 switch that serve 24 students, each using 12 logical cores. And for demonstration purposes you can run succesful applications also at all 24 computers at the same time. Hey there is switches with even more slots. Average price per student is gonna beat the crap out any junk solution you show up with - besides how many are you gonna buy? Those computers are already there, one for each student i suspect. So they can exclusively toy and toy - for the switch it's not a real problem except if they really mess up. But most important they learn something - by toying with 70Mhz hardware that's not representative and only intersting to experts like you and me, who are real good in embedded programming, they don't learn much. There is no replacement for the real thing to test upon. Besides if you go program at embedded processors, writing good fast single CPU code mine is probably gonna kick the hell out of you writing the same program at 8 CPU's. Probably by factor 10+ it'll be single core faster than you at 8. p.s. not that it's disturbing Jim but your replies are typed within my original message always, so tough to read sometimes what you typed into the message i posted here - maybe this apple macbookpro's mailing system doesn't know how to handle it - FYI i want to reformat it to linux anyway - getting sick being hacked silly each time by about every other consultant, but well this is all off topic - so hence the postscriptum. On Jan 12, 2012, at 2:09 AM, Lux, Jim (337C) wrote: > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Wednesday, January 11, 2012 4:37 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] A cluster of Arduinos > > Yes this was impossible to explain to a bunch of MiT folks as well, > some of whom wrote your book i bet - yet the slower the processor, > the more of a true SMP system it is. > > It's obvious that you missed that point. > > Writing code for a multicore is tougher, from SMP constraints > viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond > latency to the other cpu's. > > -> Yes, that's true... but that's also what I would think of as > more advanced than understanding basic message passing or non- > tightly-coupled multiprocessing systems. And there are lots of > applications for the latter. Some might not be as sexy as others, > but they exist. > > So it's far from demonstrating clusterprogramming. Lightyears away. > Emulation at a simple quadcore is in fact better representative > than this. > If you want to get closer to clusterprogramming than this, just buy > yourself off ebay some barcelona core SMP system with 4 sockets. > Say with energy efficient 1.8Ghz CPU's. > So with one of the first incarnations of hypertransport, as of > course later on it dramatically improved. > Latency from cpu to cpu is some 300+ ns if you lookup randomly. > Even good programmers in game tree search have big problems working > with those latencies. > > -> but that's an entirely different sort of problem space and > instructional area. > > > Clusters are having latencies that are far worse than that. Yet as > cpu speeds no longer increase much and number of cores doesn't > double that quickly, clusters are the way to go if you're CPU hungry. > Setting up small clusters is cheap as well. If i put in the name > 'mellanox' in ebay i see bunches of cheap cards out there and also > switches. > > -> Oh, Im sure the surplus market is full of things one could > potentially use. But I suspect that by the time you lash together > your $40 cards and $20 cables and several hundred $ switch, you're > up in the total system price >$1k. And you're using surplus, so > there's a support issue. If you're tinkering for yourself in the > garage or as a one-off, then surplus is a fine way to go. If you > want to be able to give a list of "go buy this" to a teacher, it > needs to be off-the-shelf currently being manufactured stuff. > > -> Say you want to set up 10 demo systems with 8 nodes each, so > that each student in a small class has their own to work with. > There's a big difference between $30 Arduinos and $200 netbooks. > > With a single switch you can teach half a dozen students. You can > just connect the machines you already got there onto a few switches > and write MPI code like that. > > -> The whole point is to give a student exclusive access to the > system, without needing to share. Sure, we've all done the shared > "computer lab" resource thing and managed to learn(In the late > 1970s, I would have done quite a lot to have on demand access to an > 029 keypunch). That's part of what *personal* computers is all > about. My program doesn't work right, I just hit the reset > button and start over. > > -> I confess, too, that there is an aspect of the "mass of boards > on the desktop with cables strewn around", which is a learning > experience in itself. On the other hand, the Arduino experience is > a lot less hassle than, say, a mass of PC mobos, network cards, and > power supplies and trying to get them to boot off the net or a USB > drive. > > > Average cost per student also will be a couple of hundreds of dollars. > -> that's the "total cost of several thousand dollars divided by N > students who share it" I suspect. We could get into a little BOM > battle, and I'd venture that I can keep the off the shelf parts > cost under $500, and give each student a dedicated system to play > with. The only part that I don't know right off the top of my head > is the actual interconnect hardware. I think you'd want to design > some sort of board with a bunch of connectors that connects to the > Arduinos with ribbon cables. But even there, that could be > "here's your PCBExpress file.. order the board and you get 3 for $50" > > -> over the years I've been involved in several of these "what can > we set up for a demonstration", and I've converged to the > realization that what you need is a parts list (preferably > preloaded at Newark or DigiKey or Mouser or similar) and an > explicit set of instructions. A setup that starts out with: > 1) Find 8 motherboards on eBay or newegg with these sorts of specs > 2) Find 8 power supplies that match the mother boards > > Is doomed to failure. You need "buy 3 of those and 6 of these, and > hook them up this way" > > This is the beauty of the whole Arduino culture. In fact, it's a > bit too much of that.. there's not a lot of good overview tutorial > material.. but lots of "here's how to do specific task X"... I got > started looking at Arduinos because I want to build a multichannel > temperature controller to smoke/cure sausage. > > But I've used just about every small single board computer out > there: Rabbit, Basic Stamp, various PIC boards, etc. not to mention > various MiniITX and PC schemes. So far, the Arduino is the winner > on dirt cheap and simple combined. Spend $30, plug in USB cable, > load java environment, done. Now I know why all those projects at > the science fair are using them. You get to focus on what you want > to do, rather than getting a computer working. > > Vincent > > > > On Jan 12, 2012, at 12:24 AM, Lux, Jim (337C) wrote: > >> >> >> -----Original Message----- >> From: beowulf-bounces at beowulf.org [mailto:beowulf- >> bounces at beowulf.org] On Behalf Of Vincent Diepeveen >> Sent: Wednesday, January 11, 2012 2:47 PM >> To: Beowulf Mailing List >> Subject: Re: [Beowulf] A cluster of Arduinos >> >> Jim, your microcontroller cluster is not a rather good idea. >> >> Latency didn't keep up with the CPU speeds... >> >> --- You're missing the point of the cluster. It's not for >> performance >> (where I can't imagine that the slowest single CPU PC out there >> wouldn't blow the figurative doors off). It's to provide a very >> inexpensive way to experiment/play/demonstrate loosely coupled >> multiprocessor systems. >> >> --> for example, you could experiment with redundant message >> routing across a fabric of nodes. The algorithms are fairly simple, >> and this gives you a testbed which is qualitatively >> different than just simulating a bunch of nodes on a single PC. >> There is pedagogical value in a system where you can force a link >> error by just disconnecting the cable, and your blinky lights on each >> node show what's going on. >> >> >> There is still too much years 80s and years 90s software out there, >> written by the guys who wrote books about how to parallellize, which >> simply doesn't scale at all at modern hardware. >> >> --> I think that a lot of the theory of parallel processes is >> speed independent, and while some historical approaches might not be >> used in a modern system for good implementation reasons, students and >> others still need to learn about them, if only as the >> canonical approach. Sure, you could do a simulation on a single >> PC (and I've seen them, in Simulink, and in other more specialized >> tools), but there's a lot of appeal to a hands-on-the-cheap- hardware >> approach to learning. >> >> --> To take an example, if you set a student a problem of lighting >> a LED on each node in a specified node order at specified intervals, >> and where the node interconnects are not specified in advance, that's >> a fairly interesting homework problem. You have to discover the >> network connectivity graph, then figure out how to >> pass the message to the appropriate node at the appropriate time. >> This is a classic "hot plug network discovery" kind of problem, >> and in >> the face of intermittent links, it's of great interest. >> >> --> While that particular problem isn't exactly HPC, it DOES relate >> to HPC in a world where you cannot assume perfect processor nodes and >> perfect communications links. And that gets right to the whole >> "scalability" thing in HPC. It wasn't til the implementation of >> Error >> Correcting Codes in logic that something like the Q7A computer was >> even possible, because it was so large that you couldn't guarantee >> that all the tubes would be working all the time. Likewise with many >> other aspects of modern computing. >> >> --> And, of course, in the spaceflight world, this kind of thing is >> even more important. A concept of growing importance is the >> "fractionated spacecraft" where all of the functions that would have >> been all in one physical vehicle are now spread across many smaller >> pieces. And one might reallocate spacecraft fractional pieces >> between >> different virtual spacecraft. Maybe right now, you need a lot of >> processing power to do image compression and analysis, so you want to >> allocate a lot of "processing pieces" to the job, with an ad hoc >> network connection among them. Later, you don't need them, so you >> can release them to other uses. The pieces might be in the immediate >> vicinity, or they might be some distance away, which affects the data >> rate in the link and its error rates. >> >> --> You can legitimately ask whether this sort of thing (the >> fractionated spacecraft) is a Beowulf (defined as a cluster >> supercomputer built of commodity components) and I would say it >> shares >> many of the same properties, especially in the early Beowulf days >> before multicores and fancy interconnects were fashionable for >> multi-thousand processor clusters. It's that idea of building a >> large >> complex device out of many basically identical subunits, using open >> source/simple software to manage it. >> >> >> -->> in summary, it's not about performance.. it's about a teaching >> tool for networking in the context of cluster computing. You >> claim we >> need to cast off the shackles of old programming styles and get some >> new blood and ideas. Well, you need to get people interested in >> parallel computing and learning the basics (so at least they don't >> reinvent the square wheel). One way might be challenges such as >> parallelization of game play; another might be working with >> parallelized database; the way I propose is with experimenting with >> message passing parallelization using dirt cheap hardware. >> >> >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing To change your subscription (digest mode or unsubscribe) >> visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eagles051387 at gmail.com Thu Jan 12 02:42:10 2012 From: eagles051387 at gmail.com (Jonathan Aquilina) Date: Thu, 12 Jan 2012 08:42:10 +0100 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> <4F020244.4040505@ias.edu> <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> Message-ID: <4F0E8ED2.5000504@gmail.com> On 11/01/2012 18:30, Vincent Diepeveen wrote: > On Jan 2, 2012, at 8:15 PM, Prentice Bisbal wrote: > >> On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: >>> it's very useful Mark, as we know now he works for the company and >>> also for which nation. >>> >>> Vincent >> For someone who's always bashing on US Foreign policy, you sure sound >> like a Republican or member of the Department of Homeland Security! > Where is my paycheck? > >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf FYI vincent I am no back in malta. Regards Jonathan Aquilina Get a signature like this. CLICK HERE. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p.gif Type: image/gif Size: 35 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pixel.png Type: image/png Size: 90 bytes Desc: not available URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eagles051387 at gmail.com Thu Jan 12 02:42:26 2012 From: eagles051387 at gmail.com (Jonathan Aquilina) Date: Thu, 12 Jan 2012 08:42:26 +0100 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> <4F020244.4040505@ias.edu> <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> Message-ID: <4F0E8EE2.7040403@gmail.com> On 11/01/2012 18:30, Vincent Diepeveen wrote: > On Jan 2, 2012, at 8:15 PM, Prentice Bisbal wrote: > >> On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: >>> it's very useful Mark, as we know now he works for the company and >>> also for which nation. >>> >>> Vincent >> For someone who's always bashing on US Foreign policy, you sure sound >> like a Republican or member of the Department of Homeland Security! > Where is my paycheck? > >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf FYI vincent I am now back in malta. Regards Jonathan Aquilina Get a signature like this. CLICK HERE. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p.gif Type: image/gif Size: 35 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pixel.png Type: image/png Size: 90 bytes Desc: not available URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eugen at leitl.org Thu Jan 12 03:49:45 2012 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 12 Jan 2012 09:49:45 +0100 Subject: [Beowulf] the Barcelona Supercomputing Center Message-ID: <20120112084945.GD21917@leitl.org> Just some cluster porn: http://imgur.com/a/OoNVI _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Jan 12 05:16:28 2012 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 12 Jan 2012 10:16:28 -0000 Subject: [Beowulf] A cluster of Arduinos References: <201201121204.32332.samuel@unimelb.edu.au> Message-ID: <207BB2F60743C34496BE41039233A8090A7D728A@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > Interesting... > That seems to be a growing trend, then. So, now we just have to wait > for them to actually exist. The $35 B style board has Ethernet, and > assuming one could netboot and operate "headless", then a stack > o'raspberry PIs and a cheap Ethernet switch might be an alternate > approach. Regarding Ethernet switches, I had cause recently to look for an USB powered switch Such things exist, they are promoted for gamers. http://www.scan.co.uk/products/8-port-eten-pw-108-pocket-size-metal-casi ng-10-100-switch-usb-powered-lan-party! You could imagine a cluster being powered by those USB adapters which fit into the cigarette lighter socket of a car. How about a cluster which fits in the glovebox or under the seat of a car? The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From peter.st.john at gmail.com Thu Jan 12 08:49:16 2012 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 12 Jan 2012 08:49:16 -0500 Subject: [Beowulf] the Barcelona Supercomputing Center In-Reply-To: <20120112084945.GD21917@leitl.org> References: <20120112084945.GD21917@leitl.org> Message-ID: The architectural contrast (the building housing the racks is a chapel) is vivid. Sorta Steampunkish. The place is described some at http://www.bsc.es/plantillaA.php?cat_id=1 (many of their pages seem to be in English). Peter On Thu, Jan 12, 2012 at 3:49 AM, Eugen Leitl wrote: > > Just some cluster porn: > > http://imgur.com/a/OoNVI > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ellis at runnersroll.com Thu Jan 12 08:58:20 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 08:58:20 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> Message-ID: <4F0EE6FC.2050002@runnersroll.com> On 01/11/2012 09:03 PM, Vincent Diepeveen wrote: > The whole purpose of PC's is that they are generic to use. I remember > how in past decision taking bought low clocked junk for big price - > much against the wish of the sysadmins who wanted a PC for every > student exclusively. Outdated slow junk is not interesting > to students. Now you and i might like that CPU as it's under $1, but > to them it's just 70Mhz, factor 500 slower than their home PC single > core > is. What impresses is if you got something that can beat their own > machine at home. > > In the end in science we basically learn a lot easier if we can take > a look into the future - so being faster than a single PC is a good > example of that. Take this advice in any other area, let's say, Chemical Engineering or Mechanical Engineering, and the students are going to come out the of the experience with chemical burns at least to at most blowing up half of the building. In the best case all they do is screw up very, very expensive equipment. So I have to respectfully disagree that learning is only possible and students will only be interested when working on the stuff of the "future." I think this is likely the reason why many introductory engineering classes incorporate use of Lego Mindstorm robots rather than lunar rovers (or even overstock lunar rovers :D). Point in case, I got interested in HPC/Beowulfery back in 2006, read RGBs book and a few other texts on it, and finally found a small group (4) of unused PIIIs to play on in the attic of one of my college's buildings. Did I learn how to setup a reasonable cluster? Yes. Was it slow as dirt compared to then modern Intel and AMD processors? Of course. But did the experience get me so completely hooked on HPC/Cluster research that I went on to pursue a PHD on the topic? Absolutely. Granted, I'm just one data point, but I think Jim's idea has all the right components for a great educational experience. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Jan 12 09:28:56 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 12 Jan 2012 09:28:56 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <201201121204.32332.samuel@unimelb.edu.au> Message-ID: <4F0EEE28.6030404@ias.edu> On 01/11/2012 08:22 PM, Lux, Jim (337C) wrote: > Interesting... > That seems to be a growing trend, then. So, now we just have to wait for them to actually exist. The $35 B style board has Ethernet, and assuming one could netboot and operate "headless", then a stack o'raspberry PIs and a cheap Ethernet switch might be an alternate approach. > > The "per node" cost is comparable to the Arduino, and it's true that Ethernet is probably more congenial in the long run. You can get an ethernet "shield" for arduino to add ethernet capabilities, but at $35-50 each, you cost savings just went out the window, especially when compared to the Raspberry Pi. You can also buy the Arduino Ethernet, which is an arduino board with Ethernet built in, but at a cost of ~$60, is no better a value than buying an arduino and the ethernet shield separately. > Drawing 700mA off the microUSB, though.. That's fairly hefty (although not a big deal in general.. you might need to have some better power supply scheme for a basket o'pi cluster. (Arduino Uno runs around 40-50 mA The arduino can be powered by USB, or a 9V power supply, so if you plan on using lots of them (as Jim is, theoretically), you don't have to worry about overloading the USB bus. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 09:35:50 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 06:35:50 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <207BB2F60743C34496BE41039233A8090A7D728A@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: On 1/12/12 2:16 AM, "Hearns, John" wrote: >> Interesting... >> That seems to be a growing trend, then. So, now we just have to wait >> for them to actually exist. The $35 B style board has Ethernet, and >> assuming one could netboot and operate "headless", then a stack >> o'raspberry PIs and a cheap Ethernet switch might be an alternate >> approach. > >Regarding Ethernet switches, I had cause recently to look for an USB >powered switch >Such things exist, they are promoted for gamers. >http://www.scan.co.uk/products/8-port-eten-pw-108-pocket-size-metal-casi >ng-10-100-switch-usb-powered-lan-party! > >You could imagine a cluster being powered by those USB adapters which >fit into the cigarette >lighter socket of a car. >How about a cluster which fits in the glovebox or under the seat of a >car? Powering off the cigarette lighter socket (or 12V power socket as they're now labeled) is probably feasible, but those USB widgets can't source a lot of power. Certainly not amps. > > >The contents of this email are confidential and for the exclusive use of >the intended recipient. If you receive this email in error you should >not copy it, retransmit it, use it or disclose its contents but should >return it to the sender immediately and delete your copy. >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 09:39:23 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 15:39:23 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EE6FC.2050002@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> Message-ID: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> The average guy is not interested in knowing all details regarding how to play tennis with a wooden racket from the 1980s, just around the time when McEnroe was on the tennisfield playing there. Most people are more interested in whether you can win that grandslam with what you produce. The nerds however are interested in how well you can do with a wooden racket from 1980s,therefore projecting your own interest upon those students will just get them desinterested and you will be judged by them as an irrelevant person in their life, whose name they soon forget. Vincent On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: > On 01/11/2012 09:03 PM, Vincent Diepeveen wrote: >> The whole purpose of PC's is that they are generic to use. I remember >> how in past decision taking bought low clocked junk for big price - >> much against the wish of the sysadmins who wanted a PC for every >> student exclusively. Outdated slow junk is not interesting >> to students. Now you and i might like that CPU as it's under $1, but >> to them it's just 70Mhz, factor 500 slower than their home PC single >> core >> is. What impresses is if you got something that can beat their own >> machine at home. >> >> In the end in science we basically learn a lot easier if we can take >> a look into the future - so being faster than a single PC is a good >> example of that. > > Take this advice in any other area, let's say, Chemical Engineering or > Mechanical Engineering, and the students are going to come out the of > the experience with chemical burns at least to at most blowing up half > of the building. In the best case all they do is screw up very, very > expensive equipment. So I have to respectfully disagree that learning > is only possible and students will only be interested when working on > the stuff of the "future." I think this is likely the reason why many > introductory engineering classes incorporate use of Lego Mindstorm > robots rather than lunar rovers (or even overstock lunar rovers :D). > > Point in case, I got interested in HPC/Beowulfery back in 2006, read > RGBs book and a few other texts on it, and finally found a small group > (4) of unused PIIIs to play on in the attic of one of my college's > buildings. Did I learn how to setup a reasonable cluster? Yes. > Was it > slow as dirt compared to then modern Intel and AMD processors? Of > course. But did the experience get me so completely hooked on > HPC/Cluster research that I went on to pursue a PHD on the topic? > Absolutely. > > Granted, I'm just one data point, but I think Jim's idea has all the > right components for a great educational experience. > > Best, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Jan 12 09:38:13 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 12 Jan 2012 09:38:13 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> Message-ID: <4F0EF055.3050609@ias.edu> On 01/11/2012 09:03 PM, Vincent Diepeveen wrote: > The whole purpose of PC's is that they are generic to use. That is also the purpose of the Arduino. That's why they open-sourced it's hardware design. > I remember > how in past decision taking bought low clocked junk for big price - > much against the wish of the sysadmins who wanted a PC for every > student exclusively. Outdated slow junk is not interesting > to students. Now you and i might like that CPU as it's under $1, but > to them it's just 70Mhz, factor 500 slower than their home PC single > core > is. What impresses is if you got something that can beat their own > machine at home. > Wrong. What impresses students is teaching something they didn't already know, or showing them how to do something new. Using baking soda and vinegar to build a volcano, is very low-tech, but it still impresses students of all ages (even in this modern Apple i-everything world) and it's done with ingredients just about everyone already has in their kitchen. Show them sodium acetate crystallizing out of a supersaturated solution, and their heads practically explode. Also very low-tech. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Jan 12 09:50:05 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 12 Jan 2012 09:50:05 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> Message-ID: <4F0EF31D.8010603@ias.edu> On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: > The average guy is not interested in knowing all details regarding > how to > play tennis with a wooden racket from the 1980s, just around > the time when McEnroe was on the tennisfield playing there. > > Most people are more interested in whether you can win that grandslam > with what you produce. > > The nerds however are interested in how well you can do with a wooden > racket > from 1980s,therefore projecting your own interest upon those students > will just > get them desinterested and you will be judged by them as an > irrelevant person > in their life, whose name they soon forget. Vincent, I think the only person projecting here is you. You refer to the 'average guy'. The word 'average' itself implies that statistics have been collected and analyzed. Can you please show us your statistics, and how you collected them, to determine what the average guy is interested in? And what about the average girl, what is she interested in? If you are merely citing the work of other researchers, please include citations. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 09:53:57 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 09:53:57 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EF31D.8010603@ias.edu> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> <4F0EF31D.8010603@ias.edu> Message-ID: <4F0EF405.5070600@runnersroll.com> On 01/12/2012 09:50 AM, Prentice Bisbal wrote: > On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: >> The average guy is not interested in knowing all details regarding >> how to >> play tennis with a wooden racket from the 1980s, just around >> the time when McEnroe was on the tennisfield playing there. >> >> Most people are more interested in whether you can win that grandslam >> with what you produce. >> >> The nerds however are interested in how well you can do with a wooden >> racket >> from 1980s,therefore projecting your own interest upon those students >> will just >> get them desinterested and you will be judged by them as an >> irrelevant person >> in their life, whose name they soon forget. > > Vincent, I think the only person projecting here is you. You refer to > the 'average guy'. The word 'average' itself implies that statistics > have been collected and analyzed. Can you please show us your > statistics, and how you collected them, to determine what the average > guy is interested in? And what about the average girl, what is she > interested in? If you are merely citing the work of other researchers, > please include citations. Guys, let's just let this one die in it's traditional form of Vincent disagrees with the list and there is nothing more that can be done. I recently read a blog that suggested (due to similar threads following these trajectories) that the Wulf list wasn't what it used to be. Let's save the flames for editors, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:03:49 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:03:49 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EF31D.8010603@ias.edu> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> <4F0EF31D.8010603@ias.edu> Message-ID: Very simple, Wooden tennis rackets were dirt cheap in 90s. No one bought them. Instead they all bought for the tennis court a light frame racket with big blade; in fact those were pretty expensive in some cases. Why did no one use suddenly those wooden rackets anymore? How many people watch upcoming Australian Grandslam? A lot. How many will watch 1 or 2 dudes toy with a few embedded processors using a language no one has heard of? Only a handful. On Jan 12, 2012, at 3:50 PM, Prentice Bisbal wrote: > On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: >> The average guy is not interested in knowing all details regarding >> how to >> play tennis with a wooden racket from the 1980s, just around >> the time when McEnroe was on the tennisfield playing there. >> >> Most people are more interested in whether you can win that grandslam >> with what you produce. >> >> The nerds however are interested in how well you can do with a wooden >> racket >> from 1980s,therefore projecting your own interest upon those students >> will just >> get them desinterested and you will be judged by them as an >> irrelevant person >> in their life, whose name they soon forget. > > Vincent, I think the only person projecting here is you. You refer to > the 'average guy'. The word 'average' itself implies that statistics > have been collected and analyzed. Can you please show us your > statistics, and how you collected them, to determine what the average > guy is interested in? And what about the average girl, what is she > interested in? If you are merely citing the work of other > researchers, > please include citations. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 10:10:40 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 07:10:40 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> Message-ID: On 1/12/12 6:39 AM, "Vincent Diepeveen" wrote: >The average guy is not interested in knowing all details regarding >how to >play tennis with a wooden racket from the 1980s, just around >the time when McEnroe was on the tennisfield playing there. > >Most people are more interested in whether you can win that grandslam >with what you produce. > >The nerds however are interested in how well you can do with a wooden >racket >from 1980s,therefore projecting your own interest upon those students >will just >get them desinterested and you will be judged by them as an >irrelevant person >in their life, whose name they soon forget. > Having spent some time recently in Human Resources meetings about how to better recruit software people for JPL, I'd say that something that appeals to nerds and gives them something to do is not all bad. Part of the educational process is to find and separate the people who are interested and have a passion. I'm not sure that someone who starts getting into clusters mostly because they are interested in breaking into the Top500 is the target audience in any case. If you look over the hobby clusters out there, the vast majority are "hey, I heard about this interesting idea, I scrounged up N old/small/slow/easy to find computers and tried to cluster them and do something. I learned something about cluster administration, and it was fun, but I don't use it anymore" This is exactly the population you want to hit. Bring in 100 advanced high school (grade 11-12 in US) students. Have them all use cheap hardware to do a cluster. Some fraction will think, "this is kind of cool, maybe I should major in CS instead of X" Some fraction will think, "how lame, why not make the single processor faster", and they can be CompEng or EE majors looking at how to reduce feature sizes and get the heat out. It's just like biology or chemistry classes. In high school biology (9th/10th grade) most of it is mundane memorization (Krebs cycle, various descriptive stuff. Other than the use of cheap cmos cameras, microscopes used at this level haven't really changed much in the last 100 years (and the microscopes at my kids' school are probably 10-20 years old). They also do some more modern molecular biology in a series of labs partly funded by Amgen: Some recombinant DNA to put fluorescent proteins in a bacteria, running some gels, etc. The vast majority of the students will NOT go on to a career in biology, but some fraction do, they get interested in some aspect, and they wind up majoring in bio, or being a pre-med, etc. Not everyone is looking for the world beater. A lot of kids start with Kart racing, even though even the fastest Karts aren't as fast as F1 (or even a Smart Car). How many engineers started with dismantling the lawnmower engine? For my own work, I'd rather have people who are interested in solving problems by ganging up multiple failure prone processors, rather than centralizing it all in one monolithic box (even if the box happens to have multiple cores). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:13:00 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:13:00 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EF405.5070600@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> <4F0EF31D.8010603@ias.edu> <4F0EF405.5070600@runnersroll.com> Message-ID: On Jan 12, 2012, at 3:53 PM, Ellis H. Wilson III wrote: > On 01/12/2012 09:50 AM, Prentice Bisbal wrote: >> On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: >>> The average guy is not interested in knowing all details regarding >>> how to >>> play tennis with a wooden racket from the 1980s, just around >>> the time when McEnroe was on the tennisfield playing there. >>> >>> Most people are more interested in whether you can win that >>> grandslam >>> with what you produce. >>> >>> The nerds however are interested in how well you can do with a >>> wooden >>> racket >>> from 1980s,therefore projecting your own interest upon those >>> students >>> will just >>> get them desinterested and you will be judged by them as an >>> irrelevant person >>> in their life, whose name they soon forget. >> >> Vincent, I think the only person projecting here is you. You >> refer to >> the 'average guy'. The word 'average' itself implies that statistics >> have been collected and analyzed. Can you please show us your >> statistics, and how you collected them, to determine what the average >> guy is interested in? And what about the average girl, what is she >> interested in? If you are merely citing the work of other >> researchers, >> please include citations. > > Guys, let's just let this one die in it's traditional form of Vincent > disagrees with the list and there is nothing more that can be done. I Ah no medicine seems to cure you. Let me remember the original posting of Jim: "it seems you could put together a simple demonstration of parallel processing and various message passing things." The insights presented here obviously render this platform as no good for that, not inspiring and for sure the clever students will total get desinterested and a bunch, out of desinterest probably not even finish the course. Working with stuff that isn't even within factor 500 of the speed of a normal CPU that doesn't motivate, doesn't inspire and basically learns a person very little. Embedded cpu's are for professionals, leave it like that. They are too hard for you to program efficiently. > recently read a blog that suggested (due to similar threads following > these trajectories) that the Wulf list wasn't what it used to be. > > Let's save the flames for editors, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:21:54 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:21:54 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> On Jan 12, 2012, at 4:10 PM, Lux, Jim (337C) wrote: > > > On 1/12/12 6:39 AM, "Vincent Diepeveen" wrote: > >> The average guy is not interested in knowing all details regarding >> how to >> play tennis with a wooden racket from the 1980s, just around >> the time when McEnroe was on the tennisfield playing there. >> >> Most people are more interested in whether you can win that grandslam >> with what you produce. >> >> The nerds however are interested in how well you can do with a wooden >> racket >> from 1980s,therefore projecting your own interest upon those students >> will just >> get them desinterested and you will be judged by them as an >> irrelevant person >> in their life, whose name they soon forget. >> > > Having spent some time recently in Human Resources meetings about > how to > better recruit software people for JPL, I'd say that something that > appeals to nerds and gives them something to do is not all bad. > Part of > the educational process is to find and separate the people who are > interested and have a passion. I'm not sure that someone who starts > getting into clusters mostly because they are interested in > breaking into > the Top500 is the target audience in any case. > > If you look over the hobby clusters out there, the vast majority > are "hey, > I heard about this interesting idea, I scrounged up N old/small/ > slow/easy > to find computers and tried to cluster them and do something. I > learned > something about cluster administration, and it was fun, but I don't > use it > anymore" > > This is exactly the population you want to hit. Bring in 100 advanced > high school (grade 11-12 in US) students. Have them all use cheap > hardware to do a cluster. Some fraction will think, "this is kind of > cool, maybe I should major in CS instead of X" Some fraction will > think, Your example here will just take care a big number of students don't want to have to do anything with those studies, as there is a few lame nerds there who toy with equipment that's factor 50k slower (adding to the factor 500 the object oriented slowdown of factor 100) than what they have at home, and it can do nothing useful. But in this specific case you'll just scare away students and the real clever ones will get total desinterested as you are busy with lame duck speed type cpu's. If you'd build a small marsrover with it that would be something else of course. > "how lame, why not make the single processor faster", and they can be > CompEng or EE majors looking at how to reduce feature sizes and get > the > heat out. > > It's just like biology or chemistry classes. In high school biology > (9th/10th grade) most of it is mundane memorization (Krebs cycle, > various > descriptive stuff. Other than the use of cheap cmos cameras, > microscopes > used at this level haven't really changed much in the last 100 > years (and > the microscopes at my kids' school are probably 10-20 years old). They > also do some more modern molecular biology in a series of labs partly > funded by Amgen: Some recombinant DNA to put fluorescent proteins > in a > bacteria, running some gels, etc. The vast majority of the > students will > NOT go on to a career in biology, but some fraction do, they get > interested in some aspect, and they wind up majoring in bio, or > being a > pre-med, etc. > > Not everyone is looking for the world beater. A lot of kids start > with > Kart racing, even though even the fastest Karts aren't as fast as > F1 (or > even a Smart Car). How many engineers started with dismantling the > lawnmower engine? > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens > to have > multiple cores). > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 10:35:41 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 07:35:41 -0800 Subject: [Beowulf] List traffic In-Reply-To: <4F0EF405.5070600@runnersroll.com> Message-ID: On 1/12/12 6:53 AM, "Ellis H. Wilson III" wrote: > I >recently read a blog that suggested (due to similar threads following >these trajectories) that the Wulf list wasn't what it used to be. I think that's for a variety of reasons.. The cluster world has changed. Back 15-20 years ago, clusters were new, novel, and pretty much roll your own, so there was a lot of traffic on the list about how to do that. Remember all the mobo comparisons, and all the carefully teased out idiosyncracies of various switches and network schemes. Back then, the idea of using a cluster for "big computing" was kind of new, as well. People building clusters were doing it either because the architecture was interesting OR because they had a computing problem to solve, and a cluster was a cheap way to do it, especially with free labor. I think clustering has evolved, and the concept of a cluster is totally mature. You can buy a cluster essentially off the shelf, from a whole variety of companies (some with people who were participating in this list back then and still today), and it's interesting how the basic Beowulf concept has evolved. Back in late 90s, it was still largely "commodity computers, commodity interconnects" where the focus was on using "business class" computers and networking hardware. Perhaps not consumer, as cheap as possible, but certainly not fancy, schmancy rack mounted 1U servers.. The switches people were using were just ordinary network switches, the same as in the wiring closet down the hall. Over time, though, there has developed a whole industry of supplying components specifically aimed at clusters: high speed interconnects, computers, etc. Some of this just follows the IT industry in general.. There weren't as many "server farms" back in 1995 as there are now. Maybe it's because the field has matured? So, we're back to talking about "roll-your-own" clusters of one sort or another. I think anyone serious about big cluster computing (>100 nodes) probably won'd be hanging on this list looking for hints on how to route and label their network cables. There's too many other places to go get that information, or, better yet, places to hire someone who already knows. I know that if I needed massive computational power at work, my first thought these days isn't "hey, lets build a cluster", it's "let's call up the HPC folks and get an account on one of the existing clusters". But I still see the need to bring people into the cluster world in some way. I don't know where the cluster vendors find their people, or even what sorts of skill sets they're looking for. Are they beating the bushes at CMU, MIT, and other hotbeds of CS looking for prior cluster design experience? I suspect not, just like most of the people JPL hires don't have spacecraft experience in school, or anywhere. You look for bright people who might be interested in what you're doing, and they learn the details of cluster-wrangling on the job. For myself, I like probing the edges of what you can do with a cluster. Big computational problems don't excite me. I like thinking about things like: 1) What can I use from the body of cluster knowledge to do something different. A distributed cluster is topologically similar to one all contained in a single rack, but it's different. How is it different (latency, error rate)? Can I use analysis (particularly from early cluster days) to do a better job. 2) I've always been a fan of *personal* computing (probably from many years of negotiating for a piece of some shared resource). It's tricky here, because as soon as you have a decent 8 or 16 node cluster that fits under a desk, and have figured out all the hideous complexity of how to port some single user application to run on it, someone comes out with a single processor box that's just as fast, and a lot easier to use. Back in the 80s, I designed, but did not build, a 80286 clone using discrete ECL logic, the idea being to make a 100MHz IBM PC-AT that would run standard spreadsheet software 20 times faster (a big deal when your huge spreadsheet takes hours to recalculate). However, Moore's law and Intel made that idea a losing proposition. But still, the idea of personal control over my computing resources is appealing. Nobody watching to see "are you effectively using those cpu cycles". No arguing about annual re-adjustment of chargeback rates where you take the total system budget and divide it by CPU seconds. Ooops not enough people used it, so your CPU costs just quadrupled. 3) I'm also interested in portable computing (Yes, I have a NEC 8201- TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I had one of those too, etc.) This is another interesting problem space.. No big computer room with infrastructure. Here, the fascinating trade is between local computer horsepower and cheap long distance datacomm. At some point, it's cheaper/easier to send your data via satellite link to a big computer elsewhere and get the results back. It's the classic 60s remote computing problem revisited once again. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:56:32 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:56:32 +0100 Subject: [Beowulf] Robots In-Reply-To: <4F0EE6FC.2050002@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> Message-ID: <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: > I think this is likely the reason why many > introductory engineering classes incorporate use of Lego Mindstorm > robots rather than lunar rovers (or even overstock lunar rovers :D). I didn't comment on other complete wrong examples, but i want to highlight one. Your example of a lego robot actually is disproving your statement. Amongst the affordable non-self built robots, the lego robot actually is a genius robot. It so to speak the i7-3960x under the robots, to compare it with the fastest i7 that has been released to date. It is affordable, it is completely programmable with robot OS, and if you want to build something better you need to be pretty genius. A custom robot, except if you build a real simple stupid thing that can do near to nothing, that'll be really expensive compared to such lego robot which goes for oh a copule of hundreds of dollars only. I see it for around 280 dollar online, and to add some components is just a few dozens of dollars each copmonent. > The normal way to build 'something better', if better at all, requires building most components for example from aluminium. Each component then has a price of say roughly $5k and needs to be special engineered. You need many of those components. We assume then it's not a commercial project otherwise also royalties will be involved paying for every component you build, of course that's a small part of the above price. Most custom robots, which are hardly bigger in size than the legorobot, they're pretty expensive actually. If you want to purchase components together for a tad bigger robot, just something with 4 wheels which can hold a couple of dozens of kilo's, such components already are $5k - $10k. And that's mass produced components. So building something that actually is more functional, better, that's not gonna be easy. It's a genius robot, really is. In itself it's not really a lot more expensive , if you produce something in the quantities at which lego produces it, to build a bigger robot. The reason the lego robot is very small. has really to do with safety. Big robots rare really dangerous you know. In cars they use already dozens of cpu's, already 10+ year old cars have easily over 100 cpu's inside, just for safety, with the intend that components of the car don't damage humankind. Robotsoftware is far too primitive there yet. No nothing safety concerns. In all that, the lego robot is really a genius thing. Very bad example of what you 'tried' to show with some fake arguments. > > Point in case, I got interested in HPC/Beowulfery back in 2006, read > RGBs book and a few other texts on it, and finally found a small group > (4) of unused PIIIs to play on in the attic of one of my college's > buildings. Did I learn how to setup a reasonable cluster? Yes. > Was it > slow as dirt compared to then modern Intel and AMD processors? Of > course. But did the experience get me so completely hooked on > HPC/Cluster research that I went on to pursue a PHD on the topic? > Absolutely. > > Granted, I'm just one data point, but I think Jim's idea has all the > right components for a great educational experience. > > Best, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 11:45:29 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 17:45:29 +0100 Subject: [Beowulf] List traffic In-Reply-To: References: Message-ID: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> Well i feel small clusters of say 2 computers or so might get more common in future. Yet let's start asking: What is a cluster however? That's not such a simple answer. Having a few computers at home connected via a router with simple default ethernet is something many have at home. Is that a cluster? Maybe. Let me focus pon the clusters with a decent network. The decent network clusters suffer from a number of problems. The biggest problem for this list: 0) yesterday i read in the newspaper another Irani scientist was killed by a carbomb. Past few years i really missed experts posting in here and some dorks who really have nothing to contribute to the cluster world, and just are there to be here, like Jonathan Aquilina, they get back in return. So experts leave and idiots come back. This has completely killed this mailing list. 1) The lack of postings by RGB past few months, especially the ones where he explains how easy it is to build a nuke, given the right ingredients, which gives interesting discussions. Let's look to clusters: 10) the lack of software support for clusters This is the real big issue. Sure you can get expensive commercial software to run on clusters, but that's all interesting just for scientists. Which game can effectively use cluster hardware and is dirt cheap? This really is a big issue. Note i intend to contribute myself there to change that, but that's just 1 person of course. Not an entire market moving there 11) the huge break even point of using clusterhardware I can give examples that i sat here at home with next to me Don Dailey, the programmer of Cilkchess, which used Cilk from Leierson. We played Diep at a single cpu against Cilkchess single cpu and Cilkchess got total toasted. After having been fried for 4 consecutive games, Don had enough of it and disconnected the connection to the cluster, from which he used 1 cpu for the games, and started to play at a version at his laptop, which did NOT use CILK. So no parallel framework. It was factor 40 faster. Now note that at tournaments they showed up with 500 or even 1800 cpu's, yet you can't have a cluster of 1800 cpu's at home. Usually building a 4 socket box is far easier, though not necessarily cheaper, and practical faster than a small cluster. Especially AMD has a bunch of cheap 4 socket solutions int he market, if you buy those 2nd hand ,there is not really any competition there from 4 socket clusters in the same price range. 100) the huge increase in power consumption lately of machines. Up to 2002 i used to visit someone, Jan Louwman, who had 36 computres at home, testing chessprograms at home. So that wasn't a cluster, just a bunch of machiens, in sets of 2 machines connected with a special cable we used to play back then machines against each other. Nearly all of those machines was 60-100 watt or so. He had divided his computers over 3 rooms or so, majority in 1 room though. There the 16 ampere @ 230 volt power plug already had problems supplying this amount of electricity. Around the power plug in the wall, the wall and plastic of the powerplug were completely black burned. As there was only a single P4 machine amongst the computers, only 1 box really consumed a lot of power. Try to run 36 computers at home nowadays. Most machines are well over 250 watt, and the fastest 2 machines i've got here eat 410 respectively 270 watt. That's excluding the videocard in the 410 watt machine, as it's out of it currently (AMD HD 6970), the box has been setup for gpgpu. 36 machines eat way way too much power. This is a very simple practical problem that one shouldn't overlook. It's not realistic that the average joe sets up at his popular gaming program a cluster of more than 2 machines or so. A 2 machine cluster will never beat a 2 socket machine, except when each node also has 2 sockets. So clustering simple home computers together isn't really useful except if you really cluster together half a dozen or more. Half a dozen machines, using the 250 watt measure and another 25 watt for each card and 200 watt for the switch, it's gonna eat 6 * 275 + 200 = 1850 watt. You really need diehards for that. They are there and more than you and i guess, but they need SOFTWARE that interests them that can use it in a very efficient manner, clearly proven to them to be working great and easy to install, which refers to point 11. 101) most people like to buy new stuff. new cluster hardware is very expensive for more than 2 computers as it needs a switch. Second hand it's a lot cheaper, sometimes even dirt cheap, yet that's already not what most people like to do 110) Linux had a few setbacks and got less attractive. Say when we had redhat end 90s with x-windows it was slowly improving a lot. Then x64 was there together with a big dang and we went back years and years to x.org. X.org threw back linux 10 years in time. It eats massive RAM, it's ugly bad, it's slow, it's difficult to configure etc. Basically there isn't many good distributions now that are for free. As most clusters work only very well under linux, the difficulty of using linux should really be factored in. Have a problem under linux? Then forget it as a normal user. Now for me linux got MORE attractive as i get hacked total silly by every consultant who on this planet knows how to hack on the internet, yet that's not representative for those with cash who can afford a cluster. Note i don't fall into the cash group. My total income in 2011 was real little. 111) Usually the big cash to afford a cluster is for people with a good job or a tad older, that's usually a different group than the group that can work with linux. See the previous points for that Despite all that i believe clusters will get more popular in future, for a simple reason: processors don't really clock higher. So all software that can use additional calculation power already is getting parallellized or already has been paralelllized. It's a matter of time before some of those applications also will work well at cluster hardware. Yet this is a slow proces and it really requires software that works real efficient at small number of nodes. As an example of why i feel this will happen i give to you the popularity amongst gamers to run 2 graphics cards connected via a bridge with each other within 1 machine. Yet the important factor there is that the games really profit from doing that. On Jan 12, 2012, at 4:35 PM, Lux, Jim (337C) wrote: > > > On 1/12/12 6:53 AM, "Ellis H. Wilson III" > wrote: >> I >> recently read a blog that suggested (due to similar threads following >> these trajectories) that the Wulf list wasn't what it used to be. > > I think that's for a variety of reasons.. > > The cluster world has changed. Back 15-20 years ago, clusters were > new, > novel, and pretty much roll your own, so there was a lot of traffic > on the > list about how to do that. Remember all the mobo comparisons, and > all the > carefully teased out idiosyncracies of various switches and network > schemes. > > Back then, the idea of using a cluster for "big computing" was kind of > new, as well. People building clusters were doing it either > because the > architecture was interesting OR because they had a computing > problem to > solve, and a cluster was a cheap way to do it, especially with free > labor. > > I think clustering has evolved, and the concept of a cluster is > totally > mature. You can buy a cluster essentially off the shelf, from a whole > variety of companies (some with people who were participating in > this list > back then and still today), and it's interesting how the basic Beowulf > concept has evolved. > > Back in late 90s, it was still largely "commodity computers, commodity > interconnects" where the focus was on using "business class" > computers and > networking hardware. Perhaps not consumer, as cheap as possible, but > certainly not fancy, schmancy rack mounted 1U servers.. The switches > people were using were just ordinary network switches, the same as > in the > wiring closet down the hall. > > Over time, though, there has developed a whole industry of supplying > components specifically aimed at clusters: high speed interconnects, > computers, etc. Some of this just follows the IT industry in > general.. > There weren't as many "server farms" back in 1995 as there are now. > > Maybe it's because the field has matured? > > > So, we're back to talking about "roll-your-own" clusters of one > sort or > another. I think anyone serious about big cluster computing (>100 > nodes) > probably won'd be hanging on this list looking for hints on how to > route > and label their network cables. There's too many other places to > go get > that information, or, better yet, places to hire someone who > already knows. > > I know that if I needed massive computational power at work, my first > thought these days isn't "hey, lets build a cluster", it's "let's > call up > the HPC folks and get an account on one of the existing clusters". > > But I still see the need to bring people into the cluster world in > some > way. I don't know where the cluster vendors find their people, or > even > what sorts of skill sets they're looking for. Are they beating the > bushes > at CMU, MIT, and other hotbeds of CS looking for prior cluster design > experience? I suspect not, just like most of the people JPL hires > don't > have spacecraft experience in school, or anywhere. You look for > bright > people who might be interested in what you're doing, and they learn > the > details of cluster-wrangling on the job. > > > For myself, I like probing the edges of what you can do with a > cluster. > Big computational problems don't excite me. I like thinking about > things > like: > > 1) What can I use from the body of cluster knowledge to do something > different. A distributed cluster is topologically similar to one all > contained in a single rack, but it's different. How is it different > (latency, error rate)? Can I use analysis (particularly from early > cluster > days) to do a better job. > > 2) I've always been a fan of *personal* computing (probably from many > years of negotiating for a piece of some shared resource). It's > tricky > here, because as soon as you have a decent 8 or 16 node cluster > that fits > under a desk, and have figured out all the hideous complexity of > how to > port some single user application to run on it, someone comes out > with a > single processor box that's just as fast, and a lot easier to use. > Back > in the 80s, I designed, but did not build, a 80286 clone using > discrete > ECL logic, the idea being to make a 100MHz IBM PC-AT that would run > standard spreadsheet software 20 times faster (a big deal when your > huge > spreadsheet takes hours to recalculate). However, Moore's law and > Intel > made that idea a losing proposition. > > But still, the idea of personal control over my computing resources is > appealing. Nobody watching to see "are you effectively using those > cpu > cycles". No arguing about annual re-adjustment of chargeback rates > where > you take the total system budget and divide it by CPU seconds. > Ooops not > enough people used it, so your CPU costs just quadrupled. > > 3) I'm also interested in portable computing (Yes, I have a NEC 8201- > TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I > had one > of those too, etc.) This is another interesting problem space.. > No big > computer room with infrastructure. Here, the fascinating trade is > between > local computer horsepower and cheap long distance datacomm. At some > point, it's cheaper/easier to send your data via satellite link to > a big > computer elsewhere and get the results back. It's the classic 60s > remote > computing problem revisited once again. > > >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Jan 12 11:49:25 2012 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 12 Jan 2012 11:49:25 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: snip > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens to have > multiple cores). > This is going to be an exascale issue. i.e. how to compute on a systems whose parts might be in a constant state of breaking. An other interesting question is how do you know you are getting the right answer on a *really* large system? Of course I spend much of my time optimizing really small systems. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Thu Jan 12 11:58:32 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 17:58:32 +0100 Subject: [Beowulf] Adding 1 point In-Reply-To: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> References: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> Message-ID: What really made small clusters at home less attractive, there is another reason i should add : That's the rise of cheap multi socket machines. A 2 socket machine is not so expensive anymore nowadays. So if you want faster than 1 socket, you buy a 2 socket machine. If you want faster than that , 4 sockets is there. That choice wasn't there before end 90s easily available. And in the 21th century it has become cheap. Another delaying factor is the rise of so many cores per node. AMD and intel sell cpu's for their 4 socket line that has up to double the amount of nodes than you can have in a single socket box. So it's equivalent nearly to 8 nodes, be it low clocked. For that reason clusters tend to get more effective at a dozen nodes or more, assuming cheap single socket nodes. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 12:26:01 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 12:26:01 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> References: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> Message-ID: <4F0F17A9.7010400@runnersroll.com> On 01/12/2012 10:21 AM, Vincent Diepeveen wrote: > On Jan 12, 2012, at 4:10 PM, Lux, Jim (337C) wrote: >> This is exactly the population you want to hit. Bring in 100 advanced >> high school (grade 11-12 in US) students. Have them all use cheap >> hardware to do a cluster. Some fraction will think, "this is kind of >> cool, maybe I should major in CS instead of X" Some fraction will >> think, > > Your example here will just take care a big number of students don't > want > to have to do anything with those studies, as there is a few lame nerds > there who toy with equipment that's factor 50k slower (adding to the > factor 500 > the object oriented slowdown of factor 100) than what they have > at home, and it can do nothing useful. > > But in this specific case you'll just scare away students and the > real clever ones > will get total desinterested as you are busy with lame duck speed > type cpu's. You have made it abundantly clear you aren't interested in enrolling in such a course. Thanks for your comments. On a related note, as I was thinking about 'lame duck' education, I remembered that I took an undergraduate machine learning course in which we designed players for connect-four, which would compete using recently learned techniques against other students in the class. Despite that particular game being a solved one, we all had a blast and got quite competitive trying to beat each other out using the recently acquired skills. I would encourage Jim to do something similar once the basics of cluster administration are done -- perhaps a mini SC Cluster Competition would be a neat application for the Arduinos? Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 12:35:11 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 12:35:11 -0500 Subject: [Beowulf] Robots In-Reply-To: <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> Message-ID: <4F0F19CF.2050603@runnersroll.com> On 01/12/2012 10:56 AM, Vincent Diepeveen wrote: > On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: >> I think this is likely the reason why many >> introductory engineering classes incorporate use of Lego Mindstorm >> robots rather than lunar rovers (or even overstock lunar rovers :D). > > I didn't comment on other complete wrong examples, but i want to highlight > one. Your example of a lego robot actually is disproving your statement. It was a price comparison, and without diving into the nitty-gritty of how good or bad both the Arduino and the Mindstorms are in their respective areas, it was spot on. Jim wants to give each student a 10 node cluster on the cheap (i.e. 20 to 30 bucks per node = 300 bucks), universities want to give each student (or teams of students sometimes) a robot (~280). Both provide an approachable level of difficulty and potential for education at a reasonable price. Feel free to continue to disagree for the sake of such. It was just an example. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 12:54:52 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 09:54:52 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: -----Original Message----- From: Douglas Eadline [mailto:deadline at eadline.org] Sent: Thursday, January 12, 2012 8:49 AM To: Lux, Jim (337C) Cc: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos snip > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens to > have multiple cores). > This is going to be an exascale issue. i.e. how to compute on a systems whose parts might be in a constant state of breaking. An other interesting question is how do you know you are getting the right answer on a *really* large system? Of course I spend much of my time optimizing really small systems. -- Your point about scaling is well taken.. so far, the computing world has largely dealt with things by trying to make the processor perfect and error free. Some limited areas of error correction are popular (RAM). But think in a bigger area... say your arithmetic unit has some infrequent unknown errors (e.g. FDIV bug on Pentium).. could clever algorithm design and multiple processors (or multi cores) mitigate this (e.g. instead of just computing Z = X/Y you also compute Z1 = (X*2)/(Y*2).. and compare answers... that exact example's not great because you've added 2 operations, but I can see that there are other clever techniques that might be possible.. ) What is nice if you can do things like temporal redundancy (do the calculation twice, and if it's different, do it a third time), or even better some sort of "check calculation" that takes small time compared to mainline calculation. This, I think, is somewhere that even the big iron/cluster folks could be doing some research. What are optimum communication fabrics to support this kind of "side calculation" which may have different communication patterns and data flow than the "mainline". It has a parallel in things like CRC checks in communications protocols. A lot of hardware has a dedicated little CRC checker that is continuously calculating the CRC as the bits arrive, so that when you get to the end of the frame, the answer is already there. And Doug, your small systems have a lot of the same issues, perhaps because that small Limulus might be operated in environments other than what the underlying hardware was designed for. I know people who have been rudely surprised when they found that the design environment for a laptop is a pretty narrow temperature range (e.g. office desktop) and when they put them in a car, subject to 0C or 40C temperatures, if not wider, that things don't work quite as well as expected. Very small systems (few nodes) have the same issues, in some environments (e.g. a cluster subject to single event upsets or functional interrupts in a high radiation environment with a lot of high energy charged particles. it's not so much a total dose thing, but a SEE thing) For Juno (which is in polar orbit around Jupiter), we shielded everything in a vault (a 1 meter cube with 1cm thick titanium walls) and still it's an issue. We don't get very long before everything is cooked. And I think that a non-trivially small cluster (e.g. more than 4 nodes, I think) you could do a lot of experimentation on techniques. (oddly, simulated fault injection is one of the trickier parts) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 12:55:41 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 12:55:41 -0500 Subject: [Beowulf] List traffic In-Reply-To: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> References: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> Message-ID: <4F0F1E9D.9000800@runnersroll.com> I really should be following Joe's advice circa 2008 and just not responding, but I can't help myself. On 01/12/2012 11:45 AM, Vincent Diepeveen wrote: > The biggest problem for this list: > 1) The lack of postings by RGB past few months, especially the ones > where he explains how easy > it is to build a nuke, given the right ingredients, which gives > interesting discussions. The last post from RGB was a long, long discussion about how very wrong you were about RNGs. You just don't get it. It's okay to be wrong once in a while Vincent, and even moreso to just agree to disagree. Foolish, unedited and inflammatory diatribes with a unnatural dose of newlines are what is killing this list and what that blog I referenced was specifically disappointed with. So please, I'm begging you. Stop writing huge emails that trail off from their original point. Try to say things in a non-inflammatory manner. Use spell-check, and try to read your emails once before sending them. And last of all, remember that there are many people on this list that have all sorts of different applications -- not just Chess. Your experience does not generalize well to all areas. Speaking of which, for anyone who is interested in doing serious work with low-power processors, please see a paper named FAWN for an excellent example of use-cases where low hertz low power processors can do some great work. It's by Dave Anderson of CMU. I was lucky enough to be invited to the CMU PDL retreat a few months back and had a nice conversation about the project when we went for a run together. There are some use-cases that benefit massively from that kind of architecture. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 13:10:24 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 10:10:24 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0F17A9.7010400@runnersroll.com> References: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> <4F0F17A9.7010400@runnersroll.com> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Ellis H. Wilson III Sent: Thursday, January 12, 2012 9:26 AM To: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos On 01/12/2012 10:21 AM, Vincent Diepeveen wrote: > On Jan 12, 2012, at 4:10 PM, Lux, Jim (337C) wrote: >> This is exactly the population you want to hit. Bring in 100 >> advanced high school (grade 11-12 in US) students. Have them all use >> cheap hardware to do a cluster. Some fraction will think, "this is >> kind of cool, maybe I should major in CS instead of X" Some fraction >> will think, > > Your example here will just take care a big number of students don't > want to have to do anything with those studies, as there is a few lame > nerds there who toy with equipment that's factor 50k slower (adding to > the factor 500 the object oriented slowdown of factor 100) than what > they have at home, and it can do nothing useful. > > But in this specific case you'll just scare away students and the real > clever ones will get total desinterested as you are busy with lame > duck speed type cpu's. You have made it abundantly clear you aren't interested in enrolling in such a course. Thanks for your comments. On a related note, as I was thinking about 'lame duck' education, I remembered that I took an undergraduate machine learning course in which we designed players for connect-four, which would compete using recently learned techniques against other students in the class. Despite that particular game being a solved one, we all had a blast and got quite competitive trying to beat each other out using the recently acquired skills. I would encourage Jim to do something similar once the basics of cluster administration are done -- perhaps a mini SC Cluster Competition would be a neat application for the Arduinos? ---------------------------------------- Ooohh.. that sounds *very* cool.. A bunch of slow processors. A simple problem to solve (e.g. 3D tic-tac-toe) for which there might even be published parallel approaches The challenge is effectively using the limited system, warts and all. The RaspberryPI might be a better vehicle, if it hits the price/availability targets: Comparable to Arduinos in price, but a bit more sophisticated and less contrived. We've been talking about what kind of software competitions JPL could run as a recruiting tool at Universities, and that's along those lines. Hmm... I wonder if they'd be willing to spend recruiting funds on that? (probably not.. we're all poor this fiscal year) And, on the undergrad education thing... At UCLA, I had to write stuff in MIXAL to run on a simulated MIX machine and complained mightily to the TAs, who just pointed to the sacred texts of Knuth, rather than giving an intelligent response as to why we didn't do something like work in PDP-11 ASM or System/360 BAL. (UCLA at the time had a monster 360, but I don't know that they had many 11s, and realistically, BAL is not something I'd inflict on 2nd quarter first year students. We were a PL/I or PL/C shop in the first couple years' classes for the most part, although there were people doing Algol) OTOH, I suspect was an atypical incoming student for 1977. I had, the previous year, done the Pascal courses at UCSD with p-machines running on LSI-11s as well as the Pascal system on the big Burroughs B6700, which uses a form of ALGOL as the machine language and is a stack machine to boot (how cool is that? Burroughs always did have cool machines.. Hey, they built ILLIAC IV). I had also done some ASM stuff on an 11/20 under RT-11. I guess that's characteristic of the differences in philosophy between different CS departments (UCSD was heading more in the direction of Software Engineering being part of the School of Engineering and Applied Sciences, while UCLA it was part of the Math department. Little did I know, as a cybernetics major, what the difference was: It sure as heck isn't manifested in the course catalog, at least in a form that a incoming student could discern. Going back now, I could probably look at catalogs from the various universities of the era and divine their philosophies, but that's clearly 2020 hindsight ) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 13:22:26 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 10:22:26 -0800 Subject: [Beowulf] FAWN Message-ID: Fast Array of Wimpy Nodes.. http://www.cs.cmu.edu/~fawnproj/ Very cool stuff... Their original motivation (reduction of power) is at a much larger scale than my work usually works at (they're talking megawatts in googleish clusters.. I worry about watts derived from solar panels and such) But it's a whole 'nother twist on the idea of clustering of low performance nodes (by some metric.. they've got good nanojoule/operation metrics) . And they're doing a very clever thing where they work with the very asymmetric read/write speeds on flash memory. (And FLASH memory is something I spend a lot of time thinking about these days.. It's what we use in space for NVRAM these days) Looks like I've got some reading for the holiday weekend. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ellis at runnersroll.com Thu Jan 12 13:26:26 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 13:26:26 -0500 Subject: [Beowulf] FAWN In-Reply-To: References: Message-ID: <4F0F25D2.90305@runnersroll.com> On 01/12/2012 01:22 PM, Lux, Jim (337C) wrote: > But it?s a whole ?nother twist on the idea of clustering of low > performance nodes (by some metric.. they?ve got good nanojoule/operation > metrics) . > Not just good, from a sorting perspective, /best/: http://sortbenchmark.org/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Jan 12 13:47:21 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 12 Jan 2012 13:47:21 -0500 Subject: [Beowulf] List traffic In-Reply-To: <4F0F1E9D.9000800@runnersroll.com> References: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> <4F0F1E9D.9000800@runnersroll.com> Message-ID: <4F0F2AB9.5060105@scalableinformatics.com> On 01/12/2012 12:55 PM, Ellis H. Wilson III wrote: > I really should be following Joe's advice circa 2008 and just not > responding, but I can't help myself. huh ...? > > On 01/12/2012 11:45 AM, Vincent Diepeveen wrote: >> The biggest problem for this list: >> 1) The lack of postings by RGB past few months, especially the ones >> where he explains how easy >> it is to build a nuke, given the right ingredients, which gives >> interesting discussions. > > The last post from RGB was a long, long discussion about how very wrong > you were about RNGs. You just don't get it. It's okay to be wrong once > in a while Vincent, and even moreso to just agree to disagree. Foolish, > unedited and inflammatory diatribes with a unnatural dose of newlines > are what is killing this list and what that blog I referenced was > specifically disappointed with. > > So please, I'm begging you. Stop writing huge emails that trail off > from their original point. Try to say things in a non-inflammatory ... oh ... never mind :) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 14:08:38 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 11:08:38 -0800 Subject: [Beowulf] FAWN In-Reply-To: <4F0F25D2.90305@runnersroll.com> References: <4F0F25D2.90305@runnersroll.com> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Ellis H. Wilson III Sent: Thursday, January 12, 2012 10:26 AM To: beowulf at beowulf.org Subject: Re: [Beowulf] FAWN On 01/12/2012 01:22 PM, Lux, Jim (337C) wrote: > But it's a whole 'nother twist on the idea of clustering of low > performance nodes (by some metric.. they've got good > nanojoule/operation > metrics) . > Not just good, from a sorting perspective, /best/: http://sortbenchmark.org/ ------------- I was thinking that their low powered nodes are poor in an absolute performance standpoint (i.e. MIPS), but actually quite good on a computation work per joule basis. Yes, for sorting, they are kicking rear. This is interesting, but when you start talking power consumption, one needs to be careful about where you draw boundaries and what's "in the system". Do you count conversion efficiency in the power supply? At one level, you say, no, just worry about DC power consumption, but even there.. is it at the board edge, or at the chip? Something drawing 100Amps at 0.5V is a very different beast than something drawing 10Amps at 5V, and you can't locally optimize too far because your choices inside box A start to affect the design and performance of Box B and Box C. The contest rules point to a variety of power measurement systems, but based on what I see there, I think there's some scope for "gaming" the system. It sort of seems it's "wall plug power", but then, they do allow DC power systems. For instance, one could tune the power supply for the expected load conditions.. You could run those fans at warp speed before the test run starts to cool down as much as possible, and then slow them down (saving power) during the run, maybe even letting the processor get pretty hot. Sort of like running a top fuel dragster. Only has to go fast for 3 or 4 seconds, so why bother putting in a water pump. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 14:40:15 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 14:40:15 -0500 Subject: [Beowulf] FAWN In-Reply-To: References: <4F0F25D2.90305@runnersroll.com> Message-ID: <4F0F371F.2060704@runnersroll.com> On 01/12/2012 02:08 PM, Lux, Jim (337C) wrote: > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Ellis H. Wilson III > Sent: Thursday, January 12, 2012 10:26 AM > To: beowulf at beowulf.org > Subject: Re: [Beowulf] FAWN > > On 01/12/2012 01:22 PM, Lux, Jim (337C) wrote: >> But it's a whole 'nother twist on the idea of clustering of low >> performance nodes (by some metric.. they've got good >> nanojoule/operation >> metrics) . >> > > Not just good, from a sorting perspective, /best/: > http://sortbenchmark.org/ > ------------- > > I was thinking that their low powered nodes are poor in an absolute performance standpoint (i.e. MIPS), but actually quite good on a computation work per joule basis. > > Yes, for sorting, they are kicking rear. > > > This is interesting, but when you start talking power consumption, one needs to be careful about where you draw boundaries and what's "in the system". Do you count conversion efficiency in the power supply? At one level, you say, no, just worry about DC power consumption, but even there.. is it at the board edge, or at the chip? Something drawing 100Amps at 0.5V is a very different beast than something drawing 10Amps at 5V, and you can't locally optimize too far because your choices inside box A start to affect the design and performance of Box B and Box C. > > > The contest rules point to a variety of power measurement systems, but based on what I see there, I think there's some scope for "gaming" the system. It sort of seems it's "wall plug power", but then, they do allow DC power systems. > > For instance, one could tune the power supply for the expected load conditions.. You could run those fans at warp speed before the test run starts to cool down as much as possible, and then slow them down (saving power) during the run, maybe even letting the processor get pretty hot. > > Sort of like running a top fuel dragster. Only has to go fast for 3 or 4 seconds, so why bother putting in a water pump. All fair points, and I can't contest the suggestion that they likely tune their algorithm and physical units very highly to perform well for this sorting environment. Dave actually keeps a pretty balanced perspective when discussing this, as shown in his reaction to Google talking down wimpy nodes. Wired has a nice article on it, with inside it a link to Googles pub that discusses the other half of the coin: http://www.wired.com/wiredenterprise/2012/01/wimpy_nodes/ Some more reading material for the weekend ;). ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Jan 12 15:45:16 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 12 Jan 2012 15:45:16 -0500 Subject: [Beowulf] Partial OT: CPU grouping control for Windows 2008 R2 x64 server for big calcs Message-ID: <4F0F465C.4010301@scalableinformatics.com> Ok, this one is fun. For some definitions of fun. Unusual definitions of fun... And there is a question towards the end. This is for folks who've been administrating clusters and HPC systems with big windows machines (32+ CPUs and large RAM). Imagine you have a machine as part of a very loose computing cluster. End user wants to run Windows (2008R2 x64 enterprise) on it. This machine has 32 processor cores (real ones, no hyperthreading), 1TB ram. Yeah, its a fun machine to work on. I won't discuss the OS choice here. You can see some of my playing with it here: http://scalability.org/?p=3541 and http://scalability.org/?p=3515 Windows machines can let up to 64 logical processors be part of a "group". A group is a scheduling artifice, and not necessarily directly related to the NUMA system ... think of it as a layer abstraction above this. Ok, still with me? This scheduling artifice, these groups, require at minimum a recompilation to work properly with. Its actually more than that, they do require some additional processor affinity bits be handled. If you have a code which doesn't handle this correctly, it will probably crash. Or not work well. Or both. Matlab appears to be such a beast. This isn't necessarily a Matlab issue per se, it appears to be something of a design compromise issue in Windows. Windows wasn't designed with large processor counts in mind. The changes they'd need to make in order to enable a single large spanning entity across all CPUs at once are quite likely not in the companies best interests, as there are very few customers with such machines. Still with me? Here's the problem. Matlab seems to crash (according to the user) if run on a unit with more than one group. I've not been able to verify on the machine yet myself, but I have no reason to disbelieve this. The issue as its been stated to me is that if there is more than one group of processors, Matlab crashes. This is the symptom. When the unit boots by default, we have 2 16 processor groups. So looking at bcdedit examples, I see how to turn off groups. One minor problem. It doesn't work. I can do an bcdedit /set groupaware off reboot. Which should completely disable groups, so that all 32 processor are in one group. Still 2 groups. I can do an bcdedit /set groupsize 64 reboot. Still 2 groups. So far, the only thing that seems to change this is if I install the hyperV role. With that, there is now 1 group. Looking at all the boot options with bcdedit /enum there's only one config for boot, and its the default. So ... my questions 1) Does Windows really ignore its approximate equivalent to its boot options on a grub line? 2) Is there any way to compel Windows to do the right thing? As noted, this is for a computing cluster. Our recommended OS isn't feasible right now for them and their application. Definitely annoying. I'd love there to be a bios setting to help windows past its desire to ignore my requested number of groups. Not sure if adding in the hyperV will impact performance (did some base testing with Scilab to see, and I didn't see anything I'd call significant). Will be bugging Microsoft about this as well (pretty obviously a bug in 2008R2 x64). And related to this, I read something about limits in the different windows editions. Is anyone using Windows HPC cluster on big memory machines with lots of cores? Looking at the Microsoft docs, they indicate some relatively low limits on ram and processor count. So does this mean that they won't be supporting Interlagos 4 socket machines 16 cores per socket and 1/2 TB ram in compute nodes for Windows HPC ? I am just imagining someone buying a few of those nodes and being required to buy Enterprise or Data center licenses for those machines (which clearly would not be used for anything more than HPC). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Fri Jan 13 00:36:50 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 13 Jan 2012 16:36:50 +1100 Subject: [Beowulf] FAWN In-Reply-To: <4F0F25D2.90305@runnersroll.com> References: <4F0F25D2.90305@runnersroll.com> Message-ID: <4F0FC2F2.5090606@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/01/12 05:26, Ellis H. Wilson III wrote: > Not just good, from a sorting perspective, /best/: > http://sortbenchmark.org/ But that algorithm isn't running on exactly wimpy hardware.. Intel Core i5-2400S 2.5 GHz, 16GB RAM and a bunch of SSDs cheers! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8PwvIACgkQO2KABBYQAh84cgCfQZN1ZpKfzxLmazCiZLg93n89 dwYAoIZHAFmUYENP2xwMwo5M3xile4F3 =4lFT -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 13 09:01:59 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 13 Jan 2012 15:01:59 +0100 Subject: [Beowulf] Robots In-Reply-To: <4F0F19CF.2050603@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> <4F0F19CF.2050603@runnersroll.com> Message-ID: <01D34971-9054-4F19-9776-8F107B118A1D@xs4all.nl> On Jan 12, 2012, at 6:35 PM, Ellis H. Wilson III wrote: > On 01/12/2012 10:56 AM, Vincent Diepeveen wrote: >> On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: >>> I think this is likely the reason why many >>> introductory engineering classes incorporate use of Lego Mindstorm >>> robots rather than lunar rovers (or even overstock lunar rovers :D). >> >> I didn't comment on other complete wrong examples, but i want to >> highlight >> one. Your example of a lego robot actually is disproving your >> statement. > > It was a price comparison, and without diving into the nitty-gritty > of how good or bad both the Arduino and the Mindstorms are in their > respective areas, it was spot on. Jim wants to give each student a > 10 node cluster on the cheap (i.e. 20 to 30 bucks per node = 300 > bucks), universities want to give each student (or teams of > students sometimes) a robot (~280). Both provide an approachable > level of difficulty and potential for education at a reasonable price. > > Feel free to continue to disagree for the sake of such. It was > just an example. > > Best, > > ellis It's not even spot on. You're lightyears away with your comparision. You're comparing one of the best available robots that gets mass produced, with some freak thing where there is 100 alternatives which work way better, alternatives are 500x faster, and if you want to also cheaper, and above all achieve the original goal better of demonstrating SMP programming, as the freak hardware, thanks to real low clocked type of CPU, has a neglectible latency to other cpu's. Where the robot shows you how to work with robots, the educational purpose as Jim wrote down, you won't get very well with the embedded cpu's, as the equipment has none of the typical problems you can encounter in a normal SMP system let alone a cluster environment, meanwhile it has total other problems, which you will never encounter at CPU's. Such as that embedded cpu's have severely limited caches and can execute just 1 instruction at a time. Embedded programming is total different from CPU programming and latencies embedded, thanks to the slow processor speed, are not even comparable with SMP programming between cores of 1 cpu. Such multicore box definitely has a cost below $300. On ebay i see nodes with 8 cores for $200. And those are 500x faster. Myself i'm looking at some socket 771 Xeon machines say with a L5420. Though they eat a lot more power than intel claims, it's still i guess a 170 watt a machine or so under full load. Note we still skipped the algorithmic discussion, as from algorithmic viewpoint, if i look to artificial intelligence, getting something to work at 70Mhz machines is gonna behave total different and needs total different approach than todays hardware. It's not even in the same ballpark. Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ntmoore at gmail.com Fri Jan 13 09:33:33 2012 From: ntmoore at gmail.com (Nathan Moore) Date: Fri, 13 Jan 2012 08:33:33 -0600 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> Message-ID: Jim, Have you ever interacted with the "Modeling Instruction" folks over at ASU? http://modeling.asu.edu/ They've done, for HS Physics, more or less what you're talking about in terms of making the subject engaging, compelling, and diven by student, not teacher, interest. On Thu, Jan 12, 2012 at 9:10 AM, Lux, Jim (337C) wrote: > > > On 1/12/12 6:39 AM, "Vincent Diepeveen" wrote: > >>The average guy is not interested in knowing all details regarding >>how to >>play tennis with a wooden racket from the 1980s, just around >>the time when McEnroe was on the tennisfield playing there. >> >>Most people are more interested in whether you can win that grandslam >>with what you produce. >> >>The nerds however are interested in how well you can do with a wooden >>racket >>from 1980s,therefore projecting your own interest upon those students >>will just >>get them desinterested and you will be judged by them as an >>irrelevant person >>in their life, whose name they soon forget. >> > > Having spent some time recently in Human Resources meetings about how to > better recruit software people for JPL, I'd say that something that > appeals to nerds and gives them something to do is not all bad. Part of > the educational process is to find and separate the people who are > interested and have a passion. ?I'm not sure that someone who starts > getting into clusters mostly because they are interested in breaking into > the Top500 is the target audience in any case. > > If you look over the hobby clusters out there, the vast majority are "hey, > I heard about this interesting idea, I scrounged up N old/small/slow/easy > to find computers and tried to cluster them and do something. ?I learned > something about cluster administration, and it was fun, but I don't use it > anymore" > > This is exactly the population you want to hit. ?Bring in 100 advanced > high school (grade 11-12 in US) students. ?Have them all use cheap > hardware to do a cluster. ?Some fraction will think, "this is kind of > cool, maybe I should major in CS instead of X" ?Some fraction will think, > "how lame, why not make the single processor faster", and they can be > CompEng or EE majors looking at how to reduce feature sizes and get the > heat out. > > It's just like biology or chemistry classes. ?In high school biology > (9th/10th grade) most of it is mundane memorization (Krebs cycle, various > descriptive stuff. ?Other than the use of cheap cmos cameras, microscopes > used at this level haven't really changed much in the last 100 years (and > the microscopes at my kids' school are probably 10-20 years old). They > also do some more modern molecular biology in a series of labs partly > funded by Amgen: ? Some recombinant DNA to put fluorescent proteins in a > bacteria, running some gels, etc. ?The vast majority of the students will > NOT go on to a career in biology, but some fraction do, they get > interested in some aspect, and they wind up majoring in bio, or being a > pre-med, etc. > > Not everyone is looking for the world beater. ?A lot of kids start with > Kart racing, even though even the fastest Karts aren't as fast as F1 (or > even a Smart Car). ?How many engineers started with dismantling the > lawnmower engine? > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens to have > multiple cores). > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- - - - - - - -?? - - - - - - -?? - - - - - - - Nathan Moore Associate Professor, Physics Winona State University - - - - - - -?? - - - - - - -?? - - - - - - - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Fri Jan 13 09:38:28 2012 From: deadline at eadline.org (Douglas Eadline) Date: Fri, 13 Jan 2012 09:38:28 -0500 Subject: [Beowulf] FAWN In-Reply-To: <4F0FC2F2.5090606@unimelb.edu.au> References: <4F0F25D2.90305@runnersroll.com> <4F0FC2F2.5090606@unimelb.edu.au> Message-ID: <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 13/01/12 05:26, Ellis H. Wilson III wrote: > >> Not just good, from a sorting perspective, /best/: >> http://sortbenchmark.org/ > > But that algorithm isn't running on exactly wimpy hardware.. > > Intel Core i5-2400S 2.5 GHz, 16GB RAM and a bunch of SSDs I can vouch for the i5-2400S processors, one of the best values out there, I got 200 GFLOPS on a Limulus using 4 of these. Some more benchmarks here: http://www.clustermonkey.net//content/view/306/1/ -- Doug > > cheers! > Chris > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8PwvIACgkQO2KABBYQAh84cgCfQZN1ZpKfzxLmazCiZLg93n89 > dwYAoIZHAFmUYENP2xwMwo5M3xile4F3 > =4lFT > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at eadline.org Fri Jan 13 10:18:02 2012 From: deadline at eadline.org (Douglas Eadline) Date: Fri, 13 Jan 2012 10:18:02 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> > > > -----Original Message----- > From: Douglas Eadline [mailto:deadline at eadline.org] > Sent: Thursday, January 12, 2012 8:49 AM > To: Lux, Jim (337C) > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] A cluster of Arduinos > > snip >> >> >> For my own work, I'd rather have people who are interested in solving >> problems by ganging up multiple failure prone processors, rather than >> centralizing it all in one monolithic box (even if the box happens to >> have multiple cores). >> > > This is going to be an exascale issue. i.e. how to compute on a systems > whose parts might be in a constant state of breaking. An other interesting > question is how do you know you are getting the right answer on a *really* > large system? > > Of course I spend much of my time optimizing really small systems. > > -- > > Your point about scaling is well taken.. so far, the computing world has > largely dealt with things by trying to make the processor perfect and > error free. Some limited areas of error correction are popular (RAM). > But think in a bigger area... say your arithmetic unit has some infrequent > unknown errors (e.g. FDIV bug on Pentium).. could clever algorithm design > and multiple processors (or multi cores) mitigate this (e.g. instead of > just computing Z = X/Y you also compute Z1 = (X*2)/(Y*2).. and compare > answers... that exact example's not great because you've added 2 > operations, but I can see that there are other clever techniques that > might be possible.. ) > > What is nice if you can do things like temporal redundancy (do the > calculation twice, and if it's different, do it a third time), or even > better some sort of "check calculation" that takes small time compared to > mainline calculation. > > This, I think, is somewhere that even the big iron/cluster folks could be > doing some research. What are optimum communication fabrics to support > this kind of "side calculation" which may have different communication > patterns and data flow than the "mainline". It has a parallel in things > like CRC checks in communications protocols. A lot of hardware has a > dedicated little CRC checker that is continuously calculating the CRC as > the bits arrive, so that when you get to the end of the frame, the answer > is already there. > > > And Doug, your small systems have a lot of the same issues, perhaps > because that small Limulus might be operated in environments other than > what the underlying hardware was designed for. I know people who have > been rudely surprised when they found that the design environment for a > laptop is a pretty narrow temperature range (e.g. office desktop) and when > they put them in a car, subject to 0C or 40C temperatures, if not wider, > that things don't work quite as well as expected. I will be curious to see where these things show up since all you really need is a power plug. (a little nervous actually). > > Very small systems (few nodes) have the same issues, in some environments > (e.g. a cluster subject to single event upsets or functional interrupts in > a high radiation environment with a lot of high energy charged particles. > it's not so much a total dose thing, but a SEE thing) > > For Juno (which is in polar orbit around Jupiter), we shielded everything > in a vault (a 1 meter cube with 1cm thick titanium walls) and still it's > an issue. We don't get very long before everything is cooked. > > And I think that a non-trivially small cluster (e.g. more than 4 nodes, I > think) you could do a lot of experimentation on techniques. I agree. Four nodes is really small. BTW, the most fun in designing this system is a set of tighter constraints than are found on the typical cluster. Noise, power, space, cabling, low cost packaging, etc. I have been asked about a rack mount version, we'll see. One thing I find interesting is the core/node efficiency. (what I call "effective cores") In general *on some codes*, I found that less cores (1P micro-atx 4-cores) is more efficient than many cores (2P server 12-core). Seems obvious, but I like to test things. > > > (oddly, simulated fault injection is one of the trickier parts) > I would assume, because in a sense, the black swan* is by definition hard to predict. (* the book by Nick Taleb, not the movie) -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Fri Jan 13 11:26:29 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Fri, 13 Jan 2012 08:26:29 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> Message-ID: On 1/13/12 7:18 AM, "Douglas Eadline" wrote: >> >> >> And Doug, your small systems have a lot of the same issues, perhaps >> because that small Limulus might be operated in environments other than >> what the underlying hardware was designed for. I know people who have >> been rudely surprised when they found that the design environment for a >> laptop is a pretty narrow temperature range (e.g. office desktop) and >>when >> they put them in a car, subject to 0C or 40C temperatures, if not wider, >> that things don't work quite as well as expected. > >I will be curious to see where these things show up since >all you really need is a power plug. (a little nervous actually). Yes.. That *will* be interesting... And wait til someone has a cluster of Limuluses (Not sure of the proper alliterative collective noun, nor the plural form.. A litany of limuli? A school? A murder?) > >I agree. Four nodes is really small. BTW, the most fun in designing >this system is a set of tighter constraints than are found on the typical >cluster. Noise, power, space, cabling, low cost packaging, etc. I have >been asked about a rack mount version, we'll see. > >One thing I find interesting is the core/node efficiency. >(what I call "effective cores") In general *on some codes*, I found that >less cores (1P micro-atx 4-cores) is more efficient than many >cores (2P server 12-core). Seems obvious, but I like to test things. Yes, because we're using, in general, commodity components/assemblies, we're subject to the results of optimizations and market/business forces in other user spaces. Someone designing a media PC for home use might not care about electrical efficiency (there's no big yellow energy tags on computers, yet), but would care about noise. Someone designing a rack mounted server cares not a whit about noise, but really cares about a 10% change in power consumption. And, drop on top of that the non-synchronized differences in development/manufacturing/fabrication generations for the underlying parts. Consumer stuff comes out for the winter selling season. Commercial stuff probably is on a different cycle. It's not like everyone uses the same "model year changeover". > >> >> >> (oddly, simulated fault injection is one of the trickier parts) >> > >I would assume, because in a sense, the black swan* is >by definition hard to predict. Not so much that, as the actual mechanics of fault injection. Think about testing error detection and recovery for Flash memory. The underlying specification error rate is something like 1E-9 or 1E-10/read, and that's a worst case kind of spec, so errors aren't too common (I.e. You can't just run and wait for them to occur). SO how do you cause errors to occur (without perturbing the system.)... In the flash case, because we developed our own flash controller logic in an FPGA, we can add "error injection logic" to the design, but that's not always the case. How would you simulate upsets in a CPU core? (short of blasting it with radiation.. Which is difficult and expensive.. I wish it was as easy as getting a little Co60 gamma source and putting it on top of the chip.. We hike to somewhere that has an accelerator (UC Davis, Brookhaven, etc) and shoot protons and heavy ions at it. > >(* the book by Nick Taleb, not the movie) Black swans in this case would be things like the Pentium divide bug. Yes.. That *would* be a challenge, but hey, we've got folks in our JPL Laboratory for Reliable Software (LARS) who sit around thinking of how to do that, among other things. (http://lars-lab.jpl.nasa.gov/) Hmm.. I'll have to go talk to those guys about clusters of pi or arduinos... They're big into formal verifications, too, and model based verification. So you could have a modeled system in SysML or UML and compare its behavior with that on your prototype. > > >-- >Doug > >-- >This message has been scanned for viruses and >dangerous content by MailScanner, and is >believed to be clean. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 13 23:18:57 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 13 Jan 2012 23:18:57 -0500 (EST) Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: > care about electrical efficiency (there's no big yellow energy tags on > computers, yet), but would care about noise. Someone designing a rack the "plus 80" branding is pretty ubiquitous now, and the best part is that commodity ATX parts are starting to show up at gold levels. server vendors have offered gold or platinum for a while now, but it's probably more important in the home, since personal machines spend more time idling, thus running the PSU at low demand. poor-quality PSUs are remarkably bad at low utilization. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Fri Jan 13 23:46:17 2012 From: samuel at unimelb.edu.au (Chris Samuel) Date: Sat, 14 Jan 2012 15:46:17 +1100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> References: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> Message-ID: <201201141546.17872.samuel@unimelb.edu.au> On Sat, 14 Jan 2012 02:18:02 AM Douglas Eadline wrote: > I would assume, because in a sense, the black swan* is > by definition hard to predict. Ahem, not around here, they're all black [1]. Now a white swan, that would be something to see! [1] http://www.flickr.com/photos/earthinmyeyes/4608041877/ cheers! Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Jan 19 09:46:26 2012 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 19 Jan 2012 09:46:26 -0500 Subject: [Beowulf] Parallel Programming Survey Report In-Reply-To: <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> References: <4F0F25D2.90305@runnersroll.com> <4F0FC2F2.5090606@unimelb.edu.au> <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> Message-ID: <6ec5ed08a6fb8c5b390b26bdfc18803a.squirrel@mail.eadline.org> Last year Dr Dobb's did a survey of parallel programming. Today I received a copy of: The Parallel Programming Landscape: Multicore has gone mainstream -- but are developers ready? It is mostly about multi-core and a bit Intel centric (they sponsored it) and not too much about HPC. Still interesting to see how the programming world is coping with multi-core. If you are interested in a copy you have to sign up here: https://www.cmpadministration.com/ars/emailnew.do?mode=emailnew&P=P2&MZP=&L=&F=1003933&K=&cid_download I'll probably read closer and post a summary on Cluster Monkey at some point. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eugen at leitl.org Thu Jan 19 09:57:37 2012 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 19 Jan 2012 15:57:37 +0100 Subject: [Beowulf] Parallel Programming Survey Report In-Reply-To: <6ec5ed08a6fb8c5b390b26bdfc18803a.squirrel@mail.eadline.org> References: <4F0F25D2.90305@runnersroll.com> <4F0FC2F2.5090606@unimelb.edu.au> <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> <6ec5ed08a6fb8c5b390b26bdfc18803a.squirrel@mail.eadline.org> Message-ID: <20120119145737.GK21917@leitl.org> On Thu, Jan 19, 2012 at 09:46:26AM -0500, Douglas Eadline wrote: > Last year Dr Dobb's did a survey of parallel programming. > Today I received a copy of: > > The Parallel Programming Landscape: Multicore has gone mainstream -- > but are developers ready? > > It is mostly about multi-core and a bit Intel centric (they > sponsored it) and not too much about HPC. Still interesting > to see how the programming world is coping with multi-core. > If you are interested in a copy you have to sign up here: > > https://www.cmpadministration.com/ars/emailnew.do?mode=emailnew&P=P2&MZP=&L=&F=1003933&K=&cid_download > > I'll probably read closer and post a summary on Cluster Monkey > at some point. While speaking about multicore, I recommend this 21 min video interview (even if you dislike talking heads and smarmy interviewers) with david Ungar: http://channel9.msdn.com/Blogs/Charles/SPLASH-2011-David-Ungar-Self-ManyCore-and-Embracing-Non-Determinism _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Mon Jan 23 08:45:10 2012 From: eugen at leitl.org (Eugen Leitl) Date: Mon, 23 Jan 2012 14:45:10 +0100 Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=94And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= Message-ID: <20120123134510.GF7343@leitl.org> (Old idea, makes sense, will they be able to pull it off?) http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/ CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy Sunday, January 22, 2012 - by Joel Hruska The CPU design firm Venray Technology announced a new product design this week that it claims can deliver enormous performance benefits by combining CPU and DRAM on to a single piece of silicon. We spent some time earlier this fall discussing the new TOMI (Thread Optimized Multiprocessor) with company CTO Russell Fish, but while the idea is interesting; its presentation is marred by crazy conceptualizing and deeply suspect analytics. The Multicore Problem: There are three limiting factors, or walls, that limit the scaling of modern microprocessors. First, there's the memory wall, defined as the gap between the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level Parallelism) wall, which refers to the difficulty of decoding enough instructions per clock cycle to keep a core completely busy. Finally, there's the power wall--the faster a CPU is and the more cores it has, the more power it consumes. Attempting to compensate for one wall often risks running afoul of the other two. Adding more cache to decrease the impact of the CPU/DRAM speed discrepancy adds die complexity and draws more power, as does raising CPU clock speed. Combined, the three walls are a set of fundamental constraints--improving architectural efficiency and moving to a smaller process technology may make the room a bit bigger, but they don't remove the walls themselves. TOMI attempts to redefine the problem by building a very different type of microprocessor. The TOMI Borealis is built using the same transistor structures as conventional DRAM; the chip trades clock speed and performance for ultra-low low leakage. Its design is, by necessity, extremely simple. Not counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000 transistors for the original ARM2. The company's early prototypes, built on legacy DRAM technology, ran at 500MHz on a 110nm process. Instead of surrounding a CPU core with a substantial amount of L2 and L3 cache, Venray inserted a CPU core directly into a DRAM design. A TOMI Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16 ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM. Because they're built using ultra-low-leakage processes and are so small, such cores cost very little to build and consume vanishingly small amounts of power (Venray claims power consumption is as low as 23mW per core at 500MHz). It's an interesting idea. The Bad: When your CPU has fewer transistors than an architecture that debuted in 1986, it's a good chance that you left a few things out--like an FPU, branch prediction, pipelining, or any form of speculative execution. Venray may have created a chip with power consumption an order of magnitude lower than anything ARM builds and more memory bandwidth than Intel's highest-end Xeons, but it's an ultra-specialized, ultra-lightweight core that trades 25 years of flexibility and performance for scads of memory bandwidth. The last few years have seen a dramatic surge in the number of low-power, many-core architectures being floated as the potential future of computing, but Venray's approach relies on the manufacturing expertise of companies who have no experience in building microprocessors and don't normally serve as foundries. This imposes fundamental restrictions on the CPU's ability to scale; DRAM is manufactured using a three layer mask rather than the 10-12 layers Intel and AMD use for their CPUs. Venray already acknowledges that these conditions imposed substantial limitations on the original TOMI design. Of course, there's still a chance that the TOMI uarch could be effective in certain bandwidth-hungry scenarios--but that's where the Venray Crazy Train goes flying off the track. The Disingenuous and Crazy Let's start here. In a graph like this, you expect the two bars to represent the same systems being compared across three different characteristics. That's not the case. When we spoke to Russell Fish in late November, he pointed us to this publicly available document and claimed that the results came from a customer with 384 2.1GHz Xeons. There's no such thing as an S5620 Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz chip. The "Power consumption" graphs show Oracle's maximum power consumption for a system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB) of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case figure, it's a figure utterly unrelated to the workload shown in the Performance comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of them only come out to 1.3kW--Oracle's 17.7kW figure means that the overwhelming majority of the cabinet's power consumption is driven by components other than its CPUs. >From here, things rapidly get worse. Fish makes his points about power walls by referring to unverified claims that prototype 90nm Tejas chips drew 150W at 2.8GHz back in 2004. That's like arguing that Ford can't build a decent car because the Edsel sucked. After reading about the technology, you might think Venray was planning to market a small chip to high-end HPC niche markets... and you'd be wrong. The company expects the following to occur as a result of this revolutionary architecture (organized by least-to-most creepy): Computer speech will be so common that devices will talk to other devices in the presence of their users. Your cell phone camera will recognize the face of anyone it sees and scan the computer cloud for backround red flags as well as six degrees of separation Common commands will be reduced to short verbal cues like clicking your tongue or sucking your lips Your personal history will be displayed for one and all to see...women will create search engines to find eligible, prosperous men. Men will create search engines to qualify women. Criminals will find their jobs much more difficult because their history will be immediately known to anyone who encounters them. TOMI Technology will be built on flash memories creating the elemental unit of a learning machine... the machines will be able to self organize, build robust communicating structures, and collaborate to perform tasks. A disposable diaper company will give away TOMI enabled teddy bears that teach reading and arithmetic. It will be able to identify specific children... and from time to time remind Mom to buy a product. The bear will also diagnose a raspy throat, a cough, or runny nose. Conclusion: Fish has spent decades in the microprocessor industry--he invented the first CPU to use a clock multiplier in conjunction with Chuck H. Moore--but his vision of the future is crazy enough to scare mad dogs and Englishmen. His idea for a CPU architecture is interesting, even underneath the obfuscation and false representation, but too practically limited to ever take off. Google, an enthusiastic and dedicated proponent of energy efficient, multi-core research said it best in a paper titled "Brawny cores still beat wimpy cores, most of the time." "Once a chip?s single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making a business case for switching to the wimpy system becomes increasingly difficult... So go forth and multiply your cores, but do it in moderation, or the sea of wimpy cores will stick to your programmers? boots like clay." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 23 10:38:39 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 23 Jan 2012 10:38:39 -0500 Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=94And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= In-Reply-To: <20120123134510.GF7343@leitl.org> References: <20120123134510.GF7343@leitl.org> Message-ID: <4F1D7EFF.7080206@ias.edu> If you read this PDF from Venray Technologies, which is linked to in the article, you see where the 'Whole Bunch of Crazy" part comes from. After reading it, Venray lost a lot of credibility in my book. https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf -- Prentice On 01/23/2012 08:45 AM, Eugen Leitl wrote: > (Old idea, makes sense, will they be able to pull it off?) > > http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/ > > CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy > > Sunday, January 22, 2012 - by Joel Hruska > > The CPU design firm Venray Technology announced a new product design this > week that it claims can deliver enormous performance benefits by combining > CPU and DRAM on to a single piece of silicon. We spent some time earlier this > fall discussing the new TOMI (Thread Optimized Multiprocessor) with company > CTO Russell Fish, but while the idea is interesting; its presentation is > marred by crazy conceptualizing and deeply suspect analytics. > > The Multicore Problem: > > There are three limiting factors, or walls, that limit the scaling of modern > microprocessors. First, there's the memory wall, defined as the gap between > the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level > Parallelism) wall, which refers to the difficulty of decoding enough > instructions per clock cycle to keep a core completely busy. Finally, there's > the power wall--the faster a CPU is and the more cores it has, the more power > it consumes. > > Attempting to compensate for one wall often risks running afoul of the other > two. Adding more cache to decrease the impact of the CPU/DRAM speed > discrepancy adds die complexity and draws more power, as does raising CPU > clock speed. Combined, the three walls are a set of fundamental > constraints--improving architectural efficiency and moving to a smaller > process technology may make the room a bit bigger, but they don't remove the > walls themselves. > > TOMI attempts to redefine the problem by building a very different type of > microprocessor. The TOMI Borealis is built using the same transistor > structures as conventional DRAM; the chip trades clock speed and performance > for ultra-low low leakage. Its design is, by necessity, extremely simple. Not > counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000 > transistors for the original ARM2. The company's early prototypes, built on > legacy DRAM technology, ran at 500MHz on a 110nm process. > > Instead of surrounding a CPU core with a substantial amount of L2 and L3 > cache, Venray inserted a CPU core directly into a DRAM design. A TOMI > Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16 > ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM. > Because they're built using ultra-low-leakage processes and are so small, > such cores cost very little to build and consume vanishingly small amounts of > power (Venray claims power consumption is as low as 23mW per core at 500MHz). > > It's an interesting idea. > > The Bad: > > When your CPU has fewer transistors than an architecture that debuted in > 1986, it's a good chance that you left a few things out--like an FPU, branch > prediction, pipelining, or any form of speculative execution. Venray may have > created a chip with power consumption an order of magnitude lower than > anything ARM builds and more memory bandwidth than Intel's highest-end Xeons, > but it's an ultra-specialized, ultra-lightweight core that trades 25 years of > flexibility and performance for scads of memory bandwidth. > > > The last few years have seen a dramatic surge in the number of low-power, > many-core architectures being floated as the potential future of computing, > but Venray's approach relies on the manufacturing expertise of companies who > have no experience in building microprocessors and don't normally serve as > foundries. This imposes fundamental restrictions on the CPU's ability to > scale; DRAM is manufactured using a three layer mask rather than the 10-12 > layers Intel and AMD use for their CPUs. Venray already acknowledges that > these conditions imposed substantial limitations on the original TOMI design. > > Of course, there's still a chance that the TOMI uarch could be effective in > certain bandwidth-hungry scenarios--but that's where the Venray Crazy Train > goes flying off the track. > > The Disingenuous and Crazy > > Let's start here. In a graph like this, you expect the two bars to represent > the same systems being compared across three different characteristics. > That's not the case. When we spoke to Russell Fish in late November, he > pointed us to this publicly available document and claimed that the results > came from a customer with 384 2.1GHz Xeons. There's no such thing as an S5620 > Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz chip. > > The "Power consumption" graphs show Oracle's maximum power consumption for a > system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB) > of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case figure, > it's a figure utterly unrelated to the workload shown in the Performance > comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of > them only come out to 1.3kW--Oracle's 17.7kW figure means that the > overwhelming majority of the cabinet's power consumption is driven by > components other than its CPUs. > > From here, things rapidly get worse. Fish makes his points about power walls > by referring to unverified claims that prototype 90nm Tejas chips drew 150W > at 2.8GHz back in 2004. That's like arguing that Ford can't build a decent > car because the Edsel sucked. > > After reading about the technology, you might think Venray was planning to > market a small chip to high-end HPC niche markets... and you'd be wrong. The > company expects the following to occur as a result of this revolutionary > architecture (organized by least-to-most creepy): > > Computer speech will be so common that devices will talk to other devices > in the presence of their users. > > Your cell phone camera will recognize the face of anyone it sees and scan > the computer cloud for backround red flags as well as six degrees of > separation > > Common commands will be reduced to short verbal cues like clicking your > tongue or sucking your lips > > Your personal history will be displayed for one and all to see...women > will create search engines to find eligible, prosperous men. Men will create > search engines to qualify women. Criminals will find their jobs much more > difficult because their history will be immediately known to anyone who > encounters them. > > TOMI Technology will be built on flash memories creating the elemental > unit of a learning machine... the machines will be able to self organize, > build robust communicating structures, and collaborate to perform tasks. > > A disposable diaper company will give away TOMI enabled teddy bears that > teach reading and arithmetic. It will be able to identify specific > children... and from time to time remind Mom to buy a product. The bear will > also diagnose a raspy throat, a cough, or runny nose. > > Conclusion: > > Fish has spent decades in the microprocessor industry--he invented the first > CPU to use a clock multiplier in conjunction with Chuck H. Moore--but his > vision of the future is crazy enough to scare mad dogs and Englishmen. > > His idea for a CPU architecture is interesting, even underneath the > obfuscation and false representation, but too practically limited to ever > take off. Google, an enthusiastic and dedicated proponent of energy > efficient, multi-core research said it best in a paper titled "Brawny cores > still beat wimpy cores, most of the time." > > "Once a chip?s single-core performance lags by more than a factor to two or > so behind the higher end of current-generation commodity processors, making a > business case for switching to the wimpy system becomes increasingly > difficult... So go forth and multiply your cores, but do it in moderation, or > the sea of wimpy cores will stick to your programmers? boots like clay." > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 23 11:35:56 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 08:35:56 -0800 Subject: [Beowulf] =?windows-1252?q?CPU_Startup_Combines_CPU+DRAM=8BAnd_A_?= =?windows-1252?q?Whole_Bunch_Of_Crazy?= In-Reply-To: <4F1D7EFF.7080206@ias.edu> Message-ID: The CPU reminds me of the old bipolar AMD2901 CPU chip sets... RISC before it was called RISC. The white paper sort of harps on the fact that one cannot accurately predict the future (hey, I was a 10th grader at NCC in 1975, and saw the Altair at the MITS display in their trailer and KNEW that I wanted one, but I also wanted lots of other things there, which didn't pan out). Then, having established that you can make predictions with impunity and nobody can prove you wrong, they go on with a couple pages of ideas. (establishing priority for patenting.. Eh? Like the story Feynman tells about getting a patent on nuclear powered airplanes) The concept isn't particularly new (see, e.g. Transputers), but that's true of most architectural things. I think what happens is that as manufacturing or other limits/bumps in the road are hit, it forces a review. There's always the argument that building a bigger, faster version of what we had before is easier (support for legacy codes, etc.) and at some point, the balance shifts.. It's not easier to just build bigger faster. Vector processors Pipelines Cluster computers Etc. The "processors in a sea of memory" model has been around for a while (and, in fact, there were a lot of designs in the 80s, at the board if not the chip level: transputers, early hypercubes, etc.) So this is revisiting the architecture at a smaller level of integration. One thing about power consumption.. Those memory cells consume so little power because most of them are not being accessed. They're essentially "floating" capacitors. So the power consumption of the same transistor in a CPU (where the duty factor is 100%) is going to be higher than the power consumption in a memory cell (where the duty factor is 0.001% or something). And, as always, the challenge is in the software to effectively use the distributed computing architecture. When you think about it, we've had almost a century to figure out how to program single instruction stream computers of one sort or another, and it was easy, because we are single stream (SISD) ourselves. We can create a simulation of multiple threads by timesharing in some sense (in either the human or machine models) And we have lots of experience with EP type, or even scatter/gather type processes (tilling land, building pyramids, assembly lines) so that model of software/hardware architecture can be argued to be a natural outgrowth of what humans already do, and have been figuring out how to do for millenia. (did Imhotep use some form of project planning tools? You bet he did) However, true parallelism (MIMD) is harder to conceptualize. Vector and matrix math is one area, but I'd argue that it's just the same as EP tasks, just at a finer grain. Systolic arrays, vector pipelines, FFT boxes from FloatingPointSystems, are all basically ways to use the underlying structure of the task, in an easy way (how long til there's a hardware implementation of the new faster-than-FFT algorithm published last week?) And in all those cases, you have to explicitly make use of the special capabilities. That is, in general, the compiler doesn't recognize it (although, modern parallelizing compilers ARE really smart.. So they probably do find most of the cases) I don't know that we have good conceptual tools to take a complex task and break it effectively into multiple disparate component tasks that can effectively run in parallel. It's a hard task for something straightforward (e.g. Designing a big system or building a spacecraft), and I don't know that any of outputs of current project planning techniques (which are entirely manual) can be said to produce "generalized" optimum outputs. They produce *an* output for dividing the complex task up (or else the project can't be done), but I don't know that the output is provably optimum or even workable (an awful lot of projects over-run, and not just because of bad estimates for time/cost). So the problem facing would-be users of new computing architectures (be they TOMI, HyperCube, ConnectionMachine, or Beowulf) is like that facing a project planner given a big project, and a brand new crew of workers who speak a different language, with skill sets totally different than the planner is used to. This is what the computer user is facing: There's no compiler or problem description technique that will automatically generate a "work plan" to use that new architecture. It's all manual, and it's hard, and you're up against a brute force "why not just hook 500 people up to that rock and drag it" approach. The people who figure out the new way will certainly benefit society, but there's going to be a lot of false starts along the way. And, I'm not particularly sanguine about the process being automated (at least in the sense of automatic parallelizing compilers that recognize loops and repetitve stuff). I think that for the next few years (decades?) using new architectures is going to rely on skilled humans to figure out how to use it, on an ad hoc, unique to each application, basis. [Back in the 80s, I had a loaner "sugarcube" 4 node Intel hypercube sitting on my desk for a while. I wanted to figure out something to do with it that is non-trivial, and not the examples given in the docs (which focused on stuff like LISP and Prolog). I started, as I'm sure many people do, by taking a multithreaded application I had, and distributing the threads to processors. You pretty quickly realize, though, that it's tough to evenly distribute the loads among processors, and you wind up with processor 1 waiting for something that processor 2 is doing, which in turn is waiting for something that processor 3 is doing, and so forth. In a "shared processor" this isn't a big deal, and is transparent: the processor is always working, and aside from deadlocks, there's no particular reason why you need to balance load among threads. For what it's worth, the task I was doing was comparable to taking execution of a Matlab/simulink model and distributing it across multiple processors. You had signals flowing among blocks, etc. These things are computationally intensive (especially if you have loops in the design, so you need an iterative solution of some sort) so the idea of putting multiple processors to work is attractive. But the "work" in each block in the diagram isn't known a-priori and might vary during the course of the simulation, so it's not like you can come up with some sort of automatic partitioning algorithm. On 1/23/12 7:38 AM, "Prentice Bisbal" wrote: >If you read this PDF from Venray Technologies, which is linked to in the >article, you see where the 'Whole Bunch of Crazy" part comes from. After >reading it, Venray lost a lot of credibility in my book. > >https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf > >-- >Prentice > > >On 01/23/2012 08:45 AM, Eugen Leitl wrote: >> (Old idea, makes sense, will they be able to pull it off?) >> >> >>http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch >>-Of-Crazy/ >> >> CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy >> >> Sunday, January 22, 2012 - by Joel Hruska >> >> The CPU design firm Venray Technology announced a new product design >>this >> week that it claims can deliver enormous performance benefits by >>combining >> CPU and DRAM on to a single piece of silicon. We spent some time >>earlier this >> fall discussing the new TOMI (Thread Optimized Multiprocessor) with >>company >> CTO Russell Fish, but while the idea is interesting; its presentation is >> marred by crazy conceptualizing and deeply suspect analytics. >> >> The Multicore Problem: >> >> There are three limiting factors, or walls, that limit the scaling of >>modern >> microprocessors. First, there's the memory wall, defined as the gap >>between >> the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level >> Parallelism) wall, which refers to the difficulty of decoding enough >> instructions per clock cycle to keep a core completely busy. Finally, >>there's >> the power wall--the faster a CPU is and the more cores it has, the more >>power >> it consumes. >> >> Attempting to compensate for one wall often risks running afoul of the >>other >> two. Adding more cache to decrease the impact of the CPU/DRAM speed >> discrepancy adds die complexity and draws more power, as does raising >>CPU >> clock speed. Combined, the three walls are a set of fundamental >> constraints--improving architectural efficiency and moving to a smaller >> process technology may make the room a bit bigger, but they don't >>remove the >> walls themselves. >> >> TOMI attempts to redefine the problem by building a very different type >>of >> microprocessor. The TOMI Borealis is built using the same transistor >> structures as conventional DRAM; the chip trades clock speed and >>performance >> for ultra-low low leakage. Its design is, by necessity, extremely >>simple. Not >> counting the cache, TOMI is a 22,000 transistor design, as compared to >>30,000 >> transistors for the original ARM2. The company's early prototypes, >>built on >> legacy DRAM technology, ran at 500MHz on a 110nm process. >> >> Instead of surrounding a CPU core with a substantial amount of L2 and L3 >> cache, Venray inserted a CPU core directly into a DRAM design. A TOMI >> Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of >>16 >> ICs per 2GB DIMM. This works out to a total of 128 processor cores per >>DIMM. >> Because they're built using ultra-low-leakage processes and are so >>small, >> such cores cost very little to build and consume vanishingly small >>amounts of >> power (Venray claims power consumption is as low as 23mW per core at >>500MHz). >> >> It's an interesting idea. >> >> The Bad: >> >> When your CPU has fewer transistors than an architecture that debuted in >> 1986, it's a good chance that you left a few things out--like an FPU, >>branch >> prediction, pipelining, or any form of speculative execution. Venray >>may have >> created a chip with power consumption an order of magnitude lower than >> anything ARM builds and more memory bandwidth than Intel's highest-end >>Xeons, >> but it's an ultra-specialized, ultra-lightweight core that trades 25 >>years of >> flexibility and performance for scads of memory bandwidth. >> >> >> The last few years have seen a dramatic surge in the number of >>low-power, >> many-core architectures being floated as the potential future of >>computing, >> but Venray's approach relies on the manufacturing expertise of >>companies who >> have no experience in building microprocessors and don't normally serve >>as >> foundries. This imposes fundamental restrictions on the CPU's ability to >> scale; DRAM is manufactured using a three layer mask rather than the >>10-12 >> layers Intel and AMD use for their CPUs. Venray already acknowledges >>that >> these conditions imposed substantial limitations on the original TOMI >>design. >> >> Of course, there's still a chance that the TOMI uarch could be >>effective in >> certain bandwidth-hungry scenarios--but that's where the Venray Crazy >>Train >> goes flying off the track. >> >> The Disingenuous and Crazy >> >> Let's start here. In a graph like this, you expect the two bars to >>represent >> the same systems being compared across three different characteristics. >> That's not the case. When we spoke to Russell Fish in late November, he >> pointed us to this publicly available document and claimed that the >>results >> came from a customer with 384 2.1GHz Xeons. There's no such thing as an >>S5620 >> Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz >>chip. >> >> The "Power consumption" graphs show Oracle's maximum power consumption >>for a >> system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB >>(yes, TB) >> of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case >>figure, >> it's a figure utterly unrelated to the workload shown in the Performance >> comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, >>ten of >> them only come out to 1.3kW--Oracle's 17.7kW figure means that the >> overwhelming majority of the cabinet's power consumption is driven by >> components other than its CPUs. >> >> From here, things rapidly get worse. Fish makes his points about power >>walls >> by referring to unverified claims that prototype 90nm Tejas chips drew >>150W >> at 2.8GHz back in 2004. That's like arguing that Ford can't build a >>decent >> car because the Edsel sucked. >> >> After reading about the technology, you might think Venray was planning >>to >> market a small chip to high-end HPC niche markets... and you'd be >>wrong. The >> company expects the following to occur as a result of this revolutionary >> architecture (organized by least-to-most creepy): >> >> Computer speech will be so common that devices will talk to other >>devices >> in the presence of their users. >> >> Your cell phone camera will recognize the face of anyone it sees >>and scan >> the computer cloud for backround red flags as well as six degrees of >> separation >> >> Common commands will be reduced to short verbal cues like clicking >>your >> tongue or sucking your lips >> >> Your personal history will be displayed for one and all to >>see...women >> will create search engines to find eligible, prosperous men. Men will >>create >> search engines to qualify women. Criminals will find their jobs much >>more >> difficult because their history will be immediately known to anyone who >> encounters them. >> >> TOMI Technology will be built on flash memories creating the >>elemental >> unit of a learning machine... the machines will be able to self >>organize, >> build robust communicating structures, and collaborate to perform tasks. >> >> A disposable diaper company will give away TOMI enabled teddy bears >>that >> teach reading and arithmetic. It will be able to identify specific >> children... and from time to time remind Mom to buy a product. The bear >>will >> also diagnose a raspy throat, a cough, or runny nose. >> >> Conclusion: >> >> Fish has spent decades in the microprocessor industry--he invented the >>first >> CPU to use a clock multiplier in conjunction with Chuck H. Moore--but >>his >> vision of the future is crazy enough to scare mad dogs and Englishmen. >> >> His idea for a CPU architecture is interesting, even underneath the >> obfuscation and false representation, but too practically limited to >>ever >> take off. Google, an enthusiastic and dedicated proponent of energy >> efficient, multi-core research said it best in a paper titled "Brawny >>cores >> still beat wimpy cores, most of the time." >> >> "Once a chip?s single-core performance lags by more than a factor to >>two or >> so behind the higher end of current-generation commodity processors, >>making a >> business case for switching to the wimpy system becomes increasingly >> difficult... So go forth and multiply your cores, but do it in >>moderation, or >> the sea of wimpy cores will stick to your programmers? boots like clay." >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf >> >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Mon Jan 23 14:28:26 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 23 Jan 2012 11:28:26 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business Message-ID: <20120123192826.GB17383@bx9.net> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 14:59:30 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 23 Jan 2012 20:59:30 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120123192826.GB17383@bx9.net> References: <20120123192826.GB17383@bx9.net> Message-ID: Interesting article. Difficult for me analyse - usually you sell your business when it's a succes, or when you want to run away. Not sure which of the 2 it is here. Maybe some years from now with some support from Intel that Qlogic also can unroll FDR. Right now they're stuck with QDR, which on their homepage they announce as 40 gigabit per second. http://www.qlogic.com/Products/adapters/Pages/InfiniBandAdapters.aspx Showing the Qlogic 7300 series. Mellanox is slamdunking with FDR now, the new generation network which is double the bandwidth i suppose from QDR, which already got unrolled a few months ago and should be shipping by now. Qlogic AFAIK didn't even announce their next generation network yet, let alone display it and still toys with QDR, which is what i toy at home with. Fact they announced 'improving' the oldie QDR i would interpret as bad news for innovating to FDR. Maybe someone from Mellanox wants to comment on FDR and whether it's double the bandwidth of QDR, as i suppose some will be monitoring this list. On Jan 23, 2012, at 8:28 PM, Greg Lindahl wrote: > http://www.hpcwire.com/hpcwire/2012-01-23/ > intel_to_buy_qlogic_s_infiniband_business.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Jan 23 15:00:07 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 23 Jan 2012 15:00:07 -0500 (EST) Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=94And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= In-Reply-To: <4F1D7EFF.7080206@ias.edu> References: <20120123134510.GF7343@leitl.org> <4F1D7EFF.7080206@ias.edu> Message-ID: > If you read this PDF from Venray Technologies, which is linked to in the > article, you see where the 'Whole Bunch of Crazy" part comes from. After > reading it, Venray lost a lot of credibility in my book. > > https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf wow, you're not kidding. mostly it makes me wonder whether the economy is such that you can actually get first-round VC with collateral like that! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 23 15:17:01 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 12:17:01 -0800 Subject: [Beowulf] =?windows-1252?q?CPU_Startup_Combines_CPU+DRAM=8BAnd_A_?= =?windows-1252?q?Whole_Bunch_Of_Crazy?= In-Reply-To: Message-ID: I don't know.. Maybe it's the list of potential applications (some of which are speculative and well out there) is what it takes to justify VC.. Like DARPA.. High risk, high reward. The typical VC doesn't expect every investment to hit, but the ones that do, they want big returns from. If you're just interested in slogging through successive refinement, there are probably other sources of capital that are more appropriate. While some of those things are downright creepy, none of them appear to violate the laws of physics, and if someone with cash is willing to put some up to run the idea forward and establish a position (patent term is 20 years after all.. Which is a long ways in the future in the technology world). In 2030 there may be gripes on the equivalent of SlashDot about how this Venray had patents on all the fundamental things people are using. Think of hyperlinks, mice, etc. On 1/23/12 12:00 PM, "Mark Hahn" wrote: >> If you read this PDF from Venray Technologies, which is linked to in the >> article, you see where the 'Whole Bunch of Crazy" part comes from. After >> reading it, Venray lost a lot of credibility in my book. >> >> https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf > >wow, you're not kidding. mostly it makes me wonder whether the economy >is such that you can actually get first-round VC with collateral like >that! >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Mon Jan 23 15:50:09 2012 From: raysonlogin at gmail.com (Rayson Ho) Date: Mon, 23 Jan 2012 15:50:09 -0500 Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=B9And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= In-Reply-To: References: <4F1D7EFF.7080206@ias.edu> Message-ID: On Mon, Jan 23, 2012 at 11:35 AM, Lux, Jim (337C) wrote: > The "processors in a sea of memory" model has been around for a while > (and, in fact, there were a lot of designs in the 80s, at the board if not > the chip level: transputers, early hypercubes, etc.) ?So this is > revisiting the architecture at a smaller level of integration. I remember 12-15 years ago I was reading quite a few papers published by the Berkeley Intelligent RAM (IRAM) Project: http://iram.cs.berkeley.edu/ So 15 years later someone suddenly thinks that it is a good idea to ship IRAM systems to real customers?? :-D Rayson ================================= Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > One thing about power consumption.. Those memory cells consume so little > power because most of them ?are not being accessed. ?They're essentially > "floating" capacitors. So the power consumption of the same transistor in > a CPU (where the duty factor is 100%) is going to be higher than the power > consumption in a memory cell (where the duty factor is 0.001% or > something). > > And, as always, the challenge is in the software to effectively use the > distributed computing architecture. ?When you think about it, we've had > almost a century to figure out how to program single instruction stream > computers of one sort or another, and it was easy, because we are single > stream (SISD) ourselves. ?We can create a simulation of multiple threads > by timesharing in some sense (in either the human or machine models) > > And we have lots of experience with EP type, or even scatter/gather type > processes (tilling land, building pyramids, assembly lines) so that model > of software/hardware architecture can be argued to be a natural outgrowth > of what humans already do, and have been figuring out how to do for > millenia. ?(did Imhotep use some form of project planning tools? ?You bet > he did) > > However, true parallelism (MIMD) is harder to conceptualize. ?Vector and > matrix math is one area, but I'd argue that it's just the same as EP > tasks, just at a finer grain. Systolic arrays, vector pipelines, FFT boxes > from FloatingPointSystems, are all basically ways to use the underlying > structure of the task, in an easy way (how long til there's a hardware > implementation of the new faster-than-FFT algorithm published last week?) > And in all those cases, you have to explicitly make use of the special > capabilities. ?That is, in general, the compiler doesn't recognize it > (although, modern parallelizing compilers ARE really smart.. So they > probably do find most of the cases) > > I don't know that we have good conceptual tools to take a complex task and > break it effectively into multiple disparate component tasks that can > effectively run in parallel. ?It's a hard task for something > straightforward (e.g. Designing a big system or building a spacecraft), > and I don't know that any of outputs of current project planning > techniques (which are entirely manual) can be said to produce > "generalized" optimum outputs. ?They produce *an* output for dividing the > complex task up (or else the project can't be done), but I don't know that > the output is provably optimum or even workable (an awful lot of projects > over-run, and not just because of bad estimates for time/cost). > > So the problem facing would-be users of new computing architectures (be > they TOMI, HyperCube, ConnectionMachine, or Beowulf) is like that facing a > project planner given a big project, and a brand new crew of workers who > speak a different language, with skill sets totally different than the > planner is used to. > > This is what the computer user is facing: ?There's no compiler or problem > description technique that will automatically generate a "work plan" to > use that new architecture. It's all manual, and it's hard, and you're up > against a brute force "why not just hook 500 people up to that rock and > drag it" approach. ?The people who figure out the new way will certainly > benefit society, but there's going to be a lot of false starts along the > way. ?And, I'm not particularly sanguine about the process being automated > (at least in the sense of automatic parallelizing compilers that recognize > loops and repetitve stuff). ?I think that for the next few years > (decades?) using new architectures is going to rely on skilled humans to > figure out how to use it, on an ad hoc, unique to each application, basis. > > > [Back in the 80s, I had a loaner "sugarcube" 4 node Intel hypercube > sitting on my desk for a while. ?I wanted to figure out something to do > with it that is non-trivial, and not the examples given in the docs (which > focused on stuff like LISP and Prolog). ?I started, as I'm sure many > people do, by taking a multithreaded application I had, and distributing > the threads to processors. ?You pretty quickly realize, though, that it's > tough to evenly distribute the loads among processors, and you wind up > with processor 1 waiting for something that processor 2 is doing, which in > turn is waiting for something that processor 3 is doing, and so forth. ?In > a "shared processor" this isn't a big deal, and is transparent: the > processor is always working, and aside from deadlocks, there's no > particular reason why you need to balance load among threads. > > For what it's worth, the task I was doing was comparable to taking > execution of a Matlab/simulink model and distributing it across multiple > processors. ?You had signals flowing among blocks, etc. ?These things are > computationally intensive (especially if you have loops in the design, so > you need an iterative solution of some sort) so the idea of putting > multiple processors to work is attractive. ? But the "work" in each block > in the diagram isn't known a-priori and might vary during the course of > the simulation, so it's not like you can come up with some sort of > automatic partitioning algorithm. > > > On 1/23/12 7:38 AM, "Prentice Bisbal" wrote: > >>If you read this PDF from Venray Technologies, which is linked to in the >>article, you see where the 'Whole Bunch of Crazy" part comes from. After >>reading it, Venray lost a lot of credibility in my book. >> >>https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf >> >>-- >>Prentice >> >> >>On 01/23/2012 08:45 AM, Eugen Leitl wrote: >>> (Old idea, makes sense, will they be able to pull it off?) >>> >>> >>>http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch >>>-Of-Crazy/ >>> >>> CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy >>> >>> Sunday, January 22, 2012 - by Joel Hruska >>> >>> The CPU design firm Venray Technology announced a new product design >>>this >>> week that it claims can deliver enormous performance benefits by >>>combining >>> CPU and DRAM on to a single piece of silicon. We spent some time >>>earlier this >>> fall discussing the new TOMI (Thread Optimized Multiprocessor) with >>>company >>> CTO Russell Fish, but while the idea is interesting; its presentation is >>> marred by crazy conceptualizing and deeply suspect analytics. >>> >>> The Multicore Problem: >>> >>> There are three limiting factors, or walls, that limit the scaling of >>>modern >>> microprocessors. First, there's the memory wall, defined as the gap >>>between >>> the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level >>> Parallelism) wall, which refers to the difficulty of decoding enough >>> instructions per clock cycle to keep a core completely busy. Finally, >>>there's >>> the power wall--the faster a CPU is and the more cores it has, the more >>>power >>> it consumes. >>> >>> Attempting to compensate for one wall often risks running afoul of the >>>other >>> two. Adding more cache to decrease the impact of the CPU/DRAM speed >>> discrepancy adds die complexity and draws more power, as does raising >>>CPU >>> clock speed. Combined, the three walls are a set of fundamental >>> constraints--improving architectural efficiency and moving to a smaller >>> process technology may make the room a bit bigger, but they don't >>>remove the >>> walls themselves. >>> >>> TOMI attempts to redefine the problem by building a very different type >>>of >>> microprocessor. The TOMI Borealis is built using the same transistor >>> structures as conventional DRAM; the chip trades clock speed and >>>performance >>> for ultra-low low leakage. Its design is, by necessity, extremely >>>simple. Not >>> counting the cache, TOMI is a 22,000 transistor design, as compared to >>>30,000 >>> transistors for the original ARM2. The company's early prototypes, >>>built on >>> legacy DRAM technology, ran at 500MHz on a 110nm process. >>> >>> Instead of surrounding a CPU core with a substantial amount of L2 and L3 >>> cache, Venray inserted a CPU core directly into a DRAM design. A TOMI >>> Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of >>>16 >>> ICs per 2GB DIMM. This works out to a total of 128 processor cores per >>>DIMM. >>> Because they're built using ultra-low-leakage processes and are so >>>small, >>> such cores cost very little to build and consume vanishingly small >>>amounts of >>> power (Venray claims power consumption is as low as 23mW per core at >>>500MHz). >>> >>> It's an interesting idea. >>> >>> The Bad: >>> >>> When your CPU has fewer transistors than an architecture that debuted in >>> 1986, it's a good chance that you left a few things out--like an FPU, >>>branch >>> prediction, pipelining, or any form of speculative execution. Venray >>>may have >>> created a chip with power consumption an order of magnitude lower than >>> anything ARM builds and more memory bandwidth than Intel's highest-end >>>Xeons, >>> but it's an ultra-specialized, ultra-lightweight core that trades 25 >>>years of >>> flexibility and performance for scads of memory bandwidth. >>> >>> >>> The last few years have seen a dramatic surge in the number of >>>low-power, >>> many-core architectures being floated as the potential future of >>>computing, >>> but Venray's approach relies on the manufacturing expertise of >>>companies who >>> have no experience in building microprocessors and don't normally serve >>>as >>> foundries. This imposes fundamental restrictions on the CPU's ability to >>> scale; DRAM is manufactured using a three layer mask rather than the >>>10-12 >>> layers Intel and AMD use for their CPUs. Venray already acknowledges >>>that >>> these conditions imposed substantial limitations on the original TOMI >>>design. >>> >>> Of course, there's still a chance that the TOMI uarch could be >>>effective in >>> certain bandwidth-hungry scenarios--but that's where the Venray Crazy >>>Train >>> goes flying off the track. >>> >>> The Disingenuous and Crazy >>> >>> Let's start here. In a graph like this, you expect the two bars to >>>represent >>> the same systems being compared across three different characteristics. >>> That's not the case. When we spoke to Russell Fish in late November, he >>> pointed us to this publicly available document and claimed that the >>>results >>> came from a customer with 384 2.1GHz Xeons. There's no such thing as an >>>S5620 >>> Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz >>>chip. >>> >>> The "Power consumption" graphs show Oracle's maximum power consumption >>>for a >>> system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB >>>(yes, TB) >>> of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case >>>figure, >>> it's a figure utterly unrelated to the workload shown in the Performance >>> comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, >>>ten of >>> them only come out to 1.3kW--Oracle's 17.7kW figure means that the >>> overwhelming majority of the cabinet's power consumption is driven by >>> components other than its CPUs. >>> >>> From here, things rapidly get worse. Fish makes his points about power >>>walls >>> by referring to unverified claims that prototype 90nm Tejas chips drew >>>150W >>> at 2.8GHz back in 2004. That's like arguing that Ford can't build a >>>decent >>> car because the Edsel sucked. >>> >>> After reading about the technology, you might think Venray was planning >>>to >>> market a small chip to high-end HPC niche markets... and you'd be >>>wrong. The >>> company expects the following to occur as a result of this revolutionary >>> architecture (organized by least-to-most creepy): >>> >>> ? ? Computer speech will be so common that devices will talk to other >>>devices >>> in the presence of their users. >>> >>> ? ? Your cell phone camera will recognize the face of anyone it sees >>>and scan >>> the computer cloud for backround red flags as well as six degrees of >>> separation >>> >>> ? ? Common commands will be reduced to short verbal cues like clicking >>>your >>> tongue or sucking your lips >>> >>> ? ? Your personal history will be displayed for one and all to >>>see...women >>> will create search engines to find eligible, prosperous men. Men will >>>create >>> search engines to qualify women. Criminals will find their jobs much >>>more >>> difficult because their history will be immediately known to anyone who >>> encounters them. >>> >>> ? ? TOMI Technology will be built on flash memories creating the >>>elemental >>> unit of a learning machine... the machines will be able to self >>>organize, >>> build robust communicating structures, and collaborate to perform tasks. >>> >>> ? ? A disposable diaper company will give away TOMI enabled teddy bears >>>that >>> teach reading and arithmetic. It will be able to identify specific >>> children... and from time to time remind Mom to buy a product. The bear >>>will >>> also diagnose a raspy throat, a cough, or runny nose. >>> >>> Conclusion: >>> >>> Fish has spent decades in the microprocessor industry--he invented the >>>first >>> CPU to use a clock multiplier in conjunction with Chuck H. Moore--but >>>his >>> vision of the future is crazy enough to scare mad dogs and Englishmen. >>> >>> His idea for a CPU architecture is interesting, even underneath the >>> obfuscation and false representation, but too practically limited to >>>ever >>> take off. Google, an enthusiastic and dedicated proponent of energy >>> efficient, multi-core research said it best in a paper titled "Brawny >>>cores >>> still beat wimpy cores, most of the time." >>> >>> ?"Once a chip?s single-core performance lags by more than a factor to >>>two or >>> so behind the higher end of current-generation commodity processors, >>>making a >>> business case for switching to the wimpy system becomes increasingly >>> difficult... So go forth and multiply your cores, but do it in >>>moderation, or >>> the sea of wimpy cores will stick to your programmers? boots like clay." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 23 15:58:11 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 12:58:11 -0800 Subject: [Beowulf] =?windows-1252?q?CPU_Startup_Combines_CPU+DRAM=8BAnd_A_?= =?windows-1252?q?Whole_Bunch_Of_Crazy?= In-Reply-To: Message-ID: On 1/23/12 12:50 PM, "Rayson Ho" wrote: >On Mon, Jan 23, 2012 at 11:35 AM, Lux, Jim (337C) > wrote: >> The "processors in a sea of memory" model has been around for a while >> (and, in fact, there were a lot of designs in the 80s, at the board if >>not >> the chip level: transputers, early hypercubes, etc.) So this is >> revisiting the architecture at a smaller level of integration. > >I remember 12-15 years ago I was reading quite a few papers published >by the Berkeley Intelligent RAM (IRAM) Project: > >http://iram.cs.berkeley.edu/ > >So 15 years later someone suddenly thinks that it is a good idea to >ship IRAM systems to real customers?? :-D > >Rayson Or maybe, all good ideas keep coming up again, and each time, it's refined a bit, or there's another possible source of funding appearing. Look at "solar power transmitted by microwaves from orbit" as an example. That one has a 15-20 year cycle time. You have an idea which is attractive.. You get some money to run it forward, and then insurmountable problems crop up, discoverable only with significant investment of time/money (>> 1 work month). That puts the idea to sleep for a while until either the reasons are forgotten, or technology has advanced to the point where what might have been unreasonable the previous time is reasonable now. Certainly in the computing world, where 10-15 years is sufficient for many orders of magnitude change in performance along many axes, it pays to revisit things, since what may have been a good balance or trade back then, isn't now. And that's sort of the thrust of their white paper (justifying that now the time is right), as well as staking their claim to a bunch of general applications, few of which are uniquely enabled by their proposed technology. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Jan 23 16:19:34 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 23 Jan 2012 16:19:34 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120123192826.GB17383@bx9.net> References: <20120123192826.GB17383@bx9.net> Message-ID: > http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html wonder what Intel's thinking - could do some very interesting stuff, but it would take a bit of charisma. QPI-over-IB anyone? I'm not crazy about Intel being a vertically-integrated HPC supplier (chips, systems, interconnect, mpi, compilers - I guess they still don't have their own scheduler or sexy cloud branding ;) the world is a better place when each level has internal competition based on useful, open (free), multi-implementation standards. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Jan 23 16:33:48 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Jan 2012 16:33:48 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <4F1DD23C.8080601@scalableinformatics.com> On 01/23/2012 04:19 PM, Mark Hahn wrote: > the world is a better place when each level has internal competition > based on useful, open (free), multi-implementation standards. Markets always go through these full on vertical integration phases (for a while) before the assets are sold off (either voluntarily or via bankruptcy court). Its a natural part of the business cycle. Cisco is building servers now. Oracle, the whole stack. Pretty soon, some whipper snapper of a company is going to come along and eat their lunches, and then they will get competitive pressure to change. This said, many *many* large university sites like dealing with "a single vendor" (that is until they get eventually screwed over by that one vendor, or realize that the "great deal" they are getting really isn't as great as it sounded ... ). Which is part of the reason its so hard getting into accounts other vendors have locked up. Sadly, lots of this works around the spirit (and probably skating very close to the edge of the letter) of the law surrounding most public acquisition processes, but thats life I guess. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 23 16:46:11 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 23 Jan 2012 16:46:11 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <4F1DD523.4020005@ias.edu> On 01/23/2012 04:19 PM, Mark Hahn wrote: >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? That's what I'm thinking! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 16:49:12 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 23 Jan 2012 22:49:12 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: On Jan 23, 2012, at 10:19 PM, Mark Hahn wrote: >> http://www.hpcwire.com/hpcwire/2012-01-23/ >> intel_to_buy_qlogic_s_infiniband_business.html > > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? forget it > > I'm not crazy about Intel being a vertically-integrated HPC supplier > (chips, systems, interconnect, mpi, compilers - I guess they still > don't have their own scheduler or sexy cloud branding ;) maybe they just want a new generation ethernet nic dirt cheap for their motherboards; if you produce it in those numbers as they do probably anything gets dirt cheap, this doesn't bit highend, yet it might be cheaper then to buy qlogic than pay royalties to any of the infiniband vendors; which would be either mellanox or qlogic. Also they bought qlogic for 125 million dollar, though in cash, which doesn't seem to me as exceptionnel much from intels viewpoint whereas they might intend to sell some of their upcoming line of vector cpu's which badly need a network of course. 125 million is just a few supercomputers. maybe it was just a cheap buy, as qlogic doesn't have FDR yet, who knows? What i wonder about is how wallstreet knew in advance about qlogic getting taken over. If we look careful we see that since say roughly december 19th 2011, the nasdaq rose roughly 10.5% and qlogic rose quite a lot more, several percent. So it was significant more in demand than the index, which is weird if we realize that qlogic has unrolled nothing those months whereas its competitor Mellanox has unrolled FDR. It's obvious some traders knew this deal was coming, but real fingerpointing is not my job. Vincent > the world is a better place when each level has internal competition > based on useful, open (free), multi-implementation standards. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Jan 23 18:00:02 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 10:00:02 +1100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <4F1DE672.6000602@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 08:19, Mark Hahn wrote: > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? I remember way back hearing the IB was going to be the technology to replace all those various buses (PCI, etc) on a motherboard [1], then it all went quiet and then it re-emerged as an interconnect. So perhaps Intel (who were part of one of the two groups that merged to create IB) have thoughts again on this? cheers, Chris [1] interestingly a similar comment appears on the IB Wikipedia page under history, but sadly without references.. http://en.wikipedia.org/wiki/InfiniBand#History - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8d5nIACgkQO2KABBYQAh+rcACgjTSmbr9EC4clrh0J2EQUT8lX Sz0AniUG4pdhBkliNWGq5E1tsXiOa8IV =0k6Z -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From joshua_mora at usa.net Mon Jan 23 18:02:12 2012 From: joshua_mora at usa.net (Joshua mora acosta) Date: Mon, 23 Jan 2012 17:02:12 -0600 Subject: [Beowulf] Intel buys QLogic InfiniBand business Message-ID: <708qawXBm8848S02.1327359732@web02.cms.usa.net> Do you mean IB over QPI ? Either way, High Node Count Coherence will be an issue. In any case, by acquiring their IP it is a step forward towards SoC (System on Chip). A preliminary step (building block) for the Exascale strategy and for low cost enterprise/cloud solutions. Joshua ------ Original Message ------ Received: 03:47 PM CST, 01/23/2012 From: Prentice Bisbal To: beowulf at beowulf.org Subject: Re: [Beowulf] Intel buys QLogic InfiniBand business > > On 01/23/2012 04:19 PM, Mark Hahn wrote: > >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > > wonder what Intel's thinking - could do some very interesting stuff, > > but it would take a bit of charisma. QPI-over-IB anyone? > > That's what I'm thinking! > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 18:24:15 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 00:24:15 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <708qawXBm8848S02.1327359732@web02.cms.usa.net> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> Message-ID: <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: > Do you mean IB over QPI ? > Either way, High Node Count Coherence will be an issue. Just ignore his statement - it's total nonsense. Nanosecond latency of QPI using 2 rings versus something that has a latency up to factor 1000 slower with the pci-e as the slowest delaying factor. Doing cache coherency over that forget it. From what i understand a big problem at modern cpu's is the crossbar. At latest chip displayed, the bulldozer, it's taking a significant amount of transistors. If you confront that crossbar suddenly with latencies a a factor 4000 slower, that's not gonna let it perform better of course. > In any case, by acquiring their IP it is a step forward towards SoC > (System on > Chip). A preliminary step (building block) for the Exascale > strategy and for > low cost enterprise/cloud solutions. Not with intel. Intel sells fast equipment yet it has a huge price always, about the opposite of infiniband which is a dirt cheap technology. I guess we must see this much simpler. At such a giant as intel, paying a bit over 100 million is peanuts. Probably less than what they would need to pay for royalties to a manufacturer owning a bunch of patents in the ethernet NIC area; the HPC intel gets 'for free'. Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap without needing to pay royalties to qlogic. It will not be a big performer such 10 gigabit ethernet nic, yet price matters a lot of course when integrating. Every penny counts then. What you typically see with intel is that for them the mass market is so important, read that's the 1 gigabit ethernet market right now, that all other products suffer there, as they will give their mass market products always, of course, priority. Itanium is a good example; it always was proces generations behind their main products. It never was given a fair chance to compete. So where they win it with sandy bridge becasue it's soon a proces generation or 2 having the edge on AMD, there intels other products suffer from this,as they don't get that proces technology. meanwhile ethernet is total crucial to have low latency for the financial world, as they can make dozens of billions a year by being faster than others at exchanges. Now back to that mass market and integration of a good and especially cheap 10 gigabit nic into intels mainboards, this buy might be pretty interesting to intel. Yet that's a market so big, it has nothing to do with HPC i'd argue. From HPC viewpoint i wouldn't see this takeover as a threat to anyone in HPC, i guess it basically means intel won't challenge for the crown in HPC, giving Mellanox monopoly for a while at FDR. It's about ethernet i bet. > > Joshua > ------ Original Message ------ > Received: 03:47 PM CST, 01/23/2012 > From: Prentice Bisbal > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Intel buys QLogic InfiniBand business > >> >> On 01/23/2012 04:19 PM, Mark Hahn wrote: >>>> > http://www.hpcwire.com/hpcwire/2012-01-23/ > intel_to_buy_qlogic_s_infiniband_business.html >>> wonder what Intel's thinking - could do some very interesting stuff, >>> but it would take a bit of charisma. QPI-over-IB anyone? >> >> That's what I'm thinking! >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Jan 23 19:03:14 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Jan 2012 19:03:14 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> Message-ID: <4F1DF542.6050504@scalableinformatics.com> On 01/23/2012 06:24 PM, Vincent Diepeveen wrote: > > On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: [...] > Nanosecond latency of QPI using 2 rings versus something that has a > latency up to factor 1000 slower > with the pci-e as the slowest delaying factor. > > Doing cache coherency over that forget it. Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't work!!! More seriously, with this acquisition, I could see serious contention for ScaleMP. SoC type stuff, using IB between many nodes, in smaller boxen. >> In any case, by acquiring their IP it is a step forward towards SoC >> (System on >> Chip). A preliminary step (building block) for the Exascale >> strategy and for >> low cost enterprise/cloud solutions. Yes. > Not with intel. Intel sells fast equipment yet it has a huge price > always, > about the opposite of infiniband which is a dirt cheap technology. Must use Shakespeare for this takedown: Methinks thou dost protesteth too much ... > > I guess we must see this much simpler. At such a giant as intel, > paying a bit over 100 million is peanuts. > Probably less than what they would need to pay for royalties to a > manufacturer owning a bunch of patents > in the ethernet NIC area; the HPC intel gets 'for free'. So ... exactly what are the existing intel 10GbE NIC's then ... Swiss Cheese? I see a fair number of vendors licensing Intel's IP, or, more to the point, using Intel silicon (hint: this might be a good reason for the acquisition) to build their stuff... > Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap ... which they have been doing for years ... > without needing to pay royalties to qlogic. ... not sure they were, but its possible Qlogic has 10GbE IP that Intel licenses, but this transaction was about ... Infiniband ... [...] > meanwhile ethernet is total crucial to have low latency for the > financial world, as they can make dozens of billions a year by being > faster > than others at exchanges. Errr ... given that this is one of our core markets, don't mind if I note that latency is critical to these players, so proximity to the exchange, and reliable and deterministic latency is absolutely critical. There are switches that are doing 300ns port to port in the Ethernet space now. With the NICs, you are looking in the 2-ish microsecond regime. These are not cheap. Compare this to QDR. 1 microsecond +/- some. Which has lower latency? There are many reasons why exchanges (mostly) aren't on IB. A few of them are even valid technical reasons. Historical momentum, and conservative approaches to new technology rank pretty high. So does the inability to generally export IB far and wide. And the complexity of the stack. Ethernet is (almost) plug and play. Its just a network. IB is sort of kind of plug, install OFED, and play for a while over IPoIB until you can recode for some of the RDMA bits. And don't try to run file systems and other things with lots of traffic over IPoIB. It leaks and gradually you will catch some cool ... surprises. Honestly, its a shame that IPoIB never really got the attention it deserved like the other elements of the IB stack did. Getting a rock solid IP implementation atop a fast/low latency net could have driven many design wins outside of HPC. And would have been a gateway drug^H^H^H^Htechnology for using the other stack elements. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Jan 23 19:06:43 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Jan 2012 19:06:43 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1DF542.6050504@scalableinformatics.com> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: <4F1DF613.1060603@scalableinformatics.com> On 01/23/2012 07:03 PM, Joe Landman wrote: > Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't > work!!! > There is an implicit /sarc tag here BTW. vSMP does a wonderful job (where Vincent claims that things won't work ... they do work, and very well at that). > More seriously, with this acquisition, I could see serious contention > for ScaleMP. SoC type stuff, using IB between many nodes, in smaller boxen. Serious contention to buy ScaleMP (as in potential acquirers) Must be getting too much blood in the coffee stream. Can't communicate ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From atp at piskorski.com Mon Jan 23 19:30:30 2012 From: atp at piskorski.com (Andrew Piskorski) Date: Mon, 23 Jan 2012 19:30:30 -0500 Subject: [Beowulf] CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy In-Reply-To: References: Message-ID: <20120124003030.GA80957@piskorski.com> On Mon, Jan 23, 2012 at 03:50:09PM -0500, Rayson Ho wrote: > http://iram.cs.berkeley.edu/ > > So 15 years later someone suddenly thinks that it is a good idea to > ship IRAM systems to real customers?? :-D Sure. But from when I last read about the IRAM stuff, I'm pretty sure it was strictly single core. Their VIRAM1 chip had 13 MB of DRAM, 1 cpu core, and 4 "vector lanes", with no mention of SMP or any sort of multi-chip parallelism at all. If Venray has a good design for using hundreds or more IRAM-like chips in a parallel machine, that sounds like a significant step forward. (The intended fab process and attendant design rules might also be quite different, although I'm not at all sure about that.) -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 19:40:13 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 01:40:13 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1DF542.6050504@scalableinformatics.com> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: On Jan 24, 2012, at 1:03 AM, Joe Landman wrote: > On 01/23/2012 06:24 PM, Vincent Diepeveen wrote: >> >> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: > > [...] > >> Nanosecond latency of QPI using 2 rings versus something that has a >> latency up to factor 1000 slower >> with the pci-e as the slowest delaying factor. >> >> Doing cache coherency over that forget it. > > Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't > work!!! > > More seriously, with this acquisition, I could see serious contention > for ScaleMP. SoC type stuff, using IB between many nodes, in > smaller boxen. > That would be some BlueGene type machine you speak about that intel would produce with a low power SoC. This where at this point the bluegene type machines simply can't compete with the tiny processors that get produced by the dozens of millions. "The tiny processors have won" Linus Thorvalds Intel has themselves a second law of Moore. You can google for it. Every new generation of factory that can produce this machine with double the number of transistors, that factory also is 2x more expensive. A few years ago intel projected that by 2020 building a single factory would have a cost of 20 billion dollar. Now Obama might contribute to this by overspending 40-50%, more overspending than the overspending of Greece, Spain, UK and Portugal combined. So that will cause massive inflation, which will hurt the poor most, and it sure will help the 2nd law of Moore become sooner a reality rather than later; yet if we move away from politics to money and mass production; i hope you realize that a few HPC cpu's won't pay back for 20 billion dollar. In short only cpu's that get mass produced can. A good example of massproduced processors are gpu's. If we look at the leading gpu's, which have by now thousands of cores, there is no way to compete with that with SoC's. What's price of producing 1 gpu versus 200 SOC's with a small core? Furthermore intel never really could compete in the SOC world so far with the low power cpu's that get produced by the billion a year, so betting on that would be quite surprising, though not impossible gamble. Intel always has been good in low latency designs. yet obviously further integration of logics into the cpu means of course you also need a capable ethernet chip in your cpu. Qlogics can provide that. Mass produce half a billion of those and then it's cheaper to buy a company with such technology than to pay royalties. Another HPC problem with the bluegene type designs: all those soc's basically spread the calculation power over a bigger area than 1 big power eating chip will. Bigger area means bigger distance to transfer massive data, and that's in itself a very expensive thing. Overall seen bluegene machines never really had a low power usage, despite some stupid professors shouting that. Per gflop it always was never the performance king; they just compared with total hopeless type designs and IBM usually delivered in time, something that is very important in HPC as well. IMHO the only reason bluegene could be competative is because it was fighting dinosaur type HPC cpu's. Now SoC's might be mighty interesting in the gamersworld and in the telecom to build new phones with, wich makes it mighty interesting for intel to produce those dirtcheap, and maybe even put a more capable ethernet chip on it, again dirtcheap; as for the HPC world i don't see it happen that this SoC can compete anyhow with a gpu or even CPU. Better write some code in CUDA or OpenCL i'd argue. Latest AMD gpu the HD Radeon 7970, it is delivering 1 teraflop or so? With soon a 2 gpu version coming on 1 card that's gonna deliver close to 2 Tflop a card, double precision yes. Multiply by 4 for single precision. 8+ Teraflop single precision. For a couple of hundreds of dollars. Nvidia will undoubtfully follow with their 1 teraflop gpu. If take a washing machine and pack it with cheapo socks, creating a 2 Tflop machine, do you guess you can SELL that for a couple of hundreds of dollars? Just transport costs already will be more expensive than a single gpu card... Intel cannot compete with that in HPC for the stuff that needs bandwidth and doesn't care for latency. as at a new proces technology, they first go produce a few FPGA cpu's, and after that they produce worlds fastest CPU. So there is simply no window in time to use the latest proces technology for a HPC vector type chip. That's why AMD-ATI and Nvidia will win that contest handsdown. And we sure hope intel will keep selling its cpu's very well, which if it is the case means that this won't change. After all they already make cash on majority of supercomputers as each node also usually has 2 Xeon cpu's which go for a multiple of the price of the GPU that's in the box... > >>> In any case, by acquiring their IP it is a step forward towards SoC >>> (System on >>> Chip). A preliminary step (building block) for the Exascale >>> strategy and for >>> low cost enterprise/cloud solutions. > > Yes. > >> Not with intel. Intel sells fast equipment yet it has a huge price >> always, >> about the opposite of infiniband which is a dirt cheap technology. > > Must use Shakespeare for this takedown: Methinks thou dost protesteth > too much ... > >> >> I guess we must see this much simpler. At such a giant as intel, >> paying a bit over 100 million is peanuts. >> Probably less than what they would need to pay for royalties to a >> manufacturer owning a bunch of patents >> in the ethernet NIC area; the HPC intel gets 'for free'. > > So ... exactly what are the existing intel 10GbE NIC's then ... Swiss > Cheese? I see a fair number of vendors licensing Intel's IP, or, more > to the point, using Intel silicon (hint: this might be a good > reason for > the acquisition) to build their stuff... > >> Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap > > ... which they have been doing for years ... > >> without needing to pay royalties to qlogic. > > ... not sure they were, but its possible Qlogic has 10GbE IP that > Intel > licenses, but this transaction was about ... Infiniband ... > > [...] > >> meanwhile ethernet is total crucial to have low latency for the >> financial world, as they can make dozens of billions a year by being >> faster >> than others at exchanges. > > Errr ... given that this is one of our core markets, don't mind if I > note that latency is critical to these players, so proximity to the > exchange, and reliable and deterministic latency is absolutely > critical. > There are switches that are doing 300ns port to port in the Ethernet > space now. With the NICs, you are looking in the 2-ish microsecond > regime. These are not cheap. > > Compare this to QDR. 1 microsecond +/- some. > > Which has lower latency? > > There are many reasons why exchanges (mostly) aren't on IB. A few of > them are even valid technical reasons. Historical momentum, and > conservative approaches to new technology rank pretty high. So > does the > inability to generally export IB far and wide. And the complexity of > the stack. Ethernet is (almost) plug and play. Its just a network. > > IB is sort of kind of plug, install OFED, and play for a while over > IPoIB until you can recode for some of the RDMA bits. And don't > try to > run file systems and other things with lots of traffic over IPoIB. It > leaks and gradually you will catch some cool ... surprises. > > Honestly, its a shame that IPoIB never really got the attention it > deserved like the other elements of the IB stack did. Getting a rock > solid IP implementation atop a fast/low latency net could have driven > many design wins outside of HPC. And would have been a gateway > drug^H^H^H^Htechnology for using the other stack elements. > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Jan 23 19:51:59 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 11:51:59 +1100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: <4F1E00AF.4090206@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 11:40, Vincent Diepeveen wrote: > Overall seen bluegene machines never really had a low power usage, > despite some stupid professors shouting that. So that's why the top 5 places on the last Green500 are all BlueGene.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8eAK8ACgkQO2KABBYQAh+nIwCdH88tISGrx772Sq/57XquLFRb GtcAni1urHGd2j+MIJA0LXG2sGk+YymR =tfjM -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 20:00:43 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 02:00:43 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1E00AF.4090206@unimelb.edu.au> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E00AF.4090206@unimelb.edu.au> Message-ID: <7C608C57-D51B-4369-A973-6943E2D2DB7C@xs4all.nl> On Jan 24, 2012, at 1:51 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 24/01/12 11:40, Vincent Diepeveen wrote: > >> Overall seen bluegene machines never really had a low power usage, >> despite some stupid professors shouting that. > > So that's why the top 5 places on the last Green500 are all BlueGene.. > I wondered about that as well. When i see 1 gpu get nearly 1 teraflop eating probably a tad more power than official, say a 250 watt it'll consume. I already use more power now than the specs in fact. Yet even then that's 4 gflop per watt. Last time i calculated bluegene, sure that's probably the previous generation, it was 3 watts per gflop, or factor 12 more power than a Radon HD 7970. Please note that in the statements of most HPC centers claiming blue gene to be energy efficient, usually they do not release numbers. But now the important question, what's price of bluegene per teraflop? It's let's have a look, around a 500 euro or so for a Radeon HD7970 card. Vincent > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8eAK8ACgkQO2KABBYQAh+nIwCdH88tISGrx772Sq/57XquLFRb > GtcAni1urHGd2j+MIJA0LXG2sGk+YymR > =tfjM > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Jan 23 20:06:41 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 12:06:41 +1100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <7C608C57-D51B-4369-A973-6943E2D2DB7C@xs4all.nl> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E00AF.4090206@unimelb.edu.au> <7C608C57-D51B-4369-A973-6943E2D2DB7C@xs4all.nl> Message-ID: <4F1E0421.80009@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 12:00, Vincent Diepeveen wrote: > But now the important question, what's price of bluegene per teraflop? > > It's let's have a look, around a 500 euro or so for a Radeon HD7970 > card. What does that matter if you can't power or cool a similar performance GPU system? Let alone have any applications that will actually take advantage of it. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8eBCEACgkQO2KABBYQAh839wCdFz1MjiPGCKwvbKpANCmJZpnU V4UAoJYIfKNf6VleNi0SduPcBtSkqxQq =E7Rh -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Jan 23 20:07:58 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 23 Jan 2012 20:07:58 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1DD523.4020005@ias.edu> References: <20120123192826.GB17383@bx9.net> <4F1DD523.4020005@ias.edu> Message-ID: <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> > > On 01/23/2012 04:19 PM, Mark Hahn wrote: >>> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html >> wonder what Intel's thinking - could do some very interesting stuff, >> but it would take a bit of charisma. QPI-over-IB anyone? > > That's what I'm thinking! Numascale does this already with SCI -- Doug > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at eadline.org Mon Jan 23 20:15:30 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 23 Jan 2012 20:15:30 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <2d90512c0be6a3eba887e5f6ab96b3c1.squirrel@mail.eadline.org> >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? There were some exascale goals mentioned. I wonder if there is some plans for a MIC based exascale beast -- Doug > > I'm not crazy about Intel being a vertically-integrated HPC supplier > (chips, systems, interconnect, mpi, compilers - I guess they still > don't have their own scheduler or sexy cloud branding ;) > > the world is a better place when each level has internal competition > based on useful, open (free), multi-implementation standards. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ellis at cse.psu.edu Mon Jan 23 20:19:08 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Mon, 23 Jan 2012 20:19:08 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: <4F1E070C.4040107@cse.psu.edu> On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>> Nanosecond latency of QPI using 2 rings versus something that has a >>> latency up to factor 1000 slower >>> with the pci-e as the slowest delaying factor. >>> >>> Doing cache coherency over that forget it. >> >> Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't >> work!!! >> >> More seriously, with this acquisition, I could see serious contention >> for ScaleMP. SoC type stuff, using IB between many nodes, in >> smaller boxen. > > That would be some BlueGene type machine you speak about that intel > would produce with a low power SoC. > > This where at this point the bluegene type machines simply can't > compete with the tiny processors > that get produced by the dozens of millions. For...chess? ;D > "The tiny processors have won" > Linus Thorvalds *Torvalds, and if Linux (or any well-supported kernel/OS for that matter) currently had data structures designed for extremely high parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I would agree with this statement. As I currently see it, all we can really say is that someday, probably, perhaps even hopefully: "The tiny processors will win." That's after we work out all the nasty nuances involved with designing new data structures for OSes that can handle that number of cores, and probably design new applications that can use these new OS features. And no, GPU support in Linux doesn't count as this already having been done. We just farm out very specific code to run on those things. If somebody has an example of a full-blown, usable OS running on a GPU ALONE, I would stand (very interestingly) corrected. > Intel has themselves a second law of Moore. You can google for it. Thanks, for a moment there, I almost used AskJeeves. > A good example of massproduced processors are gpu's. Was waiting for the hook. Inevitable really. I think if we were discussing the efficacy and quality of resultant bread from various bread machines versus the numerous methods for making bread by hand somehow, someway, a GPU would make better bread. Might be a wholesome cyber-loaf of artisan wheat, but nonetheless, it would be better in every way. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 20:44:10 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 02:44:10 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F1E070C.4040107@cse.psu.edu> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: In hardware you cannot beat manycore performance CPU's at the same cost structure; cpu's have an exponential cost structure, for example to maintain cache-coherency. This has many implications; for example also on size and scale. If you produce a 1000 mm^2 cpu this is extremely expensive with real low yields, whereas a 1000 mm^2 manycore is not a problem at all; cores that do not work you can just turn off. There is no coherency. So if you produce bigger cpu's, the price goes up per square millimeter, with manycores it scales near lineair. If i remember well at 2007 a NCSA director already had put the implication of this reality in his sheets, assuming by 2010 NCSA would build supercomputers exclusively using manycores. Note that manycores are not ideal for chess - they are however possible to use for majority of system time that gets burned in HPC as majority of HPC needs throughput rather than latency. Comparing bluegene machines with gpu's makes perfect sense of course as the latency on them is also total crap. I see the bluegene system by IBM as a genius move from IBM, starting an evolution, moving away from huge expensive cpu's where you produce just a handful from in a total outdated proces technology, with extremely bad yields, with a milliondollar of startup costs, which by now woud be at todays factories approaching 20 million dollar startup costs just to print a single batch of processors. IBM developing power8 will have a serious problem with newer generation factories. Every batch they print, every mistake it has, DANG 20 million dollar gone. This concept of using simple cpu's, yet not that massively produced yet, obviously evoluted now into a gpu, which is 1 total mass produced cheap chip, that integrates all those tiny cores into 1 cpu, which is way cheaper. What's price of a bluegene system per teraflop? It's 500 euro for a 1 teraflop double precision Radeon HD7970... On Jan 24, 2012, at 2:19 AM, Ellis H. Wilson III wrote: > On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>>> Nanosecond latency of QPI using 2 rings versus something that has a >>>> latency up to factor 1000 slower >>>> with the pci-e as the slowest delaying factor. >>>> >>>> Doing cache coherency over that forget it. >>> >>> Hear that Shai F? Stop work on vSMP now, cause Vincent says it >>> can't >>> work!!! >>> >>> More seriously, with this acquisition, I could see serious >>> contention >>> for ScaleMP. SoC type stuff, using IB between many nodes, in >>> smaller boxen. >> >> That would be some BlueGene type machine you speak about that intel >> would produce with a low power SoC. >> >> This where at this point the bluegene type machines simply can't >> compete with the tiny processors >> that get produced by the dozens of millions. > > For...chess? ;D > >> "The tiny processors have won" >> Linus Thorvalds > > *Torvalds, and if Linux (or any well-supported kernel/OS for that > matter) currently had data structures designed for extremely high > parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I > would agree with this statement. As I currently see it, all we can > really say is that someday, probably, perhaps even hopefully: > > "The tiny processors will win." > > That's after we work out all the nasty nuances involved with designing > new data structures for OSes that can handle that number of cores, and > probably design new applications that can use these new OS features. > And no, GPU support in Linux doesn't count as this already having been > done. We just farm out very specific code to run on those things. If > somebody has an example of a full-blown, usable OS running on a GPU > ALONE, I would stand (very interestingly) corrected. > >> Intel has themselves a second law of Moore. You can google for it. > > Thanks, for a moment there, I almost used AskJeeves. > >> A good example of massproduced processors are gpu's. > > Was waiting for the hook. Inevitable really. I think if we were > discussing the efficacy and quality of resultant bread from various > bread machines versus the numerous methods for making bread by hand > somehow, someway, a GPU would make better bread. Might be a wholesome > cyber-loaf of artisan wheat, but nonetheless, it would be better in > every way. > > Best, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 20:55:41 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 02:55:41 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> References: <20120123192826.GB17383@bx9.net> <4F1DD523.4020005@ias.edu> <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> Message-ID: <534AD42D-DC33-4199-B476-9ADED3E09073@xs4all.nl> On Jan 24, 2012, at 2:07 AM, Douglas Eadline wrote: > >> >> On 01/23/2012 04:19 PM, Mark Hahn wrote: >>>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>>> intel_to_buy_qlogic_s_infiniband_business.html >>> wonder what Intel's thinking - could do some very interesting stuff, >>> but it would take a bit of charisma. QPI-over-IB anyone? >> >> That's what I'm thinking! > > Numascale does this already with SCI They sold 300 systems, is claim on homepage. Not exactly what intel aims for. I bet they instead aim to sell half a billion cpu's with built in ethernet - let's face it their NICs started to get outdated. For HPC it won't be a slamming succes let alone give you any performance. After all what's price of 1000 SoC's with 1000 tiny cpu's on it, that together produce you 1 teraflop, versus 1 manycore that produces 1 teraflop? This is not what you buy Qlogics for. Maybe it was just a cheap buy for the number of patents they posses, and the big need within intel for some engineers that can improve their cpu's with connectivity that the average user will like; as for HPC, moving those engineers within intel to the areas where intel can make most cash, that's with cpu's and not with HPC hardware, seems Mellanox gets a monopoly on HPC network performance. > > -- > Doug > >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Mon Jan 23 23:55:41 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 23 Jan 2012 20:55:41 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120123192826.GB17383@bx9.net> References: <20120123192826.GB17383@bx9.net> Message-ID: <20120124045541.GB10196@bx9.net> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: > http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html I figured out the main why: http://seekingalpha.com/news-article/2082171-qlogic-gains-market-share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets > Server-class 10Gb Ethernet Adapter and LOM revenues have recently > surpassed $100 million per quarter, and are on track for about fifty > percent annual growth, according to Crehan Research. That's the whole market, and QLogic says they are #1 in the FCoE adapter segment of this market, and #2 in the overall 10 gig adapter market (see http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses-f2q12-results-earnings-call-transcript) Historically, QLogic had a fibre channel adapter business that was a huge cash cow, and they bought their way into various markets and had limited success with them: iscsi, fibre channel switches, and yes, InfiniBand, where QLogic managed to get some large sales (TriLabs 3 PF procurement) yet was at only 15%-20% market share. I'm surprised that QLogic could succeed in 10gige adapters given all the competition, but hey, I never understood why fibre channel was popular, either. Now that QLogic has found what the next best thing after fibre channel adapters is, they might as well concentrate on it. It'll be interesting what Intel plans to do in the exascale market. I've thought for a long time that non-cache-coherent processors like MIC ought to have InfiniPath-like hardware queues for sending and receiving short messages efficiently, even on-chip. Not to mention that whole exascale thing. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From scrusan at ur.rochester.edu Tue Jan 24 00:02:26 2012 From: scrusan at ur.rochester.edu (Steve Crusan) Date: Tue, 24 Jan 2012 00:02:26 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: > > > It's 500 euro for a 1 teraflop double precision Radeon HD7970... Great, and nothing runs on it. GPUs are insanely useful for certain tasks, but they aren't going to be able to handle most normal workloads(similar to the BG class of course). Any center that buys BGP (or Q at this point) gear is going to pay for a scientific programmer to adapt their code to take advantage of the BG's strengths; parallelism. But It's nice that supercomputing centers use GPUs to boost their flops numbers. Any word on that Chinese system's efficiency? If you look at the architecture of the new K computer in Japan, it's similar to the BlueGene line. PS: I'm really not an IBMer. > > > > On Jan 24, 2012, at 2:19 AM, Ellis H. Wilson III wrote: > >> On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>>>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>>>> Nanosecond latency of QPI using 2 rings versus something that has a >>>>> latency up to factor 1000 slower >>>>> with the pci-e as the slowest delaying factor. >>>>> >>>>> Doing cache coherency over that forget it. >>>> >>>> Hear that Shai F? Stop work on vSMP now, cause Vincent says it >>>> can't >>>> work!!! >>>> >>>> More seriously, with this acquisition, I could see serious >>>> contention >>>> for ScaleMP. SoC type stuff, using IB between many nodes, in >>>> smaller boxen. >>> >>> That would be some BlueGene type machine you speak about that intel >>> would produce with a low power SoC. >>> >>> This where at this point the bluegene type machines simply can't >>> compete with the tiny processors >>> that get produced by the dozens of millions. >> >> For...chess? ;D >> >>> "The tiny processors have won" >>> Linus Thorvalds >> >> *Torvalds, and if Linux (or any well-supported kernel/OS for that >> matter) currently had data structures designed for extremely high >> parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I >> would agree with this statement. As I currently see it, all we can >> really say is that someday, probably, perhaps even hopefully: >> >> "The tiny processors will win." >> >> That's after we work out all the nasty nuances involved with designing >> new data structures for OSes that can handle that number of cores, and >> probably design new applications that can use these new OS features. >> And no, GPU support in Linux doesn't count as this already having been >> done. We just farm out very specific code to run on those things. If >> somebody has an example of a full-blown, usable OS running on a GPU >> ALONE, I would stand (very interestingly) corrected. >> >>> Intel has themselves a second law of Moore. You can google for it. >> >> Thanks, for a moment there, I almost used AskJeeves. >> >>> A good example of massproduced processors are gpu's. >> >> Was waiting for the hook. Inevitable really. I think if we were >> discussing the efficacy and quality of resultant bread from various >> bread machines versus the numerous methods for making bread by hand >> somehow, someway, a GPU would make better bread. Might be a wholesome >> cyber-loaf of artisan wheat, but nonetheless, it would be better in >> every way. >> >> Best, >> >> ellis >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJPHjtzAAoJENS19LGOpgqKUHUH/Rvn6tXy8Kla86JNbNwt3KUJ B+70SwJL/aBDstcDG4ChT5uW0WCcuvS7qRx5e1Zwu68m7qFEZRvIwc0uu0bgHbxt KRynFRZ6suwudEp0o4HMpCBYNaC7uG7xkUeFbUHKfnfCflWDoz4Y9Fq3a/OhoriK a5JrQqjVI6HZij+xDqrFvyn80Ec8eSwfRYd8lxfq4abHtE1tKYm/cF5I5Bn2lD5l wVNvBQiU99ZPeqhcbL5XyvIsceB6ncodJ9zmBxIahrNIogMCq7UJbUhsikSRp6Dd cL7r0AekTyiRmvZaHZZKbuad68DfATT4hy9/HzodBqTWLxxTMlrW8vNH9a7dSOo= =oA7r -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Tue Jan 24 00:09:57 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 16:09:57 +1100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: <4F1E3D25.7000008@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 16:02, Steve Crusan wrote: > Any center that buys BGP (or Q at this point) gear is > going to pay for a scientific programmer to adapt their > code to take advantage of the BG's strengths; parallelism. The advantage of the BG platform though is that it's just MPI and threads, nothing that unusual at all - certainly no need to learn CUDA, OpenCL, etc.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ePSUACgkQO2KABBYQAh+hPQCggfFgdr9R9G6H7hW0Dk1/sGK+ Fe8Aniu7M6CEThw0s7F2CtqTCmuNZMRg =mH9r -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Tue Jan 24 00:32:08 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 24 Jan 2012 00:32:08 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> References: <20120123192826.GB17383@bx9.net> <4F1DD523.4020005@ias.edu> <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> Message-ID: >>> but it would take a bit of charisma. QPI-over-IB anyone? >> >> That's what I'm thinking! > > Numascale does this already with SCI it's easy to source and build pretty big IB systems; how much so with SCI? I actually like the idea of high-fanout-distributed-router systems, but they seem prepetually exotic. where are the hypercubes, FNNs? afaikt, commodification of IB has snuffed topology as a design issue, except for cray/BG/k machine-level projects. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 24 00:53:14 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 21:53:14 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: Message-ID: Inevitably, though, massively parallel interconnects (all boxes connected to all other boxes) won't scale. On 1/23/12 9:32 PM, "Mark Hahn" wrote: >>>> but it would take a bit of charisma. QPI-over-IB anyone? >>> >>> That's what I'm thinking! >> >> Numascale does this already with SCI > >it's easy to source and build pretty big IB systems; >how much so with SCI? > >I actually like the idea of high-fanout-distributed-router systems, >but they seem prepetually exotic. where are the hypercubes, FNNs? >afaikt, commodification of IB has snuffed topology as a design issue, >except for cray/BG/k machine-level projects. >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 06:53:35 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 12:53:35 +0100 Subject: [Beowulf] CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy In-Reply-To: <20120124003030.GA80957@piskorski.com> References: <20120124003030.GA80957@piskorski.com> Message-ID: <20120124115335.GW7343@leitl.org> On Mon, Jan 23, 2012 at 07:30:30PM -0500, Andrew Piskorski wrote: > On Mon, Jan 23, 2012 at 03:50:09PM -0500, Rayson Ho wrote: > > > http://iram.cs.berkeley.edu/ > > > > So 15 years later someone suddenly thinks that it is a good idea to > > ship IRAM systems to real customers?? :-D > > Sure. But from when I last read about the IRAM stuff, I'm pretty sure > it was strictly single core. Their VIRAM1 chip had 13 MB of DRAM, 1 > cpu core, and 4 "vector lanes", with no mention of SMP or any sort of > multi-chip parallelism at all. If Venray has a good design for using > hundreds or more IRAM-like chips in a parallel machine, that sounds > like a significant step forward. (The intended fab process and > attendant design rules might also be quite different, although I'm not > at all sure about that.) In order to make best use of eDRAM it's best to organize the CPU around the layout of the memory cells, treating it as an array. You'll need a refresh register, best as wide as possible, multi-kBit word sizes, add shifts (which helps the network processor), VLIW/SIMD, large integer addition and subtraction, and so on. If you shrink the dies, use redunant connections to route around dead dies you can have WSI with utilization rates of >90% of the real estate. Even without FPUs such a sea of nodes on a mesh maps very well to massively parallel physical problems, AI (spiking neurons), and such. Even as a particle swarm/game physics accelerator engine integrated into RAM it really helps with massively boosting game video and physics performance, with obvious applications in GPGPU as well. This is not at all stupid, if only this wouldn't be pushed by apparent bozos. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Tue Jan 24 07:48:23 2012 From: deadline at eadline.org (Douglas Eadline) Date: Tue, 24 Jan 2012 07:48:23 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: Message-ID: <986a2d9cf54a1630130a3361fc25a547.squirrel@mail.eadline.org> > Inevitably, though, massively parallel interconnects (all boxes connected > to all other boxes) won't scale. > Indeed, when thinking about scale I always end up thinking about the masters of scale -- ants -- Doug > > On 1/23/12 9:32 PM, "Mark Hahn" wrote: > >>>>> but it would take a bit of charisma. QPI-over-IB anyone? >>>> >>>> That's what I'm thinking! >>> >>> Numascale does this already with SCI >> >>it's easy to source and build pretty big IB systems; >>how much so with SCI? >> >>I actually like the idea of high-fanout-distributed-router systems, >>but they seem prepetually exotic. where are the hypercubes, FNNs? >>afaikt, commodification of IB has snuffed topology as a design issue, >>except for cray/BG/k machine-level projects. >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Tue Jan 24 07:51:54 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 13:51:54 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: On Jan 24, 2012, at 6:02 AM, Steve Crusan wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: >> >> >> It's 500 euro for a 1 teraflop double precision Radeon HD7970... > > > Great, and nothing runs on it. You build a system of millions of euro's alltogether, NCSA having a huge budget and you can't even pay for a few programmers who write some crunching code for gpu's???? > GPUs are insanely useful for certain tasks, but they aren't going > to be able to handle most normal workloads(similar to the BG class > of course). Any center that buys BGP (or Q at this point) gear is > going to pay for a scientific programmer to adapt their code to > take advantage of the BG's strengths; parallelism. > bluegene is ibm's equivalent of a HPC gpu, just it's a lot more expensive such box. > But It's nice that supercomputing centers use GPUs to boost their > flops numbers. Any word on that Chinese system's efficiency? Actually on this mailing list if you scroll back in history, and look in 2007, some chinese researchers here posted their codes were, we speak of the 512 streamcore ATI's, already reaching 50% IPC, and it worked crossplatform at AMD and Nvidia. They got 25% efficiency at nvidia. Now if we realize that most codes on this planet can't use multiply- add, then 25% at nvidia and 50% at ATI was really good. If we look to all sorts of applications and see that if 1 good programmer is doing effort, suddenly it works great at gpu's. > If you look at the architecture of the new K computer in Japan, > it's similar to the BlueGene line. > > PS: I'm really not an IBMer. > I took a look at latest BlueGene/Q and basically it's 4 threads per core @ 18 core @ 1.6Ghz or something they are gonna build. that's a much improved chip over the old bluegenes which are 3 watt per gflop. Yet to my surprise, or maybe not, it's still not in the league of gpu's. the not yet built bluegene/q supercomputer claims 2 flops per watt now. GPU's are 4 flops per watt now and already you can buy it in a shop. And at least 1 chinese researcher posted here in 2007 to get 2 flops per watt out of it. What works on such ibm hardware efficient should also be no problem to port to a GPU. I see no money amounts quoted on what bluegene/q is gonna cost, yet we can be sure it's gonna cost you more than a gpu in the shops. So a chip not yet sold by ibm, if i may believe wiki, especially designed for its purpose, can't compete with a gpu, that's already in the shops, which has been designed for gamers. Realize that the gpu has been designed for single precision calculations and delivers 4x more single precision flops than double, and we are comparing it double precision here. BG/Q is using 45 nm processors and AMD7970 is using 28 nm proces technology, to just show my point. > > >> >> >> >> On Jan 24, 2012, at 2:19 AM, Ellis H. Wilson III wrote: >> >>> On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>>>>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>>>>> Nanosecond latency of QPI using 2 rings versus something that >>>>>> has a >>>>>> latency up to factor 1000 slower >>>>>> with the pci-e as the slowest delaying factor. >>>>>> >>>>>> Doing cache coherency over that forget it. >>>>> >>>>> Hear that Shai F? Stop work on vSMP now, cause Vincent says it >>>>> can't >>>>> work!!! >>>>> >>>>> More seriously, with this acquisition, I could see serious >>>>> contention >>>>> for ScaleMP. SoC type stuff, using IB between many nodes, in >>>>> smaller boxen. >>>> >>>> That would be some BlueGene type machine you speak about that intel >>>> would produce with a low power SoC. >>>> >>>> This where at this point the bluegene type machines simply can't >>>> compete with the tiny processors >>>> that get produced by the dozens of millions. >>> >>> For...chess? ;D >>> >>>> "The tiny processors have won" >>>> Linus Thorvalds >>> >>> *Torvalds, and if Linux (or any well-supported kernel/OS for that >>> matter) currently had data structures designed for extremely high >>> parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I >>> would agree with this statement. As I currently see it, all we can >>> really say is that someday, probably, perhaps even hopefully: >>> >>> "The tiny processors will win." >>> >>> That's after we work out all the nasty nuances involved with >>> designing >>> new data structures for OSes that can handle that number of >>> cores, and >>> probably design new applications that can use these new OS features. >>> And no, GPU support in Linux doesn't count as this already having >>> been >>> done. We just farm out very specific code to run on those >>> things. If >>> somebody has an example of a full-blown, usable OS running on a GPU >>> ALONE, I would stand (very interestingly) corrected. >>> >>>> Intel has themselves a second law of Moore. You can google for it. >>> >>> Thanks, for a moment there, I almost used AskJeeves. >>> >>>> A good example of massproduced processors are gpu's. >>> >>> Was waiting for the hook. Inevitable really. I think if we were >>> discussing the efficacy and quality of resultant bread from various >>> bread machines versus the numerous methods for making bread by hand >>> somehow, someway, a GPU would make better bread. Might be a >>> wholesome >>> cyber-loaf of artisan wheat, but nonetheless, it would be better in >>> every way. >>> >>> Best, >>> >>> ellis >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > University of Rochester > https://www.crc.rochester.edu/ > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJPHjtzAAoJENS19LGOpgqKUHUH/Rvn6tXy8Kla86JNbNwt3KUJ > B+70SwJL/aBDstcDG4ChT5uW0WCcuvS7qRx5e1Zwu68m7qFEZRvIwc0uu0bgHbxt > KRynFRZ6suwudEp0o4HMpCBYNaC7uG7xkUeFbUHKfnfCflWDoz4Y9Fq3a/OhoriK > a5JrQqjVI6HZij+xDqrFvyn80Ec8eSwfRYd8lxfq4abHtE1tKYm/cF5I5Bn2lD5l > wVNvBQiU99ZPeqhcbL5XyvIsceB6ncodJ9zmBxIahrNIogMCq7UJbUhsikSRp6Dd > cL7r0AekTyiRmvZaHZZKbuad68DfATT4hy9/HzodBqTWLxxTMlrW8vNH9a7dSOo= > =oA7r > -----END PGP SIGNATURE----- > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Jan 24 07:52:46 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 13:52:46 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F1E3D25.7000008@unimelb.edu.au> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> <4F1E3D25.7000008@unimelb.edu.au> Message-ID: <08826288-2842-4C6B-B16A-180E5CCCF9D1@xs4all.nl> On Jan 24, 2012, at 6:09 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 24/01/12 16:02, Steve Crusan wrote: > >> Any center that buys BGP (or Q at this point) gear is >> going to pay for a scientific programmer to adapt their >> code to take advantage of the BG's strengths; parallelism. > > The advantage of the BG platform though is that it's just MPI and > threads, nothing that unusual at all - certainly no need to learn > CUDA, > OpenCL, etc.. > If you don't learn opencl, you're gonna run behind. Vincent > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8ePSUACgkQO2KABBYQAh+hPQCggfFgdr9R9G6H7hW0Dk1/sGK+ > Fe8Aniu7M6CEThw0s7F2CtqTCmuNZMRg > =mH9r > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 08:20:40 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 14:20:40 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: Message-ID: <20120124132040.GC7343@leitl.org> On Mon, Jan 23, 2012 at 09:53:14PM -0800, Lux, Jim (337C) wrote: > Inevitably, though, massively parallel interconnects (all boxes connected > to all other boxes) won't scale. You can soup up a local 3d torus with a small network like connectivity. That keeps the the node connectivity and number of wires still manageable. Moreover, the universe does it with local connectivity (even quantum entanglement needss a relativistic channel to tell it from RNG) just fine. A 3d grid/torus would be a good match for anything that can do long-range by iterating short-range interactions. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 08:23:27 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 14:23:27 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120124132040.GC7343@leitl.org> References: <20120124132040.GC7343@leitl.org> Message-ID: <20120124132327.GE7343@leitl.org> On Tue, Jan 24, 2012 at 02:20:40PM +0100, Eugen Leitl wrote: > On Mon, Jan 23, 2012 at 09:53:14PM -0800, Lux, Jim (337C) wrote: > > Inevitably, though, massively parallel interconnects (all boxes connected > > to all other boxes) won't scale. > > You can soup up a local 3d torus with a small network s/small network/small world network > like connectivity. That keeps the the node connectivity > and number of wires still manageable. > > Moreover, the universe does it with local connectivity > (even quantum entanglement needss a relativistic channel > to tell it from RNG) just fine. A 3d grid/torus would > be a good match for anything that can do long-range > by iterating short-range interactions. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 24 11:21:54 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 24 Jan 2012 08:21:54 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <986a2d9cf54a1630130a3361fc25a547.squirrel@mail.eadline.org> Message-ID: On 1/24/12 4:48 AM, "Douglas Eadline" wrote: > >> Inevitably, though, massively parallel interconnects (all boxes >>connected >> to all other boxes) won't scale. >> >Indeed, when thinking about scale I always end up thinking about >the masters of scale -- ants > >-- Unfortunately, ants only run a small set of specialized codes, and are not the generalized computing resource that we're looking for (and, frankly, don't yet know how to effectively use, if it were to exist) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 11:24:31 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 17:24:31 +0100 Subject: [Beowulf] MIT Genius Stuffs 100 Processors Into Single Chip Message-ID: <20120124162431.GJ7343@leitl.org> http://www.wired.com/wiredenterprise/2012/01/mit-genius-stu/ MIT Genius Stuffs 100 Processors Into Single Chip By Eric Smalley January 23, 2012 | 6:30 am | Categories: Big Data, Tiny Chips, Data Centers, Hardware, Microprocessors, Servers, Spin-offs Anant Agarwal is crazy. If you say otherwise, he's not doing his job. Photo: Wired.com/Eric Smalley WESTBOROUGH, Massachusetts ? Call Anant Agarwal?s work crazy, and you?ve made him a happy man. Agarwal directs the Massachusetts Institute of Technology?s vaunted Computer Science and Artificial Intelligence Laboratory, or CSAIL. The lab is housed in the university?s Stata Center, a Dr. Seussian hodgepodge of forms and angles that nicely reflects the unhindered-by-reality visionary research that goes on inside. Agarwal and his colleagues are figuring out how to build the computer chips of the future, looking a decade or two down the road. The aim is to do research that most people think is nuts. ?If people say you?re not crazy,? Agarwal tells Wired, ?that means you?re not thinking far out enough.? Agarwal has been at this a while, and periodically, when some of his pie-in-the-sky research becomes merely cutting-edge, he dons his serial entrepreneur hat and launches the technology into the world. His latest commercial venture is Tilera. The company?s specialty is squeezing cores onto chips ? lots of cores. A core is a processor, the part of a computer chip that runs software and crunches data. Today?s high-end computer chips have as many as 16 cores. But Tilera?s top-of-the-line chip has 100. The idea is to make servers more efficient. If you pack lots of simple cores onto a single chip, you?re not only saving power. You?re shortening the distance between cores. Today, Tilera sells chips with 16, 32, and 64 cores, and it?s scheduled to ship that 100-core monster later this year. Tilera provides these chips to Quanta, the huge Taiwanese original design manufacturer (ODM) that supplies servers to Facebook and ? according to reports, Google. Quanta servers sold to the big web companies don?t yet include Tilera chips, as far as anyone is admitting. But the chips are on some of the companies? radar screens. Agarwal?s outfit is part of an ever growing movement to reinvent the server for the internet age. Facebook and Google are now designing their own servers for their sweeping online operations. Startups such as SeaMicro are cramming hundreds of mobile processors into servers in an effort to save power in the web data center. And Tilera is tackling this same task from different angle, cramming the processors into a single chip. Tilera grew out of a DARPA- and NSF-funded MIT project called RAW, which produced a prototype 16-core chip in 2002. The key idea was to combine a processor with a communications switch. Agarwal calls this creation a tile, and he?s able to build these many tiles into a piece of silicon, creating what?s known as a ?mesh network.? ?Before that you had the concept of a bunch of processors hanging off of a bus, and a bus tends to be a real bottleneck,? Agarwal says. ?With a mesh, every processor gets a switch and they all talk to each other?. You can think of it as a peer-to-peer network.? What?s more, Tilera made a critical improvement to the cache memory that?s part of each core. Agarwal and company made the cache dynamic, so that every core has a consistent copy of the chip?s data. This Dynamic Distributed Cache makes the cores act like a single chip so they can run standard software. The processors run the Linux operating system and programs written in C++, and a large chunk of Tilera?s commercialization effort focused on programming tools, including compilers that let programmers recompile existing programs to run on Tilera processors. The end result is a 64-core chip that handles more transactions and consumes less power than an equivalent batch of x86 chips. A 400-watt Tilera server can replace eight x86 servers that together draw 2,000 watts. Facebook?s engineers have given the chip a thorough tire-kicking, and Tilera says it has a growing business selling its chips to networking and videoconferencing equipment makers. Tilera isn?t naming names, but claims one of the top two videoconferencing companies and one of the top two firewall companies. An Army of Wimps There?s a running debate in the server world over what are called wimpy nodes. Startups SeaMicro and Calxeda are carving out a niche for low-power servers based on processors originally built for cellphones and tablets. Carnegie Mellon professor Dave Andersen calls these chips ?wimpy.? The idea is that building servers with more but lower-power processors yields better performance for each watt of power. But some have downplayed the idea, pointing out that it only works for certain types of applications. Tilera takes the position that wimpy cores are okay, but wimpy nodes ? aka wimpy chips ? are not. Keeping the individual cores wimpy is a plus because a wimpy core is low power. But if your cores are spread across hundreds of chips, Agarwal says, you run into problems: inter-chip communications are less efficient than on-chip communications. Tilera gets the best of both worlds by using wimpy cores but putting many cores on a chip. But it still has a ways to go. There?s also a limit to how wimpy your cores can be. Google?s infrastructure guru, Urs H?lzle, published an influential paper on the subject in 2010. He argued that in most cases brawny cores beat wimpy cores. To be effective, he argued, wimpy cores need to be no less than half the power of higher-end x86 cores. Tilera is boosting the performance of its cores. The company?s most recent generation of data center server chips, released in June, are 64-bit processors that run at 1.2 to 1.5 GHz. The company also doubled DRAM speed and quadrupled the amount of cache per core. ?It?s clear that cores have to get beefier,? Agarwal says. The whole debate, however, is somewhat academic. ?At the end of the day, the customer doesn?t care whether you?re a wimpy core or a big core,? Agarwal says. ?They care about performance, and they care about performance per watt, and they care about total cost of ownership, TCO.? Tilera?s performance per watt claims were validated by a paper published by Facebook engineers in July. The paper compared Tilera?s second generation 64-core processor to Intel?s Xeon and AMD?s Opteron high end server processors. Facebook put the processors through their paces on Memcached, a high-performance database memory system for web applications. According to the Facebook engineers, a tuned version of Memcached on the 64-core Tilera TILEPro64 yielded at least 67 percent higher throughput than low-power x86 servers. Taking power and node integration into account as well, a TILEPro64-based S2Q server with 8 processors handled at least three times as many transactions per second per Watt as the x86-based servers. Despite the glowing words, Facebook hasn?t thrown its arms around Tilera. The stumbling block, cited in the paper, is the limited amount of memory the Tilera processors support. Thirty-two-bit cores can only address about 4GB of memory. ?A 32-bit architecture is a nonstarter for the cloud space,? Agarwal says. Tilera?s 64-bit processors change the picture. These chips support as much as a terabyte of memory. Whether the improvement is enough to seal the deal with Facebook, Agarwal wouldn?t say. ?We have a good relationship,? he says with a smile. While Intel Lurks Intel is also working on many-core chips, and it expects to ship a specialized 50-core processor, dubbed Knights Corner, in the next year or so as an accelerator for supercomputers. Unlike the Tilera processors, Knights Corner is optimized for floating point operations, which means it?s designed to crunch the large numbers typical of high-performance computing applications. In 2009, Intel announced an experimental 48-core processor code-named Rock Creek and officially labeled the Single-chip Cloud Computer (SCC). The chip giant has since backed off of some of the loftier claims it was making for many-core processors, and it focused its many-core efforts on high-performance computing. For now, Intel is sticking with the Xeon processor for high-end data center server products. Dave Hill, who handles server product marketing for Intel, takes exception to the Facebook paper. ?Really what they compared was a very optimized set of software running on Tilera versus the standard image that you get from the open source running on the x86 platforms,? he says. The Facebook engineers ran over a hundred different permutations in terms of the number of cores allocated to the Linux stack, the networking stack and the Memcached stack, Hill says. ?They really kinda fine tuned it. If you optimize the x86 version, then the paper probably would have been more apples to apples.? Tilera?s roadmap calls for its next generation of processors, code-named Stratton, to be released in 2013. The product line will expand the number of processors in both directions, down to as few as four and up to as many as 200 cores. The company is going from a 40-nm to a 28-nm process, meaning they?re able to cram more circuits in a given area. The chip will have improvements to interfaces, memory, I/O and instruction set, and will have more cache memory. But Agarwal isn?t stopping there. As Tilera churns out the 100-core chip, he?s leading a new MIT effort dubbed the Angstrom project. It?s one of four DARPA-funded efforts aimed at building exascale supercomputers. In short, it?s aiming for a chip with 1,000 cores. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 24 13:13:17 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 24 Jan 2012 10:13:17 -0800 Subject: [Beowulf] balance between compute and communicate Message-ID: One of the lines in the article Eugen posted: "There's also a limit to how wimpy your cores can be. Google's infrastructure guru, Urs H?lzle, published an influential paper on the subject in 2010. He argued that in most cases brawny cores beat wimpy cores. To be effective, he argued, wimpy cores need to be no less than half the power of higher-end x86 cores." Is interesting.. I think the real issue is one of "system engineering".. you want processor speed, memory size/bandwidth, and internode communication speed/bandwidth to be "balanced". Super duper 10GHz cores with 1k of RAM interconnected with 9600bps serial links is clearly an unbalanced system.. The paper is at http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36448.pdf >From the paper: Typically, CPU power decreases by approximately O(k2) when CPU frequency decreases by k, Hmm.. this isn't necessarily true, with modern designs. In the bad old days, when core voltages were high and switching losses dominated, yes, this is the case, but with modern designs, the leakage losses are starting to be comparable to the switching losses. But that's ok, because he never comes back to the power issue again, and heads off on Amdahl's law (which we 'wulfers all know) and the inevitable single thread bottleneck that exists at some point. However, I certainly agree with him when he says: Cost numbers used by wimpy-core evangelists always exclude software development costs. Unfortunately, wimpy-core systems can require applications to be explicitly parallelized or otherwise optimized for acceptable performance.... But, I don't go for Software development costs often dominate a company's overall technical expenses I don't know that software development costs dominate. If you're building a million computer data center (distributed geographically, perhaps), that's on the order of several billion dollars, and you can buy an awful lot of skilled developer time for a billion dollars. It might cost another billion to manage all of them, but that's still an awful lot of development. But maybe in his space, the development time is more costly than the hardware purchase and operating costs. He summarizes with Once a chip's single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making..... Which is essentially my system engineering balancing argument, in the context of expectations that the surrounding stuff is current generation. So the real Computer Engineering question is: Is there some basic rule of thumb that one can use to determine appropriate balance, given things like speeds/bandwidth/power consumption? Could we, for instance, take moderately well understood implications and forecasts of future performance (e.g. Moore's law and its ilk) and predict what size machines with what performance would be reasonable in say, 20 years? The scaling rules for CPUs, for Memory, and for Communications are fairly well understood. (or maybe this is something that's covered in every lower division computer engineering class these days?.. I confess I'm woefully ignorant of what they teach at various levels these days) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Tue Jan 24 13:25:07 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 19:25:07 +0100 Subject: [Beowulf] MIT Genius Stuffs 100 Processors Into Single Chip In-Reply-To: <20120124162431.GJ7343@leitl.org> References: <20120124162431.GJ7343@leitl.org> Message-ID: I remember the first announcement some years ago from Tilera. Some persons shipped some emails to tilera asking for more details. Some just asked - like me - others also offered money to buy a cpu. They all got a 'no'. But now that there are more details the chip sounds less impressive. Let's analyze based upon the vague information on the homepage. Lots of statements that a marketing department in India would write down as such are there as well; reformulating existing slogans into more political slogans, allowing you to deny later on that it performs very well. We know that trick just all too well. First of all homepage report it's 23 watts, yet doesn't say whether that's idle or under full load. It just says 'active'. Active is a vague way of formulating. I assume that's a core that isn't idle yet isn't under 100% load. So then it eats like a portion of the power. So probably it's a watt or 50 under full load. Then it says 64 cores in a grid @ 700Mhz. 700Mhz sounds as a possible Ghz frequency that you can get if you're a professional (if i'd build something count at it that it'll run 300Mhz or so). Doesn't seem like weird claim. 64 * 0.7 = 44.8Ghz measure Yet at the same time it claims on homepage 443 billion operations per second. What is an operation? Is that an internal iop? It says it's 32 bits VLIW. So that would mean it's processing each cycle 10 integers. Now we know from all other manufacturers they cheat factor 2, by double counting if just 1 instruction theirs is doing for example Fused Multiply Add. So we can divide it by 2 probably and get to 220 gflop. So then a vector would be 5 integers long, which seems like a weird measure. Maybe they rounded it up a tad and in reality mean 4 integers, sounds most reasonable. So then it's 64 cores in a grid executing vectors existing out of 4 units of 32 bits. Sounds plausible. If we compare that with some GPU's which are in our notebooks from a few years ago, then suddenly it's not so impressive. Vincent On Jan 24, 2012, at 5:24 PM, Eugen Leitl wrote: > > http://www.wired.com/wiredenterprise/2012/01/mit-genius-stu/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Tue Jan 24 17:36:14 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Wed, 25 Jan 2012 09:36:14 +1100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: <4F1F325E.9010109@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 23:51, Vincent Diepeveen wrote: > You build a system of millions of euro's alltogether, NCSA having a > huge budget and you can't even pay for a few programmers who > write some crunching code for gpu's???? I was at a meeting at SC'06 where the folks from various large institutions in the US were bemoaning the fact that there was all this money for petaflop hardware available but none for programmers or algorithm development to make apps scale out to the systems. Just because the scientists say it's a good thing to have doesn't mean the US government funding people will listen to them.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8fMl4ACgkQO2KABBYQAh95lwCfQodU25X1A0yngWOOwuAqmU2X thAAoICeeMk8fwx33enCWQ/XGvatdsEc =OFC+ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Jan 25 17:01:48 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 25 Jan 2012 17:01:48 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: <4F207BCC.9010701@ias.edu> On 01/24/2012 12:02 AM, Steve Crusan wrote: > > > On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: > > > > It's 500 euro for a 1 teraflop double precision Radeon HD7970... > > > Great, and nothing runs on it. GPUs are insanely useful for certain > tasks, but they aren't going to be able to handle most normal > workloads(similar to the BG class of course). Any center that buys BGP > (or Q at this point) gear is going to pay for a scientific programmer > to adapt their code to take advantage of the BG's strengths; parallelism. > > But It's nice that supercomputing centers use GPUs to boost their > flops numbers. Any word on that Chinese system's efficiency? If you > look at the architecture of the new K computer in Japan, it's similar > to the BlueGene line. I attended a presentation at Princeton U. on Monday about the state of HPC in China. The talk was given by someone who has been to China and spoken with the leaders of their HPC efforts. While the Chinese systems get great scores on LINPACK, even the Chinese concede that on their "real" applications, they are getting well below the theoretical max flops, because their codes aren't getting the most out of their systems. In other words, on real programs, they aren't all that efficient (yet). -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 25 19:46:57 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 26 Jan 2012 01:46:57 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F207BCC.9010701@ias.edu> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> <4F207BCC.9010701@ias.edu> Message-ID: <76840233-6CA8-4B9E-BF66-4A1A93CD1F1F@xs4all.nl> The supercomputing codes i saw run on processors, to say polite, were losing it everywhere. Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported publicly a speedup of factor 2 in the forums. However my own chessprogram, not exactly optimized for itanium2, got a boost of factor 4 moving from 500Mhz R14000 (origin3800) to itanium2 1.3Ghz. That was just a single compile, and it's an integer program, whereas the itanium2 is a floating point processor. The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has 1 Gflop on paper. Now a Chinese reporter posted on THIS mailing list, the beowulf mailing list, already at GPU hardware some generations ago an IPC of 25% at nvidia and 50% at AMD. At the same gpu's back then, most studentprojects got around 25% at nvidia; Volkov then went ahead and understood GPU's better and scored 70% efficiency - again at very old gpu's. Sincethen they really improved. See: http://www.cs.berkeley.edu/~volkov/ So you want to build a supercomptuer now 10x more expensive, and each generation lose more efficiency on newer hardware, whereas some who do effort to write new good code, they get very high efficiency? Just learn how to program and ignore the desinformation - if you have a box that fast you really can get a lot of speed out of it. You shouldn't ask for a 1 billion dollar box that can run your oldschool Fortran codes as good as a 5 million GPU box, look what you can do to write good codes for that manycore hardware. OpenGL works at all, CUDA just at nvidia. Vincent On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote: > On 01/24/2012 12:02 AM, Steve Crusan wrote: >> >> >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: >> >> >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970... >> >> >> Great, and nothing runs on it. GPUs are insanely useful for certain >> tasks, but they aren't going to be able to handle most normal >> workloads(similar to the BG class of course). Any center that buys >> BGP >> (or Q at this point) gear is going to pay for a scientific programmer >> to adapt their code to take advantage of the BG's strengths; >> parallelism. >> >> But It's nice that supercomputing centers use GPUs to boost their >> flops numbers. Any word on that Chinese system's efficiency? If you >> look at the architecture of the new K computer in Japan, it's similar >> to the BlueGene line. > > I attended a presentation at Princeton U. on Monday about the state of > HPC in China. The talk was given by someone who has been to China and > spoken with the leaders of their HPC efforts. While the Chinese > systems > get great scores on LINPACK, even the Chinese concede that on their > "real" applications, they are getting well below the theoretical max > flops, because their codes aren't getting the most out of their > systems. > In other words, on real programs, they aren't all that efficient > (yet). > > -- > Prentice > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 26 00:04:31 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 25 Jan 2012 21:04:31 -0800 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F1F325E.9010109@unimelb.edu.au> Message-ID: On 1/24/12 2:36 PM, "Christopher Samuel" wrote: >institutions in the US were bemoaning the fact that there was all this >money for petaflop hardware available but none for programmers or >algorithm development to make apps scale out to the systems. That's partly because people are an expense, while hardware is an asset that sits on the balance sheet. If I fork out a million bucks for a computer, I now have an asset that is worth a million dollars. If I fork out a million dollars for 3 skilled developers for a year, at the end of the year, it's not clear I'll possess an asset that I can sell for a million dollars. Obviously, the work product must be worth something, because otherwise we wouldn't have jobs, but the connection is more tenuous. The other thing (when government funding is considered) is that the million dollar hardware purchase might turn into more jobs than the 3 software weenies, if only because "computer assemblers and deliverers" get paid a lot less, and when it comes to statistics, they don't look at "cumulative wages", they look at "number of people employed" > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 26 07:28:41 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 26 Jan 2012 13:28:41 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: Mike you replied to me not to mailing list. note that itanium2 released too late and it was $100k a box initially and $7500 a cpu (1.5Ghz) if you ordered a 1000. And it had same IPC for integers like opteron at the time (later on compilers got pgo for opteron as well and then opteron was faster, at least for diep, in ipc). Larrabee indeed resembles itanium to some extend, but not quite. intels expertise is producing highclocked cpu's. itanium was a low clocked cpu and therefore failed. no one pays big bucks for a low clocked cpu. look on ebay - cheapest cpu's always the lowclocked ones. larrabee is something in between a cpu and a gpu so total other ballgame - intel moving to a market where they actually have competition and are not the ones owning the patents. So that's not gonna be easy for intel some years from now if they show up with a 100% vectorized design and not some dreadnought in between cpu and gpu which is low clocked. As for your infiniband remark realize that it took 25 years or so to bugfix ethernet everywhere - forget 'setting a new standard' there for the average Joe. Not gonna work. Infiniband is meant for HPC and uses MPI protocol to communicate. This is very powerful for clusters and the way to go when scaling at supercomputers, yet it's not gonna conquer average joe's machine, as there is a price to pay which is too high for now. However realize some of sales of the HPC manufacturers goes to low latency ethernet - my guess is that intel will use qlogics know how there to improve their cheapo cpu's and upgrade them with better ethernet. Seems plausible goal and a very useful one, the rest, such as rivalling Mellanox at ethernet, that's not gonna happen. On Jan 26, 2012, at 7:23 AM, MDG wrote: > Technically the Itanium Chip was a failure, it was not x86 100% > compatible and actually was for servers but often under-preformed > the traditional x86 chips, Intel let it quietly vanish as it came > nowhere near the first advertised performance. It varied too far > from the x86 architecture design requiring special programing code, > so much like the GPUs, though they are actually able to run some > parallel process, both under Windows and Linux. > > There is a difference the M series NVIDIA cards are moe for servers > and the C series such as the C2070 or C2075 for Workstations, the M > series also used the same numbering sequence and I think they are > up to the 2090 or 2095 series, but you do need PCIe high speed > slots for both sets of cards. Most resale cards I have talked to a > few, and be careful there are some knockoffs from mainland China, I > verified this with NVIDIA. > > These GPUs are designed that they are not seen as cores or cpus, > also most resale?s, are pulled from in one case a pool of HP > Workstations and servers, yet the seller had no idea the difference > between the C2070 and the M2070s. and as I said none had of them > had the required software, most did not even know it was needed! > Otherwise the GPUs do not function. So, as for resale?s it is a > pretty expensive gamble as they are untested as no software to even > try them with! > > The GPUs can be used if you wtrite your own parallel code usually > in C++ per NVIDIA, but you still need the software to offload the > work to them. If you are into heavy number crunching, assuming > allows parallel processing versus the traditional linear method > where a must always come before B and b before C in processes, you > will see a lot more results than a typical program, in other things > you will see little improvement, my talk with an NVIDIA technician > confirmed this you can get a great results for creating say > graphics but very little improvement to display a already designed > piece, same for statistics, weather forecasting, geology, > technically intel has even used their network as a massive HPC to > elp design chips, so add engineering, while beyond most physics and > nuclear explosions simulations, etc. > > Also with fiber optics now coming down in price the idea of > multiple super-workstations and even super-servers where a client > server relationship and the Server does most of the processing will > most likely grow into stable and usable systems before the average > work-station. > > It will help some with a statistics driven database but not that > much for a pure relational database, it also works well with > MathLab and SPSS. > > > > Overall I would expect that the GPUs will soon have more code > written for them as they become more plentiful in the real world > applications, also there is open source code that is available and > being further developed under linux, which with Wine and Winex can > run Windows, to some degree, not 100% and as for Windows 7 I have > not a clue if it will run under Wine or WineX, though the > Macintosh?s now run Windows very well as a second operating > system.. Than I would like to have 4 12 core Xeons in my > workstation but that bill is far higher than a few 448 GPU cards. > Just as any new technology it starts on the high end and then as > developed works its way down the price chain, than I was shocked to > see a twin Xeon 6 core in a Game machine! So things are moving > faster than I anticipated. > > > > I know I am watching the GPU idea and cards carefully as so far > beyond just throwing more cores in the x86 architecture it seems to > be moving far faster than when intel started moving upwards, maybe > you remember the hardware flaw in the first Pentiums where simple > math was processed incorrectly? Like all things when you introduce > new variables into a system, be it hardware or software, there are > a lot of things that will not always work or work to the potential > of the system. > > > > As I said I am watching the GPUs closely as so far they seem the > most likely next beak-through as software is written that can take > advantage of their unique abilities. Also from what I have read > they draw far less power than even the new generation of multi-core > x86 series. I am not an expert with these GPU systems but they do > hold a great promise as in a leap-forward than just adding x86 cores. > > The buying of Infiniband shows hat Intel is looking to move past > the copper Ethernet systems, which surpased Arcnet systems. the > only constamnt is change, while technically not an Intel Chip this > still shows Moore's law is being leveraged to other platforms > including GPUs > > Mike. > > --- On Wed, 1/25/12, Vincent Diepeveen wrote: > > From: Vincent Diepeveen > Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic > InfiniBand business > To: "Prentice Bisbal" > Cc: "Beowulf Mailing List" > Date: Wednesday, January 25, 2012, 2:46 PM > > The supercomputing codes i saw run on processors, to say polite, were > losing it everywhere. > > Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported > publicly a speedup of factor 2 in the forums. > > However my own chessprogram, not exactly optimized for itanium2, got > a boost of factor 4 moving from 500Mhz R14000 (origin3800) > to itanium2 1.3Ghz. That was just a single compile, and it's an > integer program, whereas the itanium2 is a floating point processor. > > The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has > 1 Gflop on paper. > > Now a Chinese reporter posted on THIS mailing list, the beowulf > mailing list, already at GPU hardware some generations ago > an IPC of 25% at nvidia and 50% at AMD. > > At the same gpu's back then, most studentprojects got around 25% at > nvidia; Volkov then went ahead and understood GPU's better > and scored 70% efficiency - again at very old gpu's. Sincethen they > really improved. > > See: http://www.cs.berkeley.edu/~volkov/ > > So you want to build a supercomptuer now 10x more expensive, and each > generation lose more efficiency on newer hardware, > whereas some who do effort to write new good code, they get very high > efficiency? > > Just learn how to program and ignore the desinformation - if you have > a box that fast you really can get a lot of speed out of it. > > You shouldn't ask for a 1 billion dollar box that can run your > oldschool Fortran codes as good as a 5 million GPU box, > look what you can do to write good codes for that manycore hardware. > OpenGL works at all, CUDA just at nvidia. > > Vincent > > On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote: > > > On 01/24/2012 12:02 AM, Steve Crusan wrote: > >> > >> > >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: > >> > >> > >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970... > >> > >> > >> Great, and nothing runs on it. GPUs are insanely useful for certain > >> tasks, but they aren't going to be able to handle most normal > >> workloads(similar to the BG class of course). Any center that buys > >> BGP > >> (or Q at this point) gear is going to pay for a scientific > programmer > >> to adapt their code to take advantage of the BG's strengths; > >> parallelism. > >> > >> But It's nice that supercomputing centers use GPUs to boost their > >> flops numbers. Any word on that Chinese system's efficiency? If you > >> look at the architecture of the new K computer in Japan, it's > similar > >> to the BlueGene line. > > > > I attended a presentation at Princeton U. on Monday about the > state of > > HPC in China. The talk was given by someone who has been to > China and > > spoken with the leaders of their HPC efforts. While the Chinese > > systems > > get great scores on LINPACK, even the Chinese concede that on their > > "real" applications, they are getting well below the theoretical max > > flops, because their codes aren't getting the most out of their > > systems. > > In other words, on real programs, they aren't all that efficient > > (yet). > > > > -- > > Prentice > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > > Computing > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 26 07:35:40 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 26 Jan 2012 13:35:40 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: On Jan 26, 2012, at 1:28 PM, Vincent Diepeveen wrote: > Mike you replied to me not to mailing list. > > note that itanium2 released too late and it was $100k a box > initially and $7500 a cpu (1.5Ghz) if you ordered a 1000. > And it had same IPC for integers like opteron at the time (later on > compilers got pgo for opteron as well and then opteron was faster, > at least for diep, in ipc). > > Larrabee indeed resembles itanium to some extend, but not quite. > intels expertise is producing highclocked cpu's. itanium was a low > clocked cpu and therefore failed. > no one pays big bucks for a low clocked cpu. look on ebay - > cheapest cpu's always the lowclocked ones. > > larrabee is something in between a cpu and a gpu so total other > ballgame - intel moving to a market where they actually have > competition > and are not the ones owning the patents. > > So that's not gonna be easy for intel some years from now if they > show up with a 100% vectorized design and not some dreadnought > in between cpu and gpu which is low clocked. > > As for your infiniband remark realize that it took 25 years or so > to bugfix ethernet everywhere - forget 'setting a new standard' > there for the average Joe. > Not gonna work. > > Infiniband is meant for HPC and uses MPI protocol to communicate. > This is very powerful for clusters and the way to go when scaling > at supercomputers, > yet it's not gonna conquer average joe's machine, as there is a > price to pay which is too high for now. > > However realize some of sales of the HPC manufacturers goes to low > latency ethernet - my guess is that intel will use qlogics know how > there to improve > their cheapo cpu's and upgrade them with better ethernet. Seems > plausible goal and a very useful one, the rest, such as rivalling > Mellanox at ethernet, > that's not gonna happen. > Oops small typo during speedy write. "mellanox at ethernet" should of course be 'mellanox at HPC'. The question is whether typical low latency ethernet products are gonna suffer from intels move. I doubt solarflare will. they already deliver this stuff only to those who really battle for every picosecond, so price is just not the issue there. Vincent > On Jan 26, 2012, at 7:23 AM, MDG wrote: > >> Technically the Itanium Chip was a failure, it was not x86 100% >> compatible and actually was for servers but often under-preformed >> the traditional x86 chips, Intel let it quietly vanish as it came >> nowhere near the first advertised performance. It varied too far >> from the x86 architecture design requiring special programing >> code, so much like the GPUs, though they are actually able to run >> some parallel process, both under Windows and Linux. >> >> There is a difference the M series NVIDIA cards are moe for >> servers and the C series such as the C2070 or C2075 for >> Workstations, the M series also used the same numbering sequence >> and I think they are up to the 2090 or 2095 series, but you do >> need PCIe high speed slots for both sets of cards. Most resale >> cards I have talked to a few, and be careful there are some >> knockoffs from mainland China, I verified this with NVIDIA. >> >> These GPUs are designed that they are not seen as cores or cpus, >> also most resale?s, are pulled from in one case a pool of HP >> Workstations and servers, yet the seller had no idea the >> difference between the C2070 and the M2070s. and as I said none >> had of them had the required software, most did not even know it >> was needed! Otherwise the GPUs do not function. So, as for >> resale?s it is a pretty expensive gamble as they are untested as >> no software to even try them with! >> >> The GPUs can be used if you wtrite your own parallel code usually >> in C++ per NVIDIA, but you still need the software to offload the >> work to them. If you are into heavy number crunching, assuming >> allows parallel processing versus the traditional linear method >> where a must always come before B and b before C in processes, you >> will see a lot more results than a typical program, in other >> things you will see little improvement, my talk with an NVIDIA >> technician confirmed this you can get a great results for creating >> say graphics but very little improvement to display a already >> designed piece, same for statistics, weather forecasting, geology, >> technically intel has even used their network as a massive HPC to >> elp design chips, so add engineering, while beyond most physics >> and nuclear explosions simulations, etc. >> >> Also with fiber optics now coming down in price the idea of >> multiple super-workstations and even super-servers where a client >> server relationship and the Server does most of the processing >> will most likely grow into stable and usable systems before the >> average work-station. >> >> It will help some with a statistics driven database but not that >> much for a pure relational database, it also works well with >> MathLab and SPSS. >> >> >> >> Overall I would expect that the GPUs will soon have more code >> written for them as they become more plentiful in the real world >> applications, also there is open source code that is available and >> being further developed under linux, which with Wine and Winex can >> run Windows, to some degree, not 100% and as for Windows 7 I have >> not a clue if it will run under Wine or WineX, though the >> Macintosh?s now run Windows very well as a second operating >> system.. Than I would like to have 4 12 core Xeons in my >> workstation but that bill is far higher than a few 448 GPU cards. >> Just as any new technology it starts on the high end and then as >> developed works its way down the price chain, than I was shocked >> to see a twin Xeon 6 core in a Game machine! So things are moving >> faster than I anticipated. >> >> >> >> I know I am watching the GPU idea and cards carefully as so far >> beyond just throwing more cores in the x86 architecture it seems >> to be moving far faster than when intel started moving upwards, >> maybe you remember the hardware flaw in the first Pentiums where >> simple math was processed incorrectly? Like all things when you >> introduce new variables into a system, be it hardware or software, >> there are a lot of things that will not always work or work to the >> potential of the system. >> >> >> >> As I said I am watching the GPUs closely as so far they seem the >> most likely next beak-through as software is written that can take >> advantage of their unique abilities. Also from what I have read >> they draw far less power than even the new generation of multi- >> core x86 series. I am not an expert with these GPU systems but >> they do hold a great promise as in a leap-forward than just adding >> x86 cores. >> >> The buying of Infiniband shows hat Intel is looking to move past >> the copper Ethernet systems, which surpased Arcnet systems. the >> only constamnt is change, while technically not an Intel Chip this >> still shows Moore's law is being leveraged to other platforms >> including GPUs >> >> Mike. >> >> --- On Wed, 1/25/12, Vincent Diepeveen wrote: >> >> From: Vincent Diepeveen >> Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic >> InfiniBand business >> To: "Prentice Bisbal" >> Cc: "Beowulf Mailing List" >> Date: Wednesday, January 25, 2012, 2:46 PM >> >> The supercomputing codes i saw run on processors, to say polite, were >> losing it everywhere. >> >> Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported >> publicly a speedup of factor 2 in the forums. >> >> However my own chessprogram, not exactly optimized for itanium2, got >> a boost of factor 4 moving from 500Mhz R14000 (origin3800) >> to itanium2 1.3Ghz. That was just a single compile, and it's an >> integer program, whereas the itanium2 is a floating point processor. >> >> The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has >> 1 Gflop on paper. >> >> Now a Chinese reporter posted on THIS mailing list, the beowulf >> mailing list, already at GPU hardware some generations ago >> an IPC of 25% at nvidia and 50% at AMD. >> >> At the same gpu's back then, most studentprojects got around 25% at >> nvidia; Volkov then went ahead and understood GPU's better >> and scored 70% efficiency - again at very old gpu's. Sincethen they >> really improved. >> >> See: http://www.cs.berkeley.edu/~volkov/ >> >> So you want to build a supercomptuer now 10x more expensive, and each >> generation lose more efficiency on newer hardware, >> whereas some who do effort to write new good code, they get very high >> efficiency? >> >> Just learn how to program and ignore the desinformation - if you have >> a box that fast you really can get a lot of speed out of it. >> >> You shouldn't ask for a 1 billion dollar box that can run your >> oldschool Fortran codes as good as a 5 million GPU box, >> look what you can do to write good codes for that manycore hardware. >> OpenGL works at all, CUDA just at nvidia. >> >> Vincent >> >> On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote: >> >> > On 01/24/2012 12:02 AM, Steve Crusan wrote: >> >> >> >> >> >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: >> >> >> >> >> >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970... >> >> >> >> >> >> Great, and nothing runs on it. GPUs are insanely useful for >> certain >> >> tasks, but they aren't going to be able to handle most normal >> >> workloads(similar to the BG class of course). Any center that buys >> >> BGP >> >> (or Q at this point) gear is going to pay for a scientific >> programmer >> >> to adapt their code to take advantage of the BG's strengths; >> >> parallelism. >> >> >> >> But It's nice that supercomputing centers use GPUs to boost their >> >> flops numbers. Any word on that Chinese system's efficiency? If >> you >> >> look at the architecture of the new K computer in Japan, it's >> similar >> >> to the BlueGene line. >> > >> > I attended a presentation at Princeton U. on Monday about the >> state of >> > HPC in China. The talk was given by someone who has been to >> China and >> > spoken with the leaders of their HPC efforts. While the Chinese >> > systems >> > get great scores on LINPACK, even the Chinese concede that on their >> > "real" applications, they are getting well below the theoretical >> max >> > flops, because their codes aren't getting the most out of their >> > systems. >> > In other words, on real programs, they aren't all that efficient >> > (yet). >> > >> > -- >> > Prentice >> > >> > >> > >> > _______________________________________________ >> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> > Computing >> > To change your subscription (digest mode or unsubscribe) visit >> > http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Thu Jan 26 18:27:21 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 27 Jan 2012 10:27:21 +1100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: <4F21E159.7000905@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 26/01/12 23:28, Vincent Diepeveen wrote: > Mike you replied to me not to mailing list. That was probably deliberate, and it is inconsiderate to post a reply publicly without checking with the writer that they are OK with that, especially as you quoted what they wrote - they may not have wanted that in the public domain. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8h4VkACgkQO2KABBYQAh9lJgCfQXwsmDG9l1v4Jt9vUr5YYCr0 fDYAoJdJBbUJBApO5ZOh200gZ5+Lo/vt =mpU4 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Thu Jan 26 20:48:55 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 26 Jan 2012 20:48:55 -0500 (EST) Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: > Larrabee indeed resembles itanium to some extend, but not quite. wow, that has to be your most loosely-tethered-to-reality statement yet! it's true that Larrabee and Itanium are very close in the number of letters in their name. > Infiniband is meant for HPC and uses MPI protocol to communicate. no and no. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 01:04:17 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 07:04:17 +0100 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: On Jan 27, 2012, at 2:48 AM, Mark Hahn wrote: >> Larrabee indeed resembles itanium to some extend, but not quite. > > wow, that has to be your most loosely-tethered-to-reality statement > yet! > it's true that Larrabee and Itanium are very close > in the number of letters in their name. Your personal attack seems to indicate you disagree with my qualification of the entire Larrabee line having any reality sense in the long run. Instead of throwing mudd, mind to explain why a Larrabee, an architecture far away from mainstream, makes any chance of competing in HPC with the existing architectural concepts in the long run? Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 01:06:07 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 07:06:07 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F21E159.7000905@unimelb.edu.au> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> Message-ID: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> Why do you write this? On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 26/01/12 23:28, Vincent Diepeveen wrote: > >> Mike you replied to me not to mailing list. > > That was probably deliberate, and it is inconsiderate to post a reply > publicly without checking with the writer that they are OK with that, > especially as you quoted what they wrote - they may not have wanted > that > in the public domain. > > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8h4VkACgkQO2KABBYQAh9lJgCfQXwsmDG9l1v4Jt9vUr5YYCr0 > fDYAoJdJBbUJBApO5ZOh200gZ5+Lo/vt > =mpU4 > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 27 10:37:43 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 27 Jan 2012 10:37:43 -0500 (EST) Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: >>> Larrabee indeed resembles itanium to some extend, but not quite. >> >> wow, that has to be your most loosely-tethered-to-reality statement >> yet! >> it's true that Larrabee and Itanium are very close >> in the number of letters in their name. > > Your personal attack seems to indicate you disagree with my > qualification of the entire Larrabee line > having any reality sense in the long run. not surprisingly, no: I disagree that Larrabee and Itanium resemble each other in any but really silly ways. Itanium is a custom, VLIW architecture; Larrabee is an on-chip cluster of non-VLIW, commodity x86_64 cores. none of the distinctive features of Itanium (multi-instruction bundles, dependency on compile-time scheduling, intended market, implementation, success limited to predictable, high-bandwidth situations, directory-based inter-node cache coherency) are anything close to the features of Larrabee (standard x86_64 ISA, no special compiler needed, on-chip message-passing network, suitable for complex/dynamic/unpredictable loads, possibly not even cache-coherent across one chip.) my guess is that you were thinking about how ia64 chips tended to run at low clock rates, and thinking about how gpus (probably including larrabee) also tend to be low-clocked. > Instead of throwing mudd, mind to explain why a Larrabee, > an architecture far away from mainstream, makes any chance of > competing in HPC > with the existing architectural concepts in the long run? as far as I know, larrabee will be a mesh of conventional x86_64 cores that will run today's x86_64 code. I don't know whether Intel has stated (or even decided) whether the cores will have full or partial cache coherency, or whether they'll really be an MPI-like shared-nothing cluster. if you want to compare Larrabee to Fermi or AMD GCN, that might be interesting. or to mainstream multicore - like bulldozer, with 32c per package vs larrabee with ">=50". but not ia64. it's best we all just forget about it. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 27 10:39:06 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 27 Jan 2012 10:39:06 -0500 (EST) Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> Message-ID: > Why do you write this? because he though you might be interested in improving your etiquette. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 10:42:48 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 10:42:48 -0500 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: <4F22C5F8.6010804@scalableinformatics.com> On 01/27/2012 10:37 AM, Mark Hahn wrote: >>>> Larrabee indeed resembles itanium to some extend, but not quite. >>> >>> wow, that has to be your most loosely-tethered-to-reality statement >>> yet! >>> it's true that Larrabee and Itanium are very close >>> in the number of letters in their name. >> >> Your personal attack seems to indicate you disagree with my >> qualification of the entire Larrabee line >> having any reality sense in the long run. > > not surprisingly, no: I disagree that Larrabee and Itanium resemble > each other in any but really silly ways. > > Itanium is a custom, VLIW architecture; Larrabee is an on-chip > cluster of non-VLIW, commodity x86_64 cores. But ... but .... they are both made of Silicon .... doesn't that mean they are the same? /sarc (Sorry, its been a fun week ... and this was just ... too ... irresistible ...) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 11:06:00 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 11:06:00 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> Message-ID: <4F22CB68.3080605@ias.edu> Vincent, He wrote that because he's trying to educate you on proper mailing list etiquette, which is something you appear to be lacking. Chris is absolutely right - you should not reply to off-list e-mails on-list. -- Prentice On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: > Why do you write this? > > On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: > > On 26/01/12 23:28, Vincent Diepeveen wrote: > > >>> Mike you replied to me not to mailing list. > > That was probably deliberate, and it is inconsiderate to post a reply > publicly without checking with the writer that they are OK with that, > especially as you quoted what they wrote - they may not have wanted > that > in the public domain. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 11:12:35 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 17:12:35 +0100 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> On Jan 27, 2012, at 4:37 PM, Mark Hahn wrote: >>>> Larrabee indeed resembles itanium to some extend, but not quite. >>> >>> wow, that has to be your most loosely-tethered-to-reality statement >>> yet! >>> it's true that Larrabee and Itanium are very close >>> in the number of letters in their name. >> >> Your personal attack seems to indicate you disagree with my >> qualification of the entire Larrabee line >> having any reality sense in the long run. > > not surprisingly, no: I disagree that Larrabee and Itanium resemble > each other in any but really silly ways. > > Itanium is a custom, VLIW architecture; Larrabee is an on-chip > cluster of non-VLIW, commodity x86_64 cores. > > none of the distinctive features of Itanium (multi-instruction > bundles, > dependency on compile-time scheduling, intended market, > implementation, > success limited to predictable, high-bandwidth situations, > directory-based > inter-node cache coherency) are anything close to the features of > Larrabee > (standard x86_64 ISA, no special compiler needed, on-chip message- > passing > network, suitable for complex/dynamic/unpredictable loads, possibly > not even > cache-coherent across one chip.) > > my guess is that you were thinking about how ia64 chips tended to > run at low clock rates, and thinking about how gpus (probably > including > larrabee) also tend to be low-clocked. > And both are seem failures from user viewpoint, maybe not from intels income viewpoint, but from intels aim to replace and/or create a new long lasting architecture that can even *remotely* compete with other manufacturers, not to mention far too high pricepoints for such cpu's. >> Instead of throwing mudd, mind to explain why a Larrabee, >> an architecture far away from mainstream, makes any chance of >> competing in HPC >> with the existing architectural concepts in the long run? > > as far as I know, larrabee will be a mesh of conventional x86_64 cores > that will run today's x86_64 code. I don't know whether Intel has > stated > (or even decided) whether the cores will have full or partial cache > coherency, or whether they'll really be an MPI-like shared-nothing > cluster. Assuming you're not completely born stupid, i assume you will realize that IN ORDER to run most existing x64 codes, it needs to have cache coherency, and that it always has been presented as having exactly that. Which is one of reasons why the architecture doesn't scale of course. Well you can forget about them running your x64 fortran codes on it at any fast speed. You need to total rewrite your code to be able to use vectors of doubles, and in contradiction to GPU's where you can indirectly with arrays see each PE or each 'compute core' (which is 4 PE's of in case of AMD-ATI that can execute 1 double a cycle), Such lookups are a disaster at larrabee - having a cost of 7 cycles for indirect lookups, so you really need to use vectors. Now i bet majority of your oldie x64 code doesn't use such huge vectors, so to even get some remote performance out of it, a total rewrite of most code is needed, if it can work at all. We can then also see the insight that GPU's are total superior to larrabee at most terrains and most importantly at multiplicative codes. As you might know GPU's are worldchampion in doing multiplications and CPU's are not. Multiplication happens to be something that is of major importance for the majority of HPC codes. Majority i really mean - approaching 90% at the public supercomputers. Vincent > > if you want to compare Larrabee to Fermi or AMD GCN, that might be > interesting. or to mainstream multicore - like bulldozer, with 32c > per package vs larrabee with ">=50". > > but not ia64. it's best we all just forget about it. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 11:15:05 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 17:15:05 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F22CB68.3080605@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> Message-ID: And why do you post this? On Jan 27, 2012, at 5:06 PM, Prentice Bisbal wrote: > Vincent, > > He wrote that because he's trying to educate you on proper mailing > list > etiquette, which is something you appear to be lacking. > > Chris is absolutely right - you should not reply to off-list e-mails > on-list. > > -- > Prentice > > On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: >> Why do you write this? >> >> On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: >> >> On 26/01/12 23:28, Vincent Diepeveen wrote: >> >>>>> Mike you replied to me not to mailing list. >> >> That was probably deliberate, and it is inconsiderate to post a reply >> publicly without checking with the writer that they are OK with that, >> especially as you quoted what they wrote - they may not have wanted >> that >> in the public domain. >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at cse.psu.edu Fri Jan 27 11:25:15 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Fri, 27 Jan 2012 11:25:15 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> Message-ID: <4F22CFEB.6080404@cse.psu.edu> On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: > And why do you post this? "Assuming you're not completely born stupid, i assume you will realize that IN ORDER to" write an effective email that conveys some idea or argument, it is extremely helpful to utilize some form of etiquette or at the very least, self-restraint in your writing so we all don't stop reading your emails. In fact, while it's not a terribly great book IMHO, it might still help to read "How to Win Friends and Influence People." Seems like you have enough time on your hands to write near-to-incoherent emails on this list and program near-to-impossible applications for GPUs, so perhaps if you can steal a little time from one or the other you can finish it in a day or so. But admittedly, perhaps requesting etiquette from you is truly an unthinkable thing to do. Hence your boggled state of mind. ellis > > On Jan 27, 2012, at 5:06 PM, Prentice Bisbal wrote: > >> Vincent, >> >> He wrote that because he's trying to educate you on proper mailing >> list >> etiquette, which is something you appear to be lacking. >> >> Chris is absolutely right - you should not reply to off-list e-mails >> on-list. >> >> -- >> Prentice >> >> On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: >>> Why do you write this? >>> >>> On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: >>> >>> On 26/01/12 23:28, Vincent Diepeveen wrote: >>> >>>>>> Mike you replied to me not to mailing list. >>> >>> That was probably deliberate, and it is inconsiderate to post a reply >>> publicly without checking with the writer that they are OK with that, >>> especially as you quoted what they wrote - they may not have wanted >>> that >>> in the public domain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 11:34:41 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 11:34:41 -0500 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: <4F22D221.3020504@ias.edu> On 01/27/2012 11:12 AM, Vincent Diepeveen wrote: > And both are seem failures from user viewpoint, maybe not from intels > income viewpoint, > but from intels aim to replace and/or create a new long lasting > architecture > that can even *remotely* compete with other manufacturers, > not to mention far too high pricepoints for such cpu's. This argument is ridiculous. Just because two completely different technologies (architectures) both fail, doesn't make them similar. That's like saying a Ford Edsel and Pontiac Aztek are similar cars. > Assuming you're not completely born stupid, i assume you will realize > that IN ORDER to run Calling someone "completely born stupid" is unacceptable behavior. > most existing x64 codes, it needs to have cache coherency, and that > it always has been > presented as having exactly that. > Which is one of reasons why the architecture doesn't scale of course. Cache-coherent systems don't scale well? Really? SGI Origins were ccNUMA systems, and they scaled well. > Well you can forget about them running your x64 fortran codes on it > at any fast speed. > > You need to total rewrite your code to be able to use vectors of > doubles, > and in contradiction to GPU's where you can indirectly with arrays > see each PE or each 'compute core' > (which is 4 PE's of in case of AMD-ATI that can execute 1 double a This argument makes no sense in the context of this discussion. You need to do a significant rewrite of your code to take advantage of GPUs, too, so how are GPUs better? > cycle), > > Such lookups are a disaster at larrabee - having a cost of 7 cycles > for indirect lookups, > so you really need to use vectors. > > Now i bet majority of your oldie x64 code doesn't use such huge vectors, > so to even get some remote performance out of it, a total rewrite of > most code is needed, > if it can work at all. > > We can then also see the insight that GPU's are total superior to > larrabee at most terrains and > most importantly at multiplicative codes. > > As you might know GPU's are worldchampion in doing multiplications > and CPU's are not. > > Multiplication happens to be something that is of major importance > for the majority of HPC codes. > Majority i really mean - approaching 90% at the public supercomputers. I'm at a loss for words... Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 11:38:02 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 11:38:02 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> Message-ID: <4F22D2EA.1080309@ias.edu> Vincent, I posted that because you asked a question and I answered it, which is also good mailing list etiquette. Since you posted your question "Why do you write this?" to the mailing list instead of replying just to Chris, anyone on this list is free to reply to it. Again, this is basic mailing list etiquette. -- Prentice On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: > And why do you post this? > > On Jan 27, 2012, at 5:06 PM, Prentice Bisbal wrote: > >> Vincent, >> >> He wrote that because he's trying to educate you on proper mailing >> list >> etiquette, which is something you appear to be lacking. >> >> Chris is absolutely right - you should not reply to off-list e-mails >> on-list. >> >> -- >> Prentice >> >> On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: >>> Why do you write this? >>> >>> On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: >>> >>> On 26/01/12 23:28, Vincent Diepeveen wrote: >>> >>>>>> Mike you replied to me not to mailing list. >>> That was probably deliberate, and it is inconsiderate to post a reply >>> publicly without checking with the writer that they are OK with that, >>> especially as you quoted what they wrote - they may not have wanted >>> that >>> in the public domain. >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 11:41:55 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 17:41:55 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F22CFEB.6080404@cse.psu.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> Message-ID: <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> On Jan 27, 2012, at 5:25 PM, Ellis H. Wilson III wrote: > On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: >> And why do you post this? So you can follow all etiquette, yet only techincal your mind is not capable of following the discussions - so you just felt replying to etiquette. That says more about you, than about me. What everyone hates about politics is that people just speak about how things are phrased instead of looking at the intention of the phrased text. Why don't you go into politics, maybe you'll do better there. Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at cse.psu.edu Fri Jan 27 11:58:25 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Fri, 27 Jan 2012 11:58:25 -0500 Subject: [Beowulf] The Absurdity of Diep - Was cpu's versus gpu's - Was Intel buys QLogic InfiniBand business In-Reply-To: <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> Message-ID: <4F22D7B1.4020508@cse.psu.edu> On 01/27/2012 11:41 AM, Vincent Diepeveen wrote: > On Jan 27, 2012, at 5:25 PM, Ellis H. Wilson III wrote: > >> On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: >>> And why do you post this? > > So you can follow all etiquette, yet only techincal your mind is not > capable of following the discussions - > so you just felt replying to etiquette. No, I've given up writing technically when you're posting because: a) You go into discussions to prove everyone wrong b) You rapidly switch the topic if too many people disagree, which is frustrating and confusing (hence, was intel buys qlogic, then became cpus versus gpus, which became Itanium vs Larabee somehow, and now it is how poorly you communicate) c) There is nothing to gain from having discussions with you > That says more about you, than about me. My personal background is storage and communication protocol-heavy. Not processor-oriented. You are right to suggest I am hesitant to post on a thread that directly compares two seemingly different processors, just like you hesitate to deal with the reality that you lack basic social skills. Everyone caters to their own strengths, and generally (if they are wise), takes a back-seat and tries to learn something in areas they are weak. > What everyone hates about politics is that people just speak about > how things are phrased instead of looking at the intention of the > phrased text. > > Why don't you go into politics, maybe you'll do better there. Just because this is a list on Beowulfery and broadly covers everything remotely attached to HPC does not mean it needs to be bereft of a baseline of etiquette and respect for one another. I know quite a few very nice, but rather intelligent and technically-capable people. These two qualities can in fact coexist in a person, believe it or not. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 12:03:38 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 18:03:38 +0100 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: <4F22D221.3020504@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> <4F22D221.3020504@ias.edu> Message-ID: <208B7C7D-3A3E-4134-A352-4D7D78B304D1@xs4all.nl> On Jan 27, 2012, at 5:34 PM, Prentice Bisbal wrote: > On 01/27/2012 11:12 AM, Vincent Diepeveen wrote: >> And both are seem failures from user viewpoint, maybe not from intels >> income viewpoint, >> but from intels aim to replace and/or create a new long lasting >> architecture >> that can even *remotely* compete with other manufacturers, >> not to mention far too high pricepoints for such cpu's. > > This argument is ridiculous. Just because two completely different > technologies (architectures) both fail, doesn't make them similar. > > That's like saying a Ford Edsel and Pontiac Aztek are similar cars. > >> Assuming you're not completely born stupid, i assume you will realize >> that IN ORDER to run > > Calling someone "completely born stupid" is unacceptable behavior. Whereaas everyone knows the statements of intel on larrabee there and that without cache coherency you can't multithread and everything also has to be done blocked - so there is zero compatibility with x64 then and any compatibility then cannot get garantueed. You know this really well - yet you kept yourself dumb there trying to cheap score. As without cache coherency of course it's easy to build big cpu's that scale well, yet they don't work x64 then. of course intel will be forced to design some kick butt design somewhere in future that's not x64 compatible at all which isn't using things like cache coherency. Which isn't remotely the idea of larrabee. That's why you wrote it down as such. >> most existing x64 codes, it needs to have cache coherency, and that >> it always has been >> presented as having exactly that. >> Which is one of reasons why the architecture doesn't scale of course. > > Cache-coherent systems don't scale well? Really? SGI Origins were > ccNUMA > systems, and they scaled well. > Indeed this didn't scale near lineair in price. Each Origin3800 @ 64 processors @ 1.5Ghz was exactly 1 million dollar, whereas a simple normal x64 cpu at the time had a price similar to the square root of that. In GPU's it all scales very cheap, and when using cache coherency you start to lose that scaling. Yields will go down of course. Most manufacturers need a pretty high yield to sell a chip at any decent price, so production costs of a larrabee chip in the same proces technology as a GPU, having the same performance will be a huge factor higher. That also will cause intel to really sell few of them. You would consider buying a larrabee at 1 million dollar a card? >> Well you can forget about them running your x64 fortran codes on it >> at any fast speed. >> >> You need to total rewrite your code to be able to use vectors of >> doubles, >> and in contradiction to GPU's where you can indirectly with arrays >> see each PE or each 'compute core' >> (which is 4 PE's of in case of AMD-ATI that can execute 1 double a > > This argument makes no sense in the context of this discussion. You > need to do a significant rewrite of your code to take advantage of > GPUs, > too, so how are GPUs better? If you need to rewrite it anyway, why not get a much faster performance at part of the price? It's the same effort you have to do. > >> cycle), >> >> Such lookups are a disaster at larrabee - having a cost of 7 cycles >> for indirect lookups, >> so you really need to use vectors. >> >> Now i bet majority of your oldie x64 code doesn't use such huge >> vectors, >> so to even get some remote performance out of it, a total rewrite of >> most code is needed, >> if it can work at all. >> >> We can then also see the insight that GPU's are total superior to >> larrabee at most terrains and >> most importantly at multiplicative codes. >> >> As you might know GPU's are worldchampion in doing multiplications >> and CPU's are not. >> >> Multiplication happens to be something that is of major importance >> for the majority of HPC codes. >> Majority i really mean - approaching 90% at the public >> supercomputers. > > I'm at a loss for words... > http://www.nwo.nl/nwohome.nsf/pages/NWOP_8DEEKL_Eng title: "Overview of recent supercomputers 2010" Author: Aad van der Steen > > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 13:29:52 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 13:29:52 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> Message-ID: <4F22ED20.7040105@ias.edu> On 01/27/2012 11:41 AM, Vincent Diepeveen wrote: > On Jan 27, 2012, at 5:25 PM, Ellis H. Wilson III wrote: > >> On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: >>> And why do you post this? > So you can follow all etiquette, yet only techincal your mind is not > capable of following the discussions - > so you just felt replying to etiquette. > > That says more about you, than about me. > What it says is that we've given up on discussing technology with you, because your arguments are completely nonsensical. Since you clearly don't understand technology, we're hoping you can at least understand the simple concepts of basic etiquette. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From glykos at mbg.duth.gr Fri Jan 27 13:57:31 2012 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Fri, 27 Jan 2012 20:57:31 +0200 (EET) Subject: [Beowulf] Signal to noise. In-Reply-To: <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: Dear List, I have been a (mostly) quiet reader of this list for the last ~5 years and my intention is to continue reading the excellent posts that the members of this community contribute almost daily. Having said that, the recent Vincent-centric 'discussions' have ---as I am sure you all know--- significantly reduced the signal-to-noise ratio. Can we get back to normal, please ? Thanks, Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From moloney.brendan at gmail.com Fri Jan 27 14:26:12 2012 From: moloney.brendan at gmail.com (Brendan Moloney) Date: Fri, 27 Jan 2012 11:26:12 -0800 Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: I am in a similar position. I posted a question to this list quite some time ago but have remained subscribed to the list ever since. I have always (or at least until recently) enjoyed reading the discussions on here. I hope that one person does not ruin such a great resource. Thanks, Brendan On Fri, Jan 27, 2012 at 10:57 AM, Nicholas M Glykos wrote: > > Dear List, > > I have been a (mostly) quiet reader of this list for the last ~5 years and > my intention is to continue reading the excellent posts that the members > of this community contribute almost daily. Having said that, the recent > Vincent-centric 'discussions' have ---as I am sure you all know--- > significantly reduced the signal-to-noise ratio. Can we get back to > normal, please ? > > Thanks, > Nicholas > > -- > > > Nicholas M. Glykos, Department of Molecular Biology > and Genetics, Democritus University of Thrace, University Campus, > Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, > Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From h-bugge at online.no Fri Jan 27 14:29:35 2012 From: h-bugge at online.no (=?iso-8859-1?Q?H=E5kon_Bugge?=) Date: Fri, 27 Jan 2012 11:29:35 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120124045541.GB10196@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> Message-ID: <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> Greg, On 23. jan. 2012, at 20.55, Greg Lindahl wrote: > On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: > >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > > I figured out the main why: > > http://seekingalpha.com/news-article/2082171-qlogic-gains-market-share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets > >> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >> surpassed $100 million per quarter, and are on track for about fifty >> percent annual growth, according to Crehan Research. > > That's the whole market, and QLogic says they are #1 in the FCoE > adapter segment of this market, and #2 in the overall 10 gig adapter > market (see > http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses-f2q12-results-earnings-call-transcript) That can explain why QLogic is selling, but not why Intel is buying. 10 years ago, Intel went _out_ of the Infiniband marked, see http://www.networkworld.com/newsletters/servers/2002/01383318.html So has the IB business evolved so incredible well compared to what Intel expected back in 2002? Do not think so. I would guess that we will see message passing/RDMA over Thunderbolt or similar. H?kon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 15:06:54 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 21:06:54 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> Message-ID: <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> On Jan 27, 2012, at 8:29 PM, H?kon Bugge wrote: > Greg, > > > On 23. jan. 2012, at 20.55, Greg Lindahl wrote: > >> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: >> >>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>> intel_to_buy_qlogic_s_infiniband_business.html >> >> I figured out the main why: >> >> http://seekingalpha.com/news-article/2082171-qlogic-gains-market- >> share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets >> >>> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >>> surpassed $100 million per quarter, and are on track for about fifty >>> percent annual growth, according to Crehan Research. >> >> That's the whole market, and QLogic says they are #1 in the FCoE >> adapter segment of this market, and #2 in the overall 10 gig adapter >> market (see >> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >> f2q12-results-earnings-call-transcript) > > That can explain why QLogic is selling, but not why Intel is buying. > > 10 years ago, Intel went _out_ of the Infiniband marked, see http:// > www.networkworld.com/newsletters/servers/2002/01383318.html > > So has the IB business evolved so incredible well compared to what > Intel expected back in 2002? Do not think so. > > I would guess that we will see message passing/RDMA over > Thunderbolt or similar. > > Qlogic offers that QDR. Mellanox is a generation newer there with FDR. Both in latency as well as in bandwidth a huge difference. > H?kon > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 15:19:31 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 15:19:31 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> Message-ID: <4F2306D3.4080509@scalableinformatics.com> On 01/27/2012 03:06 PM, Vincent Diepeveen wrote: > > On Jan 27, 2012, at 8:29 PM, H?kon Bugge wrote: > >> Greg, >> >> >> On 23. jan. 2012, at 20.55, Greg Lindahl wrote: >> >>> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: >>> >>>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>>> intel_to_buy_qlogic_s_infiniband_business.html >>> >>> I figured out the main why: >>> >>> http://seekingalpha.com/news-article/2082171-qlogic-gains-market- >>> share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets >>> >>>> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >>>> surpassed $100 million per quarter, and are on track for about fifty >>>> percent annual growth, according to Crehan Research. >>> >>> That's the whole market, and QLogic says they are #1 in the FCoE >>> adapter segment of this market, and #2 in the overall 10 gig adapter >>> market (see >>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >>> f2q12-results-earnings-call-transcript) I found that statement interesting. I've actually not known anything about their 10GbE products. My bad. >> >> That can explain why QLogic is selling, but not why Intel is buying. >> >> 10 years ago, Intel went _out_ of the Infiniband marked, see http:// >> www.networkworld.com/newsletters/servers/2002/01383318.html >> >> So has the IB business evolved so incredible well compared to what >> Intel expected back in 2002? Do not think so. >> >> I would guess that we will see message passing/RDMA over >> Thunderbolt or similar. Intel buying makes quite a bit of sense IMO. They are in 10GbE silicon and NICs, and being in IB silicon and HCAs gives them not only a hedge (10GbE while growing rapidly, is not the only high performance network market, and Intel is very good at getting economies of scale going with its silicon ... well ... most of its silicon ... ignoring Itanium here ...). Its quite likely that Intel would need IB for its PetaScale plans. Someone here postualted putting the silicon on the CPU. Not sure if this would happen, but I could see it on an IOH, easily. That would make sense (at least in terms of the Westmere designs ... for the Romley et al. I am not sure where it would make most sense). But Intel sees the HPC market growth, and I think they realize that there are interesting opportunities for them there with tighter high performance networking interconnects (Thunderbolt, USB3, IB, 10GbE native on all these systems). > Qlogic offers that QDR. > Mellanox is a generation newer there with FDR. > > Both in latency as well as in bandwidth a huge difference. Haven't looked much at FDR or EDR latency. Was it a huge delta (more than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us for a while, and switches are still ~150-300ns port to port. At some point I think you start hitting a latency floor, bounded in part by "c", but also by an optimal technology path length that you can't shorten without significant investment and new technology. Not sure how close we are to that point (maybe someone from Qlogic/Mellanox could comment on the headroom we have). Bandwidth wise, you need E5 with PCIe 3 to really take advantage of FDR. So again, its a natural fit, especially if its LOM .... Curiously, I think this suggests that ScaleMP could be in play on the software side ... imagine stringing together bunches of the LOM FDR/QDR motherboards with E5's and lots of ram into huge vSMPs (another thread). Shai may tell me I'm full of it (hope he doesn't), but I think this is a real possibility. The Qlogic purchase likely makes this even more interesting for Intel (or Cisco, others as a defensive acq). We sure do live in interesting times! -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 15:27:24 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 15:27:24 -0500 Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: <4F2308AC.9010704@scalableinformatics.com> On 01/27/2012 01:57 PM, Nicholas M Glykos wrote: > > Dear List, > > I have been a (mostly) quiet reader of this list for the last ~5 years and > my intention is to continue reading the excellent posts that the members > of this community contribute almost daily. Having said that, the recent > Vincent-centric 'discussions' have ---as I am sure you all know--- > significantly reduced the signal-to-noise ratio. Can we get back to > normal, please ? > Greetings Nicholas and many others: I've found that filters help. I have some simple procmail filters set up in my mail directory that redirect some people's email (and in some cases responses to them) to a file I ... well ... never read. By doing so, I find the S/N ratio to be vastly improved. Only one person from Beowulf is in this (not Vincent ... I am still deeply amused by some of the emails, though that is fading fast with the personal attacks). Procmail filters look like this :0: * ^From:.*bad at person.com $HOME/twit.filter Then I never read the twit.filter. Just empty it out every now and then. Maybe once every few years. Doing this has dramatically improved S/N here and elsewhere. If you don't have this capability directly, your mail client can probably fake it. I use this as I have (far too) many mail clients and I don't want to manage the rules on all of them. If you are afflicted with Microsoft exchange as your mail server, I am not sure what you can (easily) do. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From glykos at mbg.duth.gr Fri Jan 27 15:58:02 2012 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Fri, 27 Jan 2012 22:58:02 +0200 (EET) Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: Hi Joe, > I've found that filters help. You are killing my daily digests. > If you are afflicted with Microsoft ... What is 'Microsoft' ? :-) All the best (and apologies to the list for the email traffic), Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 16:07:34 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 16:07:34 -0500 Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: <4F231216.3020703@scalableinformatics.com> On 01/27/2012 03:58 PM, Nicholas M Glykos wrote: > > Hi Joe, > > >> I've found that filters help. > > You are killing my daily digests. Do'h ! ... I seem to remember that you can do some more fancy filtering ... Someone showed me something a few years ago, that would break apart digests, filter, and reassemble. Something like this: http://easierbuntu.blogspot.com/2011/09/managing-your-email-with-fetchmail.html (they have some interesting procmail recipes, but you can find them to do this if you really want to). > > >> If you are afflicted with Microsoft ... > > What is 'Microsoft' ? > :-) A small, very gentle company in the North West USA. > All the best (and apologies to the list for the email traffic), > Nicholas :) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 16:42:24 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 22:42:24 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2306D3.4080509@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: <69BBD80B-05C9-4683-99F7-B48A0BDA285D@xs4all.nl> On Jan 27, 2012, at 9:19 PM, Joe Landman wrote: > On 01/27/2012 03:06 PM, Vincent Diepeveen wrote: >> >> On Jan 27, 2012, at 8:29 PM, H?kon Bugge wrote: >> >>> Greg, >>> >>> >>> On 23. jan. 2012, at 20.55, Greg Lindahl wrote: >>> >>>> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: >>>> >>>>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>>>> intel_to_buy_qlogic_s_infiniband_business.html >>>> >>>> I figured out the main why: >>>> >>>> http://seekingalpha.com/news-article/2082171-qlogic-gains-market- >>>> share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets >>>> >>>>> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >>>>> surpassed $100 million per quarter, and are on track for about >>>>> fifty >>>>> percent annual growth, according to Crehan Research. >>>> >>>> That's the whole market, and QLogic says they are #1 in the FCoE >>>> adapter segment of this market, and #2 in the overall 10 gig >>>> adapter >>>> market (see >>>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >>>> f2q12-results-earnings-call-transcript) > > I found that statement interesting. I've actually not known anything > about their 10GbE products. My bad. > >>> >>> That can explain why QLogic is selling, but not why Intel is buying. >>> >>> 10 years ago, Intel went _out_ of the Infiniband marked, see http:// >>> www.networkworld.com/newsletters/servers/2002/01383318.html >>> >>> So has the IB business evolved so incredible well compared to what >>> Intel expected back in 2002? Do not think so. >>> >>> I would guess that we will see message passing/RDMA over >>> Thunderbolt or similar. > > Intel buying makes quite a bit of sense IMO. They are in 10GbE > silicon > and NICs, and being in IB silicon and HCAs gives them not only a hedge > (10GbE while growing rapidly, is not the only high performance network > market, and Intel is very good at getting economies of scale going > with > its silicon ... well ... most of its silicon ... ignoring Itanium here > ...). Its quite likely that Intel would need IB for its PetaScale Why buy previous generation IB in such case? It's about the ethernet of course... They produce tens of millions of cpu's each quarter and also announced a SoC (socket on chip). From SoC's actually the market produces billions a year. So it's alucrative market, yet highly competative. Having 10 gigabit ethernet on such SoC and the total at a low price would give intel a huge lead there worth dozens of billions a year. It's not clear to me where all their SoC plans go, but i bet right now they are open to any market needing SoC's. Note that many SoC's are dirt cheap. Even in very low volume we speak about some tens of dollars, cpu included and other connectivity included. Price is everything there, yet i guess intel will be offering the 'top' SoC's there with faster cpu's and 10 GigE. Then they produce a bunch of mainboards. Think also of upcoming generation of consoles, ipad 3's and similar products etc - it's not clear yet which company gets the contracts for upcoming consoles, it's all wide open for now. Yet they might sell also a 100+ million of those. Intel is an attractive company to do business with for console manufacturers now. IBM's cell kind of lost momentum there and has nothing new to offer that really outperforms as it seems. Also power usage of cell was kind of disappointing. Initial version PS3 was 220 watts on average and 100% usage it could go up to 380+ watt. Try to put that on your couch. Don't confuse this with the later crunching CELL version, a much improved chip, used for some supercomputers. Yet if i remember well, some reports, was it Aad v/d Steen (?) already predicted it would be not interesting for upcoming supercomputers as it is some kind of hybrid chip - which has no long term future. He was right. > plans. Someone here postualted putting the silicon on the CPU. Not > sure if this would happen, but I could see it on an IOH, easily. That > would make sense (at least in terms of the Westmere designs ... for > the > Romley et al. I am not sure where it would make most sense). > > But Intel sees the HPC market growth, and I think they realize that > there are interesting opportunities for them there with tighter high > performance networking interconnects (Thunderbolt, USB3, IB, 10GbE > native on all these systems). > Undoubtfully they'll try something in the HPC market. If you already have put lots of cash in development of a product it's better to put it on the market. Based upon their name they'll sell some. And some years from now they should have something bigtime improved. Yet realize how complicated it is to tape out a GPU at a new process technology if you aren't sure you gonna sell a 100+ million of them. Such massive projects have to pay back for factories. A product that's having a potential of not even selling for over a few dozens of billions of dollars is not even interesting to develop. Just startup costs for a GPU at a new proces technology is some dozens of millions for each run and the more complex it is and the newer the proces technology the more expensive it is. Realize IBM produces its power7 and bluegene/q upcoming cpu at 45 nm technology. GPU's release now in 28 nm. That's giving theoretically an advantage of a tad less of (45 / 28) ^ 2 = 2.58 So a gpu of intel needs to be factor 2.58 better in the same proces technology than todays gpu's of AMD (already released 28 nm) and Nvidia (coming soon 28 nm i'd expect). This where with cpu's, intels big advantage is always that they are better in getting newer proces technologies to work sooner than the competition. Ivy Bridge will be 22 nm so i heard rumours. >> Qlogic offers that QDR. >> Mellanox is a generation newer there with FDR. >> >> Both in latency as well as in bandwidth a huge difference. > > Haven't looked much at FDR or EDR latency. Was it a huge delta (more > than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us > for a while, and switches are still ~150-300ns port to port. At some Posting here some months ago from Gilad Shainer was it's 0.85 us RDMA for FDR versus 1.3 us or so for the other; more importantly for clusters is the bandwidth. I guess that pci-e 3.0 allows simply much higher speeds whereas the QDR is PCI-E 2.0 stuff. Isn't pci-e 3.0 about 2x higher bandwidth than 2 pci-e 2.0? Now i might be happy with that last, but i guess that for big FFT's or be it matrice, you still need massive bandwidth. Even if n is big in O ( k * n log n ) Where k in case of matrice is a tad bigger than n and in case of Number Theory is usually around the number of bits, so 3.32 times n or so, that means you still need k steps of n log n. That's massive bandwidth. > point I think you start hitting a latency floor, bounded in part by > "c", > but also by an optimal technology path length that you can't shorten > without significant investment and new technology. Not sure how close > we are to that point (maybe someone from Qlogic/Mellanox could comment > on the headroom we have). There is a lot of headroom for better latencies from software viewpoint, as cpu's keep getting faster yet latency of years ago networks was just marginally worse than what's there now. In case of hardware i really am no expert there. > > Bandwidth wise, you need E5 with PCIe 3 to really take advantage of > FDR. > So again, its a natural fit, especially if its LOM .... > All the socket2011 boards that are in the shops now are PCI-e 3.0 and a wave of mainboards with 2 sockets will release a few days before or at the same day that intel finally releases the Xeon version of Sandy Bridge. Seems it didn't release yet as it's not too high clocked, if i look at this sample cpu :) It's 2Ghz to be precise (8 cores Xeon). > Curiously, I think this suggests that ScaleMP could be in play on the > software side ... imagine stringing together bunches of the LOM FDR/ > QDR > motherboards with E5's and lots of ram into huge vSMPs (another > thread). > Shai may tell me I'm full of it (hope he doesn't), but I think > this is > a real possibility. The Qlogic purchase likely makes this even more > interesting for Intel (or Cisco, others as a defensive acq). > A technology that just sold to 300 machines, this is not interesting market for intel. They have very expensive factories that each cost many billions of dollars. These need to produce nonstop and sell products, to pay back for the factories and to make a profit. Intel used to be worth over a 100 billion dollar at NASDAQ. Wasting your most clever engineers, from which each company always has too few, to products that can't keep busy your factories, is a total waste of time. So your huge base of B-class engineers, let me not quote some mailing list names, that's the ones you move to Qlogic then for the HPC. That's enough to keep it afloat for a while in combination with 'intel inside'. Intels profit is too huge to be busy toying with tiny markets with a handful of customers, from which majority forgot to take their medicine when you propose rewriting the software to some new hardware platform you are gonna unroll. A habit intel is not exactly excited about of course, as they like to sell each time new technology. Also each larrabee intel would sell means they sell a bunch of xeons less of course. > We sure do live in interesting times! > Not for everyone i guess - many lost their job and as i predicted some years ago a guy with a nobel prize might be carpet bombing a huge nation this summer. Intel has 3 huge factories in Israel last time i checked. It sure can give unpredicted results for future. > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 16:47:21 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 16:47:21 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <69BBD80B-05C9-4683-99F7-B48A0BDA285D@xs4all.nl> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <69BBD80B-05C9-4683-99F7-B48A0BDA285D@xs4all.nl> Message-ID: <4F231B69.1050404@scalableinformatics.com> On 01/27/2012 04:42 PM, Vincent Diepeveen wrote: > > On Jan 27, 2012, at 9:19 PM, Joe Landman wrote: > >> On 01/27/2012 03:06 PM, Vincent Diepeveen wrote: [... merciful trimming ...] >>>> I would guess that we will see message passing/RDMA over >>>> Thunderbolt or similar. >> >> Intel buying makes quite a bit of sense IMO. They are in 10GbE >> silicon >> and NICs, and being in IB silicon and HCAs gives them not only a hedge >> (10GbE while growing rapidly, is not the only high performance network >> market, and Intel is very good at getting economies of scale going >> with >> its silicon ... well ... most of its silicon ... ignoring Itanium here >> ...). Its quite likely that Intel would need IB for its PetaScale > > Why buy previous generation IB in such case? IP. Its all about IP. Its always about IP. If ever you think its not about IP, you should remember "Landman's N+1th rule of M&A: It's the IP man ... just da IP!" > It's about the ethernet of course... ... no its not. Intel has its own ethernet. Its had it for a LONG time, and it did not buy Qlogic ethernet ... Its not about the ethernet. Say it with me ... ITS NOT ABOUT THE ETHERNET ... There, don't you feel better now? I do ... > They produce tens of millions of cpu's each quarter and also > announced a SoC (socket on chip) SoC is "System On a Chip". Socket on a chip is ... er ... cart before the horse? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Fri Jan 27 17:13:12 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 14:13:12 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> Message-ID: <20120127221312.GA29961@bx9.net> On Fri, Jan 27, 2012 at 11:29:35AM -0800, H?kon Bugge wrote: > That can explain why QLogic is selling, but not why Intel is buying. That's right. This was probably bought, not sold. If you look at the press release Intel put out, it's all about Exascale computing. http://newsroom.intel.com/community/intel_newsroom/blog/2012/01/23/intel-takes-key-step-in-accelerating-high-performance-computing-with-infiniband-acquisition If you want to put an IB HCA in a CPU or a {north,south}bridge, TrueScale nee InfiniPath is a much smaller implementation than others, and most of the chip is memory, which Intel knows how to shrink drastically compared to the usual way people implement memory. Also, keep in mind that Intel's benchmarking group in Moscow has a lot of experience with benchmarking real apps for bids using TrueScale head-to-head against other HCAs, and I wouldn't be surprised if it was the case that TrueScale QDR is faster than that other company's FDR on many real codes, for the usual reason that TrueScale's MPI-oriented InfiniBand extension is more suited for MPI than the standard InfiniBand has-more-features-than-MPI-requires protocols. Finally, I haven't seen it mentioned whether or not QLogic's IB switch was part of the purchase. If it is, then you should note that it's not hard to make that chip speak ethernet, and Intel could probably dramatically improve it with their superior serdes technology. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Fri Jan 27 17:25:58 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Fri, 27 Jan 2012 22:25:58 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120127221312.GA29961@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> Message-ID: > If you want to put an IB HCA in a CPU or a {north,south}bridge, TrueScale nee > InfiniPath is a much smaller implementation than others, and most of the chip > is memory, which Intel knows how to shrink drastically compared to the usual > way people implement memory. So I wonder why multiple OEMs decided to use Mellanox for on-board solutions and no one used the QLogic silicon... > Also, keep in mind that Intel's benchmarking group in Moscow has a lot of > experience with benchmarking real apps for bids using TrueScale head-to-head > against other HCAs, and I wouldn't be surprised if it was the case that TrueScale > QDR is faster than that other company's FDR on many real codes, Surprise surprise... this is no more than FUD. If you have real numbers to back it up please send. If it was so great, how come more people decided to use the Mellanox solutions? If QLogic was doing so great with their solution, I would guess they would not be selling the IB business... > Finally, I haven't seen it mentioned whether or not QLogic's IB switch was part > of the purchase. If it is, then you should note that it's not hard to make that chip > speak ethernet, and Intel could probably dramatically improve it with their > superior serdes technology. > > -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Fri Jan 27 17:27:23 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 14:27:23 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2306D3.4080509@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: <20120127222723.GB29961@bx9.net> On Fri, Jan 27, 2012 at 03:19:31PM -0500, Joe Landman wrote: > >>> That's the whole market, and QLogic says they are #1 in the FCoE > >>> adapter segment of this market, and #2 in the overall 10 gig adapter > >>> market (see > >>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- > >>> f2q12-results-earnings-call-transcript) > > I found that statement interesting. I've actually not known anything > about their 10GbE products. My bad. I'm not surprised, as this 10ge adapter is aimed at the same part of the market that uses fibre channel, which isn't that common in HPC. It doesn't have the kind of TCP offload features which have been (futilely) marketed in HPC; it's all about running the same fibre channel software most enterprises have run for a long time, but having the network be ethernet. > Haven't looked much at FDR or EDR latency. Was it a huge delta (more > than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us > for a while, and switches are still ~150-300ns port to port. Are you talking about the latency of 1 core on 1 system talking to 1 core on one system, or the kind of latency that real MPI programs see, running on all of the cores on a system and talking to many other systems? I assure you that the latter is not 0.8 for any IB system. > At some > point I think you start hitting a latency floor, bounded in part by "c", Last time I did the computation, we were 10X that floor. And, of course, each increase in bandwidth usually makes latency worse, absent heroic efforts of implementers to make that headline latency look better. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From tom.elken at qlogic.com Fri Jan 27 18:08:58 2012 From: tom.elken at qlogic.com (Tom Elken) Date: Fri, 27 Jan 2012 15:08:58 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120127221312.GA29961@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> Message-ID: <35AAF1E4A771E142979F27B51793A4888885B23AE5@AVEXMB1.qlogic.org> > Finally, I haven't seen it mentioned whether or not QLogic's IB switch > was part of the purchase. >From the QLogic press release: " QLogic Corp. ... today announced a definitive agreement to sell the product lines ... associated with its InfiniBand business to Intel Corporation ..." So "the product lines" means both the switch and HCA product lines. Last summer Intel acquired an Ethernet switch business: http://newsroom.intel.com/community/intel_newsroom/blog/2011/07/19/intel-to-acquire-fulcrum-microsystems so it is not unprecedented that they are interested in switching as well as host technologies. -Tom If it is, then you should note that it's not > hard to make that chip speak ethernet, and Intel could probably > dramatically improve it with their superior serdes technology. > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 27 16:07:08 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 27 Jan 2012 16:07:08 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2306D3.4080509@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: >>>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >>>> f2q12-results-earnings-call-transcript) > > I found that statement interesting. I've actually not known anything > about their 10GbE products. My bad. I was a bit surprised that the entire transcript had only one side-ways mention of IB. also interesting that they seem quite heavily into the heavily-offloaded adapter market (which is sort of the opposite of the original infinipath stuff.) >>> I would guess that we will see message passing/RDMA over >>> Thunderbolt or similar. has there been any mention of Thunderbolt in a switched context? afaikt it's just a weird "let's do faster USB and throw in video" thing. > Intel buying makes quite a bit of sense IMO. They are in 10GbE silicon > and NICs, and being in IB silicon and HCAs gives them not only a hedge > (10GbE while growing rapidly, is not the only high performance network weird to have redundant/competing parts in many of the same markets though. afaik, intel 10G has a reasonable rep; they presumably won't be junking their own products. > ...). Its quite likely that Intel would need IB for its PetaScale > plans. I can't quite tell whether Qlogic's IB switches use Mellanox chips or not. afaik, Qlogic has their own adapter chips (and perhaps FC/eth). > than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us > for a while, and switches are still ~150-300ns port to port. At some mellanox qdr systems I've tested are about 1.6 us half-rtt pingpong. I don't think the switch latency is a big deal, since with 36x fanout, you don't need a very tall fat-tree. > Curiously, I think this suggests that ScaleMP could be in play on the > software side really? I'd be interested in hearing from real people who've actually used it (not marketing, thanks). I don't really understand how ScaleMP can do the required coherency in units smaller than a page, which means that "non-embarassing" programs will surely notice... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From tom.elken at qlogic.com Fri Jan 27 18:24:21 2012 From: tom.elken at qlogic.com (Tom Elken) Date: Fri, 27 Jan 2012 15:24:21 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: <35AAF1E4A771E142979F27B51793A4888885B23AF3@AVEXMB1.qlogic.org> > I can't quite tell whether Qlogic's IB switches use Mellanox chips or not. With the QDR generation, QLogic developed its own IB switch chip, and uses it in the 12000 line of switches. -Tom This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From bill at cse.ucdavis.edu Fri Jan 27 21:10:02 2012 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Fri, 27 Jan 2012 18:10:02 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> Message-ID: <4F2358FA.4030009@cse.ucdavis.edu> On 01/27/2012 02:25 PM, Gilad Shainer wrote: > So I wonder why multiple OEMs decided to use Mellanox for on-board > solutions and no one used the QLogic silicon... That's a strange argument. What does Intel want? Something to make them more money. In the past that's been integrating functionality into their CPU or support chipsets. In the past that's been sata, usb, memory controller, pci-e controller, and GigE. The cost in transistors and die area seems very relevant to Intel's interests. Anyone have an estimate on how much latency a direct connect to QPI would save vs pci-e? What to motherboard board manufacturers want? Something to make them more money. So that's mostly marketing/reputation, pricing, and whatever they can do to differentiate themselves. If buying a $150 IB chip lets them charge $400 more then it's a win, assuming they spend less than $250 of R&D to add it to the motherboard. I doubt the difference in transistors or a few watts would be a big deal either way. >> Also, keep in mind that Intel's benchmarking group in Moscow has a >> lot of experience with benchmarking real apps for bids using >> TrueScale head-to-head >> against other HCAs, and I wouldn't be surprised if it was the case that TrueScale >> QDR is faster than that other company's FDR on many real codes, > > > Surprise surprise... this is no more than FUD. If you have real > numbers to back it up please send. If it was so great, how come more > people decided to use the Mellanox solutions? If QLogic was doing so > great with their solution, I would guess they would not be selling the > IB business... FUD = Fear, Uncertainty, and Doubt. Doesn't sound like FUD to me. More like a cheap attack on Greg, I think we (the mailing list) can do better. I've personally compared several generations of Myrinet and Infinipath to allegedly faster Mellanox adapters. Mellanox hasn't won yet, but I've not compared QDR or FDR yet. With that said the reason I run the benchmarks to find the best solution and it might well be Mellanox next time. It would be irresponsible to recommend Mellanox cluster provide just pick mellanox FDR over Qlogic QDR just because of the spec sheet. Of course recommending Qlogic over Mellanox without quantifying real world performance would be just as irresponsible. Maybe we could have a few less attacks, complaining and hand waving and more useful information? IMO Greg never came across as a commercial (which beowulf list isn't an appropriate place for), but does regularly contribute useful info. Arguing market share as proof of performance superiority is just silly. Speaking of which, you said: There is some add latency due to the 66/64 new encoding, but overall latency is lower than QDR. MPI is below 1us. I googled for additional information, looked around the Mellanox website, and couldn't find anything. Is that above number relevant to HPC folks running clusters? Does it involve a switch? If not realistic are there any realistic numbers available? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 21:24:10 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 21:24:10 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120127222723.GB29961@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <20120127222723.GB29961@bx9.net> Message-ID: <4F235C4A.8040409@scalableinformatics.com> On 01/27/2012 05:27 PM, Greg Lindahl wrote: > I'm not surprised, as this 10ge adapter is aimed at the same part of > the market that uses fibre channel, which isn't that common in HPC. It > doesn't have the kind of TCP offload features which have been > (futilely) marketed in HPC; it's all about running the same fibre > channel software most enterprises have run for a long time, but having > the network be ethernet. That makes sense. >> Haven't looked much at FDR or EDR latency. Was it a huge delta (more >> than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us >> for a while, and switches are still ~150-300ns port to port. > > Are you talking about the latency of 1 core on 1 system talking to 1 > core on one system, or the kind of latency that real MPI programs see, > running on all of the cores on a system and talking to many other > systems? I assure you that the latter is not 0.8 for any IB system. I am looking at these things from a "best of all possible cases" scenario. So when someone comes at me with new "best of all possible cases" numbers, I can compare. Sadly this seems to be the state of many OEM/integrators/manufacturers. In storage, we see small disk form factor SSDs marketed generally, with statments like 50k IOPs, and 500 MB/s. Though they neglect to mention several specific issues with these, such as writing all zeros, or the 75k IOPs are sequential IOPs you get from taking the 600 MB/s interface, dividing by 8k byte operations on a sequential read. Actually do a real random read and write and you get very ... very different results. Especially with non-zero (real) data. >> At some >> point I think you start hitting a latency floor, bounded in part by "c", > > Last time I did the computation, we were 10X that floor. And, of > course, each increase in bandwidth usually makes latency worse, absent > heroic efforts of implementers to make that headline latency look > better. I think thats the point though, that moving that performance "knee" down to lower latency involves (potentially) significant cost, for a modest return ... in terms of real performance benefit to a code. Thanks for the pointer on the computation. If we are 1000x off the floor, we can probably come up with a way to do better. 10x, probably its much harder than we think and not necessarily worth the effort. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 21:38:14 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Sat, 28 Jan 2012 03:38:14 +0100 Subject: [Beowulf] Setting up new benchmark In-Reply-To: <4F235C4A.8040409@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <20120127222723.GB29961@bx9.net> <4F235C4A.8040409@scalableinformatics.com> Message-ID: <8C9E1983-6805-4951-8DEB-79FA871940F1@xs4all.nl> No worries - when by mid februari all components from ebay arrived and i've setup a small cluster here i hope to write some MPI benchmarks that do all sorts of latency tests which i'll attach GPL header to, and which should measure from latency to bandwidth using RDMA reads mostly, with all cores of every node busy. Will be interesting then to compare it all. Maybe several over here want to benchmark. When i first designed the latency benchmark, later on Paul Hsieh managed to make the ideas implementation a bit more efficient. I jumped with a random generator through the memory, Paul Hsieh had optimized it to just jumping random. Dieter Buerssner then wrote the test for single cpu to compare whether it was similar to output i got - which appeared to be the case. Setting up random pattern took very long though - then i optimized to setup the random pattern to O ( n log n ). The advantage of all this is that one really sees the impact with all cores at the same time, whereas most tests use a total idle cluster and test 1 microtiny thing. Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Sat Jan 28 00:29:36 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 21:29:36 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2358FA.4030009@cse.ucdavis.edu> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: <20120128052936.GF20008@bx9.net> On Fri, Jan 27, 2012 at 06:10:02PM -0800, Bill Broadley wrote: > Anyone have an estimate on how much latency a direct connect to QPI > would save vs pci-e? ~ 0.2us. Remember that the first 2 generations of InfiniPath were both SDR: one for HyperTransport and one for PCIe. The difference was 0.3us back then; PathScale + QLogic did some heroic things since to shorten the pipeline stages & up the clock rate. -- greg (and if anyone needs a reminder, I no longer have any financial involvement with QLogic or Intel.) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Sat Jan 28 00:34:17 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 21:34:17 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F235C4A.8040409@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <20120127222723.GB29961@bx9.net> <4F235C4A.8040409@scalableinformatics.com> Message-ID: <20120128053417.GG20008@bx9.net> On Fri, Jan 27, 2012 at 09:24:10PM -0500, Joe Landman wrote: > > Are you talking about the latency of 1 core on 1 system talking to 1 > > core on one system, or the kind of latency that real MPI programs see, > > running on all of the cores on a system and talking to many other > > systems? I assure you that the latter is not 0.8 for any IB system. > > I am looking at these things from a "best of all possible cases" > scenario. So when someone comes at me with new "best of all possible > cases" numbers, I can compare. Sadly this seems to be the state of many > OEM/integrators/manufacturers. The point I've been trying to make for the past 8 years is that one of the two chip families you're looking at doesn't degrade as much as the other from the "best of all possible cases" to a real cluster running a real code. > In storage, we see small disk form factor SSDs marketed generally, with > statments like 50k IOPs, and 500 MB/s. And if you knew that one family of SSDs had a wildly different ratio of peak alleged perf to real application performance, would you ignore that? I suspect not. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Sat Jan 28 05:17:32 2012 From: eugen at leitl.org (Eugen Leitl) Date: Sat, 28 Jan 2012 11:17:32 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F22ED20.7040105@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> Message-ID: <20120128101732.GG7343@leitl.org> On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: > What it says is that we've given up on discussing technology with you, > because your arguments are completely nonsensical. Since you clearly > don't understand technology, we're hoping you can at least understand > the simple concepts of basic etiquette. Who's the list moderator, by the way? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Sat Jan 28 08:32:26 2012 From: eugen at leitl.org (Eugen Leitl) Date: Sat, 28 Jan 2012 14:32:26 +0100 Subject: [Beowulf] photonic buffer bloat Message-ID: <20120128133226.GU7343@leitl.org> Relevant for future clusters, see the PPT presentation linked in below URL. ----- Forwarded message from Masataka Ohta ----- From: Masataka Ohta Date: Sat, 28 Jan 2012 21:42:13 +0900 To: nanog at nanog.org Subject: Re: photonic buffer bloat User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 Eugen Leitl wrote: > In future photonic networks (which will do relativistic cut-through > directly in a photonic crossbar without converting photons to electrons > and back) the fiber is not just a transport channel but also a photonic > buffer Yes. > (e.g. at 10 GBit/s Ethernet a short reach fiber already buffers > a standard 1500 MTU). Wrong. 10Gbps is too slow for optical buffering. At 1Tbps, you can use 100 times less lengthy fiber than at 10Gbps to buffer packets. A 1Tbps packet can be constructed by simultaneously encoding 100 wavelengths at 10Gbps. > Of course photonic gates are expensive, individual delays do add up > so even with slow light buffers Don't try to make light slower. Slow light buffers have resonators, which means they have very very very narrow bandwidth. Instead, make communication speed faster, which shortens fiber length of fiber delay line buffers. > or optical delay loops taken into consideration > current TCP/IP header layout has not been optimized for leading edge > containing most significant switching/routing information, or even > local-knowledge routing (with no global routes). It's too bad IPv6 > was not radical enough, so today's legacy protocols have to be tunneled > through the networks of the future. Considering that, in practice, packet headers must be processed electrically, IPv4 at the photonic backbone is just fine, if most routing table entries are aggregated at /24 or better, which is the current practice. You only have to read a 16M entry SRAM. A problem of IPv6 with 128bit addresses is that route look up can not be performed within a constant time of a few nano seconds, which means packets have overrun fiber delay lines. > I presume this future is some 20-30 years away still. Not so much. Moore's law requires much rapid bandwidth increase. My slides presented at IEEE photonics society 2009 summer topical ftp://chacha.hpcl.titech.ac.jp/IEEE-ST.ppt might be interesting for you. Masataka Ohta ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Sat Jan 28 13:21:59 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Sat, 28 Jan 2012 18:21:59 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2358FA.4030009@cse.ucdavis.edu> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: > > So I wonder why multiple OEMs decided to use Mellanox for on-board > > solutions and no one used the QLogic silicon... > > That's a strange argument. It is not an argument, it is stating a fact. If someone claims that a product provide 10x better performance, best fit etc., and from the other side it has very little attraction, something does not make dense. > What does Intel want? Something to make them more money. Intel explained their move in their PR. They see lots of growth in HPC, definitely in the Exascale, and they see InfiniBand as a key to deliver the right solution. They also mention InfiniBand adoption in other markets, so a good validation for InfiniBand as a leading solution for any server and storage connectivity. > >> Also, keep in mind that Intel's benchmarking group in Moscow has a > >> lot of experience with benchmarking real apps for bids using > >> TrueScale > head-to-head > >> against other HCAs, and I wouldn't be surprised if it was the case > that TrueScale > >> QDR is faster than that other company's FDR on many real codes, > > > > > > Surprise surprise... this is no more than FUD. If you have real > > numbers to back it up please send. If it was so great, how come more > > people decided to use the Mellanox solutions? If QLogic was doing so > > great with their solution, I would guess they would not be selling the > > IB business... > > FUD = Fear, Uncertainty, and Doubt. Doesn't sound like FUD to me. > More like a cheap attack on Greg, I think we (the mailing list) can do better. I never saw any genuine testing from PathScale and then QLogic comparing their stuff to Mellanox, and you are more than welcome to try and prove me wrong. The argument in this email thread is no more than a re-cap of QLogic latest marketing campaign and yes, it is no more than FUD. Cheap attacks are not my game, so please.... > I've personally compared several generations of Myrinet and Infinipath to > allegedly faster Mellanox adapters. Mellanox hasn't won yet, but I've not > compared QDR or FDR yet. With that said the reason I run the benchmarks to > find the best solution and it might well be Mellanox next time. It would be > irresponsible to recommend Mellanox cluster provide just pick mellanox FDR > over Qlogic QDR just because of the spec sheet. > Of course recommending Qlogic over Mellanox without quantifying real world > performance would be just as irresponsible. Going into a bit more of a technical discussion... QLogic way of networking is doing everything in the CPU, and Mellanox way is to implement if all in the hardware (we all know that). The second option is a superset, therefore worse case can be even performance. I encourage you to contact me directly for any application benchmarking you do, and I will be happy to provide you the feedback on what you need in order to get the best out of the Mellanox products. That can be QDR vs QDR as well, no need to go to FDR - I am open for the competition any time... > Maybe we could have a few less attacks, complaining and hand waving and > more useful information? IMO Greg never came across as a commercial > (which beowulf list isn't an appropriate place for), but does regularly contribute > useful info. Arguing market share as proof of performance superiority is just > silly. I am not sure about that... quick search in past emails can show amazing things... I believe most of us are in agreement here. Less FUD, more facts. > Speaking of which, you said: > There is some add latency due to the 66/64 new encoding, but overall > latency is lower than QDR. MPI is below 1us. > > I googled for additional information, looked around the Mellanox website, and > couldn't find anything. Is that above number relevant to > HPC folks running clusters? Does it involve a switch? If not It is with a switch -Gilad > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Sat Jan 28 13:41:56 2012 From: eugen at leitl.org (Eugen Leitl) Date: Sat, 28 Jan 2012 19:41:56 +0100 Subject: [Beowulf] What It'll Take to Go Exascale Message-ID: <20120128184156.GB7343@leitl.org> http://www.sciencemag.org/content/335/6067/394.full Science 27 January 2012: Vol. 335 no. 6067 pp. 394-396 DOI: 10.1126/science.335.6067.394 Computer Science What It'll Take to Go Exascale Robert F. Service Scientists hope the next generation of supercomputers will carry out a million trillion operations per second. But first they must change the way the machines are built and run. On fire. More powerful supercomputers now in the design stage should make modeling turbulent gas flames more accurate and revolutionize engine designs. "CREDIT: J. CHEN/CENTER FOR EXASCALE SIMULATION OF COMBUSTION IN TURBULENCE, SANDIA NATIONAL LABORATORIES" Using real climate data, scientists at Lawrence Berkeley National Laboratory (LBNL) in California recently ran a simulation on one of the world's most powerful supercomputers that replicated the number of tropical storms and hurricanes that had occurred over the past 30 years. Its accuracy was a landmark for computer modeling of global climate. But Michael Wehner and his LBNL colleagues have their eyes on a much bigger prize: understanding whether an increase in cloud cover from rising temperatures would retard climate change by reflecting more light back into space, or accelerate it by trapping additional heat close to Earth. To succeed, Wehner must be able to model individual cloud systems on a global scale. To do that, he will need supercomputers more powerful than any yet designed. These so-called exascale computers would be capable of carrying out 1018 floating point operations per second, or an exaflop. That's nearly 100 times more powerful than today's biggest supercomputer, Japan's ?K Computer,? which achieves 11.3 petaflops (1015 flops) (see graph), and 1000 times faster than the Hopper supercomputer used by Wehner and his colleagues. The United States now appears poised to reach for the exascale, as do China, Japan, Russia, India, and the European Union. It won't be easy. Advances in supercomputers have come at a steady pace over the past 20 years, enabled by the continual improvement in computer chip manufacturing. But this evolutionary approach won't cut it in getting to the exascale. Instead, computer scientists must first figure out ways to make future machines far more energy efficient and tolerant of errors, and find novel ways to program them. ?The step we are about to take to exascale computing will be very, very difficult,? says Robert Rosner, a physicist at the University of Chicago in Illinois, who chaired a recent Department of Energy (DOE) committee charged with exploring whether exascale computers would be achievable. Charles Shank, a former director of LBNL who recently headed a separate panel collecting widespread views on what it would take to build an exascale machine, agrees. ?Nobody said it would be impossible,? Shank says. ?But there are significant unknowns.? Gaining support The next generation of powerful supercomputers will be used to design high-efficiency engines tailored to burn biofuels, reveal the causes of supernova explosions, track the atomic workings of catalysts in real time, and study how persistent radiation damage might affect the metal casing surrounding nuclear weapons. ?It's a technology that has become critically important for many scientific disciplines,? says Horst Simon, LBNL's deputy director. That versatility has made supercomputing an easy sell to politicians. The massive 2012 spending bill approved last month by Congress contained $1.06 billion for DOE's program in advanced computing, which includes a down payment to bring online the world's first exascale computer. Congress didn't specify exactly how much money should be spent on the exascale initiative, for which DOE had requested $126 million. But it asked for a detailed plan, due next month, with multiyear budget breakdowns listing who is expected to do what, when. Those familiar with the ways of Washington say that the request reflects an unusual bipartisan consensus on the importance of the initiative. ?In today's political atmosphere, this is very unusual,? says Jack Dongarra, a computer scientist at the University of Tennessee, Knoxville, who closely follows national and international high-performance computing trends. ?It shows how critical it really is and the threat perceived of the U.S. losing its dominance in the field.? The threat is real: Japan and China have built and operate the three most powerful supercomputers in the world. The rest of the world also hopes that their efforts will make them less dependent on U.S. technology. Of today's top 500 supercomputers, the vast majority were built using processors from Intel, Advanced Micro Devices (AMD), and NVIDIA, all U.S.-based companies. But that's beginning to change, at least at the top. Japan's K machine is built using specially designed processors from Fujitsu, a Japanese company. China, which had no supercomputers in the Top500 List in 2000, now has five petascale machines and is building another with processors made by a Chinese company. And an E.U. research effort plans to use ARM processing chips made by a U.K. company. Getting over the bumps Although bigger and faster, supercomputers aren't fundamentally different from our desktops and laptops, all of which rely on the same sorts of specialized components. Computer processors serve as the brains that carry out logical functions, such as adding two numbers together or sending a bit of data to a location where it is needed. Memory chips, by contrast, hold data for safekeeping for later use. A network of wires connects processors and memory and allows data to flow where and when they are needed. For decades, the primary way of improving computers was creating chips with ever smaller and faster circuitry. This increased the processor's frequency, allowing it to churn through tasks at a faster clip. Through the 1990s, chipmakers steadily boosted the frequency of chips. But the improvements came at a price: The power demanded by a processor is proportional to its frequency cubed. So doubling a processor's frequency requires an eightfold increase in power. New king. Japan has the fastest machine (bar), although the United States still has the most petascale computers (number in parentheses). "CREDIT: ADAPTED FROM JACK DONGARRA/TOP 500 LIST/UNIVERSITY OF TENNESSEE" On the rise. The gap in available supercomputing capacity between the United States and the rest of the world has narrowed, with China gaining the most ground. "CREDIT: ADAPTED FROM JACK DONGARRA/TOP 500 LIST/UNIVERSITY OF TENNESSEE" With the rise of mobile computing, chipmakers couldn't raise power demands beyond what batteries could store. So about 10 years ago, chip manufacturers began placing multiple processing ?cores? side by side on single chips. This arrangement meant that only twice the power was needed to double a chip's performance. This trend swept through the world of supercomputers. Those with single souped-up processors gave way to today's ?parallel? machines that couple vast numbers of off-the-shelf commercial processors together. This move to parallel computing ?was a huge, disruptive change,? says Robert Lucas, an electrical engineer at the University of Southern California's Information Sciences Institute in Los Angeles. Hardware makers and software designers had to learn how to split problems apart, send individual pieces to different processors, synchronize the results, and synthesize the final ensemble. Today's top machine?Japan's ?K Computer??has 705,000 cores. If the trend continues, an exascale computer would have between 100 million and 1 billion processors. But simply scaling up today's models won't work. ?Business as usual will not get us to the exascale,? Simon says. ?These computers are becoming so complicated that a number of issues have come up that were not there before,? Rosner agrees. The biggest issue relates to a supercomputer's overall power use. The largest supercomputers today use about 10 megawatts (MW) of power, enough to power 10,000 homes. If the current trend of power use continues, an exascale supercomputer would require 200 MW. ?It would take a nuclear power reactor to run it,? Shank says. Even if that much power were available, the cost would be prohibitive. At $1 million per megawatt per year, the electricity to run an exascale machine would cost $200 million annually. ?That's a non-starter,? Shank says. So the current target is a machine that draws 20 MW at most. Even that goal will require a 300-fold improvement in flops per watt over today's technology. Ideas for getting to these low-power chips are already circulating. One would make use of different types of specialized cores. Today's top-of-the-line supercomputers already combine conventional processor chips, known as CPUs, with an alternative version called graphical processing units (GPUs), which are very fast at certain types of calculations. Chip manufacturers are now looking at going from ?multicore? chips with four or eight cores to ?many-core? chips, each containing potentially hundreds of CPU and GPU cores, allowing them to assign different calculations to specialized processors. That change is expected to make the overall chips more energy efficient. Intel, AMD, and other chip manufacturers have already announced plans to make hybrid many-core chips. Another stumbling block is memory. As the number of processors in a supercomputer skyrockets, so, too, does the need to add memory to feed bits of data to the processors. Yet, over the next few years, memory manufacturers are not projected to increase the storage density of their chips fast enough to keep up with the performance gains of processors. Supercomputer makers can get around this by adding additional memory modules. But that's threatening to drive costs too high, Simon says. Even if researchers could afford to add more memory modules, that still won't solve matters. Moving ever-growing streams of data back and forth to processors is already creating a backup for processors that can dramatically slow a computer's performance. Today's supercomputers use 70% of their power to move bits of data around from one place to another. One potential solution would stack memory chips on top of one another and run communication and power lines vertically through the stack. This more-compact architecture would require fewer steps to route data. Another approach would stack memory chips atop processors to minimize the distance bits need to travel. A third issue is errors. Modern processors compute with stunning accuracy, but they aren't perfect. The average processor will produce one error per year, as a thermal fluctuation or a random electrical spike flips a bit of data from one value to another. Such errors are relatively easy to ferret out when the number of processors is low. But it gets much harder when 100 million to 1 billion processors are involved. And increasing complexity produces additional software errors as well. One possible solution is to have the supercomputer crunch different problems multiple times and ?vote? for the most common solution. But that creates a new problem. ?How can I do this without wasting double or triple the resources?? Lucas asks. ?Solving this problem will probably require new circuit designs and algorithms.? Finally, there is the challenge of redesigning the software applications themselves, such as a novel climate model or a simulation of a chemical reaction. ?Even if we can produce a machine with 1 billion processors, it's not clear that we can write software to use it efficiently,? Lucas says. Current parallel computing machines use a strategy, known as message passing interface, that divides computational problems and parses out the pieces to individual processors, then collects the results. But coordinating all this traffic for millions of processors is becoming a programming nightmare. ?There's a huge concern that the programming paradigm will have to change,? Rosner says. DOE has already begun laying the groundwork to tackle these and other challenges. Last year it began funding three ?co-design? centers, multi-institution cooperatives led by researchers at Los Alamos, Argonne, and Sandia national laboratories. The centers bring together scientific users who write the software code and hardware makers to design complex software and computer architectures that work in the fastest and most energy-efficient manner. It poses a potential clash between scientists who favor openness and hardware companies that normally keep their activities secret for proprietary reasons. ?But it's a worthy goal,? agrees Wilfred Pinfold, Intel's director of extreme-scale programming in Hillsboro, Oregon. Not so fast. Researchers have some ideas on how to overcome barriers to building exascale machines. Coming up with the cash Solving these challenges will take money, and lots of it. Two years ago, Simon says, DOE officials estimated that creating an exascale computer would cost $3 billion to $4 billion over 10 years. That amount would pay for one exascale computer for classified defense work, one for nonclassified work, and two 100-petaflops machines to work out some of the technology along the way. Those projections assumed that Congress would deliver a promised 10-year doubling of the budget of DOE's Office of Science. But those assumptions are ?out of the window,? Simon says, replaced by the more likely scenario of budget cuts as Congress tries to reduce overall federal spending. Given that bleak fiscal picture, DOE officials must decide how aggressively they want to pursue an exascale computer. ?What's the right balance of being aggressive to maintain a leadership position and having the plan sent back to the drawing board by [the Office of Management and Budget]?? Simon asks. ?I'm curious to see.? DOE's strategic plan, due out next month, should provide some answers. The rest of the world faces a similar juggling act. China, Japan, the European Union, Russia, and India all have given indications that they hope to build an exascale computer within the next decade. Although none has released detailed plans, each will need to find the necessary resources despite these tight fiscal times. The victor will reap more than scientific glory. Companies use 57% of the computing time on the machines on the Top500 List, looking to speed product design and gain other competitive advantages, Dongarra says. So government officials see exascale computing as giving their industries a leg up. That's particularly true for chip companies that plan to use exascale designs to improve future commodity electronics. ?It will have dividends all the way down to the laptop,? says Peter Beckman, who directs the Exascale Technology and Computing Initiative at Argonne National Laboratory in Illinois. The race to provide the hardware needed for exascale computing ?will be extremely competitive,? Beckman predicts, and developing software and networking technology will be equally important, according to Dongarra. Even so, many observers think that the U.S. track record and the current alignment of its political and scientific forces makes it America's race to lose. Whatever happens, U.S. scientists are unlikely to be blindsided. The task of building the world's first exascale computer is so complex, Simon says, that it will be nearly impossible for a potential winner to hide in the shadows and come out of nowhere to claim the prize. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Sat Jan 28 14:26:48 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 28 Jan 2012 14:26:48 -0500 (EST) Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <20120128101732.GG7343@leitl.org> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> Message-ID: >> the simple concepts of basic etiquette. > > Who's the list moderator, by the way? no, please - if there were a moderator who had to plow through all messages, no matter how long, meandering and low-worth, it would become a very unpleasant chore... the list doesn't get a lot of passing weirdos - pretty stable set of characters, fairly predictable in how much you want to read their messages, and how much good you expect to gain from them ;) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Sat Jan 28 16:28:09 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 28 Jan 2012 16:28:09 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: >>> So I wonder why multiple OEMs decided to use Mellanox for on-board >>> solutions and no one used the QLogic silicon... >> >> That's a strange argument. > > It is not an argument, it is stating a fact. you are mistaken. you ask a pointed question - do not construe it as a statement of fact. if you wanted to state a fact, you might say: "multiple OEMs decided to use Mellanox and none have used Qlogic". by stating this, you are implying that Mellanox is superior in some way, though another perfectly adequate explanation could be that Qlogic didn't offer their chips to OEMs, or did so at a higher price. (in fact, the latter would suggest the possibility that Qlogic chips are actually worth more.) note my use of subjunctive here. in reality, Mellanox is the easy choice - widely known and used, the default. OEMs are fond of making easy choices: more comfortable to a lazy customer, possibly lower customer support costs, etc. this says nothing about whether an easy choice is a superior solution to the customer (that is, in performance, price, etc). > If someone claims that a product provide 10x better performance, best fit >etc., and from the other side it has very little attraction, something does >not make dense. I saw no 10x performance claim here. there was some casual mention of a situation where Qlogic QDR performs similar to Mellanox FDR. >good validation for InfiniBand as a leading solution for any server and >storage connectivity. besides Lustre, where do you see IB used for storage? > Going into a bit more of a technical discussion... QLogic way of networking >is doing everything in the CPU, and Mellanox way is to implement if all in >the hardware (we all know that). this is a dishonest statement: you know that QLogic isn't actually trying to do *everything* in the CPU. > The second option is a superset, therefore >worse case can be even performance. this is also dishonest: making the adapter more intelligent clearly introduces some tradeoffs, so it's _not_ a superset. unless you are claiming that within every Mellanox adapter is _literally_ the same functionality, at the same performance, as is in a Qlogic adapter. >> Maybe we could have a few less attacks, complaining and hand waving and >> more useful information? IMO Greg never came across as a commercial >> (which beowulf list isn't an appropriate place for), but does regularly contribute >> useful info. Arguing market share as proof of performance superiority is just >> silly. > > I am not sure about that... quick search in past emails can show amazing things... > I believe most of us are in agreement here. Less FUD, more facts. "facts" in this context (as opposed to FUD, armwaiving, etc) must be dispassionate and quantifiable. not hyperbole and suggestive rhetoric. out of curiosity, has anyone set up a head-to-head comparison (two or more identical machines, both with a Qlogic and a Mellanox card of the same vintage)? regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Sat Jan 28 19:12:59 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Sun, 29 Jan 2012 01:12:59 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: On Jan 28, 2012, at 10:28 PM, Mark Hahn wrote: [snip] > out of curiosity, has anyone set up a head-to-head comparison > (two or more identical machines, both with a Qlogic and a Mellanox > card of > the same vintage)? > > regards, mark hahn. Mark, i stumbled upon the same problem a few months ago when i googled for 4x infiniband you can find something, when moving up to QDR it becomes more sporadic. Not to mention that the interesting test is where the cards are bad - latency. If you find anything, usually it's manufacturer side statements without clear testsetup and usually doing 0 byte tests. This is exactly why i intend to write a benchmark. What i personally believe is not important whether FDR, pci-e 3.0 and a considerable higher claimed bandwidth than pci-e 2.0 QDR. What i do believe is that one must measure objectively. That's why i'm posting for a while now that as soon as the cluster works here i'm gonna write a benchmark to measure latencies moving up the read length slowly so that it more and more gets a bandwidth game and simply present the graph for the interested readers. We're not interested in theoretic tests of 1 core busy that is measuring a latency of another core at the other side busy. A test really requires all cores busy and hammering onto the network card. In the end always everything is a measure of bandwidth of course, but even then the lack of scientists online who tested objectively QDR, no matter *what manufacturer*, such tests really are there in short supply and some of them either just tested 1 tiny thing or a theoretic thing, or just lacked all realism when i read the rest of the article. All with all, after some days of googling, I found 1 tester who toyed something using the same switch (good idea) but the graphs drawn presenting the results are tough to interpret and basically was interested in something else than what's fast now for the network cards. Running the same oldie tests, whereas all manufacturers have way faster alternatives now, such as RDMA reads, is just not interesting. To be continued in some months... > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Sun Jan 29 00:03:31 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Sun, 29 Jan 2012 05:03:31 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: > >>> So I wonder why multiple OEMs decided to use Mellanox for on-board > >>> solutions and no one used the QLogic silicon... > >> > >> That's a strange argument. > > > > It is not an argument, it is stating a fact. > > you are mistaken. you ask a pointed question - do not construe it as a > statement of fact. if you wanted to state a fact, you might say: > "multiple OEMs decided to use Mellanox and none have used Qlogic". You probably meant to say "I think differently" and not "you are mistaken".... Making this mailing list little more polite will benefit us all. > by stating this, you are implying that Mellanox is superior in some way, though > another perfectly adequate explanation could be that Qlogic didn't offer their > chips to OEMs, or did so at a higher price. (in fact, the latter would suggest the > possibility that Qlogic chips are actually worth more.) note my use of > subjunctive here. > > in reality, Mellanox is the easy choice - widely known and used, the default. > OEMs are fond of making easy choices: more comfortable to a lazy customer, > possibly lower customer support costs, etc. > > this says nothing about whether an easy choice is a superior solution to the > customer (that is, in performance, price, etc). OEMs don't place devices on the motherboard just because they can, not because it is cheaper. They do so because they believe it will benefit their users, hence they will sell more. I can assure you that silicon was offered from both companies, and it wasn't an issue of price. From this point you can make any conclusion that you wish to. > >good validation for InfiniBand as a leading solution for any server and > >storage connectivity. > > besides Lustre, where do you see IB used for storage? Protocols: iSER (iSCSI), NFSoRDMA, SRP, GPFS, SMB and others OEMs: DDN, Xyratex, Netapp, EMC, Oracle, SGI, HP, IBM and others. > > Going into a bit more of a technical discussion... QLogic way of networking > >is doing everything in the CPU, and Mellanox way is to implement if all in > >the hardware (we all know that). > > this is a dishonest statement: you know that QLogic isn't actually trying > to do *everything* in the CPU. You are right, you do need a HW translation from PCIe to IB. But I am sure you know where the majority of the transport, error handling etc is being done.... > > The second option is a superset, therefore > >worse case can be even performance. > > this is also dishonest: making the adapter more intelligent clearly > introduces some tradeoffs, so it's _not_ a superset. unless you are > claiming that within every Mellanox adapter is _literally_ the same > functionality, at the same performance, as is in a Qlogic adapter. It is not dishonest. In general offloading is a superset. You can chose to implement just offloading or to leave room for CPU control as well. There will always be parts that are better to be in HW, and if you have flexibility for the rest it is a superset. > >> Maybe we could have a few less attacks, complaining and hand waving and > >> more useful information? IMO Greg never came across as a commercial > >> (which beowulf list isn't an appropriate place for), but does regularly > contribute > >> useful info. Arguing market share as proof of performance superiority is > just > >> silly. > > > > I am not sure about that... quick search in past emails can show amazing > things... > > I believe most of us are in agreement here. Less FUD, more facts. > > "facts" in this context (as opposed to FUD, armwaiving, etc) must be > dispassionate and quantifiable. not hyperbole and suggestive rhetoric. Maybe we read different emails. > out of curiosity, has anyone set up a head-to-head comparison > (two or more identical machines, both with a Qlogic and a Mellanox card of > the same vintage)? > > regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Jan 30 10:04:53 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 30 Jan 2012 10:04:53 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: >> out of curiosity, has anyone set up a head-to-head comparison >> (two or more identical machines, both with a Qlogic and a Mellanox card of >> the same vintage)? >> >> There was a bit of discussion of InfiniBand benchmarking in this thread > and it seems it would be helpful to the casual readers like myself to have > a few references to benchmarking toolkits and actual results. > > Most often reported results are gathered with either Netpipe from Ames or > Intel MPI Benchmark (formerly known as Palas Benchmark) or OSU > Micro-benchmarks. > > Searching the web produced a recent report from Swiss CSCS where a Mellanox > ConnectX3 QDR HCA with a Mellanox switch is set against a Qlogic 7300 QDR > HCA connected to a Qlogic switch. > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/Performance_Analysis_IB-QDR_final-2.pdf as far as I can tell, this paper mainly says "a coalescing stack delivers benchmark results showing a lot higher bandwidth and message rate than a non-coalescing stack." the comment on figure 8: To some extent, the environment variables mentioned before contribute to this outstanding result which is remarkably droll. I'm not sure how well coalescing works for real applications. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 30 11:20:46 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 30 Jan 2012 11:20:46 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <20120128101732.GG7343@leitl.org> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> Message-ID: <4F26C35E.7060702@ias.edu> On 01/28/2012 05:17 AM, Eugen Leitl wrote: > On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: > >> What it says is that we've given up on discussing technology with you, >> because your arguments are completely nonsensical. Since you clearly >> don't understand technology, we're hoping you can at least understand >> the simple concepts of basic etiquette. > Who's the list moderator, by the way? > I don't think there is one, hence all the noise. The mailing list and beowulf.org is maintained by Penguin Computing/Scyld Software. Maybe they'd be interested in appoint a moderator or 3. --- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Jan 30 14:22:24 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 30 Jan 2012 19:22:24 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: > >> out of curiosity, has anyone set up a head-to-head comparison (two or > >> more identical machines, both with a Qlogic and a Mellanox card of > >> the same vintage)? > >> > >> There was a bit of discussion of InfiniBand benchmarking in this > >> thread > > and it seems it would be helpful to the casual readers like myself to > > have a few references to benchmarking toolkits and actual results. > > > > Most often reported results are gathered with either Netpipe from Ames > > or Intel MPI Benchmark (formerly known as Palas Benchmark) or OSU > > Micro-benchmarks. > > > > Searching the web produced a recent report from Swiss CSCS where a > > Mellanox > > ConnectX3 QDR HCA with a Mellanox switch is set against a Qlogic 7300 > > QDR HCA connected to a Qlogic switch. > > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/P > > erformance_Analysis_IB-QDR_final-2.pdf > > as far as I can tell, this paper mainly says "a coalescing stack delivers > benchmark results showing a lot higher bandwidth and message rate than a > non-coalescing stack." the comment on figure 8: > > To some extent, the environment variables mentioned before > contribute to this outstanding result > > which is remarkably droll. I'm not sure how well coalescing works for real > applications. First, I looked on the paper and it includes latency and bandwidth comparison as well, not only message rate. It is important for others to know that, and not to dismiss it. Second, both companies have options for message coalescing. You can chose to use it or not - I saw apps that got a benefit from it, and saw applications that does not. Without coalescing Mellanox provides around 30M message per second. -Gilad. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From peter.st.john at gmail.com Mon Jan 30 18:07:11 2012 From: peter.st.john at gmail.com (Peter St. John) Date: Mon, 30 Jan 2012 18:07:11 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F26C35E.7060702@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: Instead of appointing a moderator, we could grow one with recursive Page Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew about this type of thing a while ago because of "citation analysis", see the link). Someone writes an open script and members of the list mail it with the answers to these three questions: 1. do you volunteer to moderate? 2. Who should moderate? (give email addresses) 3. Who should judge who should moderate? (give email addresses). Then you iterate over scoring people by "wisdom" and who gets the most "wise" votes, until the scores converge. The biggest hurdle would probably be getting volunteers, though. Peter On Mon, Jan 30, 2012 at 11:20 AM, Prentice Bisbal wrote: > On 01/28/2012 05:17 AM, Eugen Leitl wrote: > > On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: > > > >> What it says is that we've given up on discussing technology with you, > >> because your arguments are completely nonsensical. Since you clearly > >> don't understand technology, we're hoping you can at least understand > >> the simple concepts of basic etiquette. > > Who's the list moderator, by the way? > > > > I don't think there is one, hence all the noise. The mailing list and > beowulf.org is maintained by Penguin Computing/Scyld Software. Maybe > they'd be interested in appoint a moderator or 3. > > --- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Jan 30 18:09:48 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 30 Jan 2012 18:09:48 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: <4F27233C.8080508@scalableinformatics.com> On 01/30/2012 06:07 PM, Peter St. John wrote: > Instead of appointing a moderator, we could grow one with recursive Page > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > about this type of thing a while ago because of "citation analysis", see > the link). Please ... no moderator. Lists get boring while waiting for content filtering organisms to fulfill their voluntary tasks ... If you don't like someone's writing, filter them. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 30 18:21:45 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 30 Jan 2012 15:21:45 -0800 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: The biggest hurdle would probably be getting volunteers, though. Peter You got that right... Moderating takes a deft touch and a thick skin. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Mon Jan 30 18:25:49 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 30 Jan 2012 15:25:49 -0800 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F27233C.8080508@scalableinformatics.com> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <4F27233C.8080508@scalableinformatics.com> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman Sent: Monday, January 30, 2012 3:10 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business On 01/30/2012 06:07 PM, Peter St. John wrote: > Instead of appointing a moderator, we could grow one with recursive > Page Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we > knew about this type of thing a while ago because of "citation > analysis", see the link). Please ... no moderator. Lists get boring while waiting for content filtering organisms to fulfill their voluntary tasks ... If you don't like someone's writing, filter them. -- I agree. However, there is also "after the fact moderation".. all posts go through by default, but someone acts as a "list conscience" and gently (or not so gently) applies a corrective force, presumably using some sort of adaptive algorithm (different people have different "plant characteristics" so the optimal controller changes). But that requires an even deft-er touch and thicker skin. All lists with participation by knowledgeable and opinionated people with varied interests and specialization tend to go off on tangents occasionally. You just delete when needed, and wait for the transient to die out. My best guess is that about 48 hours is how long the transient lasts (because it takes two cycles, for those who read the list once a day, to realize that it's died out and not keep feeding it) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Jan 30 18:52:14 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 30 Jan 2012 18:52:14 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: <294b053bd84fed49f071a631c79be7e8.squirrel@mail.eadline.org> I use my personal Zen type moderation. yea, whatever -- Doug > Instead of appointing a moderator, we could grow one with recursive Page > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > about this type of thing a while ago because of "citation analysis", see > the link). > > Someone writes an open script and members of the list mail it with the > answers to these three questions: > 1. do you volunteer to moderate? > 2. Who should moderate? (give email addresses) > 3. Who should judge who should moderate? (give email addresses). > > Then you iterate over scoring people by "wisdom" and who gets the most > "wise" votes, until the scores converge. > The biggest hurdle would probably be getting volunteers, though. > Peter > > On Mon, Jan 30, 2012 at 11:20 AM, Prentice Bisbal > wrote: > >> On 01/28/2012 05:17 AM, Eugen Leitl wrote: >> > On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: >> > >> >> What it says is that we've given up on discussing technology with >> you, >> >> because your arguments are completely nonsensical. Since you clearly >> >> don't understand technology, we're hoping you can at least understand >> >> the simple concepts of basic etiquette. >> > Who's the list moderator, by the way? >> > >> >> I don't think there is one, hence all the noise. The mailing list and >> beowulf.org is maintained by Penguin Computing/Scyld Software. Maybe >> they'd be interested in appoint a moderator or 3. >> >> --- >> Prentice >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pbm.com Tue Jan 31 02:53:18 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 30 Jan 2012 23:53:18 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: <20120131075318.GA2600@bx9.net> On Mon, Jan 30, 2012 at 10:04:53AM -0500, Mark Hahn wrote: > > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/Performance_Analysis_IB-QDR_final-2.pdf > > as far as I can tell, this paper mainly says "a coalescing stack delivers > benchmark results showing a lot higher bandwidth and message rate than a > non-coalescing stack." the comment on figure 8: > > To some extent, the environment variables mentioned before > contribute to this outstanding result > > which is remarkably droll. I'm not sure how well coalescing works for real > applications. Note also that many of the benchmarks in this analysis weren't run using MPI -- if I remember correctly, the ib_* commands mentioned use InfiniBand verbs directly, which means they aren't accellerated on InfiniPath. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 31 04:28:18 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 31 Jan 2012 10:28:18 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F27233C.8080508@scalableinformatics.com> References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <4F27233C.8080508@scalableinformatics.com> Message-ID: <20120131092818.GW7343@leitl.org> On Mon, Jan 30, 2012 at 06:09:48PM -0500, Joe Landman wrote: > On 01/30/2012 06:07 PM, Peter St. John wrote: > > Instead of appointing a moderator, we could grow one with recursive Page > > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > > about this type of thing a while ago because of "citation analysis", see > > the link). > > Please ... no moderator. Lists get boring while waiting for content > filtering organisms to fulfill their voluntary tasks ... On all the lists I run and participate in you only turn moderation on by default for new list members and put known bozos on permanent moderation. The result is zero delay as soon as new list subscribers have produced their first non-spam non-bozo post. > If you don't like someone's writing, filter them. I already do, but content producers typically don't bother and vote with their feet. I have seen many communities die in that manner. Never surprising, still always sad. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 31 04:31:04 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 31 Jan 2012 10:31:04 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: <20120131093104.GX7343@leitl.org> On Mon, Jan 30, 2012 at 03:21:45PM -0800, Lux, Jim (337C) wrote: > > > The biggest hurdle would probably be getting volunteers, though. > Peter > > You got that right... Moderating takes a deft touch and a thick skin. I would have no issues moderating Beowulf@ since that would require only negligible additional workload. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Glen.Beane at jax.org Tue Jan 31 07:15:51 2012 From: Glen.Beane at jax.org (Glen Beane) Date: Tue, 31 Jan 2012 12:15:51 +0000 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <20120131093104.GX7343@leitl.org> References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <20120131093104.GX7343@leitl.org> Message-ID: On Jan 31, 2012, at 4:31 AM, Eugen Leitl wrote: > On Mon, Jan 30, 2012 at 03:21:45PM -0800, Lux, Jim (337C) wrote: >> >> >> The biggest hurdle would probably be getting volunteers, though. >> Peter >> >> You got that right... Moderating takes a deft touch and a thick skin. > > I would have no issues moderating Beowulf@ since that would > require only negligible additional workload. Did this list used to be moderated? I remember when I first joined there would be a significant delay for my email sent to the list, while I was waiting for my replies to show up a whole conversation would be unfolding between "veteran posters" -- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at cse.psu.edu Tue Jan 31 10:30:48 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Tue, 31 Jan 2012 10:30:48 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <20120131093104.GX7343@leitl.org> Message-ID: <4F280928.7080806@cse.psu.edu> On 01/31/2012 07:15 AM, Glen Beane wrote: > Did this list used to be moderated? I remember when I first joined there would be a significant delay for my email sent to the list, while I was waiting for my replies to show up a whole conversation would be unfolding between "veteran posters" Yea, same used to happen to me back in '06 when I first joined. Sent an email about it and got a response back from Don Becker stating that I was taken off the moderation list. I'm not sure if he's still the moderator anymore, however. While I think that's a great way to deal with newcomers, I'm not sure there is a fair way to determine which of the existing posters are and are not trolls deserving of moderation. Therefore I also vote to continue in a non-moderated fashion. On that note, my sincere apologies to the list if any of my replies served in any way to kindle this discussion. I got a bit colorful due to a building frustration from years of eye-rolling. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Tue Jan 31 10:40:48 2012 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Tue, 31 Jan 2012 22:40:48 +0700 Subject: [Beowulf] List moderation In-Reply-To: <4F280928.7080806@cse.psu.edu> References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <20120131093104.GX7343@leitl.org> <4F280928.7080806@cse.psu.edu> Message-ID: <4F280B80.6030800@pathscale.com> On 01/31/12 10:30 PM, Ellis H. Wilson III wrote: > On 01/31/2012 07:15 AM, Glen Beane wrote: >> Did this list used to be moderated? I remember when I first joined there would be a significant delay for my email sent to the list, while I was waiting for my replies to show up a whole conversation would be unfolding between "veteran posters" > Yea, same used to happen to me back in '06 when I first joined. Sent an > email about it and got a response back from Don Becker stating that I > was taken off the moderation list. I'm not sure if he's still the > moderator anymore, however. While I think that's a great way to deal > with newcomers, I'm not sure there is a fair way to determine which of > the existing posters are and are not trolls deserving of moderation. > Therefore I also vote to continue in a non-moderated fashion. -1 From a bystander perspective I'm all for moderation and reducing the noise. Even people who have their posts moderated would likely be understanding that it's for the greater good. Lets call it peer review instead of "moderation". imho someone with some guts just needs to do it so this doesn't turn into a bikeshed discussion _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From joshua_mora at usa.net Tue Jan 31 14:19:46 2012 From: joshua_mora at usa.net (Joshua mora acosta) Date: Tue, 31 Jan 2012 13:19:46 -0600 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business Message-ID: <525qaETsU7536S02.1328037586@web02.cms.usa.net> I agree with Joe. Plus I know that most of us, if not all, truly want to share knowledge, and why not, opinions as well based on personal experiences as long as "we all do the effort to be respectful with both the individual and the technology and being open /receptive to be criticized as well". That is in fact the reason I like this distribution list. Joshua. ------ Original Message ------ Received: 05:11 PM CST, 01/30/2012 From: Joe Landman To: beowulf at beowulf.org Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business > On 01/30/2012 06:07 PM, Peter St. John wrote: > > Instead of appointing a moderator, we could grow one with recursive Page > > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > > about this type of thing a while ago because of "citation analysis", see > > the link). > > Please ... no moderator. Lists get boring while waiting for content > filtering organisms to fulfill their voluntary tasks ... > > If you don't like someone's writing, filter them. > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mdidomenico4 at gmail.com Tue Jan 31 15:55:55 2012 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Tue, 31 Jan 2012 15:55:55 -0500 Subject: [Beowulf] rear door heat exchangers Message-ID: i'm looking for, but have not found yet, a rear door heat exchanger with fans. the door should be able to support up to 35kw using chilled water. has anyone seen such an animal? most of the ones i've seen utilize a side car that sits beside the rack. unfortunately, i'm space limited and i need something that will hang on the back of the rack. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lathama at gmail.com Tue Jan 31 16:13:48 2012 From: lathama at gmail.com (Andrew Latham) Date: Tue, 31 Jan 2012 18:13:48 -0300 Subject: [Beowulf] rear door heat exchangers In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 5:55 PM, Michael Di Domenico wrote: > i'm looking for, but have not found yet, a rear door heat exchanger > with fans. ?the door should be able to support up to 35kw using > chilled water. ?has anyone seen such an animal? > > most of the ones i've seen utilize a side car that sits beside the > rack. ?unfortunately, i'm space limited and i need something that will > hang on the back of the rack. > _____________________________ Maybe: http://www.hoffmanonline.com/product_catalog/section_index.aspx?cat_1=34&cat_2=2383&SelectCatId=2383&CatId=2383 Semi Related question: Has any research been done on cooling the racks/rails/metal infrastructure in the effort to cool the whole rack+systems? -- ~ Andrew "lathama" Latham lathama at gmail.com http://lathama.net ~ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 31 18:47:18 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 31 Jan 2012 15:47:18 -0800 Subject: [Beowulf] rear door heat exchangers In-Reply-To: References: Message-ID: Maybe there's an issue with the weight and or flexible tubing on a swinging door? The Hoffman products in Andrew's email, I think, aren't the kind that hang on a door, more hang on the side of a large box/cabinet (Type 4,12, 3R enclosure) or wall. They're also air/air heat exchanges or airconditioners (and vortex coolers.. but you don't want one of those unless you have a LOT of compressed air available) http://www.42u.com/cooling/liquid-cooling/liquid-cooling.htm shows "in-row liquid cooling" but I think that's sort of in parallel They do mention, lower down on the page, "Rear Door Liquid Cooling" But I notice that the Liebert XDF-5 which is basically a rack and chiller deck in one, only pulls out 14kW. >From DoE: http://www1.eere.energy.gov/femp/pdfs/rdhe_cr.pdf They refer the ones installed at LLBL as RDHx units, but carefully avoid telling you the brand or any decent data. They do say they cost $6k/door, and suck up 10-11kW/rack with 9 gal/min flow of 72F water. Googling RDHx turns up "CoolCentric.com" http://www.coolcentric.com/resources/data_sheets/Coolcentric-Rear-Door-Heat-Exchanger-Data-Sheet.pdf 33kW is as good as they can do. I also note that they have no fans in them. -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Michael Di Domenico Sent: Tuesday, January 31, 2012 12:56 PM To: Beowulf Mailing List Subject: [Beowulf] rear door heat exchangers i'm looking for, but have not found yet, a rear door heat exchanger with fans. the door should be able to support up to 35kw using chilled water. has anyone seen such an animal? most of the ones i've seen utilize a side car that sits beside the rack. unfortunately, i'm space limited and i need something that will hang on the back of the rack. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From sdm900 at gmail.com Tue Jan 31 18:54:48 2012 From: sdm900 at gmail.com (Stu Midgley) Date: Wed, 1 Feb 2012 07:54:48 +0800 Subject: [Beowulf] rear door heat exchangers In-Reply-To: References: Message-ID: Speak to SGI. We have about a dozen such racks, all from SGI. On Wed, Feb 1, 2012 at 4:55 AM, Michael Di Domenico wrote: > i'm looking for, but have not found yet, a rear door heat exchanger > with fans. ?the door should be able to support up to 35kw using > chilled water. ?has anyone seen such an animal? > > most of the ones i've seen utilize a side car that sits beside the > rack. ?unfortunately, i'm space limited and i need something that will > hang on the back of the rack. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Dr Stuart Midgley sdm900 at gmail.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Herbert.Fruchtl at st-andrews.ac.uk Tue Jan 31 19:18:10 2012 From: Herbert.Fruchtl at st-andrews.ac.uk (Herbert Fruchtl) Date: Wed, 1 Feb 2012 00:18:10 +0000 Subject: [Beowulf] moderation - was cpu's versus gpu's - was Intel buys QLogic Message-ID: <97E75B730E3076479EB136E2BC5D410054874491@uos-dun-mbx1> Folks, I missed part of this discussion (for obvious reasons I lost interest), but since it seems to be moving in that direction, I'll throw in my two smallest-local-currency-units. I'm a lurker (in old usenet parlance) on this list: reading, but very rarely posting. There are probably many of us, but the others are posting even more rarely... As long as we don't get real off-topic discussions that attract the weirdos of the Internet (global warming anybody? intelligent design? even C/Fortran tends to peter out quickly nowadays), I am opposed to censorship (aka moderation). The simplistic arguments are: 1) This is my own, selfish, most important argument: it costs time! When, every two years, I have a technical question for the list, I don't want to wait until the USA is out of bed and hope that the moderator isn't at a conference for a week. 2) You need a moderator. It's quite some work, so it will only be done by somebody who gets some satisfaction out of it. This means that the job will attract exactly the kind of people who will not moderate neutrally and dispassionately. Even if they try, there's the fact that power corrupts. You're tempted to censor views that are too far from your own ("ludicrous" is the word you would use), and in the end you have an in-crowd confirming each other's views. 3) You are opening yourself to lawsuits. If something is said on the list that, let's say Intel's corporate lawyers find defamatory, they may go after the moderator. If you really find somebody's views (and their presentation) objectionable, just killfile them (it's called "filter" in the 21st century). And if certain people think ad hominem attacks help their case, ignore them instead of thinking you can look dignified in taking them on in their own game. You won't. Back to those dark alleys where we lurkers feel at home... Herbert _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 2 14:12:47 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 02 Jan 2012 14:12:47 -0500 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> Message-ID: <4F0201AF.6080509@ias.edu> On 12/29/2011 02:49 PM, Mark Hahn wrote: > guys, this isn't a dating site. ...yet. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- MailScanner: clean From prentice at ias.edu Mon Jan 2 14:15:16 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 02 Jan 2012 14:15:16 -0500 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> Message-ID: <4F020244.4040505@ias.edu> On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: > it's very useful Mark, as we know now he works for the company and > also for which nation. > > Vincent For someone who's always bashing on US Foreign policy, you sure sound like a Republican or member of the Department of Homeland Security! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- MailScanner: clean From eugen at leitl.org Wed Jan 11 04:13:02 2012 From: eugen at leitl.org (Eugen Leitl) Date: Wed, 11 Jan 2012 10:13:02 +0100 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems Message-ID: <20120111091302.GU21917@leitl.org> ----- Forwarded message from Georg Hager ----- From: Georg Hager Date: Wed, 11 Jan 2012 01:40:09 +0100 (CET) To: eugen at leitl.org Subject: Course: Parallel Programming of High Performance Systems "Parallel Programming of High Performance Systems" is the yearly course provided by LRZ and RRZE that gives students and scientists a solid introduction to - Processor and HPC system architectures - Code development and basic tools - Scalar optimizations (generic and architecture-specific) - Parallelization basics - Parallel programming with OpenMP and MPI There will also be an additional course with advanced topics, which covers - Parallel performance tools for MPI and OpenMP - Parallel I/O with MPI I/O - I/O tuning and libraries Hands-on sessions will enable participants to apply the concepts right away. Although the federal HPC system at LRZ Munich is treated in some detail, most of the conveyed concepts are of general use. You can find the preliminary course agendas on the web: Basic course: Advanced course: This year the basic course is hosted by RRZE in Erlangen and will be available at LRZ in Garching via videoconferencing, if a sufficient number of people are interested. Hands-On sessions will then be provided at both locations. The advanced course will be hosted by LRZ in Garching. Basic course: ============ Location: RRZE, Martensstr. 1, 91058 Erlangen Date: March 5-9, 2012, 9:00-18:00 Advanced course: =============== Location: LRZ, Boltzmannstr. 1, 85748 Garching b. Muenchen Date: March 19-22, 2012, 9:00-18:00 There is no course fee. Please register for course "HPPP1W11" and/or "HPAT1W11" at the following LRZ website: Hoping to see you there, G. Hager -- Dr. Georg Hager, HPC Services Friedrich-Alexander-Universitaet Erlangen-Nuernberg Regionales RechenZentrum Erlangen (RRZE) Martensstrasse 1, 91058 Erlangen, Germany Tel. +49 9131 85-28973, Fax +49 9131 302941 mailto:georg.hager at rrze.uni-erlangen.de http://www.hpc.rrze.uni-erlangen.de/ ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 10:36:48 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 16:36:48 +0100 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems In-Reply-To: <20120111091302.GU21917@leitl.org> References: <20120111091302.GU21917@leitl.org> Message-ID: <3DCEF7EC-45ED-43C0-9345-A59938AB9861@xs4all.nl> Yeah, the sheets are there from the 2003 lecture. filename LRZ210703_1.pdf Very helpful if you have grey hair and want to port your years 80 fortran code to todays HPC hardware. Vincent On Jan 11, 2012, at 10:13 AM, Eugen Leitl wrote: > ----- Forwarded message from Georg Hager erlangen.de> ----- > > From: Georg Hager > Date: Wed, 11 Jan 2012 01:40:09 +0100 (CET) > To: eugen at leitl.org > Subject: Course: Parallel Programming of High Performance Systems > > "Parallel Programming of High Performance Systems" is the > yearly course provided by LRZ and RRZE that gives students > and scientists a solid introduction to > > - Processor and HPC system architectures > - Code development and basic tools > - Scalar optimizations (generic and architecture-specific) > - Parallelization basics > - Parallel programming with OpenMP and MPI > > There will also be an additional course with advanced topics, > which covers > > - Parallel performance tools for MPI and OpenMP > - Parallel I/O with MPI I/O > - I/O tuning and libraries > > Hands-on sessions will enable participants to apply the concepts > right away. > > Although the federal HPC system at LRZ Munich is treated in some > detail, most of the conveyed concepts are of general use. > You can find the preliminary course agendas on the web: > > Basic course: > > > Advanced course: > > > This year the basic course is hosted by RRZE in Erlangen and > will be available at LRZ in Garching via videoconferencing, > if a sufficient number of people are interested. Hands-On > sessions will then be provided at both locations. The advanced > course will be hosted by LRZ in Garching. > > Basic course: > ============ > Location: RRZE, Martensstr. 1, 91058 Erlangen > Date: March 5-9, 2012, 9:00-18:00 > > Advanced course: > =============== > Location: LRZ, Boltzmannstr. 1, 85748 Garching b. Muenchen > Date: March 19-22, 2012, 9:00-18:00 > > > There is no course fee. > > Please register for course "HPPP1W11" and/or "HPAT1W11" > at the following LRZ website: > > > > Hoping to see you there, > G. Hager > > -- > Dr. Georg Hager, HPC Services > Friedrich-Alexander-Universitaet Erlangen-Nuernberg > Regionales RechenZentrum Erlangen (RRZE) > Martensstrasse 1, 91058 Erlangen, Germany > Tel. +49 9131 85-28973, Fax +49 9131 302941 > mailto:georg.hager at rrze.uni-erlangen.de > http://www.hpc.rrze.uni-erlangen.de/ > > ----- End forwarded message ----- > -- > Eugen* Leitl leitl http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 11:09:00 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 08:09:00 -0800 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems In-Reply-To: <3DCEF7EC-45ED-43C0-9345-A59938AB9861@xs4all.nl> Message-ID: I don't have grey hair (part grey beard, I confess), but I have plenty of 70s era FORTRAN that benefits from parallelization. Numerical Electromagnetics Code V4, specifically. The implementation has been throughly validated and have been used for decades, finding all the little idiosyncracies and dealing with numerical precision issues, etc. There's extensive software around that generates the card image input files it expects and parses the line printer output files (with the 1 in column 1 for a page break). Rewriting it from scratch would not be a very good use of time. You'd have to revisit all the years of validation, make sure there were subtle differences in function, because while there's an official validation suite, it's more to make sure that the compile worked ok and there's not an egregious problem. And who knows what users out there have depended on some idiosyncratic implementation aspects. I suspect the same is true for lots of fluid mechanics and other FEM codes (NASTRAN, for instance). So an incremental approach of parallelizing that old FORTRAN, replacing pieces with "new FORTRAN", for instance, might be useful. (and don't get me started on my experiences with the f2c engine) On 1/11/12 7:36 AM, "Vincent Diepeveen" wrote: >Yeah, the sheets are there from the 2003 lecture. >filename LRZ210703_1.pdf > >Very helpful if you have grey hair and want to port your years 80 >fortran code to todays HPC hardware. > >Vincent > >On Jan 11, 2012, at 10:13 AM, Eugen Leitl wrote: > >> ----- Forwarded message from Georg Hager > erlangen.de> ----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 11:18:41 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 08:18:41 -0800 Subject: [Beowulf] A cluster of Arduinos Message-ID: For educational purposes.. Has anyone done something where they implement some sort of message passing API on a network of Arduinos. Since they cost only $20 each, and have a fairly facile development environment, it seems you could put together a simple demonstration of parallel processing and various message passing things. For instance, you could introduce errors in the message links and do experiments with Byzantine General type algorithms, or with multiple parallel routes, etc. I've not actually tried hooking up multiple arduinos through a USB hub to one PC, but if that works, it gives you a nice "head node, debug console" sort of interface. Smaller, lighter, cheaper than lashing together MiniITX mobos or building a Wal-Mart Cluster. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Wed Jan 11 12:00:43 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:00:43 +0100 Subject: [Beowulf] Course: Parallel Programming of High Performance Systems In-Reply-To: References: Message-ID: <7B7DB325-4FFB-4C68-9602-2E1E71B41D12@xs4all.nl> On Jan 11, 2012, at 5:09 PM, Lux, Jim (337C) wrote: > I don't have grey hair (part grey beard, I confess), but I have > plenty of > 70s era FORTRAN that benefits from parallelization. > Numerical Electromagnetics Code V4, specifically. > > The implementation has been throughly validated and have been used for > decades, finding all the little idiosyncracies and dealing with > numerical > precision issues, etc. There's extensive software around that > generates > the card image input files it expects and parses the line printer > output > files (with the 1 in column 1 for a page break). > > Rewriting it from scratch would not be a very good use of time. > You'd have > to revisit all the years of validation, make sure there were subtle > differences in function, because while there's an official validation > suite, it's more to make sure that the compile worked ok and > there's not > an egregious problem. And who knows what users out there have > depended on > some idiosyncratic implementation aspects. > > I suspect the same is true for lots of fluid mechanics and other > FEM codes > (NASTRAN, for instance). > > So an incremental approach of parallelizing that old FORTRAN, > replacing > pieces with "new FORTRAN", for instance, might be useful. > > (and don't get me started on my experiences with the f2c engine) > No need to get started Jim, NASA can ask that the Russians as well. > > > On 1/11/12 7:36 AM, "Vincent Diepeveen" wrote: > >> Yeah, the sheets are there from the 2003 lecture. >> filename LRZ210703_1.pdf >> >> Very helpful if you have grey hair and want to port your years 80 >> fortran code to todays HPC hardware. >> >> Vincent >> >> On Jan 11, 2012, at 10:13 AM, Eugen Leitl wrote: >> >>> ----- Forwarded message from Georg Hager >> erlangen.de> ----- > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Jan 11 11:58:59 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 11 Jan 2012 11:58:59 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <4F0DBFD3.3070503@ias.edu> On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: > > For educational purposes.. > > Has anyone done something where they implement some sort of message > passing API on a network of Arduinos. Since they cost only $20 each, > and have a fairly facile development environment, it seems you could > put together a simple demonstration of parallel processing and various > message passing things. > > For instance, you could introduce errors in the message links and do > experiments with Byzantine General type algorithms, or with multiple > parallel routes, etc. > > I've not actually tried hooking up multiple arduinos through a USB hub > to one PC, but if that works, it gives you a nice "head node, debug > console" sort of interface. > > Smaller, lighter, cheaper than lashing together MiniITX mobos or > building a Wal-Mart Cluster. > I started tinkering with Arduinos a couple of months ago. Got lots of related goodies for Christmas, so I've been looking like a mad scientist building arduino things lately. I'm still a beginner arduino hacker, but I'd be game for giving this a try, if anyone else wants to give this a go. The Arduino Due, which is overdue in the marketplace, will have a Cortex-M3 ARM processor. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 12:30:30 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:30:30 +0100 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: <4F020244.4040505@ias.edu> References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> <4F020244.4040505@ias.edu> Message-ID: <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> On Jan 2, 2012, at 8:15 PM, Prentice Bisbal wrote: > On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: >> it's very useful Mark, as we know now he works for the company and >> also for which nation. >> >> Vincent > > For someone who's always bashing on US Foreign policy, you sure sound > like a Republican or member of the Department of Homeland Security! Where is my paycheck? > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ntmoore at gmail.com Wed Jan 11 12:31:30 2012 From: ntmoore at gmail.com (Nathan Moore) Date: Wed, 11 Jan 2012 11:31:30 -0600 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0DBFD3.3070503@ias.edu> References: <4F0DBFD3.3070503@ias.edu> Message-ID: I think something like the Raspberry Pi might be easier for this sort of task. They'll also be about $25, but they'll run something like ARM/linux. Not out yet thought. http://www.raspberrypi.org/ On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal wrote: > On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >> >> For educational purposes.. >> >> Has anyone done something where they implement some sort of message >> passing API on a network of Arduinos. ?Since they cost only $20 each, >> and have a fairly facile development environment, it seems you could >> put together a simple demonstration of parallel processing and various >> message passing things. >> >> For instance, you could introduce errors in the message links and do >> experiments with Byzantine General type algorithms, or with multiple >> parallel routes, etc. >> >> I've not actually tried hooking up multiple arduinos through a USB hub >> to one PC, but if that works, it gives you a nice "head node, debug >> console" sort of interface. >> >> Smaller, lighter, cheaper than lashing together MiniITX mobos or >> building a Wal-Mart Cluster. >> > > I started tinkering with Arduinos a couple of months ago. Got lots of > related goodies for Christmas, so I've been looking like a mad scientist > building arduino things lately. I'm still a beginner arduino hacker, but > I'd be game for giving this a try, ?if anyone else wants to give this a go. > > The Arduino Due, which is overdue in the marketplace, will have a > Cortex-M3 ARM processor. > > -- > Prentice > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- - - - - - - -?? - - - - - - -?? - - - - - - - Nathan Moore Associate Professor, Physics Winona State University - - - - - - -?? - - - - - - -?? - - - - - - - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 12:43:17 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:43:17 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0DBFD3.3070503@ias.edu> References: <4F0DBFD3.3070503@ias.edu> Message-ID: <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> On Jan 11, 2012, at 5:58 PM, Prentice Bisbal wrote: > On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >> >> For educational purposes.. >> >> Has anyone done something where they implement some sort of message >> passing API on a network of Arduinos. Since they cost only $20 each, >> and have a fairly facile development environment, it seems you could >> put together a simple demonstration of parallel processing and >> various >> message passing things. >> >> For instance, you could introduce errors in the message links and do >> experiments with Byzantine General type algorithms, or with multiple >> parallel routes, etc. >> >> I've not actually tried hooking up multiple arduinos through a USB >> hub >> to one PC, but if that works, it gives you a nice "head node, debug >> console" sort of interface. >> >> Smaller, lighter, cheaper than lashing together MiniITX mobos or >> building a Wal-Mart Cluster. >> > > I started tinkering with Arduinos a couple of months ago. Got lots of > related goodies for Christmas, so I've been looking like a mad > scientist > building arduino things lately. I'm still a beginner arduino > hacker, but > I'd be game for giving this a try, if anyone else wants to give > this a go. > > The Arduino Due, which is overdue in the marketplace, will have a > Cortex-M3 ARM processor. Completely superior chip that Cortex-M3. Though i couldn't program much for it so far - difficult to get contract jobs for. Can do fast multiplication 32 x 32 bits. You can even implement RSA very fast on that chip. Runs at 70Mhz or so? Usually writing assembler for such CPU's is more efficient by the way than using a compiler. Compilers are not so efficient, to say polite, for embedded cpu's. Writing assembler for such cpu's is pretty straightforward, whereas in HPC things are far more complicated because of vectorization. AVX is the latest there. Speaking of AVX, is there already lots of HPC support for AVX? I see that after years of wrestling the George Woltman released some prime number code (GWNUM), of course as always: in beta for the remainder of this century, which uses AVX. Claims are that it's a tad faster than the existing SIMD codes. I saw claims of even above 20% faster, which is really a lot at that level of engineering; usually you work 6 months for 0.5% speedup. If you improve algorithm, you still lose it from this code, as your C/ C++ code will be default a factor 10 slower if not more. I remember how i found a clever caching trick in 2006 for a Numeric Theoretic Transform (that's a FFT but then in integers, so without the rounding errors that the floating point FFT's give), yet after some hard work there my C code still was factor 8 slower than Woltman's SIMD assembler. > > -- > Prentice > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 12:44:43 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 18:44:43 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> Message-ID: <940F5BCF-8CC3-4461-ABA4-79FBCF9BF057@xs4all.nl> That's all very expensive considering the cpu's are under $1 i'd guess. I actually might need some of this stuff some months from now to build some robots. On Jan 11, 2012, at 6:31 PM, Nathan Moore wrote: > I think something like the Raspberry Pi might be easier for this sort > of task. They'll also be about $25, but they'll run something like > ARM/linux. Not out yet thought. > > http://www.raspberrypi.org/ > > On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal > wrote: >> On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >>> >>> For educational purposes.. >>> >>> Has anyone done something where they implement some sort of message >>> passing API on a network of Arduinos. Since they cost only $20 >>> each, >>> and have a fairly facile development environment, it seems you could >>> put together a simple demonstration of parallel processing and >>> various >>> message passing things. >>> >>> For instance, you could introduce errors in the message links and do >>> experiments with Byzantine General type algorithms, or with multiple >>> parallel routes, etc. >>> >>> I've not actually tried hooking up multiple arduinos through a >>> USB hub >>> to one PC, but if that works, it gives you a nice "head node, debug >>> console" sort of interface. >>> >>> Smaller, lighter, cheaper than lashing together MiniITX mobos or >>> building a Wal-Mart Cluster. >>> >> >> I started tinkering with Arduinos a couple of months ago. Got lots of >> related goodies for Christmas, so I've been looking like a mad >> scientist >> building arduino things lately. I'm still a beginner arduino >> hacker, but >> I'd be game for giving this a try, if anyone else wants to give >> this a go. >> >> The Arduino Due, which is overdue in the marketplace, will have a >> Cortex-M3 ARM processor. >> >> -- >> Prentice >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > - - - - - - - - - - - - - - - - - - - - - > Nathan Moore > Associate Professor, Physics > Winona State University > - - - - - - - - - - - - - - - - - - - - - > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 12:58:13 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 09:58:13 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> Message-ID: Yes.. better the widget that one can whip on down to Radio Shack and buy on my way home from work than the ghostware that may live for Christmas future. Also, does the Raspberry PI $25 price point include a power supply? The Arduino runs off the USB 5V power, so it's one less thing to hassle with. I don't know that performance is all that important in this application. It's more to experiment with message passing in a multiprocessor system. Slow is fine. (I can't think of a computational application for a ArdWulf (combining Italian and Saxon) that wouldn't be blown away by almost any single computer, including something like a smart phone) Realistically, you're looking at bitbanging kinds of serial interfaces. I can see several network implementations: SPI shared bus, Hypercubes, toroidal surfaces, etc. -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Nathan Moore Sent: Wednesday, January 11, 2012 9:32 AM To: Prentice Bisbal Cc: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos I think something like the Raspberry Pi might be easier for this sort of task. They'll also be about $25, but they'll run something like ARM/linux. Not out yet thought. http://www.raspberrypi.org/ On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal wrote: > On 01/11/2012 11:18 AM, Lux, Jim (337C) wrote: >> >> For educational purposes.. >> >> Has anyone done something where they implement some sort of message >> passing API on a network of Arduinos. ?Since they cost only $20 each, >> and have a fairly facile development environment, it seems you could >> put together a simple demonstration of parallel processing and >> various message passing things. >> >> _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:00:36 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:00:36 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> Message-ID: > The Arduino Due, which is overdue in the marketplace, will have a > Cortex-M3 ARM processor. Completely superior chip that Cortex-M3. Though i couldn't program much for it so far - difficult to get contract jobs for. Can do fast multiplication 32 x 32 bits. You can even implement RSA very fast on that chip. Runs at 70Mhz or so? Usually writing assembler for such CPU's is more efficient by the way than using a compiler. Compilers are not so efficient, to say polite, for embedded cpu's. Writing assembler for such cpu's is pretty straightforward, whereas in HPC things are far more complicated because of vectorization. -->> ah, but this is not really a HPC application. It's a cluster computer architecture demonstration platform. The Java based arduino environment is pretty simple and multiplatform. Yes, it uses a sort of weird C-like language, but there it is... it's easy to use. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:19:24 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:19:24 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> Message-ID: Yes.. And there's been a bunch of "value clusters" over the years (StoneSouperComputer, for instance).. But that's still $3k. I could see putting together 8 nodes for a few hundred dollars. Arduino Uno R3 is about $25 each in quantity. Think in terms of a small class where you want to have, say, 10 mini-clusters, one per student. No sharing, etc. -----Original Message----- From: Alex Chekholko [mailto:alex.chekholko at gmail.com] Sent: Wednesday, January 11, 2012 10:12 AM To: Lux, Jim (337C) Cc: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos The LittleFe cluster is designed specifically for teaching and demonstration. Current cost is ~$3k. But it's all standard x86 and runs Linux and even has GPUs. http://littlefe.net/ I saw them build a bunch of them at SC11. On Wed, Jan 11, 2012 at 10:00 AM, Lux, Jim (337C) wrote: > ?It's a cluster computer architecture demonstration platform. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:27:31 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:27:31 -0800 Subject: [Beowulf] PAPERS interface Message-ID: Arghh.. my google-fu is failing me.. I'm looking for the papers on the PAPERS cluster interface (based on using parallel ports.. back in the 90s) and, of course, if you search for the word papers, you get nothing useful.. I can't remember who the authors were or where it was done (I'm thinking in the SouthEast US, for some reason, but I'm not sure) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sabujp at gmail.com Wed Jan 11 13:35:17 2012 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Wed, 11 Jan 2012 12:35:17 -0600 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: https://www.google.com/search?hl=en&q=%22PAPERS%22%20parallel%20port%20interface&btnG=Google+Search http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1183&context=ecetr HTH, Sabuj Google Proxy Certified Search Partner On Wed, Jan 11, 2012 at 12:27 PM, Lux, Jim (337C) wrote: > Arghh.. my google-fu is failing me.. > > > > I?m looking for the papers on the PAPERS cluster interface (based on using > parallel ports.. back in the 90s) and, of course, if you search for the word > papers, you get nothing useful.. > > > > I can?t remember who the authors were or where it was done (I?m thinking in > the SouthEast US, for some reason, but I?m not sure) > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 13:37:14 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:37:14 -0800 Subject: [Beowulf] PAPERS interface In-Reply-To: <4F0DD65B.3060808@nasa.gov> References: <4F0DD65B.3060808@nasa.gov> Message-ID: Thanks.. Also props to Juan Gallego who found it, too.. From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] Sent: Wednesday, January 11, 2012 10:35 AM To: Lux, Jim (337C) Cc: beowulf at beowulf.org Subject: Re: [Beowulf] PAPERS interface On 01/11/12 10:27, Lux, Jim (337C) wrote: Arghh.. my google-fu is failing me.. I'm looking for the papers on the PAPERS cluster interface (based on using parallel ports.. back in the 90s) and, of course, if you search for the word papers, you get nothing useful.. I can't remember who the authors were or where it was done (I'm thinking in the SouthEast US, for some reason, but I'm not sure) Hi Jim. The lead author is Hank Dietz. The acronym is: PAPERS: Purdue's adapter for parallel execution and rapid synchronization. Cheers from NASA Ames... -jeff -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Jan 11 13:39:41 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 10:39:41 -0800 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: Excellent.. Purdue.. and have we really been beowulfing since 1994? I'll be that the earliest clusters can legally buy alcohol now... So, If I build a cluster with Arduinos using the PAPERS style interface, what will it be called... BeoPaperDuino? From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Lux, Jim (337C) Sent: Wednesday, January 11, 2012 10:28 AM To: beowulf at beowulf.org Subject: [Beowulf] PAPERS interface Arghh.. my google-fu is failing me.. I'm looking for the papers on the PAPERS cluster interface (based on using parallel ports.. back in the 90s) and, of course, if you search for the word papers, you get nothing useful.. I can't remember who the authors were or where it was done (I'm thinking in the SouthEast US, for some reason, but I'm not sure) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Wed Jan 11 14:38:53 2012 From: atp at piskorski.com (Andrew Piskorski) Date: Wed, 11 Jan 2012 14:38:53 -0500 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: <20120111193853.GA86203@piskorski.com> On Wed, Jan 11, 2012 at 10:27:31AM -0800, Lux, Jim (337C) wrote: > I'm looking for the papers on the PAPERS cluster interface (based on > using parallel ports.. back in the 90s) and, of course, if you It also came up a few times here on the list, e.g.: http://www.beowulf.org/archive/2004-October/010934.html From: Tim Mattox Date: Sat Oct 16 15:15:14 PDT 2004 -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 17:47:00 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 23:47:00 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> Message-ID: <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Jim, your microcontroller cluster is not a rather good idea. Latency didn't keep up with the CPU speeds... Todays nodes have a CPU core or 12 and soon 16 which can execute, let's take a simple integer example in my chessprogram and its IPC, about 24 instructions per cycle So nothing SIMD, just simple integer instructions most of it, of course loads which effectively come from L1 play an overwhelming role there. typical latencies to do a random memory read from the remote nodes, even with the latest networks, it's between 0.85 and 1.9 microseconds. Let's take optimistic 1 microsecond. RDMA read... So in that timeframe you can execute 24k+ instructions. IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for most codes. Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are busy with rough measures here. Let's call that 1/4 millisecond. Even USB 1.1 has to sticks latencies far under 1 millisecond. So actual latency of todays clusters is factor 25k worse than this 'cluster'. In fact your microcontrollercluster here has latencies that you do not even have core to core within a single CPU today. There is still too much years 80s and years 90s software out there, written by the guys who wrote books about how to parallellize, which simply doesn't scale at all at modern hardware. Let me not quote too many names there as i've done before. They were just too lazy to throw away their old code and start over new writing a new parallel concept that works at todays hardware. If we involve GPU's now then there is gonna be an even bigger problem and that's that bandwidth of the network can't keep up with what a single GPU delivers. Who is to blame for that is quite a complicated discussion, if anyone has to be blamed anyway. We just need more clever algorithms there. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 17:56:12 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 11 Jan 2012 23:56:12 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: <106FFC0A-B488-4A39-8C55-7FD27C3BCFC1@xs4all.nl> On Jan 11, 2012, at 11:47 PM, Vincent Diepeveen wrote: > Jim, your microcontroller cluster is not a rather good idea. > > Latency didn't keep up with the CPU speeds... > > Todays nodes have a CPU core or 12 and soon 16 which can execute, > let's take a simple integer example in my chessprogram and its IPC, > about 24 instructions per cycle > > So nothing SIMD, just simple integer instructions most of it, of > course loads which effectively > come from L1 play an overwhelming role there. > > typical latencies to do a random memory read from the remote nodes, > even with the latest networks, > it's between 0.85 and 1.9 microseconds. Let's take optimistic 1 > microsecond. RDMA read... > > So in that timeframe you can execute 24k+ instructions. > Hah, how easy it is to make a mistake, sorry for that. I didn't even multiply by the Ghz frequency of the cpu's yet. So if it's 3Ghz or so, it's actually closer to factor 75k faster than 24k. Furthermore another problem is that you cant fully load networks of course. So to keep the network functioning great you want to do such hammering over the network no more than once each 750k instructions. > IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for > most codes. > > Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are > busy with rough measures here. > > Let's call that 1/4 millisecond. > > Even USB 1.1 has to sticks latencies far under 1 millisecond. > > So actual latency of todays clusters is factor 25k worse than this > 'cluster'. > > In fact your microcontrollercluster here has latencies that you do > not even have core to core > within a single CPU today. > > There is still too much years 80s and years 90s software out there, > written by the guys who wrote books about how to parallellize, which > simply > doesn't scale at all at modern hardware. > > Let me not quote too many names there as i've done before. > > They were just too lazy to throw away their old code and start over > new writing a new parallel concept > that works at todays hardware. > > If we involve GPU's now then there is gonna be an even bigger problem > and that's that bandwidth of the network > can't keep up with what a single GPU delivers. Who is to blame for > that is quite a complicated discussion, > if anyone has to be blamed anyway. > > We just need more clever algorithms there. > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 18:24:55 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 15:24:55 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Vincent Diepeveen Sent: Wednesday, January 11, 2012 2:47 PM To: Beowulf Mailing List Subject: Re: [Beowulf] A cluster of Arduinos Jim, your microcontroller cluster is not a rather good idea. Latency didn't keep up with the CPU speeds... --- You're missing the point of the cluster. It's not for performance (where I can't imagine that the slowest single CPU PC out there wouldn't blow the figurative doors off). It's to provide a very inexpensive way to experiment/play/demonstrate loosely coupled multiprocessor systems. --> for example, you could experiment with redundant message routing across a fabric of nodes. The algorithms are fairly simple, and this gives you a testbed which is qualitatively different than just simulating a bunch of nodes on a single PC. There is pedagogical value in a system where you can force a link error by just disconnecting the cable, and your blinky lights on each node show what's going on. There is still too much years 80s and years 90s software out there, written by the guys who wrote books about how to parallellize, which simply doesn't scale at all at modern hardware. --> I think that a lot of the theory of parallel processes is speed independent, and while some historical approaches might not be used in a modern system for good implementation reasons, students and others still need to learn about them, if only as the canonical approach. Sure, you could do a simulation on a single PC (and I've seen them, in Simulink, and in other more specialized tools), but there's a lot of appeal to a hands-on-the-cheap-hardware approach to learning. --> To take an example, if you set a student a problem of lighting a LED on each node in a specified node order at specified intervals, and where the node interconnects are not specified in advance, that's a fairly interesting homework problem. You have to discover the network connectivity graph, then figure out how to pass the message to the appropriate node at the appropriate time. This is a classic "hot plug network discovery" kind of problem, and in the face of intermittent links, it's of great interest. --> While that particular problem isn't exactly HPC, it DOES relate to HPC in a world where you cannot assume perfect processor nodes and perfect communications links. And that gets right to the whole "scalability" thing in HPC. It wasn't til the implementation of Error Correcting Codes in logic that something like the Q7A computer was even possible, because it was so large that you couldn't guarantee that all the tubes would be working all the time. Likewise with many other aspects of modern computing. --> And, of course, in the spaceflight world, this kind of thing is even more important. A concept of growing importance is the "fractionated spacecraft" where all of the functions that would have been all in one physical vehicle are now spread across many smaller pieces. And one might reallocate spacecraft fractional pieces between different virtual spacecraft. Maybe right now, you need a lot of processing power to do image compression and analysis, so you want to allocate a lot of "processing pieces" to the job, with an ad hoc network connection among them. Later, you don't need them, so you can release them to other uses. The pieces might be in the immediate vicinity, or they might be some distance away, which affects the data rate in the link and its error rates. --> You can legitimately ask whether this sort of thing (the fractionated spacecraft) is a Beowulf (defined as a cluster supercomputer built of commodity components) and I would say it shares many of the same properties, especially in the early Beowulf days before multicores and fancy interconnects were fashionable for multi-thousand processor clusters. It's that idea of building a large complex device out of many basically identical subunits, using open source/simple software to manage it. -->> in summary, it's not about performance.. it's about a teaching tool for networking in the context of cluster computing. You claim we need to cast off the shackles of old programming styles and get some new blood and ideas. Well, you need to get people interested in parallel computing and learning the basics (so at least they don't reinvent the square wheel). One way might be challenges such as parallelization of game play; another might be working with parallelized database; the way I propose is with experimenting with message passing parallelization using dirt cheap hardware. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Wed Jan 11 19:18:11 2012 From: deadline at eadline.org (Douglas Eadline) Date: Wed, 11 Jan 2012 19:18:11 -0500 Subject: [Beowulf] PAPERS interface In-Reply-To: References: Message-ID: <2d6fa78f1fc44cea3df118e1c0a27f31.squirrel@mail.eadline.org> Hank Deitz, was at Purdue, now at Kentucky, see aggregate.org -- Doug > Arghh.. my google-fu is failing me.. > > I'm looking for the papers on the PAPERS cluster interface (based on using > parallel ports.. back in the 90s) and, of course, if you search for the > word papers, you get nothing useful.. > > I can't remember who the authors were or where it was done (I'm thinking > in the SouthEast US, for some reason, but I'm not sure) > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Wed Jan 11 19:36:37 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 01:36:37 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: Yes this was impossible to explain to a bunch of MiT folks as well, some of whom wrote your book i bet - yet the slower the processor, the more of a true SMP system it is. It's obvious that you missed that point. Writing code for a multicore is tougher, from SMP constraints viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond latency to the other cpu's. So it's far from demonstrating clusterprogramming. Lightyears away. Emulation at a simple quadcore is in fact better representative than this. If you want to get closer to clusterprogramming than this, just buy yourself off ebay some barcelona core SMP system with 4 sockets. Say with energy efficient 1.8Ghz CPU's. So with one of the first incarnations of hypertransport, as of course later on it dramatically improved. Latency from cpu to cpu is some 300+ ns if you lookup randomly. Even good programmers in game tree search have big problems working with those latencies. Clusters are having latencies that are far worse than that. Yet as cpu speeds no longer increase much and number of cores doesn't double that quickly, clusters are the way to go if you're CPU hungry. Setting up small clusters is cheap as well. If i put in the name 'mellanox' in ebay i see bunches of cheap cards out there and also switches. With a single switch you can teach half a dozen students. You can just connect the machines you already got there onto a few switches and write MPI code like that. Average cost per student also will be a couple of hundreds of dollars. Vincent On Jan 12, 2012, at 12:24 AM, Lux, Jim (337C) wrote: > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Wednesday, January 11, 2012 2:47 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] A cluster of Arduinos > > Jim, your microcontroller cluster is not a rather good idea. > > Latency didn't keep up with the CPU speeds... > > --- You're missing the point of the cluster. It's not for > performance (where I can't imagine that the slowest single CPU PC > out there wouldn't blow the figurative doors off). It's to provide > a very inexpensive way to experiment/play/demonstrate loosely > coupled multiprocessor systems. > > --> for example, you could experiment with redundant message > routing across a fabric of nodes. The algorithms are fairly > simple, and this gives you a testbed which is qualitatively > different than just simulating a bunch of nodes on a single PC. > There is pedagogical value in a system where you can force a link > error by just disconnecting the cable, and your blinky lights on > each node show what's going on. > > > There is still too much years 80s and years 90s software out there, > written by the guys who wrote books about how to parallellize, > which simply doesn't scale at all at modern hardware. > > --> I think that a lot of the theory of parallel processes is > speed independent, and while some historical approaches might not > be used in a modern system for good implementation reasons, > students and others still need to learn about them, if only as the > canonical approach. Sure, you could do a simulation on a single > PC (and I've seen them, in Simulink, and in other more specialized > tools), but there's a lot of appeal to a hands-on-the-cheap- > hardware approach to learning. > > --> To take an example, if you set a student a problem of lighting > a LED on each node in a specified node order at specified > intervals, and where the node interconnects are not specified in > advance, that's a fairly interesting homework problem. You have to > discover the network connectivity graph, then figure out how to > pass the message to the appropriate node at the appropriate time. > This is a classic "hot plug network discovery" kind of problem, and > in the face of intermittent links, it's of great interest. > > --> While that particular problem isn't exactly HPC, it DOES relate > to HPC in a world where you cannot assume perfect processor nodes > and perfect communications links. And that gets right to the whole > "scalability" thing in HPC. It wasn't til the implementation of > Error Correcting Codes in logic that something like the Q7A > computer was even possible, because it was so large that you > couldn't guarantee that all the tubes would be working all the > time. Likewise with many other aspects of modern computing. > > --> And, of course, in the spaceflight world, this kind of thing is > even more important. A concept of growing importance is the > "fractionated spacecraft" where all of the functions that would > have been all in one physical vehicle are now spread across many > smaller pieces. And one might reallocate spacecraft fractional > pieces between different virtual spacecraft. Maybe right now, you > need a lot of processing power to do image compression and > analysis, so you want to allocate a lot of "processing pieces" to > the job, with an ad hoc network connection among them. Later, you > don't need them, so you can release them to other uses. The pieces > might be in the immediate vicinity, or they might be some distance > away, which affects the data rate in the link and its error rates. > > --> You can legitimately ask whether this sort of thing (the > fractionated spacecraft) is a Beowulf (defined as a cluster > supercomputer built of commodity components) and I would say it > shares many of the same properties, especially in the early Beowulf > days before multicores and fancy interconnects were fashionable for > multi-thousand processor clusters. It's that idea of building a > large complex device out of many basically identical subunits, > using open source/simple software to manage it. > > > -->> in summary, it's not about performance.. it's about a teaching > tool for networking in the context of cluster computing. You claim > we need to cast off the shackles of old programming styles and get > some new blood and ideas. Well, you need to get people interested > in parallel computing and learning the basics (so at least they > don't reinvent the square wheel). One way might be challenges such > as parallelization of game play; another might be working with > parallelized database; the way I propose is with experimenting with > message passing parallelization using dirt cheap hardware. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Jan 11 19:59:18 2012 From: samuel at unimelb.edu.au (Chris Samuel) Date: Thu, 12 Jan 2012 11:59:18 +1100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <201201121159.18993.samuel@unimelb.edu.au> On Thu, 12 Jan 2012 11:36:37 AM Vincent Diepeveen wrote: > So it's far from demonstrating clusterprogramming. Lightyears away. Whatever happpened to hacking on hardware just for the fun of it? Just because it's not going to be useful doesn't mean you won't learn from the experience, even if the lesson is only "don't do it again". :-) -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Jan 11 20:04:32 2012 From: samuel at unimelb.edu.au (Chris Samuel) Date: Thu, 12 Jan 2012 12:04:32 +1100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <201201121204.32332.samuel@unimelb.edu.au> On Thu, 12 Jan 2012 04:58:13 AM Lux, Jim (337C) wrote: > Also, does the Raspberry PI $25 price point include a power supply? I thought the plan was for them to be powered from the HDMI connector, but it appears I was wrong, it looks like it can use either microUSB or the GPIO header. http://elinux.org/RaspberryPiBoard # The board takes fixed 5V input, (with the 1V2 core voltage generated # directly from the input using the internal switch-mode supply on the # BCM2835 die). This permits adoption of the micro USB form factor, # which, in turn, prevents the user from inadvertently plugging in # out-of-range power inputs; that would be dangerous, since the 5V # would go straight to HDMI and output USB ports, even though the # problem should be mitigated by some protections applied to the input # power: The board provides a polarity protection diode, a voltage # clamp, and a self-resetting semiconductor fuse. -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 20:09:53 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 17:09:53 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Vincent Diepeveen Sent: Wednesday, January 11, 2012 4:37 PM To: Beowulf Mailing List Subject: Re: [Beowulf] A cluster of Arduinos Yes this was impossible to explain to a bunch of MiT folks as well, some of whom wrote your book i bet - yet the slower the processor, the more of a true SMP system it is. It's obvious that you missed that point. Writing code for a multicore is tougher, from SMP constraints viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond latency to the other cpu's. -> Yes, that's true... but that's also what I would think of as more advanced than understanding basic message passing or non-tightly-coupled multiprocessing systems. And there are lots of applications for the latter. Some might not be as sexy as others, but they exist. So it's far from demonstrating clusterprogramming. Lightyears away. Emulation at a simple quadcore is in fact better representative than this. If you want to get closer to clusterprogramming than this, just buy yourself off ebay some barcelona core SMP system with 4 sockets. Say with energy efficient 1.8Ghz CPU's. So with one of the first incarnations of hypertransport, as of course later on it dramatically improved. Latency from cpu to cpu is some 300+ ns if you lookup randomly. Even good programmers in game tree search have big problems working with those latencies. -> but that's an entirely different sort of problem space and instructional area. Clusters are having latencies that are far worse than that. Yet as cpu speeds no longer increase much and number of cores doesn't double that quickly, clusters are the way to go if you're CPU hungry. Setting up small clusters is cheap as well. If i put in the name 'mellanox' in ebay i see bunches of cheap cards out there and also switches. -> Oh, Im sure the surplus market is full of things one could potentially use. But I suspect that by the time you lash together your $40 cards and $20 cables and several hundred $ switch, you're up in the total system price >$1k. And you're using surplus, so there's a support issue. If you're tinkering for yourself in the garage or as a one-off, then surplus is a fine way to go. If you want to be able to give a list of "go buy this" to a teacher, it needs to be off-the-shelf currently being manufactured stuff. -> Say you want to set up 10 demo systems with 8 nodes each, so that each student in a small class has their own to work with. There's a big difference between $30 Arduinos and $200 netbooks. With a single switch you can teach half a dozen students. You can just connect the machines you already got there onto a few switches and write MPI code like that. -> The whole point is to give a student exclusive access to the system, without needing to share. Sure, we've all done the shared "computer lab" resource thing and managed to learn(In the late 1970s, I would have done quite a lot to have on demand access to an 029 keypunch). That's part of what *personal* computers is all about. My program doesn't work right, I just hit the reset button and start over. -> I confess, too, that there is an aspect of the "mass of boards on the desktop with cables strewn around", which is a learning experience in itself. On the other hand, the Arduino experience is a lot less hassle than, say, a mass of PC mobos, network cards, and power supplies and trying to get them to boot off the net or a USB drive. Average cost per student also will be a couple of hundreds of dollars. -> that's the "total cost of several thousand dollars divided by N students who share it" I suspect. We could get into a little BOM battle, and I'd venture that I can keep the off the shelf parts cost under $500, and give each student a dedicated system to play with. The only part that I don't know right off the top of my head is the actual interconnect hardware. I think you'd want to design some sort of board with a bunch of connectors that connects to the Arduinos with ribbon cables. But even there, that could be "here's your PCBExpress file.. order the board and you get 3 for $50" -> over the years I've been involved in several of these "what can we set up for a demonstration", and I've converged to the realization that what you need is a parts list (preferably preloaded at Newark or DigiKey or Mouser or similar) and an explicit set of instructions. A setup that starts out with: 1) Find 8 motherboards on eBay or newegg with these sorts of specs 2) Find 8 power supplies that match the mother boards Is doomed to failure. You need "buy 3 of those and 6 of these, and hook them up this way" This is the beauty of the whole Arduino culture. In fact, it's a bit too much of that.. there's not a lot of good overview tutorial material.. but lots of "here's how to do specific task X"... I got started looking at Arduinos because I want to build a multichannel temperature controller to smoke/cure sausage. But I've used just about every small single board computer out there: Rabbit, Basic Stamp, various PIC boards, etc. not to mention various MiniITX and PC schemes. So far, the Arduino is the winner on dirt cheap and simple combined. Spend $30, plug in USB cable, load java environment, done. Now I know why all those projects at the science fair are using them. You get to focus on what you want to do, rather than getting a computer working. Vincent On Jan 12, 2012, at 12:24 AM, Lux, Jim (337C) wrote: > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Wednesday, January 11, 2012 2:47 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] A cluster of Arduinos > > Jim, your microcontroller cluster is not a rather good idea. > > Latency didn't keep up with the CPU speeds... > > --- You're missing the point of the cluster. It's not for performance > (where I can't imagine that the slowest single CPU PC out there > wouldn't blow the figurative doors off). It's to provide a very > inexpensive way to experiment/play/demonstrate loosely coupled > multiprocessor systems. > > --> for example, you could experiment with redundant message > routing across a fabric of nodes. The algorithms are fairly simple, > and this gives you a testbed which is qualitatively > different than just simulating a bunch of nodes on a single PC. > There is pedagogical value in a system where you can force a link > error by just disconnecting the cable, and your blinky lights on each > node show what's going on. > > > There is still too much years 80s and years 90s software out there, > written by the guys who wrote books about how to parallellize, which > simply doesn't scale at all at modern hardware. > > --> I think that a lot of the theory of parallel processes is > speed independent, and while some historical approaches might not be > used in a modern system for good implementation reasons, students and > others still need to learn about them, if only as the > canonical approach. Sure, you could do a simulation on a single > PC (and I've seen them, in Simulink, and in other more specialized > tools), but there's a lot of appeal to a hands-on-the-cheap- hardware > approach to learning. > > --> To take an example, if you set a student a problem of lighting > a LED on each node in a specified node order at specified intervals, > and where the node interconnects are not specified in advance, that's > a fairly interesting homework problem. You have to discover the > network connectivity graph, then figure out how to > pass the message to the appropriate node at the appropriate time. > This is a classic "hot plug network discovery" kind of problem, and in > the face of intermittent links, it's of great interest. > > --> While that particular problem isn't exactly HPC, it DOES relate > to HPC in a world where you cannot assume perfect processor nodes and > perfect communications links. And that gets right to the whole > "scalability" thing in HPC. It wasn't til the implementation of Error > Correcting Codes in logic that something like the Q7A computer was > even possible, because it was so large that you couldn't guarantee > that all the tubes would be working all the time. Likewise with many > other aspects of modern computing. > > --> And, of course, in the spaceflight world, this kind of thing is > even more important. A concept of growing importance is the > "fractionated spacecraft" where all of the functions that would have > been all in one physical vehicle are now spread across many smaller > pieces. And one might reallocate spacecraft fractional pieces between > different virtual spacecraft. Maybe right now, you need a lot of > processing power to do image compression and analysis, so you want to > allocate a lot of "processing pieces" to the job, with an ad hoc > network connection among them. Later, you don't need them, so you > can release them to other uses. The pieces might be in the immediate > vicinity, or they might be some distance away, which affects the data > rate in the link and its error rates. > > --> You can legitimately ask whether this sort of thing (the > fractionated spacecraft) is a Beowulf (defined as a cluster > supercomputer built of commodity components) and I would say it shares > many of the same properties, especially in the early Beowulf days > before multicores and fancy interconnects were fashionable for > multi-thousand processor clusters. It's that idea of building a large > complex device out of many basically identical subunits, using open > source/simple software to manage it. > > > -->> in summary, it's not about performance.. it's about a teaching > tool for networking in the context of cluster computing. You claim we > need to cast off the shackles of old programming styles and get some > new blood and ideas. Well, you need to get people interested in > parallel computing and learning the basics (so at least they don't > reinvent the square wheel). One way might be challenges such as > parallelization of game play; another might be working with > parallelized database; the way I propose is with experimenting with > message passing parallelization using dirt cheap hardware. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Jan 11 20:22:07 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 11 Jan 2012 17:22:07 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <201201121204.32332.samuel@unimelb.edu.au> References: <201201121204.32332.samuel@unimelb.edu.au> Message-ID: Interesting... That seems to be a growing trend, then. So, now we just have to wait for them to actually exist. The $35 B style board has Ethernet, and assuming one could netboot and operate "headless", then a stack o'raspberry PIs and a cheap Ethernet switch might be an alternate approach. The "per node" cost is comparable to the Arduino, and it's true that Ethernet is probably more congenial in the long run. Drawing 700mA off the microUSB, though.. That's fairly hefty (although not a big deal in general.. you might need to have some better power supply scheme for a basket o'pi cluster. (Arduino Uno runs around 40-50 mA) -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Chris Samuel Sent: Wednesday, January 11, 2012 5:05 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos On Thu, 12 Jan 2012 04:58:13 AM Lux, Jim (337C) wrote: > Also, does the Raspberry PI $25 price point include a power supply? I thought the plan was for them to be powered from the HDMI connector, but it appears I was wrong, it looks like it can use either microUSB or the GPIO header. http://elinux.org/RaspberryPiBoard # The board takes fixed 5V input, (with the 1V2 core voltage generated # directly from the input using the internal switch-mode supply on the # BCM2835 die). This permits adoption of the micro USB form factor, # which, in turn, prevents the user from inadvertently plugging in # out-of-range power inputs; that would be dangerous, since the 5V # would go straight to HDMI and output USB ports, even though the # problem should be mitigated by some protections applied to the input # power: The board provides a polarity protection diode, a voltage # clamp, and a self-resetting semiconductor fuse. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 11 21:03:21 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 03:03:21 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> Message-ID: <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> The whole purpose of PC's is that they are generic to use. I remember how in past decision taking bought low clocked junk for big price - much against the wish of the sysadmins who wanted a PC for every student exclusively. Outdated slow junk is not interesting to students. Now you and i might like that CPU as it's under $1, but to them it's just 70Mhz, factor 500 slower than their home PC single core is. What impresses is if you got something that can beat their own machine at home. In the end in science we basically learn a lot easier if we can take a look into the future - so being faster than a single PC is a good example of that. So let them do that. If you take care you launch 1 proces on each machine, then at quadcore machines, not to mention i7's with hyperthreading, you can have 24 computers on 1 switch that serve 24 students, each using 12 logical cores. And for demonstration purposes you can run succesful applications also at all 24 computers at the same time. Hey there is switches with even more slots. Average price per student is gonna beat the crap out any junk solution you show up with - besides how many are you gonna buy? Those computers are already there, one for each student i suspect. So they can exclusively toy and toy - for the switch it's not a real problem except if they really mess up. But most important they learn something - by toying with 70Mhz hardware that's not representative and only intersting to experts like you and me, who are real good in embedded programming, they don't learn much. There is no replacement for the real thing to test upon. Besides if you go program at embedded processors, writing good fast single CPU code mine is probably gonna kick the hell out of you writing the same program at 8 CPU's. Probably by factor 10+ it'll be single core faster than you at 8. p.s. not that it's disturbing Jim but your replies are typed within my original message always, so tough to read sometimes what you typed into the message i posted here - maybe this apple macbookpro's mailing system doesn't know how to handle it - FYI i want to reformat it to linux anyway - getting sick being hacked silly each time by about every other consultant, but well this is all off topic - so hence the postscriptum. On Jan 12, 2012, at 2:09 AM, Lux, Jim (337C) wrote: > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Wednesday, January 11, 2012 4:37 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] A cluster of Arduinos > > Yes this was impossible to explain to a bunch of MiT folks as well, > some of whom wrote your book i bet - yet the slower the processor, > the more of a true SMP system it is. > > It's obvious that you missed that point. > > Writing code for a multicore is tougher, from SMP constraints > viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond > latency to the other cpu's. > > -> Yes, that's true... but that's also what I would think of as > more advanced than understanding basic message passing or non- > tightly-coupled multiprocessing systems. And there are lots of > applications for the latter. Some might not be as sexy as others, > but they exist. > > So it's far from demonstrating clusterprogramming. Lightyears away. > Emulation at a simple quadcore is in fact better representative > than this. > If you want to get closer to clusterprogramming than this, just buy > yourself off ebay some barcelona core SMP system with 4 sockets. > Say with energy efficient 1.8Ghz CPU's. > So with one of the first incarnations of hypertransport, as of > course later on it dramatically improved. > Latency from cpu to cpu is some 300+ ns if you lookup randomly. > Even good programmers in game tree search have big problems working > with those latencies. > > -> but that's an entirely different sort of problem space and > instructional area. > > > Clusters are having latencies that are far worse than that. Yet as > cpu speeds no longer increase much and number of cores doesn't > double that quickly, clusters are the way to go if you're CPU hungry. > Setting up small clusters is cheap as well. If i put in the name > 'mellanox' in ebay i see bunches of cheap cards out there and also > switches. > > -> Oh, Im sure the surplus market is full of things one could > potentially use. But I suspect that by the time you lash together > your $40 cards and $20 cables and several hundred $ switch, you're > up in the total system price >$1k. And you're using surplus, so > there's a support issue. If you're tinkering for yourself in the > garage or as a one-off, then surplus is a fine way to go. If you > want to be able to give a list of "go buy this" to a teacher, it > needs to be off-the-shelf currently being manufactured stuff. > > -> Say you want to set up 10 demo systems with 8 nodes each, so > that each student in a small class has their own to work with. > There's a big difference between $30 Arduinos and $200 netbooks. > > With a single switch you can teach half a dozen students. You can > just connect the machines you already got there onto a few switches > and write MPI code like that. > > -> The whole point is to give a student exclusive access to the > system, without needing to share. Sure, we've all done the shared > "computer lab" resource thing and managed to learn(In the late > 1970s, I would have done quite a lot to have on demand access to an > 029 keypunch). That's part of what *personal* computers is all > about. My program doesn't work right, I just hit the reset > button and start over. > > -> I confess, too, that there is an aspect of the "mass of boards > on the desktop with cables strewn around", which is a learning > experience in itself. On the other hand, the Arduino experience is > a lot less hassle than, say, a mass of PC mobos, network cards, and > power supplies and trying to get them to boot off the net or a USB > drive. > > > Average cost per student also will be a couple of hundreds of dollars. > -> that's the "total cost of several thousand dollars divided by N > students who share it" I suspect. We could get into a little BOM > battle, and I'd venture that I can keep the off the shelf parts > cost under $500, and give each student a dedicated system to play > with. The only part that I don't know right off the top of my head > is the actual interconnect hardware. I think you'd want to design > some sort of board with a bunch of connectors that connects to the > Arduinos with ribbon cables. But even there, that could be > "here's your PCBExpress file.. order the board and you get 3 for $50" > > -> over the years I've been involved in several of these "what can > we set up for a demonstration", and I've converged to the > realization that what you need is a parts list (preferably > preloaded at Newark or DigiKey or Mouser or similar) and an > explicit set of instructions. A setup that starts out with: > 1) Find 8 motherboards on eBay or newegg with these sorts of specs > 2) Find 8 power supplies that match the mother boards > > Is doomed to failure. You need "buy 3 of those and 6 of these, and > hook them up this way" > > This is the beauty of the whole Arduino culture. In fact, it's a > bit too much of that.. there's not a lot of good overview tutorial > material.. but lots of "here's how to do specific task X"... I got > started looking at Arduinos because I want to build a multichannel > temperature controller to smoke/cure sausage. > > But I've used just about every small single board computer out > there: Rabbit, Basic Stamp, various PIC boards, etc. not to mention > various MiniITX and PC schemes. So far, the Arduino is the winner > on dirt cheap and simple combined. Spend $30, plug in USB cable, > load java environment, done. Now I know why all those projects at > the science fair are using them. You get to focus on what you want > to do, rather than getting a computer working. > > Vincent > > > > On Jan 12, 2012, at 12:24 AM, Lux, Jim (337C) wrote: > >> >> >> -----Original Message----- >> From: beowulf-bounces at beowulf.org [mailto:beowulf- >> bounces at beowulf.org] On Behalf Of Vincent Diepeveen >> Sent: Wednesday, January 11, 2012 2:47 PM >> To: Beowulf Mailing List >> Subject: Re: [Beowulf] A cluster of Arduinos >> >> Jim, your microcontroller cluster is not a rather good idea. >> >> Latency didn't keep up with the CPU speeds... >> >> --- You're missing the point of the cluster. It's not for >> performance >> (where I can't imagine that the slowest single CPU PC out there >> wouldn't blow the figurative doors off). It's to provide a very >> inexpensive way to experiment/play/demonstrate loosely coupled >> multiprocessor systems. >> >> --> for example, you could experiment with redundant message >> routing across a fabric of nodes. The algorithms are fairly simple, >> and this gives you a testbed which is qualitatively >> different than just simulating a bunch of nodes on a single PC. >> There is pedagogical value in a system where you can force a link >> error by just disconnecting the cable, and your blinky lights on each >> node show what's going on. >> >> >> There is still too much years 80s and years 90s software out there, >> written by the guys who wrote books about how to parallellize, which >> simply doesn't scale at all at modern hardware. >> >> --> I think that a lot of the theory of parallel processes is >> speed independent, and while some historical approaches might not be >> used in a modern system for good implementation reasons, students and >> others still need to learn about them, if only as the >> canonical approach. Sure, you could do a simulation on a single >> PC (and I've seen them, in Simulink, and in other more specialized >> tools), but there's a lot of appeal to a hands-on-the-cheap- hardware >> approach to learning. >> >> --> To take an example, if you set a student a problem of lighting >> a LED on each node in a specified node order at specified intervals, >> and where the node interconnects are not specified in advance, that's >> a fairly interesting homework problem. You have to discover the >> network connectivity graph, then figure out how to >> pass the message to the appropriate node at the appropriate time. >> This is a classic "hot plug network discovery" kind of problem, >> and in >> the face of intermittent links, it's of great interest. >> >> --> While that particular problem isn't exactly HPC, it DOES relate >> to HPC in a world where you cannot assume perfect processor nodes and >> perfect communications links. And that gets right to the whole >> "scalability" thing in HPC. It wasn't til the implementation of >> Error >> Correcting Codes in logic that something like the Q7A computer was >> even possible, because it was so large that you couldn't guarantee >> that all the tubes would be working all the time. Likewise with many >> other aspects of modern computing. >> >> --> And, of course, in the spaceflight world, this kind of thing is >> even more important. A concept of growing importance is the >> "fractionated spacecraft" where all of the functions that would have >> been all in one physical vehicle are now spread across many smaller >> pieces. And one might reallocate spacecraft fractional pieces >> between >> different virtual spacecraft. Maybe right now, you need a lot of >> processing power to do image compression and analysis, so you want to >> allocate a lot of "processing pieces" to the job, with an ad hoc >> network connection among them. Later, you don't need them, so you >> can release them to other uses. The pieces might be in the immediate >> vicinity, or they might be some distance away, which affects the data >> rate in the link and its error rates. >> >> --> You can legitimately ask whether this sort of thing (the >> fractionated spacecraft) is a Beowulf (defined as a cluster >> supercomputer built of commodity components) and I would say it >> shares >> many of the same properties, especially in the early Beowulf days >> before multicores and fancy interconnects were fashionable for >> multi-thousand processor clusters. It's that idea of building a >> large >> complex device out of many basically identical subunits, using open >> source/simple software to manage it. >> >> >> -->> in summary, it's not about performance.. it's about a teaching >> tool for networking in the context of cluster computing. You >> claim we >> need to cast off the shackles of old programming styles and get some >> new blood and ideas. Well, you need to get people interested in >> parallel computing and learning the basics (so at least they don't >> reinvent the square wheel). One way might be challenges such as >> parallelization of game play; another might be working with >> parallelized database; the way I propose is with experimenting with >> message passing parallelization using dirt cheap hardware. >> >> >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing To change your subscription (digest mode or unsubscribe) >> visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eagles051387 at gmail.com Thu Jan 12 02:42:10 2012 From: eagles051387 at gmail.com (Jonathan Aquilina) Date: Thu, 12 Jan 2012 08:42:10 +0100 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> <4F020244.4040505@ias.edu> <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> Message-ID: <4F0E8ED2.5000504@gmail.com> On 11/01/2012 18:30, Vincent Diepeveen wrote: > On Jan 2, 2012, at 8:15 PM, Prentice Bisbal wrote: > >> On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: >>> it's very useful Mark, as we know now he works for the company and >>> also for which nation. >>> >>> Vincent >> For someone who's always bashing on US Foreign policy, you sure sound >> like a Republican or member of the Department of Homeland Security! > Where is my paycheck? > >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf FYI vincent I am no back in malta. Regards Jonathan Aquilina Get a signature like this. CLICK HERE. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p.gif Type: image/gif Size: 35 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pixel.png Type: image/png Size: 90 bytes Desc: not available URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eagles051387 at gmail.com Thu Jan 12 02:42:26 2012 From: eagles051387 at gmail.com (Jonathan Aquilina) Date: Thu, 12 Jan 2012 08:42:26 +0100 Subject: [Beowulf] clustering using off the shelf systems in a fish tank full of oil. In-Reply-To: <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> References: <4EFB5AAE.3030900@gmail.com> <715C5657-461B-41E7-9591-5DF89F3CC285@xs4all.nl> <4EFC8D03.4020406@gmail.com> <5AF52A05-28AA-4EE5-A081-EA60BD1E9B32@xs4all.nl> <4EFC9540.5010906@gmail.com> <4F020244.4040505@ias.edu> <1820F354-C0D4-4337-A9EB-DDBD9CB50761@xs4all.nl> Message-ID: <4F0E8EE2.7040403@gmail.com> On 11/01/2012 18:30, Vincent Diepeveen wrote: > On Jan 2, 2012, at 8:15 PM, Prentice Bisbal wrote: > >> On 12/29/2011 07:50 PM, Vincent Diepeveen wrote: >>> it's very useful Mark, as we know now he works for the company and >>> also for which nation. >>> >>> Vincent >> For someone who's always bashing on US Foreign policy, you sure sound >> like a Republican or member of the Department of Homeland Security! > Where is my paycheck? > >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf FYI vincent I am now back in malta. Regards Jonathan Aquilina Get a signature like this. CLICK HERE. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p.gif Type: image/gif Size: 35 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pixel.png Type: image/png Size: 90 bytes Desc: not available URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eugen at leitl.org Thu Jan 12 03:49:45 2012 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 12 Jan 2012 09:49:45 +0100 Subject: [Beowulf] the Barcelona Supercomputing Center Message-ID: <20120112084945.GD21917@leitl.org> Just some cluster porn: http://imgur.com/a/OoNVI _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From john.hearns at mclaren.com Thu Jan 12 05:16:28 2012 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 12 Jan 2012 10:16:28 -0000 Subject: [Beowulf] A cluster of Arduinos References: <201201121204.32332.samuel@unimelb.edu.au> Message-ID: <207BB2F60743C34496BE41039233A8090A7D728A@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > Interesting... > That seems to be a growing trend, then. So, now we just have to wait > for them to actually exist. The $35 B style board has Ethernet, and > assuming one could netboot and operate "headless", then a stack > o'raspberry PIs and a cheap Ethernet switch might be an alternate > approach. Regarding Ethernet switches, I had cause recently to look for an USB powered switch Such things exist, they are promoted for gamers. http://www.scan.co.uk/products/8-port-eten-pw-108-pocket-size-metal-casi ng-10-100-switch-usb-powered-lan-party! You could imagine a cluster being powered by those USB adapters which fit into the cigarette lighter socket of a car. How about a cluster which fits in the glovebox or under the seat of a car? The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From peter.st.john at gmail.com Thu Jan 12 08:49:16 2012 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 12 Jan 2012 08:49:16 -0500 Subject: [Beowulf] the Barcelona Supercomputing Center In-Reply-To: <20120112084945.GD21917@leitl.org> References: <20120112084945.GD21917@leitl.org> Message-ID: The architectural contrast (the building housing the racks is a chapel) is vivid. Sorta Steampunkish. The place is described some at http://www.bsc.es/plantillaA.php?cat_id=1 (many of their pages seem to be in English). Peter On Thu, Jan 12, 2012 at 3:49 AM, Eugen Leitl wrote: > > Just some cluster porn: > > http://imgur.com/a/OoNVI > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ellis at runnersroll.com Thu Jan 12 08:58:20 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 08:58:20 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> Message-ID: <4F0EE6FC.2050002@runnersroll.com> On 01/11/2012 09:03 PM, Vincent Diepeveen wrote: > The whole purpose of PC's is that they are generic to use. I remember > how in past decision taking bought low clocked junk for big price - > much against the wish of the sysadmins who wanted a PC for every > student exclusively. Outdated slow junk is not interesting > to students. Now you and i might like that CPU as it's under $1, but > to them it's just 70Mhz, factor 500 slower than their home PC single > core > is. What impresses is if you got something that can beat their own > machine at home. > > In the end in science we basically learn a lot easier if we can take > a look into the future - so being faster than a single PC is a good > example of that. Take this advice in any other area, let's say, Chemical Engineering or Mechanical Engineering, and the students are going to come out the of the experience with chemical burns at least to at most blowing up half of the building. In the best case all they do is screw up very, very expensive equipment. So I have to respectfully disagree that learning is only possible and students will only be interested when working on the stuff of the "future." I think this is likely the reason why many introductory engineering classes incorporate use of Lego Mindstorm robots rather than lunar rovers (or even overstock lunar rovers :D). Point in case, I got interested in HPC/Beowulfery back in 2006, read RGBs book and a few other texts on it, and finally found a small group (4) of unused PIIIs to play on in the attic of one of my college's buildings. Did I learn how to setup a reasonable cluster? Yes. Was it slow as dirt compared to then modern Intel and AMD processors? Of course. But did the experience get me so completely hooked on HPC/Cluster research that I went on to pursue a PHD on the topic? Absolutely. Granted, I'm just one data point, but I think Jim's idea has all the right components for a great educational experience. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Jan 12 09:28:56 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 12 Jan 2012 09:28:56 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <201201121204.32332.samuel@unimelb.edu.au> Message-ID: <4F0EEE28.6030404@ias.edu> On 01/11/2012 08:22 PM, Lux, Jim (337C) wrote: > Interesting... > That seems to be a growing trend, then. So, now we just have to wait for them to actually exist. The $35 B style board has Ethernet, and assuming one could netboot and operate "headless", then a stack o'raspberry PIs and a cheap Ethernet switch might be an alternate approach. > > The "per node" cost is comparable to the Arduino, and it's true that Ethernet is probably more congenial in the long run. You can get an ethernet "shield" for arduino to add ethernet capabilities, but at $35-50 each, you cost savings just went out the window, especially when compared to the Raspberry Pi. You can also buy the Arduino Ethernet, which is an arduino board with Ethernet built in, but at a cost of ~$60, is no better a value than buying an arduino and the ethernet shield separately. > Drawing 700mA off the microUSB, though.. That's fairly hefty (although not a big deal in general.. you might need to have some better power supply scheme for a basket o'pi cluster. (Arduino Uno runs around 40-50 mA The arduino can be powered by USB, or a 9V power supply, so if you plan on using lots of them (as Jim is, theoretically), you don't have to worry about overloading the USB bus. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 09:35:50 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 06:35:50 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <207BB2F60743C34496BE41039233A8090A7D728A@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: On 1/12/12 2:16 AM, "Hearns, John" wrote: >> Interesting... >> That seems to be a growing trend, then. So, now we just have to wait >> for them to actually exist. The $35 B style board has Ethernet, and >> assuming one could netboot and operate "headless", then a stack >> o'raspberry PIs and a cheap Ethernet switch might be an alternate >> approach. > >Regarding Ethernet switches, I had cause recently to look for an USB >powered switch >Such things exist, they are promoted for gamers. >http://www.scan.co.uk/products/8-port-eten-pw-108-pocket-size-metal-casi >ng-10-100-switch-usb-powered-lan-party! > >You could imagine a cluster being powered by those USB adapters which >fit into the cigarette >lighter socket of a car. >How about a cluster which fits in the glovebox or under the seat of a >car? Powering off the cigarette lighter socket (or 12V power socket as they're now labeled) is probably feasible, but those USB widgets can't source a lot of power. Certainly not amps. > > >The contents of this email are confidential and for the exclusive use of >the intended recipient. If you receive this email in error you should >not copy it, retransmit it, use it or disclose its contents but should >return it to the sender immediately and delete your copy. >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 09:39:23 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 15:39:23 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EE6FC.2050002@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> Message-ID: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> The average guy is not interested in knowing all details regarding how to play tennis with a wooden racket from the 1980s, just around the time when McEnroe was on the tennisfield playing there. Most people are more interested in whether you can win that grandslam with what you produce. The nerds however are interested in how well you can do with a wooden racket from 1980s,therefore projecting your own interest upon those students will just get them desinterested and you will be judged by them as an irrelevant person in their life, whose name they soon forget. Vincent On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: > On 01/11/2012 09:03 PM, Vincent Diepeveen wrote: >> The whole purpose of PC's is that they are generic to use. I remember >> how in past decision taking bought low clocked junk for big price - >> much against the wish of the sysadmins who wanted a PC for every >> student exclusively. Outdated slow junk is not interesting >> to students. Now you and i might like that CPU as it's under $1, but >> to them it's just 70Mhz, factor 500 slower than their home PC single >> core >> is. What impresses is if you got something that can beat their own >> machine at home. >> >> In the end in science we basically learn a lot easier if we can take >> a look into the future - so being faster than a single PC is a good >> example of that. > > Take this advice in any other area, let's say, Chemical Engineering or > Mechanical Engineering, and the students are going to come out the of > the experience with chemical burns at least to at most blowing up half > of the building. In the best case all they do is screw up very, very > expensive equipment. So I have to respectfully disagree that learning > is only possible and students will only be interested when working on > the stuff of the "future." I think this is likely the reason why many > introductory engineering classes incorporate use of Lego Mindstorm > robots rather than lunar rovers (or even overstock lunar rovers :D). > > Point in case, I got interested in HPC/Beowulfery back in 2006, read > RGBs book and a few other texts on it, and finally found a small group > (4) of unused PIIIs to play on in the attic of one of my college's > buildings. Did I learn how to setup a reasonable cluster? Yes. > Was it > slow as dirt compared to then modern Intel and AMD processors? Of > course. But did the experience get me so completely hooked on > HPC/Cluster research that I went on to pursue a PHD on the topic? > Absolutely. > > Granted, I'm just one data point, but I think Jim's idea has all the > right components for a great educational experience. > > Best, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Jan 12 09:38:13 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 12 Jan 2012 09:38:13 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> Message-ID: <4F0EF055.3050609@ias.edu> On 01/11/2012 09:03 PM, Vincent Diepeveen wrote: > The whole purpose of PC's is that they are generic to use. That is also the purpose of the Arduino. That's why they open-sourced it's hardware design. > I remember > how in past decision taking bought low clocked junk for big price - > much against the wish of the sysadmins who wanted a PC for every > student exclusively. Outdated slow junk is not interesting > to students. Now you and i might like that CPU as it's under $1, but > to them it's just 70Mhz, factor 500 slower than their home PC single > core > is. What impresses is if you got something that can beat their own > machine at home. > Wrong. What impresses students is teaching something they didn't already know, or showing them how to do something new. Using baking soda and vinegar to build a volcano, is very low-tech, but it still impresses students of all ages (even in this modern Apple i-everything world) and it's done with ingredients just about everyone already has in their kitchen. Show them sodium acetate crystallizing out of a supersaturated solution, and their heads practically explode. Also very low-tech. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Thu Jan 12 09:50:05 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 12 Jan 2012 09:50:05 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> Message-ID: <4F0EF31D.8010603@ias.edu> On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: > The average guy is not interested in knowing all details regarding > how to > play tennis with a wooden racket from the 1980s, just around > the time when McEnroe was on the tennisfield playing there. > > Most people are more interested in whether you can win that grandslam > with what you produce. > > The nerds however are interested in how well you can do with a wooden > racket > from 1980s,therefore projecting your own interest upon those students > will just > get them desinterested and you will be judged by them as an > irrelevant person > in their life, whose name they soon forget. Vincent, I think the only person projecting here is you. You refer to the 'average guy'. The word 'average' itself implies that statistics have been collected and analyzed. Can you please show us your statistics, and how you collected them, to determine what the average guy is interested in? And what about the average girl, what is she interested in? If you are merely citing the work of other researchers, please include citations. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 09:53:57 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 09:53:57 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EF31D.8010603@ias.edu> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> <4F0EF31D.8010603@ias.edu> Message-ID: <4F0EF405.5070600@runnersroll.com> On 01/12/2012 09:50 AM, Prentice Bisbal wrote: > On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: >> The average guy is not interested in knowing all details regarding >> how to >> play tennis with a wooden racket from the 1980s, just around >> the time when McEnroe was on the tennisfield playing there. >> >> Most people are more interested in whether you can win that grandslam >> with what you produce. >> >> The nerds however are interested in how well you can do with a wooden >> racket >> from 1980s,therefore projecting your own interest upon those students >> will just >> get them desinterested and you will be judged by them as an >> irrelevant person >> in their life, whose name they soon forget. > > Vincent, I think the only person projecting here is you. You refer to > the 'average guy'. The word 'average' itself implies that statistics > have been collected and analyzed. Can you please show us your > statistics, and how you collected them, to determine what the average > guy is interested in? And what about the average girl, what is she > interested in? If you are merely citing the work of other researchers, > please include citations. Guys, let's just let this one die in it's traditional form of Vincent disagrees with the list and there is nothing more that can be done. I recently read a blog that suggested (due to similar threads following these trajectories) that the Wulf list wasn't what it used to be. Let's save the flames for editors, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:03:49 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:03:49 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EF31D.8010603@ias.edu> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> <4F0EF31D.8010603@ias.edu> Message-ID: Very simple, Wooden tennis rackets were dirt cheap in 90s. No one bought them. Instead they all bought for the tennis court a light frame racket with big blade; in fact those were pretty expensive in some cases. Why did no one use suddenly those wooden rackets anymore? How many people watch upcoming Australian Grandslam? A lot. How many will watch 1 or 2 dudes toy with a few embedded processors using a language no one has heard of? Only a handful. On Jan 12, 2012, at 3:50 PM, Prentice Bisbal wrote: > On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: >> The average guy is not interested in knowing all details regarding >> how to >> play tennis with a wooden racket from the 1980s, just around >> the time when McEnroe was on the tennisfield playing there. >> >> Most people are more interested in whether you can win that grandslam >> with what you produce. >> >> The nerds however are interested in how well you can do with a wooden >> racket >> from 1980s,therefore projecting your own interest upon those students >> will just >> get them desinterested and you will be judged by them as an >> irrelevant person >> in their life, whose name they soon forget. > > Vincent, I think the only person projecting here is you. You refer to > the 'average guy'. The word 'average' itself implies that statistics > have been collected and analyzed. Can you please show us your > statistics, and how you collected them, to determine what the average > guy is interested in? And what about the average girl, what is she > interested in? If you are merely citing the work of other > researchers, > please include citations. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 10:10:40 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 07:10:40 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> Message-ID: On 1/12/12 6:39 AM, "Vincent Diepeveen" wrote: >The average guy is not interested in knowing all details regarding >how to >play tennis with a wooden racket from the 1980s, just around >the time when McEnroe was on the tennisfield playing there. > >Most people are more interested in whether you can win that grandslam >with what you produce. > >The nerds however are interested in how well you can do with a wooden >racket >from 1980s,therefore projecting your own interest upon those students >will just >get them desinterested and you will be judged by them as an >irrelevant person >in their life, whose name they soon forget. > Having spent some time recently in Human Resources meetings about how to better recruit software people for JPL, I'd say that something that appeals to nerds and gives them something to do is not all bad. Part of the educational process is to find and separate the people who are interested and have a passion. I'm not sure that someone who starts getting into clusters mostly because they are interested in breaking into the Top500 is the target audience in any case. If you look over the hobby clusters out there, the vast majority are "hey, I heard about this interesting idea, I scrounged up N old/small/slow/easy to find computers and tried to cluster them and do something. I learned something about cluster administration, and it was fun, but I don't use it anymore" This is exactly the population you want to hit. Bring in 100 advanced high school (grade 11-12 in US) students. Have them all use cheap hardware to do a cluster. Some fraction will think, "this is kind of cool, maybe I should major in CS instead of X" Some fraction will think, "how lame, why not make the single processor faster", and they can be CompEng or EE majors looking at how to reduce feature sizes and get the heat out. It's just like biology or chemistry classes. In high school biology (9th/10th grade) most of it is mundane memorization (Krebs cycle, various descriptive stuff. Other than the use of cheap cmos cameras, microscopes used at this level haven't really changed much in the last 100 years (and the microscopes at my kids' school are probably 10-20 years old). They also do some more modern molecular biology in a series of labs partly funded by Amgen: Some recombinant DNA to put fluorescent proteins in a bacteria, running some gels, etc. The vast majority of the students will NOT go on to a career in biology, but some fraction do, they get interested in some aspect, and they wind up majoring in bio, or being a pre-med, etc. Not everyone is looking for the world beater. A lot of kids start with Kart racing, even though even the fastest Karts aren't as fast as F1 (or even a Smart Car). How many engineers started with dismantling the lawnmower engine? For my own work, I'd rather have people who are interested in solving problems by ganging up multiple failure prone processors, rather than centralizing it all in one monolithic box (even if the box happens to have multiple cores). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:13:00 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:13:00 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0EF405.5070600@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> <4F0EF31D.8010603@ias.edu> <4F0EF405.5070600@runnersroll.com> Message-ID: On Jan 12, 2012, at 3:53 PM, Ellis H. Wilson III wrote: > On 01/12/2012 09:50 AM, Prentice Bisbal wrote: >> On 01/12/2012 09:39 AM, Vincent Diepeveen wrote: >>> The average guy is not interested in knowing all details regarding >>> how to >>> play tennis with a wooden racket from the 1980s, just around >>> the time when McEnroe was on the tennisfield playing there. >>> >>> Most people are more interested in whether you can win that >>> grandslam >>> with what you produce. >>> >>> The nerds however are interested in how well you can do with a >>> wooden >>> racket >>> from 1980s,therefore projecting your own interest upon those >>> students >>> will just >>> get them desinterested and you will be judged by them as an >>> irrelevant person >>> in their life, whose name they soon forget. >> >> Vincent, I think the only person projecting here is you. You >> refer to >> the 'average guy'. The word 'average' itself implies that statistics >> have been collected and analyzed. Can you please show us your >> statistics, and how you collected them, to determine what the average >> guy is interested in? And what about the average girl, what is she >> interested in? If you are merely citing the work of other >> researchers, >> please include citations. > > Guys, let's just let this one die in it's traditional form of Vincent > disagrees with the list and there is nothing more that can be done. I Ah no medicine seems to cure you. Let me remember the original posting of Jim: "it seems you could put together a simple demonstration of parallel processing and various message passing things." The insights presented here obviously render this platform as no good for that, not inspiring and for sure the clever students will total get desinterested and a bunch, out of desinterest probably not even finish the course. Working with stuff that isn't even within factor 500 of the speed of a normal CPU that doesn't motivate, doesn't inspire and basically learns a person very little. Embedded cpu's are for professionals, leave it like that. They are too hard for you to program efficiently. > recently read a blog that suggested (due to similar threads following > these trajectories) that the Wulf list wasn't what it used to be. > > Let's save the flames for editors, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:21:54 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:21:54 +0100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> On Jan 12, 2012, at 4:10 PM, Lux, Jim (337C) wrote: > > > On 1/12/12 6:39 AM, "Vincent Diepeveen" wrote: > >> The average guy is not interested in knowing all details regarding >> how to >> play tennis with a wooden racket from the 1980s, just around >> the time when McEnroe was on the tennisfield playing there. >> >> Most people are more interested in whether you can win that grandslam >> with what you produce. >> >> The nerds however are interested in how well you can do with a wooden >> racket >> from 1980s,therefore projecting your own interest upon those students >> will just >> get them desinterested and you will be judged by them as an >> irrelevant person >> in their life, whose name they soon forget. >> > > Having spent some time recently in Human Resources meetings about > how to > better recruit software people for JPL, I'd say that something that > appeals to nerds and gives them something to do is not all bad. > Part of > the educational process is to find and separate the people who are > interested and have a passion. I'm not sure that someone who starts > getting into clusters mostly because they are interested in > breaking into > the Top500 is the target audience in any case. > > If you look over the hobby clusters out there, the vast majority > are "hey, > I heard about this interesting idea, I scrounged up N old/small/ > slow/easy > to find computers and tried to cluster them and do something. I > learned > something about cluster administration, and it was fun, but I don't > use it > anymore" > > This is exactly the population you want to hit. Bring in 100 advanced > high school (grade 11-12 in US) students. Have them all use cheap > hardware to do a cluster. Some fraction will think, "this is kind of > cool, maybe I should major in CS instead of X" Some fraction will > think, Your example here will just take care a big number of students don't want to have to do anything with those studies, as there is a few lame nerds there who toy with equipment that's factor 50k slower (adding to the factor 500 the object oriented slowdown of factor 100) than what they have at home, and it can do nothing useful. But in this specific case you'll just scare away students and the real clever ones will get total desinterested as you are busy with lame duck speed type cpu's. If you'd build a small marsrover with it that would be something else of course. > "how lame, why not make the single processor faster", and they can be > CompEng or EE majors looking at how to reduce feature sizes and get > the > heat out. > > It's just like biology or chemistry classes. In high school biology > (9th/10th grade) most of it is mundane memorization (Krebs cycle, > various > descriptive stuff. Other than the use of cheap cmos cameras, > microscopes > used at this level haven't really changed much in the last 100 > years (and > the microscopes at my kids' school are probably 10-20 years old). They > also do some more modern molecular biology in a series of labs partly > funded by Amgen: Some recombinant DNA to put fluorescent proteins > in a > bacteria, running some gels, etc. The vast majority of the > students will > NOT go on to a career in biology, but some fraction do, they get > interested in some aspect, and they wind up majoring in bio, or > being a > pre-med, etc. > > Not everyone is looking for the world beater. A lot of kids start > with > Kart racing, even though even the fastest Karts aren't as fast as > F1 (or > even a Smart Car). How many engineers started with dismantling the > lawnmower engine? > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens > to have > multiple cores). > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 10:35:41 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 07:35:41 -0800 Subject: [Beowulf] List traffic In-Reply-To: <4F0EF405.5070600@runnersroll.com> Message-ID: On 1/12/12 6:53 AM, "Ellis H. Wilson III" wrote: > I >recently read a blog that suggested (due to similar threads following >these trajectories) that the Wulf list wasn't what it used to be. I think that's for a variety of reasons.. The cluster world has changed. Back 15-20 years ago, clusters were new, novel, and pretty much roll your own, so there was a lot of traffic on the list about how to do that. Remember all the mobo comparisons, and all the carefully teased out idiosyncracies of various switches and network schemes. Back then, the idea of using a cluster for "big computing" was kind of new, as well. People building clusters were doing it either because the architecture was interesting OR because they had a computing problem to solve, and a cluster was a cheap way to do it, especially with free labor. I think clustering has evolved, and the concept of a cluster is totally mature. You can buy a cluster essentially off the shelf, from a whole variety of companies (some with people who were participating in this list back then and still today), and it's interesting how the basic Beowulf concept has evolved. Back in late 90s, it was still largely "commodity computers, commodity interconnects" where the focus was on using "business class" computers and networking hardware. Perhaps not consumer, as cheap as possible, but certainly not fancy, schmancy rack mounted 1U servers.. The switches people were using were just ordinary network switches, the same as in the wiring closet down the hall. Over time, though, there has developed a whole industry of supplying components specifically aimed at clusters: high speed interconnects, computers, etc. Some of this just follows the IT industry in general.. There weren't as many "server farms" back in 1995 as there are now. Maybe it's because the field has matured? So, we're back to talking about "roll-your-own" clusters of one sort or another. I think anyone serious about big cluster computing (>100 nodes) probably won'd be hanging on this list looking for hints on how to route and label their network cables. There's too many other places to go get that information, or, better yet, places to hire someone who already knows. I know that if I needed massive computational power at work, my first thought these days isn't "hey, lets build a cluster", it's "let's call up the HPC folks and get an account on one of the existing clusters". But I still see the need to bring people into the cluster world in some way. I don't know where the cluster vendors find their people, or even what sorts of skill sets they're looking for. Are they beating the bushes at CMU, MIT, and other hotbeds of CS looking for prior cluster design experience? I suspect not, just like most of the people JPL hires don't have spacecraft experience in school, or anywhere. You look for bright people who might be interested in what you're doing, and they learn the details of cluster-wrangling on the job. For myself, I like probing the edges of what you can do with a cluster. Big computational problems don't excite me. I like thinking about things like: 1) What can I use from the body of cluster knowledge to do something different. A distributed cluster is topologically similar to one all contained in a single rack, but it's different. How is it different (latency, error rate)? Can I use analysis (particularly from early cluster days) to do a better job. 2) I've always been a fan of *personal* computing (probably from many years of negotiating for a piece of some shared resource). It's tricky here, because as soon as you have a decent 8 or 16 node cluster that fits under a desk, and have figured out all the hideous complexity of how to port some single user application to run on it, someone comes out with a single processor box that's just as fast, and a lot easier to use. Back in the 80s, I designed, but did not build, a 80286 clone using discrete ECL logic, the idea being to make a 100MHz IBM PC-AT that would run standard spreadsheet software 20 times faster (a big deal when your huge spreadsheet takes hours to recalculate). However, Moore's law and Intel made that idea a losing proposition. But still, the idea of personal control over my computing resources is appealing. Nobody watching to see "are you effectively using those cpu cycles". No arguing about annual re-adjustment of chargeback rates where you take the total system budget and divide it by CPU seconds. Ooops not enough people used it, so your CPU costs just quadrupled. 3) I'm also interested in portable computing (Yes, I have a NEC 8201- TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I had one of those too, etc.) This is another interesting problem space.. No big computer room with infrastructure. Here, the fascinating trade is between local computer horsepower and cheap long distance datacomm. At some point, it's cheaper/easier to send your data via satellite link to a big computer elsewhere and get the results back. It's the classic 60s remote computing problem revisited once again. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 10:56:32 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 16:56:32 +0100 Subject: [Beowulf] Robots In-Reply-To: <4F0EE6FC.2050002@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> Message-ID: <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: > I think this is likely the reason why many > introductory engineering classes incorporate use of Lego Mindstorm > robots rather than lunar rovers (or even overstock lunar rovers :D). I didn't comment on other complete wrong examples, but i want to highlight one. Your example of a lego robot actually is disproving your statement. Amongst the affordable non-self built robots, the lego robot actually is a genius robot. It so to speak the i7-3960x under the robots, to compare it with the fastest i7 that has been released to date. It is affordable, it is completely programmable with robot OS, and if you want to build something better you need to be pretty genius. A custom robot, except if you build a real simple stupid thing that can do near to nothing, that'll be really expensive compared to such lego robot which goes for oh a copule of hundreds of dollars only. I see it for around 280 dollar online, and to add some components is just a few dozens of dollars each copmonent. > The normal way to build 'something better', if better at all, requires building most components for example from aluminium. Each component then has a price of say roughly $5k and needs to be special engineered. You need many of those components. We assume then it's not a commercial project otherwise also royalties will be involved paying for every component you build, of course that's a small part of the above price. Most custom robots, which are hardly bigger in size than the legorobot, they're pretty expensive actually. If you want to purchase components together for a tad bigger robot, just something with 4 wheels which can hold a couple of dozens of kilo's, such components already are $5k - $10k. And that's mass produced components. So building something that actually is more functional, better, that's not gonna be easy. It's a genius robot, really is. In itself it's not really a lot more expensive , if you produce something in the quantities at which lego produces it, to build a bigger robot. The reason the lego robot is very small. has really to do with safety. Big robots rare really dangerous you know. In cars they use already dozens of cpu's, already 10+ year old cars have easily over 100 cpu's inside, just for safety, with the intend that components of the car don't damage humankind. Robotsoftware is far too primitive there yet. No nothing safety concerns. In all that, the lego robot is really a genius thing. Very bad example of what you 'tried' to show with some fake arguments. > > Point in case, I got interested in HPC/Beowulfery back in 2006, read > RGBs book and a few other texts on it, and finally found a small group > (4) of unused PIIIs to play on in the attic of one of my college's > buildings. Did I learn how to setup a reasonable cluster? Yes. > Was it > slow as dirt compared to then modern Intel and AMD processors? Of > course. But did the experience get me so completely hooked on > HPC/Cluster research that I went on to pursue a PHD on the topic? > Absolutely. > > Granted, I'm just one data point, but I think Jim's idea has all the > right components for a great educational experience. > > Best, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 12 11:45:29 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 17:45:29 +0100 Subject: [Beowulf] List traffic In-Reply-To: References: Message-ID: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> Well i feel small clusters of say 2 computers or so might get more common in future. Yet let's start asking: What is a cluster however? That's not such a simple answer. Having a few computers at home connected via a router with simple default ethernet is something many have at home. Is that a cluster? Maybe. Let me focus pon the clusters with a decent network. The decent network clusters suffer from a number of problems. The biggest problem for this list: 0) yesterday i read in the newspaper another Irani scientist was killed by a carbomb. Past few years i really missed experts posting in here and some dorks who really have nothing to contribute to the cluster world, and just are there to be here, like Jonathan Aquilina, they get back in return. So experts leave and idiots come back. This has completely killed this mailing list. 1) The lack of postings by RGB past few months, especially the ones where he explains how easy it is to build a nuke, given the right ingredients, which gives interesting discussions. Let's look to clusters: 10) the lack of software support for clusters This is the real big issue. Sure you can get expensive commercial software to run on clusters, but that's all interesting just for scientists. Which game can effectively use cluster hardware and is dirt cheap? This really is a big issue. Note i intend to contribute myself there to change that, but that's just 1 person of course. Not an entire market moving there 11) the huge break even point of using clusterhardware I can give examples that i sat here at home with next to me Don Dailey, the programmer of Cilkchess, which used Cilk from Leierson. We played Diep at a single cpu against Cilkchess single cpu and Cilkchess got total toasted. After having been fried for 4 consecutive games, Don had enough of it and disconnected the connection to the cluster, from which he used 1 cpu for the games, and started to play at a version at his laptop, which did NOT use CILK. So no parallel framework. It was factor 40 faster. Now note that at tournaments they showed up with 500 or even 1800 cpu's, yet you can't have a cluster of 1800 cpu's at home. Usually building a 4 socket box is far easier, though not necessarily cheaper, and practical faster than a small cluster. Especially AMD has a bunch of cheap 4 socket solutions int he market, if you buy those 2nd hand ,there is not really any competition there from 4 socket clusters in the same price range. 100) the huge increase in power consumption lately of machines. Up to 2002 i used to visit someone, Jan Louwman, who had 36 computres at home, testing chessprograms at home. So that wasn't a cluster, just a bunch of machiens, in sets of 2 machines connected with a special cable we used to play back then machines against each other. Nearly all of those machines was 60-100 watt or so. He had divided his computers over 3 rooms or so, majority in 1 room though. There the 16 ampere @ 230 volt power plug already had problems supplying this amount of electricity. Around the power plug in the wall, the wall and plastic of the powerplug were completely black burned. As there was only a single P4 machine amongst the computers, only 1 box really consumed a lot of power. Try to run 36 computers at home nowadays. Most machines are well over 250 watt, and the fastest 2 machines i've got here eat 410 respectively 270 watt. That's excluding the videocard in the 410 watt machine, as it's out of it currently (AMD HD 6970), the box has been setup for gpgpu. 36 machines eat way way too much power. This is a very simple practical problem that one shouldn't overlook. It's not realistic that the average joe sets up at his popular gaming program a cluster of more than 2 machines or so. A 2 machine cluster will never beat a 2 socket machine, except when each node also has 2 sockets. So clustering simple home computers together isn't really useful except if you really cluster together half a dozen or more. Half a dozen machines, using the 250 watt measure and another 25 watt for each card and 200 watt for the switch, it's gonna eat 6 * 275 + 200 = 1850 watt. You really need diehards for that. They are there and more than you and i guess, but they need SOFTWARE that interests them that can use it in a very efficient manner, clearly proven to them to be working great and easy to install, which refers to point 11. 101) most people like to buy new stuff. new cluster hardware is very expensive for more than 2 computers as it needs a switch. Second hand it's a lot cheaper, sometimes even dirt cheap, yet that's already not what most people like to do 110) Linux had a few setbacks and got less attractive. Say when we had redhat end 90s with x-windows it was slowly improving a lot. Then x64 was there together with a big dang and we went back years and years to x.org. X.org threw back linux 10 years in time. It eats massive RAM, it's ugly bad, it's slow, it's difficult to configure etc. Basically there isn't many good distributions now that are for free. As most clusters work only very well under linux, the difficulty of using linux should really be factored in. Have a problem under linux? Then forget it as a normal user. Now for me linux got MORE attractive as i get hacked total silly by every consultant who on this planet knows how to hack on the internet, yet that's not representative for those with cash who can afford a cluster. Note i don't fall into the cash group. My total income in 2011 was real little. 111) Usually the big cash to afford a cluster is for people with a good job or a tad older, that's usually a different group than the group that can work with linux. See the previous points for that Despite all that i believe clusters will get more popular in future, for a simple reason: processors don't really clock higher. So all software that can use additional calculation power already is getting parallellized or already has been paralelllized. It's a matter of time before some of those applications also will work well at cluster hardware. Yet this is a slow proces and it really requires software that works real efficient at small number of nodes. As an example of why i feel this will happen i give to you the popularity amongst gamers to run 2 graphics cards connected via a bridge with each other within 1 machine. Yet the important factor there is that the games really profit from doing that. On Jan 12, 2012, at 4:35 PM, Lux, Jim (337C) wrote: > > > On 1/12/12 6:53 AM, "Ellis H. Wilson III" > wrote: >> I >> recently read a blog that suggested (due to similar threads following >> these trajectories) that the Wulf list wasn't what it used to be. > > I think that's for a variety of reasons.. > > The cluster world has changed. Back 15-20 years ago, clusters were > new, > novel, and pretty much roll your own, so there was a lot of traffic > on the > list about how to do that. Remember all the mobo comparisons, and > all the > carefully teased out idiosyncracies of various switches and network > schemes. > > Back then, the idea of using a cluster for "big computing" was kind of > new, as well. People building clusters were doing it either > because the > architecture was interesting OR because they had a computing > problem to > solve, and a cluster was a cheap way to do it, especially with free > labor. > > I think clustering has evolved, and the concept of a cluster is > totally > mature. You can buy a cluster essentially off the shelf, from a whole > variety of companies (some with people who were participating in > this list > back then and still today), and it's interesting how the basic Beowulf > concept has evolved. > > Back in late 90s, it was still largely "commodity computers, commodity > interconnects" where the focus was on using "business class" > computers and > networking hardware. Perhaps not consumer, as cheap as possible, but > certainly not fancy, schmancy rack mounted 1U servers.. The switches > people were using were just ordinary network switches, the same as > in the > wiring closet down the hall. > > Over time, though, there has developed a whole industry of supplying > components specifically aimed at clusters: high speed interconnects, > computers, etc. Some of this just follows the IT industry in > general.. > There weren't as many "server farms" back in 1995 as there are now. > > Maybe it's because the field has matured? > > > So, we're back to talking about "roll-your-own" clusters of one > sort or > another. I think anyone serious about big cluster computing (>100 > nodes) > probably won'd be hanging on this list looking for hints on how to > route > and label their network cables. There's too many other places to > go get > that information, or, better yet, places to hire someone who > already knows. > > I know that if I needed massive computational power at work, my first > thought these days isn't "hey, lets build a cluster", it's "let's > call up > the HPC folks and get an account on one of the existing clusters". > > But I still see the need to bring people into the cluster world in > some > way. I don't know where the cluster vendors find their people, or > even > what sorts of skill sets they're looking for. Are they beating the > bushes > at CMU, MIT, and other hotbeds of CS looking for prior cluster design > experience? I suspect not, just like most of the people JPL hires > don't > have spacecraft experience in school, or anywhere. You look for > bright > people who might be interested in what you're doing, and they learn > the > details of cluster-wrangling on the job. > > > For myself, I like probing the edges of what you can do with a > cluster. > Big computational problems don't excite me. I like thinking about > things > like: > > 1) What can I use from the body of cluster knowledge to do something > different. A distributed cluster is topologically similar to one all > contained in a single rack, but it's different. How is it different > (latency, error rate)? Can I use analysis (particularly from early > cluster > days) to do a better job. > > 2) I've always been a fan of *personal* computing (probably from many > years of negotiating for a piece of some shared resource). It's > tricky > here, because as soon as you have a decent 8 or 16 node cluster > that fits > under a desk, and have figured out all the hideous complexity of > how to > port some single user application to run on it, someone comes out > with a > single processor box that's just as fast, and a lot easier to use. > Back > in the 80s, I designed, but did not build, a 80286 clone using > discrete > ECL logic, the idea being to make a 100MHz IBM PC-AT that would run > standard spreadsheet software 20 times faster (a big deal when your > huge > spreadsheet takes hours to recalculate). However, Moore's law and > Intel > made that idea a losing proposition. > > But still, the idea of personal control over my computing resources is > appealing. Nobody watching to see "are you effectively using those > cpu > cycles". No arguing about annual re-adjustment of chargeback rates > where > you take the total system budget and divide it by CPU seconds. > Ooops not > enough people used it, so your CPU costs just quadrupled. > > 3) I'm also interested in portable computing (Yes, I have a NEC 8201- > TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I > had one > of those too, etc.) This is another interesting problem space.. > No big > computer room with infrastructure. Here, the fascinating trade is > between > local computer horsepower and cheap long distance datacomm. At some > point, it's cheaper/easier to send your data via satellite link to > a big > computer elsewhere and get the results back. It's the classic 60s > remote > computing problem revisited once again. > > >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Jan 12 11:49:25 2012 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 12 Jan 2012 11:49:25 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: snip > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens to have > multiple cores). > This is going to be an exascale issue. i.e. how to compute on a systems whose parts might be in a constant state of breaking. An other interesting question is how do you know you are getting the right answer on a *really* large system? Of course I spend much of my time optimizing really small systems. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Thu Jan 12 11:58:32 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 12 Jan 2012 17:58:32 +0100 Subject: [Beowulf] Adding 1 point In-Reply-To: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> References: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> Message-ID: What really made small clusters at home less attractive, there is another reason i should add : That's the rise of cheap multi socket machines. A 2 socket machine is not so expensive anymore nowadays. So if you want faster than 1 socket, you buy a 2 socket machine. If you want faster than that , 4 sockets is there. That choice wasn't there before end 90s easily available. And in the 21th century it has become cheap. Another delaying factor is the rise of so many cores per node. AMD and intel sell cpu's for their 4 socket line that has up to double the amount of nodes than you can have in a single socket box. So it's equivalent nearly to 8 nodes, be it low clocked. For that reason clusters tend to get more effective at a dozen nodes or more, assuming cheap single socket nodes. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 12:26:01 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 12:26:01 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> References: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> Message-ID: <4F0F17A9.7010400@runnersroll.com> On 01/12/2012 10:21 AM, Vincent Diepeveen wrote: > On Jan 12, 2012, at 4:10 PM, Lux, Jim (337C) wrote: >> This is exactly the population you want to hit. Bring in 100 advanced >> high school (grade 11-12 in US) students. Have them all use cheap >> hardware to do a cluster. Some fraction will think, "this is kind of >> cool, maybe I should major in CS instead of X" Some fraction will >> think, > > Your example here will just take care a big number of students don't > want > to have to do anything with those studies, as there is a few lame nerds > there who toy with equipment that's factor 50k slower (adding to the > factor 500 > the object oriented slowdown of factor 100) than what they have > at home, and it can do nothing useful. > > But in this specific case you'll just scare away students and the > real clever ones > will get total desinterested as you are busy with lame duck speed > type cpu's. You have made it abundantly clear you aren't interested in enrolling in such a course. Thanks for your comments. On a related note, as I was thinking about 'lame duck' education, I remembered that I took an undergraduate machine learning course in which we designed players for connect-four, which would compete using recently learned techniques against other students in the class. Despite that particular game being a solved one, we all had a blast and got quite competitive trying to beat each other out using the recently acquired skills. I would encourage Jim to do something similar once the basics of cluster administration are done -- perhaps a mini SC Cluster Competition would be a neat application for the Arduinos? Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 12:35:11 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 12:35:11 -0500 Subject: [Beowulf] Robots In-Reply-To: <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> Message-ID: <4F0F19CF.2050603@runnersroll.com> On 01/12/2012 10:56 AM, Vincent Diepeveen wrote: > On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: >> I think this is likely the reason why many >> introductory engineering classes incorporate use of Lego Mindstorm >> robots rather than lunar rovers (or even overstock lunar rovers :D). > > I didn't comment on other complete wrong examples, but i want to highlight > one. Your example of a lego robot actually is disproving your statement. It was a price comparison, and without diving into the nitty-gritty of how good or bad both the Arduino and the Mindstorms are in their respective areas, it was spot on. Jim wants to give each student a 10 node cluster on the cheap (i.e. 20 to 30 bucks per node = 300 bucks), universities want to give each student (or teams of students sometimes) a robot (~280). Both provide an approachable level of difficulty and potential for education at a reasonable price. Feel free to continue to disagree for the sake of such. It was just an example. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 12:54:52 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 09:54:52 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: -----Original Message----- From: Douglas Eadline [mailto:deadline at eadline.org] Sent: Thursday, January 12, 2012 8:49 AM To: Lux, Jim (337C) Cc: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos snip > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens to > have multiple cores). > This is going to be an exascale issue. i.e. how to compute on a systems whose parts might be in a constant state of breaking. An other interesting question is how do you know you are getting the right answer on a *really* large system? Of course I spend much of my time optimizing really small systems. -- Your point about scaling is well taken.. so far, the computing world has largely dealt with things by trying to make the processor perfect and error free. Some limited areas of error correction are popular (RAM). But think in a bigger area... say your arithmetic unit has some infrequent unknown errors (e.g. FDIV bug on Pentium).. could clever algorithm design and multiple processors (or multi cores) mitigate this (e.g. instead of just computing Z = X/Y you also compute Z1 = (X*2)/(Y*2).. and compare answers... that exact example's not great because you've added 2 operations, but I can see that there are other clever techniques that might be possible.. ) What is nice if you can do things like temporal redundancy (do the calculation twice, and if it's different, do it a third time), or even better some sort of "check calculation" that takes small time compared to mainline calculation. This, I think, is somewhere that even the big iron/cluster folks could be doing some research. What are optimum communication fabrics to support this kind of "side calculation" which may have different communication patterns and data flow than the "mainline". It has a parallel in things like CRC checks in communications protocols. A lot of hardware has a dedicated little CRC checker that is continuously calculating the CRC as the bits arrive, so that when you get to the end of the frame, the answer is already there. And Doug, your small systems have a lot of the same issues, perhaps because that small Limulus might be operated in environments other than what the underlying hardware was designed for. I know people who have been rudely surprised when they found that the design environment for a laptop is a pretty narrow temperature range (e.g. office desktop) and when they put them in a car, subject to 0C or 40C temperatures, if not wider, that things don't work quite as well as expected. Very small systems (few nodes) have the same issues, in some environments (e.g. a cluster subject to single event upsets or functional interrupts in a high radiation environment with a lot of high energy charged particles. it's not so much a total dose thing, but a SEE thing) For Juno (which is in polar orbit around Jupiter), we shielded everything in a vault (a 1 meter cube with 1cm thick titanium walls) and still it's an issue. We don't get very long before everything is cooked. And I think that a non-trivially small cluster (e.g. more than 4 nodes, I think) you could do a lot of experimentation on techniques. (oddly, simulated fault injection is one of the trickier parts) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 12:55:41 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 12:55:41 -0500 Subject: [Beowulf] List traffic In-Reply-To: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> References: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> Message-ID: <4F0F1E9D.9000800@runnersroll.com> I really should be following Joe's advice circa 2008 and just not responding, but I can't help myself. On 01/12/2012 11:45 AM, Vincent Diepeveen wrote: > The biggest problem for this list: > 1) The lack of postings by RGB past few months, especially the ones > where he explains how easy > it is to build a nuke, given the right ingredients, which gives > interesting discussions. The last post from RGB was a long, long discussion about how very wrong you were about RNGs. You just don't get it. It's okay to be wrong once in a while Vincent, and even moreso to just agree to disagree. Foolish, unedited and inflammatory diatribes with a unnatural dose of newlines are what is killing this list and what that blog I referenced was specifically disappointed with. So please, I'm begging you. Stop writing huge emails that trail off from their original point. Try to say things in a non-inflammatory manner. Use spell-check, and try to read your emails once before sending them. And last of all, remember that there are many people on this list that have all sorts of different applications -- not just Chess. Your experience does not generalize well to all areas. Speaking of which, for anyone who is interested in doing serious work with low-power processors, please see a paper named FAWN for an excellent example of use-cases where low hertz low power processors can do some great work. It's by Dave Anderson of CMU. I was lucky enough to be invited to the CMU PDL retreat a few months back and had a nice conversation about the project when we went for a run together. There are some use-cases that benefit massively from that kind of architecture. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 13:10:24 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 10:10:24 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <4F0F17A9.7010400@runnersroll.com> References: <4B53A30C-FF3E-4996-916F-2D8455C90C5D@xs4all.nl> <4F0F17A9.7010400@runnersroll.com> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Ellis H. Wilson III Sent: Thursday, January 12, 2012 9:26 AM To: beowulf at beowulf.org Subject: Re: [Beowulf] A cluster of Arduinos On 01/12/2012 10:21 AM, Vincent Diepeveen wrote: > On Jan 12, 2012, at 4:10 PM, Lux, Jim (337C) wrote: >> This is exactly the population you want to hit. Bring in 100 >> advanced high school (grade 11-12 in US) students. Have them all use >> cheap hardware to do a cluster. Some fraction will think, "this is >> kind of cool, maybe I should major in CS instead of X" Some fraction >> will think, > > Your example here will just take care a big number of students don't > want to have to do anything with those studies, as there is a few lame > nerds there who toy with equipment that's factor 50k slower (adding to > the factor 500 the object oriented slowdown of factor 100) than what > they have at home, and it can do nothing useful. > > But in this specific case you'll just scare away students and the real > clever ones will get total desinterested as you are busy with lame > duck speed type cpu's. You have made it abundantly clear you aren't interested in enrolling in such a course. Thanks for your comments. On a related note, as I was thinking about 'lame duck' education, I remembered that I took an undergraduate machine learning course in which we designed players for connect-four, which would compete using recently learned techniques against other students in the class. Despite that particular game being a solved one, we all had a blast and got quite competitive trying to beat each other out using the recently acquired skills. I would encourage Jim to do something similar once the basics of cluster administration are done -- perhaps a mini SC Cluster Competition would be a neat application for the Arduinos? ---------------------------------------- Ooohh.. that sounds *very* cool.. A bunch of slow processors. A simple problem to solve (e.g. 3D tic-tac-toe) for which there might even be published parallel approaches The challenge is effectively using the limited system, warts and all. The RaspberryPI might be a better vehicle, if it hits the price/availability targets: Comparable to Arduinos in price, but a bit more sophisticated and less contrived. We've been talking about what kind of software competitions JPL could run as a recruiting tool at Universities, and that's along those lines. Hmm... I wonder if they'd be willing to spend recruiting funds on that? (probably not.. we're all poor this fiscal year) And, on the undergrad education thing... At UCLA, I had to write stuff in MIXAL to run on a simulated MIX machine and complained mightily to the TAs, who just pointed to the sacred texts of Knuth, rather than giving an intelligent response as to why we didn't do something like work in PDP-11 ASM or System/360 BAL. (UCLA at the time had a monster 360, but I don't know that they had many 11s, and realistically, BAL is not something I'd inflict on 2nd quarter first year students. We were a PL/I or PL/C shop in the first couple years' classes for the most part, although there were people doing Algol) OTOH, I suspect was an atypical incoming student for 1977. I had, the previous year, done the Pascal courses at UCSD with p-machines running on LSI-11s as well as the Pascal system on the big Burroughs B6700, which uses a form of ALGOL as the machine language and is a stack machine to boot (how cool is that? Burroughs always did have cool machines.. Hey, they built ILLIAC IV). I had also done some ASM stuff on an 11/20 under RT-11. I guess that's characteristic of the differences in philosophy between different CS departments (UCSD was heading more in the direction of Software Engineering being part of the School of Engineering and Applied Sciences, while UCLA it was part of the Math department. Little did I know, as a cybernetics major, what the difference was: It sure as heck isn't manifested in the course catalog, at least in a form that a incoming student could discern. Going back now, I could probably look at catalogs from the various universities of the era and divine their philosophies, but that's clearly 2020 hindsight ) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 13:22:26 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 10:22:26 -0800 Subject: [Beowulf] FAWN Message-ID: Fast Array of Wimpy Nodes.. http://www.cs.cmu.edu/~fawnproj/ Very cool stuff... Their original motivation (reduction of power) is at a much larger scale than my work usually works at (they're talking megawatts in googleish clusters.. I worry about watts derived from solar panels and such) But it's a whole 'nother twist on the idea of clustering of low performance nodes (by some metric.. they've got good nanojoule/operation metrics) . And they're doing a very clever thing where they work with the very asymmetric read/write speeds on flash memory. (And FLASH memory is something I spend a lot of time thinking about these days.. It's what we use in space for NVRAM these days) Looks like I've got some reading for the holiday weekend. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ellis at runnersroll.com Thu Jan 12 13:26:26 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 13:26:26 -0500 Subject: [Beowulf] FAWN In-Reply-To: References: Message-ID: <4F0F25D2.90305@runnersroll.com> On 01/12/2012 01:22 PM, Lux, Jim (337C) wrote: > But it?s a whole ?nother twist on the idea of clustering of low > performance nodes (by some metric.. they?ve got good nanojoule/operation > metrics) . > Not just good, from a sorting perspective, /best/: http://sortbenchmark.org/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Jan 12 13:47:21 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 12 Jan 2012 13:47:21 -0500 Subject: [Beowulf] List traffic In-Reply-To: <4F0F1E9D.9000800@runnersroll.com> References: <4AB87920-7A8F-41B4-8129-3C19218FD9AB@xs4all.nl> <4F0F1E9D.9000800@runnersroll.com> Message-ID: <4F0F2AB9.5060105@scalableinformatics.com> On 01/12/2012 12:55 PM, Ellis H. Wilson III wrote: > I really should be following Joe's advice circa 2008 and just not > responding, but I can't help myself. huh ...? > > On 01/12/2012 11:45 AM, Vincent Diepeveen wrote: >> The biggest problem for this list: >> 1) The lack of postings by RGB past few months, especially the ones >> where he explains how easy >> it is to build a nuke, given the right ingredients, which gives >> interesting discussions. > > The last post from RGB was a long, long discussion about how very wrong > you were about RNGs. You just don't get it. It's okay to be wrong once > in a while Vincent, and even moreso to just agree to disagree. Foolish, > unedited and inflammatory diatribes with a unnatural dose of newlines > are what is killing this list and what that blog I referenced was > specifically disappointed with. > > So please, I'm begging you. Stop writing huge emails that trail off > from their original point. Try to say things in a non-inflammatory ... oh ... never mind :) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 12 14:08:38 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 12 Jan 2012 11:08:38 -0800 Subject: [Beowulf] FAWN In-Reply-To: <4F0F25D2.90305@runnersroll.com> References: <4F0F25D2.90305@runnersroll.com> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Ellis H. Wilson III Sent: Thursday, January 12, 2012 10:26 AM To: beowulf at beowulf.org Subject: Re: [Beowulf] FAWN On 01/12/2012 01:22 PM, Lux, Jim (337C) wrote: > But it's a whole 'nother twist on the idea of clustering of low > performance nodes (by some metric.. they've got good > nanojoule/operation > metrics) . > Not just good, from a sorting perspective, /best/: http://sortbenchmark.org/ ------------- I was thinking that their low powered nodes are poor in an absolute performance standpoint (i.e. MIPS), but actually quite good on a computation work per joule basis. Yes, for sorting, they are kicking rear. This is interesting, but when you start talking power consumption, one needs to be careful about where you draw boundaries and what's "in the system". Do you count conversion efficiency in the power supply? At one level, you say, no, just worry about DC power consumption, but even there.. is it at the board edge, or at the chip? Something drawing 100Amps at 0.5V is a very different beast than something drawing 10Amps at 5V, and you can't locally optimize too far because your choices inside box A start to affect the design and performance of Box B and Box C. The contest rules point to a variety of power measurement systems, but based on what I see there, I think there's some scope for "gaming" the system. It sort of seems it's "wall plug power", but then, they do allow DC power systems. For instance, one could tune the power supply for the expected load conditions.. You could run those fans at warp speed before the test run starts to cool down as much as possible, and then slow them down (saving power) during the run, maybe even letting the processor get pretty hot. Sort of like running a top fuel dragster. Only has to go fast for 3 or 4 seconds, so why bother putting in a water pump. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at runnersroll.com Thu Jan 12 14:40:15 2012 From: ellis at runnersroll.com (Ellis H. Wilson III) Date: Thu, 12 Jan 2012 14:40:15 -0500 Subject: [Beowulf] FAWN In-Reply-To: References: <4F0F25D2.90305@runnersroll.com> Message-ID: <4F0F371F.2060704@runnersroll.com> On 01/12/2012 02:08 PM, Lux, Jim (337C) wrote: > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Ellis H. Wilson III > Sent: Thursday, January 12, 2012 10:26 AM > To: beowulf at beowulf.org > Subject: Re: [Beowulf] FAWN > > On 01/12/2012 01:22 PM, Lux, Jim (337C) wrote: >> But it's a whole 'nother twist on the idea of clustering of low >> performance nodes (by some metric.. they've got good >> nanojoule/operation >> metrics) . >> > > Not just good, from a sorting perspective, /best/: > http://sortbenchmark.org/ > ------------- > > I was thinking that their low powered nodes are poor in an absolute performance standpoint (i.e. MIPS), but actually quite good on a computation work per joule basis. > > Yes, for sorting, they are kicking rear. > > > This is interesting, but when you start talking power consumption, one needs to be careful about where you draw boundaries and what's "in the system". Do you count conversion efficiency in the power supply? At one level, you say, no, just worry about DC power consumption, but even there.. is it at the board edge, or at the chip? Something drawing 100Amps at 0.5V is a very different beast than something drawing 10Amps at 5V, and you can't locally optimize too far because your choices inside box A start to affect the design and performance of Box B and Box C. > > > The contest rules point to a variety of power measurement systems, but based on what I see there, I think there's some scope for "gaming" the system. It sort of seems it's "wall plug power", but then, they do allow DC power systems. > > For instance, one could tune the power supply for the expected load conditions.. You could run those fans at warp speed before the test run starts to cool down as much as possible, and then slow them down (saving power) during the run, maybe even letting the processor get pretty hot. > > Sort of like running a top fuel dragster. Only has to go fast for 3 or 4 seconds, so why bother putting in a water pump. All fair points, and I can't contest the suggestion that they likely tune their algorithm and physical units very highly to perform well for this sorting environment. Dave actually keeps a pretty balanced perspective when discussing this, as shown in his reaction to Google talking down wimpy nodes. Wired has a nice article on it, with inside it a link to Googles pub that discusses the other half of the coin: http://www.wired.com/wiredenterprise/2012/01/wimpy_nodes/ Some more reading material for the weekend ;). ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Jan 12 15:45:16 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 12 Jan 2012 15:45:16 -0500 Subject: [Beowulf] Partial OT: CPU grouping control for Windows 2008 R2 x64 server for big calcs Message-ID: <4F0F465C.4010301@scalableinformatics.com> Ok, this one is fun. For some definitions of fun. Unusual definitions of fun... And there is a question towards the end. This is for folks who've been administrating clusters and HPC systems with big windows machines (32+ CPUs and large RAM). Imagine you have a machine as part of a very loose computing cluster. End user wants to run Windows (2008R2 x64 enterprise) on it. This machine has 32 processor cores (real ones, no hyperthreading), 1TB ram. Yeah, its a fun machine to work on. I won't discuss the OS choice here. You can see some of my playing with it here: http://scalability.org/?p=3541 and http://scalability.org/?p=3515 Windows machines can let up to 64 logical processors be part of a "group". A group is a scheduling artifice, and not necessarily directly related to the NUMA system ... think of it as a layer abstraction above this. Ok, still with me? This scheduling artifice, these groups, require at minimum a recompilation to work properly with. Its actually more than that, they do require some additional processor affinity bits be handled. If you have a code which doesn't handle this correctly, it will probably crash. Or not work well. Or both. Matlab appears to be such a beast. This isn't necessarily a Matlab issue per se, it appears to be something of a design compromise issue in Windows. Windows wasn't designed with large processor counts in mind. The changes they'd need to make in order to enable a single large spanning entity across all CPUs at once are quite likely not in the companies best interests, as there are very few customers with such machines. Still with me? Here's the problem. Matlab seems to crash (according to the user) if run on a unit with more than one group. I've not been able to verify on the machine yet myself, but I have no reason to disbelieve this. The issue as its been stated to me is that if there is more than one group of processors, Matlab crashes. This is the symptom. When the unit boots by default, we have 2 16 processor groups. So looking at bcdedit examples, I see how to turn off groups. One minor problem. It doesn't work. I can do an bcdedit /set groupaware off reboot. Which should completely disable groups, so that all 32 processor are in one group. Still 2 groups. I can do an bcdedit /set groupsize 64 reboot. Still 2 groups. So far, the only thing that seems to change this is if I install the hyperV role. With that, there is now 1 group. Looking at all the boot options with bcdedit /enum there's only one config for boot, and its the default. So ... my questions 1) Does Windows really ignore its approximate equivalent to its boot options on a grub line? 2) Is there any way to compel Windows to do the right thing? As noted, this is for a computing cluster. Our recommended OS isn't feasible right now for them and their application. Definitely annoying. I'd love there to be a bios setting to help windows past its desire to ignore my requested number of groups. Not sure if adding in the hyperV will impact performance (did some base testing with Scilab to see, and I didn't see anything I'd call significant). Will be bugging Microsoft about this as well (pretty obviously a bug in 2008R2 x64). And related to this, I read something about limits in the different windows editions. Is anyone using Windows HPC cluster on big memory machines with lots of cores? Looking at the Microsoft docs, they indicate some relatively low limits on ram and processor count. So does this mean that they won't be supporting Interlagos 4 socket machines 16 cores per socket and 1/2 TB ram in compute nodes for Windows HPC ? I am just imagining someone buying a few of those nodes and being required to buy Enterprise or Data center licenses for those machines (which clearly would not be used for anything more than HPC). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Fri Jan 13 00:36:50 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 13 Jan 2012 16:36:50 +1100 Subject: [Beowulf] FAWN In-Reply-To: <4F0F25D2.90305@runnersroll.com> References: <4F0F25D2.90305@runnersroll.com> Message-ID: <4F0FC2F2.5090606@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/01/12 05:26, Ellis H. Wilson III wrote: > Not just good, from a sorting perspective, /best/: > http://sortbenchmark.org/ But that algorithm isn't running on exactly wimpy hardware.. Intel Core i5-2400S 2.5 GHz, 16GB RAM and a bunch of SSDs cheers! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8PwvIACgkQO2KABBYQAh84cgCfQZN1ZpKfzxLmazCiZLg93n89 dwYAoIZHAFmUYENP2xwMwo5M3xile4F3 =4lFT -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 13 09:01:59 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 13 Jan 2012 15:01:59 +0100 Subject: [Beowulf] Robots In-Reply-To: <4F0F19CF.2050603@runnersroll.com> References: <4F0DBFD3.3070503@ias.edu> <2734FA76-6286-486F-B762-A48E2EAEF612@xs4all.nl> <0249BC0E-8C7A-470E-933B-A0CBCC272888@xs4all.nl> <7B548142-9871-4DCF-8EF1-90DA5A4BDEF6@xs4all.nl> <4F0EE6FC.2050002@runnersroll.com> <95F43560-B9AA-4B56-9B32-1A1979461B07@xs4all.nl> <4F0F19CF.2050603@runnersroll.com> Message-ID: <01D34971-9054-4F19-9776-8F107B118A1D@xs4all.nl> On Jan 12, 2012, at 6:35 PM, Ellis H. Wilson III wrote: > On 01/12/2012 10:56 AM, Vincent Diepeveen wrote: >> On Jan 12, 2012, at 2:58 PM, Ellis H. Wilson III wrote: >>> I think this is likely the reason why many >>> introductory engineering classes incorporate use of Lego Mindstorm >>> robots rather than lunar rovers (or even overstock lunar rovers :D). >> >> I didn't comment on other complete wrong examples, but i want to >> highlight >> one. Your example of a lego robot actually is disproving your >> statement. > > It was a price comparison, and without diving into the nitty-gritty > of how good or bad both the Arduino and the Mindstorms are in their > respective areas, it was spot on. Jim wants to give each student a > 10 node cluster on the cheap (i.e. 20 to 30 bucks per node = 300 > bucks), universities want to give each student (or teams of > students sometimes) a robot (~280). Both provide an approachable > level of difficulty and potential for education at a reasonable price. > > Feel free to continue to disagree for the sake of such. It was > just an example. > > Best, > > ellis It's not even spot on. You're lightyears away with your comparision. You're comparing one of the best available robots that gets mass produced, with some freak thing where there is 100 alternatives which work way better, alternatives are 500x faster, and if you want to also cheaper, and above all achieve the original goal better of demonstrating SMP programming, as the freak hardware, thanks to real low clocked type of CPU, has a neglectible latency to other cpu's. Where the robot shows you how to work with robots, the educational purpose as Jim wrote down, you won't get very well with the embedded cpu's, as the equipment has none of the typical problems you can encounter in a normal SMP system let alone a cluster environment, meanwhile it has total other problems, which you will never encounter at CPU's. Such as that embedded cpu's have severely limited caches and can execute just 1 instruction at a time. Embedded programming is total different from CPU programming and latencies embedded, thanks to the slow processor speed, are not even comparable with SMP programming between cores of 1 cpu. Such multicore box definitely has a cost below $300. On ebay i see nodes with 8 cores for $200. And those are 500x faster. Myself i'm looking at some socket 771 Xeon machines say with a L5420. Though they eat a lot more power than intel claims, it's still i guess a 170 watt a machine or so under full load. Note we still skipped the algorithmic discussion, as from algorithmic viewpoint, if i look to artificial intelligence, getting something to work at 70Mhz machines is gonna behave total different and needs total different approach than todays hardware. It's not even in the same ballpark. Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ntmoore at gmail.com Fri Jan 13 09:33:33 2012 From: ntmoore at gmail.com (Nathan Moore) Date: Fri, 13 Jan 2012 08:33:33 -0600 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: <41677598-47C5-4592-BDD6-314CB0EC860E@xs4all.nl> Message-ID: Jim, Have you ever interacted with the "Modeling Instruction" folks over at ASU? http://modeling.asu.edu/ They've done, for HS Physics, more or less what you're talking about in terms of making the subject engaging, compelling, and diven by student, not teacher, interest. On Thu, Jan 12, 2012 at 9:10 AM, Lux, Jim (337C) wrote: > > > On 1/12/12 6:39 AM, "Vincent Diepeveen" wrote: > >>The average guy is not interested in knowing all details regarding >>how to >>play tennis with a wooden racket from the 1980s, just around >>the time when McEnroe was on the tennisfield playing there. >> >>Most people are more interested in whether you can win that grandslam >>with what you produce. >> >>The nerds however are interested in how well you can do with a wooden >>racket >>from 1980s,therefore projecting your own interest upon those students >>will just >>get them desinterested and you will be judged by them as an >>irrelevant person >>in their life, whose name they soon forget. >> > > Having spent some time recently in Human Resources meetings about how to > better recruit software people for JPL, I'd say that something that > appeals to nerds and gives them something to do is not all bad. Part of > the educational process is to find and separate the people who are > interested and have a passion. ?I'm not sure that someone who starts > getting into clusters mostly because they are interested in breaking into > the Top500 is the target audience in any case. > > If you look over the hobby clusters out there, the vast majority are "hey, > I heard about this interesting idea, I scrounged up N old/small/slow/easy > to find computers and tried to cluster them and do something. ?I learned > something about cluster administration, and it was fun, but I don't use it > anymore" > > This is exactly the population you want to hit. ?Bring in 100 advanced > high school (grade 11-12 in US) students. ?Have them all use cheap > hardware to do a cluster. ?Some fraction will think, "this is kind of > cool, maybe I should major in CS instead of X" ?Some fraction will think, > "how lame, why not make the single processor faster", and they can be > CompEng or EE majors looking at how to reduce feature sizes and get the > heat out. > > It's just like biology or chemistry classes. ?In high school biology > (9th/10th grade) most of it is mundane memorization (Krebs cycle, various > descriptive stuff. ?Other than the use of cheap cmos cameras, microscopes > used at this level haven't really changed much in the last 100 years (and > the microscopes at my kids' school are probably 10-20 years old). They > also do some more modern molecular biology in a series of labs partly > funded by Amgen: ? Some recombinant DNA to put fluorescent proteins in a > bacteria, running some gels, etc. ?The vast majority of the students will > NOT go on to a career in biology, but some fraction do, they get > interested in some aspect, and they wind up majoring in bio, or being a > pre-med, etc. > > Not everyone is looking for the world beater. ?A lot of kids start with > Kart racing, even though even the fastest Karts aren't as fast as F1 (or > even a Smart Car). ?How many engineers started with dismantling the > lawnmower engine? > > > For my own work, I'd rather have people who are interested in solving > problems by ganging up multiple failure prone processors, rather than > centralizing it all in one monolithic box (even if the box happens to have > multiple cores). > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- - - - - - - -?? - - - - - - -?? - - - - - - - Nathan Moore Associate Professor, Physics Winona State University - - - - - - -?? - - - - - - -?? - - - - - - - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Fri Jan 13 09:38:28 2012 From: deadline at eadline.org (Douglas Eadline) Date: Fri, 13 Jan 2012 09:38:28 -0500 Subject: [Beowulf] FAWN In-Reply-To: <4F0FC2F2.5090606@unimelb.edu.au> References: <4F0F25D2.90305@runnersroll.com> <4F0FC2F2.5090606@unimelb.edu.au> Message-ID: <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 13/01/12 05:26, Ellis H. Wilson III wrote: > >> Not just good, from a sorting perspective, /best/: >> http://sortbenchmark.org/ > > But that algorithm isn't running on exactly wimpy hardware.. > > Intel Core i5-2400S 2.5 GHz, 16GB RAM and a bunch of SSDs I can vouch for the i5-2400S processors, one of the best values out there, I got 200 GFLOPS on a Limulus using 4 of these. Some more benchmarks here: http://www.clustermonkey.net//content/view/306/1/ -- Doug > > cheers! > Chris > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8PwvIACgkQO2KABBYQAh84cgCfQZN1ZpKfzxLmazCiZLg93n89 > dwYAoIZHAFmUYENP2xwMwo5M3xile4F3 > =4lFT > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at eadline.org Fri Jan 13 10:18:02 2012 From: deadline at eadline.org (Douglas Eadline) Date: Fri, 13 Jan 2012 10:18:02 -0500 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> > > > -----Original Message----- > From: Douglas Eadline [mailto:deadline at eadline.org] > Sent: Thursday, January 12, 2012 8:49 AM > To: Lux, Jim (337C) > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] A cluster of Arduinos > > snip >> >> >> For my own work, I'd rather have people who are interested in solving >> problems by ganging up multiple failure prone processors, rather than >> centralizing it all in one monolithic box (even if the box happens to >> have multiple cores). >> > > This is going to be an exascale issue. i.e. how to compute on a systems > whose parts might be in a constant state of breaking. An other interesting > question is how do you know you are getting the right answer on a *really* > large system? > > Of course I spend much of my time optimizing really small systems. > > -- > > Your point about scaling is well taken.. so far, the computing world has > largely dealt with things by trying to make the processor perfect and > error free. Some limited areas of error correction are popular (RAM). > But think in a bigger area... say your arithmetic unit has some infrequent > unknown errors (e.g. FDIV bug on Pentium).. could clever algorithm design > and multiple processors (or multi cores) mitigate this (e.g. instead of > just computing Z = X/Y you also compute Z1 = (X*2)/(Y*2).. and compare > answers... that exact example's not great because you've added 2 > operations, but I can see that there are other clever techniques that > might be possible.. ) > > What is nice if you can do things like temporal redundancy (do the > calculation twice, and if it's different, do it a third time), or even > better some sort of "check calculation" that takes small time compared to > mainline calculation. > > This, I think, is somewhere that even the big iron/cluster folks could be > doing some research. What are optimum communication fabrics to support > this kind of "side calculation" which may have different communication > patterns and data flow than the "mainline". It has a parallel in things > like CRC checks in communications protocols. A lot of hardware has a > dedicated little CRC checker that is continuously calculating the CRC as > the bits arrive, so that when you get to the end of the frame, the answer > is already there. > > > And Doug, your small systems have a lot of the same issues, perhaps > because that small Limulus might be operated in environments other than > what the underlying hardware was designed for. I know people who have > been rudely surprised when they found that the design environment for a > laptop is a pretty narrow temperature range (e.g. office desktop) and when > they put them in a car, subject to 0C or 40C temperatures, if not wider, > that things don't work quite as well as expected. I will be curious to see where these things show up since all you really need is a power plug. (a little nervous actually). > > Very small systems (few nodes) have the same issues, in some environments > (e.g. a cluster subject to single event upsets or functional interrupts in > a high radiation environment with a lot of high energy charged particles. > it's not so much a total dose thing, but a SEE thing) > > For Juno (which is in polar orbit around Jupiter), we shielded everything > in a vault (a 1 meter cube with 1cm thick titanium walls) and still it's > an issue. We don't get very long before everything is cooked. > > And I think that a non-trivially small cluster (e.g. more than 4 nodes, I > think) you could do a lot of experimentation on techniques. I agree. Four nodes is really small. BTW, the most fun in designing this system is a set of tighter constraints than are found on the typical cluster. Noise, power, space, cabling, low cost packaging, etc. I have been asked about a rack mount version, we'll see. One thing I find interesting is the core/node efficiency. (what I call "effective cores") In general *on some codes*, I found that less cores (1P micro-atx 4-cores) is more efficient than many cores (2P server 12-core). Seems obvious, but I like to test things. > > > (oddly, simulated fault injection is one of the trickier parts) > I would assume, because in a sense, the black swan* is by definition hard to predict. (* the book by Nick Taleb, not the movie) -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Fri Jan 13 11:26:29 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Fri, 13 Jan 2012 08:26:29 -0800 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> Message-ID: On 1/13/12 7:18 AM, "Douglas Eadline" wrote: >> >> >> And Doug, your small systems have a lot of the same issues, perhaps >> because that small Limulus might be operated in environments other than >> what the underlying hardware was designed for. I know people who have >> been rudely surprised when they found that the design environment for a >> laptop is a pretty narrow temperature range (e.g. office desktop) and >>when >> they put them in a car, subject to 0C or 40C temperatures, if not wider, >> that things don't work quite as well as expected. > >I will be curious to see where these things show up since >all you really need is a power plug. (a little nervous actually). Yes.. That *will* be interesting... And wait til someone has a cluster of Limuluses (Not sure of the proper alliterative collective noun, nor the plural form.. A litany of limuli? A school? A murder?) > >I agree. Four nodes is really small. BTW, the most fun in designing >this system is a set of tighter constraints than are found on the typical >cluster. Noise, power, space, cabling, low cost packaging, etc. I have >been asked about a rack mount version, we'll see. > >One thing I find interesting is the core/node efficiency. >(what I call "effective cores") In general *on some codes*, I found that >less cores (1P micro-atx 4-cores) is more efficient than many >cores (2P server 12-core). Seems obvious, but I like to test things. Yes, because we're using, in general, commodity components/assemblies, we're subject to the results of optimizations and market/business forces in other user spaces. Someone designing a media PC for home use might not care about electrical efficiency (there's no big yellow energy tags on computers, yet), but would care about noise. Someone designing a rack mounted server cares not a whit about noise, but really cares about a 10% change in power consumption. And, drop on top of that the non-synchronized differences in development/manufacturing/fabrication generations for the underlying parts. Consumer stuff comes out for the winter selling season. Commercial stuff probably is on a different cycle. It's not like everyone uses the same "model year changeover". > >> >> >> (oddly, simulated fault injection is one of the trickier parts) >> > >I would assume, because in a sense, the black swan* is >by definition hard to predict. Not so much that, as the actual mechanics of fault injection. Think about testing error detection and recovery for Flash memory. The underlying specification error rate is something like 1E-9 or 1E-10/read, and that's a worst case kind of spec, so errors aren't too common (I.e. You can't just run and wait for them to occur). SO how do you cause errors to occur (without perturbing the system.)... In the flash case, because we developed our own flash controller logic in an FPGA, we can add "error injection logic" to the design, but that's not always the case. How would you simulate upsets in a CPU core? (short of blasting it with radiation.. Which is difficult and expensive.. I wish it was as easy as getting a little Co60 gamma source and putting it on top of the chip.. We hike to somewhere that has an accelerator (UC Davis, Brookhaven, etc) and shoot protons and heavy ions at it. > >(* the book by Nick Taleb, not the movie) Black swans in this case would be things like the Pentium divide bug. Yes.. That *would* be a challenge, but hey, we've got folks in our JPL Laboratory for Reliable Software (LARS) who sit around thinking of how to do that, among other things. (http://lars-lab.jpl.nasa.gov/) Hmm.. I'll have to go talk to those guys about clusters of pi or arduinos... They're big into formal verifications, too, and model based verification. So you could have a modeled system in SysML or UML and compare its behavior with that on your prototype. > > >-- >Doug > >-- >This message has been scanned for viruses and >dangerous content by MailScanner, and is >believed to be clean. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 13 23:18:57 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 13 Jan 2012 23:18:57 -0500 (EST) Subject: [Beowulf] A cluster of Arduinos In-Reply-To: References: Message-ID: > care about electrical efficiency (there's no big yellow energy tags on > computers, yet), but would care about noise. Someone designing a rack the "plus 80" branding is pretty ubiquitous now, and the best part is that commodity ATX parts are starting to show up at gold levels. server vendors have offered gold or platinum for a while now, but it's probably more important in the home, since personal machines spend more time idling, thus running the PSU at low demand. poor-quality PSUs are remarkably bad at low utilization. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Fri Jan 13 23:46:17 2012 From: samuel at unimelb.edu.au (Chris Samuel) Date: Sat, 14 Jan 2012 15:46:17 +1100 Subject: [Beowulf] A cluster of Arduinos In-Reply-To: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> References: <6ea096464afff4fe7e544e0cd28c5204.squirrel@mail.eadline.org> Message-ID: <201201141546.17872.samuel@unimelb.edu.au> On Sat, 14 Jan 2012 02:18:02 AM Douglas Eadline wrote: > I would assume, because in a sense, the black swan* is > by definition hard to predict. Ahem, not around here, they're all black [1]. Now a white swan, that would be something to see! [1] http://www.flickr.com/photos/earthinmyeyes/4608041877/ cheers! Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Jan 19 09:46:26 2012 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 19 Jan 2012 09:46:26 -0500 Subject: [Beowulf] Parallel Programming Survey Report In-Reply-To: <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> References: <4F0F25D2.90305@runnersroll.com> <4F0FC2F2.5090606@unimelb.edu.au> <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> Message-ID: <6ec5ed08a6fb8c5b390b26bdfc18803a.squirrel@mail.eadline.org> Last year Dr Dobb's did a survey of parallel programming. Today I received a copy of: The Parallel Programming Landscape: Multicore has gone mainstream -- but are developers ready? It is mostly about multi-core and a bit Intel centric (they sponsored it) and not too much about HPC. Still interesting to see how the programming world is coping with multi-core. If you are interested in a copy you have to sign up here: https://www.cmpadministration.com/ars/emailnew.do?mode=emailnew&P=P2&MZP=&L=&F=1003933&K=&cid_download I'll probably read closer and post a summary on Cluster Monkey at some point. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eugen at leitl.org Thu Jan 19 09:57:37 2012 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 19 Jan 2012 15:57:37 +0100 Subject: [Beowulf] Parallel Programming Survey Report In-Reply-To: <6ec5ed08a6fb8c5b390b26bdfc18803a.squirrel@mail.eadline.org> References: <4F0F25D2.90305@runnersroll.com> <4F0FC2F2.5090606@unimelb.edu.au> <4c1a72f2e8097c3585e495e1c95bfa2c.squirrel@mail.eadline.org> <6ec5ed08a6fb8c5b390b26bdfc18803a.squirrel@mail.eadline.org> Message-ID: <20120119145737.GK21917@leitl.org> On Thu, Jan 19, 2012 at 09:46:26AM -0500, Douglas Eadline wrote: > Last year Dr Dobb's did a survey of parallel programming. > Today I received a copy of: > > The Parallel Programming Landscape: Multicore has gone mainstream -- > but are developers ready? > > It is mostly about multi-core and a bit Intel centric (they > sponsored it) and not too much about HPC. Still interesting > to see how the programming world is coping with multi-core. > If you are interested in a copy you have to sign up here: > > https://www.cmpadministration.com/ars/emailnew.do?mode=emailnew&P=P2&MZP=&L=&F=1003933&K=&cid_download > > I'll probably read closer and post a summary on Cluster Monkey > at some point. While speaking about multicore, I recommend this 21 min video interview (even if you dislike talking heads and smarmy interviewers) with david Ungar: http://channel9.msdn.com/Blogs/Charles/SPLASH-2011-David-Ungar-Self-ManyCore-and-Embracing-Non-Determinism _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Mon Jan 23 08:45:10 2012 From: eugen at leitl.org (Eugen Leitl) Date: Mon, 23 Jan 2012 14:45:10 +0100 Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=94And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= Message-ID: <20120123134510.GF7343@leitl.org> (Old idea, makes sense, will they be able to pull it off?) http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/ CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy Sunday, January 22, 2012 - by Joel Hruska The CPU design firm Venray Technology announced a new product design this week that it claims can deliver enormous performance benefits by combining CPU and DRAM on to a single piece of silicon. We spent some time earlier this fall discussing the new TOMI (Thread Optimized Multiprocessor) with company CTO Russell Fish, but while the idea is interesting; its presentation is marred by crazy conceptualizing and deeply suspect analytics. The Multicore Problem: There are three limiting factors, or walls, that limit the scaling of modern microprocessors. First, there's the memory wall, defined as the gap between the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level Parallelism) wall, which refers to the difficulty of decoding enough instructions per clock cycle to keep a core completely busy. Finally, there's the power wall--the faster a CPU is and the more cores it has, the more power it consumes. Attempting to compensate for one wall often risks running afoul of the other two. Adding more cache to decrease the impact of the CPU/DRAM speed discrepancy adds die complexity and draws more power, as does raising CPU clock speed. Combined, the three walls are a set of fundamental constraints--improving architectural efficiency and moving to a smaller process technology may make the room a bit bigger, but they don't remove the walls themselves. TOMI attempts to redefine the problem by building a very different type of microprocessor. The TOMI Borealis is built using the same transistor structures as conventional DRAM; the chip trades clock speed and performance for ultra-low low leakage. Its design is, by necessity, extremely simple. Not counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000 transistors for the original ARM2. The company's early prototypes, built on legacy DRAM technology, ran at 500MHz on a 110nm process. Instead of surrounding a CPU core with a substantial amount of L2 and L3 cache, Venray inserted a CPU core directly into a DRAM design. A TOMI Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16 ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM. Because they're built using ultra-low-leakage processes and are so small, such cores cost very little to build and consume vanishingly small amounts of power (Venray claims power consumption is as low as 23mW per core at 500MHz). It's an interesting idea. The Bad: When your CPU has fewer transistors than an architecture that debuted in 1986, it's a good chance that you left a few things out--like an FPU, branch prediction, pipelining, or any form of speculative execution. Venray may have created a chip with power consumption an order of magnitude lower than anything ARM builds and more memory bandwidth than Intel's highest-end Xeons, but it's an ultra-specialized, ultra-lightweight core that trades 25 years of flexibility and performance for scads of memory bandwidth. The last few years have seen a dramatic surge in the number of low-power, many-core architectures being floated as the potential future of computing, but Venray's approach relies on the manufacturing expertise of companies who have no experience in building microprocessors and don't normally serve as foundries. This imposes fundamental restrictions on the CPU's ability to scale; DRAM is manufactured using a three layer mask rather than the 10-12 layers Intel and AMD use for their CPUs. Venray already acknowledges that these conditions imposed substantial limitations on the original TOMI design. Of course, there's still a chance that the TOMI uarch could be effective in certain bandwidth-hungry scenarios--but that's where the Venray Crazy Train goes flying off the track. The Disingenuous and Crazy Let's start here. In a graph like this, you expect the two bars to represent the same systems being compared across three different characteristics. That's not the case. When we spoke to Russell Fish in late November, he pointed us to this publicly available document and claimed that the results came from a customer with 384 2.1GHz Xeons. There's no such thing as an S5620 Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz chip. The "Power consumption" graphs show Oracle's maximum power consumption for a system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB) of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case figure, it's a figure utterly unrelated to the workload shown in the Performance comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of them only come out to 1.3kW--Oracle's 17.7kW figure means that the overwhelming majority of the cabinet's power consumption is driven by components other than its CPUs. >From here, things rapidly get worse. Fish makes his points about power walls by referring to unverified claims that prototype 90nm Tejas chips drew 150W at 2.8GHz back in 2004. That's like arguing that Ford can't build a decent car because the Edsel sucked. After reading about the technology, you might think Venray was planning to market a small chip to high-end HPC niche markets... and you'd be wrong. The company expects the following to occur as a result of this revolutionary architecture (organized by least-to-most creepy): Computer speech will be so common that devices will talk to other devices in the presence of their users. Your cell phone camera will recognize the face of anyone it sees and scan the computer cloud for backround red flags as well as six degrees of separation Common commands will be reduced to short verbal cues like clicking your tongue or sucking your lips Your personal history will be displayed for one and all to see...women will create search engines to find eligible, prosperous men. Men will create search engines to qualify women. Criminals will find their jobs much more difficult because their history will be immediately known to anyone who encounters them. TOMI Technology will be built on flash memories creating the elemental unit of a learning machine... the machines will be able to self organize, build robust communicating structures, and collaborate to perform tasks. A disposable diaper company will give away TOMI enabled teddy bears that teach reading and arithmetic. It will be able to identify specific children... and from time to time remind Mom to buy a product. The bear will also diagnose a raspy throat, a cough, or runny nose. Conclusion: Fish has spent decades in the microprocessor industry--he invented the first CPU to use a clock multiplier in conjunction with Chuck H. Moore--but his vision of the future is crazy enough to scare mad dogs and Englishmen. His idea for a CPU architecture is interesting, even underneath the obfuscation and false representation, but too practically limited to ever take off. Google, an enthusiastic and dedicated proponent of energy efficient, multi-core research said it best in a paper titled "Brawny cores still beat wimpy cores, most of the time." "Once a chip?s single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making a business case for switching to the wimpy system becomes increasingly difficult... So go forth and multiply your cores, but do it in moderation, or the sea of wimpy cores will stick to your programmers? boots like clay." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 23 10:38:39 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 23 Jan 2012 10:38:39 -0500 Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=94And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= In-Reply-To: <20120123134510.GF7343@leitl.org> References: <20120123134510.GF7343@leitl.org> Message-ID: <4F1D7EFF.7080206@ias.edu> If you read this PDF from Venray Technologies, which is linked to in the article, you see where the 'Whole Bunch of Crazy" part comes from. After reading it, Venray lost a lot of credibility in my book. https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf -- Prentice On 01/23/2012 08:45 AM, Eugen Leitl wrote: > (Old idea, makes sense, will they be able to pull it off?) > > http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/ > > CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy > > Sunday, January 22, 2012 - by Joel Hruska > > The CPU design firm Venray Technology announced a new product design this > week that it claims can deliver enormous performance benefits by combining > CPU and DRAM on to a single piece of silicon. We spent some time earlier this > fall discussing the new TOMI (Thread Optimized Multiprocessor) with company > CTO Russell Fish, but while the idea is interesting; its presentation is > marred by crazy conceptualizing and deeply suspect analytics. > > The Multicore Problem: > > There are three limiting factors, or walls, that limit the scaling of modern > microprocessors. First, there's the memory wall, defined as the gap between > the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level > Parallelism) wall, which refers to the difficulty of decoding enough > instructions per clock cycle to keep a core completely busy. Finally, there's > the power wall--the faster a CPU is and the more cores it has, the more power > it consumes. > > Attempting to compensate for one wall often risks running afoul of the other > two. Adding more cache to decrease the impact of the CPU/DRAM speed > discrepancy adds die complexity and draws more power, as does raising CPU > clock speed. Combined, the three walls are a set of fundamental > constraints--improving architectural efficiency and moving to a smaller > process technology may make the room a bit bigger, but they don't remove the > walls themselves. > > TOMI attempts to redefine the problem by building a very different type of > microprocessor. The TOMI Borealis is built using the same transistor > structures as conventional DRAM; the chip trades clock speed and performance > for ultra-low low leakage. Its design is, by necessity, extremely simple. Not > counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000 > transistors for the original ARM2. The company's early prototypes, built on > legacy DRAM technology, ran at 500MHz on a 110nm process. > > Instead of surrounding a CPU core with a substantial amount of L2 and L3 > cache, Venray inserted a CPU core directly into a DRAM design. A TOMI > Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16 > ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM. > Because they're built using ultra-low-leakage processes and are so small, > such cores cost very little to build and consume vanishingly small amounts of > power (Venray claims power consumption is as low as 23mW per core at 500MHz). > > It's an interesting idea. > > The Bad: > > When your CPU has fewer transistors than an architecture that debuted in > 1986, it's a good chance that you left a few things out--like an FPU, branch > prediction, pipelining, or any form of speculative execution. Venray may have > created a chip with power consumption an order of magnitude lower than > anything ARM builds and more memory bandwidth than Intel's highest-end Xeons, > but it's an ultra-specialized, ultra-lightweight core that trades 25 years of > flexibility and performance for scads of memory bandwidth. > > > The last few years have seen a dramatic surge in the number of low-power, > many-core architectures being floated as the potential future of computing, > but Venray's approach relies on the manufacturing expertise of companies who > have no experience in building microprocessors and don't normally serve as > foundries. This imposes fundamental restrictions on the CPU's ability to > scale; DRAM is manufactured using a three layer mask rather than the 10-12 > layers Intel and AMD use for their CPUs. Venray already acknowledges that > these conditions imposed substantial limitations on the original TOMI design. > > Of course, there's still a chance that the TOMI uarch could be effective in > certain bandwidth-hungry scenarios--but that's where the Venray Crazy Train > goes flying off the track. > > The Disingenuous and Crazy > > Let's start here. In a graph like this, you expect the two bars to represent > the same systems being compared across three different characteristics. > That's not the case. When we spoke to Russell Fish in late November, he > pointed us to this publicly available document and claimed that the results > came from a customer with 384 2.1GHz Xeons. There's no such thing as an S5620 > Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz chip. > > The "Power consumption" graphs show Oracle's maximum power consumption for a > system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB) > of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case figure, > it's a figure utterly unrelated to the workload shown in the Performance > comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of > them only come out to 1.3kW--Oracle's 17.7kW figure means that the > overwhelming majority of the cabinet's power consumption is driven by > components other than its CPUs. > > From here, things rapidly get worse. Fish makes his points about power walls > by referring to unverified claims that prototype 90nm Tejas chips drew 150W > at 2.8GHz back in 2004. That's like arguing that Ford can't build a decent > car because the Edsel sucked. > > After reading about the technology, you might think Venray was planning to > market a small chip to high-end HPC niche markets... and you'd be wrong. The > company expects the following to occur as a result of this revolutionary > architecture (organized by least-to-most creepy): > > Computer speech will be so common that devices will talk to other devices > in the presence of their users. > > Your cell phone camera will recognize the face of anyone it sees and scan > the computer cloud for backround red flags as well as six degrees of > separation > > Common commands will be reduced to short verbal cues like clicking your > tongue or sucking your lips > > Your personal history will be displayed for one and all to see...women > will create search engines to find eligible, prosperous men. Men will create > search engines to qualify women. Criminals will find their jobs much more > difficult because their history will be immediately known to anyone who > encounters them. > > TOMI Technology will be built on flash memories creating the elemental > unit of a learning machine... the machines will be able to self organize, > build robust communicating structures, and collaborate to perform tasks. > > A disposable diaper company will give away TOMI enabled teddy bears that > teach reading and arithmetic. It will be able to identify specific > children... and from time to time remind Mom to buy a product. The bear will > also diagnose a raspy throat, a cough, or runny nose. > > Conclusion: > > Fish has spent decades in the microprocessor industry--he invented the first > CPU to use a clock multiplier in conjunction with Chuck H. Moore--but his > vision of the future is crazy enough to scare mad dogs and Englishmen. > > His idea for a CPU architecture is interesting, even underneath the > obfuscation and false representation, but too practically limited to ever > take off. Google, an enthusiastic and dedicated proponent of energy > efficient, multi-core research said it best in a paper titled "Brawny cores > still beat wimpy cores, most of the time." > > "Once a chip?s single-core performance lags by more than a factor to two or > so behind the higher end of current-generation commodity processors, making a > business case for switching to the wimpy system becomes increasingly > difficult... So go forth and multiply your cores, but do it in moderation, or > the sea of wimpy cores will stick to your programmers? boots like clay." > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 23 11:35:56 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 08:35:56 -0800 Subject: [Beowulf] =?windows-1252?q?CPU_Startup_Combines_CPU+DRAM=8BAnd_A_?= =?windows-1252?q?Whole_Bunch_Of_Crazy?= In-Reply-To: <4F1D7EFF.7080206@ias.edu> Message-ID: The CPU reminds me of the old bipolar AMD2901 CPU chip sets... RISC before it was called RISC. The white paper sort of harps on the fact that one cannot accurately predict the future (hey, I was a 10th grader at NCC in 1975, and saw the Altair at the MITS display in their trailer and KNEW that I wanted one, but I also wanted lots of other things there, which didn't pan out). Then, having established that you can make predictions with impunity and nobody can prove you wrong, they go on with a couple pages of ideas. (establishing priority for patenting.. Eh? Like the story Feynman tells about getting a patent on nuclear powered airplanes) The concept isn't particularly new (see, e.g. Transputers), but that's true of most architectural things. I think what happens is that as manufacturing or other limits/bumps in the road are hit, it forces a review. There's always the argument that building a bigger, faster version of what we had before is easier (support for legacy codes, etc.) and at some point, the balance shifts.. It's not easier to just build bigger faster. Vector processors Pipelines Cluster computers Etc. The "processors in a sea of memory" model has been around for a while (and, in fact, there were a lot of designs in the 80s, at the board if not the chip level: transputers, early hypercubes, etc.) So this is revisiting the architecture at a smaller level of integration. One thing about power consumption.. Those memory cells consume so little power because most of them are not being accessed. They're essentially "floating" capacitors. So the power consumption of the same transistor in a CPU (where the duty factor is 100%) is going to be higher than the power consumption in a memory cell (where the duty factor is 0.001% or something). And, as always, the challenge is in the software to effectively use the distributed computing architecture. When you think about it, we've had almost a century to figure out how to program single instruction stream computers of one sort or another, and it was easy, because we are single stream (SISD) ourselves. We can create a simulation of multiple threads by timesharing in some sense (in either the human or machine models) And we have lots of experience with EP type, or even scatter/gather type processes (tilling land, building pyramids, assembly lines) so that model of software/hardware architecture can be argued to be a natural outgrowth of what humans already do, and have been figuring out how to do for millenia. (did Imhotep use some form of project planning tools? You bet he did) However, true parallelism (MIMD) is harder to conceptualize. Vector and matrix math is one area, but I'd argue that it's just the same as EP tasks, just at a finer grain. Systolic arrays, vector pipelines, FFT boxes from FloatingPointSystems, are all basically ways to use the underlying structure of the task, in an easy way (how long til there's a hardware implementation of the new faster-than-FFT algorithm published last week?) And in all those cases, you have to explicitly make use of the special capabilities. That is, in general, the compiler doesn't recognize it (although, modern parallelizing compilers ARE really smart.. So they probably do find most of the cases) I don't know that we have good conceptual tools to take a complex task and break it effectively into multiple disparate component tasks that can effectively run in parallel. It's a hard task for something straightforward (e.g. Designing a big system or building a spacecraft), and I don't know that any of outputs of current project planning techniques (which are entirely manual) can be said to produce "generalized" optimum outputs. They produce *an* output for dividing the complex task up (or else the project can't be done), but I don't know that the output is provably optimum or even workable (an awful lot of projects over-run, and not just because of bad estimates for time/cost). So the problem facing would-be users of new computing architectures (be they TOMI, HyperCube, ConnectionMachine, or Beowulf) is like that facing a project planner given a big project, and a brand new crew of workers who speak a different language, with skill sets totally different than the planner is used to. This is what the computer user is facing: There's no compiler or problem description technique that will automatically generate a "work plan" to use that new architecture. It's all manual, and it's hard, and you're up against a brute force "why not just hook 500 people up to that rock and drag it" approach. The people who figure out the new way will certainly benefit society, but there's going to be a lot of false starts along the way. And, I'm not particularly sanguine about the process being automated (at least in the sense of automatic parallelizing compilers that recognize loops and repetitve stuff). I think that for the next few years (decades?) using new architectures is going to rely on skilled humans to figure out how to use it, on an ad hoc, unique to each application, basis. [Back in the 80s, I had a loaner "sugarcube" 4 node Intel hypercube sitting on my desk for a while. I wanted to figure out something to do with it that is non-trivial, and not the examples given in the docs (which focused on stuff like LISP and Prolog). I started, as I'm sure many people do, by taking a multithreaded application I had, and distributing the threads to processors. You pretty quickly realize, though, that it's tough to evenly distribute the loads among processors, and you wind up with processor 1 waiting for something that processor 2 is doing, which in turn is waiting for something that processor 3 is doing, and so forth. In a "shared processor" this isn't a big deal, and is transparent: the processor is always working, and aside from deadlocks, there's no particular reason why you need to balance load among threads. For what it's worth, the task I was doing was comparable to taking execution of a Matlab/simulink model and distributing it across multiple processors. You had signals flowing among blocks, etc. These things are computationally intensive (especially if you have loops in the design, so you need an iterative solution of some sort) so the idea of putting multiple processors to work is attractive. But the "work" in each block in the diagram isn't known a-priori and might vary during the course of the simulation, so it's not like you can come up with some sort of automatic partitioning algorithm. On 1/23/12 7:38 AM, "Prentice Bisbal" wrote: >If you read this PDF from Venray Technologies, which is linked to in the >article, you see where the 'Whole Bunch of Crazy" part comes from. After >reading it, Venray lost a lot of credibility in my book. > >https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf > >-- >Prentice > > >On 01/23/2012 08:45 AM, Eugen Leitl wrote: >> (Old idea, makes sense, will they be able to pull it off?) >> >> >>http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch >>-Of-Crazy/ >> >> CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy >> >> Sunday, January 22, 2012 - by Joel Hruska >> >> The CPU design firm Venray Technology announced a new product design >>this >> week that it claims can deliver enormous performance benefits by >>combining >> CPU and DRAM on to a single piece of silicon. We spent some time >>earlier this >> fall discussing the new TOMI (Thread Optimized Multiprocessor) with >>company >> CTO Russell Fish, but while the idea is interesting; its presentation is >> marred by crazy conceptualizing and deeply suspect analytics. >> >> The Multicore Problem: >> >> There are three limiting factors, or walls, that limit the scaling of >>modern >> microprocessors. First, there's the memory wall, defined as the gap >>between >> the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level >> Parallelism) wall, which refers to the difficulty of decoding enough >> instructions per clock cycle to keep a core completely busy. Finally, >>there's >> the power wall--the faster a CPU is and the more cores it has, the more >>power >> it consumes. >> >> Attempting to compensate for one wall often risks running afoul of the >>other >> two. Adding more cache to decrease the impact of the CPU/DRAM speed >> discrepancy adds die complexity and draws more power, as does raising >>CPU >> clock speed. Combined, the three walls are a set of fundamental >> constraints--improving architectural efficiency and moving to a smaller >> process technology may make the room a bit bigger, but they don't >>remove the >> walls themselves. >> >> TOMI attempts to redefine the problem by building a very different type >>of >> microprocessor. The TOMI Borealis is built using the same transistor >> structures as conventional DRAM; the chip trades clock speed and >>performance >> for ultra-low low leakage. Its design is, by necessity, extremely >>simple. Not >> counting the cache, TOMI is a 22,000 transistor design, as compared to >>30,000 >> transistors for the original ARM2. The company's early prototypes, >>built on >> legacy DRAM technology, ran at 500MHz on a 110nm process. >> >> Instead of surrounding a CPU core with a substantial amount of L2 and L3 >> cache, Venray inserted a CPU core directly into a DRAM design. A TOMI >> Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of >>16 >> ICs per 2GB DIMM. This works out to a total of 128 processor cores per >>DIMM. >> Because they're built using ultra-low-leakage processes and are so >>small, >> such cores cost very little to build and consume vanishingly small >>amounts of >> power (Venray claims power consumption is as low as 23mW per core at >>500MHz). >> >> It's an interesting idea. >> >> The Bad: >> >> When your CPU has fewer transistors than an architecture that debuted in >> 1986, it's a good chance that you left a few things out--like an FPU, >>branch >> prediction, pipelining, or any form of speculative execution. Venray >>may have >> created a chip with power consumption an order of magnitude lower than >> anything ARM builds and more memory bandwidth than Intel's highest-end >>Xeons, >> but it's an ultra-specialized, ultra-lightweight core that trades 25 >>years of >> flexibility and performance for scads of memory bandwidth. >> >> >> The last few years have seen a dramatic surge in the number of >>low-power, >> many-core architectures being floated as the potential future of >>computing, >> but Venray's approach relies on the manufacturing expertise of >>companies who >> have no experience in building microprocessors and don't normally serve >>as >> foundries. This imposes fundamental restrictions on the CPU's ability to >> scale; DRAM is manufactured using a three layer mask rather than the >>10-12 >> layers Intel and AMD use for their CPUs. Venray already acknowledges >>that >> these conditions imposed substantial limitations on the original TOMI >>design. >> >> Of course, there's still a chance that the TOMI uarch could be >>effective in >> certain bandwidth-hungry scenarios--but that's where the Venray Crazy >>Train >> goes flying off the track. >> >> The Disingenuous and Crazy >> >> Let's start here. In a graph like this, you expect the two bars to >>represent >> the same systems being compared across three different characteristics. >> That's not the case. When we spoke to Russell Fish in late November, he >> pointed us to this publicly available document and claimed that the >>results >> came from a customer with 384 2.1GHz Xeons. There's no such thing as an >>S5620 >> Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz >>chip. >> >> The "Power consumption" graphs show Oracle's maximum power consumption >>for a >> system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB >>(yes, TB) >> of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case >>figure, >> it's a figure utterly unrelated to the workload shown in the Performance >> comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, >>ten of >> them only come out to 1.3kW--Oracle's 17.7kW figure means that the >> overwhelming majority of the cabinet's power consumption is driven by >> components other than its CPUs. >> >> From here, things rapidly get worse. Fish makes his points about power >>walls >> by referring to unverified claims that prototype 90nm Tejas chips drew >>150W >> at 2.8GHz back in 2004. That's like arguing that Ford can't build a >>decent >> car because the Edsel sucked. >> >> After reading about the technology, you might think Venray was planning >>to >> market a small chip to high-end HPC niche markets... and you'd be >>wrong. The >> company expects the following to occur as a result of this revolutionary >> architecture (organized by least-to-most creepy): >> >> Computer speech will be so common that devices will talk to other >>devices >> in the presence of their users. >> >> Your cell phone camera will recognize the face of anyone it sees >>and scan >> the computer cloud for backround red flags as well as six degrees of >> separation >> >> Common commands will be reduced to short verbal cues like clicking >>your >> tongue or sucking your lips >> >> Your personal history will be displayed for one and all to >>see...women >> will create search engines to find eligible, prosperous men. Men will >>create >> search engines to qualify women. Criminals will find their jobs much >>more >> difficult because their history will be immediately known to anyone who >> encounters them. >> >> TOMI Technology will be built on flash memories creating the >>elemental >> unit of a learning machine... the machines will be able to self >>organize, >> build robust communicating structures, and collaborate to perform tasks. >> >> A disposable diaper company will give away TOMI enabled teddy bears >>that >> teach reading and arithmetic. It will be able to identify specific >> children... and from time to time remind Mom to buy a product. The bear >>will >> also diagnose a raspy throat, a cough, or runny nose. >> >> Conclusion: >> >> Fish has spent decades in the microprocessor industry--he invented the >>first >> CPU to use a clock multiplier in conjunction with Chuck H. Moore--but >>his >> vision of the future is crazy enough to scare mad dogs and Englishmen. >> >> His idea for a CPU architecture is interesting, even underneath the >> obfuscation and false representation, but too practically limited to >>ever >> take off. Google, an enthusiastic and dedicated proponent of energy >> efficient, multi-core research said it best in a paper titled "Brawny >>cores >> still beat wimpy cores, most of the time." >> >> "Once a chip?s single-core performance lags by more than a factor to >>two or >> so behind the higher end of current-generation commodity processors, >>making a >> business case for switching to the wimpy system becomes increasingly >> difficult... So go forth and multiply your cores, but do it in >>moderation, or >> the sea of wimpy cores will stick to your programmers? boots like clay." >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf >> >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Mon Jan 23 14:28:26 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 23 Jan 2012 11:28:26 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business Message-ID: <20120123192826.GB17383@bx9.net> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 14:59:30 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 23 Jan 2012 20:59:30 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120123192826.GB17383@bx9.net> References: <20120123192826.GB17383@bx9.net> Message-ID: Interesting article. Difficult for me analyse - usually you sell your business when it's a succes, or when you want to run away. Not sure which of the 2 it is here. Maybe some years from now with some support from Intel that Qlogic also can unroll FDR. Right now they're stuck with QDR, which on their homepage they announce as 40 gigabit per second. http://www.qlogic.com/Products/adapters/Pages/InfiniBandAdapters.aspx Showing the Qlogic 7300 series. Mellanox is slamdunking with FDR now, the new generation network which is double the bandwidth i suppose from QDR, which already got unrolled a few months ago and should be shipping by now. Qlogic AFAIK didn't even announce their next generation network yet, let alone display it and still toys with QDR, which is what i toy at home with. Fact they announced 'improving' the oldie QDR i would interpret as bad news for innovating to FDR. Maybe someone from Mellanox wants to comment on FDR and whether it's double the bandwidth of QDR, as i suppose some will be monitoring this list. On Jan 23, 2012, at 8:28 PM, Greg Lindahl wrote: > http://www.hpcwire.com/hpcwire/2012-01-23/ > intel_to_buy_qlogic_s_infiniband_business.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Jan 23 15:00:07 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 23 Jan 2012 15:00:07 -0500 (EST) Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=94And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= In-Reply-To: <4F1D7EFF.7080206@ias.edu> References: <20120123134510.GF7343@leitl.org> <4F1D7EFF.7080206@ias.edu> Message-ID: > If you read this PDF from Venray Technologies, which is linked to in the > article, you see where the 'Whole Bunch of Crazy" part comes from. After > reading it, Venray lost a lot of credibility in my book. > > https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf wow, you're not kidding. mostly it makes me wonder whether the economy is such that you can actually get first-round VC with collateral like that! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 23 15:17:01 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 12:17:01 -0800 Subject: [Beowulf] =?windows-1252?q?CPU_Startup_Combines_CPU+DRAM=8BAnd_A_?= =?windows-1252?q?Whole_Bunch_Of_Crazy?= In-Reply-To: Message-ID: I don't know.. Maybe it's the list of potential applications (some of which are speculative and well out there) is what it takes to justify VC.. Like DARPA.. High risk, high reward. The typical VC doesn't expect every investment to hit, but the ones that do, they want big returns from. If you're just interested in slogging through successive refinement, there are probably other sources of capital that are more appropriate. While some of those things are downright creepy, none of them appear to violate the laws of physics, and if someone with cash is willing to put some up to run the idea forward and establish a position (patent term is 20 years after all.. Which is a long ways in the future in the technology world). In 2030 there may be gripes on the equivalent of SlashDot about how this Venray had patents on all the fundamental things people are using. Think of hyperlinks, mice, etc. On 1/23/12 12:00 PM, "Mark Hahn" wrote: >> If you read this PDF from Venray Technologies, which is linked to in the >> article, you see where the 'Whole Bunch of Crazy" part comes from. After >> reading it, Venray lost a lot of credibility in my book. >> >> https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf > >wow, you're not kidding. mostly it makes me wonder whether the economy >is such that you can actually get first-round VC with collateral like >that! >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Mon Jan 23 15:50:09 2012 From: raysonlogin at gmail.com (Rayson Ho) Date: Mon, 23 Jan 2012 15:50:09 -0500 Subject: [Beowulf] =?utf-8?q?CPU_Startup_Combines_CPU+DRAM=E2=80=B9And_A_W?= =?utf-8?q?hole_Bunch_Of_Crazy?= In-Reply-To: References: <4F1D7EFF.7080206@ias.edu> Message-ID: On Mon, Jan 23, 2012 at 11:35 AM, Lux, Jim (337C) wrote: > The "processors in a sea of memory" model has been around for a while > (and, in fact, there were a lot of designs in the 80s, at the board if not > the chip level: transputers, early hypercubes, etc.) ?So this is > revisiting the architecture at a smaller level of integration. I remember 12-15 years ago I was reading quite a few papers published by the Berkeley Intelligent RAM (IRAM) Project: http://iram.cs.berkeley.edu/ So 15 years later someone suddenly thinks that it is a good idea to ship IRAM systems to real customers?? :-D Rayson ================================= Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > One thing about power consumption.. Those memory cells consume so little > power because most of them ?are not being accessed. ?They're essentially > "floating" capacitors. So the power consumption of the same transistor in > a CPU (where the duty factor is 100%) is going to be higher than the power > consumption in a memory cell (where the duty factor is 0.001% or > something). > > And, as always, the challenge is in the software to effectively use the > distributed computing architecture. ?When you think about it, we've had > almost a century to figure out how to program single instruction stream > computers of one sort or another, and it was easy, because we are single > stream (SISD) ourselves. ?We can create a simulation of multiple threads > by timesharing in some sense (in either the human or machine models) > > And we have lots of experience with EP type, or even scatter/gather type > processes (tilling land, building pyramids, assembly lines) so that model > of software/hardware architecture can be argued to be a natural outgrowth > of what humans already do, and have been figuring out how to do for > millenia. ?(did Imhotep use some form of project planning tools? ?You bet > he did) > > However, true parallelism (MIMD) is harder to conceptualize. ?Vector and > matrix math is one area, but I'd argue that it's just the same as EP > tasks, just at a finer grain. Systolic arrays, vector pipelines, FFT boxes > from FloatingPointSystems, are all basically ways to use the underlying > structure of the task, in an easy way (how long til there's a hardware > implementation of the new faster-than-FFT algorithm published last week?) > And in all those cases, you have to explicitly make use of the special > capabilities. ?That is, in general, the compiler doesn't recognize it > (although, modern parallelizing compilers ARE really smart.. So they > probably do find most of the cases) > > I don't know that we have good conceptual tools to take a complex task and > break it effectively into multiple disparate component tasks that can > effectively run in parallel. ?It's a hard task for something > straightforward (e.g. Designing a big system or building a spacecraft), > and I don't know that any of outputs of current project planning > techniques (which are entirely manual) can be said to produce > "generalized" optimum outputs. ?They produce *an* output for dividing the > complex task up (or else the project can't be done), but I don't know that > the output is provably optimum or even workable (an awful lot of projects > over-run, and not just because of bad estimates for time/cost). > > So the problem facing would-be users of new computing architectures (be > they TOMI, HyperCube, ConnectionMachine, or Beowulf) is like that facing a > project planner given a big project, and a brand new crew of workers who > speak a different language, with skill sets totally different than the > planner is used to. > > This is what the computer user is facing: ?There's no compiler or problem > description technique that will automatically generate a "work plan" to > use that new architecture. It's all manual, and it's hard, and you're up > against a brute force "why not just hook 500 people up to that rock and > drag it" approach. ?The people who figure out the new way will certainly > benefit society, but there's going to be a lot of false starts along the > way. ?And, I'm not particularly sanguine about the process being automated > (at least in the sense of automatic parallelizing compilers that recognize > loops and repetitve stuff). ?I think that for the next few years > (decades?) using new architectures is going to rely on skilled humans to > figure out how to use it, on an ad hoc, unique to each application, basis. > > > [Back in the 80s, I had a loaner "sugarcube" 4 node Intel hypercube > sitting on my desk for a while. ?I wanted to figure out something to do > with it that is non-trivial, and not the examples given in the docs (which > focused on stuff like LISP and Prolog). ?I started, as I'm sure many > people do, by taking a multithreaded application I had, and distributing > the threads to processors. ?You pretty quickly realize, though, that it's > tough to evenly distribute the loads among processors, and you wind up > with processor 1 waiting for something that processor 2 is doing, which in > turn is waiting for something that processor 3 is doing, and so forth. ?In > a "shared processor" this isn't a big deal, and is transparent: the > processor is always working, and aside from deadlocks, there's no > particular reason why you need to balance load among threads. > > For what it's worth, the task I was doing was comparable to taking > execution of a Matlab/simulink model and distributing it across multiple > processors. ?You had signals flowing among blocks, etc. ?These things are > computationally intensive (especially if you have loops in the design, so > you need an iterative solution of some sort) so the idea of putting > multiple processors to work is attractive. ? But the "work" in each block > in the diagram isn't known a-priori and might vary during the course of > the simulation, so it's not like you can come up with some sort of > automatic partitioning algorithm. > > > On 1/23/12 7:38 AM, "Prentice Bisbal" wrote: > >>If you read this PDF from Venray Technologies, which is linked to in the >>article, you see where the 'Whole Bunch of Crazy" part comes from. After >>reading it, Venray lost a lot of credibility in my book. >> >>https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf >> >>-- >>Prentice >> >> >>On 01/23/2012 08:45 AM, Eugen Leitl wrote: >>> (Old idea, makes sense, will they be able to pull it off?) >>> >>> >>>http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch >>>-Of-Crazy/ >>> >>> CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy >>> >>> Sunday, January 22, 2012 - by Joel Hruska >>> >>> The CPU design firm Venray Technology announced a new product design >>>this >>> week that it claims can deliver enormous performance benefits by >>>combining >>> CPU and DRAM on to a single piece of silicon. We spent some time >>>earlier this >>> fall discussing the new TOMI (Thread Optimized Multiprocessor) with >>>company >>> CTO Russell Fish, but while the idea is interesting; its presentation is >>> marred by crazy conceptualizing and deeply suspect analytics. >>> >>> The Multicore Problem: >>> >>> There are three limiting factors, or walls, that limit the scaling of >>>modern >>> microprocessors. First, there's the memory wall, defined as the gap >>>between >>> the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level >>> Parallelism) wall, which refers to the difficulty of decoding enough >>> instructions per clock cycle to keep a core completely busy. Finally, >>>there's >>> the power wall--the faster a CPU is and the more cores it has, the more >>>power >>> it consumes. >>> >>> Attempting to compensate for one wall often risks running afoul of the >>>other >>> two. Adding more cache to decrease the impact of the CPU/DRAM speed >>> discrepancy adds die complexity and draws more power, as does raising >>>CPU >>> clock speed. Combined, the three walls are a set of fundamental >>> constraints--improving architectural efficiency and moving to a smaller >>> process technology may make the room a bit bigger, but they don't >>>remove the >>> walls themselves. >>> >>> TOMI attempts to redefine the problem by building a very different type >>>of >>> microprocessor. The TOMI Borealis is built using the same transistor >>> structures as conventional DRAM; the chip trades clock speed and >>>performance >>> for ultra-low low leakage. Its design is, by necessity, extremely >>>simple. Not >>> counting the cache, TOMI is a 22,000 transistor design, as compared to >>>30,000 >>> transistors for the original ARM2. The company's early prototypes, >>>built on >>> legacy DRAM technology, ran at 500MHz on a 110nm process. >>> >>> Instead of surrounding a CPU core with a substantial amount of L2 and L3 >>> cache, Venray inserted a CPU core directly into a DRAM design. A TOMI >>> Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of >>>16 >>> ICs per 2GB DIMM. This works out to a total of 128 processor cores per >>>DIMM. >>> Because they're built using ultra-low-leakage processes and are so >>>small, >>> such cores cost very little to build and consume vanishingly small >>>amounts of >>> power (Venray claims power consumption is as low as 23mW per core at >>>500MHz). >>> >>> It's an interesting idea. >>> >>> The Bad: >>> >>> When your CPU has fewer transistors than an architecture that debuted in >>> 1986, it's a good chance that you left a few things out--like an FPU, >>>branch >>> prediction, pipelining, or any form of speculative execution. Venray >>>may have >>> created a chip with power consumption an order of magnitude lower than >>> anything ARM builds and more memory bandwidth than Intel's highest-end >>>Xeons, >>> but it's an ultra-specialized, ultra-lightweight core that trades 25 >>>years of >>> flexibility and performance for scads of memory bandwidth. >>> >>> >>> The last few years have seen a dramatic surge in the number of >>>low-power, >>> many-core architectures being floated as the potential future of >>>computing, >>> but Venray's approach relies on the manufacturing expertise of >>>companies who >>> have no experience in building microprocessors and don't normally serve >>>as >>> foundries. This imposes fundamental restrictions on the CPU's ability to >>> scale; DRAM is manufactured using a three layer mask rather than the >>>10-12 >>> layers Intel and AMD use for their CPUs. Venray already acknowledges >>>that >>> these conditions imposed substantial limitations on the original TOMI >>>design. >>> >>> Of course, there's still a chance that the TOMI uarch could be >>>effective in >>> certain bandwidth-hungry scenarios--but that's where the Venray Crazy >>>Train >>> goes flying off the track. >>> >>> The Disingenuous and Crazy >>> >>> Let's start here. In a graph like this, you expect the two bars to >>>represent >>> the same systems being compared across three different characteristics. >>> That's not the case. When we spoke to Russell Fish in late November, he >>> pointed us to this publicly available document and claimed that the >>>results >>> came from a customer with 384 2.1GHz Xeons. There's no such thing as an >>>S5620 >>> Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz >>>chip. >>> >>> The "Power consumption" graphs show Oracle's maximum power consumption >>>for a >>> system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB >>>(yes, TB) >>> of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case >>>figure, >>> it's a figure utterly unrelated to the workload shown in the Performance >>> comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, >>>ten of >>> them only come out to 1.3kW--Oracle's 17.7kW figure means that the >>> overwhelming majority of the cabinet's power consumption is driven by >>> components other than its CPUs. >>> >>> From here, things rapidly get worse. Fish makes his points about power >>>walls >>> by referring to unverified claims that prototype 90nm Tejas chips drew >>>150W >>> at 2.8GHz back in 2004. That's like arguing that Ford can't build a >>>decent >>> car because the Edsel sucked. >>> >>> After reading about the technology, you might think Venray was planning >>>to >>> market a small chip to high-end HPC niche markets... and you'd be >>>wrong. The >>> company expects the following to occur as a result of this revolutionary >>> architecture (organized by least-to-most creepy): >>> >>> ? ? Computer speech will be so common that devices will talk to other >>>devices >>> in the presence of their users. >>> >>> ? ? Your cell phone camera will recognize the face of anyone it sees >>>and scan >>> the computer cloud for backround red flags as well as six degrees of >>> separation >>> >>> ? ? Common commands will be reduced to short verbal cues like clicking >>>your >>> tongue or sucking your lips >>> >>> ? ? Your personal history will be displayed for one and all to >>>see...women >>> will create search engines to find eligible, prosperous men. Men will >>>create >>> search engines to qualify women. Criminals will find their jobs much >>>more >>> difficult because their history will be immediately known to anyone who >>> encounters them. >>> >>> ? ? TOMI Technology will be built on flash memories creating the >>>elemental >>> unit of a learning machine... the machines will be able to self >>>organize, >>> build robust communicating structures, and collaborate to perform tasks. >>> >>> ? ? A disposable diaper company will give away TOMI enabled teddy bears >>>that >>> teach reading and arithmetic. It will be able to identify specific >>> children... and from time to time remind Mom to buy a product. The bear >>>will >>> also diagnose a raspy throat, a cough, or runny nose. >>> >>> Conclusion: >>> >>> Fish has spent decades in the microprocessor industry--he invented the >>>first >>> CPU to use a clock multiplier in conjunction with Chuck H. Moore--but >>>his >>> vision of the future is crazy enough to scare mad dogs and Englishmen. >>> >>> His idea for a CPU architecture is interesting, even underneath the >>> obfuscation and false representation, but too practically limited to >>>ever >>> take off. Google, an enthusiastic and dedicated proponent of energy >>> efficient, multi-core research said it best in a paper titled "Brawny >>>cores >>> still beat wimpy cores, most of the time." >>> >>> ?"Once a chip?s single-core performance lags by more than a factor to >>>two or >>> so behind the higher end of current-generation commodity processors, >>>making a >>> business case for switching to the wimpy system becomes increasingly >>> difficult... So go forth and multiply your cores, but do it in >>>moderation, or >>> the sea of wimpy cores will stick to your programmers? boots like clay." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 23 15:58:11 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 12:58:11 -0800 Subject: [Beowulf] =?windows-1252?q?CPU_Startup_Combines_CPU+DRAM=8BAnd_A_?= =?windows-1252?q?Whole_Bunch_Of_Crazy?= In-Reply-To: Message-ID: On 1/23/12 12:50 PM, "Rayson Ho" wrote: >On Mon, Jan 23, 2012 at 11:35 AM, Lux, Jim (337C) > wrote: >> The "processors in a sea of memory" model has been around for a while >> (and, in fact, there were a lot of designs in the 80s, at the board if >>not >> the chip level: transputers, early hypercubes, etc.) So this is >> revisiting the architecture at a smaller level of integration. > >I remember 12-15 years ago I was reading quite a few papers published >by the Berkeley Intelligent RAM (IRAM) Project: > >http://iram.cs.berkeley.edu/ > >So 15 years later someone suddenly thinks that it is a good idea to >ship IRAM systems to real customers?? :-D > >Rayson Or maybe, all good ideas keep coming up again, and each time, it's refined a bit, or there's another possible source of funding appearing. Look at "solar power transmitted by microwaves from orbit" as an example. That one has a 15-20 year cycle time. You have an idea which is attractive.. You get some money to run it forward, and then insurmountable problems crop up, discoverable only with significant investment of time/money (>> 1 work month). That puts the idea to sleep for a while until either the reasons are forgotten, or technology has advanced to the point where what might have been unreasonable the previous time is reasonable now. Certainly in the computing world, where 10-15 years is sufficient for many orders of magnitude change in performance along many axes, it pays to revisit things, since what may have been a good balance or trade back then, isn't now. And that's sort of the thrust of their white paper (justifying that now the time is right), as well as staking their claim to a bunch of general applications, few of which are uniquely enabled by their proposed technology. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Jan 23 16:19:34 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 23 Jan 2012 16:19:34 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120123192826.GB17383@bx9.net> References: <20120123192826.GB17383@bx9.net> Message-ID: > http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html wonder what Intel's thinking - could do some very interesting stuff, but it would take a bit of charisma. QPI-over-IB anyone? I'm not crazy about Intel being a vertically-integrated HPC supplier (chips, systems, interconnect, mpi, compilers - I guess they still don't have their own scheduler or sexy cloud branding ;) the world is a better place when each level has internal competition based on useful, open (free), multi-implementation standards. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Jan 23 16:33:48 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Jan 2012 16:33:48 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <4F1DD23C.8080601@scalableinformatics.com> On 01/23/2012 04:19 PM, Mark Hahn wrote: > the world is a better place when each level has internal competition > based on useful, open (free), multi-implementation standards. Markets always go through these full on vertical integration phases (for a while) before the assets are sold off (either voluntarily or via bankruptcy court). Its a natural part of the business cycle. Cisco is building servers now. Oracle, the whole stack. Pretty soon, some whipper snapper of a company is going to come along and eat their lunches, and then they will get competitive pressure to change. This said, many *many* large university sites like dealing with "a single vendor" (that is until they get eventually screwed over by that one vendor, or realize that the "great deal" they are getting really isn't as great as it sounded ... ). Which is part of the reason its so hard getting into accounts other vendors have locked up. Sadly, lots of this works around the spirit (and probably skating very close to the edge of the letter) of the law surrounding most public acquisition processes, but thats life I guess. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 23 16:46:11 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 23 Jan 2012 16:46:11 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <4F1DD523.4020005@ias.edu> On 01/23/2012 04:19 PM, Mark Hahn wrote: >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? That's what I'm thinking! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 16:49:12 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 23 Jan 2012 22:49:12 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: On Jan 23, 2012, at 10:19 PM, Mark Hahn wrote: >> http://www.hpcwire.com/hpcwire/2012-01-23/ >> intel_to_buy_qlogic_s_infiniband_business.html > > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? forget it > > I'm not crazy about Intel being a vertically-integrated HPC supplier > (chips, systems, interconnect, mpi, compilers - I guess they still > don't have their own scheduler or sexy cloud branding ;) maybe they just want a new generation ethernet nic dirt cheap for their motherboards; if you produce it in those numbers as they do probably anything gets dirt cheap, this doesn't bit highend, yet it might be cheaper then to buy qlogic than pay royalties to any of the infiniband vendors; which would be either mellanox or qlogic. Also they bought qlogic for 125 million dollar, though in cash, which doesn't seem to me as exceptionnel much from intels viewpoint whereas they might intend to sell some of their upcoming line of vector cpu's which badly need a network of course. 125 million is just a few supercomputers. maybe it was just a cheap buy, as qlogic doesn't have FDR yet, who knows? What i wonder about is how wallstreet knew in advance about qlogic getting taken over. If we look careful we see that since say roughly december 19th 2011, the nasdaq rose roughly 10.5% and qlogic rose quite a lot more, several percent. So it was significant more in demand than the index, which is weird if we realize that qlogic has unrolled nothing those months whereas its competitor Mellanox has unrolled FDR. It's obvious some traders knew this deal was coming, but real fingerpointing is not my job. Vincent > the world is a better place when each level has internal competition > based on useful, open (free), multi-implementation standards. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Jan 23 18:00:02 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 10:00:02 +1100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <4F1DE672.6000602@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 08:19, Mark Hahn wrote: > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? I remember way back hearing the IB was going to be the technology to replace all those various buses (PCI, etc) on a motherboard [1], then it all went quiet and then it re-emerged as an interconnect. So perhaps Intel (who were part of one of the two groups that merged to create IB) have thoughts again on this? cheers, Chris [1] interestingly a similar comment appears on the IB Wikipedia page under history, but sadly without references.. http://en.wikipedia.org/wiki/InfiniBand#History - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8d5nIACgkQO2KABBYQAh+rcACgjTSmbr9EC4clrh0J2EQUT8lX Sz0AniUG4pdhBkliNWGq5E1tsXiOa8IV =0k6Z -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From joshua_mora at usa.net Mon Jan 23 18:02:12 2012 From: joshua_mora at usa.net (Joshua mora acosta) Date: Mon, 23 Jan 2012 17:02:12 -0600 Subject: [Beowulf] Intel buys QLogic InfiniBand business Message-ID: <708qawXBm8848S02.1327359732@web02.cms.usa.net> Do you mean IB over QPI ? Either way, High Node Count Coherence will be an issue. In any case, by acquiring their IP it is a step forward towards SoC (System on Chip). A preliminary step (building block) for the Exascale strategy and for low cost enterprise/cloud solutions. Joshua ------ Original Message ------ Received: 03:47 PM CST, 01/23/2012 From: Prentice Bisbal To: beowulf at beowulf.org Subject: Re: [Beowulf] Intel buys QLogic InfiniBand business > > On 01/23/2012 04:19 PM, Mark Hahn wrote: > >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > > wonder what Intel's thinking - could do some very interesting stuff, > > but it would take a bit of charisma. QPI-over-IB anyone? > > That's what I'm thinking! > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 18:24:15 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 00:24:15 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <708qawXBm8848S02.1327359732@web02.cms.usa.net> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> Message-ID: <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: > Do you mean IB over QPI ? > Either way, High Node Count Coherence will be an issue. Just ignore his statement - it's total nonsense. Nanosecond latency of QPI using 2 rings versus something that has a latency up to factor 1000 slower with the pci-e as the slowest delaying factor. Doing cache coherency over that forget it. From what i understand a big problem at modern cpu's is the crossbar. At latest chip displayed, the bulldozer, it's taking a significant amount of transistors. If you confront that crossbar suddenly with latencies a a factor 4000 slower, that's not gonna let it perform better of course. > In any case, by acquiring their IP it is a step forward towards SoC > (System on > Chip). A preliminary step (building block) for the Exascale > strategy and for > low cost enterprise/cloud solutions. Not with intel. Intel sells fast equipment yet it has a huge price always, about the opposite of infiniband which is a dirt cheap technology. I guess we must see this much simpler. At such a giant as intel, paying a bit over 100 million is peanuts. Probably less than what they would need to pay for royalties to a manufacturer owning a bunch of patents in the ethernet NIC area; the HPC intel gets 'for free'. Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap without needing to pay royalties to qlogic. It will not be a big performer such 10 gigabit ethernet nic, yet price matters a lot of course when integrating. Every penny counts then. What you typically see with intel is that for them the mass market is so important, read that's the 1 gigabit ethernet market right now, that all other products suffer there, as they will give their mass market products always, of course, priority. Itanium is a good example; it always was proces generations behind their main products. It never was given a fair chance to compete. So where they win it with sandy bridge becasue it's soon a proces generation or 2 having the edge on AMD, there intels other products suffer from this,as they don't get that proces technology. meanwhile ethernet is total crucial to have low latency for the financial world, as they can make dozens of billions a year by being faster than others at exchanges. Now back to that mass market and integration of a good and especially cheap 10 gigabit nic into intels mainboards, this buy might be pretty interesting to intel. Yet that's a market so big, it has nothing to do with HPC i'd argue. From HPC viewpoint i wouldn't see this takeover as a threat to anyone in HPC, i guess it basically means intel won't challenge for the crown in HPC, giving Mellanox monopoly for a while at FDR. It's about ethernet i bet. > > Joshua > ------ Original Message ------ > Received: 03:47 PM CST, 01/23/2012 > From: Prentice Bisbal > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Intel buys QLogic InfiniBand business > >> >> On 01/23/2012 04:19 PM, Mark Hahn wrote: >>>> > http://www.hpcwire.com/hpcwire/2012-01-23/ > intel_to_buy_qlogic_s_infiniband_business.html >>> wonder what Intel's thinking - could do some very interesting stuff, >>> but it would take a bit of charisma. QPI-over-IB anyone? >> >> That's what I'm thinking! >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Jan 23 19:03:14 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Jan 2012 19:03:14 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> Message-ID: <4F1DF542.6050504@scalableinformatics.com> On 01/23/2012 06:24 PM, Vincent Diepeveen wrote: > > On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: [...] > Nanosecond latency of QPI using 2 rings versus something that has a > latency up to factor 1000 slower > with the pci-e as the slowest delaying factor. > > Doing cache coherency over that forget it. Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't work!!! More seriously, with this acquisition, I could see serious contention for ScaleMP. SoC type stuff, using IB between many nodes, in smaller boxen. >> In any case, by acquiring their IP it is a step forward towards SoC >> (System on >> Chip). A preliminary step (building block) for the Exascale >> strategy and for >> low cost enterprise/cloud solutions. Yes. > Not with intel. Intel sells fast equipment yet it has a huge price > always, > about the opposite of infiniband which is a dirt cheap technology. Must use Shakespeare for this takedown: Methinks thou dost protesteth too much ... > > I guess we must see this much simpler. At such a giant as intel, > paying a bit over 100 million is peanuts. > Probably less than what they would need to pay for royalties to a > manufacturer owning a bunch of patents > in the ethernet NIC area; the HPC intel gets 'for free'. So ... exactly what are the existing intel 10GbE NIC's then ... Swiss Cheese? I see a fair number of vendors licensing Intel's IP, or, more to the point, using Intel silicon (hint: this might be a good reason for the acquisition) to build their stuff... > Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap ... which they have been doing for years ... > without needing to pay royalties to qlogic. ... not sure they were, but its possible Qlogic has 10GbE IP that Intel licenses, but this transaction was about ... Infiniband ... [...] > meanwhile ethernet is total crucial to have low latency for the > financial world, as they can make dozens of billions a year by being > faster > than others at exchanges. Errr ... given that this is one of our core markets, don't mind if I note that latency is critical to these players, so proximity to the exchange, and reliable and deterministic latency is absolutely critical. There are switches that are doing 300ns port to port in the Ethernet space now. With the NICs, you are looking in the 2-ish microsecond regime. These are not cheap. Compare this to QDR. 1 microsecond +/- some. Which has lower latency? There are many reasons why exchanges (mostly) aren't on IB. A few of them are even valid technical reasons. Historical momentum, and conservative approaches to new technology rank pretty high. So does the inability to generally export IB far and wide. And the complexity of the stack. Ethernet is (almost) plug and play. Its just a network. IB is sort of kind of plug, install OFED, and play for a while over IPoIB until you can recode for some of the RDMA bits. And don't try to run file systems and other things with lots of traffic over IPoIB. It leaks and gradually you will catch some cool ... surprises. Honestly, its a shame that IPoIB never really got the attention it deserved like the other elements of the IB stack did. Getting a rock solid IP implementation atop a fast/low latency net could have driven many design wins outside of HPC. And would have been a gateway drug^H^H^H^Htechnology for using the other stack elements. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Jan 23 19:06:43 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Jan 2012 19:06:43 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1DF542.6050504@scalableinformatics.com> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: <4F1DF613.1060603@scalableinformatics.com> On 01/23/2012 07:03 PM, Joe Landman wrote: > Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't > work!!! > There is an implicit /sarc tag here BTW. vSMP does a wonderful job (where Vincent claims that things won't work ... they do work, and very well at that). > More seriously, with this acquisition, I could see serious contention > for ScaleMP. SoC type stuff, using IB between many nodes, in smaller boxen. Serious contention to buy ScaleMP (as in potential acquirers) Must be getting too much blood in the coffee stream. Can't communicate ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From atp at piskorski.com Mon Jan 23 19:30:30 2012 From: atp at piskorski.com (Andrew Piskorski) Date: Mon, 23 Jan 2012 19:30:30 -0500 Subject: [Beowulf] CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy In-Reply-To: References: Message-ID: <20120124003030.GA80957@piskorski.com> On Mon, Jan 23, 2012 at 03:50:09PM -0500, Rayson Ho wrote: > http://iram.cs.berkeley.edu/ > > So 15 years later someone suddenly thinks that it is a good idea to > ship IRAM systems to real customers?? :-D Sure. But from when I last read about the IRAM stuff, I'm pretty sure it was strictly single core. Their VIRAM1 chip had 13 MB of DRAM, 1 cpu core, and 4 "vector lanes", with no mention of SMP or any sort of multi-chip parallelism at all. If Venray has a good design for using hundreds or more IRAM-like chips in a parallel machine, that sounds like a significant step forward. (The intended fab process and attendant design rules might also be quite different, although I'm not at all sure about that.) -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 19:40:13 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 01:40:13 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1DF542.6050504@scalableinformatics.com> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: On Jan 24, 2012, at 1:03 AM, Joe Landman wrote: > On 01/23/2012 06:24 PM, Vincent Diepeveen wrote: >> >> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: > > [...] > >> Nanosecond latency of QPI using 2 rings versus something that has a >> latency up to factor 1000 slower >> with the pci-e as the slowest delaying factor. >> >> Doing cache coherency over that forget it. > > Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't > work!!! > > More seriously, with this acquisition, I could see serious contention > for ScaleMP. SoC type stuff, using IB between many nodes, in > smaller boxen. > That would be some BlueGene type machine you speak about that intel would produce with a low power SoC. This where at this point the bluegene type machines simply can't compete with the tiny processors that get produced by the dozens of millions. "The tiny processors have won" Linus Thorvalds Intel has themselves a second law of Moore. You can google for it. Every new generation of factory that can produce this machine with double the number of transistors, that factory also is 2x more expensive. A few years ago intel projected that by 2020 building a single factory would have a cost of 20 billion dollar. Now Obama might contribute to this by overspending 40-50%, more overspending than the overspending of Greece, Spain, UK and Portugal combined. So that will cause massive inflation, which will hurt the poor most, and it sure will help the 2nd law of Moore become sooner a reality rather than later; yet if we move away from politics to money and mass production; i hope you realize that a few HPC cpu's won't pay back for 20 billion dollar. In short only cpu's that get mass produced can. A good example of massproduced processors are gpu's. If we look at the leading gpu's, which have by now thousands of cores, there is no way to compete with that with SoC's. What's price of producing 1 gpu versus 200 SOC's with a small core? Furthermore intel never really could compete in the SOC world so far with the low power cpu's that get produced by the billion a year, so betting on that would be quite surprising, though not impossible gamble. Intel always has been good in low latency designs. yet obviously further integration of logics into the cpu means of course you also need a capable ethernet chip in your cpu. Qlogics can provide that. Mass produce half a billion of those and then it's cheaper to buy a company with such technology than to pay royalties. Another HPC problem with the bluegene type designs: all those soc's basically spread the calculation power over a bigger area than 1 big power eating chip will. Bigger area means bigger distance to transfer massive data, and that's in itself a very expensive thing. Overall seen bluegene machines never really had a low power usage, despite some stupid professors shouting that. Per gflop it always was never the performance king; they just compared with total hopeless type designs and IBM usually delivered in time, something that is very important in HPC as well. IMHO the only reason bluegene could be competative is because it was fighting dinosaur type HPC cpu's. Now SoC's might be mighty interesting in the gamersworld and in the telecom to build new phones with, wich makes it mighty interesting for intel to produce those dirtcheap, and maybe even put a more capable ethernet chip on it, again dirtcheap; as for the HPC world i don't see it happen that this SoC can compete anyhow with a gpu or even CPU. Better write some code in CUDA or OpenCL i'd argue. Latest AMD gpu the HD Radeon 7970, it is delivering 1 teraflop or so? With soon a 2 gpu version coming on 1 card that's gonna deliver close to 2 Tflop a card, double precision yes. Multiply by 4 for single precision. 8+ Teraflop single precision. For a couple of hundreds of dollars. Nvidia will undoubtfully follow with their 1 teraflop gpu. If take a washing machine and pack it with cheapo socks, creating a 2 Tflop machine, do you guess you can SELL that for a couple of hundreds of dollars? Just transport costs already will be more expensive than a single gpu card... Intel cannot compete with that in HPC for the stuff that needs bandwidth and doesn't care for latency. as at a new proces technology, they first go produce a few FPGA cpu's, and after that they produce worlds fastest CPU. So there is simply no window in time to use the latest proces technology for a HPC vector type chip. That's why AMD-ATI and Nvidia will win that contest handsdown. And we sure hope intel will keep selling its cpu's very well, which if it is the case means that this won't change. After all they already make cash on majority of supercomputers as each node also usually has 2 Xeon cpu's which go for a multiple of the price of the GPU that's in the box... > >>> In any case, by acquiring their IP it is a step forward towards SoC >>> (System on >>> Chip). A preliminary step (building block) for the Exascale >>> strategy and for >>> low cost enterprise/cloud solutions. > > Yes. > >> Not with intel. Intel sells fast equipment yet it has a huge price >> always, >> about the opposite of infiniband which is a dirt cheap technology. > > Must use Shakespeare for this takedown: Methinks thou dost protesteth > too much ... > >> >> I guess we must see this much simpler. At such a giant as intel, >> paying a bit over 100 million is peanuts. >> Probably less than what they would need to pay for royalties to a >> manufacturer owning a bunch of patents >> in the ethernet NIC area; the HPC intel gets 'for free'. > > So ... exactly what are the existing intel 10GbE NIC's then ... Swiss > Cheese? I see a fair number of vendors licensing Intel's IP, or, more > to the point, using Intel silicon (hint: this might be a good > reason for > the acquisition) to build their stuff... > >> Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap > > ... which they have been doing for years ... > >> without needing to pay royalties to qlogic. > > ... not sure they were, but its possible Qlogic has 10GbE IP that > Intel > licenses, but this transaction was about ... Infiniband ... > > [...] > >> meanwhile ethernet is total crucial to have low latency for the >> financial world, as they can make dozens of billions a year by being >> faster >> than others at exchanges. > > Errr ... given that this is one of our core markets, don't mind if I > note that latency is critical to these players, so proximity to the > exchange, and reliable and deterministic latency is absolutely > critical. > There are switches that are doing 300ns port to port in the Ethernet > space now. With the NICs, you are looking in the 2-ish microsecond > regime. These are not cheap. > > Compare this to QDR. 1 microsecond +/- some. > > Which has lower latency? > > There are many reasons why exchanges (mostly) aren't on IB. A few of > them are even valid technical reasons. Historical momentum, and > conservative approaches to new technology rank pretty high. So > does the > inability to generally export IB far and wide. And the complexity of > the stack. Ethernet is (almost) plug and play. Its just a network. > > IB is sort of kind of plug, install OFED, and play for a while over > IPoIB until you can recode for some of the RDMA bits. And don't > try to > run file systems and other things with lots of traffic over IPoIB. It > leaks and gradually you will catch some cool ... surprises. > > Honestly, its a shame that IPoIB never really got the attention it > deserved like the other elements of the IB stack did. Getting a rock > solid IP implementation atop a fast/low latency net could have driven > many design wins outside of HPC. And would have been a gateway > drug^H^H^H^Htechnology for using the other stack elements. > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Jan 23 19:51:59 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 11:51:59 +1100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: <4F1E00AF.4090206@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 11:40, Vincent Diepeveen wrote: > Overall seen bluegene machines never really had a low power usage, > despite some stupid professors shouting that. So that's why the top 5 places on the last Green500 are all BlueGene.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8eAK8ACgkQO2KABBYQAh+nIwCdH88tISGrx772Sq/57XquLFRb GtcAni1urHGd2j+MIJA0LXG2sGk+YymR =tfjM -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 20:00:43 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 02:00:43 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1E00AF.4090206@unimelb.edu.au> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E00AF.4090206@unimelb.edu.au> Message-ID: <7C608C57-D51B-4369-A973-6943E2D2DB7C@xs4all.nl> On Jan 24, 2012, at 1:51 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 24/01/12 11:40, Vincent Diepeveen wrote: > >> Overall seen bluegene machines never really had a low power usage, >> despite some stupid professors shouting that. > > So that's why the top 5 places on the last Green500 are all BlueGene.. > I wondered about that as well. When i see 1 gpu get nearly 1 teraflop eating probably a tad more power than official, say a 250 watt it'll consume. I already use more power now than the specs in fact. Yet even then that's 4 gflop per watt. Last time i calculated bluegene, sure that's probably the previous generation, it was 3 watts per gflop, or factor 12 more power than a Radon HD 7970. Please note that in the statements of most HPC centers claiming blue gene to be energy efficient, usually they do not release numbers. But now the important question, what's price of bluegene per teraflop? It's let's have a look, around a 500 euro or so for a Radeon HD7970 card. Vincent > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8eAK8ACgkQO2KABBYQAh+nIwCdH88tISGrx772Sq/57XquLFRb > GtcAni1urHGd2j+MIJA0LXG2sGk+YymR > =tfjM > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Jan 23 20:06:41 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 12:06:41 +1100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <7C608C57-D51B-4369-A973-6943E2D2DB7C@xs4all.nl> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E00AF.4090206@unimelb.edu.au> <7C608C57-D51B-4369-A973-6943E2D2DB7C@xs4all.nl> Message-ID: <4F1E0421.80009@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 12:00, Vincent Diepeveen wrote: > But now the important question, what's price of bluegene per teraflop? > > It's let's have a look, around a 500 euro or so for a Radeon HD7970 > card. What does that matter if you can't power or cool a similar performance GPU system? Let alone have any applications that will actually take advantage of it. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8eBCEACgkQO2KABBYQAh839wCdFz1MjiPGCKwvbKpANCmJZpnU V4UAoJYIfKNf6VleNi0SduPcBtSkqxQq =E7Rh -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Jan 23 20:07:58 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 23 Jan 2012 20:07:58 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F1DD523.4020005@ias.edu> References: <20120123192826.GB17383@bx9.net> <4F1DD523.4020005@ias.edu> Message-ID: <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> > > On 01/23/2012 04:19 PM, Mark Hahn wrote: >>> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html >> wonder what Intel's thinking - could do some very interesting stuff, >> but it would take a bit of charisma. QPI-over-IB anyone? > > That's what I'm thinking! Numascale does this already with SCI -- Doug > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at eadline.org Mon Jan 23 20:15:30 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 23 Jan 2012 20:15:30 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> Message-ID: <2d90512c0be6a3eba887e5f6ab96b3c1.squirrel@mail.eadline.org> >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > > wonder what Intel's thinking - could do some very interesting stuff, > but it would take a bit of charisma. QPI-over-IB anyone? There were some exascale goals mentioned. I wonder if there is some plans for a MIC based exascale beast -- Doug > > I'm not crazy about Intel being a vertically-integrated HPC supplier > (chips, systems, interconnect, mpi, compilers - I guess they still > don't have their own scheduler or sexy cloud branding ;) > > the world is a better place when each level has internal competition > based on useful, open (free), multi-implementation standards. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ellis at cse.psu.edu Mon Jan 23 20:19:08 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Mon, 23 Jan 2012 20:19:08 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> Message-ID: <4F1E070C.4040107@cse.psu.edu> On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>> Nanosecond latency of QPI using 2 rings versus something that has a >>> latency up to factor 1000 slower >>> with the pci-e as the slowest delaying factor. >>> >>> Doing cache coherency over that forget it. >> >> Hear that Shai F? Stop work on vSMP now, cause Vincent says it can't >> work!!! >> >> More seriously, with this acquisition, I could see serious contention >> for ScaleMP. SoC type stuff, using IB between many nodes, in >> smaller boxen. > > That would be some BlueGene type machine you speak about that intel > would produce with a low power SoC. > > This where at this point the bluegene type machines simply can't > compete with the tiny processors > that get produced by the dozens of millions. For...chess? ;D > "The tiny processors have won" > Linus Thorvalds *Torvalds, and if Linux (or any well-supported kernel/OS for that matter) currently had data structures designed for extremely high parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I would agree with this statement. As I currently see it, all we can really say is that someday, probably, perhaps even hopefully: "The tiny processors will win." That's after we work out all the nasty nuances involved with designing new data structures for OSes that can handle that number of cores, and probably design new applications that can use these new OS features. And no, GPU support in Linux doesn't count as this already having been done. We just farm out very specific code to run on those things. If somebody has an example of a full-blown, usable OS running on a GPU ALONE, I would stand (very interestingly) corrected. > Intel has themselves a second law of Moore. You can google for it. Thanks, for a moment there, I almost used AskJeeves. > A good example of massproduced processors are gpu's. Was waiting for the hook. Inevitable really. I think if we were discussing the efficacy and quality of resultant bread from various bread machines versus the numerous methods for making bread by hand somehow, someway, a GPU would make better bread. Might be a wholesome cyber-loaf of artisan wheat, but nonetheless, it would be better in every way. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 20:44:10 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 02:44:10 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F1E070C.4040107@cse.psu.edu> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: In hardware you cannot beat manycore performance CPU's at the same cost structure; cpu's have an exponential cost structure, for example to maintain cache-coherency. This has many implications; for example also on size and scale. If you produce a 1000 mm^2 cpu this is extremely expensive with real low yields, whereas a 1000 mm^2 manycore is not a problem at all; cores that do not work you can just turn off. There is no coherency. So if you produce bigger cpu's, the price goes up per square millimeter, with manycores it scales near lineair. If i remember well at 2007 a NCSA director already had put the implication of this reality in his sheets, assuming by 2010 NCSA would build supercomputers exclusively using manycores. Note that manycores are not ideal for chess - they are however possible to use for majority of system time that gets burned in HPC as majority of HPC needs throughput rather than latency. Comparing bluegene machines with gpu's makes perfect sense of course as the latency on them is also total crap. I see the bluegene system by IBM as a genius move from IBM, starting an evolution, moving away from huge expensive cpu's where you produce just a handful from in a total outdated proces technology, with extremely bad yields, with a milliondollar of startup costs, which by now woud be at todays factories approaching 20 million dollar startup costs just to print a single batch of processors. IBM developing power8 will have a serious problem with newer generation factories. Every batch they print, every mistake it has, DANG 20 million dollar gone. This concept of using simple cpu's, yet not that massively produced yet, obviously evoluted now into a gpu, which is 1 total mass produced cheap chip, that integrates all those tiny cores into 1 cpu, which is way cheaper. What's price of a bluegene system per teraflop? It's 500 euro for a 1 teraflop double precision Radeon HD7970... On Jan 24, 2012, at 2:19 AM, Ellis H. Wilson III wrote: > On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>>> Nanosecond latency of QPI using 2 rings versus something that has a >>>> latency up to factor 1000 slower >>>> with the pci-e as the slowest delaying factor. >>>> >>>> Doing cache coherency over that forget it. >>> >>> Hear that Shai F? Stop work on vSMP now, cause Vincent says it >>> can't >>> work!!! >>> >>> More seriously, with this acquisition, I could see serious >>> contention >>> for ScaleMP. SoC type stuff, using IB between many nodes, in >>> smaller boxen. >> >> That would be some BlueGene type machine you speak about that intel >> would produce with a low power SoC. >> >> This where at this point the bluegene type machines simply can't >> compete with the tiny processors >> that get produced by the dozens of millions. > > For...chess? ;D > >> "The tiny processors have won" >> Linus Thorvalds > > *Torvalds, and if Linux (or any well-supported kernel/OS for that > matter) currently had data structures designed for extremely high > parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I > would agree with this statement. As I currently see it, all we can > really say is that someday, probably, perhaps even hopefully: > > "The tiny processors will win." > > That's after we work out all the nasty nuances involved with designing > new data structures for OSes that can handle that number of cores, and > probably design new applications that can use these new OS features. > And no, GPU support in Linux doesn't count as this already having been > done. We just farm out very specific code to run on those things. If > somebody has an example of a full-blown, usable OS running on a GPU > ALONE, I would stand (very interestingly) corrected. > >> Intel has themselves a second law of Moore. You can google for it. > > Thanks, for a moment there, I almost used AskJeeves. > >> A good example of massproduced processors are gpu's. > > Was waiting for the hook. Inevitable really. I think if we were > discussing the efficacy and quality of resultant bread from various > bread machines versus the numerous methods for making bread by hand > somehow, someway, a GPU would make better bread. Might be a wholesome > cyber-loaf of artisan wheat, but nonetheless, it would be better in > every way. > > Best, > > ellis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Jan 23 20:55:41 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 02:55:41 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> References: <20120123192826.GB17383@bx9.net> <4F1DD523.4020005@ias.edu> <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> Message-ID: <534AD42D-DC33-4199-B476-9ADED3E09073@xs4all.nl> On Jan 24, 2012, at 2:07 AM, Douglas Eadline wrote: > >> >> On 01/23/2012 04:19 PM, Mark Hahn wrote: >>>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>>> intel_to_buy_qlogic_s_infiniband_business.html >>> wonder what Intel's thinking - could do some very interesting stuff, >>> but it would take a bit of charisma. QPI-over-IB anyone? >> >> That's what I'm thinking! > > Numascale does this already with SCI They sold 300 systems, is claim on homepage. Not exactly what intel aims for. I bet they instead aim to sell half a billion cpu's with built in ethernet - let's face it their NICs started to get outdated. For HPC it won't be a slamming succes let alone give you any performance. After all what's price of 1000 SoC's with 1000 tiny cpu's on it, that together produce you 1 teraflop, versus 1 manycore that produces 1 teraflop? This is not what you buy Qlogics for. Maybe it was just a cheap buy for the number of patents they posses, and the big need within intel for some engineers that can improve their cpu's with connectivity that the average user will like; as for HPC, moving those engineers within intel to the areas where intel can make most cash, that's with cpu's and not with HPC hardware, seems Mellanox gets a monopoly on HPC network performance. > > -- > Doug > >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Mon Jan 23 23:55:41 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 23 Jan 2012 20:55:41 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120123192826.GB17383@bx9.net> References: <20120123192826.GB17383@bx9.net> Message-ID: <20120124045541.GB10196@bx9.net> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: > http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html I figured out the main why: http://seekingalpha.com/news-article/2082171-qlogic-gains-market-share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets > Server-class 10Gb Ethernet Adapter and LOM revenues have recently > surpassed $100 million per quarter, and are on track for about fifty > percent annual growth, according to Crehan Research. That's the whole market, and QLogic says they are #1 in the FCoE adapter segment of this market, and #2 in the overall 10 gig adapter market (see http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses-f2q12-results-earnings-call-transcript) Historically, QLogic had a fibre channel adapter business that was a huge cash cow, and they bought their way into various markets and had limited success with them: iscsi, fibre channel switches, and yes, InfiniBand, where QLogic managed to get some large sales (TriLabs 3 PF procurement) yet was at only 15%-20% market share. I'm surprised that QLogic could succeed in 10gige adapters given all the competition, but hey, I never understood why fibre channel was popular, either. Now that QLogic has found what the next best thing after fibre channel adapters is, they might as well concentrate on it. It'll be interesting what Intel plans to do in the exascale market. I've thought for a long time that non-cache-coherent processors like MIC ought to have InfiniPath-like hardware queues for sending and receiving short messages efficiently, even on-chip. Not to mention that whole exascale thing. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From scrusan at ur.rochester.edu Tue Jan 24 00:02:26 2012 From: scrusan at ur.rochester.edu (Steve Crusan) Date: Tue, 24 Jan 2012 00:02:26 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: > > > It's 500 euro for a 1 teraflop double precision Radeon HD7970... Great, and nothing runs on it. GPUs are insanely useful for certain tasks, but they aren't going to be able to handle most normal workloads(similar to the BG class of course). Any center that buys BGP (or Q at this point) gear is going to pay for a scientific programmer to adapt their code to take advantage of the BG's strengths; parallelism. But It's nice that supercomputing centers use GPUs to boost their flops numbers. Any word on that Chinese system's efficiency? If you look at the architecture of the new K computer in Japan, it's similar to the BlueGene line. PS: I'm really not an IBMer. > > > > On Jan 24, 2012, at 2:19 AM, Ellis H. Wilson III wrote: > >> On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>>>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>>>> Nanosecond latency of QPI using 2 rings versus something that has a >>>>> latency up to factor 1000 slower >>>>> with the pci-e as the slowest delaying factor. >>>>> >>>>> Doing cache coherency over that forget it. >>>> >>>> Hear that Shai F? Stop work on vSMP now, cause Vincent says it >>>> can't >>>> work!!! >>>> >>>> More seriously, with this acquisition, I could see serious >>>> contention >>>> for ScaleMP. SoC type stuff, using IB between many nodes, in >>>> smaller boxen. >>> >>> That would be some BlueGene type machine you speak about that intel >>> would produce with a low power SoC. >>> >>> This where at this point the bluegene type machines simply can't >>> compete with the tiny processors >>> that get produced by the dozens of millions. >> >> For...chess? ;D >> >>> "The tiny processors have won" >>> Linus Thorvalds >> >> *Torvalds, and if Linux (or any well-supported kernel/OS for that >> matter) currently had data structures designed for extremely high >> parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I >> would agree with this statement. As I currently see it, all we can >> really say is that someday, probably, perhaps even hopefully: >> >> "The tiny processors will win." >> >> That's after we work out all the nasty nuances involved with designing >> new data structures for OSes that can handle that number of cores, and >> probably design new applications that can use these new OS features. >> And no, GPU support in Linux doesn't count as this already having been >> done. We just farm out very specific code to run on those things. If >> somebody has an example of a full-blown, usable OS running on a GPU >> ALONE, I would stand (very interestingly) corrected. >> >>> Intel has themselves a second law of Moore. You can google for it. >> >> Thanks, for a moment there, I almost used AskJeeves. >> >>> A good example of massproduced processors are gpu's. >> >> Was waiting for the hook. Inevitable really. I think if we were >> discussing the efficacy and quality of resultant bread from various >> bread machines versus the numerous methods for making bread by hand >> somehow, someway, a GPU would make better bread. Might be a wholesome >> cyber-loaf of artisan wheat, but nonetheless, it would be better in >> every way. >> >> Best, >> >> ellis >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJPHjtzAAoJENS19LGOpgqKUHUH/Rvn6tXy8Kla86JNbNwt3KUJ B+70SwJL/aBDstcDG4ChT5uW0WCcuvS7qRx5e1Zwu68m7qFEZRvIwc0uu0bgHbxt KRynFRZ6suwudEp0o4HMpCBYNaC7uG7xkUeFbUHKfnfCflWDoz4Y9Fq3a/OhoriK a5JrQqjVI6HZij+xDqrFvyn80Ec8eSwfRYd8lxfq4abHtE1tKYm/cF5I5Bn2lD5l wVNvBQiU99ZPeqhcbL5XyvIsceB6ncodJ9zmBxIahrNIogMCq7UJbUhsikSRp6Dd cL7r0AekTyiRmvZaHZZKbuad68DfATT4hy9/HzodBqTWLxxTMlrW8vNH9a7dSOo= =oA7r -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Tue Jan 24 00:09:57 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Jan 2012 16:09:57 +1100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: <4F1E3D25.7000008@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 16:02, Steve Crusan wrote: > Any center that buys BGP (or Q at this point) gear is > going to pay for a scientific programmer to adapt their > code to take advantage of the BG's strengths; parallelism. The advantage of the BG platform though is that it's just MPI and threads, nothing that unusual at all - certainly no need to learn CUDA, OpenCL, etc.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ePSUACgkQO2KABBYQAh+hPQCggfFgdr9R9G6H7hW0Dk1/sGK+ Fe8Aniu7M6CEThw0s7F2CtqTCmuNZMRg =mH9r -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Tue Jan 24 00:32:08 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 24 Jan 2012 00:32:08 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> References: <20120123192826.GB17383@bx9.net> <4F1DD523.4020005@ias.edu> <39e1ffabf2e6448ea3b65da4e34b712a.squirrel@mail.eadline.org> Message-ID: >>> but it would take a bit of charisma. QPI-over-IB anyone? >> >> That's what I'm thinking! > > Numascale does this already with SCI it's easy to source and build pretty big IB systems; how much so with SCI? I actually like the idea of high-fanout-distributed-router systems, but they seem prepetually exotic. where are the hypercubes, FNNs? afaikt, commodification of IB has snuffed topology as a design issue, except for cray/BG/k machine-level projects. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 24 00:53:14 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 23 Jan 2012 21:53:14 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: Message-ID: Inevitably, though, massively parallel interconnects (all boxes connected to all other boxes) won't scale. On 1/23/12 9:32 PM, "Mark Hahn" wrote: >>>> but it would take a bit of charisma. QPI-over-IB anyone? >>> >>> That's what I'm thinking! >> >> Numascale does this already with SCI > >it's easy to source and build pretty big IB systems; >how much so with SCI? > >I actually like the idea of high-fanout-distributed-router systems, >but they seem prepetually exotic. where are the hypercubes, FNNs? >afaikt, commodification of IB has snuffed topology as a design issue, >except for cray/BG/k machine-level projects. >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 06:53:35 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 12:53:35 +0100 Subject: [Beowulf] CPU Startup Combines CPU+DRAM?And A Whole Bunch Of Crazy In-Reply-To: <20120124003030.GA80957@piskorski.com> References: <20120124003030.GA80957@piskorski.com> Message-ID: <20120124115335.GW7343@leitl.org> On Mon, Jan 23, 2012 at 07:30:30PM -0500, Andrew Piskorski wrote: > On Mon, Jan 23, 2012 at 03:50:09PM -0500, Rayson Ho wrote: > > > http://iram.cs.berkeley.edu/ > > > > So 15 years later someone suddenly thinks that it is a good idea to > > ship IRAM systems to real customers?? :-D > > Sure. But from when I last read about the IRAM stuff, I'm pretty sure > it was strictly single core. Their VIRAM1 chip had 13 MB of DRAM, 1 > cpu core, and 4 "vector lanes", with no mention of SMP or any sort of > multi-chip parallelism at all. If Venray has a good design for using > hundreds or more IRAM-like chips in a parallel machine, that sounds > like a significant step forward. (The intended fab process and > attendant design rules might also be quite different, although I'm not > at all sure about that.) In order to make best use of eDRAM it's best to organize the CPU around the layout of the memory cells, treating it as an array. You'll need a refresh register, best as wide as possible, multi-kBit word sizes, add shifts (which helps the network processor), VLIW/SIMD, large integer addition and subtraction, and so on. If you shrink the dies, use redunant connections to route around dead dies you can have WSI with utilization rates of >90% of the real estate. Even without FPUs such a sea of nodes on a mesh maps very well to massively parallel physical problems, AI (spiking neurons), and such. Even as a particle swarm/game physics accelerator engine integrated into RAM it really helps with massively boosting game video and physics performance, with obvious applications in GPGPU as well. This is not at all stupid, if only this wouldn't be pushed by apparent bozos. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Tue Jan 24 07:48:23 2012 From: deadline at eadline.org (Douglas Eadline) Date: Tue, 24 Jan 2012 07:48:23 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: Message-ID: <986a2d9cf54a1630130a3361fc25a547.squirrel@mail.eadline.org> > Inevitably, though, massively parallel interconnects (all boxes connected > to all other boxes) won't scale. > Indeed, when thinking about scale I always end up thinking about the masters of scale -- ants -- Doug > > On 1/23/12 9:32 PM, "Mark Hahn" wrote: > >>>>> but it would take a bit of charisma. QPI-over-IB anyone? >>>> >>>> That's what I'm thinking! >>> >>> Numascale does this already with SCI >> >>it's easy to source and build pretty big IB systems; >>how much so with SCI? >> >>I actually like the idea of high-fanout-distributed-router systems, >>but they seem prepetually exotic. where are the hypercubes, FNNs? >>afaikt, commodification of IB has snuffed topology as a design issue, >>except for cray/BG/k machine-level projects. >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Tue Jan 24 07:51:54 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 13:51:54 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: On Jan 24, 2012, at 6:02 AM, Steve Crusan wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: >> >> >> It's 500 euro for a 1 teraflop double precision Radeon HD7970... > > > Great, and nothing runs on it. You build a system of millions of euro's alltogether, NCSA having a huge budget and you can't even pay for a few programmers who write some crunching code for gpu's???? > GPUs are insanely useful for certain tasks, but they aren't going > to be able to handle most normal workloads(similar to the BG class > of course). Any center that buys BGP (or Q at this point) gear is > going to pay for a scientific programmer to adapt their code to > take advantage of the BG's strengths; parallelism. > bluegene is ibm's equivalent of a HPC gpu, just it's a lot more expensive such box. > But It's nice that supercomputing centers use GPUs to boost their > flops numbers. Any word on that Chinese system's efficiency? Actually on this mailing list if you scroll back in history, and look in 2007, some chinese researchers here posted their codes were, we speak of the 512 streamcore ATI's, already reaching 50% IPC, and it worked crossplatform at AMD and Nvidia. They got 25% efficiency at nvidia. Now if we realize that most codes on this planet can't use multiply- add, then 25% at nvidia and 50% at ATI was really good. If we look to all sorts of applications and see that if 1 good programmer is doing effort, suddenly it works great at gpu's. > If you look at the architecture of the new K computer in Japan, > it's similar to the BlueGene line. > > PS: I'm really not an IBMer. > I took a look at latest BlueGene/Q and basically it's 4 threads per core @ 18 core @ 1.6Ghz or something they are gonna build. that's a much improved chip over the old bluegenes which are 3 watt per gflop. Yet to my surprise, or maybe not, it's still not in the league of gpu's. the not yet built bluegene/q supercomputer claims 2 flops per watt now. GPU's are 4 flops per watt now and already you can buy it in a shop. And at least 1 chinese researcher posted here in 2007 to get 2 flops per watt out of it. What works on such ibm hardware efficient should also be no problem to port to a GPU. I see no money amounts quoted on what bluegene/q is gonna cost, yet we can be sure it's gonna cost you more than a gpu in the shops. So a chip not yet sold by ibm, if i may believe wiki, especially designed for its purpose, can't compete with a gpu, that's already in the shops, which has been designed for gamers. Realize that the gpu has been designed for single precision calculations and delivers 4x more single precision flops than double, and we are comparing it double precision here. BG/Q is using 45 nm processors and AMD7970 is using 28 nm proces technology, to just show my point. > > >> >> >> >> On Jan 24, 2012, at 2:19 AM, Ellis H. Wilson III wrote: >> >>> On 01/23/2012 07:40 PM, Vincent Diepeveen wrote: >>>>>> On Jan 24, 2012, at 12:02 AM, Joshua mora acosta wrote: >>>>>> Nanosecond latency of QPI using 2 rings versus something that >>>>>> has a >>>>>> latency up to factor 1000 slower >>>>>> with the pci-e as the slowest delaying factor. >>>>>> >>>>>> Doing cache coherency over that forget it. >>>>> >>>>> Hear that Shai F? Stop work on vSMP now, cause Vincent says it >>>>> can't >>>>> work!!! >>>>> >>>>> More seriously, with this acquisition, I could see serious >>>>> contention >>>>> for ScaleMP. SoC type stuff, using IB between many nodes, in >>>>> smaller boxen. >>>> >>>> That would be some BlueGene type machine you speak about that intel >>>> would produce with a low power SoC. >>>> >>>> This where at this point the bluegene type machines simply can't >>>> compete with the tiny processors >>>> that get produced by the dozens of millions. >>> >>> For...chess? ;D >>> >>>> "The tiny processors have won" >>>> Linus Thorvalds >>> >>> *Torvalds, and if Linux (or any well-supported kernel/OS for that >>> matter) currently had data structures designed for extremely high >>> parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I >>> would agree with this statement. As I currently see it, all we can >>> really say is that someday, probably, perhaps even hopefully: >>> >>> "The tiny processors will win." >>> >>> That's after we work out all the nasty nuances involved with >>> designing >>> new data structures for OSes that can handle that number of >>> cores, and >>> probably design new applications that can use these new OS features. >>> And no, GPU support in Linux doesn't count as this already having >>> been >>> done. We just farm out very specific code to run on those >>> things. If >>> somebody has an example of a full-blown, usable OS running on a GPU >>> ALONE, I would stand (very interestingly) corrected. >>> >>>> Intel has themselves a second law of Moore. You can google for it. >>> >>> Thanks, for a moment there, I almost used AskJeeves. >>> >>>> A good example of massproduced processors are gpu's. >>> >>> Was waiting for the hook. Inevitable really. I think if we were >>> discussing the efficacy and quality of resultant bread from various >>> bread machines versus the numerous methods for making bread by hand >>> somehow, someway, a GPU would make better bread. Might be a >>> wholesome >>> cyber-loaf of artisan wheat, but nonetheless, it would be better in >>> every way. >>> >>> Best, >>> >>> ellis >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > University of Rochester > https://www.crc.rochester.edu/ > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJPHjtzAAoJENS19LGOpgqKUHUH/Rvn6tXy8Kla86JNbNwt3KUJ > B+70SwJL/aBDstcDG4ChT5uW0WCcuvS7qRx5e1Zwu68m7qFEZRvIwc0uu0bgHbxt > KRynFRZ6suwudEp0o4HMpCBYNaC7uG7xkUeFbUHKfnfCflWDoz4Y9Fq3a/OhoriK > a5JrQqjVI6HZij+xDqrFvyn80Ec8eSwfRYd8lxfq4abHtE1tKYm/cF5I5Bn2lD5l > wVNvBQiU99ZPeqhcbL5XyvIsceB6ncodJ9zmBxIahrNIogMCq7UJbUhsikSRp6Dd > cL7r0AekTyiRmvZaHZZKbuad68DfATT4hy9/HzodBqTWLxxTMlrW8vNH9a7dSOo= > =oA7r > -----END PGP SIGNATURE----- > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Jan 24 07:52:46 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 13:52:46 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F1E3D25.7000008@unimelb.edu.au> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> <4F1E3D25.7000008@unimelb.edu.au> Message-ID: <08826288-2842-4C6B-B16A-180E5CCCF9D1@xs4all.nl> On Jan 24, 2012, at 6:09 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 24/01/12 16:02, Steve Crusan wrote: > >> Any center that buys BGP (or Q at this point) gear is >> going to pay for a scientific programmer to adapt their >> code to take advantage of the BG's strengths; parallelism. > > The advantage of the BG platform though is that it's just MPI and > threads, nothing that unusual at all - certainly no need to learn > CUDA, > OpenCL, etc.. > If you don't learn opencl, you're gonna run behind. Vincent > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8ePSUACgkQO2KABBYQAh+hPQCggfFgdr9R9G6H7hW0Dk1/sGK+ > Fe8Aniu7M6CEThw0s7F2CtqTCmuNZMRg > =mH9r > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 08:20:40 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 14:20:40 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: Message-ID: <20120124132040.GC7343@leitl.org> On Mon, Jan 23, 2012 at 09:53:14PM -0800, Lux, Jim (337C) wrote: > Inevitably, though, massively parallel interconnects (all boxes connected > to all other boxes) won't scale. You can soup up a local 3d torus with a small network like connectivity. That keeps the the node connectivity and number of wires still manageable. Moreover, the universe does it with local connectivity (even quantum entanglement needss a relativistic channel to tell it from RNG) just fine. A 3d grid/torus would be a good match for anything that can do long-range by iterating short-range interactions. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 08:23:27 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 14:23:27 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120124132040.GC7343@leitl.org> References: <20120124132040.GC7343@leitl.org> Message-ID: <20120124132327.GE7343@leitl.org> On Tue, Jan 24, 2012 at 02:20:40PM +0100, Eugen Leitl wrote: > On Mon, Jan 23, 2012 at 09:53:14PM -0800, Lux, Jim (337C) wrote: > > Inevitably, though, massively parallel interconnects (all boxes connected > > to all other boxes) won't scale. > > You can soup up a local 3d torus with a small network s/small network/small world network > like connectivity. That keeps the the node connectivity > and number of wires still manageable. > > Moreover, the universe does it with local connectivity > (even quantum entanglement needss a relativistic channel > to tell it from RNG) just fine. A 3d grid/torus would > be a good match for anything that can do long-range > by iterating short-range interactions. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 24 11:21:54 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 24 Jan 2012 08:21:54 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <986a2d9cf54a1630130a3361fc25a547.squirrel@mail.eadline.org> Message-ID: On 1/24/12 4:48 AM, "Douglas Eadline" wrote: > >> Inevitably, though, massively parallel interconnects (all boxes >>connected >> to all other boxes) won't scale. >> >Indeed, when thinking about scale I always end up thinking about >the masters of scale -- ants > >-- Unfortunately, ants only run a small set of specialized codes, and are not the generalized computing resource that we're looking for (and, frankly, don't yet know how to effectively use, if it were to exist) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 24 11:24:31 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 24 Jan 2012 17:24:31 +0100 Subject: [Beowulf] MIT Genius Stuffs 100 Processors Into Single Chip Message-ID: <20120124162431.GJ7343@leitl.org> http://www.wired.com/wiredenterprise/2012/01/mit-genius-stu/ MIT Genius Stuffs 100 Processors Into Single Chip By Eric Smalley January 23, 2012 | 6:30 am | Categories: Big Data, Tiny Chips, Data Centers, Hardware, Microprocessors, Servers, Spin-offs Anant Agarwal is crazy. If you say otherwise, he's not doing his job. Photo: Wired.com/Eric Smalley WESTBOROUGH, Massachusetts ? Call Anant Agarwal?s work crazy, and you?ve made him a happy man. Agarwal directs the Massachusetts Institute of Technology?s vaunted Computer Science and Artificial Intelligence Laboratory, or CSAIL. The lab is housed in the university?s Stata Center, a Dr. Seussian hodgepodge of forms and angles that nicely reflects the unhindered-by-reality visionary research that goes on inside. Agarwal and his colleagues are figuring out how to build the computer chips of the future, looking a decade or two down the road. The aim is to do research that most people think is nuts. ?If people say you?re not crazy,? Agarwal tells Wired, ?that means you?re not thinking far out enough.? Agarwal has been at this a while, and periodically, when some of his pie-in-the-sky research becomes merely cutting-edge, he dons his serial entrepreneur hat and launches the technology into the world. His latest commercial venture is Tilera. The company?s specialty is squeezing cores onto chips ? lots of cores. A core is a processor, the part of a computer chip that runs software and crunches data. Today?s high-end computer chips have as many as 16 cores. But Tilera?s top-of-the-line chip has 100. The idea is to make servers more efficient. If you pack lots of simple cores onto a single chip, you?re not only saving power. You?re shortening the distance between cores. Today, Tilera sells chips with 16, 32, and 64 cores, and it?s scheduled to ship that 100-core monster later this year. Tilera provides these chips to Quanta, the huge Taiwanese original design manufacturer (ODM) that supplies servers to Facebook and ? according to reports, Google. Quanta servers sold to the big web companies don?t yet include Tilera chips, as far as anyone is admitting. But the chips are on some of the companies? radar screens. Agarwal?s outfit is part of an ever growing movement to reinvent the server for the internet age. Facebook and Google are now designing their own servers for their sweeping online operations. Startups such as SeaMicro are cramming hundreds of mobile processors into servers in an effort to save power in the web data center. And Tilera is tackling this same task from different angle, cramming the processors into a single chip. Tilera grew out of a DARPA- and NSF-funded MIT project called RAW, which produced a prototype 16-core chip in 2002. The key idea was to combine a processor with a communications switch. Agarwal calls this creation a tile, and he?s able to build these many tiles into a piece of silicon, creating what?s known as a ?mesh network.? ?Before that you had the concept of a bunch of processors hanging off of a bus, and a bus tends to be a real bottleneck,? Agarwal says. ?With a mesh, every processor gets a switch and they all talk to each other?. You can think of it as a peer-to-peer network.? What?s more, Tilera made a critical improvement to the cache memory that?s part of each core. Agarwal and company made the cache dynamic, so that every core has a consistent copy of the chip?s data. This Dynamic Distributed Cache makes the cores act like a single chip so they can run standard software. The processors run the Linux operating system and programs written in C++, and a large chunk of Tilera?s commercialization effort focused on programming tools, including compilers that let programmers recompile existing programs to run on Tilera processors. The end result is a 64-core chip that handles more transactions and consumes less power than an equivalent batch of x86 chips. A 400-watt Tilera server can replace eight x86 servers that together draw 2,000 watts. Facebook?s engineers have given the chip a thorough tire-kicking, and Tilera says it has a growing business selling its chips to networking and videoconferencing equipment makers. Tilera isn?t naming names, but claims one of the top two videoconferencing companies and one of the top two firewall companies. An Army of Wimps There?s a running debate in the server world over what are called wimpy nodes. Startups SeaMicro and Calxeda are carving out a niche for low-power servers based on processors originally built for cellphones and tablets. Carnegie Mellon professor Dave Andersen calls these chips ?wimpy.? The idea is that building servers with more but lower-power processors yields better performance for each watt of power. But some have downplayed the idea, pointing out that it only works for certain types of applications. Tilera takes the position that wimpy cores are okay, but wimpy nodes ? aka wimpy chips ? are not. Keeping the individual cores wimpy is a plus because a wimpy core is low power. But if your cores are spread across hundreds of chips, Agarwal says, you run into problems: inter-chip communications are less efficient than on-chip communications. Tilera gets the best of both worlds by using wimpy cores but putting many cores on a chip. But it still has a ways to go. There?s also a limit to how wimpy your cores can be. Google?s infrastructure guru, Urs H?lzle, published an influential paper on the subject in 2010. He argued that in most cases brawny cores beat wimpy cores. To be effective, he argued, wimpy cores need to be no less than half the power of higher-end x86 cores. Tilera is boosting the performance of its cores. The company?s most recent generation of data center server chips, released in June, are 64-bit processors that run at 1.2 to 1.5 GHz. The company also doubled DRAM speed and quadrupled the amount of cache per core. ?It?s clear that cores have to get beefier,? Agarwal says. The whole debate, however, is somewhat academic. ?At the end of the day, the customer doesn?t care whether you?re a wimpy core or a big core,? Agarwal says. ?They care about performance, and they care about performance per watt, and they care about total cost of ownership, TCO.? Tilera?s performance per watt claims were validated by a paper published by Facebook engineers in July. The paper compared Tilera?s second generation 64-core processor to Intel?s Xeon and AMD?s Opteron high end server processors. Facebook put the processors through their paces on Memcached, a high-performance database memory system for web applications. According to the Facebook engineers, a tuned version of Memcached on the 64-core Tilera TILEPro64 yielded at least 67 percent higher throughput than low-power x86 servers. Taking power and node integration into account as well, a TILEPro64-based S2Q server with 8 processors handled at least three times as many transactions per second per Watt as the x86-based servers. Despite the glowing words, Facebook hasn?t thrown its arms around Tilera. The stumbling block, cited in the paper, is the limited amount of memory the Tilera processors support. Thirty-two-bit cores can only address about 4GB of memory. ?A 32-bit architecture is a nonstarter for the cloud space,? Agarwal says. Tilera?s 64-bit processors change the picture. These chips support as much as a terabyte of memory. Whether the improvement is enough to seal the deal with Facebook, Agarwal wouldn?t say. ?We have a good relationship,? he says with a smile. While Intel Lurks Intel is also working on many-core chips, and it expects to ship a specialized 50-core processor, dubbed Knights Corner, in the next year or so as an accelerator for supercomputers. Unlike the Tilera processors, Knights Corner is optimized for floating point operations, which means it?s designed to crunch the large numbers typical of high-performance computing applications. In 2009, Intel announced an experimental 48-core processor code-named Rock Creek and officially labeled the Single-chip Cloud Computer (SCC). The chip giant has since backed off of some of the loftier claims it was making for many-core processors, and it focused its many-core efforts on high-performance computing. For now, Intel is sticking with the Xeon processor for high-end data center server products. Dave Hill, who handles server product marketing for Intel, takes exception to the Facebook paper. ?Really what they compared was a very optimized set of software running on Tilera versus the standard image that you get from the open source running on the x86 platforms,? he says. The Facebook engineers ran over a hundred different permutations in terms of the number of cores allocated to the Linux stack, the networking stack and the Memcached stack, Hill says. ?They really kinda fine tuned it. If you optimize the x86 version, then the paper probably would have been more apples to apples.? Tilera?s roadmap calls for its next generation of processors, code-named Stratton, to be released in 2013. The product line will expand the number of processors in both directions, down to as few as four and up to as many as 200 cores. The company is going from a 40-nm to a 28-nm process, meaning they?re able to cram more circuits in a given area. The chip will have improvements to interfaces, memory, I/O and instruction set, and will have more cache memory. But Agarwal isn?t stopping there. As Tilera churns out the 100-core chip, he?s leading a new MIT effort dubbed the Angstrom project. It?s one of four DARPA-funded efforts aimed at building exascale supercomputers. In short, it?s aiming for a chip with 1,000 cores. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 24 13:13:17 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 24 Jan 2012 10:13:17 -0800 Subject: [Beowulf] balance between compute and communicate Message-ID: One of the lines in the article Eugen posted: "There's also a limit to how wimpy your cores can be. Google's infrastructure guru, Urs H?lzle, published an influential paper on the subject in 2010. He argued that in most cases brawny cores beat wimpy cores. To be effective, he argued, wimpy cores need to be no less than half the power of higher-end x86 cores." Is interesting.. I think the real issue is one of "system engineering".. you want processor speed, memory size/bandwidth, and internode communication speed/bandwidth to be "balanced". Super duper 10GHz cores with 1k of RAM interconnected with 9600bps serial links is clearly an unbalanced system.. The paper is at http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36448.pdf >From the paper: Typically, CPU power decreases by approximately O(k2) when CPU frequency decreases by k, Hmm.. this isn't necessarily true, with modern designs. In the bad old days, when core voltages were high and switching losses dominated, yes, this is the case, but with modern designs, the leakage losses are starting to be comparable to the switching losses. But that's ok, because he never comes back to the power issue again, and heads off on Amdahl's law (which we 'wulfers all know) and the inevitable single thread bottleneck that exists at some point. However, I certainly agree with him when he says: Cost numbers used by wimpy-core evangelists always exclude software development costs. Unfortunately, wimpy-core systems can require applications to be explicitly parallelized or otherwise optimized for acceptable performance.... But, I don't go for Software development costs often dominate a company's overall technical expenses I don't know that software development costs dominate. If you're building a million computer data center (distributed geographically, perhaps), that's on the order of several billion dollars, and you can buy an awful lot of skilled developer time for a billion dollars. It might cost another billion to manage all of them, but that's still an awful lot of development. But maybe in his space, the development time is more costly than the hardware purchase and operating costs. He summarizes with Once a chip's single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making..... Which is essentially my system engineering balancing argument, in the context of expectations that the surrounding stuff is current generation. So the real Computer Engineering question is: Is there some basic rule of thumb that one can use to determine appropriate balance, given things like speeds/bandwidth/power consumption? Could we, for instance, take moderately well understood implications and forecasts of future performance (e.g. Moore's law and its ilk) and predict what size machines with what performance would be reasonable in say, 20 years? The scaling rules for CPUs, for Memory, and for Communications are fairly well understood. (or maybe this is something that's covered in every lower division computer engineering class these days?.. I confess I'm woefully ignorant of what they teach at various levels these days) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Tue Jan 24 13:25:07 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 24 Jan 2012 19:25:07 +0100 Subject: [Beowulf] MIT Genius Stuffs 100 Processors Into Single Chip In-Reply-To: <20120124162431.GJ7343@leitl.org> References: <20120124162431.GJ7343@leitl.org> Message-ID: I remember the first announcement some years ago from Tilera. Some persons shipped some emails to tilera asking for more details. Some just asked - like me - others also offered money to buy a cpu. They all got a 'no'. But now that there are more details the chip sounds less impressive. Let's analyze based upon the vague information on the homepage. Lots of statements that a marketing department in India would write down as such are there as well; reformulating existing slogans into more political slogans, allowing you to deny later on that it performs very well. We know that trick just all too well. First of all homepage report it's 23 watts, yet doesn't say whether that's idle or under full load. It just says 'active'. Active is a vague way of formulating. I assume that's a core that isn't idle yet isn't under 100% load. So then it eats like a portion of the power. So probably it's a watt or 50 under full load. Then it says 64 cores in a grid @ 700Mhz. 700Mhz sounds as a possible Ghz frequency that you can get if you're a professional (if i'd build something count at it that it'll run 300Mhz or so). Doesn't seem like weird claim. 64 * 0.7 = 44.8Ghz measure Yet at the same time it claims on homepage 443 billion operations per second. What is an operation? Is that an internal iop? It says it's 32 bits VLIW. So that would mean it's processing each cycle 10 integers. Now we know from all other manufacturers they cheat factor 2, by double counting if just 1 instruction theirs is doing for example Fused Multiply Add. So we can divide it by 2 probably and get to 220 gflop. So then a vector would be 5 integers long, which seems like a weird measure. Maybe they rounded it up a tad and in reality mean 4 integers, sounds most reasonable. So then it's 64 cores in a grid executing vectors existing out of 4 units of 32 bits. Sounds plausible. If we compare that with some GPU's which are in our notebooks from a few years ago, then suddenly it's not so impressive. Vincent On Jan 24, 2012, at 5:24 PM, Eugen Leitl wrote: > > http://www.wired.com/wiredenterprise/2012/01/mit-genius-stu/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Tue Jan 24 17:36:14 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Wed, 25 Jan 2012 09:36:14 +1100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: <4F1F325E.9010109@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/01/12 23:51, Vincent Diepeveen wrote: > You build a system of millions of euro's alltogether, NCSA having a > huge budget and you can't even pay for a few programmers who > write some crunching code for gpu's???? I was at a meeting at SC'06 where the folks from various large institutions in the US were bemoaning the fact that there was all this money for petaflop hardware available but none for programmers or algorithm development to make apps scale out to the systems. Just because the scientists say it's a good thing to have doesn't mean the US government funding people will listen to them.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8fMl4ACgkQO2KABBYQAh95lwCfQodU25X1A0yngWOOwuAqmU2X thAAoICeeMk8fwx33enCWQ/XGvatdsEc =OFC+ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Jan 25 17:01:48 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 25 Jan 2012 17:01:48 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> Message-ID: <4F207BCC.9010701@ias.edu> On 01/24/2012 12:02 AM, Steve Crusan wrote: > > > On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: > > > > It's 500 euro for a 1 teraflop double precision Radeon HD7970... > > > Great, and nothing runs on it. GPUs are insanely useful for certain > tasks, but they aren't going to be able to handle most normal > workloads(similar to the BG class of course). Any center that buys BGP > (or Q at this point) gear is going to pay for a scientific programmer > to adapt their code to take advantage of the BG's strengths; parallelism. > > But It's nice that supercomputing centers use GPUs to boost their > flops numbers. Any word on that Chinese system's efficiency? If you > look at the architecture of the new K computer in Japan, it's similar > to the BlueGene line. I attended a presentation at Princeton U. on Monday about the state of HPC in China. The talk was given by someone who has been to China and spoken with the leaders of their HPC efforts. While the Chinese systems get great scores on LINPACK, even the Chinese concede that on their "real" applications, they are getting well below the theoretical max flops, because their codes aren't getting the most out of their systems. In other words, on real programs, they aren't all that efficient (yet). -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Jan 25 19:46:57 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 26 Jan 2012 01:46:57 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F207BCC.9010701@ias.edu> References: <708qawXBm8848S02.1327359732@web02.cms.usa.net> <3B284FD5-80E7-4384-9DBD-640D2F465E39@xs4all.nl> <4F1DF542.6050504@scalableinformatics.com> <4F1E070C.4040107@cse.psu.edu> <4F207BCC.9010701@ias.edu> Message-ID: <76840233-6CA8-4B9E-BF66-4A1A93CD1F1F@xs4all.nl> The supercomputing codes i saw run on processors, to say polite, were losing it everywhere. Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported publicly a speedup of factor 2 in the forums. However my own chessprogram, not exactly optimized for itanium2, got a boost of factor 4 moving from 500Mhz R14000 (origin3800) to itanium2 1.3Ghz. That was just a single compile, and it's an integer program, whereas the itanium2 is a floating point processor. The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has 1 Gflop on paper. Now a Chinese reporter posted on THIS mailing list, the beowulf mailing list, already at GPU hardware some generations ago an IPC of 25% at nvidia and 50% at AMD. At the same gpu's back then, most studentprojects got around 25% at nvidia; Volkov then went ahead and understood GPU's better and scored 70% efficiency - again at very old gpu's. Sincethen they really improved. See: http://www.cs.berkeley.edu/~volkov/ So you want to build a supercomptuer now 10x more expensive, and each generation lose more efficiency on newer hardware, whereas some who do effort to write new good code, they get very high efficiency? Just learn how to program and ignore the desinformation - if you have a box that fast you really can get a lot of speed out of it. You shouldn't ask for a 1 billion dollar box that can run your oldschool Fortran codes as good as a 5 million GPU box, look what you can do to write good codes for that manycore hardware. OpenGL works at all, CUDA just at nvidia. Vincent On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote: > On 01/24/2012 12:02 AM, Steve Crusan wrote: >> >> >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: >> >> >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970... >> >> >> Great, and nothing runs on it. GPUs are insanely useful for certain >> tasks, but they aren't going to be able to handle most normal >> workloads(similar to the BG class of course). Any center that buys >> BGP >> (or Q at this point) gear is going to pay for a scientific programmer >> to adapt their code to take advantage of the BG's strengths; >> parallelism. >> >> But It's nice that supercomputing centers use GPUs to boost their >> flops numbers. Any word on that Chinese system's efficiency? If you >> look at the architecture of the new K computer in Japan, it's similar >> to the BlueGene line. > > I attended a presentation at Princeton U. on Monday about the state of > HPC in China. The talk was given by someone who has been to China and > spoken with the leaders of their HPC efforts. While the Chinese > systems > get great scores on LINPACK, even the Chinese concede that on their > "real" applications, they are getting well below the theoretical max > flops, because their codes aren't getting the most out of their > systems. > In other words, on real programs, they aren't all that efficient > (yet). > > -- > Prentice > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Jan 26 00:04:31 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 25 Jan 2012 21:04:31 -0800 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F1F325E.9010109@unimelb.edu.au> Message-ID: On 1/24/12 2:36 PM, "Christopher Samuel" wrote: >institutions in the US were bemoaning the fact that there was all this >money for petaflop hardware available but none for programmers or >algorithm development to make apps scale out to the systems. That's partly because people are an expense, while hardware is an asset that sits on the balance sheet. If I fork out a million bucks for a computer, I now have an asset that is worth a million dollars. If I fork out a million dollars for 3 skilled developers for a year, at the end of the year, it's not clear I'll possess an asset that I can sell for a million dollars. Obviously, the work product must be worth something, because otherwise we wouldn't have jobs, but the connection is more tenuous. The other thing (when government funding is considered) is that the million dollar hardware purchase might turn into more jobs than the 3 software weenies, if only because "computer assemblers and deliverers" get paid a lot less, and when it comes to statistics, they don't look at "cumulative wages", they look at "number of people employed" > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 26 07:28:41 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 26 Jan 2012 13:28:41 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: Mike you replied to me not to mailing list. note that itanium2 released too late and it was $100k a box initially and $7500 a cpu (1.5Ghz) if you ordered a 1000. And it had same IPC for integers like opteron at the time (later on compilers got pgo for opteron as well and then opteron was faster, at least for diep, in ipc). Larrabee indeed resembles itanium to some extend, but not quite. intels expertise is producing highclocked cpu's. itanium was a low clocked cpu and therefore failed. no one pays big bucks for a low clocked cpu. look on ebay - cheapest cpu's always the lowclocked ones. larrabee is something in between a cpu and a gpu so total other ballgame - intel moving to a market where they actually have competition and are not the ones owning the patents. So that's not gonna be easy for intel some years from now if they show up with a 100% vectorized design and not some dreadnought in between cpu and gpu which is low clocked. As for your infiniband remark realize that it took 25 years or so to bugfix ethernet everywhere - forget 'setting a new standard' there for the average Joe. Not gonna work. Infiniband is meant for HPC and uses MPI protocol to communicate. This is very powerful for clusters and the way to go when scaling at supercomputers, yet it's not gonna conquer average joe's machine, as there is a price to pay which is too high for now. However realize some of sales of the HPC manufacturers goes to low latency ethernet - my guess is that intel will use qlogics know how there to improve their cheapo cpu's and upgrade them with better ethernet. Seems plausible goal and a very useful one, the rest, such as rivalling Mellanox at ethernet, that's not gonna happen. On Jan 26, 2012, at 7:23 AM, MDG wrote: > Technically the Itanium Chip was a failure, it was not x86 100% > compatible and actually was for servers but often under-preformed > the traditional x86 chips, Intel let it quietly vanish as it came > nowhere near the first advertised performance. It varied too far > from the x86 architecture design requiring special programing code, > so much like the GPUs, though they are actually able to run some > parallel process, both under Windows and Linux. > > There is a difference the M series NVIDIA cards are moe for servers > and the C series such as the C2070 or C2075 for Workstations, the M > series also used the same numbering sequence and I think they are > up to the 2090 or 2095 series, but you do need PCIe high speed > slots for both sets of cards. Most resale cards I have talked to a > few, and be careful there are some knockoffs from mainland China, I > verified this with NVIDIA. > > These GPUs are designed that they are not seen as cores or cpus, > also most resale?s, are pulled from in one case a pool of HP > Workstations and servers, yet the seller had no idea the difference > between the C2070 and the M2070s. and as I said none had of them > had the required software, most did not even know it was needed! > Otherwise the GPUs do not function. So, as for resale?s it is a > pretty expensive gamble as they are untested as no software to even > try them with! > > The GPUs can be used if you wtrite your own parallel code usually > in C++ per NVIDIA, but you still need the software to offload the > work to them. If you are into heavy number crunching, assuming > allows parallel processing versus the traditional linear method > where a must always come before B and b before C in processes, you > will see a lot more results than a typical program, in other things > you will see little improvement, my talk with an NVIDIA technician > confirmed this you can get a great results for creating say > graphics but very little improvement to display a already designed > piece, same for statistics, weather forecasting, geology, > technically intel has even used their network as a massive HPC to > elp design chips, so add engineering, while beyond most physics and > nuclear explosions simulations, etc. > > Also with fiber optics now coming down in price the idea of > multiple super-workstations and even super-servers where a client > server relationship and the Server does most of the processing will > most likely grow into stable and usable systems before the average > work-station. > > It will help some with a statistics driven database but not that > much for a pure relational database, it also works well with > MathLab and SPSS. > > > > Overall I would expect that the GPUs will soon have more code > written for them as they become more plentiful in the real world > applications, also there is open source code that is available and > being further developed under linux, which with Wine and Winex can > run Windows, to some degree, not 100% and as for Windows 7 I have > not a clue if it will run under Wine or WineX, though the > Macintosh?s now run Windows very well as a second operating > system.. Than I would like to have 4 12 core Xeons in my > workstation but that bill is far higher than a few 448 GPU cards. > Just as any new technology it starts on the high end and then as > developed works its way down the price chain, than I was shocked to > see a twin Xeon 6 core in a Game machine! So things are moving > faster than I anticipated. > > > > I know I am watching the GPU idea and cards carefully as so far > beyond just throwing more cores in the x86 architecture it seems to > be moving far faster than when intel started moving upwards, maybe > you remember the hardware flaw in the first Pentiums where simple > math was processed incorrectly? Like all things when you introduce > new variables into a system, be it hardware or software, there are > a lot of things that will not always work or work to the potential > of the system. > > > > As I said I am watching the GPUs closely as so far they seem the > most likely next beak-through as software is written that can take > advantage of their unique abilities. Also from what I have read > they draw far less power than even the new generation of multi-core > x86 series. I am not an expert with these GPU systems but they do > hold a great promise as in a leap-forward than just adding x86 cores. > > The buying of Infiniband shows hat Intel is looking to move past > the copper Ethernet systems, which surpased Arcnet systems. the > only constamnt is change, while technically not an Intel Chip this > still shows Moore's law is being leveraged to other platforms > including GPUs > > Mike. > > --- On Wed, 1/25/12, Vincent Diepeveen wrote: > > From: Vincent Diepeveen > Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic > InfiniBand business > To: "Prentice Bisbal" > Cc: "Beowulf Mailing List" > Date: Wednesday, January 25, 2012, 2:46 PM > > The supercomputing codes i saw run on processors, to say polite, were > losing it everywhere. > > Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported > publicly a speedup of factor 2 in the forums. > > However my own chessprogram, not exactly optimized for itanium2, got > a boost of factor 4 moving from 500Mhz R14000 (origin3800) > to itanium2 1.3Ghz. That was just a single compile, and it's an > integer program, whereas the itanium2 is a floating point processor. > > The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has > 1 Gflop on paper. > > Now a Chinese reporter posted on THIS mailing list, the beowulf > mailing list, already at GPU hardware some generations ago > an IPC of 25% at nvidia and 50% at AMD. > > At the same gpu's back then, most studentprojects got around 25% at > nvidia; Volkov then went ahead and understood GPU's better > and scored 70% efficiency - again at very old gpu's. Sincethen they > really improved. > > See: http://www.cs.berkeley.edu/~volkov/ > > So you want to build a supercomptuer now 10x more expensive, and each > generation lose more efficiency on newer hardware, > whereas some who do effort to write new good code, they get very high > efficiency? > > Just learn how to program and ignore the desinformation - if you have > a box that fast you really can get a lot of speed out of it. > > You shouldn't ask for a 1 billion dollar box that can run your > oldschool Fortran codes as good as a 5 million GPU box, > look what you can do to write good codes for that manycore hardware. > OpenGL works at all, CUDA just at nvidia. > > Vincent > > On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote: > > > On 01/24/2012 12:02 AM, Steve Crusan wrote: > >> > >> > >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: > >> > >> > >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970... > >> > >> > >> Great, and nothing runs on it. GPUs are insanely useful for certain > >> tasks, but they aren't going to be able to handle most normal > >> workloads(similar to the BG class of course). Any center that buys > >> BGP > >> (or Q at this point) gear is going to pay for a scientific > programmer > >> to adapt their code to take advantage of the BG's strengths; > >> parallelism. > >> > >> But It's nice that supercomputing centers use GPUs to boost their > >> flops numbers. Any word on that Chinese system's efficiency? If you > >> look at the architecture of the new K computer in Japan, it's > similar > >> to the BlueGene line. > > > > I attended a presentation at Princeton U. on Monday about the > state of > > HPC in China. The talk was given by someone who has been to > China and > > spoken with the leaders of their HPC efforts. While the Chinese > > systems > > get great scores on LINPACK, even the Chinese concede that on their > > "real" applications, they are getting well below the theoretical max > > flops, because their codes aren't getting the most out of their > > systems. > > In other words, on real programs, they aren't all that efficient > > (yet). > > > > -- > > Prentice > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > > Computing > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Jan 26 07:35:40 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 26 Jan 2012 13:35:40 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: On Jan 26, 2012, at 1:28 PM, Vincent Diepeveen wrote: > Mike you replied to me not to mailing list. > > note that itanium2 released too late and it was $100k a box > initially and $7500 a cpu (1.5Ghz) if you ordered a 1000. > And it had same IPC for integers like opteron at the time (later on > compilers got pgo for opteron as well and then opteron was faster, > at least for diep, in ipc). > > Larrabee indeed resembles itanium to some extend, but not quite. > intels expertise is producing highclocked cpu's. itanium was a low > clocked cpu and therefore failed. > no one pays big bucks for a low clocked cpu. look on ebay - > cheapest cpu's always the lowclocked ones. > > larrabee is something in between a cpu and a gpu so total other > ballgame - intel moving to a market where they actually have > competition > and are not the ones owning the patents. > > So that's not gonna be easy for intel some years from now if they > show up with a 100% vectorized design and not some dreadnought > in between cpu and gpu which is low clocked. > > As for your infiniband remark realize that it took 25 years or so > to bugfix ethernet everywhere - forget 'setting a new standard' > there for the average Joe. > Not gonna work. > > Infiniband is meant for HPC and uses MPI protocol to communicate. > This is very powerful for clusters and the way to go when scaling > at supercomputers, > yet it's not gonna conquer average joe's machine, as there is a > price to pay which is too high for now. > > However realize some of sales of the HPC manufacturers goes to low > latency ethernet - my guess is that intel will use qlogics know how > there to improve > their cheapo cpu's and upgrade them with better ethernet. Seems > plausible goal and a very useful one, the rest, such as rivalling > Mellanox at ethernet, > that's not gonna happen. > Oops small typo during speedy write. "mellanox at ethernet" should of course be 'mellanox at HPC'. The question is whether typical low latency ethernet products are gonna suffer from intels move. I doubt solarflare will. they already deliver this stuff only to those who really battle for every picosecond, so price is just not the issue there. Vincent > On Jan 26, 2012, at 7:23 AM, MDG wrote: > >> Technically the Itanium Chip was a failure, it was not x86 100% >> compatible and actually was for servers but often under-preformed >> the traditional x86 chips, Intel let it quietly vanish as it came >> nowhere near the first advertised performance. It varied too far >> from the x86 architecture design requiring special programing >> code, so much like the GPUs, though they are actually able to run >> some parallel process, both under Windows and Linux. >> >> There is a difference the M series NVIDIA cards are moe for >> servers and the C series such as the C2070 or C2075 for >> Workstations, the M series also used the same numbering sequence >> and I think they are up to the 2090 or 2095 series, but you do >> need PCIe high speed slots for both sets of cards. Most resale >> cards I have talked to a few, and be careful there are some >> knockoffs from mainland China, I verified this with NVIDIA. >> >> These GPUs are designed that they are not seen as cores or cpus, >> also most resale?s, are pulled from in one case a pool of HP >> Workstations and servers, yet the seller had no idea the >> difference between the C2070 and the M2070s. and as I said none >> had of them had the required software, most did not even know it >> was needed! Otherwise the GPUs do not function. So, as for >> resale?s it is a pretty expensive gamble as they are untested as >> no software to even try them with! >> >> The GPUs can be used if you wtrite your own parallel code usually >> in C++ per NVIDIA, but you still need the software to offload the >> work to them. If you are into heavy number crunching, assuming >> allows parallel processing versus the traditional linear method >> where a must always come before B and b before C in processes, you >> will see a lot more results than a typical program, in other >> things you will see little improvement, my talk with an NVIDIA >> technician confirmed this you can get a great results for creating >> say graphics but very little improvement to display a already >> designed piece, same for statistics, weather forecasting, geology, >> technically intel has even used their network as a massive HPC to >> elp design chips, so add engineering, while beyond most physics >> and nuclear explosions simulations, etc. >> >> Also with fiber optics now coming down in price the idea of >> multiple super-workstations and even super-servers where a client >> server relationship and the Server does most of the processing >> will most likely grow into stable and usable systems before the >> average work-station. >> >> It will help some with a statistics driven database but not that >> much for a pure relational database, it also works well with >> MathLab and SPSS. >> >> >> >> Overall I would expect that the GPUs will soon have more code >> written for them as they become more plentiful in the real world >> applications, also there is open source code that is available and >> being further developed under linux, which with Wine and Winex can >> run Windows, to some degree, not 100% and as for Windows 7 I have >> not a clue if it will run under Wine or WineX, though the >> Macintosh?s now run Windows very well as a second operating >> system.. Than I would like to have 4 12 core Xeons in my >> workstation but that bill is far higher than a few 448 GPU cards. >> Just as any new technology it starts on the high end and then as >> developed works its way down the price chain, than I was shocked >> to see a twin Xeon 6 core in a Game machine! So things are moving >> faster than I anticipated. >> >> >> >> I know I am watching the GPU idea and cards carefully as so far >> beyond just throwing more cores in the x86 architecture it seems >> to be moving far faster than when intel started moving upwards, >> maybe you remember the hardware flaw in the first Pentiums where >> simple math was processed incorrectly? Like all things when you >> introduce new variables into a system, be it hardware or software, >> there are a lot of things that will not always work or work to the >> potential of the system. >> >> >> >> As I said I am watching the GPUs closely as so far they seem the >> most likely next beak-through as software is written that can take >> advantage of their unique abilities. Also from what I have read >> they draw far less power than even the new generation of multi- >> core x86 series. I am not an expert with these GPU systems but >> they do hold a great promise as in a leap-forward than just adding >> x86 cores. >> >> The buying of Infiniband shows hat Intel is looking to move past >> the copper Ethernet systems, which surpased Arcnet systems. the >> only constamnt is change, while technically not an Intel Chip this >> still shows Moore's law is being leveraged to other platforms >> including GPUs >> >> Mike. >> >> --- On Wed, 1/25/12, Vincent Diepeveen wrote: >> >> From: Vincent Diepeveen >> Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic >> InfiniBand business >> To: "Prentice Bisbal" >> Cc: "Beowulf Mailing List" >> Date: Wednesday, January 25, 2012, 2:46 PM >> >> The supercomputing codes i saw run on processors, to say polite, were >> losing it everywhere. >> >> Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported >> publicly a speedup of factor 2 in the forums. >> >> However my own chessprogram, not exactly optimized for itanium2, got >> a boost of factor 4 moving from 500Mhz R14000 (origin3800) >> to itanium2 1.3Ghz. That was just a single compile, and it's an >> integer program, whereas the itanium2 is a floating point processor. >> >> The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has >> 1 Gflop on paper. >> >> Now a Chinese reporter posted on THIS mailing list, the beowulf >> mailing list, already at GPU hardware some generations ago >> an IPC of 25% at nvidia and 50% at AMD. >> >> At the same gpu's back then, most studentprojects got around 25% at >> nvidia; Volkov then went ahead and understood GPU's better >> and scored 70% efficiency - again at very old gpu's. Sincethen they >> really improved. >> >> See: http://www.cs.berkeley.edu/~volkov/ >> >> So you want to build a supercomptuer now 10x more expensive, and each >> generation lose more efficiency on newer hardware, >> whereas some who do effort to write new good code, they get very high >> efficiency? >> >> Just learn how to program and ignore the desinformation - if you have >> a box that fast you really can get a lot of speed out of it. >> >> You shouldn't ask for a 1 billion dollar box that can run your >> oldschool Fortran codes as good as a 5 million GPU box, >> look what you can do to write good codes for that manycore hardware. >> OpenGL works at all, CUDA just at nvidia. >> >> Vincent >> >> On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote: >> >> > On 01/24/2012 12:02 AM, Steve Crusan wrote: >> >> >> >> >> >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote: >> >> >> >> >> >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970... >> >> >> >> >> >> Great, and nothing runs on it. GPUs are insanely useful for >> certain >> >> tasks, but they aren't going to be able to handle most normal >> >> workloads(similar to the BG class of course). Any center that buys >> >> BGP >> >> (or Q at this point) gear is going to pay for a scientific >> programmer >> >> to adapt their code to take advantage of the BG's strengths; >> >> parallelism. >> >> >> >> But It's nice that supercomputing centers use GPUs to boost their >> >> flops numbers. Any word on that Chinese system's efficiency? If >> you >> >> look at the architecture of the new K computer in Japan, it's >> similar >> >> to the BlueGene line. >> > >> > I attended a presentation at Princeton U. on Monday about the >> state of >> > HPC in China. The talk was given by someone who has been to >> China and >> > spoken with the leaders of their HPC efforts. While the Chinese >> > systems >> > get great scores on LINPACK, even the Chinese concede that on their >> > "real" applications, they are getting well below the theoretical >> max >> > flops, because their codes aren't getting the most out of their >> > systems. >> > In other words, on real programs, they aren't all that efficient >> > (yet). >> > >> > -- >> > Prentice >> > >> > >> > >> > _______________________________________________ >> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> > Computing >> > To change your subscription (digest mode or unsubscribe) visit >> > http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Thu Jan 26 18:27:21 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 27 Jan 2012 10:27:21 +1100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: <4F21E159.7000905@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 26/01/12 23:28, Vincent Diepeveen wrote: > Mike you replied to me not to mailing list. That was probably deliberate, and it is inconsiderate to post a reply publicly without checking with the writer that they are OK with that, especially as you quoted what they wrote - they may not have wanted that in the public domain. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8h4VkACgkQO2KABBYQAh9lJgCfQXwsmDG9l1v4Jt9vUr5YYCr0 fDYAoJdJBbUJBApO5ZOh200gZ5+Lo/vt =mpU4 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Thu Jan 26 20:48:55 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 26 Jan 2012 20:48:55 -0500 (EST) Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: > Larrabee indeed resembles itanium to some extend, but not quite. wow, that has to be your most loosely-tethered-to-reality statement yet! it's true that Larrabee and Itanium are very close in the number of letters in their name. > Infiniband is meant for HPC and uses MPI protocol to communicate. no and no. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 01:04:17 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 07:04:17 +0100 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: On Jan 27, 2012, at 2:48 AM, Mark Hahn wrote: >> Larrabee indeed resembles itanium to some extend, but not quite. > > wow, that has to be your most loosely-tethered-to-reality statement > yet! > it's true that Larrabee and Itanium are very close > in the number of letters in their name. Your personal attack seems to indicate you disagree with my qualification of the entire Larrabee line having any reality sense in the long run. Instead of throwing mudd, mind to explain why a Larrabee, an architecture far away from mainstream, makes any chance of competing in HPC with the existing architectural concepts in the long run? Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 01:06:07 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 07:06:07 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F21E159.7000905@unimelb.edu.au> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> Message-ID: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> Why do you write this? On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 26/01/12 23:28, Vincent Diepeveen wrote: > >> Mike you replied to me not to mailing list. > > That was probably deliberate, and it is inconsiderate to post a reply > publicly without checking with the writer that they are OK with that, > especially as you quoted what they wrote - they may not have wanted > that > in the public domain. > > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8h4VkACgkQO2KABBYQAh9lJgCfQXwsmDG9l1v4Jt9vUr5YYCr0 > fDYAoJdJBbUJBApO5ZOh200gZ5+Lo/vt > =mpU4 > -----END PGP SIGNATURE----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 27 10:37:43 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 27 Jan 2012 10:37:43 -0500 (EST) Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: >>> Larrabee indeed resembles itanium to some extend, but not quite. >> >> wow, that has to be your most loosely-tethered-to-reality statement >> yet! >> it's true that Larrabee and Itanium are very close >> in the number of letters in their name. > > Your personal attack seems to indicate you disagree with my > qualification of the entire Larrabee line > having any reality sense in the long run. not surprisingly, no: I disagree that Larrabee and Itanium resemble each other in any but really silly ways. Itanium is a custom, VLIW architecture; Larrabee is an on-chip cluster of non-VLIW, commodity x86_64 cores. none of the distinctive features of Itanium (multi-instruction bundles, dependency on compile-time scheduling, intended market, implementation, success limited to predictable, high-bandwidth situations, directory-based inter-node cache coherency) are anything close to the features of Larrabee (standard x86_64 ISA, no special compiler needed, on-chip message-passing network, suitable for complex/dynamic/unpredictable loads, possibly not even cache-coherent across one chip.) my guess is that you were thinking about how ia64 chips tended to run at low clock rates, and thinking about how gpus (probably including larrabee) also tend to be low-clocked. > Instead of throwing mudd, mind to explain why a Larrabee, > an architecture far away from mainstream, makes any chance of > competing in HPC > with the existing architectural concepts in the long run? as far as I know, larrabee will be a mesh of conventional x86_64 cores that will run today's x86_64 code. I don't know whether Intel has stated (or even decided) whether the cores will have full or partial cache coherency, or whether they'll really be an MPI-like shared-nothing cluster. if you want to compare Larrabee to Fermi or AMD GCN, that might be interesting. or to mainstream multicore - like bulldozer, with 32c per package vs larrabee with ">=50". but not ia64. it's best we all just forget about it. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 27 10:39:06 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 27 Jan 2012 10:39:06 -0500 (EST) Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> Message-ID: > Why do you write this? because he though you might be interested in improving your etiquette. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 10:42:48 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 10:42:48 -0500 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: <4F22C5F8.6010804@scalableinformatics.com> On 01/27/2012 10:37 AM, Mark Hahn wrote: >>>> Larrabee indeed resembles itanium to some extend, but not quite. >>> >>> wow, that has to be your most loosely-tethered-to-reality statement >>> yet! >>> it's true that Larrabee and Itanium are very close >>> in the number of letters in their name. >> >> Your personal attack seems to indicate you disagree with my >> qualification of the entire Larrabee line >> having any reality sense in the long run. > > not surprisingly, no: I disagree that Larrabee and Itanium resemble > each other in any but really silly ways. > > Itanium is a custom, VLIW architecture; Larrabee is an on-chip > cluster of non-VLIW, commodity x86_64 cores. But ... but .... they are both made of Silicon .... doesn't that mean they are the same? /sarc (Sorry, its been a fun week ... and this was just ... too ... irresistible ...) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 11:06:00 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 11:06:00 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> Message-ID: <4F22CB68.3080605@ias.edu> Vincent, He wrote that because he's trying to educate you on proper mailing list etiquette, which is something you appear to be lacking. Chris is absolutely right - you should not reply to off-list e-mails on-list. -- Prentice On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: > Why do you write this? > > On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: > > On 26/01/12 23:28, Vincent Diepeveen wrote: > > >>> Mike you replied to me not to mailing list. > > That was probably deliberate, and it is inconsiderate to post a reply > publicly without checking with the writer that they are OK with that, > especially as you quoted what they wrote - they may not have wanted > that > in the public domain. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 11:12:35 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 17:12:35 +0100 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> Message-ID: <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> On Jan 27, 2012, at 4:37 PM, Mark Hahn wrote: >>>> Larrabee indeed resembles itanium to some extend, but not quite. >>> >>> wow, that has to be your most loosely-tethered-to-reality statement >>> yet! >>> it's true that Larrabee and Itanium are very close >>> in the number of letters in their name. >> >> Your personal attack seems to indicate you disagree with my >> qualification of the entire Larrabee line >> having any reality sense in the long run. > > not surprisingly, no: I disagree that Larrabee and Itanium resemble > each other in any but really silly ways. > > Itanium is a custom, VLIW architecture; Larrabee is an on-chip > cluster of non-VLIW, commodity x86_64 cores. > > none of the distinctive features of Itanium (multi-instruction > bundles, > dependency on compile-time scheduling, intended market, > implementation, > success limited to predictable, high-bandwidth situations, > directory-based > inter-node cache coherency) are anything close to the features of > Larrabee > (standard x86_64 ISA, no special compiler needed, on-chip message- > passing > network, suitable for complex/dynamic/unpredictable loads, possibly > not even > cache-coherent across one chip.) > > my guess is that you were thinking about how ia64 chips tended to > run at low clock rates, and thinking about how gpus (probably > including > larrabee) also tend to be low-clocked. > And both are seem failures from user viewpoint, maybe not from intels income viewpoint, but from intels aim to replace and/or create a new long lasting architecture that can even *remotely* compete with other manufacturers, not to mention far too high pricepoints for such cpu's. >> Instead of throwing mudd, mind to explain why a Larrabee, >> an architecture far away from mainstream, makes any chance of >> competing in HPC >> with the existing architectural concepts in the long run? > > as far as I know, larrabee will be a mesh of conventional x86_64 cores > that will run today's x86_64 code. I don't know whether Intel has > stated > (or even decided) whether the cores will have full or partial cache > coherency, or whether they'll really be an MPI-like shared-nothing > cluster. Assuming you're not completely born stupid, i assume you will realize that IN ORDER to run most existing x64 codes, it needs to have cache coherency, and that it always has been presented as having exactly that. Which is one of reasons why the architecture doesn't scale of course. Well you can forget about them running your x64 fortran codes on it at any fast speed. You need to total rewrite your code to be able to use vectors of doubles, and in contradiction to GPU's where you can indirectly with arrays see each PE or each 'compute core' (which is 4 PE's of in case of AMD-ATI that can execute 1 double a cycle), Such lookups are a disaster at larrabee - having a cost of 7 cycles for indirect lookups, so you really need to use vectors. Now i bet majority of your oldie x64 code doesn't use such huge vectors, so to even get some remote performance out of it, a total rewrite of most code is needed, if it can work at all. We can then also see the insight that GPU's are total superior to larrabee at most terrains and most importantly at multiplicative codes. As you might know GPU's are worldchampion in doing multiplications and CPU's are not. Multiplication happens to be something that is of major importance for the majority of HPC codes. Majority i really mean - approaching 90% at the public supercomputers. Vincent > > if you want to compare Larrabee to Fermi or AMD GCN, that might be > interesting. or to mainstream multicore - like bulldozer, with 32c > per package vs larrabee with ">=50". > > but not ia64. it's best we all just forget about it. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 11:15:05 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 17:15:05 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F22CB68.3080605@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> Message-ID: And why do you post this? On Jan 27, 2012, at 5:06 PM, Prentice Bisbal wrote: > Vincent, > > He wrote that because he's trying to educate you on proper mailing > list > etiquette, which is something you appear to be lacking. > > Chris is absolutely right - you should not reply to off-list e-mails > on-list. > > -- > Prentice > > On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: >> Why do you write this? >> >> On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: >> >> On 26/01/12 23:28, Vincent Diepeveen wrote: >> >>>>> Mike you replied to me not to mailing list. >> >> That was probably deliberate, and it is inconsiderate to post a reply >> publicly without checking with the writer that they are OK with that, >> especially as you quoted what they wrote - they may not have wanted >> that >> in the public domain. >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at cse.psu.edu Fri Jan 27 11:25:15 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Fri, 27 Jan 2012 11:25:15 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> Message-ID: <4F22CFEB.6080404@cse.psu.edu> On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: > And why do you post this? "Assuming you're not completely born stupid, i assume you will realize that IN ORDER to" write an effective email that conveys some idea or argument, it is extremely helpful to utilize some form of etiquette or at the very least, self-restraint in your writing so we all don't stop reading your emails. In fact, while it's not a terribly great book IMHO, it might still help to read "How to Win Friends and Influence People." Seems like you have enough time on your hands to write near-to-incoherent emails on this list and program near-to-impossible applications for GPUs, so perhaps if you can steal a little time from one or the other you can finish it in a day or so. But admittedly, perhaps requesting etiquette from you is truly an unthinkable thing to do. Hence your boggled state of mind. ellis > > On Jan 27, 2012, at 5:06 PM, Prentice Bisbal wrote: > >> Vincent, >> >> He wrote that because he's trying to educate you on proper mailing >> list >> etiquette, which is something you appear to be lacking. >> >> Chris is absolutely right - you should not reply to off-list e-mails >> on-list. >> >> -- >> Prentice >> >> On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: >>> Why do you write this? >>> >>> On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: >>> >>> On 26/01/12 23:28, Vincent Diepeveen wrote: >>> >>>>>> Mike you replied to me not to mailing list. >>> >>> That was probably deliberate, and it is inconsiderate to post a reply >>> publicly without checking with the writer that they are OK with that, >>> especially as you quoted what they wrote - they may not have wanted >>> that >>> in the public domain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 11:34:41 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 11:34:41 -0500 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: <4F22D221.3020504@ias.edu> On 01/27/2012 11:12 AM, Vincent Diepeveen wrote: > And both are seem failures from user viewpoint, maybe not from intels > income viewpoint, > but from intels aim to replace and/or create a new long lasting > architecture > that can even *remotely* compete with other manufacturers, > not to mention far too high pricepoints for such cpu's. This argument is ridiculous. Just because two completely different technologies (architectures) both fail, doesn't make them similar. That's like saying a Ford Edsel and Pontiac Aztek are similar cars. > Assuming you're not completely born stupid, i assume you will realize > that IN ORDER to run Calling someone "completely born stupid" is unacceptable behavior. > most existing x64 codes, it needs to have cache coherency, and that > it always has been > presented as having exactly that. > Which is one of reasons why the architecture doesn't scale of course. Cache-coherent systems don't scale well? Really? SGI Origins were ccNUMA systems, and they scaled well. > Well you can forget about them running your x64 fortran codes on it > at any fast speed. > > You need to total rewrite your code to be able to use vectors of > doubles, > and in contradiction to GPU's where you can indirectly with arrays > see each PE or each 'compute core' > (which is 4 PE's of in case of AMD-ATI that can execute 1 double a This argument makes no sense in the context of this discussion. You need to do a significant rewrite of your code to take advantage of GPUs, too, so how are GPUs better? > cycle), > > Such lookups are a disaster at larrabee - having a cost of 7 cycles > for indirect lookups, > so you really need to use vectors. > > Now i bet majority of your oldie x64 code doesn't use such huge vectors, > so to even get some remote performance out of it, a total rewrite of > most code is needed, > if it can work at all. > > We can then also see the insight that GPU's are total superior to > larrabee at most terrains and > most importantly at multiplicative codes. > > As you might know GPU's are worldchampion in doing multiplications > and CPU's are not. > > Multiplication happens to be something that is of major importance > for the majority of HPC codes. > Majority i really mean - approaching 90% at the public supercomputers. I'm at a loss for words... Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 11:38:02 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 11:38:02 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> Message-ID: <4F22D2EA.1080309@ias.edu> Vincent, I posted that because you asked a question and I answered it, which is also good mailing list etiquette. Since you posted your question "Why do you write this?" to the mailing list instead of replying just to Chris, anyone on this list is free to reply to it. Again, this is basic mailing list etiquette. -- Prentice On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: > And why do you post this? > > On Jan 27, 2012, at 5:06 PM, Prentice Bisbal wrote: > >> Vincent, >> >> He wrote that because he's trying to educate you on proper mailing >> list >> etiquette, which is something you appear to be lacking. >> >> Chris is absolutely right - you should not reply to off-list e-mails >> on-list. >> >> -- >> Prentice >> >> On 01/27/2012 01:06 AM, Vincent Diepeveen wrote: >>> Why do you write this? >>> >>> On Jan 27, 2012, at 12:27 AM, Christopher Samuel wrote: >>> >>> On 26/01/12 23:28, Vincent Diepeveen wrote: >>> >>>>>> Mike you replied to me not to mailing list. >>> That was probably deliberate, and it is inconsiderate to post a reply >>> publicly without checking with the writer that they are OK with that, >>> especially as you quoted what they wrote - they may not have wanted >>> that >>> in the public domain. >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 11:41:55 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 17:41:55 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F22CFEB.6080404@cse.psu.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> Message-ID: <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> On Jan 27, 2012, at 5:25 PM, Ellis H. Wilson III wrote: > On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: >> And why do you post this? So you can follow all etiquette, yet only techincal your mind is not capable of following the discussions - so you just felt replying to etiquette. That says more about you, than about me. What everyone hates about politics is that people just speak about how things are phrased instead of looking at the intention of the phrased text. Why don't you go into politics, maybe you'll do better there. Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at cse.psu.edu Fri Jan 27 11:58:25 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Fri, 27 Jan 2012 11:58:25 -0500 Subject: [Beowulf] The Absurdity of Diep - Was cpu's versus gpu's - Was Intel buys QLogic InfiniBand business In-Reply-To: <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> Message-ID: <4F22D7B1.4020508@cse.psu.edu> On 01/27/2012 11:41 AM, Vincent Diepeveen wrote: > On Jan 27, 2012, at 5:25 PM, Ellis H. Wilson III wrote: > >> On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: >>> And why do you post this? > > So you can follow all etiquette, yet only techincal your mind is not > capable of following the discussions - > so you just felt replying to etiquette. No, I've given up writing technically when you're posting because: a) You go into discussions to prove everyone wrong b) You rapidly switch the topic if too many people disagree, which is frustrating and confusing (hence, was intel buys qlogic, then became cpus versus gpus, which became Itanium vs Larabee somehow, and now it is how poorly you communicate) c) There is nothing to gain from having discussions with you > That says more about you, than about me. My personal background is storage and communication protocol-heavy. Not processor-oriented. You are right to suggest I am hesitant to post on a thread that directly compares two seemingly different processors, just like you hesitate to deal with the reality that you lack basic social skills. Everyone caters to their own strengths, and generally (if they are wise), takes a back-seat and tries to learn something in areas they are weak. > What everyone hates about politics is that people just speak about > how things are phrased instead of looking at the intention of the > phrased text. > > Why don't you go into politics, maybe you'll do better there. Just because this is a list on Beowulfery and broadly covers everything remotely attached to HPC does not mean it needs to be bereft of a baseline of etiquette and respect for one another. I know quite a few very nice, but rather intelligent and technically-capable people. These two qualities can in fact coexist in a person, believe it or not. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 12:03:38 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 18:03:38 +0100 Subject: [Beowulf] Larrabee - Mark Hahn's personal attack In-Reply-To: <4F22D221.3020504@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> <4F22D221.3020504@ias.edu> Message-ID: <208B7C7D-3A3E-4134-A352-4D7D78B304D1@xs4all.nl> On Jan 27, 2012, at 5:34 PM, Prentice Bisbal wrote: > On 01/27/2012 11:12 AM, Vincent Diepeveen wrote: >> And both are seem failures from user viewpoint, maybe not from intels >> income viewpoint, >> but from intels aim to replace and/or create a new long lasting >> architecture >> that can even *remotely* compete with other manufacturers, >> not to mention far too high pricepoints for such cpu's. > > This argument is ridiculous. Just because two completely different > technologies (architectures) both fail, doesn't make them similar. > > That's like saying a Ford Edsel and Pontiac Aztek are similar cars. > >> Assuming you're not completely born stupid, i assume you will realize >> that IN ORDER to run > > Calling someone "completely born stupid" is unacceptable behavior. Whereaas everyone knows the statements of intel on larrabee there and that without cache coherency you can't multithread and everything also has to be done blocked - so there is zero compatibility with x64 then and any compatibility then cannot get garantueed. You know this really well - yet you kept yourself dumb there trying to cheap score. As without cache coherency of course it's easy to build big cpu's that scale well, yet they don't work x64 then. of course intel will be forced to design some kick butt design somewhere in future that's not x64 compatible at all which isn't using things like cache coherency. Which isn't remotely the idea of larrabee. That's why you wrote it down as such. >> most existing x64 codes, it needs to have cache coherency, and that >> it always has been >> presented as having exactly that. >> Which is one of reasons why the architecture doesn't scale of course. > > Cache-coherent systems don't scale well? Really? SGI Origins were > ccNUMA > systems, and they scaled well. > Indeed this didn't scale near lineair in price. Each Origin3800 @ 64 processors @ 1.5Ghz was exactly 1 million dollar, whereas a simple normal x64 cpu at the time had a price similar to the square root of that. In GPU's it all scales very cheap, and when using cache coherency you start to lose that scaling. Yields will go down of course. Most manufacturers need a pretty high yield to sell a chip at any decent price, so production costs of a larrabee chip in the same proces technology as a GPU, having the same performance will be a huge factor higher. That also will cause intel to really sell few of them. You would consider buying a larrabee at 1 million dollar a card? >> Well you can forget about them running your x64 fortran codes on it >> at any fast speed. >> >> You need to total rewrite your code to be able to use vectors of >> doubles, >> and in contradiction to GPU's where you can indirectly with arrays >> see each PE or each 'compute core' >> (which is 4 PE's of in case of AMD-ATI that can execute 1 double a > > This argument makes no sense in the context of this discussion. You > need to do a significant rewrite of your code to take advantage of > GPUs, > too, so how are GPUs better? If you need to rewrite it anyway, why not get a much faster performance at part of the price? It's the same effort you have to do. > >> cycle), >> >> Such lookups are a disaster at larrabee - having a cost of 7 cycles >> for indirect lookups, >> so you really need to use vectors. >> >> Now i bet majority of your oldie x64 code doesn't use such huge >> vectors, >> so to even get some remote performance out of it, a total rewrite of >> most code is needed, >> if it can work at all. >> >> We can then also see the insight that GPU's are total superior to >> larrabee at most terrains and >> most importantly at multiplicative codes. >> >> As you might know GPU's are worldchampion in doing multiplications >> and CPU's are not. >> >> Multiplication happens to be something that is of major importance >> for the majority of HPC codes. >> Majority i really mean - approaching 90% at the public >> supercomputers. > > I'm at a loss for words... > http://www.nwo.nl/nwohome.nsf/pages/NWOP_8DEEKL_Eng title: "Overview of recent supercomputers 2010" Author: Aad van der Steen > > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Jan 27 13:29:52 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 27 Jan 2012 13:29:52 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> Message-ID: <4F22ED20.7040105@ias.edu> On 01/27/2012 11:41 AM, Vincent Diepeveen wrote: > On Jan 27, 2012, at 5:25 PM, Ellis H. Wilson III wrote: > >> On 01/27/2012 11:15 AM, Vincent Diepeveen wrote: >>> And why do you post this? > So you can follow all etiquette, yet only techincal your mind is not > capable of following the discussions - > so you just felt replying to etiquette. > > That says more about you, than about me. > What it says is that we've given up on discussing technology with you, because your arguments are completely nonsensical. Since you clearly don't understand technology, we're hoping you can at least understand the simple concepts of basic etiquette. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From glykos at mbg.duth.gr Fri Jan 27 13:57:31 2012 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Fri, 27 Jan 2012 20:57:31 +0200 (EET) Subject: [Beowulf] Signal to noise. In-Reply-To: <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: Dear List, I have been a (mostly) quiet reader of this list for the last ~5 years and my intention is to continue reading the excellent posts that the members of this community contribute almost daily. Having said that, the recent Vincent-centric 'discussions' have ---as I am sure you all know--- significantly reduced the signal-to-noise ratio. Can we get back to normal, please ? Thanks, Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From moloney.brendan at gmail.com Fri Jan 27 14:26:12 2012 From: moloney.brendan at gmail.com (Brendan Moloney) Date: Fri, 27 Jan 2012 11:26:12 -0800 Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: I am in a similar position. I posted a question to this list quite some time ago but have remained subscribed to the list ever since. I have always (or at least until recently) enjoyed reading the discussions on here. I hope that one person does not ruin such a great resource. Thanks, Brendan On Fri, Jan 27, 2012 at 10:57 AM, Nicholas M Glykos wrote: > > Dear List, > > I have been a (mostly) quiet reader of this list for the last ~5 years and > my intention is to continue reading the excellent posts that the members > of this community contribute almost daily. Having said that, the recent > Vincent-centric 'discussions' have ---as I am sure you all know--- > significantly reduced the signal-to-noise ratio. Can we get back to > normal, please ? > > Thanks, > Nicholas > > -- > > > Nicholas M. Glykos, Department of Molecular Biology > and Genetics, Democritus University of Thrace, University Campus, > Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, > Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From h-bugge at online.no Fri Jan 27 14:29:35 2012 From: h-bugge at online.no (=?iso-8859-1?Q?H=E5kon_Bugge?=) Date: Fri, 27 Jan 2012 11:29:35 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120124045541.GB10196@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> Message-ID: <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> Greg, On 23. jan. 2012, at 20.55, Greg Lindahl wrote: > On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: > >> http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html > > I figured out the main why: > > http://seekingalpha.com/news-article/2082171-qlogic-gains-market-share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets > >> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >> surpassed $100 million per quarter, and are on track for about fifty >> percent annual growth, according to Crehan Research. > > That's the whole market, and QLogic says they are #1 in the FCoE > adapter segment of this market, and #2 in the overall 10 gig adapter > market (see > http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses-f2q12-results-earnings-call-transcript) That can explain why QLogic is selling, but not why Intel is buying. 10 years ago, Intel went _out_ of the Infiniband marked, see http://www.networkworld.com/newsletters/servers/2002/01383318.html So has the IB business evolved so incredible well compared to what Intel expected back in 2002? Do not think so. I would guess that we will see message passing/RDMA over Thunderbolt or similar. H?kon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 15:06:54 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 21:06:54 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> Message-ID: <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> On Jan 27, 2012, at 8:29 PM, H?kon Bugge wrote: > Greg, > > > On 23. jan. 2012, at 20.55, Greg Lindahl wrote: > >> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: >> >>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>> intel_to_buy_qlogic_s_infiniband_business.html >> >> I figured out the main why: >> >> http://seekingalpha.com/news-article/2082171-qlogic-gains-market- >> share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets >> >>> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >>> surpassed $100 million per quarter, and are on track for about fifty >>> percent annual growth, according to Crehan Research. >> >> That's the whole market, and QLogic says they are #1 in the FCoE >> adapter segment of this market, and #2 in the overall 10 gig adapter >> market (see >> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >> f2q12-results-earnings-call-transcript) > > That can explain why QLogic is selling, but not why Intel is buying. > > 10 years ago, Intel went _out_ of the Infiniband marked, see http:// > www.networkworld.com/newsletters/servers/2002/01383318.html > > So has the IB business evolved so incredible well compared to what > Intel expected back in 2002? Do not think so. > > I would guess that we will see message passing/RDMA over > Thunderbolt or similar. > > Qlogic offers that QDR. Mellanox is a generation newer there with FDR. Both in latency as well as in bandwidth a huge difference. > H?kon > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 15:19:31 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 15:19:31 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> Message-ID: <4F2306D3.4080509@scalableinformatics.com> On 01/27/2012 03:06 PM, Vincent Diepeveen wrote: > > On Jan 27, 2012, at 8:29 PM, H?kon Bugge wrote: > >> Greg, >> >> >> On 23. jan. 2012, at 20.55, Greg Lindahl wrote: >> >>> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: >>> >>>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>>> intel_to_buy_qlogic_s_infiniband_business.html >>> >>> I figured out the main why: >>> >>> http://seekingalpha.com/news-article/2082171-qlogic-gains-market- >>> share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets >>> >>>> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >>>> surpassed $100 million per quarter, and are on track for about fifty >>>> percent annual growth, according to Crehan Research. >>> >>> That's the whole market, and QLogic says they are #1 in the FCoE >>> adapter segment of this market, and #2 in the overall 10 gig adapter >>> market (see >>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >>> f2q12-results-earnings-call-transcript) I found that statement interesting. I've actually not known anything about their 10GbE products. My bad. >> >> That can explain why QLogic is selling, but not why Intel is buying. >> >> 10 years ago, Intel went _out_ of the Infiniband marked, see http:// >> www.networkworld.com/newsletters/servers/2002/01383318.html >> >> So has the IB business evolved so incredible well compared to what >> Intel expected back in 2002? Do not think so. >> >> I would guess that we will see message passing/RDMA over >> Thunderbolt or similar. Intel buying makes quite a bit of sense IMO. They are in 10GbE silicon and NICs, and being in IB silicon and HCAs gives them not only a hedge (10GbE while growing rapidly, is not the only high performance network market, and Intel is very good at getting economies of scale going with its silicon ... well ... most of its silicon ... ignoring Itanium here ...). Its quite likely that Intel would need IB for its PetaScale plans. Someone here postualted putting the silicon on the CPU. Not sure if this would happen, but I could see it on an IOH, easily. That would make sense (at least in terms of the Westmere designs ... for the Romley et al. I am not sure where it would make most sense). But Intel sees the HPC market growth, and I think they realize that there are interesting opportunities for them there with tighter high performance networking interconnects (Thunderbolt, USB3, IB, 10GbE native on all these systems). > Qlogic offers that QDR. > Mellanox is a generation newer there with FDR. > > Both in latency as well as in bandwidth a huge difference. Haven't looked much at FDR or EDR latency. Was it a huge delta (more than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us for a while, and switches are still ~150-300ns port to port. At some point I think you start hitting a latency floor, bounded in part by "c", but also by an optimal technology path length that you can't shorten without significant investment and new technology. Not sure how close we are to that point (maybe someone from Qlogic/Mellanox could comment on the headroom we have). Bandwidth wise, you need E5 with PCIe 3 to really take advantage of FDR. So again, its a natural fit, especially if its LOM .... Curiously, I think this suggests that ScaleMP could be in play on the software side ... imagine stringing together bunches of the LOM FDR/QDR motherboards with E5's and lots of ram into huge vSMPs (another thread). Shai may tell me I'm full of it (hope he doesn't), but I think this is a real possibility. The Qlogic purchase likely makes this even more interesting for Intel (or Cisco, others as a defensive acq). We sure do live in interesting times! -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 15:27:24 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 15:27:24 -0500 Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: <4F2308AC.9010704@scalableinformatics.com> On 01/27/2012 01:57 PM, Nicholas M Glykos wrote: > > Dear List, > > I have been a (mostly) quiet reader of this list for the last ~5 years and > my intention is to continue reading the excellent posts that the members > of this community contribute almost daily. Having said that, the recent > Vincent-centric 'discussions' have ---as I am sure you all know--- > significantly reduced the signal-to-noise ratio. Can we get back to > normal, please ? > Greetings Nicholas and many others: I've found that filters help. I have some simple procmail filters set up in my mail directory that redirect some people's email (and in some cases responses to them) to a file I ... well ... never read. By doing so, I find the S/N ratio to be vastly improved. Only one person from Beowulf is in this (not Vincent ... I am still deeply amused by some of the emails, though that is fading fast with the personal attacks). Procmail filters look like this :0: * ^From:.*bad at person.com $HOME/twit.filter Then I never read the twit.filter. Just empty it out every now and then. Maybe once every few years. Doing this has dramatically improved S/N here and elsewhere. If you don't have this capability directly, your mail client can probably fake it. I use this as I have (far too) many mail clients and I don't want to manage the rules on all of them. If you are afflicted with Microsoft exchange as your mail server, I am not sure what you can (easily) do. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From glykos at mbg.duth.gr Fri Jan 27 15:58:02 2012 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Fri, 27 Jan 2012 22:58:02 +0200 (EET) Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: Hi Joe, > I've found that filters help. You are killing my daily digests. > If you are afflicted with Microsoft ... What is 'Microsoft' ? :-) All the best (and apologies to the list for the email traffic), Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 16:07:34 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 16:07:34 -0500 Subject: [Beowulf] Signal to noise. In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <109ADC53-91F0-4699-A3F9-0EFB57EEC25E@xs4all.nl> Message-ID: <4F231216.3020703@scalableinformatics.com> On 01/27/2012 03:58 PM, Nicholas M Glykos wrote: > > Hi Joe, > > >> I've found that filters help. > > You are killing my daily digests. Do'h ! ... I seem to remember that you can do some more fancy filtering ... Someone showed me something a few years ago, that would break apart digests, filter, and reassemble. Something like this: http://easierbuntu.blogspot.com/2011/09/managing-your-email-with-fetchmail.html (they have some interesting procmail recipes, but you can find them to do this if you really want to). > > >> If you are afflicted with Microsoft ... > > What is 'Microsoft' ? > :-) A small, very gentle company in the North West USA. > All the best (and apologies to the list for the email traffic), > Nicholas :) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 16:42:24 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 27 Jan 2012 22:42:24 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2306D3.4080509@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: <69BBD80B-05C9-4683-99F7-B48A0BDA285D@xs4all.nl> On Jan 27, 2012, at 9:19 PM, Joe Landman wrote: > On 01/27/2012 03:06 PM, Vincent Diepeveen wrote: >> >> On Jan 27, 2012, at 8:29 PM, H?kon Bugge wrote: >> >>> Greg, >>> >>> >>> On 23. jan. 2012, at 20.55, Greg Lindahl wrote: >>> >>>> On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: >>>> >>>>> http://www.hpcwire.com/hpcwire/2012-01-23/ >>>>> intel_to_buy_qlogic_s_infiniband_business.html >>>> >>>> I figured out the main why: >>>> >>>> http://seekingalpha.com/news-article/2082171-qlogic-gains-market- >>>> share-in-both-fibre-channel-and-10gb-ethernet-adapter-markets >>>> >>>>> Server-class 10Gb Ethernet Adapter and LOM revenues have recently >>>>> surpassed $100 million per quarter, and are on track for about >>>>> fifty >>>>> percent annual growth, according to Crehan Research. >>>> >>>> That's the whole market, and QLogic says they are #1 in the FCoE >>>> adapter segment of this market, and #2 in the overall 10 gig >>>> adapter >>>> market (see >>>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >>>> f2q12-results-earnings-call-transcript) > > I found that statement interesting. I've actually not known anything > about their 10GbE products. My bad. > >>> >>> That can explain why QLogic is selling, but not why Intel is buying. >>> >>> 10 years ago, Intel went _out_ of the Infiniband marked, see http:// >>> www.networkworld.com/newsletters/servers/2002/01383318.html >>> >>> So has the IB business evolved so incredible well compared to what >>> Intel expected back in 2002? Do not think so. >>> >>> I would guess that we will see message passing/RDMA over >>> Thunderbolt or similar. > > Intel buying makes quite a bit of sense IMO. They are in 10GbE > silicon > and NICs, and being in IB silicon and HCAs gives them not only a hedge > (10GbE while growing rapidly, is not the only high performance network > market, and Intel is very good at getting economies of scale going > with > its silicon ... well ... most of its silicon ... ignoring Itanium here > ...). Its quite likely that Intel would need IB for its PetaScale Why buy previous generation IB in such case? It's about the ethernet of course... They produce tens of millions of cpu's each quarter and also announced a SoC (socket on chip). From SoC's actually the market produces billions a year. So it's alucrative market, yet highly competative. Having 10 gigabit ethernet on such SoC and the total at a low price would give intel a huge lead there worth dozens of billions a year. It's not clear to me where all their SoC plans go, but i bet right now they are open to any market needing SoC's. Note that many SoC's are dirt cheap. Even in very low volume we speak about some tens of dollars, cpu included and other connectivity included. Price is everything there, yet i guess intel will be offering the 'top' SoC's there with faster cpu's and 10 GigE. Then they produce a bunch of mainboards. Think also of upcoming generation of consoles, ipad 3's and similar products etc - it's not clear yet which company gets the contracts for upcoming consoles, it's all wide open for now. Yet they might sell also a 100+ million of those. Intel is an attractive company to do business with for console manufacturers now. IBM's cell kind of lost momentum there and has nothing new to offer that really outperforms as it seems. Also power usage of cell was kind of disappointing. Initial version PS3 was 220 watts on average and 100% usage it could go up to 380+ watt. Try to put that on your couch. Don't confuse this with the later crunching CELL version, a much improved chip, used for some supercomputers. Yet if i remember well, some reports, was it Aad v/d Steen (?) already predicted it would be not interesting for upcoming supercomputers as it is some kind of hybrid chip - which has no long term future. He was right. > plans. Someone here postualted putting the silicon on the CPU. Not > sure if this would happen, but I could see it on an IOH, easily. That > would make sense (at least in terms of the Westmere designs ... for > the > Romley et al. I am not sure where it would make most sense). > > But Intel sees the HPC market growth, and I think they realize that > there are interesting opportunities for them there with tighter high > performance networking interconnects (Thunderbolt, USB3, IB, 10GbE > native on all these systems). > Undoubtfully they'll try something in the HPC market. If you already have put lots of cash in development of a product it's better to put it on the market. Based upon their name they'll sell some. And some years from now they should have something bigtime improved. Yet realize how complicated it is to tape out a GPU at a new process technology if you aren't sure you gonna sell a 100+ million of them. Such massive projects have to pay back for factories. A product that's having a potential of not even selling for over a few dozens of billions of dollars is not even interesting to develop. Just startup costs for a GPU at a new proces technology is some dozens of millions for each run and the more complex it is and the newer the proces technology the more expensive it is. Realize IBM produces its power7 and bluegene/q upcoming cpu at 45 nm technology. GPU's release now in 28 nm. That's giving theoretically an advantage of a tad less of (45 / 28) ^ 2 = 2.58 So a gpu of intel needs to be factor 2.58 better in the same proces technology than todays gpu's of AMD (already released 28 nm) and Nvidia (coming soon 28 nm i'd expect). This where with cpu's, intels big advantage is always that they are better in getting newer proces technologies to work sooner than the competition. Ivy Bridge will be 22 nm so i heard rumours. >> Qlogic offers that QDR. >> Mellanox is a generation newer there with FDR. >> >> Both in latency as well as in bandwidth a huge difference. > > Haven't looked much at FDR or EDR latency. Was it a huge delta (more > than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us > for a while, and switches are still ~150-300ns port to port. At some Posting here some months ago from Gilad Shainer was it's 0.85 us RDMA for FDR versus 1.3 us or so for the other; more importantly for clusters is the bandwidth. I guess that pci-e 3.0 allows simply much higher speeds whereas the QDR is PCI-E 2.0 stuff. Isn't pci-e 3.0 about 2x higher bandwidth than 2 pci-e 2.0? Now i might be happy with that last, but i guess that for big FFT's or be it matrice, you still need massive bandwidth. Even if n is big in O ( k * n log n ) Where k in case of matrice is a tad bigger than n and in case of Number Theory is usually around the number of bits, so 3.32 times n or so, that means you still need k steps of n log n. That's massive bandwidth. > point I think you start hitting a latency floor, bounded in part by > "c", > but also by an optimal technology path length that you can't shorten > without significant investment and new technology. Not sure how close > we are to that point (maybe someone from Qlogic/Mellanox could comment > on the headroom we have). There is a lot of headroom for better latencies from software viewpoint, as cpu's keep getting faster yet latency of years ago networks was just marginally worse than what's there now. In case of hardware i really am no expert there. > > Bandwidth wise, you need E5 with PCIe 3 to really take advantage of > FDR. > So again, its a natural fit, especially if its LOM .... > All the socket2011 boards that are in the shops now are PCI-e 3.0 and a wave of mainboards with 2 sockets will release a few days before or at the same day that intel finally releases the Xeon version of Sandy Bridge. Seems it didn't release yet as it's not too high clocked, if i look at this sample cpu :) It's 2Ghz to be precise (8 cores Xeon). > Curiously, I think this suggests that ScaleMP could be in play on the > software side ... imagine stringing together bunches of the LOM FDR/ > QDR > motherboards with E5's and lots of ram into huge vSMPs (another > thread). > Shai may tell me I'm full of it (hope he doesn't), but I think > this is > a real possibility. The Qlogic purchase likely makes this even more > interesting for Intel (or Cisco, others as a defensive acq). > A technology that just sold to 300 machines, this is not interesting market for intel. They have very expensive factories that each cost many billions of dollars. These need to produce nonstop and sell products, to pay back for the factories and to make a profit. Intel used to be worth over a 100 billion dollar at NASDAQ. Wasting your most clever engineers, from which each company always has too few, to products that can't keep busy your factories, is a total waste of time. So your huge base of B-class engineers, let me not quote some mailing list names, that's the ones you move to Qlogic then for the HPC. That's enough to keep it afloat for a while in combination with 'intel inside'. Intels profit is too huge to be busy toying with tiny markets with a handful of customers, from which majority forgot to take their medicine when you propose rewriting the software to some new hardware platform you are gonna unroll. A habit intel is not exactly excited about of course, as they like to sell each time new technology. Also each larrabee intel would sell means they sell a bunch of xeons less of course. > We sure do live in interesting times! > Not for everyone i guess - many lost their job and as i predicted some years ago a guy with a nobel prize might be carpet bombing a huge nation this summer. Intel has 3 huge factories in Israel last time i checked. It sure can give unpredicted results for future. > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 16:47:21 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 16:47:21 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <69BBD80B-05C9-4683-99F7-B48A0BDA285D@xs4all.nl> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <69BBD80B-05C9-4683-99F7-B48A0BDA285D@xs4all.nl> Message-ID: <4F231B69.1050404@scalableinformatics.com> On 01/27/2012 04:42 PM, Vincent Diepeveen wrote: > > On Jan 27, 2012, at 9:19 PM, Joe Landman wrote: > >> On 01/27/2012 03:06 PM, Vincent Diepeveen wrote: [... merciful trimming ...] >>>> I would guess that we will see message passing/RDMA over >>>> Thunderbolt or similar. >> >> Intel buying makes quite a bit of sense IMO. They are in 10GbE >> silicon >> and NICs, and being in IB silicon and HCAs gives them not only a hedge >> (10GbE while growing rapidly, is not the only high performance network >> market, and Intel is very good at getting economies of scale going >> with >> its silicon ... well ... most of its silicon ... ignoring Itanium here >> ...). Its quite likely that Intel would need IB for its PetaScale > > Why buy previous generation IB in such case? IP. Its all about IP. Its always about IP. If ever you think its not about IP, you should remember "Landman's N+1th rule of M&A: It's the IP man ... just da IP!" > It's about the ethernet of course... ... no its not. Intel has its own ethernet. Its had it for a LONG time, and it did not buy Qlogic ethernet ... Its not about the ethernet. Say it with me ... ITS NOT ABOUT THE ETHERNET ... There, don't you feel better now? I do ... > They produce tens of millions of cpu's each quarter and also > announced a SoC (socket on chip) SoC is "System On a Chip". Socket on a chip is ... er ... cart before the horse? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Fri Jan 27 17:13:12 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 14:13:12 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> Message-ID: <20120127221312.GA29961@bx9.net> On Fri, Jan 27, 2012 at 11:29:35AM -0800, H?kon Bugge wrote: > That can explain why QLogic is selling, but not why Intel is buying. That's right. This was probably bought, not sold. If you look at the press release Intel put out, it's all about Exascale computing. http://newsroom.intel.com/community/intel_newsroom/blog/2012/01/23/intel-takes-key-step-in-accelerating-high-performance-computing-with-infiniband-acquisition If you want to put an IB HCA in a CPU or a {north,south}bridge, TrueScale nee InfiniPath is a much smaller implementation than others, and most of the chip is memory, which Intel knows how to shrink drastically compared to the usual way people implement memory. Also, keep in mind that Intel's benchmarking group in Moscow has a lot of experience with benchmarking real apps for bids using TrueScale head-to-head against other HCAs, and I wouldn't be surprised if it was the case that TrueScale QDR is faster than that other company's FDR on many real codes, for the usual reason that TrueScale's MPI-oriented InfiniBand extension is more suited for MPI than the standard InfiniBand has-more-features-than-MPI-requires protocols. Finally, I haven't seen it mentioned whether or not QLogic's IB switch was part of the purchase. If it is, then you should note that it's not hard to make that chip speak ethernet, and Intel could probably dramatically improve it with their superior serdes technology. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Fri Jan 27 17:25:58 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Fri, 27 Jan 2012 22:25:58 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120127221312.GA29961@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> Message-ID: > If you want to put an IB HCA in a CPU or a {north,south}bridge, TrueScale nee > InfiniPath is a much smaller implementation than others, and most of the chip > is memory, which Intel knows how to shrink drastically compared to the usual > way people implement memory. So I wonder why multiple OEMs decided to use Mellanox for on-board solutions and no one used the QLogic silicon... > Also, keep in mind that Intel's benchmarking group in Moscow has a lot of > experience with benchmarking real apps for bids using TrueScale head-to-head > against other HCAs, and I wouldn't be surprised if it was the case that TrueScale > QDR is faster than that other company's FDR on many real codes, Surprise surprise... this is no more than FUD. If you have real numbers to back it up please send. If it was so great, how come more people decided to use the Mellanox solutions? If QLogic was doing so great with their solution, I would guess they would not be selling the IB business... > Finally, I haven't seen it mentioned whether or not QLogic's IB switch was part > of the purchase. If it is, then you should note that it's not hard to make that chip > speak ethernet, and Intel could probably dramatically improve it with their > superior serdes technology. > > -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Fri Jan 27 17:27:23 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 14:27:23 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2306D3.4080509@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: <20120127222723.GB29961@bx9.net> On Fri, Jan 27, 2012 at 03:19:31PM -0500, Joe Landman wrote: > >>> That's the whole market, and QLogic says they are #1 in the FCoE > >>> adapter segment of this market, and #2 in the overall 10 gig adapter > >>> market (see > >>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- > >>> f2q12-results-earnings-call-transcript) > > I found that statement interesting. I've actually not known anything > about their 10GbE products. My bad. I'm not surprised, as this 10ge adapter is aimed at the same part of the market that uses fibre channel, which isn't that common in HPC. It doesn't have the kind of TCP offload features which have been (futilely) marketed in HPC; it's all about running the same fibre channel software most enterprises have run for a long time, but having the network be ethernet. > Haven't looked much at FDR or EDR latency. Was it a huge delta (more > than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us > for a while, and switches are still ~150-300ns port to port. Are you talking about the latency of 1 core on 1 system talking to 1 core on one system, or the kind of latency that real MPI programs see, running on all of the cores on a system and talking to many other systems? I assure you that the latter is not 0.8 for any IB system. > At some > point I think you start hitting a latency floor, bounded in part by "c", Last time I did the computation, we were 10X that floor. And, of course, each increase in bandwidth usually makes latency worse, absent heroic efforts of implementers to make that headline latency look better. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From tom.elken at qlogic.com Fri Jan 27 18:08:58 2012 From: tom.elken at qlogic.com (Tom Elken) Date: Fri, 27 Jan 2012 15:08:58 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120127221312.GA29961@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> Message-ID: <35AAF1E4A771E142979F27B51793A4888885B23AE5@AVEXMB1.qlogic.org> > Finally, I haven't seen it mentioned whether or not QLogic's IB switch > was part of the purchase. >From the QLogic press release: " QLogic Corp. ... today announced a definitive agreement to sell the product lines ... associated with its InfiniBand business to Intel Corporation ..." So "the product lines" means both the switch and HCA product lines. Last summer Intel acquired an Ethernet switch business: http://newsroom.intel.com/community/intel_newsroom/blog/2011/07/19/intel-to-acquire-fulcrum-microsystems so it is not unprecedented that they are interested in switching as well as host technologies. -Tom If it is, then you should note that it's not > hard to make that chip speak ethernet, and Intel could probably > dramatically improve it with their superior serdes technology. > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Fri Jan 27 16:07:08 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 27 Jan 2012 16:07:08 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2306D3.4080509@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: >>>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- >>>> f2q12-results-earnings-call-transcript) > > I found that statement interesting. I've actually not known anything > about their 10GbE products. My bad. I was a bit surprised that the entire transcript had only one side-ways mention of IB. also interesting that they seem quite heavily into the heavily-offloaded adapter market (which is sort of the opposite of the original infinipath stuff.) >>> I would guess that we will see message passing/RDMA over >>> Thunderbolt or similar. has there been any mention of Thunderbolt in a switched context? afaikt it's just a weird "let's do faster USB and throw in video" thing. > Intel buying makes quite a bit of sense IMO. They are in 10GbE silicon > and NICs, and being in IB silicon and HCAs gives them not only a hedge > (10GbE while growing rapidly, is not the only high performance network weird to have redundant/competing parts in many of the same markets though. afaik, intel 10G has a reasonable rep; they presumably won't be junking their own products. > ...). Its quite likely that Intel would need IB for its PetaScale > plans. I can't quite tell whether Qlogic's IB switches use Mellanox chips or not. afaik, Qlogic has their own adapter chips (and perhaps FC/eth). > than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us > for a while, and switches are still ~150-300ns port to port. At some mellanox qdr systems I've tested are about 1.6 us half-rtt pingpong. I don't think the switch latency is a big deal, since with 36x fanout, you don't need a very tall fat-tree. > Curiously, I think this suggests that ScaleMP could be in play on the > software side really? I'd be interested in hearing from real people who've actually used it (not marketing, thanks). I don't really understand how ScaleMP can do the required coherency in units smaller than a page, which means that "non-embarassing" programs will surely notice... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From tom.elken at qlogic.com Fri Jan 27 18:24:21 2012 From: tom.elken at qlogic.com (Tom Elken) Date: Fri, 27 Jan 2012 15:24:21 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> Message-ID: <35AAF1E4A771E142979F27B51793A4888885B23AF3@AVEXMB1.qlogic.org> > I can't quite tell whether Qlogic's IB switches use Mellanox chips or not. With the QDR generation, QLogic developed its own IB switch chip, and uses it in the 12000 line of switches. -Tom This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From bill at cse.ucdavis.edu Fri Jan 27 21:10:02 2012 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Fri, 27 Jan 2012 18:10:02 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> Message-ID: <4F2358FA.4030009@cse.ucdavis.edu> On 01/27/2012 02:25 PM, Gilad Shainer wrote: > So I wonder why multiple OEMs decided to use Mellanox for on-board > solutions and no one used the QLogic silicon... That's a strange argument. What does Intel want? Something to make them more money. In the past that's been integrating functionality into their CPU or support chipsets. In the past that's been sata, usb, memory controller, pci-e controller, and GigE. The cost in transistors and die area seems very relevant to Intel's interests. Anyone have an estimate on how much latency a direct connect to QPI would save vs pci-e? What to motherboard board manufacturers want? Something to make them more money. So that's mostly marketing/reputation, pricing, and whatever they can do to differentiate themselves. If buying a $150 IB chip lets them charge $400 more then it's a win, assuming they spend less than $250 of R&D to add it to the motherboard. I doubt the difference in transistors or a few watts would be a big deal either way. >> Also, keep in mind that Intel's benchmarking group in Moscow has a >> lot of experience with benchmarking real apps for bids using >> TrueScale head-to-head >> against other HCAs, and I wouldn't be surprised if it was the case that TrueScale >> QDR is faster than that other company's FDR on many real codes, > > > Surprise surprise... this is no more than FUD. If you have real > numbers to back it up please send. If it was so great, how come more > people decided to use the Mellanox solutions? If QLogic was doing so > great with their solution, I would guess they would not be selling the > IB business... FUD = Fear, Uncertainty, and Doubt. Doesn't sound like FUD to me. More like a cheap attack on Greg, I think we (the mailing list) can do better. I've personally compared several generations of Myrinet and Infinipath to allegedly faster Mellanox adapters. Mellanox hasn't won yet, but I've not compared QDR or FDR yet. With that said the reason I run the benchmarks to find the best solution and it might well be Mellanox next time. It would be irresponsible to recommend Mellanox cluster provide just pick mellanox FDR over Qlogic QDR just because of the spec sheet. Of course recommending Qlogic over Mellanox without quantifying real world performance would be just as irresponsible. Maybe we could have a few less attacks, complaining and hand waving and more useful information? IMO Greg never came across as a commercial (which beowulf list isn't an appropriate place for), but does regularly contribute useful info. Arguing market share as proof of performance superiority is just silly. Speaking of which, you said: There is some add latency due to the 66/64 new encoding, but overall latency is lower than QDR. MPI is below 1us. I googled for additional information, looked around the Mellanox website, and couldn't find anything. Is that above number relevant to HPC folks running clusters? Does it involve a switch? If not realistic are there any realistic numbers available? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Jan 27 21:24:10 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2012 21:24:10 -0500 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <20120127222723.GB29961@bx9.net> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <20120127222723.GB29961@bx9.net> Message-ID: <4F235C4A.8040409@scalableinformatics.com> On 01/27/2012 05:27 PM, Greg Lindahl wrote: > I'm not surprised, as this 10ge adapter is aimed at the same part of > the market that uses fibre channel, which isn't that common in HPC. It > doesn't have the kind of TCP offload features which have been > (futilely) marketed in HPC; it's all about running the same fibre > channel software most enterprises have run for a long time, but having > the network be ethernet. That makes sense. >> Haven't looked much at FDR or EDR latency. Was it a huge delta (more >> than 30%) better than QDR? I've been hearing numbers like 0.8-0.9 us >> for a while, and switches are still ~150-300ns port to port. > > Are you talking about the latency of 1 core on 1 system talking to 1 > core on one system, or the kind of latency that real MPI programs see, > running on all of the cores on a system and talking to many other > systems? I assure you that the latter is not 0.8 for any IB system. I am looking at these things from a "best of all possible cases" scenario. So when someone comes at me with new "best of all possible cases" numbers, I can compare. Sadly this seems to be the state of many OEM/integrators/manufacturers. In storage, we see small disk form factor SSDs marketed generally, with statments like 50k IOPs, and 500 MB/s. Though they neglect to mention several specific issues with these, such as writing all zeros, or the 75k IOPs are sequential IOPs you get from taking the 600 MB/s interface, dividing by 8k byte operations on a sequential read. Actually do a real random read and write and you get very ... very different results. Especially with non-zero (real) data. >> At some >> point I think you start hitting a latency floor, bounded in part by "c", > > Last time I did the computation, we were 10X that floor. And, of > course, each increase in bandwidth usually makes latency worse, absent > heroic efforts of implementers to make that headline latency look > better. I think thats the point though, that moving that performance "knee" down to lower latency involves (potentially) significant cost, for a modest return ... in terms of real performance benefit to a code. Thanks for the pointer on the computation. If we are 1000x off the floor, we can probably come up with a way to do better. 10x, probably its much harder than we think and not necessarily worth the effort. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Jan 27 21:38:14 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Sat, 28 Jan 2012 03:38:14 +0100 Subject: [Beowulf] Setting up new benchmark In-Reply-To: <4F235C4A.8040409@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <20120127222723.GB29961@bx9.net> <4F235C4A.8040409@scalableinformatics.com> Message-ID: <8C9E1983-6805-4951-8DEB-79FA871940F1@xs4all.nl> No worries - when by mid februari all components from ebay arrived and i've setup a small cluster here i hope to write some MPI benchmarks that do all sorts of latency tests which i'll attach GPL header to, and which should measure from latency to bandwidth using RDMA reads mostly, with all cores of every node busy. Will be interesting then to compare it all. Maybe several over here want to benchmark. When i first designed the latency benchmark, later on Paul Hsieh managed to make the ideas implementation a bit more efficient. I jumped with a random generator through the memory, Paul Hsieh had optimized it to just jumping random. Dieter Buerssner then wrote the test for single cpu to compare whether it was similar to output i got - which appeared to be the case. Setting up random pattern took very long though - then i optimized to setup the random pattern to O ( n log n ). The advantage of all this is that one really sees the impact with all cores at the same time, whereas most tests use a total idle cluster and test 1 microtiny thing. Vincent _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Sat Jan 28 00:29:36 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 21:29:36 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2358FA.4030009@cse.ucdavis.edu> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: <20120128052936.GF20008@bx9.net> On Fri, Jan 27, 2012 at 06:10:02PM -0800, Bill Broadley wrote: > Anyone have an estimate on how much latency a direct connect to QPI > would save vs pci-e? ~ 0.2us. Remember that the first 2 generations of InfiniPath were both SDR: one for HyperTransport and one for PCIe. The difference was 0.3us back then; PathScale + QLogic did some heroic things since to shorten the pipeline stages & up the clock rate. -- greg (and if anyone needs a reminder, I no longer have any financial involvement with QLogic or Intel.) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Sat Jan 28 00:34:17 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 27 Jan 2012 21:34:17 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F235C4A.8040409@scalableinformatics.com> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <88F64A51-B5B9-494A-B6E8-52F3C5FCCAEE@xs4all.nl> <4F2306D3.4080509@scalableinformatics.com> <20120127222723.GB29961@bx9.net> <4F235C4A.8040409@scalableinformatics.com> Message-ID: <20120128053417.GG20008@bx9.net> On Fri, Jan 27, 2012 at 09:24:10PM -0500, Joe Landman wrote: > > Are you talking about the latency of 1 core on 1 system talking to 1 > > core on one system, or the kind of latency that real MPI programs see, > > running on all of the cores on a system and talking to many other > > systems? I assure you that the latter is not 0.8 for any IB system. > > I am looking at these things from a "best of all possible cases" > scenario. So when someone comes at me with new "best of all possible > cases" numbers, I can compare. Sadly this seems to be the state of many > OEM/integrators/manufacturers. The point I've been trying to make for the past 8 years is that one of the two chip families you're looking at doesn't degrade as much as the other from the "best of all possible cases" to a real cluster running a real code. > In storage, we see small disk form factor SSDs marketed generally, with > statments like 50k IOPs, and 500 MB/s. And if you knew that one family of SSDs had a wildly different ratio of peak alleged perf to real application performance, would you ignore that? I suspect not. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Sat Jan 28 05:17:32 2012 From: eugen at leitl.org (Eugen Leitl) Date: Sat, 28 Jan 2012 11:17:32 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F22ED20.7040105@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> Message-ID: <20120128101732.GG7343@leitl.org> On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: > What it says is that we've given up on discussing technology with you, > because your arguments are completely nonsensical. Since you clearly > don't understand technology, we're hoping you can at least understand > the simple concepts of basic etiquette. Who's the list moderator, by the way? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Sat Jan 28 08:32:26 2012 From: eugen at leitl.org (Eugen Leitl) Date: Sat, 28 Jan 2012 14:32:26 +0100 Subject: [Beowulf] photonic buffer bloat Message-ID: <20120128133226.GU7343@leitl.org> Relevant for future clusters, see the PPT presentation linked in below URL. ----- Forwarded message from Masataka Ohta ----- From: Masataka Ohta Date: Sat, 28 Jan 2012 21:42:13 +0900 To: nanog at nanog.org Subject: Re: photonic buffer bloat User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 Eugen Leitl wrote: > In future photonic networks (which will do relativistic cut-through > directly in a photonic crossbar without converting photons to electrons > and back) the fiber is not just a transport channel but also a photonic > buffer Yes. > (e.g. at 10 GBit/s Ethernet a short reach fiber already buffers > a standard 1500 MTU). Wrong. 10Gbps is too slow for optical buffering. At 1Tbps, you can use 100 times less lengthy fiber than at 10Gbps to buffer packets. A 1Tbps packet can be constructed by simultaneously encoding 100 wavelengths at 10Gbps. > Of course photonic gates are expensive, individual delays do add up > so even with slow light buffers Don't try to make light slower. Slow light buffers have resonators, which means they have very very very narrow bandwidth. Instead, make communication speed faster, which shortens fiber length of fiber delay line buffers. > or optical delay loops taken into consideration > current TCP/IP header layout has not been optimized for leading edge > containing most significant switching/routing information, or even > local-knowledge routing (with no global routes). It's too bad IPv6 > was not radical enough, so today's legacy protocols have to be tunneled > through the networks of the future. Considering that, in practice, packet headers must be processed electrically, IPv4 at the photonic backbone is just fine, if most routing table entries are aggregated at /24 or better, which is the current practice. You only have to read a 16M entry SRAM. A problem of IPv6 with 128bit addresses is that route look up can not be performed within a constant time of a few nano seconds, which means packets have overrun fiber delay lines. > I presume this future is some 20-30 years away still. Not so much. Moore's law requires much rapid bandwidth increase. My slides presented at IEEE photonics society 2009 summer topical ftp://chacha.hpcl.titech.ac.jp/IEEE-ST.ppt might be interesting for you. Masataka Ohta ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Sat Jan 28 13:21:59 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Sat, 28 Jan 2012 18:21:59 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: <4F2358FA.4030009@cse.ucdavis.edu> References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: > > So I wonder why multiple OEMs decided to use Mellanox for on-board > > solutions and no one used the QLogic silicon... > > That's a strange argument. It is not an argument, it is stating a fact. If someone claims that a product provide 10x better performance, best fit etc., and from the other side it has very little attraction, something does not make dense. > What does Intel want? Something to make them more money. Intel explained their move in their PR. They see lots of growth in HPC, definitely in the Exascale, and they see InfiniBand as a key to deliver the right solution. They also mention InfiniBand adoption in other markets, so a good validation for InfiniBand as a leading solution for any server and storage connectivity. > >> Also, keep in mind that Intel's benchmarking group in Moscow has a > >> lot of experience with benchmarking real apps for bids using > >> TrueScale > head-to-head > >> against other HCAs, and I wouldn't be surprised if it was the case > that TrueScale > >> QDR is faster than that other company's FDR on many real codes, > > > > > > Surprise surprise... this is no more than FUD. If you have real > > numbers to back it up please send. If it was so great, how come more > > people decided to use the Mellanox solutions? If QLogic was doing so > > great with their solution, I would guess they would not be selling the > > IB business... > > FUD = Fear, Uncertainty, and Doubt. Doesn't sound like FUD to me. > More like a cheap attack on Greg, I think we (the mailing list) can do better. I never saw any genuine testing from PathScale and then QLogic comparing their stuff to Mellanox, and you are more than welcome to try and prove me wrong. The argument in this email thread is no more than a re-cap of QLogic latest marketing campaign and yes, it is no more than FUD. Cheap attacks are not my game, so please.... > I've personally compared several generations of Myrinet and Infinipath to > allegedly faster Mellanox adapters. Mellanox hasn't won yet, but I've not > compared QDR or FDR yet. With that said the reason I run the benchmarks to > find the best solution and it might well be Mellanox next time. It would be > irresponsible to recommend Mellanox cluster provide just pick mellanox FDR > over Qlogic QDR just because of the spec sheet. > Of course recommending Qlogic over Mellanox without quantifying real world > performance would be just as irresponsible. Going into a bit more of a technical discussion... QLogic way of networking is doing everything in the CPU, and Mellanox way is to implement if all in the hardware (we all know that). The second option is a superset, therefore worse case can be even performance. I encourage you to contact me directly for any application benchmarking you do, and I will be happy to provide you the feedback on what you need in order to get the best out of the Mellanox products. That can be QDR vs QDR as well, no need to go to FDR - I am open for the competition any time... > Maybe we could have a few less attacks, complaining and hand waving and > more useful information? IMO Greg never came across as a commercial > (which beowulf list isn't an appropriate place for), but does regularly contribute > useful info. Arguing market share as proof of performance superiority is just > silly. I am not sure about that... quick search in past emails can show amazing things... I believe most of us are in agreement here. Less FUD, more facts. > Speaking of which, you said: > There is some add latency due to the 66/64 new encoding, but overall > latency is lower than QDR. MPI is below 1us. > > I googled for additional information, looked around the Mellanox website, and > couldn't find anything. Is that above number relevant to > HPC folks running clusters? Does it involve a switch? If not It is with a switch -Gilad > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Sat Jan 28 13:41:56 2012 From: eugen at leitl.org (Eugen Leitl) Date: Sat, 28 Jan 2012 19:41:56 +0100 Subject: [Beowulf] What It'll Take to Go Exascale Message-ID: <20120128184156.GB7343@leitl.org> http://www.sciencemag.org/content/335/6067/394.full Science 27 January 2012: Vol. 335 no. 6067 pp. 394-396 DOI: 10.1126/science.335.6067.394 Computer Science What It'll Take to Go Exascale Robert F. Service Scientists hope the next generation of supercomputers will carry out a million trillion operations per second. But first they must change the way the machines are built and run. On fire. More powerful supercomputers now in the design stage should make modeling turbulent gas flames more accurate and revolutionize engine designs. "CREDIT: J. CHEN/CENTER FOR EXASCALE SIMULATION OF COMBUSTION IN TURBULENCE, SANDIA NATIONAL LABORATORIES" Using real climate data, scientists at Lawrence Berkeley National Laboratory (LBNL) in California recently ran a simulation on one of the world's most powerful supercomputers that replicated the number of tropical storms and hurricanes that had occurred over the past 30 years. Its accuracy was a landmark for computer modeling of global climate. But Michael Wehner and his LBNL colleagues have their eyes on a much bigger prize: understanding whether an increase in cloud cover from rising temperatures would retard climate change by reflecting more light back into space, or accelerate it by trapping additional heat close to Earth. To succeed, Wehner must be able to model individual cloud systems on a global scale. To do that, he will need supercomputers more powerful than any yet designed. These so-called exascale computers would be capable of carrying out 1018 floating point operations per second, or an exaflop. That's nearly 100 times more powerful than today's biggest supercomputer, Japan's ?K Computer,? which achieves 11.3 petaflops (1015 flops) (see graph), and 1000 times faster than the Hopper supercomputer used by Wehner and his colleagues. The United States now appears poised to reach for the exascale, as do China, Japan, Russia, India, and the European Union. It won't be easy. Advances in supercomputers have come at a steady pace over the past 20 years, enabled by the continual improvement in computer chip manufacturing. But this evolutionary approach won't cut it in getting to the exascale. Instead, computer scientists must first figure out ways to make future machines far more energy efficient and tolerant of errors, and find novel ways to program them. ?The step we are about to take to exascale computing will be very, very difficult,? says Robert Rosner, a physicist at the University of Chicago in Illinois, who chaired a recent Department of Energy (DOE) committee charged with exploring whether exascale computers would be achievable. Charles Shank, a former director of LBNL who recently headed a separate panel collecting widespread views on what it would take to build an exascale machine, agrees. ?Nobody said it would be impossible,? Shank says. ?But there are significant unknowns.? Gaining support The next generation of powerful supercomputers will be used to design high-efficiency engines tailored to burn biofuels, reveal the causes of supernova explosions, track the atomic workings of catalysts in real time, and study how persistent radiation damage might affect the metal casing surrounding nuclear weapons. ?It's a technology that has become critically important for many scientific disciplines,? says Horst Simon, LBNL's deputy director. That versatility has made supercomputing an easy sell to politicians. The massive 2012 spending bill approved last month by Congress contained $1.06 billion for DOE's program in advanced computing, which includes a down payment to bring online the world's first exascale computer. Congress didn't specify exactly how much money should be spent on the exascale initiative, for which DOE had requested $126 million. But it asked for a detailed plan, due next month, with multiyear budget breakdowns listing who is expected to do what, when. Those familiar with the ways of Washington say that the request reflects an unusual bipartisan consensus on the importance of the initiative. ?In today's political atmosphere, this is very unusual,? says Jack Dongarra, a computer scientist at the University of Tennessee, Knoxville, who closely follows national and international high-performance computing trends. ?It shows how critical it really is and the threat perceived of the U.S. losing its dominance in the field.? The threat is real: Japan and China have built and operate the three most powerful supercomputers in the world. The rest of the world also hopes that their efforts will make them less dependent on U.S. technology. Of today's top 500 supercomputers, the vast majority were built using processors from Intel, Advanced Micro Devices (AMD), and NVIDIA, all U.S.-based companies. But that's beginning to change, at least at the top. Japan's K machine is built using specially designed processors from Fujitsu, a Japanese company. China, which had no supercomputers in the Top500 List in 2000, now has five petascale machines and is building another with processors made by a Chinese company. And an E.U. research effort plans to use ARM processing chips made by a U.K. company. Getting over the bumps Although bigger and faster, supercomputers aren't fundamentally different from our desktops and laptops, all of which rely on the same sorts of specialized components. Computer processors serve as the brains that carry out logical functions, such as adding two numbers together or sending a bit of data to a location where it is needed. Memory chips, by contrast, hold data for safekeeping for later use. A network of wires connects processors and memory and allows data to flow where and when they are needed. For decades, the primary way of improving computers was creating chips with ever smaller and faster circuitry. This increased the processor's frequency, allowing it to churn through tasks at a faster clip. Through the 1990s, chipmakers steadily boosted the frequency of chips. But the improvements came at a price: The power demanded by a processor is proportional to its frequency cubed. So doubling a processor's frequency requires an eightfold increase in power. New king. Japan has the fastest machine (bar), although the United States still has the most petascale computers (number in parentheses). "CREDIT: ADAPTED FROM JACK DONGARRA/TOP 500 LIST/UNIVERSITY OF TENNESSEE" On the rise. The gap in available supercomputing capacity between the United States and the rest of the world has narrowed, with China gaining the most ground. "CREDIT: ADAPTED FROM JACK DONGARRA/TOP 500 LIST/UNIVERSITY OF TENNESSEE" With the rise of mobile computing, chipmakers couldn't raise power demands beyond what batteries could store. So about 10 years ago, chip manufacturers began placing multiple processing ?cores? side by side on single chips. This arrangement meant that only twice the power was needed to double a chip's performance. This trend swept through the world of supercomputers. Those with single souped-up processors gave way to today's ?parallel? machines that couple vast numbers of off-the-shelf commercial processors together. This move to parallel computing ?was a huge, disruptive change,? says Robert Lucas, an electrical engineer at the University of Southern California's Information Sciences Institute in Los Angeles. Hardware makers and software designers had to learn how to split problems apart, send individual pieces to different processors, synchronize the results, and synthesize the final ensemble. Today's top machine?Japan's ?K Computer??has 705,000 cores. If the trend continues, an exascale computer would have between 100 million and 1 billion processors. But simply scaling up today's models won't work. ?Business as usual will not get us to the exascale,? Simon says. ?These computers are becoming so complicated that a number of issues have come up that were not there before,? Rosner agrees. The biggest issue relates to a supercomputer's overall power use. The largest supercomputers today use about 10 megawatts (MW) of power, enough to power 10,000 homes. If the current trend of power use continues, an exascale supercomputer would require 200 MW. ?It would take a nuclear power reactor to run it,? Shank says. Even if that much power were available, the cost would be prohibitive. At $1 million per megawatt per year, the electricity to run an exascale machine would cost $200 million annually. ?That's a non-starter,? Shank says. So the current target is a machine that draws 20 MW at most. Even that goal will require a 300-fold improvement in flops per watt over today's technology. Ideas for getting to these low-power chips are already circulating. One would make use of different types of specialized cores. Today's top-of-the-line supercomputers already combine conventional processor chips, known as CPUs, with an alternative version called graphical processing units (GPUs), which are very fast at certain types of calculations. Chip manufacturers are now looking at going from ?multicore? chips with four or eight cores to ?many-core? chips, each containing potentially hundreds of CPU and GPU cores, allowing them to assign different calculations to specialized processors. That change is expected to make the overall chips more energy efficient. Intel, AMD, and other chip manufacturers have already announced plans to make hybrid many-core chips. Another stumbling block is memory. As the number of processors in a supercomputer skyrockets, so, too, does the need to add memory to feed bits of data to the processors. Yet, over the next few years, memory manufacturers are not projected to increase the storage density of their chips fast enough to keep up with the performance gains of processors. Supercomputer makers can get around this by adding additional memory modules. But that's threatening to drive costs too high, Simon says. Even if researchers could afford to add more memory modules, that still won't solve matters. Moving ever-growing streams of data back and forth to processors is already creating a backup for processors that can dramatically slow a computer's performance. Today's supercomputers use 70% of their power to move bits of data around from one place to another. One potential solution would stack memory chips on top of one another and run communication and power lines vertically through the stack. This more-compact architecture would require fewer steps to route data. Another approach would stack memory chips atop processors to minimize the distance bits need to travel. A third issue is errors. Modern processors compute with stunning accuracy, but they aren't perfect. The average processor will produce one error per year, as a thermal fluctuation or a random electrical spike flips a bit of data from one value to another. Such errors are relatively easy to ferret out when the number of processors is low. But it gets much harder when 100 million to 1 billion processors are involved. And increasing complexity produces additional software errors as well. One possible solution is to have the supercomputer crunch different problems multiple times and ?vote? for the most common solution. But that creates a new problem. ?How can I do this without wasting double or triple the resources?? Lucas asks. ?Solving this problem will probably require new circuit designs and algorithms.? Finally, there is the challenge of redesigning the software applications themselves, such as a novel climate model or a simulation of a chemical reaction. ?Even if we can produce a machine with 1 billion processors, it's not clear that we can write software to use it efficiently,? Lucas says. Current parallel computing machines use a strategy, known as message passing interface, that divides computational problems and parses out the pieces to individual processors, then collects the results. But coordinating all this traffic for millions of processors is becoming a programming nightmare. ?There's a huge concern that the programming paradigm will have to change,? Rosner says. DOE has already begun laying the groundwork to tackle these and other challenges. Last year it began funding three ?co-design? centers, multi-institution cooperatives led by researchers at Los Alamos, Argonne, and Sandia national laboratories. The centers bring together scientific users who write the software code and hardware makers to design complex software and computer architectures that work in the fastest and most energy-efficient manner. It poses a potential clash between scientists who favor openness and hardware companies that normally keep their activities secret for proprietary reasons. ?But it's a worthy goal,? agrees Wilfred Pinfold, Intel's director of extreme-scale programming in Hillsboro, Oregon. Not so fast. Researchers have some ideas on how to overcome barriers to building exascale machines. Coming up with the cash Solving these challenges will take money, and lots of it. Two years ago, Simon says, DOE officials estimated that creating an exascale computer would cost $3 billion to $4 billion over 10 years. That amount would pay for one exascale computer for classified defense work, one for nonclassified work, and two 100-petaflops machines to work out some of the technology along the way. Those projections assumed that Congress would deliver a promised 10-year doubling of the budget of DOE's Office of Science. But those assumptions are ?out of the window,? Simon says, replaced by the more likely scenario of budget cuts as Congress tries to reduce overall federal spending. Given that bleak fiscal picture, DOE officials must decide how aggressively they want to pursue an exascale computer. ?What's the right balance of being aggressive to maintain a leadership position and having the plan sent back to the drawing board by [the Office of Management and Budget]?? Simon asks. ?I'm curious to see.? DOE's strategic plan, due out next month, should provide some answers. The rest of the world faces a similar juggling act. China, Japan, the European Union, Russia, and India all have given indications that they hope to build an exascale computer within the next decade. Although none has released detailed plans, each will need to find the necessary resources despite these tight fiscal times. The victor will reap more than scientific glory. Companies use 57% of the computing time on the machines on the Top500 List, looking to speed product design and gain other competitive advantages, Dongarra says. So government officials see exascale computing as giving their industries a leg up. That's particularly true for chip companies that plan to use exascale designs to improve future commodity electronics. ?It will have dividends all the way down to the laptop,? says Peter Beckman, who directs the Exascale Technology and Computing Initiative at Argonne National Laboratory in Illinois. The race to provide the hardware needed for exascale computing ?will be extremely competitive,? Beckman predicts, and developing software and networking technology will be equally important, according to Dongarra. Even so, many observers think that the U.S. track record and the current alignment of its political and scientific forces makes it America's race to lose. Whatever happens, U.S. scientists are unlikely to be blindsided. The task of building the world's first exascale computer is so complex, Simon says, that it will be nearly impossible for a potential winner to hide in the shadows and come out of nowhere to claim the prize. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Sat Jan 28 14:26:48 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 28 Jan 2012 14:26:48 -0500 (EST) Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <20120128101732.GG7343@leitl.org> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> Message-ID: >> the simple concepts of basic etiquette. > > Who's the list moderator, by the way? no, please - if there were a moderator who had to plow through all messages, no matter how long, meandering and low-worth, it would become a very unpleasant chore... the list doesn't get a lot of passing weirdos - pretty stable set of characters, fairly predictable in how much you want to read their messages, and how much good you expect to gain from them ;) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Sat Jan 28 16:28:09 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 28 Jan 2012 16:28:09 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: >>> So I wonder why multiple OEMs decided to use Mellanox for on-board >>> solutions and no one used the QLogic silicon... >> >> That's a strange argument. > > It is not an argument, it is stating a fact. you are mistaken. you ask a pointed question - do not construe it as a statement of fact. if you wanted to state a fact, you might say: "multiple OEMs decided to use Mellanox and none have used Qlogic". by stating this, you are implying that Mellanox is superior in some way, though another perfectly adequate explanation could be that Qlogic didn't offer their chips to OEMs, or did so at a higher price. (in fact, the latter would suggest the possibility that Qlogic chips are actually worth more.) note my use of subjunctive here. in reality, Mellanox is the easy choice - widely known and used, the default. OEMs are fond of making easy choices: more comfortable to a lazy customer, possibly lower customer support costs, etc. this says nothing about whether an easy choice is a superior solution to the customer (that is, in performance, price, etc). > If someone claims that a product provide 10x better performance, best fit >etc., and from the other side it has very little attraction, something does >not make dense. I saw no 10x performance claim here. there was some casual mention of a situation where Qlogic QDR performs similar to Mellanox FDR. >good validation for InfiniBand as a leading solution for any server and >storage connectivity. besides Lustre, where do you see IB used for storage? > Going into a bit more of a technical discussion... QLogic way of networking >is doing everything in the CPU, and Mellanox way is to implement if all in >the hardware (we all know that). this is a dishonest statement: you know that QLogic isn't actually trying to do *everything* in the CPU. > The second option is a superset, therefore >worse case can be even performance. this is also dishonest: making the adapter more intelligent clearly introduces some tradeoffs, so it's _not_ a superset. unless you are claiming that within every Mellanox adapter is _literally_ the same functionality, at the same performance, as is in a Qlogic adapter. >> Maybe we could have a few less attacks, complaining and hand waving and >> more useful information? IMO Greg never came across as a commercial >> (which beowulf list isn't an appropriate place for), but does regularly contribute >> useful info. Arguing market share as proof of performance superiority is just >> silly. > > I am not sure about that... quick search in past emails can show amazing things... > I believe most of us are in agreement here. Less FUD, more facts. "facts" in this context (as opposed to FUD, armwaiving, etc) must be dispassionate and quantifiable. not hyperbole and suggestive rhetoric. out of curiosity, has anyone set up a head-to-head comparison (two or more identical machines, both with a Qlogic and a Mellanox card of the same vintage)? regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Sat Jan 28 19:12:59 2012 From: diep at xs4all.nl (Vincent Diepeveen) Date: Sun, 29 Jan 2012 01:12:59 +0100 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: On Jan 28, 2012, at 10:28 PM, Mark Hahn wrote: [snip] > out of curiosity, has anyone set up a head-to-head comparison > (two or more identical machines, both with a Qlogic and a Mellanox > card of > the same vintage)? > > regards, mark hahn. Mark, i stumbled upon the same problem a few months ago when i googled for 4x infiniband you can find something, when moving up to QDR it becomes more sporadic. Not to mention that the interesting test is where the cards are bad - latency. If you find anything, usually it's manufacturer side statements without clear testsetup and usually doing 0 byte tests. This is exactly why i intend to write a benchmark. What i personally believe is not important whether FDR, pci-e 3.0 and a considerable higher claimed bandwidth than pci-e 2.0 QDR. What i do believe is that one must measure objectively. That's why i'm posting for a while now that as soon as the cluster works here i'm gonna write a benchmark to measure latencies moving up the read length slowly so that it more and more gets a bandwidth game and simply present the graph for the interested readers. We're not interested in theoretic tests of 1 core busy that is measuring a latency of another core at the other side busy. A test really requires all cores busy and hammering onto the network card. In the end always everything is a measure of bandwidth of course, but even then the lack of scientists online who tested objectively QDR, no matter *what manufacturer*, such tests really are there in short supply and some of them either just tested 1 tiny thing or a theoretic thing, or just lacked all realism when i read the rest of the article. All with all, after some days of googling, I found 1 tester who toyed something using the same switch (good idea) but the graphs drawn presenting the results are tough to interpret and basically was interested in something else than what's fast now for the network cards. Running the same oldie tests, whereas all manufacturers have way faster alternatives now, such as RDMA reads, is just not interesting. To be continued in some months... > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Sun Jan 29 00:03:31 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Sun, 29 Jan 2012 05:03:31 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: > >>> So I wonder why multiple OEMs decided to use Mellanox for on-board > >>> solutions and no one used the QLogic silicon... > >> > >> That's a strange argument. > > > > It is not an argument, it is stating a fact. > > you are mistaken. you ask a pointed question - do not construe it as a > statement of fact. if you wanted to state a fact, you might say: > "multiple OEMs decided to use Mellanox and none have used Qlogic". You probably meant to say "I think differently" and not "you are mistaken".... Making this mailing list little more polite will benefit us all. > by stating this, you are implying that Mellanox is superior in some way, though > another perfectly adequate explanation could be that Qlogic didn't offer their > chips to OEMs, or did so at a higher price. (in fact, the latter would suggest the > possibility that Qlogic chips are actually worth more.) note my use of > subjunctive here. > > in reality, Mellanox is the easy choice - widely known and used, the default. > OEMs are fond of making easy choices: more comfortable to a lazy customer, > possibly lower customer support costs, etc. > > this says nothing about whether an easy choice is a superior solution to the > customer (that is, in performance, price, etc). OEMs don't place devices on the motherboard just because they can, not because it is cheaper. They do so because they believe it will benefit their users, hence they will sell more. I can assure you that silicon was offered from both companies, and it wasn't an issue of price. From this point you can make any conclusion that you wish to. > >good validation for InfiniBand as a leading solution for any server and > >storage connectivity. > > besides Lustre, where do you see IB used for storage? Protocols: iSER (iSCSI), NFSoRDMA, SRP, GPFS, SMB and others OEMs: DDN, Xyratex, Netapp, EMC, Oracle, SGI, HP, IBM and others. > > Going into a bit more of a technical discussion... QLogic way of networking > >is doing everything in the CPU, and Mellanox way is to implement if all in > >the hardware (we all know that). > > this is a dishonest statement: you know that QLogic isn't actually trying > to do *everything* in the CPU. You are right, you do need a HW translation from PCIe to IB. But I am sure you know where the majority of the transport, error handling etc is being done.... > > The second option is a superset, therefore > >worse case can be even performance. > > this is also dishonest: making the adapter more intelligent clearly > introduces some tradeoffs, so it's _not_ a superset. unless you are > claiming that within every Mellanox adapter is _literally_ the same > functionality, at the same performance, as is in a Qlogic adapter. It is not dishonest. In general offloading is a superset. You can chose to implement just offloading or to leave room for CPU control as well. There will always be parts that are better to be in HW, and if you have flexibility for the rest it is a superset. > >> Maybe we could have a few less attacks, complaining and hand waving and > >> more useful information? IMO Greg never came across as a commercial > >> (which beowulf list isn't an appropriate place for), but does regularly > contribute > >> useful info. Arguing market share as proof of performance superiority is > just > >> silly. > > > > I am not sure about that... quick search in past emails can show amazing > things... > > I believe most of us are in agreement here. Less FUD, more facts. > > "facts" in this context (as opposed to FUD, armwaiving, etc) must be > dispassionate and quantifiable. not hyperbole and suggestive rhetoric. Maybe we read different emails. > out of curiosity, has anyone set up a head-to-head comparison > (two or more identical machines, both with a Qlogic and a Mellanox card of > the same vintage)? > > regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Mon Jan 30 10:04:53 2012 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 30 Jan 2012 10:04:53 -0500 (EST) Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: >> out of curiosity, has anyone set up a head-to-head comparison >> (two or more identical machines, both with a Qlogic and a Mellanox card of >> the same vintage)? >> >> There was a bit of discussion of InfiniBand benchmarking in this thread > and it seems it would be helpful to the casual readers like myself to have > a few references to benchmarking toolkits and actual results. > > Most often reported results are gathered with either Netpipe from Ames or > Intel MPI Benchmark (formerly known as Palas Benchmark) or OSU > Micro-benchmarks. > > Searching the web produced a recent report from Swiss CSCS where a Mellanox > ConnectX3 QDR HCA with a Mellanox switch is set against a Qlogic 7300 QDR > HCA connected to a Qlogic switch. > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/Performance_Analysis_IB-QDR_final-2.pdf as far as I can tell, this paper mainly says "a coalescing stack delivers benchmark results showing a lot higher bandwidth and message rate than a non-coalescing stack." the comment on figure 8: To some extent, the environment variables mentioned before contribute to this outstanding result which is remarkably droll. I'm not sure how well coalescing works for real applications. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Jan 30 11:20:46 2012 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 30 Jan 2012 11:20:46 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <20120128101732.GG7343@leitl.org> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> Message-ID: <4F26C35E.7060702@ias.edu> On 01/28/2012 05:17 AM, Eugen Leitl wrote: > On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: > >> What it says is that we've given up on discussing technology with you, >> because your arguments are completely nonsensical. Since you clearly >> don't understand technology, we're hoping you can at least understand >> the simple concepts of basic etiquette. > Who's the list moderator, by the way? > I don't think there is one, hence all the noise. The mailing list and beowulf.org is maintained by Penguin Computing/Scyld Software. Maybe they'd be interested in appoint a moderator or 3. --- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Jan 30 14:22:24 2012 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 30 Jan 2012 19:22:24 +0000 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: > >> out of curiosity, has anyone set up a head-to-head comparison (two or > >> more identical machines, both with a Qlogic and a Mellanox card of > >> the same vintage)? > >> > >> There was a bit of discussion of InfiniBand benchmarking in this > >> thread > > and it seems it would be helpful to the casual readers like myself to > > have a few references to benchmarking toolkits and actual results. > > > > Most often reported results are gathered with either Netpipe from Ames > > or Intel MPI Benchmark (formerly known as Palas Benchmark) or OSU > > Micro-benchmarks. > > > > Searching the web produced a recent report from Swiss CSCS where a > > Mellanox > > ConnectX3 QDR HCA with a Mellanox switch is set against a Qlogic 7300 > > QDR HCA connected to a Qlogic switch. > > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/P > > erformance_Analysis_IB-QDR_final-2.pdf > > as far as I can tell, this paper mainly says "a coalescing stack delivers > benchmark results showing a lot higher bandwidth and message rate than a > non-coalescing stack." the comment on figure 8: > > To some extent, the environment variables mentioned before > contribute to this outstanding result > > which is remarkably droll. I'm not sure how well coalescing works for real > applications. First, I looked on the paper and it includes latency and bandwidth comparison as well, not only message rate. It is important for others to know that, and not to dismiss it. Second, both companies have options for message coalescing. You can chose to use it or not - I saw apps that got a benefit from it, and saw applications that does not. Without coalescing Mellanox provides around 30M message per second. -Gilad. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From peter.st.john at gmail.com Mon Jan 30 18:07:11 2012 From: peter.st.john at gmail.com (Peter St. John) Date: Mon, 30 Jan 2012 18:07:11 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F26C35E.7060702@ias.edu> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: Instead of appointing a moderator, we could grow one with recursive Page Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew about this type of thing a while ago because of "citation analysis", see the link). Someone writes an open script and members of the list mail it with the answers to these three questions: 1. do you volunteer to moderate? 2. Who should moderate? (give email addresses) 3. Who should judge who should moderate? (give email addresses). Then you iterate over scoring people by "wisdom" and who gets the most "wise" votes, until the scores converge. The biggest hurdle would probably be getting volunteers, though. Peter On Mon, Jan 30, 2012 at 11:20 AM, Prentice Bisbal wrote: > On 01/28/2012 05:17 AM, Eugen Leitl wrote: > > On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: > > > >> What it says is that we've given up on discussing technology with you, > >> because your arguments are completely nonsensical. Since you clearly > >> don't understand technology, we're hoping you can at least understand > >> the simple concepts of basic etiquette. > > Who's the list moderator, by the way? > > > > I don't think there is one, hence all the noise. The mailing list and > beowulf.org is maintained by Penguin Computing/Scyld Software. Maybe > they'd be interested in appoint a moderator or 3. > > --- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Jan 30 18:09:48 2012 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 30 Jan 2012 18:09:48 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: <4F27233C.8080508@scalableinformatics.com> On 01/30/2012 06:07 PM, Peter St. John wrote: > Instead of appointing a moderator, we could grow one with recursive Page > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > about this type of thing a while ago because of "citation analysis", see > the link). Please ... no moderator. Lists get boring while waiting for content filtering organisms to fulfill their voluntary tasks ... If you don't like someone's writing, filter them. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Jan 30 18:21:45 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 30 Jan 2012 15:21:45 -0800 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: The biggest hurdle would probably be getting volunteers, though. Peter You got that right... Moderating takes a deft touch and a thick skin. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Mon Jan 30 18:25:49 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 30 Jan 2012 15:25:49 -0800 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F27233C.8080508@scalableinformatics.com> References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <4F27233C.8080508@scalableinformatics.com> Message-ID: -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman Sent: Monday, January 30, 2012 3:10 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business On 01/30/2012 06:07 PM, Peter St. John wrote: > Instead of appointing a moderator, we could grow one with recursive > Page Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we > knew about this type of thing a while ago because of "citation > analysis", see the link). Please ... no moderator. Lists get boring while waiting for content filtering organisms to fulfill their voluntary tasks ... If you don't like someone's writing, filter them. -- I agree. However, there is also "after the fact moderation".. all posts go through by default, but someone acts as a "list conscience" and gently (or not so gently) applies a corrective force, presumably using some sort of adaptive algorithm (different people have different "plant characteristics" so the optimal controller changes). But that requires an even deft-er touch and thicker skin. All lists with participation by knowledgeable and opinionated people with varied interests and specialization tend to go off on tangents occasionally. You just delete when needed, and wait for the transient to die out. My best guess is that about 48 hours is how long the transient lasts (because it takes two cycles, for those who read the list once a day, to realize that it's died out and not keep feeding it) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Jan 30 18:52:14 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 30 Jan 2012 18:52:14 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <1327559032.32034.YahooMailClassic@web120504.mail.ne1.yahoo.com> <4F21E159.7000905@unimelb.edu.au> <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: <294b053bd84fed49f071a631c79be7e8.squirrel@mail.eadline.org> I use my personal Zen type moderation. yea, whatever -- Doug > Instead of appointing a moderator, we could grow one with recursive Page > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > about this type of thing a while ago because of "citation analysis", see > the link). > > Someone writes an open script and members of the list mail it with the > answers to these three questions: > 1. do you volunteer to moderate? > 2. Who should moderate? (give email addresses) > 3. Who should judge who should moderate? (give email addresses). > > Then you iterate over scoring people by "wisdom" and who gets the most > "wise" votes, until the scores converge. > The biggest hurdle would probably be getting volunteers, though. > Peter > > On Mon, Jan 30, 2012 at 11:20 AM, Prentice Bisbal > wrote: > >> On 01/28/2012 05:17 AM, Eugen Leitl wrote: >> > On Fri, Jan 27, 2012 at 01:29:52PM -0500, Prentice Bisbal wrote: >> > >> >> What it says is that we've given up on discussing technology with >> you, >> >> because your arguments are completely nonsensical. Since you clearly >> >> don't understand technology, we're hoping you can at least understand >> >> the simple concepts of basic etiquette. >> > Who's the list moderator, by the way? >> > >> >> I don't think there is one, hence all the noise. The mailing list and >> beowulf.org is maintained by Penguin Computing/Scyld Software. Maybe >> they'd be interested in appoint a moderator or 3. >> >> --- >> Prentice >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pbm.com Tue Jan 31 02:53:18 2012 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 30 Jan 2012 23:53:18 -0800 Subject: [Beowulf] Intel buys QLogic InfiniBand business In-Reply-To: References: <20120123192826.GB17383@bx9.net> <20120124045541.GB10196@bx9.net> <8B4C5C67-6548-4B70-B250-CE6C930E5510@online.no> <20120127221312.GA29961@bx9.net> <4F2358FA.4030009@cse.ucdavis.edu> Message-ID: <20120131075318.GA2600@bx9.net> On Mon, Jan 30, 2012 at 10:04:53AM -0500, Mark Hahn wrote: > > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/Performance_Analysis_IB-QDR_final-2.pdf > > as far as I can tell, this paper mainly says "a coalescing stack delivers > benchmark results showing a lot higher bandwidth and message rate than a > non-coalescing stack." the comment on figure 8: > > To some extent, the environment variables mentioned before > contribute to this outstanding result > > which is remarkably droll. I'm not sure how well coalescing works for real > applications. Note also that many of the benchmarks in this analysis weren't run using MPI -- if I remember correctly, the ib_* commands mentioned use InfiniBand verbs directly, which means they aren't accellerated on InfiniPath. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 31 04:28:18 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 31 Jan 2012 10:28:18 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <4F27233C.8080508@scalableinformatics.com> References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <4F27233C.8080508@scalableinformatics.com> Message-ID: <20120131092818.GW7343@leitl.org> On Mon, Jan 30, 2012 at 06:09:48PM -0500, Joe Landman wrote: > On 01/30/2012 06:07 PM, Peter St. John wrote: > > Instead of appointing a moderator, we could grow one with recursive Page > > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > > about this type of thing a while ago because of "citation analysis", see > > the link). > > Please ... no moderator. Lists get boring while waiting for content > filtering organisms to fulfill their voluntary tasks ... On all the lists I run and participate in you only turn moderation on by default for new list members and put known bozos on permanent moderation. The result is zero delay as soon as new list subscribers have produced their first non-spam non-bozo post. > If you don't like someone's writing, filter them. I already do, but content producers typically don't bother and vote with their feet. I have seen many communities die in that manner. Never surprising, still always sad. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Jan 31 04:31:04 2012 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 31 Jan 2012 10:31:04 +0100 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> Message-ID: <20120131093104.GX7343@leitl.org> On Mon, Jan 30, 2012 at 03:21:45PM -0800, Lux, Jim (337C) wrote: > > > The biggest hurdle would probably be getting volunteers, though. > Peter > > You got that right... Moderating takes a deft touch and a thick skin. I would have no issues moderating Beowulf@ since that would require only negligible additional workload. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Glen.Beane at jax.org Tue Jan 31 07:15:51 2012 From: Glen.Beane at jax.org (Glen Beane) Date: Tue, 31 Jan 2012 12:15:51 +0000 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: <20120131093104.GX7343@leitl.org> References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <20120131093104.GX7343@leitl.org> Message-ID: On Jan 31, 2012, at 4:31 AM, Eugen Leitl wrote: > On Mon, Jan 30, 2012 at 03:21:45PM -0800, Lux, Jim (337C) wrote: >> >> >> The biggest hurdle would probably be getting volunteers, though. >> Peter >> >> You got that right... Moderating takes a deft touch and a thick skin. > > I would have no issues moderating Beowulf@ since that would > require only negligible additional workload. Did this list used to be moderated? I remember when I first joined there would be a significant delay for my email sent to the list, while I was waiting for my replies to show up a whole conversation would be unfolding between "veteran posters" -- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From ellis at cse.psu.edu Tue Jan 31 10:30:48 2012 From: ellis at cse.psu.edu (Ellis H. Wilson III) Date: Tue, 31 Jan 2012 10:30:48 -0500 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business In-Reply-To: References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <20120131093104.GX7343@leitl.org> Message-ID: <4F280928.7080806@cse.psu.edu> On 01/31/2012 07:15 AM, Glen Beane wrote: > Did this list used to be moderated? I remember when I first joined there would be a significant delay for my email sent to the list, while I was waiting for my replies to show up a whole conversation would be unfolding between "veteran posters" Yea, same used to happen to me back in '06 when I first joined. Sent an email about it and got a response back from Don Becker stating that I was taken off the moderation list. I'm not sure if he's still the moderator anymore, however. While I think that's a great way to deal with newcomers, I'm not sure there is a fair way to determine which of the existing posters are and are not trolls deserving of moderation. Therefore I also vote to continue in a non-moderated fashion. On that note, my sincere apologies to the list if any of my replies served in any way to kindle this discussion. I got a bit colorful due to a building frustration from years of eye-rolling. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Tue Jan 31 10:40:48 2012 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Tue, 31 Jan 2012 22:40:48 +0700 Subject: [Beowulf] List moderation In-Reply-To: <4F280928.7080806@cse.psu.edu> References: <95FE2494-EEF9-497E-B446-C44F970FF4DE@xs4all.nl> <4F22CB68.3080605@ias.edu> <4F22CFEB.6080404@cse.psu.edu> <3AA1B93B-6157-4A71-B800-083FEAFBE3C4@xs4all.nl> <4F22ED20.7040105@ias.edu> <20120128101732.GG7343@leitl.org> <4F26C35E.7060702@ias.edu> <20120131093104.GX7343@leitl.org> <4F280928.7080806@cse.psu.edu> Message-ID: <4F280B80.6030800@pathscale.com> On 01/31/12 10:30 PM, Ellis H. Wilson III wrote: > On 01/31/2012 07:15 AM, Glen Beane wrote: >> Did this list used to be moderated? I remember when I first joined there would be a significant delay for my email sent to the list, while I was waiting for my replies to show up a whole conversation would be unfolding between "veteran posters" > Yea, same used to happen to me back in '06 when I first joined. Sent an > email about it and got a response back from Don Becker stating that I > was taken off the moderation list. I'm not sure if he's still the > moderator anymore, however. While I think that's a great way to deal > with newcomers, I'm not sure there is a fair way to determine which of > the existing posters are and are not trolls deserving of moderation. > Therefore I also vote to continue in a non-moderated fashion. -1 From a bystander perspective I'm all for moderation and reducing the noise. Even people who have their posts moderated would likely be understanding that it's for the greater good. Lets call it peer review instead of "moderation". imho someone with some guts just needs to do it so this doesn't turn into a bikeshed discussion _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From joshua_mora at usa.net Tue Jan 31 14:19:46 2012 From: joshua_mora at usa.net (Joshua mora acosta) Date: Tue, 31 Jan 2012 13:19:46 -0600 Subject: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business Message-ID: <525qaETsU7536S02.1328037586@web02.cms.usa.net> I agree with Joe. Plus I know that most of us, if not all, truly want to share knowledge, and why not, opinions as well based on personal experiences as long as "we all do the effort to be respectful with both the individual and the technology and being open /receptive to be criticized as well". That is in fact the reason I like this distribution list. Joshua. ------ Original Message ------ Received: 05:11 PM CST, 01/30/2012 From: Joe Landman To: beowulf at beowulf.org Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business > On 01/30/2012 06:07 PM, Peter St. John wrote: > > Instead of appointing a moderator, we could grow one with recursive Page > > Ranking (http://en.wikipedia.org/wiki/Google_ranking) (in math we knew > > about this type of thing a while ago because of "citation analysis", see > > the link). > > Please ... no moderator. Lists get boring while waiting for content > filtering organisms to fulfill their voluntary tasks ... > > If you don't like someone's writing, filter them. > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mdidomenico4 at gmail.com Tue Jan 31 15:55:55 2012 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Tue, 31 Jan 2012 15:55:55 -0500 Subject: [Beowulf] rear door heat exchangers Message-ID: i'm looking for, but have not found yet, a rear door heat exchanger with fans. the door should be able to support up to 35kw using chilled water. has anyone seen such an animal? most of the ones i've seen utilize a side car that sits beside the rack. unfortunately, i'm space limited and i need something that will hang on the back of the rack. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lathama at gmail.com Tue Jan 31 16:13:48 2012 From: lathama at gmail.com (Andrew Latham) Date: Tue, 31 Jan 2012 18:13:48 -0300 Subject: [Beowulf] rear door heat exchangers In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 5:55 PM, Michael Di Domenico wrote: > i'm looking for, but have not found yet, a rear door heat exchanger > with fans. ?the door should be able to support up to 35kw using > chilled water. ?has anyone seen such an animal? > > most of the ones i've seen utilize a side car that sits beside the > rack. ?unfortunately, i'm space limited and i need something that will > hang on the back of the rack. > _____________________________ Maybe: http://www.hoffmanonline.com/product_catalog/section_index.aspx?cat_1=34&cat_2=2383&SelectCatId=2383&CatId=2383 Semi Related question: Has any research been done on cooling the racks/rails/metal infrastructure in the effort to cool the whole rack+systems? -- ~ Andrew "lathama" Latham lathama at gmail.com http://lathama.net ~ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Tue Jan 31 18:47:18 2012 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 31 Jan 2012 15:47:18 -0800 Subject: [Beowulf] rear door heat exchangers In-Reply-To: References: Message-ID: Maybe there's an issue with the weight and or flexible tubing on a swinging door? The Hoffman products in Andrew's email, I think, aren't the kind that hang on a door, more hang on the side of a large box/cabinet (Type 4,12, 3R enclosure) or wall. They're also air/air heat exchanges or airconditioners (and vortex coolers.. but you don't want one of those unless you have a LOT of compressed air available) http://www.42u.com/cooling/liquid-cooling/liquid-cooling.htm shows "in-row liquid cooling" but I think that's sort of in parallel They do mention, lower down on the page, "Rear Door Liquid Cooling" But I notice that the Liebert XDF-5 which is basically a rack and chiller deck in one, only pulls out 14kW. >From DoE: http://www1.eere.energy.gov/femp/pdfs/rdhe_cr.pdf They refer the ones installed at LLBL as RDHx units, but carefully avoid telling you the brand or any decent data. They do say they cost $6k/door, and suck up 10-11kW/rack with 9 gal/min flow of 72F water. Googling RDHx turns up "CoolCentric.com" http://www.coolcentric.com/resources/data_sheets/Coolcentric-Rear-Door-Heat-Exchanger-Data-Sheet.pdf 33kW is as good as they can do. I also note that they have no fans in them. -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Michael Di Domenico Sent: Tuesday, January 31, 2012 12:56 PM To: Beowulf Mailing List Subject: [Beowulf] rear door heat exchangers i'm looking for, but have not found yet, a rear door heat exchanger with fans. the door should be able to support up to 35kw using chilled water. has anyone seen such an animal? most of the ones i've seen utilize a side car that sits beside the rack. unfortunately, i'm space limited and i need something that will hang on the back of the rack. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From sdm900 at gmail.com Tue Jan 31 18:54:48 2012 From: sdm900 at gmail.com (Stu Midgley) Date: Wed, 1 Feb 2012 07:54:48 +0800 Subject: [Beowulf] rear door heat exchangers In-Reply-To: References: Message-ID: Speak to SGI. We have about a dozen such racks, all from SGI. On Wed, Feb 1, 2012 at 4:55 AM, Michael Di Domenico wrote: > i'm looking for, but have not found yet, a rear door heat exchanger > with fans. ?the door should be able to support up to 35kw using > chilled water. ?has anyone seen such an animal? > > most of the ones i've seen utilize a side car that sits beside the > rack. ?unfortunately, i'm space limited and i need something that will > hang on the back of the rack. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Dr Stuart Midgley sdm900 at gmail.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Herbert.Fruchtl at st-andrews.ac.uk Tue Jan 31 19:18:10 2012 From: Herbert.Fruchtl at st-andrews.ac.uk (Herbert Fruchtl) Date: Wed, 1 Feb 2012 00:18:10 +0000 Subject: [Beowulf] moderation - was cpu's versus gpu's - was Intel buys QLogic Message-ID: <97E75B730E3076479EB136E2BC5D410054874491@uos-dun-mbx1> Folks, I missed part of this discussion (for obvious reasons I lost interest), but since it seems to be moving in that direction, I'll throw in my two smallest-local-currency-units. I'm a lurker (in old usenet parlance) on this list: reading, but very rarely posting. There are probably many of us, but the others are posting even more rarely... As long as we don't get real off-topic discussions that attract the weirdos of the Internet (global warming anybody? intelligent design? even C/Fortran tends to peter out quickly nowadays), I am opposed to censorship (aka moderation). The simplistic arguments are: 1) This is my own, selfish, most important argument: it costs time! When, every two years, I have a technical question for the list, I don't want to wait until the USA is out of bed and hope that the moderator isn't at a conference for a week. 2) You need a moderator. It's quite some work, so it will only be done by somebody who gets some satisfaction out of it. This means that the job will attract exactly the kind of people who will not moderate neutrally and dispassionately. Even if they try, there's the fact that power corrupts. You're tempted to censor views that are too far from your own ("ludicrous" is the word you would use), and in the end you have an in-crowd confirming each other's views. 3) You are opening yourself to lawsuits. If something is said on the list that, let's say Intel's corporate lawyers find defamatory, they may go after the moderator. If you really find somebody's views (and their presentation) objectionable, just killfile them (it's called "filter" in the 21st century). And if certain people think ad hominem attacks help their case, ignore them instead of thinking you can look dignified in taking them on in their own game. You won't. Back to those dark alleys where we lurkers feel at home... Herbert _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.