From prentice at ias.edu Tue Nov 1 11:31:36 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 01 Nov 2011 11:31:36 -0400 Subject: [Beowulf] Users abusing screen In-Reply-To: References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de> <4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu> <774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca> <20111027054147.GB29939@bx9.net> <4EAB2AFE.7000901@ias.edu> Message-ID: <4EB010D8.6060606@ias.edu> IAS is definitely interesting place. I'll have to look up Dima in the directory. On 10/28/2011 07:16 PM, Peter St. John wrote: > Prentice, > No, I didin't mean to imply anything specific about e.g. your budget, > but IAS has a fantastic reputation. > Say hi to Dima for me, he plays Go and is an algebraic geometer > visiting this year. > Peter > > On Fri, Oct 28, 2011 at 6:21 PM, Prentice Bisbal > wrote: > > > On 10/28/2011 04:56 PM, Peter St. John wrote: > > I think Greg is right on the money. Particularly at a place like > IAS, > > where resources are good and users may be errant but are doing great > > things, > > Have you been a visitor, member or staff member at IAS? > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Tue Nov 1 17:42:08 2011 From: deadline at eadline.org (Douglas Eadline) Date: Tue, 1 Nov 2011 17:42:08 -0400 (EDT) Subject: [Beowulf] Exascale Breakfast Message-ID: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> I have nothing to say about screen, but I do want to mention I will be moderating a breakfast panel discussion on the push toward exascale at SC11 this year. It is sponsored but Panasas and SICORP. Here are the details http://www.clustermonkey.net//content/view/314/1/ Does an Exascale Breakfast mean lots of food? And, the Beobash announcement is imminent! Monday November 14th 9PM. Traditional witty invite to arrive soon. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Tue Nov 1 22:10:00 2011 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Tue, 01 Nov 2011 19:10:00 -0700 Subject: [Beowulf] HP redstone servers Message-ID: <4EB0A678.9060602@cse.ucdavis.edu> The best summary I've found: http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ Specifications at for the ECX-1000: http://www.calxeda.com/products/energycore/ecx1000/techspecs And EnergyCard: http://www.calxeda.com/products/energycards/techspecs The only hint on price that I found was from theregister.co.uk: The sales pitch for the Redstone systems, says Santeler, is that a half rack of Redstone machines and their external switches implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, and costs $1.2m. So it sounds like for 6 watts and $750 you get a quad core 1.4 GHz arm 10G connected node. Comments? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Nov 2 09:10:10 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 02 Nov 2011 09:10:10 -0400 Subject: [Beowulf] Exascale Breakfast In-Reply-To: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> References: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> Message-ID: <4EB14132.30902@ias.edu> On 11/01/2011 05:42 PM, Douglas Eadline wrote: > I have nothing to say about screen, I have a feeling that 10 years from now, I'm still going to be getting shit about this. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Nov 3 07:53:44 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 3 Nov 2011 07:53:44 -0400 (EDT) Subject: [Beowulf] 2011 BeoBash In-Reply-To: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> References: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> Message-ID: <35890.67.249.182.227.1320321224.squirrel@mail.eadline.org> You asked for it, here it is: http://www.xandmarketing.com/beobash11/ I have one request, if you work for a company that is not already a sponsor, please ask them to consider helping with the event. It is a community party with lots of visibility and promotion. Please have them contact Lara at lara at xandmarketing.com as soon as possible. Since we have grown so big, if we do not get enough sponsors we may have to limit the attendees -- something we do not want to do. Give a little, get a lot... And, a big thank you to the sponsors! -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Sun Nov 6 18:01:02 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 00:01:02 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions Message-ID: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> hi, There is a lot of infiniband 4x stuff on ebay now. Most interesting for me to buy some and toy with it. However i have not much clue about infiniband as it always used to be 'out of my pricerange'. So looking for som expert advice here. A few questions: Now i assume what i need is: bunch of cards, copper cables, and a switch, and a big dosis of luck. There is a lot of cards on ebay with 2 ports. 10 gbit + 10 gbit. Do i need to connect both to the same switch? So in short with infiniband you lose 2 ports of the switch to 1 card, is that correct? CARDS: Do all switches work with all cards? Can you mix the many different cards that are out there of 4x infiniband? If not, can you mix from mellanox the different cards theirs? So mix from 1 manufacturer cards? SOCKET 1155: Do the cards work for socket 1155 if it's pci-e versions? (of course watercooled nodes each) Is there a limit on how much RAM the machine can have? (referring to the famous 4GB limit of the QM400 cards of quadrics) Does infiniband 4x work for x64 machines? DRIVERS: Drivers for cards now. Are those all open source, or does it require payment? Is the source released of all those cards drivers, and do they integrate into linux? MPI: The MPI library you can use with each card is that different manufacturer from manufacturer? Free to download and can work with different distro's? Does it compile? Is it just a library or a modified compiler? Note i assume it's possible to combine it all with pdsh. SWITCH: I see a bunch of topspin 120 switches there. Supposed to be 200 ns. there is a 47 manual page, yet it doesn't mention anything about a password needed to login, only the default password it mentions. What if it already has been set, as one ebay guy mentions he didn't manage to login there. Is it possible to reset that login or isn't it possible to modify login password? Is it possible to combine 2 switches and have so to speak a 48 port switch then? Oh btw i plan to ship messages sized 256 bytes massively over the switch. Would it work if i add a 2nd switch just to act as a 2nd rail? And a 3d and a 4th rail also work? So a rail or 4 would it work? Really important question that rail question. As that would allow more messages per second. Most messages will be a byte or 128-256 and for sure nothing will be bigger. Some messages are shorter. If 128 is that much faster i'd go for 128. What more do i need to know? Lots of simple questions in short! Many thanks in advance for answerring any question or raising new ones :) Regards, Vincent Diepeveen diep at xs4all.nl _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hearnsj at googlemail.com Mon Nov 7 06:10:50 2011 From: hearnsj at googlemail.com (John Hearns) Date: Mon, 7 Nov 2011 11:10:50 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: Vincent, I cannot answer all of your questions. I have a couple of answers: Regarding MPI, you will be looking for OpenMPI You will need a subnet manager running somewhere on the fabric. These can either run on the switch or on a host. If you are buying this equipment from eBay I would imagine you will be running the Open Fabrics subnet manager on a host on your cluster, rather than on a switch. I might be wrong - depends if the switch has a SM license. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Mon Nov 7 06:35:40 2011 From: eugen at leitl.org (Eugen Leitl) Date: Mon, 7 Nov 2011 12:35:40 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <20111107113540.GO31847@leitl.org> On Mon, Nov 07, 2011 at 11:10:50AM +0000, John Hearns wrote: > Vincent, > I cannot answer all of your questions. > I have a couple of answers: > > Regarding MPI, you will be looking for OpenMPI > > You will need a subnet manager running somewhere on the fabric. > These can either run on the switch or on a host. > If you are buying this equipment from eBay I would imagine you will be > running the Open Fabrics subnet manager > on a host on your cluster, rather than on a switch. > I might be wrong - depends if the switch has a SM license. Assuming ebay-sourced equipment, what price tag are we roughly looking at, per node, assuming small (8-16 nodes) cluster sizes? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From robh at dongle.org.uk Mon Nov 7 06:44:49 2011 From: robh at dongle.org.uk (Robert Horton) Date: Mon, 07 Nov 2011 11:44:49 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <1320666289.1856.25.camel@moelwyn> Hi, Most of what I know about Infiniband came from the notes at http://www.hpcadvisorycouncil.com/events/switzerland_workshop/agenda.php (or John Hearns in his previous life!). On Mon, 2011-11-07 at 00:01 +0100, Vincent Diepeveen wrote: > Do i need to connect both to the same switch? > So in short with infiniband you lose 2 ports of the switch to 1 > card, is that correct? You probably want to just connect one port to a switch and leave the other one unconnected to start with. > CARDS: > Do all switches work with all cards? > Can you mix the many different cards that are out there of 4x > infiniband? > If not, can you mix from mellanox the different cards theirs? So mix > from 1 manufacturer cards? They will (or at least should) all work to a point but depending on what combination you are using you may not get some features. If you want an easy life keep it all from the same manufacturer > Does infiniband 4x work for x64 machines? The 4x bit is the number of links aggregated together. 4x is normal for connections from a switch to a node, higher numbers are sometimes used for inter-switch links. You also need to note the data rate (eg SDR, DDR, QDR etc). > DRIVERS: > Drivers for cards now. Are those all open source, or does it require > payment? Is the source released of > all those cards drivers, and do they integrate into linux? You should get everything you need from the Linux kernel and / or OFED. > MPI: > The MPI library you can use with each card is that different > manufacturer from manufacturer? Free to download > and can work with different distro's? Does it compile? Is it just a > library or a modified compiler? There are quite a lot to choose from but OpenMPI is probably a good starting point. Hope that's some help... Rob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 09:28:37 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 15:28:37 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> hi John, I had read already about subnet manager but i don't really understand this, except when it's only configuration tool. I assume it's not something that's critical in terms of bandwidth, it doesn't need nonstop bandwidth from the machine & switch is it? In case of a simple cluster consisting out of 1 switch with some nodes attached, is it really a problem? On Nov 7, 2011, at 12:10 PM, John Hearns wrote: > Vincent, > I cannot answer all of your questions. > I have a couple of answers: > > Regarding MPI, you will be looking for OpenMPI > > You will need a subnet manager running somewhere on the fabric. > These can either run on the switch or on a host. > If you are buying this equipment from eBay I would imagine you will be > running the Open Fabrics subnet manager > on a host on your cluster, rather than on a switch. > I might be wrong - depends if the switch has a SM license. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 09:45:33 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 15:45:33 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions - using 1 port out of 2 In-Reply-To: <1320666289.1856.25.camel@moelwyn> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> On Nov 7, 2011, at 12:44 PM, Robert Horton wrote: > Hi, > > Most of what I know about Infiniband came from the notes at > http://www.hpcadvisorycouncil.com/events/switzerland_workshop/ > agenda.php > (or John Hearns in his previous life!). > > On Mon, 2011-11-07 at 00:01 +0100, Vincent Diepeveen wrote: >> Do i need to connect both to the same switch? >> So in short with infiniband you lose 2 ports of the switch to 1 >> card, is that correct? > > You probably want to just connect one port to a switch and leave the > other one unconnected to start with. What's the second one doing, is this just in case the switch fails, a kind of 'backup' port? In my naivity i had thought that both ports together formed the bidirectional link to the switch. So i thought that 1 port was for 10 gigabit upstream and the other port was for 10 gigabit downstream, did i misunderstood that? > >> CARDS: >> Do all switches work with all cards? >> Can you mix the many different cards that are out there of 4x >> infiniband? >> If not, can you mix from mellanox the different cards theirs? So mix >> from 1 manufacturer cards? > > They will (or at least should) all work to a point but depending on > what > combination you are using you may not get some features. If you > want an > easy life keep it all from the same manufacturer I will load the switch (es) to the maximum number of messages a second it can handle, > >> Does infiniband 4x work for x64 machines? > > The 4x bit is the number of links aggregated together. 4x is normal > for > connections from a switch to a node, higher numbers are sometimes used > for inter-switch links. You also need to note the data rate (eg SDR, > DDR, QDR etc). > >> DRIVERS: >> Drivers for cards now. Are those all open source, or does it require >> payment? Is the source released of >> all those cards drivers, and do they integrate into linux? > > You should get everything you need from the Linux kernel and / or > OFED. > >> MPI: >> The MPI library you can use with each card is that different >> manufacturer from manufacturer? Free to download >> and can work with different distro's? Does it compile? Is it just a >> library or a modified compiler? > > There are quite a lot to choose from but OpenMPI is probably a good > starting point. > > Hope that's some help... > > Rob > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From j.wender at science-computing.de Mon Nov 7 09:45:38 2011 From: j.wender at science-computing.de (Jan Wender) Date: Mon, 07 Nov 2011 15:45:38 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: <4EB7EF12.8000403@science-computing.de> Hi all, a relatively easy to read introduction to IB is found at http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf Cheerio, Jan -- ---- Company Information ---- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Philippe Miltin Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- A non-text attachment was scrubbed... Name: j_wender.vcf Type: text/x-vcard Size: 338 bytes Desc: not available URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at mclaren.com Mon Nov 7 10:45:39 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Mon, 7 Nov 2011 15:45:39 -0000 Subject: [Beowulf] building Infiniband 4x cluster questions References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: <207BB2F60743C34496BE41039233A809092AD253@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > hi John, > > I had read already about subnet manager but i don't really understand > this, except when it's only configuration tool. > > I assume it's not something that's critical in terms of bandwidth, it > doesn't need nonstop bandwidth from the machine & switch is it? > It is critical. I perhaps am not explaining this correctly. In an Ethernet network you have a MAC address and the process of ARPing - ie if you want to open a connection to another host on the Ethernet, you broadcast its IP address and you get returned a MAC address. Hey, that's why its called an ETHERnet (geddit? Oh, the drollery of those Xerox engineers) Anyway, on an Infiniband network the Subnet Manager assigns new hosts a LID (local identifier) and keeps track of routing tables between them. No SM, no new hosts join the network. An Infiniband expert will be along in a minute and explain that you can operate a fabric without an SM and I shall stand corrected. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From sabujp at gmail.com Mon Nov 7 11:01:34 2011 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Mon, 7 Nov 2011 10:01:34 -0600 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <207BB2F60743C34496BE41039233A809092AD253@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> <207BB2F60743C34496BE41039233A809092AD253@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: > Anyway, on an Infiniband network the Subnet Manager assigns new hosts a > LID (local identifier) > and keeps track of routing tables between them. > No SM, no new hosts join the network. Regardless, make sure you're running opensm on an at least one of the nodes connected to your IB switch. I didn't have to configure anything within the manager, just make sure it's running. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From robh at dongle.org.uk Mon Nov 7 11:07:07 2011 From: robh at dongle.org.uk (Robert Horton) Date: Mon, 07 Nov 2011 16:07:07 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions - using 1 port out of 2 In-Reply-To: <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> Message-ID: <1320682027.1856.61.camel@moelwyn> On Mon, 2011-11-07 at 15:45 +0100, Vincent Diepeveen wrote: > What's the second one doing, is this just in case the switch fails, > a > kind of 'backup' port? > > In my naivity i had thought that both ports together formed the > bidirectional link to the switch. > So i thought that 1 port was for 10 gigabit upstream and the other > port was for 10 gigabit downstream, > did i misunderstood that? It's "normal" to just use single port cards in a compute server. You might want to use 2 (or more) to increase the bandwidth to a particular machine (might be useful for a fileserver, for instance) or if you are linking nodes to each other (rather than via a switch) in a taurus-type topology. Rob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 11:13:13 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 16:13:13 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <1320666289.1856.25.camel@moelwyn> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: > > Do i need to connect both to the same switch? > > So in short with infiniband you lose 2 ports of the switch to 1 card, > > is that correct? > > You probably want to just connect one port to a switch and leave the other > one unconnected to start with. Correct. You only need to connect one port. The second port can be used for performance increase or fail over for example. > > CARDS: > > Do all switches work with all cards? > > Can you mix the many different cards that are out there of 4x > > infiniband? > > If not, can you mix from mellanox the different cards theirs? So mix > > from 1 manufacturer cards? > > They will (or at least should) all work to a point but depending on what > combination you are using you may not get some features. If you want an > easy life keep it all from the same manufacturer All cards and switches build according to the spec will work. > > Does infiniband 4x work for x64 machines? > > The 4x bit is the number of links aggregated together. 4x is normal for > connections from a switch to a node, higher numbers are sometimes used > for inter-switch links. You also need to note the data rate (eg SDR, DDR, QDR > etc). 4X means 4 network lanes (same as the PCIe convention - PCIe x4, x8 etc.). It is related to the port speed, not the server architecture. Most of the InfiniBand port out there are 4X > > > DRIVERS: > > Drivers for cards now. Are those all open source, or does it require > > payment? Is the source released of all those cards drivers, and do > > they integrate into linux? > > You should get everything you need from the Linux kernel and / or OFED. You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Nov 7 11:36:22 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 07 Nov 2011 11:36:22 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <4EB80906.4040501@ias.edu> On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: > hi, > > There is a lot of infiniband 4x stuff on ebay now. Vincent, Do you mean 4x, or QDR? They refer to different parts of the IB architecture. 4x refers to the number of lanes for the data to travel down and QDR refers to the data signalling rate. It's probably irrelevant for this conversation, but if you are just learning about IB, It's good to understand that difference. Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Nov 7 11:50:36 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 07 Nov 2011 11:50:36 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: <4EB80C5C.6020605@ias.edu> >>> DRIVERS: >>> Drivers for cards now. Are those all open source, or does it require >>> payment? Is the source released of all those cards drivers, and do >>> they integrate into linux? >>> You should get everything you need from the Linux kernel and / or OFED. > > You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows > I bought some Supermicro systems about a year ago (maybe new than that), with newer Mellanox cards( Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0 5GT/s - IB DDR / 10GigE). That aren't fully supported by the OFED that comes with RHEL/CentOS, not even version 6.1, so I had to download the latest Mellanox OFED to get them to work. I can confirm Gilad's statement that you can download them for free, they are 100% open source, and you don't need to be a paying customer or register on the Mellanox site, or any of that BS. Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Nov 7 11:54:11 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 7 Nov 2011 08:54:11 -0800 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB7EF12.8000403@science-computing.de> Message-ID: An interesting writeup.. A sort of tangential question about that writeup.. They use the term "message oriented" with the description that the IB hardware takes care of segmentation and so forth, so that the application just says "send this" or "receive this" and the gory details are concealed. Then he distinguishes that from a TCP/IP stack, etc., where the software does a lot of this, with the implication that the user has to be involved in that. But it seems to me that the same processes are going on.. You have a big message, it needs to be broken up, etc. And for *most users* all that is hidden underneath the hood of, say, MPI. (obviously, if you are a message passing software writer, the distinction is important). On 11/7/11 6:45 AM, "Jan Wender" wrote: >Hi all, > >a relatively easy to read introduction to IB is found at >http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf > >Cheerio, >Jan >-- >---- Company Information ---- >Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, >Dr. >Arno Steitz, Dr. Ingrid Zech >Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: >Philippe Miltin >Sitz/Registered Office: Tuebingen Registergericht/Registration Court: >Stuttgart >Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 11:58:41 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 17:58:41 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB80906.4040501@ias.edu> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> Message-ID: hi Prentice, I had noticed the diff between SDR up to QDR, the SDR cards are affordable, the QDR isn't. The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap prices in that pricerange yet. If i would want to build a network that's low latency and had a budget of $800 or so a node of course i would build a dolphin SCI network, as that's probably the fastest latency card sold for a $675 or so a piece. I do not really see a rival latency wise to Dolphin there. I bet most manufacturers selling clusters don't use it as they can make $100 more profit or so selling other networking stuff, and universities usually swallow that. So price total dominates the network. As it seems now infiniband 4x is not going to offer enough performance. The one-way pingpong latencies over a switch that i see of it, are not very convincing. I see remote writes to RAM are like nearly 10 microseconds for 4x infiniband and that card is the only one affordable. The old QM400's i have here are one-way pingpong 2.1 us or so, and QM500-B's are plentyful on the net (of course big disadvantage: needs pci-x), which are a 1.3 us or so there and have SHMEM. Not seeing a cheap switch for the QM500's though nor cables. You see price really dominates everything here. Small cheap nodes you cannot build if the port price, thanks to expensive network card, more than doubles. Power is not the real concern for now - if a factory already burns a couple of hundreds of megawatts, a small cluster somewhere on the attick eating a few kilowatts is not really a problem :) On Nov 7, 2011, at 5:36 PM, Prentice Bisbal wrote: > > On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: >> hi, >> >> There is a lot of infiniband 4x stuff on ebay now. > > Vincent, > > Do you mean 4x, or QDR? They refer to different parts of the IB > architecture. 4x refers to the number of lanes for the data to travel > down and QDR refers to the data signalling rate. > > It's probably irrelevant for this conversation, but if you are just > learning about IB, It's good to understand that difference. > > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 12:02:30 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 18:02:30 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB80C5C.6020605@ias.edu> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <4EB80C5C.6020605@ias.edu> Message-ID: On Nov 7, 2011, at 5:50 PM, Prentice Bisbal wrote: >>>> DRIVERS: >>>> Drivers for cards now. Are those all open source, or does it >>>> require >>>> payment? Is the source released of all those cards drivers, and do >>>> they integrate into linux? >>>> You should get everything you need from the Linux kernel and / >>>> or OFED. >> >> You can also find the drivers on the vendors sites. Not sure about >> the rest, but for the Mellanox case it is open source and free - >> both for Linux and Windows >> > > I bought some Supermicro systems about a year ago (maybe new than > that), > with newer Mellanox cards( Mellanox Technologies MT26418 [ConnectX VPI > PCIe 2.0 5GT/s - IB DDR / 10GigE). That aren't fully supported by the > OFED that comes with RHEL/CentOS, not even version 6.1, so I had to > download the latest Mellanox OFED to get them to work. I can confirm > Gilad's statement that you can download them for free, they are 100% > open source, and you don't need to be a paying customer or > register on > the Mellanox site, or any of that BS. Yeah i saw that some websites charge money for that. I saw the Dolphin website wants $5000 for a developer license or something vague. I call that 'download rights for the SDK'. Sounds weird to me. As for the MT26418 that's $562, that's factors too much for a low budget cluster that's low latency. Another website i checked out was an Indian website: plx technologies. Seems in India. However didn't allow me to register. For bandiwdth you don't need to check them out, as in some webvideo i saw them speak about 600MB/s as if it was a lot, which is of course a joke even to old 4x infiniband, which gets handsdown 800MB/s. But for latency might not be bad idea. Yet didn't allow me to register that plxtechnologies, which is weird. > Prentice > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 12:20:24 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 18:20:24 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions - using 1 port out of 2 In-Reply-To: <1320682027.1856.61.camel@moelwyn> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> <1320682027.1856.61.camel@moelwyn> Message-ID: <658BF446-A7C5-473B-93D8-EA8EAC256F47@xs4all.nl> On Nov 7, 2011, at 5:07 PM, Robert Horton wrote: > On Mon, 2011-11-07 at 15:45 +0100, Vincent Diepeveen wrote: >> What's the second one doing, is this just in case the switch fails, >> a >> kind of 'backup' port? >> >> In my naivity i had thought that both ports together formed the >> bidirectional link to the switch. >> So i thought that 1 port was for 10 gigabit upstream and the other >> port was for 10 gigabit downstream, >> did i misunderstood that? > > It's "normal" to just use single port cards in a compute server. You > might want to use 2 (or more) to increase the bandwidth to a > particular > machine (might be useful for a fileserver, for instance) or if you are > linking nodes to each other (rather than via a switch) in a taurus- > type > topology. > > Rob > It's still not clear to me what exactly the 2nd link is doing. If i want to ship th emaximum amount of short messages, say 128 bytes each message, is a 2nd cable gonna increase the number of messages i can ship? In fact the messages i'll be shipping out is requests to read remote in a blocking manner 128 bytes. So say this proces P at node N0 wants from some other node N1 exactly 128 bytes from the gigabytes big hashtable. That's a blocked read. The number of blocked reads per second that can read a 128 bytes is the only thing that matters for the network, nothing else. Note it will also do writes, but with writes you always can be doing things in a more tricky manner. So to speak you can queue up a bunch and ship them. Writes do not need to be non-blocking. If they flow at a tad slower speed to the correct node and get written that's also ok. The write is 32 bytes max. In fact i don't want to read 128 bytes. As that 128 bytes is 4 entries and as in such cluster network it's a layered system, if i would be able to modify the source code doing the read, all i would give is a location, the host processor then can do the read of 32 bytes and give that. As i assume the network to be silly and not able to execute remote code, i read 128 bytes and figure out here which of the 4 positions *possible* is the correct position stored (odds about a tad more than 5% that a position already was stored before). So ideally i'd be doing reads of 32 bytes, yet as the request for the read is not capable of selecting the correct position, it has to scan 128 bytes for it, so i get the entire 128 bytes. The number of 128 byte reads per second randomized over the hashtable that's spreaded over the nodes, is the speed at which the 'mainsearch' can search. I'm guessing blocked reads to eat nearly 10 microseconds with infiniband 4x, so that would mean i can do about a 100k lookups a card. Question is whether connecting the 2nd port would speedup that to more than 100k reads per second. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 12:19:54 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 17:19:54 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: > hi John, > > I had read already about subnet manager but i don't really understand this, > except when it's only configuration tool. > > I assume it's not something that's critical in terms of bandwidth, it doesn't > need nonstop bandwidth from the machine & switch is it? The subnet management is just an agent in the fabric that give identifiers to the ports and set the routing in the fabric (in case of static routing). It will also discover new nodes once connected to the fabric, or nodes that went down (in the later case, it can modify the routing accordingly). The agent requires negligible network resources, so no need to worry. You can run the subnet management from a server (head node for example using OpenSM for example) or from one of the switches. Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 12:14:56 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 17:14:56 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> Message-ID: > I had noticed the diff between SDR up to QDR, the SDR cards are affordable, > the QDR isn't. > > The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap prices in > that pricerange yet. You can also find cards on www.colfaxdirect.com. You can also check with the HPC Advisory Council (www.hpcadvisorycouncil.com) - they are doing refresh cycles for their systems, and might have some older cards to donate. Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 12:16:26 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 17:16:26 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB7EF12.8000403@science-computing.de> Message-ID: > They use the term "message oriented" with the description that the IB > hardware takes care of segmentation and so forth, so that the application > just says "send this" or "receive this" and the gory details are > concealed. Then he distinguishes that from a TCP/IP stack, etc., where > the software does a lot of this, with the implication that the user has to be > involved in that. > > But it seems to me that the same processes are going on.. You have a big > message, it needs to be broken up, etc. > And for *most users* all that is hidden underneath the hood of, say, MPI. > (obviously, if you are a message passing software writer, the distinction is > important). You can also post large message to the IB interface (up to 2GB I believe) and the IB transport will break it to the network MTU. Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 12:26:41 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 18:26:41 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: <6E607B84-8C7F-4C40-9916-48EAEFAF46A1@xs4all.nl> Thanks for the very clear explanation Gilad! You beated with just 2 lines entire wiki and lots of other homepages with endless of chatter :) On Nov 7, 2011, at 6:19 PM, Gilad Shainer wrote: >> hi John, >> >> I had read already about subnet manager but i don't really >> understand this, >> except when it's only configuration tool. >> >> I assume it's not something that's critical in terms of bandwidth, >> it doesn't >> need nonstop bandwidth from the machine & switch is it? > > The subnet management is just an agent in the fabric that give > identifiers to the ports and set the routing in the fabric (in case > of static routing). It will also discover new nodes once connected > to the fabric, or nodes that went down (in the later case, it can > modify the routing accordingly). The agent requires negligible > network resources, so no need to worry. You can run the subnet > management from a server (head node for example using OpenSM for > example) or from one of the switches. > > Gilad > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Nov 7 13:16:00 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 07 Nov 2011 13:16:00 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> Message-ID: <4EB82060.3050300@ias.edu> Vincent, Don't forget that between SDR and QDR, there is DDR. If SDR is too slow, and QDR is too expensive, DDR might be just right. -- Goldilocks On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: > hi Prentice, > > I had noticed the diff between SDR up to QDR, > the SDR cards are affordable, the QDR isn't. > > The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap > prices in that pricerange yet. > > If i would want to build a network that's low latency and had a budget > of $800 or so a node of course i would > build a dolphin SCI network, as that's probably the fastest latency > card sold for a $675 or so a piece. > > I do not really see a rival latency wise to Dolphin there. I bet most > manufacturers selling clusters don't use > it as they can make $100 more profit or so selling other networking > stuff, and universities usually swallow that. > > So price total dominates the network. As it seems now infiniband 4x is > not going to offer enough performance. > The one-way pingpong latencies over a switch that i see of it, are not > very convincing. I see remote writes to RAM > are like nearly 10 microseconds for 4x infiniband and that card is the > only one affordable. > > The old QM400's i have here are one-way pingpong 2.1 us or so, and > QM500-B's are plentyful on the net (of course big disadvantage: needs > pci-x), > which are a 1.3 us or so there and have SHMEM. Not seeing a cheap > switch for the QM500's though nor cables. > > You see price really dominates everything here. Small cheap nodes you > cannot build if the port price, thanks to expensive network card, > more than doubles. > > Power is not the real concern for now - if a factory already burns a > couple of hundreds of megawatts, a small cluster somewhere on the > attick eating > a few kilowatts is not really a problem :) > > > > On Nov 7, 2011, at 5:36 PM, Prentice Bisbal wrote: > >> >> On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: >>> hi, >>> >>> There is a lot of infiniband 4x stuff on ebay now. >> >> Vincent, >> >> Do you mean 4x, or QDR? They refer to different parts of the IB >> architecture. 4x refers to the number of lanes for the data to travel >> down and QDR refers to the data signalling rate. >> >> It's probably irrelevant for this conversation, but if you are just >> learning about IB, It's good to understand that difference. >> >> Prentice >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 13:51:28 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 19:51:28 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <20111107113540.GO31847@leitl.org> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <20111107113540.GO31847@leitl.org> Message-ID: <5B66256F-ACFA-489B-AE01-44F5DB8F2B61@xs4all.nl> hi Eugen, In Game Tree Search basically algorithmically it is a century further than many other sciences as the brilliant minds have been busy with it. For the brilliant guys it was possible to make CASH with it. In Math there is stil many challenges to design 1 kick butt algorithm, but you won't get rich with it. As a result from Alan Turing up to the latest Einstein, they al have put their focus upon Game Tree Search. I'm moving now towards robotica in fact, building a robot. Not the robot, as i suck in building robots, but the software part so far hasn't been realy developed very well for robots. Unexplored area still for civil use that is. But as for the chessprograms now, they combine a bunch of algorithms and every single one of them profits bigtime (exponential) from caching. That caching is of course random. So the cluster we look at in number of nodes you can probably count at one hand, yet i intend to put 4 network cards (4 rails for insiders here) into each single machine. Machine is a big word, it wil be stand alone mainboard of course to save costs. So the price of each network card is fairly important. As it seems now, the old quadrics network cards QM500-B that you can pick up for $30 each or so on ebay are most promising. At Home i have a full working QM400 setup which is 2.1 us latency one way ping pong. So i'd guess a blocked read has a latency not much above that. I can choose myself whether i want to do reads of 128 bytes or 256 bytes. No big deal in fact. It's scathered through the RAM, so each read is a random read fromthe RAM. With 4 nodes that would mean of course odds 25% it's a local RAM read (no nothing network read then), and 75% odds it's somewhere in the gigabytes of RAM from a remote machine. As it seems now 4x infiniband has a blocked read latency that's too slow and i don't know for which sockets 4x works, as all testreports i read the 4x infiniband just works for old socket 604. So am not sure it works for socket 1366 let alone socket 1155; those have a different memory architecture so it's never sure whether a much older network card that works DMA will work for it. Also i hear nothing about putting several cards in 1 machine. I want at least 4 rails of course from those old crap cards. You'll argue that for 4x infiniband this is not very cost effective, as the price of 4 cards and 4 cables is already gonna be nearly 400 dollar. That's also what i noticed. But if i put in 2x QM500-B in for example a P6T professional, that's gonna be cheaper including the cables than $200 and it will be able to deliver i'd guess over a million blocked reads per second. By already doing 8 probes which is 192-256 bytes currently i already 'bandwidth optimized' the algorithm. Back in the days that Leierson at MiT ran cilkchess and other engines at the origin3800 there and some Sun supercomputers, they requested in slow manner a single probe of what will it have been, a byte or 8-12. So far it seems that 4 infiniband cards 4x can deliver me only 400k blocked reads a second, which is a workable number in fact (the amount i need depends largely upon how fast the node is) for a single socket machine. Yet i'm not aware whether infiniband allows multiple rails. Does it? The QM400 cards i have here, i'd guess can deliver with 4 rails around 1.2 million blocked reads a second, which already allows a lot faster nodes. The ideal kick butt machine so far is a simple supermicro mainboard with 4x pci-x and 4 sockets. Now it'll depend upon which cpu's i can get cheapest whether that's intel or AMD. If the 8 core socket 1366 cpu's are going to be cheap @ 22 nm, that's of course with some watercooling, say clock them to 4.5Ghz, gonna be kick butt nodes. Those mainboards allow "only" 2 rails, which definitely means that the QM400 cards, not to mention 4x infiniband is an underperformer. Up to 24 nodes, infiniband has cheap switches. But it seems only the newer infiniband cards have a latency that's sufficient, and all of them are far over $500, so that's far outside of budget. Even then they still can't beat a single QM500-B card. It's more than said that the top500 sporthall hardly needs bandwidth let alone latency. I saw that exactly a cluster in the same sporthall top500 with simple built in gigabit that isn't even DMA was only 2x slower than the same machines equipped with infiniband. Now some wil cry here that gigabit CAN have reasonable one way pingpong's, not to mention the $5k solarflare cards of 10 gigabit ethernet, yet in all sanity we must be honest that the built in gigabits from practical performance reasons are more like 500 microseconds latency if you have all cores busy. In fact even the realtime linux kernel will central lock every udp packet you ship or receive. Ugly ugly. That's no compare with the latencies of the HPC cards of course, whether you use MPI or SHMEM doesn't really matter there. That difference is so huge. As a result it seems there was never much of a push to having great network cards. That might change now with gpu's kicking butt, though those need of course massive bandwidth, not latency. For my tiny cluster latency is what matters. Usually 'one way pingpong' is a good representation of the speed of blocked reads, Quadrics excepted, as the SHMEM allows way faster blocked reads there than 2 times the price for a MPI one-way pingpong. Quadrics is dead and gone. Old junk. My cluster also will be old junk probably, with exception maybe of the cpu's. Yet if i don't find sponsorship for the cpu's, of course i'm on a big budget there as well. On Nov 7, 2011, at 12:35 PM, Eugen Leitl wrote: > On Mon, Nov 07, 2011 at 11:10:50AM +0000, John Hearns wrote: >> Vincent, >> I cannot answer all of your questions. >> I have a couple of answers: >> >> Regarding MPI, you will be looking for OpenMPI >> >> You will need a subnet manager running somewhere on the fabric. >> These can either run on the switch or on a host. >> If you are buying this equipment from eBay I would imagine you >> will be >> running the Open Fabrics subnet manager >> on a host on your cluster, rather than on a switch. >> I might be wrong - depends if the switch has a SM license. > > Assuming ebay-sourced equipment, what price tag > are we roughly looking at, per node, assuming small > (8-16 nodes) cluster sizes? > > -- > Eugen* Leitl leitl http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 15:09:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 21:09:46 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB82060.3050300@ias.edu> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> <4EB82060.3050300@ias.edu> Message-ID: <0502713E-FED7-430B-9200-E9F19576C67C@xs4all.nl> It seems the latency of DDR infiniband to do a blocked read from remote memory (RDMA) is between that of SDR and quadrics, with quadrics being a lot faster. http://www.google.nl/url?sa=t&rct=j&q=rdma%20latency%20ddr% 20infiniband&source=web&cd=9&ved=0CF8QFjAI&url=http%3A%2F% 2Fwww.cse.scitech.ac.uk%2Fdisco%2Fmew18%2FPresentations%2FDay2% 2F5th_Session% 2FMarkLehrer.pdf&ei=tjW4ToWjOY2dOoD69esB&usg=AFQjCNEzRhG5ljCxmm1r0SMXVob nAbZUAQ&cad=rja If i click there i get to a MarkLehrer.pdf www.cse.scitech.ac.uk/disco/mew18/Presentations/.../MarkLehrer.pdf It claims a RDMA read has latency of 1.91 us. However i'll have to see that in my own benchmark first before i believe it when we hammer with many different processes at that card at the same time. You get problems like switch latencies and other nasty stuff then. This is a presentation slide and i need something that works in reality. HP 4X DDR InfiniBand Mezzanine HCA 410533-B21 SFF-8470 they're $75 but just 2 of them available on ebay. The next 'ddr' one is QLE7104 QLOGIC INFINIBAND 8X DDR SINGLE PORT HBA So that's a qlogic one, $108 just 3 of them available, but we already get at a dangerous price level. Remember i want well over a million reads getting done a second and i didn't count the pollution by writes even yet. HP 4X DDR InfiniBand Mezzanine HCA - 2 Ports 448262-B21 They're $121 and again just 2 available. This seems a problem with infiniband on ebay. Even if you search 16 cards, you can each time buy 2 or so max. As if sometimes a scientist takes 2 back home and puts 'em on ebay. No big 'old' masses get posted there. The first one to offer 10, that's http://www.ebay.com/itm/HP- INFINIBAND-4X-DDR-PCI-E-DUAL-PORT-HCA-448397B21-/110649801200? pt=COMP_EN_Hubs&hash=item19c33df9f0 That 's at $192.11 a piece. It seems DDR infiniband still isn't in my pricerange Prentice. The QM500-B's from quadrics go for between $30 and $50 however. On Nov 7, 2011, at 7:16 PM, Prentice Bisbal wrote: > Vincent, > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > slow, and QDR is too expensive, DDR might be just right. > > -- > Goldilocks > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >> hi Prentice, >> >> I had noticed the diff between SDR up to QDR, >> the SDR cards are affordable, the QDR isn't. >> >> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap >> prices in that pricerange yet. >> >> If i would want to build a network that's low latency and had a >> budget >> of $800 or so a node of course i would >> build a dolphin SCI network, as that's probably the fastest latency >> card sold for a $675 or so a piece. >> >> I do not really see a rival latency wise to Dolphin there. I bet most >> manufacturers selling clusters don't use >> it as they can make $100 more profit or so selling other networking >> stuff, and universities usually swallow that. >> >> So price total dominates the network. As it seems now infiniband >> 4x is >> not going to offer enough performance. >> The one-way pingpong latencies over a switch that i see of it, are >> not >> very convincing. I see remote writes to RAM >> are like nearly 10 microseconds for 4x infiniband and that card is >> the >> only one affordable. >> >> The old QM400's i have here are one-way pingpong 2.1 us or so, and >> QM500-B's are plentyful on the net (of course big disadvantage: needs >> pci-x), >> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >> switch for the QM500's though nor cables. >> >> You see price really dominates everything here. Small cheap nodes you >> cannot build if the port price, thanks to expensive network card, >> more than doubles. >> >> Power is not the real concern for now - if a factory already burns a >> couple of hundreds of megawatts, a small cluster somewhere on the >> attick eating >> a few kilowatts is not really a problem :) >> >> >> >> On Nov 7, 2011, at 5:36 PM, Prentice Bisbal wrote: >> >>> >>> On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: >>>> hi, >>>> >>>> There is a lot of infiniband 4x stuff on ebay now. >>> >>> Vincent, >>> >>> Do you mean 4x, or QDR? They refer to different parts of the IB >>> architecture. 4x refers to the number of lanes for the data to >>> travel >>> down and QDR refers to the data signalling rate. >>> >>> It's probably irrelevant for this conversation, but if you are just >>> learning about IB, It's good to understand that difference. >>> >>> Prentice >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Greg at Keller.net Mon Nov 7 15:21:51 2011 From: Greg at Keller.net (Greg Keller) Date: Mon, 07 Nov 2011 14:21:51 -0600 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: Message-ID: <4EB83DDF.5020902@Keller.net> > Date: Mon, 07 Nov 2011 13:16:00 -0500 > From: Prentice Bisbal > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > Cc: Beowulf Mailing List > Message-ID:<4EB82060.3050300 at ias.edu> > Content-Type: text/plain; charset=ISO-8859-1 > > Vincent, > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > slow, and QDR is too expensive, DDR might be just right. And for DDR a key thing is, when latency matters, "ConnectX" DDR is much better than the earlier "Infinihost III" DDR cards. We have 100's of each and the ConnectX make a large impact for some codes. Although nearly antique now, we actually have plans for the ConnectX cards in yet another round of updated systems. This is the 3rd Generation system I have been able to re-use the cards in (Harperton, Nehalem, and now Single Socket Sandy Bridge), which makes me very happy. A great investment that will likely live until PCI-Gen3 slots are the norm. -- Da Bears?! > -- > Goldilocks > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >> > hi Prentice, >> > >> > I had noticed the diff between SDR up to QDR, >> > the SDR cards are affordable, the QDR isn't. >> > >> > The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap >> > prices in that pricerange yet. >> > >> > If i would want to build a network that's low latency and had a budget >> > of $800 or so a node of course i would >> > build a dolphin SCI network, as that's probably the fastest latency >> > card sold for a $675 or so a piece. >> > >> > I do not really see a rival latency wise to Dolphin there. I bet most >> > manufacturers selling clusters don't use >> > it as they can make $100 more profit or so selling other networking >> > stuff, and universities usually swallow that. >> > >> > So price total dominates the network. As it seems now infiniband 4x is >> > not going to offer enough performance. >> > The one-way pingpong latencies over a switch that i see of it, are not >> > very convincing. I see remote writes to RAM >> > are like nearly 10 microseconds for 4x infiniband and that card is the >> > only one affordable. >> > >> > The old QM400's i have here are one-way pingpong 2.1 us or so, and >> > QM500-B's are plentyful on the net (of course big disadvantage: needs >> > pci-x), >> > which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >> > switch for the QM500's though nor cables. >> > >> > You see price really dominates everything here. Small cheap nodes you >> > cannot build if the port price, thanks to expensive network card, >> > more than doubles. >> > >> > Power is not the real concern for now - if a factory already burns a >> > couple of hundreds of megawatts, a small cluster somewhere on the >> > attick eating >> > a few kilowatts is not really a problem:) >> > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 15:33:52 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 21:33:52 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB83DDF.5020902@Keller.net> References: <4EB83DDF.5020902@Keller.net> Message-ID: hi Greg, Very useful info! I already was wondering about the different timings i see for infiniband, but indeed it's the ConnectX that scores better in latency. $289 on ebay but that's directly QDR then. "ConnectX-2 Dual-Port VPI QDR Infiniband Mezzanine I/O Card for Dell PowerEdge M1000e-Series Blade Servers" This 1.91 microseconds for a RDMA read is for a connectx. Not bad for Infiniband. Only 50% slower in latency than quadrics which is pci-x of course. Yet now needed is a cheap price for 'em :) It seems indeed all the 'cheap' offers are the infinihost III DDR versions. Regards, Vincent On Nov 7, 2011, at 9:21 PM, Greg Keller wrote: > >> Date: Mon, 07 Nov 2011 13:16:00 -0500 >> From: Prentice Bisbal >> Subject: Re: [Beowulf] building Infiniband 4x cluster questions >> Cc: Beowulf Mailing List >> Message-ID:<4EB82060.3050300 at ias.edu> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Vincent, >> >> Don't forget that between SDR and QDR, there is DDR. If SDR is too >> slow, and QDR is too expensive, DDR might be just right. > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > much > better than the earlier "Infinihost III" DDR cards. We have 100's of > each and the ConnectX make a large impact for some codes. Although > nearly antique now, we actually have plans for the ConnectX cards > in yet > another round of updated systems. This is the 3rd Generation system I > have been able to re-use the cards in (Harperton, Nehalem, and now > Single Socket Sandy Bridge), which makes me very happy. A great > investment that will likely live until PCI-Gen3 slots are the norm. > -- > Da Bears?! > >> -- >> Goldilocks >> >> >> On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >>>> hi Prentice, >>>> >>>> I had noticed the diff between SDR up to QDR, >>>> the SDR cards are affordable, the QDR isn't. >>>> >>>> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find >>>> cheap >>>> prices in that pricerange yet. >>>> >>>> If i would want to build a network that's low latency and had a >>>> budget >>>> of $800 or so a node of course i would >>>> build a dolphin SCI network, as that's probably the fastest >>>> latency >>>> card sold for a $675 or so a piece. >>>> >>>> I do not really see a rival latency wise to Dolphin there. I >>>> bet most >>>> manufacturers selling clusters don't use >>>> it as they can make $100 more profit or so selling other >>>> networking >>>> stuff, and universities usually swallow that. >>>> >>>> So price total dominates the network. As it seems now >>>> infiniband 4x is >>>> not going to offer enough performance. >>>> The one-way pingpong latencies over a switch that i see of it, >>>> are not >>>> very convincing. I see remote writes to RAM >>>> are like nearly 10 microseconds for 4x infiniband and that card >>>> is the >>>> only one affordable. >>>> >>>> The old QM400's i have here are one-way pingpong 2.1 us or so, and >>>> QM500-B's are plentyful on the net (of course big disadvantage: >>>> needs >>>> pci-x), >>>> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >>>> switch for the QM500's though nor cables. >>>> >>>> You see price really dominates everything here. Small cheap >>>> nodes you >>>> cannot build if the port price, thanks to expensive network card, >>>> more than doubles. >>>> >>>> Power is not the real concern for now - if a factory already >>>> burns a >>>> couple of hundreds of megawatts, a small cluster somewhere on the >>>> attick eating >>>> a few kilowatts is not really a problem:) >>>> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 17:07:51 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 22:07:51 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB83DDF.5020902@Keller.net> Message-ID: RDMA read is a round trip operation and it is measured from host memory to host memory. I doubt if Quadrics had half of it for round trip operations measured from host memory to host memory. The PCI-X memory to card was around 0.7 by itself (one way).... Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Vincent Diepeveen Sent: Monday, November 07, 2011 12:33 PM To: Greg Keller Cc: beowulf at beowulf.org Subject: Re: [Beowulf] building Infiniband 4x cluster questions hi Greg, Very useful info! I already was wondering about the different timings i see for infiniband, but indeed it's the ConnectX that scores better in latency. $289 on ebay but that's directly QDR then. "ConnectX-2 Dual-Port VPI QDR Infiniband Mezzanine I/O Card for Dell PowerEdge M1000e-Series Blade Servers" This 1.91 microseconds for a RDMA read is for a connectx. Not bad for Infiniband. Only 50% slower in latency than quadrics which is pci-x of course. Yet now needed is a cheap price for 'em :) It seems indeed all the 'cheap' offers are the infinihost III DDR versions. Regards, Vincent On Nov 7, 2011, at 9:21 PM, Greg Keller wrote: > >> Date: Mon, 07 Nov 2011 13:16:00 -0500 >> From: Prentice Bisbal >> Subject: Re: [Beowulf] building Infiniband 4x cluster questions >> Cc: Beowulf Mailing List >> Message-ID:<4EB82060.3050300 at ias.edu> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Vincent, >> >> Don't forget that between SDR and QDR, there is DDR. If SDR is too >> slow, and QDR is too expensive, DDR might be just right. > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > much better than the earlier "Infinihost III" DDR cards. We have > 100's of each and the ConnectX make a large impact for some codes. > Although nearly antique now, we actually have plans for the ConnectX > cards in yet another round of updated systems. This is the 3rd > Generation system I have been able to re-use the cards in (Harperton, > Nehalem, and now Single Socket Sandy Bridge), which makes me very > happy. A great investment that will likely live until PCI-Gen3 slots > are the norm. > -- > Da Bears?! > >> -- >> Goldilocks >> >> >> On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >>>> hi Prentice, >>>> >>>> I had noticed the diff between SDR up to QDR, the SDR cards are >>>> affordable, the QDR isn't. >>>> >>>> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find >>>> cheap prices in that pricerange yet. >>>> >>>> If i would want to build a network that's low latency and had a >>>> budget of $800 or so a node of course i would build a dolphin SCI >>>> network, as that's probably the fastest latency card sold for a >>>> $675 or so a piece. >>>> >>>> I do not really see a rival latency wise to Dolphin there. I bet >>>> most manufacturers selling clusters don't use it as they can make >>>> $100 more profit or so selling other networking stuff, and >>>> universities usually swallow that. >>>> >>>> So price total dominates the network. As it seems now infiniband >>>> 4x is not going to offer enough performance. >>>> The one-way pingpong latencies over a switch that i see of it, are >>>> not very convincing. I see remote writes to RAM are like nearly >>>> 10 microseconds for 4x infiniband and that card is the only one >>>> affordable. >>>> >>>> The old QM400's i have here are one-way pingpong 2.1 us or so, and >>>> QM500-B's are plentyful on the net (of course big disadvantage: >>>> needs >>>> pci-x), >>>> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >>>> switch for the QM500's though nor cables. >>>> >>>> You see price really dominates everything here. Small cheap nodes >>>> you cannot build if the port price, thanks to expensive network >>>> card, more than doubles. >>>> >>>> Power is not the real concern for now - if a factory already burns >>>> a couple of hundreds of megawatts, a small cluster somewhere on >>>> the attick eating a few kilowatts is not really a problem:) >>>> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 18:25:56 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 00:25:56 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB83DDF.5020902@Keller.net> Message-ID: <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> Yeah well i'm no expert there what pci-x adds versus pci-e. I'm on a budget here :) I just test things and go for the fastest. But if we do theoretic math, SHMEM is difficult to beat of course. Google for measurements with shmem, not many out there. Fact that so few standardized/rewrote their floating point software to gpu's, is already saying enough about all the legacy codes in HPC world :) When some years ago i had a working 2 cluster node here with QM500- A , it had at 32 bits , 33Mhz pci long sleeve slots a blocked read latency of under 3 us is what i saw on my screen. Sure i had no switch in between it. Direct connection between the 2 elan4's. I'm not sure what pci-x adds to it when clocked at 133Mhz, but it won't be a big diff with pci-e. PCI-e probably only has a bigger bandwidth isn't it? Beating such hardware 2nd hand is difficult. $30 on ebay and i can install 4 rails or so. Didn't find the cables yet though... So i don't see how to outdo that with old infiniband cards which are $130 and upwards for the connectx, say $150 soon, which would allow only single rail or maybe at best 2 rails. So far didn't hear anyone yet who has more than single rail IB. Is it possible to install 2 rails with IB? So if i use your number in pessimistic manner, which means that there is some overhead of pci-x, then the connectx type IB, can do 1 million blocked reads per second theoretic with 2 rails. Which is $300 or so, cables not counted. Quadrics QM500 is around 2 million blocked reads per second for 4 rails @ $120 , cables not counted. Copper cables which have a cost of around 100 ns each 10 meters, if i use 1/3 of lightspeed for electrons in copper, those costs also are kept low with short cables. On Nov 7, 2011, at 11:07 PM, Gilad Shainer wrote: > RDMA read is a round trip operation and it is measured from host > memory to host memory. I doubt if Quadrics had half of it for round > trip operations measured from host memory to host memory. The PCI-X > memory to card was around 0.7 by itself (one way).... > > Gilad > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Monday, November 07, 2011 12:33 PM > To: Greg Keller > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > hi Greg, > > Very useful info! I already was wondering about the different > timings i see for infiniband, but indeed it's the ConnectX that > scores better in latency. > > $289 on ebay but that's directly QDR then. > > "ConnectX-2 Dual-Port VPI QDR Infiniband Mezzanine I/O Card for > Dell PowerEdge M1000e-Series Blade Servers" > > This 1.91 microseconds for a RDMA read is for a connectx. Not bad > for Infiniband. > Only 50% slower in latency than quadrics which is pci-x of course. > > Yet now needed is a cheap price for 'em :) > > It seems indeed all the 'cheap' offers are the infinihost III DDR > versions. > > Regards, > Vincent > > On Nov 7, 2011, at 9:21 PM, Greg Keller wrote: > >> >>> Date: Mon, 07 Nov 2011 13:16:00 -0500 >>> From: Prentice Bisbal >>> Subject: Re: [Beowulf] building Infiniband 4x cluster questions >>> Cc: Beowulf Mailing List >>> Message-ID:<4EB82060.3050300 at ias.edu> >>> Content-Type: text/plain; charset=ISO-8859-1 >>> >>> Vincent, >>> >>> Don't forget that between SDR and QDR, there is DDR. If SDR is too >>> slow, and QDR is too expensive, DDR might be just right. >> And for DDR a key thing is, when latency matters, "ConnectX" DDR is >> much better than the earlier "Infinihost III" DDR cards. We have >> 100's of each and the ConnectX make a large impact for some codes. >> Although nearly antique now, we actually have plans for the ConnectX >> cards in yet another round of updated systems. This is the 3rd >> Generation system I have been able to re-use the cards in (Harperton, >> Nehalem, and now Single Socket Sandy Bridge), which makes me very >> happy. A great investment that will likely live until PCI-Gen3 slots >> are the norm. >> -- >> Da Bears?! >> >>> -- >>> Goldilocks >>> >>> >>> On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >>>>> hi Prentice, >>>>> >>>>> I had noticed the diff between SDR up to QDR, the SDR cards are >>>>> affordable, the QDR isn't. >>>>> >>>>> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find >>>>> cheap prices in that pricerange yet. >>>>> >>>>> If i would want to build a network that's low latency and had a >>>>> budget of $800 or so a node of course i would build a dolphin >>>>> SCI >>>>> network, as that's probably the fastest latency card sold for a >>>>> $675 or so a piece. >>>>> >>>>> I do not really see a rival latency wise to Dolphin there. I bet >>>>> most manufacturers selling clusters don't use it as they can >>>>> make >>>>> $100 more profit or so selling other networking stuff, and >>>>> universities usually swallow that. >>>>> >>>>> So price total dominates the network. As it seems now infiniband >>>>> 4x is not going to offer enough performance. >>>>> The one-way pingpong latencies over a switch that i see of it, >>>>> are >>>>> not very convincing. I see remote writes to RAM are like nearly >>>>> 10 microseconds for 4x infiniband and that card is the only one >>>>> affordable. >>>>> >>>>> The old QM400's i have here are one-way pingpong 2.1 us or so, >>>>> and >>>>> QM500-B's are plentyful on the net (of course big disadvantage: >>>>> needs >>>>> pci-x), >>>>> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >>>>> switch for the QM500's though nor cables. >>>>> >>>>> You see price really dominates everything here. Small cheap nodes >>>>> you cannot build if the port price, thanks to expensive network >>>>> card, more than doubles. >>>>> >>>>> Power is not the real concern for now - if a factory already >>>>> burns >>>>> a couple of hundreds of megawatts, a small cluster somewhere on >>>>> the attick eating a few kilowatts is not really a problem:) >>>>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing To change your subscription (digest mode or unsubscribe) >> visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From jhh3851 at yahoo.com Mon Nov 7 18:44:41 2011 From: jhh3851 at yahoo.com (Joseph Han) Date: Mon, 7 Nov 2011 15:44:41 -0800 (PST) Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: Message-ID: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> To further complicate issue, if latency is the key driving factor for older hardware, I think that the chips with the Infinipath/Pathscale lineage tend to have lower latencies than the Mellanox Inifinihost line. ? When in the DDR time frame, I measured Infinipath ping-pong latencies 3-4x better than that of DDR Mellanox silicon. ?Of course, the Infinipath silicon will require different kernel drivers than those from Mellanox (ipath versus mthca). ?These were QLogic specific HCA's and not the rebranded Silverstorm HCA's sold by QLogic. ?(Confused yet?) ?I believe that the model number was QLogic 7240 for the DDR version and QLogic 7140 for the SDR one. Joseph Message: 2 Date: Mon, 07 Nov 2011 14:21:51 -0600 From: Greg Keller Subject: Re: [Beowulf] building Infiniband 4x cluster questions To: beowulf at beowulf.org Message-ID: <4EB83DDF.5020902 at Keller.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Date: Mon, 07 Nov 2011 13:16:00 -0500 > From: Prentice Bisbal > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > Cc: Beowulf Mailing List > Message-ID:<4EB82060.3050300 at ias.edu> > Content-Type: text/plain; charset=ISO-8859-1 > > Vincent, > > Don't forget that between SDR and QDR, there is DDR.? If SDR is too > slow, and QDR is too expensive, DDR might be just right. And for DDR a key thing is, when latency matters, "ConnectX" DDR is much better than the earlier "Infinihost III" DDR cards.? We have 100's of each and the ConnectX make a large impact for some codes.? Although nearly antique now, we actually have plans for the ConnectX cards in yet another round of updated systems.? This is the 3rd Generation system I have been able to re-use the cards in (Harperton, Nehalem, and now Single Socket Sandy Bridge), which makes me very happy.? A great investment that will likely live until PCI-Gen3 slots are the norm. -- Da Bears?! > -- > Goldilocks > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >> >? hi Prentice, >> > >> >? I had noticed the diff between SDR up to QDR, >> >? the SDR cards are affordable, the QDR isn't. >> > >> >? The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap >> >? prices in that pricerange yet. >> > >> >? If i would want to build a network that's low latency and had a budget >> >? of $800 or so a node of course i would >> >? build a dolphin SCI network, as that's probably the fastest latency >> >? card sold for a $675 or so a piece. >> > >> >? I do not really see a rival latency wise to Dolphin there. I bet most >> >? manufacturers selling clusters don't use >> >? it as they can make $100 more profit or so selling other networking >> >? stuff, and universities usually swallow that. >> > >> >? So price total dominates the network. As it seems now infiniband 4x is >> >? not going to offer enough performance. >> >? The one-way pingpong latencies over a switch that i see of it, are not >> >? very convincing. I see remote writes to RAM >> >? are like nearly 10 microseconds for 4x infiniband and that card is the >> >? only one affordable. >> > >> >? The old QM400's i have here are one-way pingpong 2.1 us or so, and >> >? QM500-B's are plentyful on the net (of course big disadvantage: needs >> >? pci-x), >> >? which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >> >? switch for the QM500's though nor cables. >> > >> >? You see price really dominates everything here. Small cheap nodes you >> >? cannot build if the port price, thanks to expensive network card, >> >? more than doubles. >> > >> >? Power is not the real concern for now - if a factory already burns a >> >? couple of hundreds of megawatts, a small cluster somewhere on the >> >? attick eating >> >? a few kilowatts is not really a problem:) >> > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Mon Nov 7 18:57:45 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 00:57:45 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> Message-ID: <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> On Nov 8, 2011, at 12:44 AM, Joseph Han wrote: > To further complicate issue, if latency is the key driving factor > for older hardware, I think that the chips with the Infinipath/ > Pathscale lineage tend to have lower latencies than the Mellanox > Inifinihost line. > > When in the DDR time frame, I measured Infinipath ping-pong > latencies 3-4x better than that of DDR Mellanox silicon. Of > course, the Infinipath silicon will require different kernel > drivers than those from Mellanox (ipath versus mthca). These were > QLogic specific HCA's and not the rebranded Silverstorm HCA's sold > by QLogic. (Confused yet?) I believe that the model number was > QLogic 7240 for the DDR version and QLogic 7140 for the SDR one. > > Joseph > Claim of manufactuer is 1.2 us one-way pingpong for QLE7240. Of course to get to that number possibly they would've needed to use their grandmother analogue stopwatch, but even 1.2 us ain't bad :) 95 dollar on ebay. Anyone having even better news? Vincent > > > Message: 2 > Date: Mon, 07 Nov 2011 14:21:51 -0600 > From: Greg Keller > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > To: beowulf at beowulf.org > Message-ID: <4EB83DDF.5020902 at Keller.net> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Date: Mon, 07 Nov 2011 13:16:00 -0500 > > From: Prentice Bisbal > > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > Cc: Beowulf Mailing List > > Message-ID:<4EB82060.3050300 at ias.edu> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Vincent, > > > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > > slow, and QDR is too expensive, DDR might be just right. > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > much > better than the earlier "Infinihost III" DDR cards. We have 100's of > each and the ConnectX make a large impact for some codes. Although > nearly antique now, we actually have plans for the ConnectX cards > in yet > another round of updated systems. This is the 3rd Generation system I > have been able to re-use the cards in (Harperton, Nehalem, and now > Single Socket Sandy Bridge), which makes me very happy. A great > investment that will likely live until PCI-Gen3 slots are the norm. > -- > Da Bears?! > > > -- > > Goldilocks > > > > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: > >> > hi Prentice, > >> > > >> > I had noticed the diff between SDR up to QDR, > >> > the SDR cards are affordable, the QDR isn't. > >> > > >> > The SDR's are all $50-$75 on ebay now. The QDR's i didn't > find cheap > >> > prices in that pricerange yet. > >> > > >> > If i would want to build a network that's low latency and had > a budget > >> > of $800 or so a node of course i would > >> > build a dolphin SCI network, as that's probably the fastest > latency > >> > card sold for a $675 or so a piece. > >> > > >> > I do not really see a rival latency wise to Dolphin there. I > bet most > >> > manufacturers selling clusters don't use > >> > it as they can make $100 more profit or so selling other > networking > >> > stuff, and universities usually swallow that. > >> > > >> > So price total dominates the network. As it seems now > infiniband 4x is > >> > not going to offer enough performance. > >> > The one-way pingpong latencies over a switch that i see of > it, are not > >> > very convincing. I see remote writes to RAM > >> > are like nearly 10 microseconds for 4x infiniband and that > card is the > >> > only one affordable. > >> > > >> > The old QM400's i have here are one-way pingpong 2.1 us or > so, and > >> > QM500-B's are plentyful on the net (of course big > disadvantage: needs > >> > pci-x), > >> > which are a 1.3 us or so there and have SHMEM. Not seeing a > cheap > >> > switch for the QM500's though nor cables. > >> > > >> > You see price really dominates everything here. Small cheap > nodes you > >> > cannot build if the port price, thanks to expensive network > card, > >> > more than doubles. > >> > > >> > Power is not the real concern for now - if a factory already > burns a > >> > couple of hundreds of megawatts, a small cluster somewhere on > the > >> > attick eating > >> > a few kilowatts is not really a problem:) > >> > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 20:46:38 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Tue, 8 Nov 2011 01:46:38 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> References: <4EB83DDF.5020902@Keller.net> <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> Message-ID: > I just test things and go for the fastest. But if we do theoretic math, SHMEM > is difficult to beat of course. > Google for measurements with shmem, not many out there. SHMEM within the node or between nodes? > Fact that so few standardized/rewrote their floating point software to gpu's, > is already saying enough about all the legacy codes in HPC world :) > > When some years ago i had a working 2 cluster node here with QM500- A , it > had at 32 bits , 33Mhz pci long sleeve slots a blocked read latency of under 3 > us is what i saw on my screen. Sure i had no switch in between it. Direct > connection between the 2 elan4's. > > I'm not sure what pci-x adds to it when clocked at 133Mhz, but it won't be a > big diff with pci-e. There is a big different between PCIX and PCIe. PCIe is half the latency - from 0.7 to 0.3 more or less. > PCI-e probably only has a bigger bandwidth isn't it? Also bandwidth ...:-) > Beating such hardware 2nd hand is difficult. $30 on ebay and i can install 4 > rails or so. > Didn't find the cables yet though... > > So i don't see how to outdo that with old infiniband cards which are > $130 and upwards for the connectx, say $150 soon, which would allow only > single rail > or maybe at best 2 rails. So far didn't hear anyone yet who has more than > single rail IB. > > Is it possible to install 2 rails with IB? Yes, you can do dual rails > So if i use your number in pessimistic manner, which means that there is > some overhead of pci-x, then the connectx type IB, can do 1 million blocked > reads per second theoretic with 2 rails. Which is $300 or so, cables not > counted. Are you referring to RDMA reads? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 20:53:55 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Tue, 8 Nov 2011 01:53:55 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> Message-ID: The latency numbers are more or less the same between the IB vendors on SDR, DDR and QDR. Mellanox is the only vendor with FDR IB for now, and with PCIe 3.0 latency are below 1us (RDMA much below...). Question is what you are going to use the system for - which apps. Gilad > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Monday, November 07, 2011 3:58 PM > To: jhh3851 at yahoo.com > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > > On Nov 8, 2011, at 12:44 AM, Joseph Han wrote: > > > To further complicate issue, if latency is the key driving factor for > > older hardware, I think that the chips with the Infinipath/ Pathscale > > lineage tend to have lower latencies than the Mellanox Inifinihost > > line. > > > > When in the DDR time frame, I measured Infinipath ping-pong latencies > > 3-4x better than that of DDR Mellanox silicon. Of course, the > > Infinipath silicon will require different kernel drivers than those > > from Mellanox (ipath versus mthca). These were QLogic specific HCA's > > and not the rebranded Silverstorm HCA's sold by QLogic. (Confused > > yet?) I believe that the model number was QLogic 7240 for the DDR > > version and QLogic 7140 for the SDR one. > > > > Joseph > > > > Claim of manufactuer is 1.2 us one-way pingpong for QLE7240. Of course to > get to that number possibly they would've needed to use their grandmother > analogue stopwatch, but even 1.2 us ain't bad :) > > 95 dollar on ebay. > > Anyone having even better news? > > Vincent > > > > > > > Message: 2 > > Date: Mon, 07 Nov 2011 14:21:51 -0600 > > From: Greg Keller > > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > To: beowulf at beowulf.org > > Message-ID: <4EB83DDF.5020902 at Keller.net> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > > > > Date: Mon, 07 Nov 2011 13:16:00 -0500 > > > From: Prentice Bisbal > > > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > > Cc: Beowulf Mailing List > > > Message-ID:<4EB82060.3050300 at ias.edu> > > > Content-Type: text/plain; charset=ISO-8859-1 > > > > > > Vincent, > > > > > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > > > slow, and QDR is too expensive, DDR might be just right. > > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > > much better than the earlier "Infinihost III" DDR cards. We have > > 100's of each and the ConnectX make a large impact for some codes. > > Although nearly antique now, we actually have plans for the ConnectX > > cards in yet another round of updated systems. This is the 3rd > > Generation system I have been able to re-use the cards in (Harperton, > > Nehalem, and now Single Socket Sandy Bridge), which makes me very > > happy. A great investment that will likely live until PCI-Gen3 slots > > are the norm. > > -- > > Da Bears?! > > > > > -- > > > Goldilocks > > > > > > > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: > > >> > hi Prentice, > > >> > > > >> > I had noticed the diff between SDR up to QDR, the SDR cards are > > >> > affordable, the QDR isn't. > > >> > > > >> > The SDR's are all $50-$75 on ebay now. The QDR's i didn't > > find cheap > > >> > prices in that pricerange yet. > > >> > > > >> > If i would want to build a network that's low latency and had > > a budget > > >> > of $800 or so a node of course i would build a dolphin SCI > > >> > network, as that's probably the fastest > > latency > > >> > card sold for a $675 or so a piece. > > >> > > > >> > I do not really see a rival latency wise to Dolphin there. I > > bet most > > >> > manufacturers selling clusters don't use it as they can make > > >> > $100 more profit or so selling other > > networking > > >> > stuff, and universities usually swallow that. > > >> > > > >> > So price total dominates the network. As it seems now > > infiniband 4x is > > >> > not going to offer enough performance. > > >> > The one-way pingpong latencies over a switch that i see of > > it, are not > > >> > very convincing. I see remote writes to RAM are like nearly 10 > > >> > microseconds for 4x infiniband and that > > card is the > > >> > only one affordable. > > >> > > > >> > The old QM400's i have here are one-way pingpong 2.1 us or > > so, and > > >> > QM500-B's are plentyful on the net (of course big > > disadvantage: needs > > >> > pci-x), > > >> > which are a 1.3 us or so there and have SHMEM. Not seeing a > > cheap > > >> > switch for the QM500's though nor cables. > > >> > > > >> > You see price really dominates everything here. Small cheap > > nodes you > > >> > cannot build if the port price, thanks to expensive network > > card, > > >> > more than doubles. > > >> > > > >> > Power is not the real concern for now - if a factory already > > burns a > > >> > couple of hundreds of megawatts, a small cluster somewhere on > > the > > >> > attick eating > > >> > a few kilowatts is not really a problem:) > > >> > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > > Computing To change your subscription (digest mode or unsubscribe) > > visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 21:33:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 03:33:46 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB83DDF.5020902@Keller.net> <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> Message-ID: <857E3421-8EB5-4260-81BF-4AADEFF706C2@xs4all.nl> On Nov 8, 2011, at 2:46 AM, Gilad Shainer wrote: >> I just test things and go for the fastest. But if we do theoretic >> math, SHMEM >> is difficult to beat of course. >> Google for measurements with shmem, not many out there. > > SHMEM within the node or between nodes? shmem is the programming library that cray had and that quadrics had. so basically your program doesn't need silly message catching mpi commands everywhere. You only define at program start whether an array is getting tracked by elan4 and which nodes it gets updated to etc. So no need to check for MPI overfows for the complex code of starting / stopping cpu's. Can reuse code there easily to start remote nodes and cpu's. So where the majority of the latency is needed for RDMA reads and/or reads from remote elan memory, the tough yet in overhead neglectible complicated code to start/stop cpu's, is a bit easier to program with SHMEM library. the caches on the quadrics cards have shmem so you don't access the RAM at all, it's already in the cards. didn't check whether those features got added to mpi somehow. so you just need to read the card - it's not gonna go through pci-x at all at the remote node. Yet of course all this is not so relevant to explain here - as quadrics is long gone, and i just search for a cheapo solution :) So you lose only 2x the pci-x latency, versus 4x pci-e latency in such case. In case of a RDMA read i doubt latency of DDR infiniband is faster than quadrics. that 0.7 you mentionned if it is microseconds sounds like a bit overestimated latency for pci-x. From the 1.3 us that the MPI-one-way pingpong is at QM500, if we multiply it by 2 it's 2.6 us. From that 2.6 us, according to your math it's already 2.8 us cost to pci-x, then , which has a cost of 2x pci-x, receiving elan has a cost of 130 ns, switch say 300 ns including cables for a 128 port router, 100 ns from the sending elan. that's 530 ns, and that times 2 is 1060 ns. There's really little left for the pci-x. as 2.6 - 1.06 = 1.44 us left for 4 times pci-x. 1.44 / 4 = 0.36 us for pci-x. I used the Los Alamos National Laboratory example numbers here for elan4. In the end it is about price, not user friendliness of programming :) > > >> Fact that so few standardized/rewrote their floating point >> software to gpu's, >> is already saying enough about all the legacy codes in HPC world :) >> >> When some years ago i had a working 2 cluster node here with >> QM500- A , it >> had at 32 bits , 33Mhz pci long sleeve slots a blocked read >> latency of under 3 >> us is what i saw on my screen. Sure i had no switch in between it. >> Direct >> connection between the 2 elan4's. >> >> I'm not sure what pci-x adds to it when clocked at 133Mhz, but it >> won't be a >> big diff with pci-e. > > There is a big different between PCIX and PCIe. PCIe is half the > latency - from 0.7 to 0.3 more or less. > Well i'm not so sure the difference is that huge. All those measurements in past was at oldie Xeon P4 machines, and i've never really seen a good comparision there. Furthermore fabrics like Dolphin at the time with a 66Mhz, 64 bits PCI card already got like 1.36 us one-way pingpong latencies, not exactly a lot slower than DDR infinibands qlogics of a claimed 1.2 us. >> PCI-e probably only has a bigger bandwidth isn't it? > > Also bandwidth ...:-) That's a non discussion here. I need latency :) If i'd really need big bandwidth for transport i'd use of course a boat - 90% of all cargo here gets transported over the rivers and hand dug canal; especially river Rhine. > >> Beating such hardware 2nd hand is difficult. $30 on ebay and i can >> install 4 >> rails or so. >> Didn't find the cables yet though... >> >> So i don't see how to outdo that with old infiniband cards which are >> $130 and upwards for the connectx, say $150 soon, which would >> allow only >> single rail >> or maybe at best 2 rails. So far didn't hear anyone yet who has >> more than >> single rail IB. >> >> Is it possible to install 2 rails with IB? > > Yes, you can do dual rails very well > >> So if i use your number in pessimistic manner, which means that >> there is >> some overhead of pci-x, then the connectx type IB, can do 1 >> million blocked >> reads per second theoretic with 2 rails. Which is $300 or so, >> cables not >> counted. > > Are you referring to RDMA reads? > As i use all cpu cores 100%, i simply cannot catch mpi messages, let alone overflow. So anything that has the cards processor do the job of digging inthe RAM rather than bug one of the very busy cores, is very welcome form of communication. 99.9% of all communication to remote nodes is 32 byte RDMA wites and 128-256 byte reads. I can set myself whether it's 128, 192 or 256. Probably i'll make it 128. The number of reads is a few percent more than writes. That other 0.01% is the very complex parallel algorithm that basically parallellizes a sequential algorithm. That algorithm is a 150 pages of a4 roughly full of insights and proofs why it works correct :) > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Nov 8 05:24:07 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 11:24:07 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB7EF12.8000403@science-computing.de> Message-ID: On Nov 7, 2011, at 6:16 PM, Gilad Shainer wrote: >> They use the term "message oriented" with the description that the IB >> hardware takes care of segmentation and so forth, so that the >> application >> just says "send this" or "receive this" and the gory details are >> concealed. Then he distinguishes that from a TCP/IP stack, etc., >> where >> the software does a lot of this, with the implication that the >> user has to be >> involved in that. >> >> But it seems to me that the same processes are going on.. You have >> a big >> message, it needs to be broken up, etc. >> And for *most users* all that is hidden underneath the hood of, >> say, MPI. >> (obviously, if you are a message passing software writer, the >> distinction is >> important). > > You can also post large message to the IB interface (up to 2GB I > believe) and the IB transport will break it to the network MTU. > > Gilad > Please note that searching only requires massive amounts of short data requests, say 128 bytes and massive stores of 32 bytes. So latency of the network cards and how fast the cards can switch from proces to proces, those latencies play a far more important role than all those single core latencies that everyone always posts. Some cards when switching from helping 1 proces to another can have a penalty of dozens of microseconds; you never hear from all those hidden penalties as the few online tests done by all those academics are always single core tests with the rest of the cluster idle. Interesting is to hear experiences there, but reality is that you hardly ever hear that. You have to gamble what to buy. What i do know is that the MPI programming model is not very attractive; you have to catch all those messages shipped somewhere, check for overflow and so on. Yet i'm on a budget here, so price dominates everything. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Tue Nov 8 16:03:27 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Tue, 8 Nov 2011 16:03:27 -0500 Subject: [Beowulf] HP redstone servers In-Reply-To: <4EB0A678.9060602@cse.ucdavis.edu> References: <4EB0A678.9060602@cse.ucdavis.edu> Message-ID: ARM is an interesting platform that offers better performance/power ratio than x64 processors. I don't think ARM will eat into HPC shares of AMD/Intel/IBM POWER or enter the TOP500 list any time soon. However, I am expecting to see ARM in high throughput environments in the near future. Thus, we are announcing that the next version of Grid Engine released by the Grid Scheduler open project will support ARM Linux. We tested SGE on an ARMv7 box. As the SGE code is 64-bit clean, when 64-bit ARM processors come out in the next year or two, our version should/will compile & work out of the box. Rayson ================================= Grid Engine / Open Grid Scheduler http://gridscheduler.sourceforge.net On Tue, Nov 1, 2011 at 10:10 PM, Bill Broadley wrote: > The best summary I've found: > http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ > > Specifications at for the ECX-1000: > http://www.calxeda.com/products/energycore/ecx1000/techspecs > > And EnergyCard: > http://www.calxeda.com/products/energycards/techspecs > > The only hint on price that I found was from theregister.co.uk: > ?The sales pitch for the Redstone systems, says Santeler, is that a > ?half rack of Redstone machines and their external switches > ?implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, > ?and costs $1.2m. > > So it sounds like for 6 watts and $750 you get a quad core 1.4 GHz arm > 10G connected node. > > Comments? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Rayson ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Nov 8 20:01:47 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 9 Nov 2011 02:01:47 +0100 Subject: [Beowulf] HP redstone servers In-Reply-To: References: <4EB0A678.9060602@cse.ucdavis.edu> Message-ID: hi Rayson, Most interesting stuff. The question i ask myself. Why is it so expensive? If i do a silly compare, just looking to the Ghz. Then a quad core 1.4Ghz is similar to a single core i7 @ 1.5Ghz roughly for Diep. I rounded up optimistically the IPC of diep at a single ARM core to 0.5 (if you realize a bulldozer core gets like 0.73, you'll realize the problem of this optimistic guess, whereas an i7 core is over 1.73+ ). Diep being in principle a 32 bits integer program, just 64 bits compiled for a bigger caching range (hashtable) of course profits perfectly from ARM. You won't find much software that can run better on such ARM cpu's than a chessprogram. So 1600 nodes then is like 800 cores 3Ghz i7. Or a 100 socket machine i7 @ 8 cores a CPU, or a 128 socket machine i7 @ 6 cores a CPU. The 6 core Xeons actually are a tad higher clocked than 3Ghz, but let's forget about that now. Now getting that with a good network might not be so cheap, but so to speak there is a budget of far over 1.2 million / 128 = $9375 per socket. So that 's a 64 node switch and 64 nodes dual socket Xeon. That gives a budget of $18750 a node. Pretty easy to build i'd say so. Now performance a watt. Of course something ARM is good at. With 64 nodes that means 9900 watt / 64 = 154 watt per node. We can be sure that the Xeon burn more than that. Yet it's not much more than factor 2 off and everywhere so far i rounded off optimistically for the ARM. I took 3Ghz cpu's, in reality they're higher clocked. I took 6 cores, in reality they're soon 8 cores a node. I took an IPC of 0.5 for the arm cores, and we must still see they will get that IPC, most likely they won't. So it's nearly on par if we do a real accurate calculation. It's not like there is much of a margin in power consumption versus optimized i7 code. This factor 2 evaporates practical. Who would anyone be interested in buying this at this huge price with as far as i can see 0 advantages. On Nov 8, 2011, at 10:03 PM, Rayson Ho wrote: > ARM is an interesting platform that offers better performance/power > ratio than x64 processors. I don't think ARM will eat into HPC shares > of AMD/Intel/IBM POWER or enter the TOP500 list any time soon. > However, I am expecting to see ARM in high throughput environments in > the near future. Thus, we are announcing that the next version of Grid > Engine released by the Grid Scheduler open project will support ARM > Linux. > > We tested SGE on an ARMv7 box. As the SGE code is 64-bit clean, when > 64-bit ARM processors come out in the next year or two, our version > should/will compile & work out of the box. > > Rayson > > ================================= > Grid Engine / Open Grid Scheduler > http://gridscheduler.sourceforge.net > > > > On Tue, Nov 1, 2011 at 10:10 PM, Bill Broadley > wrote: >> The best summary I've found: >> http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ >> >> Specifications at for the ECX-1000: >> http://www.calxeda.com/products/energycore/ecx1000/techspecs >> >> And EnergyCard: >> http://www.calxeda.com/products/energycards/techspecs >> >> The only hint on price that I found was from theregister.co.uk: >> The sales pitch for the Redstone systems, says Santeler, is that a >> half rack of Redstone machines and their external switches >> implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, >> and costs $1.2m. >> >> So it sounds like for 6 watts and $750 you get a quad core 1.4 GHz >> arm >> 10G connected node. >> >> Comments? >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > -- > Rayson > > ================================================== > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Thu Nov 10 12:04:44 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 10 Nov 2011 12:04:44 -0500 (EST) Subject: [Beowulf] HP redstone servers In-Reply-To: References: <4EB0A678.9060602@cse.ucdavis.edu>

Message-ID: > Who would anyone be interested in buying this at this huge price with > as far as i can see 0 advantages. it's for memcached. it's not for chess. I'm not sure how HPC-friendly the current rev is, in that it has fairly limited-seeming memory bandwidth. The most interesting aspect is the onchip networking (5-degree 10Gb). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Nov 10 17:02:27 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 10 Nov 2011 17:02:27 -0500 (EST) Subject: [Beowulf] SC 2011 BeoBash Reminder Message-ID: <45024.192.168.93.213.1320962547.squirrel@mail.eadline.org> In case you missed the previous announcement http://www.xandmarketing.com/beobash11/ -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Thu Nov 10 20:23:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 11 Nov 2011 02:23:46 +0100 Subject: [Beowulf] HP redstone servers In-Reply-To: References: <4EB0A678.9060602@cse.ucdavis.edu>

Message-ID: <602F6A42-E8E6-4AA0-9C0F-5372C651A680@xs4all.nl> On Nov 10, 2011, at 6:04 PM, Mark Hahn wrote: >> Who would anyone be interested in buying this at this huge price with >> as far as i can see 0 advantages. > > it's for memcached. it's not for chess. > > I'm not sure how HPC-friendly the current rev is, in that it has > fairly > limited-seeming memory bandwidth. The most interesting aspect > is the onchip networking (5-degree 10Gb). It's eating too much power and too expensive for telecom networking i'd guess. They always speak about power in telecom, but all examples i have there is that price dominates everything there. Power constraints soon get dropped when eating more power is cheaper. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From tegner at renget.se Fri Nov 11 02:10:59 2011 From: tegner at renget.se (Jon Tegner) Date: Fri, 11 Nov 2011 08:10:59 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: <4EBCCA83.4010408@renget.se> >>> DRIVERS: >>> Drivers for cards now. Are those all open source, or does it require >>> payment? Is the source released of all those cards drivers, and do >>> they integrate into linux? >> You should get everything you need from the Linux kernel and / or OFED. > > You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows > > I'm using Qlogic drivers, it works well, but has the drawback that I'm limited to the kernel required for those drivers (which since I'm using CentOS means that I can only use CentOS-5.5). Would there be any disadvantages involved in instead use the stuff from the kernel/OFED directly? Regards, /jon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Fri Nov 11 09:18:37 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 11 Nov 2011 09:18:37 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EBCCA83.4010408@renget.se> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <4EBCCA83.4010408@renget.se> Message-ID: <4EBD2EBD.1030901@ias.edu> On 11/11/2011 02:10 AM, Jon Tegner wrote: >>>> DRIVERS: >>>> Drivers for cards now. Are those all open source, or does it require >>>> payment? Is the source released of all those cards drivers, and do >>>> they integrate into linux? >>> You should get everything you need from the Linux kernel and / or OFED. >> You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows >> >> > I'm using Qlogic drivers, it works well, but has the drawback that I'm > limited to the kernel required for those drivers (which since I'm using > CentOS means that I can only use CentOS-5.5). > > Would there be any disadvantages involved in instead use the stuff from > the kernel/OFED directly? > Jon, The Mellanox OFED drivers come as an .iso file. Inside that ISO image is a script that will rebuild all the mellanox packages for newer/different kernels. It's a couple of extra steps (mount iso image, extract script, run it, yada, yada, yada), but it works very well. But that's Mellanox, and your concerned about QLogic. I don't know how the QLogic drivers are bundled, but look around the files provided to see if there's a utility script that does the same thing for QLogic, or at least instructions on how to recompile against different kernel versions. Since OFED is open source, QLogic should provide the source code to their drivers. The only disadvantage with using stuff directly from the kernel/OFED is that if you have newer cards with new features, the software to support those new features may not have trickled down into the official OFED distro or kernel, and then into your Linux distro of choice. For example, my current cluster was installed in the fall of 2008, using RHEL 5. The software/drivers in RHEL 5 worked just fine. Last year. I added a couple of new nodes with GPUs that had newer Mellanox HBAs. They wouldn't work with RHEL 5. I needed the OFED software provided from the Mellanox site. I was hoping Mellanox's additions in that OFED distro made it into RHEL 6, but I just upgraded my cluster nodes and still needed to download the Mellanox OFED package. I'm sure by RHEL 7 or 8, those HBAs will be supported directly by the distro. This is a problem of multiple lags. The vendor makes changes to the kernel/OFED software to support their latest technology, and then you have your first lag as the vendor tries to get the changes merged into the Linux kernel and the offiial OFED distro. Then there's a second lag as those changes get merged into the different Linux distros. For a conservative, stability-as-first-priority distro like RHEL, that second lag can be really long. :( -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Sun Nov 13 21:40:52 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Mon, 14 Nov 2011 13:40:52 +1100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> Message-ID: <4EC07FB4.2020902@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Gilad, On 08/11/11 12:53, Gilad Shainer wrote: > The latency numbers are more or less the same between > the IB vendors on SDR, DDR and QDR. Mellanox is the > only vendor with FDR IB for now, and with PCIe 3.0 > latency are below 1us (RDMA much below...). Is that for an MPI message ? I'd heard that FDR might have higher latencies due to the coding changes that were happening - is this not the case ? cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7Af7MACgkQO2KABBYQAh8HyQCdG3AUK1k6QmyRd7SueQLp3MHZ 1wsAn3VxYqclLdQcqBv5yxd4LLvhsiLv =Hf0h -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Sun Nov 13 23:52:03 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 14 Nov 2011 04:52:03 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EC07FB4.2020902@unimelb.edu.au> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> <4EC07FB4.2020902@unimelb.edu.au> Message-ID: There is some add latency due to the 66/64 new encoding, but overall latency is lower than QDR. MPI is below 1us. Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Christopher Samuel Sent: Sunday, November 13, 2011 6:42 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] building Infiniband 4x cluster questions -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Gilad, On 08/11/11 12:53, Gilad Shainer wrote: > The latency numbers are more or less the same between the IB vendors > on SDR, DDR and QDR. Mellanox is the only vendor with FDR IB for now, > and with PCIe 3.0 latency are below 1us (RDMA much below...). Is that for an MPI message ? I'd heard that FDR might have higher latencies due to the coding changes that were happening - is this not the case ? cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7Af7MACgkQO2KABBYQAh8HyQCdG3AUK1k6QmyRd7SueQLp3MHZ 1wsAn3VxYqclLdQcqBv5yxd4LLvhsiLv =Hf0h -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Mon Nov 14 00:23:33 2011 From: lindahl at pbm.com (Greg Lindahl) Date: Sun, 13 Nov 2011 21:23:33 -0800 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EC07FB4.2020902@unimelb.edu.au> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> <4EC07FB4.2020902@unimelb.edu.au> Message-ID: <20111114052333.GB31084@bx9.net> > Is that for an MPI message ? I'd heard that FDR might > have higher latencies due to the coding changes that > were happening - is this not the case ? When you're asking about MPI latency, are you interested in a 2-node cluster, or a big one? The usual MPI latency benchmark only uses 1 core each on 2 nodes, and on some interconnects, that's a very different latency than what you'll see on 16 nodes or 100s of nodes. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Nov 14 03:25:08 2011 From: samuel at unimelb.edu.au (Chris Samuel) Date: Mon, 14 Nov 2011 19:25:08 +1100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <4EC07FB4.2020902@unimelb.edu.au> Message-ID: <201111141925.09072.samuel@unimelb.edu.au> Hi Gilad, On Mon, 14 Nov 2011 03:52:03 PM Gilad Shainer wrote: > There is some add latency due to the 66/64 new encoding, but > overall latency is lower than QDR. MPI is below 1us. Thanks for that. So I'd guess a pair of nodes with QDR cards in the same slot would have lower latency again ? cheers! Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 14 03:30:35 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 14 Nov 2011 08:30:35 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <201111141925.09072.samuel@unimelb.edu.au> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <4EC07FB4.2020902@unimelb.edu.au> <201111141925.09072.samuel@unimelb.edu.au> Message-ID: FDR adapters are lower latency than the QDR. So you will gain some with QDR, and the most with FDR. Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Chris Samuel Sent: Monday, November 14, 2011 12:26 AM To: beowulf at beowulf.org Subject: Re: [Beowulf] building Infiniband 4x cluster questions Hi Gilad, On Mon, 14 Nov 2011 03:52:03 PM Gilad Shainer wrote: > There is some add latency due to the 66/64 new encoding, but overall > latency is lower than QDR. MPI is below 1us. Thanks for that. So I'd guess a pair of nodes with QDR cards in the same slot would have lower latency again ? cheers! Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Nov 14 03:38:08 2011 From: samuel at unimelb.edu.au (Chris Samuel) Date: Mon, 14 Nov 2011 19:38:08 +1100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <20111114052333.GB31084@bx9.net> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <4EC07FB4.2020902@unimelb.edu.au> <20111114052333.GB31084@bx9.net> Message-ID: <201111141938.08874.samuel@unimelb.edu.au> Hi Greg, On Mon, 14 Nov 2011 04:23:33 PM Greg Lindahl wrote: > When you're asking about MPI latency, are you interested in a > 2-node cluster, or a big one? I'm interested in the differencies in latency between QDR and FDR - whether that be node to node or in a large cluster. cheers! Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Mon Nov 14 14:41:44 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Mon, 14 Nov 2011 14:41:44 -0500 Subject: [Beowulf] Announcing Grid Engine 2011.11 and SC11 demo & presentation Message-ID: The Open Grid Scheduler Project is releasing a new major release: Grid Engine 2011.11. We are using the open source model that was used by Sun Grid Engine (2001 - 2009), and we offer open source (not open core) Grid Engine with *optional* technical support. We have a growing Grid Engine Community and joined by companies and system integrators in the Grid Engine ecosystem. New features ============ * Berkeley DB Spooling Directory can now be placed on NFS (NFSv3 or older, basically any modern network filesystems) * Portable Hardware Locality Library (was under alpha/beta since April 2011) * CUDA GPU load sensor - uses NVIDIA Management Library (NVML) * User notification mails can be sent from a configurable user ID * Job exit status available to epilog - via $SGE_JOBEXIT_STAT * ARM Linux support - ARMv7. * qmake upgraded to version 3.82 * Support for Linux 3.0 * Perfstat library used on AIX, retiring ibm-loadsensor * Support for newer AIX versions * Tango qmon icons * Code quality improvements - static code verifier used for testing PQS API code cleaned up and we are releasing the PQS API Scheduler Plugin Interface as technology preview. SC11 Demo ========= Gridcore/Gompute is kind enough to offer part of their 20x20 booth for the Grid Engine 2011.11 demo. I am going to remotely give a presentation on the new features of Grid Engine 2011.11, and also the future of open source Grid Engine. The proposed time slots are: 12:00 and 15:00 on Tuesday and Wedneday. Please sign up for the presentation at the Gridcore/Gompute booth (#6002). Download links ============== Download the latest release from sourceforge: http://gridscheduler.sourceforge.net/ Download release notes from the Scalable Logic homepage: http://www.scalablelogic.com/ We are offering pre-compiled binaries. The temp. download site (before we can upload everything to Sourceforge) for x64 Linux is: http://dl.dropbox.com/u/47200624/ge2011.11-x64.tar.gz Rayson ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Tue Nov 15 04:25:23 2011 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 15 Nov 2011 10:25:23 +0100 Subject: [Beowulf] Nvidia's ARM chips power supercomputer Message-ID: <20111115092523.GK31847@leitl.org> http://news.cnet.com/8301-13924_3-57323948-64/nvidias-arm-chips-power-supercomputer/ Nvidia's ARM chips power supercomputer by Brooke Crothers November 14, 2011 6:00 AM PST Follow @mbrookec Barcelona Supercomputing Center. Barcelona Supercomputing Center is located in a former chapel. (Credit: Barcelona Supercomputing Center.) Nvidia's Tegra chips will for the first time power a supercomputer--more evidence that ARM is movin' on up into Intel territory. The chipmaker said today the Barcelona Supercomputing Center is developing a new hybrid supercomputer that, for the first time, combines energy-efficient Nvidia Tegra CPUs (central processing units), based on the ARM chip architecture, with Nvidia's graphics processing units (GPUs). The supercomputing center plans to develop a system that is two to five times more energy-efficient compared with today's efficient high-performance computing systems. Most of today's supercomputers use Intel processors. "In most current systems, CPUs alone consume the lion's share of the energy, often 40 percent or more," Alex Ramirez, leader of the Mont-Blanc Project at the Barcelona Supercomputing Center, said in a statement. "By comparison, the Mont-Blanc architecture will rely on energy-efficient compute accelerators and ARM processors...to achieve a 4- to 10-times increase in energy-efficiency by 2014." A development kit will feature a quad-core Nvidia Tegra 3 CPU accelerated by a discrete Nvidia GPU. It is expected to be available in the first half of 2012. Nvidia also announced today that the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign is deploying a Cray supercomputer accelerated by Nvidia's Tesla GPUs. That's part of the Blue Waters project to build one of the world's most powerful computer systems. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Wed Nov 16 04:52:38 2011 From: eugen at leitl.org (Eugen Leitl) Date: Wed, 16 Nov 2011 10:52:38 +0100 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores Message-ID: <20111116095238.GX31847@leitl.org> http://seattletimes.nwsource.com/html/technologybrierdudleysblog/2016775145_wow_intel_unveils_1_teraflop_c.html Wow: Intel unveils 1 teraflop chip with 50-plus cores Posted by Brier Dudley I thought the prospect of quad-core tablet computers was exciting. Then I saw Intel's latest -- a 1 teraflop chip, with more than 50 cores, that Intel unveiled today, running it on a test machine at the SC11 supercomputing conference in Seattle. That means my kids may take a teraflop laptop to college -- if their grades don't suffer too much having access to 50-core video game consoles. It wasn't that long ago that Intel was boasting about the first supercomputer with sustained 1 teraflop performance. That was in 1997, on a system with 9,298 Pentium II chips that filled 72 computing cabinets. Now Intel has squeezed that much performance onto a matchbook-sized chip, dubbed "Knights Ferry," based on its new "Many Integrated Core" architecture, or MIC. It was designed largely in the Portland area and has just started manufacturing. "In 15 years that's what we've been able to do. That is stupendous. You're witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general manager of Intel's technical computing group, said at an unveiling ceremony. (He holds up the chip here) A single teraflop is capable of a trillion floating point operations per second. On hand for the event -- in the cellar of the Ruth's Chris Steak House in Seattle -- were the directors of the National Center for Computational Sciences at Oak Ridge Laboratory and the Application Acceleration Center of Excellence. Also speaking was the chief science officer of the GENCI supercomputing organization in France, which has used its Intel-based system for molecular simulations of Alzheimer's, looking at issues such as plaque formation that's a hallmark of the disease. "The hardware is hardly exciting. ... The exciting part is doing the science," said Jeff Nichols, acting director of the computational center at Oak Ridge. The hardware was pretty cool, though. George Chrysos, the chief architect of Knights Ferry, came up from the Portland area with a test system running the new chip, which was connected to a speed meter on a laptop to show that it was running around 1 teraflop. Intel had the test system set up behind closed doors -- on a coffee table in a hotel suite at the Grand Hyatt, and wouldn't allow reporters to take pictures of the setup. Nor would the company specify how many cores the chip has -- just more than 50 -- or its power requirement. If you're building a new system and want to future-proof it, the Knights Ferry chip uses a double PCI Express slot. Chrysos said the systems are also likely to run alongside a few Xeon processors. This means that Intel could be producing teraflop chips for personal computers within a few years, although there's lots of work to be done on the software side before you'd want one. Another question is whether you'd want a processor that powerful on a laptop, for instance, where you may prefer to have a system optimized for longer battery life, Hazra said. More important, Knights Ferry chips may help engineers build the next generation of supercomputing systems, which Intel and its partners hope to delivery by 2018. Power efficiency was a highlight of another big announcement this week at SC11. On Monday night, IBM announced its "next generation supercomputing project," the Blue Gene/Q system that's heading to Lawrence Livermore National Laboratory next year. Dubbed Sequoia, the system should run at 20 petaflops peak performance. IBM expects it to be the world's most power-efficient computer, processing 2 gigaflops per watt. The first 96 racks of the system could be delivered in December. The Department of Energy's National Nuclear Security Administration uses the systems to work on nuclear weapons, energy reseach and climate change, among other things. Sequoia complements another Blue Gene/Q system, a 10-petaflop setup called "Mira," which was previously announced by Argonne National Laboratory. A few images from the conference, which runs through Friday at the Washington State Convention & Trade Center, starting with perusal of Intel boards: Take home a Cray today! IBM was sporting Blue Genes, and it wasn't even casual Friday: A 94 teraflop rack: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Nov 16 06:04:50 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 16 Nov 2011 12:04:50 +0100 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores In-Reply-To: <20111116095238.GX31847@leitl.org> References: <20111116095238.GX31847@leitl.org> Message-ID: <72D8289B-FFA3-4AC6-93F3-3312D3445937@xs4all.nl> Well, If it's gonna use 2 pci-express slots, for sure it's eating massive power, just like the gpu's. Furthermore the word 'double precision' is nowhere there, so we can safely assume single precision. Speaking of which - isn't nvidia and amd already delivering cards that deliver a lot? AMD's HD6990 is 500 euro and delivers a 5+ Tflop and supposedly so in openCL. Knowing intel is not delivering hardware dirt cheap - despite hammering the bulldozer, bulldozer so far is cheaper than any competative intel chip - though might change a few months from now when the 22nm parts are there. For crunching get gpu's - as for intel - i hope they release cheap sixcore cpu's and don't overprice 8 core Xeon... On Nov 16, 2011, at 10:52 AM, Eugen Leitl wrote: > > http://seattletimes.nwsource.com/html/technologybrierdudleysblog/ > 2016775145_wow_intel_unveils_1_teraflop_c.html > > Wow: Intel unveils 1 teraflop chip with 50-plus cores > > Posted by Brier Dudley > > I thought the prospect of quad-core tablet computers was exciting. > > Then I saw Intel's latest -- a 1 teraflop chip, with more than 50 > cores, that > Intel unveiled today, running it on a test machine at the SC11 > supercomputing > conference in Seattle. > > That means my kids may take a teraflop laptop to college -- if > their grades > don't suffer too much having access to 50-core video game consoles. > > It wasn't that long ago that Intel was boasting about the first > supercomputer > with sustained 1 teraflop performance. That was in 1997, on a > system with > 9,298 Pentium II chips that filled 72 computing cabinets. > > Now Intel has squeezed that much performance onto a matchbook-sized > chip, > dubbed "Knights Ferry," based on its new "Many Integrated Core" > architecture, > or MIC. > > It was designed largely in the Portland area and has just started > manufacturing. > > "In 15 years that's what we've been able to do. That is stupendous. > You're > witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general > manager of > Intel's technical computing group, said at an unveiling ceremony. > (He holds > up the chip here) > > A single teraflop is capable of a trillion floating point > operations per > second. > > On hand for the event -- in the cellar of the Ruth's Chris Steak > House in > Seattle -- were the directors of the National Center for Computational > Sciences at Oak Ridge Laboratory and the Application Acceleration > Center of > Excellence. > > Also speaking was the chief science officer of the GENCI > supercomputing > organization in France, which has used its Intel-based system for > molecular > simulations of Alzheimer's, looking at issues such as plaque > formation that's > a hallmark of the disease. > > "The hardware is hardly exciting. ... The exciting part is doing the > science," said Jeff Nichols, acting director of the computational > center at > Oak Ridge. > > The hardware was pretty cool, though. > > George Chrysos, the chief architect of Knights Ferry, came up from the > Portland area with a test system running the new chip, which was > connected to > a speed meter on a laptop to show that it was running around 1 > teraflop. > > Intel had the test system set up behind closed doors -- on a coffee > table in > a hotel suite at the Grand Hyatt, and wouldn't allow reporters to take > pictures of the setup. > > Nor would the company specify how many cores the chip has -- just > more than > 50 -- or its power requirement. > > If you're building a new system and want to future-proof it, the > Knights > Ferry chip uses a double PCI Express slot. Chrysos said the systems > are also > likely to run alongside a few Xeon processors. > > This means that Intel could be producing teraflop chips for personal > computers within a few years, although there's lots of work to be > done on the > software side before you'd want one. > > Another question is whether you'd want a processor that powerful on > a laptop, > for instance, where you may prefer to have a system optimized for > longer > battery life, Hazra said. > > More important, Knights Ferry chips may help engineers build the next > generation of supercomputing systems, which Intel and its partners > hope to > delivery by 2018. > > Power efficiency was a highlight of another big announcement this > week at > SC11. On Monday night, IBM announced its "next generation > supercomputing > project," the Blue Gene/Q system that's heading to Lawrence Livermore > National Laboratory next year. > > Dubbed Sequoia, the system should run at 20 petaflops peak > performance. IBM > expects it to be the world's most power-efficient computer, > processing 2 > gigaflops per watt. > > The first 96 racks of the system could be delivered in December. The > Department of Energy's National Nuclear Security Administration > uses the > systems to work on nuclear weapons, energy reseach and climate > change, among > other things. > > Sequoia complements another Blue Gene/Q system, a 10-petaflop setup > called > "Mira," which was previously announced by Argonne National Laboratory. > > A few images from the conference, which runs through Friday at the > Washington > State Convention & Trade Center, starting with perusal of Intel > boards: > > > Take home a Cray today! > > IBM was sporting Blue Genes, and it wasn't even casual Friday: > > A 94 teraflop rack: > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Wed Nov 16 06:27:47 2011 From: eugen at leitl.org (Eugen Leitl) Date: Wed, 16 Nov 2011 12:27:47 +0100 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores In-Reply-To: <72D8289B-FFA3-4AC6-93F3-3312D3445937@xs4all.nl> References: <20111116095238.GX31847@leitl.org> <72D8289B-FFA3-4AC6-93F3-3312D3445937@xs4all.nl> Message-ID: <20111116112747.GF31847@leitl.org> On Wed, Nov 16, 2011 at 12:04:50PM +0100, Vincent Diepeveen wrote: > Well, > > If it's gonna use 2 pci-express slots, for sure it's eating massive > power, just like the gpu's. It's not too bad for an 1997 Top500 equivalent (well, at least as far as matrix multiplication is concerned). > Furthermore the word 'double precision' is nowhere there, so we can > safely assume single precision. It's double precision. > Speaking of which - isn't nvidia and amd already delivering cards > that deliver a lot? Kepler is supposed to get 1.3 TFlops in DGEMM when it's out. Intel touts that Knights Corner produces 1 TFlop consistently indedepent of matrix (block) size. The vector unit is 512 bits, Knights Landing will boost that to 124 bits, supposedly. Source: http://www.heise.de/newsticker/meldung/Supercomputer-2011-CPU-mit-Many-Integrated-Cores-knackt-1-TFlops-1379625.html > AMD's HD6990 is 500 euro and delivers a 5+ Tflop and supposedly so in > openCL. > > Knowing intel is not delivering hardware dirt cheap - despite > hammering the bulldozer, bulldozer > so far is cheaper than any competative intel chip - though might > change a few months from now when the 22nm > parts are there. Parts like these will be useful for gamer markets, so presumably nVidia or AMD will be only too happy to leap into any gap that Intel offers. > For crunching get gpu's - as for intel - i hope they release cheap > sixcore cpu's and don't overprice 8 core Xeon... > > On Nov 16, 2011, at 10:52 AM, Eugen Leitl wrote: > > > > > http://seattletimes.nwsource.com/html/technologybrierdudleysblog/ > > 2016775145_wow_intel_unveils_1_teraflop_c.html > > > > Wow: Intel unveils 1 teraflop chip with 50-plus cores > > > > Posted by Brier Dudley > > > > I thought the prospect of quad-core tablet computers was exciting. > > > > Then I saw Intel's latest -- a 1 teraflop chip, with more than 50 > > cores, that > > Intel unveiled today, running it on a test machine at the SC11 > > supercomputing > > conference in Seattle. > > > > That means my kids may take a teraflop laptop to college -- if > > their grades > > don't suffer too much having access to 50-core video game consoles. > > > > It wasn't that long ago that Intel was boasting about the first > > supercomputer > > with sustained 1 teraflop performance. That was in 1997, on a > > system with > > 9,298 Pentium II chips that filled 72 computing cabinets. > > > > Now Intel has squeezed that much performance onto a matchbook-sized > > chip, > > dubbed "Knights Ferry," based on its new "Many Integrated Core" > > architecture, > > or MIC. > > > > It was designed largely in the Portland area and has just started > > manufacturing. > > > > "In 15 years that's what we've been able to do. That is stupendous. > > You're > > witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general > > manager of > > Intel's technical computing group, said at an unveiling ceremony. > > (He holds > > up the chip here) > > > > A single teraflop is capable of a trillion floating point > > operations per > > second. > > > > On hand for the event -- in the cellar of the Ruth's Chris Steak > > House in > > Seattle -- were the directors of the National Center for Computational > > Sciences at Oak Ridge Laboratory and the Application Acceleration > > Center of > > Excellence. > > > > Also speaking was the chief science officer of the GENCI > > supercomputing > > organization in France, which has used its Intel-based system for > > molecular > > simulations of Alzheimer's, looking at issues such as plaque > > formation that's > > a hallmark of the disease. > > > > "The hardware is hardly exciting. ... The exciting part is doing the > > science," said Jeff Nichols, acting director of the computational > > center at > > Oak Ridge. > > > > The hardware was pretty cool, though. > > > > George Chrysos, the chief architect of Knights Ferry, came up from the > > Portland area with a test system running the new chip, which was > > connected to > > a speed meter on a laptop to show that it was running around 1 > > teraflop. > > > > Intel had the test system set up behind closed doors -- on a coffee > > table in > > a hotel suite at the Grand Hyatt, and wouldn't allow reporters to take > > pictures of the setup. > > > > Nor would the company specify how many cores the chip has -- just > > more than > > 50 -- or its power requirement. > > > > If you're building a new system and want to future-proof it, the > > Knights > > Ferry chip uses a double PCI Express slot. Chrysos said the systems > > are also > > likely to run alongside a few Xeon processors. > > > > This means that Intel could be producing teraflop chips for personal > > computers within a few years, although there's lots of work to be > > done on the > > software side before you'd want one. > > > > Another question is whether you'd want a processor that powerful on > > a laptop, > > for instance, where you may prefer to have a system optimized for > > longer > > battery life, Hazra said. > > > > More important, Knights Ferry chips may help engineers build the next > > generation of supercomputing systems, which Intel and its partners > > hope to > > delivery by 2018. > > > > Power efficiency was a highlight of another big announcement this > > week at > > SC11. On Monday night, IBM announced its "next generation > > supercomputing > > project," the Blue Gene/Q system that's heading to Lawrence Livermore > > National Laboratory next year. > > > > Dubbed Sequoia, the system should run at 20 petaflops peak > > performance. IBM > > expects it to be the world's most power-efficient computer, > > processing 2 > > gigaflops per watt. > > > > The first 96 racks of the system could be delivered in December. The > > Department of Energy's National Nuclear Security Administration > > uses the > > systems to work on nuclear weapons, energy reseach and climate > > change, among > > other things. > > > > Sequoia complements another Blue Gene/Q system, a 10-petaflop setup > > called > > "Mira," which was previously announced by Argonne National Laboratory. > > > > A few images from the conference, which runs through Friday at the > > Washington > > State Convention & Trade Center, starting with perusal of Intel > > boards: > > > > > > Take home a Cray today! > > > > IBM was sporting Blue Genes, and it wasn't even casual Friday: > > > > A 94 teraflop rack: > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > > Computing > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Nov 16 07:43:31 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 16 Nov 2011 13:43:31 +0100 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores In-Reply-To: <20111116112747.GF31847@leitl.org> References: <20111116095238.GX31847@leitl.org> <72D8289B-FFA3-4AC6-93F3-3312D3445937@xs4all.nl> <20111116112747.GF31847@leitl.org> Message-ID: <4E638224-7632-4E84-8C08-0C454F58AFD5@xs4all.nl> On Nov 16, 2011, at 12:27 PM, Eugen Leitl wrote: > On Wed, Nov 16, 2011 at 12:04:50PM +0100, Vincent Diepeveen wrote: >> Well, >> >> If it's gonna use 2 pci-express slots, for sure it's eating massive >> power, just like the gpu's. > > It's not too bad for an 1997 Top500 equivalent (well, at least > as far as matrix multiplication is concerned). > >> Furthermore the word 'double precision' is nowhere there, so we can >> safely assume single precision. > > It's double precision. And probably like AMD and Nvidia creative counting a multiply-add as 2 flops. > >> Speaking of which - isn't nvidia and amd already delivering cards >> that deliver a lot? > > Kepler is supposed to get 1.3 TFlops in DGEMM when it's out. > Intel touts that Knights Corner produces 1 TFlop consistently > indedepent of matrix (block) size. > > The vector unit is 512 bits, Knights Landing will boost > that to 124 bits, supposedly. > You mean vectors of 8 double precisions i assume. That renders the chip less generic than GPU's are. > Source: http://www.heise.de/newsticker/meldung/Supercomputer-2011- > CPU-mit-Many-Integrated-Cores-knackt-1-TFlops-1379625.html > >> AMD's HD6990 is 500 euro and delivers a 5+ Tflop and supposedly so in >> openCL. >> >> Knowing intel is not delivering hardware dirt cheap - despite >> hammering the bulldozer, bulldozer >> so far is cheaper than any competative intel chip - though might >> change a few months from now when the 22nm >> parts are there. > > Parts like these will be useful for gamer markets, so > presumably nVidia or AMD will be only too happy to leap > into any gap that Intel offers. It's the gamers market that keeps those GPU's cheap. GPU's that are custom made for HPC always will lose it in the end. First generation might be strong and after that it will be simply not worth it price versus performance. Those production factories for now are too expensive to run, to produce something that gets sold to just a few HPC organisations. A few small improvements in GPU's always will win it at the HPC market from custom made units. So if intel is gonna sell this Larrabee derivative, in general it's a good plan if you invest big cash in a product, to also sell it, you run the likely risk that after the first release, it will be behind next generation GPU's, and a lot. Intel has superior cpu's, based upon producing the cpu's at the latest technology they have available. We see how AMD is behind now, because they produce end 2011 at 32 nm whereas intel already did sell their 32 nm sixcores since march 2010. The GPU's are something like 40 nm now. For the gamersmarket the next gpu will be at the latest and fastest process they can affort. A special HPC part from intel that's for a specific market will not be able to use the latest process technology in the long run; so it will be total crushed by GPU's. Just total annihilated it will be. I'd not sign any deal with intel regarding such cpu's without paper signature of them that it will release within 6 months in the same proces technology like their latest CPU. Furthermore the gpu's will beat this by factor 10 in price or so. Add to that, this AMD 6990, though it has 2 gpu's, it's 500 euro and 1.37 Tflop double precision right now. How's something gonna compete with that what years later achieves this? It's about price per gflop. In future the big calculations will simply need to get redone 2 times, just like with prime numbers they get double checked. I see too much gibberish in the scientific calculations done at supercomputers. 99% of those projects in HPC is not even amateur level from algorithmic calculation viewpoint seen. The software of 99% of those projects doesn't even have remotely the same outcome when you rerun the same code, that's how crappy most of it is. We didn't even touch the word 'efficiency' yet. If i compare some commercial codes that do something similar to the software calculating the height of seawater levels... ...which from climate viewpoint by the way is pretty important as hundreds of billions get invested/changes hands based upon such calculations, then the word efficiency gets another dimension. It's wishful thinking that in future the fastest hardware is going to be 100% deterministic. The true nature of parallellism already makes everything non-deterministic. You can forever ignore GPU's, just like some ignored CPU's from intel&amd for a year or 10, but it's not very healthy. Especially if current generation is going to have already a factor 10 + advantage in price/performance to this chip. > >> For crunching get gpu's - as for intel - i hope they release cheap >> sixcore cpu's and don't overprice 8 core Xeon... >> >> On Nov 16, 2011, at 10:52 AM, Eugen Leitl wrote: >> >>> >>> http://seattletimes.nwsource.com/html/technologybrierdudleysblog/ >>> 2016775145_wow_intel_unveils_1_teraflop_c.html >>> >>> Wow: Intel unveils 1 teraflop chip with 50-plus cores >>> >>> Posted by Brier Dudley >>> >>> I thought the prospect of quad-core tablet computers was exciting. >>> >>> Then I saw Intel's latest -- a 1 teraflop chip, with more than 50 >>> cores, that >>> Intel unveiled today, running it on a test machine at the SC11 >>> supercomputing >>> conference in Seattle. >>> >>> That means my kids may take a teraflop laptop to college -- if >>> their grades >>> don't suffer too much having access to 50-core video game consoles. >>> >>> It wasn't that long ago that Intel was boasting about the first >>> supercomputer >>> with sustained 1 teraflop performance. That was in 1997, on a >>> system with >>> 9,298 Pentium II chips that filled 72 computing cabinets. >>> >>> Now Intel has squeezed that much performance onto a matchbook-sized >>> chip, >>> dubbed "Knights Ferry," based on its new "Many Integrated Core" >>> architecture, >>> or MIC. >>> >>> It was designed largely in the Portland area and has just started >>> manufacturing. >>> >>> "In 15 years that's what we've been able to do. That is stupendous. >>> You're >>> witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general >>> manager of >>> Intel's technical computing group, said at an unveiling ceremony. >>> (He holds >>> up the chip here) >>> >>> A single teraflop is capable of a trillion floating point >>> operations per >>> second. >>> >>> On hand for the event -- in the cellar of the Ruth's Chris Steak >>> House in >>> Seattle -- were the directors of the National Center for >>> Computational >>> Sciences at Oak Ridge Laboratory and the Application Acceleration >>> Center of >>> Excellence. >>> >>> Also speaking was the chief science officer of the GENCI >>> supercomputing >>> organization in France, which has used its Intel-based system for >>> molecular >>> simulations of Alzheimer's, looking at issues such as plaque >>> formation that's >>> a hallmark of the disease. >>> >>> "The hardware is hardly exciting. ... The exciting part is doing the >>> science," said Jeff Nichols, acting director of the computational >>> center at >>> Oak Ridge. >>> >>> The hardware was pretty cool, though. >>> >>> George Chrysos, the chief architect of Knights Ferry, came up >>> from the >>> Portland area with a test system running the new chip, which was >>> connected to >>> a speed meter on a laptop to show that it was running around 1 >>> teraflop. >>> >>> Intel had the test system set up behind closed doors -- on a coffee >>> table in >>> a hotel suite at the Grand Hyatt, and wouldn't allow reporters to >>> take >>> pictures of the setup. >>> >>> Nor would the company specify how many cores the chip has -- just >>> more than >>> 50 -- or its power requirement. >>> >>> If you're building a new system and want to future-proof it, the >>> Knights >>> Ferry chip uses a double PCI Express slot. Chrysos said the systems >>> are also >>> likely to run alongside a few Xeon processors. >>> >>> This means that Intel could be producing teraflop chips for personal >>> computers within a few years, although there's lots of work to be >>> done on the >>> software side before you'd want one. >>> >>> Another question is whether you'd want a processor that powerful on >>> a laptop, >>> for instance, where you may prefer to have a system optimized for >>> longer >>> battery life, Hazra said. >>> >>> More important, Knights Ferry chips may help engineers build the >>> next >>> generation of supercomputing systems, which Intel and its partners >>> hope to >>> delivery by 2018. >>> >>> Power efficiency was a highlight of another big announcement this >>> week at >>> SC11. On Monday night, IBM announced its "next generation >>> supercomputing >>> project," the Blue Gene/Q system that's heading to Lawrence >>> Livermore >>> National Laboratory next year. >>> >>> Dubbed Sequoia, the system should run at 20 petaflops peak >>> performance. IBM >>> expects it to be the world's most power-efficient computer, >>> processing 2 >>> gigaflops per watt. >>> >>> The first 96 racks of the system could be delivered in December. The >>> Department of Energy's National Nuclear Security Administration >>> uses the >>> systems to work on nuclear weapons, energy reseach and climate >>> change, among >>> other things. >>> >>> Sequoia complements another Blue Gene/Q system, a 10-petaflop setup >>> called >>> "Mira," which was previously announced by Argonne National >>> Laboratory. >>> >>> A few images from the conference, which runs through Friday at the >>> Washington >>> State Convention & Trade Center, starting with perusal of Intel >>> boards: >>> >>> >>> Take home a Cray today! >>> >>> IBM was sporting Blue Genes, and it wasn't even casual Friday: >>> >>> A 94 teraflop rack: >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > -- > Eugen* Leitl leitl http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From michf at post.tau.ac.il Wed Nov 16 07:49:13 2011 From: michf at post.tau.ac.il (Micha) Date: Wed, 16 Nov 2011 14:49:13 +0200 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores In-Reply-To: <20111116095238.GX31847@leitl.org> References: <20111116095238.GX31847@leitl.org> Message-ID: <9fc1b9b6-887a-4a2a-8757-0a587fe5181d@email.android.com> They are just busting the one teraflop but they are going with it into the GPU market, only without a GPU, i.e. they're competing with the Tesla GPU here. The Tesla admittedly is also about 1 TFlops but the consumer market has already gone past the 2 TFlop mark about a year ago and the next generation is just around the corner (will be operational before the mic). And the funny part is that its a discrete (over pci) card that is running a software micro-kernel ands scheduler that you can ssh into. I'm not sure how much I buy into the hype their selling that it's the next best thing because its x86 so you run the same code, although aparantly its not binary compatible, so you do need to recompile. And I think we all know that real world codes need a rework to transfer well to different vector sizes and communication/synchronization/etc. So why is it so much better than picking up an AMD or NVIDIA? Eugen Leitl wrote: http://seattletimes.nwsource.com/html/technologybrierdudleysblog/2016775145_wow_intel_unveils_1_teraflop_c.html Wow: Intel unveils 1 teraflop chip with 50-plus cores Posted by Brier Dudley I thought the prospect of quad-core tablet computers was exciting. Then I saw Intel's latest -- a 1 teraflop chip, with more than 50 cores, that Intel unveiled today, running it on a test machine at the SC11 supercomputing conference in Seattle. That means my kids may take a teraflop laptop to college -- if their grades don't suffer too much having access to 50-core video game consoles. It wasn't that long ago that Intel was boasting about the first supercomputer with sustained 1 teraflop performance. That was in 1997, on a system with 9,298 Pentium II chips that filled 72 computing cabinets. Now Intel has squeezed that much performance onto a matchbook-sized chip, dubbed "Knights Ferry," based on its new "Many Integrated Core" architecture, or MIC. It was designed largely in the Portland area and has just started manufacturing. "In 15 years that's what we've been able to do. That is stupendous. You're witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general manager of Intel's technical computing group, said at an unveiling ceremony. (He holds up the chip here) A single teraflop is capable of a trillion floating point operations per second. On hand for the event -- in the cellar of the Ruth's Chris Steak House in Seattle -- were the directors of the National Center for Computational Sciences at Oak Ridge Laboratory and the Application Acceleration Center of Excellence. Also speaking was the chief science officer of the GENCI supercomputing organization in France, which has used its Intel-based system for molecular simulations of Alzheimer's, looking at issues such as plaque formation that's a hallmark of the disease. "The hardware is hardly exciting. ... The exciting part is doing the science," said Jeff Nichols, acting director of the computational center at Oak Ridge. The hardware was pretty cool, though. George Chrysos, the chief architect of Knights Ferry, came up from the Portland area with a test system running the new chip, which was connected to a speed meter on a laptop to show that it was running around 1 teraflop. Intel had the test system set up behind closed doors -- on a coffee table in a hotel suite at the Grand Hyatt, and wouldn't allow reporters to take pictures of the setup. Nor would the company specify how many cores the chip has -- just more than 50 -- or its power requirement. If you're building a new system and want to future-proof it, the Knights Ferry chip uses a double PCI Express slot. Chrysos said the systems are also likely to run alongside a few Xeon processors. This means that Intel could be producing teraflop chips for personal computers within a few years, although there's lots of work to be done on the software side before you'd want one. Another question is whether you'd want a processor that powerful on a laptop, for instance, where you may prefer to have a system optimized for longer battery life, Hazra said. More important, Knights Ferry chips may help engineers build the next generation of supercomputing systems, which Intel and its partners hope to delivery by 2018. Power efficiency was a highlight of another big announcement this week at SC11. On Monday night, IBM announced its "next generation supercomputing project," the Blue Gene/Q system that's heading to Lawrence Livermore National Laboratory next year. Dubbed Sequoia, the system should run at 20 petaflops peak performance. IBM expects it to be the world's most power-efficient computer, processing 2 gigaflops per watt. The first 96 racks of the system could be delivered in December. The Department of Energy's National Nuclear Security Administration uses the systems to work on nuclear weapons, energy reseach and climate change, among other things. Sequoia complements another Blue Gene/Q system, a 10-petaflop setup called "Mira," which was previously announced by Argonne National Laboratory. A few images from the conference, which runs through Friday at the Washington State Convention & Trade Center, starting with perusal of Intel boards: Take home a Cray today! IBM was sporting Blue Genes, and it wasn't even casual Friday: A 94 teraflop rack: _____________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Wed Nov 16 12:44:43 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 16 Nov 2011 18:44:43 +0100 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores In-Reply-To: <9fc1b9b6-887a-4a2a-8757-0a587fe5181d@email.android.com> References: <20111116095238.GX31847@leitl.org> <9fc1b9b6-887a-4a2a-8757-0a587fe5181d@email.android.com> Message-ID: <9E1E2790-30D0-4BF6-A5E1-530FA55EE616@xs4all.nl> Well look everyone here is looking to the part of the machine that is delivering the 'big punch' which either is the Tesla's or the AMD 6990. However we shouldn't forget that in its basis each node is a 2 node intel Xeon machine having 2 intel Xeon cpu's and requires a very fast network. The weakness of the network is not only the network, but gets especially determined by the quality of those 2 cpu's, as they have to feed the GPU and more importantly also the network. Furthermore a part of the software is uncapable of running at GPU's and has to run on the cpu. That said the big punch being a Tesla, it's obvious that this can't be so high clocked like the gamerscards, as it focuses more upon reliability. We see recently that the bandwidth transfers one can achieve from CPU to GPU have tremendeously improved in bandwidth. From 2 GB/s to many gigabytes per second now and approaching also a big part of the total bandwidth the ram CAN deliver. Suppose we do a big multiplication of some giant prime number using a safe form of FFT (of course there is specialized forms here that are faster, but for readability i call it and not DWT). Now we can see the FFT as something that in O ( log n ) steps is doing a number of things. Only in the last few phases of the O ( log n ) we actually need communication between all the nodes. Basically there is nothing that prevents us from doing a double check of the results at a different GPU, meanwhile we are busy with the finalizing steps, as the majority of the GPU's basically idle anyway at that point. So if we would calculate just a single number, we can rather efficiently do a double check. of our GPU calculations, as the crunching power of those things is much above anything else that it's always ahead of any other step. Only if you already run other independant calculations at the same time, you can keep those GPU's busy. However if you'd run independant calculations, where do you need that massive huge expensive cluster for, as you could also give each machine its own number and just sit and wait until they all finished with it. So that's an embarrassingly parallel approach where basically it's a JBOM, "just a bunch of machines". In order words if we take advantage of the cluster as a whole to speedup the calculation, then the crucial reliability part of the calculation gets done by the CPU's, not by the GPU; it would be easy to give the GPU double checking time of results previously calculated and a simple comparision which happens while we are already some steps further, would occur. With GPU's you simply do have the system time to double check and you MUST double check; there is no reason to not buy GPU's with millions of sustainability demands, as the reaosn why they're so fast also is the reason why it's cheap and that's also the reason why you need to double check. So cheap kick butt GPU's is the way to go for now. On Nov 16, 2011, at 1:49 PM, Micha wrote: > They are just busting the one teraflop but they are going with it > into the GPU market, only without a GPU, i.e. they're competing > with the Tesla GPU here. The Tesla admittedly is also about 1 > TFlops but the consumer market has already gone past the 2 TFlop > mark about a year ago and the next generation is just around the > corner (will be operational before the mic). And the funny part is > that its a discrete (over pci) card that is running a software > micro-kernel ands scheduler that you can ssh into. > I'm not sure how much I buy into the hype their selling that it's > the next best thing because its x86 so you run the same code, > although aparantly its not binary compatible, so you do need to > recompile. And I think we all know that real world codes need a > rework to transfer well to different vector sizes and communication/ > synchronization/etc. So why is it so much better than picking up an > AMD or NVIDIA? > > Eugen Leitl wrote: > http://seattletimes.nwsource.com/html/technologybrierdudleysblog/ > 2016775145_wow_intel_unveils_1_teraflop_c.html > > Wow: Intel unveils 1 teraflop chip with 50-plus cores > > Posted by Brier Dudley > > I thought the prospect of quad-core tablet computers was exciting. > > Then I saw Intel's latest -- a 1 teraflop chip, with more than 50 > cores, that > Intel unveiled today, running it on a test machine at the SC11 > supercomputing > conference in Seattle. > > That means my kids may take a teraflop laptop to college -- if > their grades > don't suffer too much having access to 50-core video game consoles. > > It wasn't that long ago that Intel was boasting about the first > supercomputer > with sustained 1 teraflop performa nce. That was in 1997, on a > system with > 9,298 Pentium II chips that filled 72 computing cabinets. > > Now Intel has squeezed that much performance onto a matchbook-sized > chip, > dubbed "Knights Ferry," based on its new "Many Integrated Core" > architecture, > or MIC. > > It was designed largely in the Portland area and has just started > manufacturing. > > "In 15 years that's what we've been able to do. That is stupendous. > You're > witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general > manager of > Intel's technical computing group, said at an unveiling ceremony. > (He holds > up the chip here) > > A single teraflop is capable of a trillion floating point > operations per > second. > > On hand for the event -- in the cellar of the Ruth's Chris Steak > House in > Seattle -- were the directors of the National Center for Computational > Sciences at Oak Ridge Laboratory and the Application Acceleration > Center of > Excellence. > > Also speaking was the chief science officer of the GENCI > supercomputing > organization in France, which has used its Intel-based system for > molecular > simulations of Alzheimer's, looking at issues such as plaque > formation that's > a hallmark of the disease. > > "The hardware is hardly exciting. ... The exciting part is doing the > science," said Jeff Nichols, acting director of the computational > center at > Oak Ridge. > > The hardware was pretty cool, though. > > George Chrysos, the chief architect of Knights Ferry, came up from the > Portland area with a test system running the new chip, which was > connected to > a speed meter on a laptop to show that it was running around 1 > teraflop. > > Intel had the test system set up behind closed doors -- on a coffee > table in > a hotel suite at the Grand Hyatt, and wouldn't allow reporters to take > pictures of the setup. > > Nor would the company spe cify how many cores the chip has -- just > more than > 50 -- or its power requirement. > > If you're building a new system and want to future-proof it, the > Knights > Ferry chip uses a double PCI Express slot. Chrysos said the systems > are also > likely to run alongside a few Xeon processors. > > This means that Intel could be producing teraflop chips for personal > computers within a few years, although there's lots of work to be > done on the > software side before you'd want one. > > Another question is whether you'd want a processor that powerful on > a laptop, > for instance, where you may prefer to have a system optimized for > longer > battery life, Hazra said. > > More important, Knights Ferry chips may help engineers build the next > generation of supercomputing systems, which Intel and its partners > hope to > delivery by 2018. > > Power efficiency was a highlight of another big announcement this > week at > SC11. On Mon day night, IBM announced its "next generation > supercomputing > project," the Blue Gene/Q system that's heading to Lawrence Livermore > National Laboratory next year. > > Dubbed Sequoia, the system should run at 20 petaflops peak > performance. IBM > expects it to be the world's most power-efficient computer, > processing 2 > gigaflops per watt. > > The first 96 racks of the system could be delivered in December. The > Department of Energy's National Nuclear Security Administration > uses the > systems to work on nuclear weapons, energy reseach and climate > change, among > other things. > > Sequoia complements another Blue Gene/Q system, a 10-petaflop setup > called > "Mira," which was previously announced by Argonne National Laboratory. > > A few images from the conference, which runs through Friday at the > Washington > State Convention & Trade Center, starting with perusal of Intel > boards: > > > Take home a Cray today!< br /> > IBM was sporting Blue Genes, and it wasn't even casual Friday: > > A 94 teraflop rack: > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Nov 16 17:47:57 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Thu, 17 Nov 2011 09:47:57 +1100 Subject: [Beowulf] Intel unveils 1 teraflop chip with 50-plus cores In-Reply-To: <20111116095238.GX31847@leitl.org> References: <20111116095238.GX31847@leitl.org> Message-ID: <4EC43D9D.4070509@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 16/11/11 20:52, Eugen Leitl wrote: [quoting a newspaper report] > Now Intel has squeezed that much performance onto a matchbook-sized chip, > dubbed "Knights Ferry," based on its new "Many Integrated Core" architecture, > or MIC. Actually that's wrong, this is Knights Corner that has been announced (Knights Ferry was announced over a year ago).. http://communities.intel.com/community/openportit/server/blog/2011/11/15/supercomputing-2011-day-2-knights-corner-shown-at-1tf-per-socket # Today.. for the first time, Intel showed our first # silicon from the Knights Corner Product. It runs. # Even more yet, it showed 1 teraflop double precision # -- 1997 was dozens of cabinet -- 2011 is a single # 22nm chip. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7EPZ0ACgkQO2KABBYQAh8T9wCeIrYMtEB3ouzoGgwzbxzNbivu ToMAoJI3PrRi+uZR14M83rYKlg6KnM1p =1jkV -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Thu Nov 17 02:27:02 2011 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 17 Nov 2011 08:27:02 +0100 Subject: [Beowulf] HPC User Training Survey (PRACE) Message-ID: <20111117072702.GV31847@leitl.org> ----- Forwarded message from Rolf Rabenseifner ----- From: Rolf Rabenseifner Date: Thu, 17 Nov 2011 08:19:45 +0100 (CET) To: eugen at leitl.org Subject: HPC User Training Survey (PRACE) Sehr geehrte/r Teilnehmer/in meiner Kurseinladungsliste, ich habe diesmal ein groessere Bitte: Im Rahmen von PRACE fuehren wir eine Umfrage zu den HPC Fortbildungen in Europa durch. Es wuerde mir sehr helfen, wenn auch Sie daran teilnehmen koennten. Und es gibt als Dank sogar etwas zu gewinnen. Falls Sie selbst MPI oder OpenMP oder andere HPC Kurse geben (nicht Vorlesungen), dann betrifft Sie die zweite Umfrage, die ich gleich in einer zweiten Mail verschicke. Im voraus besten Dank und mit freundlichen Gruessen Rolf Rabenseifner -------------- Dear HPC user, We are writing to you on behalf of PRACE, the Partnership for Advanced Computing in Europe (www.prace-ri.eu), which has been established to create a persistent HPC infrastructure to provide Europe with world-class HPC resources. As part of this infrastructure, PRACE runs a training programme to enable users, such as yourself, to fully exploit HPC systems that are made available via its regular calls for proposals (http://www.prace-ri.eu/Call-Announcements). The PRACE project has designed a survey to assess the current level of knowledge and satisfaction with existing HPC training. Results from this will be used to guide training events that will be offered to you and your colleagues all over Europe. As a candidate/existing user of PRACE Tier-1 and/or Tier-0 systems, you are invited to take part in the survey. Please redistribute this email and survey link to the technical members (staff and students) of your group who are developing and/or using applications on high-end HPC resources and who may also benefit from PRACE training. All responses will be treated confidentially. Respondents' identities will not be disclosed in any publication of the survey results. Each participant who fully completes the survey will be eligible to enter a draw to win one of three Amazon Kindle devices. The link to the survey is as follows: http://survey.ipb.ac.rs/index.php?sid=32117&lang=en The survey will close on 17:00 (CET) on 25th November 2011. Thank you in advance for your contribution. Please direct any enquiries to danica at ipb.ac.rs Rolf Rabenseifner, on behalf of the PRACE training team --------------------------------------------------------------------- Dr. Rolf Rabenseifner .. . . . . . . . . . email rabenseifner at hlrs.de High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 University of Stuttgart .. . . . . . . . . fax : ++49(0)711/685-65832 Head of Dpmt Parallel Computing .. .. www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 Stuttgart, Germany . . (Office: Allmandring 30) --------------------------------------------------------------------- ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Thu Nov 17 02:27:06 2011 From: eugen at leitl.org (Eugen Leitl) Date: Thu, 17 Nov 2011 08:27:06 +0100 Subject: [Beowulf] HPC **Trainer** Survey (PRACE) Message-ID: <20111117072706.GW31847@leitl.org> ----- Forwarded message from Rolf Rabenseifner ----- From: Rolf Rabenseifner Date: Thu, 17 Nov 2011 08:22:06 +0100 (CET) To: eugen at leitl.org Subject: HPC **Trainer** Survey (PRACE) Nur fuer den Fall, dass Sie selbst HPC-Trainingskurse (nicht Vorlesungen) geben: Dann habe ich noch eine zweite Bitte: Koennten Sie bitte (auch) die Trainer-Umfrage ausfuellen. Im voraus besten Dank und mit freundlichen Gruessen Rolf Rabenseifner -------------- Dear HPC Trainer, We are writing to invite you to participate in a HPC trainer survey, designed by PRACE, with the aim to better understand the current scope and level of HPC training expertise across Europe, as well as the views and needs of the HPC trainers themselves. You have been invited to participate in this survey as there are indications that you provide HPC training. Even if you are not involved in training for PRACE, your views and participation would be welcome and greatly appreciated. The link to the survey is as follows: http://survey.ipb.ac.rs/index.php?sid=68456&lang=en The survey will close on 17:00 (CET) on 25th November 2011. Thank you in advance for your contribution. Please direct any enquiries to danica at ipb.ac.rs Rolf Rabenseifner, on behalf of the PRACE training team --------------------------------------------------------------------- Dr. Rolf Rabenseifner .. . . . . . . . . . email rabenseifner at hlrs.de High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 University of Stuttgart .. . . . . . . . . fax : ++49(0)711/685-65832 Head of Dpmt Parallel Computing .. .. www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 Stuttgart, Germany . . (Office: Allmandring 30) --------------------------------------------------------------------- ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Mon Nov 21 14:28:04 2011 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 21 Nov 2011 11:28:04 -0800 Subject: [Beowulf] Leif Nixon gets quoted in Forbes Message-ID: <20111121192804.GC29861@bx9.net> Beowulf-list-participant Lief Nixon got quoted in Forbes! http://www.forbes.com/sites/andygreenberg/2011/11/17/chinas-great-firewall-tests-mysterious-scans-on-encrypted-connections/ -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From nixon at nsc.liu.se Mon Nov 21 14:48:50 2011 From: nixon at nsc.liu.se (Leif Nixon) Date: Mon, 21 Nov 2011 20:48:50 +0100 Subject: [Beowulf] Leif Nixon gets quoted in Forbes In-Reply-To: <20111121192804.GC29861@bx9.net> (Greg Lindahl's message of "Mon, 21 Nov 2011 11:28:04 -0800") References: <20111121192804.GC29861@bx9.net> Message-ID: Greg Lindahl writes: > Beowulf-list-participant Lief Nixon got quoted in Forbes! > > http://www.forbes.com/sites/andygreenberg/2011/11/17/chinas-great-firewall-tests-mysterious-scans-on-encrypted-connections/ Thank you. Yes, my little discovery seems to have attracted quite a bit of attention. Any list members with users logging in from China, please check your ssh logs. I'd be interested in comparing notes with you. -- Leif Nixon - Security officer National Supercomputer Centre - Swedish National Infrastructure for Computing Nordic Data Grid Facility - European Grid Infrastructure _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 21 16:33:17 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 21 Nov 2011 22:33:17 +0100 Subject: [Beowulf] Leif Nixon gets quoted in Forbes In-Reply-To: References: <20111121192804.GC29861@bx9.net> Message-ID: <9D87F26B-67B3-49A8-9FB3-A3ECDA86B137@xs4all.nl> I'm not sure i understand what happens. In itself just cracking SSH is rather easy for governments, it's just 1024 - 2048 bits RSA, and they can factor that with a reasonable beowulf cluster with special hardware quite easily. You can already prove this for a rather old algorithm to be the case, and i'm sure the math guys got up with something much better nowadays - so will China. So what's the scan really doing, is it a physical verification of where you communicate from, so a physical adress, so mapping every user on this planet to a specific physical location? How sure is it that China is behind everything knowing they already can crack everything anyway? China gets everywhere the blame, in the meantime i have lots of Irani's lurking around here. Vincent On Nov 21, 2011, at 8:48 PM, Leif Nixon wrote: > Greg Lindahl writes: > >> Beowulf-list-participant Lief Nixon got quoted in Forbes! >> >> http://www.forbes.com/sites/andygreenberg/2011/11/17/chinas-great- >> firewall-tests-mysterious-scans-on-encrypted-connections/ > > Thank you. Yes, my little discovery seems to have attracted quite a > bit > of attention. > > Any list members with users logging in from China, please check > your ssh > logs. I'd be interested in comparing notes with you. > > -- > Leif Nixon - Security officer > National Supercomputer Centre - Swedish National Infrastructure for > Computing > Nordic Data Grid Facility - European Grid Infrastructure > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Nov 21 21:33:31 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 22 Nov 2011 13:33:31 +1100 Subject: [Beowulf] Leif Nixon gets quoted in Forbes In-Reply-To: References: <20111121192804.GC29861@bx9.net> Message-ID: <4ECB09FB.9040701@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/11 06:48, Leif Nixon wrote: > Any list members with users logging in from China, please check your ssh > logs. I'd be interested in comparing notes with you. I thought I might have seen something similar back in October 2010: Oct 19 00:22:13 merri sshd[11001]: Bad protocol version identification '\377\373\037\377\373 \377\373\030\377\373'\377\375\001\377\373\003\377\375\003\377\370\003' from UNKNOWN but it turns out that all the patterns here are the same, and Google leads me here: http://seclists.org/fulldisclosure/2004/Mar/1243 indicating a broken telnet program (for our case only). ;-) cheers! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7LCfsACgkQO2KABBYQAh/lnwCeInJRW7wIK0msEmCBJOf9wMNR RTAAnjk/bT5qBvb7o14CtCPYbJS7+/wH =gcRb -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Nov 21 21:35:38 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 22 Nov 2011 13:35:38 +1100 Subject: [Beowulf] Leif Nixon gets quoted in Forbes In-Reply-To: <9D87F26B-67B3-49A8-9FB3-A3ECDA86B137@xs4all.nl> References: <20111121192804.GC29861@bx9.net> <9D87F26B-67B3-49A8-9FB3-A3ECDA86B137@xs4all.nl> Message-ID: <4ECB0A7A.7040301@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/11 08:33, Vincent Diepeveen wrote: > I'm not sure i understand what happens. Basically the suspicion is that it's a way of automatically spotting and blocking access to certain encrypted egress methods from China. I didn't see any indication this was to do with trying to break the crypto. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7LCnoACgkQO2KABBYQAh8agACfUhKlWA03Y2rYNK5Sq7EWvb35 P3oAn3KVFEQfHzi/0nfWZyKeY3Vxc8wG =ELz9 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From nixon at nsc.liu.se Tue Nov 22 03:08:20 2011 From: nixon at nsc.liu.se (Leif Nixon) Date: Tue, 22 Nov 2011 09:08:20 +0100 Subject: [Beowulf] Leif Nixon gets quoted in Forbes In-Reply-To: <4ECB09FB.9040701@unimelb.edu.au> (Christopher Samuel's message of "Tue, 22 Nov 2011 13:33:31 +1100") References: <20111121192804.GC29861@bx9.net> <4ECB09FB.9040701@unimelb.edu.au> Message-ID: Christopher Samuel writes: > On 22/11/11 06:48, Leif Nixon wrote: > >> Any list members with users logging in from China, please check your ssh >> logs. I'd be interested in comparing notes with you. > > I thought I might have seen something similar back in October 2010: > > Oct 19 00:22:13 merri sshd[11001]: Bad protocol version identification '\377\373\037\377\373 \377\373\030\377\373'\377\375\001\377\373\003\377\375\003\377\370\003' from UNKNOWN Yes, we see the odd telnet handshake coming in as well, but as you say, that is a different matter. Thanks for looking! -- Leif Nixon - Security officer National Supercomputer Centre - Swedish National Infrastructure for Computing Nordic Data Grid Facility - European Grid Infrastructure _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Tue Nov 1 11:31:36 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 01 Nov 2011 11:31:36 -0400 Subject: [Beowulf] Users abusing screen In-Reply-To: References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de> <4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu> <774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca> <20111027054147.GB29939@bx9.net> <4EAB2AFE.7000901@ias.edu> Message-ID: <4EB010D8.6060606@ias.edu> IAS is definitely interesting place. I'll have to look up Dima in the directory. On 10/28/2011 07:16 PM, Peter St. John wrote: > Prentice, > No, I didin't mean to imply anything specific about e.g. your budget, > but IAS has a fantastic reputation. > Say hi to Dima for me, he plays Go and is an algebraic geometer > visiting this year. > Peter > > On Fri, Oct 28, 2011 at 6:21 PM, Prentice Bisbal > wrote: > > > On 10/28/2011 04:56 PM, Peter St. John wrote: > > I think Greg is right on the money. Particularly at a place like > IAS, > > where resources are good and users may be errant but are doing great > > things, > > Have you been a visitor, member or staff member at IAS? > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Tue Nov 1 17:42:08 2011 From: deadline at eadline.org (Douglas Eadline) Date: Tue, 1 Nov 2011 17:42:08 -0400 (EDT) Subject: [Beowulf] Exascale Breakfast Message-ID: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> I have nothing to say about screen, but I do want to mention I will be moderating a breakfast panel discussion on the push toward exascale at SC11 this year. It is sponsored but Panasas and SICORP. Here are the details http://www.clustermonkey.net//content/view/314/1/ Does an Exascale Breakfast mean lots of food? And, the Beobash announcement is imminent! Monday November 14th 9PM. Traditional witty invite to arrive soon. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Tue Nov 1 22:10:00 2011 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Tue, 01 Nov 2011 19:10:00 -0700 Subject: [Beowulf] HP redstone servers Message-ID: <4EB0A678.9060602@cse.ucdavis.edu> The best summary I've found: http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ Specifications at for the ECX-1000: http://www.calxeda.com/products/energycore/ecx1000/techspecs And EnergyCard: http://www.calxeda.com/products/energycards/techspecs The only hint on price that I found was from theregister.co.uk: The sales pitch for the Redstone systems, says Santeler, is that a half rack of Redstone machines and their external switches implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, and costs $1.2m. So it sounds like for 6 watts and $750 you get a quad core 1.4 GHz arm 10G connected node. Comments? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Nov 2 09:10:10 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 02 Nov 2011 09:10:10 -0400 Subject: [Beowulf] Exascale Breakfast In-Reply-To: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> References: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> Message-ID: <4EB14132.30902@ias.edu> On 11/01/2011 05:42 PM, Douglas Eadline wrote: > I have nothing to say about screen, I have a feeling that 10 years from now, I'm still going to be getting shit about this. -- Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Nov 3 07:53:44 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 3 Nov 2011 07:53:44 -0400 (EDT) Subject: [Beowulf] 2011 BeoBash In-Reply-To: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> References: <39114.192.168.93.213.1320183728.squirrel@mail.eadline.org> Message-ID: <35890.67.249.182.227.1320321224.squirrel@mail.eadline.org> You asked for it, here it is: http://www.xandmarketing.com/beobash11/ I have one request, if you work for a company that is not already a sponsor, please ask them to consider helping with the event. It is a community party with lots of visibility and promotion. Please have them contact Lara at lara at xandmarketing.com as soon as possible. Since we have grown so big, if we do not get enough sponsors we may have to limit the attendees -- something we do not want to do. Give a little, get a lot... And, a big thank you to the sponsors! -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Sun Nov 6 18:01:02 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 00:01:02 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions Message-ID: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> hi, There is a lot of infiniband 4x stuff on ebay now. Most interesting for me to buy some and toy with it. However i have not much clue about infiniband as it always used to be 'out of my pricerange'. So looking for som expert advice here. A few questions: Now i assume what i need is: bunch of cards, copper cables, and a switch, and a big dosis of luck. There is a lot of cards on ebay with 2 ports. 10 gbit + 10 gbit. Do i need to connect both to the same switch? So in short with infiniband you lose 2 ports of the switch to 1 card, is that correct? CARDS: Do all switches work with all cards? Can you mix the many different cards that are out there of 4x infiniband? If not, can you mix from mellanox the different cards theirs? So mix from 1 manufacturer cards? SOCKET 1155: Do the cards work for socket 1155 if it's pci-e versions? (of course watercooled nodes each) Is there a limit on how much RAM the machine can have? (referring to the famous 4GB limit of the QM400 cards of quadrics) Does infiniband 4x work for x64 machines? DRIVERS: Drivers for cards now. Are those all open source, or does it require payment? Is the source released of all those cards drivers, and do they integrate into linux? MPI: The MPI library you can use with each card is that different manufacturer from manufacturer? Free to download and can work with different distro's? Does it compile? Is it just a library or a modified compiler? Note i assume it's possible to combine it all with pdsh. SWITCH: I see a bunch of topspin 120 switches there. Supposed to be 200 ns. there is a 47 manual page, yet it doesn't mention anything about a password needed to login, only the default password it mentions. What if it already has been set, as one ebay guy mentions he didn't manage to login there. Is it possible to reset that login or isn't it possible to modify login password? Is it possible to combine 2 switches and have so to speak a 48 port switch then? Oh btw i plan to ship messages sized 256 bytes massively over the switch. Would it work if i add a 2nd switch just to act as a 2nd rail? And a 3d and a 4th rail also work? So a rail or 4 would it work? Really important question that rail question. As that would allow more messages per second. Most messages will be a byte or 128-256 and for sure nothing will be bigger. Some messages are shorter. If 128 is that much faster i'd go for 128. What more do i need to know? Lots of simple questions in short! Many thanks in advance for answerring any question or raising new ones :) Regards, Vincent Diepeveen diep at xs4all.nl _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hearnsj at googlemail.com Mon Nov 7 06:10:50 2011 From: hearnsj at googlemail.com (John Hearns) Date: Mon, 7 Nov 2011 11:10:50 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: Vincent, I cannot answer all of your questions. I have a couple of answers: Regarding MPI, you will be looking for OpenMPI You will need a subnet manager running somewhere on the fabric. These can either run on the switch or on a host. If you are buying this equipment from eBay I would imagine you will be running the Open Fabrics subnet manager on a host on your cluster, rather than on a switch. I might be wrong - depends if the switch has a SM license. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From eugen at leitl.org Mon Nov 7 06:35:40 2011 From: eugen at leitl.org (Eugen Leitl) Date: Mon, 7 Nov 2011 12:35:40 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <20111107113540.GO31847@leitl.org> On Mon, Nov 07, 2011 at 11:10:50AM +0000, John Hearns wrote: > Vincent, > I cannot answer all of your questions. > I have a couple of answers: > > Regarding MPI, you will be looking for OpenMPI > > You will need a subnet manager running somewhere on the fabric. > These can either run on the switch or on a host. > If you are buying this equipment from eBay I would imagine you will be > running the Open Fabrics subnet manager > on a host on your cluster, rather than on a switch. > I might be wrong - depends if the switch has a SM license. Assuming ebay-sourced equipment, what price tag are we roughly looking at, per node, assuming small (8-16 nodes) cluster sizes? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From robh at dongle.org.uk Mon Nov 7 06:44:49 2011 From: robh at dongle.org.uk (Robert Horton) Date: Mon, 07 Nov 2011 11:44:49 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <1320666289.1856.25.camel@moelwyn> Hi, Most of what I know about Infiniband came from the notes at http://www.hpcadvisorycouncil.com/events/switzerland_workshop/agenda.php (or John Hearns in his previous life!). On Mon, 2011-11-07 at 00:01 +0100, Vincent Diepeveen wrote: > Do i need to connect both to the same switch? > So in short with infiniband you lose 2 ports of the switch to 1 > card, is that correct? You probably want to just connect one port to a switch and leave the other one unconnected to start with. > CARDS: > Do all switches work with all cards? > Can you mix the many different cards that are out there of 4x > infiniband? > If not, can you mix from mellanox the different cards theirs? So mix > from 1 manufacturer cards? They will (or at least should) all work to a point but depending on what combination you are using you may not get some features. If you want an easy life keep it all from the same manufacturer > Does infiniband 4x work for x64 machines? The 4x bit is the number of links aggregated together. 4x is normal for connections from a switch to a node, higher numbers are sometimes used for inter-switch links. You also need to note the data rate (eg SDR, DDR, QDR etc). > DRIVERS: > Drivers for cards now. Are those all open source, or does it require > payment? Is the source released of > all those cards drivers, and do they integrate into linux? You should get everything you need from the Linux kernel and / or OFED. > MPI: > The MPI library you can use with each card is that different > manufacturer from manufacturer? Free to download > and can work with different distro's? Does it compile? Is it just a > library or a modified compiler? There are quite a lot to choose from but OpenMPI is probably a good starting point. Hope that's some help... Rob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 09:28:37 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 15:28:37 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> hi John, I had read already about subnet manager but i don't really understand this, except when it's only configuration tool. I assume it's not something that's critical in terms of bandwidth, it doesn't need nonstop bandwidth from the machine & switch is it? In case of a simple cluster consisting out of 1 switch with some nodes attached, is it really a problem? On Nov 7, 2011, at 12:10 PM, John Hearns wrote: > Vincent, > I cannot answer all of your questions. > I have a couple of answers: > > Regarding MPI, you will be looking for OpenMPI > > You will need a subnet manager running somewhere on the fabric. > These can either run on the switch or on a host. > If you are buying this equipment from eBay I would imagine you will be > running the Open Fabrics subnet manager > on a host on your cluster, rather than on a switch. > I might be wrong - depends if the switch has a SM license. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 09:45:33 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 15:45:33 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions - using 1 port out of 2 In-Reply-To: <1320666289.1856.25.camel@moelwyn> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> On Nov 7, 2011, at 12:44 PM, Robert Horton wrote: > Hi, > > Most of what I know about Infiniband came from the notes at > http://www.hpcadvisorycouncil.com/events/switzerland_workshop/ > agenda.php > (or John Hearns in his previous life!). > > On Mon, 2011-11-07 at 00:01 +0100, Vincent Diepeveen wrote: >> Do i need to connect both to the same switch? >> So in short with infiniband you lose 2 ports of the switch to 1 >> card, is that correct? > > You probably want to just connect one port to a switch and leave the > other one unconnected to start with. What's the second one doing, is this just in case the switch fails, a kind of 'backup' port? In my naivity i had thought that both ports together formed the bidirectional link to the switch. So i thought that 1 port was for 10 gigabit upstream and the other port was for 10 gigabit downstream, did i misunderstood that? > >> CARDS: >> Do all switches work with all cards? >> Can you mix the many different cards that are out there of 4x >> infiniband? >> If not, can you mix from mellanox the different cards theirs? So mix >> from 1 manufacturer cards? > > They will (or at least should) all work to a point but depending on > what > combination you are using you may not get some features. If you > want an > easy life keep it all from the same manufacturer I will load the switch (es) to the maximum number of messages a second it can handle, > >> Does infiniband 4x work for x64 machines? > > The 4x bit is the number of links aggregated together. 4x is normal > for > connections from a switch to a node, higher numbers are sometimes used > for inter-switch links. You also need to note the data rate (eg SDR, > DDR, QDR etc). > >> DRIVERS: >> Drivers for cards now. Are those all open source, or does it require >> payment? Is the source released of >> all those cards drivers, and do they integrate into linux? > > You should get everything you need from the Linux kernel and / or > OFED. > >> MPI: >> The MPI library you can use with each card is that different >> manufacturer from manufacturer? Free to download >> and can work with different distro's? Does it compile? Is it just a >> library or a modified compiler? > > There are quite a lot to choose from but OpenMPI is probably a good > starting point. > > Hope that's some help... > > Rob > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From j.wender at science-computing.de Mon Nov 7 09:45:38 2011 From: j.wender at science-computing.de (Jan Wender) Date: Mon, 07 Nov 2011 15:45:38 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: <4EB7EF12.8000403@science-computing.de> Hi all, a relatively easy to read introduction to IB is found at http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf Cheerio, Jan -- ---- Company Information ---- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Philippe Miltin Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- A non-text attachment was scrubbed... Name: j_wender.vcf Type: text/x-vcard Size: 338 bytes Desc: not available URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at mclaren.com Mon Nov 7 10:45:39 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Mon, 7 Nov 2011 15:45:39 -0000 Subject: [Beowulf] building Infiniband 4x cluster questions References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: <207BB2F60743C34496BE41039233A809092AD253@MRL-PWEXCHMB02.mil.tagmclarengroup.com> > > hi John, > > I had read already about subnet manager but i don't really understand > this, except when it's only configuration tool. > > I assume it's not something that's critical in terms of bandwidth, it > doesn't need nonstop bandwidth from the machine & switch is it? > It is critical. I perhaps am not explaining this correctly. In an Ethernet network you have a MAC address and the process of ARPing - ie if you want to open a connection to another host on the Ethernet, you broadcast its IP address and you get returned a MAC address. Hey, that's why its called an ETHERnet (geddit? Oh, the drollery of those Xerox engineers) Anyway, on an Infiniband network the Subnet Manager assigns new hosts a LID (local identifier) and keeps track of routing tables between them. No SM, no new hosts join the network. An Infiniband expert will be along in a minute and explain that you can operate a fabric without an SM and I shall stand corrected. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From sabujp at gmail.com Mon Nov 7 11:01:34 2011 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Mon, 7 Nov 2011 10:01:34 -0600 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <207BB2F60743C34496BE41039233A809092AD253@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> <207BB2F60743C34496BE41039233A809092AD253@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: > Anyway, on an Infiniband network the Subnet Manager assigns new hosts a > LID (local identifier) > and keeps track of routing tables between them. > No SM, no new hosts join the network. Regardless, make sure you're running opensm on an at least one of the nodes connected to your IB switch. I didn't have to configure anything within the manager, just make sure it's running. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From robh at dongle.org.uk Mon Nov 7 11:07:07 2011 From: robh at dongle.org.uk (Robert Horton) Date: Mon, 07 Nov 2011 16:07:07 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions - using 1 port out of 2 In-Reply-To: <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> Message-ID: <1320682027.1856.61.camel@moelwyn> On Mon, 2011-11-07 at 15:45 +0100, Vincent Diepeveen wrote: > What's the second one doing, is this just in case the switch fails, > a > kind of 'backup' port? > > In my naivity i had thought that both ports together formed the > bidirectional link to the switch. > So i thought that 1 port was for 10 gigabit upstream and the other > port was for 10 gigabit downstream, > did i misunderstood that? It's "normal" to just use single port cards in a compute server. You might want to use 2 (or more) to increase the bandwidth to a particular machine (might be useful for a fileserver, for instance) or if you are linking nodes to each other (rather than via a switch) in a taurus-type topology. Rob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 11:13:13 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 16:13:13 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <1320666289.1856.25.camel@moelwyn> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: > > Do i need to connect both to the same switch? > > So in short with infiniband you lose 2 ports of the switch to 1 card, > > is that correct? > > You probably want to just connect one port to a switch and leave the other > one unconnected to start with. Correct. You only need to connect one port. The second port can be used for performance increase or fail over for example. > > CARDS: > > Do all switches work with all cards? > > Can you mix the many different cards that are out there of 4x > > infiniband? > > If not, can you mix from mellanox the different cards theirs? So mix > > from 1 manufacturer cards? > > They will (or at least should) all work to a point but depending on what > combination you are using you may not get some features. If you want an > easy life keep it all from the same manufacturer All cards and switches build according to the spec will work. > > Does infiniband 4x work for x64 machines? > > The 4x bit is the number of links aggregated together. 4x is normal for > connections from a switch to a node, higher numbers are sometimes used > for inter-switch links. You also need to note the data rate (eg SDR, DDR, QDR > etc). 4X means 4 network lanes (same as the PCIe convention - PCIe x4, x8 etc.). It is related to the port speed, not the server architecture. Most of the InfiniBand port out there are 4X > > > DRIVERS: > > Drivers for cards now. Are those all open source, or does it require > > payment? Is the source released of all those cards drivers, and do > > they integrate into linux? > > You should get everything you need from the Linux kernel and / or OFED. You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Nov 7 11:36:22 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 07 Nov 2011 11:36:22 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> Message-ID: <4EB80906.4040501@ias.edu> On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: > hi, > > There is a lot of infiniband 4x stuff on ebay now. Vincent, Do you mean 4x, or QDR? They refer to different parts of the IB architecture. 4x refers to the number of lanes for the data to travel down and QDR refers to the data signalling rate. It's probably irrelevant for this conversation, but if you are just learning about IB, It's good to understand that difference. Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Nov 7 11:50:36 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 07 Nov 2011 11:50:36 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> Message-ID: <4EB80C5C.6020605@ias.edu> >>> DRIVERS: >>> Drivers for cards now. Are those all open source, or does it require >>> payment? Is the source released of all those cards drivers, and do >>> they integrate into linux? >>> You should get everything you need from the Linux kernel and / or OFED. > > You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows > I bought some Supermicro systems about a year ago (maybe new than that), with newer Mellanox cards( Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0 5GT/s - IB DDR / 10GigE). That aren't fully supported by the OFED that comes with RHEL/CentOS, not even version 6.1, so I had to download the latest Mellanox OFED to get them to work. I can confirm Gilad's statement that you can download them for free, they are 100% open source, and you don't need to be a paying customer or register on the Mellanox site, or any of that BS. Prentice _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Mon Nov 7 11:54:11 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Mon, 7 Nov 2011 08:54:11 -0800 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB7EF12.8000403@science-computing.de> Message-ID: An interesting writeup.. A sort of tangential question about that writeup.. They use the term "message oriented" with the description that the IB hardware takes care of segmentation and so forth, so that the application just says "send this" or "receive this" and the gory details are concealed. Then he distinguishes that from a TCP/IP stack, etc., where the software does a lot of this, with the implication that the user has to be involved in that. But it seems to me that the same processes are going on.. You have a big message, it needs to be broken up, etc. And for *most users* all that is hidden underneath the hood of, say, MPI. (obviously, if you are a message passing software writer, the distinction is important). On 11/7/11 6:45 AM, "Jan Wender" wrote: >Hi all, > >a relatively easy to read introduction to IB is found at >http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf > >Cheerio, >Jan >-- >---- Company Information ---- >Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, >Dr. >Arno Steitz, Dr. Ingrid Zech >Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: >Philippe Miltin >Sitz/Registered Office: Tuebingen Registergericht/Registration Court: >Stuttgart >Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 11:58:41 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 17:58:41 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB80906.4040501@ias.edu> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> Message-ID: hi Prentice, I had noticed the diff between SDR up to QDR, the SDR cards are affordable, the QDR isn't. The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap prices in that pricerange yet. If i would want to build a network that's low latency and had a budget of $800 or so a node of course i would build a dolphin SCI network, as that's probably the fastest latency card sold for a $675 or so a piece. I do not really see a rival latency wise to Dolphin there. I bet most manufacturers selling clusters don't use it as they can make $100 more profit or so selling other networking stuff, and universities usually swallow that. So price total dominates the network. As it seems now infiniband 4x is not going to offer enough performance. The one-way pingpong latencies over a switch that i see of it, are not very convincing. I see remote writes to RAM are like nearly 10 microseconds for 4x infiniband and that card is the only one affordable. The old QM400's i have here are one-way pingpong 2.1 us or so, and QM500-B's are plentyful on the net (of course big disadvantage: needs pci-x), which are a 1.3 us or so there and have SHMEM. Not seeing a cheap switch for the QM500's though nor cables. You see price really dominates everything here. Small cheap nodes you cannot build if the port price, thanks to expensive network card, more than doubles. Power is not the real concern for now - if a factory already burns a couple of hundreds of megawatts, a small cluster somewhere on the attick eating a few kilowatts is not really a problem :) On Nov 7, 2011, at 5:36 PM, Prentice Bisbal wrote: > > On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: >> hi, >> >> There is a lot of infiniband 4x stuff on ebay now. > > Vincent, > > Do you mean 4x, or QDR? They refer to different parts of the IB > architecture. 4x refers to the number of lanes for the data to travel > down and QDR refers to the data signalling rate. > > It's probably irrelevant for this conversation, but if you are just > learning about IB, It's good to understand that difference. > > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 12:02:30 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 18:02:30 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB80C5C.6020605@ias.edu> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <4EB80C5C.6020605@ias.edu> Message-ID: On Nov 7, 2011, at 5:50 PM, Prentice Bisbal wrote: >>>> DRIVERS: >>>> Drivers for cards now. Are those all open source, or does it >>>> require >>>> payment? Is the source released of all those cards drivers, and do >>>> they integrate into linux? >>>> You should get everything you need from the Linux kernel and / >>>> or OFED. >> >> You can also find the drivers on the vendors sites. Not sure about >> the rest, but for the Mellanox case it is open source and free - >> both for Linux and Windows >> > > I bought some Supermicro systems about a year ago (maybe new than > that), > with newer Mellanox cards( Mellanox Technologies MT26418 [ConnectX VPI > PCIe 2.0 5GT/s - IB DDR / 10GigE). That aren't fully supported by the > OFED that comes with RHEL/CentOS, not even version 6.1, so I had to > download the latest Mellanox OFED to get them to work. I can confirm > Gilad's statement that you can download them for free, they are 100% > open source, and you don't need to be a paying customer or > register on > the Mellanox site, or any of that BS. Yeah i saw that some websites charge money for that. I saw the Dolphin website wants $5000 for a developer license or something vague. I call that 'download rights for the SDK'. Sounds weird to me. As for the MT26418 that's $562, that's factors too much for a low budget cluster that's low latency. Another website i checked out was an Indian website: plx technologies. Seems in India. However didn't allow me to register. For bandiwdth you don't need to check them out, as in some webvideo i saw them speak about 600MB/s as if it was a lot, which is of course a joke even to old 4x infiniband, which gets handsdown 800MB/s. But for latency might not be bad idea. Yet didn't allow me to register that plxtechnologies, which is weird. > Prentice > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 12:20:24 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 18:20:24 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions - using 1 port out of 2 In-Reply-To: <1320682027.1856.61.camel@moelwyn> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <1320666289.1856.25.camel@moelwyn> <29EE59EA-02F1-4F24-9FE4-8C317196B75C@xs4all.nl> <1320682027.1856.61.camel@moelwyn> Message-ID: <658BF446-A7C5-473B-93D8-EA8EAC256F47@xs4all.nl> On Nov 7, 2011, at 5:07 PM, Robert Horton wrote: > On Mon, 2011-11-07 at 15:45 +0100, Vincent Diepeveen wrote: >> What's the second one doing, is this just in case the switch fails, >> a >> kind of 'backup' port? >> >> In my naivity i had thought that both ports together formed the >> bidirectional link to the switch. >> So i thought that 1 port was for 10 gigabit upstream and the other >> port was for 10 gigabit downstream, >> did i misunderstood that? > > It's "normal" to just use single port cards in a compute server. You > might want to use 2 (or more) to increase the bandwidth to a > particular > machine (might be useful for a fileserver, for instance) or if you are > linking nodes to each other (rather than via a switch) in a taurus- > type > topology. > > Rob > It's still not clear to me what exactly the 2nd link is doing. If i want to ship th emaximum amount of short messages, say 128 bytes each message, is a 2nd cable gonna increase the number of messages i can ship? In fact the messages i'll be shipping out is requests to read remote in a blocking manner 128 bytes. So say this proces P at node N0 wants from some other node N1 exactly 128 bytes from the gigabytes big hashtable. That's a blocked read. The number of blocked reads per second that can read a 128 bytes is the only thing that matters for the network, nothing else. Note it will also do writes, but with writes you always can be doing things in a more tricky manner. So to speak you can queue up a bunch and ship them. Writes do not need to be non-blocking. If they flow at a tad slower speed to the correct node and get written that's also ok. The write is 32 bytes max. In fact i don't want to read 128 bytes. As that 128 bytes is 4 entries and as in such cluster network it's a layered system, if i would be able to modify the source code doing the read, all i would give is a location, the host processor then can do the read of 32 bytes and give that. As i assume the network to be silly and not able to execute remote code, i read 128 bytes and figure out here which of the 4 positions *possible* is the correct position stored (odds about a tad more than 5% that a position already was stored before). So ideally i'd be doing reads of 32 bytes, yet as the request for the read is not capable of selecting the correct position, it has to scan 128 bytes for it, so i get the entire 128 bytes. The number of 128 byte reads per second randomized over the hashtable that's spreaded over the nodes, is the speed at which the 'mainsearch' can search. I'm guessing blocked reads to eat nearly 10 microseconds with infiniband 4x, so that would mean i can do about a 100k lookups a card. Question is whether connecting the 2nd port would speedup that to more than 100k reads per second. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 12:19:54 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 17:19:54 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: > hi John, > > I had read already about subnet manager but i don't really understand this, > except when it's only configuration tool. > > I assume it's not something that's critical in terms of bandwidth, it doesn't > need nonstop bandwidth from the machine & switch is it? The subnet management is just an agent in the fabric that give identifiers to the ports and set the routing in the fabric (in case of static routing). It will also discover new nodes once connected to the fabric, or nodes that went down (in the later case, it can modify the routing accordingly). The agent requires negligible network resources, so no need to worry. You can run the subnet management from a server (head node for example using OpenSM for example) or from one of the switches. Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 12:14:56 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 17:14:56 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> Message-ID: > I had noticed the diff between SDR up to QDR, the SDR cards are affordable, > the QDR isn't. > > The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap prices in > that pricerange yet. You can also find cards on www.colfaxdirect.com. You can also check with the HPC Advisory Council (www.hpcadvisorycouncil.com) - they are doing refresh cycles for their systems, and might have some older cards to donate. Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 12:16:26 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 17:16:26 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB7EF12.8000403@science-computing.de> Message-ID: > They use the term "message oriented" with the description that the IB > hardware takes care of segmentation and so forth, so that the application > just says "send this" or "receive this" and the gory details are > concealed. Then he distinguishes that from a TCP/IP stack, etc., where > the software does a lot of this, with the implication that the user has to be > involved in that. > > But it seems to me that the same processes are going on.. You have a big > message, it needs to be broken up, etc. > And for *most users* all that is hidden underneath the hood of, say, MPI. > (obviously, if you are a message passing software writer, the distinction is > important). You can also post large message to the IB interface (up to 2GB I believe) and the IB transport will break it to the network MTU. Gilad _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 12:26:41 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 18:26:41 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <3D45761B-6EBE-4B8E-BFDA-1692E3B69703@xs4all.nl> Message-ID: <6E607B84-8C7F-4C40-9916-48EAEFAF46A1@xs4all.nl> Thanks for the very clear explanation Gilad! You beated with just 2 lines entire wiki and lots of other homepages with endless of chatter :) On Nov 7, 2011, at 6:19 PM, Gilad Shainer wrote: >> hi John, >> >> I had read already about subnet manager but i don't really >> understand this, >> except when it's only configuration tool. >> >> I assume it's not something that's critical in terms of bandwidth, >> it doesn't >> need nonstop bandwidth from the machine & switch is it? > > The subnet management is just an agent in the fabric that give > identifiers to the ports and set the routing in the fabric (in case > of static routing). It will also discover new nodes once connected > to the fabric, or nodes that went down (in the later case, it can > modify the routing accordingly). The agent requires negligible > network resources, so no need to worry. You can run the subnet > management from a server (head node for example using OpenSM for > example) or from one of the switches. > > Gilad > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Mon Nov 7 13:16:00 2011 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 07 Nov 2011 13:16:00 -0500 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> Message-ID: <4EB82060.3050300@ias.edu> Vincent, Don't forget that between SDR and QDR, there is DDR. If SDR is too slow, and QDR is too expensive, DDR might be just right. -- Goldilocks On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: > hi Prentice, > > I had noticed the diff between SDR up to QDR, > the SDR cards are affordable, the QDR isn't. > > The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap > prices in that pricerange yet. > > If i would want to build a network that's low latency and had a budget > of $800 or so a node of course i would > build a dolphin SCI network, as that's probably the fastest latency > card sold for a $675 or so a piece. > > I do not really see a rival latency wise to Dolphin there. I bet most > manufacturers selling clusters don't use > it as they can make $100 more profit or so selling other networking > stuff, and universities usually swallow that. > > So price total dominates the network. As it seems now infiniband 4x is > not going to offer enough performance. > The one-way pingpong latencies over a switch that i see of it, are not > very convincing. I see remote writes to RAM > are like nearly 10 microseconds for 4x infiniband and that card is the > only one affordable. > > The old QM400's i have here are one-way pingpong 2.1 us or so, and > QM500-B's are plentyful on the net (of course big disadvantage: needs > pci-x), > which are a 1.3 us or so there and have SHMEM. Not seeing a cheap > switch for the QM500's though nor cables. > > You see price really dominates everything here. Small cheap nodes you > cannot build if the port price, thanks to expensive network card, > more than doubles. > > Power is not the real concern for now - if a factory already burns a > couple of hundreds of megawatts, a small cluster somewhere on the > attick eating > a few kilowatts is not really a problem :) > > > > On Nov 7, 2011, at 5:36 PM, Prentice Bisbal wrote: > >> >> On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: >>> hi, >>> >>> There is a lot of infiniband 4x stuff on ebay now. >> >> Vincent, >> >> Do you mean 4x, or QDR? They refer to different parts of the IB >> architecture. 4x refers to the number of lanes for the data to travel >> down and QDR refers to the data signalling rate. >> >> It's probably irrelevant for this conversation, but if you are just >> learning about IB, It's good to understand that difference. >> >> Prentice >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 13:51:28 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 19:51:28 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <20111107113540.GO31847@leitl.org> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <20111107113540.GO31847@leitl.org> Message-ID: <5B66256F-ACFA-489B-AE01-44F5DB8F2B61@xs4all.nl> hi Eugen, In Game Tree Search basically algorithmically it is a century further than many other sciences as the brilliant minds have been busy with it. For the brilliant guys it was possible to make CASH with it. In Math there is stil many challenges to design 1 kick butt algorithm, but you won't get rich with it. As a result from Alan Turing up to the latest Einstein, they al have put their focus upon Game Tree Search. I'm moving now towards robotica in fact, building a robot. Not the robot, as i suck in building robots, but the software part so far hasn't been realy developed very well for robots. Unexplored area still for civil use that is. But as for the chessprograms now, they combine a bunch of algorithms and every single one of them profits bigtime (exponential) from caching. That caching is of course random. So the cluster we look at in number of nodes you can probably count at one hand, yet i intend to put 4 network cards (4 rails for insiders here) into each single machine. Machine is a big word, it wil be stand alone mainboard of course to save costs. So the price of each network card is fairly important. As it seems now, the old quadrics network cards QM500-B that you can pick up for $30 each or so on ebay are most promising. At Home i have a full working QM400 setup which is 2.1 us latency one way ping pong. So i'd guess a blocked read has a latency not much above that. I can choose myself whether i want to do reads of 128 bytes or 256 bytes. No big deal in fact. It's scathered through the RAM, so each read is a random read fromthe RAM. With 4 nodes that would mean of course odds 25% it's a local RAM read (no nothing network read then), and 75% odds it's somewhere in the gigabytes of RAM from a remote machine. As it seems now 4x infiniband has a blocked read latency that's too slow and i don't know for which sockets 4x works, as all testreports i read the 4x infiniband just works for old socket 604. So am not sure it works for socket 1366 let alone socket 1155; those have a different memory architecture so it's never sure whether a much older network card that works DMA will work for it. Also i hear nothing about putting several cards in 1 machine. I want at least 4 rails of course from those old crap cards. You'll argue that for 4x infiniband this is not very cost effective, as the price of 4 cards and 4 cables is already gonna be nearly 400 dollar. That's also what i noticed. But if i put in 2x QM500-B in for example a P6T professional, that's gonna be cheaper including the cables than $200 and it will be able to deliver i'd guess over a million blocked reads per second. By already doing 8 probes which is 192-256 bytes currently i already 'bandwidth optimized' the algorithm. Back in the days that Leierson at MiT ran cilkchess and other engines at the origin3800 there and some Sun supercomputers, they requested in slow manner a single probe of what will it have been, a byte or 8-12. So far it seems that 4 infiniband cards 4x can deliver me only 400k blocked reads a second, which is a workable number in fact (the amount i need depends largely upon how fast the node is) for a single socket machine. Yet i'm not aware whether infiniband allows multiple rails. Does it? The QM400 cards i have here, i'd guess can deliver with 4 rails around 1.2 million blocked reads a second, which already allows a lot faster nodes. The ideal kick butt machine so far is a simple supermicro mainboard with 4x pci-x and 4 sockets. Now it'll depend upon which cpu's i can get cheapest whether that's intel or AMD. If the 8 core socket 1366 cpu's are going to be cheap @ 22 nm, that's of course with some watercooling, say clock them to 4.5Ghz, gonna be kick butt nodes. Those mainboards allow "only" 2 rails, which definitely means that the QM400 cards, not to mention 4x infiniband is an underperformer. Up to 24 nodes, infiniband has cheap switches. But it seems only the newer infiniband cards have a latency that's sufficient, and all of them are far over $500, so that's far outside of budget. Even then they still can't beat a single QM500-B card. It's more than said that the top500 sporthall hardly needs bandwidth let alone latency. I saw that exactly a cluster in the same sporthall top500 with simple built in gigabit that isn't even DMA was only 2x slower than the same machines equipped with infiniband. Now some wil cry here that gigabit CAN have reasonable one way pingpong's, not to mention the $5k solarflare cards of 10 gigabit ethernet, yet in all sanity we must be honest that the built in gigabits from practical performance reasons are more like 500 microseconds latency if you have all cores busy. In fact even the realtime linux kernel will central lock every udp packet you ship or receive. Ugly ugly. That's no compare with the latencies of the HPC cards of course, whether you use MPI or SHMEM doesn't really matter there. That difference is so huge. As a result it seems there was never much of a push to having great network cards. That might change now with gpu's kicking butt, though those need of course massive bandwidth, not latency. For my tiny cluster latency is what matters. Usually 'one way pingpong' is a good representation of the speed of blocked reads, Quadrics excepted, as the SHMEM allows way faster blocked reads there than 2 times the price for a MPI one-way pingpong. Quadrics is dead and gone. Old junk. My cluster also will be old junk probably, with exception maybe of the cpu's. Yet if i don't find sponsorship for the cpu's, of course i'm on a big budget there as well. On Nov 7, 2011, at 12:35 PM, Eugen Leitl wrote: > On Mon, Nov 07, 2011 at 11:10:50AM +0000, John Hearns wrote: >> Vincent, >> I cannot answer all of your questions. >> I have a couple of answers: >> >> Regarding MPI, you will be looking for OpenMPI >> >> You will need a subnet manager running somewhere on the fabric. >> These can either run on the switch or on a host. >> If you are buying this equipment from eBay I would imagine you >> will be >> running the Open Fabrics subnet manager >> on a host on your cluster, rather than on a switch. >> I might be wrong - depends if the switch has a SM license. > > Assuming ebay-sourced equipment, what price tag > are we roughly looking at, per node, assuming small > (8-16 nodes) cluster sizes? > > -- > Eugen* Leitl leitl http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 15:09:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 21:09:46 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB82060.3050300@ias.edu> References: <701E477D-4F79-4E35-B035-C5E55097E6BD@xs4all.nl> <4EB80906.4040501@ias.edu> <4EB82060.3050300@ias.edu> Message-ID: <0502713E-FED7-430B-9200-E9F19576C67C@xs4all.nl> It seems the latency of DDR infiniband to do a blocked read from remote memory (RDMA) is between that of SDR and quadrics, with quadrics being a lot faster. http://www.google.nl/url?sa=t&rct=j&q=rdma%20latency%20ddr% 20infiniband&source=web&cd=9&ved=0CF8QFjAI&url=http%3A%2F% 2Fwww.cse.scitech.ac.uk%2Fdisco%2Fmew18%2FPresentations%2FDay2% 2F5th_Session% 2FMarkLehrer.pdf&ei=tjW4ToWjOY2dOoD69esB&usg=AFQjCNEzRhG5ljCxmm1r0SMXVob nAbZUAQ&cad=rja If i click there i get to a MarkLehrer.pdf www.cse.scitech.ac.uk/disco/mew18/Presentations/.../MarkLehrer.pdf It claims a RDMA read has latency of 1.91 us. However i'll have to see that in my own benchmark first before i believe it when we hammer with many different processes at that card at the same time. You get problems like switch latencies and other nasty stuff then. This is a presentation slide and i need something that works in reality. HP 4X DDR InfiniBand Mezzanine HCA 410533-B21 SFF-8470 they're $75 but just 2 of them available on ebay. The next 'ddr' one is QLE7104 QLOGIC INFINIBAND 8X DDR SINGLE PORT HBA So that's a qlogic one, $108 just 3 of them available, but we already get at a dangerous price level. Remember i want well over a million reads getting done a second and i didn't count the pollution by writes even yet. HP 4X DDR InfiniBand Mezzanine HCA - 2 Ports 448262-B21 They're $121 and again just 2 available. This seems a problem with infiniband on ebay. Even if you search 16 cards, you can each time buy 2 or so max. As if sometimes a scientist takes 2 back home and puts 'em on ebay. No big 'old' masses get posted there. The first one to offer 10, that's http://www.ebay.com/itm/HP- INFINIBAND-4X-DDR-PCI-E-DUAL-PORT-HCA-448397B21-/110649801200? pt=COMP_EN_Hubs&hash=item19c33df9f0 That 's at $192.11 a piece. It seems DDR infiniband still isn't in my pricerange Prentice. The QM500-B's from quadrics go for between $30 and $50 however. On Nov 7, 2011, at 7:16 PM, Prentice Bisbal wrote: > Vincent, > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > slow, and QDR is too expensive, DDR might be just right. > > -- > Goldilocks > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >> hi Prentice, >> >> I had noticed the diff between SDR up to QDR, >> the SDR cards are affordable, the QDR isn't. >> >> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap >> prices in that pricerange yet. >> >> If i would want to build a network that's low latency and had a >> budget >> of $800 or so a node of course i would >> build a dolphin SCI network, as that's probably the fastest latency >> card sold for a $675 or so a piece. >> >> I do not really see a rival latency wise to Dolphin there. I bet most >> manufacturers selling clusters don't use >> it as they can make $100 more profit or so selling other networking >> stuff, and universities usually swallow that. >> >> So price total dominates the network. As it seems now infiniband >> 4x is >> not going to offer enough performance. >> The one-way pingpong latencies over a switch that i see of it, are >> not >> very convincing. I see remote writes to RAM >> are like nearly 10 microseconds for 4x infiniband and that card is >> the >> only one affordable. >> >> The old QM400's i have here are one-way pingpong 2.1 us or so, and >> QM500-B's are plentyful on the net (of course big disadvantage: needs >> pci-x), >> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >> switch for the QM500's though nor cables. >> >> You see price really dominates everything here. Small cheap nodes you >> cannot build if the port price, thanks to expensive network card, >> more than doubles. >> >> Power is not the real concern for now - if a factory already burns a >> couple of hundreds of megawatts, a small cluster somewhere on the >> attick eating >> a few kilowatts is not really a problem :) >> >> >> >> On Nov 7, 2011, at 5:36 PM, Prentice Bisbal wrote: >> >>> >>> On 11/06/2011 06:01 PM, Vincent Diepeveen wrote: >>>> hi, >>>> >>>> There is a lot of infiniband 4x stuff on ebay now. >>> >>> Vincent, >>> >>> Do you mean 4x, or QDR? They refer to different parts of the IB >>> architecture. 4x refers to the number of lanes for the data to >>> travel >>> down and QDR refers to the data signalling rate. >>> >>> It's probably irrelevant for this conversation, but if you are just >>> learning about IB, It's good to understand that difference. >>> >>> Prentice >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Greg at Keller.net Mon Nov 7 15:21:51 2011 From: Greg at Keller.net (Greg Keller) Date: Mon, 07 Nov 2011 14:21:51 -0600 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: Message-ID: <4EB83DDF.5020902@Keller.net> > Date: Mon, 07 Nov 2011 13:16:00 -0500 > From: Prentice Bisbal > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > Cc: Beowulf Mailing List > Message-ID:<4EB82060.3050300 at ias.edu> > Content-Type: text/plain; charset=ISO-8859-1 > > Vincent, > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > slow, and QDR is too expensive, DDR might be just right. And for DDR a key thing is, when latency matters, "ConnectX" DDR is much better than the earlier "Infinihost III" DDR cards. We have 100's of each and the ConnectX make a large impact for some codes. Although nearly antique now, we actually have plans for the ConnectX cards in yet another round of updated systems. This is the 3rd Generation system I have been able to re-use the cards in (Harperton, Nehalem, and now Single Socket Sandy Bridge), which makes me very happy. A great investment that will likely live until PCI-Gen3 slots are the norm. -- Da Bears?! > -- > Goldilocks > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >> > hi Prentice, >> > >> > I had noticed the diff between SDR up to QDR, >> > the SDR cards are affordable, the QDR isn't. >> > >> > The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap >> > prices in that pricerange yet. >> > >> > If i would want to build a network that's low latency and had a budget >> > of $800 or so a node of course i would >> > build a dolphin SCI network, as that's probably the fastest latency >> > card sold for a $675 or so a piece. >> > >> > I do not really see a rival latency wise to Dolphin there. I bet most >> > manufacturers selling clusters don't use >> > it as they can make $100 more profit or so selling other networking >> > stuff, and universities usually swallow that. >> > >> > So price total dominates the network. As it seems now infiniband 4x is >> > not going to offer enough performance. >> > The one-way pingpong latencies over a switch that i see of it, are not >> > very convincing. I see remote writes to RAM >> > are like nearly 10 microseconds for 4x infiniband and that card is the >> > only one affordable. >> > >> > The old QM400's i have here are one-way pingpong 2.1 us or so, and >> > QM500-B's are plentyful on the net (of course big disadvantage: needs >> > pci-x), >> > which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >> > switch for the QM500's though nor cables. >> > >> > You see price really dominates everything here. Small cheap nodes you >> > cannot build if the port price, thanks to expensive network card, >> > more than doubles. >> > >> > Power is not the real concern for now - if a factory already burns a >> > couple of hundreds of megawatts, a small cluster somewhere on the >> > attick eating >> > a few kilowatts is not really a problem:) >> > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 15:33:52 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Mon, 7 Nov 2011 21:33:52 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <4EB83DDF.5020902@Keller.net> References: <4EB83DDF.5020902@Keller.net> Message-ID: hi Greg, Very useful info! I already was wondering about the different timings i see for infiniband, but indeed it's the ConnectX that scores better in latency. $289 on ebay but that's directly QDR then. "ConnectX-2 Dual-Port VPI QDR Infiniband Mezzanine I/O Card for Dell PowerEdge M1000e-Series Blade Servers" This 1.91 microseconds for a RDMA read is for a connectx. Not bad for Infiniband. Only 50% slower in latency than quadrics which is pci-x of course. Yet now needed is a cheap price for 'em :) It seems indeed all the 'cheap' offers are the infinihost III DDR versions. Regards, Vincent On Nov 7, 2011, at 9:21 PM, Greg Keller wrote: > >> Date: Mon, 07 Nov 2011 13:16:00 -0500 >> From: Prentice Bisbal >> Subject: Re: [Beowulf] building Infiniband 4x cluster questions >> Cc: Beowulf Mailing List >> Message-ID:<4EB82060.3050300 at ias.edu> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Vincent, >> >> Don't forget that between SDR and QDR, there is DDR. If SDR is too >> slow, and QDR is too expensive, DDR might be just right. > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > much > better than the earlier "Infinihost III" DDR cards. We have 100's of > each and the ConnectX make a large impact for some codes. Although > nearly antique now, we actually have plans for the ConnectX cards > in yet > another round of updated systems. This is the 3rd Generation system I > have been able to re-use the cards in (Harperton, Nehalem, and now > Single Socket Sandy Bridge), which makes me very happy. A great > investment that will likely live until PCI-Gen3 slots are the norm. > -- > Da Bears?! > >> -- >> Goldilocks >> >> >> On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >>>> hi Prentice, >>>> >>>> I had noticed the diff between SDR up to QDR, >>>> the SDR cards are affordable, the QDR isn't. >>>> >>>> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find >>>> cheap >>>> prices in that pricerange yet. >>>> >>>> If i would want to build a network that's low latency and had a >>>> budget >>>> of $800 or so a node of course i would >>>> build a dolphin SCI network, as that's probably the fastest >>>> latency >>>> card sold for a $675 or so a piece. >>>> >>>> I do not really see a rival latency wise to Dolphin there. I >>>> bet most >>>> manufacturers selling clusters don't use >>>> it as they can make $100 more profit or so selling other >>>> networking >>>> stuff, and universities usually swallow that. >>>> >>>> So price total dominates the network. As it seems now >>>> infiniband 4x is >>>> not going to offer enough performance. >>>> The one-way pingpong latencies over a switch that i see of it, >>>> are not >>>> very convincing. I see remote writes to RAM >>>> are like nearly 10 microseconds for 4x infiniband and that card >>>> is the >>>> only one affordable. >>>> >>>> The old QM400's i have here are one-way pingpong 2.1 us or so, and >>>> QM500-B's are plentyful on the net (of course big disadvantage: >>>> needs >>>> pci-x), >>>> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >>>> switch for the QM500's though nor cables. >>>> >>>> You see price really dominates everything here. Small cheap >>>> nodes you >>>> cannot build if the port price, thanks to expensive network card, >>>> more than doubles. >>>> >>>> Power is not the real concern for now - if a factory already >>>> burns a >>>> couple of hundreds of megawatts, a small cluster somewhere on the >>>> attick eating >>>> a few kilowatts is not really a problem:) >>>> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 17:07:51 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Mon, 7 Nov 2011 22:07:51 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB83DDF.5020902@Keller.net> Message-ID: RDMA read is a round trip operation and it is measured from host memory to host memory. I doubt if Quadrics had half of it for round trip operations measured from host memory to host memory. The PCI-X memory to card was around 0.7 by itself (one way).... Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Vincent Diepeveen Sent: Monday, November 07, 2011 12:33 PM To: Greg Keller Cc: beowulf at beowulf.org Subject: Re: [Beowulf] building Infiniband 4x cluster questions hi Greg, Very useful info! I already was wondering about the different timings i see for infiniband, but indeed it's the ConnectX that scores better in latency. $289 on ebay but that's directly QDR then. "ConnectX-2 Dual-Port VPI QDR Infiniband Mezzanine I/O Card for Dell PowerEdge M1000e-Series Blade Servers" This 1.91 microseconds for a RDMA read is for a connectx. Not bad for Infiniband. Only 50% slower in latency than quadrics which is pci-x of course. Yet now needed is a cheap price for 'em :) It seems indeed all the 'cheap' offers are the infinihost III DDR versions. Regards, Vincent On Nov 7, 2011, at 9:21 PM, Greg Keller wrote: > >> Date: Mon, 07 Nov 2011 13:16:00 -0500 >> From: Prentice Bisbal >> Subject: Re: [Beowulf] building Infiniband 4x cluster questions >> Cc: Beowulf Mailing List >> Message-ID:<4EB82060.3050300 at ias.edu> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Vincent, >> >> Don't forget that between SDR and QDR, there is DDR. If SDR is too >> slow, and QDR is too expensive, DDR might be just right. > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > much better than the earlier "Infinihost III" DDR cards. We have > 100's of each and the ConnectX make a large impact for some codes. > Although nearly antique now, we actually have plans for the ConnectX > cards in yet another round of updated systems. This is the 3rd > Generation system I have been able to re-use the cards in (Harperton, > Nehalem, and now Single Socket Sandy Bridge), which makes me very > happy. A great investment that will likely live until PCI-Gen3 slots > are the norm. > -- > Da Bears?! > >> -- >> Goldilocks >> >> >> On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >>>> hi Prentice, >>>> >>>> I had noticed the diff between SDR up to QDR, the SDR cards are >>>> affordable, the QDR isn't. >>>> >>>> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find >>>> cheap prices in that pricerange yet. >>>> >>>> If i would want to build a network that's low latency and had a >>>> budget of $800 or so a node of course i would build a dolphin SCI >>>> network, as that's probably the fastest latency card sold for a >>>> $675 or so a piece. >>>> >>>> I do not really see a rival latency wise to Dolphin there. I bet >>>> most manufacturers selling clusters don't use it as they can make >>>> $100 more profit or so selling other networking stuff, and >>>> universities usually swallow that. >>>> >>>> So price total dominates the network. As it seems now infiniband >>>> 4x is not going to offer enough performance. >>>> The one-way pingpong latencies over a switch that i see of it, are >>>> not very convincing. I see remote writes to RAM are like nearly >>>> 10 microseconds for 4x infiniband and that card is the only one >>>> affordable. >>>> >>>> The old QM400's i have here are one-way pingpong 2.1 us or so, and >>>> QM500-B's are plentyful on the net (of course big disadvantage: >>>> needs >>>> pci-x), >>>> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >>>> switch for the QM500's though nor cables. >>>> >>>> You see price really dominates everything here. Small cheap nodes >>>> you cannot build if the port price, thanks to expensive network >>>> card, more than doubles. >>>> >>>> Power is not the real concern for now - if a factory already burns >>>> a couple of hundreds of megawatts, a small cluster somewhere on >>>> the attick eating a few kilowatts is not really a problem:) >>>> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 18:25:56 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 00:25:56 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB83DDF.5020902@Keller.net> Message-ID: <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> Yeah well i'm no expert there what pci-x adds versus pci-e. I'm on a budget here :) I just test things and go for the fastest. But if we do theoretic math, SHMEM is difficult to beat of course. Google for measurements with shmem, not many out there. Fact that so few standardized/rewrote their floating point software to gpu's, is already saying enough about all the legacy codes in HPC world :) When some years ago i had a working 2 cluster node here with QM500- A , it had at 32 bits , 33Mhz pci long sleeve slots a blocked read latency of under 3 us is what i saw on my screen. Sure i had no switch in between it. Direct connection between the 2 elan4's. I'm not sure what pci-x adds to it when clocked at 133Mhz, but it won't be a big diff with pci-e. PCI-e probably only has a bigger bandwidth isn't it? Beating such hardware 2nd hand is difficult. $30 on ebay and i can install 4 rails or so. Didn't find the cables yet though... So i don't see how to outdo that with old infiniband cards which are $130 and upwards for the connectx, say $150 soon, which would allow only single rail or maybe at best 2 rails. So far didn't hear anyone yet who has more than single rail IB. Is it possible to install 2 rails with IB? So if i use your number in pessimistic manner, which means that there is some overhead of pci-x, then the connectx type IB, can do 1 million blocked reads per second theoretic with 2 rails. Which is $300 or so, cables not counted. Quadrics QM500 is around 2 million blocked reads per second for 4 rails @ $120 , cables not counted. Copper cables which have a cost of around 100 ns each 10 meters, if i use 1/3 of lightspeed for electrons in copper, those costs also are kept low with short cables. On Nov 7, 2011, at 11:07 PM, Gilad Shainer wrote: > RDMA read is a round trip operation and it is measured from host > memory to host memory. I doubt if Quadrics had half of it for round > trip operations measured from host memory to host memory. The PCI-X > memory to card was around 0.7 by itself (one way).... > > Gilad > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Monday, November 07, 2011 12:33 PM > To: Greg Keller > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > hi Greg, > > Very useful info! I already was wondering about the different > timings i see for infiniband, but indeed it's the ConnectX that > scores better in latency. > > $289 on ebay but that's directly QDR then. > > "ConnectX-2 Dual-Port VPI QDR Infiniband Mezzanine I/O Card for > Dell PowerEdge M1000e-Series Blade Servers" > > This 1.91 microseconds for a RDMA read is for a connectx. Not bad > for Infiniband. > Only 50% slower in latency than quadrics which is pci-x of course. > > Yet now needed is a cheap price for 'em :) > > It seems indeed all the 'cheap' offers are the infinihost III DDR > versions. > > Regards, > Vincent > > On Nov 7, 2011, at 9:21 PM, Greg Keller wrote: > >> >>> Date: Mon, 07 Nov 2011 13:16:00 -0500 >>> From: Prentice Bisbal >>> Subject: Re: [Beowulf] building Infiniband 4x cluster questions >>> Cc: Beowulf Mailing List >>> Message-ID:<4EB82060.3050300 at ias.edu> >>> Content-Type: text/plain; charset=ISO-8859-1 >>> >>> Vincent, >>> >>> Don't forget that between SDR and QDR, there is DDR. If SDR is too >>> slow, and QDR is too expensive, DDR might be just right. >> And for DDR a key thing is, when latency matters, "ConnectX" DDR is >> much better than the earlier "Infinihost III" DDR cards. We have >> 100's of each and the ConnectX make a large impact for some codes. >> Although nearly antique now, we actually have plans for the ConnectX >> cards in yet another round of updated systems. This is the 3rd >> Generation system I have been able to re-use the cards in (Harperton, >> Nehalem, and now Single Socket Sandy Bridge), which makes me very >> happy. A great investment that will likely live until PCI-Gen3 slots >> are the norm. >> -- >> Da Bears?! >> >>> -- >>> Goldilocks >>> >>> >>> On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >>>>> hi Prentice, >>>>> >>>>> I had noticed the diff between SDR up to QDR, the SDR cards are >>>>> affordable, the QDR isn't. >>>>> >>>>> The SDR's are all $50-$75 on ebay now. The QDR's i didn't find >>>>> cheap prices in that pricerange yet. >>>>> >>>>> If i would want to build a network that's low latency and had a >>>>> budget of $800 or so a node of course i would build a dolphin >>>>> SCI >>>>> network, as that's probably the fastest latency card sold for a >>>>> $675 or so a piece. >>>>> >>>>> I do not really see a rival latency wise to Dolphin there. I bet >>>>> most manufacturers selling clusters don't use it as they can >>>>> make >>>>> $100 more profit or so selling other networking stuff, and >>>>> universities usually swallow that. >>>>> >>>>> So price total dominates the network. As it seems now infiniband >>>>> 4x is not going to offer enough performance. >>>>> The one-way pingpong latencies over a switch that i see of it, >>>>> are >>>>> not very convincing. I see remote writes to RAM are like nearly >>>>> 10 microseconds for 4x infiniband and that card is the only one >>>>> affordable. >>>>> >>>>> The old QM400's i have here are one-way pingpong 2.1 us or so, >>>>> and >>>>> QM500-B's are plentyful on the net (of course big disadvantage: >>>>> needs >>>>> pci-x), >>>>> which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >>>>> switch for the QM500's though nor cables. >>>>> >>>>> You see price really dominates everything here. Small cheap nodes >>>>> you cannot build if the port price, thanks to expensive network >>>>> card, more than doubles. >>>>> >>>>> Power is not the real concern for now - if a factory already >>>>> burns >>>>> a couple of hundreds of megawatts, a small cluster somewhere on >>>>> the attick eating a few kilowatts is not really a problem:) >>>>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing To change your subscription (digest mode or unsubscribe) >> visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From jhh3851 at yahoo.com Mon Nov 7 18:44:41 2011 From: jhh3851 at yahoo.com (Joseph Han) Date: Mon, 7 Nov 2011 15:44:41 -0800 (PST) Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: Message-ID: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> To further complicate issue, if latency is the key driving factor for older hardware, I think that the chips with the Infinipath/Pathscale lineage tend to have lower latencies than the Mellanox Inifinihost line. ? When in the DDR time frame, I measured Infinipath ping-pong latencies 3-4x better than that of DDR Mellanox silicon. ?Of course, the Infinipath silicon will require different kernel drivers than those from Mellanox (ipath versus mthca). ?These were QLogic specific HCA's and not the rebranded Silverstorm HCA's sold by QLogic. ?(Confused yet?) ?I believe that the model number was QLogic 7240 for the DDR version and QLogic 7140 for the SDR one. Joseph Message: 2 Date: Mon, 07 Nov 2011 14:21:51 -0600 From: Greg Keller Subject: Re: [Beowulf] building Infiniband 4x cluster questions To: beowulf at beowulf.org Message-ID: <4EB83DDF.5020902 at Keller.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Date: Mon, 07 Nov 2011 13:16:00 -0500 > From: Prentice Bisbal > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > Cc: Beowulf Mailing List > Message-ID:<4EB82060.3050300 at ias.edu> > Content-Type: text/plain; charset=ISO-8859-1 > > Vincent, > > Don't forget that between SDR and QDR, there is DDR.? If SDR is too > slow, and QDR is too expensive, DDR might be just right. And for DDR a key thing is, when latency matters, "ConnectX" DDR is much better than the earlier "Infinihost III" DDR cards.? We have 100's of each and the ConnectX make a large impact for some codes.? Although nearly antique now, we actually have plans for the ConnectX cards in yet another round of updated systems.? This is the 3rd Generation system I have been able to re-use the cards in (Harperton, Nehalem, and now Single Socket Sandy Bridge), which makes me very happy.? A great investment that will likely live until PCI-Gen3 slots are the norm. -- Da Bears?! > -- > Goldilocks > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: >> >? hi Prentice, >> > >> >? I had noticed the diff between SDR up to QDR, >> >? the SDR cards are affordable, the QDR isn't. >> > >> >? The SDR's are all $50-$75 on ebay now. The QDR's i didn't find cheap >> >? prices in that pricerange yet. >> > >> >? If i would want to build a network that's low latency and had a budget >> >? of $800 or so a node of course i would >> >? build a dolphin SCI network, as that's probably the fastest latency >> >? card sold for a $675 or so a piece. >> > >> >? I do not really see a rival latency wise to Dolphin there. I bet most >> >? manufacturers selling clusters don't use >> >? it as they can make $100 more profit or so selling other networking >> >? stuff, and universities usually swallow that. >> > >> >? So price total dominates the network. As it seems now infiniband 4x is >> >? not going to offer enough performance. >> >? The one-way pingpong latencies over a switch that i see of it, are not >> >? very convincing. I see remote writes to RAM >> >? are like nearly 10 microseconds for 4x infiniband and that card is the >> >? only one affordable. >> > >> >? The old QM400's i have here are one-way pingpong 2.1 us or so, and >> >? QM500-B's are plentyful on the net (of course big disadvantage: needs >> >? pci-x), >> >? which are a 1.3 us or so there and have SHMEM. Not seeing a cheap >> >? switch for the QM500's though nor cables. >> > >> >? You see price really dominates everything here. Small cheap nodes you >> >? cannot build if the port price, thanks to expensive network card, >> >? more than doubles. >> > >> >? Power is not the real concern for now - if a factory already burns a >> >? couple of hundreds of megawatts, a small cluster somewhere on the >> >? attick eating >> >? a few kilowatts is not really a problem:) >> > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Mon Nov 7 18:57:45 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 00:57:45 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> Message-ID: <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> On Nov 8, 2011, at 12:44 AM, Joseph Han wrote: > To further complicate issue, if latency is the key driving factor > for older hardware, I think that the chips with the Infinipath/ > Pathscale lineage tend to have lower latencies than the Mellanox > Inifinihost line. > > When in the DDR time frame, I measured Infinipath ping-pong > latencies 3-4x better than that of DDR Mellanox silicon. Of > course, the Infinipath silicon will require different kernel > drivers than those from Mellanox (ipath versus mthca). These were > QLogic specific HCA's and not the rebranded Silverstorm HCA's sold > by QLogic. (Confused yet?) I believe that the model number was > QLogic 7240 for the DDR version and QLogic 7140 for the SDR one. > > Joseph > Claim of manufactuer is 1.2 us one-way pingpong for QLE7240. Of course to get to that number possibly they would've needed to use their grandmother analogue stopwatch, but even 1.2 us ain't bad :) 95 dollar on ebay. Anyone having even better news? Vincent > > > Message: 2 > Date: Mon, 07 Nov 2011 14:21:51 -0600 > From: Greg Keller > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > To: beowulf at beowulf.org > Message-ID: <4EB83DDF.5020902 at Keller.net> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Date: Mon, 07 Nov 2011 13:16:00 -0500 > > From: Prentice Bisbal > > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > Cc: Beowulf Mailing List > > Message-ID:<4EB82060.3050300 at ias.edu> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Vincent, > > > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > > slow, and QDR is too expensive, DDR might be just right. > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > much > better than the earlier "Infinihost III" DDR cards. We have 100's of > each and the ConnectX make a large impact for some codes. Although > nearly antique now, we actually have plans for the ConnectX cards > in yet > another round of updated systems. This is the 3rd Generation system I > have been able to re-use the cards in (Harperton, Nehalem, and now > Single Socket Sandy Bridge), which makes me very happy. A great > investment that will likely live until PCI-Gen3 slots are the norm. > -- > Da Bears?! > > > -- > > Goldilocks > > > > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: > >> > hi Prentice, > >> > > >> > I had noticed the diff between SDR up to QDR, > >> > the SDR cards are affordable, the QDR isn't. > >> > > >> > The SDR's are all $50-$75 on ebay now. The QDR's i didn't > find cheap > >> > prices in that pricerange yet. > >> > > >> > If i would want to build a network that's low latency and had > a budget > >> > of $800 or so a node of course i would > >> > build a dolphin SCI network, as that's probably the fastest > latency > >> > card sold for a $675 or so a piece. > >> > > >> > I do not really see a rival latency wise to Dolphin there. I > bet most > >> > manufacturers selling clusters don't use > >> > it as they can make $100 more profit or so selling other > networking > >> > stuff, and universities usually swallow that. > >> > > >> > So price total dominates the network. As it seems now > infiniband 4x is > >> > not going to offer enough performance. > >> > The one-way pingpong latencies over a switch that i see of > it, are not > >> > very convincing. I see remote writes to RAM > >> > are like nearly 10 microseconds for 4x infiniband and that > card is the > >> > only one affordable. > >> > > >> > The old QM400's i have here are one-way pingpong 2.1 us or > so, and > >> > QM500-B's are plentyful on the net (of course big > disadvantage: needs > >> > pci-x), > >> > which are a 1.3 us or so there and have SHMEM. Not seeing a > cheap > >> > switch for the QM500's though nor cables. > >> > > >> > You see price really dominates everything here. Small cheap > nodes you > >> > cannot build if the port price, thanks to expensive network > card, > >> > more than doubles. > >> > > >> > Power is not the real concern for now - if a factory already > burns a > >> > couple of hundreds of megawatts, a small cluster somewhere on > the > >> > attick eating > >> > a few kilowatts is not really a problem:) > >> > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 20:46:38 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Tue, 8 Nov 2011 01:46:38 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> References: <4EB83DDF.5020902@Keller.net> <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> Message-ID: > I just test things and go for the fastest. But if we do theoretic math, SHMEM > is difficult to beat of course. > Google for measurements with shmem, not many out there. SHMEM within the node or between nodes? > Fact that so few standardized/rewrote their floating point software to gpu's, > is already saying enough about all the legacy codes in HPC world :) > > When some years ago i had a working 2 cluster node here with QM500- A , it > had at 32 bits , 33Mhz pci long sleeve slots a blocked read latency of under 3 > us is what i saw on my screen. Sure i had no switch in between it. Direct > connection between the 2 elan4's. > > I'm not sure what pci-x adds to it when clocked at 133Mhz, but it won't be a > big diff with pci-e. There is a big different between PCIX and PCIe. PCIe is half the latency - from 0.7 to 0.3 more or less. > PCI-e probably only has a bigger bandwidth isn't it? Also bandwidth ...:-) > Beating such hardware 2nd hand is difficult. $30 on ebay and i can install 4 > rails or so. > Didn't find the cables yet though... > > So i don't see how to outdo that with old infiniband cards which are > $130 and upwards for the connectx, say $150 soon, which would allow only > single rail > or maybe at best 2 rails. So far didn't hear anyone yet who has more than > single rail IB. > > Is it possible to install 2 rails with IB? Yes, you can do dual rails > So if i use your number in pessimistic manner, which means that there is > some overhead of pci-x, then the connectx type IB, can do 1 million blocked > reads per second theoretic with 2 rails. Which is $300 or so, cables not > counted. Are you referring to RDMA reads? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Shainer at Mellanox.com Mon Nov 7 20:53:55 2011 From: Shainer at Mellanox.com (Gilad Shainer) Date: Tue, 8 Nov 2011 01:53:55 +0000 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> References: <1320709481.91083.YahooMailClassic@web161903.mail.bf1.yahoo.com> <1DA4BA55-588D-4CCE-8D8C-46F7E94F29BF@xs4all.nl> Message-ID: The latency numbers are more or less the same between the IB vendors on SDR, DDR and QDR. Mellanox is the only vendor with FDR IB for now, and with PCIe 3.0 latency are below 1us (RDMA much below...). Question is what you are going to use the system for - which apps. Gilad > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf- > bounces at beowulf.org] On Behalf Of Vincent Diepeveen > Sent: Monday, November 07, 2011 3:58 PM > To: jhh3851 at yahoo.com > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > > On Nov 8, 2011, at 12:44 AM, Joseph Han wrote: > > > To further complicate issue, if latency is the key driving factor for > > older hardware, I think that the chips with the Infinipath/ Pathscale > > lineage tend to have lower latencies than the Mellanox Inifinihost > > line. > > > > When in the DDR time frame, I measured Infinipath ping-pong latencies > > 3-4x better than that of DDR Mellanox silicon. Of course, the > > Infinipath silicon will require different kernel drivers than those > > from Mellanox (ipath versus mthca). These were QLogic specific HCA's > > and not the rebranded Silverstorm HCA's sold by QLogic. (Confused > > yet?) I believe that the model number was QLogic 7240 for the DDR > > version and QLogic 7140 for the SDR one. > > > > Joseph > > > > Claim of manufactuer is 1.2 us one-way pingpong for QLE7240. Of course to > get to that number possibly they would've needed to use their grandmother > analogue stopwatch, but even 1.2 us ain't bad :) > > 95 dollar on ebay. > > Anyone having even better news? > > Vincent > > > > > > > Message: 2 > > Date: Mon, 07 Nov 2011 14:21:51 -0600 > > From: Greg Keller > > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > To: beowulf at beowulf.org > > Message-ID: <4EB83DDF.5020902 at Keller.net> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > > > > Date: Mon, 07 Nov 2011 13:16:00 -0500 > > > From: Prentice Bisbal > > > Subject: Re: [Beowulf] building Infiniband 4x cluster questions > > > Cc: Beowulf Mailing List > > > Message-ID:<4EB82060.3050300 at ias.edu> > > > Content-Type: text/plain; charset=ISO-8859-1 > > > > > > Vincent, > > > > > > Don't forget that between SDR and QDR, there is DDR. If SDR is too > > > slow, and QDR is too expensive, DDR might be just right. > > And for DDR a key thing is, when latency matters, "ConnectX" DDR is > > much better than the earlier "Infinihost III" DDR cards. We have > > 100's of each and the ConnectX make a large impact for some codes. > > Although nearly antique now, we actually have plans for the ConnectX > > cards in yet another round of updated systems. This is the 3rd > > Generation system I have been able to re-use the cards in (Harperton, > > Nehalem, and now Single Socket Sandy Bridge), which makes me very > > happy. A great investment that will likely live until PCI-Gen3 slots > > are the norm. > > -- > > Da Bears?! > > > > > -- > > > Goldilocks > > > > > > > > > On 11/07/2011 11:58 AM, Vincent Diepeveen wrote: > > >> > hi Prentice, > > >> > > > >> > I had noticed the diff between SDR up to QDR, the SDR cards are > > >> > affordable, the QDR isn't. > > >> > > > >> > The SDR's are all $50-$75 on ebay now. The QDR's i didn't > > find cheap > > >> > prices in that pricerange yet. > > >> > > > >> > If i would want to build a network that's low latency and had > > a budget > > >> > of $800 or so a node of course i would build a dolphin SCI > > >> > network, as that's probably the fastest > > latency > > >> > card sold for a $675 or so a piece. > > >> > > > >> > I do not really see a rival latency wise to Dolphin there. I > > bet most > > >> > manufacturers selling clusters don't use it as they can make > > >> > $100 more profit or so selling other > > networking > > >> > stuff, and universities usually swallow that. > > >> > > > >> > So price total dominates the network. As it seems now > > infiniband 4x is > > >> > not going to offer enough performance. > > >> > The one-way pingpong latencies over a switch that i see of > > it, are not > > >> > very convincing. I see remote writes to RAM are like nearly 10 > > >> > microseconds for 4x infiniband and that > > card is the > > >> > only one affordable. > > >> > > > >> > The old QM400's i have here are one-way pingpong 2.1 us or > > so, and > > >> > QM500-B's are plentyful on the net (of course big > > disadvantage: needs > > >> > pci-x), > > >> > which are a 1.3 us or so there and have SHMEM. Not seeing a > > cheap > > >> > switch for the QM500's though nor cables. > > >> > > > >> > You see price really dominates everything here. Small cheap > > nodes you > > >> > cannot build if the port price, thanks to expensive network > > card, > > >> > more than doubles. > > >> > > > >> > Power is not the real concern for now - if a factory already > > burns a > > >> > couple of hundreds of megawatts, a small cluster somewhere on > > the > > >> > attick eating > > >> > a few kilowatts is not really a problem:) > > >> > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > > Computing To change your subscription (digest mode or unsubscribe) > > visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Mon Nov 7 21:33:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 03:33:46 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB83DDF.5020902@Keller.net> <147D1320-4C2E-423D-95BC-3F5F8AEEFDA5@xs4all.nl> Message-ID: <857E3421-8EB5-4260-81BF-4AADEFF706C2@xs4all.nl> On Nov 8, 2011, at 2:46 AM, Gilad Shainer wrote: >> I just test things and go for the fastest. But if we do theoretic >> math, SHMEM >> is difficult to beat of course. >> Google for measurements with shmem, not many out there. > > SHMEM within the node or between nodes? shmem is the programming library that cray had and that quadrics had. so basically your program doesn't need silly message catching mpi commands everywhere. You only define at program start whether an array is getting tracked by elan4 and which nodes it gets updated to etc. So no need to check for MPI overfows for the complex code of starting / stopping cpu's. Can reuse code there easily to start remote nodes and cpu's. So where the majority of the latency is needed for RDMA reads and/or reads from remote elan memory, the tough yet in overhead neglectible complicated code to start/stop cpu's, is a bit easier to program with SHMEM library. the caches on the quadrics cards have shmem so you don't access the RAM at all, it's already in the cards. didn't check whether those features got added to mpi somehow. so you just need to read the card - it's not gonna go through pci-x at all at the remote node. Yet of course all this is not so relevant to explain here - as quadrics is long gone, and i just search for a cheapo solution :) So you lose only 2x the pci-x latency, versus 4x pci-e latency in such case. In case of a RDMA read i doubt latency of DDR infiniband is faster than quadrics. that 0.7 you mentionned if it is microseconds sounds like a bit overestimated latency for pci-x. From the 1.3 us that the MPI-one-way pingpong is at QM500, if we multiply it by 2 it's 2.6 us. From that 2.6 us, according to your math it's already 2.8 us cost to pci-x, then , which has a cost of 2x pci-x, receiving elan has a cost of 130 ns, switch say 300 ns including cables for a 128 port router, 100 ns from the sending elan. that's 530 ns, and that times 2 is 1060 ns. There's really little left for the pci-x. as 2.6 - 1.06 = 1.44 us left for 4 times pci-x. 1.44 / 4 = 0.36 us for pci-x. I used the Los Alamos National Laboratory example numbers here for elan4. In the end it is about price, not user friendliness of programming :) > > >> Fact that so few standardized/rewrote their floating point >> software to gpu's, >> is already saying enough about all the legacy codes in HPC world :) >> >> When some years ago i had a working 2 cluster node here with >> QM500- A , it >> had at 32 bits , 33Mhz pci long sleeve slots a blocked read >> latency of under 3 >> us is what i saw on my screen. Sure i had no switch in between it. >> Direct >> connection between the 2 elan4's. >> >> I'm not sure what pci-x adds to it when clocked at 133Mhz, but it >> won't be a >> big diff with pci-e. > > There is a big different between PCIX and PCIe. PCIe is half the > latency - from 0.7 to 0.3 more or less. > Well i'm not so sure the difference is that huge. All those measurements in past was at oldie Xeon P4 machines, and i've never really seen a good comparision there. Furthermore fabrics like Dolphin at the time with a 66Mhz, 64 bits PCI card already got like 1.36 us one-way pingpong latencies, not exactly a lot slower than DDR infinibands qlogics of a claimed 1.2 us. >> PCI-e probably only has a bigger bandwidth isn't it? > > Also bandwidth ...:-) That's a non discussion here. I need latency :) If i'd really need big bandwidth for transport i'd use of course a boat - 90% of all cargo here gets transported over the rivers and hand dug canal; especially river Rhine. > >> Beating such hardware 2nd hand is difficult. $30 on ebay and i can >> install 4 >> rails or so. >> Didn't find the cables yet though... >> >> So i don't see how to outdo that with old infiniband cards which are >> $130 and upwards for the connectx, say $150 soon, which would >> allow only >> single rail >> or maybe at best 2 rails. So far didn't hear anyone yet who has >> more than >> single rail IB. >> >> Is it possible to install 2 rails with IB? > > Yes, you can do dual rails very well > >> So if i use your number in pessimistic manner, which means that >> there is >> some overhead of pci-x, then the connectx type IB, can do 1 >> million blocked >> reads per second theoretic with 2 rails. Which is $300 or so, >> cables not >> counted. > > Are you referring to RDMA reads? > As i use all cpu cores 100%, i simply cannot catch mpi messages, let alone overflow. So anything that has the cards processor do the job of digging inthe RAM rather than bug one of the very busy cores, is very welcome form of communication. 99.9% of all communication to remote nodes is 32 byte RDMA wites and 128-256 byte reads. I can set myself whether it's 128, 192 or 256. Probably i'll make it 128. The number of reads is a few percent more than writes. That other 0.01% is the very complex parallel algorithm that basically parallellizes a sequential algorithm. That algorithm is a 150 pages of a4 roughly full of insights and proofs why it works correct :) > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Nov 8 05:24:07 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 8 Nov 2011 11:24:07 +0100 Subject: [Beowulf] building Infiniband 4x cluster questions In-Reply-To: References: <4EB7EF12.8000403@science-computing.de> Message-ID: On Nov 7, 2011, at 6:16 PM, Gilad Shainer wrote: >> They use the term "message oriented" with the description that the IB >> hardware takes care of segmentation and so forth, so that the >> application >> just says "send this" or "receive this" and the gory details are >> concealed. Then he distinguishes that from a TCP/IP stack, etc., >> where >> the software does a lot of this, with the implication that the >> user has to be >> involved in that. >> >> But it seems to me that the same processes are going on.. You have >> a big >> message, it needs to be broken up, etc. >> And for *most users* all that is hidden underneath the hood of, >> say, MPI. >> (obviously, if you are a message passing software writer, the >> distinction is >> important). > > You can also post large message to the IB interface (up to 2GB I > believe) and the IB transport will break it to the network MTU. > > Gilad > Please note that searching only requires massive amounts of short data requests, say 128 bytes and massive stores of 32 bytes. So latency of the network cards and how fast the cards can switch from proces to proces, those latencies play a far more important role than all those single core latencies that everyone always posts. Some cards when switching from helping 1 proces to another can have a penalty of dozens of microseconds; you never hear from all those hidden penalties as the few online tests done by all those academics are always single core tests with the rest of the cluster idle. Interesting is to hear experiences there, but reality is that you hardly ever hear that. You have to gamble what to buy. What i do know is that the MPI programming model is not very attractive; you have to catch all those messages shipped somewhere, check for overflow and so on. Yet i'm on a budget here, so price dominates everything. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From raysonlogin at gmail.com Tue Nov 8 16:03:27 2011 From: raysonlogin at gmail.com (Rayson Ho) Date: Tue, 8 Nov 2011 16:03:27 -0500 Subject: [Beowulf] HP redstone servers In-Reply-To: <4EB0A678.9060602@cse.ucdavis.edu> References: <4EB0A678.9060602@cse.ucdavis.edu> Message-ID: ARM is an interesting platform that offers better performance/power ratio than x64 processors. I don't think ARM will eat into HPC shares of AMD/Intel/IBM POWER or enter the TOP500 list any time soon. However, I am expecting to see ARM in high throughput environments in the near future. Thus, we are announcing that the next version of Grid Engine released by the Grid Scheduler open project will support ARM Linux. We tested SGE on an ARMv7 box. As the SGE code is 64-bit clean, when 64-bit ARM processors come out in the next year or two, our version should/will compile & work out of the box. Rayson ================================= Grid Engine / Open Grid Scheduler http://gridscheduler.sourceforge.net On Tue, Nov 1, 2011 at 10:10 PM, Bill Broadley wrote: > The best summary I've found: > http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ > > Specifications at for the ECX-1000: > http://www.calxeda.com/products/energycore/ecx1000/techspecs > > And EnergyCard: > http://www.calxeda.com/products/energycards/techspecs > > The only hint on price that I found was from theregister.co.uk: > ?The sales pitch for the Redstone systems, says Santeler, is that a > ?half rack of Redstone machines and their external switches > ?implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, > ?and costs $1.2m. > > So it sounds like for 6 watts and $750 you get a quad core 1.4 GHz arm > 10G connected node. > > Comments? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Rayson ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Tue Nov 8 20:01:47 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed, 9 Nov 2011 02:01:47 +0100 Subject: [Beowulf] HP redstone servers In-Reply-To: References: <4EB0A678.9060602@cse.ucdavis.edu> Message-ID: hi Rayson, Most interesting stuff. The question i ask myself. Why is it so expensive? If i do a silly compare, just looking to the Ghz. Then a quad core 1.4Ghz is similar to a single core i7 @ 1.5Ghz roughly for Diep. I rounded up optimistically the IPC of diep at a single ARM core to 0.5 (if you realize a bulldozer core gets like 0.73, you'll realize the problem of this optimistic guess, whereas an i7 core is over 1.73+ ). Diep being in principle a 32 bits integer program, just 64 bits compiled for a bigger caching range (hashtable) of course profits perfectly from ARM. You won't find much software that can run better on such ARM cpu's than a chessprogram. So 1600 nodes then is like 800 cores 3Ghz i7. Or a 100 socket machine i7 @ 8 cores a CPU, or a 128 socket machine i7 @ 6 cores a CPU. The 6 core Xeons actually are a tad higher clocked than 3Ghz, but let's forget about that now. Now getting that with a good network might not be so cheap, but so to speak there is a budget of far over 1.2 million / 128 = $9375 per socket. So that 's a 64 node switch and 64 nodes dual socket Xeon. That gives a budget of $18750 a node. Pretty easy to build i'd say so. Now performance a watt. Of course something ARM is good at. With 64 nodes that means 9900 watt / 64 = 154 watt per node. We can be sure that the Xeon burn more than that. Yet it's not much more than factor 2 off and everywhere so far i rounded off optimistically for the ARM. I took 3Ghz cpu's, in reality they're higher clocked. I took 6 cores, in reality they're soon 8 cores a node. I took an IPC of 0.5 for the arm cores, and we must still see they will get that IPC, most likely they won't. So it's nearly on par if we do a real accurate calculation. It's not like there is much of a margin in power consumption versus optimized i7 code. This factor 2 evaporates practical. Who would anyone be interested in buying this at this huge price with as far as i can see 0 advantages. On Nov 8, 2011, at 10:03 PM, Rayson Ho wrote: > ARM is an interesting platform that offers better performance/power > ratio than x64 processors. I don't think ARM will eat into HPC shares > of AMD/Intel/IBM POWER or enter the TOP500 list any time soon. > However, I am expecting to see ARM in high throughput environments in > the near future. Thus, we are announcing that the next version of Grid > Engine released by the Grid Scheduler open project will support ARM > Linux. > > We tested SGE on an ARMv7 box. As the SGE code is 64-bit clean, when > 64-bit ARM processors come out in the next year or two, our version > should/will compile & work out of the box. > > Rayson > > ================================= > Grid Engine / Open Grid Scheduler > http://gridscheduler.sourceforge.net > > > > On Tue, Nov 1, 2011 at 10:10 PM, Bill Broadley > wrote: >> The best summary I've found: >> http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ >> >> Specifications at for the ECX-1000: >> http://www.calxeda.com/products/energycore/ecx1000/techspecs >> >> And EnergyCard: >> http://www.calxeda.com/products/energycards/techspecs >> >> The only hint on price that I found was from theregister.co.uk: >> The sales pitch for the Redstone systems, says Santeler, is that a >> half rack of Redstone machines and their external switches >> implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, >> and costs $1.2m. >> >> So it sounds like for 6 watts and $750 you get a quad core 1.4 GHz >> arm >> 10G connected node. >> >> Comments? >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > -- > Rayson > > ================================================== > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Thu Nov 10 12:04:44 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 10 Nov 2011 12:04:44 -0500 (EST) Subject: [Beowulf] HP redstone servers In-Reply-To: References: <4EB0A678.9060602@cse.ucdavis.edu>