From amacater at galactic.demon.co.uk Sat Nov 1 05:05:07 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sat, 1 Nov 2003 10:05:07 +0000 Subject: Cluster Poll Results (tangent into OS choices, Fedora and Debian) In-Reply-To: <1067644757.5702.219.camel@haze.sr.unh.edu> References: <1067629499.21719.73.camel@localhost.localdomain> <1067644757.5702.219.camel@haze.sr.unh.edu> Message-ID: <20031101100507.GA623@galactic.demon.co.uk> On Fri, Oct 31, 2003 at 06:59:16PM -0500, Tod Hagan wrote: > > If Fedora achieves 2-3 upgrades per year then it will be fairly > different from Debian, which seems to be at 2-3 years per upgrade these > days, (well almost). I think it's averaged out at about 18 months overall for each major version release. Point releases of security fixes come out more frequently. Debian 2.2 was there for about two years with about 7 point releases, the last being made days before 3.0. 3.0 has only had one point release - but security fixes and so on are updated quickly. > > After a new release comes out Debian supports the old one for a period > of time (12 months?) with security updates before pulling the plug. > Given a two year release cycle, that means you may get three years of full support. We don't kill things off on fixed dates, necessarily, and it's open to every package maintainer to build fixes for "old stable" for as long as he wishes. One aim is to support upgrades from older systems easily: I'm fairly sure you can go from 2.1 - 2.2 - 3.0 - unstable with about four reboots - so thats about six or seven years of development in a couple of hours :) > Debian can be upgraded in place as opposed to requiring a full > resinstall; while this is great for desktops and servers, I'm not sure > if this is important for a cluster. Upgrades are relatively straightforward - unless you change kernels / a.out -> ELF / glibc major versions, you probably don't need to reboot. > > As a result of the extended release cycle Debian stable tends to lack > support for the newest hardware (Opteron 64-bit, for example). This is > why Knoppix, which is based on Debian, isn't derived from Debian stable, > but rather from packages in the newer releases (testing, unstable and > experimental). But the flip side is that the stable release, while > dated, tends to work well as it's had a lot of testing. Debian also works on 11 hardware architectures, with more coming along. We've had 64 bit issues on Alpha, Sparc and Itanium for a while. The 32 bit distribution works well on Opteron but there is also 64 bit stuff working. Testing is creeping asymptotically to release, as ever :) > > Debian could probably use more recognition as a target platform by > commercial software vendors but it incorporates a huge number of > packages including many open source applications pertinent to science. > Breadth in packaged applications is probably more important for > workstations since clusters tend to use small numbers of apps very > intensely. > There's a lot of stuff packaged by Debian people who want, for example, genome sequencing / heavy maths and other "stuff" :) > > Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3 > releases per year rate probably requires a small core, putting the bulk > of applications into the Extras category and thus increasing the chance > of conflict. (Wasn't that the original recipe for DLL hell?) Debian has > avoided this through a much larger core, which of course slows the > release cycle. > The key is tight dependency control and management. That's what has set Debian apart from the distributions based on .rpm. There's a heavy overhead for the maintainers but hopefully a lighter burden on users. [Full disclosure: I'm also amacater at debian.org :) ] Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Nov 1 10:40:24 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 1 Nov 2003 10:40:24 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: Message-ID: On Fri, 31 Oct 2003, Joel Jaeggli wrote: > > Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't > > claim support from RH for more than one of the systems. > > read the liscsense agreement for you redhat enterprise disks... I think his point is that there is some untested legal ground here for GPL distributions, "license agreement" or not. As in it remains to be seen whether it is possible to create a license agreement for a GPL or mostly-GPL distribution that restricts the redistribution of the binary images. To quote from the preamble: When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. Note the phrase "free software". Note also the various inheritance clauses. I'm not a lawyer; I don't know how this would ultimately untangle in a court if someone chose to just ignore RH's license agreement and install things as they wished, but I'm sure we'll eventually find out...;-) rgb > > > Regards, > > Steffen > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Nov 1 10:35:03 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 1 Nov 2003 10:35:03 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices, Fedora and Debian) In-Reply-To: <1067644757.5702.219.camel@haze.sr.unh.edu> Message-ID: On 31 Oct 2003, Tod Hagan wrote: > While looking into the number of packages in Debian vs. Fedora I > stumbled across this frightening bit (gotta throw a Halloween reference > in somewhere) on the Fedora site: > > http://fedora.redhat.com/participate/terminology.html > > Packages in Fedora Extras should avoid conflicts with other packages > > in Fedora Extras to the fullest extent possible. Packages in Fedora > > Extras must not conflict with packages in Fedora Core. > > It seems that Fedora intends to achieve applications breadth through > "Fedora Extras" package sets in other repositories, but the prohibition > of conflicts between Extras packages isn't as strong as the absolute > prohibition of conflicts between Extras and Core packages. Could this > result in a new era of DLL hell a few years down the road? > > Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3 > releases per year rate probably requires a small core, putting the bulk > of applications into the Extras category and thus increasing the chance > of conflict. (Wasn't that the original recipe for DLL hell?) Debian has > avoided this through a much larger core, which of course slows the > release cycle. Pre-yum, I would have said yes, but I honestly think that yum has arrived in the knick of time to rescue RPM-based distributions of all sorts from precisely this. Fedora (and for that matter RH mainstream) appears to have embraced yum (perhaps somewhat reluctantly, given that it obsoletes up2date dramatically before up2date achieved anything like real traction in the marketplace) and AFAIK are yummifying their repositories or plan to soon (as well as provide yum as a part of the distribution). With yum, packages that conflict will have a very, very short lifetime in any public or private repository because the conflicts will be immediately exposed and the conflicting packages either rejected or rebuilt. Note well that with rpm-based distros one can put oneself into hell already by just using rpm --force (and who hasn't done this at least once, seriously:-). If one uses kickstart to install systems and yum to install additional packages and update/upgrade the ones you've got (religiously) one cannot enter hell as the gates are barred. One MAY well have packages with conflicts that one wishes to put into your repositories, but yum will reveal them in short order and you (or the group with whom you share an interest in the packages) will willy-nilly fix the packages or have the PACKAGES consigned to hell. The point is that FINALLY having a sensible toolset that can resolve all forward dependencies, revealing conflicts, obsoletes, dependency loops, file (as opposed to package) dependencies, and all the other Evil that rpm as a bare specification at long last enables packages to be developed with something approximating rigor and discipline. This is one of several reasons that I think that Fedora (or similar rpm-associated projects -- there will likely be more than one) will turn out in the very short run to be MORE reliable than RHE, and that there will be a very distinct flow of energy FROM fedora BACK to RHE. In fact, I think that as fedora becomes the "real" open source red hat where development is rapid and problems are rapidly revealed and repaired on the dynamic edge by all the people who actually wrote and use the bulk of the software in ANY linux distribution, people who buy RHE are increasingly going to be getting fedora, repackaged and "tested and supported" by RH and resold to people that want to be insulated from the supposedly chaotic process that is PRECISELY what has been driving linux stable and unstable distributions for years now to everybody's general satisfaction. yum isn't even "finished", in my opinion, and will only get stronger as tools and concepts are added to the suite. There are some really significant changes in the wind that could conceivably trigger a long-overdue paradigm shift in the way packages in ALL linux distributions, including e.g. Debian (not just rpm-based) are installed. Then there are several ideas out there for tools that don't just install binary/distro compatible rpm's from a distro-specific repository, but rather install a binary/distro compatible "base" system and then download and dynamically build source rpm's (either for a local install or to BUILD a binary/distro non-conflicted rpm repository for yum install and maintenance). As I said in an earlier message, I think that the Internet's general response to RH's attempt to coerce money on the order of $100/seat/year (or more, conceivably MUCH more) from all its users is going to be very, very "interesting". Chinese curse "interesting". (And yes, I think that this is totally absurd on a workstation or small (<32 node) compute node that these days might cost $500-600 full retail, where its costs is more like 20% of the base hardware where 2%/year would STILL be too much -- noting that small <32 node clusters that comprise the bulk of installed clusters, and that $3200 or more is TOTALLY out of the question for most of these. I also see no reason whatsoever for a "workstation" distribution to be crippled by omitting http, nfs, and the various so-called "server" packages -- one of linux's strengths for years has been that when a workstation needs to become a server, you just chkconfig the server features on, possibly after installing a couple of packages. In fact, I see absolutely no reason for any linux distribution to partition out ANY packages for special treatment -- once they are built for a distribution, they are built, building/rebuilding most of them once the source rpm's are made consistent even one time is mostly a matter of rpm --rebuild package.src.rpm.) I could of course be wrong -- maybe we'll all be spending trans-microsoftian amounts of money. I'll be cheerily paying Red Hat several thousand dollars a year for the privilege of running an internal webserver and nfs file server in my house to serve a handful of computers on my kids' and wife's desks. Maybe Duke will just go "sure, we'll just raise tuition by a few hundred more dollars -- the kids and their parents won't care -- so we can give it to Red Hat as we'd MUCH rather pay them even more insane amounts of money than the $17/seat or thereabouts we currently pay to Microsoft." Maybe the NSF and DOE will go "oh, hey, our bad" and ask the government to raise taxes a bit so that all the government labs that use linux can now spend hundreds to thousands of dollars extra per node/workstation/"server" (with Microsoft sitting there perfectly happy to "compete" head to head for the same privilege, invariably at LOWER PRICES). Maybe consumers in Best Buy will look at those spiffy box sets of Red Hat Linux that have finally started to grace their shelves and say "gee, here is an operating system that costs more than twice as much as WinXX, won't run any of these seven hundred off the shelf WinXX applications, requires considerable expertise to install it and maintain it and doesn't come preloaded on any system I might buy here -- I simply MUST have it." I personally think that Red Hat's board has lost its collective mind. This is dicey ground; their stock price has not coincidentally been skyrocketing as they present a public picture of "becoming another Microsoft or Sun" complete with prices to match. Of course, Microsoft AND Sun have slowly but surely been LOSING market share to linux, largely on the basis of COST-BENEFIT. What will happen if/when this rosy picture of huge profits turns out to be an "expectation bubble" and that it actually SLOWS the rate at which RH is adopted in the corporate marketplace compared to other commercial Unices and Microsofts competing products (not to mention crushes the potential consumer linux marketplace before it even was fully born)? Ugly... As Greg (in his role as a pundit:-) once remarked at a meeting I attended, the game of being a pundit is to try to see the future and make oracular predictions; to be provocative or evocative, right or wrong. So I could be wrong. Either way things will prove "interesting";-) rgb P.S. - to Greg, sorry if I'm misquoting you -- my memory is far from perfect but IIRC this was at the Atlanta ALS meeting and you were on a panel discussion:-) -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 1 11:45:40 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 1 Nov 2003 11:45:40 -0500 (EST) Subject: what happened to deerfield? Message-ID: I was just wondering whether anyone had seen tangible evidence of deerfield (Intel's low-voltage, low-power, 1.5M cache it2 whose 1G version was claimed to sell for $800 or so). I'd be especially interested if any vendors have produced 1U duals: price, heat and performance... thanks, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brett at nssl.noaa.gov Sat Nov 1 14:51:44 2003 From: brett at nssl.noaa.gov (Brett Morrow) Date: Sat, 01 Nov 2003 13:51:44 -0600 Subject: Scyld and MPI In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> References: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> <20031031181912.GB1289@greglaptop.internal.keyresearch.com> Message-ID: <3FA40ED0.90605@nssl.noaa.gov> I am running the latest version of SCYLD and having some trouble that I hope someone has seen before and can fix. I am running gm version 1.5.1 and Mpich 1.2.3 with the PGI compiler (have tried version 4.0-2 and 5.0. I have the SRPM for SCYLD for the MPICH so I can build the F90 compiler and it all builds clean. The problem is I am trying to get a model called WRF to run and the jobs all start as they are suppose to on all the processors I specify by the np variable. They all set at 100% cpu usage, but I get no output. It is like something is not being passed right. Before I switched to SCYLD (which I love because it solves so many management headaches and is VERY easy to install on big clusters), I was running a Redhat 7.3 with GM 1.6 and Mpich 1.2.5 and the jobs ran just fine. Does anyone know what might could cause this? Thanks -Brett Morrow National Severe Storms Lab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Mon Nov 3 00:32:09 2003 From: jsims at csiopen.com (Joey Sims) Date: Mon, 3 Nov 2003 00:32:09 -0500 Subject: Low Voltage Itanium2 Message-ID: <812B16724C38EE45A802B03DD01FD547226266@exchange.concen.com> I believe Intel is planning a production run of low voltage Itanium2 chips as their answer to AMD's offering of low voltage Opteron. Who really knows what Intel is doing lately.... Yes, 1U/2P Itanium2 boxes are available. -joey ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Mon Nov 3 06:51:24 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Mon, 3 Nov 2003 08:51:24 -0300 (ART) Subject: Turn on nodes through the network Message-ID: <20031103115124.77919.qmail@web12203.mail.yahoo.com> I finish my cluster beowulf, and first of all, I wold like to thanks everybody that help me through this mailing list, now I`m sure that I`ll have a lot of other problems in this new phase. And i already have one. I would like to know Howto boot my machines(nodes) using the network, I would like that turning on my master node the slave nodes automatically wake. What can I do? or, where can I find more information? Thanks Mathias Brito ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Mon Nov 3 07:53:54 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Mon, 3 Nov 2003 09:53:54 -0300 (ART) Subject: What`s wrong with my code Message-ID: <20031103125354.8071.qmail@web12205.mail.yahoo.com> Well, I avoided to send my code, because it is not the best way to solve the problem, and I using only the basics calls of MPI. The programam make the sum of two matrices. It generate two matrices ramdomically and sum it. But it didn`t work with a matrix greather than 834x834. I dont kwon why. Some variables and functions have portuguese names, but i commented it to say what it do. #include #include #include #include #define LINHAS 835 /*Number of lines*/ #define COLUNAS 835 /*Number of colums*/ #define TRUE 1 #define FALSE 0 void juntar(int *, int*); /*put the result of local operation in the final result matrix*/ void somar(int *, int*, int*); /*make the sum*/ void imprimir(int[][COLUNAS]); /*print a matrix*/ void inicializar(int[][COLUNAS]); /*initialize matrix with ramdom numbers*/ int main(int argc, char *argv[]) { int minha_parte1[LINHAS], /*my_part, my_result*/ minha_parte2[LINHAS], meu_resultado[LINHAS] = {0}; int size, my_rank; int i, j, tag = 0; int master = 0; int sair = 0; /*exit = 0*/ MPI_Status status; srand(time(NULL)); MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if(my_rank == master) { int matriz1[LINHAS][COLUNAS], matriz2[LINHAS][COLUNAS], resultado[LINHAS][COLUNAS]; int linhas_env = 0; /*number of line sent*/ int linhas_rec = 0; /*number of lines received*/ inicializar(matriz1); inicializar(matriz2); //imprimir(matriz1); //imprimir(matriz2); for(i = 1; i < size; i++) { if(linhas_env < LINHAS) { if(MPI_Send(&matriz1[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD) == MPI_ERR_BUFFER) printf("ERRO\n"); MPI_Send(&matriz2[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); //printf("MASTER: Enviando dados para o processo %d\n", i); linhas_env++; } } i = 1; while(TRUE) { if(linhas_rec < LINHAS) { MPI_Recv(&meu_resultado, COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD, &status); juntar(&resultado[linhas_rec][0], meu_resultado); //printf("MASTER: Recebendo dados do processo %d. Total de linhas recebidas = %d\n", i, linhas_rec + 1); linhas_rec++; } else break; if(linhas_env < LINHAS) { MPI_Send(&matriz1[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&matriz2[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); //printf("MASTER: Enviado mais dados para o processo %d. Total de linhas enviadas = %d\n", i, linhas_env + 1); linhas_env++; } if(i == size - 1) i = 1; else i++; } for(i = 1; i < size; i++) { MPI_Send(&sair, COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); //printf("MASTER: Finalizando processo %d\n", i); } printf("\n\n"); //imprimir(resultado); } else { while(TRUE) { MPI_Recv(&minha_parte1, COLUNAS, MPI_INT, master, tag, MPI_COMM_WORLD, &status); if(minha_parte1[0] == 0) break; MPI_Recv(&minha_parte2, COLUNAS, MPI_INT, master, tag, MPI_COMM_WORLD, &status); somar(minha_parte1, minha_parte2, meu_resultado); MPI_Send(&meu_resultado, COLUNAS, MPI_INT, master, tag, MPI_COMM_WORLD); } } MPI_Finalize(); return 0; } void juntar(int *m, int *r) { int i; for(i = 0; i < COLUNAS; i++) { m[i] = r[i]; } } void somar(int m[], int n[], int r[]) { int i; for(i = 0; i < COLUNAS; i++) r[i] = m[i] + n[i]; } void imprimir(int m[][COLUNAS]) { int i, j; for(i = 0;i < LINHAS; i++) { for(j = 0; j < COLUNAS; j++) { printf("%d\t", m[i][j]); } printf("\n"); } printf("\n\n"); } void inicializar(int m[][COLUNAS]) { int i, j; for(i = 0; i < LINHAS; i++) { for(j = 0; j < COLUNAS; j++) { m[i][j] = (rand() % 10) + 1; } } } ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Rafael.Tinoco at sun.com Mon Nov 3 07:11:16 2003 From: Rafael.Tinoco at sun.com (rafael david tinoco) Date: Mon, 03 Nov 2003 10:11:16 -0200 Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> References: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: <1067861475.5670.2.camel@dhcp-sao01-194-186.Brazil.Sun.COM> Hello mathias, i know sun has one thing called: Serial Over Lan, with that, you can power up all stations using the "LAN CONSOLE" in the v60/65 (intel based) machines. try finding something like this.. regards rafael david tinoco sun professional services - brazil rafael.tinoco at sun.com On Mon, 2003-11-03 at 09:51, Mathias Brito wrote: > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? > > Thanks > Mathias Brito > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci??ncias Exatas e Tecnol??gicas > Estudante do Curso de Ci??ncia da Computa????o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Mon Nov 3 05:52:29 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Mon, 03 Nov 2003 11:52:29 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <1067856749.902.33.camel@revolution.mandrakesoft.com> > Hmm... Let's take the case of a 1000 node system. If we assume a > $3000/node cost (probably low once rack, UPS, hardware support, and > interconnect are added in), we arrive at an approximate hardware cost of > $3,000,000. If we were to use the RHEL WS list price of $179/node, we > get $179,000 or about 6% of the hardware cost. That is assuming RedHat > will not provide any discount on large volume purchases (unlikely). Is > 6% unreasonable? 6% is reasonable but for a clustering awared distribution not for a general use distro. > What are the alternatives? [...] > - Mandrake - Mandrake has their clustering distribution, which could be > a good possibility, but the cost is as high or higher than RedHat. We've already talk about that on this mailing list. http://www.beowulf.org/pipermail/beowulf/2003-September/008032.html CLIC & MandrakeClustering are not comparable with RedHat because we really offer a Linux distribution specially redesigned for the clustering (tools, installation, configuration has been made for meeting the clustering needs). > The cluster management portion of the software stack would be great to > have integrated in to the product, but if third party vendors (Linux > Networx, OSCAR, Rocks, etc...) can provide the cluster management > portion on top of the distribution, a solution can be found. In some > ways this is even better since your cluster management decision is > independent of the OS vendor. Our vision is to provide a real distribution based on a generalist distro (MDK 9.0) with a lots of applications and modifications for the cluster. For example drakcluster helps you to manage your cluster (add/remove nodes or users in maui partition, etc..) -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From yudong at hsb.gsfc.nasa.gov Mon Nov 3 10:30:47 2003 From: yudong at hsb.gsfc.nasa.gov (Yudong Tian) Date: Mon, 3 Nov 2003 10:30:47 -0500 Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: WOL (Wake on LAN) will do the trick. If your NIC cards support WOL, your BIOS lets you turn it on, your nodes have decent power supplies, and your network switches behave normally, then you can let your master node to turn on the slave nodes automatically. You can turn them on one by one in whatever sequence you desire. ------------------------------------------------------------ Falun Dafa: The Tao of Meditation (http://www.falundafa.org) ------------------------------------------------------------ Yudong Tian, Ph.D. NASA/GSFC (301) 286-2275 > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf > Of Mathias Brito > Sent: Monday, November 03, 2003 6:51 AM > To: beowulf at beowulf.org > Subject: Turn on nodes through the network > > > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? > > Thanks > Mathias Brito > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci?ncias Exatas e Tecnol?gicas > Estudante do Curso de Ci?ncia da Computa??o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bhalevy at panasas.com Mon Nov 3 11:02:54 2003 From: bhalevy at panasas.com (Halevy, Benny) Date: Mon, 3 Nov 2003 11:02:54 -0500 Subject: What`s wrong with my code Message-ID: <30489F1321F5C343ACF6872B2CF7942A039DF922@PIKES.panasas.com> Mathias, I suspect you run out of stack with higher values of LINHAS and COLUNAS. Try calculating how much memory these automatic variables need... > if(my_rank == master) { > int matriz1[LINHAS][COLUNAS], >matriz2[LINHAS][COLUNAS], resultado[LINHAS][COLUNAS]; You should consider to allocate these matrices dynamically using malloc(); - Benny >-----Original Message----- >From: Mathias Brito [mailto:mathiasbrito at yahoo.com.br] >Sent: Monday, November 03, 2003 7:54 AM >To: beowulf at beowulf.org >Subject: What`s wrong with my code > > >Well, I avoided to send my code, because it is not the >best way to solve the problem, and I using only the >basics calls of MPI. The programam make the sum of two >matrices. It generate two matrices ramdomically and >sum it. But it didn`t work with a matrix greather than >834x834. I dont kwon why. Some variables and functions >have portuguese names, but i commented it to say what >it do. > >#include >#include >#include >#include > >#define LINHAS 835 /*Number of lines*/ >#define COLUNAS 835 /*Number of colums*/ >#define TRUE 1 >#define FALSE 0 > >void juntar(int *, int*); /*put the result of local >operation in the final result matrix*/ >void somar(int *, int*, int*); /*make the sum*/ >void imprimir(int[][COLUNAS]); /*print a matrix*/ >void inicializar(int[][COLUNAS]); /*initialize matrix >with ramdom numbers*/ > >int main(int argc, char *argv[]) { > int minha_parte1[LINHAS], /*my_part, my_result*/ >minha_parte2[LINHAS], meu_resultado[LINHAS] = {0}; > int size, my_rank; > int i, j, tag = 0; > int master = 0; > int sair = 0; /*exit = 0*/ > MPI_Status status; > > srand(time(NULL)); > > MPI_Init(&argc, &argv); > MPI_Comm_size(MPI_COMM_WORLD, &size); > MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); > > if(my_rank == master) { > int matriz1[LINHAS][COLUNAS], >matriz2[LINHAS][COLUNAS], resultado[LINHAS][COLUNAS]; > int linhas_env = 0; /*number of line sent*/ > int linhas_rec = 0; /*number of lines received*/ > > inicializar(matriz1); > inicializar(matriz2); > > //imprimir(matriz1); > //imprimir(matriz2); > > for(i = 1; i < size; i++) { > if(linhas_env < LINHAS) { > >if(MPI_Send(&matriz1[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD) == MPI_ERR_BUFFER) > printf("ERRO\n"); > >MPI_Send(&matriz2[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD); > //printf("MASTER: Enviando >dados para o >processo %d\n", i); > linhas_env++; > } > } > > > i = 1; > while(TRUE) { > > if(linhas_rec < LINHAS) { > MPI_Recv(&meu_resultado, >COLUNAS, MPI_INT, i, >tag, MPI_COMM_WORLD, &status); > juntar(&resultado[linhas_rec][0], >meu_resultado); > //printf("MASTER: Recebendo >dados do processo >%d. Total de linhas recebidas = %d\n", i, linhas_rec + >1); > linhas_rec++; > } > else > break; > > if(linhas_env < LINHAS) { > >MPI_Send(&matriz1[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD); > >MPI_Send(&matriz2[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD); > //printf("MASTER: Enviado >mais dados para o >processo %d. Total de linhas enviadas = %d\n", i, >linhas_env + 1); > linhas_env++; > } > if(i == size - 1) > i = 1; > else > i++; > } > > for(i = 1; i < size; i++) { > MPI_Send(&sair, COLUNAS, MPI_INT, i, tag, >MPI_COMM_WORLD); > //printf("MASTER: Finalizando >processo %d\n", >i); > } > > printf("\n\n"); > //imprimir(resultado); > > } > else { > while(TRUE) { > MPI_Recv(&minha_parte1, COLUNAS, >MPI_INT, master, >tag, MPI_COMM_WORLD, &status); > if(minha_parte1[0] == 0) > break; > > MPI_Recv(&minha_parte2, COLUNAS, >MPI_INT, master, >tag, MPI_COMM_WORLD, &status); > > somar(minha_parte1, minha_parte2, >meu_resultado); > > MPI_Send(&meu_resultado, COLUNAS, MPI_INT, >master, tag, MPI_COMM_WORLD); > } > } > > MPI_Finalize(); > return 0; >} > >void juntar(int *m, int *r) { > int i; > for(i = 0; i < COLUNAS; i++) { > m[i] = r[i]; > } >} > >void somar(int m[], int n[], int r[]) { > int i; > for(i = 0; i < COLUNAS; i++) > r[i] = m[i] + n[i]; >} > >void imprimir(int m[][COLUNAS]) { > int i, j; > for(i = 0;i < LINHAS; i++) { > for(j = 0; j < COLUNAS; j++) { > printf("%d\t", m[i][j]); > } > printf("\n"); > } > printf("\n\n"); >} > >void inicializar(int m[][COLUNAS]) { > int i, j; > for(i = 0; i < LINHAS; i++) { > for(j = 0; j < COLUNAS; j++) { > m[i][j] = (rand() % 10) + 1; > } > } >} > > >===== >Mathias Brito >Universidade Estadual de Santa Cruz - UESC >Departamento de Ci?ncias Exatas e Tecnol?gicas >Estudante do Curso de Ci?ncia da Computa??o > >Yahoo! Mail - o melhor webmail do Brasil >http://mail.yahoo.com.br >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 3 11:52:24 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 3 Nov 2003 11:52:24 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: <1067861475.5670.2.camel@dhcp-sao01-194-186.Brazil.Sun.COM> Message-ID: On Mon, 3 Nov 2003, rafael david tinoco wrote: > i know sun has one thing called: Serial Over Lan, with that, you can > power up all stations using the "LAN CONSOLE" in the v60/65 (intel > based) machines. This is a standard feature of Intel IPMI 1.5 specification. Most implementations of IPMI also allow setting up the BIOS. While not part of the standard, it's a natural connection of BIOS-over-serial and serial-over-LAN. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 3 11:50:42 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 3 Nov 2003 11:50:42 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: On Mon, 3 Nov 2003, Mathias Brito wrote: > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? We developed the driver support (needed with some cards) and ether-wake code to do that several years ago: http://www.scyld.com/expert/wake-on-lan.html This requires both Wake-on-LAN hardware and soft-power-off, but most modern machines have that. A more reliable and sophisticated approach is to use systems with IPMI 1.5 support. That generally requires a Baseboard Management Controller (BMC) on the motherboard, which adds $25-$150 to the price. We have demoed software that hooks into the load monitoring to automatically bring up more cluster nodes as needed. That takes advantage of our ability to boot nodes in only a few seconds, but you might still consider booting your nodes on demand. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Mon Nov 3 10:28:44 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Mon, 03 Nov 2003 16:28:44 +0100 Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> References: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: <1067873324.902.44.camel@revolution.mandrakesoft.com> Le lun 03/11/2003 ? 12:51, Mathias Brito a ?crit : > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? you can add a script in your rc.local that make a series of call to ether-wake. Your nodes must be wakeable by network but most of new computers are able to do it. -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 3 13:08:29 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 3 Nov 2003 13:08:29 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: Message-ID: > 1.5 support. That generally requires a Baseboard Management Controller > (BMC) on the motherboard, which adds $25-$150 to the price. I'd very much appreciate seeing an example of this. or do you mean "BMC adds $25-150 to the price of an already gold-plated system"? as a concrete example, Tyan's S2723 is a reasonable example of a board you might find in a cluster. IPMI/BMC is an option via the qlogic zircon, but I have never found a real price for it - one vendor quoted me a little under $Cdn 1000 for the daughtercard, which is just plain ridiculous for a <$500 motherboard. thanks, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Mon Nov 3 14:29:40 2003 From: djholm at fnal.gov (Don Holmgren) Date: Mon, 03 Nov 2003 13:29:40 -0600 Subject: Turn on nodes through the network In-Reply-To: References: Message-ID: On Mon, 3 Nov 2003, Mark Hahn wrote: > > 1.5 support. That generally requires a Baseboard Management Controller > > (BMC) on the motherboard, which adds $25-$150 to the price. > > I'd very much appreciate seeing an example of this. or do you mean > "BMC adds $25-150 to the price of an already gold-plated system"? > > as a concrete example, Tyan's S2723 is a reasonable example of a board > you might find in a cluster. IPMI/BMC is an option via the qlogic > zircon, but I have never found a real price for it - one vendor quoted > me a little under $Cdn 1000 for the daughtercard, which is just plain > ridiculous for a <$500 motherboard. > > thanks, mark hahn. On clusters we've built with Supermicro E7500 or E7501 chipset motherboards (P4DPE, X5DPE), which I believe are roughly equivalent to the Tyan S2723 in features and price, there's an IPMI/BMC option card based on the Agilent BMC available. We've paid between $90 and $100 for these cards, depending on volume. I've not purchased Intel motherboards in quantity, but from doing a quick web search, it looks like the incremental price between boards without (SE7501CW2) and with (SE7501BR2) IPMI is no more than $150. Don Holmgren _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Nov 3 14:00:46 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 3 Nov 2003 11:00:46 -0800 Subject: opteron VS Itanium 2 In-Reply-To: References: Message-ID: <20031103190046.GF1167@greglaptop.internal.keyresearch.com> On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: > Yeah, me too. As someone who just ponied up for a rather large IB > installation, I'm not sure that most people realize what a substantial > percentage of the cost of the cluster the IB might be. >From all public indications, IB prices are roughly the same as Myrinet. Nothing new there... -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Mon Nov 3 14:31:40 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Mon, 3 Nov 2003 13:31:40 -0600 (CST) Subject: opteron VS Itanium 2 In-Reply-To: <20031103190046.GF1167@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 3 Nov 2003, Greg Lindahl wrote: > On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: > > > Yeah, me too. As someone who just ponied up for a rather large IB > > installation, I'm not sure that most people realize what a substantial > > percentage of the cost of the cluster the IB might be. > > From all public indications, IB prices are roughly the same as > Myrinet. Nothing new there... > > -- greg > IB costs significantly more than Myrinet... -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.worsham at mci.com Mon Nov 3 15:07:24 2003 From: michael.worsham at mci.com (Michael Worsham) Date: Mon, 03 Nov 2003 15:07:24 -0500 Subject: Freebee RH Releases... Message-ID: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> As Per Slashdot (http://slashdot.org/) Received a missive this morning from the Red Hat Network, stating that they will discontinue maintenance on Red Hat Linux 7.x and 8.0 by the end of 2003, and on Red Hat 9.0 by the end of April, 2004. And, more ominously: 'Red Hat does not plan to release another product in the Red Hat Linux line.' [The full text of the email is on Newsforge.] Does this mean that we will all have to using WS or ES version of RedHat, thus getting ripped a bit on support and updates? Anyone have a cluster running on anything else non-RH based and any details for how to do it? -- Michael _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Nov 3 16:51:17 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Mon, 3 Nov 2003 13:51:17 -0800 (PST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: On Mon, 3 Nov 2003, Michael Worsham wrote: > As Per Slashdot (http://slashdot.org/) > > Received a missive this morning from the Red Hat Network, stating that they > will discontinue maintenance on > Red Hat Linux 7.x and 8.0 by the end of 2003, and on Red Hat 9.0 by the end > of April, 2004. And, more ominously: 'Red Hat does not plan to release > another product in the Red Hat Linux line.' [The full text > of the email is > on Newsforge.] > > Does this mean that we will all have to using WS or ES version of RedHat, > thus getting ripped a bit on support and updates? Anyone have a cluster > running on anything else non-RH based and any details for how to do it? fedora.redhat.com > -- Michael > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 3 17:36:16 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 3 Nov 2003 17:36:16 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: Message-ID: On Mon, 3 Nov 2003, Mark Hahn wrote: > > 1.5 support. That generally requires a Baseboard Management Controller > > (BMC) on the motherboard, which adds $25-$150 to the price. > > I'd very much appreciate seeing an example of this. or do you mean > "BMC adds $25-150 to the price of an already gold-plated system"? I've seen a quote of +$26 to populate the BMC chip a board that supported it. Or more precisely, -$26 to delete the chip from the standard config. While not low-end boards, this was a motherboard definitely in the commodity range. Daughtercard implementations costs at the high end of the range, $70-$150, if you can find them. > as a concrete example, Tyan's S2723 is a reasonable example of a board > you might find in a cluster. IPMI/BMC is an option via the qlogic > zircon, but I have never found a real price for it - one vendor quoted > me a little under $Cdn 1000 for the daughtercard, which is just plain > ridiculous for a <$500 motherboard. That's means "we don't know how much it costs, but for $1K we'll find out". Much like the old VGA feature connector or IRDA header, a BMC header is worthless. The only way you'll get a BMC is when one is packaged and priced (and tested) with the motherboard. The Zircon chip is easily most common controller, but it seems that the firmware must be tweaked for each implementation. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 3 18:27:50 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 3 Nov 2003 18:27:50 -0500 (EST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: On Mon, 3 Nov 2003, Michael Worsham wrote: > Does this mean that we will all have to using WS or ES version of RedHat, > thus getting ripped a bit on support and updates? Anyone have a cluster > running on anything else non-RH based and any details for how to do it? Fedora, fedora, fedora. http://fedora.redhat.com/ AFAICT, this is going to de facto be "Red Hat 10" (except that one can probably not say that because of trademark issues and so forth), but with more "community involvement". Community involvement that will probably be a GOOD thing, by the way. Fedora will come pre-yummified at the core and will have RH engineers continuing to be heavily involved. This is only sensible as I expect fedora to become the core of RH's development cycle, as they aren't going to be able to offer "rawhide" of any sort at RHE prices. However, with the community really participating, I also don't expect fedora to be in any sense "alpha" or "beta" versions of RHE -- more like RHL, reasonably stable, reasonably supported, but don't expect to be able to just call RH and demand to talk to a systems engineer and get help. Which, by the way, one doesn't really do now. So nobody RHish panic, just start looking into fedora, maybe join its list(s). BTW, I expect there to be opteron support in fedora pretty soon as well. There better be; I'm getting a bunch of them...;-) rgb > > -- Michael > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 3 18:44:18 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 3 Nov 2003 18:44:18 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) (fwd) Message-ID: Andrew sent me this but forgot to add the list address, so I'm forwarding it on to the list for him...:-) I'll probably send my reply to this later. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu ---------- Forwarded message ---------- Date: Sat, 1 Nov 2003 19:48:52 +0000 From: Andrew M.A. Cater To: Robert G. Brown Subject: Re: Cluster Poll Results (tangent into OS choices) On Sat, Nov 01, 2003 at 10:40:24AM -0500, Robert G. Brown wrote: > On Fri, 31 Oct 2003, Joel Jaeggli wrote: > > > > Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't > > > claim support from RH for more than one of the systems. > > > > read the liscsense agreement for you redhat enterprise disks... > RH can request that they be allowed to audit your cluster, IIRC. I think the idea is that RH Enterprise [Server/Advanced Server/Workstation] is trademarked, copywrited and contains some non-GPL portions. RH can therefore insist that you install only one copy per single machine as per your licence - you can't just copy the binaries and put the one copy on to your other 1023 nodes. But you do get (up to) five years support. [You can't buy RHE without support, IIRC]. If you add non-GPL software to an otherwise GPL'd distribution, you can charge for it: you can also restrict the use of the whole distribution thus created as I understand it because it contains your proprietary code. If you modify GPL'd software in order to create your proprietary added value, however, then that modified software must in turn be available under the GPL. FWIW, SUSE operate the same way: you can't buy SUSE .iso's unless you buy the box and you are not licensed to make copies thereof. [SUSE do, however, make it possible to install the whole distribution via ftp from their site]. > > Note the phrase "free software". Note also the various inheritance > clauses. I'm not a lawyer; I don't know how this would ultimately > untangle in a court if someone chose to just ignore RH's license > agreement and install things as they wished, but I'm sure we'll > eventually find out...;-) > I trained as an (English) lawyer - but didn't pursue that to practice. It would depend on the jurisdiction. Another potentially good reason to go with Debian - which doesn't restrict use, modification, distribution or field of endeavour. I won't mention other purported Linux distributions which now require you to sign a non-GPL licence before you can download GPL licenced updates but that too is an interesting case. :) Andy [Potentially OT PS: Yum appears to be a re-invention of apt functionality with some improvements. You've hit the same dependencies problems that may already have been solved by apt three years ago. The _real_ trick is to sort dependency issues properly at a distribution wide level. My problems with RH have always been that the RH doesn't include enough packages and those packages that are not packaged directly by RH can be of variable quality -hence digging the package out from the 'Net somewhere and DLL hell. (It also doesn't help that there are five or six vendors out there with "different" .rpms of the same thing for SUSE/Mandrake/RH7.x/8.0/9.0 because RPM has been interpreted/implemented in a variable way). Debian isn't perfect - a spell spent on reading the mailing list archives would _easily_ prove this :) - but, perversely, having 8710 packages in the "stable" tree and about 14000 in unstable has meant that the main distribution often contains exactly the package you were looking for ready to drop in. [It's also quite useful, for example, to re-use legacy lab hardware and run your new cluster on Opterons but display the pretty output graphs on your Suns and do post-processing of the data on your old Alphas. I couldn't do that with the same versions of the software on all three architectures on any other current GNU/Linux distribution :) ] For those who haven't used Debian and wonder what all the fuss is about: apt-cache show foo will give you all the details of foo apt-get install foo will install foo and all its dependencies in one operation And by using the following command line apt-get update ; apt-get dist-upgrade your entire machine will be brought up to date. [Where apt-get update updates your package list and apt-get dist-upgrade resolves the interdependencies and fetches the necessary packages.] ] _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Mon Nov 3 17:30:51 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Mon, 03 Nov 2003 17:30:51 -0500 Subject: opteron VS Itanium 2 In-Reply-To: References: Message-ID: <3FA6D71B.1020900@comcast.net> Rocky McGaugh wrote: >On Mon, 3 Nov 2003, Greg Lindahl wrote: > > > >>On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: >> >> >> >>>Yeah, me too. As someone who just ponied up for a rather large IB >>>installation, I'm not sure that most people realize what a substantial >>>percentage of the cost of the cluster the IB might be. >>> >>> >>From all public indications, IB prices are roughly the same as >>Myrinet. Nothing new there... >> >>-- greg >> >> >> > >IB costs significantly more than Myrinet... > > Are you sure? In the quotes I've gotten, it's about the same as Myrinet except for very small clusters (perhaps 4 nodes or less). In fact in some cases, it's cheaper than Myrinet. :) Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Mon Nov 3 18:50:33 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Mon, 03 Nov 2003 18:50:33 -0500 Subject: Freebee RH Releases... In-Reply-To: References: Message-ID: <3FA6E9C9.7010804@comcast.net> Robert G. Brown wrote: >On Mon, 3 Nov 2003, Michael Worsham wrote: > > >>Does this mean that we will all have to using WS or ES version of RedHat, >>thus getting ripped a bit on support and updates? Anyone have a cluster >>running on anything else non-RH based and any details for how to do it? >> >> > >Fedora, fedora, fedora. > > http://fedora.redhat.com/ > >AFAICT, this is going to de facto be "Red Hat 10" (except that one can >probably not say that because of trademark issues and so forth), but >with more "community involvement". Community involvement that will >probably be a GOOD thing, by the way. > >Fedora will come pre-yummified at the core and will have RH engineers >continuing to be heavily involved. This is only sensible as I expect >fedora to become the core of RH's development cycle, as they aren't >going to be able to offer "rawhide" of any sort at RHE prices. However, >with the community really participating, I also don't expect fedora to >be in any sense "alpha" or "beta" versions of RHE -- more like RHL, >reasonably stable, reasonably supported, but don't expect to be able to >just call RH and demand to talk to a systems engineer and get help. > >Which, by the way, one doesn't really do now. > >So nobody RHish panic, just start looking into fedora, maybe join its >list(s). > >BTW, I expect there to be opteron support in fedora pretty soon as well. >There better be; I'm getting a bunch of them...;-) > Let me mention cAos (caosity.org). It's a community supported RH based OS built on RH EL (3.0 I think). It's following the letter of the law in removing trademarks from RH EL and rebuilding the distribution with some add features. If you look at the list of people involved, I think you'll see some familiar names from this mailing list. Consequently, I think cAos will be built with clusters in mind. Anyway, just a suggestion. Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Mon Nov 3 18:31:03 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Mon, 3 Nov 2003 17:31:03 -0600 (CST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: I can hold my tongue no longer. Most of us are faced with similar problems. Several groups are in process of making a freely distributable OS for scientific use. The cAos project (http://www.caosity.org) is one such, and one that I feel is worth looking into. The website does not reflect their expanded goals as well as it could. cAos seeks to fill many needs. There will be four main flavors of cAos to match the needs expressed by the community. They are best described here: http://caosity.org/pipermail/caos/2003-September/000385.html cAosel-2 will be built from the SRPMS located at: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS It is not yet ready for release, but it is close. It should provide a long-term base that will be great for use in clustering and server environments. There will be x86, x86_64, and ia64 versions. Other flavors of cAos will provide a good base for scientific desktops. -- Rocky McGaugh _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Mon Nov 3 18:51:41 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Mon, 3 Nov 2003 16:51:41 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031031203745.GU1408@aminor.cs.uiuc.edu>; from weideng@uiuc.edu on Fri, Oct 31, 2003 at 02:37:45PM -0600 References: <1067629499.21719.73.camel@localhost.localdomain> <20031031203745.GU1408@aminor.cs.uiuc.edu> Message-ID: <20031103165141.A3153@lnxi.com> On Fri, Oct 31 2003 at 13:37, Wei Deng wrote: > On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > > - OSCAR / Rocks / etc... - generally installed on top of another > > distribution. We still have to pick a base distribution. > > From what I heard from Rocks mailing list, they will release 3.1.0 the > next Month, which will be based on RHEL 3.0, compiled from source code > that is publicly available, and free of charge. Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test for corporations trying to coexist and actually work with Red Hat. Why not focus that questionable rebuilding effort on a more worthwhile task? E.g. porting Fedora Core to support amd64, ia64, etc; adding features to Fedora Core that are relevant to clustering, etc. > Even though Rocks is based on RedHat distribution, it is complete, which > means you only need to download Rocks ISOs to accomplish your > installation. All well and good, but basing a "complete" clustering solution on a reverse engineered RHEL is completely underhanded and wrong (regardless of whether you feel RH is being greedy or whatever). Ripping off RHEL is a pretty cheap contribution to the advancement of free clustering technology. But maybe this type of thing gets peoples' ROCKS off? Mike (these views are my own; I just happen to work for a clustering company ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmkurtzer at lbl.gov Mon Nov 3 20:42:19 2003 From: gmkurtzer at lbl.gov (Greg Kurtzer) Date: Mon, 3 Nov 2003 17:42:19 -0800 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031103165141.A3153@lnxi.com> References: <1067629499.21719.73.camel@localhost.localdomain> <20031031203745.GU1408@aminor.cs.uiuc.edu> <20031103165141.A3153@lnxi.com> Message-ID: <20031104014219.GB32428@tux.lbl.gov> On Mon, Nov 03, 2003 at 04:51:41PM -0700, Mike Snitzer told me: > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > for corporations trying to coexist and actually work with Red Hat. Why > not focus that questionable rebuilding effort on a more worthwhile task? > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > Fedora Core that are relevant to clustering, etc. I guess what some would consider a worth while task others would consider a waste of time. From what I see, Fedora core is an unreasonable solution for me and I will not be contributing to it while RH holds every seat on the steering committee and rules all directions. Not that I have anything against RH, it is just that there is a major conflict of interest, don't you think? If Fedora gets too good, won't it take business from RHEL? > > Even though Rocks is based on RedHat distribution, it is complete, which > > means you only need to download Rocks ISOs to accomplish your > > installation. > > All well and good, but basing a "complete" clustering solution on a reverse > engineered RHEL is completely underhanded and wrong (regardless of whether > you feel RH is being greedy or whatever). Ripping off RHEL is a pretty > cheap contribution to the advancement of free clustering technology. But > maybe this type of thing gets peoples' ROCKS off? Uhmm, what is reversed engineered? The source _is_ open ya know... ;) Not that I have anything against what RH is doing, but to prove a point... Isn't RH taking code from the community, and selling it back to the community with limitations on redistribution? It seems to me that to accuse the community of "ripping off" OSS software is a bit harsh. So as RH has stated, their business model is not about the code, rather their support models around the code, and their trademark. Now I do want to mention that I think that RH's new direction is what is needed for Linux to become a suitable Enterprise solution. This move however left a vacancy in the community which is why projects are emerging or changing direction to fix this. It is OSS evolution (see: http://caosity.org/). > (these views are my own; I just happen to work for a clustering company ;) My views are also mine and not necessarily shared by my employers. ;) Greg -- Greg M. Kurtzer, CSE: Linux cluster specialist Lawrence Berkeley National Laboratory Contact: O=510.495.2307, P=510.448.4540, M=510.928.9953 1 Cyclotron Road MS:50C-3396, Berkeley, CA 94720 http://www.lbl.gov, http://scs.lbl.gov/, http://lug.lbl.gov/ Email: GMKurtzer_at_lbl.gov, Text: 5109289953_at_mobileatt.net _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Nov 3 21:08:34 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Mon, 03 Nov 2003 21:08:34 -0500 Subject: Freebee RH Releases... In-Reply-To: <3FA6E9C9.7010804@comcast.net> References: <3FA6E9C9.7010804@comcast.net> Message-ID: <1067911714.4434.32.camel@protein.scalableinformatics.com> On Mon, 2003-11-03 at 18:50, Jeffrey B. Layton wrote: > Let me mention cAos (caosity.org). It's a community supported RH > based OS built on RH EL (3.0 I think). It's following the letter of the > law in removing trademarks from RH EL and rebuilding the distribution > with some add features. > If you look at the list of people involved, I think you'll see some > familiar names from this mailing list. Consequently, I think cAos will > be built with clusters in mind. On a related note, has anyone played with Warewulf? I'd like to hear back from end users who have built their clusters using this system. Please send me email offline so as not to pollute the ongoing RH discussion... -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 3 21:13:21 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 3 Nov 2003 18:13:21 -0800 (PST) Subject: Cluster Poll Results (tangent into OS choices) - options In-Reply-To: <20031103165141.A3153@lnxi.com> Message-ID: hi ya On Mon, 3 Nov 2003, Mike Snitzer wrote: > On Fri, Oct 31 2003 at 13:37, > Wei Deng wrote: > > > On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > > > - OSCAR / Rocks / etc... - generally installed on top of another > > > distribution. We still have to pick a base distribution. > > > > From what I heard from Rocks mailing list, they will release 3.1.0 the > > next Month, which will be based on RHEL 3.0, compiled from source code > > that is publicly available, and free of charge. > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > for corporations trying to coexist and actually work with Red Hat. Why > not focus that questionable rebuilding effort on a more worthwhile task? > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > Fedora Core that are relevant to clustering, etc. i think that any proprietory sw should be avoided if it requires $$$ and licenses unfortunately, sometimes, 3rd party sw is built and tested against things like RH - AS and its permutations and derivatives .. so my feeling is to avoid those 3rd party vendors too - it's a choice of: - pay RH licenses ( cheaper than an inhouse $150K/yr linux guy ??) - get an in-house linux dude that can support all the GPL stuff and tweek it to your needs/requirements - buy/get a "free" distro that has most of the apps you need - working on the lastest/breatest pre-release or beta/alpha release implies you have lots of in-house development expertise or the ability to manipulate the vendors priorities to fix the bugs you find - am thinking, ( naively ? ) why is it so hard to get a distro that does what one needs and avoid "license fees" - what's so special ... - in every instance that a 3rd party vendor required xx-linux-version-0.5 ... i've been able to make those apps work on the latest/greatest version of said vendor or other distro - reading the various sw licenses is also a full time job too :-) - support should be done by in-house staff, or outsourced to expensive "support outfits" like rh, ibm, and few others - doing support in house is best, if staff is available, in which case, the fact that redhat is not providing support for older distros is a non-issue .. - the fact that redhat and other distro wants to collect license fee or break something that used to work in prior releases is a big problem in my book, especially when 90%-99% of the apps that's running is all GPL'd - but, on the bright side, at least rh, is directly or indirectly doing and supporting a lot of development work that is released as gpl have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Mon Nov 3 21:02:23 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Mon, 3 Nov 2003 21:02:23 -0500 Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> References: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: <20031103210223.C1594@www2> I thought that this article: http://news.com.com/2100-7344_3-5094774.html did a pretty good job of explaining what Red Hat is up to, and what some of the implications are. --Bob On Mon, Nov 03, 2003 at 03:07:24PM -0500, Michael Worsham wrote: > > As Per Slashdot (http://slashdot.org/) > > Received a missive this morning from the Red Hat Network... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Mon Nov 3 22:19:48 2003 From: jsims at csiopen.com (Joey Sims) Date: Mon, 3 Nov 2003 22:19:48 -0500 Subject: IB vs Myrinet Message-ID: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> I believe IB is a much better interconnect technology than Myrinet period. Plus, you don't have to deal with Myricom. IB is about to find major traction in this industry and Myricom will not have the guns to stop it. As adoption rates increase the price will decrease quite rapidly. I've been working with Mellanox and Topspin both using Mellanox chips but, their product positioning is different. The difference between the two being that Topspin offers a more "value added" flavor of Mellanox silicon with various hardware tweaks and a more robust software package. It depends on how you're looking at the cost of IB. First of all, it's comparative to Myrinet in "cost per port". Not too long ago, Myrinet was higher in price than IB is today and they haven't came out with anything "new" in forever. Well except a PCI-X version when PCI Express is around the corner. Myricom has a lot of installations worldwide and they are highly credible without a doubt but, this industry moves very fast and new things are not a new thing. At 3x the performance of Myrinet, "comparative" is still a better value. IB has many different options such as bridging between IB, GbE, or FC so you could hang your storage boxes off the IB switch without much hassle. Up to 10GB/sec is fairly fat today. The roadmap for IB has this interconnect technology ratcheted up way higher than 10GB. Regards, ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Mon Nov 3 22:50:42 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: Mon, 03 Nov 2003 21:50:42 -0600 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <1067917842.3219.88.camel@terra> On Mon, 2003-11-03 at 21:19, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. > Hmmm, all my dealings with Myricom have been excellent. We had a frame failure right before a holiday and they happily cross-shipped a replacement. We were back in business very quickly. All our questions of support have been answered quickly and accurately. -- -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From luis.licon at cimav.edu.mx Mon Nov 3 21:13:32 2003 From: luis.licon at cimav.edu.mx (luis.licon at cimav.edu.mx) Date: Mon, 3 Nov 2003 19:13:32 -0700 (MST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> References: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: <37053.148.223.46.10.1067912012.squirrel@www3.cimav.edu.mx> Fedora ;) (http://fedora.redhat.com) cheerz, Luis > As Per Slashdot (http://slashdot.org/) > > Received a missive this morning from the Red Hat Network, stating that > they > will discontinue maintenance > on > Red Hat Linux 7.x and 8.0 by the end of 2003, and on Red Hat 9.0 by the > end > of April, 2004. And, more ominously: 'Red Hat does not plan to release > another product in the Red Hat Linux line.' [The full text > of the email is > on Newsforge.] > > Does this mean that we will all have to using WS or ES version of RedHat, > thus getting ripped a bit on support and updates? Anyone have a cluster > running on anything else non-RH based and any details for how to do it? > > -- Michael > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Mon Nov 3 22:48:08 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Mon, 3 Nov 2003 19:48:08 -0800 Subject: Fwd: Cluster Poll Results (tangent into OS choices) Message-ID: Begin forwarded message: > From: Glen Otero > Date: Mon Nov 3, 2003 6:42:09 PM US/Pacific > To: beowulf at beowulf.org, Mike Snitzer > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: Cluster Poll Results (tangent into OS choices) > > > On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: > >> On Fri, Oct 31 2003 at 13:37, >> Wei Deng wrote: >> >>> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>>> - OSCAR / Rocks / etc... - generally installed on top of another >>>> distribution. We still have to pick a base distribution. >>> >>> From what I heard from Rocks mailing list, they will release 3.1.0 >>> the >>> next Month, which will be based on RHEL 3.0, compiled from source >>> code >>> that is publicly available, and free of charge. >> >> Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the >> smile-test >> for corporations trying to coexist and actually work with Red Hat. > > Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with > and/or sell Rocks-based clusters? Because it won't pass the smile test > inside a corporation? > >> Why >> not focus that questionable rebuilding effort on a more worthwhile >> task? >> E.g. porting Fedora Core to support amd64, ia64, etc; adding features >> to >> Fedora Core that are relevant to clustering, etc. >> >>> Even though Rocks is based on RedHat distribution, it is complete, >>> which >>> means you only need to download Rocks ISOs to accomplish your >>> installation. >> >> All well and good, but basing a "complete" clustering solution on a >> reverse >> engineered RHEL is completely underhanded and wrong (regardless of >> whether >> you feel RH is being greedy or whatever). Ripping off RHEL is a >> pretty >> cheap contribution to the advancement of free clustering technology. >> But >> maybe this type of thing gets peoples' ROCKS off? > > It's hardly reverse engineered, underhanded, or wrong. The Rocks guys > have been releasing their software for years based on standard Red Hat > releases. In order to make their cluster software freely available on > ia64, they built RH AS 2.1 from srpms, which is perfectly legal. They > also had planned to base the Rocks software on RH9 in the near future, > but RH decided to stop supporting everything but RHEL. So, in order to > continue to provide the community with the latest and greatest > clustering software with a Red Hat foundation, the Rocks guys are > migrating to a RHEL release. And in order to keep it free of charge, > they are building it all from scratch using RHEL srpms. And don't > think they are pulling one over on Red Hat or ripping Red Hat off. The > Rocks crew communicates frequently with Red Hat regarding these very > issues. Red Hat knows exactly what they are doing and supports it. > Besides, the technology that makes Rocks what it is is hardly due to > anything Red Hat creates. It's all the software that the Rocks crew > has written and packaged on top of Red Hat that matters. > >> Mike >> >> (these views are my own; I just happen to work for a clustering >> company ;) > > These views are my own. I just happen to own a clustering company. > > Glen Otero, Ph.D. > Linux Prophet > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Mon Nov 3 23:14:43 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Mon, 3 Nov 2003 20:14:43 -0800 Subject: Fwd: [Rocks-Discuss]Fwd: Cluster Poll Results (tangent into OS choices) Message-ID: <6AB8D414-0E7D-11D8-9947-000393911A90@linuxprophet.com> Begin forwarded message: > From: "Philip Papadopoulos" > Date: Mon Nov 3, 2003 7:40:44 PM US/Pacific > To: "Glen Otero" > To: npaci-rocks-discussion-admin at sdsc.edu > To: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]Fwd: Cluster Poll Results (tangent into OS > choices) > Reply-To: phil at sdsc.edu > > Since I don't read the beowulf list ... Somebody can forward. > > 1) Rocks isn't cheaply ripping off redhat. It certainly is our right > under gpl to provide the distro in the fashion we do. > 2) I would like redhat to have a cluster pricing that > academics/companies can afford and makes sense so they don't opt out > of using a tested product. That price is > $0. > 3) Redhat does an immense amount of work for the entire linux > community. I'd like them to have a way for us to give them a > reasonable amount of money for clusters. It takes real people, time, > and money to build a complete, tested distro. > 4) Fedora is really a rolling beta. The community needs this. But, > people who run "production" clusters desire regression-tested distros > and more slowly moving software. > 5) I don't work for or run a clustering company. I don't own stock in > redhat. > > 6) Open source means freedom in software. It doesn't mean free beer. > > -p > -----Original Message----- > From: Glen Otero > Date: Mon, 3 Nov 2003 18:49:51 > To:npaci-rocks-discussion at sdsc.edu > Subject: [Rocks-Discuss]Fwd: Cluster Poll Results (tangent into OS > choices) > > > > Begin forwarded message: > >> From: Glen Otero >> Date: Mon Nov 3, 2003 6:42:09 PM US/Pacific >> To: beowulf at beowulf.org, Mike Snitzer >> Cc: npaci-rocks-discussion at sdsc.edu >> Subject: Re: Cluster Poll Results (tangent into OS choices) >> >> >> On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: >> >>> On Fri, Oct 31 2003 at 13:37, >>> Wei Deng wrote: >>> >>>> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>>>> - OSCAR / Rocks / etc... - generally installed on top of another >>>>> distribution. We still have to pick a base distribution. >>>> >>>> From what I heard from Rocks mailing list, they will release 3.1.0 >>>> the >>>> next Month, which will be based on RHEL 3.0, compiled from source >>>> code >>>> that is publicly available, and free of charge. >>> >>> Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the >>> smile-test >>> for corporations trying to coexist and actually work with Red Hat. >> >> Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with >> and/or sell Rocks-based clusters? Because it won't pass the smile test >> inside a corporation? >> >>> Why >>> not focus that questionable rebuilding effort on a more worthwhile >>> task? >>> E.g. porting Fedora Core to support amd64, ia64, etc; adding features >>> to >>> Fedora Core that are relevant to clustering, etc. >>> >>>> Even though Rocks is based on RedHat distribution, it is complete, >>>> which >>>> means you only need to download Rocks ISOs to accomplish your >>>> installation. >>> >>> All well and good, but basing a "complete" clustering solution on a >>> reverse >>> engineered RHEL is completely underhanded and wrong (regardless of >>> whether >>> you feel RH is being greedy or whatever). Ripping off RHEL is a >>> pretty >>> cheap contribution to the advancement of free clustering technology. >>> But >>> maybe this type of thing gets peoples' ROCKS off? >> >> It's hardly reverse engineered, underhanded, or wrong. The Rocks guys >> have been releasing their software for years based on standard Red Hat >> releases. In order to make their cluster software freely available on >> ia64, they built RH AS 2.1 from srpms, which is perfectly legal. They >> also had planned to base the Rocks software on RH9 in the near future, >> but RH decided to stop supporting everything but RHEL. So, in order to >> continue to provide the community with the latest and greatest >> clustering software with a Red Hat foundation, the Rocks guys are >> migrating to a RHEL release. And in order to keep it free of charge, >> they are building it all from scratch using RHEL srpms. And don't >> think they are pulling one over on Red Hat or ripping Red Hat off. The >> Rocks crew communicates frequently with Red Hat regarding these very >> issues. Red Hat knows exactly what they are doing and supports it. >> Besides, the technology that makes Rocks what it is is hardly due to >> anything Red Hat creates. It's all the software that the Rocks crew >> has written and packaged on top of Red Hat that matters. >> >>> Mike >>> >>> (these views are my own; I just happen to work for a clustering >>> company ;) >> >> These views are my own. I just happen to own a clustering company. >> >> Glen Otero, Ph.D. >> Linux Prophet > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > > Sent via BlackBerry - a service from AT&T Wireless. > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at callident.com Mon Nov 3 21:42:09 2003 From: glen at callident.com (Glen Otero) Date: Mon, 3 Nov 2003 18:42:09 -0800 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031103165141.A3153@lnxi.com> Message-ID: <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: > On Fri, Oct 31 2003 at 13:37, > Wei Deng wrote: > >> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>> - OSCAR / Rocks / etc... - generally installed on top of another >>> distribution. We still have to pick a base distribution. >> >> From what I heard from Rocks mailing list, they will release 3.1.0 the >> next Month, which will be based on RHEL 3.0, compiled from source code >> that is publicly available, and free of charge. > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the > smile-test > for corporations trying to coexist and actually work with Red Hat. Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with and/or sell Rocks-based clusters? Because it won't pass the smile test inside a corporation? > Why > not focus that questionable rebuilding effort on a more worthwhile > task? > E.g. porting Fedora Core to support amd64, ia64, etc; adding features > to > Fedora Core that are relevant to clustering, etc. > >> Even though Rocks is based on RedHat distribution, it is complete, >> which >> means you only need to download Rocks ISOs to accomplish your >> installation. > > All well and good, but basing a "complete" clustering solution on a > reverse > engineered RHEL is completely underhanded and wrong (regardless of > whether > you feel RH is being greedy or whatever). Ripping off RHEL is a pretty > cheap contribution to the advancement of free clustering technology. > But > maybe this type of thing gets peoples' ROCKS off? It's hardly reverse engineered, underhanded, or wrong. The Rocks guys have been releasing their software for years based on standard Red Hat releases. In order to make their cluster software freely available on ia64, they built RH AS 2.1 from srpms, which is perfectly legal. They also had planned to base the Rocks software on RH9 in the near future, but RH decided to stop supporting everything but RHEL. So, in order to continue to provide the community with the latest and greatest clustering software with a Red Hat foundation, the Rocks guys are migrating to a RHEL release. And in order to keep it free of charge, they are building it all from scratch using RHEL srpms. And don't think they are pulling one over on Red Hat or ripping Red Hat off. The Rocks crew communicates frequently with Red Hat regarding these very issues. Red Hat knows exactly what they are doing and supports it. Besides, the technology that makes Rocks what it is is hardly due to anything Red Hat creates. It's all the software that the Rocks crew has written and packaged on top of Red Hat that matters. > Mike > > (these views are my own; I just happen to work for a clustering > company ;) These views are my own. I just happen to own a clustering company. Glen Otero, Ph.D. Linux Prophet Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Mon Nov 3 23:28:49 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Mon, 3 Nov 2003 21:28:49 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104014219.GB32428@tux.lbl.gov>; from gmkurtzer@lbl.gov on Mon, Nov 03, 2003 at 05:42:19PM -0800 References: <1067629499.21719.73.camel@localhost.localdomain> <20031031203745.GU1408@aminor.cs.uiuc.edu> <20031103165141.A3153@lnxi.com> <20031104014219.GB32428@tux.lbl.gov> Message-ID: <20031103212849.A4021@lnxi.com> On Mon, Nov 03 2003 at 18:42, Greg Kurtzer wrote: > On Mon, Nov 03, 2003 at 04:51:41PM -0700, Mike Snitzer told me: > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > > for corporations trying to coexist and actually work with Red Hat. Why > > not focus that questionable rebuilding effort on a more worthwhile task? > > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > > Fedora Core that are relevant to clustering, etc. > > I guess what some would consider a worth while task others would consider a > waste of time. From what I see, Fedora core is an unreasonable solution for me > and I will not be contributing to it while RH holds every seat on the steering > committee and rules all directions. Not that I have anything against RH, it is > just that there is a major conflict of interest, don't you think? > > If Fedora gets too good, won't it take business from RHEL? I have the same concerns but think it would be better to challenge the level of control that the RedHat-only committee will exert on the Fedora Project sooner rather than later. Below you reference how RedHat says its not about the code; so why should Red Hat _really_ care if Fedora is even better than the enterprise offering? If RedHat holds Fedora too close to their chest they'll give people a _real_ reason to defect to other solutions. > > > Even though Rocks is based on RedHat distribution, it is complete, which > > > means you only need to download Rocks ISOs to accomplish your > > > installation. > > > > All well and good, but basing a "complete" clustering solution on a reverse > > engineered RHEL is completely underhanded and wrong (regardless of whether > > you feel RH is being greedy or whatever). Ripping off RHEL is a pretty > > cheap contribution to the advancement of free clustering technology. But > > maybe this type of thing gets peoples' ROCKS off? > > Uhmm, what is reversed engineered? The source _is_ open ya know... ;) Yeap, reverse engineered is the wrong term; how about time spent uncovering what is RH-specific that needs to be removed/replaced. I'd be inclined to say that the sustained engineering effort that is proposed for cAosel would be better spent innovating Fedora; but maybe thats just me. Today, RedHat developers openly stated on the fedora-devel list that RHELv3 code (specifically amd64 code) is open for all to filter into Fedora. Now thats a true test of the RedHat-only committee, no? > Not that I have anything against what RH is doing, but to prove a point... > Isn't RH taking code from the community, and selling it back to the community > with limitations on redistribution? It seems to me that to accuse the > community of "ripping off" OSS software is a bit harsh. > > So as RH has stated, their business model is not about the code, rather their > support models around the code, and their trademark. > > Now I do want to mention that I think that RH's new direction is what is > needed for Linux to become a suitable Enterprise solution. This move however > left a vacancy in the community which is why projects are emerging or changing > direction to fix this. It is OSS evolution (see: http://caosity.org/). Fair enough, but keep in mind that the polished innovations that RedHat has put into the Red Hat product are free too; hence the ability to just rebuild their RHEL SRPMs. Red Hat realized there was a large segment of the OSS community that would be left in the cold by their move; they balanced that fact with Fedora. Conspiracy theories on the RedHat-only committee aside, Fedora is a pretty good peace offering. Time will tell if Fedora truly is good for OSS; but to just go off and further splinter the RPM-based Linux distro space (with cAos, or whatever) is short-cited. OSCAR, ROCKS, Warewulf, could very easily take the time to make Fedora into what they need it to be. In that moderately innovative competing solutions to the same problem has been the chosen path for clustering; why not seal the same fate for the Linux distributions that their based on, right? This is a fun debate, but might be too off-topic... feel free to email me either way. Thanks, Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Mon Nov 3 23:13:32 2003 From: jsims at csiopen.com (Joey Sims) Date: Mon, 3 Nov 2003 23:13:32 -0500 Subject: IB vs Myrinet Message-ID: <812B16724C38EE45A802B03DD01FD547226268@exchange.concen.com> Hello Dean, I was stating this opinion from a manufacturers viewpoint. A viewpoint expressed outside of their circle of distribution partners. You have to have outstanding products, service and support to be as successful as Myricom. -joey ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Nov 3 23:58:16 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 03 Nov 2003 23:58:16 -0500 Subject: Fwd: Cluster Poll Results (tangent into OS choices) In-Reply-To: References: Message-ID: <3FA731E8.2060603@scalableinformatics.com> Glen Otero wrote: > > > Begin forwarded message: > >> On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: >> >>> On Fri, Oct 31 2003 at 13:37, >>> Wei Deng wrote: >>> >>>> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>> [...] >>> >>> Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the >>> smile-test >>> for corporations trying to coexist and actually work with Red Hat. >> >> >> Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with >> and/or sell Rocks-based clusters? Because it won't pass the smile >> test inside a corporation? > The "smile" test? I thought it was all about risks, support, etc. ROCKS appears to be in significant use as indicated by the ROCKS counter page. Remember that RedHat's added value is in packaging, bug fixes, etc. They bundle many peoples' code (Don's and probably a number of others here). They have added value back to the community as a whole. That said, they are not terribly interested in HPC from what I can see. Might be due to the size of this market compared to their total addressable market. >> >>> Why >>> not focus that questionable rebuilding effort on a more worthwhile >>> task? >>> E.g. porting Fedora Core to support amd64, ia64, etc; adding >>> features to >>> Fedora Core that are relevant to clustering, etc. >>> I would argue that Fedora is more like a permanent beta. It doesn't look like we will get good things into Fedora anytime soon (x86_64, XFS et al), and the release/support cycle is too short to be useful for long term customer support. The risks of that platform would be somewhat high for a commercial deployment, and I would find it hard to justify installing this for a customer knowing full well that next year, they are support free. >>>> Even though Rocks is based on RedHat distribution, it is complete, >>>> which >>>> means you only need to download Rocks ISOs to accomplish your >>>> installation. >>> >>> >>> All well and good, but basing a "complete" clustering solution on a >>> reverse >>> engineered RHEL is completely underhanded and wrong (regardless of >>> whether >>> you feel RH is being greedy or whatever). Ripping off RHEL is a pretty >>> cheap contribution to the advancement of free clustering >>> technology. But >>> maybe this type of thing gets peoples' ROCKS off? >> Ripping of RedHat? I thought they were packaging GPL and similar software... how is taking GPL software which is Libre' and redistributing recompiled versions of it (allowable under the license) ripping off the folks who have a their own packaging of it? >> >>> Mike >>> >>> (these views are my own; I just happen to work for a clustering >>> company ;) >> >> >> These views are my own. I just happen to own a clustering company. > RedHat is focused upon its primary market, which appears to be Unix/Windows server displacement. Mike's employer is focused upon selling hardware. Glen's company is focused upon good quality cluster software. For companies like mine, the issue is a stable reliable platform to build our product offerings. The problem with things like the permanent beta cycles of Fedora is that we will have to focus more upon the underlying issues of the platform changes (which will not be focused upon HPC needs) than on our own development. This is a moving target. This is "Not A Good Thing(TM)". A whole bunch of commercial software vendors have "old" and "outdated" OS support for their wares. I have to carefully check the software OS support matrix when building engineering or bioclusters. RedHat 7.3 is long in the tooth, and it happens to be a very good cluster distribution, in large part because so many commercial codes have been ported in the RH7.x time frame. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Tue Nov 4 00:58:14 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Mon, 3 Nov 2003 22:58:14 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com>; from glen@callident.com on Mon, Nov 03, 2003 at 06:42:09PM -0800 References: <20031103165141.A3153@lnxi.com> <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> Message-ID: <20031103225814.B4021@lnxi.com> On Mon, Nov 03 2003 at 19:42, Glen Otero wrote: > > On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: > > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the > > smile-test for corporations trying to coexist and actually work with > > Red Hat. > > Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with > and/or sell Rocks-based clusters? Because it won't pass the smile test > inside a corporation? Could be that the larger corporations in your list embraced Rocks before this enterprise distro vs. no-cost distro became an issue. I _really_ doubt those corporations would do themselves any justice in the eyes of RedHat by undermining RedHat's enterprise offering by having an educational institution broker RHEL rebuilds. All of this debate over RHEL repackaging appropriateness is interesting to me. I have explored this as an option and arrived at the fact that it really doesn't offer anything of real value; simply offers a free-beer solution to an otherwise expensive product. Which obviously is invaluable to Rocks and many others on this list. > > Why not focus that questionable rebuilding effort on a more worthwhile > > task? E.g. porting Fedora Core to support amd64, ia64, etc; adding > > features to Fedora Core that are relevant to clustering, etc. > > > >> Even though Rocks is based on RedHat distribution, it is complete, > >> which means you only need to download Rocks ISOs to accomplish your > >> installation. > > > > All well and good, but basing a "complete" clustering solution on a > > reverse engineered RHEL is completely underhanded and wrong > > (regardless of whether you feel RH is being greedy or whatever). > > It's hardly reverse engineered, underhanded, or wrong. The Rocks guys > have been releasing their software for years based on standard Red Hat > releases. In order to make their cluster software freely available on > ia64, they built RH AS 2.1 from srpms, which is perfectly legal. I never said rebuilding RHEL is illegal; simply stated that I felt it was underhanded and wrong; we're all entitled to our opinions. I guess the Rocks people are at peace with their chosen engineering roadmap. > Besides, the technology that makes Rocks what it is is hardly due to > anything Red Hat creates. It's all the software that the Rocks crew has > written and packaged on top of Red Hat that matters. Thats a bold statement; Rocks' dependency on RH is implicit and hacking RHEL to be "free" requires significant effort on the part of rocks developers (even though they play it down). Also there is this post that points out just how important Red Hat is to Rocks: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003307.html Also, nice to see you cross posted to the rocks-discussion, for the benefit of those on the beowulf list, Mason Katz (mjk at sdsc.edu) had an informative reply: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-November/003567.html It would appear as though Rocks is free and clear to openly redistribute RHEL SRPM-rebuilds; this is an interesting loop-hole: - Rocks released by an academic institution, which means it has a license to use the RedHat trademark. This also means no one can charge for Rocks software (only support). Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Tue Nov 4 02:11:47 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Tue, 4 Nov 2003 00:11:47 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA731E8.2060603@scalableinformatics.com>; from landman@scalableinformatics.com on Mon, Nov 03, 2003 at 11:58:16PM -0500 References: <3FA731E8.2060603@scalableinformatics.com> Message-ID: <20031104001147.D4021@lnxi.com> On Mon, Nov 03 2003 at 21:58, Joe Landman wrote: > The "smile" test? I thought it was all about risks, support, etc. > ROCKS appears to be in significant use as indicated by the ROCKS counter > page. In my _personal_ utopia of the industry smile-tests are worthy; I do however realize business is business and people want stable yet affordable solutions before anything else. That said, smiles can be had along the way. > Remember that RedHat's added value is in packaging, bug fixes, etc. Not to mention numerous contributions to the Linux kernel, low-level libraries (nptl), compilers and much more. > I would argue that Fedora is more like a permanent beta. It doesn't > look like we will get good things into Fedora anytime soon (x86_64, XFS > et al), and the release/support cycle is too short to be useful for long > term customer support. The risks of that platform would be somewhat > high for a commercial deployment, and I would find it hard to justify > installing this for a customer knowing full well that next year, they > are support free. It all comes down to opportunity cost; time spent working with the Fedora project (and its evolving policies) to add required features is time consuming and takes away from _real_ HPC innovation. BUT, if the entire HPC community actually worked together to bring about that change it wouldn't be that hard. Too idealistic? It would appear so based on the resounding cry for rebuilt RHEL solutions. Keep in mind that customers want "the real thing". > Ripping of RedHat? I thought they were packaging GPL and similar > software... how is taking GPL software which is Libre' and > redistributing recompiled versions of it (allowable under the license) > ripping off the folks who have a their own packaging of it? It comes down to the unfortunate reality that many in the HPC community would rather continuously fork/reinvent RHEL than work with Red Hat to arrive at a mutually beneficial arrangement. > RedHat is focused upon its primary market, which appears to be > Unix/Windows server displacement. Mike's employer is focused upon > selling hardware. Glen's company is focused upon good quality cluster > software. While I appreciate you associating myself and my views with my employeer I have expressed my _personal_ views. However, your assessment of my employeer's focus is not accurate; but I'm not going to get into that discussion. > For companies like mine, the issue is a stable reliable platform to > build our product offerings. The problem with things like the permanent > beta cycles of Fedora is that we will have to focus more upon the > underlying issues of the platform changes (which will not be focused > upon HPC needs) than on our own development. This is a moving target. > This is "Not A Good Thing(TM)". > > A whole bunch of commercial software vendors have "old" and "outdated" > OS support for their wares. I have to carefully check the software OS > support matrix when building engineering or bioclusters. RedHat 7.3 is > long in the tooth, and it happens to be a very good cluster > distribution, in large part because so many commercial codes have been > ported in the RH7.x time frame. Make no mistake about it, its not good for any commercial company that historically relied upon Red Hat Linux; hence the extensive attention this debate has recieved all over the Internet. You have blantantly attempted to spin this thread in a self-serving/tangential direction of company vs company; and it wasn't about that. Now I know why this list is perdominantly technical and _tries_ to stay away from the commercial interests of any one vendor. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.pfenniger at obs.unige.ch Tue Nov 4 04:01:40 2003 From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger) Date: Tue, 04 Nov 2003 10:01:40 +0100 Subject: opteron VS Itanium 2 In-Reply-To: <3FA6D71B.1020900@comcast.net> References: <3FA6D71B.1020900@comcast.net> Message-ID: <3FA76AF4.7080800@obs.unige.ch> Jeffrey B. Layton wrote: > Rocky McGaugh wrote: > >> On Mon, 3 Nov 2003, Greg Lindahl wrote: >>> On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: >>>> Yeah, me too. As someone who just ponied up for a rather large IB >>>> installation, I'm not sure that most people realize what a substantial >>>> percentage of the cost of the cluster the IB might be. >>> From all public indications, IB prices are roughly the same as >>> Myrinet. Nothing new there... >>> >>> -- greg >>> >> IB costs significantly more than Myrinet... > > Are you sure? In the quotes I've gotten, it's about the same as Myrinet > except for very small clusters (perhaps 4 nodes or less). In fact in some > cases, it's cheaper than Myrinet. :) I confirm because we bought such a 24 node cluster with switched IB. The hardware cost of IB was 2/3 of Myrinet with better specs, but without a software support as good as provided by Myricom. For example the free mpich over GM provided by Myricom correpsonds to $200 per processor if a commercial MPI must be purchased. Daniel Pfenniger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Tue Nov 4 12:10:36 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Tue, 04 Nov 2003 09:10:36 -0800 Subject: Turn on nodes through the network References: Message-ID: <3FA7DD8C.2ED177B2@attglobal.net> Would it be possible to tweak the BIOS to behave like a BMC? Paul Schenker Don Holmgren wrote: > On Mon, 3 Nov 2003, Mark Hahn wrote: > > > > 1.5 support. That generally requires a Baseboard Management Controller > > > (BMC) on the motherboard, which adds $25-$150 to the price. > > > > I'd very much appreciate seeing an example of this. or do you mean > > "BMC adds $25-150 to the price of an already gold-plated system"? > > > > as a concrete example, Tyan's S2723 is a reasonable example of a board > > you might find in a cluster. IPMI/BMC is an option via the qlogic > > zircon, but I have never found a real price for it - one vendor quoted > > me a little under $Cdn 1000 for the daughtercard, which is just plain > > ridiculous for a <$500 motherboard. > > > > thanks, mark hahn. > > On clusters we've built with Supermicro E7500 or E7501 chipset > motherboards (P4DPE, X5DPE), which I believe are roughly equivalent to > the Tyan S2723 in features and price, there's an IPMI/BMC option card > based on the Agilent BMC available. We've paid between $90 and $100 for > these cards, depending on volume. > > I've not purchased Intel motherboards in quantity, but from doing a > quick web search, it looks like the incremental price between boards > without (SE7501CW2) and with (SE7501BR2) IPMI is no more than $150. > > Don Holmgren > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Nov 4 04:18:37 2003 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 4 Nov 2003 10:18:37 +0100 (CET) Subject: Freebee RH Releases... In-Reply-To: Message-ID: On Mon, 3 Nov 2003, Robert G. Brown wrote: > On Mon, 3 Nov 2003, Michael Worsham wrote: > > Fedora, fedora, fedora. > I agree. But please never be caught on video prancing about the stage shouting this :-) > http://fedora.redhat.com/ > > > Fedora will come pre-yummified at the core and will have RH engineers I did a yum update last night to the fedora test release. Seemed to work fine! Yum is nice (I've bene using apt-for-rpm until now) The Fedora release 1 is due out in the next couple of days (there was a slip in the release date, which was supposed to be Monday) > So nobody RHish panic, just start looking into fedora, maybe join its > list(s). > BTW, I expect there to be opteron support in fedora pretty soon as well. > There better be; I'm getting a bunch of them...;-) Ooooh.... that sounds interesting. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Tue Nov 4 05:37:47 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Tue, 4 Nov 2003 02:37:47 -0800 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <20031104103747.GA836@sphere.math.ucdavis.edu> On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. People I've talked to is pretty happy with Myricom, of course there have been the occasional complaint. But in general I'd say they deliver what they promise, and the result is pretty much as advertised. > IB is about to find major traction in this industry and Myricom will not I've heard this statement for 2 years running, not that it couldn't become true. > have the guns to stop it. As adoption rates increase the price will > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. > The difference between the two being that Topspin offers a more "value > added" flavor of Mellanox silicon with various hardware tweaks and a > more robust software package. > > It depends on how you're looking at the cost of IB. First of all, it's > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when PCI Express > is around the corner. Myricom has a lot of installations worldwide and > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different 3x the performance of Myrinet? Actual observed performance in a real life situation? Are we talking programmer visible bandwidth? Latency? Or both? At the process? Port? Or switch level? > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. Roadmaps are great, easy, and cheap. I'm most interested in what I can build a cluster with today. Do you mean 10 GB/sec = 10 Gigabytes/sec or 10 Gigabits/sec? In what conditions? Where can I download a linux compatible driver? Linux compatible MPI implementation? Linux/AMD64 drivers/MPI implementations? -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 07:38:30 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 04 Nov 2003 07:38:30 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104001147.D4021@lnxi.com> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> Message-ID: <3FA79DC6.9040608@scalableinformatics.com> Mike Snitzer wrote: >On Mon, Nov 03 2003 at 21:58, >Joe Landman wrote: > > > >>The "smile" test? I thought it was all about risks, support, etc. >>ROCKS appears to be in significant use as indicated by the ROCKS counter >>page. >> >> > >In my _personal_ utopia of the industry smile-tests are worthy; I do > > Ahh... so thats what I need to win more business. The smile test... (...) >>Remember that RedHat's added value is in packaging, bug fixes, etc. >> >> > >Not to mention numerous contributions to the Linux kernel, low-level >libraries (nptl), compilers and much more. > > This was specifically implied/mentioned in my post. "They bundle many peoples' code (Don's and probably a number of others here). They have added value back to the community as a whole." > > >>I would argue that Fedora is more like a permanent beta. It doesn't >>look like we will get good things into Fedora anytime soon (x86_64, XFS >>et al), and the release/support cycle is too short to be useful for long >>term customer support. The risks of that platform would be somewhat >>high for a commercial deployment, and I would find it hard to justify >>installing this for a customer knowing full well that next year, they >>are support free. >> >> > >It all comes down to opportunity cost; time spent working with the Fedora >project (and its evolving policies) to add required features is >time consuming and takes away from _real_ HPC innovation. > Yes. That is the point after all, that if you spend all your time discussing whether or not your masters^H^H^H^H^H^H^H benefactors will accept XFS or other HPC relevant features in the kernel, you do not then have time to put them in. Part of it is opportunity cost, the other part is a zero sum game of time. > BUT, if the >entire HPC community actually worked together to bring about that change >it wouldn't be that hard. Too idealistic? > I believe it might be too idealistic. This crowd, if you read this forum and some of the others, likes to innovate and create its own value atop some sort of standard offering. If I am reading you correctly, you are advising focusing on making on particular platform that you personally (to separate you from your employer here) like, as the standard, and stop all the bickering about doing another direction (that you personally do not like). Is this a fair read? >It would appear so based on >the resounding cry for rebuilt RHEL solutions. Keep in mind that >customers want "the real thing". > > Hmmm. The one thing that customers repeatedly tell me is that they want their solutions to be supportable. They don't want one-offs, or other things that significantly increase their risks. If the "real thing" represents a risk, they will not go for it. If the "non-real-thing" ala ROCKS, Warewulf, CLIC, OSCAR, et al. are well supported, and slowly varying enough, they are content to live with a few warts. More important than this is a specific understanding on the part of the customer that RedHat is focused upon a different market. As I noted in the post to which you replied "That said, they are not terribly interested in HPC from what I can see. Might be due to the size of this market compared to their total addressable market. " They, in this case, are RedHat. The "Real-Thing"(TM) doesn't matter in this case, if it is missing key functionality/features et al. Moreover, Fedora will be no-more the real thing than the RHEL based versions. Though, the RHEL will vary more slowly over time than Fedora, which is a very good thing for a stable commercial/academic cycle shop. > > >>Ripping of RedHat? I thought they were packaging GPL and similar >>software... how is taking GPL software which is Libre' and >>redistributing recompiled versions of it (allowable under the license) >>ripping off the folks who have a their own packaging of it? >> >> > >It comes down to the unfortunate reality that many in the HPC community >would rather continuously fork/reinvent RHEL than work with Red Hat to >arrive at a mutually beneficial arrangement. > > As noted previously "That said, they are not terribly interested in HPC from what I can see. Might be due to the size of this market compared to their total addressable market. " This market is not significant to them. It will not drive hundreds of thousands of additional unit sales. It requires levels of support that they may be unwilling to supply due to the individuality of the products offered. Aside from that, finding and paying for real HPC people (e.g. more than a few years experience) is not cheap/easy. This increases their marginal costs without really increasing their marginal utility. I don't blame them for this. It is simple economics. It leaves a market hole that (as Glen pointed out with his company), people are willing to step up to fill. > > >>RedHat is focused upon its primary market, which appears to be >>Unix/Windows server displacement. Mike's employer is focused upon >>selling hardware. Glen's company is focused upon good quality cluster >>software. >> >> > >While I appreciate you associating myself and my views with my employeer I >have expressed my _personal_ views. However, your assessment of my >employeer's focus is not accurate; but I'm not going to get into that >discussion. > > > Highly oversimplistic assessment on my part. I assumed that someone writing from an lnxi.com email address would be expressing corporate philosophy. This is your own _personal_ opinion then? >>For companies like mine, the issue is a stable reliable platform to >>build our product offerings. The problem with things like the permanent >>beta cycles of Fedora is that we will have to focus more upon the >>underlying issues of the platform changes (which will not be focused >>upon HPC needs) than on our own development. This is a moving target. >>This is "Not A Good Thing(TM)". >> >>A whole bunch of commercial software vendors have "old" and "outdated" >>OS support for their wares. I have to carefully check the software OS >>support matrix when building engineering or bioclusters. RedHat 7.3 is >>long in the tooth, and it happens to be a very good cluster >>distribution, in large part because so many commercial codes have been >>ported in the RH7.x time frame. >> >> > >Make no mistake about it, its not good for any commercial company that >historically relied upon Red Hat Linux; hence the extensive attention this >debate has recieved all over the Internet. You have blantantly attempted >to spin this thread in a self-serving/tangential direction of company >vs company; and it wasn't about that. Now I know why this list is > > Uh... no. Not even close, and mind the defensive bit, no one was attacking you. There is no "spin" here. Commercial software developers need stable bases for their products, pure and simple. Nothing self-serving about that. I didn't state whether or not I worked for a clustering company. Though you asked us above to consider your opinion to be a personal one, this statement ties your posts to your company (IMO), and using your words, " attempted to spin this thread in a self-serving ..." Glen gave his point of view, which I am guessing reflects his corporate positions. If you are saying that Glen's statement is self serving, I guess I am at a loss to understand how yours is not. I gave my point of view, and yes, it does reflect my company's philosophy. I do not believe it is "self serving" to state what we perceive as product developers. >perdominantly technical and _tries_ to stay away from the commercial >interests of any one vendor. > > I may be (slightly but purposely) dense at the moment, but I would suggest that you might wish to place a disclaimer in your posts as to when you post here, that these are your views and not those of the good folks at Linux Networx. Otherwise, folks like me, Glen, and a number of others might make a mistake and assume you are posting their views. >Mike > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 4 08:26:39 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 4 Nov 2003 08:26:39 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104014219.GB32428@tux.lbl.gov> Message-ID: On Mon, 3 Nov 2003, Greg Kurtzer wrote: > On Mon, Nov 03, 2003 at 04:51:41PM -0700, Mike Snitzer told me: > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > > for corporations trying to coexist and actually work with Red Hat. Why > > not focus that questionable rebuilding effort on a more worthwhile task? > > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > > Fedora Core that are relevant to clustering, etc. > > I guess what some would consider a worth while task others would consider a > waste of time. From what I see, Fedora core is an unreasonable solution for me > and I will not be contributing to it while RH holds every seat on the steering > committee and rules all directions. Not that I have anything against RH, it is > just that there is a major conflict of interest, don't you think? > > If Fedora gets too good, won't it take business from RHEL? It already is, but it is mostly business that RHEL will never get anyway. Basically, it will be a cold day in hell before Duke or any other University starts paying Red Hat a million dollars a year for access to a linux distribution consisting of 99% GPL/free packages and 1% logos, given that we do NOT consume RH "support", but are rather a net contributer (we run a primary RH mirror, for example, as well as several GPL projects some of which have in the past de facto shamelessly promoted RH and which now will shamelessly promote fedora just by existing with that as their primary base). Remember also that RH >>needs<< fedora, or something like it. They CANNOT release RHEL in rawhide (like I'm not only going to pay RH order of $100/seat, but I'm going to get and install beta-level software for that price and help debug it? Oh yeah.) One reason RH has been, for the most part, rock-solid as largish linux distributions go is because all the squishiness in each new release has been squeezed from the rock in rawhide. There are several other reasons that fedora will likely work just fine and not be subverted by RH. One is that (as we've been discussing on the list) it is by no means clear that RH can legally restrict even the reinstallation of binary RPM's whose primary content is GPL software in any way. I personally, after reading the GPL carefully yet again, think that this is the case. In fact, I think that if they do something like mix a lot of "trademark" logos in with e.g. GPL/gnome icons that are required by GPL packages in e.g. redhat-artwork that redhat-artwork de facto inherits a GPL -- the GPL is explicitly written to keep free software free as in air. Note well that I >>cannot<< make a GPL package "proprietary" in any way -- certainly not by adding a header that says something like "this package is copyright and trademark and belongs to rgb". Not even by actually writing something that adds value that IS copyright rgb. Add to the GPL base, become part of the GPL base -- that is the rule. The only packages RH can legitimately constrain the reinstallation of are: packages containing proprietary Red Hat code with no full GPL (or equivalent) components or library dependencies and packages containing strictly RH logos and trademarks, ditto. Nobody cares about either one. I don't think there are any of the former (I could be wrong) and the latter is advertising. Like I should pay RH enormous sums of money for installing their advertising on my system? Finally, one REASON that RH is splitting out fedora is to GET more work out of "the community". It costs them a fair bit to keep up the RHL releases for years after they are obsoleted. They'd like to have their developers working on RH 10, but they're still supporting RH 7, 8 and 9 and have to constantly fix bugs, backport security patches, etc. So they're putting fedora out there in part to armtwist US into keeping old releases up (or not), as a tacit acknowledgement of the realities of the GPL (which apply to RHEL whether or not they like it and frankly whether or not one rebuilds the source rpm's -- there isn't any way to restrict the installation or redistribution of a BINARY package of GPL code as I read the license), and to maintain the absolutely essential community involvement in the rawhide process that leads TO a "RHEL" that they can market with some degree of success to corporations. As I've noted and will continue to note again, I think that we are on the verge of a paradigm shift in linux anyway, and that this is going to likely kick it over the edge. The existence of yum AND apt-tools finally make it natural to consider merger, and completely alter the scaling paradigms at the institutional support level. We are seeing the very last releases of RH where CD ISO's are in any way relevant, for example. The future is going to focus completely on the network and the concept of "the repository", on cross-distribution standardized package metadata, on fully automated rebuilds from source packaging. Yum conceivably makes the entire concept of "a distribution release" obsolete -- we'll have to wait and see, but I suspect that this will be the case. Instead of upgrading systems on a timescale with granularity of years, we may be entering a universe where systems are microincrementally updated nightly, with immediate feedback and repair, and with a user/admin determined lag relative to the primary repositories to insure a level of institutional stability deemed locally acceptable. In a way this is DEMANDED by security anyway -- security requirements are a major driver of the paradigm shift. RH will eventually be making its money from RHEL by inserting themselves into this stream with a FIXED delay and a certification process required by e.g. banks and other corporations that have due diligence and government audit laws to satisfy. Everybody else will ride the wave (and generally be more secure than said banks, but it will take years for lawmakers to catch up to the new paradigm. > Now I do want to mention that I think that RH's new direction is what is > needed for Linux to become a suitable Enterprise solution. This move however > left a vacancy in the community which is why projects are emerging or changing > direction to fix this. It is OSS evolution (see: http://caosity.org/). It be VERY interesting to see what the proliferation of community efforts produces. Perhaps they'll one day merge. Perhaps not. Open source is a rich environment for the evolution of new ideas, and these will be "interesting" times indeed:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Tue Nov 4 08:31:39 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Tue, 04 Nov 2003 08:31:39 -0500 Subject: IB vs Myrinet In-Reply-To: <1067917842.3219.88.camel@terra> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <1067917842.3219.88.camel@terra> Message-ID: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> On Mon, 2003-11-03 at 22:50, Dean Johnson wrote: > On Mon, 2003-11-03 at 21:19, Joey Sims wrote: > > I believe IB is a much better interconnect technology than Myrinet > > period. Plus, you don't have to deal with Myricom. > > > > Hmmm, all my dealings with Myricom have been excellent. We had a frame > failure right before a holiday and they happily cross-shipped a > replacement. We were back in business very quickly. All our questions of > support have been answered quickly and accurately. Same here -- the support has been great. We did run into a problem where they were short on line cards to send as replacements, but they did keep us posted and let us know what was going on. BTW -- did no one notice that the statement was 'comparable prices per port' but that you had to _pay_ for the mpi implementation? I feel alot better knowing there is a free and open source implementation of MPI over gm from LAM and mpich. Nic -- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Tue Nov 4 09:34:56 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue, 4 Nov 2003 08:34:56 -0600 Subject: IB vs Myrinet In-Reply-To: <20031104103747.GA836@sphere.math.ucdavis.edu> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <20031104103747.GA836@sphere.math.ucdavis.edu> Message-ID: On Tue, 4 Nov 2003, Bill Broadley wrote: > On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > > > IB is about to find major traction in this industry and Myricom will not > > I've heard this statement for 2 years running, not that it couldn't > become true. Just look at all of the recent press releases for IB clusters being built. The hardware is finally actually available, and a lot of HPC clusters are starting to be built with it. In the spirit of full disclosure, I have three engineers on-site today from an IB vendor working with me to install a 192 node diskless IB cluster. > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > > interconnect technology ratcheted up way higher than 10GB. > > Roadmaps are great, easy, and cheap. I'm most interested in > what I can build a cluster with today. > > Do you mean 10 GB/sec = 10 Gigabytes/sec or 10 Gigabits/sec? In what > conditions? Where can I download a linux compatible driver? Linux > compatible MPI implementation? Linux/AMD64 drivers/MPI implementations? It's 10 gigabits per second (theoretical). Linux drivers are available from all of the vendors. Certain vendors (including the one I purchased my IB from) provide open-source drivers. There are a few MPI implementations, there are commercial versions MPI/Pro and ChaMPIon from MPI Software Technology, Inc. MVAPICH is available from OSC, and I'm hearing that there may be a version of LAM in the near future. I'm not sure of the status of the AMD64 drivers, although I know of at least one AMD64 cluster currently being built with IB, so at least some level of support exists. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Tue Nov 4 09:53:55 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Tue, 04 Nov 2003 09:53:55 -0500 Subject: IB vs Myrinet In-Reply-To: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> References: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> Message-ID: <3FA7BD83.2060901@lmco.com> Nicholas Henke wrote: > On Mon, 2003-11-03 at 22:50, Dean Johnson wrote: > > On Mon, 2003-11-03 at 21:19, Joey Sims wrote: > > > I believe IB is a much better interconnect technology than Myrinet > > > period. Plus, you don't have to deal with Myricom. > > > > > > > Hmmm, all my dealings with Myricom have been excellent. We had a frame > > failure right before a holiday and they happily cross-shipped a > > replacement. We were back in business very quickly. All our > questions of > > support have been answered quickly and accurately. > > Same here -- the support has been great. We did run into a problem where > they were short on line cards to send as replacements, but they did keep > us posted and let us know what was going on. > > BTW -- did no one notice that the statement was 'comparable prices per > port' but that you had to _pay_ for the mpi implementation? I feel alot > better knowing there is a free and open source implementation of MPI > over gm from LAM and mpich. > I want to interject a comment here. In the past (recent and a few years back) we've had trouble with the open source MPI implementations with our codes. When we contacted them about our problem we got a luke warm (at best) response. When we contacted a commercial MPI vendor, they fixed the problem in less than a day. Plus our codes were about 30% faster than the open-source ones. However, we continue to look at LAM, MPICH, and others. While I'm a big proponent of open-source for many reasons, at least for MPI, we've found that a commercial vendor is worthwhile for us. The one we've used provides a very good and fast product for our systems. Also, their technical support is extremely good (I normally reserve that phrase, but it truly applies in this case). More importantly, we've found that most of our problems beyond the first few months that a cluster is in production, are with MPI. Having a company to help us diagnose and fix the problem quickly means a great deal to us (we're in production 24/7 and down time is a true killer). So for us, when we look at per port costs, we include a commercial MPI for whatever network we're looking at, well with one exception. While there are differences in MPI costs based on the type of interconnect, the difference is in the noise for price/performance for us. One of the hidden costs from my prospective, that allows us to compare interconnects, is a product of the cost of diagnosing problems, fixing problems, and how frequently the problems occur. We have experience with one high-speed interconnect in this regard and that number is very large. This has made us gun-shy about trying any other high-speed interconnect on a production basis (although we continue to test). Just my 2 cents this morning. Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Tue Nov 4 10:10:48 2003 From: jcownie at etnus.com (James Cownie) Date: Tue, 04 Nov 2003 15:10:48 +0000 Subject: IB vs Myrinet In-Reply-To: Message from Nicholas Henke of "Tue, 04 Nov 2003 08:31:39 EST." <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> Message-ID: <1AH2pU-2Gz-00@etnus.com> > BTW -- did no one notice that the statement was 'comparable prices > per port' but that you had to _pay_ for the mpi implementation? I > feel alot better knowing there is a free and open source > implementation of MPI over gm from LAM and mpich. There's at least one free, open source MPICH over Infiniband project :- http://nowlab.cis.ohio-state.edu/projects/mpi-iba/ -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From waitt at saic.com Tue Nov 4 10:08:09 2003 From: waitt at saic.com (Tim Wait) Date: Tue, 04 Nov 2003 10:08:09 -0500 Subject: IB vs Myrinet In-Reply-To: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <1067917842.3219.88.camel@terra> <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> Message-ID: <3FA7C0D9.3050104@saic.com> > BTW -- did no one notice that the statement was 'comparable prices per > port' but that you had to _pay_ for the mpi implementation? I feel alot > better knowing there is a free and open source implementation of MPI > over gm from LAM and mpich. There is, it's called MVAPICH: http://nowlab.cis.ohio-state.edu/projects/mpi-iba/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Nov 4 10:06:16 2003 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 4 Nov 2003 16:06:16 +0100 (CET) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: Message-ID: On Tue, 4 Nov 2003, Robert G. Brown wrote: > > As I've noted and will continue to note again, I think that we are on > the verge of a paradigm shift in linux anyway, and that this is going to > likely kick it over the edge. The existence of yum AND apt-tools > > Instead of upgrading systems on a timescale with granularity of years, > we may be entering a universe where systems are microincrementally > updated nightly, with immediate feedback and repair, and with a > user/admin determined lag relative to the primary repositories to insure > a level of institutional stability deemed locally acceptable. In a way Agree. A couple of days ago you made a post re. updating mechanisms, where you talked about yum, or yum-like mechanisms which would recompile packages to suit the target node, or something along those lines. How about we think of applications along the same lines - but not necessarily being recompiled. In the context of a grid world, it still worries me that people are saying that application Z will run anywhere - on some machine maintained on another campus - who knows if it is even running the same Linux distribution? (OK - maybe everything needs to be statically linked then.) If I may, I'll join Bob in a blue sky thought. How about applications being installed as RPMs? Then the RPM would have dependencies - application Z needs library-b between x and y. Could you then get a message back saying 'sorry - this cluster won't support this application: it needs x-y' Sorry - a huge bly sky here. But we do keep hearing about grid and infrastructure on demand. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From egan at sense.net Tue Nov 4 13:18:35 2003 From: egan at sense.net (Egan Ford) Date: Tue, 4 Nov 2003 11:18:35 -0700 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <027d01c3a300$13c57120$27b358c7@titan> What about software? Where is the stable free OSS IB kernel modules and proven current MPICH implementation (don't point me to MVICH either, it is no longer being maintained)? What about people? Myricom has many HPC experts. I need to look at the total picture. Support, service, software, and history are equally important to raw hardware specs. > -----Original Message----- > From: beowulf-admin at scyld.com > [mailto:beowulf-admin at scyld.com] On Behalf Of Joey Sims > Sent: Monday, November 03, 2003 8:20 PM > To: beowulf at beowulf.org > Subject: IB vs Myrinet > > > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. > > IB is about to find major traction in this industry and > Myricom will not > have the guns to stop it. As adoption rates increase the price will > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. > The difference between the two being that Topspin offers a more "value > added" flavor of Mellanox silicon with various hardware tweaks and a > more robust software package. > > It depends on how you're looking at the cost of IB. First of > all, it's > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when > PCI Express > is around the corner. Myricom has a lot of installations > worldwide and > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different > options such as bridging between IB, GbE, or FC so you could hang your > storage boxes off the IB switch without much hassle. > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. > > Regards, > > ================================================== > Joey P. Sims 800.995.4274 - 242 > Sales Manager 770.442.5896 - Fax > HPC/Storage Division www.csilabs.net > Concentric Systems, Inc. jsims at csiopen.com > ====================================ISO9001:2000== > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ktpedre at sandia.gov Tue Nov 4 12:35:40 2003 From: ktpedre at sandia.gov (Kevin Pedretti) Date: Tue, 4 Nov 2003 09:35:40 -0800 Subject: IB vs. Myrinet In-Reply-To: <200311040417.hA44H0g27831@NewBlue.scyld.com> References: <200311040417.hA44H0g27831@NewBlue.scyld.com> Message-ID: <200311040935.40505.ktpedre@sandia.gov> > Subject: IB vs Myrinet > Date: Mon, 3 Nov 2003 22:19:48 -0500 > From: "Joey Sims" > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. Things like Myrinet and Quadrics do have at least one architectural advantage over IB for HPC -- they have a programmable processor on the NIC. Myricom's MX will, presumably, use the NIC processor to offload MPI receive matching. Quadrics Tports also offloads MPI matching. Offloading theoretically lowers host CPU overhead (less interrupts) and lowers latency (less trips across the PCI bus). If Ohio State's MVAPICH really scales beyond 8 nodes well (I've only seen 8 node benchmarks), then maybe my point is irrelevant. Still, in my opinion the offload approach is more elegant. I've heard some IB HBAs are programmable but there is no standardization and documentation is scarce. Does anybody have more information? Kevin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Tue Nov 4 10:00:12 2003 From: ctierney at hpti.com (Craig Tierney) Date: Tue, 4 Nov 2003 08:00:12 -0700 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <20031104150011.GC1872@hpti.com> On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. > > IB is about to find major traction in this industry and Myricom will not > have the guns to stop it. As adoption rates increase the price will > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. > The difference between the two being that Topspin offers a more "value > added" flavor of Mellanox silicon with various hardware tweaks and a > more robust software package. > > It depends on how you're looking at the cost of IB. First of all, it's > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when PCI Express > is around the corner. Myricom has a lot of installations worldwide and > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different > options such as bridging between IB, GbE, or FC so you could hang your > storage boxes off the IB switch without much hassle. Comparative is very relative. Pricing I have seen does not show that. You state 3x performance, that might be true in bandwidth for the PCIX-D card but not the E-card. The E-card does saturate the PCIX slot. For both cards, latency is better on Myrinet than on IB. For some that is more important than bandwidth. Also, Myrinet is going to release new software (MX) that reduces the latency futher. Show me a very large IB system that scales well. VaTech does not count. I hope they figure that out though. I like the IB specs and their roadmap, but it has to work as well. Craig > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. > > Regards, > > ================================================== > Joey P. Sims 800.995.4274 - 242 > Sales Manager 770.442.5896 - Fax > HPC/Storage Division www.csilabs.net > Concentric Systems, Inc. jsims at csiopen.com > ====================================ISO9001:2000== > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney at hpti.com) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 4 14:59:30 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 4 Nov 2003 14:59:30 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: Message-ID: On Tue, 4 Nov 2003, John Hearns wrote: > If I may, I'll join Bob in a blue sky thought. How about applications > being installed as RPMs? Then the RPM would have dependencies - > application Z needs library-b between x and y. > Could you then get a message back saying 'sorry - this cluster won't > support this application: it needs x-y' > > Sorry - a huge bly sky here. But we do keep hearing about grid and > infrastructure on demand. Curiously and not terribly coincidentally, we REQUIRE all our linux applications to be installed as RPMs at Duke. Even commercial ones we get in some other form, we typically repackage into an RPM. Otherwise it IS very difficult to know whether dependencies are being met or warped, and a good rpm also facilitates de-installation (an rpm --erase or yum remove away). This isn't to say that individuals may not install applications that aren't rpm's on their own systems from time to time or force rpm installs from mismatched distributions without a rebuild, but this is the Dark Path and leads to RPM Hell. This also means that you don't even get the message above. If you use yum as in: yum install clusterapp and clusterapp is on one of the repositories in its /etc/yum.conf, then yum will fire back a message such as "clusterapp needs packages: clusterlib clustergui clusterdevel to be installed. Install (y/n)?: Press y and all four packages are grabbed and installed so clusterapp is ready to run, possibly from the clustergui you may not have known existed. Or use yum -y -d 0 install clusterapp and install clusterapp AND its dependencies right now and you mean it and only say something if you encounter a condition that makes the install fail. You also get warnings if clusterapp contains and wishes to replace files "belonging" to installed packages, if there are obsoletes in its dependencies, if there are dependency loops -- basically if there is anything whatsoever that cannot be automagically resolved and requires human intervention to make happen safely. (Getting such a message usually means that your system is in RPM Hell from a previous RPM force; NOT getting such a message when you should have usually means that you built something and installed it from source so that it isn't in the rpm db but is on the system anyway.) If you install strictly RPM's strictly with anaconda or yum, and never override or force anything, your system has an excellent chance of staying out of RPM Hell and being consistently automagically installable, upgradeable, updateable, and so forth. If nothing else when you put a "bad rpm" on your repository (and there are plenty out there) it won't install and you'll be forced to fix/rebuild it so that it does instead of breaking your system with a force. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Tue Nov 4 16:36:32 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Tue, 4 Nov 2003 21:36:32 +0000 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA79DC6.9040608@scalableinformatics.com> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> Message-ID: <20031104213632.GA1662@galactic.demon.co.uk> On Tue, Nov 04, 2003 at 07:38:30AM -0500, Joe Landman wrote: > > > >BUT, if the > >entire HPC community actually worked together to bring about that change > >it wouldn't be that hard. Too idealistic? > > > > I believe it might be too idealistic. This crowd, if you read this > forum and some of the others, likes to innovate and create its own value > atop some sort of standard offering. If I am reading you correctly, you > are advising focusing on making on particular platform that you > personally (to separate you from your employer here) like, as the > standard, and stop all the bickering about doing another direction (that > you personally do not like). Is this a fair read? > Everyone, There is a problem here: the sky is falling, and no-one is listening to us :) Red Hat Linux as we have known it for the past few years has changed focus. Most of the Red Hat Linux boxes out there will be unsupported after December this year - the remainder will be unsupported after April 2004 (unless they happen to convert to RH Enterprise Linux in one or other variant)> Lots of smaller Red Hat based specialist distributions will also potentially feel the knock on effect as software is not updated or you can't get the base OS on which to build any more. That's OK - it just doesn't feel comfortable at this point. The replacement is still a beta - although the beta test period is over, Fedora Core 1.0 isn't out yet. The new model of Fedora Core / Extra and so on is going to be hard to get used to - as is the accelerated speed of change and potential lack of bug fixes. "Don't fix, upgrade" may be the new model. There are calls to rip out the proprietary bits of RHEL and build a Libre version: that would possibly be unwise - you are still tying your efforts to someone else's code: this is also likely to be code where fixes are made relatively slowly on a long timescale and where the vendor may have other peoples values in mind: the typical EL customer is not necessarily the typical cluster owner. Forking small special purpose distributions is potentially a bad idea. Rocks/Warewulf/Scyld/Caos(if I'm spelling correctly) are all RH based - on older code - and all in the same marketplace. BUT There may be an alternative which will guarantee you code freedom, won't charge for licenses in any event, won't bow to "commercial" pressure and won't restrict your code use/re-use/modification/distribution. If you want an ultra stable platform to which you can freely contribute code and which you can use for any purpose - try Debian "stable". The release cycle is long, it's updated relatively infrequently but security patches and major bugs are fixed. It won't vary wildly from quarter to quarter. It takes two years between major releases - but the code will have been tested for longer and, potentially on more platforms. [It's been said that Debian is the test bed of choice for the X Window System, for example, because it is made to work on many architectures and tends to find obscure bugs] Debian "testing" is in a state of slow flux. The name is a misnomer in that the code is not necessarily beta quality: it should always be the "testing" for the next build of the full release such that it's perfectly usable from day to day / week to week / month to month and should be releasable at relatively short notice. Occasionally, major changes may break stuff for a few days: there is a transition at the moment, for example, from KDE 2.x (which was released in stable over a year ago) to KDE 3.x (which has been working in "unstable" for some months). Because of incompatibilities, some KDE components may not work for a couple of days until it settles down. For 90% of folk using a cluster, that sort of thing may be irrelevant. Testing is asymptotically approaching the next stable release - but won't be released as stable until its ready :) Debian "unstable" - may change fairly dynamically: may break but is quickly fixed. Latest bleeding edge software trickles down from here through a defined process until it reaches testing and then, ultimately, becomes part of the next stable release. Probably not for clusters - though I have a toy/evaluation cluster running at work on unstable purely for the very latest GCC, for example. Debian encourages everyone to build specialist distributions based on Debian. If the hassle's too much, feed in your cluster packages to become part of the main distribution. There is at least one Debian newsgroup (debian beowulf) where clusters are of high interest. Check out what's already within Debian. As noted in a previous post, you may find what you want has already been put in place in the 8000+ packages. This is a purely personal post. It does _not_ represent my employers or any other person. Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 17:50:57 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 04 Nov 2003 17:50:57 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104213632.GA1662@galactic.demon.co.uk> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> Message-ID: <3FA82D51.3070102@scalableinformatics.com> Andrew M.A. Cater wrote: >Everyone, > >There is a problem here: the sky is falling, and no-one is listening to >us :) > > And I thought it was just a business sea change :0 [...] >BUT > >There may be an alternative which will guarantee you code freedom, >won't charge for licenses in any event, won't bow to "commercial" >pressure and won't restrict your code use/re-use/modification/distribution. >If you want an ultra stable platform to which you can freely contribute >code and which you can use for any purpose - try Debian "stable". > > There are interesting bits in debian. I am not sure it is necessarily the right choice for clusters due to the specific lack of commercial support for cluster specific items such as Myrinet, and the other high speed interconnects. Commercial compiler support for Debian (e.g. Intel, Absoft, et al) is largely non-existant as far as I know (please do correct me if I am wrong). Few if any commercial applications are certified to work on Debian (Oracle, Legato, ....) and again, please correct me if I am wrong. I simply don't see this as a universally viable alternative. Debian does indeed have lots of nice technical things going for it. Maybe I am missing some obvious point here. I do know some people have built clusters using it, but a few clusters does not a clustering distribution make. I believe someone at Cornell built Windows 2000 into a cluster. Doesn't make Win2k a clustering OS though. The distribution matters less than the overall support for what you want to do with it. I believe that it might be possible to build a Gentoo based cluster, though I would be concerned about the length of time for an OS load, among other things. One of the hardest parts of a cluster is getting the OS on. ROCKS, BioBrew (and I understand Warewulf) make this ridiculously easy. Increasing the setup/management time, or making your life harder in general, doesn't make much sense. There is a Knoppix variant that does clustering (OpenMosix style). Not sure it is the best solution, but I would like to hear from anyone using it. -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Tue Nov 4 17:33:57 2003 From: patrick at myri.com (Patrick Geoffray) Date: 04 Nov 2003 17:33:57 -0500 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <1067985237.1208.392.camel@asterix> Hi Joey, On Mon, 2003-11-03 at 22:19, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Free country. > Plus, you don't have to deal with Myricom. Would you share the horror story of you dealing with Myricom ? Did Myricom did something bad to you or your company ? > IB is about to find major traction in this industry and Myricom will not > have the guns to stop it. As adoption rates increase the price will Which industry ? That's the real question. If you say HPC, it's bad news for IB, for several reasons. First, it's a tiny market. The VC don't find it very appealing, it's not the billions dollar market they were promised. Storage and data center are worth it, but IB did not succeed to penetrate these markets. Look at the fate of the last IB company to close its doors (Fabric Networks if I remember well). The press release was saying that they were taking their money to go do something else, because the current market was not worth it. The second reason is that HPC has very special needs. You can get some success by having a big pipe, but it's usually not enough. MPI is important, application performance is important. That's not what the storage and data center needed, and that's not what IB was designed for. > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. There is something very bad in this sentence: "both using Mellanox chips". Where are the dozens of silicon vendors that were supposed to flood the market and drive the price down ? They died the last 2 years. Today, it's not Infiniband, it's Mellanox and resellers. Not that different from Myricom and resellers... > It depends on how you're looking at the cost of IB. First of all, it's It really depends on how your are looking at the cost of IB. Mellanox has been, and still is I believe, burning VC cash, as they don't have the sales volume to sustain their internal cost. Today's price for IB products are not sustainable price, they are aggressive penetration price, that means it's near cost or below cost. That's why so many players died. > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when PCI Express > is around the corner. Myricom has a lot of installations worldwide and The current PCI-X NIC is 1X, the second one to ship by SC03 is 2X, the next one is 4X. It's not that hard to add links and aggregate bandwidth, the rest is more important (like being able to do bidirectional faster than unidirectional...). Why do you want to have a PCI-Express product when no customers ask for it because PCI-Express is not shipping in volume yet ? Don't worry, when you can buy PCI-Express nodes in volume, you will be able to buy a PCI-Express Myrinet NIC to put it in. > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different New things are not always hardware. We will demo a completely new software stack at SC. Same hardware, much better application performance. As I said, adding links to aggregate bandwidth is easy, but doing the right thing to run applications faster is another level of difficulties. Now, when you have the right software design, just ramp up the pipe performance to please the spec believers and you have what customer wants. > options such as bridging between IB, GbE, or FC so you could hang your > storage boxes off the IB switch without much hassle. You can bridge Myrinet and GigE. Not FC, the protocol stinks too much. > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. Myricom's roadmap goes up way higher than 10 Pb/s, if that makes you feel more comfortable. Patrick -- Patrick Geoffray Myricom, Inc. http://www.myri.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Nov 4 19:01:27 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 04 Nov 2003 16:01:27 -0800 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067985237.1208.392.camel@asterix> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> > > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > > interconnect technology ratcheted up way higher than 10GB. > >Myricom's roadmap goes up way higher than 10 Pb/s, if that makes you >feel more comfortable. > >Patrick >-- Wait a minute here... You might run into some fundamental physics problems, especially when getting "way higher" than 10 Pb/sec... I'd like to see what you've paved that road map with, and make sure it doesn't ruin my shoes when I walk on it . Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in powers of 10, here, and 100 Gb/sec is just too hard to envision..) 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 Giga, right?) Where are you going to fit those million wires/fibers/connectors? Let's say you're using optical fibers that are 10 micron in diameter (which is a fairly impressive feat). Assuming you space them by 5 micron, you can pack 1000x1000 of them in 5 mm x 5 mm... There is a bit of a problem with interconnects, etc., but perhaps you can terminate it right on top of a die, and the circuitry for one channel is small enough to fit? How tolerant is Myricom's hardware of skew and jitter between the parallel lines? At least with a million lines, you can use statistical techniques to characterize it, and you'd almost have to use some form of forward error correction, so the extreme outliers wouldn't give you troubles. You might be able to push the bit rate a bit higher.. We've got some components working at 94 GHz here, and there are some novel techniques with propagating the wave in the boundary outside a dielectric rod, so the loss is reasonable. We haven't figured out how to turn a corner yet, but that wouldn't violate any laws of physics. The distance is short, so maybe waveguide can work (optical fiber is waveguide and fairly low loss) Hmm.. now, about that X-ium or X-lon mobo that is going to send/accept/process the 10 Pb/s data stream.... What is the physical limit on memory speed? The cells can only be so small, and you've got to propagate the charge across it. I suppose, theoretically, one could use a charge as small as 1 electron, so that sort of provides a lower bound. I've heard of CMOS processes with fT of 10 GHz in very small feature sizes (the wireless market really wants to do RF and digital on the same chip). Say you get that 10 GHz memory... you'll need million way interleaving. This starts to make the SIMD systolic arrays look more attractive doesn't it. Maybe free space optical interconnects with monolithically fabricated optics over the chip might be a solution? HeNe lasers are about 474 THz, as I recall, so if you baseband encode your 10 Pb/sec bitstream, you're only looking at 30 nm extreme UV kinds of bandwidth. Lends new meaning to the RF designer's term: DC to light... James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Tue Nov 4 19:27:36 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 04 Nov 2003 17:27:36 -0700 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> Message-ID: <1067992056.21779.50.camel@fpga.sandia.gov> > Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in powers of > 10, here, and 100 Gb/sec is just too hard to envision..) > > 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 > Giga, right?) Um: http://pr.fujitsu.com/en/news/2000/09/25.html or: http://www.siemens.com/page/1,3771,228164-1-999_4_0-0-,00.html Multiple Tb/s on a single fiber... Of course, I don't have any idea what they are going to do with the data when they get it there either ;-) (i.e. issues with memory, buses, processor speeds, etc.) Or, for that matter, how they are going to get that data out of any silicon chip... But, I assume these are planned for after we are doing all optical computing, right? ;-) Keith -- Keith D. Underwood _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Tue Nov 4 19:04:12 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Tue, 4 Nov 2003 17:04:12 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104213632.GA1662@galactic.demon.co.uk>; from amacater@galactic.demon.co.uk on Tue, Nov 04, 2003 at 09:36:32PM +0000 References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> Message-ID: <20031104170412.A9868@lnxi.com> On Tue, Nov 04 2003 at 14:36, Andrew M.A. Cater wrote: > There may be an alternative which will guarantee you code freedom, > won't charge for licenses in any event, won't bow to "commercial" > pressure and won't restrict your code use/re-use/modification/distribution. > If you want an ultra stable platform to which you can freely contribute > code and which you can use for any purpose - try Debian "stable". ... > Debian encourages everyone to build specialist distributions based on > Debian. If the hassle's too much, feed in your cluster packages to > become part of the main distribution. There is at least one Debian > newsgroup (debian beowulf) where clusters are of high interest. > Check out what's already within Debian. As noted in a previous post, > you may find what you want has already been put in place in the 8000+ > packages. Debian is at a disadvantage in that RPM is not its native package format; BUT debs are also what make Debian so robust. However, RPM is the package format of choice for HPC, the enterprise, and lets not forget the LSB. Its unfortunate really, but Debian has generally prided itself on making aspiring debian developers run the deb packaging gauntlet in order to prove they've got the required deb-fu. That's something that'll have to be lessened; possibly by leveraging some of the build systems that are coming into light from developers in the Debian community. If anything I'd say that Debian's lack of native RPM support is the biggest hurdle for debian to have a break-out run as the Linux distro of choice for many. BUT there is hope; Progeny recently announced that they ported RedHat's Anaconda to Debian (still under development). This is significant in that projects like ROCKS _could_ theoretically drop Debian in as a replacement for the underlying Linux distro and still maintain the complete clustering feature set that ROCKS offers (at least kickstart compatibility). Granted extensive RedHat-isms would need to be ported to be Debian-isms; but this is where the LSB is _supposed_ to weigh-in. But as Joe Landman pointed out certification and commercial software support for Debian is non-existent (AFAIK); that _could_ change in the near future if Bruce Perens and others in the Debian community have anything to say about it. As Bruce Perens recently mentioned on an lwn.net forum: Bruce: What's wrong with RedHat? (Posted Oct 11, 2003 16:02 UTC (Sat) by BrucePerens) (Post reply) The most important things that a user-driven distribution can provide over RHAS are that the free version will be the certified one, that there won't be a lock on support information, and that it won't be dominated by one company. I am having talks with sponsors now. You'll hear from me in a few weeks. > This is a purely personal post. It does _not_ represent my employers > or any other person. Likewise... Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Tue Nov 4 19:05:13 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Wed, 5 Nov 2003 00:05:13 +0000 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA82D51.3070102@scalableinformatics.com> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> <3FA82D51.3070102@scalableinformatics.com> Message-ID: <20031105000513.GA2101@galactic.demon.co.uk> On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: > > > Andrew M.A. Cater wrote: > > [...] > > >BUT > > > >If you want an ultra stable platform to which you can freely contribute > >code and which you can use for any purpose - try Debian "stable". > > > > It's an idea. > > There are interesting bits in debian. I am not sure it is necessarily > the right choice for clusters due to the specific lack of commercial > support for cluster specific items such as Myrinet, and the other high > speed interconnects. Dan - if I build a _really big_ cluster, will you get Quadrics to do Debian :) Same goes for any other vendor - if you ask them nicely and make it worth their while, they'll do it. In many cases, it's only a recompile of a device driver to account for library differences, after all. HP use Debian internally, IIRC. Some of the Debian developers are also HP folk - HP are potentially looking to support more of their products under Linux? [See, for example, Debian Weekly News for today :) ] > Commercial compiler support for Debian (e.g. > Intel, Absoft, et al) is largely non-existant as far as I know (please > do correct me if I am wrong). Compaq Alpha compilers work on the Alpha port or can be tweaked to IIRC. I have no current expertise on big commercial compilers, however. > Few if any commercial applications are certified to work on Debian > (Oracle, Legato, ....) and again, please correct me if I am wrong. > Many of these will run fine without formal certification from the vendor. Few, if any, current commercial apps run on Red Hat 4.2 / 5.0 - and current Red Hat 7.x/8.x/9.x is now as commercially relevant. The big commercial apps will have to retrench their markets, potentially, to (one/both) of Novell / RH Enterprise Linux at ??$ per licence. Unless it says RH/Novell on the box, they won't certify on something "less but Libre" based on RH. But this is Linux - a commercial Linux app. will run on other distributions with a little thought / planning. I'm not sure they'll run Oracle on Scyld / ROCKS, for example. > I simply don't see this as a universally viable alternative. Debian > does indeed have lots of nice technical things going for it. Maybe I am > missing some obvious point here. I do know some people have built > clusters using it, but a few clusters does not a clustering distribution > make. I believe someone at Cornell built Windows 2000 into a cluster. > Doesn't make Win2k a clustering OS though. If the HPC on Linux community wants to build a clustering distribution on their terms they can within Debian. A thousand coders worldwide who have more than a passing interest in fun stuff can work wonders if they see the motivation in quality and good code - a character trait I'm sure they have in common with many cluster folk, academics and researchers :) > > The distribution matters less than the overall support for what you want > to do with it. I believe that it might be possible to build a Gentoo > based cluster, though I would be concerned about the length of time for > an OS load, among other things. One of the hardest parts of a cluster > is getting the OS on. Getting Debian nodes up is no harder than anything else on any other distribution - provided its not your first ever experience of Linux :) The minimal Debian install really is fairly minimal, if that's what you want - you can readily build from there. Want a full featured X Windows System - apt-get install x-windows-system. Want vi ?? Apt-get install vi / elvis / vim / nvi ... :) > ROCKS, BioBrew (and I understand Warewulf) make > this ridiculously easy. Increasing the setup/management time, or making > your life harder in general, doesn't make much sense. There is a > Knoppix variant that does clustering (OpenMosix style). Not sure it is > the best solution, but I would like to hear from anyone using it. > This is fun if you want an ad-hoc StoneSouperComputer - the 512 node machine built in a night on a German TV show or the four node proof of concept idea for a show and tell in someone's office - but I'm not entirely sure I'd trust my most valuable data to it. But hey, like most things KNOPPIX based its an ultra cool demo :) Have fun - at 0015 or so Zulu time, I'd better get some rest :) Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Tue Nov 4 20:27:03 2003 From: patrick at myri.com (Patrick Geoffray) Date: 04 Nov 2003 20:27:03 -0500 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> Message-ID: <1067995623.1208.431.camel@asterix> Hi Jim, On Tue, 2003-11-04 at 19:01, Jim Lux wrote: > >Myricom's roadmap goes up way higher than 10 Pb/s, if that makes you > >feel more comfortable. > Wait a minute here... You might run into some fundamental physics problems, > especially when getting "way higher" than 10 Pb/sec... I'd like to see what > you've paved that road map with, and make sure it doesn't ruin my shoes > when I walk on it . It supposed to wish you luck in some countries, right ? :-) But what about this twin photon spin stuff ? You send a photon, you keep his twin brother and when you spin the second one, the first one spin the same way a few thousand miles away.... Seriously, it was a very ironical comment, I should have marked it that way to be sure it would not be taken at the letter. I wrote Pb because I don't know what there is after "Peta". I thought about googooplexb/s but didn't know if it was valid. The point is that having something in your roadmap has no more value than what just ruined your shoes. Some vendors also have a public roadmap and a private roadmap, you need an NDA to access the private roadmap, either because you lie in the public roadmap or because the private roadmap contains strategic choices that you don't want to share with your competitors. So let's say that Myricom's roadmap goes up way higher than 10 googooplexb/s around 2030 :-) Patrick --- Patrick Geoffray Myricom, Inc. http://www.myri.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 19:29:13 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 04 Nov 2003 19:29:13 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031105000513.GA2101@galactic.demon.co.uk> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> <3FA82D51.3070102@scalableinformatics.com> <20031105000513.GA2101@galactic.demon.co.uk> Message-ID: <1067992153.4513.24.camel@protein.scalableinformatics.com> On Tue, 2003-11-04 at 19:05, Andrew M.A. Cater wrote: > On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: [...] > > There are interesting bits in debian. I am not sure it is necessarily > > the right choice for clusters due to the specific lack of commercial > > support for cluster specific items such as Myrinet, and the other high > > speed interconnects. > > Dan - if I build a _really big_ cluster, will you get Quadrics to do > Debian :) I think the question is, if you buy $10M in interconnects from them, would they please port to distro X. Likely it would be worth their while in that case. Are you going to build such a big cluster? :) > Same goes for any other vendor - if you ask them nicely and make it > worth their while, they'll do it. In many cases, it's only a recompile > of a device driver to account for library differences, after all. Not always. The issue is not simply a port, but also the support costs. Support in the sense of qualifying the port against a standard load. Coming up with the standard load, building the regression tests, educating the staff on the new support ... They may simply make the port and say, good luck, you are on your own. > HP use Debian internally, IIRC. Depends upon who you ask. Bruce Perens had some effect there, but as I remember, they use SUSE, RedHat, ROCKS, etc. > Some of the Debian developers are also > HP folk - HP are potentially looking to support more of their products > under Linux? [See, for example, Debian Weekly News for today :) ] Following on others foray into this, I am going to take a pragmatic position. I will believe it when I see it (.deb's from HP and others). > > Commercial compiler support for Debian (e.g. > > Intel, Absoft, et al) is largely non-existant as far as I know (please > > do correct me if I am wrong). > > Compaq Alpha compilers work on the Alpha port or can be tweaked to IIRC. > I have no current expertise on big commercial compilers, however. :) I seem to remember HP recently EOLing the Alpha in favor of some other chip... can't remember its name ... ;-) I can run Debian on my SGI Indy. I am not, but I can. Doesn't mean much as the market for Indy's has basically dried up. > > Few if any commercial applications are certified to work on Debian > > (Oracle, Legato, ....) and again, please correct me if I am wrong. > > > > Many of these will run fine without formal certification from the > vendor. Ok. Now sell that to a CIO/CTO, or someone responsible for making the infrastructure work. Mike at Linux Networx (though speaking for himself) called it the "smile test" or something like that. The question you will be asked is, if something breaks in our critical business application, who are we going to call if we are using the un-certified OS distribution? This is a hard sell. > Few, if any, current commercial apps run on Red Hat 4.2 / 5.0 - and > current Red Hat 7.x/8.x/9.x is now as commercially relevant. I respectfully disagree with the last portion of the statement. Most of the engineering code that I have played with recently spec out RH7.x as their linux supported platform. Anything else and you are on your own. The bio and chem codes which come pre-compiled tend to have a "requirements" section as well, listing RH7.x. Remember that RHAS2.1 will be supported a few more years, and it is ostensibly RH7.x. > The big > commercial apps will have to retrench their markets, potentially, to > (one/both) of Novell / RH Enterprise Linux at ??$ per licence. Unless > it says RH/Novell on the box, they won't certify on something "less but > Libre" based on RH. But this is Linux - a commercial Linux app. will run > on other distributions with a little thought / planning. I'm not sure > they'll run Oracle on Scyld / ROCKS, for example. Some distros are more (for lack of a better term) engineered than others. There are some code issues which some of these which break the "defacto" standard Linux. As for Oracle on ROCKS, well, Oracle does run in a supported mode on RHAS2.1 (see above), and ROCKS == RH7.3, so the rest is left to the reader. As the underlying OS is RedHat, with a meta layer atop it called ROCKS, Oracle should not see any reason not to work under this environment in a supported manner. That said, I am not sure that is what you want to do with Oracle though. [...] > > > ROCKS, BioBrew (and I understand Warewulf) make > > this ridiculously easy. Increasing the setup/management time, or making > > your life harder in general, doesn't make much sense. There is a > > Knoppix variant that does clustering (OpenMosix style). Not sure it is > > the best solution, but I would like to hear from anyone using it. > > > > This is fun if you want an ad-hoc StoneSouperComputer - the 512 node > machine built in a night on a German TV show or the four node proof > of concept idea for a show and tell in someone's office - but I'm > not entirely sure I'd trust my most valuable data to it. But hey, like > most things KNOPPIX based its an ultra cool demo :) I think the irony in all of this is that the one disk I carry with me everywhere is a Knoppix disk (Debian based). I like it, it is technically neat. > > Have fun - at 0015 or so Zulu time, I'd better get some rest :) > > Andy -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Tue Nov 4 20:08:02 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue, 4 Nov 2003 17:08:02 -0800 Subject: IB vs Myrinet In-Reply-To: <3FA7BD83.2060901@lmco.com> References: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> <3FA7BD83.2060901@lmco.com> Message-ID: <20031105010802.GE4682@greglaptop.internal.keyresearch.com> On Tue, Nov 04, 2003 at 09:53:55AM -0500, Jeff Layton wrote: > I want to interject a comment here. In the past (recent and a few > years back) we've had trouble with the open source MPI implementations > with our codes. When we contacted them about our problem we got > a luke warm (at best) response. Jeff, Whom did you contact, the people maintaining MPICH, or the people who sold you the cluster? System integrators can do more than deliver piles of parts, they can also support things. So there's more to life than asking open source maintainers to fix something, or switching to a closed-source commercial MPI vendor. > One of the hidden costs from my prospective, that allows us to > compare interconnects, is a product of the cost of diagnosing problems, > fixing problems, and how frequently the problems occur. We have > experience with one high-speed interconnect in this regard and that > number is very large. Amen. Again, the front-line of interconnect support is generally the system integrator. Myricom, for example, has a reseller model, in which front-line support should be provided by your system integrator. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Tue Nov 4 20:53:52 2003 From: jsims at csiopen.com (Joey Sims) Date: Tue, 4 Nov 2003 20:53:52 -0500 Subject: IB vs Mryinet Message-ID: <812B16724C38EE45A802B03DD01FD547226269@exchange.concen.com> Hello Patrick, First of all, I want to apologize for my message initially coming off as if there were problems dealing with Myricom in regards to your products and/or customer service and it's user base. Not so. It was not intended to be directed to Myrinet users at all. I clarified that immediately. Secondly, I do not want to use this message board which I enjoy visiting, chatting with others, and learning a lot from as some tool to sling mud back and forth. Personally, I don't think that would be very courteous to the owners of this site nor its visitors. Moving forward with respect to others, I will keep this response brief and no more need be said about it. At least you guys are consistent with your "shock and awe" defense reflex at the first mention of a competitors product when compared to yours. Friendly competition is good for everyone for many reasons. Matter of fact, I've purchased your product from one of my own competitors and he was awesome. He helped me out big time. For the record, we still haven't found that missing card.... Be treated the way you would want to be treated. I am not the only one who has witnessed the show of arrogance and disrespect towards someone coming to you for years trying to sell your product and be treated like we just stuck you with bad debt. We're just trying to do our jobs. Have a great evening. Sincerely, ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 20:32:16 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 04 Nov 2003 20:32:16 -0500 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067992056.21779.50.camel@fpga.sandia.gov> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> <1067992056.21779.50.camel@fpga.sandia.gov> Message-ID: <1067995935.4513.51.camel@protein.scalableinformatics.com> On Tue, 2003-11-04 at 19:27, Keith D. Underwood wrote: > > Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in powers of > > 10, here, and 100 Gb/sec is just too hard to envision..) > > > > 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 > > Giga, right?) > > Um: http://pr.fujitsu.com/en/news/2000/09/25.html > or: http://www.siemens.com/page/1,3771,228164-1-999_4_0-0-,00.html > > Multiple Tb/s on a single fiber... Get yer lambda's here... red hot... lots of them... 10000 to a fiber ... Think of this like the cable TV coax coming into your house (theoretically if need be). One wire, lots of bandwidth. I have heard (e.g. bad memory) that TV signals require 6MHz of bandwidth, so "hundreds" of TV stations require somewhat less than a GHz in bandwidth. Same effect, using different 1/lambda's for each "channel". > Of course, I don't have any idea what they are going to do with the data > when they get it there either ;-) (i.e. issues with memory, buses, > processor speeds, etc.) Or, for that matter, how they are going to get > that data out of any silicon chip... But, I assume these are planned > for after we are doing all optical computing, right? ;-) The entropy generation rate must be huge with all that data going into the bit bucket, http://www.nature.com/nature/journal/v406/n6799/box/4061047a0_bx1.html unless of course a fast enough receiver can do something about this... k(B) ln 2 for each bit "erased", so something like ((10**15) k(B) ln 2)/second, if you consider that bits dropped on the floor are erased. On the order of 10**(-8) W/K. I defer to practicing physicists on the interpretation (as it is not my field). -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmyers1400 at comcast.net Tue Nov 4 10:23:36 2003 From: rmyers1400 at comcast.net (Robert Myers) Date: Tue, 04 Nov 2003 10:23:36 -0500 Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <3FA7C478.9060908@comcast.net> Butti Gabriele - Dottorati di Ricerca wrote: > > > >The code we want to run on these machines >is basically a home-made code, not fully optimized, which allocates >around 500 Mb of RAM per node. > I'm surprised no one has picked up on this comment. If they have, I've missed it. If you don't want to mess with optimizing and tuning, don't even consider itanium. Let somebody else be the pioneer. You'll live longer. There are people who are seriously into this kind of stuff, and plainly you're not. RM _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From widyono at cis.upenn.edu Tue Nov 4 12:34:17 2003 From: widyono at cis.upenn.edu (Daniel Widyono) Date: Tue, 4 Nov 2003 12:34:17 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031103225814.B4021@lnxi.com> References: <20031103165141.A3153@lnxi.com> <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> <20031103225814.B4021@lnxi.com> Message-ID: <20031104173417.GA22590@central.cis.upenn.edu> I just asked Mason about the source of his statement, and he referred me to RedHat's own site, from which I found at http://www.redhat.com/about/corporate/trademark/guidelines/page9.html the following quote (taken out of the Educational Institutions paragraph): "This permission is not applicable to Red Hat. Enterprise Linux. or any Red Hat subscription product. Of course, you are always permitted to redistribute the code without utilizing Red Hat's trademark so long as you otherwise comply with the GNU General Public License and Red Hat's Trademark Guidelines." cAos's work is sounding mighty tasty. Thanks for the cross-post, Mike. Dan W. > Also, nice to see you cross posted to the rocks-discussion, for the > benefit of those on the beowulf list, Mason Katz (mjk at sdsc.edu) had an > informative reply: > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-November/003567.html > > It would appear as though Rocks is free and clear to openly redistribute > RHEL SRPM-rebuilds; this is an interesting loop-hole: > > - Rocks released by an academic institution, which means it has a > license to use the RedHat trademark. This also means no one can charge > for Rocks software (only support). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From phil at sdsc.edu Tue Nov 4 16:05:21 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Tue, 04 Nov 2003 13:05:21 -0800 Subject: [Rocks-Discuss]Re: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104173417.GA22590@central.cis.upenn.edu> References: <20031103165141.A3153@lnxi.com> <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> <20031103225814.B4021@lnxi.com> <20031104173417.GA22590@central.cis.upenn.edu> Message-ID: <3FA81491.9030501@sdsc.edu> Daniel Widyono wrote: >I just asked Mason about the source of his statement, and he referred me to >RedHat's own site, from which I found at > >http://www.redhat.com/about/corporate/trademark/guidelines/page9.html > >the following quote (taken out of the Educational Institutions paragraph): > >"This permission is not applicable to Red Hat. Enterprise Linux. or any Red >Hat subscription product. Of course, you are always permitted to >redistribute the code without utilizing Red Hat's trademark so long as you >otherwise comply with the GNU General Public License and Red Hat's Trademark >Guidelines." > If read the _entire_ paragraph, this refers to a redistribution of their binaries (which we did for distributions <= 7.3). For IA-64, we recompiled everything that was on RedHat's open-source _source_ directory. I very much doubt that the enterprise-specific crown jewels (failover, etc) are in this source directory. In particular, we are not redistributing redhat-created binaries. We went down the path of trying to get permission to re-distribute IA-64 to a small select group of folks, but could not and therefore we didn't -- Hence, we "figured" out how to rebuild a complete IA-64 distro from the open-source sources. Also, we are only working with the advanced workstation (AW) source tree. I don't know (and don't care) if this the same base set that is used in the advanced server (AS) and enterprise server (ES) version that Redhat sells as well. Our _current_ clustering needs are met quite well the re-compiled open-source rpms of AW. And we are not trying to re-engineer RedHat's entire line (even though some may think we are). If one needs AS or ES, the web address is http://www.redhat.com. So, are there things in AW that aren't in Rocks. Probably. I haven't looked in detail. Are there things in Rocks that aren't in AW. Absolutely (go check our cvs repository). Are there open-source add-ons in Rocks that we didn't author? Uh. Yeah. Try a whole litany of base cluster tools. MPICH, GM, SGE, Globus, Condor-G. ... . Are there open-source things that Redhat didn't author? Duh. So if it's all open source, what does redhat sell? Services, patches, updates, notifications, integration. All of this is very very valuable. They make key contributions to linux development and have authored a whole bunch a critical software -- Their (open-source) packaging format is de-facto standard. Even SuSE uses it. We like Redhat (alot), want to support them (both morally and with $$). There is, however, a reality of how much money people have and how much they are willing to spend. RedHat will find that balance (I hope) for clusters, universities, and others. I believe that most folks agree that O($200/node/yr) does not match either the amount of money people have or how much money they are willing to spend for the support in a clustered environment. -P > >cAos's work is sounding mighty tasty. Thanks for the cross-post, Mike. > >Dan W. > > > >>Also, nice to see you cross posted to the rocks-discussion, for the >>benefit of those on the beowulf list, Mason Katz (mjk at sdsc.edu) had an >>informative reply: >>https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-November/003567.html >> >>It would appear as though Rocks is free and clear to openly redistribute >>RHEL SRPM-rebuilds; this is an interesting loop-hole: >> >> - Rocks released by an academic institution, which means it has a >>license to use the RedHat trademark. This also means no one can charge >>for Rocks software (only support). >> >> -- == Philip Papadopoulos, Ph.D. == Program Director for San Diego Supercomputer Center == Grid and Cluster Computing 9500 Gilman Drive == Ph: (858) 822-3628 University of California, San Diego == FAX: (858) 822-5407 La Jolla, CA 92093-0505 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Nov 4 20:20:57 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 04 Nov 2003 17:20:57 -0800 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067992056.21779.50.camel@fpga.sandia.gov> References: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> Message-ID: <5.2.0.9.2.20031104171841.030df030@mailhost4.jpl.nasa.gov> At 05:27 PM 11/4/2003 -0700, Keith D. Underwood wrote: > > Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in > powers of > > 10, here, and 100 Gb/sec is just too hard to envision..) > > > > 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 > > Giga, right?) > >Um: http://pr.fujitsu.com/en/news/2000/09/25.html >or: http://www.siemens.com/page/1,3771,228164-1-999_4_0-0-,00.html > >Multiple Tb/s on a single fiber... > >Of course, I don't have any idea what they are going to do with the data >when they get it there either ;-) (i.e. issues with memory, buses, >processor speeds, etc.) Or, for that matter, how they are going to get >that data out of any silicon chip... But, I assume these are planned >for after we are doing all optical computing, right? ;-) > And I'll bet that fancy Wavelength Division Multiplexing (WDM) hardware that Fujitsu used is physically quite large to mux those 200 channels together. You'd run into a light time across the box problem. That nanosecond per foot adds up when you're sending bits at 10 Gbps. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 4 23:05:09 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 4 Nov 2003 23:05:09 -0500 (EST) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067995623.1208.431.camel@asterix> Message-ID: On 4 Nov 2003, Patrick Geoffray wrote: > But what about this twin photon spin stuff ? You send a photon, you keep > his twin brother and when you spin the second one, the first one spin > the same way a few thousand miles away.... Oooo, don't get me started on EPR. All I can say is no, no, and no. Can't describe relativistic phenomena with non-relativistic physics, and you can't "spin one photon one way" and do anything at all to the other thousands of miles away. Time reversal invariance. Generalized master equation. Quantum mechanics of closed systems is fully deterministic. Measurement represents classical interference (all measurement apparati are classical and of indeterminate/stochastic phase and known only via a trace in the GME). Ahhh, my head is exploding....must...stop...thinking...about...quantum...measurement...theory. > So let's say that Myricom's roadmap goes up way higher than 10 googooplexb/s > around 2030 :-) Let's see, if we drive a truck containing 125 terabyte sized RAID arrays through a big garage door fast enough to pass through in about one second...I guess that makes about a Pbps, right? So you're wrong, Jim. We can accomplish this today, at least in a burst. If we can afford a whole line of said trucks, we might even achieve it sustained...;-) All myricom needs is a bunch of trucks. BIG trucks...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Tue Nov 4 05:31:51 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Tue, 04 Nov 2003 11:31:51 +0100 Subject: Freebee RH Releases... In-Reply-To: References: Message-ID: <1067941911.902.126.camel@revolution.mandrakesoft.com> Le mar 04/11/2003 ? 00:31, Rocky McGaugh a ?crit : > I can hold my tongue no longer. > Most of us are faced with similar problems. > Several groups are in process of making a freely distributable OS > for scientific use. I must say that's also our goal :) This is the basis of the CLIC project but combined with an OS approach. Making a clustering OS for scientists. -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Tue Nov 4 23:35:56 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Tue, 04 Nov 2003 20:35:56 -0800 Subject: Freebee RH Releases... In-Reply-To: <1067941911.902.126.camel@revolution.mandrakesoft.com> (Erwan Velu's message of "Tue, 04 Nov 2003 11:31:51 +0100") References: <1067941911.902.126.camel@revolution.mandrakesoft.com> Message-ID: <85oevrph3n.fsf@blindglobe.net> Erwan Velu writes: > Le mar 04/11/2003 ? 00:31, Rocky McGaugh a ?crit : >> I can hold my tongue no longer. >> Most of us are faced with similar problems. >> Several groups are in process of making a freely distributable OS >> for scientific use. > I must say that's also our goal :) > This is the basis of the CLIC project but combined with an OS approach. > Making a clustering OS for scientists. It's many people's goal. It's one of the reasons for Quantian (a ClusterKnoppix repackaging, focusing on analytic and quantitative tools for data analysis) best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Tue Nov 4 23:46:07 2003 From: lathama at yahoo.com (Andrew Latham) Date: Tue, 4 Nov 2003 20:46:07 -0800 (PST) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: Message-ID: <20031105044607.37924.qmail@web60307.mail.yahoo.com> speacking of big trucks (I almost hurt myself laughing). did anyone ever use anything with the silk road technology. I read about them making fiber adapters with internal robotics to grab the fiber and automagicly align the laser for best use. It sounded cool a few years ago, Just curious. --- "Robert G. Brown" wrote: > On 4 Nov 2003, Patrick Geoffray wrote: > > > But what about this twin photon spin stuff ? You send a photon, you keep > > his twin brother and when you spin the second one, the first one spin > > the same way a few thousand miles away.... > > Oooo, don't get me started on EPR. All I can say is no, no, and no. > > Can't describe relativistic phenomena with non-relativistic physics, and > you can't "spin one photon one way" and do anything at all to the other > thousands of miles away. > > Time reversal invariance. Generalized master equation. Quantum > mechanics of closed systems is fully deterministic. Measurement > represents classical interference (all measurement apparati are > classical and of indeterminate/stochastic phase and known only via a > trace in the GME). > > Ahhh, my head is > exploding....must...stop...thinking...about...quantum...measurement...theory. > > > > > So let's say that Myricom's roadmap goes up way higher than 10 > googooplexb/s > > around 2030 :-) > > Let's see, if we drive a truck containing 125 terabyte sized RAID arrays > through a big garage door fast enough to pass through in about one > second...I guess that makes about a Pbps, right? So you're wrong, Jim. > We can accomplish this today, at least in a burst. If we can afford a > whole line of said trucks, we might even achieve it sustained...;-) > > All myricom needs is a bunch of trucks. BIG trucks...:-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /---------------------------------------------------------------------------------------------------\ Andrew Latham - Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a god and the future with witch religions are concerned with. Or, if not impossible, at least impossible at the present time. LathamA.com - (lay-th-ham-eh) - lathama at lathama.com - lathama at yahoo.com \---------------------------------------------------------------------------------------------------/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian.dobbins at yale.edu Wed Nov 5 00:38:03 2003 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Wed, 5 Nov 2003 00:38:03 -0500 (EST) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: Message-ID: Hah! Big trucks? Actually, if we use a collection of 8-disk RAID 5 arrays, equipped with Maxtor's 320GB EIDE HDD, we'd get effectively 2.24 TB per array, right? That can fit in a 2U rack, and we'd need 56 of them to reach 125 TB, so if we seperate that into 2 x 28, we'd need two rows, each 19" wide, 24" (approx.) tall, and 28 x 3.5" long - or 8.17 feet. (Laying them straight up, in this case) Hmm... let's make it 2 x 2 rows, 14 deep, so we've got a bulk that's 38" wide, 48" tall, and 49" long. Add a little for padding, I guess, but since this is just for throughput not counting, uh, well, lots of error correcting that may be needed if the road is bumpy, we'd get something a little over three feet wide, 4 feet tall, and just over 4 feet long, right? Taking a look at Chevy.com, it appears that the basic Silverado 1500 has a cargo box width of 49 inches, so we'd just make it, and a whopping 78.6 inches in length, so we can fit all of our RAID arrays, plus plenty of room left over for beer to celebrate the milestone with afterwards. :) > All myricom needs is a bunch of trucks. BIG trucks...:-) I'm betting with a U-Hual we could update the roadmaps even more! Cheers, - Brian Brian Dobbins Yale University Mechanical Engineering ------------------------------------------------------------------- "Be nice to other people... they outnumber you six billion to one." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Nov 5 01:49:27 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Tue, 04 Nov 2003 22:49:27 -0800 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: References: Message-ID: <5.2.0.9.2.20031104224018.04590310@216.82.101.6> So now you have a van full of RAID arrays... How do you load and unload it? It would be quite comical to run a spool of armored/weatherized multimode fiber from the computer room, down the hallway, out the door and into the parking lot. :-) Using the previously mentioned figure of 125TB in a truck, and a estimated coast-to-coast driving time of 60 hours, this works out to a one-way transfer rate of 2.08333 terabytes per hour, not counting the load and unload time. > Taking a look at Chevy.com, it appears that the basic Silverado 1500 has >a cargo box width of 49 inches, so we'd just make it, and a whopping 78.6 >inches in length, so we can fit all of our RAID arrays, plus plenty of >room left over for beer to celebrate the milestone with afterwards. :) > > > All myricom needs is a bunch of trucks. BIG trucks...:-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 00:30:35 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 5 Nov 2003 16:30:35 +1100 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: References: Message-ID: <200311051630.37595.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 5 Nov 2003 03:05 pm, Robert G. Brown wrote: > On 4 Nov 2003, Patrick Geoffray wrote: > > But what about this twin photon spin stuff ? You send a photon, you keep > > his twin brother and when you spin the second one, the first one spin > > the same way a few thousand miles away.... > > Oooo, don't get me started on EPR. All I can say is no, no, and no. > > Can't describe relativistic phenomena with non-relativistic physics, and > you can't "spin one photon one way" and do anything at all to the other > thousands of miles away. I think he's talking about things like using entanglement for crypto key exchanges, etc, which has already been done. Viz: http://www.quiprocone.org/pressrelease_JRarity.htm Jan 2001 - DERA Scientists achieve world record 1.9km range for free- space secure key exchange using quantum cryptography. My favourite quote: To avoid air turbulence effects the experiment was carried out over an elevated path with the receiver on the DERA Malvern site and the transmitter located in a rented room in a conveniently situated pub on the side of the Malvern hills. These are smart cookies working on this, John is an Institute of Physics medal winner (1995). - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qIr7O2KABBYQAh8RAi0pAJ0U+T7O7DKD4FA8hX+vwWNIBb+hcQCgi/S8 hY56fIUjydfLhcU+VcmrSP8= =PlZQ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Nov 5 05:01:15 2003 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 5 Nov 2003 11:01:15 +0100 (CET) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104224018.04590310@216.82.101.6> Message-ID: On Tue, 4 Nov 2003, Eric Kuhnke wrote: > So now you have a van full of RAID arrays... How do you load and unload > it? It would be quite comical to run a spool of armored/weatherized And quite comical to turn corners (depending on how the disks are oriented of course) But don't joke about things like this. This is exactly how e-cinema is currently arranged. As I recall, Lucasfilm delivered RAID arrays with the digital files of Star Wars Episode 1 to a cinema in Leicester Square, ready for digital projection. Star Wars is the first film to be short entirely digital - from the cameras right through to projection. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Wed Nov 5 05:22:53 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Wed, 5 Nov 2003 11:22:53 +0100 Subject: Cluster Poll Results (tangent into OS choices) Message-ID: <200311051022.hA5AMrw02523@dali.crs4.it> Joe Landman wrote: > There are interesting bits in debian. I am not sure it is necessarily > the right choice for clusters due to the specific lack of commercial > support for cluster specific items such as Myrinet, and the other high > speed interconnects. The above comment is just one of many that seem to me to describe the situation as being dependent on a commercial distribution whereas, in my experience, Red Hat was not sufficient. The most stable Red Hat for a long time was 7.3 but to solve a problem with the ext3 file system, I installed the most recent kernel from kernel.org. Moreover, from time-to-time I update gcc/g77/g++ from Gnu. If I waited for Red Hat, I would be out-of-date software for the compiler and the kernel, for a non-trivial period of time. Perhaps as a slogan I should write: its not that we need a distribution as good as Red Hat, we need something even better. I was motivated to write this when I read the reference to Myrinet. The gm driver needs to be compiled with a specific kernel. Using the most recent kernel (at the time I did the work around the beginning of 2003) from kernel.org and using the most recent gcc from Gnu, I built gm and MPICH-gm and it works fine. With regard to the comment by Joe Landman, I assume he is referring to a cluster-specific distribution such as ROCKS, whereas my comments make reference to Red Hat. My intention is to raise a general question, wouldn't any RedHat-like distribution be sufficient as the base such that one person could do the rest of the work needed to build and maintain a cluster? On our cluster we have MPICH_pgi, MPICH_intel, MPICH_gcc, MPICH_pgi_myrinet, MPICH_intel_myrinet and most of these also compiled for debugging. It not fun to build but I want to give users a choice. ^was Would I have the same flexibility automatically with a distribution oriented towards clusters? Alan Scheinine _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Nov 5 04:58:38 2003 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 5 Nov 2003 10:58:38 +0100 (CET) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104224018.04590310@216.82.101.6> Message-ID: On Tue, 4 Nov 2003, Eric Kuhnke wrote: > So now you have a van full of RAID arrays... How do you load and unload > it? It would be quite comical to run a spool of armored/weatherized And quite comical to turn corners (depending of course on how the disks are oriented). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at mendel.bio.caltech.edu Wed Nov 5 11:59:29 2003 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 05 Nov 2003 08:59:29 -0800 Subject: A Tyan S2466 gotcha Message-ID: Has anybody else seen this? Last week I was going nuts trying to figure out why one of our Tyan S2466 nodes was running slower than all the others. It was physically the same as the other nodes (1Gb memory, S2466 motheboard, single Athlon MP 2200+) yet CPU bound jobs ran about 25% slower on only this node. Finally it crossed my mind that maybe the vendor had somehow or other stuck the wrong chip in it, and sure enough, /proc/cpuinfo showed that one node having an Athlon MP 1600+. Except it wasn't really. Simply rebooting the node changed /proc/cpuinfo back to Athlon MP 2200+. Apparently under some circumstances this motherboard will throttle back from 266Mhz to 200Mhz, at which point it misreports the identity of the CPU. Asus mobos do something similar when they shut down funny (for instance on a power failure)but they stay at the slower "safe" setting until it is changed manually in the BIOS. This Tyan board bounces back up to the higher setting on the next reboot. Anyway, the take home lesson seems to be that one should scan the /proc/cpuinfo on all nodes following a reboot to verify that all came up at the rated speed. Is there some way to configure these nodes so that they cannot drop into the lower speed? Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 13:00:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 13:00:13 -0500 (EST) Subject: A Tyan S2466 gotcha In-Reply-To: Message-ID: On Wed, 5 Nov 2003, David Mathog wrote: > Anyway, the take home lesson seems to be that one should > scan the /proc/cpuinfo on all nodes following a reboot to > verify that all came up at the rated speed. xmlsysd: Content-Length: 728 AuthenticAMD 6 6 AMD Athlon(tm) MP 1900+ 1600.096 256 AuthenticAMD 6 6 AMD Athlon(tm) Processor 1600.096 256 plus wulfstat: r00 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 21d:04h:56m:09s| 98 r01 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:30 pm| 15d:23h:17m:58s| 94 r02 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 15d:23h:17m:50s| 93 r03 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 15d:23h:17m:26s| 93 r04 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 21d:04h:55m:39s| 98 ... make it easy to scan a cluster for this particular problem -- all the rnodes are 2466's:-) Did the clock drop on just ONE CPU or on both? xmlsysd provides both as you can see, but up to know I only have displayed the clock of the first one in wulfstat as it never occurred to me that they might be different. > Is there some way to configure these nodes so that > they cannot drop into the lower speed? What BIOS revision are you running? Most of the problems we've had with 2466's are related to running an older BIOS. It should be at least 4.03 I think to run fairly stably. Although if this is a thermal throttling to avoid processor burnout, what it may be telling you is that this particular node has a bad CPU cooler or a ribbon cable somewhere that is partially obstructing airflow. The Tyan/Athlon combination >>really<< hates heat and responds to an excess with temper tantrums and worse. We've found that just having CPU-coolers that "work" but rattle a bit while working is enough to induce node failure under load. You might not WANT to override the BIOS action here, but rather tweak the node to run cooler. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Wed Nov 5 12:16:32 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed, 5 Nov 2003 09:16:32 -0800 (PST) Subject: Fwd: [Mauiusers] SC 2003 Activities Message-ID: <20031105171632.17590.qmail@web11401.mail.yahoo.com> --- David Jackson wrote: > News Brief > > Cluster Resources to show the latest advances in HPC > Scheduling\Resource > Management at Supercomputing 2003 with "Moab" releases. > > - Unveiling of "Moab Cluster Scheduler" - Next Generation Scheduler > - Demonstration of "Moab Cluster Manager" - Graphical Mgmt Tool > - Early Access Release of "Moab Grid Scheduler" (a.k.a. Silver) > - Maui Consortium - Great Advancements, Great Benefits, Join the Team > - BOF - Birds of a Feather Meeting: Nov 18th at Noon to 1:00 P.M. > - Visit us in the Ames Lab, CHPC, and Indiana University Booths > > Nov. 04, 2003 - Cluster Resources is pleased to announce that > demonstrations and overviews of our latest developments and plans > will be > highlighted at Supercomputing 2003 in Phoenix. Join us at a "BOF" > meeting > on the 18th at 12:00 to 1:00 P.M. in room 16-18, and view > demonstrations > and discuss the latest advancements in the booths of Ames Lab, CHPC > and > Indiana. > > Cluster Resources will be unveiling the advancements being made in > the > next generation Moab Cluster Scheduler as well as demonstrating the > Open-alpha of our Moab Cluster Monitor/Manager product that makes HPC > > scheduling a significantly simpler task. Grid scheduling will also > make a > leap forward with the Early Access Release of Moab Grid Scheduler. > > Maui Consortium will unveil some of the latest advancements that have > just > completed as well as projects that are underway in the "BOF" meeting, > and > organizations that wish to be candidates for membership may either > approach us or any other of the founding members in that meeting or > in the > above mentioned booths. This year benefits go beyond at-cost > development > projects to cover regular group training sessions, discounted support > and > usage of Early Access Release software. We extend our thanks to CHPC > for > sponsoring this year's Maui Consortium "BOF" and for Ames Lab, CHPC > and > Indiana Universities for their consideration, in providing access in > their > booths. > > About Cluster Resource, Inc. > Cluster Resources, Inc. is an industry-leading provider of resource > scheduling and management software for cluster and grid environments. > > Our vision is to provide tools and services that enable organizations > to > understand, control, and fully optimize compute resources, allowing > organizations to realize the full potential of their compute resource > investment in a way that maximizes the service delivered to the > organization.s most critical objectives. Copyright ?2001-2003 > Cluster > Resources, Inc All Rights Reserved. For more information call (801) > > 873-3400 or visit www.clusterresources.com. > > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://supercluster.org/mailman/listinfo/mauiusers __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Wed Nov 5 13:03:51 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Wed, 05 Nov 2003 13:03:51 -0500 Subject: Petabits/sec, and the like In-Reply-To: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> References: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> Message-ID: <1068055431.11118.33.camel@roughneck.liniac.upenn.edu> On Wed, 2003-11-05 at 12:40, Jim Lux wrote: > All of the enjoyable chat about achieving stupendous data rates with disk > drives in trucks is quite interesting. By the way, I don't know why you > insist on having the drives mounted in racks..why not just leave them in > their original shipping containers. There's also the concept of how many > bits are being moved in, say, a container load of Britney Spears DVDs. > (leaving aside questions about redundancy, information entropy, and whether > there is any information content in Britney Spears to begin with) > ..snipped... > For example.... old style 10Mbps thinnet ethernet used solid dielectric > coax, which had a propagation velocity of about 0.66 c. twisted pair is > probably around 0.75, fiber optics are a bit tricky, depending on the mode > of propagation, but probably around 0.85. The pickup truck full of disks > is about 1E-7. The units of the new measure would be, what, (bits per > second)*(meters per second) or bit meters per second squared. I'd normalize > by c, to make the units more useful..I'd modestly propose calling the new > unit the Lux, but it's already been used, so perhaps we should recognize > rgb's contributions by calling it the "Brown" 10Mbps over thinnet would > then be 6MegaBrowns. 100mbps over twisted pair would be 70MegaBrowns. The > 1 Pb/s truckload of disks would be 100MegaBrowns. > Couldn't resist -- 'What can Brown do for you' ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Nov 5 12:40:09 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 05 Nov 2003 09:40:09 -0800 Subject: Petabits/sec, and the like Message-ID: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> All of the enjoyable chat about achieving stupendous data rates with disk drives in trucks is quite interesting. By the way, I don't know why you insist on having the drives mounted in racks..why not just leave them in their original shipping containers. There's also the concept of how many bits are being moved in, say, a container load of Britney Spears DVDs. (leaving aside questions about redundancy, information entropy, and whether there is any information content in Britney Spears to begin with) But, on to a more practical aspect. It seems that a mere bits per second number isn't useful, because it doesn't embody some practically important things, like latency or transport time, both of which can be significant. This is of particular concern to me, because I'm used to having to deal with networks where the round trip light time is significant. So, I propose that an interesting single metric might be to scale the bit rate by the latency with which the bits appear at the other end of the pipe. As illustrious an early high performance computing as Seymour Cray recognized that this could be significant when you're looking at pumping lots of bits real fast. And, there's a handy yardstick to measure by (issues of quantum entanglement and photon twinning aside), in vacuo speed of light. For example.... old style 10Mbps thinnet ethernet used solid dielectric coax, which had a propagation velocity of about 0.66 c. twisted pair is probably around 0.75, fiber optics are a bit tricky, depending on the mode of propagation, but probably around 0.85. The pickup truck full of disks is about 1E-7. The units of the new measure would be, what, (bits per second)*(meters per second) or bit meters per second squared. I'd normalize by c, to make the units more useful..I'd modestly propose calling the new unit the Lux, but it's already been used, so perhaps we should recognize rgb's contributions by calling it the "Brown" 10Mbps over thinnet would then be 6MegaBrowns. 100mbps over twisted pair would be 70MegaBrowns. The 1 Pb/s truckload of disks would be 100MegaBrowns. This is clearly the "raw pipe speed" too... not taking into account the headers and any coding that's going on. The disk drive pipe hides all the coding and sector headers, so the measurement is a real data transfer throughput. The Ethernet scheme on the other hand, is just the signalling rate, and there is some significant non-zero overhead. One might also ask whether physical size of the system being communicated within should be factored in (say, when talking about bisection bandwidth). Clearly, a cluster with a physical dimension of 100meters is going to be slower than one with a physical dimension of 1 meter, all other things (processor speed, comm speed, etc.) being equal. One has to also consider the bandwidth of the entrance and exit to the pipe... merely having the capability to transport Tb of disk drives rapidly doesn't mean that you can put data onto those disks at a Pb/s and get it off at the other end of the shipping channel. This is where those "use free air as a communication medium" schemes get into trouble. Sure, the optical bandwidth of air (or optical fiber) is pretty darn wide (on the order of 0.5 PetaHertz (a unit I never thought I'd ever use) for just the visible spectrum) but the modulation and demodulation might prove to be a problem. There's also the issue of real computing efficiency.. speed is not everything in some applications... some applications might optimize for calculations per Dollar/Euro or calculations/Joule. Coming up with a metric for the calculation is a bit tricky. The calculations could be viewed as extracting information bits from a redundant data set (a coding/decoding process), or as creating new information (although, hmmm... this gets a bit metaphysical) I leave the selection of appropriate units and names to the community. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 13:39:54 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 13:39:54 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> Message-ID: On Wed, 5 Nov 2003, Jim Lux wrote: > by c, to make the units more useful..I'd modestly propose calling the new > unit the Lux, but it's already been used, so perhaps we should recognize > rgb's contributions by calling it the "Brown" 10Mbps over thinnet would > then be 6MegaBrowns. 100mbps over twisted pair would be 70MegaBrowns. The > 1 Pb/s truckload of disks would be 100MegaBrowns. You are clearly an evil man, and children and pets probably cross the street to avoid you. For the love of God, don't name a unit the Brown. Megabrowns. Sheeesh. > This is clearly the "raw pipe speed" too... not taking into account the > headers and any coding that's going on. The disk drive pipe hides all the > coding and sector headers, so the measurement is a real data transfer > throughput. The Ethernet scheme on the other hand, is just the signalling > rate, and there is some significant non-zero overhead. > > One might also ask whether physical size of the system being communicated > within should be factored in (say, when talking about bisection > bandwidth). Clearly, a cluster with a physical dimension of 100meters is > going to be slower than one with a physical dimension of 1 meter, all other > things (processor speed, comm speed, etc.) being equal. > > One has to also consider the bandwidth of the entrance and exit to the > pipe... merely having the capability to transport Tb of disk drives rapidly > doesn't mean that you can put data onto those disks at a Pb/s and get it > off at the other end of the shipping channel. This is where those "use > free air as a communication medium" schemes get into trouble. Sure, the > optical bandwidth of air (or optical fiber) is pretty darn wide (on the > order of 0.5 PetaHertz (a unit I never thought I'd ever use) for just the > visible spectrum) but the modulation and demodulation might prove to be a > problem. > > > There's also the issue of real computing efficiency.. speed is not > everything in some applications... some applications might optimize for > calculations per Dollar/Euro or calculations/Joule. Coming up with a > metric for the calculation is a bit tricky. The calculations could be > viewed as extracting information bits from a redundant data set (a > coding/decoding process), or as creating new information (although, hmmm... > this gets a bit metaphysical) > > I leave the selection of appropriate units and names to the community. By the time you add dollars to the problem those truckfulls of disks look pretty damn good, actually. Which one is cheaper: Building an optical fiber network capable of distributing the kids of datasets they accumulate at the big accelerator labs to the participating Universities (often on the other side of the country) with enough bandwidth to be useful, or cross-shipping a RAID that gets refilled and emptied at the ends? Consider a metaphor: Fermilab is a river of data. People at Duke are thirsty, but they can only drink just so much just so fast. It is very likely much cheaper to just ship Duke an occasional truckfull of bottled water -- I mean data -- than to build a crosscountry pipeline just to put a high capacity spigot in a single room. It is also useful to consider how long it takes to FILL a terabyte RAID. Even at (say) 100 MB/sec it is still 10,000 seconds, or about three hours. A petabyte would require 3000 hours (admittedly potentially in parallel). That would be a goodly chunk of a year. By the time bottlenecks like this are considered, the time and cost of overnight shipping a containerized PB across the country are relatively insignicant. Interesting transformations between time and spatial dimensions involved in all of this. wire/fiber carrier frequency, wire/fiber bundle density and multiplexing/termination costs plust the cost of the wire/fiber itself vs achieving a very high spatial information density using a storage VOLUME and moving the space, with THOSE associated costs. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tony at mpi-softtech.com Wed Nov 5 13:22:04 2003 From: tony at mpi-softtech.com (Anthony Skjellum) Date: Wed, 5 Nov 2003 12:22:04 -0600 Subject: IB vs Myrinet References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <20031104103747.GA836@sphere.math.ucdavis.edu> Message-ID: <000a01c3a3cd$01575670$a900a8c0@cis.uab.edu> Roger, FYI, we have had only trivial issues supporting Opteron-based IB, for instance, things worked in a matter of hours with Racksaver-based Opteron dual 1U boxes with Mellanox cards and VAPI drivers. This is with a SuSE/UNITED linux type build that Racksaver generally ships (nothing specialized). In polling mode, we had about 4.5us latency from the Mellanox A1 cards with VAPI out of the box from MPI/Pro (above 4-5 months ago). After that, we ran ChaMPIon/Pro on some larger cluster configs with dual Opteron too, all very straightforward. -Tony ----- Original Message ----- From: "Roger L. Smith" To: "Bill Broadley" Cc: "Joey Sims" ; Sent: Tuesday, November 04, 2003 8:34 AM Subject: Re: IB vs Myrinet > On Tue, 4 Nov 2003, Bill Broadley wrote: > > > On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > > > > > IB is about to find major traction in this industry and Myricom will > not > > > > I've heard this statement for 2 years running, not that it couldn't > > become true. > > Just look at all of the recent press releases for IB clusters being built. > The hardware is finally actually available, and a lot of HPC clusters are > starting to be built with it. > > In the spirit of full disclosure, I have three engineers on-site today > from an IB vendor working with me to install a 192 node diskless IB > cluster. > > > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > > > interconnect technology ratcheted up way higher than 10GB. > > > > Roadmaps are great, easy, and cheap. I'm most interested in > > what I can build a cluster with today. > > > > Do you mean 10 GB/sec = 10 Gigabytes/sec or 10 Gigabits/sec? In what > > conditions? Where can I download a linux compatible driver? Linux > > compatible MPI implementation? Linux/AMD64 drivers/MPI implementations? > > It's 10 gigabits per second (theoretical). Linux drivers are available > from all of the vendors. Certain vendors (including the one I purchased > my IB from) provide open-source drivers. > > There are a few MPI implementations, there are commercial versions MPI/Pro > and ChaMPIon from MPI Software Technology, Inc. MVAPICH is available from > OSC, and I'm hearing that there may be a version of LAM in the near > future. > > I'm not sure of the status of the AMD64 drivers, although I know of at > least one AMD64 cluster currently being built with IB, so at least some > level of support exists. > > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ > | Roger L. Smith Phone: 662-325-3625 | > | Sr. Systems Administrator FAX: 662-325-7692 | > | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | > | Mississippi State University | > |____________________________________ERC__________________________________| > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From konstantin_kudin at yahoo.com Wed Nov 5 17:54:05 2003 From: konstantin_kudin at yahoo.com (Konstantin Kudin) Date: Wed, 5 Nov 2003 14:54:05 -0800 (PST) Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux Message-ID: <20031105225405.60387.qmail@web21205.mail.yahoo.com> Could anyone please share experiences with these boards under linux? Is it still a risky proposition at this time? It seem like there are drivers for AMD-8111/8131/8151 chipset on the AMD page, drivers for the Broadcom network chip in other places. Any feedback on SATA support for the Silicon Image Sil3114 SATA RAID Accelerator and on SATA support in general? Any other caveats? Thanks in advance for any help! Konstantin __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Nov 5 16:41:49 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed, 05 Nov 2003 13:41:49 -0800 Subject: Petabits/sec, and the like In-Reply-To: References: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> Message-ID: <5.2.0.9.2.20031105133632.03f6f078@216.82.101.6> For those interested in long distance optical transmission of research data, Bill St. Arnaud of CANARIE/CAnet4 posts many interesting things to the CANET-NEWS list: http://morris.canarie.ca/MLISTS/news2003/index.html Quoting from a post to the list: "Another example is the Canadian Virtual Observatory that will require to transfer data files and stream instrumentation data of over half a terabyte a day (!!) from facilities to Hawaii, France and UK http://www.risq.qc.ca/risq2003-canw2003/en/conferenciers/_sgaudet.html " >Consider a metaphor: Fermilab is a river of data. People at Duke are >thirsty, but they can only drink just so much just so fast. It is very >likely much cheaper to just ship Duke an occasional truckfull of bottled >water -- I mean data -- than to build a crosscountry pipeline just to >put a high capacity spigot in a single room. > >It is also useful to consider how long it takes to FILL a terabyte RAID. >Even at (say) 100 MB/sec it is still 10,000 seconds, or about three >hours. A petabyte would require 3000 hours (admittedly potentially in >parallel). That would be a goodly chunk of a year. By the time >bottlenecks like this are considered, the time and cost of overnight >shipping a containerized PB across the country are relatively >insignicant. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 17:23:39 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 6 Nov 2003 09:23:39 +1100 Subject: Petabits/sec, and the like In-Reply-To: <5.2.0.9.2.20031105133632.03f6f078@216.82.101.6> References: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> <5.2.0.9.2.20031105133632.03f6f078@216.82.101.6> Message-ID: <200311060923.40670.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 6 Nov 2003 08:41 am, Eric Kuhnke wrote: > Quoting from a post to the list: > "Another example is the Canadian Virtual Observatory that will require to > transfer data files and stream instrumentation data of over half a terabyte > a day (!!) from facilities to Hawaii, France and UK There are higher rates on the horizon, for instance the LOFAR (Low Frequency Array) telescope that's proposed will reportedly be delivering multi-terrabits a second from each detector, which will need to be processed on site. Part of a CSIRO webpage on the project, if it were to be located in Western Australia, says: http://www.atnf.csiro.au/projects/ska/general/lofar.html [quote] Specific technologies that would be developed for LOFAR in WA include: * The construction of a 6 terabit/second optic-fibre link from the heart of inland WA to coastal Geraldton. This is a higher data-rate than systems in general use today. LOFAR would therefore represent a non-commercial test-bed for developing technologies. [/quote] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qXhrO2KABBYQAh8RAp0AAKCD2ccIYF4psvK78skXKd58Twg0rwCeMjCZ f+Shi+yhowKtXPpRI9agGHY= =K4rn -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 17:46:29 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 17:46:29 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <200311060923.40670.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > There are higher rates on the horizon, for instance the LOFAR (Low Frequency > Array) telescope that's proposed will reportedly be delivering > multi-terrabits a second from each detector, which will need to be processed > on site. > > Part of a CSIRO webpage on the project, if it were to be located in Western > Australia, says: > > http://www.atnf.csiro.au/projects/ska/general/lofar.html > > [quote] > > Specific technologies that would be developed for LOFAR in WA include: > > * The construction of a 6 terabit/second optic-fibre link from the heart of > inland WA to coastal Geraldton. This is a higher data-rate than systems in > general use today. LOFAR would therefore represent a non-commercial test-bed > for developing technologies. > > [/quote] Ah, did I mention that building this sort of thing is a very important jobs program for struggling telcom industries;-)? That should be very exciting. Very, very expensive, but very exciting. Just mux/demuxing the data should be "interesting", as should finding someplace to put it as it comes through. Sort of like catching the aforementioned metaphorical river as it flows and splitting it into lots of small pipes that go into bottles that just exactly fill all without spilling a drop. Unless it is all about peak transmission times for small bundles of data, that is... rgb > > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/qXhrO2KABBYQAh8RAp0AAKCD2ccIYF4psvK78skXKd58Twg0rwCeMjCZ > f+Shi+yhowKtXPpRI9agGHY= > =K4rn > -----END PGP SIGNATURE----- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 17:52:17 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 17:52:17 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <200311060923.40670.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > Part of a CSIRO webpage on the project, if it were to be located in Western > Australia, says: > > http://www.atnf.csiro.au/projects/ska/general/lofar.html > > [quote] > > Specific technologies that would be developed for LOFAR in WA include: > > * The construction of a 6 terabit/second optic-fibre link from the heart of > inland WA to coastal Geraldton. This is a higher data-rate than systems in > general use today. LOFAR would therefore represent a non-commercial test-bed > for developing technologies. On a second thought, I suspect that the OF link is going to be delivering real time analog data. This is very similar to a plan for an even bigger radiotelescope that I've had for years -- one that spans a continent, or even continents. The key to making a radiotelescope is being able to deliver realtime traces of the received signals with very precise time/phase delay information to a centralized location where the traces can be recombined and used to create an interference projection of the sky. Perhaps they're going to digitize the signal(s) first, but I don't see why they would offhand. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Nov 5 19:15:38 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed, 05 Nov 2003 16:15:38 -0800 Subject: Petabits/sec, and the like In-Reply-To: References: <200311060923.40670.csamuel@vpac.org> Message-ID: <5.2.0.9.2.20031105160600.02b003c0@216.82.101.6> Re: Australian terabit project The budget for routers alone will be astronomical... Loading up a Juniper T640 with OC-192 PICs isn't cheap! To the best of my knowledge there are submarine cables commercially available from Alcatel, Fujitsu/Siemens and KDDI with capacities in the 640Gb range. This requires not only multiple OC-192 capable routers, but vastly expensive DWDM terminal equipment at each end to insert multiple lambdas in fiber pairs. 20 singlemode fibre Rx/Tx pairs * 20 DWDM wavelengths per fiber pair * real-world 9.2Gb/s per DWDM wavelength = one immense problem processing/receiving at the destination of your data stream. Currently, with the telecom capacity glut, the highest capacity single-purpose cables laid in 2001 such as 360Atlantic (Boston USA to UK) and Tyco Transatlantic are operating at 80Gb or less. As a price reference, the FLAG group (recently purchased by Reliance Telecom) spent something like $1.6 to $2.0 billion to build a worldwide network four years ago with 80Gb capacity. >That should be very exciting. Very, very expensive, but very exciting. >Just mux/demuxing the data should be "interesting", as should finding >someplace to put it as it comes through. Sort of like catching the >aforementioned metaphorical river as it flows and splitting it into lots >of small pipes that go into bottles that _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 17:57:07 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 6 Nov 2003 09:57:07 +1100 Subject: Petabits/sec, and the like In-Reply-To: References: Message-ID: <200311060957.10972.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 6 Nov 2003 05:39 am, Robert G. Brown wrote: > By the time you add dollars to the problem those truckfulls of disks > look pretty damn good, actually. To take an example, I know of a local group doing work at CERN. Apparently they are participating in experiments that generate around 1TB a day of data (no idea if that's compressed/uncompressed or how compressible it would be). At the standard AARNET rate for international traffic of AU$22.50 per gig that's over $22,000 dollars a day if you wanted to pull that back over the 'net (assuming sufficient bandwidth to be able to do that). It then becomes obvious why they choose to fly it back with them. :-) - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qYBDO2KABBYQAh8RAnfvAJ4rAXhYoKnZmJiNOt6UhO8Jq5EZGwCdE1TG OsIsfqAvJ+3setXpVCA8v8A= =6mur -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 18:00:15 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 6 Nov 2003 10:00:15 +1100 Subject: Petabits/sec, and the like In-Reply-To: References: Message-ID: <200311061000.16897.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 6 Nov 2003 09:52 am, Robert G. Brown wrote: > On a second thought, I suspect that the OF link is going to be > delivering real time analog data. This is very similar to a plan for an > even bigger radiotelescope that I've had for years -- one that spans a > continent, or even continents. The key to making a radiotelescope is > being able to deliver realtime traces of the received signals with very > precise time/phase delay information to a centralized location where the > traces can be recombined and used to create an interference projection > of the sky. LOFAR is an interferometer in its own right, and it'll be the only 'scope going down to those frequencies (AFAIK) and so there won't be anything else to combine it with. :-) Website: http://www.lofar.org/ - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qYD/O2KABBYQAh8RAv6qAJ9z5FxCcxMDoA+F8mWqcyaf6y772wCfTQta IGEZ3BAmH3fnCMppwoCur2I= =YWrJ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 19:18:26 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 19:18:26 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <200311061000.16897.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, 6 Nov 2003 09:52 am, Robert G. Brown wrote: > > > On a second thought, I suspect that the OF link is going to be > > delivering real time analog data. This is very similar to a plan for an > > even bigger radiotelescope that I've had for years -- one that spans a > > continent, or even continents. The key to making a radiotelescope is > > being able to deliver realtime traces of the received signals with very > > precise time/phase delay information to a centralized location where the > > traces can be recombined and used to create an interference projection > > of the sky. > > LOFAR is an interferometer in its own right, and it'll be the only 'scope > going down to those frequencies (AFAIK) and so there won't be anything else > to combine it with. :-) I meant the "outlying stations" that give it a large baseline for high resolution. My idea has (for years now) been to transform all the cell phone towers in a country into a gigantic radiotelescope. You lose the single receiver directionality that LOFAR has with a tight array of parabolic receivers, but it is potentially SO cheap and there are SO many stations with SUCH a large baseline that overall brightness and resolution should be quite satisfactory. The north american continent, for example, would have an aperture of what, roughly 5000 km with towers strung in irregular networked distributions -- every few km along major highways and in dense clusters near cities and increasingly near even small highways and small towns. There must be tens to hundreds of thousands of towers by now, with interference brightening of 10^8 or so along the selected direction. I actually have a student working on this idea to a limited extent at this very moment -- sort of a preliminary feasibility study. In fundamental terms this means determining if the cell tower owners are willing to permit a dual public use of their receivers (which should be passive and utterly irrelevant to their function as cell phone antennae). Otherwise, I expect that all of the towers have fiber to them already; it is just a matter of piggybacking...;-) Using GPS and/or atomic clocks to establish a precise time base, local PC's should be able to record a generalized radio trace at a particular frequency. The same GPS can be used to precisely locate the towers. With a precise physical map of the receivers and a precise signal against a common time base, recombining the signals with various delays to assemble an image is then straightforward. In fact, with the time base, one could even do (I think) Hanbury-Brown-Twiss correlation studies, which I imagine is also a goal of LOFAR via its outlier stations although I haven't read far enough to find out. In this context I don't know whether or not the traces from the individual towers would be best sent digitized or not. In the LOFAR context they probably are. Alas, I'm a theorist and so I'm not sufficiently familiar with the hardware requirements one has to work with to capture, save, send, and ultimately recombine the signals, although I can visualize the math easily enough. I'll see if I can get my student to join the LOFAR discussion group. I think he's a bit behind on this anyway, with all the work he has this semester. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Nov 6 04:23:50 2003 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 6 Nov 2003 10:23:50 +0100 (CET) Subject: Petabits/sec, and the like In-Reply-To: <200311060957.10972.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > To take an example, I know of a local group doing work at CERN. Apparently > they are participating in experiments that generate around 1TB a day of data > (no idea if that's compressed/uncompressed or how compressible it would be). > I THINK (though don't quote me) that this is the raw data rate. What happens in an HEP experiment is that raw data comes from the detector. It is passed through three levels of trigger processors, from a very simple (are 1st level inside the detector at LHC???) to a third level, which is run on PCs. I guess this 1TB rate is the raw event rate after the level 3 trigger. The data is then sent to a reconstruction farm, where the raw levels are combined into tracks and energy deposits, using the physical data and calibrations of the detector. The physicists then work on the resulting DST - data summary tape, which is much less data than the raw data. I'm not sure of the plans for processing raw data at LHC - maybe all is processed at the main site, maybe som is shipped off to the Tier 1 centres. I really don't know the answer here. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Thu Nov 6 07:39:13 2003 From: djholm at fnal.gov (Don Holmgren) Date: Thu, 06 Nov 2003 06:39:13 -0600 Subject: Petabits/sec, and the like In-Reply-To: References: Message-ID: On Thu, 6 Nov 2003, John Hearns wrote: > On Thu, 6 Nov 2003, Chris Samuel wrote: > > > To take an example, I know of a local group doing work at CERN. Apparently > > they are participating in experiments that generate around 1TB a day of data > > (no idea if that's compressed/uncompressed or how compressible it would be). > > > I THINK (though don't quote me) that this is the raw data rate. > What happens in an HEP experiment is that raw data comes from the > detector. > It is passed through three levels of trigger processors, from > a very simple (are 1st level inside the detector at LHC???) to a third > level, which is run on PCs. > > I guess this 1TB rate is the raw event rate after the level 3 trigger. > The data is then sent to a reconstruction farm, where the raw levels > are combined into tracks and energy deposits, using the physical > data and calibrations of the detector. > > The physicists then work on the resulting DST - data summary tape, > which is much less data than the raw data. > > I'm not sure of the plans for processing raw data at LHC - > maybe all is processed at the main site, maybe som is shipped off > to the Tier 1 centres. I really don't know the answer here. > I was part of the team that implemented the level 3 trigger at the CDF experiment at FNAL. The order of magnitude data rate out of the detector is 1 TByte/sec - collisions at O(1 MHz), O(1 million) data channels, O(1 byte/channel). That rate gets reduced through Level 1, 2, and 3 triggers. The level 3 trigger Linux computers do event building (assembling full events from event fragments sent via an ATM switch) and reconstruction (full events distributed via fast ethernet, data "inverted" to produce particle tracks and energies). Here were the specifications we worked from in 1997 for L3: - event rate into L3: 250 to 1000 Hz - event size: 250 KB avg - accept rate: 72 Hz The accept rate translates into 18 MB/sec, written to mass storage. At this 18 MB/sec (set by the tape budget, BTW), CDF currently writes ~ 1.5 TB/day to tape. The D0 experiment at Fermilab is writing a similar amount. On typical days, the Fermilab mass storage system moves 10's of TB/day - I think the record is something like 35 TB/day. I'm not sure of LHC design numbers, but believe they are more like 1 GB/sec to storage. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Nov 6 07:55:04 2003 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 6 Nov 2003 13:55:04 +0100 (CET) Subject: Petabits/sec, and the like In-Reply-To: Message-ID: On Thu, 6 Nov 2003, Don Holmgren wrote: > > > - event rate into L3: 250 to 1000 Hz > - event size: 250 KB avg > - accept rate: 72 Hz > > The accept rate translates into 18 MB/sec, written to mass storage. Wow. (I was part of a LEP experiment, and we of course had much less data to content with. I still remember though in the days of low capacityB leased lines that I ftp'd the first LEP event back to Glasgow Uni. I was soundly rapped over the knuckles for tying up the whole line)A > > > I'm not sure of LHC design numbers, but believe they are more like 1 > GB/sec to storage. We are in the wrong game. Time to buy shares in a tape manufacturer like 3M. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Nov 6 08:02:06 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 6 Nov 2003 08:02:06 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: Message-ID: On Thu, 6 Nov 2003, John Hearns wrote: > I'm not sure of the plans for processing raw data at LHC - > maybe all is processed at the main site, maybe som is shipped off > to the Tier 1 centres. I really don't know the answer here. We have some people who work at CERN and Fermilab, and they do indeed talk about needing very, very fat pipes, big big disk, and the biggest problem -- backup to match. Or if you prefer, viewing tape as a reasonably compact high-data-density transport medium, these labs have shipped tapes around from time immemorial -- we just happen to live in a time where tape densities have been most unfortunately bypassed by hard media in both data density in one direction and cost in the other. My feeling is that the labs are still in the process of reacting to this and reengineering the data transport problem, balanced between wildly varying costs and ease of use for different alternatives, political pressure (I wasn't kidding about the jobs program for telcoms part -- lots of politicians would LOVE to see billion dollar programs for fat pipes funded), and the actual facility/infrastructure realities at both ends. Duke, for example, is on one of the experimental gigabit networks, which (at least when the project was started) was pretty bleeding edge, but this is still only 0.1% of a terabps, connectivity is far from uniform across the net, the pipe is shared by many users (and it isn't just the HEP community that makes fat data sets -- medical centers like to ship around images of their own:-). At the one or two meetings I've sat in on with these guys (discussing beowulfery and data transport) its like they look at the primary campus feed and kind of shrug their shoulders and ask if they can get a few of those for themselves -- one isn't enough. I personally think that there are always going to be bleeding edge consumers of advances on any of the primary computing/data processing bottlenecks. Even with terabyte RAIDS (a number that would have been unthinkably expensive just five years ago that I could now build for myself upstairs using leftover development account money, if I had the slightest use for a TB:-) some people are blocked by too little disk. LOTS of people ride the Moore's Law curves on raw CPU and memory (size and speed both). Others pray for networks that could carry orders of magnitude more than a "mere" Gbps. Most of us on this list likely wish for whole combinations of the above -- a desktop RAID holding a petabyte of data backed up to a holographic optical crystal, 100x faster CPUs with 1000x larger and faster memory (to get memory speed closer to CPU speed) fed by networks with 1000x the bw and 1/1000th the latency (c'mon, admit it, network latency on the order of a nanosecond would be lovely. Too bad about that pesky speed of light thing...:-). And while we're messing with that holographic crystal in our imaginations, let's just make everything optical and built on top of nanoscale devices, shall we? One thing of great beauty is that Moore's Law makes it quite likely that at least some of this "insane" wish list will come true over the next decade. Not the ns-latency network though...at least if you want to talk to things more than a few cm away. . [Although hey, if y'all think one can violate causality AND TRANSMIT MESSAGES by "twisting" one of a pair of correlated photons, a little thing like ns latency networks across the entire continent become straightforward, right? Cannot use non-relativistic Schrodinger equations or even concepts to describe relativistic field propagation, grumble... no such thing as "wavefunction collapse", grumble, not time-reversal invariant, grumble, violates causal propagation of field UNLESS one looks at advanced field and Wheeler-Feynman and Dirac which do not permit separation of local field interaction of eventual absorber/measurement device from system even "back" at emission event on same light cone, grumble. Having a grumbly day. Stayed up too late working on something for a slave-driv... I mean "friend" of mine on this very list...Grumble;-)] rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ZukaitAJ at nv.doe.gov Thu Nov 6 11:55:20 2003 From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony) Date: Thu, 6 Nov 2003 08:55:20 -0800 Subject: Scyld and MPICH. Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E1E@lao-exchpo1-nt.nv.doe.gov> I am having a problem with MPI_reduce and I believe that it is a buffer size error. Is there a way to calculate the maximum size of the buffer and what is the maximum size of the buffer allowed? It does not seem to be linear with the number of processors. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mike.sullivan at alltec.com Thu Nov 6 13:39:51 2003 From: mike.sullivan at alltec.com (Mike Sullivan) Date: Thu, 06 Nov 2003 13:39:51 -0500 Subject: Tyan 2880 and 2885 Message-ID: <3FAA9577.2040001@alltec.com> >Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? I have used the 2880 under RedHat AS 2.1 and gingin64 and it works fine execpt for the SATA controller. I did not get the promise chip to work but did not spend a lot of time on it. The GigE interface works. The board was stable and I have been using them in NAS devices with 3ware cards. The SMDC option for these units works fairly well with the most recent console and you can get sensor data. > It seem like there are drivers for AMD-8111/8131/8151 >chipset on the AMD page, drivers for the Broadcom >network chip in other places. Any feedback on SATA >support for the Silicon Image Sil3114 SATA RAID >Accelerator and on SATA support in general? Any other >caveats? I also have both a 2882 and 2885 that I will be testing early next week with Suse Linux 9 for AMD64 and would will post my findings. Thanks in advance for any help! Konstantin -- Mike Sullivan Director Performance Computing @lliance Technologies, Voice: (416) 385-3255 x 228, 18 Wynford Dr, Suite 407 Fax: (416) 385-1774 Toronto, ON, Canada, M3C-3S2 Toll Free:1-877-216-3199 http://www.alltec.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Thu Nov 6 13:37:59 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Thu, 06 Nov 2003 10:37:59 -0800 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: <20031105225405.60387.qmail@web21205.mail.yahoo.com> References: <20031105225405.60387.qmail@web21205.mail.yahoo.com> Message-ID: <3FAA9507.2000508@cert.ucr.edu> Konstantin Kudin wrote: > Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? > > We have a few of the s2880's. They were real problematic at first in that they'd constantly crash. But it turned out that when I downgraded the bios, all of our problems went away. Of course I also needed to install the latest 2.4.22 kernel before the machines would boot with the older bios installed. I'm not sure what to tell you about the serial ata support, as I've never played with it. Linux seems to support the nic just fine though. Hope that helps, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Thu Nov 6 12:48:24 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Thu, 06 Nov 2003 11:48:24 -0600 Subject: Scyld and MPICH. In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E1E@lao-exchpo1-nt.nv .doe.gov> References: <09AE3D324A22D511A1A50002A5289F2101030E1E@lao-exchpo1-nt.nv.doe.gov> Message-ID: <6.0.0.22.2.20031106114611.039168b8@localhost> At 10:55 AM 11/6/2003, Zukaitis, Anthony wrote: >I am having a problem with MPI_reduce and I believe that it is a buffer size >error. Is there a way to calculate the maximum size of the buffer and what >is the maximum size of the buffer allowed? It does not seem to be linear >with the number of processors. There should be no maximum buffer size, though the ch_p4 device does impose a limit when shared memory is used to transfer a message. Do you have an example program that we could test (Bug reports for MPICH should be sent to mpi-maint at mcs.anl.gov) Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tod at gust.sr.unh.edu Thu Nov 6 15:02:25 2003 From: tod at gust.sr.unh.edu (Tod Hagan) Date: 06 Nov 2003 15:02:25 -0500 Subject: Article: Sony Cell CPU to deliver two teraflops in 64-core config Message-ID: <1068148945.24918.28.camel@haze.sr.unh.edu> http://www.theregister.co.uk/content/3/33791.html It also mentions the ClearSpeed chip that was discussed here recently. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From aasmund at simula.no Thu Nov 6 17:52:28 2003 From: aasmund at simula.no (=?iso-8859-1?Q?=C5smund_=D8deg=E5rd?=) Date: Thu, 06 Nov 2003 23:52:28 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031105000513.GA2101@galactic.demon.co.uk> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> <3FA82D51.3070102@scalableinformatics.com> <20031105000513.GA2101@galactic.demon.co.uk> Message-ID: On Wed, 5 Nov 2003 00:05:13 +0000, Andrew M.A. Cater wrote: > > On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: >> >> There are interesting bits in debian. I am not sure it is necessarily >> the right choice for clusters due to the specific lack of commercial >> support for cluster specific items such as Myrinet, and the other high >> speed interconnects. > > Dan - if I build a _really big_ cluster, will you get Quadrics to do > Debian :) > Same goes for any other vendor - if you ask them nicely and make it > worth their while, they'll do it. In many cases, it's only a recompile > of a device driver to account for library differences, after all. > > HP use Debian internally, IIRC. Some of the Debian developers are also > HP folk - HP are potentially looking to support more of their products > under Linux? [See, for example, Debian Weekly News for today :) ]' Actually, we have quite recently installed a Itanium2 based cluster, using debian, because we want debian. We got HP to do it for us, using the (former Compaq) CMU tool. They did some porting to support debian in this tool... So, ask nicely (and put it as a requirement to let them get the deal), and you can get what ever you want ;-) >> Commercial compiler support for Debian (e.g. >> Intel, Absoft, et al) is largely non-existant as far as I know (please >> do correct me if I am wrong). No problem with Intel compilers on Debian (alien do the trick). -- [simula.research laboratory] ?smund ?deg?rd Scientific Programmer / Chief Sys.Adm phone: 67828291 / 90069915 http://www.simula.no/~aasmundo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Thu Nov 6 19:59:51 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Thu, 6 Nov 2003 16:59:51 -0800 (PST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031106145623.GA5867@iib.unsam.edu.ar> Message-ID: <20031107005951.2157.qmail@web11407.mail.yahoo.com> A very good paper about building HPC clusters with FreeBSD: "Building a High-performance Computing Cluster Using FreeBSD" http://people.freebsd.org/~brooks/papers/bsdcon2003/ The author talked about hardware issues: KVM, BIOS redirection, CPU choices; and then talked about why he chose FreeBSD instead of Linux... he also did the port of GridEngine (SGE) to FreeBSD. Anyone tried to setup HPC clusters with *BSD?? Rayson --- Fernan Aguero wrote: > Any FreeBSD users willing to share clustering experiences > out there? > > Fernan __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Thu Nov 6 22:07:53 2003 From: jsims at csiopen.com (Joey Sims) Date: Thu, 6 Nov 2003 22:07:53 -0500 Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL Message-ID: <812B16724C38EE45A802B03DD01FD5471E049E@exchange.concen.com> Maybe someone could lend a hand and help Intel find out what their unknown material is. Be careful! Don't spill it in your lap for goodness sake.... Dohh! :-O I found this amusing: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL 11.07.03 by Jennifer Tabor HPCwire ======================================================================== ====== Chip makers are searching for ways to create smaller and smaller computer chips, and researchers at Intel believe they have discovered a new material that would help them to do just that. Intel's announcement will garner much attention in an industry where the demand for products that push fundamental physical limits is ever increasing. A problem afflicting many chip makers today is the prevention of electrical currents from leaking outside their proper patches. Because the transistor gates are now becoming as small as just five atomic layers, chips need more power. In turn, they also need a more efficient cooling system. Intel has been having difficulties with the cooling of its chips -- the smaller they get (with etchings as small as 90-130 nanometers), the hotter they become. Recent reports say that the problem has even caused a delay in the Prescott, Intel's most advanced version of the Pentium. Though the new technology would not debut until approximately 2007, Intel is planning to scale down their current 90 nanometer chip size over the years to 65, followed by 45. It is at this point that Intel's new material, which is still unknown, would be introduced. Intel's discovery comes at the height of an intense industry wide search for a new material to replace silicon dioxide, which is used as insulator between the gate and the channel through which current flows in an active transistor. Intel researchers have been working on solving the chip predicament for five years in efforts to keep pace with Moore's Law. Gordon E. Moore, co-founder of Intel, believed that the number of transistors in the same space should double every 18 months. Intel believes they can continue to make short strides, despite the thoughts of many who doubt their ability to keep up such a pace. Though many researchers and competitors agree that Intel's announcement revolves around the most important research area in the chip industry, some feel that the lack of specific technical detail will deter scientists from assessing their claims. ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodrigc at crodrigues.org Thu Nov 6 23:04:15 2003 From: rodrigc at crodrigues.org (Craig Rodrigues) Date: Thu, 6 Nov 2003 23:04:15 -0500 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107005951.2157.qmail@web11407.mail.yahoo.com> References: <20031106145623.GA5867@iib.unsam.edu.ar> <20031107005951.2157.qmail@web11407.mail.yahoo.com> Message-ID: <20031107040415.GA5711@crodrigues.org> On Thu, Nov 06, 2003 at 04:59:51PM -0800, Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? Hi, Not quite the same as an HPC cluster, but take a look at the University of Utah's Emulab: http://www.emulab.net It is heavily based on FreeBSD (i.e. makes use of FreeBSD routing, Dummynet, etc.). The Emulab is a remotely accessible testbed that researchers can use to conduct network experiments. It consists of about 200 PC nodes. The same company that Brooks works for (Aerospace), has apparently set up an internal testbed based on the Emulab software developed at Utah. I use the Emulab every day as party of my research work at BBN, and it is an excellent facility. -- Craig Rodrigues http://crodrigues.org rodrigc at crodrigues.org _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Nov 7 04:13:40 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 07 Nov 2003 10:13:40 +0100 Subject: Article: Sony Cell CPU to deliver two teraflops in 64-core config In-Reply-To: <1068148945.24918.28.camel@haze.sr.unh.edu> References: <1068148945.24918.28.camel@haze.sr.unh.edu> Message-ID: <1068196420.17694.8.camel@penguin> And also on The Reg: http://www.theregister.co.uk/content/3/33813.html The Reg reckons Opteron 250s by early next year. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From franz.marini at mi.infn.it Fri Nov 7 07:56:28 2003 From: franz.marini at mi.infn.it (Franz Marini) Date: Fri, 7 Nov 2003 13:56:28 +0100 (CET) Subject: OctigaBay 12K In-Reply-To: <1068196420.17694.8.camel@penguin> References: <1068148945.24918.28.camel@haze.sr.unh.edu> <1068196420.17694.8.camel@penguin> Message-ID: Hello, just discover this interesting, imho, company and its first product : http://www.octigabay.com/ Their first product is a linux opteron-based cluster that they said could scale up to 12K processors. The base system is a 3.5U shelf with 12 opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor latency and 77GB/s aggregate mem bandwidth. Seems nice, I would like to know what rgb and some of the other people in here think about it :) Have a nice day, Franz --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it --------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Nov 7 08:44:11 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 7 Nov 2003 08:44:11 -0500 (EST) Subject: OctigaBay 12K In-Reply-To: Message-ID: On Fri, 7 Nov 2003, Franz Marini wrote: > Hello, > > just discover this interesting, imho, company and its first product : > > http://www.octigabay.com/ > > Their first product is a linux opteron-based cluster that they said > could scale up to 12K processors. The base system is a 3.5U shelf with 12 > opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor > latency and 77GB/s aggregate mem bandwidth. > > Seems nice, I would like to know what rgb and some of the other people > in here think about it :) Why, it looks simply lovely, as hardware I've never actually tried goes. I mean, if the octigabay people want to send me one for free just so I can write a review for it on this list and the brahma website, well, from the look of it I wouldn't kick it out of my machine room for chewing crackers... and I >>can<< be bought, folks, yes I can, just look at the brahma vendors page and my brazen demand for t-shirts in exchange for space:-) I'll even dig up something fine grained to run on it so that I can pretend to really test it. The bottom line is, well, the bottom line. Pretty isn't enough. Performance (even performance that is absolutely everything promised) isn't enough. It is PRICE performance that matters, or better yet cost-benefit. How does the cost compare to the benefits the design delivers in your environment. For my own personal code, for example, I don't NEED their fancy interconnect, and I can rack up a bunch of opterons for the cost of the basic hardware and a nice case to put them in. They'd therefore have to literally give it to me to make it a cost-benefit win (especially true since I just spent the last of my money in this grant cycle buying hey, whaddya know, a stack of 9 dual Opteron 242's for a hair over $20K). However, there are people out there who run fine grained synchronous parallel code that is bottlenecked at the network IPC level. Even THERE the computations have some intrinsic "value" in that there are finite amounts of money people are willing to pay to get them done, and there are choices. So ultimately it will come down to whether there is a match between the value of the computation (amount people are willing to pay to get it done), the needs of the computation, and the marketplace. It's one of these people that you need to ask about whether or not this is a good deal or good arrangement. My knee jerk reaction is that it is lovely but a bit too far into the big iron side (SP3-ish) to be likely to win a hard-nosed CB comparison relative to a DIY cluster with e.g. myrinet or SCI for MANY clustervolken (the market gets smaller and smaller the further up one travels to super-high-speed networks), but corporate consumers and the larger government consumers shy away from DIY, and even in the intermediate market it comes down to price/performance, eh? If they price it competitively with the other high speed networks and it has clear benefits (as it looks like it might) well then, who knows? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Fri Nov 7 10:00:56 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Fri, 7 Nov 2003 10:00:56 -0500 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107005951.2157.qmail@web11407.mail.yahoo.com> References: <20031106145623.GA5867@iib.unsam.edu.ar> <20031107005951.2157.qmail@web11407.mail.yahoo.com> Message-ID: <20031107150056.GA16835@netmeister.org> [Resending; this message was originally sent last night across the various mailing lists, but beowulf at beowulf.org chokes on the gpg signature. :-/ ] Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? I have a 30 node NetBSD/i386 cluster, and just recently created the tech-cluster at netbsd.org mailing list. Some people are working on a port of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in particular for cluster usage in the near future. Some URLs of relevance: http://guinness.cs.stevens-tech.edu/~jschauma/hpcf/ http://www.netbsd.org/MailingLists/#tech-cluster http://www.netbsd.org/ http://eurobsdcon.org/papers/#souvatzis http://bsd.slashdot.org/article.pl?sid=03/10/20/1523252&mode=thread&tid=122&tid=185&tid=190 http://bsd.slashdot.org/bsd/03/11/05/1536226.shtml?tid=122&tid=185&tid=190 -Jan -- Life," said Marvin, "don't talk to me about life." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Nov 7 12:17:23 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 7 Nov 2003 09:17:23 -0800 (PST) Subject: Fwd: Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) Message-ID: <20031107171723.61183.qmail@web11409.mail.yahoo.com> Forwarding... (Brooks is not on the beowulf list) Rayson --- Brooks Davis wrote: > We (my department, but mostly different people then Fellowship) have > a > small 10-node setup (though each node does have 6 gigabit ports :-). > I > think we're aiming to upgrade to around 48 nodes in the next year. > > Our HPC cluster is currently pretty close to what's described in the > paper, though we are up to 160 nodes and we're adding rack space for > another 192 this year. > > The short version of my take on which OS to run on your cluster is > that so long as it runs the apps you need, the best OS is one you > know > how to admin well since that's most of the work. I've spent a few > weeks here and there porting applictions or improving their ports, > but > by and large, most key systems are already ported to the major UNIX > platforms. The free MPI implemntations work on just about anything, > the > base Ganglia metrics work nearly everywhere (FreeBSD and Linux are at > feature parity in the upcoming release), and SGE works on a wide > range > of platforms. On an amusing note, we were the launch customer for > Grid > Mathematica despite not running a supported OS because the Linux > version > runs just fine on FreeBSD. > > -- Brooks > > -- > Any statement of the form "X is the one, true Y" is FALSE. > PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 > > ATTACHMENT part 2 application/pgp-signature __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Fri Nov 7 14:17:48 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Fri, 7 Nov 2003 14:17:48 -0500 (EST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107150056.GA16835@netmeister.org> Message-ID: > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > particular for cluster usage in the near future. why? I've never understood the evangelical aspect to *BSD (or for that matter Debian). is there a tangible, measurable benefit? I'm not sure it's effective to advocate a niche OS/dist for ideological reasons or just plain personal preference... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Fri Nov 7 16:36:48 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Fri, 7 Nov 2003 21:36:48 +0000 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: References: <20031107150056.GA16835@netmeister.org> Message-ID: <20031107213648.GA11665@galactic.demon.co.uk> On Fri, Nov 07, 2003 at 02:17:48PM -0500, Mark Hahn wrote: > > why? I've never understood the evangelical aspect to *BSD > (or for that matter Debian). is there a tangible, measurable benefit? For some of the list this is old news :( Sorry, but as the co-author of the Distributions HOWTO (and I know it badly needs updating :) ), I can't let this pass. In the beginning was chaos: build your own Linux machine from random bits and pieces, bootstrapping from Minix if necessary. Soft Landing Systems (SLS) introduced the whole concept of a Linux distribution - a collection of code more or less known to work together. In 1993, Patrick Volkerding got fed up with the problems of SLS and founded Slackware. Coincidentally, within a couple of months, a university student named Ian Murdock did almost the same and founded Debian, named for himself and his then girlfriend Debra. Mark Ewing and the Red Hat founders came along a little later, liked some of the concepts from Debian and the idea of package management and introduced the Red Hat Package Manager which spawned .rpm packages. SuSE came out of Slackware via Unifix and Jurix and adopted .rpm's a little later. Mandrake was a French attempt to localise Red Hat and introduce some better packages ... and so on. Don Becker can probably back me up on this - the first cluster was a quick project to fill a specific need for NASA. Cheap, commodity hardware and a quick win. It's name was Beowulf (for the mythical hero). It wasn't intended to go further than NASA and be a short term thing. But it snowballed to the point where everyone wanted "a Beowulf" The first Extreme Linux disk was based on Red Hat 5.x because that was what happened to be around at the time. I've still got mine somewhere - a quick bootstrap to experiment with a cluster (in the days when 4 x 486's still counted for something). Much the same with Extreme Linux 2 and the (semi-commercial/commercial) Scyld. Then commerce woke up to and "commoditised" clusters. Clusters don't have to be Red Hat - nothing Linux _has_ to be anything - but many of them are. There have always been various distributions: various clustering solutions/hardware have come and gone. Everyone has "their" cluster and "their" problem/solution set. It may still be quicker to build your own minimalist system / cheaper to use cast-offs / more economic for you to use ultra-high performance interconnects and networking - that's for you / your budget holder / sysadmins / vendor / cooling plant vendors to decide :) I advocate Debian not just because I use it a lot (for the record, I've also used Red Hat 4.2/5.*/6.*/7.*/8/9/Fedora beta, Mandrake, SuSE, Slackware and the late lamented Linux-FT) but because it has some good qualities, runs on lots of hardware platforms and is relatively unencumbered by nasty legal agreements/high fees. That's my choice - I'll happily help others to run it/port programs so they fit in the Debian distribution/ask vendors nicely if they'll consider support for Linux. It may not be your choice or the choice of others - but it is always worth trying stuff out and being open to change. [Rabid flame mode on: For the record, Debian, despite being maintained by an army of multi-national, multi-lingual volunteers who occasionally manage the semblance of close formation, is _not_ a niche OS - some figures put it second to RH in terms of popularity as a Linux distribution :) Go and write GNU/Linux 1000 times as a penance.] As far as *BSD goes: The BSD's have a longer pedigree. Some people swear by (others swear at) their networking capability. If you've grown up with SunOS and many of the other commercial Unices, much may feel familiar. NetBSD will run on (almost) anything, FreeBSD is less hardware agnostic and differently focused ... and so it goes. A lot of arguments have been thrashed out over the years which generate more heat than light (vi vs. emacs vs. any other editor / .rpm vs. .deb / apt vs. yum vs. urpmi vs. up2date :) ) and this is probably one of them, so it's probably not worth a massive OT thread to follow up. > I'm not sure it's effective to advocate a niche OS/dist for > ideological reasons or just plain personal preference... > See above: your niche OS may be someone else's ideal. An awful lot of niche Linux variants have tried to set themselves up as "the standard" - and vanished without trace. Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmyers1400 at comcast.net Fri Nov 7 11:16:56 2003 From: rmyers1400 at comcast.net (Robert Myers) Date: Fri, 07 Nov 2003 11:16:56 -0500 Subject: OctigaBay 12K In-Reply-To: References: Message-ID: <3FABC578.8050503@comcast.net> Robert G. Brown wrote: >However, there are people out there who run fine grained synchronous >parallel code that is bottlenecked at the network IPC level. Even THERE >the computations have some intrinsic "value" in that there are finite >amounts of money people are willing to pay to get them done, and there >are choices. > As a reader of one of my undergraduate essays commented, that's the kind of knowing comment that Henry James might have written in a letter to his brother William. You may have the status of Henry James in this particular field (my grading professor plainly did not think that I did in the field on which I was holding forth) and therefore be entitled to such remarks with no defense, but could you perhaps elaborate in light of http://www.lanl.gov/orgs/ccn/salishan2003/pdf/camp.pdf particularly slide 25 et. seq.? Thanks RM _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Fri Nov 7 17:50:14 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Fri, 7 Nov 2003 15:50:14 -0700 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: References: <20031107150056.GA16835@netmeister.org> Message-ID: <200311072249.hA7MnG3T000139@knockout.kirtland.af.mil> On Fri, Nov 07, 2003 at 02:17:48PM -0500, Mark Hahn wrote: > > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > > particular for cluster usage in the near future. > > why? I've never understood the evangelical aspect to *BSD > (or for that matter Debian). is there a tangible, measurable benefit? > I'm not sure it's effective to advocate a niche OS/dist for > ideological reasons or just plain personal preference... It is true that Debian has a core of developers that are very committed to a particular definition of free software, and this often carries with it a surprisingly consistent set of philosophical POVs. Nonetheless, Debian is remarkably stable and it has, by far, the best package management system I have ever come across. I use it on all my machines with great overall happiness. With RH moving toward the Big-Bucks model of software, I will not be surprised if I see many new Debian users over the next few months. Art Edwards > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 8 11:54:26 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 8 Nov 2003 11:54:26 -0500 (EST) Subject: Linux vs FreeBSD clusters In-Reply-To: <20031108162037.GA835@netmeister.org> Message-ID: > > > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > > > particular for cluster usage in the near future. > > > > why? > > I believe that NetBSD, even though is a very, very clean and clear OS > with high-quality code, is much neglected. It deserves more attention > and is perfectly suitable for the task at end. there's lots of good stuff out there; I notice you didn't state that *BSD is actually cleaner/clearer/higher-quality than Linux though. and such a statement would be manifestly untrue. the 'deserves' thing is purely an aesthetic judgement, which is perfectly fine, but not a reason to switch... > > I've never understood the evangelical aspect to *BSD (or for that > > matter Debian). > > Evangelical? The entire Linux ``movement'' is based on such evangelism, > much more so than Free- or NetBSD. how strange! absolutely everyone I know who uses Linux (MANY) use it for purely practical reasons - speed, robustness, ease-of-whatever, and sometimes simply because it's the most prevalent Unix. wait, OK, I do know one Debianista, but there's on in every crowd ;) > > is there a tangible, measurable benefit? > > Well, the same as mentioned in the FreeBSD paper, for example. If all what paper is that? > my other machines are NetBSD machines, then having the cluster be NetBSD > makes administration an order of magnitude easier. sure. as I said, *BSD seems mainly preferred by people who are already committed to *BSD. and precisely what I'm looking for is any reason to go BSD for the majority who already have Linux. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Sat Nov 8 12:17:08 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Sat, 8 Nov 2003 12:17:08 -0500 Subject: Linux vs FreeBSD clusters In-Reply-To: References: <20031108162037.GA835@netmeister.org> Message-ID: <20031108171708.GA24376@netmeister.org> Mark Hahn wrote: > > I believe that NetBSD, even though is a very, very clean and clear OS > > with high-quality code, is much neglected. It deserves more attention > > and is perfectly suitable for the task at end. > > there's lots of good stuff out there; I notice you didn't state that > *BSD is actually cleaner/clearer/higher-quality than Linux though. I'm not familiar enough with the Linux code to warrant such a statement. I do know people who are and who would argue that such a statement has a basis. But this should probably be discussed off-list, if at all. > > > is there a tangible, measurable benefit? > > > > Well, the same as mentioned in the FreeBSD paper, for example. If all > > what paper is that? The one posted here. Message-ID: <20031107005951.2157.qmail at web11407.mail.yahoo.com> http://people.freebsd.org/~brooks/papers/bsdcon2003/ > and precisely what I'm looking for is any reason to go BSD for the > majority who already have Linux. s/BSD/Linux/ s/Linux/Solaris/ That's probably pretty much the same question posed before Linux was the new bandwagon to jump. *shrug* -Jan -- Wenn ich tot bin, mir soll mal Einer mit Auferstehung oder so kommen, ich hau ihm eine rein! (Anonym) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Sat Nov 8 11:20:37 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Sat, 8 Nov 2003 11:20:37 -0500 Subject: Linux vs FreeBSD clusters In-Reply-To: References: <20031107150056.GA16835@netmeister.org> Message-ID: <20031108162037.GA835@netmeister.org> Mark Hahn wrote: > > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > > particular for cluster usage in the near future. > > why? I believe that NetBSD, even though is a very, very clean and clear OS with high-quality code, is much neglected. It deserves more attention and is perfectly suitable for the task at end. > I've never understood the evangelical aspect to *BSD (or for that > matter Debian). Evangelical? The entire Linux ``movement'' is based on such evangelism, much more so than Free- or NetBSD. > is there a tangible, measurable benefit? Well, the same as mentioned in the FreeBSD paper, for example. If all my other machines are NetBSD machines, then having the cluster be NetBSD makes administration an order of magnitude easier. -Jan -- "I am so amazingly cool you could keep a side of meat in me for a month. I am so hip I have difficulty seeing over my pelvis." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 8 14:17:49 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 8 Nov 2003 14:17:49 -0500 (EST) Subject: Linux vs FreeBSD clusters In-Reply-To: <3FAD2B7C.2050302@mail2.vcu.edu> Message-ID: > It seems that your are trying to logically argue that people who > evangelize whatever form of linux/unix are being zealots. However, you > avoid providing us with any logical reasons for this assertion. what? I'm doing the exact opposite: asking for non-religious reason for choosing a particular OS or dist. if there's a good, factual reason for using *BSD (say, its gigabit latency is 10 us) then I'd seriously consider switching. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Fetrovsky at netscape.net Sat Nov 8 16:09:47 2003 From: Fetrovsky at netscape.net (=?ISO-8859-1?Q?Daniel_Jes=FAs_Valencia_S=E1nchez?=) Date: Sat, 08 Nov 2003 13:09:47 -0800 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <3FAD5B9B.4050207@netscape.net> hahn at physics.mcmaster.ca wrote: >>It seems that your are trying to logically argue that people who >>evangelize whatever form of linux/unix are being zealots. However, you >>avoid providing us with any logical reasons for this assertion. >> >> > >what? I'm doing the exact opposite: asking for non-religious >reason for choosing a particular OS or dist. if there's a good, >factual reason for using *BSD (say, its gigabit latency is 10 us) >then I'd seriously consider switching. > > I'm sorry... I couldn't help myself. After several years using linux, and dealing with stability problems, I switched to FreeBSD, and since then (about 4 yrs ago) I've had no problems. My code actually runs faster and I don't have to deal with several FreeBSD distributions. If I have to hack an OS kernel or distribution in order to make it work, then I don't want it. That's why I switched to Freebie. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Sat Nov 8 16:32:41 2003 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Sat, 8 Nov 2003 21:32:41 +0000 (UTC) Subject: Linux vs FreeBSD clusters In-Reply-To: Message-ID: On Sat, 8 Nov 2003, Mark Hahn wrote: > and precisely what I'm looking for is any reason > to go BSD We-e-ell ... For starters, the experience might help get rid of this inexplicable: 'Red Hat Is The Only One True God' attitude you seem to want to inflict on the list readership. Doesn't really go down a storm here in Europe, where *real* Linux experts are automatically expected to be experienced with SUSE, Mandrake, Debian -- and of course, Red Hat. As well as others. I'm afraid the "Oh, we don't use anything but Red Hat" freaks come over as being extremely blinkered, and tend not to get very far (or quickly relegated to the Red Hat niche areas). Autre pays, autre moeurs. (Oh, and of course there are valid technical reasons behind everyone's preferred choice for carrying out a particular task. However, blanket dismissal of all but the reigning high-visibility sales publicity leader does not count as a technical reason for most.) Sorry; but too many remarks on this list over the past two weeks have been allowed to pass without comment, and have increasingly pressed the wrong buttons for me. As far as I'm concerned, Red Hat is the Cobol of the 21st century. -- Martin Wheeler Long-time enthusiastic user of "the real thing". Sometime user of Red Hat (but only when paid to do so). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 8 17:45:21 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 8 Nov 2003 17:45:21 -0500 (EST) Subject: Linux vs FreeBSD clusters In-Reply-To: Message-ID: > > and precisely what I'm looking for is any reason > > to go BSD > > > > We-e-ell ... For starters, the experience might help get rid of this > inexplicable: 'Red Hat Is The Only One True God' attitude you seem to > want to inflict on the list readership. you must have be confused with someone else: I could not care less whether it's RH or not. dists are unimportant, just a way to install a set of precompiled tools/libs. they are at best merely non-problematic. kernels are important. compilers are important. some libraries (libc, mpi) are important. dists seem to spend most of their effort on things like installers (which my cluster needed just once), GUI junk, and things like which 20 files in /etc contribute to the network config ;) I value Suse and RH as organizations simply because they both provide meaningful support to certain projects that I'm interested in (kernel, gcc, x86-64, etc) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jmdavis at mail2.vcu.edu Sat Nov 8 12:44:28 2003 From: jmdavis at mail2.vcu.edu (Mike Davis) Date: Sat, 08 Nov 2003 12:44:28 -0500 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <3FAD2B7C.2050302@mail2.vcu.edu> Mark, It seems that your are trying to logically argue that people who evangelize whatever form of linux/unix are being zealots. However, you avoid providing us with any logical reasons for this assertion. So, Debian works, as do Suse and Red Hat. Debian has a better package system than either Suse or RH. I haven't seen a press release from Debian telling me that Novels worldwide support network makes Debian a better OS. I still can't figure out what CNA's and CNE's have to do with linux, but I guess a tech person is a tech person as far as a CEO is concerned. In addition, I haven't seen a Debian press release saying that they were going to stop development of the current release model and create the "New Improved Enterpise Server" which they will happily sell you at $300-$1500 per year for maintenance and support that you don't use but you must buy to download patches for BUGS and errors. Hmm. So, Suse is trying to convince me that people who know nothing about linux are good support and RH is trying to convince me that I should pay for something that I don't use so that I may download patches. Seems to me that the Debianista's (as you refer to them) may be on to something. On the OpenBSD side, it works. No questions, no doubts. I've used it over the years in a variety of roles ranging from Firewall and packet filters, to servers, and to research type machines. Mike Davis Mark Hahn wrote: >>>>of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in >>>>particular for cluster usage in the near future. >>>> >>>> >>>why? >>> >>> >>I believe that NetBSD, even though is a very, very clean and clear OS >>with high-quality code, is much neglected. It deserves more attention >>and is perfectly suitable for the task at end. >> >> > >there's lots of good stuff out there; I notice you didn't state that >*BSD is actually cleaner/clearer/higher-quality than Linux though. >and such a statement would be manifestly untrue. the 'deserves' thing >is purely an aesthetic judgement, which is perfectly fine, but not >a reason to switch... > > > >>> I've never understood the evangelical aspect to *BSD (or for that >>> matter Debian). >>> >>> >>Evangelical? The entire Linux ``movement'' is based on such evangelism, >>much more so than Free- or NetBSD. >> >> > >how strange! absolutely everyone I know who uses Linux (MANY) >use it for purely practical reasons - speed, robustness, ease-of-whatever, >and sometimes simply because it's the most prevalent Unix. wait, OK, >I do know one Debianista, but there's on in every crowd ;) > > > >>>is there a tangible, measurable benefit? >>> >>> >>Well, the same as mentioned in the FreeBSD paper, for example. If all >> >> > >what paper is that? > > > >>my other machines are NetBSD machines, then having the cluster be NetBSD >>makes administration an order of magnitude easier. >> >> > >sure. as I said, *BSD seems mainly preferred by people who are already >committed to *BSD. and precisely what I'm looking for is any reason >to go BSD for the majority who already have Linux. > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cosmik.debris at elec.canterbury.ac.nz Sun Nov 9 15:02:07 2003 From: cosmik.debris at elec.canterbury.ac.nz (Cosmik Debris) Date: Mon, 10 Nov 2003 09:02:07 +1300 Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL Message-ID: <74C3DBA1ACA54844B781615F22D0DB18DA14D8@claude.elec.canterbury.ac.nz> > -----Original Message----- > From: Joey Sims [mailto:jsims at csiopen.com] > Sent: Friday, 7 November 2003 16:08 p.m. > To: beowulf at beowulf.org > Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL > > > Maybe someone could lend a hand and help Intel find out what > their unknown material is. Be careful! Don't spill it in > your lap for goodness sake.... Dohh! :-O > > I found this amusing: > > INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL > 11.07.03 > by Jennifer Tabor > HPCwire > ============================================================== Diamond???? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 9 17:48:00 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 10 Nov 2003 09:48:00 +1100 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <200311100948.05192.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 9 Nov 2003 08:32 am, Martin WHEELER wrote: > On Sat, 8 Nov 2003, Mark Hahn wrote: > > and precisely what I'm looking for is any reason > > to go BSD > > > > We-e-ell ... For starters, the experience might help get rid of this > inexplicable: 'Red Hat Is The Only One True God' attitude you seem to > want to inflict on the list readership. Mark is looking for benchmarks, basically. Some hard facts (well, OK, figures, benchmarks often don't quite qualify as fact :-) ) that will give some actual measure to whether one is better than the other. Also note that he's stated he's looking to be persuaded, and that he's not tied to Linux. He just wants whatever will get his jobs done fastest. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/rsQjO2KABBYQAh8RAvh6AJ9mA52+tBkmW1FEmS9Iuhl1CcJrrACeO+wp t/XI0f1tW9/dScTkCvBWB2c= =1P+Y -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 9 18:05:17 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 10 Nov 2003 10:05:17 +1100 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107213648.GA11665@galactic.demon.co.uk> References: <20031107150056.GA16835@netmeister.org> <20031107213648.GA11665@galactic.demon.co.uk> Message-ID: <200311101005.23978.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 8 Nov 2003 08:36 am, Andrew M.A. Cater wrote: > Soft Landing Systems (SLS) introduced the whole concept of a Linux > distribution - a collection of code more or less known to work together. Not quite correct I believe. My understanding is that SLS is predated by the Manchester Computing Centre (MCC) distribution (the first distro I used around 199[23]). For instance v2.03 of the c.o.l.a FAQ (27th Jan 1993) says on distributions: MCC and SLS are more complete systems that contain most of what is needed for normal use. MCC is older, SLS includes X. ...and the Linux Distribution List says this: http://lightning.prohosting.com/~ldl/cgi-bin/show.cgi?action=2&show=169 MCC Interim Linux is currently the oldest distribution listed on the LDL. It was started by the Manchester Computing Centre in February of 1992, after they made Linux availible on their FTP site in November of 1991. The distribution was one of the first to use a combined boot/root disk. Several distributions were based off of MCC Interim Linux, including TAMU, MJ, and SLS (which later morphed into Slackware Linux, a distribution that's still alive today). Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/rsgtO2KABBYQAh8RAm6PAKCBenVvS1Ob7AgiCTWyRfcg25j+BACfUx8K XFjheVGrgqo3WpHUYEHmLXk= =IXy8 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Mon Nov 10 00:22:33 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Sun, 9 Nov 2003 22:22:33 -0700 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <3FAF1B62.1090900@tamu.edu> References: <20031107150056.GA16835@netmeister.org> <200311072249.hA7MnG3T000139@knockout.kirtland.af.mil> <3FAF1B62.1090900@tamu.edu> Message-ID: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> On Sun, Nov 09, 2003 at 11:00:18PM -0600, Gerry Creager N5JXS wrote: > Until the Debian model allows me to find libs where I am used to looking > for libs (and doesn't mandate /share over /usr/local) as well as a few > more little nagging issues, I'm not going there. I don't, honestly, > have time for me to adapt, and to make my students adapt! > > One of my hopes for LSB was that I'd be able to go from distro to distro > without some of those headaches. Hasn't happened yet. > > gerry It is interesting because one of the initial attractions for Debian was its organization of libraries and configuration files. Afer RH, it seemed totally transparent. I guess this is just a matter of personal taste. I would be surprised, though, if after trying apt-get, you could ever go back to the rpm model. Art > > Arthur H. Edwards wrote: > >On Fri, Nov 07, 2003 at 02:17:48PM -0500, Mark Hahn wrote: > > > >>>of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > >>>particular for cluster usage in the near future. > >> > >>why? I've never understood the evangelical aspect to *BSD > >>(or for that matter Debian). is there a tangible, measurable benefit? > >>I'm not sure it's effective to advocate a niche OS/dist for > >>ideological reasons or just plain personal preference... > > > > > >It is true that Debian has a core of developers that are very > >committed to a particular definition of free software, and this often > >carries with it a surprisingly consistent set of philosophical > >POVs. Nonetheless, Debian is remarkably stable and it has, by far, the > >best package management system I have ever come across. I use it on > >all my machines with great overall happiness. With RH moving toward > >the Big-Bucks model of software, I will not be surprised if I see many > >new Debian users over the next few months. > > > >Art Edwards > > > > > >>_______________________________________________ > >>Beowulf mailing list, Beowulf at beowulf.org > >>To change your subscription (digest mode or unsubscribe) visit > >>http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > -- > Gerry Creager -- gerry.creager at tamu.edu > Network Engineering -- AATLT, Texas A&M University > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > Page: 979.228.0173 > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Nov 10 00:46:38 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 10 Nov 2003 16:46:38 +1100 Subject: Linux package management (was Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?)) In-Reply-To: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> References: <20031107150056.GA16835@netmeister.org> <3FAF1B62.1090900@tamu.edu> <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> Message-ID: <200311101646.39648.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 10 Nov 2003 04:22 pm, Arthur H. Edwards wrote: > I would be surprised, though, if after trying apt-get, you > could ever go back to the rpm model. Mandrake's urpmi is very apt-get like (works out dependencies, you can add other sources, etc) and works very well, and there's also apt4rpm. I've used both (as well as apt-get on Debian) and they're very capable. Yum has a good reputation, though I've never tried it. Lets face it, trying to convince people that distro X is better than distro Y is an exercise in futility, and quite pointless as it all comes down to a subjective judgement of what someone finds more appealing or capable. To me the diversity is good for the future of Linux, it means there's a lot of people with their own ideas that can grow and evolve and spread. The more the merrier. :-) Of course all this idealism tends to come down with a bump when you hit the hard reality of will vendor X support product Y on distro Z, and how much having (or not having) support means to you. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/ryY+O2KABBYQAh8RAj1lAJ4jLnFqT+PWGpYS1tYv6HgD6CU4yACcDUCs 7+e+rX3lh0rttbQ5BwKFfuo= =Ahvj -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Mon Nov 10 03:53:52 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Mon, 10 Nov 2003 09:53:52 +0100 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux Message-ID: <200311100853.hAA8rqI02462@dali.crs4.it> Glen Kaukola gave me a more precise explanation of the downgrade the resulted in a stable O/S for Tyen S2880. He wrote: "I downgraded from 2.01 to 1.07." and mentioned the URL http://www.tyan.com/support/html/b_s2880.html That page describe 1.07 as the first BIOS and 2.01 as the second, the latter release 20 august 2003. Curiously the AMD page for recommended motherboards http://www.amd.com/us-en/recmobo/ResultsHandler/1,,30_2252_869_8819%5E8821~68707,00.html describes the second release as 19 august 2003, version PON but then describes a newer version "TOY" version 2.01p with date 8 october 2004. Where I live, the only readily available board for Opteron is from Tyan, so I hope we collectively shed more light on the situation. best regards, Alan Scheinine _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Mon Nov 10 06:53:10 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Mon, 10 Nov 2003 08:53:10 -0300 (ART) Subject: Compiling HPL Message-ID: <20031110115310.40488.qmail@web12201.mail.yahoo.com> I trying to compile the HPL, it starts the process and stop with the following message: mpicc -o HPL_pdinfo.o -c -fomit-frame-pointer -O3 -funroll-loops -W -Wall -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I/work1/mathias/hpl/include -I/work1/mathias/hpl/include/Linux_ATHLON_FBLAS -I/opt/mpich/include ../HPL_pdinfo.c mpicc -o HPL_pdtest.o -c -fomit-frame-pointer -O3 -funroll-loops -W -Wall -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I/work1/mathias/hpl/include -I/work1/mathias/hpl/include/Linux_ATHLON_FBLAS -I/opt/mpich/include ../HPL_pdtest.c g77 -fomit-frame-pointer -O3 -funroll-loops -W -Wall -o /work1/mathias/hpl/bin/Linux_ATHLON_FBLAS/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /work1/mathias/hpl/lib/Linux_ATHLON_FBLAS/libhpl.a /usr/lib/libblas.a /usr/lib/libatlas.a /opt/mpich/lib/libmpich.a /opt/mpich/lib/libmpich.a(comm_split.o)(.text+0x138): In function `MPI_Comm_split': : undefined reference to `PMPI_Allreduce' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x63): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Allreduce' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x94): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Bcast' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x111): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Sendrecv' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x154): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Allreduce' collect2: ld returned 1 exit status make[2]: ** [dexe.grd] Erro 1 make[2]: Leaving directory `/work1/mathias/hpl/testing/ptest/Linux_ATHLON_FBLAS' make[1]: ** [build_tst] Erro 2 make[1]: Leaving directory `/work1/mathias/hpl' make: ** [build] Erro 2 I think the problem is with the mpi. Well I change the makefile with the Make.Linux_ATHLON_FBLAS with the necessary modifications. Did I forget something? ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - 6MB, anti-spam e antiv?rus gratuito. Crie sua conta agora: http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 10 07:30:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 10 Nov 2003 07:30:13 -0500 (EST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> Message-ID: On Sun, 9 Nov 2003, Arthur H. Edwards wrote: > > It is interesting because one of the initial attractions for Debian > was its organization of libraries and configuration files. Afer RH, it > seemed totally transparent. I guess this is just a matter of personal > taste. I would be surprised, though, if after trying apt-get, you > could ever go back to the rpm model. > > Art It isn't "the rpm model" -- in both cases the packaging and metadata are adequate. Comparing apt to rpm is apples to oranges -- apt-get is a toplevel toolset to extract and resolve dependencies from the debian packages and use them to retrieve and install package(s) and their entire consistent dependency trees, by revision. The problem is that in the past there has been no comparable toolset for RPM packages and all the distributions that rely on them. For the last two or three years, there has been (first yup, now yum). It, too, is "totally transparent" and has, arguably features that some administrators might prefer (including considerable and increasingly fine-grained control over their own, local, repository images). Whether or not you've looked at yum and tried yum and compared yum's operation and features to apt, the existence of choices appears to be a good thing, as does "competition" of sorts (the friendly, slightly religious sort that tends to exist in the open source world:-). I know yum's primary developers quite well (since they work about fifty meters away from my office in the same building:-) and they are very, very dedicated and not at all religiously inclined towards Red Hat per se. Yum has been successfully used to make RPM-based repositories for just about all the primary RPM linux distributions, and I believe that people have even used it to distribute/maintain RPMs on Solaris boxes. At this particular moment, I think that yum makes RH (or if you prefer, Fedora) slightly preferrable to Debian in a scaled/automated LAN installation because both effectively automaintain after installation, but RH/Fedora permits the easy use of PXE/kickstart. Kickstart, after all, is (IMO) the reason RH maintained its dominance in spite of the otherwise pain of manipulating RPMs compared to Debian, and the reason it remained dominant among RPM-based distros as well. In part because of its existence, there is actually some talk of coming up with a rational unification of linux packaging schemes, reviewing and getting rid of package features that have proven to be more Evil than Good over the years, developing an XML schema, and lots of other good things that might actually reduce the "us and them" barriers for linux in general. I personally think that this would be a good thing. As Mark has been saying -- most of us are religious about open source, stability, functionality, but at best we are "used" to particular distributions and could be convinced to change fairly easily if advantages associated with the change outweighed the hassle of learning something new. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Sun Nov 9 23:21:07 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Sun, 09 Nov 2003 22:21:07 -0600 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <3FAF1233.5030301@tamu.edu> OK, so let's get back on-track. I *am* a RH user for my systems, and less likely to sway in he near future if Fedora pans out. I went (somewhat reluctantly) to RH for its RPM capability, back when dselect had user interface problems (well, more precisely, its user interface was user-hostile) and Slack's package management was tarballs with no accountabilty. SuSE didn't load reliably, and Mandrake hadn't started snagging RedHat RPMS and repackaging them yet. debian has always, in its quest to be the "one pure linux" placed things where they weren't easy for me to find. I am a proponent of the OpenSource concept and movement, but I'm also too busy to have to chase where one distro puts things, determine why another doesn't clean up after itself, and finally have to fight almost every one for some form of accountability in package management. RedHat offered the package management piece at a time when that's what was needed to give me a little breathing room. As one friend put it this weekend, RedHat is as quirky as all the other distributions, but I seem to know where and what the quirks are. The change to Fedora has me reexamining the potential for change. It'd be easiest for me, and my lab and other activities, to go to Mandrake. SuSE has a good reputation with folks I know, work with, and trust. Debian still hasn't decided where they're going to put pieces of the code, and while the distro appears internally consistent (all the debian installs I've worked with _worked_), I often have to customize things, and I don't have time these days to go looking for where someone decided they felt libs or modules belonged, in contravention to the rest of the world. I'm not so sure why you're so hard on RedHat. I don't think I'd characterize it as the COBOL (it's an acronym) of the 21st century. Unless you're simply looking for a tag that usually raises the ire of scientific programmers who might have had to take a COBOL course in their academic past... This tends to be a rather high-end group of computational expertise. I learn a lot here. I've been known to contribute a bit in the past (back when I got to do high performance computing instead of managing a pack of grad students who now get the fun stuff). We don't really need the elitist plugs. If you don't like RedHat, fine. If someone else does, fine. Just recognize that you don't have to use my distro and I don't have to use yours. Exasperatedly yours, Gerry Martin WHEELER wrote: > On Sat, 8 Nov 2003, Mark Hahn wrote: > > >> and precisely what I'm looking for is any reason >>to go BSD > > > > > We-e-ell ... For starters, the experience might help get rid of this > inexplicable: 'Red Hat Is The Only One True God' attitude you seem to > want to inflict on the list readership. > > Doesn't really go down a storm here in Europe, where *real* Linux > experts are automatically expected to be experienced with SUSE, > Mandrake, Debian -- and of course, Red Hat. As well as others. > > I'm afraid the "Oh, we don't use anything but Red Hat" freaks come > over as being extremely blinkered, and tend not to get very far (or > quickly relegated to the Red Hat niche areas). > > Autre pays, autre moeurs. > > (Oh, and of course there are valid technical reasons behind everyone's > preferred choice for carrying out a particular task. However, blanket > dismissal of all but the reigning high-visibility sales publicity leader > does not count as a technical reason for most.) > > Sorry; but too many remarks on this list over the past two weeks have > been allowed to pass without comment, and have increasingly pressed the > wrong buttons for me. > > As far as I'm concerned, Red Hat is the Cobol of the 21st century. > > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Mon Nov 10 11:26:48 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Mon, 10 Nov 2003 09:26:48 -0700 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: References: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> Message-ID: <200311101626.hAAGPs3R013517@knockout.kirtland.af.mil> I think your point about newer package management tools is well-taken. I have tried the apt for rpms (when I was running the free scyld distribution) and it was clealy better. I have not tried yum, but I have not had enough (any) frustration with apt-get, and now apt-proxy, to make a move desirable. I also agree that my attachment is more to open-source than to Debian per-se, although after using RH, SUSe, and even turbo, Linux, I have stuck with Debian. I actually wish SUSe (now part of Novell) well, and I am sorry to see the demise of RH as we know it, because they are where increased user base comes from. However, I don't know whether SUSe will have better luck at generating revenue than did RH, and they may well go the same direction. It is that possibility that makes me think that Debian, or a similar, volunteer-based distribution may have the greater longevity. Art Edwards On Mon, Nov 10, 2003 at 07:30:13AM -0500, Robert G. Brown wrote: > On Sun, 9 Nov 2003, Arthur H. Edwards wrote: > > > > > It is interesting because one of the initial attractions for Debian > > was its organization of libraries and configuration files. Afer RH, it > > seemed totally transparent. I guess this is just a matter of personal > > taste. I would be surprised, though, if after trying apt-get, you > > could ever go back to the rpm model. > > > > Art > > It isn't "the rpm model" -- in both cases the packaging and metadata are > adequate. Comparing apt to rpm is apples to oranges -- apt-get is a > toplevel toolset to extract and resolve dependencies from the debian > packages and use them to retrieve and install package(s) and their > entire consistent dependency trees, by revision. > > The problem is that in the past there has been no comparable toolset for > RPM packages and all the distributions that rely on them. For the last > two or three years, there has been (first yup, now yum). It, too, is > "totally transparent" and has, arguably features that some > administrators might prefer (including considerable and increasingly > fine-grained control over their own, local, repository images). > > Whether or not you've looked at yum and tried yum and compared yum's > operation and features to apt, the existence of choices appears to be a > good thing, as does "competition" of sorts (the friendly, slightly > religious sort that tends to exist in the open source world:-). I know > yum's primary developers quite well (since they work about fifty meters > away from my office in the same building:-) and they are very, very > dedicated and not at all religiously inclined towards Red Hat per se. > Yum has been successfully used to make RPM-based repositories for just > about all the primary RPM linux distributions, and I believe that people > have even used it to distribute/maintain RPMs on Solaris boxes. At this > particular moment, I think that yum makes RH (or if you prefer, Fedora) > slightly preferrable to Debian in a scaled/automated LAN installation > because both effectively automaintain after installation, but RH/Fedora > permits the easy use of PXE/kickstart. Kickstart, after all, is (IMO) > the reason RH maintained its dominance in spite of the otherwise pain of > manipulating RPMs compared to Debian, and the reason it remained > dominant among RPM-based distros as well. > > In part because of its existence, there is actually some talk of coming > up with a rational unification of linux packaging schemes, reviewing and > getting rid of package features that have proven to be more Evil than > Good over the years, developing an XML schema, and lots of other good > things that might actually reduce the "us and them" barriers for linux > in general. I personally think that this would be a good thing. As > Mark has been saying -- most of us are religious about open source, > stability, functionality, but at best we are "used" to particular > distributions and could be convinced to change fairly easily if > advantages associated with the change outweighed the hassle of learning > something new. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 10 12:20:20 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 10 Nov 2003 12:20:20 -0500 (EST) Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: <200311100853.hAA8rqI02462@dali.crs4.it> Message-ID: > "I downgraded from 2.01 to 1.07." and mentioned the URL > http://www.tyan.com/support/html/b_s2880.html > That page describe 1.07 as the first BIOS and 2.01 as the second, the > latter release 20 august 2003. Curiously the AMD page for > recommended motherboards > http://www.amd.com/us-en/recmobo/ResultsHandler/1,,30_2252_869_8819%5E8821~68707,00.html > describes the second release as 19 august 2003, version PON but then > describes a newer version "TOY" version 2.01p with date 8 october 2004. I recently upgraded from 1.0.1 to 2880201l.rom (which seems to be the most recent from the site above). I did so mainly to let me boot my shiny new nodes without a keyboard ;( I haven't noticed any problems with the new bios. admittedly, mine are fairly simple machines: if pxe, the nics and userspace works, I'm happy... "dd if=/dev/mem bs=1k skip=640 count=384 | strings|less" shows: BIOS Date: 08/18/03 15:19:23 Ver: 08.00.08 TYAN Thunder K8S V2.01l BIOS that reminds me: has anyone done the gruntwork to figure out how to run a flash upgrade without a windows-formatted floppy and floppy drive? I'd actually feel fairly sanguine about pxe-booting to a faked floppy image that ran the installer... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Mon Nov 10 13:26:31 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Mon, 10 Nov 2003 10:26:31 -0800 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: References: Message-ID: <3FAFD857.2010303@cert.ucr.edu> Mark Hahn wrote: >that reminds me: has anyone done the gruntwork to figure out how to run >a flash upgrade without a windows-formatted floppy and floppy drive? >I'd actually feel fairly sanguine about pxe-booting to a faked floppy >image that ran the installer... > Best I've been able to come up with is turning a floppy image into a bootable cdrom. From one of the little howto's I've made for myself: mkdir /tmp/foo cp floppy.img /tmp/foo/ cd /tmp/foo mkisofs -J -r -b floppy.img . > /tmp/floppy.iso Cheers, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Sat Nov 8 20:11:48 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Sat, 8 Nov 2003 17:11:48 -0800 (PST) Subject: Performance Tuning on Clusters Course Message-ID: <20031109011148.63342.qmail@web11405.mail.yahoo.com> http://webct.ncsa.uiuc.edu:8900/public/PTCLUST/ Rayson __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shewa at inel.gov Mon Nov 10 14:34:23 2003 From: shewa at inel.gov (Andrew Shewmaker) Date: Mon, 10 Nov 2003 12:34:23 -0700 Subject: Compiling HPL In-Reply-To: <20031110115310.40488.qmail@web12201.mail.yahoo.com> References: <20031110115310.40488.qmail@web12201.mail.yahoo.com> Message-ID: <3FAFE83F.7000809@inel.gov> Mathias Brito wrote: > I trying to compile the HPL, it starts the process and > stop with the following message: > > /opt/mpich/lib/libmpich.a(comm_split.o)(.text+0x138): > In function `MPI_Comm_split': > : undefined reference to `PMPI_Allreduce' > > I think the problem is with the mpi. > > Well I change the makefile with the > Make.Linux_ATHLON_FBLAS with the necessary > modifications. Did I forget something? Mathias, You might want to check to see if you are mixing LAM and MPICH with "rpm -qf `which mpicc`" (note the backticks) on an rpm based distro. If it is an MPICH problem then you will get the best help from mpi-maint at mcs.anl.gov and you will have to provide them with more information. Read their FAQ to find out what they expect from you in order to provide support. http://www.mcs.anl.gov/mpi/mpich/docs/faq.htm Make sure you tell them what version of mpich you have installed and whether or not you built it yourself. Andrew -- Andrew Shewmaker, Associate Engineer Phone: 1-208-526-1415 Idaho National Eng. and Environmental Lab. P.0. Box 1625, M.S. 3605 Idaho Falls, Idaho 83415-3605 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nfaerber at penguincomputing.com Mon Nov 10 13:56:40 2003 From: nfaerber at penguincomputing.com (Nate Faerber) Date: 10 Nov 2003 10:56:40 -0800 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: References: Message-ID: <1068490599.28875.35.camel@m10.penguincomputing.com> > that reminds me: has anyone done the gruntwork to figure out how to run > a flash upgrade without a windows-formatted floppy and floppy drive? > I'd actually feel fairly sanguine about pxe-booting to a faked floppy > image that ran the installer... > You can use FreeDOS boot floppies if you are trying to free yourself from MS. Then if you want to free yourself from the floppy drive, try out MEMDISK from H. Peter Anvin (SYSLINUX) with your floppy image. Unfortunately, we have found that the Phoenix Flash utility (phlash16.exe) has not been working lately over a network with PXE/MEMDISK. We haven't contacted H. Peter about this, yet. It could be a MEMDISK issue or it could be a FreeDOS issue or maybe a bit of both. If you don't mind burning tiny CDs for BIOS upgrades, another option instead of floppy is CDR. Try DOSEMU with your floppy image. -- Nate Faerber, Engineer Tel: 415-358-2666 Fax: 415-358-2646 Toll Free: 888-PENGUIN PENGUIN COMPUTING www.penguincomputing.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 18:00:12 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 15:00:12 -0800 (PST) Subject: floppy images - Re: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: <3FAFD857.2010303@cert.ucr.edu> Message-ID: hi ya glen On Mon, 10 Nov 2003, Glen Kaukola wrote: > Mark Hahn wrote: > > >that reminds me: has anyone done the gruntwork to figure out how to run > >a flash upgrade without a windows-formatted floppy and floppy drive? > >I'd actually feel fairly sanguine about pxe-booting to a faked floppy > >image that ran the installer... > > > > Best I've been able to come up with is turning a floppy image into a > bootable cdrom. > > From one of the little howto's I've made for myself: > mkdir /tmp/foo > cp floppy.img /tmp/foo/ > cd /tmp/foo > mkisofs -J -r -b floppy.img . > /tmp/floppy.iso if you're trying to do upgrades ... to a faked floppy drive http://www.linux-consulting/www-linux/Boot/Boot.Loop.txt - lots of other booting/installer stuff c ya alvin # # Loopback Device # # # http://burks.brighton.ac.uk/burks/linux/rute/node19.htm # # # # dd if=/dev/zero of=/dev/ram0 count=1440 bs=1024 # dd if=/dev/zero of=/tmp/floppy count=1440 bs=1024 # losetup /dev/loop0 /tmp/floppy mke2fs /dev/loop0 # mount /dev/loop0 /mnt/test # # copy files to the loopback device ( will go to floppy later ) # # get a copy of tom's root-boot for sample contents # to copy onto your faked floppy version # ( shows you what files are required # # - use your minimized kernel and no modules # ls -al /mnt/test # umount /mnt/test losetup -d /dev/loop0 # # # Now copy the data to the floppy drive # # dd if=/dev/ram0 of=/dev/fd0 count=1440 bs=1024 dd if=/tmp/floppy of=/dev/fd0 count=1440 bs=1024 # # End of file _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 17:55:47 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 14:55:47 -0800 (PST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <200311101626.hAAGPs3R013517@knockout.kirtland.af.mil> Message-ID: hi ya art On Mon, 10 Nov 2003, Arthur H. Edwards wrote: > I think your point about newer package management tools is > well-taken. I have tried the apt for rpms (when I was running the free > scyld distribution) and it was clealy better. I have not tried yum, > but I have not had enough (any) frustration with apt-get, and now > apt-proxy, to make a move desirable. I also agree that my attachment > is more to open-source than to Debian per-se, although after using RH, > SUSe, and even turbo, Linux, I have stuck with Debian. I actually wish > SUSe (now part of Novell) well, and I am sorry to see the demise of RH > as we know it, because they are where increased user base comes > from. However, I don't know whether SUSe will have better luck at > generating revenue than did RH, and they may well go the same > direction. It is that possibility that makes me think that Debian, or a > similar, volunteer-based distribution may have the greater longevity. good point ... - i think that "volunteer-based distro" will survive all the commercial methodologies ... - commercial folks are out to make $$$$ to attempt to cover the costs of marketing, sales, advertisement and analysts expectations - voluteers do what they do, because its what they like doing and will probably continue doing so for the next few eons *.rpm or *.deb or *.foo or *.tgz package managers... - i can make *.deb break its dependecies equally easily as *.rpm would be barfing ... - that if the dependcies arent installed, the app you're trying to install fails, or that if you use --no-deps than even worst things happen when you brute force things my choice/methodology is *.tgz original sources, and run my update scripts - updates/upgrades/installing/patches are distro independent ( linux, bsd, solaris, sgi, etc ) - copy the old files FIRST into a date-stamped tar ball backup ( you should always be able restore what it used to be ( before the failed updates/upgrades - merge the old files with the new configs - overwrite with the merged data and binaries other cluster installers http://www.Linux-Consulting.com/Cluster c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 18:40:36 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 15:40:36 -0800 (PST) Subject: floppy images - fix - Re: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: Message-ID: hi again ooppss ( fixing the i dont have a drive at the bootom ) On Mon, 10 Nov 2003, Alvin Oga wrote: > > Best I've been able to come up with is turning a floppy image into a > > bootable cdrom. > > > > From one of the little howto's I've made for myself: > > mkdir /tmp/foo > > cp floppy.img /tmp/foo/ > > cd /tmp/foo > > mkisofs -J -r -b floppy.img . > /tmp/floppy.iso > > if you're trying to do upgrades ... to a faked floppy drive > > > http://www.linux-consulting/www-linux/Boot/Boot.Loop.txt > - lots of other booting/installer stuff > > > c ya > alvin > > # > # Loopback Device > # > # > # http://burks.brighton.ac.uk/burks/linux/rute/node19.htm > # > # > # > # dd if=/dev/zero of=/dev/ram0 count=1440 bs=1024 > # > dd if=/dev/zero of=/tmp/floppy count=1440 bs=1024 > # > losetup /dev/loop0 /tmp/floppy > mke2fs /dev/loop0 > # > mount /dev/loop0 /mnt/test > # > # copy files to the loopback device ( will go to floppy later ) > # > # get a copy of tom's root-boot for sample contents > # to copy onto your faked floppy version > # ( shows you what files are required > # > # - use your minimized kernel and no modules > # > ls -al /mnt/test > # > umount /mnt/test > losetup -d /dev/loop0 > # > # > # Now copy the data to the floppy drive > # > # dd if=/dev/ram0 of=/dev/fd0 count=1440 bs=1024 ### > dd if=/tmp/floppy of=/dev/fd0 count=1440 bs=1024 dd if=/tmp/floppy of=/tmp/initrd.fakefloppy count=14440 bs=1024 gzip /tmp/initrd.fakefloppy # # your new boot stuff # cp /tmp/initrd.fakefloppy.gz /boot # # - look mahh, no /dev/fd0 :-) # > # > # End of file > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 10 19:45:55 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 10 Nov 2003 19:45:55 -0500 (EST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: Message-ID: On Mon, 10 Nov 2003, Alvin Oga wrote: > > hi ya art > > On Mon, 10 Nov 2003, Arthur H. Edwards wrote: > > > I think your point about newer package management tools is > > well-taken. I have tried the apt for rpms (when I was running the free > > scyld distribution) and it was clealy better. I have not tried yum, > > but I have not had enough (any) frustration with apt-get, and now > > apt-proxy, to make a move desirable. I also agree that my attachment > > is more to open-source than to Debian per-se, although after using RH, > > SUSe, and even turbo, Linux, I have stuck with Debian. I actually wish > > SUSe (now part of Novell) well, and I am sorry to see the demise of RH > > as we know it, because they are where increased user base comes > > from. However, I don't know whether SUSe will have better luck at > > generating revenue than did RH, and they may well go the same > > direction. It is that possibility that makes me think that Debian, or a > > similar, volunteer-based distribution may have the greater longevity. > > good point ... > > - i think that "volunteer-based distro" will survive all the commercial > methodologies ... > - commercial folks are out to make $$$$ to attempt to cover the > costs of marketing, sales, advertisement and analysts expectations > > - voluteers do what they do, because its what they like doing and will > probably continue doing so for the next few eons Alvin and Arthur, I don't think either one will disappear anytime soon and think that we've entered an era where the two can exhibit excellent synthesis. There is nothing wrong with commercial distributions, or commercial distributions making money, as long as they remember: a) They don't own their product. b) They are therefore at best selling added value, such as support. c) This puts pretty strict limits on what they can sanely charge. Some of the major distributions may be forgetting c) just a bit, but the market will correct this soon enough:-) Or maybe this is just wishful thinking and some marketing hype to give their stocks a bit of a bounce. Truthfully, the commercial distributions and individual/volunteer developers have achieved a moderately healthy synergism in all the various Linuxes. In fact, the commericial folks have provided a variety of valuable services -- collecting packages, running a moderately systematic debugging service (which actually has worked adequately for serious bugs, however poorly/slowly it works for ignorable/annoying bugs), applying critical patches, contributing whole applications and much needed support services to the development process. I certainly don't begrudge them a living and have gone out of my way to buy something from them periodically in the hope that the golden goose stays fat (enough) and will continue laying. I just think it is pretty silly of them to try to charge as much or more than their major commercial competitors for a product that they don't own and (for the most part) didn't develop. They won't get it and in the meantime they'll irritate a lot of people they should be (and have in the past) been working with, as well as for. On the other hand, it is perfectly reasonable to try to re-engage the volunteer community in the development process and to give the notion of "supported versions" a bit of a goose. In recent years, we all may literally have become somewhat complacent, trusting the commercial groups to PROVIDE those valuable services without our strong participation. If this is all Fedora (for example) is about, I'm all for it. However, the whole process >>has<< already started spawning new alternatives (such as caos) and I think "the community" has plenty of capacity to support plenty of non-commercial or very low margin alternatives (as one would expect, given the existence of Gnu itself, freebsd, debian, all of which support themselves by means of low margin events like T-shirt sales and donations). So the commercials may find that they've created something of a monster. Or four. Overall, I'm less cynical about the process than I was a month ago because I see some benefit that could arise from it. Companies like Red Hat, Mandrake, SuSE/Novell may learn from all this what the limits are on what they can charge and what added value they need to provide and where to earn a fair living (in contrast to a gross Bill Gates billionaire profit). They also are firmly reminded that they need us (the "volunteers" who as often as not own the software they sell) more than we need them. As I've said before and will repeat -- they aren't going to get anything like the prices they wish to charge businesses for "rawhide", and they >>must<< have a meaningful rawhide and community development process in order to sustain Linux's legendary stability and universal utility. The "community" may be reminded that although the software is free, supporting it in a distributional form isn't free, and if they don't take steps to fairly compensate whoever it is that is providing the service, they'd better make arrangements to do it themselves individually or collectively. The stress will also very likely create a spate of new programs, which is a very good thing. Some of them may even be revolutionary products, as I think we're about to enter an era where computers build themselves an operating environment from certified open sources in real-time and on demand, except where administrators deliberately do or mirror a prebuilt repository based on the same tools and sources for efficiency reasons. The GPL sanctifies the source package, new XML-based packaging schema and the web itself can guarantee cross-linux, maybe even cross-linux+BSD combined build compatibility. That is, SOMEWHERE in the very near future I think we are going to see the emergence of a completely new paradigm, one that finally ends the era of commercial software as we have known it in the past. The requisite tool components are all there, the broadband connections to the home required to sustain it are there, and I think the creative juices are cooking in developers' minds. The coming revolution will make even the notions of java and .net look tame and in retrospect a bit silly, as the entire notion of java and source application delivery will be just a tiny fraction of a source application delivery system that can and will deliver the entire operating system and all derivative tools (including java). C, perl, python, java, html, php -- sources of all sorts bundled into GPL packages with attached development processes and delivered directly to your system on demand for prices ranging from nothing (as most web services are delivered today) to a trivial amount for a snazzy subscription/security service. Hmm, sounds like a whole new .com concept, doesn't it? But I really think that is where we are going, fairly rapidly now. I do think that consumer Linux, especially, will suffer tremendously (indeed already is) from the largely unnecessary and unjustifiable price inflation -- as if Linux is somehow more expensive to support (badly) than Windows. 2004 should have been the year of consumer linux, and still would be if Red Hat would get out there with a $35 box set of the absolute kitchen sink followed by perhaps $25/year full update support per household. Or even less -- they now risk the community providing the installation and update support for free so efficiently that they can't charge even this. You can get pretty rich selling $15 objects to people, if you sell them to a LOT of people. Just ask J. Rowling (and her publisher). But who would buy even the best of Harry Potter books for $150 a copy? Especially when they could get pretty much the same book for free, or in an inexpensive $5 paperback version. Sounds like Econ 141 time -- I vaguely remember some nifty concepts such as "supply and demand", and "elastic and inelastic markets". We'll see if RH, Mandrake, SuSE/Novell remember them too. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Tue Nov 11 10:05:10 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Tue, 11 Nov 2003 09:05:10 -0600 Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: <3FB0FAA6.2030502@tamu.edu> One key element to look at is fabric speed. If the backplane can't keep up to wire-speeds, you're going to suffer some slowdown and latencies associated with the network. Whether that's a problem in your installation, or not, we can't tell from this range. However, if there's sufficient money I'd be buying the most capable switch I could from a backplane and port sustainability point as I could. gerry Keyan Mehravaran wrote: > Hi, > > I am planning to connect 8 dual Xeon PCs > with onboard gigabit through a switch and > I only need access to the "zeroth" node. > I have two questions: > > 1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? > > 2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? > > Please advise. > > Thank You, > > Kian Mehravaran > Research Assistant > 4110 Engineering > Michigan State University > East Lansing, MI 48823 > > __________________________________ > Do you Yahoo!? > Protect your identity with Yahoo! Mail AddressGuard > http://antispam.yahoo.com/whatsnewfree > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From keyanm at yahoo.com Tue Nov 11 09:24:45 2003 From: keyanm at yahoo.com (Keyan Mehravaran) Date: Tue, 11 Nov 2003 06:24:45 -0800 (PST) Subject: Gigabit Switch Message-ID: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Hi, I am planning to connect 8 dual Xeon PCs with onboard gigabit through a switch and I only need access to the "zeroth" node. I have two questions: 1) Is there any benefit to using "managed" switch rather than the "unmanaged" ones? 2) Is it possible to increase bandwidth by adding an extra gigabit NIC to each node? If the answer is yes, then should all the 16 ports connect to the same switch? Please advise. Thank You, Kian Mehravaran Research Assistant 4110 Engineering Michigan State University East Lansing, MI 48823 __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Tue Nov 11 11:17:50 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Tue, 11 Nov 2003 11:17:50 -0500 Subject: Gigabit Switch In-Reply-To: <3FB0FAA6.2030502@tamu.edu> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> <3FB0FAA6.2030502@tamu.edu> Message-ID: <20031111111750.D9711@www2> I will add that the ability of a second gigabit adapter to add to the total bandwith available to a node will depend a great deal on (a) the architecture of the node and (b) the adapter chosen. For example, two 32-bit gigabit adapters on the same PCI bus aren't going to do you much good, while two 64-bit adapters on separate PCI busses might. --Bob On Tue, Nov 11, 2003 at 09:05:10AM -0600, Gerry Creager N5JXS wrote: > > One key element to look at is fabric speed. If the backplane can't keep > up to wire-speeds, you're going to suffer some slowdown and latencies > associated with the network. Whether that's a problem in your > installation, or not, we can't tell from this range. However, if > there's sufficient money I'd be buying the most capable switch I could > from a backplane and port sustainability point as I could. > > gerry > > Keyan Mehravaran wrote: > >Hi, > > > >I am planning to connect 8 dual Xeon PCs > >with onboard gigabit through a switch and > >I only need access to the "zeroth" node. > >I have two questions: > > > >1) Is there any benefit to using "managed" > > switch rather than the "unmanaged" ones? > > > >2) Is it possible to increase bandwidth by > > adding an extra gigabit NIC to each node? > > If the answer is yes, then should all the > > 16 ports connect to the same switch? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Nov 11 12:32:32 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 11 Nov 2003 12:32:32 -0500 (EST) Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: On Tue, 11 Nov 2003, Keyan Mehravaran wrote: > I am planning to connect 8 dual Xeon PCs > with onboard gigabit through a switch and > I only need access to the "zeroth" node. > I have two questions: > > 1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? Frequently "managed" switches are a negative. An Ethernet switch should "just work". Providing configuration options just encourages setting the switch to flawed modes, such as forced-full-duplex or filtering packet types you thought you were not using. > 2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? Yes, you can marginally increase bandwidth. But it's not worth it. If you channel bond GbE, you'll likely get out-of-order packets on the receiving side and consume much more CPU to reassemble. If you trunk, you will not see higher peak bandwidth, and may still suffer from bad cache or interrupt affinity effects. You should use separate switches for channel bonding. Although it's possible to use VLAN to avoid this, that's brings us back to the switch configuration issue. And two half size switches are less expensive than one. Bottom line: use a single GbE channel unless there is a specific application reason to do otherwise. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Rafael.Tinoco at sun.com Tue Nov 11 11:41:31 2003 From: Rafael.Tinoco at sun.com (Rafael David Tinoco) Date: Tue, 11 Nov 2003 14:41:31 -0200 Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: <3FB1113B.2090903@sun.com> ooops, sorry about one thing i forget.. to make bonding connections with 2 NICs in each node, you would have to have 2 VLANS in the switch, 1 nic for each node in vlan0 (for ex) and the other in vlan1. so bonding could work regards rafael david tinoco sun professional services - brazil rafael.tinoco at sun.com Keyan Mehravaran wrote: >Hi, > >I am planning to connect 8 dual Xeon PCs >with onboard gigabit through a switch and >I only need access to the "zeroth" node. >I have two questions: > >1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? > >2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? > >Please advise. > >Thank You, > >Kian Mehravaran >Research Assistant >4110 Engineering >Michigan State University >East Lansing, MI 48823 > >__________________________________ >Do you Yahoo!? >Protect your identity with Yahoo! Mail AddressGuard >http://antispam.yahoo.com/whatsnewfree >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Nov 11 13:27:34 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 11 Nov 2003 13:27:34 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: > > 1) Is there any benefit to using "managed" > > switch rather than the "unmanaged" ones? > > Frequently "managed" switches are a negative. > An Ethernet switch should "just work". indeed. I usually take an Occam's apprach to features, too. but I was chatting with a big-gbe switch vendor last week, and thought of a couple of features which could be useful for HPC: 1. suppose you could attach a QOS/TOS tag to small packets, and have the switch give them preferential treatment. for instance, if there's a congested port with a backlog, let small packets "cut" the queue. 2. the vendor claims that multicast is reliable. I've often pondered whether multicast would be worth using in clusters, since it's going to be faster than even a tree-based multicast (as MPICH/LAM do, I think). 3. it would be neat to be able to query performance/load/queueing stats from the switch on a per-port basis. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Rafael.Tinoco at sun.com Tue Nov 11 11:39:51 2003 From: Rafael.Tinoco at sun.com (Rafael David Tinoco) Date: Tue, 11 Nov 2003 14:39:51 -0200 Subject: [Fwd: Re: Gigabit Switch] Message-ID: <3FB110D7.4030800@sun.com> hello keyan, We've made a 16 node cluster in a project and we've used 2 gigabit NIC with linux BONDING (to balance) for each node. There is no big changes at all in bandwidth because our cluster is not exchanging to much network information. To have 2 NIC in each node i think your application would have to exchange TOO MUCH information between the nodes, and when i say TOO MUCH .. i really mean it!! hehe Our cluster was: 16 hosts V60 SUN with dual XEON 2.8 - 1G RAM and 2 SCSI DRIVES about the managed switchs, i dont think there is any difference. but im not completly sure. regards rafael david tinoco sun professional services - brazil rafael.tinoco at sun.com Keyan Mehravaran wrote: >Hi, > >I am planning to connect 8 dual Xeon PCs >with onboard gigabit through a switch and >I only need access to the "zeroth" node. >I have two questions: > >1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? > >2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? > >Please advise. > >Thank You, > >Kian Mehravaran >Research Assistant >4110 Engineering >Michigan State University >East Lansing, MI 48823 > >__________________________________ >Do you Yahoo!? >Protect your identity with Yahoo! Mail AddressGuard >http://antispam.yahoo.com/whatsnewfree >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ranjansm at psu.edu Tue Nov 11 13:41:10 2003 From: ranjansm at psu.edu (Ranjan S. Mehta) Date: Tue, 11 Nov 2003 13:41:10 -0500 Subject: Peculiar Problem :Any Help would be appreciated Message-ID: <3FB12D46.9010300@psu.edu> Hi all, I have a serial application, which allws me to hook into itself, using dynamically shared objects ( .so ). I wanted to use this .so file to do some parallel processing of the data, which I get from the application at run-time and then feed the results back. Now to add more complexity, I have to do it in Fortran !! ( my advisor needs it like that ). Has anyone done something like this. If yes, any help is appreciated. If NO, tell me why it cannot be done. Please comment, Thanks and regards Ranjan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Tue Nov 11 15:27:48 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 11 Nov 2003 21:27:48 +0100 (CET) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003, Mark Hahn wrote: > 1. suppose you could attach a QOS/TOS tag to small packets, and > have the switch give them preferential treatment. for instance, > if there's a congested port with a backlog, let small packets > "cut" the queue. That means that you actually have a backlog. Many switches, especially the cheap ones, have small buffers that can be filled fast, so further (small) packets might not make it to the switch at all. Furthermore, by letting packets go out-of-order, you make life harder for the receiver... > 2. the vendor claims that multicast is reliable. See Donald's answers to this very question in this thread: http://marc.theaimsgroup.com/?l=linux-net&m=106665132425192&w=2 > 3. it would be neat to be able to query performance/load/queueing stats > from the switch on a per-port basis. That is actually one of the 2 things that I use out of a managed switch... Althought I think the information should be available as SNMP, I use it only when I try to find out if there is some networking problem, which is rarely enough, so I always used the switch CLI or web interface to do it. The second thing that I use from a managed switch is VLAN - not for splitting it in half for bonding, but for things like separating the the control/login/NFS connection from the one used for parallel computation (in case of at least 2 NICs/node). -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Nov 11 16:47:50 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 11 Nov 2003 16:47:50 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: > > 1. suppose you could attach a QOS/TOS tag to small packets, and > > have the switch give them preferential treatment. for instance, > > if there's a congested port with a backlog, let small packets > > "cut" the queue. > > That means that you actually have a backlog. if you ever have two nodes sending to one node (eg gather), you will. > Many switches, especially the > cheap ones, have small buffers that can be filled fast, so further (small) right, but irrelevant. the topic is "is there any point to managable or otherwise fancy switches?" > Furthermore, by letting packets go out-of-order, you make life harder for > the receiver... TCP is good at dealing with out-of-order. practically by by definition! > > 2. the vendor claims that multicast is reliable. > > See Donald's answers to this very question in this thread: > > http://marc.theaimsgroup.com/?l=linux-net&m=106665132425192&w=2 I remember. the point is that the switch vendor claimed that full multicast was not lossy, contradicting Becker's claim. this vendor specializes in large, big-backplane chassis switches, so they might be right. it may be that Don was thinking of a cluster with multiple switches. I think latency is the real appeal of hw-supported multicast - if you want to do a barrier across 256 nodes, do you want a ~8-deep tree of user-level processes farming out your tinygrams (say, 8x50=400 us), or do you want a single 30 us multicast? > > 3. it would be neat to be able to query performance/load/queueing stats > > from the switch on a per-port basis. > > That is actually one of the 2 things that I use out of a managed switch... > Althought I think the information should be available as SNMP, I use it > only when I try to find out if there is some networking problem, which is > rarely enough, so I always used the switch CLI or web interface to do it. which misses the point. I'd like to be able to let a user find out that his node 12 is always bottlenecking the sim because of a problem with his domain decomp, for instance. summary stats are not useful, only per-port (preferably per-flow, but...) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From zarquon at zarq.dhs.org Tue Nov 11 19:40:41 2003 From: zarquon at zarq.dhs.org (zarquon at zarq.dhs.org) Date: Tue, 11 Nov 2003 19:40:41 -0500 Subject: Gigabit Switch In-Reply-To: <3FB1113B.2090903@sun.com> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> <3FB1113B.2090903@sun.com> Message-ID: <20031112004041.GB29819@earendel.org> On Tue, Nov 11, 2003 at 02:41:31PM -0200, Rafael David Tinoco wrote: > ooops, sorry about one thing i forget.. > > to make bonding connections with 2 NICs in each node, you would have > to have 2 VLANS in the switch, 1 nic for each node in vlan0 (for ex) and > the other > in vlan1. Depends on the switch. Some switches have a single mac address / port table, even if they have VLAN support. We had a big managed HP switch that behaved that way. R C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 11 19:07:16 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 11 Nov 2003 19:07:16 -0500 (EST) Subject: Peculiar Problem :Any Help would be appreciated In-Reply-To: <3FB12D46.9010300@psu.edu> Message-ID: On Tue, 11 Nov 2003, Ranjan S. Mehta wrote: > Hi all, > > I have a serial application, which allws me to hook into itself, using > dynamically shared objects ( .so ). > > I wanted to use this .so file to do some parallel processing of the > data, which I get from the application at run-time and then feed the > results back. > > Now to add more complexity, I have to do it in Fortran !! ( my advisor > needs it like that ). > > Has anyone done something like this. If yes, any help is appreciated. > > If NO, tell me why it cannot be done. > > Please comment, I think you'll have to do a better job of describing your program's expected flow, as I at least am very confused. You are making a library? With some sort of recursive call? You have to use Fortran (ooo, quel drag, mon!)? What has this to do with clusters? I can think of lots of ways to implement lots of things "like" what you may be trying to describe (but not in Fortran, which I took a vow never to code in again -- unless of course somebody offers me obscene quantities of money to do so:-). Most of them don't need shared libraries per se, which is one of the things I don't understand. The other is why beowulf list -- are you trying to invoke this recursion across a cluster or something? So that the "shared libraries" live on different systems? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Tue Nov 11 19:37:38 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 11 Nov 2003 16:37:38 -0800 Subject: Gigabit Switch In-Reply-To: References: Message-ID: <20031112003738.GA4558@greglaptop.internal.keyresearch.com> On Tue, Nov 11, 2003 at 04:47:50PM -0500, Mark Hahn wrote: > TCP is good at dealing with out-of-order. practically by by definition! Oh? Please share some test results. The reality is that out-of-order packets are a moderate load on the CPU at best, and Linux isn't exactly great at handling them, especially with multiple cpus and multiple interfaces. > I think latency is the real appeal of hw-supported multicast - if you > want to do a barrier across 256 nodes, do you want a ~8-deep tree of > user-level processes farming out your tinygrams (say, 8x50=400 us), > or do you want a single 30 us multicast? A reliable barrier built using unreliable multicast isn't 30 usec. And MPI programs rarely have barriers. Perhaps you know of an MPI application that really needs a barrier operation? I don't think I've run into one yet. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Nov 11 22:33:44 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 12 Nov 2003 11:33:44 +0800 (CST) Subject: Gridengine 6.0 (new features) Message-ID: <20031112033344.1533.qmail@web16809.mail.tpe.yahoo.com> It's actually old news, but no one mentioned it on this list: http://gridengine.sunsource.net/workshop22-24.09.03/proceedings.html They have presented a lot of new SGE 6.0 features, and i think the most famous one is the cluster queues, and the most interesting one is the SGE P2P client (just like UD or SETI at home). Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel.leiva at uam.es Wed Nov 12 07:58:07 2003 From: angel.leiva at uam.es (Rafael Angel Garcia Leiva) Date: Wed, 12 Nov 2003 12:58:07 +0000 Subject: Q: ATM Beowulf Message-ID: <200311121258.07155.angel.leiva@uam.es> Hi everybody, I am planning to build a cluster (around 300 nodes) for Monte-Carlo simulation. I will run the same program, but with different input data files, on each node. I expect that the computation time is much greater than communication time, and that I will have to transfer large amount of (input and output) data files from working nodes to the master server. Does make sense to use LAN emulation over ATM for this kind of clusters? Has anyone experimented with ATM interconnections? Do you think is it cost-effective today (specially compared to Fast / Gigabit Ethernet)? Thanks in advance. -- Rafael Angel Garcia Leiva Universidad Autonoma Madrid http.//www.uam.es/angel.leiva _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Nov 12 07:05:04 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 12 Nov 2003 13:05:04 +0100 (CET) Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: On Tue, 11 Nov 2003, Keyan Mehravaran wrote: > 1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? For a cluster scenario I would go for the unmanaged switch, since management features might reduce the performance of the switch and are not really needed. In a cluster, all that you want your switch to do is layer-2 switching. It is important that you know your switch's real performance. Sometimes, technical data sheets might help to find out about the capabilities of your switch, but sometimes data sheets are "inaccurate" (not to say "written based on wishfull dreaming"). We described such a switch and how we found out about its capabilities in our paper "Cost/Performance Tradeoffs in Network Interconnects for Clusters of Commodity PCs" [1]. So, before you order a cluster, specify exactly what your switch must be able to do. If it doesn't fulfil your specification, you might get a free upgrade ;-) - Felix [1] http://www.inf.ethz.ch/~rauch/#cac03 -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Wed Nov 12 07:54:07 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 12 Nov 2003 13:54:07 +0100 (CET) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003, Mark Hahn wrote: > right, but irrelevant. the topic is "is there any point to managable > or otherwise fancy switches?" Sorry, but I don't see the "irrelevant" part. You mentioned QOS/TOS and I replied that I don't see it as an advantage... Anyway, another point is that unmanaged switches with large number of ports are not so common. Maybe your switch vendor can explain why ? > TCP is good at dealing with out-of-order. practically by by definition! Sure. But at what cost ? Do you want to do some computation too on that node ? :-) If you talk about multicast, you eliminate TCP from the discussion. Then how do you synchronize between data transmitted over TCP and some zero-payload (barrier) sent through other protocol ? The stack guarantees in-order delivery of data for the same socket, not for different sockets or even more for different protocols. > the point is that the switch vendor claimed that full multicast was not > lossy, contradicting Becker's claim. Not only the switch has to be non-lossy but the stack as well. Packets can be dropped for example at network driver level (let's say Rx overrun) or simply transmission errors, so the forwarding logic in the switch is not involved. > this vendor specializes in large, big-backplane chassis switches, so > they might be right. I never thought of this but I can probably build a switch that never drops a correctly received packet by putting insane amount of buffers on it. But at what price (money as well as performance) ? And what do you do with transmission errors ? > or do you want a single 30 us multicast? Pardon me, but "reliable" and "single" do not match in my view. Reliable to me means that receiver acknowledges, which can be done by unicast or multicast again. And the problem (and latency) multiplies... > I'd like to be able to let a user find out ... You're too kind to your users :-) I've never been given information related to communication problems on any parallel computer I tried to make CHARMM run on for my group. Sure, I often got CPU performance counters, but never communication parameters. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Tue Nov 11 15:13:08 2003 From: nixon at nsc.liu.se (nixon at nsc.liu.se) Date: Tue, 11 Nov 2003 21:13:08 +0100 Subject: Gigabit Switch In-Reply-To: (Donald Becker's message of "Tue, 11 Nov 2003 12:32:32 -0500 (EST)") References: Message-ID: Donald Becker writes: > Frequently "managed" switches are a negative. > An Ethernet switch should "just work". > Providing configuration options just encourages setting the switch to > flawed modes, such as forced-full-duplex or filtering packet types you > thought you were not using. On the other hand, in the real world autonegotiation doesn't always work. And when you get in that spot, it's *very* nice to be able to lock down a port's mode. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 12 09:10:35 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 12 Nov 2003 09:10:35 -0500 (EST) Subject: Q: ATM Beowulf In-Reply-To: <200311121258.07155.angel.leiva@uam.es> Message-ID: On Wed, 12 Nov 2003, Rafael Angel Garcia Leiva wrote: > > Hi everybody, > > I am planning to build a cluster (around 300 nodes) for Monte-Carlo > simulation. I will run the same program, but with different input data files, > on each node. I expect that the computation time is much greater than > communication time, and that I will have to transfer large amount of (input > and output) data files from working nodes to the master server. > > Does make sense to use LAN emulation over ATM for this kind of clusters? Has > anyone experimented with ATM interconnections? Do you think is it > cost-effective today (specially compared to Fast / Gigabit Ethernet)? >From what you describe, it perhaps depends on what "large amounts of input and output files" works out to in more detail, but the answer is almost certainly not. The problem is embarrassingly parallel (completely independent programs) which makes it relatively easy to figure out how performance is likely to depend on the actual sizes (transfer times) of the programs relative to their run time. What you probably need to do is set up (or borrow from a friendly vendor -- most serious cluster vendors have a test cluster and will cheerily loan you an account) a few test nodes with gigabit interconnects. Measure the time it takes to actually run your program alone, then the time it takes to run your program WHILE copying its "next" input data set in and its "last" data set out (without any sort of e.g. ssh encryption -- use as raw as possible a data transfer tool). Depending on how effective your NIC is at doing DMA transfers and how I/O bound the MC code is, copying large files while your job is running may not count as a "serial penalty" against your CPU/memory bound computation. It will also give you a pretty accurate idea of what the actual transfer times are on Gbps ethernet relative to run times. This in turn will give you some clue as to required server capacity and whether or how to distribute/gather the files from a single server or multiple servers (whether or not this will help will of course depend on what use you make of the files when you get them). Part of the problem with your question is that as you frame it nobody can answer it -- yet. It requires detailed data. If by "large" you mean a 10 MB input file (which is yeah, pretty large) and a 100 MB output file (ditto), well, that is roughly 1-2 seconds on a 100BT connection for input transfer, 10-15 seconds for output transfer. If the program runs for 24 hours per input and output transfer, well, you could run on 300 nodes with 100BT and never warm up the lines. If by "large" you mean 10x (100MB in, 1 GB out), but still 24 hours computation it would STILL run pretty perfectly on 100BT. If by "large" you mean ANOTHER 10x (1 GB out, 10 GB in), you're finally up to a significant fraction of an hour for the data transfer at 100BT relative to a daylong run. However, at 1000BT you are still on the order of minutes of I/O total (maybe 90 Gbits to transfer on a 1 Gbps line at perhaps 50% efficiency -- three minutes or so?) and keeping all 300 nodes fed takes only 900 minutes (fifteen hours), which is less than the 1440 minutes of a day. So a single server with Gbps ethernet could distribute and collect results from 15+ hour long computations with 3 minutes of pure serial I/O per computation on 300 nodes and (barely) not block. If your NIC and disk channel manages DMA and can run modestly in parallel with your computation, it simply improves things. Obviously the important thing is the RATIO of computation to additional per node serial communication (assuming optimal round-robin task organization); if this ratio remains at roughly 300:1 a single server with the cheapest network that can sustain the ratio should suffice. If the ratio is less than this, you have to start to think. For example, would it be better (or even possible) to a) channel bond to increase bandwidth and decrease serial I/O time; b) use more than one server (servers are cheap relative to high end networks, and fortunately your task can use stacked relatively cheap switches as you'll only use a single channel at a time to the nodes in round robin -- IIRC jgigabit ethernet gets more expensive than alternative high end networks if you insist on putting 300 nodes on a single switching fabric); c) use a high end network. I don't usually think of ATM in this last category -- I'd think of Myrinet or SCI, probably the latter because it is switchless and perhaps a bit cheaper per node while still adequate, and you're VERY LIKELY to be able to find a task organization where it is adequate. Also, I'm pretty sure both Myrinet and SCI use DMA very effectively and will largely parallelize the actual data transfer with your computation. Ultimately, when you work out the actual numbers for the different networks (at least approximately) you have to do a cost benefit analysis and just pick the cheapest alternative that will scale to 300 nodes. Fortunately, with an EP computation it is pretty easy to actually do this very systematically and be pretty confident that you have a near-optimal design. rgb > > Thanks in advance. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Nov 12 08:56:20 2003 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 12 Nov 2003 14:56:20 +0100 (CET) Subject: Q: ATM Beowulf In-Reply-To: <200311121258.07155.angel.leiva@uam.es> Message-ID: On Wed, 12 Nov 2003, Rafael Angel Garcia Leiva wrote: > > Does make sense to use LAN emulation over ATM for this kind of clusters? Has > anyone experimented with ATM interconnections? Do you think is it > cost-effective today (specially compared to Fast / Gigabit Ethernet)? The list will know that a few years ago I was very enthusuastic about ATM. I put in a leading edge ATM network at a hospital in the UK for medical imaging. I was a proponent of using ATM for clustering. These days, I would say the idea is not so good. You get built-in Gigabit network interfaces on many motherboards, and Gigabit switches are really cheap. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ZukaitAJ at nv.doe.gov Wed Nov 12 09:43:21 2003 From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony) Date: Wed, 12 Nov 2003 06:43:21 -0800 Subject: mpirun + Scyld MPI Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E2B@lao-exchpo1-nt.nv.doe.gov> I am currently using MPI distributed with scyld which I believe is MPICH. I have 6 dual CPU nodes for a total of 12 cpu's. When ever I try to use 12 processors it puts 3 processes on one of the nodes and only one process on the master node. I have tried using a machinefile like master:2 .0:2 .1:2 .2:2 .3:2 .4:2 and -map and it doesnt seem to help. Any hints? -----Original Message----- From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com] Sent: Friday, November 07, 2003 10:04 AM To: beowulf at beowulf.org Subject: Beowulf digest, Vol 1 #1533 - 13 msgs Send Beowulf mailing list submissions to beowulf at beowulf.org To subscribe or unsubscribe via the World Wide Web, visit http://www.beowulf.org/mailman/listinfo/beowulf or, via email, send a message with subject or body 'help' to beowulf-request at beowulf.org You can reach the person managing the list at beowulf-admin at beowulf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beowulf digest..." Today's Topics: 1. Re:Scyld and MPICH. (William Gropp) 2. Re:Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux (Glen Kaukola) 3. Tyan 2880 and 2885 (Mike Sullivan) 4. Article: Sony Cell CPU to deliver two teraflops in 64-core config (Tod Hagan) 5. Re:Cluster Poll Results (tangent into OS choices) (=?iso-8859-1?Q?=C5smund_=D8deg=E5rd?=) 6. Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) (Rayson Ho) 7. INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL (Joey Sims) 8. Re:Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) (Craig Rodrigues) 9. Re:Article: Sony Cell CPU to deliver two teraflops in 64-core config (John Hearns) 10. OctigaBay 12K (Franz Marini) 11. Re:OctigaBay 12K (Robert G. Brown) 12. Re:Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) (Jan Schaumann) --__--__-- Message: 1 Date: Thu, 06 Nov 2003 11:48:24 -0600 To: "Zukaitis, Anthony" From: William Gropp Subject: Re: Scyld and MPICH. Cc: "'beowulf at scyld.com'" , mpi-maint at mcs.anl.gov At 10:55 AM 11/6/2003, Zukaitis, Anthony wrote: >I am having a problem with MPI_reduce and I believe that it is a buffer size >error. Is there a way to calculate the maximum size of the buffer and what >is the maximum size of the buffer allowed? It does not seem to be linear >with the number of processors. There should be no maximum buffer size, though the ch_p4 device does impose a limit when shared memory is used to transfer a message. Do you have an example program that we could test (Bug reports for MPICH should be sent to mpi-maint at mcs.anl.gov) Bill --__--__-- Message: 2 Date: Thu, 06 Nov 2003 10:37:59 -0800 From: Glen Kaukola To: Konstantin Kudin CC: beowulf at beowulf.org Subject: Re: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux Konstantin Kudin wrote: > Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? > > We have a few of the s2880's. They were real problematic at first in that they'd constantly crash. But it turned out that when I downgraded the bios, all of our problems went away. Of course I also needed to install the latest 2.4.22 kernel before the machines would boot with the older bios installed. I'm not sure what to tell you about the serial ata support, as I've never played with it. Linux seems to support the nic just fine though. Hope that helps, Glen --__--__-- Message: 3 Date: Thu, 06 Nov 2003 13:39:51 -0500 From: Mike Sullivan Reply-To: mike.sullivan at alltec.com To: beowulf at beowulf.org Subject: Tyan 2880 and 2885 >Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? I have used the 2880 under RedHat AS 2.1 and gingin64 and it works fine execpt for the SATA controller. I did not get the promise chip to work but did not spend a lot of time on it. The GigE interface works. The board was stable and I have been using them in NAS devices with 3ware cards. The SMDC option for these units works fairly well with the most recent console and you can get sensor data. > It seem like there are drivers for AMD-8111/8131/8151 >chipset on the AMD page, drivers for the Broadcom >network chip in other places. Any feedback on SATA >support for the Silicon Image Sil3114 SATA RAID >Accelerator and on SATA support in general? Any other >caveats? I also have both a 2882 and 2885 that I will be testing early next week with Suse Linux 9 for AMD64 and would will post my findings. Thanks in advance for any help! Konstantin -- Mike Sullivan Director Performance Computing @lliance Technologies, Voice: (416) 385-3255 x 228, 18 Wynford Dr, Suite 407 Fax: (416) 385-1774 Toronto, ON, Canada, M3C-3S2 Toll Free:1-877-216-3199 http://www.alltec.com --__--__-- Message: 4 Subject: Article: Sony Cell CPU to deliver two teraflops in 64-core config From: Tod Hagan To: Beowulf List Date: 06 Nov 2003 15:02:25 -0500 http://www.theregister.co.uk/content/3/33791.html It also mentions the ClearSpeed chip that was discussed here recently. --__--__-- Message: 5 Date: Thu, 06 Nov 2003 23:52:28 +0100 To: beowulf at beowulf.org Subject: Re: Cluster Poll Results (tangent into OS choices) Reply-To: aasmund at simula.no From: =?iso-8859-1?Q?=C5smund_=D8deg=E5rd?= Organization: Simula Research Laboratory AS On Wed, 5 Nov 2003 00:05:13 +0000, Andrew M.A. Cater wrote: > > On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: >> >> There are interesting bits in debian. I am not sure it is necessarily >> the right choice for clusters due to the specific lack of commercial >> support for cluster specific items such as Myrinet, and the other high >> speed interconnects. > > Dan - if I build a _really big_ cluster, will you get Quadrics to do > Debian :) > Same goes for any other vendor - if you ask them nicely and make it > worth their while, they'll do it. In many cases, it's only a recompile > of a device driver to account for library differences, after all. > > HP use Debian internally, IIRC. Some of the Debian developers are also > HP folk - HP are potentially looking to support more of their products > under Linux? [See, for example, Debian Weekly News for today :) ]' Actually, we have quite recently installed a Itanium2 based cluster, using debian, because we want debian. We got HP to do it for us, using the (former Compaq) CMU tool. They did some porting to support debian in this tool... So, ask nicely (and put it as a requirement to let them get the deal), and you can get what ever you want ;-) >> Commercial compiler support for Debian (e.g. >> Intel, Absoft, et al) is largely non-existant as far as I know (please >> do correct me if I am wrong). No problem with Intel compilers on Debian (alien do the trick). -- [simula.research laboratory] ?smund ?deg?rd Scientific Programmer / Chief Sys.Adm phone: 67828291 / 90069915 http://www.simula.no/~aasmundo --__--__-- Message: 6 Date: Thu, 6 Nov 2003 16:59:51 -0800 (PST) From: Rayson Ho Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) To: bioclusters at bioinformatics.org, beowulf , Linux Cluster , List A very good paper about building HPC clusters with FreeBSD: "Building a High-performance Computing Cluster Using FreeBSD" http://people.freebsd.org/~brooks/papers/bsdcon2003/ The author talked about hardware issues: KVM, BIOS redirection, CPU choices; and then talked about why he chose FreeBSD instead of Linux... he also did the port of GridEngine (SGE) to FreeBSD. Anyone tried to setup HPC clusters with *BSD?? Rayson --- Fernan Aguero wrote: > Any FreeBSD users willing to share clustering experiences > out there? > > Fernan __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree --__--__-- Message: 7 Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL Date: Thu, 6 Nov 2003 22:07:53 -0500 From: "Joey Sims" To: Maybe someone could lend a hand and help Intel find out what their unknown material is. Be careful! Don't spill it in your lap for goodness sake.... Dohh! :-O I found this amusing: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL 11.07.03 by Jennifer Tabor HPCwire ======================================================================== ====== Chip makers are searching for ways to create smaller and smaller computer chips, and researchers at Intel believe they have discovered a new material that would help them to do just that. Intel's announcement will garner much attention in an industry where the demand for products that push fundamental physical limits is ever increasing. A problem afflicting many chip makers today is the prevention of electrical currents from leaking outside their proper patches. Because the transistor gates are now becoming as small as just five atomic layers, chips need more power. In turn, they also need a more efficient cooling system. Intel has been having difficulties with the cooling of its chips -- the smaller they get (with etchings as small as 90-130 nanometers), the hotter they become. Recent reports say that the problem has even caused a delay in the Prescott, Intel's most advanced version of the Pentium. Though the new technology would not debut until approximately 2007, Intel is planning to scale down their current 90 nanometer chip size over the years to 65, followed by 45. It is at this point that Intel's new material, which is still unknown, would be introduced. Intel's discovery comes at the height of an intense industry wide search for a new material to replace silicon dioxide, which is used as insulator between the gate and the channel through which current flows in an active transistor. Intel researchers have been working on solving the chip predicament for five years in efforts to keep pace with Moore's Law. Gordon E. Moore, co-founder of Intel, believed that the number of transistors in the same space should double every 18 months. Intel believes they can continue to make short strides, despite the thoughts of many who doubt their ability to keep up such a pace. Though many researchers and competitors agree that Intel's announcement revolves around the most important research area in the chip industry, some feel that the lack of specific technical detail will deter scientists from assessing their claims. ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== --__--__-- Message: 8 Date: Thu, 6 Nov 2003 23:04:15 -0500 From: Craig Rodrigues To: Rayson Ho Cc: bioclusters at bioinformatics.org, beowulf , Linux Cluster , List Subject: Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) On Thu, Nov 06, 2003 at 04:59:51PM -0800, Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? Hi, Not quite the same as an HPC cluster, but take a look at the University of Utah's Emulab: http://www.emulab.net It is heavily based on FreeBSD (i.e. makes use of FreeBSD routing, Dummynet, etc.). The Emulab is a remotely accessible testbed that researchers can use to conduct network experiments. It consists of about 200 PC nodes. The same company that Brooks works for (Aerospace), has apparently set up an internal testbed based on the Emulab software developed at Utah. I use the Emulab every day as party of my research work at BBN, and it is an excellent facility. -- Craig Rodrigues http://crodrigues.org rodrigc at crodrigues.org --__--__-- Message: 9 Subject: Re: Article: Sony Cell CPU to deliver two teraflops in 64-core config From: John Hearns To: beowulf at beowulf.org Organization: Clustervision Date: Fri, 07 Nov 2003 10:13:40 +0100 And also on The Reg: http://www.theregister.co.uk/content/3/33813.html The Reg reckons Opteron 250s by early next year. --__--__-- Message: 10 Date: Fri, 7 Nov 2003 13:56:28 +0100 (CET) From: Franz Marini To: beowulf at beowulf.org Subject: OctigaBay 12K Hello, just discover this interesting, imho, company and its first product : http://www.octigabay.com/ Their first product is a linux opteron-based cluster that they said could scale up to 12K processors. The base system is a 3.5U shelf with 12 opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor latency and 77GB/s aggregate mem bandwidth. Seems nice, I would like to know what rgb and some of the other people in here think about it :) Have a nice day, Franz --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it --------------------------------------------------------- --__--__-- Message: 11 Date: Fri, 7 Nov 2003 08:44:11 -0500 (EST) From: "Robert G. Brown" To: Franz Marini Cc: beowulf at beowulf.org Subject: Re: OctigaBay 12K On Fri, 7 Nov 2003, Franz Marini wrote: > Hello, > > just discover this interesting, imho, company and its first product : > > http://www.octigabay.com/ > > Their first product is a linux opteron-based cluster that they said > could scale up to 12K processors. The base system is a 3.5U shelf with 12 > opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor > latency and 77GB/s aggregate mem bandwidth. > > Seems nice, I would like to know what rgb and some of the other people > in here think about it :) Why, it looks simply lovely, as hardware I've never actually tried goes. I mean, if the octigabay people want to send me one for free just so I can write a review for it on this list and the brahma website, well, from the look of it I wouldn't kick it out of my machine room for chewing crackers... and I >>can<< be bought, folks, yes I can, just look at the brahma vendors page and my brazen demand for t-shirts in exchange for space:-) I'll even dig up something fine grained to run on it so that I can pretend to really test it. The bottom line is, well, the bottom line. Pretty isn't enough. Performance (even performance that is absolutely everything promised) isn't enough. It is PRICE performance that matters, or better yet cost-benefit. How does the cost compare to the benefits the design delivers in your environment. For my own personal code, for example, I don't NEED their fancy interconnect, and I can rack up a bunch of opterons for the cost of the basic hardware and a nice case to put them in. They'd therefore have to literally give it to me to make it a cost-benefit win (especially true since I just spent the last of my money in this grant cycle buying hey, whaddya know, a stack of 9 dual Opteron 242's for a hair over $20K). However, there are people out there who run fine grained synchronous parallel code that is bottlenecked at the network IPC level. Even THERE the computations have some intrinsic "value" in that there are finite amounts of money people are willing to pay to get them done, and there are choices. So ultimately it will come down to whether there is a match between the value of the computation (amount people are willing to pay to get it done), the needs of the computation, and the marketplace. It's one of these people that you need to ask about whether or not this is a good deal or good arrangement. My knee jerk reaction is that it is lovely but a bit too far into the big iron side (SP3-ish) to be likely to win a hard-nosed CB comparison relative to a DIY cluster with e.g. myrinet or SCI for MANY clustervolken (the market gets smaller and smaller the further up one travels to super-high-speed networks), but corporate consumers and the larger government consumers shy away from DIY, and even in the intermediate market it comes down to price/performance, eh? If they price it competitively with the other high speed networks and it has clear benefits (as it looks like it might) well then, who knows? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu --__--__-- Message: 12 Date: Fri, 7 Nov 2003 10:00:56 -0500 From: Jan Schaumann To: beowulf at beowulf.org Subject: Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) [Resending; this message was originally sent last night across the various mailing lists, but beowulf at beowulf.org chokes on the gpg signature. :-/ ] Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? I have a 30 node NetBSD/i386 cluster, and just recently created the tech-cluster at netbsd.org mailing list. Some people are working on a port of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in particular for cluster usage in the near future. Some URLs of relevance: http://guinness.cs.stevens-tech.edu/~jschauma/hpcf/ http://www.netbsd.org/MailingLists/#tech-cluster http://www.netbsd.org/ http://eurobsdcon.org/papers/#souvatzis http://bsd.slashdot.org/article.pl?sid=03/10/20/1523252&mode=thread&tid=122& tid=185&tid=190 http://bsd.slashdot.org/bsd/03/11/05/1536226.shtml?tid=122&tid=185&tid=190 -Jan -- Life," said Marvin, "don't talk to me about life." --__--__-- _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf End of Beowulf Digest _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 12 10:58:38 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 12 Nov 2003 10:58:38 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003, Mark Hahn wrote: > indeed. I usually take an Occam's apprach to features, too. > but I was chatting with a big-gbe switch vendor last week, > and thought of a couple of features which could be useful for HPC: > > 1. suppose you could attach a QOS/TOS tag to small packets, and > have the switch give them preferential treatment. for instance, > if there's a congested port with a backlog, let small packets > "cut" the queue. QOS/TOS tags already exist, but consider very carefully before you wish for a LAN switch that observes them. Using QOS/TOS is a good idea on multitraffic, multipath WANs, where bulk transfer would otherwise block telnet-like traffic. But ACKs bypassing data packets on a LAN will likely lead to congestion, with higher overall latency and dropped packets. This is compounded with now-common flow control, which only works within the LAN. > 2. the vendor claims that multicast is reliable. "Our equipment is perfect, and is not limited by fundamental principles". > I've often pondered whether multicast would be worth using in > clusters, since it's going to be faster than even a tree-based > multicast (as MPICH/LAM do, I think). You can construct a set of hardware+protocol+tuning that will work for a demo. But we now use Ethernet switches, not repeaters. Consider that modern switched Ethernet has many of the same constraints as Myrinet, SCI, and IB, where multicast must be emulated. Again, multicast is good for service discovery and low-rate communication. Just don't rely on it for bulk data delivery. > 3. it would be neat to be able to query performance/load/queueing stats > from the switch on a per-port basis. You can some of the info from the connected machines. But you likely want the time-averaged FIFO length and high-water-mark in packets and bytes, right? -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 12 12:23:52 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 12 Nov 2003 12:23:52 -0500 (EST) Subject: Q: ATM Beowulf In-Reply-To: Message-ID: On Wed, 12 Nov 2003, Robert G. Brown wrote: [[ Long, informative text deleted. ]] > Part of the problem with your question is that as you frame it nobody > can answer it -- yet. It requires detailed data. If by "large" you > mean a 10 MB input file (which is yeah, pretty large) and a 100 MB > output file (ditto), well, that is roughly 1-2 seconds on a 100BT Recalibrate your idea of "large". We recently encountered an application that was using PVFS with 50MB files over GbE. With the PVFS filesystem spread over 16 servers, that's only 3MB per machine, or 30-40 msec. of data transfer to permute the file contents. The 50MB files were too small to ignore the start-up overhead and get an accurate performance baseline! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 12 11:46:10 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 12 Nov 2003 11:46:10 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003 nixon at nsc.liu.se wrote: > Donald Becker writes: > > > Frequently "managed" switches are a negative. > > An Ethernet switch should "just work". > > Providing configuration options just encourages setting the switch to > > flawed modes, such as forced-full-duplex or filtering packet types you > > thought you were not using. > > On the other hand, in the real world autonegotiation doesn't always work. > And when you get in that spot, it's *very* nice to be able to lock > down a port's mode. The only autonegotiation problems I'm aware of is firmware bugs in early Cisco and 3Com switches. The switches would autonegotiate, but sometimes would not notice the parameter changes. To draw an automotive analogy "sometimes starter motors fail, thus all cars should have a hand crank in the front". The proper solution is to replace with working equipment. A fall back is to disable autonegotiation and use 10/100 speed sensing half duplex. A flawed approach (unfortunately the one recommended by Cisco) was to force speed and full-duplex. A great thing about autonegotiation is that it is automatic, transparent and extensible. Most installations are now using Ethernet flow control. Because it is configured using autonegotiation, almost no one knows that they have it. Things just work better. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From asabigue at fing.edu.uy Wed Nov 12 06:16:48 2003 From: asabigue at fing.edu.uy (Ariel Sabiguero) Date: Wed, 12 Nov 2003 14:16:48 +0300 Subject: Q: ATM Beowulf In-Reply-To: <200311121258.07155.angel.leiva@uam.es> References: <200311121258.07155.angel.leiva@uam.es> Message-ID: <3FB216A0.6070100@fing.edu.uy> Rafael Angel Garcia Leiva wrote: >Hi everybody, > >I am planning to build a cluster (around 300 nodes) for Monte-Carlo >simulation. I will run the same program, but with different input data files, >on each node. I expect that the computation time is much greater than >communication time, and that I will have to transfer large amount of (input >and output) data files from working nodes to the master server. > > I might be quite optimistic, but it looks like you do not need an expensive network at all. All you need is a good file-server and that's it. if you are not able to feed the nodes with a single fileserver, maybe you can rsync 6 of them an each of them feed serve 50 nodes. You can have your own comodity-gigabit-SAN between servers and attach 2x24 port switches to them. Let me emphasize this: I might be getting a wrong picture of your problem, but it seems that it accepts jitter and delay. From your description I assume that there is no need for expensive synchronizations.... I think you will save lot's of money and even get a better solution. Ariel >Does make sense to use LAN emulation over ATM for this kind of clusters? Has >anyone experimented with ATM interconnections? Do you think is it >cost-effective today (specially compared to Fast / Gigabit Ethernet)? > >Thanks in advance. > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: asabigue.vcf Type: text/x-vcard Size: 166 bytes Desc: not available URL: From gerry.creager at tamu.edu Wed Nov 12 09:38:10 2003 From: gerry.creager at tamu.edu (Gerry Creager (N5JXS)) Date: Wed, 12 Nov 2003 08:38:10 -0600 Subject: Q: ATM Beowulf In-Reply-To: References: Message-ID: <3FB245D2.7060300@tamu.edu> To expand on that a little bit, the overhead associated withn LANE is also going to be a problem for you, even ic communications will be a small segment of your cluster activity. If your application were capable of talking directly at the ATM layer, and not have to go through the framing and conversion issues with either LANE or IPOA, then you could see some advantages. However, the issue of sending 1500 byte... or 9kB... packets over ethernet, taking advantage of the somewhat faster encoding we see for ethernet today, far outstrips the potential benefits of ATM on a cluster environment. And making up 1500 byte packets, then resending them as 53 byte cells, with 10% overhead, just doesn't make sense anymore save in a connection-oriented network like WAN or Carrier. And, fwiw, the carriers are also dropping ATM for GBE, 10GBE and carrying same over multiplexed lamdas in the glass now. Gerry John Hearns wrote: > On Wed, 12 Nov 2003, Rafael Angel Garcia Leiva wrote: > > >>Does make sense to use LAN emulation over ATM for this kind of clusters? Has >>anyone experimented with ATM interconnections? Do you think is it >>cost-effective today (specially compared to Fast / Gigabit Ethernet)? > > > The list will know that a few years ago I was very enthusuastic > about ATM. I put in a leading edge ATM network at a hospital in > the UK for medical imaging. I was a proponent of using ATM for clustering. > > These days, I would say the idea is not so good. > You get built-in Gigabit network interfaces on many motherboards, > and Gigabit switches are really cheap. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Office: 979.458.4020 FAX: 979.847.8578 Cell: 979.229.5301 Pager: 979.228.0173 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 21:34:45 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 18:34:45 -0800 (PST) Subject: Linux vs FreeBSD clusters ... In-Reply-To: Message-ID: hi ya robert - i truncated the long subject ... hope it didnt mess up anybody's procmail On Mon, 10 Nov 2003, Robert G. Brown wrote: > > good point ... > > > > - i think that "volunteer-based distro" will survive all the commercial > > methodologies ... > > - commercial folks are out to make $$$$ to attempt to cover the > > costs of marketing, sales, advertisement and analysts expectations > > > > - voluteers do what they do, because its what they like doing and will > > probably continue doing so for the next few eons > > Alvin and Arthur, > > I don't think either one will disappear anytime soon and think that > we've entered an era where the two can exhibit excellent synthesis. > There is nothing wrong with commercial distributions, or commercial > distributions making money, as long as they remember: > > a) They don't own their product. > > b) They are therefore at best selling added value, such as support. > > c) This puts pretty strict limits on what they can sanely charge. > > Some of the major distributions may be forgetting c) just a bit, but the > market will correct this soon enough:-) Or maybe this is just wishful > thinking and some marketing hype to give their stocks a bit of a bounce. yes.. thats the problem ... all the others ( the suits ) tend to forget where all their new widgets tehy are able to sell ( at high margins w/o any overhead r/d expenses ) are coming from i always buy the full blown cdroms from which ever distro the clients want to use ... ( my little contribution ) vs burning my own cdrom of other people's distro paying for "support" ( per phone call, per email, per task, per contract is fine ... ) making everybody pay at least $1500 for a "pre-packaged product" is not "fine" ( in my book, and lowering the "value" you get for it ) and unfortunately, if one big-boy does it, all the other equivalent big boys or medium sized boys will also try to bump their prices and try to compete with similar business models when they see their revenues drop from too high a price, than they might adjust their plans 6mon, a year later ... and jsut announce quarterly losses for a while ... - wish i can tell the landlords and ISPs that we suffered a big $$$ loss this quarter .. so you should buy more stock :-) have fun alvin - crystal ball says: "there needs to be a new generation of GPL licenses, that's free for non-commercial use, otherwise pay up..." and there's 20-30 different variations of the licenses .. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ole at scali.com Wed Nov 12 10:18:47 2003 From: ole at scali.com (Ole W. Saastad) Date: 12 Nov 2003 16:18:47 +0100 Subject: Gigabit Switch In-Reply-To: <200311121257.hACCvxS24146@NewBlue.scyld.com> References: <200311121257.hACCvxS24146@NewBlue.scyld.com> Message-ID: <1068650327.26645.85.camel@pc-2.office.scali.no> Some comments about the channel aggregation of Gibabit ethernet channels and switches. > > Message: 1 > Date: Tue, 11 Nov 2003 12:32:32 -0500 (EST) > From: Donald Becker > To: Keyan Mehravaran > cc: beowulf at beowulf.org > Subject: Re: Gigabit Switch > > On Tue, 11 Nov 2003, Keyan Mehravaran wrote: > > > I am planning to connect 8 dual Xeon PCs > > with onboard gigabit through a switch and > > I only need access to the "zeroth" node. > > I have two questions: > > > > 1) Is there any benefit to using "managed" > > switch rather than the "unmanaged" ones? > > Frequently "managed" switches are a negative. > An Ethernet switch should "just work". > Providing configuration options just encourages setting the switch to > flawed modes, such as forced-full-duplex or filtering packet types you > thought you were not using. > > > 2) Is it possible to increase bandwidth by > > adding an extra gigabit NIC to each node? > > If the answer is yes, then should all the > > 16 ports connect to the same switch? > Scali has also addressed this issue and developed a device for Scali MPI Connect (SMC) called Direct Ethernet Transport, DET. This bypasses the tcp/ip stack and works well with a single gigabit ethernet channel. However, the main benefit of DET is that it very simple to bond two NICs to a single device, usually named det2. For ScaMPI usage you just select at run time det2 instead of det0, tcp, myr0 or sci. > Yes, you can marginally increase bandwidth. But it's not worth it. > I do not agree, we see a marginally lower latency, but the bandwidth increase when going from one to two gigabit channels are in the order of 50-60%. When we get approx 110 MB/sec using one channel this approach yield 165 to 175 MB/sec. When doing exchange full duplex we can quote a number like 350 MB/sec. > If you channel bond GbE, you'll likely get out-of-order packets on > the receiving side and consume much more CPU to reassemble. > If you trunk, you will not see higher peak bandwidth, and may still > suffer from bad cache or interrupt affinity effects. Yes, this is true and if you do not really are constrained by bandwidth is does not pay off. Latency are most of the time the constraint. Check your application with a test setup using channel aggregation and measure yourself. > You should use separate switches for channel bonding. Although it's > possible to use VLAN to avoid this, that's brings us back to the > switch configuration issue. And two half size switches are less > expensive than one. > Switches up to 24 ports are so cheap today that you can just buy an extra. For large switches the algebra becomes more complex. The cost per port becomes so high that you can consider using a high performance interconnect like Myrinet, Infiniband or SCI. This will in addition to high bandwidth give you a very low latency which is beneficial for most applications. > Bottom line: use a single GbE channel unless there is a specific > application reason to do otherwise. Agree, but is said before, if your really need bandwidth there is an option you can try with Gigabit Ethernet. > > -- > Donald Becker becker at scyld.com > Scyld Computing Corporation http://www.scyld.com > 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system > Annapolis MD 21403 410-990-9993 -- Ole W. Saastad, Dr.Scient. Manager, ISV relations/Business Dev. dir. +47 22 62 89 68 fax. +47 22 62 89 51 mob. +47 93 05 74 87 ole at scali.com Scali - www.scali.com High Performance Clustering _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 12 18:38:22 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 13 Nov 2003 10:38:22 +1100 Subject: list managemnt issue In-Reply-To: References: Message-ID: <200311131038.27648.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 13 Nov 2003 10:22 am, Joel Jaeggli wrote: > can some plase blackhole anyone at systemsfirm.net their mailserver has > been perodically bouncing messages sent to the list for weeks... Also note that it should be trivial to automatically bin email from postmaster at systemsfirm.net in your MUA. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/ssRuO2KABBYQAh8RAiPTAJsH8EjXUgpj2IMRKR8ro7zch9vudACfSABa hp+rS9x/nshVQT+s9QZn9t4= =FiSc -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Wed Nov 12 18:22:36 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed, 12 Nov 2003 15:22:36 -0800 (PST) Subject: list managemnt issue Message-ID: can some plase blackhole anyone at systemsfirm.net their mailserver has been perodically bouncing messages sent to the list for weeks... Date: Wed, 12 Nov 2003 18:16:00 -0500 From: postmaster at systemsfirm.net To: joelja at darkwing.uoregon.edu Subject: Delivery Status Notification (Failure) Parts/Attachments: 1 Shown 8 lines Text (charset: Unknown) 2 Shown 338 bytes Message, "Delivery Status" 3 Shown 5.4 KB Message, "Re: building a RAID system" 3.1 Shown ~43 lines Text ---------------------------------------- Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Wed, 12 Nov 2003 18:03:31 -0500 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Wed, 15 Oct 2003 00:36:09 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Wed, 15 Oct 2003 01:24:42 -0400 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, 14 Oct 2003 23:02:08 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 14 Oct 2003 23:50:47 -0400 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, 14 Oct 2003 21:59:52 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 14 Oct 2003 22:59:51 -0400 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, 14 Oct 2003 21:07:45 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 14 Oct 2003 22:07:45 -0400 Return-Path: Received: from newblue.scyld.com ([64.237.107.19]) by ; Wed, 08 Oct 2003 17:42:03 -0500 Received: from NewBlue.scyld.com (localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.10.2/8.10.2) with ESMTP id h98L0tb28952; Wed, 8 Oct 2003 17:00:55 -0400 Received: from darkwing.uoregon.edu (root at darkwing.uoregon.edu [128.223.142.13]) by NewBlue.scyld.com (8.10.2/8.10.2) with ESMTP id h98Kvqb28616 for ; Wed, 8 Oct 2003 16:57:53 -0400 Received: from twin.uoregon.edu (twin.uoregon.edu [128.223.214.27]) by darkwing.uoregon.edu (8.12.10/8.12.10) with ESMTP id h98KwHEA005624 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NOT); Wed, 8 Oct 2003 13:58:17 -0700 (PDT) From: Joel Jaeggli X-X-Sender: joelja at twin.uoregon.edu To: Daniel Fernandez cc: beowulf at beowulf.org -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Nov 13 00:11:35 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 13 Nov 2003 00:11:35 -0500 (EST) Subject: list managemnt issue In-Reply-To: Message-ID: On Wed, 12 Nov 2003, Joel Jaeggli wrote: > can some plase blackhole anyone at systemsfirm.net their mailserver has > been perodically bouncing messages sent to the list for weeks... The logs report that the address was automatically deleted because of bounces on October 20. The problem is old queued messages, and an apparent mail loop. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Wed Nov 12 23:23:35 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed, 12 Nov 2003 20:23:35 -0800 Subject: list managemnt issue In-Reply-To: <200311131038.27648.csamuel@vpac.org> References: <200311131038.27648.csamuel@vpac.org> Message-ID: <20031113042335.GA17561@sphere.math.ucdavis.edu> On Thu, Nov 13, 2003 at 10:38:22AM +1100, Chris Samuel wrote: > On Thu, 13 Nov 2003 10:22 am, Joel Jaeggli wrote: > > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > been perodically bouncing messages sent to the list for weeks... > > Also note that it should be trivial to automatically bin email from > postmaster at systemsfirm.net in your MUA. > > Chris So instead of the list admin blocking a obviously broken mail setup, every poster to this list should have to setup a seperate filter? Please unsubscribe dan at systemsfirm.com. I'm getting hourly bounces to my posting of over a month ago, I suspect every other poster is as well. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Nov 13 00:51:56 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 12 Nov 2003 21:51:56 -0800 (PST) Subject: list managemnt issue In-Reply-To: Message-ID: On Thu, 13 Nov 2003, Donald Becker wrote: > On Wed, 12 Nov 2003, Joel Jaeggli wrote: > > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > been perodically bouncing messages sent to the list for weeks... > > The logs report that the address was automatically deleted because of > bounces on October 20. > > The problem is old queued messages, and an apparent mail loop. sometimes *-you-*, the subscriber have to clean things up and this systemsfirm.net stuff is simple to get rid of that junk .. and nope ... i dont get those systemsfirm.net junk anymore .... c ya alvin # # i added their ip# and domains to the /etc/mail/access list # # # cd /etc/mail # # make # # restart sendmail or exim or ?? # # if you don't have access or control of your mta, this might be a good # time to rethink your "how do i get email" strategy # or add some pop mail filtering before you get your mail, etc # systemsfirm.net REJECT - geez .. do you need help to fix your PC # 1.2.3.4 REJECT - more junk # # .. more ip# for their junk .. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 12 23:52:51 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 13 Nov 2003 15:52:51 +1100 Subject: list managemnt issue In-Reply-To: <20031113042335.GA17561@sphere.math.ucdavis.edu> References: <200311131038.27648.csamuel@vpac.org> <20031113042335.GA17561@sphere.math.ucdavis.edu> Message-ID: <200311131553.05445.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 13 Nov 2003 03:23 pm, Bill Broadley wrote: > So instead of the list admin blocking a obviously broken mail setup, > every poster to this list should have to setup a seperate filter? No, I'm saying that *as* *well* as unsubscribing them they should filter those emails out. That's why I said "also" in my email. Just unsubscribing them won't fix this quickly. I'm getting multiple bounces for messages I sent to the beowulf list over a month ago now, so even if they are unsubscribed I think we'll all be getting these bounces for the forseeable future. Can anyone in the US get in contact with them by phone and tell them what's going on please ? The WHOIS data for them says: Organization: The Systems.Firm Daniel Philpott 348 Rutgers Street Rockville, MD 20850 US Phone: 3016109635 Fax..: 3016109636 Email: dphilpott at ex-pressnet.com Registrar Name....: Register.com Registrar Whois...: whois.register.com Registrar Homepage: http://www.register.com Domain Name: SYSTEMSFIRM.NET Created on..............: Tue, Jun 15, 1999 Expires on..............: Thu, Jun 15, 2006 Record last updated on..: Wed, Dec 04, 2002 Administrative Contact: The Systems.Firm Daniel Philpott 348 Rutgers Street Rockville, MD 20850 US Phone: 3016109635 Fax..: 3016109636 Email: dphilpott at ex-pressnet.com Technical Contact, Zone Contact: Register.Com Domain Registrar 575 8th Avenue - 11th Floor New York, NY 10018 US Phone: 902-749-2701 Fax..: 902-749-5429 Email: domain-registrar at register.com thanks, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/sw4jO2KABBYQAh8RAuEjAKCPnEyGif7S17OL+/ykJsJAc7kiIgCcDV2U AxTo9T98ZZIdhm8Ap8pliB8= =5kzH -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Thu Nov 13 05:16:15 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Thu, 13 Nov 2003 05:16:15 -0500 Subject: list managemnt issue In-Reply-To: <20031113042335.GA17561@sphere.math.ucdavis.edu> References: <200311131038.27648.csamuel@vpac.org> <20031113042335.GA17561@sphere.math.ucdavis.edu> Message-ID: <20031113051615.I9711@www2> On Wed, Nov 12, 2003 at 08:23:35PM -0800, Bill Broadley wrote: > > On Thu, Nov 13, 2003 at 10:38:22AM +1100, Chris Samuel wrote: > > On Thu, 13 Nov 2003 10:22 am, Joel Jaeggli wrote: > > > > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > > been perodically bouncing messages sent to the list for weeks... > > > > Also note that it should be trivial to automatically bin email from > > postmaster at systemsfirm.net in your MUA. > > > > Chris > > So instead of the list admin blocking a obviously broken mail setup, > every poster to this list should have to setup a seperate filter? > > Please unsubscribe dan at systemsfirm.com. > > I'm getting hourly bounces to my posting of over a month ago, I suspect > every other poster is as well. But that is just the problem: Only posters -- not non-posting subscribers -- are getting these messages because they are coming direct to the poster without going back through the list. There is absolutely nothing that Donald can do to fix that. Note that there are no such bounces from recent posts; this is because that address was removed from the list a long time ago. --Bob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Nov 13 10:56:17 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 13 Nov 2003 07:56:17 -0800 (PST) Subject: list managemnt issue In-Reply-To: <3FB3A5C9.80609@tamu.edu> Message-ID: Its not spam. it's a bunged mailserver and messages bounce to the sender rather than the list admins. mail to both postmaster and the techincal admin contact for that domain have failed. I have thirteen messages in my lamer folder from yesterday from their mailer telling me it couldn't deliver the message. If you can't configure your mta properly you don't belong on mailing lists, period, end of story. joelja On Thu, 13 Nov 2003, Gerry Creager N5JXS wrote: > Can someone *NOT* blackhole anyone? > > I'm sorry Joel. This is a hot-button. I've found myself blackholed in > the past because I was on an ISDN modem, on DSL, from a University, and > once for an open relay... that I didn't run. > > Getting out of the blackhole list is a PITA, and sometimes unachievable. > > I've firmly decided that blackhole/blacklisting spammers/potential > spammers/someone I just don't like/etc. isn't the answer. I've had > considerable success with graylisting, but that's not the problem here. > > What I guess I'm asking here is for the listadmin to unceremoniously > unsubscribe *@systemsfirm.net for much the same reason you asked for > them to be blackholed. > > Blacklist/blackhole implementations are, IMO, broken at best, and a > number of the administrators of same I've dealt with are pompous > juveniles who can't interact with a human when they make a mistake. > > gerry > > Joel Jaeggli wrote: > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > been perodically bouncing messages sent to the list for weeks... > > > > Date: Wed, 12 Nov 2003 18:16:00 -0500 > > From: postmaster at systemsfirm.net > > To: joelja at darkwing.uoregon.edu > > Subject: Delivery Status Notification (Failure) > > Parts/Attachments: > > 1 Shown 8 lines Text (charset: Unknown) > > 2 Shown 338 bytes Message, "Delivery Status" > > 3 Shown 5.4 KB Message, "Re: building a RAID system" > > 3.1 Shown ~43 lines Text > > ---------------------------------------- > > > > > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Wed, 12 Nov 2003 18:03:31 -0500 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Wed, > > 15 Oct 2003 00:36:09 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Wed, 15 Oct 2003 01:24:42 -0400 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > > 14 Oct 2003 23:02:08 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Tue, 14 Oct 2003 23:50:47 -0400 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > > 14 Oct 2003 21:59:52 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Tue, 14 Oct 2003 22:59:51 -0400 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > > 14 Oct 2003 21:07:45 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Tue, 14 Oct 2003 22:07:45 -0400 > > Return-Path: > > Received: from newblue.scyld.com ([64.237.107.19]) by ; Wed, > > 08 Oct 2003 17:42:03 -0500 > > Received: from NewBlue.scyld.com (localhost.localdomain [127.0.0.1]) > > by localhost.localdomain (8.10.2/8.10.2) with ESMTP id > > h98L0tb28952; > > Wed, 8 Oct 2003 17:00:55 -0400 > > Received: from darkwing.uoregon.edu (root at darkwing.uoregon.edu > > [128.223.142.13]) > > by NewBlue.scyld.com (8.10.2/8.10.2) with ESMTP id h98Kvqb28616 > > for ; Wed, 8 Oct 2003 16:57:53 -0400 > > Received: from twin.uoregon.edu (twin.uoregon.edu [128.223.214.27]) > > by darkwing.uoregon.edu (8.12.10/8.12.10) with ESMTP id > > h98KwHEA005624 > > (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 > > verify=NOT); > > Wed, 8 Oct 2003 13:58:17 -0700 (PDT) > > From: Joel Jaeggli > > X-X-Sender: joelja at twin.uoregon.edu > > To: Daniel Fernandez > > cc: beowulf at beowulf.org > > > > > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Nov 13 11:55:43 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 13 Nov 2003 11:55:43 -0500 (EST) Subject: LECCIBG at SC2003 In-Reply-To: <20031113051615.I9711@www2> Message-ID: For those attending SC2003, there does not seem to be a Beowulf Bash on Monday night after the Opening Gala. There is however a LECCIBG: http://www.cluster-rant.com/LECCIBG/LECCIBG.html Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Nov 13 12:53:34 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 13 Nov 2003 12:53:34 -0500 (EST) Subject: Alternate LECCIBG notice In-Reply-To: <20031113051615.I9711@www2> Message-ID: If you have trouble getting to the cluster-rant.com try this: http://www.hpc-design.com/LECCIBG/ Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 12:59:56 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 13 Nov 2003 17:59:56 GMT Subject: list managemnt issue In-Reply-To: References: Message-ID: <20031113175956.23801.qmail@houston.wolf.com> Joel Jaeggli writes: > Its not spam. it's a bunged mailserver and messages bounce to the sender > rather than the list admins. mail to both postmaster and the techincal > admin contact for that domain have failed. I have thirteen messages in my > lamer folder from yesterday from their mailer telling me it couldn't > deliver the message. If you can't configure your mta properly you don't > belong on mailing lists, period, end of story. Took the easy way out. reported them to rfc-ignorant.org (should RBL them upon verification) and added them to our local RBL-so that's that. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Thu Nov 13 05:24:54 2003 From: clwang at csis.hku.hk (Cho Li Wang) Date: Thu, 13 Nov 2003 18:24:54 +0800 Subject: Cluster2003: Advance Registration (Due: Nov. 15) References: <3F80CC86.FAFBFAD2@csis.hku.hk> Message-ID: <3FB35BF6.A9466ABF@csis.hku.hk> ---------------------------------------------------------------- 2003 IEEE International Conference on Cluster Computing December 1-4, 2003 Sheraton Hong Kong Hotel & Towers Tsim Sha Tsui, Kowloon, Hong Kong Sponsored by: IEEE Task Force on Cluster Computing IEEE Computer Society The University of Hong Kong --------------------------------------------------------------- Dear Colleagues The deadline for Cluster2003 advance registration is approaching (Nov. 15). You are reminded to make your registration and hotel reservation ahead. For more detailed information about the conference activities, please visit our web site at http://www.csis.hku.hk/cluster2003/ Regards Cho-Li Wang and Daniel Katz Cluster2003 Program Co-Chairs ---------------------------------------------------------------- Conference Highlights ** There will be 48 contributed papers to be presented on Dec. 2-4, covering a wide range of subjects in cluster computing research. See our full program in : http://www.csis.hku.hk/cluster2003/advance-program.html ** Keynote addresses given by world-class researchers: - Linux Clusters for Extremely Large Scientific Simulation (Mark K. Seager, LLNL) - Distributed Security Enforcement for Trusted Cluster and Grid Computing (Kai Hwang, USC) - Cluster Computing for Financial Engineering (Thomas F. Coleman, Cornell) - Towards Grid and Cluster Federations (Satoshi Sekiguchi, AIST) - ... More deatils: http://www.csis.hku.hk/cluster2003/keynote.htm ** Panel discussion on Dec. 3: "Top Problems in Cluster Computing and Systems and Possible Solutions", led by - Rusty Lust, Angonne National Lab., USA - Dhabaleswar K. Panda, Ohio State University, USA - Phil Papadopoulos, San Diego Supercomputing Center, USA - Thomas Stricker, ETH, Switzerland - Chip Watson, DOE Jefferson Lab, USA - Zhiwei Xu, Institute of Computing Technology, China - Xiaodong Zhang (Moderator), NSF and College of William and Mary, USA ** Four tutorials, introducing state-of-the-art clustering technologies on Dec. 1 : 1. Designing Next Generation Clusters with Infiniband: Opportunities and Challenges 2. Using MPI-2: Advanced Features of the Message Passing Interface 3. The Gridbus Toolkit for Grid and Utility Computing 4. Building and Managing Clusters with NPACI Rocks More http://www.csis.hku.hk/cluster2003/tutorials.htm ** Vender technical talks and exhibitions by : HP, Microsoft, IBM, Extreme Networks, Sun Microsystems, Dawning, Intel, DELL, Cluster File System, Linux Networx, Mellanox, RackSaver, More http://www.csis.hku.hk/cluster2003/vender.htm ** A live Grid demo session on Dec. 4, featuring 8 innovative Grid applications/systems. More .. http://www.csis.hku.hk/cluster2003/griddemo.html ** A boat trip at the beautiful and romantic Victoria Harbour in the evening of Dec. 4 More .. http://www.csis.hku.hk/~chyu/cluster2003/boat.html ** One-day tour on Dec. 5 : - Enjoy the fascinating view of the Victoria Harbor from the Victoria Peak - The breathtaking views over sandy beaches at Repulse Bay - Visit the "Ling Chi" ("herb of the gods") plantation garden. - "Big Bowl Feast" ("Pun Choi" in Chinese) -- food served in wooden basins More http://www.csis.hku.hk/~chyu/cluster2003/tour.html ----------------------------------------------------------------- Conference/Tutorial Registration : http://www.csis.hku.hk/cluster2003/registration.htm Hotel Reservation: http://www.csis.hku.hk/cluster2003/hotel.htm --------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Nov 13 10:39:53 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 13 Nov 2003 09:39:53 -0600 Subject: list managemnt issue In-Reply-To: References: Message-ID: <3FB3A5C9.80609@tamu.edu> Can someone *NOT* blackhole anyone? I'm sorry Joel. This is a hot-button. I've found myself blackholed in the past because I was on an ISDN modem, on DSL, from a University, and once for an open relay... that I didn't run. Getting out of the blackhole list is a PITA, and sometimes unachievable. I've firmly decided that blackhole/blacklisting spammers/potential spammers/someone I just don't like/etc. isn't the answer. I've had considerable success with graylisting, but that's not the problem here. What I guess I'm asking here is for the listadmin to unceremoniously unsubscribe *@systemsfirm.net for much the same reason you asked for them to be blackholed. Blacklist/blackhole implementations are, IMO, broken at best, and a number of the administrators of same I've dealt with are pompous juveniles who can't interact with a human when they make a mistake. gerry Joel Jaeggli wrote: > can some plase blackhole anyone at systemsfirm.net their mailserver has > been perodically bouncing messages sent to the list for weeks... > > Date: Wed, 12 Nov 2003 18:16:00 -0500 > From: postmaster at systemsfirm.net > To: joelja at darkwing.uoregon.edu > Subject: Delivery Status Notification (Failure) > Parts/Attachments: > 1 Shown 8 lines Text (charset: Unknown) > 2 Shown 338 bytes Message, "Delivery Status" > 3 Shown 5.4 KB Message, "Re: building a RAID system" > 3.1 Shown ~43 lines Text > ---------------------------------------- > > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Wed, 12 Nov 2003 18:03:31 -0500 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Wed, > 15 Oct 2003 00:36:09 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Wed, 15 Oct 2003 01:24:42 -0400 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > 14 Oct 2003 23:02:08 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Tue, 14 Oct 2003 23:50:47 -0400 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > 14 Oct 2003 21:59:52 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Tue, 14 Oct 2003 22:59:51 -0400 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > 14 Oct 2003 21:07:45 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Tue, 14 Oct 2003 22:07:45 -0400 > Return-Path: > Received: from newblue.scyld.com ([64.237.107.19]) by ; Wed, > 08 Oct 2003 17:42:03 -0500 > Received: from NewBlue.scyld.com (localhost.localdomain [127.0.0.1]) > by localhost.localdomain (8.10.2/8.10.2) with ESMTP id > h98L0tb28952; > Wed, 8 Oct 2003 17:00:55 -0400 > Received: from darkwing.uoregon.edu (root at darkwing.uoregon.edu > [128.223.142.13]) > by NewBlue.scyld.com (8.10.2/8.10.2) with ESMTP id h98Kvqb28616 > for ; Wed, 8 Oct 2003 16:57:53 -0400 > Received: from twin.uoregon.edu (twin.uoregon.edu [128.223.214.27]) > by darkwing.uoregon.edu (8.12.10/8.12.10) with ESMTP id > h98KwHEA005624 > (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 > verify=NOT); > Wed, 8 Oct 2003 13:58:17 -0700 (PDT) > From: Joel Jaeggli > X-X-Sender: joelja at twin.uoregon.edu > To: Daniel Fernandez > cc: beowulf at beowulf.org > > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jampuero at Princeton.EDU Thu Nov 13 14:44:32 2003 From: jampuero at Princeton.EDU (Jean Paul Ampuero) Date: Thu, 13 Nov 2003 14:44:32 -0500 Subject: bpcp with globbing Message-ID: <3FB3DF20.6040506@princeton.edu> I am trying to gather output files from the slaves to the master node using the bpcp command (example: bpcp 2:/scratch/ampuero/SCEC1/S001* ~ampuero) But globbing does not work the way I'd like: bpcp tries to expand the * in the master, instead of in the slave. Similar problem with "bpsh -a cp /scratch/ampuero/SCEC1/S0* ~ampuero". Is there a workaround ? -- Jean Paul (Pablo) AMPUERO Post-Doctoral Research Associate Princeton University - Department of Geosciences Guyot Hall, Room 321 B - Princeton NJ 08544 Office: (609) 258 2598 Mobile: (609) 638 0106 Fax : (609) 258 1671 http://geoweb.princeton.edu/people/resstaff/ampuero.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 15:52:06 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 13 Nov 2003 20:52:06 GMT Subject: list managemnt issue In-Reply-To: <3FB3A5C9.80609@tamu.edu> References: <3FB3A5C9.80609@tamu.edu> Message-ID: <20031113205206.3460.qmail@houston.wolf.com> Gerry Creager N5JXS writes: > Can someone *NOT* blackhole anyone? > > I'm sorry Joel. This is a hot-button. I've found myself blackholed in > the past because I was on an ISDN modem, on DSL, from a University, and > once for an open relay... that I didn't run. > > Getting out of the blackhole list is a PITA, and sometimes unachievable. > > I've firmly decided that blackhole/blacklisting spammers/potential > spammers/someone I just don't like/etc. isn't the answer. I've had > considerable success with graylisting, but that's not the problem here. > > What I guess I'm asking here is for the listadmin to unceremoniously > unsubscribe *@systemsfirm.net for much the same reason you asked for them > to be blackholed. > > Blacklist/blackhole implementations are, IMO, broken at best, and a number > of the administrators of same I've dealt with are pompous juveniles who > can't interact with a human when they make a mistake. Knee jerk reactions are never good-no matter what side of the RBL question you are on. I love RBLs. They do exactly what they are supposed to do, block abuse of my systems from the incompetent (at best), or deliberate abusive (at worse) without having to add more of a burden to my and my users. Also, I can with a two line entry control access to all my boxes. Don't wanna get RBL'd? Keep your system tighened down. Someone does not get into RBLs by keeping their system configured correctly. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dstanzi at clemson.edu Thu Nov 13 16:22:07 2003 From: dstanzi at clemson.edu (Dan Stanzione) Date: Thu, 13 Nov 2003 16:22:07 -0500 Subject: bpcp with globbing References: <3FB3DF20.6040506@princeton.edu> Message-ID: <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> >From: "Jean Paul Ampuero" >To: >Sent: Thursday, November 13, 2003 2:44 PM >Subject: bpcp with globbing > > I am trying to gather output files from the slaves to the master node > using the bpcp command (example: bpcp 2:/scratch/ampuero/SCEC1/S001* > ~ampuero) > But globbing does not work the way I'd like: bpcp tries to expand the * > in the master, > instead of in the slave. > Similar problem with "bpsh -a cp /scratch/ampuero/SCEC1/S0* ~ampuero". > Is there a workaround ? I'm assuming you're running Scyld? That's actually a problem with the shell itself more than bpcp; and unlike some utilities (like say, scp) you can't get around by simply putting the whole thing in quotes. I haven't had the patience to find a "good" fix, but it seems from your example you have all the files you need on an NFS or other shared directory, and you're just trying to move them to local disk. A really ugly work-around (but it takes 10 seconds) is just to put the cp command with the wildcard argument into a one-line script, then run "bpsh -a " and the arguments will be expanded on the slaves. There's got to be a better way to do this, but that will get you through the night. Any ideas, Don? Dan ---------------------------------------------- Dan Stanzione, PhD dstanzio at nsf.gov AAAS Fellow Division of Graduate Education National Science Foundation (703)292-8121 Fax: (703) 292-9048 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Thu Nov 13 17:01:20 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Thu, 13 Nov 2003 17:01:20 -0500 Subject: clubmask-0.6b1 released Message-ID: <1068760880.3988.17.camel@roughneck.liniac.upenn.edu> On a sourceforge mirror near you: Name : Clubmask Version : 0.6 Release : b1 Group : Cluster Resource Management and Scheduling Vendor : Liniac Project, University of Pennsylvania License : GPL-2 URL : http://clubmask.sourceforge.net Download : http://sourceforge.net/project/showfiles.php?group_id=1316&release_id=197383 What is Clubmask ------------------------------------------------------------------------------ Clubmask is a resource manager designed to allow Bproc based clusters enjoy the full scheduling power and configuration of the Maui HPC Scheduler. Clubmask uses a modified version of the Supermon resource monitoring software to gather resource information from the cluster nodes. This information is combined with job submission data and delivered to the Maui scheduler. Maui issues job control commands back to Clubmask, which then starts or stops the job scripts using the Bproc environment. Clubmask also provides builtin support for a supermon2ganglia translator that allows a standard Ganlgia web backend to contact supermon and get XML data that will disply through the Ganglia web interface. Clubmask is currently running on around 10 clusters, varying in size from 8 to 128 nodes, and has been tested up to 5000 jobs. Notes/warnings on this release: ------------------------------------------------------------------------------ Before upgrading, please make sure to save your /etc/clubmask/clubmask.conf file, as it may get overwritten. There are a few new variables in clubmask.conf, so beware! To use the resource requests, you must be running the latest snapshot of maui. Changes since 0.5: ------------------------------------------------------------------------------ Change the name from the god awfull absolute timestamp, to a more normal "string.number" format, where "string" is an arbitrary job name and "number" is the Nth time that the job name is being used. EX root.1, root.2, ... fix cmnodesshknownhosts to get the -n information from the bproc nodenumber that is given as the argument update to latest supermon APIs Feature Request #790938: add 'cmsubmit -r ' to run a job in a maui reservation. Fixed bug #791396: make sure processes get killed in Interactive jobs make sure bproc is running when starting resource_manager fix cmsubmit -h. it is now cleaner, and easier to understand add support for resource requirements on the nodes. swap, mem, disk, qos, reservation, and processors per node are supported now. see cmsumbit -h for more information. add infrastructure for architecture, os, network, arbitrary features as node resource requests. We do not get this information dynamically yet, so no need in letting people muck with it. add supermon_state daemon to manage the nodelist for supermon. keeps that logic out of resource_manager make sure there is at most one 'R' command in the pipeline for down nodes at any given time. No sense in asking nodes to revive if they have not responded to the last request yet. cleanup setup to perform RPM builds cleaner split /etc/clubmask/clubmask.conf to /etc/clubmask/{system,clubmask}.conf to allow variables that need user editing to live in clubmask.conf and the rest of the system varaibles to live in system.conf. This will let a user update to a newer version of Clubmask, and just copy over the old clubmask.conf to restore their configuration. migrate all docs from Docbook XML to Lyx/latex. All of the docs -- pdf, html single, and html multiple can be generated with a simple 'make' in the docs/ directory. add --secret-key to setup.py args for building maui and clubmask with same checksum key. This removes the need to edit setup.py when installing clubmask. Links ------------- Bproc: http://bproc.sourceforge.net Ganglia: http://ganglia.sourceforge.net Maui Scheduler: http://www.supercluster.org/maui Supermon: http://supermon.sourceforge.net Cheers~ Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Nov 13 18:06:55 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 14 Nov 2003 10:06:55 +1100 Subject: list managemnt issue In-Reply-To: <20031113205206.3460.qmail@houston.wolf.com> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> Message-ID: <200311141006.56305.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 14 Nov 2003 07:52 am, Angel Rivera wrote: > Don't wanna get RBL'd? Keep your system tighened down. Someone does not > get into RBLs by keeping their system configured correctly. This is rapidly getting off-topic, but this needed addressing. People *can* get into blacklists without doing anything wrong, if the maintainers are overly broad with their brush (such as listing entire class-C networks at hosting companies) or because of malicious/clueless submission of reports. Debian blacklisted: http://lists.debian.org/debian-devel/2002/debian-devel-200207/msg00044.html The Age newspaper report on SpamCop blocking entire /24's (and hence Politech and others): http://www.theage.com.au/articles/2002/12/19/1040174329829.html Politech blocked 3 times by SpamCop: http://www.politechbot.com/p-04121.html Peacefire become collateral damage to a netblock blocked by MAPS for hosting a site selling spam software: http://slashdot.org/yro/00/12/13/1853237.shtml RFC-ignorant blacklists the entire 202/7 netblock: http://www.apnic.net/mailing-lists/apops/archive/2001/10/msg00009.html Now note I'm not saying that blacklists are bad, just that there *is* collateral damage from them, and as long as people are aware of that and decide they can tolerate the risk then that's fine. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/tA6PO2KABBYQAh8RAtkNAJoCTbVE4xnRJFJSY5wHkszrC5zVQQCffzLW zL4ppbE6JHN1f7y2xWv9cxo= =jNQu -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Nov 13 17:54:07 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 13 Nov 2003 17:54:07 -0500 (EST) Subject: list managemnt issue In-Reply-To: <20031113205206.3460.qmail@houston.wolf.com> Message-ID: Just to summarize the immediate issue: - messages from about a month ago were being bounced back to the posters - the address associated with the 'borken system was removed Oct. 20 - there is nothing that can be done at beowulf.org to prevent the bogus bounces, as the bounces were not routed through here On Thu, 13 Nov 2003, Angel Rivera wrote: > Gerry Creager N5JXS writes: > > I'm sorry Joel. This is a hot-button. I've found myself blackholed in > > the past because I was on an ISDN modem, on DSL, from a University, and .. > I love RBLs. They do exactly what they are supposed to do, block abuse of .. > Don't wanna get RBL'd? Keep your system tighened down. Someone does not get > into RBLs by keeping their system configured correctly. Scyld runs a bunch of Linux and Beowulf-related mailing lists, with all of them moderated or posting limited to members. I average about 40 minutes a day moderating and adding to the spam filters. Even with that care, we have been on a number of RBLs. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Nov 13 18:54:56 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 13 Nov 2003 15:54:56 -0800 (PST) Subject: list managemnt issue - rbl In-Reply-To: <200311141006.56305.csamuel@vpac.org> Message-ID: hi ya a list admin cannot do nothing about stopping spam other than making it members only ... rest of the spam fighting applies to all lists and all regular user emails too thanx donald and crew for the list... its a lot of work to keep a list going On Fri, 14 Nov 2003, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 14 Nov 2003 07:52 am, Angel Rivera wrote: > > > Don't wanna get RBL'd? Keep your system tighened down. Someone does not > > get into RBLs by keeping their system configured correctly. > > This is rapidly getting off-topic, but this needed addressing. yes and no, 75% or more of the spams come from "mis-managed" clusters ( at least when i'm collecting data on sven virus ) http://www.Linux-Sec.net/Mail/SpamVirus/Sven/ all that "compute power" for sending out spam .. :-) ( *pout* ) > People *can* get into blacklists without doing anything wrong, if the - only way is if you sent spam .. - or if you inherited an ip# of a spammer - or if the rbl db admin decides to block all ip# in the class-C, class-B, country > maintainers are overly broad with their brush (such as listing entire class-C > networks at hosting companies) or because of malicious/clueless submission of > reports. BL works when: - the blacklister has a copy of the spam to prove their case ( it works when you run your own RBL lists .. or whatever way ( you/your corp decide to fight spam - building your own rbl is trivial or complicated ..depending on what you want it to do .. http://www.UCEAS.org/RBL.Server/ BL does NOT work when: - its done by a 3rd party - its done for free on tehir t1 or t3 line for everybody to use - the bl db maintainer adds any incoming report w/o checking - the bl db maintainers does NOT remove people from the bl db - the bl db mainterners adds the entire class-C, class-B or entire country to their bad-boy list > Debian blacklisted: > > http://lists.debian.org/debian-devel/2002/debian-devel-200207/msg00044.html mailing lists should be open for all, liek they are, in which case spam can get thru if mailing lists are members only, its one more hurdle for the spammer sw to subscribe, spam the list, and unsubscribe -------- whitelisting doesn't work, you dont know where your business inquiry is coming from challenge response system is too much of a pain for people to start a (business/social) conversation .... but does work, but again it tells the other business you dont know how else to stop spam without considering everybody a potential spammer ( a bad impression in my book ) tar pitting works if enough people implements it and slows down the sending ( misconfigured open relay or cracked ) server simple bouncing ( rejecting ) the incoming spam will fill the sending guilty spam server of rejected/bounced spam - dropping the spam is a bad idea, since it confirm to them that the email address is valid and you'd get more of it ---- 80% of all DNS is misconfigured too ... :-) c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From twhitcomb at apl.washington.edu Thu Nov 13 20:51:47 2003 From: twhitcomb at apl.washington.edu (Timothy R. Whitcomb) Date: Thu, 13 Nov 2003 17:51:47 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) Message-ID: We are having an ongoing issue with our compute cluster, running Scyld 28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute nodes and 1 master node. We are running the Navy's weather model. The problem: The model runs fine when run on 4 processors (1 on each compute node). However, when I use the SMP capabilities of the machine and try to run on, say, 8 processors (using both CPUs on each compute node), everything will run fine for a while. Then, at a non-consistent time, a node will invariably freeze up. The cluster loses its connection to the node and I cannot communicate with it using any of the cluster tools - sometimes it will automatically reboot, but usually it requires me to go perform a hard reset on the node. However, I have found that in most cases if I run 2 jobs in parallel (i.e. 2 4-cpu processes, each using only 1 CPU on each node) things seem to work fine. Nodes may still freeze from time to time but not nearly as often. The hardware: The cluster was obtained pre-built from PSSC LabsEach compute node is a dual-processor Tyan MB with 2 Athlon MP CPUS. They are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We are using the BeoMPI 1.0.7 implementation of MPICH compiled with: --with-device=ch_p4 --with-comm=bproc (note that I had to recompile BeoMPI with the PGI compiler to get it to work with the model) Again, we use Scyld Beowulf 28cz4 for the operating system uname -a gives Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 i686 unknown _Please_ help if you have _any_ suggestions whatsoever. I am at the end of my rope, and this is presenting a serious impediment to our research! If you need more information, let me know and I will be happy to provide it! Thanks... Tim Whitcomb Meteorologist University of Washington Applied Physics Lab twhitcomb at apl.washington.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 20:52:45 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 14 Nov 2003 01:52:45 GMT Subject: list managemnt issue In-Reply-To: <200311141006.56305.csamuel@vpac.org> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> <200311141006.56305.csamuel@vpac.org> Message-ID: <20031114015245.24191.qmail@houston.wolf.com> Chris Samuel writes: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 14 Nov 2003 07:52 am, Angel Rivera wrote: > >> Don't wanna get RBL'd? Keep your system tighened down. Someone does not >> get into RBLs by keeping their system configured correctly. > > This is rapidly getting off-topic, but this needed addressing. > > People *can* get into blacklists without doing anything wrong, if the > maintainers are overly broad with their brush (such as listing entire class-C > networks at hosting companies) or because of malicious/clueless submission of > reports. It is off-topic but... While I agree it can and has happen, I do not believe that it is that common-at least in my experience. I am one who believes in expaning blocks to include large swaths if IP space where abuse it rampant. Our policy is we LART 100% of the time for spam and block that IP. If the ISPs postmaster and/or abuse does not work we submit them to rfc-ignorant.org (an RBL we also use). if they do nothing, we do nothing also and they stay blocked. If we get another spam in the same C-equiv we block the entire C. Heck, I block entire countries. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Nov 13 21:38:18 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 13 Nov 2003 18:38:18 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: Message-ID: hi ya tim i usually was able to fix random cpu crashes by changing the kernel to the latest/greatest one at the time ( 2.4.22 ) if i were to use a new smp kernel today if the latest kernel has no effect, than there's some other serious hw problems ... timing issues ?? - make sure the kernel is compiled for athlon and not p4 and smp enabled - memory clock speeds, marginal memeory sticks ( get rid of generic no-name-brand memory sticks - swap memory sticks and see if the problem follow the memory ( keep good track of it so you can easily identify it if all the memory was thrown on the floor all at the same time - make sure you only have 1 ide disk on each cable to help identify any other hw issues - blow air, with a household 24"-36" fan, in the same direction as normal airflow of the system and see if it helps any - replace the home-made nic ables with molded cat-5 cables where its obvious that a person didnt hand-crimp the wires - swap the ports the the nic cables are connected to - inexpensive hubs is the next to swap out c ya alvin On Thu, 13 Nov 2003, Timothy R. Whitcomb wrote: > We are having an ongoing issue with our compute cluster, running Scyld > 28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute > nodes and 1 master node. We are running the Navy's weather model. > > The problem: > The model runs fine when run on 4 processors (1 on each compute node). > However, when I use the SMP capabilities of the machine and try to run on, > say, 8 processors (using both CPUs on each compute node), everything will > run fine for a while. Then, at a non-consistent time, a node will > invariably freeze up. The cluster loses its connection to the > node and I cannot communicate with it using any of the cluster tools - > sometimes it will automatically reboot, but usually it requires me to go > perform a hard reset on the node. > > However, I have found that in most cases if I run 2 jobs in parallel (i.e. > 2 4-cpu processes, each using only 1 CPU on each node) things seem to work > fine. Nodes may still freeze from time to time but not nearly as often. > > The hardware: > The cluster was obtained pre-built from PSSC LabsEach compute node is a > dual-processor Tyan MB with 2 Athlon MP CPUS. They > are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 > Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We > are using the BeoMPI 1.0.7 implementation of MPICH compiled with: > --with-device=ch_p4 --with-comm=bproc > (note that I had to recompile BeoMPI with the PGI compiler to get it to > work with the model) > Again, we use Scyld Beowulf 28cz4 for the operating system > uname -a gives > Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 > i686 unknown > > _Please_ help if you have _any_ suggestions whatsoever. I am at the end > of my rope, and this is presenting a serious impediment to our research! > If you need more information, let me know and I will be happy to provide > it! > > Thanks... > > Tim Whitcomb > Meteorologist > University of Washington Applied Physics Lab > twhitcomb at apl.washington.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jampuero at Princeton.EDU Thu Nov 13 16:45:06 2003 From: jampuero at Princeton.EDU (Jean Paul Ampuero) Date: Thu, 13 Nov 2003 16:45:06 -0500 Subject: bpcp with globbing In-Reply-To: <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> References: <3FB3DF20.6040506@princeton.edu> <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> Message-ID: <3FB3FB62.3010503@princeton.edu> Right, I am running Scyld. I already tried your "ugly" work-around, without success. I get: execmove: Exec format error Dan Stanzione wrote: > I'm assuming you're running Scyld? > >That's actually a problem with the shell itself more than bpcp; and unlike >some utilities (like say, scp) you can't get around by simply putting the >whole >thing in quotes. I haven't had the patience to find a "good" fix, but >it seems from your example you have all the files you need on an NFS >or other shared directory, and you're just trying to move them to local >disk. > >A really ugly work-around (but it takes 10 seconds) is just to put the >cp command with the wildcard argument into a one-line script, then >run "bpsh -a " and the arguments will be expanded on >the slaves. > >There's got to be a better way to do this, but that will get you >through the night. Any ideas, Don? > > Dan > >---------------------------------------------- >Dan Stanzione, PhD dstanzio at nsf.gov >AAAS Fellow >Division of Graduate Education >National Science Foundation >(703)292-8121 Fax: (703) 292-9048 > > > -- Jean Paul (Pablo) AMPUERO Post-Doctoral Research Associate Princeton University - Department of Geosciences Guyot Hall, Room 321 B - Princeton NJ 08544 Office: (609) 258 2598 Mobile: (609) 638 0106 Fax : (609) 258 1671 http://geoweb.princeton.edu/people/resstaff/ampuero.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Nov 13 21:31:49 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 13 Nov 2003 21:31:49 -0500 (EST) Subject: bpcp with globbing In-Reply-To: <3FB3DF20.6040506@princeton.edu> Message-ID: On Thu, 13 Nov 2003, Jean Paul Ampuero wrote: > I am trying to gather output files from the slaves to the master node > using the bpcp command (example: bpcp 2:/scratch/ampuero/SCEC1/S001* > ~ampuero) > But globbing does not work the way I'd like: bpcp tries to expand the * > in the master, > instead of in the slave. This is the behavior your would expect: globbing is done by the local shell before the expanded args are passed to bpcp. > Similar problem with "bpsh -a cp /scratch/ampuero/SCEC1/S0* ~ampuero". > Is there a workaround ? To get the exact semantics of shell globbing, you must use the shell itself. Here is a broken-out way of doing what you wish: FILES=`bpsh 2 /bin/sh -c 'echo /scratch/ampuero/SCEC1/S001*' for file in $FILES; do bpcp 2:$file ~ampuero done The clearest way to do this in a single line is to 'reverse' the direction of the copy: bpsh 2 bpcp `bpsh 2 /bin/sh -c 'echo /scratch/ampuero/SCEC1/S001*' master:~ampuero Some detailed notes about this: - By running the 'bpcp' on the "read-from" node we don't have to prepend "2:" to each file name. - 'bpcp' is modeled after 'rcp', which has a similar local globbing issue. - You use you might be able to simplify the specific commands to not use shell globbing. But using '/bin/sh' prevents misinterpretations, such confusing regexp and globbing: does ".*" mean all files or just dot files? - The command above relies on 'echo' being a shell built-in. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dstanzi at clemson.edu Thu Nov 13 17:25:35 2003 From: dstanzi at clemson.edu (Dan Stanzione) Date: Thu, 13 Nov 2003 17:25:35 -0500 Subject: bpcp with globbing References: <3FB3DF20.6040506@princeton.edu> <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> <3FB3FB62.3010503@princeton.edu> Message-ID: <198a01c3aa35$0e9b6740$795d9680@slimowitzXP> > Right, I am running Scyld. > I already tried your "ugly" work-around, without success. I get: > execmove: Exec format error > > OK, I knew I'd done that before, but I was able to re-create your problem. There are two gotchas to make this work: -Your shell script has to be a script, not just commands in a file you source (i.e., it starts #!/bin/sh). That takes care of the execmove error. -If you run a command that's not on the nodes (like cp) from a script, bpsh doesn't know to move that command out there, so, everything you use must reside on the node; you may need to stick a copy of /bin/cp out there somewhere. With those two caveats, it works just fine... I reiterate that there must be a better way... Dan ---------------------------------------------- Dan Stanzione, PhD dstanzio at nsf.gov AAAS Fellow Division of Graduate Education National Science Foundation (703)292-8121 Fax: (703) 292-9048 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 22:00:36 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 14 Nov 2003 03:00:36 GMT Subject: list managemnt issue In-Reply-To: <3FB44137.5090603@tamu.edu> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> <3FB44137.5090603@tamu.edu> Message-ID: <20031114030036.28845.qmail@houston.wolf.com> Gerry Creager N5JXS writes: > Count to 10. Don't respond initially with what you wanted to say... > Okay, I've followed the advice. Good. Perhaps I should have done that too-I am very passionate about spam and fighting all forms of network abuse. But: *timeout here* I was not talking about "you" as in you or the beowulf list. It was a generic "you." > > Reread the initial portion of my e-mail. I *DO* keep my system tight. > The last known compromise was a buffer overflow in apache, exploited > before it was announced by apache or bugtraq. And fixed appropriately as > soon as a patch was available (within hours). Because of system configs and > safeguards, no spam emitted from the site. The one previous to that was > caused by a buffer overflow exploit in wuftpd. That represents the last > time wuftpd ran on one of my systems. It also resulted in forensics > running back thru 3 other compromised systems in the US, and to 2 > originating machines in Germany. And some detentions (I never got final > word on arrests/convictions, if any). This is not what I would consider an open system. I certainly spend an awful lot of time keeping and eye on my system and fighting all of the slick ways they find to get spam through all my rbl, filters and avs. I stopped a hacker from UPenn (I think it was) as he was hacking. When they got to his house he was asleep with his girlfriend-someone had hacked into this linux box that was wide open. That I do consider negligent. > I've not had a documented case of an open relay. I've not been > appropriately accused of having spam transit any of my systems. I perform > periodic security audits. I no longer run honey-pots and tarpits because > of an Attorney General's opinion on their legality, but I have. > > AND YOU ARE GOING TO TELL ME TO TIGHTEN UP MY SYSTEM? See above. I am not sure they wouldn't pass muster. if someone is not predisposed to being a criminal and tresspassing and stealing from you-then having them is of no negative value. I am not Don Quixote. I am not trying to track down and chase spammers to ground.I do not run them. I do not smtp scan other boxes. All I am trying to do is keep spam out my box and those of my 2000 or so email users and when it does, I log it, keep a copy of the spam (kinda hard to protest one's innocent under those conditions)and RBL them until they get it fixed and hades freezes over-which ever comes first. I have been subject to one semi-spam complaint. Years ago. You can find it in NANE. It was a camera company that used my domain name internally and they spammed. > Sorry about your _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Nov 13 21:43:03 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 13 Nov 2003 20:43:03 -0600 Subject: list managemnt issue In-Reply-To: <20031113205206.3460.qmail@houston.wolf.com> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> Message-ID: <3FB44137.5090603@tamu.edu> Count to 10. Don't respond initially with what you wanted to say... Okay, I've followed the advice. Reread the initial portion of my e-mail. I *DO* keep my system tight. The last known compromise was a buffer overflow in apache, exploited before it was announced by apache or bugtraq. And fixed appropriately as soon as a patch was available (within hours). Because of system configs and safeguards, no spam emitted from the site. The one previous to that was caused by a buffer overflow exploit in wuftpd. That represents the last time wuftpd ran on one of my systems. It also resulted in forensics running back thru 3 other compromised systems in the US, and to 2 originating machines in Germany. And some detentions (I never got final word on arrests/convictions, if any). I've not had a documented case of an open relay. I've not been appropriately accused of having spam transit any of my systems. I perform periodic security audits. I no longer run honey-pots and tarpits because of an Attorney General's opinion on their legality, but I have. AND YOU ARE GOING TO TELL ME TO TIGHTEN UP MY SYSTEM? Angel Rivera wrote: > Gerry Creager N5JXS writes: > >> Can someone *NOT* blackhole anyone? >> I'm sorry Joel. This is a hot-button. I've found myself blackholed >> in the past because I was on an ISDN modem, on DSL, from a University, >> and once for an open relay... that I didn't run. >> Getting out of the blackhole list is a PITA, and sometimes unachievable. >> I've firmly decided that blackhole/blacklisting spammers/potential >> spammers/someone I just don't like/etc. isn't the answer. I've had >> considerable success with graylisting, but that's not the problem here. >> What I guess I'm asking here is for the listadmin to unceremoniously >> unsubscribe *@systemsfirm.net for much the same reason you asked for >> them to be blackholed. >> Blacklist/blackhole implementations are, IMO, broken at best, and a >> number of the administrators of same I've dealt with are pompous >> juveniles who can't interact with a human when they make a mistake. > > > Knee jerk reactions are never good-no matter what side of the RBL > question you are on. > I love RBLs. They do exactly what they are supposed to do, block abuse > of my systems from the incompetent (at best), or deliberate abusive (at > worse) without having to add more of a burden to my and my users. Also, > I can with a two line entry control access to all my boxes. > Don't wanna get RBL'd? Keep your system tighened down. Someone does not > get into RBLs by keeping their system configured correctly. -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Nov 14 01:11:20 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 14 Nov 2003 01:11:20 -0500 (EST) Subject: mpirun + Scyld MPI In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E2B@lao-exchpo1-nt.nv.doe.gov> Message-ID: On Wed, 12 Nov 2003, Zukaitis, Anthony wrote: > I am currently using MPI distributed with scyld which I believe is > MPICH. Which version of Scyld? > I have 6 dual CPU nodes for a total of 12 cpu's. When ever I try to use 12 > processors it puts 3 processes on one of the nodes and only one process on > the master node. That's the preferred behavior, and thus the default. The initial single process, which will become MPI Rank 0, is on the master. The initialization and scheduling is done single threaded. Additional processes are created when MPI_Init() is called. An alternate behavior is putting all processes on compute nodes. This leaves the master free to manage the jobs. Rank 0 will be on a compute node and thus may not have access to the full set of file systems and scheduling information. > I have tried using a machinefile like > master:2 > .0:2 > .1:2 > .2:2 > .3:2 > .4:2 Using a 'machinefile' is old-fashioned and inflexible. Read the 'beomap' section in the manual for details on the many scheduling options available with Scyld. I'm guessing that you want the control of specifying an explicit job map with the environment variable or command-line option: --map BEOWULF_JOB_MAP Use the colon-delimited list to specify which nodes to run on. It's also possible for the application to influence or specify a process mapping, or for the administrator to install a alternate scheduler as a dynamic library. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Nov 14 01:16:02 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 14 Nov 2003 01:16:02 -0500 (EST) Subject: 5th Annual Beowulf Bash at SC2003 -- early Wednesday evening Message-ID: Everyone attending SC2003 in Phoenix next week is invited to the 5th Annual Extreme Beowulf Bash. Yes, 5th already! As usual, Scyld continues it role as a founding sponsor for the traditional event. WHEN: Wednesday November 18th / 6-8pm WHERE: Hyatt Regency Hotel, Phoenix The Atrium SPONSORS: AMD Penguin Computing Scyld Other sponsors are welcome. Contact jcarrington at scyld.com. For updates and additions check back at http://www.beowulf.org/beowulf/bash in the days before the event! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From beowulf-announce-admin at scyld.com Fri Nov 14 08:10:02 2003 From: beowulf-announce-admin at scyld.com (beowulf-announce-admin at scyld.com) Date: Fri, 14 Nov 2003 08:10:02 -0500 Subject: Your message to Beowulf-announce awaits moderator approval Message-ID: <200311141310.hAEDA2S11540@NewBlue.scyld.com> Your mail to 'Beowulf-announce' with the subject ClusterWorld Is being held until the list moderator can review it for approval. The reason it is being held: Post to moderated list Either the message will get posted to the list, or you will receive notification of the moderator's decision. From deadline at linux-mag.com Fri Nov 14 08:47:14 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 14 Nov 2003 08:47:14 -0500 (EST) Subject: ClusterWorld In-Reply-To: Message-ID: Hi everyone, I am happy to announce ClusterWorld a new magazine about clusters. If you are going to SC2003 come see us at booth 637 for a free copy and sign up for three free issues. You can also check it out on-line right now at: http://www.clusterworld.com Did I mention the four node Microway AMD Opteron cluster we are giving away. Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Fri Nov 14 10:03:20 2003 From: david.n.lombard at intel.com (Lombard, David N) Date: Fri, 14 Nov 2003 07:03:20 -0800 Subject: Scyld Nodes Freezing w/ SMP (fwd) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4BF1C@orsmsx402.jf.intel.com> From: Alvin Oga [deletia] > > if the latest kernel has no effect, than there's some other > serious hw problems ... timing issues ?? > - make sure the kernel is compiled for athlon and not p4 > and smp enabled > > - memory clock speeds, marginal memeory sticks > ( get rid of generic no-name-brand memory sticks > - swap memory sticks and see if the problem > follow the memory ( keep good track of it > so you can easily identify it if all the memory > was thrown on the floor all at the same time > Instead of guessing, try memtest86 at http://www.memtest86.com/memtest86-3.0.tar.gz The easiest way to use it is download the package, copy the pre-built binary to a floppy, and boot the node with the floppy. It will run until you stop it. This should point to a memory problem if it exists. Make sure you read the docs, as there are some Athlon-specific comments IIRC. -- David N. Lombard My comments represent my opinions, not those of Intel. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Nov 14 11:11:42 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Fri, 14 Nov 2003 11:11:42 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <1068826302.28176.22.camel@roughneck.liniac.upenn.edu> On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > Hi, > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. Does it work when using Lam/MPI ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jim at ks.uiuc.edu Fri Nov 14 10:58:10 2003 From: jim at ks.uiuc.edu (Jim Phillips) Date: Fri, 14 Nov 2003 09:58:10 -0600 (CST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: Message-ID: Hi, This is very similar to problems we're seeing on our dual Athlon MP 2600+ cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. -Jim On Thu, 13 Nov 2003, Timothy R. Whitcomb wrote: > We are having an ongoing issue with our compute cluster, running Scyld > 28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute > nodes and 1 master node. We are running the Navy's weather model. > > The problem: > The model runs fine when run on 4 processors (1 on each compute node). > However, when I use the SMP capabilities of the machine and try to run on, > say, 8 processors (using both CPUs on each compute node), everything will > run fine for a while. Then, at a non-consistent time, a node will > invariably freeze up. The cluster loses its connection to the > node and I cannot communicate with it using any of the cluster tools - > sometimes it will automatically reboot, but usually it requires me to go > perform a hard reset on the node. > > However, I have found that in most cases if I run 2 jobs in parallel (i.e. > 2 4-cpu processes, each using only 1 CPU on each node) things seem to work > fine. Nodes may still freeze from time to time but not nearly as often. > > The hardware: > The cluster was obtained pre-built from PSSC LabsEach compute node is a > dual-processor Tyan MB with 2 Athlon MP CPUS. They > are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 > Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We > are using the BeoMPI 1.0.7 implementation of MPICH compiled with: > --with-device=ch_p4 --with-comm=bproc > (note that I had to recompile BeoMPI with the PGI compiler to get it to > work with the model) > Again, we use Scyld Beowulf 28cz4 for the operating system > uname -a gives > Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 > i686 unknown > > _Please_ help if you have _any_ suggestions whatsoever. I am at the end > of my rope, and this is presenting a serious impediment to our research! > If you need more information, let me know and I will be happy to provide > it! > > Thanks... > > Tim Whitcomb > Meteorologist > University of Washington Applied Physics Lab > twhitcomb at apl.washington.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jim at ks.uiuc.edu Fri Nov 14 11:19:28 2003 From: jim at ks.uiuc.edu (Jim Phillips) Date: Fri, 14 Nov 2003 10:19:28 -0600 (CST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: <1068826302.28176.22.camel@roughneck.liniac.upenn.edu> Message-ID: On Fri, 14 Nov 2003, Nicholas Henke wrote: > On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > > Hi, > > > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > Does it work when using Lam/MPI ? I don't know. We run (and write) NAMD using direct TCP sockets. Is Lam bproc-compatible now? Do you have some theory that the sockets code is somehow responsible (other than simply stressing the machine)? -Jim _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Nov 14 11:31:57 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Fri, 14 Nov 2003 11:31:57 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <1068827516.28176.24.camel@roughneck.liniac.upenn.edu> On Fri, 2003-11-14 at 11:19, Jim Phillips wrote: > On Fri, 14 Nov 2003, Nicholas Henke wrote: > > > On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > > > Hi, > > > > > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > > > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > > > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > > > Does it work when using Lam/MPI ? > > I don't know. We run (and write) NAMD using direct TCP sockets. Is Lam > bproc-compatible now? Do you have some theory that the sockets code is > somehow responsible (other than simply stressing the machine)? The 7.X tree of lam is bproc 'aware', and as far as why I would try it -- just another datapoint. It may help to isolate what exactly is causing the hangs. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Fri Nov 14 12:29:52 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Fri, 14 Nov 2003 12:29:52 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <3FB51110.9060002@scalableinformatics.com> Hi Jim: I commented to Timothy offline that I am seeing stability problems on my customers machines based upon Tyan 2466 MB's. Some success via MB replacement (after isolating subsystems through memory tests/exchange, IO loads, net loads,...). Some were CPU replacement, the CPUs seemed to be burned. Failure was very difficult to isolate, lots of symptoms, few were repeatable. Joe Jim Phillips wrote: > Hi, > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > -Jim > > > On Thu, 13 Nov 2003, Timothy R. Whitcomb wrote: > > >>We are having an ongoing issue with our compute cluster, running Scyld >>28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute >>nodes and 1 master node. We are running the Navy's weather model. >> >>The problem: >>The model runs fine when run on 4 processors (1 on each compute node). >>However, when I use the SMP capabilities of the machine and try to run on, >>say, 8 processors (using both CPUs on each compute node), everything will >>run fine for a while. Then, at a non-consistent time, a node will >>invariably freeze up. The cluster loses its connection to the >>node and I cannot communicate with it using any of the cluster tools - >>sometimes it will automatically reboot, but usually it requires me to go >>perform a hard reset on the node. >> >>However, I have found that in most cases if I run 2 jobs in parallel (i.e. >>2 4-cpu processes, each using only 1 CPU on each node) things seem to work >>fine. Nodes may still freeze from time to time but not nearly as often. >> >>The hardware: >>The cluster was obtained pre-built from PSSC LabsEach compute node is a >>dual-processor Tyan MB with 2 Athlon MP CPUS. They >>are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 >>Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We >>are using the BeoMPI 1.0.7 implementation of MPICH compiled with: >>--with-device=ch_p4 --with-comm=bproc >>(note that I had to recompile BeoMPI with the PGI compiler to get it to >>work with the model) >>Again, we use Scyld Beowulf 28cz4 for the operating system >>uname -a gives >>Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 >>i686 unknown >> >>_Please_ help if you have _any_ suggestions whatsoever. I am at the end >>of my rope, and this is presenting a serious impediment to our research! >>If you need more information, let me know and I will be happy to provide >>it! >> >>Thanks... >> >>Tim Whitcomb >>Meteorologist >>University of Washington Applied Physics Lab >>twhitcomb at apl.washington.edu >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Nov 14 12:00:49 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Fri, 14 Nov 2003 12:00:49 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: <1068827516.28176.24.camel@roughneck.liniac.upenn.edu> References: <1068827516.28176.24.camel@roughneck.liniac.upenn.edu> Message-ID: <1068829249.28174.26.camel@roughneck.liniac.upenn.edu> On Fri, 2003-11-14 at 11:31, Nicholas Henke wrote: > On Fri, 2003-11-14 at 11:19, Jim Phillips wrote: > > On Fri, 14 Nov 2003, Nicholas Henke wrote: > > > > > On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > > > > Hi, > > > > > > > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > > > > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > > > > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > > > > > Does it work when using Lam/MPI ? > > > > I don't know. We run (and write) NAMD using direct TCP sockets. Is Lam > > bproc-compatible now? Do you have some theory that the sockets code is > > somehow responsible (other than simply stressing the machine)? > > The 7.X tree of lam is bproc 'aware', and as far as why I would try it > -- just another datapoint. It may help to isolate what exactly is > causing the hangs. BTW -- what version of bproc + kernel are you running ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Fri Nov 14 14:02:11 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Fri, 14 Nov 2003 11:02:11 -0800 Subject: limiting cpu usage on grid engine Message-ID: <3FB526B3.2020509@cert.ucr.edu> Hi, I was wondering if anyone would know an easy way to limit the amount of cpu's a user can request when they run an mpich job under grid engine? I was hoping to set the limit to around 16, and if a user requests more than that, then I'd like their job to be rejected. Thanks in advance, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nfaerber at penguincomputing.com Fri Nov 14 12:42:57 2003 From: nfaerber at penguincomputing.com (Nate Faerber) Date: 14 Nov 2003 09:42:57 -0800 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <1068831777.12797.47.camel@m10.penguincomputing.com> > The problem: > The model runs fine when run on 4 processors (1 on each compute node). > However, when I use the SMP capabilities of the machine and try to run on, > say, 8 processors (using both CPUs on each compute node), everything will > run fine for a while. Then, at a non-consistent time, a node will > invariably freeze up. The cluster loses its connection to the > node and I cannot communicate with it using any of the cluster tools - > sometimes it will automatically reboot, but usually it requires me to go > perform a hard reset on the node. > > However, I have found that in most cases if I run 2 jobs in parallel (i.e. > 2 4-cpu processes, each using only 1 CPU on each node) things seem to work > fine. Nodes may still freeze from time to time but not nearly as often. Do you have experience running this software on other clusters with SMP? I have seen a software package that did not perform well (or properly) on SMP systems. He have a customer that could only run one process per system. This limitation was a known to the software vendor and customer. It may not be the case now for that piece of software if it has matured since then. -- Nate Faerber, Engineer Tel: 415-358-2666 Fax: 415-358-2646 Toll Free: 888-PENGUIN PENGUIN COMPUTING www.penguincomputing.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajt at rri.sari.ac.uk Fri Nov 14 12:30:01 2003 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri, 14 Nov 2003 17:30:01 +0000 Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC Message-ID: <3FB51119.2070506@rri.sari.ac.uk> I've encountered problems using multiple 3C905CX-TXM NIC's in MSI KT3/4 AMD motherboards using the Scyld 3c59x kernel driver module with a Linux 2.4.22-openmosix1 kernel under Red Hat 8.0 Linux (updated by apt-get from the Fedora Linux rmp repository). Installing a single NIC is detected by kudzu, and it works correctly. Sometimes, installing a second NIC works, but sometimes it compromises the first NIC, sometimes the second NIC works. Similarly with a third NIC... The puzzling thing is that by selecting cards from a 'pool' of NIC's I bought for the cluster I can eventually get three that *will* work together. All these NIC's are brand new. The NIC's are connected to a Cisco switch and auto-negotiate with it. Using netdiag 'vortex-diag' I can verify that the NIC's are installed correctly and I can reset them using 'mii-diag -R' after which they re-negotiate with the switch and work correctly, but they do not work again from a cold boot (power off/on and reboot) without being reset like this manually with mii-diag. I have an MSI KT4V-L, KT4AV-L and 23 KT3Ultra-2 motherboards - I've not tested them exhaustively (all are brand new) but this problem appears to be common to all of them. I wonder if anyone else has a similar problem? [I put multiple NIC's in Gigabyte GA-7ZXE motherboards with the same 3c59x driver+kernel on my original openMosix cluster but I didn't have problems like this!]. The present configuration of NIC's in my motherboards works correctly from a cold boot, but not with just 'any' 3C905CX-TXM NIC's fitted: KT4V-L (head node of 8-node 'bobcat' cluster 7 at GA-7ZXE's) eth0 3COM 3C905CX eth1 3COM 3C905CX eth2 3COM 3C905CX eth3 VIA "Rhine" on-board LAN adapter (disabled) KT4AV-L (head node of 24-node 'topcat' cluster 23 at KT3Ultra-2) eth0 3COM 3C905CX eth1 3COM 3C2000-T eth2 VIA "Rhine" on-board LAN adapter Although I've 'solved' the problem for the two head nodes, I've got another 23 systems to configure and I'd like to know if there is a work-around for this problem that will let me install 'any' NIC? Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Nov 14 15:06:23 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 14 Nov 2003 15:06:23 -0500 (EST) Subject: Beowulf Bash update: date is Wednesday 19th, 2003 at SC2003 Message-ID: My cut-n-paste from last year's text turned into a snarf-n-barf when I missed changing the date. The web site contains the accurate information, and has just been updated: http://www.beowulf.org/beowulf/bash Our links to pictures of past Bashes have faded with time. If anyone has on-line copies, let me know so that I can add them to the page. ________________ 5th Annual Extreme Beowulf Bash. WHEN: Wednesday November 19th / 6-8pm WHERE: Hyatt Regency Hotel, Phoenix The Atrium SPONSORS: AMD Penguin Computing Scyld Other sponsors are welcome. Contact jcarrington at scyld.com. For updates and additions check back at http://www.beowulf.org/beowulf/bash in the days before the event! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Nov 14 16:11:06 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 14 Nov 2003 13:11:06 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) - memory In-Reply-To: Message-ID: hi ya jim On Fri, 14 Nov 2003, Jim Phillips wrote: > Hi, > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. cpu/nic is very specific how about specific manufacturer of the memory ? ( and it does make a very big difference ) - mushkin, corsair, kingston would be amongst the first few vendors that we would start from - lei, century(?), couple other is the next tier - i would not use any of the rest of the vendors in any dual-cpu systems and more importantly, do you know if they used "brand new memory" or recycled memory from other dead/randomly crashing systems ( returned parts ) - systems usually worked, solved itself, when i bought brand new memory from the distributor or a pc store i trusted would sell me new parts vs returned parts thanx alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Nov 14 16:00:46 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 14 Nov 2003 13:00:46 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4BF1C@orsmsx402.jf.intel.com> Message-ID: hi ya On Fri, 14 Nov 2003, Lombard, David N wrote: > From: Alvin Oga > [deletia] > > > > if the latest kernel has no effect, than there's some other > > serious hw problems ... timing issues ?? > > - make sure the kernel is compiled for athlon and not p4 > > and smp enabled > > > > - memory clock speeds, marginal memeory sticks > > ( get rid of generic no-name-brand memory sticks > > - swap memory sticks and see if the problem > > follow the memory ( keep good track of it > > so you can easily identify it if all the memory > > was thrown on the floor all at the same time > > > Instead of guessing, try memtest86 at > http://www.memtest86.com/memtest86-3.0.tar.gz i have yet to see memtest find a failure thats real or pass the memory that works in a given system ... where you know the system crashes and yet moving the mem stick to another system give you identical failures or passes the "application running tests" other memory testors http://www.Linux-1U.net/Diags/#Mem http://www.Linux-1U.net/Memory/#Test > The easiest way to use it is download the package, copy the pre-built > binary to a floppy, and boot the node with the floppy. It will run > until you stop it. > > This should point to a memory problem if it exists. Make sure you read > the docs, as there are some Athlon-specific comments IIRC. have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Nov 14 18:55:36 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sat, 15 Nov 2003 00:55:36 +0100 (CET) Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC In-Reply-To: <3FB51119.2070506@rri.sari.ac.uk> Message-ID: On Fri, 14 Nov 2003, Tony Travis wrote: > 2.4.22-openmosix1 kernel under Red Hat 8.0 Linux (updated by apt-get > from the Fedora Linux rmp repository). It's not clear what part of Fedora you are using now. Are you using kudzu from Fedora ? It appears to create some problems with 3C905 cards; there are some bug reports in Red Hat's Bugzilla, but so far nothing concludent. The only "solution" is to disable kudzu... > Installing a single NIC is detected by kudzu, and it works correctly. You can try deactivating kudzu ("chkconfig kudzu off") and run it manually only when adding cards. > The puzzling thing is that by selecting cards from a 'pool' of NIC's I > bought for the cluster I can eventually get three that *will* work > together. That's interesting. And after you find these 3 cards that work together they will _always_ work even after reboot > I have an MSI KT4V-L, KT4AV-L and 23 KT3Ultra-2 motherboards - I've not > ... > [I put multiple NIC's in Gigabyte GA-7ZXE motherboards with the same My guess is that it's something related to ACPI. GA-7ZXE didn't have support it. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajt at rri.sari.ac.uk Sat Nov 15 11:46:54 2003 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Sat, 15 Nov 2003 16:46:54 +0000 Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC In-Reply-To: References: Message-ID: <3FB6587E.6020209@rri.sari.ac.uk> Bogdan Costescu wrote: > [...] > It's not clear what part of Fedora you are using now. Are you using kudzu > from Fedora ? It appears to create some problems with 3C905 cards; there > are some bug reports in Red Hat's Bugzilla, but so far nothing concludent. > The only "solution" is to disable kudzu... Hello, Bogdan. I originally installed RH8.0 from the 'Psyche' iso distribution, then updated it periodically from RHN. Recently, I installed apt-get from the Fedora RH8.0 repository. I now update and upgrade from there instead. >>Installing a single NIC is detected by kudzu, and it works correctly. > > > You can try deactivating kudzu ("chkconfig kudzu off") and run it manually > only when adding cards. Tried that - makes no difference: These 3C905CX NIC's are failing to auto-negotiate with the Cisco switch at a low level, not failing to be detected and installed by kudzu. >[...] > That's interesting. And after you find these 3 cards that work together > they will _always_ work even after reboot Yes, that's right once I have a 'set' of NIC's that work, they continue to work reliably. Even, NIC's that don't initialise correcly on a cold boot will re-negotiate and connect properly if the tranceiver is reset manually using "mii-diag -R". >>I have an MSI KT4V-L, KT4AV-L and 23 KT3Ultra-2 motherboards - I've not >>... >>[I put multiple NIC's in Gigabyte GA-7ZXE motherboards with the same > > > My guess is that it's something related to ACPI. GA-7ZXE didn't have > support it. We spent quite a while deciding which boards to use: MSI are recommended by AMD. I've no other complaint about the boards. I'm not sure what level of auto-configuration the NIC's are capable of at PC BIOS level. I'm using the 3COM 3c2000 Linux 2.4 driver for the 3C2000-T NIC and it doesn't negotiate with the switch until the Linux driver is loaded. The 3C905CX's appear to wake up as soon as the ATX PSU AC is powered on. The status LED is green, which indicates 10Base-T and the NIC's appear as 10Base-T on the Cisco switch display panel. The NIC's negotiate with the Cisco switch as soon as the motherboard power switch is pressed. NIC's that fail to auto-negotiate end up with a flashing amber LED. A steady amber LED is present on NIC's that work. This all happens before GRUB boots the system (i.e. it is done at BIOS level). When Linux boots, the cards are all seen but, as I described, sometimes they don't work. The cards appear to be started by the Linux kernel 'OK' and can be seen by ifconfig but if they have a flashing amber LED, they will not work until manually reset using "mii-diag -R". I thought the 3c59x driver would do something similar to initialise the NIC's instead of relying on the BIOS to do it or have I misunderstood the problem? Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From larry at pssclabs.com Sat Nov 15 14:01:53 2003 From: larry at pssclabs.com (Larry Lesser) Date: Sat, 15 Nov 2003 11:01:53 -0800 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: Message-ID: <5.1.0.14.2.20031115110107.035e5698@mail.pssclabs.com> Tim: As this cluster has been running for over a year without any crashes, I would suspect that the hardware is fine. In general, the Tyan 2466 supports SMP applications fairly well. We have installed many Beowulfs using the Tyan 2466 without any SMP issues. However, most customers use Redhat. Have you tried running the model with both processors only on the head node ? If that fails, you may want to install a current version of Red Hat and see if that works better. Larry At 05:51 PM 11/13/2003 -0800, you wrote: >We are having an ongoing issue with our compute cluster, running Scyld >28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute >nodes and 1 master node. We are running the Navy's weather model. > >The problem: >The model runs fine when run on 4 processors (1 on each compute node). >However, when I use the SMP capabilities of the machine and try to run on, >say, 8 processors (using both CPUs on each compute node), everything will >run fine for a while. Then, at a non-consistent time, a node will >invariably freeze up. The cluster loses its connection to the >node and I cannot communicate with it using any of the cluster tools - >sometimes it will automatically reboot, but usually it requires me to go >perform a hard reset on the node. > >However, I have found that in most cases if I run 2 jobs in parallel (i.e. >2 4-cpu processes, each using only 1 CPU on each node) things seem to work >fine. Nodes may still freeze from time to time but not nearly as often. > >The hardware: >The cluster was obtained pre-built from PSSC LabsEach compute node is a >dual-processor Tyan MB with 2 Athlon MP CPUS. They >are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 >Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We >are using the BeoMPI 1.0.7 implementation of MPICH compiled with: >--with-device=ch_p4 --with-comm=bproc >(note that I had to recompile BeoMPI with the PGI compiler to get it to >work with the model) >Again, we use Scyld Beowulf 28cz4 for the operating system >uname -a gives >Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 >i686 unknown > >_Please_ help if you have _any_ suggestions whatsoever. I am at the end >of my rope, and this is presenting a serious impediment to our research! >If you need more information, let me know and I will be happy to provide >it! > >Thanks... > >Tim Whitcomb >Meteorologist >University of Washington Applied Physics Lab >twhitcomb at apl.washington.edu >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf Larry Lesser 949-380-7288 www.pssclabs.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fredruopp at yahoo.com Sun Nov 16 10:44:01 2003 From: fredruopp at yahoo.com (Fred Ruopp) Date: Sun, 16 Nov 2003 07:44:01 -0800 (PST) Subject: Optimal SMP Stucture for Opteron Message-ID: <20031116154401.19933.qmail@web60309.mail.yahoo.com> In order to build a 16 to 32 processor Opteron machine without corporate resources; a high performance and economic approach would seem to be a cluster of Quad motherboards interconnected by infiniBand host channel adapters( a la SBS Technologies) or ,possibly, a less expensive RemoteDMA data transfer PCI card. This approach stems from the little I know of the Opteron memory model; it seems that the Opteron leans towards NUMA memory management in a SMP system with more than 8 CPU's.Many have opined that Opteron's current Hyper Transport bus becomes saturated with 8 CPU's on one board. The locality of each CPU's memory seems to fit a NUMA model best and, more so, as the number of CPU's rise. SGI's Altrix has an approach somewhat similar to this. One distinctive feature that SGI has added to the Linux kernel to empower their NUMA model is process affinity - linking a process to one (or a group) of CPU's. If someone with an intimate knowledge of NUMA could critique this general approach for a SMP Opteron system, I would appreciate it greatly. __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Sun Nov 16 11:11:26 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Sun, 16 Nov 2003 11:11:26 -0500 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <20031116154401.19933.qmail@web60309.mail.yahoo.com> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> Message-ID: <3FB7A1AE.5020307@comcast.net> Fred, I think the first question to answer, what do you want to do with the cluster? In other words, what are your applications? Also, what do you mean 'without corporate resources'? If you can start filling in the answers to these questions, it becomes a little easier to give advise (although that has never stopped me before) :) Let me embellish a little. If you application doesn't require much network bandwidth or if network latency is not important, then you can consider a slower, cheaper network such as FastE or GigE (GigE is pretty cheap for smaller systems right now or you can use the smaller GigE switches in some kind of tree arrangement). Even if bandwidth and latency is important to some degree you could also start looking at dual Opteron boxes instead of 4-way or 8-way boxes. This may get you the performance you need or may have a better price/performance ratio. One last comment. This next week is SC2003 so many of the regular posters to this list won't be posting much. So don't be surprised if you don't get many answers to your questions right away. However, in the meantime, I think I can safely say that if you start thinking about the answers to these first few questions, the more likely you are to get more concrete answers. Jeff P.S. And to all of you folks going to SC2003 while I sit at work sucking on the glass teet, green with envy, the lot of you are all bastards! Bastards I say! Where's the love? I need T-shirts! > In order to build a 16 to 32 processor Opteron >machine >without corporate resources; a high performance and >economic approach would seem to be a cluster of Quad >motherboards interconnected by infiniBand host channel >adapters( a la SBS Technologies) or ,possibly, a less >expensive RemoteDMA data transfer PCI card. > > This approach stems from the little I know of the >Opteron memory model; it seems that the Opteron leans >towards NUMA memory management in a SMP system with >more than 8 CPU's.Many have opined that Opteron's >current Hyper Transport bus becomes saturated with 8 >CPU's on one board. The locality of each CPU's memory >seems to fit a NUMA model best >and, more so, as the number of CPU's rise. > > SGI's Altrix has an approach somewhat similar to >this. One distinctive feature that SGI has added to >the Linux kernel to empower their NUMA model is >process affinity - linking a process to one (or a >group) of CPU's. > > If someone with an intimate knowledge of NUMA could >critique this general approach for a SMP Opteron >system, I would appreciate it greatly. > > > >__________________________________ >Do you Yahoo!? >Protect your identity with Yahoo! Mail AddressGuard >http://antispam.yahoo.com/whatsnewfree >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Nov 16 20:18:01 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Mon, 17 Nov 2003 09:18:01 +0800 (CST) Subject: top500 list (was: opteron VS Itanium 2) Message-ID: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> Sorry Greg, top500 list came out and you lost! BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops Apple BigMac is number 3, while the Opteron cluster is number 6. Also, the 1936-CPU IA64 cluster is at the 5th place, at 8.6 TFlops. http://www.top500.org/dlist/2003/11/ Andrew. > > This would place the Big Mac in the 3rd place on > > the top500 list > > Except that there are several other new large > clusters that will > likely place higher -- LANL announced a 2,048 cpu > Opteron cluster a > while back, and LLNL has something new, too, I > think. Comparing > yourself to the obsolete list in multiple press > releases isn't very clever. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Nov 17 02:01:11 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Sun, 16 Nov 2003 23:01:11 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> References: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> Message-ID: <20031117070111.GB18073@greglaptop.greghome.keyresearch.com> On Mon, Nov 17, 2003 at 09:18:01AM +0800, Andrew Wang wrote: > Sorry Greg, top500 list came out and you lost! 'tis true, I did lose. It is nice to see a bunch of new clusters in the top 10. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Nov 17 01:59:39 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Sun, 16 Nov 2003 22:59:39 -0800 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <3FB7A1AE.5020307@comcast.net> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> <3FB7A1AE.5020307@comcast.net> Message-ID: <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> On Sun, Nov 16, 2003 at 11:11:26AM -0500, Jeffrey B. Layton wrote: > P.S. And to all of you folks going to SC2003 while I sit at > work sucking on the glass teet, green with envy, the lot of you > are all bastards! Bastards I say! Where's the love? I need T-shirts! We'll drink a beer on your behalf, before returning home with piles and piles of loot. Unfortunately, Yotta Yotta is kaput, so no more of the cute fuzzy orange cubes. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Mon Nov 17 02:36:02 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Mon, 17 Nov 2003 01:36:02 -0600 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> <3FB7A1AE.5020307@comcast.net> <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> Message-ID: <3FB87A62.3000406@tamu.edu> and all the TAMU folks going are gonna snag the toys for themselves! I'm helping plan their Grid Network for the State and I didn't even get a lousy T-shirt! This work stuff is getting in the way of all the cool meetings. gerry Greg Lindahl wrote: > On Sun, Nov 16, 2003 at 11:11:26AM -0500, Jeffrey B. Layton wrote: > > >>P.S. And to all of you folks going to SC2003 while I sit at >>work sucking on the glass teet, green with envy, the lot of you >>are all bastards! Bastards I say! Where's the love? I need T-shirts! > > > We'll drink a beer on your behalf, before returning home with piles > and piles of loot. Unfortunately, Yotta Yotta is kaput, so no more of > the cute fuzzy orange cubes. > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 17 06:23:13 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 17 Nov 2003 12:23:13 +0100 (CET) Subject: ClusterWorld In-Reply-To: Message-ID: Three month trial only valid within the US :-( If I cried and pleaded, adn said I regularly bought Linux Magazine retail in Borders in the UK could I get a sample copy? Hope everyone is having a good time at SC. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 17 07:24:32 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 17 Nov 2003 07:24:32 -0500 (EST) Subject: Optimal SMP Stucture for Opteron In-Reply-To: <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> Message-ID: On Sun, 16 Nov 2003, Greg Lindahl wrote: > On Sun, Nov 16, 2003 at 11:11:26AM -0500, Jeffrey B. Layton wrote: > > > P.S. And to all of you folks going to SC2003 while I sit at > > work sucking on the glass teet, green with envy, the lot of you > > are all bastards! Bastards I say! Where's the love? I need T-shirts! > > We'll drink a beer on your behalf, before returning home with piles > and piles of loot. Another reminder: the Beowulf Bash is Wednesday night, and we'll have some interesting things to announce. Jeff, if it makes you feel any better, T-shirts were way down last year. And for most of them you had to sit through a sales presentation. And we'll wait until after the show to gloat about how cool the swag was this year ;-> > Unfortunately, Yotta Yotta is kaput, so no more of > the cute fuzzy orange cubes. Doh! I _loved_ the sound clip they had! "Yotta Yotta" really fast and high pitched. My cubes disappeared as soon as they got back to the office. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Nov 17 08:15:34 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Mon, 17 Nov 2003 21:15:34 +0800 (CST) Subject: limiting cpu usage on grid engine Message-ID: <20031117131534.38200.qmail@web16807.mail.tpe.yahoo.com> You can send questions to the mailing lists hosted on the project homepage: http://gridengine.sunsource.net Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Mon Nov 17 10:20:49 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Mon, 17 Nov 2003 07:20:49 -0800 (PST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: Message-ID: <20031117152049.99076.qmail@web11406.mail.yahoo.com> --- Mark Hahn wrote: > and their yield is around 59%. not to mention the little bit > of missing ECC... I didn't follow, which "yield" are you refering to?? > I'd like very much to know the actual prices and discounts for Big > Mac. > it's a shame this isn't required for Top500... The price is around 1100*3000*(discount for edu) + cost interconnect 5.2M $ is not too far away from the "actual price"... Rayson __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 09:33:51 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 09:33:51 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> Message-ID: > Sorry Greg, top500 list came out and you lost! sigh. Apple bought a benchmark. does this make Apple products better? > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops OK, so we already know that their pricing disclosures are ah, "optimistic". (nodes probably cost >6k apiece, not half that) and their yield is around 59%. not to mention the little bit of missing ECC... actually, I'd love to know what their yield is if they had only 4GB per node - almost certainly lower. > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops yield is 72%. > Apple BigMac is number 3, while the Opteron cluster is > number 6. for the billionth time: rmax is just a matter of how much money you have. rmax/rpeak is the only part of top500 that matters. I'd like very much to know the actual prices and discounts for Big Mac. it's a shame this isn't required for Top500... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Mon Nov 17 10:28:10 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: Mon, 17 Nov 2003 09:28:10 -0600 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069082890.2659.69.camel@terra> On Mon, 2003-11-17 at 08:33, Mark Hahn wrote: > > Sorry Greg, top500 list came out and you lost! > > sigh. Apple bought a benchmark. does this make Apple products better? > Virginia bought a benchmark, not Apple. According to the reports, he paid list for them. Course he got extra goodies as a result of buying so many, but I suspect that behavior would occur with any vendor. BTW, Yes they are pretty damned good. Running them head to head against other machines, even without benchmarking heroics, they stand up quite well. One Amber benchmark that a colleague ran showed that a 2Ghz G5 was nearly twice as fast as a 2.8Ghz Xeon. Thats one datapoint, but other testing has shown that it continues to do pretty well. We can't do real apple-to-apples comparisons, no pun intended, because we only have gigabit on the G5's and things like Amber and Gromacs seem to run into that pretty quickly. > > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops > > OK, so we already know that their pricing disclosures are ah, > "optimistic". (nodes probably cost >6k apiece, not half that) > and their yield is around 59%. not to mention the little bit > of missing ECC... I'm sure that the infiniband figures into the discrepancy a bit. $3k is roughly the list for a stock box. > > actually, I'd love to know what their yield is if they had only > 4GB per node - almost certainly lower. > > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > yield is 72%. > > > Apple BigMac is number 3, while the Opteron cluster is > > number 6. > > for the billionth time: rmax is just a matter of how much money > you have. rmax/rpeak is the only part of top500 that matters. Perhaps for shallow people who put stock in such lists. What really matters is how much of that peak can be applied to actual computationally intense problems that the owner considers needing solved. It doesn't matter how fast the Ferrari can go, as the speed limit is still only 70. > > I'd like very much to know the actual prices and discounts for Big Mac. > it's a shame this isn't required for Top500... > Does Virginia have to include the 600-700 pizzas that they bought for the volunteers during the construction phase. ;-) Perhaps they should also have something like watts per gigaflop, or cubic feet occupied per gigaflop. -- -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 11:28:25 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 11:28:25 -0500 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB8F729.3080709@lmco.com> Mark Hahn wrote: > > Sorry Greg, top500 list came out and you lost! > > sigh. Apple bought a benchmark. does this make Apple products better? > > > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops > > OK, so we already know that their pricing disclosures are ah, > "optimistic". (nodes probably cost >6k apiece, not half that) > and their yield is around 59%. not to mention the little bit > of missing ECC... > I went through a straw-man on pricing at one time. Let me dig that up.... I'm seeing dual 2.0 G5 boxes with the Superdrive, 512 Megs of memory, and about 80 Gigs of storage for $3k (commercial pricing in single units). So, for laughs, let's assume $3k per box for the 4 Gigs of memory and the storage. Node cost: 1100 x $3k = $3.3 million For the IB network, I've been using $1500-$1600 per node based on quotes I've gotten from other companies. IB cost: 1100 x $1500 = $1.65 million For the Cisco network, I have no idea. Why anybody would use Cisco crap in a HPC system even for a management network is beyond me. So, let's just guess, $300k. Cisco network cost: $300k Total so far: $5.25 million Rack cost - again I have no idea. I'll be kind and gentle and guesstimate about $1500 a rack. It looks like they're getting about 12 nodes per rack, so assume 92 racks. Rack cost: $1500 x 92 = $138k Let's exclude the floor space, windows, pizzas, chillers, etc. and figure out the total: Total = $5.388 million I guess I'm not too far off. Personally I think the big unknown is the rack cost. That could be very expensive since it's specialized (although 92 racks in a single sale might be considered a commodity). Also, the Cisco costs could be high as well (Cisco never does anything that can't make money off of). This was just for laughs. I still think there is a sugar daddy somewhere in there. Be it Cisco, Apple, IBM, etc., there are some costs not being mentioned. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 11:50:14 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 11:50:14 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8F729.3080709@lmco.com> Message-ID: > I'm seeing dual 2.0 G5 boxes with the Superdrive, 512 Megs of > memory, and about 80 Gigs of storage for $3k (commercial > pricing in single units). So, for laughs, let's assume $3k per box > for the 4 Gigs of memory and the storage. I did the pricing too. afaikt, it's actually 8GB per node, so the price was just under $5900 list. I'd guess that the IB hardware is at least $1500 list. > Node cost: 1100 x $3k = $3.3 million around 8M list. > Total = $5.388 million it's a very good price, no doubt. it would be nice if Top500 would require full price disclosures - for instance, could I take a same-sized pile of cash to Apple/Mellanox and get the same discount? I doubt it. besides price, lack of ECC is a big question. how many other Top500 scores are ECC-less? does anyone know the FIT rate for dram nowadays? I figure BigMac has at either 7e4 or 1.4e5 dram chips... how much does it help your HPL score to run 4GB/cpu? I'd guess that most clusters are lighter than that. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Mon Nov 17 10:06:40 2003 From: ctierney at hpti.com (Craig Tierney) Date: 17 Nov 2003 08:06:40 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069081600.14602.57.camel@localhost.localdomain> On Mon, 2003-11-17 at 07:33, Mark Hahn wrote: > > Sorry Greg, top500 list came out and you lost! > > sigh. Apple bought a benchmark. does this make Apple products better? Dell bought a benchmark at NCSA (See Tungsten at #4). Does this mean that Dell knows clusters? I would be more than happy to have a vendor decide to let me have their stuff at significant discount so they can have a good benchmark. > > > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops > > OK, so we already know that their pricing disclosures are ah, > "optimistic". (nodes probably cost >6k apiece, not half that) > and their yield is around 59%. not to mention the little bit > of missing ECC... I am really curious to know why their are only getting 59% efficiency when Quadrics system with similar node counts is above 70%. We know it isn't an issue with the node speed, because the X cluster ran at 80% efficiency with 128 nodes. Whats going on with the Infiniband????? > > actually, I'd love to know what their yield is if they had only > 4GB per node - almost certainly lower. > > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > yield is 72%. How much was this system? You sure Linux Networx didn't 'buy' the business? It seems to be happening alot these days with large systems. Craig > > > Apple BigMac is number 3, while the Opteron cluster is > > number 6. > > for the billionth time: rmax is just a matter of how much money > you have. rmax/rpeak is the only part of top500 that matters. > Yes but besides to politicians and the people funding the systems, how relevant is the rmax/rpeak ratio? Craig > I'd like very much to know the actual prices and discounts for Big Mac. > it's a shame this isn't required for Top500... > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 11:54:42 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 11:54:42 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117152049.99076.qmail@web11406.mail.yahoo.com> Message-ID: > > and their yield is around 59%. not to mention the little bit > > of missing ECC... > > I didn't follow, which "yield" are you refering to?? rmax/rpeak. it's really the most interesting part of the list; the actual ranking is just a matter of funding. > > I'd like very much to know the actual prices and discounts for Big > > Mac. > > it's a shame this isn't required for Top500... > > The price is around 1100*3000*(discount for edu) + cost interconnect 3000 would be an impressive discount, since the list is around $5900. > 5.2M $ is not too far away from the "actual price"... it's massively discounted, and probably not repeatable. it would be nice to know the best-published prices, as well as what it would cost to "fix" the ECC. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Mon Nov 17 10:08:41 2003 From: david.n.lombard at intel.com (Lombard, David N) Date: Mon, 17 Nov 2003 07:08:41 -0800 Subject: FW: Scyld Nodes Freezing w/ SMP (fwd) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4BF2C@orsmsx402.jf.intel.com> From: Alvin Oga [mailto:alvin at Mail.Linux-Consulting.com] > > Instead of guessing, try memtest86 at > > http://www.memtest86.com/memtest86-3.0.tar.gz > > i have yet to see memtest find a failure thats real > or pass the memory that works > in a given system ... where you know the system crashes > > and yet moving the mem stick to another system give you > identical failures or passes the "application running tests" > Hmmm. My experiences were consistently different, i.e., consistently useful. In a previous job, I made it a standard boot option on installed systems so that customers could just boot right into it, including via PXE boot, and save us a diagnostic visit to only find out new memory was needed. > other memory testors s/testors/testers/ > http://www.Linux-1U.net/Diags/#Mem > http://www.Linux-1U.net/Memory/#Test Thanks for the pointers, I'll check them out! -- David N. Lombard My comments represent my opinons, not those of Intel. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 12:18:32 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 12:18:32 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8F729.3080709@lmco.com> Message-ID: On Mon, 17 Nov 2003, Jeff Layton wrote: > Let's exclude the floor space, windows, pizzas, chillers, etc. > and figure out the total: > > Total = $5.388 million > > I guess I'm not too far off. Personally I think the big unknown > is the rack cost. That could be very expensive since it's specialized > (although 92 racks in a single sale might be considered a commodity). > Also, the Cisco costs could be high as well (Cisco never does anything > that can't make money off of). Don't leave out the wiring and chillers if you're going to include the racks and cisco stuff -- 1100 nodes burning (at an as you say humorous but generous guess) 250 Watts apiece is, um, 275,000 watts. As in forgetting the capital costs of the chillers and wiring, just buying the power to run this puppy for a year will cost around $275K/year (more than the racks themselves). The cost for the chillers, blowers, transformers, and primary wiring infrastructure to actually move this power in and waste heat out of their space will likely add a pretty big chunk to the total. Perhaps 180 20 amp circuits? A chiller the size of a small destroyer? Their own nuclear power plant (just kidding:-)? So add another seven digit number to the above, at a guess...;-) The pizza I agree is free... rgb > > > This was just for laughs. I still think there is a sugar daddy > somewhere in there. Be it Cisco, Apple, IBM, etc., there are some > costs not being mentioned. > > Jeff > > > -- > Dr. Jeff Layton > Aerodynamics and CFD > Lockheed-Martin Aeronautical Company - Marietta > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Mon Nov 17 11:53:25 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 17 Nov 2003 09:53:25 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069081600.14602.57.camel@localhost.localdomain> References: <1069081600.14602.57.camel@localhost.localdomain> Message-ID: <1069088005.8428.1185.camel@thinkpad> > I am really curious to know why their are only > getting 59% efficiency when Quadrics system with > similar node counts is above 70%. We know it > isn't an issue with the node speed, because > the X cluster ran at 80% efficiency with 128 nodes. > > Whats going on with the Infiniband????? It's not just the Infiniband, a lot of it is the processor. Actually, it's the artificially inflated FLOPs of the processor. Everyone who believes that a G5 should be rated at 8 GFLOPs, please speak up... Multiply-adds are great for some stuff, but not everything... (LINPACK happens to be among the stuff it's pretty good for). Keith _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 12:01:38 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 12:01:38 -0500 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB8FEF2.9090009@lmco.com> Mark Hahn wrote: > > I'm seeing dual 2.0 G5 boxes with the Superdrive, 512 Megs of > > memory, and about 80 Gigs of storage for $3k (commercial > > pricing in single units). So, for laughs, let's assume $3k per box > > for the 4 Gigs of memory and the storage. > > I did the pricing too. afaikt, it's actually 8GB per node, > so the price was just under $5900 list. I'd guess that the IB > hardware is at least $1500 list. > From the pdf at the VTech website, it's 4 GB per node. (http://don.cc.vt.edu/tcfslides.pdf) > > Node cost: 1100 x $3k = $3.3 million > > around 8M list. > > > Total = $5.388 million > > it's a very good price, no doubt. it would be nice if Top500 > would require full price disclosures > It would be interesting. I'm sure a number of the clusters in there were 'bought' by the vendor. Of course, they make their money back by screwing you later on with really high maintenance fees. I know people like to get hardware for free (I've had arguments with people about this), but anytime it looks like a vendor is offering something or nothing, the little hairs on the back of my neck stand up and I reach over and grab my ankles. Of course, management seldoms sees things the same way. They are very short-term focused. :) > how much does it help your HPL score to run 4GB/cpu? I'd guess that most > clusters are lighter than that. > It depends. You can process more data per node, so you are cutting down on communications. But if I remember the rules correctly, you can pick whatever size problem you want. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 12:37:37 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 12:37:37 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069088211.8428.1193.camel@thinkpad> Message-ID: > > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > > > yield is 72%. > > BTW, 60% of 8 GF == 4.8 GF per processor. 72% of 4 GF == 2.88. If you > use LINPACK as a metric, why do you think the latter wins? because rmax/rpeak as being a sort of "balance-like" measure. it's also scale-invariant, to the first order at least. within the same category of hardware (say, desktop microprocessors and a premium but off-the-shelf interconnect), rmax/rpeak is interesting, since $/cpu are very roughly comparable. > > for the billionth time: rmax is just a matter of how much money > > you have. rmax/rpeak is the only part of top500 that matters. > > You have to include cost. I assume that if top500 reported prices, they'd be fairly wonky. for instance, in the DB domain, are $/TPC numbers all that useful? > Or, put another > way, a vendor would be better off building a slower processor with a > modern memory system that achieved 95% of peak. yes, you've just described a trad vector box. looking at rmax/rpeak is indeed a "vector super-ness" measure. > You can always put more > of them together with more money, right? right, which is why I want to somehow regress scale out of the measure. > (I'm not sure if I know of any > networks that scale to 100,000 processors). grid ;) > rmax/rpeak is just as bad (or worse) of a metric as rmax if it is the > only metric. It's not like LINPACk is terribly communication bound or > anything (in which case, rmax/rpeak might mean something). I wish I had a 1K CPU cluster with gigabit, Myri, Quadrics, *and* IB ;) squinting at top500, it looks like there is a fairly significant dependence of rmax/rpeak upon type of interconnect. that's quite interesting. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Mon Nov 17 11:56:51 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 17 Nov 2003 09:56:51 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069088211.8428.1193.camel@thinkpad> > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > yield is 72%. BTW, 60% of 8 GF == 4.8 GF per processor. 72% of 4 GF == 2.88. If you use LINPACK as a metric, why do you think the latter wins? (note the if. I'm not suggesting LINPACK is the right benchmark.) > for the billionth time: rmax is just a matter of how much money > you have. rmax/rpeak is the only part of top500 that matters. You have to include cost. Otherwise, if I could buy a 1 PetaFLOP system that yielded 10% efficiency for the same price as a 100 TeraFLOP system that yielded 75% efficiency, I should buy the latter? Or, put another way, a vendor would be better off building a slower processor with a modern memory system that achieved 95% of peak. You can always put more of them together with more money, right? (I'm not sure if I know of any networks that scale to 100,000 processors). rmax/rpeak is just as bad (or worse) of a metric as rmax if it is the only metric. It's not like LINPACk is terribly communication bound or anything (in which case, rmax/rpeak might mean something). Keith _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 13:09:51 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 13:09:51 -0500 Subject: top500 list In-Reply-To: <3FB90E52.6020204@lmco.com> References: <3FB90E52.6020204@lmco.com> Message-ID: <3FB90EEF.80206@lmco.com> Of course, I forgot to adjust for the memory difference. According to Mark, the 4G dual boxes on the Apple store are $5349 each. Nodes: 1100 x $5349 = $5.884 million Which brings the total to: $7.972 Sorry about the confusion. Jeff > > > > > So, Mark's numbers are correct. So my 'adjusted' estimate is, > > Nodes: 1100 x $8k = $8.8 million > IB network: $1.65 million > Cisco Crap^h^h^h^hNetwork: $300k > Racks: $138k > > Total: $10.889 million > > Who's your Sugar Daddy VTech? > > > > Thanks! > > Jeff > > -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 13:02:30 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 13:02:30 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8FEF2.9090009@lmco.com> Message-ID: > From the pdf at the VTech website, it's 4 GB per node. > > (http://don.cc.vt.edu/tcfslides.pdf) hmm, you're right (I found 4GB/node in multiple sources). that means that the current list price is $5349, rather than $8k. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 13:07:14 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 13:07:14 -0500 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB90E52.6020204@lmco.com> Mark Hahn wrote: > > > > > I'd like very much to know the actual prices and discounts for > Big > > > > > Mac. > > > > > it's a shame this isn't required for Top500... > > > > > > > > The price is around 1100*3000*(discount for edu) + cost > interconnect > > > > > > 3000 would be an impressive discount, since the list is around $5900. > > > > > > > > http://www.microcenter.com/single_product_results.phtml?product_id=0161922 > > > when I view that page, it lists $3k for the dual 2.0 with 512M, > which is exactly what store.apple.com says. > > the VT config is with 8GB per box, which store.apple.com says will > list at $7949! wow, that has to be higher than last time I looked... > Wow! Apple is charging $5k to go to 8 Gigs! What a rip! It's just plain-jane DDR memory that you can get anywhere. It's not even ECC! If you don't mind, I'm cc-ing this to the beowulf list. So, Mark's numbers are correct. So my 'adjusted' estimate is, Nodes: 1100 x $8k = $8.8 million IB network: $1.65 million Cisco Crap^h^h^h^hNetwork: $300k Racks: $138k Total: $10.889 million Who's your Sugar Daddy VTech? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Mon Nov 17 13:04:45 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 17 Nov 2003 11:04:45 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069092285.8431.1295.camel@thinkpad> > because rmax/rpeak as being a sort of "balance-like" measure. > it's also scale-invariant, to the first order at least. > > within the same category of hardware (say, desktop microprocessors and a > premium but off-the-shelf interconnect), rmax/rpeak is interesting, > since $/cpu are very roughly comparable. But the rpeaks vary by a factor of 2 or more... > > You can always put more > > of them together with more money, right? > > right, which is why I want to somehow regress scale out of the measure. no - that was sarcasm. At 10,000 processors it is hard enough to build a box that will stay up long enough to do a useful amount of work with apps that run across all of the nodes. At 100,000 processors, today, it is pretty close to impossible. And that is IF you can get your app to scale that well. Big IF. (yes, monte carlo simulations can probably scale that high. Yes, you could probably build fault tolerant monte carlo simulations. Yes, it would be nice to run something other than monte carlo on the machine.) Keith _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 12:05:35 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 12:05:35 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069082890.2659.69.camel@terra> Message-ID: On Mon, 17 Nov 2003, Dean Johnson wrote: > Perhaps they should also have something like watts per gigaflop, or > cubic feet occupied per gigaflop. I actually think that it would be very interesting to plot watts/flop over time, for integrated systems (not just "the CPU", but CPU and whole box supporting). My personal theory is that it is actually decreasing, on average, because of interactions between Moore's Law scaling of CPU speed and CPU power consumption and because of the cost of feeding the REST of the system, which tends to hang nearly constant (effectively dividing it as the relative power of the CPU is increased). At any rate, as I run a 300 MHz Celeron side by side with a 2200 MHz Celeron at home, I don't think the 2200 MHz Celeron eats 7+ times the power. One day I'll liberate my kill-a-watt and take it home and find out. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 14:15:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 14:15:12 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8F729.3080709@lmco.com> Message-ID: On Mon, 17 Nov 2003, Jeff Layton wrote: > Let's exclude the floor space, windows, pizzas, chillers, etc. > and figure out the total: > > Total = $5.388 million > > I guess I'm not too far off. Personally I think the big unknown > is the rack cost. That could be very expensive since it's specialized > (although 92 racks in a single sale might be considered a commodity). > Also, the Cisco costs could be high as well (Cisco never does anything > that can't make money off of). With 1100 dual CPU nodes drawing perhaps 250 Watts apiece, the room needs some 275 KW of capacity, maybe 180 20 amp circuits (assuming one can drive roughly six nodes per circuit). This costs ballpark estimate of $275,000/year just to feed and cool the nodes, more than the racks themselves. The capital cost of the circuits, transformers, space renovation, and the chillers required to drive this cluster would likely add another seven digit number to your estimate and is a lot less ignorable than the cost of the racks or network;-) Small nuclear power plant optional... Now the pizza cost, that can be ignored. However, the human cost is another "interesting" question. With 1100 systems running 24x7 under stress, I would expect to rack up system failures nearly every day after the cluster was roughly a year old and beyond. If operating system installation and administration scaled nearly perfectly (which with linux is not insanely impossible, but for a cluster this size e.g. pxe-automated installs are absolutely essential) one's ability to manage the cluster is likely limited by user support (which is beyond prediction, as it depends on task mix and expertise of user base) and hardware maintenance capacity. They also need proactive administration -- hot and cold running help for emergencies given the large productivity cost when the cluster is down. I'm going to guess that they have 5-6 full time people just to care for and feed the cluster and to sacrifice the odd chicken here and there. Maybe another $300K in salaries and benefits. So I'd go to over $6 million (maybe even over $7 million) total including infrastructure, with perhaps a $600-750K/year operating budget. > This was just for laughs. I still think there is a sugar daddy > somewhere in there. Be it Cisco, Apple, IBM, etc., there are some > costs not being mentioned. It >>does<< seem to be a lot of money for a cluster, doesn't it. Not exactly pocket change, or University startup money. DoD, DOE, NIH, perhaps, it seems a lot for NSF unless, as you suggest, there are corporate sponsors contributing. The other thing that always amuses me about clusters like this is the Moore's Law effect. They buy it this year, after spending a year (easily) preparing the site and building the requisite infrastructure. They operate it for three years (spending $2.25 million, say). In the meantime, node power at constant cost has increased by a factor of 4. If they invested their capital in bonds for those three years (including the operating budget), and bought that 4x faster node hardware, they would BREAK EVEN on the amount of work they get done by year four, and have saved three years operating expenses plus interest in addition to the interest on the entire capital amount for three years -- an easy $3+ million. To put it another way, it is bloody silly to take an N year budget and spend it all in year one on computing hardware, because compute capacity that can be purchased at constant cost grows exponentially while compute capacity that has been purchased AT fixed cost depreciates exponentially and has a rather high baseline operating cost. It also means that you >>really<< pay for a design error. If this enormous 1100 node cluster, designed and purchased all at once, has any design flaw with a repair cost that scales like the number of nodes, it would be ruinous. If one had only bought (say) 1/4 of the nodes in year one, 1/4 more in year two, 1/4 more in year three, and 1/4 more in year 4, one would get roughly: 4 years @ 0.25 capacity +3 years @ 0.40 capacity +2 years @ 0.63 capacity +1 year @ 1.00 capacity ========================== 4.46 capacity-years (assuming an 18 month ML doubling time) and would have numerous opportunities to repair design flaws at minimal cost and to exploit special deals and opportunities that exceed this "average" performance. Sigh, rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Mon Nov 17 14:28:42 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Mon, 17 Nov 2003 14:28:42 -0500 (EST) Subject: ClusterWorld In-Reply-To: Message-ID: On Mon, 17 Nov 2003, John Hearns wrote: > Three month trial only valid within the US :-( Sorry. > > If I cried and pleaded, adn said I regularly bought Linux Magazine > retail in Borders in the UK could I get a sample copy? Unfortunately, I do not think CW will show up at Borders or Barnes and Noble unless the general public gets really excited about clusters. Who knows. > > Hope everyone is having a good time at SC. Do you know anyone who is going? Have them come by and get you a copy. Doug > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Nov 17 13:59:21 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 17 Nov 2003 10:59:21 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8FEF2.9090009@lmco.com> References: < Message-ID: <5.2.0.9.2.20031117104002.018c6618@mailhost4.jpl.nasa.gov> At 12:01 PM 11/17/2003 -0500, Jeff Layton wrote: >Mark Hahn wrote: > >> > Node cost: 1100 x $3k = $3.3 million >> >>around 8M list. >> >> > Total = $5.388 million >> >>it's a very good price, no doubt. it would be nice if Top500 >>would require full price disclosures > > It would be interesting. I'm sure a number of the clusters >in there were 'bought' by the vendor. Of course, they make >their money back by screwing you later on with really >high maintenance fees. Also, bear in mind that the apparent cost of the node, to the manufacturer, is somewhat less than it would be to the eventual retail consumer, even for volume purchases. Depending on how the mfr does their accounting, the actual "cost" (as in, bottom line effect) of the node being provided gratis to an educational instutition may be quite low, because it may not have things like an apportionment of marketing and distribution costs. On the other hand, the mfr can probably claim a "retail value of $X Million" for their tax deduction (subject to some restrictions.. you can't claim costs you didn't actually incur). Also, consider that if a company like Dell or Apple spends, say, 10% of their budget on sales and marketing (Apple, for instance, spent 898M on "selling, general, and administrative" costs on $4,492M in sales), that a few million dollars in computers isn't a huge advertising expense (compared to buying ads... Web portals typically get $20-30/CPM (CPM=cost per thousand (views/impressions)) A full page color ad in a Sunday Paper with a circulation of several hundred thousand might be $100/CPM I don't have rate cards in front of me, but I found some information on the web (of course!) Printed PC Magazine, International edition, 4 color ad page for 2 wk period ($52K) which works out to $56/CPM Printed Wired Magazine, full page, $51/CPM Compare this to the "free publicity" from getting your cluster on the list, and featured in some articles. Giving away a million bucks worth of computers might actually be a better deal than buying a million bucks worth of ads. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Mon Nov 17 14:37:38 2003 From: canon at nersc.gov (canon at nersc.gov) Date: Mon, 17 Nov 2003 11:37:38 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: Message from Mark Hahn of "Mon, 17 Nov 2003 12:37:37 EST." Message-ID: <200311171937.hAHJbcfr011705@pookie.nersc.gov> I think the Big Mac guys deserve snaps for pulling this system off. The VA Tech guys accomplished a real feat and I suspect they worked their collective butts off to do it. Who would have predicted a Mac based cluster to be in the top 5? Not me. I still suspect this is an anomaly. I don't think we are going to see a bunch of Mac based clusters breaking into the list next year. Which begs the question "Why not?" My feeling is when you build a system that large, you want to know you can get real work done with it. That's where committed vendors and large user communities become important. At this point, Big Mac is a one-of-a-kind. The Apple crowd has never even looked at HPC before this (probably because its typically a money loser). Meanwhile there seems to be a growing community of people that want to use the Opteron for HPC. That's why I expect we will see more Opteron clusters over time. But hey, maybe Big Mac will make people look at the Apple stuff more closely. There still seems to be a lot of missing pieces though (parallel debuggers, profiling tools, libraries, etc). The long term measure for the Big Mac is to see how well they can use the system, especially for generic codes. Regarding the top500, I see the point of the top500 as being a ranking of capability of various machines. Unfortunately, its difficult to come up with a benchmark that accurately measures capability that is super portable and easy to run. Personally I don't think LinPACK should necessarily be that code, but at least it forces people to run a consistent problem across the entries. I think adding costs would be interesting since any real purchase has to take this into consideration, but it would be more for comparison purposes and not ranking. NERSC(#9) has used (sustained performance)/$, where the sustained performance is calculated from a collection of standard codes used by the NERSC community. I think this approach has served us well, but it can be challenging to get apples to apples comparison when you are talking about projecting the performance of codes to large scales. Each vendor does the projection their own way and its tricky to know how much to believe, especially if its on a non-existent hardware or at untested scales. My true measure for the top500 would be the value of the science (or work) accomplished with it, a difficult to impossible thing to determine. NERSC's puts all the emphasis on the science. This means considering: how usable the system is; how hard is it to harness the full capability of the system; what will the sustained performance be. Then we try to squeeze every cycle out of the system. We've ran Seaborg (#9) with +90% utilization for years now. We've gotten tons of science done with it, just like we did the T3E before it. It can be a little disappointing to watch your system slide down the rankings, when you know its still being used to do great stuff and its still making a large impact. But I guess that's just the nature of Moore's law. I think this years top 500 raises all sorts of interesting questions. How will the X-1 evolve? Will Opteron systems become a big player? What about Itanium? Will the Blue Gene based systems make an impact? Its certainly more interesting than a few years ago where there were just a handful of vendors and no clear direction where things were heading. --Shane Disclaimer: These statements represent my own opinions and not those of NERSC. ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Mon Nov 17 13:50:19 2003 From: agrajag at dragaera.net (Sean Dilda) Date: Mon, 17 Nov 2003 13:50:19 -0500 Subject: top500 list In-Reply-To: <3FB90EEF.80206@lmco.com>; from jeffrey.b.layton@lmco.com on Mon, Nov 17, 2003 at 01:09:51PM -0500 References: <3FB90E52.6020204@lmco.com> <3FB90EEF.80206@lmco.com> Message-ID: <20031117135019.A30183@vallista.dragaera.net> On Mon, 17 Nov 2003, Jeff Layton wrote: > > Of course, I forgot to adjust for the memory difference. > According to Mark, the 4G dual boxes on the Apple store > are $5349 each. > > Nodes: 1100 x $5349 = $5.884 million > > Which brings the total to: $7.972 Wow! That's over $2k/node for just a few Gig of RAM. There's also a chance that they didn't buy the RAM from Apple. If you look at their pictures site (http://don.cc.vt.edu/g5modify/) you can see that they in fact modified all of the boxes. I didn't see it say what was modified, but lets assume they added RAM. Now if you go look at crucial (http://www.crucial.com/store/listparts.asp?Mfr%2BProductline=Apple%2BPower+Mac&mfr=Apple&cat=RAM&model=Power+Mac+G5+%28Dual+2.0GHz+DDR%29&submit=Go) you find they can get a 512M stick of ram for $93.99 Even if they replaced all th ram that apple sent them with new RAM, that's only around $752/node for the RAM, not $2349. So, 1100 * $3752 = $4.127 million, and the total up to $6.215 million (using your numbers). Not as low as your first numbers, but not has high as your new ones. And there are some other adjustments that could be made. Like the racks for instance. I have heard of big name vendors throwing in racks with large purchases, especially for repeat customers. So, its possible that the racks and maybe some other stuff were given away by Apple. They may not give the same deal to everyone, but I imagine anyone buying that many machines at once can talk the sales rep into wheeling and dealing quite a bit. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Nov 17 13:23:56 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Mon, 17 Nov 2003 10:23:56 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069082890.2659.69.camel@terra> References: <1069082890.2659.69.camel@terra> Message-ID: <20031117182356.GA19831@greglaptop.greghome.keyresearch.com> On Mon, Nov 17, 2003 at 09:28:10AM -0600, Dean Johnson wrote: > Virginia bought a benchmark, not Apple. "Virginia" is the University of Virginia. Virginia Tech is that *other* school. -- greg, alumnus of the real thing _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 13:58:54 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 13:58:54 -0500 Subject: top500 list In-Reply-To: <20031117135019.A30183@vallista.dragaera.net> References: <20031117135019.A30183@vallista.dragaera.net> Message-ID: <3FB91A6E.2060703@lmco.com> Sean Dilda wrote: > On Mon, 17 Nov 2003, Jeff Layton wrote: > > > > > Of course, I forgot to adjust for the memory difference. > > According to Mark, the 4G dual boxes on the Apple store > > are $5349 each. > > > > Nodes: 1100 x $5349 = $5.884 million > > > > Which brings the total to: $7.972 > > Wow! That's over $2k/node for just a few Gig of RAM. There's also a > chance that they didn't buy the RAM from Apple. If you look at their > pictures site (http://don.cc.vt.edu/g5modify/) you can see that they in > fact modified all of the boxes. I didn't see it say what was modified, > but lets assume they added RAM. Now if you go look at crucial > (http://www.crucial.com/store/listparts.asp?Mfr%2BProductline=Apple%2BPower+Mac&mfr=Apple&cat=RAM&model=Power+Mac+G5+%28Dual+2.0GHz+DDR%29&submit=Go > ) > > you find they can get a 512M stick of ram for $93.99 Even if they > replaced all th ram that apple sent them with new RAM, that's only > around $752/node for the RAM, not $2349. > Good point. I forgot they popped the cases and did something to them (never did bother to figure out what though). > So, 1100 * $3752 = $4.127 million, and the total up to $6.215 million > (using your numbers). > > Not as low as your first numbers, but not has high as your new ones. > And there are some other adjustments that could be made. Like the racks > for instance. I have heard of big name vendors throwing in racks with > large purchases, especially for repeat customers. So, its possible that > the racks and maybe some other stuff were given away by Apple. They may > not give the same deal to everyone, but I imagine anyone buying that > many machines at once can talk the sales rep into wheeling and dealing > quite a bit. > The presentations I've seen said that they contacted the Rack manufacturer directly and that custom racks were designed and built. I'm sure the rack manufacturer got paid by someone, just not sure who. :) Still, in my quick analysis, you only drop $138k. Still not close to the $5.2 million that's floating around. Let's have some more fun! Let's assume that all the vendors but Apple got paid what I projected. So the difference between the quoted (5.2) and the projected (6.215) is $1.015 million. Divide that by the number of nodes and you get $923 per node. Furthermore, let's assume that VTech got a volume discount of $923 per node. Then we get the $5.2 million. So, in fact VTech paid Apple about $2k per node instead of $3k. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Mon Nov 17 15:21:25 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Mon, 17 Nov 2003 14:21:25 -0600 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117152049.99076.qmail@web11406.mail.yahoo.com> References: <20031117152049.99076.qmail@web11406.mail.yahoo.com> Message-ID: On Mon, 17 Nov 2003, Rayson Ho wrote: > --- Mark Hahn wrote: > > and their yield is around 59%. not to mention the little bit > > of missing ECC... > > I didn't follow, which "yield" are you refering to?? > > > I'd like very much to know the actual prices and discounts for Big > > Mac. > > it's a shame this isn't required for Top500... > > The price is around 1100*3000*(discount for edu) + cost interconnect > > 5.2M $ is not too far away from the "actual price"... You've apparently never priced 27 96-port IB switches (plus cables)! _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Mon Nov 17 16:08:40 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Mon, 17 Nov 2003 13:08:40 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: <3FB8F729.3080709@lmco.com> Message-ID: <20031117210840.GC25979@sphere.math.ucdavis.edu> > To put it another way, it is bloody silly to take an N year budget and > spend it all in year one on computing hardware, because compute capacity > that can be purchased at constant cost grows exponentially while compute > capacity that has been purchased AT fixed cost depreciates exponentially > and has a rather high baseline operating cost. It also means that you > >>really<< pay for a design error. If this enormous 1100 node cluster, > designed and purchased all at once, has any design flaw with a repair > cost that scales like the number of nodes, it would be ruinous. If one > had only bought (say) 1/4 of the nodes in year one, 1/4 more in year > two, 1/4 more in year three, and 1/4 more in year 4, one would get > roughly: Having just sat through a Production Clusters talk at SC2003, I figured it would be worth mentioning the downside of yearly upgrades. Hetrogenious clusters are a nightmare, at least linear scaling in support costs, and if your running large codes you can can get zero scaling. I.e. 250 nodes a year, at the end of 4 years you can run 250 fast nodes, or 1000 nodes at the speed of the 1st years. The opinion of the 4 speakers giving the talk was buy a cluster large enough to keep it till replaced. This dramatically decreases support costs, keeps things simple for the end users, keeps the batch queue simpler, and stops silly things like a BIOS upgrade for some of the nodes taking down the entire cluster. Certifying a large body of applications, user tools, quota monitoring, sensor monitoring etc for a particular configuration is alot of work. Numerous nightmares were reported even for "identical" nodes that ended up coming from different factories. Large site installations spend alot of sweat and tears becoming intimiately familar with their hardware. Analyzing failure rates, how to read various temp sensors, monitoring of various types, etc. Building a cluster 1 year at a time can work of course, especially if your jobs are never bigger then a single years purchase, but it's not free. In many cases when your support staff limited (seems very common) you might be better off with a cluster every couple years. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Mon Nov 17 16:36:28 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Mon, 17 Nov 2003 13:36:28 -0800 Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: References: <20031117152049.99076.qmail@web11406.mail.yahoo.com> Message-ID: <20031117213628.GA26471@sphere.math.ucdavis.edu> After all this discussion of the top 500 list, it got me thinking about a "better" benchmark. Where "better" means more useful to evaluating my idea of cluster goodness. So what is hard about large clusters? Seems to me like it is primary scaling. What controls the scaling? Mostly the interconnect. So we primarily need to evaluate the interconnect and how it performs in a large cluster environment. Additionally getting an account or even the hardware to evaluate single cpu performance of a IT2, G5, P4, or Opteron is fairly easy and direct. Of course there are characteristics inside the box that effect scaling outside, but I'd argue these effects are much smaller then the effects of the interconnect. So what would a better benchmark look like? Bisectional bandwidth is of course interesting, although it's a fairly gross measure. How about something along the lines of: * Minimal CPU work, only enough to ensure correctness. * MPI based (focus on user visible performance) * Provide scores for sending messages 1,10,100,1000,10000 64 bit numbers * Have a random mode (any node can talk to any other) * Have a nearest neighbor mode (end user can define arbitrary mapping of virtual nodes to physical nodes for maximum performance.) * Run on 8, 16, ... 2^N nodes (for pretty scaling graphs) For shared memory machines it's much tougher, I don't know of any portable way to insure remote page allocation. Maybe have each cpu allocate 512 MB arrays, access it for a million times, then swap pointers, start the clock and measure the bandwidth per CPU to that memory (wherever it was allocated). Does anyone know of similar tools for doing this? If not do people think it would be worthwhile? If so I'd be willing to take a shot at writing the MPI version. Anyone interested in a SC2003 BOF to discuss it? Feedback? Comments? -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Nov 17 17:02:49 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 17 Nov 2003 14:02:49 -0800 Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: <3FB8F729.3080709@lmco.com> Message-ID: <5.2.0.9.2.20031117134023.037dd200@mailhost4.jpl.nasa.gov> rgb wrote: >With 1100 dual CPU nodes drawing perhaps 250 Watts apiece, the room >needs some 275 KW of capacity, maybe 180 20 amp circuits (assuming one >can drive roughly six nodes per circuit). This costs ballpark estimate >of $275,000/year just to feed and cool the nodes, more than the racks >themselves. The capital cost of the circuits, transformers, space >renovation, and the chillers required to drive this cluster would likely >add another seven digit number to your estimate and is a lot less >ignorable than the cost of the racks or network;-) Just the AC receptacles, boxes, and conduit (along with electricians to install it) alone will be a significant cost.. For comparison, when my tract house was built, they charged a flat fee of $50 to add a receptacle; for putting in conduit, installing a duplex receptacle, pulling the wire, and attaching it to the distribution panel in an industrial environment, you could figure about $30-50 in materials and a couple hours in labor (@ $50/hr fully burdened). Just to do some quick back of the enveloping, lets assume $150/receptacle. Say 200 circuits (based on rgb's calculation above), so you're at $30K, just for the end of the wire. A typical 50 kVA pad mount single phase transformer runs about $1500-2000, plus about $700 to install it, and you'd need at least 6, probably more like 9, so that's another $20K. There's also panels, overcurrent protection, grounding, etc., getting the P.E. to design the system and sign and seal plans (and we licensed engineers don't come cheap). The infrastructure for a job like this would be many hundreds of thousands of dollars, before you rolled in the first rack of computers. >Small nuclear power plant optional... > >Now the pizza cost, that can be ignored. Unless it's a government funded facility, where OMB guidelines (and, more importantly, instiutional interpretation) say that provision of meals (in distinction to snacks at a meeting) is verboten (donuts: OK, bagels: NO; because bagels are food and doughnuts are not) >The other thing that always amuses me about clusters like this is the >Moore's Law effect. They buy it this year, after spending a year >(easily) preparing the site and building the requisite infrastructure. >They operate it for three years (spending $2.25 million, say). In the >meantime, node power at constant cost has increased by a factor of 4. >If they invested their capital in bonds for those three years (including >the operating budget), and bought that 4x faster node hardware, they >would BREAK EVEN on the amount of work they get done by year four, and >have saved three years operating expenses plus interest in addition to >the interest on the entire capital amount for three years -- an easy $3+ >million. > Unless one gets partial results early on that make the later years of analysis and computing more efficient. Difficult to quantify, but an important factor. Also, there is a certain fixed amount of labor for "fiddling around to get it all to work" that will apply at the beginning of the computation, and earlier is better, because you're paying with non-inflated dollars. In fact, here is a great argument for scalable clusters. You can invest in all the infrastructure up front (because it's generally cheaper to buy things like buildings all at once) and implement a smaller cluster to get through the teething pains, and then, as the performance of the hardware improves, upgrade the cluster along the way. If you haven't tied the computation inextricably to the particular implementation, then this may provide a more efficient/optimum use of a fixed amount of capital. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Mon Nov 17 17:31:32 2003 From: david.n.lombard at intel.com (Lombard, David N) Date: Mon, 17 Nov 2003 14:31:32 -0800 Subject: top500 list (was: opteron VS Itanium 2) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4BF32@orsmsx402.jf.intel.com> From: canon at nersc.gov, Monday, November 17, 2003 11:38 AM > ... At this point, > Big Mac is a one-of-a-kind. The Apple crowd has never even > looked at HPC before this (probably because its typically a money loser). Actually, the "Apple crowd" had been making the rounds, at least the ISV rounds, at Cluster World in June of this year. Don't know how long before that they were (certainly not at LW in Jan) or if their approach was a locality affect (again, LW in Jan in NYC). Perhaps though, the cold^H^H^H^HFRIGID weather kept them away from LW at NYC (kudos for not being *that* dumb ;^) -- David N. Lombard My comments represent my opinions, not those of Intel. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Nov 17 17:46:54 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 18 Nov 2003 09:46:54 +1100 Subject: Sun to start selling Opteron systems - official + Sun/AMD to work with community to create 64-bit Linux ABI Message-ID: <200311180946.58367.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Usual PR stuff, but the last part I've quoted (which ran on from the bit about 64-bit Solaris for Opteron originally) about joint work on a 64-bit Linux ABI seems the most interesting part to me. http://www.sun.com/smi/Press/sunflash/2003-11/sunflash.20031117.2.html [quote] With today's announcement that Sun Microsystems, Inc. (Nasdaq: SUNW) and AMD (NYSE: AMD) have formed an alliance to deliver a broad range of AMD Opteron[tm] processor-based systems, Sun also announced it plans to offer its Java Enterprise System on the AMD Opteron processor and is significantly extending the reach of its Solaris Operating System (OS) and leadership in the 64-bit space. [...] [...] The Solaris OS on the 64-bit AMD Opteron processor platform is expected to be available in the first half of 2004 through Sun's innovative early-access Software Express for Solaris program. Furthermore, Sun and AMD intend to work jointly with the Linux community to define and promote a 64-bit UNIX(r)-Linux Application Binary Interface (ABI) to enable interoperability. UNIX or Linux applications could run natively on any operating systems supporting this ABI. [/quote] Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/uU/eO2KABBYQAh8RAgxCAJ99Qp58juNNNSSecu+WtaaaXTLuOQCdFgmG w6MdeWwomodfTZ41E/B2YnA= =sbfy -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Nov 17 18:28:34 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Mon, 17 Nov 2003 15:28:34 -0800 (PST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4BF32@orsmsx402.jf.intel.com> Message-ID: On Mon, 17 Nov 2003, Lombard, David N wrote: > From: canon at nersc.gov, Monday, November 17, 2003 11:38 AM > > ... At this point, > > Big Mac is a one-of-a-kind. The Apple crowd has never even > > looked at HPC before this (probably because its typically a money > loser). they're had a cluster version of the xserve since the last major rev of the platform... making the rounds, and "actually releveant" to people building clusters are kind of different things. I have a mac (among several other machines) on my desk, and while it runs linux fairly well, I'm not terribly convinced that my goals and those of steve jobs/apple computer are terribly well aligned. > Actually, the "Apple crowd" had been making the rounds, at least the ISV > rounds, at Cluster World in June of this year. Don't know how long > before that they were (certainly not at LW in Jan) or if their approach > was a locality affect (again, LW in Jan in NYC). Perhaps though, the > cold^H^H^H^HFRIGID weather kept them away from LW at NYC (kudos for not > being *that* dumb ;^) > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Mon Nov 17 18:44:19 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: Mon, 17 Nov 2003 17:44:19 -0600 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4BF32@orsmsx402.jf.intel.com> Message-ID: On Monday, November 17, 2003, at 04:31 PM, Lombard, David N wrote: > From: canon at nersc.gov, Monday, November 17, 2003 11:38 AM >> ... At this point, >> Big Mac is a one-of-a-kind. The Apple crowd has never even >> looked at HPC before this (probably because its typically a money > loser). > > Actually, the "Apple crowd" had been making the rounds, at least the > ISV > rounds, at Cluster World in June of this year. Don't know how long > before that they were (certainly not at LW in Jan) or if their approach > was a locality affect (again, LW in Jan in NYC). Perhaps though, the > cold^H^H^H^HFRIGID weather kept them away from LW at NYC (kudos for not > being *that* dumb ;^) > Actually I think Apple folks have been sniffing around bioinformatics for a while, but overall lacked in the floating point arena to make an impact in other areas of HPC. You also have to keep in mind that it is no longer just the "apple crowd". IBM is also sniffing around using the PPC for HPC. The G5 blades that they are working on are pretty good evidence of that. I think they should hold the January LW in Minneapolis. That would *definitely* indicate who is dedicated and who is not. ;-) -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 19:56:46 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 19:56:46 -0500 (EST) Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: <5.2.0.9.2.20031117134023.037dd200@mailhost4.jpl.nasa.gov> Message-ID: On Mon, 17 Nov 2003, Jim Lux wrote: > Say 200 circuits (based on rgb's calculation above), so you're at $30K, > just for the end of the wire. A typical 50 kVA pad mount single phase > transformer runs about $1500-2000, plus about $700 to install it, and you'd > need at least 6, probably more like 9, so that's another $20K. There's also > panels, overcurrent protection, grounding, etc., getting the P.E. to design > the system and sign and seal plans (and we licensed engineers don't come > cheap). The infrastructure for a job like this would be many hundreds > of thousands of dollars, before you rolled in the first rack of computers. Ya. And the AC might well cost several times the electrical circuits. And don't forget all the network wiring (as opposed to NICs and switches). Lots of pulls, cable trays, maybe raised floor action (this looks like a high rent cluster likely to have a raised floor design and custom cabinets). Infrastructure and renovation costs pretty much EQUALLED the costs of the first 100+ nodes we moved into our new cluster space, and their capacity looks like it is many times ours. > >The other thing that always amuses me about clusters like this is the > >Moore's Law effect. They buy it this year, after spending a year > >(easily) preparing the site and building the requisite infrastructure. > >They operate it for three years (spending $2.25 million, say). In the > >meantime, node power at constant cost has increased by a factor of 4. > >If they invested their capital in bonds for those three years (including > >the operating budget), and bought that 4x faster node hardware, they > >would BREAK EVEN on the amount of work they get done by year four, and > >have saved three years operating expenses plus interest in addition to > >the interest on the entire capital amount for three years -- an easy $3+ > >million. > > Unless one gets partial results early on that make the later years of > analysis and computing more efficient. Difficult to quantify, but an > important factor. Also, there is a certain fixed amount of labor for > "fiddling around to get it all to work" that will apply at the beginning of > the computation, and earlier is better, because you're paying with > non-inflated dollars. Oh, yeah, you and Bill are right. My argument was simplistic and won't apply in all cases (especially as Bill noted if you're trying to scale a real parallel computation across all N nodes all at once, and not just divvying up compute cycles amongst a large number of users none of whom are running computations that can scale to more than N/4 nodes anyway). > In fact, here is a great argument for scalable clusters. You can invest in > all the infrastructure up front (because it's generally cheaper to buy > things like buildings all at once) and implement a smaller cluster to get > through the teething pains, and then, as the performance of the hardware > improves, upgrade the cluster along the way. > If you haven't tied the computation inextricably to the particular > implementation, then this may provide a more efficient/optimum use of a > fixed amount of capital. This is my general feeling. The other point is that for VTech to get funding for a "supercluster" like this once is a strike of lightning -- $10 million dollar projects don't fall in your lap every day. However, in 4-5 years tops, the hardware is going to be aged out (in six years contemporary computers will have a LOT more memory per node, processors that are estimatable to be 16x as fast, we might be up to REALLY fast networks or fast networks might be really cheap -- who knows?). Some joker like me will be able to build a cluster in their basement for $100K and equal its throughput, especially when scaling penalties on 1100 nodes are taken into account. So they'll have to go BACK to the well early and often, just like a real supercomputer center, or be obsoleted out of relevance by Moore's Law. And if they go back to the well every year, well, they're adopting a scalable cluster model. This is the killer -- what exactly will they DO with the cluster that is worth $7 million, plus the better part of a million a year just to run it? Not a whole lot of projects out there that are worth the up-front investment. It's really a matter of mindset. I've seen or heard of lots of very very expensive computers designed and assembled to accomplish some "really important" computation "really fast" that have been funded by all sorts of deep pocketed government agencies. In some of those cases, building the computer was so difficult that it didn't even get finished before Moore's Law overtook it at 1/10th the cost using commodity hardware (anything that takes years to build is at real risk of this). Worse, a lot of the research funded this way isn't really burning issue stuff in that the outcome won't change people's lives. Worth doing, sure, but not worth spending millions on to get a year or two earlier. Moore's Law just trundles right along, and now we're spending huge amounts to reach for teraflops, where a decade ago we were spending huge amounts to reach for gigaflops and a decade before THAT a megaflop was awesomely expensive. Well hell, I do gigaflops at home these days, for a few thousand dollars total. In ten more years, Inshallah, I'll be doing teraflops on my desktop and my personal digital assistant in my shirt pocket will be doing gigaflops:-). It really is a matter of waiting or not waiting to accomplish particular tasks. The REALLY big iron guys (or REALLY big cluster guys:-) hate to hear that -- they make a living from their really big supercomputers that live out on the bleeding edge. So I'm not surprised to hear that four out of four reject a scalable approach in favor of the big project model. The big science guys hate it too. Doesn't stop it from being true...at least for some projects. YMMV, and I'm not trying to break anybody's dolly;-) rgb P.S. -- anybody remember the good old days, when you'd have been arrested and put in jail as a traitor to the American Way if you'd sold a Russian or Chinese person a Gigaflop-capable computer because they could use it to Simulate Nuclear Devices? Developing GHz CPUs sort of put a squeeze on THAT idea, ay? Especially with the beowulf model to pursue. Now beowulfs are being built that follow the big iron model. We have met the enemy and it is us... > > > James Lux, P.E. > Spacecraft Telecommunications Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Nov 18 00:01:21 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 17 Nov 2003 21:01:21 -0800 Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) References: Message-ID: <008601c3ad94$4f2648e0$32a8a8c0@laptop152422> Some philosophical comments below (and what is a list like this for, if not philosophical comments) rgb wrote: > It's really a matter of mindset. I've seen or heard of lots of very > very expensive computers designed and assembled to accomplish some > "really important" computation "really fast" that have been funded by > all sorts of deep pocketed government agencies. In some of those cases, > building the computer was so difficult that it didn't even get finished > before Moore's Law overtook it at 1/10th the cost using commodity > hardware (anything that takes years to build is at real risk of this). > Worse, a lot of the research funded this way isn't really burning issue > stuff in that the outcome won't change people's lives. Of course, one could make this argument about particle physics or deep space exploration. Whether we find that next particle or discover life on Europa or verify Einstein or find water on Mars won't affect a significant fraction of the lives on Earth anytime soon (except those, like me, who get paid to facilitate such exploration). However, aside from the "white collar welfare" aspect (not an aspect to be totally disregarded, what with pork barrels and such), there are practical and immediate benefits. While the actual application may not have much immediate need, it might provide a framework, and specific application, that drives a development which has general application. Sometimes, a specific problem is needed to get work rolling, rather than sitting in a "what might be the optimum general solution" analysis mode for years. If the problem is stated as "determine X", then something needs to get done, clusters need to get built (however inefficient), technology needs to be developed, which is then "inserted" into succeeding projects/missions etc. Also, for anything novel, there's always the "I'm not going first" problem. Like penguins wondering if there's a leopard seal in the water, someone's got to jump in and show that you won't die instantly. Sometimes, those programs of perceived little value (and hence, little opprobrium if you fail) provide the mechanism to demonstrate a new technology. Jim Lux _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Tue Nov 18 02:12:46 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Tue, 18 Nov 2003 08:12:46 +0100 Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: <20031117213628.GA26471@sphere.math.ucdavis.edu> References: <20031117152049.99076.qmail@web11406.mail.yahoo.com> <20031117213628.GA26471@sphere.math.ucdavis.edu> Message-ID: <20031118071246.GC17558@unthought.net> On Mon, Nov 17, 2003 at 01:36:28PM -0800, Bill Broadley wrote: > > After all this discussion of the top 500 list, it got me thinking about a > "better" benchmark. Where "better" means more useful to evaluating my > idea of cluster goodness. There are lies, damn lies, and statistics... Your points about a more appropriate benchmark are valid - but we must realize that there is not such thing as "the one true benchmark". Some clusters are tailored for one specific workload - one app. that has been written for the cluster, as the cluster was built for the app. In those situations, you can run that app on the cluster and get your "true performance" metric. For most of the top machines, I'd be rather surprised if there hadn't been a pretty clear idea about what the machines would be running, prior to purchase. A general list such as Top500 needs one benchmark which will arguably be both unfair and even irrelevant for a large number of the systems on the list. (example: if all I do is factor large numbers, I don't care what the Linpack performance of my machine is - I may well have a system that does factoring 10 times faster than the Earth Simulator, while my system cannot even make the Top500). All in all - for a list as Top500, having *one* *simple* benchmark that is *well known*, is really the true value of the list. Having a "fairer" benchmark with more numbers (one number is as you argue and as per my previous example, irrelevant in many if not most cases), would in my oppinion not be a gain for the usefulness of the list as such. It's not what the Top500 is for. The Top500 is for "who's got big iron that can do Linpack really fast". Chances are such big iron will perform other tasks really fast as well, but we don't know, and if the Top500 could tell us, the list would be so massively complicated that we couldn't use it for anything at all in the first place anyway. I think that having one poor (but well known and simple) metric is the better solution. -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lars at meshtechnologies.com Tue Nov 18 04:41:56 2003 From: lars at meshtechnologies.com (Lars Henriksen) Date: Tue, 18 Nov 2003 09:41:56 +0000 Subject: GenericNQS batch system Message-ID: <1069148516.7118.26.camel@tp1.mesh-hq> Dear beowulfers I'm having some problems with the Generic NQS batch system. Creating and using queues on a single host works fine,, but when i try to submit jobs to queues on remote hosts, it does not work. Does anyone have experience with that kind of operation? Here is what i've done: On the scheduling host (host1): # qmgr create pipe sched-queue destination = exe-in at host2 # qmgr set lb_out sched-queue # qmgr enable queue sched-queue On the host that has to do the job execution (host2): # qmgr create batch exe-queue pipeonly # qmgr create pipe exe-in pipeonly destination exe-queue # qmgr set lb_in exe-in # qmgr enable queue exe-queue # qmgr enable queue run-in # qmgr set scheduler host1 In 'nmapmgr' on both host, entries has been added both for principal names and aliases. /etc/hosts.nqs looks like this on both hosts: * * So when i try to submit at job to the system on host1: (top of job description file:) ------- #QSUB-q sched-queue #QSUB-eo #QSUB-r test ------- nothing happens :-( edited syslog from the host where submission is made: host1 NQS daemon[7467]: psc_spawn: Rqst not scheduled due to none there. host1 NQS daemon[7467]: psc_spawn: Rqst not scheduled due to none there. host1 NQS Pipeclient[5899]: Process logging started at Tue Nov 18 10:24:36 2003 host1 NQS Netdaemon[5900]: Netdaemon: Connection from host1 host1 NQS Pipeclient[5899]: Unable to deliver request 31 to a destination host1 NQS Pipeclient[5899]: Msg #2:Scheduling request for retry at a later time host1 NQS Pipeclient[5899]: Msg #2:Request rescheduled; exiting A 'qstat -x' shows this: Destset = {exe-in at host2 [RETRY] }; I'm kinda baffled by this... Well thanks for your patience in reading this. I hope some of you can give me some pointers... best regards Lars -- Lars Henriksen | MESH-Technologies A/S Systems Manager & Consultant | Forskerparken 10 www.meshtechnologies.com | DK-5230 Odense M, Denmark lars at meshtechnologies.com | mobile: +45 2291 2904 direct: +45 6315 7310 | fax: +45 6315 7314 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Tue Nov 18 06:01:12 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Tue, 18 Nov 2003 06:01:12 -0500 Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB9FBF8.9080400@lmco.com> Robert G. Brown wrote: > It's really a matter of mindset. I've seen or heard of lots of very > very expensive computers designed and assembled to accomplish some > "really important" computation "really fast" that have been funded by > all sorts of deep pocketed government agencies. In some of those cases, > building the computer was so difficult that it didn't even get finished > before Moore's Law overtook it at 1/10th the cost using commodity > hardware (anything that takes years to build is at real risk of this). > Worse, a lot of the research funded this way isn't really burning issue > stuff in that the outcome won't change people's lives. Worth doing, > sure, but not worth spending millions on to get a year or two earlier. > Moore's Law just trundles right along, and now we're spending huge > amounts to reach for teraflops, where a decade ago we were spending huge > amounts to reach for gigaflops and a decade before THAT a megaflop was > awesomely expensive. > > Well hell, I do gigaflops at home these days, for a few thousand dollars > total. In ten more years, Inshallah, I'll be doing teraflops on my > desktop and my personal digital assistant in my shirt pocket will be > doing gigaflops:-). It really is a matter of waiting or not waiting to > accomplish particular tasks. The REALLY big iron guys (or REALLY big > cluster guys:-) hate to hear that -- they make a living from their > really big supercomputers that live out on the bleeding edge. So I'm > not surprised to hear that four out of four reject a scalable approach > in favor of the big project model. The big science guys hate it too. > Bob, I think it's about time you posted a quick review of the little scenario you came up with regarding having a pot of money and a project to finish in a certain amount of time. It's the one where you showed that it's better (more cost effective) to wait until the project is almost due, buy the fastest cluster you need, and run the code, rather than buy the fastest machine at the beginning of the project and compute the rest of the time. This analysis was beautiful and very insightful. I think alot of people would benefit from reading it. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 18 09:21:23 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 18 Nov 2003 09:21:23 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: <20031118071246.GC17558@unthought.net> Message-ID: On Tue, 18 Nov 2003, Jakob Oestergaard wrote: > On Mon, Nov 17, 2003 at 01:36:28PM -0800, Bill Broadley wrote: > > > > After all this discussion of the top 500 list, it got me thinking about a > > "better" benchmark. Where "better" means more useful to evaluating my > > idea of cluster goodness. > > There are lies, damn lies, and statistics... > > Your points about a more appropriate benchmark are valid - but we must > realize that there is not such thing as "the one true benchmark". > > Some clusters are tailored for one specific workload - one app. that has > been written for the cluster, as the cluster was built for the app. In > those situations, you can run that app on the cluster and get your "true > performance" metric. I agree and disagree. I personally have a deep and abiding mistrust of high end benchmarks -- benchmarks of complex code -- unless they are MY complex code. Things like linpack and spec are useful only to the extent that one or more components "resembles" your application. Screw resemblance -- test your application. However, I think Bill's points are very well taken, so much so that I saved the article in my "List Ideas" directory for eventual reconsideration and mention in an article or the book. I also think that MICROBENCHMARKS are very useful indeed to systems and cluster engineers. Things like lmbench or stream or netpipes are small (generally nearly trivial code) and relatively insensitive to compiler/architecture quirks, or at least if they are they are likely to be sensitive in ways that do translate to arbitrary applications that use the tested operations. They are also a LOT harder to "fool", especially if the microbenchmarks can be run by anybody from a GPL source base. The vendor cannot easily fudge a benchmark if you put your benchmark source on a vanilla Linux install, compile it, and run it. Or again, if they do "fudge" somehow under those circumstances (perhaps by warping an entire architecture to optimize some result:-) it is likely that a real application will benefit from the optimized operation, even if other operations elsewhere suffer. The latter sort of tradeoff is why Larry McVoy insists that lmbench (which can be run, of course, any way a user likes, a microbenchmark at a time) can only be used to publish >>results<< if a full suite of results are published, not "selected" ones on which a vendor does well. This is intended to prevent the kind of abuse that early benchmarks were notorious for attracting (and that likely continues today). Chip real estate ALSO goes through various opportunity cost decisioning processes (re: previous post on grant processes:-) and a new LU to optimize process X comes at the expense of e.g. on-chip context storage, more registers, heat production and hence higher clock. At some point you are robbing peter to pay paul, and the issue becomes one of balance. The balance issue extends out to the rest of the architecture, as has increasingly been a list focus. CPU clock has consistently outpaced memory (in Moore's Law exponent); both have WAY outpaced the network. Disk has outpaced even the CPU in volume, but lagged even the network in speed. So I personally would like to see a full suite of microbenchmarks -- literally trivial components wrapped in a timing harness. These should measure core functions that are building blocks of real programs. Many of these computational component measurements exist for standalone systems; not so many for clustered systems. I think this is the intriguing element of Bill's suggestions. A benchmark graph of just how long it takes to use raw UDP or TCP sockets, MPI, PVM to pass a message according to one of several patterns, plotted as a 2d/3d function of e.g. message size and number of nodes, together with stream results (and perhaps some of the other cpu_rate or lmbench benchmarks, depending on your arithmetic mix) would be a lot more openly informative than what gets published now. For one thing, it would separate out a lot of the bullshit associated with "top 500-ness". We could look at two clusters and compare their actual performance in important metrics at a glance, instead of wondering who could possibly give a rodent's furry behind about tools that de facto are just ONE possible measure of aggregate CPU in ONE set of fairly complex operations out of a practical infinity that might actually occur in our code. > For most of the top machines, I'd be rather surprised if there hadn't > been a pretty clear idea about what the machines would be running, prior > to purchase. ;-) I think you're right... > I think that having one poor (but well known and simple) metric is the > better solution. It does make it simple, but it doesn't make it better. It's the old issue -- "how many MFLOPS -> GFLOPS -> TFLOPS is your cluster?" (arrows indicate the progress of roughly decades). Who's di..um, I mean "cluster" is bigger. First, tell me what the HELL a MFLOP is. My microbenchmark measurements of a MFLOP don't agree with any of the accepted definitions, and vary significantly with whether or not e.g. division is included in the "floating point operations" tested. Since division is so slow, it is almost always omitted from computations of FLOPS. Since division is so common, people wonder why even their simple loops with division in them don't ever achieve the blazing throughput they expected. Then there are the rather immense variations in performance observed as e.g. the size of vectors is varied, code is driven from local/sequential to nonlocal/random. Cluster engineers are not stupid. Well, maybe SOME of them are stupid, somewhere, but I haven't met any that happened to be drooling and looking off in the distance with a vacant expression. Unless a beer happened to be sitting in front of them, of course. I think that they could manage to learn to use a very complex (but well documented, GPL) instrument set to support intelligent cluster design. Hell, I think most of the good people on this list use a complex but NOT terribly integrated set to support intelligent cluster design now! As I said, stream, netpipes, even spec (there ARE people whose tasks match decently with at least one component). And of course, the best of benchmarks, your application, but >>even optimizing your application<< requires knowledge only a microbenchmark can provide. The benefits of using this sort of information intelligently can equal the output of your entire cluster put together. Dongarra's ATLAS project is a shining beacon for what can be done in this regard. Factors of 2-3 speedup are not unknown for what CAN be core operations in many computations, just automagically adjusting algorithm and stride to take maximal advantage of register/L1/L2/memory latencies and bandwidths and the underlying CPU/chipset. It is pretty much the ONLY way one can achieve superlinear speedup -- know where significant nonlinearities in bottleneck speed occur and partition the task accordingly. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 18 08:40:33 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 18 Nov 2003 08:40:33 -0500 (EST) Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: <008601c3ad94$4f2648e0$32a8a8c0@laptop152422> Message-ID: On Mon, 17 Nov 2003, Jim Lux wrote: > Also, for anything novel, there's always the "I'm not going first" problem. > Like penguins wondering if there's a leopard seal in the water, someone's > got to jump in and show that you won't die instantly. Sometimes, those > programs of perceived little value (and hence, little opprobrium if you > fail) provide the mechanism to demonstrate a new technology. Agreed. However, the fundamental underlying issue is economics. You have an approximately fixed budget of X billions of dollars allocated to publically funded research in the US. This money is distributed among many agencies for targeted disbursement. The target selection process (as you note) contains elements of national, state, local, and scientific politics -- there is plenty of pork in it. Some does indeed get distributed as a sort of jobs program for starving corporations who not completely coincidentally made large donations to selected politicians (often on both sides of a race). Other parts go to fund some scientific director's pet project. However, at the crux of each funding decision, politics or no, is the issue of opportunity cost. It was opportunity cost that ultimately brought down the SSC. It is never and "and" operator with funding, it is inherently an "or" operator, given a fixed budget (and if the budget is deliberately expanded to include somebody's pet project, the "or" operator needs apply to the expanded but still fixed pool, even if that decisioning is done at a very coarse granularity and one decision level up, e.g. the US Senate). So I totally agree with everything you say. Sure, we need to climb certain scientific mountains just because they are there, and trust that new worlds lie on the other side of some of them. HOWEVER, that does not release us from the obligation of making choices. For every project that is funded, the pool of funds is diminished, and alternative projects are rejected and not funded. My personal research colleague is an ARO grant officer, and I am fairly frequently treated to a view of this from the other side -- so much he'd LIKE to fund, so finite a pool of resources to fund from, so much politics that sends huge chunks of money to specific venues outside of the normal review and selection process. It is difficult to raise oneself to a sufficiently elevated level to even begin to judge a lot of this. However, waste openly abounds. I know of quite a few places, for example, that have bought e.g. SP2's to do HPC computations in years past. These are (were), recall, quite expensive boxes. Naturally, they were publically funded from various grants. At the time they were purchased, the beowulf model was already well known, and on a per-processor/per-cycle basis a competing beowulf cost perhaps 1/5th as much. The grantholders even KNEW about the beowulf model, and were using the systems to run primarily embarrassingly parallel applications that would have run efficiently on a pile of PCs and sneakernet. However, politics or open ignorance or "deals" cut by IBM, bewteen one thing or another there they were with $500,000 computers whose actual benefit to their owners could easily be matched by $100,000 beowulfs, even on considerably finer grained code than they were running. Then there is the ADDITIONAL issue of whether the work being done was worth the cost of the hardware, compared to all the other work that might have been done with that money. I'm sure that the money from sales like these floated IBM's boat through tough times, and kept its sales and engineeering force from having to go on welfare, and I'm not even sarcastic about it. The same hand ultimately feeds me, after all, and I have no wish to bite it. However, there >>is<< the opportunity cost issue of the extra $400K or so. If the work was really valuable, perhaps it could have been completed much faster with a more intelligent cluster model. If an intelligent cluster model had been used at the lower rate, perhaps some other deserving project could have been funded to keep ITS researchers and support people out of homeless shelters. Choice is essential. Cost-benefit is at the heart of economic choice. Where admittedly, the liver is politics...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 18 13:23:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 18 Nov 2003 13:23:13 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Tue, 18 Nov 2003, John Hearns wrote: > There's a page frpm Paralogic with a packaged set of > benchmarking tools http://www.plogic.com/bps/ > > Maybe could be a start to your ideas? Doug (Eadline) and I have talked about this for years now, and he has put together a small package that are used for design purposes, I believe, at paralogic. Maybe he'll write an article about this himself in his new mag...:-) I don't think that they are quite "done", though, (at least the last time I checked) so yes, I'd call it a "start" to the idea. Not really my idea, as you can see. I think there are lots of folks who have thought on this, and lots more that have a de facto suite they use whether or not they are packaged. lmbench, netpipe, netperf, bonnie, memtest86 -- lots of tools out there for doing bits of this, some of them very nice. cpu_rate (which is available on my own website under the Beowulf link) is another such tool. I've no time to work on it just now, but I'm in midstream on a fairly major rewrite to really separate out the timing harness and test invocation process so that code snippets can be wrapped in a standard subroutine pro/epilogue and timed, with correct subtraction of the subtroutine overhead. This isn't as easy as you might think (at least to get consistent results) but when it is finished cpu_rate should be a highly extensible way to wrap up anything from microbenchmarks to specific code fragments you want to test. One day we might even get a little group together at a meeting and kick around specs for a really nice, full GPL cluster exerciser toolset that can test, benchmark, and help debug problems with clusters large and small. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Nov 18 13:00:32 2003 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 18 Nov 2003 19:00:32 +0100 (CET) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: There's a page frpm Paralogic with a packaged set of benchmarking tools http://www.plogic.com/bps/ Maybe could be a start to your ideas? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Nov 18 13:54:33 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 18 Nov 2003 10:54:33 -0800 Subject: Heat, computers, etc. Message-ID: <5.2.0.9.2.20031118105317.02f8e608@mailhost4.jpl.nasa.gov> An interesting column from Robert X. Cringely talking about infrastructure issues, particularly power density in racked computers. http://www.pbs.org/cringely/pulpit/pulpit20031106.html James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Tue Nov 18 15:06:54 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue, 18 Nov 2003 14:06:54 -0600 Subject: Heat, computers, etc. Message-ID: <200311182006.hAIK6sP26138@mycroft.ahpcrc.org> James Lux wrote: >An interesting column from Robert X. Cringely talking about infrastructure >issues, particularly power density in racked computers. > >http://www.pbs.org/cringely/pulpit/pulpit20031106.html > Jim and All, Another paper I have found useful on the same subject is amoung the ADC white paper list (paper #46) at: http://www.apc.com/tools/mytools/index.cfm?action=search&category=whitepaper Search with power, cooling, and racks. It makes the point that the goal should not be simply to endlessly reduce rack square footage because high power density models have non-linear affects on ancillary power and cooling costs both in terms of the square feet they occupy on their own and their intrinsic cost. ADC posits this begins to occur around 4 KW per rack. The bottom line then (if you believe them) is that as per rack compute density goes up per chip wattage (and general per node wattage) must go down to retain the savings of a smaller foot print. Regards, rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com # rbw at ahpcrc.org # #--------------------------------------------------- # "What you can do, or dream you can, begin it; # Boldness has genius, power, and magic in it." # -Goethe #--------------------------------------------------- # "Without mystery, there can be no authority." # -Charles DeGaulle #--------------------------------------------------- # Nullum magnum ingenium sine mixtura dementiae fuit. # - Seneca #--------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From seth at hogg.org Tue Nov 18 16:37:09 2003 From: seth at hogg.org (Simon Hogg) Date: Tue, 18 Nov 2003 21:37:09 +0000 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <3FB7A1AE.5020307@comcast.net> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> <20031116154401.19933.qmail@web60309.mail.yahoo.com> Message-ID: <4.3.2.7.2.20031118213530.00accf00@pop.clara.net> At 11:11 16/11/03 -0500, Jeffrey B. Layton wrote: >One last comment. This next week is SC2003 so many of the >regular posters to this list won't be posting much. Having been away for 2 days (not at SC2003) and just checking my mail, I would just like to say 'au contraire'. Simon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Tue Nov 18 16:24:27 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Tue, 18 Nov 2003 16:24:27 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Tue, 18 Nov 2003, Robert G. Brown wrote: > On Tue, 18 Nov 2003, John Hearns wrote: > > > There's a page frpm Paralogic with a packaged set of > > benchmarking tools http://www.plogic.com/bps/ > > > > Maybe could be a start to your ideas? > > Doug (Eadline) and I have talked about this for years now, and he has > put together a small package that are used for design purposes, I > believe, at paralogic. Maybe he'll write an article about this himself > in his new mag...:-) Well, yes and more. We are going to address the benchmark thing in a bit more detail in the future. The BPS package is described at http://www.cluster-rant.com/article.pl?sid=03/03/17/1838236. It will be getting an upgrade soon and there will be some real codes added as well. Stay tuned. We will have an issue on featuring benchmarking as well. You will notice that it is call BPS (Beowulf Performance Suite) and not BBF (Beowulf Benchmark Suite). The reason is that BPS was not supposed to be a benchmark per se. It was intended to generate a baseline from which to measure the effect of changes to the cluster (i.e.new driver, new kernel, etc.) and to diagnose some problems. I intentionally omitted HPL because I did not want the suite to become a contest until it could provide good data on which good engineering decisions could be made. Doug > > I don't think that they are quite "done", though, (at least the last > time I checked) so yes, I'd call it a "start" to the idea. Not really > my idea, as you can see. I think there are lots of folks who have > thought on this, and lots more that have a de facto suite they use > whether or not they are packaged. lmbench, netpipe, netperf, bonnie, > memtest86 -- lots of tools out there for doing bits of this, some of > them very nice. > > cpu_rate (which is available on my own website under the Beowulf link) > is another such tool. I've no time to work on it just now, but I'm in > midstream on a fairly major rewrite to really separate out the timing > harness and test invocation process so that code snippets can be wrapped > in a standard subroutine pro/epilogue and timed, with correct > subtraction of the subtroutine overhead. This isn't as easy as you > might think (at least to get consistent results) but when it is finished > cpu_rate should be a highly extensible way to wrap up anything from > microbenchmarks to specific code fragments you want to test. > > One day we might even get a little group together at a meeting and kick > around specs for a really nice, full GPL cluster exerciser toolset that > can test, benchmark, and help debug problems with clusters large and > small. > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian.dobbins at yale.edu Tue Nov 18 15:26:09 2003 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Tue, 18 Nov 2003 15:26:09 -0500 (EST) Subject: Q: Any info on the PathScale compilers? Message-ID: Hi guys, I recently came across the announcement of the (upcoming) PathScale compilers for the Opteron platform - does anyone have any experience with them yet? Apparently they're at SC2003, so if any of you who happen to be there have come across them, what's the latest news? For those of you who aren't familiar with them, check out: http://www.pathscale.com/products1.html Thanks in advance! :-) - Brian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From galitz at uclink.berkeley.edu Tue Nov 18 19:26:12 2003 From: galitz at uclink.berkeley.edu (Geoff Galitz) Date: Tue, 18 Nov 2003 16:26:12 -0800 Subject: thermal sensing Message-ID: G'day. I need to put together a little system which can monitor the temperature of a machine room, and when a certain threshold is reached, trigger a program to run. I can handle the software side, but I'm not really sure where to begin looking on the hardware side. I've been to a few engineering web sites and catalogues but haven't really found just what I need in terms of hardware. I am looking for a temperature sensor that can simply go high or low when the threshold is reached. Any recommendations? If there is already a device or howto out there on how to do this, that would be great too. Thanks, -geoff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Tue Nov 18 18:39:06 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Tue, 18 Nov 2003 15:39:06 -0800 Subject: Q: Any info on the PathScale compilers? In-Reply-To: References: Message-ID: <20031118233906.GA520@sphere.math.ucdavis.edu> I picked up a broshure, they seem to be claiming to beat the competition and have full spec runs labeled estimates because they don't expect to run ship for 4 months (spec has a 3 month rule). I'll post more details when the material and my email access is in the same place. On Tue, Nov 18, 2003 at 03:26:09PM -0500, Brian Dobbins wrote: > > Hi guys, > > I recently came across the announcement of the (upcoming) PathScale > compilers for the Opteron platform - does anyone have any experience with > them yet? Apparently they're at SC2003, so if any of you who happen to be > there have come across them, what's the latest news? > > For those of you who aren't familiar with them, check out: > http://www.pathscale.com/products1.html > > Thanks in advance! :-) > - Brian > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Hans.Schwengeler at unibas.ch Mon Nov 17 08:08:19 2003 From: Hans.Schwengeler at unibas.ch (Hans Schwengeler) Date: Mon, 17 Nov 2003 14:08:19 +0100 Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC Message-ID: <200311171308.hAHD8J0A003109@ida.astro.unibas.ch> Dear Tony, I had once problems to get two 3C905CX to work in our slaves. One alone would work ok, but not two. I could solve the problem by using the 3c90x driver instead of the 3c59x. changes in /etc/modules.conf: alias eth1 3c90x alias eth2 3c90x (for the master) in /etc/beowulf/config.boot: pci 0x10b7 0x9200 3c90x pci 0x10b7 0x9800 3c90x pci 0x10b7 0x9805 3c90x (for the slaves) I have a Scyld bz27-8 system. Yours, Hans. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mof at labf.org Tue Nov 18 03:31:48 2003 From: mof at labf.org (Mof) Date: Tue, 18 Nov 2003 19:01:48 +1030 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <200311171937.hAHJbcfr011705@pookie.nersc.gov> References: <200311171937.hAHJbcfr011705@pookie.nersc.gov> Message-ID: <200311181901.49107.mof@labf.org> Speaking of which, does anyone know what VT intend to use the cluster for ? Mof. On Tue, 18 Nov 2003 06:07 am, canon at nersc.gov wrote: > My true measure for the top500 would be the value of the > science (or work) accomplished with it, a difficult to > impossible thing to determine. NERSC's puts all the > emphasis on the science. This means considering: how usable the system > is; how hard is it to harness the full capability of the system; > what will the sustained performance be. Then we try to squeeze > every cycle out of the system. We've ran Seaborg (#9) > with +90% utilization for years now. We've gotten tons of > science done with it, just like we did the T3E before it. > It can be a little disappointing to watch your system slide > down the rankings, when you know its still being used to do great > stuff and its still making a large impact. But I guess that's > just the nature of Moore's law. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Nov 18 22:24:30 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 19 Nov 2003 11:24:30 +0800 (CST) Subject: GenericNQS batch system In-Reply-To: <1069148516.7118.26.camel@tp1.mesh-hq> Message-ID: <20031119032430.22152.qmail@web16803.mail.tpe.yahoo.com> GNQS is really old, and there have been no improvements for a long time. Are u supporting legacy systems? Is using a different batch system an option? The two most popular batch systems these days are Gridengine (SGE) and Scalable PBS (SPBS). Since SGE is backed by Sun, so more R&D (and money too) are put into it, and more companies use it. On the other hand, SPBS is backed by Supercluster.org, which means that it should work better with Maui/Silver/Gold, and a lot of existing HPC sites are switching from OpenPBS (which has no new development) to ScalablePBS. Both SGE and SPBS are opensource. Lastly, the Condor team told me that once they clean up the build environment, they will release the source! SPBS: http://www.supercluster.org SGE: http://gridengine.sunsource.net Condor: http://www.cs.wisc.edu/condor/ Andrew. --- Lars Henriksen ???? > Dear beowulfers > > I'm having some problems with the Generic NQS batch > system. > > Creating and using queues on a single host works > fine,, but when i try > to submit jobs to queues on remote hosts, it does > not work. Does anyone > have experience with that kind of operation? > > Here is what i've done: > > On the scheduling host (host1): > > # qmgr create pipe sched-queue destination = > exe-in at host2 > # qmgr set lb_out sched-queue > # qmgr enable queue sched-queue > > On the host that has to do the job execution > (host2): > > # qmgr create batch exe-queue pipeonly > # qmgr create pipe exe-in pipeonly destination > exe-queue > # qmgr set lb_in exe-in > # qmgr enable queue exe-queue > # qmgr enable queue run-in > # qmgr set scheduler host1 > > In 'nmapmgr' on both host, entries has been added > both for principal > names and aliases. > > /etc/hosts.nqs looks like this on both hosts: > * * > > So when i try to submit at job to the system on > host1: > (top of job description file:) > ------- > #QSUB-q sched-queue > #QSUB-eo > #QSUB-r test > > ------- > > nothing happens :-( > > edited syslog from the host where submission is > made: > > host1 NQS daemon[7467]: psc_spawn: Rqst not > scheduled due to none there. > host1 NQS daemon[7467]: psc_spawn: Rqst not > scheduled due to none there. > host1 NQS Pipeclient[5899]: Process logging started > at Tue Nov 18 > 10:24:36 2003 > host1 NQS Netdaemon[5900]: Netdaemon: Connection > from host1 > host1 NQS Pipeclient[5899]: Unable to deliver > request 31 to a > destination > host1 NQS Pipeclient[5899]: Msg #2:Scheduling > request for retry at a > later time > host1 NQS Pipeclient[5899]: Msg #2:Request > rescheduled; exiting > > A 'qstat -x' shows this: > > > Destset = {exe-in at host2 [RETRY] > 12:35:10 CET 2003> > CET 2003> > }; > > > I'm kinda baffled by this... > > Well thanks for your patience in reading this. I > hope some of you can > give me some pointers... > > best regards > > Lars > -- > Lars Henriksen | MESH-Technologies > A/S > Systems Manager & Consultant | Forskerparken 10 > www.meshtechnologies.com | DK-5230 Odense M, > Denmark > lars at meshtechnologies.com | mobile: +45 2291 > 2904 > direct: +45 6315 7310 | fax: +45 6315 7314 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Nov 18 22:35:22 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 19 Nov 2003 11:35:22 +0800 (CST) Subject: Fwd: Open to the Public Colloquium with VT's Srinidhi Varadarajan In-Reply-To: <54B4C31E-1950-11D8-A838-000393838B9E@linguamediagroup.com> Message-ID: <20031119033522.23658.qmail@web16803.mail.tpe.yahoo.com> Since there are way too many guesses and "i think..." (and then continue with several thousand words descibing how bad a Mac cluster would be!) about BigMac, why don't you go to the following colloquium to find out the truth? Andrew. --- Garrett Cobarr ??? > The Johns Hopkins Applied Physics Lab in Laurel, > Maryland will host a > colloquium with Virginia Tech's Srinidhi Varadarajan > on December 5 > that's open to the public. > > http://www.jhuapl.edu/colloquium/schedule.html > _______________________________________________ > clusters mailing list | clusters at lists.apple.com > Help/Unsubscribe/Archives: > http://www.lists.apple.com/mailman/listinfo/clusters > Do not post admin requests to the list. They will be ignored. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Nov 18 23:39:00 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 18 Nov 2003 23:39:00 -0500 (EST) Subject: thermal sensing In-Reply-To: Message-ID: > I can handle the software side, but I'm not really > sure where to begin looking on the hardware side. ibutton and a serial interface, $25 or so. > hardware. I am looking for a temperature sensor that > can simply go high or low when the threshold is reached. gross. data is cheap, machines are fast; why not collect 16ths of a degree every few seconds? www.ibutton.com. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lars at meshtechnologies.com Wed Nov 19 01:58:13 2003 From: lars at meshtechnologies.com (Lars Henriksen) Date: 19 Nov 2003 06:58:13 +0000 Subject: GenericNQS batch system In-Reply-To: <20031119032430.22152.qmail@web16803.mail.tpe.yahoo.com> References: <20031119032430.22152.qmail@web16803.mail.tpe.yahoo.com> Message-ID: <1069225093.2286.6.camel@fermi> On Wed, 2003-11-19 at 03:24, Andrew Wang wrote: > Are u supporting legacy systems? Is using a different > batch system an option? Well, short of rewriting a large scripted system, i have no choice but to use GNQS :-( > The two most popular batch systems these days are > Gridengine (SGE) and Scalable PBS (SPBS). Yeah, i usually use SPBS. Thanks for your input, best regards Lars -- Lars Henriksen | MESH-Technologies A/S Systems Manager & Consultant | Forskerparken 10 www.meshtechnologies.com | DK-5260 Odense M, Denmark lars at meshtechnologies.com | mobile: +45 2291 2904 direct: +45 6315 7310 | fax: +45 6315 7314 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 19 07:52:38 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 19 Nov 2003 07:52:38 -0500 (EST) Subject: thermal sensing In-Reply-To: Message-ID: On Tue, 18 Nov 2003, Geoff Galitz wrote: > > > G'day. > > I need to put together a little system which can > monitor the temperature of a machine room, and > when a certain threshold is reached, trigger a > program to run. > > I can handle the software side, but I'm not really > sure where to begin looking on the hardware side. > I've been to a few engineering web sites and catalogues > but haven't really found just what I need in terms of > hardware. I am looking for a temperature sensor that > can simply go high or low when the threshold is reached. > Any recommendations? There are a bunch of links on http://www.phy.duke.edu/brahma for temperature sensors, and there is even a place where you can get a "kit" of components and build your own. Prices for read to run solutions range from $100-200 on up to netbotz, which can be pretty expensive but which have lots of fabulous features and sensors. rgb > > If there is already a device or howto out there on how to > do this, that would be great too. > > Thanks, > -geoff > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Wed Nov 19 04:28:11 2003 From: nixon at nsc.liu.se (nixon at nsc.liu.se) Date: Wed, 19 Nov 2003 10:28:11 +0100 Subject: thermal sensing In-Reply-To: (Geoff Galitz's message of "Tue, 18 Nov 2003 16:26:12 -0800") References: Message-ID: Geoff Galitz writes: > I've been to a few engineering web sites and catalogues > but haven't really found just what I need in terms of > hardware. I am looking for a temperature sensor that > can simply go high or low when the threshold is reached. > Any recommendations? Picotech's stuff is nice. Linux drivers are supplied. http://www.picotech.com/thermistor.html -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 19 08:08:09 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 19 Nov 2003 08:08:09 -0500 (EST) Subject: Reminder: the 5th Annual Beowulf Bash is tonight! Message-ID: All of the information is front-and-center at http://www.Beowulf.org and http://www.Beowulf.org/beowulf/bash The summary is The Annual Beowulf Bash is held in conjunction with the IEEE SC conferences. The party is tonight, Wednesday November 19th, 2003 at the Phoenix Hyatt directly, across the street from the SC2003 venue. It's be held on the second floor atrium, and we'll have large signs posted. We are pleased to introduce a new magazine as a sponsor, and welcome back Etnus, a founding sponsor from 1999 and 2000. Other sponsors are AMD, Penguin and Scyld (a founding sponsor). A note to attendee: please bring a camera: we'll be collecting for a pictorial on beowulf.org. Please note blackmail-worthy images so that we can fund next year's bash ;-> -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 19 09:15:33 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 19 Nov 2003 09:15:33 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Wed, 19 Nov 2003, Felix Rauch wrote: > On Tue, 18 Nov 2003, Robert G. Brown wrote: > > I don't think that they are quite "done", though, (at least the last > > time I checked) so yes, I'd call it a "start" to the idea. Not really > > my idea, as you can see. I think there are lots of folks who have > > thought on this, and lots more that have a de facto suite they use > > whether or not they are packaged. lmbench, netpipe, netperf, bonnie, > > memtest86 -- lots of tools out there for doing bits of this, some of > > them very nice. > > Please correct me if I'm wrong, but if I remember correctly netpipe > and netperf are one-to-one benchmarks. While these are important to > find out more about (and tune) the performance of your NICs, we need > more to find out about the overall performance of the whole cluster > network. No, of course I agree in detail with all of the observations below. This was what I meant when I suggested tests involving various message passing communications patterns in raw sockets, MPI, PVM -- in more detail, master-slave (boring but often relevant), tree distribution, all-to-all with and without some effort to avoid collisions, etc. Netpipes is very nice and does let you test PVM and MPI, but isn't really engineered for driving a cluster switch to its figurative knees. > > There are switches who's backplane offers only half bisectional > bandwidth, which might be fine for some applications. Other switches > are advertized to offer full bisectional bandwidth, but they simply > can't hold the promise. Other switches are expensive but deliver real > full bisection bandwidth. Some applications don't care if they don't > have a full-bisection-bandwidth network -- others do. > > So, for a comprehensive cluster benchmark, we should also have tools > to get insight into the inner workings of the network. Our reserach > group introduced such a benchmark as part of our paper > "Cost/Performance Tradeoffs in Network Interconnects for Clusters of > Commodity PCs" presented at this years CAC workshop (see [1]). We > found out that some switches perform rather poorly for some > communication patterns and that a full bisection bandwidth can play a > role for the performance of some applications (e.g. car traffic > simulation). > > While we don't have a ready-to-be-used-for-all-clusters kind of > benchmark, I still hope the ideas might be valuable for this > discussion. > > - Felix > > [1] http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 This is the kind of thing that should ultimately be a component of any full suite. What we really need are some handy dandy students who want to write and GPL all of this stuff and publish it. Alas, I'm a physicist and don't have the right kind of students, and although I do work on writing it myself I lack the time to really put it all together. It does seem like the sort of project a CS department with research efforts in cluster computing might want to tackle and "own", the way the Clemson guys own PVFS. Maybe I'll talk to my CS cluster colleagues here at Duke and see if a joint proposal can be worked out, perhaps collaboratively with a few other interested groups elsewhere. I seriously think that there is real computer science work to be done here, with an end stage goal being the creation of a daemon or kernel module that automagically generates microbenchmark numbers (ideally from a suite of modules that can be added or deleted at any time by e.g. dropping a suitably instrumente program file in a suitable directory) that are subsequently published in /proc (I've suggested this on the lmbench list at least twice now, to no avail). The advantage of this is that one COULD then rewrite e.g. ATLAS so that instead of having to be rebuilt for each micro-architecture on which it might run (a tedious and time consuming process) it simply drops its basic parametric tests in (if they aren't already in the default set) and runs. When it runs it reads in increasingly accurate numbers from /proc and dynamically autotunes. One could likely add a damped gradient search to the autotuning routine so that it can actually adjust itself (gradually) to very specific features of the system on which it is running, including the effect of the rest of its typical dynamic load. And not just ATLAS, of course. ANY program that might need to switch algorithm or access pattern based on microperformance metrics could benefit. As a single example, it might be possible to write a PVM or MPI program that automagically selects an optimal message passing pattern IF there were microbenchmark results immediately available indicating message passing efficiency at various scales (varying message size, distribution pattern, number of nodes). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Nov 19 08:56:33 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 19 Nov 2003 14:56:33 +0100 (CET) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Tue, 18 Nov 2003, Robert G. Brown wrote: > I don't think that they are quite "done", though, (at least the last > time I checked) so yes, I'd call it a "start" to the idea. Not really > my idea, as you can see. I think there are lots of folks who have > thought on this, and lots more that have a de facto suite they use > whether or not they are packaged. lmbench, netpipe, netperf, bonnie, > memtest86 -- lots of tools out there for doing bits of this, some of > them very nice. Please correct me if I'm wrong, but if I remember correctly netpipe and netperf are one-to-one benchmarks. While these are important to find out more about (and tune) the performance of your NICs, we need more to find out about the overall performance of the whole cluster network. There are switches who's backplane offers only half bisectional bandwidth, which might be fine for some applications. Other switches are advertized to offer full bisectional bandwidth, but they simply can't hold the promise. Other switches are expensive but deliver real full bisection bandwidth. Some applications don't care if they don't have a full-bisection-bandwidth network -- others do. So, for a comprehensive cluster benchmark, we should also have tools to get insight into the inner workings of the network. Our reserach group introduced such a benchmark as part of our paper "Cost/Performance Tradeoffs in Network Interconnects for Clusters of Commodity PCs" presented at this years CAC workshop (see [1]). We found out that some switches perform rather poorly for some communication patterns and that a full bisection bandwidth can play a role for the performance of some applications (e.g. car traffic simulation). While we don't have a ready-to-be-used-for-all-clusters kind of benchmark, I still hope the ideas might be valuable for this discussion. - Felix [1] http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Wed Nov 19 11:08:20 2003 From: jcownie at etnus.com (James Cownie) Date: Wed, 19 Nov 2003 16:08:20 +0000 Subject: Yotta Yotta Message-ID: <1AMUsO-752-00@etnus.com> Despite reports on this list to the contrary, Yotta Yotta are still in business, and have a stand here at SC. If you ask Wayne _really_ nicely he even has a few Yotta Yotta cubes :-) -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Wed Nov 19 11:17:27 2003 From: jcownie at etnus.com (James Cownie) Date: Wed, 19 Nov 2003 16:17:27 +0000 Subject: Q: Any info on the PathScale compilers? In-Reply-To: Message from Bill Broadley of "Tue, 18 Nov 2003 15:39:06 PST." <20031118233906.GA520@sphere.math.ucdavis.edu> Message-ID: <1AMV1D-75g-00@etnus.com> I attended a talk by one of the PathScale folks on the IBM booth. The compilers are based on the Open64 sources released under GPL by SGI. (Presumably they have some expert GPL lawyers). The SPEC numbers quoted were unlabelled as to whether they were peak or base. Some marginally conflicting claims were made :- The only compiler designed from the ground up for Opteron A stable code base from Open64 (presumably they mean the code-generator was designed from scratch for Opteron). -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Wed Nov 19 17:21:24 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed, 19 Nov 2003 14:21:24 -0800 Subject: Q: Any info on the PathScale compilers? In-Reply-To: <1AMV1D-75g-00@etnus.com> References: <20031118233906.GA520@sphere.math.ucdavis.edu> <1AMV1D-75g-00@etnus.com> Message-ID: <20031119222124.GA6034@sphere.math.ucdavis.edu> Alas my ethernet AND wireless seem to be buying on the dell laptop I'm using, keeping my notes off the network. At least unless I can find a smallish phillips screwdriver in the downtown pheonix area. In any case the sheets I got are labeled speculative I believe, I have the ratios handy: Estimate ratios for an IBM eserver 325 dual 2.0 GHz with PC3200 CINT2000 = 1065 953 1364 615 1714 935 1605 1362 1138 2206 1086 1011 CFP2000 = 1733 2225 1526 1277 1660 2425 1347 1341 1654 1134 1415 1275 613 1150 INT 1200 est, INT base 1173 FP 1416 est, FP base 1237 I don't have similar numbers for NAG, PGI, or anyone else who has an opteron compiler handy. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Thu Nov 20 00:49:31 2003 From: josip at lanl.gov (Josip Loncaric) Date: Wed, 19 Nov 2003 22:49:31 -0700 Subject: thermal sensing In-Reply-To: References: Message-ID: <3FBC55EB.4040009@lanl.gov> Geoff Galitz wrote: > I need to put together a little system which can > monitor the temperature of a machine room, and > when a certain threshold is reached, trigger a > program to run. If cost must be minimized, how about a cheap $10 thermostat suitably wired to a serial port DCD or CTS line? This may provide on/off thermal signaling (e.g. some UPS units use this method to signal power failures). On a related note, Mark Hahn mentioned this back in June: http://www.ibutton.com/ibuttons/thermochron.html which could be useful in somewhat different situations... Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scameron at ubi.com Thu Nov 20 11:05:29 2003 From: scameron at ubi.com (Scott Cameron) Date: Thu, 20 Nov 2003 11:05:29 -0500 Subject: Linux 2.4.20 + bonding troubles Message-ID: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> Hi there, I'm not sure where to look for information regarding this. I've been trying to implement an etherchannel setup for one of my systems and have been seeing varied success. I have the first etherchannel set up on 2 Intel Etherexpress 100 cards (e100 driver), it seems to work with little problem -- the only issue with this etherchannel I have seen is that the channel can not seem go above 100 megabits, while I certainly generate enough throughput to go beyond 100 megabits. The second etherchannel is on 2 Intel 1000 Mbit cards (e1000 driver). I've had the most trouble with this channel -- when I bring it up, the interface begins showing CRC errors & collisions for Tx (not Rx). However, I don't see the collisions on the switch -- just the CRC errors. Both channels are running in load-balancing round-robin mode. On the switch I have the port-channel configured to do source XOR destination IP load-balancing. The switch I'm connecting to is a Catalyst 6006 running the integrated IOS. I can't see any errors in the log on the switch, and not sure how to proceed to figure out where the CRC errors are coming from. If anyone could point me in the right direction that would be great. Scott Switch: Cisco Catalyst 6006 (integrated IOS) Linux box: P3-1.4 GHz, 2.4.20 kernel 2x Intel PRO/1000 (driver 5.2.20) 4x Intel PRO/100 (driver 2.1.24-k1) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From siegert at sfu.ca Thu Nov 20 13:01:07 2003 From: siegert at sfu.ca (Martin Siegert) Date: Thu, 20 Nov 2003 10:01:07 -0800 Subject: Linux 2.4.20 + bonding troubles In-Reply-To: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> References: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> Message-ID: <20031120180107.GA10741@stikine.ucs.sfu.ca> Hi Scott, we ran into the same problem here: the problem is not Linux (as you mentioned you can set the channels to round-robin under Linux), but the Cisco switch: you cannot set the Cisco to round-robin mode on the etherchannel [partially to blame is the IEEE 802.3ad standard, which does not specify round-robin mode; but that standard was probably intended for Telco situations (serving many connections at the same time) instead of HPC situations (aiming at high throughput)]. As a consequence the Cisco will always forward all packets to a single leg of the outgoing trunk. If your receiving trunk is made out of 100Mbit/s connections this will limit you to 100Mbit/s. There is not much you can do about this: If all machines that connect to that network have two NICs, you can create two VLANs on the Cisco and connect the first of two NICs of each box to VLAN 1 and the second VLAN 2. If you are not in that situation (and we aren't) the only thing that you can do is to forklift the Cisco out of the way and buy a switch that supports round-robin mode on etherchannels, e.g., Extreme's Black Diamond switches. Cheers, Martin -- Martin Siegert Manager, Research Services WestGrid Site Manager Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 On Thu, Nov 20, 2003 at 11:05:29AM -0500, Scott Cameron wrote: > Hi there, > > I'm not sure where to look for information regarding this. > > I've been trying to implement an etherchannel setup for one of my systems > and have been seeing varied success. I have the first etherchannel set up > on 2 Intel Etherexpress 100 cards (e100 driver), it seems to work with > little problem -- the only issue with this etherchannel I have seen is that > the channel can not seem go above 100 megabits, while I certainly generate > enough throughput to go beyond 100 megabits. > > The second etherchannel is on 2 Intel 1000 Mbit cards (e1000 driver). I've > had the most trouble with this channel -- when I bring it up, the interface > begins showing CRC errors & collisions for Tx (not Rx). However, I don't > see the collisions on the switch -- just the CRC errors. > > Both channels are running in load-balancing round-robin mode. On the switch > I have the port-channel configured to do source XOR destination IP > load-balancing. > > The switch I'm connecting to is a Catalyst 6006 running the integrated IOS. > I can't see any errors in the log on the switch, and not sure how to proceed > to figure out where the CRC errors are coming from. > > If anyone could point me in the right direction that would be great. > > Scott > > Switch: Cisco Catalyst 6006 (integrated IOS) > Linux box: P3-1.4 GHz, 2.4.20 kernel > 2x Intel PRO/1000 (driver 5.2.20) > 4x Intel PRO/100 (driver 2.1.24-k1) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jmoyer at redhat.com Thu Nov 20 16:25:58 2003 From: jmoyer at redhat.com (Jeff Moyer) Date: Thu, 20 Nov 2003 16:25:58 -0500 Subject: Linux 2.4.20 + bonding troubles In-Reply-To: <20031120180107.GA10741@stikine.ucs.sfu.ca> References: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> <20031120180107.GA10741@stikine.ucs.sfu.ca> Message-ID: <16317.12646.988920.171852@segfault.boston.redhat.com> ==> Regarding Re: Linux 2.4.20 + bonding troubles; Martin Siegert adds: [snip] siegert> There is not much you can do about this: If all machines that siegert> connect to that network have two NICs, you can create two VLANs on siegert> the Cisco and connect the first of two NICs of each box to VLAN 1 siegert> and the second VLAN 2. If you are not in that situation (and we siegert> aren't) the only thing that you can do is to forklift the Cisco siegert> out of the way and buy a switch that supports round-robin mode on siegert> etherchannels, e.g., Extreme's Black Diamond switches. Note that a simple round robin scheme for sending packets can cause performance issues as well if you get tcp packet reordering. See, for example: http://roland.grc.nasa.gov/~mallman/papers/tcp-reorder-ccr.ps Cheers, Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Nov 20 22:39:05 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 21 Nov 2003 11:39:05 +0800 (CST) Subject: News, FYI: High Productivity Computing Systems & PBS 5.4 Message-ID: <20031121033905.45494.qmail@web16806.mail.tpe.yahoo.com> "... Part of this (High Productivity Computing System) is looking into better super-computing benchmarks": http://www.aceshardware.com/#75000448 Also, some news about PBS from sc2003: 1) http://www.supercomputingonline.com/article.php?sid=5079 2) http://www.supercomputingonline.com/article.php?sid=5089 3) http://www.supercomputingonline.com/article.php?sid=5090 Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From neil.brown at syngenta.com Fri Nov 21 10:35:10 2003 From: neil.brown at syngenta.com (neil.brown at syngenta.com) Date: Fri, 21 Nov 2003 15:35:10 -0000 Subject: RHEL Copyright Removal Message-ID: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Hi all, We're having a bit of a dilemma here, as I'm sure many others are, about what to use as our standard Linux distro with the end of life of the Red Hat family. RHEL or SLES are looking favourites in terms of supportability, but of course there's the not insignificant problem of cost. The thought of having to pay at least $179 per server, with around 50 compute nodes, along with various other non-beowulf Linux servers doesn't appeal. I've been trying to find out how much effort it takes to strip the RH copyrighted bits out of RHEL and compile it for our own use and whether doing so reduces it's functionality a great deal. I've trawled the web and usenet, but not found much to write home about on the subject. Have any of you had experiences with such an exercise? Were they positive? How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure whether it's ES or AS) so it surely can't be that bad as a cluster oriented distro. Thanks for any suggestions, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tlovie at pokey.mine.nu Fri Nov 21 11:00:17 2003 From: tlovie at pokey.mine.nu (Thomas Lovie) Date: Fri, 21 Nov 2003 11:00:17 -0500 Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Unfortunately, this is not a trivial task. I had attempted to re-build 2.1AS, and to get everything to build is quite tricky, since various packages have dependency lists that sometimes conflict. From what I understand, building 3.0AS is even more difficult. But there are others that share the same dilema, and much progress has been made on doing this so far. You might want to check out this mailing list: rhel-rebuild mailing list rhel-rebuild-l at uibk.ac.at Hosted at the University of Innsbruck, Austria And also a distribution called cAos at: caosity.org (I believe they have a mailing list) Tom Lovie. -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of neil.brown at syngenta.com Sent: Friday, November 21, 2003 10:35 AM To: beowulf at beowulf.org Subject: RHEL Copyright Removal Hi all, We're having a bit of a dilemma here, as I'm sure many others are, about what to use as our standard Linux distro with the end of life of the Red Hat family. RHEL or SLES are looking favourites in terms of supportability, but of course there's the not insignificant problem of cost. The thought of having to pay at least $179 per server, with around 50 compute nodes, along with various other non-beowulf Linux servers doesn't appeal. I've been trying to find out how much effort it takes to strip the RH copyrighted bits out of RHEL and compile it for our own use and whether doing so reduces it's functionality a great deal. I've trawled the web and usenet, but not found much to write home about on the subject. Have any of you had experiences with such an exercise? Were they positive? How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure whether it's ES or AS) so it surely can't be that bad as a cluster oriented distro. Thanks for any suggestions, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Nov 21 11:30:36 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 21 Nov 2003 11:30:36 -0500 (EST) Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: On Fri, 21 Nov 2003 neil.brown at syngenta.com wrote: > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red Hat > family. RHEL or SLES are looking favourites in terms of supportability, but > of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > This will soon be a FAQ. The best solutions kicked around so far (if you wish to stick with basically free RPM-based full-service kickstartable/pxe installable distros) are two community supported efforts: Fedora: http://fedora.redhat.com This is basically a core that is RH9 with all the logos etc stripped down to where they inherit GPL (eventually completely, I imagine). It is a community supported model, where I believe they are looking for people to take on pieces of the bug triage tree -- Adopt a Package Today! It is designed in layers, with a "core" that should be fully functional at the server and workstation level and supported as well as anything out there, a legacy layer, and a contributed/kitchen sink layer with less stable but bleeding-edge useful stuff. The project is yummified from the beginning, which means that it is very simple to create/rsync your own repository mirror and then use yum to maintain a LAN or cluster from it. At a guess, NEARLY anything you have set up for RH 9 will eventually be quite portable to fedora, although on the yum list I hear of occasional exceptions, as one might expect until things settle down. The fedora core is "in production" now at version 1, I believe, although I expect that only hardy admins and developer types are adopting it at first until and to help it settle in. Note well that www.fedora.org is a site that will just ask you to go away, it is NOT associated with this project...:-( Note well that Red Hat IS associated with this project. This may or may not make you feel good about going this route. I personally think they are strongly committed to it as they rely on SOMETHING to create a rawhide -> semistable released -> rockstable corporate chain; they damn well can't unlease barely-out-of-rawhide on people paying big bucks per seat and disinclined to participate in the debugging process. Caosity: http://www.caosity.org This is Community Linux WITHOUT corporate strings, run at least in part by clustervolken. They too are stripping RH as a base, but plan to eventually diverge. At a guess, at some point there will be Much Synthesis and sharing between the two projects as it would be silly not to. They too are soliciting humans to help out. I know people heavily engaged in both projects, and expect both projects to be stable at the starting level of RH9 before RHL support ceases. One or the other will likely be the most successful at setting up and organizing the bug triage network and perhaps eventually dominate, although they are also likely to differentiate in focus (Caos has a very definite cluster/scientific computing flavor due to the work environments of a lot of the primary drivers). HTH. I personally have stopped worrying about the transition, and plan to convert my personal machines to fedora "soon" to start screwing around with it prior to a campus conversion likely in the spring. "Soon" as in my first rsync of the oceanic fedora core to my home repository server is being slurped through a DSL straw as I type this, likely done sometime today. I have a totally idle box all lined up to be first, PXE and kickstart all happy -- I'll cheerfully report my experiences as soon as I have any. rgb > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. > > Thanks for any suggestions, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Fri Nov 21 12:24:06 2003 From: canon at nersc.gov (canon at nersc.gov) Date: Fri, 21 Nov 2003 09:24:06 -0800 Subject: RHEL Copyright Removal In-Reply-To: Message from "Thomas Lovie" of "Fri, 21 Nov 2003 11:00:17 EST." <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Message-ID: <200311211724.hALHO744026500@pookie.nersc.gov> Niel, The two projects/groups Tom mentioned are a good starting point. I have rebuilt 3.0AS without too much trouble and that's without purchasing a copy (which would have given me a better jumping off point). Another project that might be appealing is whitebox. http://www.beau.org/~jmorris/linux/whitebox/index.html This is an already rebuilt RHEL with all trademarks removed. --Shane ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From j.c.burton at gats-inc.com Fri Nov 21 12:36:26 2003 From: j.c.burton at gats-inc.com (John Burton) Date: Fri, 21 Nov 2003 12:36:26 -0500 Subject: RHEL Copyright Removal In-Reply-To: References: Message-ID: <3FBE4D1A.6030803@gats-inc.com> I'm running Fedora on one of my machines and am pretty happy with it - its RedHat Linux with the names and logos changed. It uses a slightly newer kernel than RH9 (2.4.22 vs 2.4.20 IIRC). One minor difficulty I had came from trying to compile a third party (nvidia) kernel module. The kernel is compiled with gcc32, but the default compiler is gcc33. Both are supplied on the system, so you just have to be careful about specifying which compiler to use. I'm guessing there is some issue with gcc33 and the kernel... So far so good... we'll probably go with fedora for development or personal workstations and RHEL for our production servers... John Robert G. Brown wrote: >On Fri, 21 Nov 2003 neil.brown at syngenta.com wrote: > > > >>Hi all, >> >>We're having a bit of a dilemma here, as I'm sure many others are, about >>what to use as our standard Linux distro with the end of life of the Red Hat >>family. RHEL or SLES are looking favourites in terms of supportability, but >>of course there's the not insignificant problem of cost. The thought of >>having to pay at least $179 per server, with around 50 compute nodes, along >>with various other non-beowulf Linux servers doesn't appeal. >> >>I've been trying to find out how much effort it takes to strip the RH >>copyrighted bits out of RHEL and compile it for our own use and whether >>doing so reduces it's functionality a great deal. I've trawled the web and >>usenet, but not found much to write home about on the subject. >> >> >> > >This will soon be a FAQ. The best solutions kicked around so far (if >you wish to stick with basically free RPM-based full-service >kickstartable/pxe installable distros) are two community supported >efforts: > >Fedora: http://fedora.redhat.com > >This is basically a core that is RH9 with all the logos etc stripped >down to where they inherit GPL (eventually completely, I imagine). It >is a community supported model, where I believe they are looking for >people to take on pieces of the bug triage tree -- Adopt a Package >Today! It is designed in layers, with a "core" that should be fully >functional at the server and workstation level and supported as well as >anything out there, a legacy layer, and a contributed/kitchen sink layer >with less stable but bleeding-edge useful stuff. The project is >yummified from the beginning, which means that it is very simple to >create/rsync your own repository mirror and then use yum to maintain a >LAN or cluster from it. > >At a guess, NEARLY anything you have set up for RH 9 will eventually be >quite portable to fedora, although on the yum list I hear of occasional >exceptions, as one might expect until things settle down. > >The fedora core is "in production" now at version 1, I believe, although >I expect that only hardy admins and developer types are adopting it at >first until and to help it settle in. > >Note well that www.fedora.org is a site that will just ask you to go >away, it is NOT associated with this project...:-( > >Note well that Red Hat IS associated with this project. This may or may >not make you feel good about going this route. I personally think they >are strongly committed to it as they rely on SOMETHING to create a >rawhide -> semistable released -> rockstable corporate chain; they damn >well can't unlease barely-out-of-rawhide on people paying big bucks per >seat and disinclined to participate in the debugging process. > >Caosity: http://www.caosity.org > >This is Community Linux WITHOUT corporate strings, run at least in part >by clustervolken. They too are stripping RH as a base, but plan to >eventually diverge. At a guess, at some point there will be Much >Synthesis and sharing between the two projects as it would be silly not >to. They too are soliciting humans to help out. > >I know people heavily engaged in both projects, and expect both projects >to be stable at the starting level of RH9 before RHL support ceases. >One or the other will likely be the most successful at setting up and >organizing the bug triage network and perhaps eventually dominate, >although they are also likely to differentiate in focus (Caos has a very >definite cluster/scientific computing flavor due to the work >environments of a lot of the primary drivers). > >HTH. I personally have stopped worrying about the transition, and plan >to convert my personal machines to fedora "soon" to start screwing >around with it prior to a campus conversion likely in the spring. >"Soon" as in my first rsync of the oceanic fedora core to my home >repository server is being slurped through a DSL straw as I type this, >likely done sometime today. I have a totally idle box all lined up to >be first, PXE and kickstart all happy -- I'll cheerfully report my >experiences as soon as I have any. > > rgb > > > >>Have any of you had experiences with such an exercise? Were they positive? >>How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure >>whether it's ES or AS) so it surely can't be that bad as a cluster oriented >>distro. >> >>Thanks for any suggestions, >>Neil >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> > >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Nov 21 11:16:29 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 21 Nov 2003 08:16:29 -0800 (PST) Subject: RHEL Copyright Removal In-Reply-To: <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Message-ID: what was your build environment? joelja On Fri, 21 Nov 2003, Thomas Lovie wrote: > Unfortunately, this is not a trivial task. I had attempted to re-build > 2.1AS, and to get everything to build is quite tricky, since various > packages have dependency lists that sometimes conflict. From what I > understand, building 3.0AS is even more difficult. But there are others > that share the same dilema, and much progress has been made on doing this so > far. You might want to check out this mailing list: > > rhel-rebuild mailing list > rhel-rebuild-l at uibk.ac.at > Hosted at the University of Innsbruck, Austria > > And also a distribution called cAos at: caosity.org > (I believe they have a mailing list) > > Tom Lovie. > > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of > neil.brown at syngenta.com > Sent: Friday, November 21, 2003 10:35 AM > To: beowulf at beowulf.org > Subject: RHEL Copyright Removal > > > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red Hat > family. RHEL or SLES are looking favourites in terms of supportability, but > of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. > > Thanks for any suggestions, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From p.pennaz at tiscali.it Fri Nov 21 13:27:04 2003 From: p.pennaz at tiscali.it (p.pennaz at tiscali.it) Date: Fri, 21 Nov 2003 19:27:04 +0100 Subject: booting from usb pen drive Message-ID: <3FAA831D0001F2C1@mail-1.tiscali.it> Does anyone know if it is a possibility in boot a linux PC system via USB cartridge? My usb subsystem is working fine. Thank you __________________________________________________________________ Tiscali ADSL SENZA CANONE, paghi solo quando navighi! E in pi? il modem e' GRATIS! Abbonati subito. http://point.tiscali.it/adsl/index.shtml _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Fri Nov 21 11:25:56 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Fri, 21 Nov 2003 10:25:56 -0600 (CST) Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: On Fri, 21 Nov 2003 neil.brown at syngenta.com wrote: > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red Hat > family. RHEL or SLES are looking favourites in terms of supportability, but > of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. For 2.1, I am unsure if there are any binary releases out there or not. For 3, there are a couple groups doing work with this. One is called White Box Enterprise Linux and has binaries up at http://www.beau.org/~jmorris/linux/whitebox/index.html . Another group, www.caosity.org, is doing the same thing, but does not yet have ISO's available. I am involved with this project and we expect to have a testing release out in the next week or so. I've also heard that ROCKS is putting together a rebuild for their own use, but I was unable to find any information about it after a short search. I also know of several other groups that have internal projects to do the same thing. -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jbecker at northwestern.edu Fri Nov 21 10:51:00 2003 From: jbecker at northwestern.edu (Jesse Becker) Date: Fri, 21 Nov 2003 09:51:00 -0600 Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: <20031121155100.GD8468@northwestern.edu> On Fri, Nov 21, 2003 at 03:35:10PM -0000, neil.brown at syngenta.com wrote: > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. Actaully, I believe that ROCKS is based on RHEL 2.1 WS. I've used it a few times, and parts of it are quite nice. The ROCKS guys have automated most of the recompile process, but I don't know if the automation includes stripping out the RH stuff. -- Jesse Becker GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From alvin at Mail.Linux-Consulting.com Fri Nov 21 19:15:32 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 21 Nov 2003 16:15:32 -0800 (PST) Subject: RHEL Copyright Removal In-Reply-To: <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Message-ID: hi ya neil/thomas On Fri, 21 Nov 2003, Thomas Lovie wrote: > Unfortunately, this is not a trivial task. I had attempted to re-build > 2.1AS, and to get everything to build is quite tricky, since various > packages have dependency lists that sometimes conflict. i think that strictly depends on "how linux" was installed and which "versions" ... dependencies are relatively easy to solve ... compared to the problems you folks are trying to solve with the clusters only problems in the last few (5) years that i've seen that couldnt be solved was a mix-n-match of latest versions of php, perl, mysql, gcc, bugzilla, mozilla, apache ... ( bugilla would not work when some upgrades/patches are applied ( and d/l the latest patches at the time of each didnt work ( either.. - most all other apps have all worked on any other distro that i tend to use ( rh, slackware, suse, custom, .. ) ( ie .. customers do not need to be locked down to a particular ( older version and forced to pay $$ for it knowing which additional "user application software" you need to have running is what makes 95% of the difference of which distro to use or not and the rest os system tweeking and debugging and patches - i think, imho, "support" is the most expensive part of the cluster's TCO and the hardware is relatively in-expensive in comparason .. - $ 200/server * 50 machines ( $10K ) is still inexpensive compared to hiring an outsourced "linux support" > From what I > understand, building 3.0AS is even more difficult. But there are others > that share the same dilema, and much progress has been made on doing this so > far. You might want to check out this mailing list: > > rhel-rebuild mailing list > rhel-rebuild-l at uibk.ac.at > Hosted at the University of Innsbruck, Austria > > And also a distribution called cAos at: caosity.org > (I believe they have a mailing list) > > Tom Lovie. thanx alvin -- if anybody is local in silicon valley, and want to build alternative cluster distro's using existing "free distro", i'm game .. -- cluster apps that people seem to use http://www.Linux-Consulting.com/Cluster _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From virtualsuresh at yahoo.co.in Sat Nov 22 00:34:52 2003 From: virtualsuresh at yahoo.co.in (=?iso-8859-1?q?Suresh=20Chandra=20Mannava?=) Date: Sat, 22 Nov 2003 05:34:52 +0000 (GMT) Subject: distributed computing applications Message-ID: <20031122053452.63176.qmail@web8005.mail.in.yahoo.com> distributed computing efforts. Sir, I am interested in the area of Distibuted/ Parallel/High performnace computing, as a part of my study I am preparing a list of applications that can utilise distributed computing power. I made a small list by searching on the internet, there are much more applications yet to added. I request you to provide pointers for latest applications and the applictaions I missed. I also request you to provide pointers for applications specific to Beowulf clusters Here is the list: (They are not properly organised) 1) Visualization, image processing, rendering, special effects Parallel ray-tracing University of Bristol (Computer Graphics group) http://www.cs.bris.ac.uk/research/graphics Kwangu Institute of Science & technology (Information System group) http://parallel.kjist.ac.kr/projects.htm 2)Data mining PAPIA -PArallel Protein Information Analysis http://www.rwcp.or.jp/papia PADMA-PArallel Data Mining Agents http://www.eecs.wsu.edu/~hillol/padma.html 3)Goggle ? Web Search Engine with Linux cluster (more than 10,000 servers). 4) High Performance, High availability web servers eddieware, khttpd(Static pages) 5)Computing in Computational fluid dynamics 6)Search for Extraterrestrial Life (SETI) Radio signals are monitored by Computationally-intense algorithms http://setiathome.ssl.berkeley.edu/ 7)Folding at Home: An effort to understand protein folding and aggregation for use toward fighting degenerative diseases. http://www.stanford.edu/group/pandegroup/folding/ 8)Find-a-Drug: http://www.find-a-drug.org/ Evaluates the potential of different molecules to interact with certain protein targets. Molecules that are found to be "hits" can become new drug candidates used for treating important diseases. 9) GIMPS: The Great Internet Mersenne Prime Search: http://www.mersenne.org Searches for record-size Mersenne prime numbers. Discovered the 39th known Mersenne prime number, 2^13,466,917 - 1 on November 14, 2001. 10)Distributed Search for Fermat Number Divisors: http://www.fermatsearch.org/ Searches for additional divisors of Fermat numbers. Found 7 new divisorsin 2003. 1) Brute force attacks on cryptographic keys Cracking RC4, RC5, DES Cracking Passwords http://www.isg.inf.ethz.ch/docu/security/passwd/crackcluster.html http://www.cisiar.org/proyectos/cisilia/home_en.php 12)other Applications Computing for Genomic Analysis Genetic programming ?Big Science? problems involving modeling, simulating and understanding large complex systems, example: cosmology sub atomic physics climate modeling Biomedicine and Biochemistry ===== ---------------------------Research ScholarVIT, India. ________________________________________________________________________ Yahoo! India Mobile: Download the latest polyphonic ringtones. Go to http://in.mobile.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From award at andorra.ad Sat Nov 22 05:30:45 2003 From: award at andorra.ad (Alan Ward) Date: Sat, 22 Nov 2003 11:30:45 +0100 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> Message-ID: <3FBF3AD5.4040301@andorra.ad> Yes, but for the time being you need to boot the kernel off a diskette or network. You also need to hack the kernel a wee bit. I just sent an article on this for linuxgazette.net , and will keep you posted when it comes out (probably next month). Best regards, Alan Ward En/na p.pennaz at tiscali.it ha escrit: > Does anyone know if it is a possibility in boot a linux PC system via USB > cartridge? > My usb subsystem is working fine. > Thank you > > __________________________________________________________________ > Tiscali ADSL SENZA CANONE, paghi solo quando navighi! > E in pi? il modem e' GRATIS! Abbonati subito. > http://point.tiscali.it/adsl/index.shtml > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Nov 22 07:52:54 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 22 Nov 2003 20:52:54 +0800 (CST) Subject: distributed computing applications In-Reply-To: <20031122053452.63176.qmail@web8005.mail.in.yahoo.com> Message-ID: <20031122125254.16605.qmail@web16811.mail.tpe.yahoo.com> Also don't miss Condor, PBS, and Gridengine. They are the enabling technologies for distributed/parallel applications. Andrew. --- Suresh Chandra Mannava ???T???G > distributed computing efforts. > Sir, > I am interested in the area of Distibuted/ > Parallel/High performnace computing, as a part of > my > study I am preparing a list of applications that can > utilise distributed computing power. > I made a small list by searching on the internet, > there are much more applications yet to added. I > request you to provide pointers for latest > applications and the applictaions I missed. > I also request you to provide pointers for > applications specific to Beowulf clusters > > Here is the list: > > (They are not properly organised) > > 1) Visualization, image processing, rendering, > special > effects > Parallel ray-tracing > University of Bristol (Computer Graphics group) > http://www.cs.bris.ac.uk/research/graphics > Kwangu Institute of Science & technology > (Information > System group) > http://parallel.kjist.ac.kr/projects.htm > > 2)Data mining > > PAPIA -PArallel Protein Information Analysis > http://www.rwcp.or.jp/papia > PADMA-PArallel Data Mining Agents > http://www.eecs.wsu.edu/~hillol/padma.html > > 3)Goggle ?Web Search Engine with Linux cluster > (more > than 10,000 servers). > > 4) High Performance, High availability web servers > eddieware, khttpd(Static pages) > > 5)Computing in Computational fluid dynamics > > 6)Search for Extraterrestrial Life (SETI) > Radio signals are monitored by > Computationally-intense > algorithms > http://setiathome.ssl.berkeley.edu/ > > 7)Folding at Home: > An effort to understand protein folding and > aggregation for use toward fighting degenerative > diseases. > http://www.stanford.edu/group/pandegroup/folding/ > > 8)Find-a-Drug: http://www.find-a-drug.org/ > Evaluates the potential of different molecules to > interact with certain protein targets. Molecules > that > are found to be "hits" can become new drug > candidates > used for treating important diseases. > > 9) GIMPS: The Great Internet Mersenne Prime Search: > http://www.mersenne.org > Searches for record-size Mersenne prime numbers. > Discovered the 39th known Mersenne prime number, > 2^13,466,917 - 1 on November 14, 2001. > > 10)Distributed Search for Fermat Number Divisors: > http://www.fermatsearch.org/ > Searches for additional divisors of Fermat > numbers. > Found 7 new divisorsin 2003. > > 1) Brute force attacks on cryptographic keys > Cracking RC4, RC5, DES > Cracking Passwords > http://www.isg.inf.ethz.ch/docu/security/passwd/crackcluster.html > http://www.cisiar.org/proyectos/cisilia/home_en.php > > 12)other Applications > > Computing for Genomic Analysis > Genetic programming > ?Big Science?problems involving modeling, > simulating > and understanding large complex systems, example: > cosmology > sub atomic physics > climate modeling > Biomedicine and Biochemistry > > > ===== > ---------------------------Research ScholarVIT, > India. > > ________________________________________________________________________ > Yahoo! India Mobile: Download the latest polyphonic > ringtones. > Go to http://in.mobile.yahoo.com > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ?C???? Yahoo!?_?? ?????C???B?????????B?R?A???????A???b?H?????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sat Nov 22 10:52:13 2003 From: agrajag at dragaera.net (Jag) Date: 22 Nov 2003 10:52:13 -0500 Subject: booting from usb pen drive In-Reply-To: <3FAA831D0001F2C1@mail-1.tiscali.it> References: <3FAA831D0001F2C1@mail-1.tiscali.it> Message-ID: <1069516333.2018.1.camel@loiosh> On Fri, 2003-11-21 at 13:27, p.pennaz at tiscali.it wrote: > Does anyone know if it is a possibility in boot a linux PC system via USB > cartridge? > My usb subsystem is working fine. The short answer is yes. The long answer is, it depends on your BIOS. Its kinda like a few years ago when some systems would boot from CD and some wouldn't. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sat Nov 22 09:11:27 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sat, 22 Nov 2003 06:11:27 -0800 Subject: booting from odd sources was Re: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <3FBF3AD5.4040301@andorra.ad> Message-ID: <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> Along the same lines (oddly, I was wondering about just this idea (booting from USB)), one can get a IDE<>compact flash adapter for about $20 that mounts right on the motherboard (space permitting). One CAN boot off the CF drive (and you could use sneakernet to get the stuff on the drive in the first place). Check out http://www.mini-box.com/ As far as CF goes, places like Best Buy are selling 64MB for $35, 256MB for $80-85, but I'm sure a bit of research would turn it up cheaper (I was just looking through the inserts in the morning paper). Smart Media, Memory Sticks, and "secure digital memory" appear to be in the same general price range but I don't know about interfaces. And another question.. has anyone done a network boot off a network adapter attached to the USB port? Or, more to my precise need, has anyone done a network boot over a wireless network adapter of any kind? Do the wireless adapters have the PXE or bootrom capability? Does the bios even allow you to specify the "non-wired to the bus" adapaters as a boot device? Is there some fundamental chipset reason why they can't. Jim Lux ----- Original Message ----- From: "Alan Ward" To: Cc: Sent: Saturday, November 22, 2003 2:30 AM Subject: Re: booting from usb pen drive > Yes, but for the time being you need to boot the kernel > off a diskette or network. You also need to hack the > kernel a wee bit. > > I just sent an article on this for linuxgazette.net , and > will keep you posted when it comes out (probably next > month). > > Best regards, > Alan Ward > > > En/na p.pennaz at tiscali.it ha escrit: > > Does anyone know if it is a possibility in boot a linux PC system via USB > > cartridge? > > My usb subsystem is working fine. > > Thank you > > > > __________________________________________________________________ > > Tiscali ADSL SENZA CANONE, paghi solo quando navighi! > > E in pi? il modem e' GRATIS! Abbonati subito. > > http://point.tiscali.it/adsl/index.shtml > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sat Nov 22 13:37:11 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sat, 22 Nov 2003 10:37:11 -0800 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> Message-ID: <000601c3b127$a5845560$32a8a8c0@laptop152422> Aiee.. an answer just long enough to really whet our appetites. A bit longer answer please? Which BIOS? Which mobo? How could one tell (without having the mobo sitting in front of you)? This could be a very elegant solution for booting diskless nodes, since virtually every mobo made today has USB interfaces on it, and would save you the hassle of putting CDROM or Floppy drives out there. I'd point out that NOT every mobo out there has PXE or network boot capability, so this is a nice alternative. ----- Original Message ----- From: "Jag" To: Cc: Sent: Saturday, November 22, 2003 7:52 AM Subject: Re: booting from usb pen drive > On Fri, 2003-11-21 at 13:27, p.pennaz at tiscali.it wrote: > > Does anyone know if it is a possibility in boot a linux PC system via USB > > cartridge? > > My usb subsystem is working fine. > > The short answer is yes. The long answer is, it depends on your BIOS. > Its kinda like a few years ago when some systems would boot from CD and > some wouldn't. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 22 14:51:46 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 22 Nov 2003 14:51:46 -0500 (EST) Subject: RHEL Copyright Removal In-Reply-To: Message-ID: > - i think, imho, "support" is the most expensive part of the cluster's > TCO and the hardware is relatively in-expensive in comparason .. depends. some components of TCO scale with cluster size (fixing hardware failures, initial hardware cost, power, cooling). others scale with some function of the userbase (clueless ones require more support). but in this case, we're talking about the kind of support offered by dists and OS's: it doesn scale with cluster size at all, since the cluster is basically a single machine. > - $ 200/server * 50 machines ( $10K ) is still inexpensive compared to > hiring an outsourced "linux support" but that's silly - for the cost of a person, you get a lot more than what's offered by OS/dist support. in summary, RH is doing a sensible thing. there's a market for OS/dists sold to "high-maintenance" users who can't or don't want to learn the details, and don't have someone else to do it. but it's silly to think that that kind of maintenance contract should scale with cluster size. it's also clear that there will continue to be low-maintenance dists (and afaikt, that's exactly what Fedora is.) I suppose that in a very indirect way, the cost to support a large cluster does scale with size. that is, if you have 1K CPUs that won't work at all, you should be willing to pay more than $200 for support. $200/machine is silly though, since these days, a node can easily be $2k or less, and 10% is simply too much. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tim at otten.co.uk Fri Nov 21 15:59:52 2003 From: tim at otten.co.uk (Tim) Date: Fri, 21 Nov 2003 20:59:52 -0000 Subject: booting from usb pen drive In-Reply-To: <3FAA831D0001F2C1@mail-1.tiscali.it> Message-ID: <20031121205946.MZBN13700.mta05-svc.ntlworld.com@methodman> Depends if your mobo has a boot from usb option. -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of p.pennaz at tiscali.it Sent: 21 November 2003 18:27 To: beowulf at beowulf.org Subject: booting from usb pen drive Does anyone know if it is a possibility in boot a linux PC system via USB cartridge? My usb subsystem is working fine. Thank you __________________________________________________________________ Tiscali ADSL SENZA CANONE, paghi solo quando navighi! E in pi? il modem e' GRATIS! Abbonati subito. http://point.tiscali.it/adsl/index.shtml _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cmwoo at hkucc.hku.hk Fri Nov 21 23:41:33 2003 From: cmwoo at hkucc.hku.hk (Woo Chat Ming) Date: Sat, 22 Nov 2003 12:41:33 +0800 Subject: How : up2date 128 nodes of Redhat 9 ? Message-ID: <3FC7020D@webmaila.hku.hk> Dear beowulf friends, We are a university in Hong Kong and we have a Redhat Linux 9 beowulf cluster consisting of 128 nodes. All of them have real IP address and are connected to the Internet. May I know how can I up2date all those nodes using a single command ? Thanks in advance for your information. Regards, Woo Chat Ming. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 22 17:07:57 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 22 Nov 2003 17:07:57 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: <20031121205946.MZBN13700.mta05-svc.ntlworld.com@methodman> Message-ID: > Depends if your mobo has a boot from usb option. imagine that! I wonder how bootable usb-keys work. it would be pretty useless if the bios only had enough smarts to load a bootsector and run it. the bios must at least contain enough of a usb-block driver to let it emulate a floppy disk. if so, I'd expect linux to "just work"... as long as you can somehow get even a bare kernel loaded, you're home free. things like gui bootloaders or even initrd's are just icing ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Sat Nov 22 18:22:55 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Sun, 23 Nov 2003 07:22:55 +0800 Subject: How : up2date 128 nodes of Redhat 9 ? In-Reply-To: <3FC7020D@webmaila.hku.hk> References: <3FC7020D@webmaila.hku.hk> Message-ID: <1069543373.2179.6.camel@scalable> Hi, by strict definition, your 128 nodes is not a beowulf cluster but a NOW. but anyway, u need a batch system, parallel shell or a script to launch up2date with commandline options. If you have sge, pbs or lsf etc installed, u could lauch up2date and schedule the updates.. a better method is to explore other means to update only your master node and have your master node pushes or your compute nodes pull the updates... better overall security and manageability. take a look at Rocks, Scyld, Oscar etc.. these cluster toolkits helps remove many manual tasks of managing a cluster. Cheers! laurence ps. If u are going for ieee cluster 2003 in hong kong, we can meet and discuss further. On Sat, 2003-11-22 at 12:41, Woo Chat Ming wrote: > Dear beowulf friends, > > We are a university in Hong Kong and we have a Redhat Linux 9 > beowulf cluster consisting of 128 nodes. All of them have real > IP address and are connected to the Internet. > May I know how can I up2date all those nodes using a single > command ? > Thanks in advance for your information. > > Regards, > Woo Chat Ming. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Sat Nov 22 20:58:06 2003 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Sun, 23 Nov 2003 01:58:06 +0000 (UTC) Subject: [OT] statistical calculations In-Reply-To: Message-ID: This is off-topic for this list, I know; but coming from my background (linguistics) I can't think of a better place to ask. It's probably not the usual size problem list-members deal with, but to me it feels like it. I have to process a group of several thousand acquired datasets, each containing well over one hundred numerical items; and eventually, I'm going to have to work with a statistician to pull some meaningful figures out of it all. In other words, the data have to be massaged in some pretty fancy ways. For various reasons outwith my control this is being done principally via a spreadsheet (wouldn't have been an obvious choice for me, but hey, I only know about words, not numbers). Can anyone on this list used to doing this stuff point me towards a GPLed spreadsheet with built-in statistical functions? or an add-in to gnumeric / OpenOffice etc.? (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? Please correct me if I'm barking up a wrong tree here. Any help appreciated, -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sun Nov 23 02:09:12 2003 From: agrajag at dragaera.net (Jag) Date: 23 Nov 2003 02:09:12 -0500 Subject: How : up2date 128 nodes of Redhat 9 ? In-Reply-To: <3FC7020D@webmaila.hku.hk> References: <3FC7020D@webmaila.hku.hk> Message-ID: <1069571352.2022.18.camel@loiosh> On Fri, 2003-11-21 at 23:41, Woo Chat Ming wrote: > Dear beowulf friends, > > We are a university in Hong Kong and we have a Redhat Linux 9 > beowulf cluster consisting of 128 nodes. All of them have real > IP address and are connected to the Internet. > May I know how can I up2date all those nodes using a single > command ? > Thanks in advance for your information. RHN (https://rhn.redhat.com/) can handle this for you. If you get all your machines registered with RHN, you can log into the RHN webpage, and with a few mouse clicks tell it what packages to update on them. Your machines should be running the rhn client daemon, which will regularly connect to RHN's servers and download the appropriate updates. up2date is a part of RHN. If you're not up for paying RH for this service, I suggest looking into yum (http://linux.duke.edu/projects/yum/). With it you can setup a local repository that you update with new updates from Red Hat. You then have a cronjob on all your machines to run yum, which will update them off your local repository. Jag _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sun Nov 23 02:03:14 2003 From: agrajag at dragaera.net (Jag) Date: 23 Nov 2003 02:03:14 -0500 Subject: booting from usb pen drive In-Reply-To: <000601c3b127$a5845560$32a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> Message-ID: <1069570994.2022.10.camel@loiosh> On Sat, 2003-11-22 at 13:37, Jim Lux wrote: > Aiee.. an answer just long enough to really whet our appetites. > A bit longer answer please? Which BIOS? Which mobo? I don't have any specific machine information as to what does and doesn't support it. > How could one tell > (without having the mobo sitting in front of you)? Check online specs/user guides/feature lists from your manufacturer. Ask your sales rep. Consult the almighty oracle that resides at google.com. And any other method you'd normally use to find out information on specific motherboards and other hardware components. > This could be a very elegant solution for booting diskless nodes, since > virtually every mobo made today has USB interfaces on it, and would save you > the hassle of putting CDROM or Floppy drives out there. I'm not sure I'd be a fan of it. On one hand, you could just have one usb pen drive that you use to boot all the nodes. Nice in theory, but I really don't want to have to touch a slave node just to reboot it. Other than that, you'd have a nice usb key sticking out of either the front or rear of all your machines like a sore thumb, and would be quite easy to accidently brush against and break/pull-out/snap-off in your usb port. I have heard of people using usb devices (ipod's) to boot public kiosk machines so that if a machine were cracked into, the real system files couldn't be tampered with, and a reboot would wipe any added back doors. But that's a very different situation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sun Nov 23 09:46:10 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sun, 23 Nov 2003 06:46:10 -0800 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> Message-ID: <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> > > How could one tell > > (without having the mobo sitting in front of you)? > > Check online specs/user guides/feature lists from your manufacturer. > Ask your sales rep. Consult the almighty oracle that resides at > google.com. And any other method you'd normally use to find out > information on specific motherboards and other hardware components. Online spec sheets are usually a bit sketchy, and, while one can download the usermanual for the mobo most of the time, it often resorts to some hokey "Press F2 to get the boot selection menu, then press + or - to move the order around, see Figure nn" which you KNOW isn't the BIOS version you're going to get. As for the almighty oracle that is google, isn't that what this list is??? Perhaps one could email the mfr of the mobo and get an answer.. > > > This could be a very elegant solution for booting diskless nodes, since > > virtually every mobo made today has USB interfaces on it, and would save you > > the hassle of putting CDROM or Floppy drives out there. > > I'm not sure I'd be a fan of it. On one hand, you could just have one > usb pen drive that you use to boot all the nodes. Nice in theory, but I > really don't want to have to touch a slave node just to reboot it. I was thinking of a USB drive on each and every diskless node, not moving the one drive around. > Other than that, you'd have a nice usb key sticking out of either the > front or rear of all your machines like a sore thumb, and would be quite > easy to accidently brush against and break/pull-out/snap-off in your usb > port. Only if you packaged it that way... Lots of mobos have USB ports that come out to a header and they expect you to put a little adapter dohickey (which can cost as much as the USB drive) to create the USB jack on the front panel. Leave the USB drive inside the case. > > > I have heard of people using usb devices (ipod's) to boot public kiosk > machines so that if a machine were cracked into, the real system files > couldn't be tampered with, and a reboot would wipe any added back doors. > But that's a very different situation. As far as I know, you can't make a USB pod readonly, which is what I'd want for a non-tamperable, non-hackable, backup. Not so different. For what it's worth, this is how they do electronic voting machines, except I think they use Compact Flash. There's an "interesting" story about mass software updates of machines in Georgia over a weekend on the internet. (and you think managing the software configuration of a cluster is a challenge!) In my specific case, I'm looking for a cheap, off the shelf diskless boot solution that is compatible with having only wireless access to the node. My application is almost embarassingly parallel (by deliberate design) and the goal is to show that "useful work" can be done with power being the only physical connection to each node. So far, the CF/IDE adapter looks like a winner... > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sun Nov 23 12:19:12 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sun, 23 Nov 2003 17:19:12 +0000 Subject: booting from usb pen drive In-Reply-To: <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> Message-ID: <20031123171912.GA533@galactic.demon.co.uk> On Sun, Nov 23, 2003 at 06:46:10AM -0800, Jim Lux wrote: > > I was thinking of a USB drive on each and every diskless node, not moving > the one drive around. > > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be quite > > easy to accidently brush against and break/pull-out/snap-off in your usb > > port. > Only if you packaged it that way... Lots of mobos have USB ports that come > out to a header and they expect you to put a little adapter dohickey (which > can cost as much as the USB drive) to create the USB jack on the front > panel. Leave the USB drive inside the case. Fine if you can. If you can't the smallest 32M USB drive I've just seen is barely big enough to protrude beyond the rear of the case. Another has a neat cable to extend the USB "plug" by about two feet / 60cm. Just leave it dangling neatly and run a cable tie round it to tie it to the ethernet cable :) > > > > > > > I have heard of people using usb devices (ipod's) to boot public kiosk > > machines so that if a machine were cracked into, the real system files > > couldn't be tampered with, and a reboot would wipe any added back doors. > > But that's a very different situation. > As far as I know, you can't make a USB pod readonly, which is what I'd want > for a non-tamperable, non-hackable, backup. At least one of those I saw yesterday for round the GBP30 mark had a physical R/W switch. > > Not so different. For what it's worth, this is how they do electronic > voting machines, except I think they use Compact Flash. There's an > "interesting" story about mass software updates of machines in Georgia over > a weekend on the internet. (and you think managing the software > configuration of a cluster is a challenge!) > > In my specific case, I'm looking for a cheap, off the shelf diskless boot > solution that is compatible with having only wireless access to the node. My > application is almost embarassingly parallel (by deliberate design) and the > goal is to show that "useful work" can be done with power being the only > physical connection to each node. So far, the CF/IDE adapter looks like a > winner... > > This is effectively only a form factor converter. CF == IDE if you pull one pin low/high. Pull it whichever way (I can't remember right now :) ) and you can write to it as an IDE disk. Unassert it and it becomes CF and read only :) Google for the Soekris wireless devices / the Openbrick low power low form factor devices used primarily as firewalls and WiFi devices - they do more or less exactly this, as do some of the low power mini-ITX boards. CF doesn't like too many writes but read is forever IIRC. HTH, Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From seth at hogg.org Sun Nov 23 16:49:01 2003 From: seth at hogg.org (Simon Hogg) Date: Sun, 23 Nov 2003 21:49:01 +0000 Subject: booting from usb pen drive In-Reply-To: <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> Message-ID: <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> > > > This could be a very elegant solution for booting diskless nodes, since > > > virtually every mobo made today has USB interfaces on it, and would save >you > > > the hassle of putting CDROM or Floppy drives out there. > > > > I'm not sure I'd be a fan of it. On one hand, you could just have one > > usb pen drive that you use to boot all the nodes. Nice in theory, but I > > really don't want to have to touch a slave node just to reboot it. > >I was thinking of a USB drive on each and every diskless node, not moving >the one drive around. > > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be quite > > easy to accidently brush against and break/pull-out/snap-off in your usb > > port. >Only if you packaged it that way... Lots of mobos have USB ports that come >out to a header and they expect you to put a little adapter dohickey (which >can cost as much as the USB drive) to create the USB jack on the front >panel. Leave the USB drive inside the case. > >In my specific case, I'm looking for a cheap, off the shelf diskless boot >solution that is compatible with having only wireless access to the node. My >application is almost embarassingly parallel (by deliberate design) and the >goal is to show that "useful work" can be done with power being the only >physical connection to each node. So far, the CF/IDE adapter looks like a >winner... Forgive my intrusion, but I don't see why this approach is so very different from having a disk (sure, it's a solid state disk, but still, it's kind of a disk) and for all the messing with trying to install a usb pen drive in each node, why not just stick a CD-ROM in it to boot from (apart from size)? At least that's pretty much guaranteed to be read-only. But on a related note (and I *think* I have seen it on this list before) how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. It even comes in a 2.5" disk form factor. URL is at http://www.kontron.com/products/pdproductdetail.cfm?keyProduct=31731 for one of them, not sure if there are other developers out there, and I have no idea of cost. Simn _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Sun Nov 23 17:27:22 2003 From: lathama at yahoo.com (Andrew Latham) Date: Sun, 23 Nov 2003 14:27:22 -0800 (PST) Subject: booting from usb pen drive In-Reply-To: <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> Message-ID: <20031123222722.2858.qmail@web60310.mail.yahoo.com> the idea is that using a 32meg usb memory device to boot a system gives you now moving parts, is cheap, is 3l33t. The WEBASDISK/x is cool but I am assuming that they are not under $100USD. --- Simon Hogg wrote: > > > > > This could be a very elegant solution for booting diskless nodes, since > > > > virtually every mobo made today has USB interfaces on it, and would > save > >you > > > > the hassle of putting CDROM or Floppy drives out there. > > > > > > I'm not sure I'd be a fan of it. On one hand, you could just have one > > > usb pen drive that you use to boot all the nodes. Nice in theory, but I > > > really don't want to have to touch a slave node just to reboot it. > > > >I was thinking of a USB drive on each and every diskless node, not moving > >the one drive around. > > > > > Other than that, you'd have a nice usb key sticking out of either the > > > front or rear of all your machines like a sore thumb, and would be quite > > > easy to accidently brush against and break/pull-out/snap-off in your usb > > > port. > >Only if you packaged it that way... Lots of mobos have USB ports that come > >out to a header and they expect you to put a little adapter dohickey (which > >can cost as much as the USB drive) to create the USB jack on the front > >panel. Leave the USB drive inside the case. > > > >In my specific case, I'm looking for a cheap, off the shelf diskless boot > >solution that is compatible with having only wireless access to the node. My > >application is almost embarassingly parallel (by deliberate design) and the > >goal is to show that "useful work" can be done with power being the only > >physical connection to each node. So far, the CF/IDE adapter looks like a > >winner... > > Forgive my intrusion, but I don't see why this approach is so very > different from having a disk (sure, it's a solid state disk, but still, > it's kind of a disk) and for all the messing with trying to install a usb > pen drive in each node, why not just stick a CD-ROM in it to boot from > (apart from size)? At least that's pretty much guaranteed to be read-only. > > But on a related note (and I *think* I have seen it on this list before) > how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. It > even comes in a 2.5" disk form factor. > > URL is at > http://www.kontron.com/products/pdproductdetail.cfm?keyProduct=31731 for > one of them, not sure if there are other developers out there, and I have > no idea of cost. > > Simn > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /---------------------------------------------------------------------------------------------------\ Andrew Latham -LathamA - Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a god or the future with which religions are concerned with. Or, if not impossible, at least impossible at the present time. LathamA.com - (lay-th-ham-eh) - lathama at lathama.com - lathama at yahoo.com \---------------------------------------------------------------------------------------------------/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sun Nov 23 18:26:40 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sun, 23 Nov 2003 15:26:40 -0800 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> Message-ID: <000e01c3b21a$038f4280$36a8a8c0@laptop152422> It may be a virtual disk, but it's not a device with moving parts or one that requires anywhere as much cooling or power as a real disk. It also allows one to "power on boot" the cluster and be up and running relatively quickly, even with a low bandwidth link among nodes (i.e. wireless network), since one doesn't have to load the entire software image over the net. There are all manner of weird and wonderful adapters and solid state disk emulators aimed at the industrial market, among others, but I was looking for something very consumer/mass market (read cheap), since this is only going to have to work in a lab environment, albeit, no moving parts and DC supply. ----- Original Message ----- From: "Simon Hogg" To: "Jim Lux" ; "Jag" Cc: > > > >In my specific case, I'm looking for a cheap, off the shelf diskless boot > >solution that is compatible with having only wireless access to the node. My > >application is almost embarassingly parallel (by deliberate design) and the > >goal is to show that "useful work" can be done with power being the only > >physical connection to each node. So far, the CF/IDE adapter looks like a > >winner... > > Forgive my intrusion, but I don't see why this approach is so very > different from having a disk (sure, it's a solid state disk, but still, > it's kind of a disk) and for all the messing with trying to install a usb > pen drive in each node, why not just stick a CD-ROM in it to boot from > (apart from size)? At least that's pretty much guaranteed to be read-only. > > But on a related note (and I *think* I have seen it on this list before) > how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. It > even comes in a 2.5" disk form factor. > > URL is at > http://www.kontron.com/products/pdproductdetail.cfm?keyProduct=31731 for > one of them, not sure if there are other developers out there, and I have > no idea of cost. > > Simn > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sun Nov 23 18:49:03 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sun, 23 Nov 2003 18:49:03 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: <1069570994.2022.10.camel@loiosh> Message-ID: > Other than that, you'd have a nice usb key sticking out of either the > front or rear of all your machines like a sore thumb, and would be quite many motherboards have several additional USB ports (as headers) located outside the standard ATX backplate. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Sun Nov 23 19:57:50 2003 From: lathama at yahoo.com (Andrew Latham) Date: Sun, 23 Nov 2003 16:57:50 -0800 (PST) Subject: booting from usb pen drive In-Reply-To: Message-ID: <20031124005750.33631.qmail@web60310.mail.yahoo.com> I would also urge the use of a dongle. just a small one to maybe make a custom mount. Crazy thought of the week. What about KVMs that allow the access of shared USB devices!?!? --- Mark Hahn wrote: > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be quite > > many motherboards have several additional USB ports (as headers) > located outside the standard ATX backplate. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /---------------------------------------------------------------------------------------------------\ Andrew Latham -LathamA - Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a god or the future with which religions are concerned with. Or, if not impossible, at least impossible at the present time. LathamA.com - (lay-th-ham-eh) - lathama at lathama.com - lathama at yahoo.com \---------------------------------------------------------------------------------------------------/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 23 21:39:41 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 24 Nov 2003 13:39:41 +1100 Subject: booting from usb pen drive In-Reply-To: <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069570994.2022.10.camel@loiosh> <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> Message-ID: <200311241339.49659.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 24 Nov 2003 08:49 am, Simon Hogg wrote: > But on a related note (and I *think* I have seen it on this list before) > how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. Looking at the website for this it looks like what it actually does is IDE over TCP/IP, rather than the other way around. Still, interesting gadget. :-) Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/wW9zO2KABBYQAh8RAtiWAKCBTpUU00OlUzJ5+pJtfefkRUp90wCfT1TO zSvR7BTh/M7r/1tyGsisB4c= =gq0o -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 23 22:17:02 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 24 Nov 2003 14:17:02 +1100 Subject: RHEL Copyright Removal In-Reply-To: <20031121155100.GD8468@northwestern.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> <20031121155100.GD8468@northwestern.edu> Message-ID: <200311241417.03362.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 22 Nov 2003 02:51 am, Jesse Becker wrote: > Actaully, I believe that ROCKS is based on RHEL 2.1 WS. This is only true for the IA64 version of Rocks 3.0.0 (the current version), the release notes say: http://rocks.npaci.edu/rocks-documentation/3.0.0/release-notes.html Based on RedHat 7.3 for x86 and RedHat Advanced Workstation 2.1 for ia64 (all packages recompiled from publicly available source). > I've used it a few times, and parts of it are quite nice. Rocks is pretty cool, we've recently put it on an IA32 cluster owned by one of our member institutions which we manage for them and we've tweaked the installed systems a little (removed OpenPBS and MAUI and put Scalable PBS and the latest MAUI on instead), but that said it just works (for us). YMMV. :-) > The ROCKS guys have automated most of the recompile process, but I don't > know if the automation includes stripping out the RH stuff. Not tried the IA64 version (yet), so can't comment on that yet. cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/wXguO2KABBYQAh8RArPlAJ9topK3mzXCVkAWljRoXxNhEsxS9wCZAWJG 2pjxpWIAx9/Rpjkvh4Dd4E8= =a+pR -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From anand at novaglobal.com.sg Mon Nov 24 04:30:01 2003 From: anand at novaglobal.com.sg (Anand Vaidya) Date: Mon, 24 Nov 2003 17:30:01 +0800 Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: <200311241730.08129.anand@novaglobal.com.sg> You can checkout http://www.whiteboxlinux.org They seem to have successfully produced ISOs from RHEL3 sources. The dist is at RC1 now. Regards, Anand On Friday 21 November 2003 23:35, neil.brown at syngenta.com wrote: > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red > Hat family. RHEL or SLES are looking favourites in terms of supportability, > but of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not > sure whether it's ES or AS) so it surely can't be that bad as a cluster > oriented distro. > > Thanks for any suggestions, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From neil.brown at syngenta.com Mon Nov 24 03:53:17 2003 From: neil.brown at syngenta.com (neil.brown at syngenta.com) Date: Mon, 24 Nov 2003 08:53:17 -0000 Subject: booting from usb pen drive Message-ID: <0B27450D68F1D511993E0001FA7ED2B3036EE4F7@ukjhmbx12.ukjh.zeneca.com> > -----Original Message----- > From: Jim Lux [mailto:james.p.lux at jpl.nasa.gov] > Sent: 22 November 2003 18:37 > To: Jag > Cc: beowulf at beowulf.org > Subject: Re: booting from usb pen drive > > > Aiee.. an answer just long enough to really whet our appetites. > A bit longer answer please? Which BIOS? Which mobo? How > could one tell > (without having the mobo sitting in front of you)? > This could be a very elegant solution for booting diskless > nodes, since > virtually every mobo made today has USB interfaces on it, and > would save you > the hassle of putting CDROM or Floppy drives out there. I'd > point out that > NOT every mobo out there has PXE or network boot capability, > so this is a > nice alternative. Not sure about specific BIOS/Mobo models, you'd probably need to look at their specs on the respective manufacturers web sites, but Dell PC's have had this functionality built in for a while now. Ford made a big effort to get rid of floppy disk drives and use USB to boot their PC's when they needed to be rebuilt with their standard ghost image (albeit this was most probably Windoze). See http://tinyurl.com/wajo for more about that. My HP desktop PC that I'm writing this on also has support for USB boot, as, I imagine do most modern desktop PC's. As for 1U server motherboards, I'm not so sure, although again I'd imagine that most newish boards would have this capability. As you say, not every mobo has PXE capability and USB boot would certainly be a nice alternative to floppy booting in these cases. However, I think it's likely that if a mobo doesn't have PXE boot capability, it's not likely to have USB boot support either. Given the choice of the two, I'd go for PXE boot in a cluster computing environment unless I was doing a "one off" sort of thing where it wasn't worth the effort on the server side of the PXE boot. Just my tuppence worth, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From neil.brown at syngenta.com Mon Nov 24 04:06:36 2003 From: neil.brown at syngenta.com (neil.brown at syngenta.com) Date: Mon, 24 Nov 2003 09:06:36 -0000 Subject: RHEL Copyright Removal Message-ID: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> > Actaully, I believe that ROCKS is based on RHEL 2.1 WS. I've > used it a few > times, and parts of it are quite nice. The ROCKS guys have automated > most of the recompile process, but I don't know if the > automation includes > stripping out the RH stuff. > > -- > Jesse Becker > GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 Thanks everyone for your replies on this topic. I think part of our problem is that we're ideally looking for a standard distro that we can use on our Linux servers and desktop PC's as well as on our cluster. This would be nice, as it'd make administration easier with the commonality between Linux boxes. Perhaps this isn't the best way of doing it though. I'm beginning to think that maybe something like Fedora would be good for the cluster. I've had a play with it and it seems VERY similar to RH9. The fast paced release cycle wouldn't be so bad for the cluster, as it's easy to rebuild and we wouldn't need to upgrade EVERY time a new Fedora release came out. For the other servers, we often run Oracle and we really need to run a supported distro. The problem is, about the only supported Linux distro's later than RH7.1 are "paid for" ones like RHEL and SLES. They do support UnitedLinux too though. What would be nice is if there was a free Linux distro based on UnitedLinux. I've looked at cAos before. Looks good, I'd like to try it when a release becomes available. Not heard of White Box before, but I'll have a look at it. Thanks again, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 24 07:13:57 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 24 Nov 2003 07:13:57 -0500 (EST) Subject: [OT] statistical calculations In-Reply-To: Message-ID: On Sun, 23 Nov 2003, Martin WHEELER wrote: > This is off-topic for this list, I know; but coming from my background > (linguistics) I can't think of a better place to ask. > It's probably not the usual size problem list-members deal with, but to > me it feels like it. > > I have to process a group of several thousand acquired datasets, each > containing well over one hundred numerical items; and eventually, I'm > going to have to work with a statistician to pull some meaningful > figures out of it all. > In other words, the data have to be massaged in some pretty fancy ways. > > For various reasons outwith my control this is being done principally > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > I only know about words, not numbers). Can anyone on this list used to > doing this stuff point me towards a GPLed spreadsheet with built-in > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > Please correct me if I'm barking up a wrong tree here. Ask on the GSL (Gnu Scientific Library) list. There have been mentions on the list of people wrapping/encapsulating list functions in various ways, but I can't remember offhand if any of them were inside a spreadsheet per se. It also depends to some extent on what you mean by "built in statistical functions" -- GSL has the basic functions but is not a package like R. Which is the second thing you should probably look at on: www.r-project.org. R is a full-service stats suite with a variety of interfaces including web -- hopefully somebody has wrapped it up into a spreadsheet of some sort. rgb > > Any help appreciated, > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 24 08:20:28 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 24 Nov 2003 14:20:28 +0100 Subject: booting from usb pen drive In-Reply-To: <1069570994.2022.10.camel@loiosh> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> Message-ID: <1069680028.1218.5.camel@penguin> On Sun, 2003-11-23 at 08:03, Jag wrote: > > > This could be a very elegant solution for booting diskless nodes, since > > virtually every mobo made today has USB interfaces on it, and would save you > > the hassle of putting CDROM or Floppy drives out there. > > I'm not sure I'd be a fan of it. On one hand, you could just have one > usb pen drive that you use to boot all the nodes. Nice in theory, but I > really don't want to have to touch a slave node just to reboot it. > Other than that, you'd have a nice usb key sticking out of either the > front or rear of all your machines like a sore thumb, and would be quite > easy to accidently brush against and break/pull-out/snap-off in your usb > port. > I'll be happy to help anyone who wants to get Stresslinux running. Also another potential use would be for BIOS updates to nodes without floppies. Yes, I know it is just as easy to have a USB floppy drive attached. But the USB keychain things are just so portable. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 24 08:17:46 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 24 Nov 2003 14:17:46 +0100 Subject: booting from odd sources was Re: booting from usb pen drive In-Reply-To: <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <3FBF3AD5.4040301@andorra.ad> <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> Message-ID: <1069679866.1218.2.camel@penguin> On Sat, 2003-11-22 at 15:11, Jim Lux wrote: > Along the same lines (oddly, I was wondering about just this idea (booting > from USB)), one can get a IDE<>compact flash adapter for about $20 that > mounts right on the motherboard (space permitting). One CAN boot off the CF > drive (and you could use sneakernet to get the stuff on the drive in the > first place). I have booted the mini-ITX boards off Compact Flash and USB. Its quite easy. The secret though with the M1000 board is to completely power it off first. I have booted Tyan boards with a USB stick having Stresslinux on it. http://www.stresslinux.org Good to have in your toolkit - does CPU burn, memtest, Bonnie++, lm_sensors _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 24 09:07:52 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 24 Nov 2003 09:07:52 -0500 Subject: booting from usb pen drive In-Reply-To: <1069680028.1218.5.camel@penguin> References: <1069680028.1218.5.camel@penguin> Message-ID: <3FC210B8.60503@lmco.com> Good morning! A friend of mine and I have been talking about this type of thing for about a year now. Our idea was to put a base install on a CF card and boot from it. Prices on CF aren't too bad until you get to the high end (yes I know hard drives are cheaper) and with some cluster distributions, you only need 128 Megs and you would have plenty of space (may be able to get that down to 64 megs). Our goal behind using CF cards was to eliminate hard drives from the nodes as a possible source of downtime. One neat little gizmo we found is a 7-in-1 reader which can also handle floppies: http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=170109 It's a little pricey, but you can have pretty much whatever solid-state media is out there. It fits into a floppy bay. As others have pointed out, if your motherboard can boot off USB then this should work. Enjoy! Jeff > On Sun, 2003-11-23 at 08:03, Jag wrote: > > > > > > This could be a very elegant solution for booting diskless nodes, > since > > > virtually every mobo made today has USB interfaces on it, and > would save you > > > the hassle of putting CDROM or Floppy drives out there. > > > > I'm not sure I'd be a fan of it. On one hand, you could just have one > > usb pen drive that you use to boot all the nodes. Nice in theory, > but I > > really don't want to have to touch a slave node just to reboot it. > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be > quite > > easy to accidently brush against and break/pull-out/snap-off in your > usb > > port. > > > -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Mon Nov 24 09:29:18 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Mon, 24 Nov 2003 09:29:18 -0500 Subject: [OT] statistical calculations In-Reply-To: <200311241217.hAOCHMS31645@NewBlue.scyld.com> References: <200311241217.hAOCHMS31645@NewBlue.scyld.com> Message-ID: <20031124142918.GA52661@piskorski.com> > From: "Robert G. Brown" > To: Martin WHEELER > On Sun, 23 Nov 2003, Martin WHEELER wrote: > > I have to process a group of several thousand acquired datasets, each > > containing well over one hundred numerical items; and eventually, I'm > > going to have to work with a statistician to pull some meaningful > > figures out of it all. > > In other words, the data have to be massaged in some pretty fancy ways. > > > > For various reasons outwith my control this is being done principally > > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > > I only know about words, not numbers). Can anyone on this list used to > > doing this stuff point me towards a GPLed spreadsheet with built-in > > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > > Please correct me if I'm barking up a wrong tree here. > Ask on the GSL (Gnu Scientific Library) list. There have been mentions > on the list of people wrapping/encapsulating list functions in various > ways, but I can't remember offhand if any of them were inside a > spreadsheet per se. It also depends to some extent on what you mean by > "built in statistical functions" -- GSL has the basic functions but is > not a package like R. Which is the second thing you should probably > look at on: www.r-project.org. R is a full-service stats suite with a > variety of interfaces including web -- hopefully somebody has wrapped it > up into a spreadsheet of some sort. Martin, R should definitely do whatever statistical stuff you want. There is also an R plugin for the Gnumeric spreadsheet, and some stuff to let MS Excel call R. I've never tried either of those plugins, but they might be good if you don't want to use R directly: http://www.omegahat.org/RGnumeric/ For general vendor data clean-up and conversion issues, well, that depends. :) You didn't say enough for me to know whether you need to worry about that or not, but most of the vendor data I've seen (not in linguistics) has always needed cleanup of some sort! In my own line of work, for that sort of thing (which means for financial/market data), I mostly write Tcl code to read and manipulate the files, shove all the data into an RDBMS like Oracle or PostgreSQL, then sometimes do additional processing in the database. This works well, but if you're not already using an RDBMS you probably should NOT want to get into that for just for this one application. Most likely, as long as your data all fits (or almost fits?) into RAM, and you don't need the many-readers many-writers (concurrency, atomicity, etc.) support that a real RDBMS provides, stuffing all your data into a R's built in matrix or dataframe types should be fine. Depending on what the vendor files look like to begin with, you may want to pre-process them a bit with a Tcl, Perl, Python, or whatever script first to make them easier to get into R via R's read.table() function. -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 24 09:12:15 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 24 Nov 2003 15:12:15 +0100 Subject: booting from usb pen drive In-Reply-To: <3FC210B8.60503@lmco.com> References: <1069680028.1218.5.camel@penguin> <3FC210B8.60503@lmco.com> Message-ID: <1069683134.1218.18.camel@penguin> On Mon, 2003-11-24 at 15:07, Jeff Layton wrote: > Good morning! > > A friend of mine and I have been talking about this type > of thing for about a year now. Our idea was to put a base > install on a CF card and boot from it. All you need is a CF to IDE adapter. Google will find plenty, eg. http://www.cfide.co.uk/compact_flash_ide_adapters.shtml The Compact Flash card then plugs straight on an IDE cable. If you put a Linux image on the CF card the machine will boot it just the same as from a hard disk. (Of course I mean you read from the CF and boot to RAM) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 24 09:22:08 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 24 Nov 2003 09:22:08 -0500 Subject: booting from usb pen drive In-Reply-To: <1069683134.1218.18.camel@penguin> References: <1069683134.1218.18.camel@penguin> Message-ID: <3FC21410.3050908@lmco.com> John Hearns wrote: > On Mon, 2003-11-24 at 15:07, Jeff Layton wrote: > > Good morning! > > > > A friend of mine and I have been talking about this type > > of thing for about a year now. Our idea was to put a base > > install on a CF card and boot from it. > > All you need is a CF to IDE adapter. > Google will find plenty, eg. > http://www.cfide.co.uk/compact_flash_ide_adapters.shtml > The Compact Flash card then plugs straight on an IDE cable. > If you put a Linux image on the CF card the machine will boot it just > the same as from a hard disk. > (Of course I mean you read from the CF and boot to RAM) > We were thinking of actually booting and running off the CF card with it being mounted as RO. We would have to move certain things to a RAM disk such as parts of /var, /dev (probably), and a few others. However, using it with something like Warewulf which has already solved most of the details would be really neat. I'll have to think about trying this one. Thanks for the pointer! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Mon Nov 24 09:01:30 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Mon, 24 Nov 2003 22:01:30 +0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> Message-ID: <1069682488.2179.127.camel@scalable> Hi all, RedHat have annouced academic pricing at USD25 per desktop (RHEL WS based) and USD50 for Academic server (RHEL ES based) a week or so ago. >Raleigh, N.C.-based Red Hat, the top seller of the open-source >operating system, will sell students its Red Hat Academic Desktop >product for $25 and sell schools its Red Hat Academic Server product >for $50, including online software updates but no telephone support. >The products will be offered first in the United States, but will be >available internationally by the end of the year, said John Young, vice >president of marketing. I have been building clusters for 5 - 6 years for various customers, and have seen the arrivals and disappearance of distros and cluster distros.... The cluster community have done very well and today, large commercial organisations are adopting linux clusters as one of the tools they use to solve their complex problems. But I find this talk of "stripping" RHEL copyright to create yet another distro to be counter productive as linux beowulf clusters goes into commercial mainstream computing.... where customers have specific support demands. (And yes... commercial customers WILL PAY the full list price of RHEL to build a cluster). Now... I believe the USD25 and USD50 are acceptable pricing for the value that RHEL + RHN brings to the customer (academic). The cost of the OS is a small fraction of the total value of the cluster. Most of our users want a stable and supported OS, but more importantly, most of them run a commercial software of one form or another... and this means that these 3rd party ISV softwares are most likely to be certified on RHEL. It would do me no good if I build a cluster with a "RHEL with copyright removed" or a fedora core as my customers would not be able to get support for their Ansys, Fluent, Matlab and so on and so forth... yes technically they can be the same.. but commercial support matrix says otherwise. BTW ROCKS V3 is based on RHEL 3.0 WS... With the new RHEL academic pricing model, I would encourage all to go for the academic pricing for RHEL and focus on the real problem on hand which is building better cluster systems ontop of a commerical quality, robust and supported OS, rather than try to roll-your-own distro.. and support updates etc etc... Linux have enough Linux distro already. What we should be concentrating on is to create more value ontop of existing distros such as RHEL... create better cluster toolkits like what the Rocks and Oscar guys are doing, or improve on Ganglia, PVFS, distributed shared mem, checkpointing etc.... or focus on getting your apps to run faster... There are alot of cluster problems that needs to be addressed and I believe the community would benefit more if we focus on these issues rather than another distro.... let Redhat make what they deserve, let them continue to engage the ISVs and get them to certify and support RHEL... the wider the based of ISVs running on RHEL.. the faster and wider the adoption of Linux not only in the schools but also in the enterprises. if the community continues to fork a project just becauses it charges some $$$$, our progress would be very slow.... Redhat have listened to the customer and partners and have created a academic pricing model for cluster builders... so we should accept that and move on. today the linux market is anchored by Redhat and a few other linux vendors... imagine if Redhat were to become unprofitable and closes shop.... the impact would be tremendous. yes.. there will always be another linux company that will try to take over redhat position in the market..., but the credibility of the linux community and the opensource business model would be thrown into disarray and you will see droves of commercial ISVs abandoning linux and moving back to UNIX and Windows.... where would that leave us? without commercial apps, linux would never sustain and grow in the commercial arena. cheers! laurence On Mon, 2003-11-24 at 17:06, neil.brown at syngenta.com wrote: > > > Actaully, I believe that ROCKS is based on RHEL 2.1 WS. I've > > used it a few > > times, and parts of it are quite nice. The ROCKS guys have automated > > most of the recompile process, but I don't know if the > > automation includes > > stripping out the RH stuff. > > > > -- > > Jesse Becker > > GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 > > Thanks everyone for your replies on this topic. > > I think part of our problem is that we're ideally looking for a standard > distro that we can use on our Linux servers and desktop PC's as well as on > our cluster. This would be nice, as it'd make administration easier with the > commonality between Linux boxes. Perhaps this isn't the best way of doing it > though. I'm beginning to think that maybe something like Fedora would be > good for the cluster. I've had a play with it and it seems VERY similar to > RH9. The fast paced release cycle wouldn't be so bad for the cluster, as > it's easy to rebuild and we wouldn't need to upgrade EVERY time a new Fedora > release came out. > > For the other servers, we often run Oracle and we really need to run a > supported distro. The problem is, about the only supported Linux distro's > later than RH7.1 are "paid for" ones like RHEL and SLES. They do support > UnitedLinux too though. What would be nice is if there was a free Linux > distro based on UnitedLinux. > > I've looked at cAos before. Looks good, I'd like to try it when a release > becomes available. Not heard of White Box before, but I'll have a look at > it. > > Thanks again, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From azubrow at galton.uchicago.edu Mon Nov 24 11:38:19 2003 From: azubrow at galton.uchicago.edu (Alexis Zubrow) Date: Mon, 24 Nov 2003 10:38:19 -0600 (CST) Subject: [OT] statistical calculations In-Reply-To: Message-ID: Martin- A related possibility is to use some sort of database. You might be able to "easily" translate the original datasets into one of the SQL based database formats. If you can do that, I know that some of them can be accessed via python or R, which will give you a much larger suite of computational possibilities. One database that I've tried out is mySQL: http://www.mysql.com I know that this can be accessed via python and R, as well as a bunch of other programming languages. Though it doesn't sound like you want or need to parallelize this, both python and R have wrappers around MPI code. Best, Alexis > > For various reasons outwith my control this is being done principally > > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > > I only know about words, not numbers). Can anyone on this list used to > > doing this stuff point me towards a GPLed spreadsheet with built-in > > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > > Please correct me if I'm barking up a wrong tree here. > > Ask on the GSL (Gnu Scientific Library) list. There have been mentions > on the list of people wrapping/encapsulating list functions in various > ways, but I can't remember offhand if any of them were inside a > spreadsheet per se. It also depends to some extent on what you mean by > "built in statistical functions" -- GSL has the basic functions but is > not a package like R. Which is the second thing you should probably > look at on: www.r-project.org. R is a full-service stats suite with a > variety of interfaces including web -- hopefully somebody has wrapped it > up into a spreadsheet of some sort. > > rgb > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Mon Nov 24 13:29:13 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Mon, 24 Nov 2003 13:29:13 -0500 Subject: [OT] statistical calculations In-Reply-To: <200311241702.hAOH2IS02987@NewBlue.scyld.com> References: <200311241702.hAOH2IS02987@NewBlue.scyld.com> Message-ID: <20031124182913.GA12259@piskorski.com> On Mon, Nov 24, 2003 at 12:02:18PM -0500, beowulf-request at scyld.com wrote: > A related possibility is to use some sort of database. You might be able Yes indeed. > computational possibilities. One database that I've tried out is mySQL: > http://www.mysql.com This is getting, way, way of topic for this list, but as someone who's done a lot of database programming, I feel compelled to point out that, generally speaking, you should never, ever use MySQL for anything important unless you BOTH: 1. Have very specific technical requirements which you have assured yourself MySQL is capable of meeting. (This will be many fewer applications than you might think.) 2. Have specific reasons why MySQL is a better choice for you than any other database. (E.g., you are really cheap, and can find a shared hosting service offering MySQL cheaper than one offering PostgreSQL.) There are many, many reasons why MySQL is usually a poor choice for database applications, but if you care, here are two links to get you started: http://openacs.org/philosophy/why-not-mysql.html http://sql-info.de/mysql/gotchas.html But if you don't want to worry about any of that the answer is simple, just use PostgreSQL instead. (Or perhaps Firebird or SAPdb; but PostgreSQL would be my first choice in any open source database.) -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Mon Nov 24 09:53:47 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Mon, 24 Nov 2003 09:53:47 -0500 Subject: booting from odd sources was Re: booting from usb pen drive In-Reply-To: <1069679866.1218.2.camel@penguin> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <3FBF3AD5.4040301@andorra.ad> <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> <1069679866.1218.2.camel@penguin> Message-ID: <20031124145347.GA15355@netmeister.org> John Hearns wrote: > I have booted Tyan boards with a USB stick having Stresslinux on it. > http://www.stresslinux.org `` Hey, it worked ! The SSL/TLS-aware Apache webserver was successfully installed on this website. If you can see this page, then the people who own this website have just installed the Apache Web server software and the Apache Interface to OpenSSL (mod_ssl) successfully. They now have to add content to this directory and replace this placeholder page, or else point the server at their real content. [...]'' Hehe. -Jan -- 'I have reached an age where my main purpose is not to receive messages.' --- Umberto Eco, quoted in the New Yorker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 186 bytes Desc: not available URL: From mp00aa at cosc.brocku.ca Sun Nov 23 18:25:49 2003 From: mp00aa at cosc.brocku.ca (Matthew Timothy Pratola) Date: Sun, 23 Nov 2003 18:25:49 -0500 Subject: [OT] statistical calculations (Martin WHEELER) In-Reply-To: <200311231701.hANH1CS16397@NewBlue.scyld.com> References: <200311231701.hANH1CS16397@NewBlue.scyld.com> Message-ID: Hello Martin, The primary opensource package used in statistical analyses is R, which you can find at www.r-project.org. R is an OSS implementation of the S language, which is also the basis for the commercial package S-Plus. A quick search of "using R in gnumeric" gives the following link: http://www.omegahat.org/RGnumeric/Docs/introduction.pdf which may be helpful. I don't know if any of the other spreadsheet programs have an R plugin written, i would suspect not. At any rate, if the data you are working with will require some pretty fancy approaches, i'd be pretty suprised that any spreadsheet program (ie without said plugin) would be able to do anything half-decent with any degree of reliability. Especially for large datasets. Anyhow, to keep this slightly on-topic, in a recent conversation with someone from the R project, i was told that there is maybe a rough, non-widely distributed implementation of MPI in R, which i think would be nice, but currently searching R and MPI on google does not yield much. Actually the person i spoke to gave me a name to search for, but i don't have that information in front of me right now... -Matt ps - i'm a starving grad student just heading home for xmas vacation, so i don't have a lot to do for the next 3 weeks if you are looking for some short-term R coding work to be done... ....................................................................... Matthew T. Pratola http://zynec.homelinux.net mtpratol _at_ cs.sfu.ca Home: 604.899.8845 Office: 604.291.4983 Department of Statistics and Actuarial Science, Simon Fraser University ....................................................................... > I have to process a group of several thousand acquired datasets, each > containing well over one hundred numerical items; and eventually, I'm > going to have to work with a statistician to pull some meaningful > figures out of it all. > In other words, the data have to be massaged in some pretty fancy ways. > > For various reasons outwith my control this is being done principally > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > I only know about words, not numbers). Can anyone on this list used to > doing this stuff point me towards a GPLed spreadsheet with built-in > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > Please correct me if I'm barking up a wrong tree here. > > Any help appreciated, > -- > Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England > mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ > GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB > - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From r.grenyer at imperial.ac.uk Mon Nov 24 10:53:29 2003 From: r.grenyer at imperial.ac.uk (Rich Grenyer) Date: Mon, 24 Nov 2003 15:53:29 +0000 Subject: [OT] statistical calculations In-Reply-To: <20031124142918.GA52661@piskorski.com> Message-ID: <58C0AF58-1E96-11D8-8E20-003065F0ED32@imperial.ac.uk> Likewise, as a heavy-ish R user, I'd say go look, immediately. R is a stunning piece of software anyway, but I suspect the level of interface between most major language packages (as the previous poster said, it talks both ways to Tcl, Perl and Python to name but a few) and database implementations alone would make it your first stop. *Most* statisticians would love you for it, too. Rich On Monday, Nov 24, 2003, at 14:29 Europe/London, Andrew Piskorski wrote: >> From: "Robert G. Brown" >> To: Martin WHEELER > >> On Sun, 23 Nov 2003, Martin WHEELER wrote: >>> I have to process a group of several thousand acquired datasets, each >>> containing well over one hundred numerical items; and eventually, I'm >>> going to have to work with a statistician to pull some meaningful >>> figures out of it all. >>> In other words, the data have to be massaged in some pretty fancy >>> ways. >>> >>> For various reasons outwith my control this is being done principally >>> via a spreadsheet (wouldn't have been an obvious choice for me, but >>> hey, >>> I only know about words, not numbers). Can anyone on this list used >>> to >>> doing this stuff point me towards a GPLed spreadsheet with built-in >>> statistical functions? or an add-in to gnumeric / OpenOffice etc.? >>> (I believe such exist.) Or maybe a library of GPLed spreadsheet >>> macros? >>> Please correct me if I'm barking up a wrong tree here. > >> Ask on the GSL (Gnu Scientific Library) list. There have been >> mentions >> on the list of people wrapping/encapsulating list functions in various >> ways, but I can't remember offhand if any of them were inside a >> spreadsheet per se. It also depends to some extent on what you mean >> by >> "built in statistical functions" -- GSL has the basic functions but is >> not a package like R. Which is the second thing you should probably >> look at on: www.r-project.org. R is a full-service stats suite with a >> variety of interfaces including web -- hopefully somebody has wrapped >> it >> up into a spreadsheet of some sort. > > Martin, R should definitely do whatever statistical stuff you want. > There is also an R plugin for the Gnumeric spreadsheet, and some stuff > to let MS Excel call R. I've never tried either of those plugins, but > they might be good if you don't want to use R directly: > > http://www.omegahat.org/RGnumeric/ > > For general vendor data clean-up and conversion issues, well, that > depends. :) You didn't say enough for me to know whether you need to > worry about that or not, but most of the vendor data I've seen (not in > linguistics) has always needed cleanup of some sort! > > In my own line of work, for that sort of thing (which means for > financial/market data), I mostly write Tcl code to read and manipulate > the files, shove all the data into an RDBMS like Oracle or PostgreSQL, > then sometimes do additional processing in the database. This works > well, but if you're not already using an RDBMS you probably should NOT > want to get into that for just for this one application. > > Most likely, as long as your data all fits (or almost fits?) into RAM, > and you don't need the many-readers many-writers (concurrency, > atomicity, etc.) support that a real RDBMS provides, stuffing all your > data into a R's built in matrix or dataframe types should be fine. > Depending on what the vendor files look like to begin with, you may > want to pre-process them a bit with a Tcl, Perl, Python, or whatever > script first to make them easier to get into R via R's read.table() > function. > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From william.mandra at us.army.mil Sun Nov 23 22:36:01 2003 From: william.mandra at us.army.mil (William J Mandra) Date: Sun, 23 Nov 2003 22:36:01 -0500 Subject: Need a little help getting started Message-ID: Hello all. I am new to this lit and apologize in advance if any of the questions that I have are silly but here it goes. I am in the design phase of a cluster and I am having some trouble figuring out which software packages to use. The cluster will originally consist of 12 nodes linked via 100BaseT switched ethernet and a cluster controller. The following are some of my requirements: 1. All nodes netboot off of the cluster controller 2. automatic process migration and load balancing (openMOSIX) 3. distributed shared memory The cluster controller will be connected to both the main network and the private cluster network and I would like to be able to start applications on the cluster remotely via the cluster controller. I have been doing an exhaustive amount of research on all of the different software available to accomplish this, but I have fallen short in figuring out which ones will work together. I am planning on using Red Hat 9 on all of the nodes in the cluster. I just need a little more information to give me that push in the right direction. I do have some time, s I am not planning to start building the cluster until March or April. Thanks in advance, William Mandra _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Mon Nov 24 15:04:55 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Mon, 24 Nov 2003 12:04:55 -0800 Subject: [OT] statistical calculations (Martin WHEELER) In-Reply-To: (Matthew Timothy Pratola's message of "Sun, 23 Nov 2003 18:25:49 -0500") References: <200311231701.hANH1CS16397@NewBlue.scyld.com> Message-ID: <851xrxh6s8.fsf@blindglobe.net> Matthew Timothy Pratola writes: > Anyhow, to keep this slightly on-topic, in a recent conversation with > someone from the R project, i was told that there is maybe a rough, > non-widely distributed implementation of MPI in R, which i think would be > nice, but currently searching R and MPI on google does not yield much. > Actually the person i spoke to gave me a name to search for, but i don't > have that information in front of me right now... Look for Rmpi. I believe it's in the contrib non-current directory on CRAN. It works with LAM-MPI, though we've talked about extending it to MPICH. If interested in programming statistical calculations on a beowulf, one might consider SNOW, which is an R library which provides a higher (but simpler) level implementation (independent of PVM or MPI -- will even use socket-based communication on a cluster if you don't have it), and integrates transparently with SPRNG (the scalable parallel RNG). See http://www.analytics.washington.edu/~rossini/courses/cph-statcomp/ and Lecture/Lab 4 for description/issues in interactively computing statistical quantities on a computational cluster (for statisticians who don't want to figure out communication, and just want to get results faster). (I'm biased in my view -- we wrote the wrappers to PVM and SPRNG for R, as well as contributed to SNOW, and just need to extend the current set of MPI wrappers). best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Mon Nov 24 15:04:55 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Mon, 24 Nov 2003 12:04:55 -0800 Subject: [OT] statistical calculations (Martin WHEELER) In-Reply-To: (Matthew Timothy Pratola's message of "Sun, 23 Nov 2003 18:25:49 -0500") References: <200311231701.hANH1CS16397@NewBlue.scyld.com> Message-ID: <851xrxh6s8.fsf@blindglobe.net> Matthew Timothy Pratola writes: > Anyhow, to keep this slightly on-topic, in a recent conversation with > someone from the R project, i was told that there is maybe a rough, > non-widely distributed implementation of MPI in R, which i think would be > nice, but currently searching R and MPI on google does not yield much. > Actually the person i spoke to gave me a name to search for, but i don't > have that information in front of me right now... Look for Rmpi. I believe it's in the contrib non-current directory on CRAN. It works with LAM-MPI, though we've talked about extending it to MPICH. If interested in programming statistical calculations on a beowulf, one might consider SNOW, which is an R library which provides a higher (but simpler) level implementation (independent of PVM or MPI -- will even use socket-based communication on a cluster if you don't have it), and integrates transparently with SPRNG (the scalable parallel RNG). See http://www.analytics.washington.edu/~rossini/courses/cph-statcomp/ and Lecture/Lab 4 for description/issues in interactively computing statistical quantities on a computational cluster (for statisticians who don't want to figure out communication, and just want to get results faster). (I'm biased in my view -- we wrote the wrappers to PVM and SPRNG for R, as well as contributed to SNOW, and just need to extend the current set of MPI wrappers). best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 24 16:51:06 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 24 Nov 2003 16:51:06 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: Message-ID: On Sat, 22 Nov 2003, Mark Hahn wrote: > > Depends if your mobo has a boot from usb option. It's slightly more complex than that: only some (many, but not all) USB memory devices are usable as boot media. The Intel-branded Itanium-2 (I2) machines can boot from USB devices. Intel might be the best source for a list of usable USB boot devices. The I2 might be the only interesting case for USB booting: an I2 kernel can't even come close to fitting in 1.44 or 2.88 MB! > I wonder how bootable usb-keys work. it would be pretty useless > if the bios only had enough smarts to load a bootsector and run it. > the bios must at least contain enough of a usb-block driver to let > it emulate a floppy disk. if so, I'd expect linux to "just work"... We've been doing this for years with Scyld BeoBoot: use the BIOS to load both the kernel and an ramdisk '/'. The now-standard Linux approach is loading an "initrd", which accomplishes the same thing with a slightly different environment. The advantage here is that the kernel doesn't require USB support built-in, or any USB support at all! Everything needed from the boot media is loaded into memory by the boot ROM + BIOS. But bottom line is that booting is no longer a hotly-debated cluster issue. Essentially every current system has PXE network booting. Approaches such as BeoBoot stage 1 or USB booting are only needed for legacy machines. With x86 machines you can use PXE to do BIOS updates, hardware diagnostics, or boot the machine as a cluster node, all without touching the hardware. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 24 17:13:53 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 24 Nov 2003 17:13:53 -0500 (EST) Subject: Need a little help getting started In-Reply-To: Message-ID: On Sun, 23 Nov 2003, William J Mandra wrote: > Hello all. I am new to this lit and apologize in advance if any of the > questions that I have are silly but here it goes. I am in the design phase > of a cluster and I am having some trouble figuring out which software > packages to use. The cluster will originally consist of 12 nodes linked via > 100BaseT switched ethernet and a cluster controller. The following are some > of my requirements: > 1. All nodes netboot off of the cluster controller > 2. automatic process migration and load balancing (openMOSIX) Do you require transparent process migration at run-time (e.g. Mosix) which imposed significant overhead, or will directed process migration work? > 3. distributed shared memory Ahhh, you have control of your application, which implies that you likely won't benefit from transparent process migration. There are several Distributed Shared Memory (DSM) systems, with different design tradeoffs. Since it's very easy to thrash a DSM system, you should select one that matches you application's needs and then carefully tune your application. You should treat the DSM system exactly the same as MPI or the message-passing subsystem of PVM: a library that fits with the rest of the system, not the piece around which everything else revolves. > The cluster controller will be connected to both the main network and the > private cluster network and I would like to be able to start applications on > the cluster remotely via the cluster controller. That's a normal configuration. Almost every cluster design configures one (or a small number of) master and designates the other machines as compute nodes. The Scyld system goes further by making the compute slaves capable of only running processes initiated and controlled by the master. > I have been doing an exhaustive amount of research on all of the different > software available to accomplish this, but I have fallen short in figuring > out which ones will work together. You'll find two approaches: - Monolithic designs, that have no independently replaceable subsystems - Component designs, that use independent subsystem The challenge is implementing component designs using an over-all architecture that results in a simple system. Most approaches using independent components end up being unable to evolve. The result is overly feature-full, complex subsystems as individual try to address new problems using only the subsystem they understand and have control over. > I am planning on using Red Hat 9 on all of the nodes in the cluster. You should understand what you are asking for: perhaps you mean "I need library and application compatibility with Red Hat 9". Because you aren't going to get process migration and DSM without modifying the kernel and/or libraries. > I just need a little more information to give me that push in the right > direction. I do have some time, s I am not planning to start building the > cluster until March or April. You should consider Gigabit Ethernet a likely baseline network by then. If your application requires DSM, there is a fair chance that you would benefit from Remote DMA (RDMA) or Remote Write in SCI, Myrinet, Quadrics or Infiniband. Selecting one of those will impose a library interface, and you may find that you have few additional decisions to make. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 24 17:41:04 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 24 Nov 2003 17:41:04 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: Message-ID: On Mon, 24 Nov 2003, Mark Hahn wrote: > > > > Depends if your mobo has a boot from usb option. > > > > It's slightly more complex than that: only some (many, but not all) USB > > memory devices are usable as boot media. > > I've seen that advertised, but it was unclear to me whether it was a > purely marketing feature or not. > what does the device need to do to support booting? It surprised me that Intel needed to list which USB memory devices were usable as boot devices. > > We've been doing this for years with Scyld BeoBoot: use the BIOS to load > > both the kernel and an ramdisk '/'. The now-standard Linux approach is > > right, but this is actually two-step, no? that is, the bios only loads > the bootsector and jumps to it. your code in the bootsector (or just > the generic code in the kernel's boot.S) is then responsible for making > further bios calls for reading more than that 512B. so if the bios > doesn't provide a floppy-like block driver, it wouldn't work. Correct. The bootloader - is in 16 bit mode, - may only use the basic BIOS entry points for reading blocks, - must follow rules such as periodically calling the keyboard-read loop The key is that your bootloader must load everything the final system might need before exiting 16 bit mode, 'cause there ain't no goin' back. > I guess what I'm wondering is whether a bios that provides USB-booting > does actually provide a block driver. Yes, but it's a BIOS block driver -- it's not suitable for general purpose use. The functionality might be divided between polling hardware with interrupts disabled and doing things within the keyboard-read calls. It might re-program the timer and PIC chips, or use the SIM mode of the processor. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From derek.richardson at pgs.com Mon Nov 24 18:12:16 2003 From: derek.richardson at pgs.com (Derek Richardson) Date: Mon, 24 Nov 2003 17:12:16 -0600 Subject: Opteron kernel Message-ID: <3FC29050.6000003@pgs.com> All, Does anyone know where to find info on tuning the linux kernel for Opterons? Googling hasn't turned up much useful information. Thanks, Derek R. -- Linux Administrator derek.richardson at pgs.com derek.richardson at ieee.org Office 713-781-4000 Cell 713-817-1197 Madison's Inquiry: If you have to travel on the Titanic, why not go first class? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jducom at nd.edu Mon Nov 24 19:17:54 2003 From: jducom at nd.edu (Jean-Christophe Ducom) Date: Mon, 24 Nov 2003 19:17:54 -0500 Subject: Beowulf of bare motherboards Message-ID: <3FC29FB2.5070504@nd.edu> I tried to find a link to a 'old' project where people were using racks to put barebone motherboards (to save the cost of the case basically). It was similar to the following project but was more elaborated (it was possible to pull out the bare motherboards of the shelf, etc...) http://www.abo.fi/~physcomp/cluster/celeron.html I spent hours to find it on google..without success. Could anyone remember it? Please send the link. Thanks a lot JC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Mon Nov 24 20:14:51 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Mon, 24 Nov 2003 20:14:51 -0500 Subject: Beowulf of bare motherboards In-Reply-To: <3FC29FB2.5070504@nd.edu> References: <3FC29FB2.5070504@nd.edu> Message-ID: <3FC2AD0B.4090301@comcast.net> Is this it? http://www.clustercompute.com/ Jeff > I tried to find a link to a 'old' project where people were using > racks to put barebone motherboards (to save the cost of the case > basically). > It was similar to the following project but was more elaborated (it > was possible to pull out the bare motherboards of the shelf, etc...) > http://www.abo.fi/~physcomp/cluster/celeron.html > > I spent hours to find it on google..without success. > Could anyone remember it? Please send the link. > Thanks a lot > > JC > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Mon Nov 24 20:30:08 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Mon, 24 Nov 2003 17:30:08 -0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <1069682488.2179.127.camel@scalable> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> Message-ID: <20031125013008.GA6416@sphere.math.ucdavis.edu> On Mon, Nov 24, 2003 at 10:01:30PM +0800, Laurence Liew wrote: > Hi all, > > RedHat have annouced academic pricing at USD25 per desktop (RHEL WS > based) and USD50 for Academic server (RHEL ES based) a week or so ago. This sounded relatively attractive to me, until I found out that USD25 per desktop for RHEL WS did NOT include the Opteron version. To add insult to injury RHEL ES does not support opteron. > Now... I believe the USD25 and USD50 are acceptable pricing for the > value that RHEL + RHN brings to the customer (academic). The cost of the > OS is a small fraction of the total value of the cluster. Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of $792. If you want named, dhcpd, and friends it's $1992. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 24 20:10:39 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 24 Nov 2003 17:10:39 -0800 (PST) Subject: Beowulf of bare motherboards In-Reply-To: <3FC29FB2.5070504@nd.edu> Message-ID: hi ya On Mon, 24 Nov 2003, Jean-Christophe Ducom wrote: > I tried to find a link to a 'old' project where people were using racks to put > barebone motherboards (to save the cost of the case basically). hotmail and google used those motherboard in the 19" (kingstarusa.com) racks -- looks like its discontinued ?? - a flat piece of (aluminum/steel) metal (from home depot/orchard) will work too you know - just add a couple holes on stand off for the mb and power supply - or get a sheet metal shop to bend and drill a few holes w rack mounting ears > It was similar to the following project but was more elaborated (it was possible > to pull out the bare motherboards of the shelf, etc...) > http://www.abo.fi/~physcomp/cluster/celeron.html i'm very interested in those systems ... - to build a cluster w/ just motherboards and optionally w/ disks - power supply will be simple +12vDC wall adaptor ... - P4-3G equivalent mb/cpu - it'd be a good engineering challenge :-) ( big question is what holds up the back of the "caseless" ( motherboards and disks c ya alvin > I spent hours to find it on google..without success. > Could anyone remember it? Please send the link. > Thanks a lot there are other pc104 based caseless clusters http://eri.ca.sandia.gov/eri/howto.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Nov 24 19:40:38 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 24 Nov 2003 16:40:38 -0800 Subject: booting from usb pen drive In-Reply-To: References: < Message-ID: <5.2.0.9.2.20031124163754.018c7b58@mailhost4.jpl.nasa.gov> At 04:51 PM 11/24/2003 -0500, Donald Becker wrote: >But bottom line is that booting is no longer a hotly-debated cluster >issue. Essentially every current system has PXE network booting. Every "x86, wired ethernet" cluster has PXE booting. >Approaches such as BeoBoot stage 1 or USB booting are only needed for >legacy machines. Or for clusters built with some other processor (still COTS, but not necessarily "currently sold x86 mobo in the consumer/office market" COTS). > With x86 machines you can use PXE to do BIOS >updates, hardware diagnostics, or boot the machine as a cluster node, >all without touching the hardware. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Nov 24 23:00:55 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 25 Nov 2003 15:00:55 +1100 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> <20031125013008.GA6416@sphere.math.ucdavis.edu> Message-ID: <200311251500.56467.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 25 Nov 2003 12:30 pm, Bill Broadley wrote: > Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of > $792. If you want named, dhcpd, and friends it's $1992. Release Candidate 1 of Mandrake 9.2 for AMD64 is now available for download. http://www.mandrakelinux.com/en/92amd64beta.php3 There's also an experimental Gentoo build available, and a Debian port is in the works. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/wtP3O2KABBYQAh8RAk6zAJ4xqhx0pCbf2BJehd+pkwb7uXpEoQCeINBF e7gR5Gnx7f33dKueUF7UiUQ= =Bsw1 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From herrold at owlriver.com Mon Nov 24 23:53:14 2003 From: herrold at owlriver.com (R P Herrold) Date: Mon, 24 Nov 2003 23:53:14 -0500 (EST) Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> <20031125013008.GA6416@sphere.math.ucdavis.edu> Message-ID: On Mon, 24 Nov 2003, Bill Broadley wrote: > Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of > $792. If you want named, dhcpd, and friends it's $1992. Goodness, such pessimism. The new Red Hat pricing model allows for you to avail yourselves of their release integration for a long lived product tail, 'instant ISO' download, and for various scaled support models (including their 'up2date' tool for remote console administration, update pools, and update scheduling) Clever folks saved copies of the beta ISO's and are 90 percent there already. Daemon applications simply don't change that fast, unless there is a security matter. Buy the low end stripped model, and get the kernel and libraries updates by RH; invest a week to learn yum and package building and signing, and add whatever application layer tools you want. Or pay a third party to build them for you, to a SLA you can afford. Owl River has sold such third-party services for years, as have the nice folks of the KRUD distribution, the Wirex folks, and so forth. It's a small group of people doing this, coming from both inside and outside the RH private beta testers group; many are listed toward the bottom of: http://www.owlriver.com/projects/packaging/ As for binaries built by a third party are not on the manifest for that package, the 'up2date' update channel from Red Hat should simply ignore them, as it would any other 'foriegn' package. One has to assume their client is smart enough to ignore updating non-channel content. If not, you gain a windfall (or maybe are harmed if it updates someting you did not expect it to); if so, download the updates from the trusted alternative archive as planned. I have pushed the development of yum, and published proof of concept code under the GPL to use yum for some of server-side functions similar to RH's 'up2date', and published added kickstart integration, as well as for the more familiar client side tasks. Large parts of our work future will continue to be available under the GPL our 'cAos' participation. http://www.owlriver.com/support/yum/ For those unwilling to read, experiment, maintain the needed devel lab, and develop, I am happy to sell such services. http://www.owlriver.com/support/wings/ BTW: The RH exit from the mass 'free download/free support/forever' market should come as no great surprise to folks; I note that our page is untouched since early June when this outcome was pretty obvious (and before RH formally even named their new line). -- Russ Herrold _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hunting at ix.netcom.com Tue Nov 25 01:19:16 2003 From: hunting at ix.netcom.com (Michael Huntingdon) Date: Mon, 24 Nov 2003 22:19:16 -0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <1069682488.2179.127.camel@scalable> <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> Message-ID: <3.0.3.32.20031124221916.01250910@popd.ix.netcom.com> Huge suprise to all of us. Someone or a group of folks will have to jump in, write the code and fill the void. Not sure something like with will take long. At 05:30 PM 11/24/2003 -0800, Bill Broadley wrote: >On Mon, Nov 24, 2003 at 10:01:30PM +0800, Laurence Liew wrote: >> Hi all, >> >> RedHat have annouced academic pricing at USD25 per desktop (RHEL WS >> based) and USD50 for Academic server (RHEL ES based) a week or so ago. > >This sounded relatively attractive to me, until I found out that >USD25 per desktop for RHEL WS did NOT include the Opteron version. > >To add insult to injury RHEL ES does not support opteron. > >> Now... I believe the USD25 and USD50 are acceptable pricing for the >> value that RHEL + RHN brings to the customer (academic). The cost of the >> OS is a small fraction of the total value of the cluster. > >Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of >$792. If you want named, dhcpd, and friends it's $1992. > >-- >Bill Broadley >Mathematics >UC Davis >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Tue Nov 25 09:29:50 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Tue, 25 Nov 2003 22:29:50 +0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> <20031125013008.GA6416@sphere.math.ucdavis.edu> Message-ID: <1069770053.2179.224.camel@scalable> Hi, I am still waiting for my Red Hat rep to get me official pricing which should also provide me the platforms offered under the Academic program. If currently there is no academic pricing for AMD64, then what I would suggest you do is write to your Red Hat rep/sales manager and ask for it. Over in Singapore/Asia Pac, I have asked for Academic pricing for RHEL for HPC clusters for the last 8 months.. and I guess part of my prayers have been answered. The pricing announced is very close (better in fact) to what I have requested for at USD50 per node. It is not easy convincing Red Hat that I needed a HPC pricing model, but it can be done and I guess they have listened (to this mailing list, their customers and their partners). Most vendors need to get feedback on what is wanted/required, and it is up to the community to let Red Hat know what that is... just be reasonable. As a business, they need to survive and make a profit. If we can argue a win-win proposal, I am sure they will listen. BTW, we have used RHEL WS to build clusters and it seemed to include all required daemons (sorry do not have access to AMD64 yet... so cannot comment). Cheers! laurence On Tue, 2003-11-25 at 09:30, Bill Broadley wrote: > On Mon, Nov 24, 2003 at 10:01:30PM +0800, Laurence Liew wrote: > > Hi all, > > > > RedHat have annouced academic pricing at USD25 per desktop (RHEL WS > > based) and USD50 for Academic server (RHEL ES based) a week or so ago. > > This sounded relatively attractive to me, until I found out that > USD25 per desktop for RHEL WS did NOT include the Opteron version. > > To add insult to injury RHEL ES does not support opteron. > > > Now... I believe the USD25 and USD50 are acceptable pricing for the > > value that RHEL + RHN brings to the customer (academic). The cost of the > > OS is a small fraction of the total value of the cluster. > > Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of > $792. If you want named, dhcpd, and friends it's $1992. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.worsham at mci.com Tue Nov 25 13:38:55 2003 From: michael.worsham at mci.com (Michael Worsham) Date: Tue, 25 Nov 2003 13:38:55 -0500 Subject: Serious processing power... Message-ID: <001101c3b383$62515190$9c9832a6@Wcomnet.com> This article might interest a few of you with some serious processing power... Meet the real star of Lord of the Rings - a 1,600-box server farm. http://www.wired.com/wired/archive/11.12/play.html?pg=2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Nov 25 15:05:55 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 25 Nov 2003 15:05:55 -0500 (EST) Subject: Opteron kernel In-Reply-To: <3FC29050.6000003@pgs.com> Message-ID: On Mon, 24 Nov 2003, Derek Richardson wrote: > Does anyone know where to find info on tuning the linux kernel for > Opterons? Googling hasn't turned up much useful information. What type of tuning? PCI bus transactions (the Itanium required more, but the Opteron still benefits)? Scheduling? Processor affinity? What kernel version? If you ask specific questions, there is likely someone on the list that knows the specific answer. The easiest performance improvement comes from proper memory DIMM configuration to match the application layout. Each processor has its own local memory controller, and understanding how the memory slots are filled and the options e.g. interleave can make a 30% difference on a dual processor system. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Tue Nov 25 18:21:49 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 25 Nov 2003 15:21:49 -0800 Subject: Opteron kernel In-Reply-To: References: <3FC29050.6000003@pgs.com> Message-ID: <20031125232149.GA2995@greglaptop.internal.keyresearch.com> On Tue, Nov 25, 2003 at 03:05:55PM -0500, Donald Becker wrote: > The easiest performance improvement comes from proper memory DIMM > configuration to match the application layout. Each processor has its > own local memory controller, and understanding how the memory slots are > filled and the options e.g. interleave can make a 30% difference on a > dual processor system. I second this -- don't trust what you think you *know* (we all know it only has 1 memory channel, so you shouldn't have to fill all the dimm slots) and instead measure (filling all the dimms slots helps perf.) The 2.6 kernels have a bit better performance than 2.4, and there are bugs that simply aren't fixed in 2.4, including one that our compiler stomps on frequently. You will definitely want the "runon" command for processor affinity... but it will change your choice of interleave in the BIOS. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 25 22:28:54 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 25 Nov 2003 22:28:54 -0500 Subject: Opteron kernel In-Reply-To: <20031125232149.GA2995@greglaptop.internal.keyresearch.com> References: <3FC29050.6000003@pgs.com> <20031125232149.GA2995@greglaptop.internal.keyresearch.com> Message-ID: <1069817334.8326.122.camel@protein.scalableinformatics.com> On Tue, 2003-11-25 at 18:21, Greg Lindahl wrote: > You will definitely want the "runon" command for processor affinity... > but it will change your choice of interleave in the BIOS. Hi Greg: Has anyone implemented a real runon, or built something like the old IRIX dplace stuff yet? I had been looking into this, and don't want to re-invent a working thing... Joe > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 02:30:14 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 08:30:14 +0100 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doe sn't succeed Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2EB@agnnl02.mas.eurocontrol.be> Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Nov 26 04:00:08 2003 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 26 Nov 2003 01:00:08 -0800 Subject: Cluster benchmark summary Message-ID: <20031126090008.GA15109@cse.ucdavis.edu> Greetings all. Many thanks for the many responses. My main frustration and motivation for my benchmark proposal was the relatively poor relationship between advertised link latencies and bandwidths and actual application level scaling/efficiency. Turns out this was a popular topic at SC2003, I discussed it with many people, and attended a few discussions on it. McCalpin had a talk discussing (from memory, so expect inaccuracies) how MFlops predict spec cpu_rate (very poorly) and how memory bandwidth predicts cpu_rate (less poorly). He then discussed a hybrid model using MFlops, cache size, and memory bandwidth. Something along the lines of 0.8 bytes per flop with zero cache and 0.1 bytes per flop with 8MB of cache was used for a model to predict Spec cpu_rate based on MFlops, and memory bandwidth. Using this fairly simple model Mflops * cachefficency * bandwidth led to a pretty good correlation with the 900+ spec cpu_rate numbers he collected (my vague memory wants to claim +/- 10 or 20%). Interpreted by me as somewhat of a validation of microbenchmarks acting as a predictor of real application performance (for applications that are well understood). If I find the slides online I'll post (if someone else does please follow up). At least he has a convincing graph (I know, a great way to lie) on his predictions for 923 spec_cpu_rate results. The most noteworthy benchmark suite mentioned at SC2003 was: http://icl.cs.utk.edu/hpcc/ Basically 5 benchmarks (well 4 + the top-500 HP Linpack) to help quantify cluster performance and scaling. McCalpin's stream, Random Access (I believe I heard this referred to as something that sounded liked Gups), Ptrans(parallel matrix transpose), and b_eff (effective bandwidth benchmark). Current version is at 0.4 alpha, so here is your chance people, improve it while you can. I'm assuming that input is welcome, and patches doubly so. I think this is a great start. Currently submitted results are for a Cray (vector), Alphaserver, Itanium2, Altix, and Power 4 based clusters. I'd love to see additional numbers for Myrinet, Dolphin, Quadrics and Infiniband clusters. Submit yours today! Oh and most importantly (no Spec mistakes here), source is available, so have at it and report results (click on archive or upload). I have no idea what the license status of the source is, it is available for download but doesn't mention any licensing terms. Ideally it will be GPL or similar. I believe source code optimization is legal AFTER reporting based unmodified results. I also believe that ALL results most be posted, mainly to avoiding cherry picking. Of course the URL mentioned is the authorative source for such info. I heard rumors from several different people that top500.org was going to collect these performance numbers, but still rank only on HPL. Of course people can download the results and rank however they want. So hopefully this will lead to interconnect companies competing on complete cluster performance instead of link speeds and latencies. ============================================================================ I'll list here any other benchmarks people brought to my attention, please follow up if I missed anything, many of the messages came in at the conference after my ethernet and wireless died (damn Dell laptop), of course upon my return I've been swamped with email. Felix Rauch mentioned the Switchmark discussed in a paper at: http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 A collection of benchmarks, mentioned at SC2003, is available at: http://www.ipacs-benchmark.org John Hearns mentioned: http://www.plogic.com/bps (beowulf performance suite) More related discussion at: http://www.cluster-rant.com/article.pl?sid=03/03/17/1838236. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 03:29:49 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 09:29:49 +0100 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doe sn't succeed Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2EF@agnnl02.mas.eurocontrol.be> Oh, yes. My original email contained the Ethereal dumps: --- snip --- But still, when trying ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the board doesn't respond to the Activate Session request: Frame 1 (65 bytes on wire, 65 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.094931000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 65 bytes Capture Length: 65 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 51 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1923 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 31 Checksum: 0x8d6b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 9 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Request LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Checksum 2: 0x33 Data (2 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 33 00 00 40 00 40 11 19 23 c8 c8 c8 01 c8 c8 .3.. at .@..#...... 0020 c8 04 1b 59 02 6f 00 1f 8d 6b 06 00 ff 07 00 00 ...Y.o...k...... 0030 00 00 00 00 00 00 00 09 20 18 c8 81 04 38 0e 02 ........ ....8.. 0040 33 3 Frame 2 (72 bytes on wire, 72 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321677000 Time delta from previous packet: 0.226746000 seconds Time relative to first packet: 0.226746000 seconds Frame Number: 2 Packet Length: 72 bytes Capture Length: 72 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 58 Identification: 0x0a9c (2716) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e70 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 38 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 16 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Response LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Completion Code: Command completed normally (0x00) Checksum 2: 0x97 Data (8 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 3a 0a 9c 40 00 40 11 0e 70 c8 c8 c8 04 c8 c8 .:.. at .@..p...... 0020 c8 01 02 6f 1b 59 00 26 00 00 06 00 ff 07 00 00 ...o.Y.&........ 0030 00 00 00 00 00 00 00 10 81 1c 63 20 04 38 00 01 ..........c .8.. 0040 20 1c 00 bd 13 00 00 97 ....... Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321885000 Time delta from previous packet: 0.000208000 seconds Time relative to first packet: 0.226954000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0x3693 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0xd0 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 1b 59 02 6f 00 2e 36 93 06 00 ff 07 00 00 ...Y.o..6....... 0030 00 00 00 00 00 00 00 18 20 18 c8 81 08 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 d0 DMIN............ Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.436904000 Time delta from previous packet: 0.115019000 seconds Time relative to first packet: 0.341973000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a9d (2717) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e62 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0xdd Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 9d 40 00 40 11 0e 62 c8 c8 c8 04 c8 c8 .G.. at .@..b...... 0020 c8 01 02 6f 1b 59 00 33 00 00 06 00 ff 07 00 00 ...o.Y.3........ 0030 00 00 00 00 00 00 00 1c 81 1c 63 20 08 39 00 10 ..........c .9.. 0040 3d 22 9c 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 =".uN.9..Vao ..x 0050 6b 86 c2 dd 00 k.... Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.437332000 Time delta from previous packet: 0.000428000 seconds Time relative to first packet: 0.342401000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0xc2b9 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0x9c223d10 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0x9c223d10 Authentication Code: DD4D2F557F83B6DFE6E9CACAE38CA53E Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x0c 0000 11.. = Sequence: 0x03 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0x3c Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 1b 59 02 6f 00 43 c2 b9 06 00 ff 07 05 00 ...Y.o.C........ 0030 00 00 00 10 3d 22 9c dd 4d 2f 55 7f 83 b6 df e6 ....="..M/U..... 0040 e9 ca ca e3 8c a5 3e 1d 20 18 c8 81 0c 3a 05 02 ......>. ....:.. 0050 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 6b 86 c2 uN.9..Vao ..xk.. 0060 dd 76 ef fd 3c .v..< The same holds true when changing ipmi_auths[5] to: { ipmi_md5_authcode_init, ipmi_md5_authcode_gen, ipmi_md5_authcode_check, ipmi_md5_authcode_cleanup } However, the Java IPMI tool that SuperMicro delivers with the BMC is able to activate the session: Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.427023000 Time delta from previous packet: 0.002488000 seconds Time relative to first packet: 0.123603000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0xcd34 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0x59 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 8d af 02 6f 00 2e cd 34 06 00 ff 07 00 00 .....o...4...... 0030 00 00 00 00 00 00 00 18 20 18 c8 00 00 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 59 DMIN...........Y Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.542351000 Time delta from previous packet: 0.115328000 seconds Time relative to first packet: 0.238931000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a93 (2707) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e6c (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0x85 Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 93 40 00 40 11 0e 6c c8 c8 c8 04 c8 c8 .G.. at .@..l...... 0020 c8 01 02 6f 8d af 00 33 00 00 06 00 ff 07 00 00 ...o...3........ 0030 00 00 00 00 00 00 00 1c 00 1c e4 20 00 39 00 10 ........... .9.. 0040 30 d2 b3 b4 66 5c 8d 59 95 1f 9e 48 1d 51 b3 4f 0...f\.Y...H.Q.O 0050 cd eb 3f 85 00 ..?.. Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.549163000 Time delta from previous packet: 0.006812000 seconds Time relative to first packet: 0.245743000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0x836b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0xb3d23010 Authentication Code: 5A50292FC164E754A3E7846B0A96880F Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0xbd Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 8d af 02 6f 00 43 83 6b 06 00 ff 07 05 00 .....o.C.k...... 0030 00 00 00 10 30 d2 b3 5a 50 29 2f c1 64 e7 54 a3 ....0..ZP)/.d.T. 0040 e7 84 6b 0a 96 88 0f 1d 20 18 c8 00 00 3a 05 04 ..k..... ....:.. 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 00 00 00 bd ..... Frame 6 (90 bytes on wire, 90 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.627386000 Time delta from previous packet: 0.078223000 seconds Time relative to first packet: 0.323966000 seconds Frame Number: 6 Packet Length: 90 bytes Capture Length: 90 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 76 Identification: 0x0a94 (2708) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e66 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 56 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000001 Session ID: 0xb3d23010 Authentication Code: 1C000048D88D000200000000000000A4 Message Length: 18 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Activate Session (0x3a) Completion Code: Command completed normally (0x00) Checksum 2: 0xd8 Data (10 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 4c 0a 94 40 00 40 11 0e 66 c8 c8 c8 04 c8 c8 .L.. at .@..f...... 0020 c8 01 02 6f 8d af 00 38 00 00 06 00 ff 07 05 01 ...o...8........ 0030 00 00 00 10 30 d2 b3 1c 00 00 48 d8 8d 00 02 00 ....0.....H..... 0040 00 00 00 00 00 00 a4 12 00 1c e4 20 00 3a 00 05 ........... .:.. 0050 10 30 d2 b3 00 00 00 00 04 d8 .0........ --- snip --- Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 08:56 To: WANGNICK Sebastian; openipmi-developer at lists.sourceforge.net; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 07:55:11 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 13:55:11 +0100 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doe sn't succeed Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2F7@agnnl02.mas.eurocontrol.be> Hello Jeff, when doing ./ipmicmd -k "0f 00 06 01" lan 200.200.200.4 623 md2 admin ADMIN ADMIN I'm getting Requested authentication 1 not supported (supporting 0x20 only)Unable to setup connection: 16 as explained below. This is OpenIPMI-1.5.5, modified as described further down. Is 172.16.211.198 a SuperMicro or an Intel machine? Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 13:24 To: WANGNICK Sebastian; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed I can connect using following commands: [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 none user "" " " Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 After set LAN password by SSU, [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 md2 user "" "1 23456" Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: WANGNICK Sebastian [mailto:sebastian.wangnick at eurocontrol.int] Sent: Wednesday, November 26, 2003 4:30 PM To: Zheng, Jeff; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Oh, yes. My original email contained the Ethereal dumps: --- snip --- But still, when trying ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the board doesn't respond to the Activate Session request: Frame 1 (65 bytes on wire, 65 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.094931000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 65 bytes Capture Length: 65 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 51 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1923 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 31 Checksum: 0x8d6b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 9 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Request LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Checksum 2: 0x33 Data (2 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 33 00 00 40 00 40 11 19 23 c8 c8 c8 01 c8 c8 .3.. at .@..#...... 0020 c8 04 1b 59 02 6f 00 1f 8d 6b 06 00 ff 07 00 00 ...Y.o...k...... 0030 00 00 00 00 00 00 00 09 20 18 c8 81 04 38 0e 02 ........ ....8.. 0040 33 3 Frame 2 (72 bytes on wire, 72 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321677000 Time delta from previous packet: 0.226746000 seconds Time relative to first packet: 0.226746000 seconds Frame Number: 2 Packet Length: 72 bytes Capture Length: 72 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 58 Identification: 0x0a9c (2716) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e70 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 38 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 16 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Response LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Completion Code: Command completed normally (0x00) Checksum 2: 0x97 Data (8 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 3a 0a 9c 40 00 40 11 0e 70 c8 c8 c8 04 c8 c8 .:.. at .@..p...... 0020 c8 01 02 6f 1b 59 00 26 00 00 06 00 ff 07 00 00 ...o.Y.&........ 0030 00 00 00 00 00 00 00 10 81 1c 63 20 04 38 00 01 ..........c .8.. 0040 20 1c 00 bd 13 00 00 97 ....... Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321885000 Time delta from previous packet: 0.000208000 seconds Time relative to first packet: 0.226954000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0x3693 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0xd0 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 1b 59 02 6f 00 2e 36 93 06 00 ff 07 00 00 ...Y.o..6....... 0030 00 00 00 00 00 00 00 18 20 18 c8 81 08 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 d0 DMIN............ Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.436904000 Time delta from previous packet: 0.115019000 seconds Time relative to first packet: 0.341973000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a9d (2717) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e62 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0xdd Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 9d 40 00 40 11 0e 62 c8 c8 c8 04 c8 c8 .G.. at .@..b...... 0020 c8 01 02 6f 1b 59 00 33 00 00 06 00 ff 07 00 00 ...o.Y.3........ 0030 00 00 00 00 00 00 00 1c 81 1c 63 20 08 39 00 10 ..........c .9.. 0040 3d 22 9c 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 =".uN.9..Vao ..x 0050 6b 86 c2 dd 00 k.... Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.437332000 Time delta from previous packet: 0.000428000 seconds Time relative to first packet: 0.342401000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0xc2b9 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0x9c223d10 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0x9c223d10 Authentication Code: DD4D2F557F83B6DFE6E9CACAE38CA53E Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x0c 0000 11.. = Sequence: 0x03 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0x3c Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 1b 59 02 6f 00 43 c2 b9 06 00 ff 07 05 00 ...Y.o.C........ 0030 00 00 00 10 3d 22 9c dd 4d 2f 55 7f 83 b6 df e6 ....="..M/U..... 0040 e9 ca ca e3 8c a5 3e 1d 20 18 c8 81 0c 3a 05 02 ......>. ....:.. 0050 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 6b 86 c2 uN.9..Vao ..xk.. 0060 dd 76 ef fd 3c .v..< The same holds true when changing ipmi_auths[5] to: { ipmi_md5_authcode_init, ipmi_md5_authcode_gen, ipmi_md5_authcode_check, ipmi_md5_authcode_cleanup } However, the Java IPMI tool that SuperMicro delivers with the BMC is able to activate the session: Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.427023000 Time delta from previous packet: 0.002488000 seconds Time relative to first packet: 0.123603000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0xcd34 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0x59 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 8d af 02 6f 00 2e cd 34 06 00 ff 07 00 00 .....o...4...... 0030 00 00 00 00 00 00 00 18 20 18 c8 00 00 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 59 DMIN...........Y Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.542351000 Time delta from previous packet: 0.115328000 seconds Time relative to first packet: 0.238931000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a93 (2707) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e6c (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0x85 Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 93 40 00 40 11 0e 6c c8 c8 c8 04 c8 c8 .G.. at .@..l...... 0020 c8 01 02 6f 8d af 00 33 00 00 06 00 ff 07 00 00 ...o...3........ 0030 00 00 00 00 00 00 00 1c 00 1c e4 20 00 39 00 10 ........... .9.. 0040 30 d2 b3 b4 66 5c 8d 59 95 1f 9e 48 1d 51 b3 4f 0...f\.Y...H.Q.O 0050 cd eb 3f 85 00 ..?.. Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.549163000 Time delta from previous packet: 0.006812000 seconds Time relative to first packet: 0.245743000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0x836b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0xb3d23010 Authentication Code: 5A50292FC164E754A3E7846B0A96880F Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0xbd Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 8d af 02 6f 00 43 83 6b 06 00 ff 07 05 00 .....o.C.k...... 0030 00 00 00 10 30 d2 b3 5a 50 29 2f c1 64 e7 54 a3 ....0..ZP)/.d.T. 0040 e7 84 6b 0a 96 88 0f 1d 20 18 c8 00 00 3a 05 04 ..k..... ....:.. 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 00 00 00 bd ..... Frame 6 (90 bytes on wire, 90 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.627386000 Time delta from previous packet: 0.078223000 seconds Time relative to first packet: 0.323966000 seconds Frame Number: 6 Packet Length: 90 bytes Capture Length: 90 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 76 Identification: 0x0a94 (2708) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e66 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 56 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000001 Session ID: 0xb3d23010 Authentication Code: 1C000048D88D000200000000000000A4 Message Length: 18 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Activate Session (0x3a) Completion Code: Command completed normally (0x00) Checksum 2: 0xd8 Data (10 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 4c 0a 94 40 00 40 11 0e 66 c8 c8 c8 04 c8 c8 .L.. at .@..f...... 0020 c8 01 02 6f 8d af 00 38 00 00 06 00 ff 07 05 01 ...o...8........ 0030 00 00 00 10 30 d2 b3 1c 00 00 48 d8 8d 00 02 00 ....0.....H..... 0040 00 00 00 00 00 00 a4 12 00 1c e4 20 00 3a 00 05 ........... .:.. 0050 10 30 d2 b3 00 00 00 00 04 d8 .0........ --- snip --- Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 08:56 To: WANGNICK Sebastian; openipmi-developer at lists.sourceforge.net; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Wed Nov 26 10:37:51 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Wed, 26 Nov 2003 08:37:51 -0700 Subject: Opteron kernel In-Reply-To: <1069817334.8326.122.camel@protein.scalableinformatics.com>; from landman@scalableinformatics.com on Tue, Nov 25, 2003 at 10:28:54PM -0500 References: <3FC29050.6000003@pgs.com> <20031125232149.GA2995@greglaptop.internal.keyresearch.com> <1069817334.8326.122.camel@protein.scalableinformatics.com> Message-ID: <20031126083751.A6434@lnxi.com> On Tue, Nov 25 2003 at 20:28, Joe Landman wrote: > On Tue, 2003-11-25 at 18:21, Greg Lindahl wrote: > > > You will definitely want the "runon" command for processor affinity... > > but it will change your choice of interleave in the BIOS. > > Hi Greg: > > Has anyone implemented a real runon, or built something like the old > IRIX dplace stuff yet? I had been looking into this, and don't want to > re-invent a working thing... http://www.tech9.net/rml/schedutils/ hth, Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From egan at sense.net Wed Nov 26 11:11:21 2003 From: egan at sense.net (Egan Ford) Date: Wed, 26 Nov 2003 09:11:21 -0700 Subject: Opteron kernel In-Reply-To: <1069817334.8326.122.camel@protein.scalableinformatics.com> Message-ID: <066901c3b437$ef2d9a10$27b358c7@titan> node15:~ # numactl usage: numactl [--interleave=nodes] [--homenode=homenode] [--cpubind=nodes] [--membind=nodes] [--localalloc] command args ... numactl [--show] nodes is a comma delimited list of node numbers. You can get this as part of SLES8 SP3, however it appears to only work with the 2.4.19 included kernel, not 2.4.21. > -----Original Message----- > From: beowulf-admin at scyld.com > [mailto:beowulf-admin at scyld.com] On Behalf Of Joe Landman > Sent: Tuesday, November 25, 2003 8:29 PM > To: Greg Lindahl > Cc: Beowulf > Subject: Re: Opteron kernel > > > On Tue, 2003-11-25 at 18:21, Greg Lindahl wrote: > > > You will definitely want the "runon" command for processor > affinity... > > but it will change your choice of interleave in the BIOS. > > Hi Greg: > > Has anyone implemented a real runon, or built something like the old > IRIX dplace stuff yet? I had been looking into this, and > don't want to > re-invent a working thing... > > Joe > > > > -- greg > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeff.zheng at intel.com Wed Nov 26 02:55:35 2003 From: jeff.zheng at intel.com (Zheng, Jeff) Date: Wed, 26 Nov 2003 15:55:35 +0800 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Message-ID: <37FBBA5F3A361C41AB7CE44558C3448E011959F1@pdsmsx403.ccr.corp.intel.com> Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Openipmi-developer mailing list Openipmi-developer at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 11:57:43 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 17:57:43 +0100 Subject: SuperMicro IPMI authentication Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2F9@agnnl02.mas.eurocontrol.be> Dear Peter, thanks for your quick answer. I'll cross-post your answer to the Beowulf and Openipmi mailing list. Please note that this approach renders your IPMI support propriatary, and unusable for us (your tool won't pass our safety assessment). However, I'm not interested in source code of IPMIview (Java is not an option for us anyway). What I'm asking is the specification of the IPMI type 5 authentification. I just learned via the Beowulf mailing list that the Intel server boards do fully support type 0, 1, 2 and 4 authentication as specified in the standard. May I ask you, based on this clarification, to reconsider your position? Thanks in advance, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Support_Europe [mailto:Support at supermicro.nl] Sent: Wednesday 26 November 2003 15:00 To: Sebastian Wangnick Subject: RE: IPMI authorisation Hello Sir, while the IPMI 1.5 standard is implemented with our IPMI card we have to customize our IPMIview program, this is the only program to use with the IPMI card. While we can't give free the source code of the program. Best Regards, Peter Maas Supermicro Computer B.V. Het Sterrenbeeld 28 5215 ML 's-Hertogenbosch The Netherlands T: +31 (0)73-6400390 F: +31 (0)73-6416525 -----Original Message----- From: Sebastian Wangnick [mailto:sebastian.wangnick at eurocontrol.int] Sent: Wednesday, November 26, 2003 11:24 AM To: support at supermicro.nl Subject: IPMI authorisation Name: Sebastian Wangnick E-mail: sebastian.wangnick at eurocontrol.int Phone: +31 43366 1370 Model: SM-X5DPL-iGM with IPM Question or Comment: Dear Madam, dear Sir, I'm trying in vain to create an IPMI LAN session to my SuperMicro system. The system always replies in the GetChannelAuthCapabilities response that it doesn't support any standard authorisation scheme (0=No auth, 1=MD2, 2=MD5, 4=Straight), only an OEM-specific one (5=OEM), and neither MD2 nor MD5 seem to fit by chance. Which authorisation algorithm am I to use to successfully activate the IPMI session? Note that for system engineering reasons I can not rely on your IPMI-View tool. Regards, Sebastian Wangnick -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From c00jsh00 at nchc.gov.tw Wed Nov 26 01:10:57 2003 From: c00jsh00 at nchc.gov.tw (Jyh-Shyong Ho) Date: Wed, 26 Nov 2003 14:10:57 +0800 Subject: MPICH-1.2.5 on Opteron Message-ID: <3FC443F1.DDF317E7@nchc.gov.tw> Hi, I wonder if this is the right please to post this question, however, I'll appreciate if anyone can provide me some suggestion. We have a 1+4 nodes of dual Opteron cluster running SuSE Linux Enterprise 8 for AMD64, I installed MPICH-1.2.5 with PGI 5.1 64-bit compiler on the system. the MPICH configure file /opt/mpich/ch_p4/share/machines.LINUX has the following lines: Zephyr:2 Eurus1:2 Eurus2:2 Eurus3:2 Eurus4:2 When we ran a MPI program, it did not run with 10 cpus, however, it ran with 8 cpus. What might be the possible reason that not all cpus can be used? Jyh-Shyong Ho, PhD. Research Scientist National Center for High-Performance Computing Hsinchu, Taiwan, ROC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Wed Nov 26 04:58:32 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Wed, 26 Nov 2003 10:58:32 +0100 Subject: How : up2date 128 nodes of Redhat 9 ? In-Reply-To: <3FC7020D@webmaila.hku.hk> References: <3FC7020D@webmaila.hku.hk> Message-ID: <1069840712.7839.24.camel@revolution.mandrakesoft.com> Le sam 22/11/2003 ? 05:41, Woo Chat Ming a ?crit : > Dear beowulf friends, Hi, > We are a university in Hong Kong and we have a Redhat Linux 9 > beowulf cluster consisting of 128 nodes. All of them have real > IP address and are connected to the Internet. Woow ! Your nodes are using public IPs ? Why not MASQUERADING them ? > May I know how can I up2date all those nodes using a single > command ? You may use a parallel command such as rshp or c3 to ask your nodes to update themself. On MandrakeClustering/CLIC, urpmi parallel allow you to install packages on a full cluster. In your case, the following command executed on the server, ask each node to choose the updates it needs from the sources the server knows (the main distribution, updates from internet, your own packages etc..) and then uses rshp&mput from KA-Tools (http://ka-tools.sourceforge.net/) for copying/installing efficiently rpms on each nodes. It take the same range of time for 1 or 200 nodes ! urpmi.update -a # <-- Read the lastest updates for your rpms sources urpmi --parallel cluster --auto-select # <-- Ask each node of the "cluster" group to update itself with the list of rpms that the server knows. Best regards, -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From graham.mullier at syngenta.com Wed Nov 26 12:48:54 2003 From: graham.mullier at syngenta.com (graham.mullier at syngenta.com) Date: Wed, 26 Nov 2003 17:48:54 -0000 Subject: long reply (was RE: LONG RANT [RE: RHEL Copyright Removal]) Message-ID: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> Laurence Liew wrote: [...] > The cluster community have done very well and today, large commercial > organisations are adopting linux clusters as one of the tools they use > to solve their complex problems. Yup, that's us - we are currently adopting a variety of open source tools, including Linux, to help tackle at least some of our HPC needs. > > But I find this talk of "stripping" RHEL copyright to create > yet another > distro to be counter productive as linux beowulf clusters goes into > commercial mainstream computing.... where customers have specific > support demands. (And yes... commercial customers WILL PAY > the full list > price of RHEL to build a cluster). > > Now... I believe the USD25 and USD50 are acceptable pricing for the > value that RHEL + RHN brings to the customer (academic). The > cost of the > OS is a small fraction of the total value of the cluster. > > Most of our users want a stable and supported OS, but more > importantly, > most of them run a commercial software of one form or another... and > this means that these 3rd party ISV softwares are most likely to be > certified on RHEL. [...] I think you are confusing things here (I know you are ranting but let's try to keep the arguments coherent, please! ;) I'm running a project within a commercial company, so academic rates are of no use to me. I am willing to pay for what I get, but I'm not willing to pay simply to give us a warm glow that we are "supported". If I get some value I'll pay. I don't think I get value if I'm expected to pay separately for each copy of RHEL-AS on each of 42 compute nodes, and the only price I'm offered is an extreme full list price. I would be willing to buy into a model where I'm paying for a clean, well-tested patch stream. But that model can not scale cost linearly with number of installed nodes - I'm not even convinced it can scale as the log of the number of nodes. > > if the community continues to fork a project just becauses it charges > some $$$$, our progress would be very slow.... Redhat have listened to > the customer and partners and have created a academic pricing > model for > cluster builders... so we should accept that and move on. As I've said above, this is simply confused and does nothing for me or my project. The community depends on people contributing work - and in some cases those people contribute work in exchange for remuneration. But in other cases we as a community find ways of driving development forward through what amounts to barter - we all get value from the open source software, and we all contribute to it in some way. RH is (or at least appears to be) going down the restrictive licence, over-priced model pushed by MS. They've also learned the 'force frequent upgrades' trick. That leaves me uncomfortable about them as a vendor with whom I believe I'll have a good long-term relationship. But in the short-term software I use needs "RH 7.1", or "supported only on RH 7.3" or "RHEL-AS 2.1". Great. So I want ways of using RH that reduce my risks (what if RH stop making binaries available - can I still operate? If not, I want to be able to recompile from the source, and need to avoid copyright infringement problems). [...] > disarray and you will see droves of commercial ISVs > abandoning linux and > moving back to UNIX and Windows.... > > where would that leave us? without commercial apps, linux would never > sustain and grow in the commercial arena. > ah, well, now you've moved off into another universe. This isn't the one I'm in. Closed source is bad - it gets in my way, makes my life difficult, and increases my project's risks enormously. Why should I pay RH huge sums of money for Linux AND have to fight to get acceptance of Linux internally when I could take the "easy" option and just buy Windows? [by the way, I know why, and I'm fighting - and winning] Where I am now is a small part of the commercial arena, it uses commercial apps that run on Linux because we, customers, demand that they do. If RH make life difficult for us (awkward licence model and/or high price per node) we will start looking for ways around the problem, because it is worthwhile. Maybe we'll shift to another distro, maybe we'll take the time and sort out how to build it ourselves - and once we've done that, what use are RH to us? And if they are no use, will they get any money - no I don't think so. Open source is a whole new way of working - and the money has to come in a different way. If we're offered useful services that we can't or don't want to handle internally, we'll look at buying them. But if the price is too high we won't bother. Graham (long term IRIX user, computational chemist, and now chemoinformatics specialist. I put up with Windows for office use but wouldn't want to rely on it for anything important...) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From william.mandra at us.army.mil Tue Nov 25 20:53:01 2003 From: william.mandra at us.army.mil (William J Mandra) Date: Tue, 25 Nov 2003 20:53:01 -0500 Subject: Could clusters soon be a thing of the past? ..... Message-ID: If anyone is interested in the possibility of a desktop computer making the Top500 list check out this article: http://www.wired.com/news/technology/0,1282,60791,00.html The product will not be available until next year so I will wait and hold my breath (maybe some benchmarks will come out of the company before it disappears). :) It would be very interesting to see what kind off performance a current cluster could attain with an add on like this. ----- William Mandra william.mandra at us.army.mil ----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From derek.richardson at pgs.com Wed Nov 26 13:23:47 2003 From: derek.richardson at pgs.com (Derek Richardson) Date: Wed, 26 Nov 2003 12:23:47 -0600 Subject: Opteron kernel In-Reply-To: References: Message-ID: <3FC4EFB3.10708@pgs.com> Donald, Sorry for the late reply, bloody Exchange server didn't drop it in my inbox until late this morning. Memory and scheduling would probably be the biggest factor. Processor affinity doesn't matter as much, because in my experience we haven't had problems w/ processes bouncing between CPUs. PCI bus is almost a non-issue, since our application is embarassingly parallel and therefore has no need for > 100 Mbit ethernet, and there is no disk on a PCI-attached controller, so we have very little information passing over the PCI bus. By interleaving, I assume you mean at the physical level, which I had a quick peek at when we got the system ( it's an IBM eServer 325, a loaner for testing ) and I assumed to be correct. But given the poor performance I have seen ( 2 GHz Opterons coming in at ~15% slower than a 3 GHz P4 on a compute/memory intensive application when most benchmarks I have seen would imply the inverse ), I will double-check that when given a chance. I will probably just try the latest 2.6 kernel and a few other tweaks as well, and AMD has also offerred help, but that would more likely be at the application layer ( which I don't have control of, unfortunately ). Thanks for the response, and my apologies for the vagueness of the question. Derek R. Donald Becker wrote: >On Mon, 24 Nov 2003, Derek Richardson wrote: > > > >>Does anyone know where to find info on tuning the linux kernel for >>Opterons? Googling hasn't turned up much useful information. >> >> > >What type of tuning? >PCI bus transactions (the Itanium required more, but the Opteron still >benefits)? Scheduling? Processor affinity? What kernel version? >If you ask specific questions, there is likely someone on the list that >knows the specific answer. > >The easiest performance improvement comes from proper memory DIMM >configuration to match the application layout. Each processor has its >own local memory controller, and understanding how the memory slots are >filled and the options e.g. interleave can make a 30% difference on a >dual processor system. > > > -- Linux Administrator derek.richardson at pgs.com derek.richardson at ieee.org Office 713-781-4000 Cell 713-817-1197 Madison's Inquiry: If you have to travel on the Titanic, why not go first class? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeff.zheng at intel.com Wed Nov 26 07:24:27 2003 From: jeff.zheng at intel.com (Zheng, Jeff) Date: Wed, 26 Nov 2003 20:24:27 +0800 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Message-ID: <37FBBA5F3A361C41AB7CE44558C3448E011959F2@pdsmsx403.ccr.corp.intel.com> I can connect using following commands: [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 none user "" " " Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 After set LAN password by SSU, [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 md2 user "" "1 23456" Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: WANGNICK Sebastian [mailto:sebastian.wangnick at eurocontrol.int] Sent: Wednesday, November 26, 2003 4:30 PM To: Zheng, Jeff; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Oh, yes. My original email contained the Ethereal dumps: --- snip --- But still, when trying ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the board doesn't respond to the Activate Session request: Frame 1 (65 bytes on wire, 65 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.094931000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 65 bytes Capture Length: 65 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 51 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1923 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 31 Checksum: 0x8d6b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 9 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Request LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Checksum 2: 0x33 Data (2 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 33 00 00 40 00 40 11 19 23 c8 c8 c8 01 c8 c8 .3.. at .@..#...... 0020 c8 04 1b 59 02 6f 00 1f 8d 6b 06 00 ff 07 00 00 ...Y.o...k...... 0030 00 00 00 00 00 00 00 09 20 18 c8 81 04 38 0e 02 ........ ....8.. 0040 33 3 Frame 2 (72 bytes on wire, 72 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321677000 Time delta from previous packet: 0.226746000 seconds Time relative to first packet: 0.226746000 seconds Frame Number: 2 Packet Length: 72 bytes Capture Length: 72 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 58 Identification: 0x0a9c (2716) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e70 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 38 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 16 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Response LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Completion Code: Command completed normally (0x00) Checksum 2: 0x97 Data (8 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 3a 0a 9c 40 00 40 11 0e 70 c8 c8 c8 04 c8 c8 .:.. at .@..p...... 0020 c8 01 02 6f 1b 59 00 26 00 00 06 00 ff 07 00 00 ...o.Y.&........ 0030 00 00 00 00 00 00 00 10 81 1c 63 20 04 38 00 01 ..........c .8.. 0040 20 1c 00 bd 13 00 00 97 ....... Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321885000 Time delta from previous packet: 0.000208000 seconds Time relative to first packet: 0.226954000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0x3693 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0xd0 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 1b 59 02 6f 00 2e 36 93 06 00 ff 07 00 00 ...Y.o..6....... 0030 00 00 00 00 00 00 00 18 20 18 c8 81 08 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 d0 DMIN............ Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.436904000 Time delta from previous packet: 0.115019000 seconds Time relative to first packet: 0.341973000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a9d (2717) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e62 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0xdd Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 9d 40 00 40 11 0e 62 c8 c8 c8 04 c8 c8 .G.. at .@..b...... 0020 c8 01 02 6f 1b 59 00 33 00 00 06 00 ff 07 00 00 ...o.Y.3........ 0030 00 00 00 00 00 00 00 1c 81 1c 63 20 08 39 00 10 ..........c .9.. 0040 3d 22 9c 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 =".uN.9..Vao ..x 0050 6b 86 c2 dd 00 k.... Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.437332000 Time delta from previous packet: 0.000428000 seconds Time relative to first packet: 0.342401000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0xc2b9 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0x9c223d10 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0x9c223d10 Authentication Code: DD4D2F557F83B6DFE6E9CACAE38CA53E Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x0c 0000 11.. = Sequence: 0x03 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0x3c Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 1b 59 02 6f 00 43 c2 b9 06 00 ff 07 05 00 ...Y.o.C........ 0030 00 00 00 10 3d 22 9c dd 4d 2f 55 7f 83 b6 df e6 ....="..M/U..... 0040 e9 ca ca e3 8c a5 3e 1d 20 18 c8 81 0c 3a 05 02 ......>. ....:.. 0050 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 6b 86 c2 uN.9..Vao ..xk.. 0060 dd 76 ef fd 3c .v..< The same holds true when changing ipmi_auths[5] to: { ipmi_md5_authcode_init, ipmi_md5_authcode_gen, ipmi_md5_authcode_check, ipmi_md5_authcode_cleanup } However, the Java IPMI tool that SuperMicro delivers with the BMC is able to activate the session: Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.427023000 Time delta from previous packet: 0.002488000 seconds Time relative to first packet: 0.123603000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0xcd34 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0x59 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 8d af 02 6f 00 2e cd 34 06 00 ff 07 00 00 .....o...4...... 0030 00 00 00 00 00 00 00 18 20 18 c8 00 00 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 59 DMIN...........Y Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.542351000 Time delta from previous packet: 0.115328000 seconds Time relative to first packet: 0.238931000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a93 (2707) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e6c (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0x85 Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 93 40 00 40 11 0e 6c c8 c8 c8 04 c8 c8 .G.. at .@..l...... 0020 c8 01 02 6f 8d af 00 33 00 00 06 00 ff 07 00 00 ...o...3........ 0030 00 00 00 00 00 00 00 1c 00 1c e4 20 00 39 00 10 ........... .9.. 0040 30 d2 b3 b4 66 5c 8d 59 95 1f 9e 48 1d 51 b3 4f 0...f\.Y...H.Q.O 0050 cd eb 3f 85 00 ..?.. Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.549163000 Time delta from previous packet: 0.006812000 seconds Time relative to first packet: 0.245743000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0x836b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0xb3d23010 Authentication Code: 5A50292FC164E754A3E7846B0A96880F Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0xbd Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 8d af 02 6f 00 43 83 6b 06 00 ff 07 05 00 .....o.C.k...... 0030 00 00 00 10 30 d2 b3 5a 50 29 2f c1 64 e7 54 a3 ....0..ZP)/.d.T. 0040 e7 84 6b 0a 96 88 0f 1d 20 18 c8 00 00 3a 05 04 ..k..... ....:.. 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 00 00 00 bd ..... Frame 6 (90 bytes on wire, 90 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.627386000 Time delta from previous packet: 0.078223000 seconds Time relative to first packet: 0.323966000 seconds Frame Number: 6 Packet Length: 90 bytes Capture Length: 90 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 76 Identification: 0x0a94 (2708) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e66 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 56 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000001 Session ID: 0xb3d23010 Authentication Code: 1C000048D88D000200000000000000A4 Message Length: 18 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Activate Session (0x3a) Completion Code: Command completed normally (0x00) Checksum 2: 0xd8 Data (10 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 4c 0a 94 40 00 40 11 0e 66 c8 c8 c8 04 c8 c8 .L.. at .@..f...... 0020 c8 01 02 6f 8d af 00 38 00 00 06 00 ff 07 05 01 ...o...8........ 0030 00 00 00 10 30 d2 b3 1c 00 00 48 d8 8d 00 02 00 ....0.....H..... 0040 00 00 00 00 00 00 a4 12 00 1c e4 20 00 3a 00 05 ........... .:.. 0050 10 30 d2 b3 00 00 00 00 04 d8 .0........ --- snip --- Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 08:56 To: WANGNICK Sebastian; openipmi-developer at lists.sourceforge.net; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Wed Nov 26 04:51:25 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Wed, 26 Nov 2003 10:51:25 +0100 Subject: booting from usb pen drive In-Reply-To: <1069516333.2018.1.camel@loiosh> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> Message-ID: <1069840284.7852.16.camel@revolution.mandrakesoft.com> > The short answer is yes. The long answer is, it depends on your BIOS. > Its kinda like a few years ago when some systems would boot from CD and > some wouldn't. Agreed but I've played a bit with it and it seems there is many way a bios boot an usb key. The usb key could be detected as a USB-FDD, USB-ZIP, USB-HDD, USB-CDROM. The geometry & the bootloader you use could change the way it is detected. Many BIOSes don't give the choice between this options, there is only a "Boot USB" option which usually equals USB-ZIP or USB-FDD. Then if you are using a FAT filesystem you can use syslinux, or grub/lilo on all the others (FAT included). I'm using my usb key as a firmware/bios updater when PXE is not available and/or for booting a rescue linux for repairing some linux boxes. Best regards, PS: On my Asus A7N8x-Deluxe, if the Bios Option "USB LEGACY MOUSE" is not activated I can't boot on my usb key ! :((( I took some time to find it. I've reported it to ASUS but no news, no fix :(. I know this is not a usual Clustering Mobo but such trick could help some of you. -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Nov 26 13:59:33 2003 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 26 Nov 2003 10:59:33 -0800 Subject: Opteron kernel In-Reply-To: <3FC4EFB3.10708@pgs.com> References: <3FC4EFB3.10708@pgs.com> Message-ID: <20031126185933.GB16806@cse.ucdavis.edu> > for testing ) and I assumed to be correct. But given the poor > performance I have seen ( 2 GHz Opterons coming in at ~15% slower than a > 3 GHz P4 on a compute/memory intensive application when most benchmarks > I have seen would imply the inverse ), I will double-check that when I've seen this repeatedly. Did each opteron cpu have 2 or 4 dimms attached? Did you benchmark TWO jobs on the opteron vs TWO jobs on the P4? Is the memory at least PC2700? Have you played with the BIOS settings, I've seen significant speedups playing with both the node interleaving and the memory interleaving. I can provide a benchmark that should show 2GB/sec to main memory on a single opteron, and 3 GB/sec to a dual opteron if it's properly setup. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From derek.richardson at pgs.com Wed Nov 26 14:37:32 2003 From: derek.richardson at pgs.com (Derek Richardson) Date: Wed, 26 Nov 2003 13:37:32 -0600 Subject: Opteron kernel In-Reply-To: <20031126185933.GB16806@cse.ucdavis.edu> References: <3FC4EFB3.10708@pgs.com> <20031126185933.GB16806@cse.ucdavis.edu> Message-ID: <3FC500FC.7080101@pgs.com> Bill, It had 4 DIMMs, w/ 6 slots on of the motherboard. I am going to go confirm which banks correspond to what shortly, just in case IBM put it together w/ an imbalance. Yes, we have done the 2 job scenario ( that's our primary mode of operating, running 2 jobs ( called executive shells, or es's ) on a dual-cpu node, w/ anywhere from 10 up to 125 nodes participating in a job ), the memory should be DDR333 IIRC. The BIOS doesn't have much in the way of tuning options, more's the pity. What's the benchmark? If it's a publicly available one, I probably have it installed already, if not, yes, I would appreciate it. When you have seen it repeatedly, you mean an improper distribution of memory to CPU? Derek R. Bill Broadley wrote: >>for testing ) and I assumed to be correct. But given the poor >>performance I have seen ( 2 GHz Opterons coming in at ~15% slower than a >>3 GHz P4 on a compute/memory intensive application when most benchmarks >>I have seen would imply the inverse ), I will double-check that when >> >> > >I've seen this repeatedly. > >Did each opteron cpu have 2 or 4 dimms attached? > >Did you benchmark TWO jobs on the opteron vs TWO jobs on the P4? > >Is the memory at least PC2700? > >Have you played with the BIOS settings, I've seen significant speedups >playing with both the node interleaving and the memory interleaving. > >I can provide a benchmark that should show 2GB/sec to main memory >on a single opteron, and 3 GB/sec to a dual opteron if it's properly >setup. > > > -- Linux Administrator derek.richardson at pgs.com derek.richardson at ieee.org Office 713-781-4000 Cell 713-817-1197 Madison's Inquiry: If you have to travel on the Titanic, why not go first class? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Wed Nov 26 15:43:29 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed, 26 Nov 2003 12:43:29 -0800 Subject: Opteron kernel In-Reply-To: <1069817334.8326.122.camel@protein.scalableinformatics.com> References: <3FC29050.6000003@pgs.com> <20031125232149.GA2995@greglaptop.internal.keyresearch.com> <1069817334.8326.122.camel@protein.scalableinformatics.com> Message-ID: <20031126204329.GE2793@greglaptop.internal.keyresearch.com> On Tue, Nov 25, 2003 at 10:28:54PM -0500, Joe Landman wrote: > Has anyone implemented a real runon, or built something like the old > IRIX dplace stuff yet? I had been looking into this, and don't want to > re-invent a working thing... In addition to the SUSE thing posted already, Fedora has some kind of user utility too. We're using an in-house thingie for now, same functionality, different name... -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Wed Nov 26 17:56:53 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Thu, 27 Nov 2003 06:56:53 +0800 Subject: long reply (was RE: LONG RANT [RE: RHEL Copyright Removal]) In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> Message-ID: <1069887413.2179.577.camel@scalable> Hi, It is great that you have convinced your management to use Linux and winning the game :-) you should look at using RHEL WS for the compute nodes and RHEL ES or AS for the frontend. It will lower your costs quite a bit. for 1 frontend with AS (USD1499) and 42 compute (42 x USD179) = USD$9017. (for 42 nodes, u probably can and should get discounts!) Your cost of your hardware would probable amount to around USD100K. So the OS costs comes up to about 10%... I believe 10% for OS for a cluster is about right. I understand how you feel about RHEL policies etc, and I am hopeful that RH will have specific HPC pricing further down the road. I would encourage you to speak to your RH rep nicely and explain what you are doing and why you think you should get a "HPC" pricing. (ie. more discounts of the compute nodes) You will be surprised that not all in Red Hat appreciates HPC and what we do, and why their model of pricing currently does not work for us. As for alternative distro, you may wish to look at Novel/Suse Linux and use it as a counter balance to RH. Again, you will note that I can only encourage the use of these "commercial" distros as it will probably be part of a ISVs supported matrix. Most of my customers are sticky about such support and demands that the OS used is a supported OS for their applications. I look forward to the day Novell Linux offers a HPC pricing model. But again for them to do so and provide the support and patch stream, there will be costs and I hope it will be reasonable and which something the community can accept. Cheers! laurence On Thu, 2003-11-27 at 01:48, graham.mullier at syngenta.com wrote: > Laurence Liew wrote: > [...] > > The cluster community have done very well and today, large commercial > > organisations are adopting linux clusters as one of the tools they use > > to solve their complex problems. > > Yup, that's us - we are currently adopting a variety of open source tools, > including Linux, to help tackle at least some of our HPC needs. > > > > But I find this talk of "stripping" RHEL copyright to create > > yet another > > distro to be counter productive as linux beowulf clusters goes into > > commercial mainstream computing.... where customers have specific > > support demands. (And yes... commercial customers WILL PAY > > the full list > > price of RHEL to build a cluster). > > > > Now... I believe the USD25 and USD50 are acceptable pricing for the > > value that RHEL + RHN brings to the customer (academic). The > > cost of the > > OS is a small fraction of the total value of the cluster. > > > > Most of our users want a stable and supported OS, but more > > importantly, > > most of them run a commercial software of one form or another... and > > this means that these 3rd party ISV softwares are most likely to be > > certified on RHEL. > [...] > > I think you are confusing things here (I know you are ranting but let's try > to keep the arguments coherent, please! ;) > I'm running a project within a commercial company, so academic rates are of > no use to me. I am willing to pay for what I get, but I'm not willing to pay > simply to give us a warm glow that we are "supported". If I get some value > I'll pay. I don't think I get value if I'm expected to pay separately for > each copy of RHEL-AS on each of 42 compute nodes, and the only price I'm > offered is an extreme full list price. I would be willing to buy into a > model where I'm paying for a clean, well-tested patch stream. But that model > can not scale cost linearly with number of installed nodes - I'm not even > convinced it can scale as the log of the number of nodes. > > > > if the community continues to fork a project just becauses it charges > > some $$$$, our progress would be very slow.... Redhat have listened to > > the customer and partners and have created a academic pricing > > model for > > cluster builders... so we should accept that and move on. > As I've said above, this is simply confused and does nothing for me or my > project. The community depends on people contributing work - and in some > cases those people contribute work in exchange for remuneration. But in > other cases we as a community find ways of driving development forward > through what amounts to barter - we all get value from the open source > software, and we all contribute to it in some way. > > RH is (or at least appears to be) going down the restrictive licence, > over-priced model pushed by MS. They've also learned the 'force frequent > upgrades' trick. That leaves me uncomfortable about them as a vendor with > whom I believe I'll have a good long-term relationship. > > But in the short-term software I use needs "RH 7.1", or "supported only on > RH 7.3" or "RHEL-AS 2.1". Great. So I want ways of using RH that reduce my > risks (what if RH stop making binaries available - can I still operate? If > not, I want to be able to recompile from the source, and need to avoid > copyright infringement problems). > [...] > > disarray and you will see droves of commercial ISVs > > abandoning linux and > > moving back to UNIX and Windows.... > > > > where would that leave us? without commercial apps, linux would never > > sustain and grow in the commercial arena. > > > ah, well, now you've moved off into another universe. This isn't the one I'm > in. Closed source is bad - it gets in my way, makes my life difficult, and > increases my project's risks enormously. > > Why should I pay RH huge sums of money for Linux AND have to fight to get > acceptance of Linux internally when I could take the "easy" option and just > buy Windows? [by the way, I know why, and I'm fighting - and winning] > > Where I am now is a small part of the commercial arena, it uses commercial > apps that run on Linux because we, customers, demand that they do. If RH > make life difficult for us (awkward licence model and/or high price per > node) we will start looking for ways around the problem, because it is > worthwhile. Maybe we'll shift to another distro, maybe we'll take the time > and sort out how to build it ourselves - and once we've done that, what use > are RH to us? And if they are no use, will they get any money - no I don't > think so. > > Open source is a whole new way of working - and the money has to come in a > different way. If we're offered useful services that we can't or don't want > to handle internally, we'll look at buying them. But if the price is too > high we won't bother. > > Graham > > (long term IRIX user, computational chemist, and now chemoinformatics > specialist. I put up with Windows for office use but wouldn't want to rely > on it for anything important...) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Wed Nov 26 20:59:44 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed, 26 Nov 2003 17:59:44 -0800 Subject: long reply (was RE: LONG RANT [RE: RHEL Copyright Removal]) In-Reply-To: <1069887413.2179.577.camel@scalable> References: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> <1069887413.2179.577.camel@scalable> Message-ID: <20031127015944.GB4959@greglaptop.internal.keyresearch.com> On Thu, Nov 27, 2003 at 06:56:53AM +0800, Laurence Liew wrote: > Your cost of your hardware would probable amount to around USD100K. So > the OS costs comes up to about 10%... I believe 10% for OS for a cluster > is about right. That might be true if the OS was cluster-aware and helped out with cluster problems. RHEL doesn't meet this standard. > Most of my customers are sticky about such support and demands > that the OS used is a supported OS for their applications. Then by all means sell them whatever they will buy. But don't be surprised if they think they're getting ripped off paying so much for a non-cluster-aware OS. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eccf at super.unam.mx Thu Nov 27 19:38:35 2003 From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores) Date: Thu, 27 Nov 2003 18:38:35 -0600 (CST) Subject: Grand Challenge Message-ID: Hi to all, Does anybody know if there were any articles talking about the Grand Challenge years ago? Sorry if this is a bit out of topic cafe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Fri Nov 28 15:04:30 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Fri, 28 Nov 2003 21:04:30 +0100 Subject: Mainboard identification and BIOS dump Message-ID: <1070049870.528.34.camel@qeldroma.cttc.org> Hi again, We have a fully OS remote installation to recover crashed nodes or upgrade them. They're configured and installed through BOOTP, NFS and some scripting, but our cluster and workstation machines are not uniform at all and some critical configuration and monitoring depends on motherboard model. BIOS on PC's is found at the last 64 Kb as reported by the "System ROM" entry at /proc/iomem: 00000000-0009efff : System RAM 0009f000-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000d0000-000d5fff : Extension ROM 000f0000-000fffff : System ROM 00100000-1fffbfff : System RAM 00100000-0023d67d : Kernel code 0023d67e-002b8f1f : Kernel data We have just done a simply BIOS dump script first, avoiding trouble with kernel calls in C language: dd if=/dev/mem bs=1048575 count=1 | tail -c 65535 > dumpbios.bin Therefore, we just need to parse this "dumpbios.bin" file and check against a small database file if a known motherboard string is present. I think data strings are put at the same place through different models ( supposing same bios manufacturer ), so brute force parsing this file won't be needed... anyway this file is damn short. Is there any motherboard identifying utility for linux ? We could also mess with kernel calls as well but that method should suffice ? any thoughts ? Thank you in advance. -- Daniel Fernandez Laboratori de Termot?cnia i Energia - CTTC www.upc.edu/lte c/ Colom n?11 UPC Campus Terrassa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Nov 28 20:28:53 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 28 Nov 2003 17:28:53 -0800 (PST) Subject: Mainboard identification and BIOS dump In-Reply-To: <1070049870.528.34.camel@qeldroma.cttc.org> Message-ID: hi ya daniel assuming that one can uniquely identify a motherboard model... a) yes ... sometimes that info Asus p4-aaaa is in the bios but usually not b) why play with dd if=/dev/mem .... its easier to save a copy of the output of dmesg on bootups /etc/rc.d/rc.local echo "" echo "save some startup info" dmesg > /etc/rc.d/rc.local.dmesg - append info from /proc/io proc/cpuinfo /proc/pci /proc/iomem /proc/meminfo cat /proc/cpuinfo >> /etc/rc.d/rc.local.dmesg .... - poke around at that rc.localdmesg file when you wanna know which mb it might be - you'd have to make a list of mapping/signatures from the chipset back to the mb manufacturer and model# making a kernel that supports all your hardware is the easiest way to handle the non-homogenous network c ya aklvin On Fri, 28 Nov 2003, Daniel Fernandez wrote: > Hi again, > > We have a fully OS remote installation to recover crashed nodes or > upgrade them. They're configured and installed through BOOTP, NFS and > some scripting, but our cluster and workstation machines are not uniform > at all and some critical configuration and monitoring depends on > motherboard model. > > BIOS on PC's is found at the last 64 Kb as reported by the "System ROM" > entry at /proc/iomem: > > 00000000-0009efff : System RAM > 0009f000-0009ffff : reserved > 000a0000-000bffff : Video RAM area > 000c0000-000c7fff : Video ROM > 000d0000-000d5fff : Extension ROM > 000f0000-000fffff : System ROM > 00100000-1fffbfff : System RAM > 00100000-0023d67d : Kernel code > 0023d67e-002b8f1f : Kernel data > > We have just done a simply BIOS dump script first, avoiding trouble with > kernel calls in C language: > > dd if=/dev/mem bs=1048575 count=1 | tail -c 65535 > dumpbios.bin > > Therefore, we just need to parse this "dumpbios.bin" file and check > against a small database file if a known motherboard string is present. > I think data strings are put at the same place through different models > ( supposing same bios manufacturer ), so brute force parsing this file > won't be needed... anyway this file is damn short. > > Is there any motherboard identifying utility for linux ? We could also > mess with kernel calls as well but that method should suffice ? any > thoughts ? Thank you in advance. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nishanth at mec.ac.in Fri Nov 28 21:16:48 2003 From: nishanth at mec.ac.in (Nishanth Rajan) Date: Sat, 29 Nov 2003 07:46:48 +0530 (IST) Subject: MOSIX In-Reply-To: <200311281704.hASH4SS04011@NewBlue.scyld.com> References: <200311281704.hASH4SS04011@NewBlue.scyld.com> Message-ID: <4541.202.88.246.210.1070072208.squirrel@mail.mec.ac.in> hi everybody, This is Nishanth from Cochin , India. I am new comer into this mailing list.. Iam an engineering student and am intending to do MOSIX for my project... If anyone could help me in this regard...pls contact me. Thanks Nishanth <-=+||+=-> _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nashif at planux.com Sat Nov 29 23:28:32 2003 From: nashif at planux.com (Anas Nashif) Date: Sat, 29 Nov 2003 23:28:32 -0500 Subject: Mainboard identification and BIOS dump In-Reply-To: <1070049870.528.34.camel@qeldroma.cttc.org> References: <1070049870.528.34.camel@qeldroma.cttc.org> Message-ID: <3FC971F0.7060807@planux.com> Hi, DMI decode is your friend http://www.nongnu.org/dmidecode/ Anas Daniel Fernandez wrote: > Hi again, > > We have a fully OS remote installation to recover crashed nodes or > upgrade them. They're configured and installed through BOOTP, NFS and > some scripting, but our cluster and workstation machines are not uniform > at all and some critical configuration and monitoring depends on > motherboard model. > > BIOS on PC's is found at the last 64 Kb as reported by the "System ROM" > entry at /proc/iomem: > > 00000000-0009efff : System RAM > 0009f000-0009ffff : reserved > 000a0000-000bffff : Video RAM area > 000c0000-000c7fff : Video ROM > 000d0000-000d5fff : Extension ROM > 000f0000-000fffff : System ROM > 00100000-1fffbfff : System RAM > 00100000-0023d67d : Kernel code > 0023d67e-002b8f1f : Kernel data > > We have just done a simply BIOS dump script first, avoiding trouble with > kernel calls in C language: > > dd if=/dev/mem bs=1048575 count=1 | tail -c 65535 > dumpbios.bin > > Therefore, we just need to parse this "dumpbios.bin" file and check > against a small database file if a known motherboard string is present. > I think data strings are put at the same place through different models > ( supposing same bios manufacturer ), so brute force parsing this file > won't be needed... anyway this file is damn short. > > Is there any motherboard identifying utility for linux ? We could also > mess with kernel calls as well but that method should suffice ? any > thoughts ? Thank you in advance. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sun Nov 30 01:58:35 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sat, 29 Nov 2003 22:58:35 -0800 (PST) Subject: Mainboard identification and BIOS dump In-Reply-To: <3FC971F0.7060807@planux.com> Message-ID: hi ya anas On Sat, 29 Nov 2003, Anas Nashif wrote: > Hi, > > DMI decode is your friend > > http://www.nongnu.org/dmidecode/ very nice !! c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sat Nov 1 05:05:07 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sat, 1 Nov 2003 10:05:07 +0000 Subject: Cluster Poll Results (tangent into OS choices, Fedora and Debian) In-Reply-To: <1067644757.5702.219.camel@haze.sr.unh.edu> References: <1067629499.21719.73.camel@localhost.localdomain> <1067644757.5702.219.camel@haze.sr.unh.edu> Message-ID: <20031101100507.GA623@galactic.demon.co.uk> On Fri, Oct 31, 2003 at 06:59:16PM -0500, Tod Hagan wrote: > > If Fedora achieves 2-3 upgrades per year then it will be fairly > different from Debian, which seems to be at 2-3 years per upgrade these > days, (well almost). I think it's averaged out at about 18 months overall for each major version release. Point releases of security fixes come out more frequently. Debian 2.2 was there for about two years with about 7 point releases, the last being made days before 3.0. 3.0 has only had one point release - but security fixes and so on are updated quickly. > > After a new release comes out Debian supports the old one for a period > of time (12 months?) with security updates before pulling the plug. > Given a two year release cycle, that means you may get three years of full support. We don't kill things off on fixed dates, necessarily, and it's open to every package maintainer to build fixes for "old stable" for as long as he wishes. One aim is to support upgrades from older systems easily: I'm fairly sure you can go from 2.1 - 2.2 - 3.0 - unstable with about four reboots - so thats about six or seven years of development in a couple of hours :) > Debian can be upgraded in place as opposed to requiring a full > resinstall; while this is great for desktops and servers, I'm not sure > if this is important for a cluster. Upgrades are relatively straightforward - unless you change kernels / a.out -> ELF / glibc major versions, you probably don't need to reboot. > > As a result of the extended release cycle Debian stable tends to lack > support for the newest hardware (Opteron 64-bit, for example). This is > why Knoppix, which is based on Debian, isn't derived from Debian stable, > but rather from packages in the newer releases (testing, unstable and > experimental). But the flip side is that the stable release, while > dated, tends to work well as it's had a lot of testing. Debian also works on 11 hardware architectures, with more coming along. We've had 64 bit issues on Alpha, Sparc and Itanium for a while. The 32 bit distribution works well on Opteron but there is also 64 bit stuff working. Testing is creeping asymptotically to release, as ever :) > > Debian could probably use more recognition as a target platform by > commercial software vendors but it incorporates a huge number of > packages including many open source applications pertinent to science. > Breadth in packaged applications is probably more important for > workstations since clusters tend to use small numbers of apps very > intensely. > There's a lot of stuff packaged by Debian people who want, for example, genome sequencing / heavy maths and other "stuff" :) > > Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3 > releases per year rate probably requires a small core, putting the bulk > of applications into the Extras category and thus increasing the chance > of conflict. (Wasn't that the original recipe for DLL hell?) Debian has > avoided this through a much larger core, which of course slows the > release cycle. > The key is tight dependency control and management. That's what has set Debian apart from the distributions based on .rpm. There's a heavy overhead for the maintainers but hopefully a lighter burden on users. [Full disclosure: I'm also amacater at debian.org :) ] Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Nov 1 10:40:24 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 1 Nov 2003 10:40:24 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: Message-ID: On Fri, 31 Oct 2003, Joel Jaeggli wrote: > > Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't > > claim support from RH for more than one of the systems. > > read the liscsense agreement for you redhat enterprise disks... I think his point is that there is some untested legal ground here for GPL distributions, "license agreement" or not. As in it remains to be seen whether it is possible to create a license agreement for a GPL or mostly-GPL distribution that restricts the redistribution of the binary images. To quote from the preamble: When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. Note the phrase "free software". Note also the various inheritance clauses. I'm not a lawyer; I don't know how this would ultimately untangle in a court if someone chose to just ignore RH's license agreement and install things as they wished, but I'm sure we'll eventually find out...;-) rgb > > > Regards, > > Steffen > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Nov 1 10:35:03 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 1 Nov 2003 10:35:03 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices, Fedora and Debian) In-Reply-To: <1067644757.5702.219.camel@haze.sr.unh.edu> Message-ID: On 31 Oct 2003, Tod Hagan wrote: > While looking into the number of packages in Debian vs. Fedora I > stumbled across this frightening bit (gotta throw a Halloween reference > in somewhere) on the Fedora site: > > http://fedora.redhat.com/participate/terminology.html > > Packages in Fedora Extras should avoid conflicts with other packages > > in Fedora Extras to the fullest extent possible. Packages in Fedora > > Extras must not conflict with packages in Fedora Core. > > It seems that Fedora intends to achieve applications breadth through > "Fedora Extras" package sets in other repositories, but the prohibition > of conflicts between Extras packages isn't as strong as the absolute > prohibition of conflicts between Extras and Core packages. Could this > result in a new era of DLL hell a few years down the road? > > Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3 > releases per year rate probably requires a small core, putting the bulk > of applications into the Extras category and thus increasing the chance > of conflict. (Wasn't that the original recipe for DLL hell?) Debian has > avoided this through a much larger core, which of course slows the > release cycle. Pre-yum, I would have said yes, but I honestly think that yum has arrived in the knick of time to rescue RPM-based distributions of all sorts from precisely this. Fedora (and for that matter RH mainstream) appears to have embraced yum (perhaps somewhat reluctantly, given that it obsoletes up2date dramatically before up2date achieved anything like real traction in the marketplace) and AFAIK are yummifying their repositories or plan to soon (as well as provide yum as a part of the distribution). With yum, packages that conflict will have a very, very short lifetime in any public or private repository because the conflicts will be immediately exposed and the conflicting packages either rejected or rebuilt. Note well that with rpm-based distros one can put oneself into hell already by just using rpm --force (and who hasn't done this at least once, seriously:-). If one uses kickstart to install systems and yum to install additional packages and update/upgrade the ones you've got (religiously) one cannot enter hell as the gates are barred. One MAY well have packages with conflicts that one wishes to put into your repositories, but yum will reveal them in short order and you (or the group with whom you share an interest in the packages) will willy-nilly fix the packages or have the PACKAGES consigned to hell. The point is that FINALLY having a sensible toolset that can resolve all forward dependencies, revealing conflicts, obsoletes, dependency loops, file (as opposed to package) dependencies, and all the other Evil that rpm as a bare specification at long last enables packages to be developed with something approximating rigor and discipline. This is one of several reasons that I think that Fedora (or similar rpm-associated projects -- there will likely be more than one) will turn out in the very short run to be MORE reliable than RHE, and that there will be a very distinct flow of energy FROM fedora BACK to RHE. In fact, I think that as fedora becomes the "real" open source red hat where development is rapid and problems are rapidly revealed and repaired on the dynamic edge by all the people who actually wrote and use the bulk of the software in ANY linux distribution, people who buy RHE are increasingly going to be getting fedora, repackaged and "tested and supported" by RH and resold to people that want to be insulated from the supposedly chaotic process that is PRECISELY what has been driving linux stable and unstable distributions for years now to everybody's general satisfaction. yum isn't even "finished", in my opinion, and will only get stronger as tools and concepts are added to the suite. There are some really significant changes in the wind that could conceivably trigger a long-overdue paradigm shift in the way packages in ALL linux distributions, including e.g. Debian (not just rpm-based) are installed. Then there are several ideas out there for tools that don't just install binary/distro compatible rpm's from a distro-specific repository, but rather install a binary/distro compatible "base" system and then download and dynamically build source rpm's (either for a local install or to BUILD a binary/distro non-conflicted rpm repository for yum install and maintenance). As I said in an earlier message, I think that the Internet's general response to RH's attempt to coerce money on the order of $100/seat/year (or more, conceivably MUCH more) from all its users is going to be very, very "interesting". Chinese curse "interesting". (And yes, I think that this is totally absurd on a workstation or small (<32 node) compute node that these days might cost $500-600 full retail, where its costs is more like 20% of the base hardware where 2%/year would STILL be too much -- noting that small <32 node clusters that comprise the bulk of installed clusters, and that $3200 or more is TOTALLY out of the question for most of these. I also see no reason whatsoever for a "workstation" distribution to be crippled by omitting http, nfs, and the various so-called "server" packages -- one of linux's strengths for years has been that when a workstation needs to become a server, you just chkconfig the server features on, possibly after installing a couple of packages. In fact, I see absolutely no reason for any linux distribution to partition out ANY packages for special treatment -- once they are built for a distribution, they are built, building/rebuilding most of them once the source rpm's are made consistent even one time is mostly a matter of rpm --rebuild package.src.rpm.) I could of course be wrong -- maybe we'll all be spending trans-microsoftian amounts of money. I'll be cheerily paying Red Hat several thousand dollars a year for the privilege of running an internal webserver and nfs file server in my house to serve a handful of computers on my kids' and wife's desks. Maybe Duke will just go "sure, we'll just raise tuition by a few hundred more dollars -- the kids and their parents won't care -- so we can give it to Red Hat as we'd MUCH rather pay them even more insane amounts of money than the $17/seat or thereabouts we currently pay to Microsoft." Maybe the NSF and DOE will go "oh, hey, our bad" and ask the government to raise taxes a bit so that all the government labs that use linux can now spend hundreds to thousands of dollars extra per node/workstation/"server" (with Microsoft sitting there perfectly happy to "compete" head to head for the same privilege, invariably at LOWER PRICES). Maybe consumers in Best Buy will look at those spiffy box sets of Red Hat Linux that have finally started to grace their shelves and say "gee, here is an operating system that costs more than twice as much as WinXX, won't run any of these seven hundred off the shelf WinXX applications, requires considerable expertise to install it and maintain it and doesn't come preloaded on any system I might buy here -- I simply MUST have it." I personally think that Red Hat's board has lost its collective mind. This is dicey ground; their stock price has not coincidentally been skyrocketing as they present a public picture of "becoming another Microsoft or Sun" complete with prices to match. Of course, Microsoft AND Sun have slowly but surely been LOSING market share to linux, largely on the basis of COST-BENEFIT. What will happen if/when this rosy picture of huge profits turns out to be an "expectation bubble" and that it actually SLOWS the rate at which RH is adopted in the corporate marketplace compared to other commercial Unices and Microsofts competing products (not to mention crushes the potential consumer linux marketplace before it even was fully born)? Ugly... As Greg (in his role as a pundit:-) once remarked at a meeting I attended, the game of being a pundit is to try to see the future and make oracular predictions; to be provocative or evocative, right or wrong. So I could be wrong. Either way things will prove "interesting";-) rgb P.S. - to Greg, sorry if I'm misquoting you -- my memory is far from perfect but IIRC this was at the Atlanta ALS meeting and you were on a panel discussion:-) -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 1 11:45:40 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 1 Nov 2003 11:45:40 -0500 (EST) Subject: what happened to deerfield? Message-ID: I was just wondering whether anyone had seen tangible evidence of deerfield (Intel's low-voltage, low-power, 1.5M cache it2 whose 1G version was claimed to sell for $800 or so). I'd be especially interested if any vendors have produced 1U duals: price, heat and performance... thanks, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brett at nssl.noaa.gov Sat Nov 1 14:51:44 2003 From: brett at nssl.noaa.gov (Brett Morrow) Date: Sat, 01 Nov 2003 13:51:44 -0600 Subject: Scyld and MPI In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> References: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> <20031031181912.GB1289@greglaptop.internal.keyresearch.com> Message-ID: <3FA40ED0.90605@nssl.noaa.gov> I am running the latest version of SCYLD and having some trouble that I hope someone has seen before and can fix. I am running gm version 1.5.1 and Mpich 1.2.3 with the PGI compiler (have tried version 4.0-2 and 5.0. I have the SRPM for SCYLD for the MPICH so I can build the F90 compiler and it all builds clean. The problem is I am trying to get a model called WRF to run and the jobs all start as they are suppose to on all the processors I specify by the np variable. They all set at 100% cpu usage, but I get no output. It is like something is not being passed right. Before I switched to SCYLD (which I love because it solves so many management headaches and is VERY easy to install on big clusters), I was running a Redhat 7.3 with GM 1.6 and Mpich 1.2.5 and the jobs ran just fine. Does anyone know what might could cause this? Thanks -Brett Morrow National Severe Storms Lab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Mon Nov 3 00:32:09 2003 From: jsims at csiopen.com (Joey Sims) Date: Mon, 3 Nov 2003 00:32:09 -0500 Subject: Low Voltage Itanium2 Message-ID: <812B16724C38EE45A802B03DD01FD547226266@exchange.concen.com> I believe Intel is planning a production run of low voltage Itanium2 chips as their answer to AMD's offering of low voltage Opteron. Who really knows what Intel is doing lately.... Yes, 1U/2P Itanium2 boxes are available. -joey ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Mon Nov 3 06:51:24 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Mon, 3 Nov 2003 08:51:24 -0300 (ART) Subject: Turn on nodes through the network Message-ID: <20031103115124.77919.qmail@web12203.mail.yahoo.com> I finish my cluster beowulf, and first of all, I wold like to thanks everybody that help me through this mailing list, now I`m sure that I`ll have a lot of other problems in this new phase. And i already have one. I would like to know Howto boot my machines(nodes) using the network, I would like that turning on my master node the slave nodes automatically wake. What can I do? or, where can I find more information? Thanks Mathias Brito ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Mon Nov 3 07:53:54 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Mon, 3 Nov 2003 09:53:54 -0300 (ART) Subject: What`s wrong with my code Message-ID: <20031103125354.8071.qmail@web12205.mail.yahoo.com> Well, I avoided to send my code, because it is not the best way to solve the problem, and I using only the basics calls of MPI. The programam make the sum of two matrices. It generate two matrices ramdomically and sum it. But it didn`t work with a matrix greather than 834x834. I dont kwon why. Some variables and functions have portuguese names, but i commented it to say what it do. #include #include #include #include #define LINHAS 835 /*Number of lines*/ #define COLUNAS 835 /*Number of colums*/ #define TRUE 1 #define FALSE 0 void juntar(int *, int*); /*put the result of local operation in the final result matrix*/ void somar(int *, int*, int*); /*make the sum*/ void imprimir(int[][COLUNAS]); /*print a matrix*/ void inicializar(int[][COLUNAS]); /*initialize matrix with ramdom numbers*/ int main(int argc, char *argv[]) { int minha_parte1[LINHAS], /*my_part, my_result*/ minha_parte2[LINHAS], meu_resultado[LINHAS] = {0}; int size, my_rank; int i, j, tag = 0; int master = 0; int sair = 0; /*exit = 0*/ MPI_Status status; srand(time(NULL)); MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if(my_rank == master) { int matriz1[LINHAS][COLUNAS], matriz2[LINHAS][COLUNAS], resultado[LINHAS][COLUNAS]; int linhas_env = 0; /*number of line sent*/ int linhas_rec = 0; /*number of lines received*/ inicializar(matriz1); inicializar(matriz2); //imprimir(matriz1); //imprimir(matriz2); for(i = 1; i < size; i++) { if(linhas_env < LINHAS) { if(MPI_Send(&matriz1[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD) == MPI_ERR_BUFFER) printf("ERRO\n"); MPI_Send(&matriz2[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); //printf("MASTER: Enviando dados para o processo %d\n", i); linhas_env++; } } i = 1; while(TRUE) { if(linhas_rec < LINHAS) { MPI_Recv(&meu_resultado, COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD, &status); juntar(&resultado[linhas_rec][0], meu_resultado); //printf("MASTER: Recebendo dados do processo %d. Total de linhas recebidas = %d\n", i, linhas_rec + 1); linhas_rec++; } else break; if(linhas_env < LINHAS) { MPI_Send(&matriz1[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&matriz2[linhas_env][0], COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); //printf("MASTER: Enviado mais dados para o processo %d. Total de linhas enviadas = %d\n", i, linhas_env + 1); linhas_env++; } if(i == size - 1) i = 1; else i++; } for(i = 1; i < size; i++) { MPI_Send(&sair, COLUNAS, MPI_INT, i, tag, MPI_COMM_WORLD); //printf("MASTER: Finalizando processo %d\n", i); } printf("\n\n"); //imprimir(resultado); } else { while(TRUE) { MPI_Recv(&minha_parte1, COLUNAS, MPI_INT, master, tag, MPI_COMM_WORLD, &status); if(minha_parte1[0] == 0) break; MPI_Recv(&minha_parte2, COLUNAS, MPI_INT, master, tag, MPI_COMM_WORLD, &status); somar(minha_parte1, minha_parte2, meu_resultado); MPI_Send(&meu_resultado, COLUNAS, MPI_INT, master, tag, MPI_COMM_WORLD); } } MPI_Finalize(); return 0; } void juntar(int *m, int *r) { int i; for(i = 0; i < COLUNAS; i++) { m[i] = r[i]; } } void somar(int m[], int n[], int r[]) { int i; for(i = 0; i < COLUNAS; i++) r[i] = m[i] + n[i]; } void imprimir(int m[][COLUNAS]) { int i, j; for(i = 0;i < LINHAS; i++) { for(j = 0; j < COLUNAS; j++) { printf("%d\t", m[i][j]); } printf("\n"); } printf("\n\n"); } void inicializar(int m[][COLUNAS]) { int i, j; for(i = 0; i < LINHAS; i++) { for(j = 0; j < COLUNAS; j++) { m[i][j] = (rand() % 10) + 1; } } } ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Rafael.Tinoco at sun.com Mon Nov 3 07:11:16 2003 From: Rafael.Tinoco at sun.com (rafael david tinoco) Date: Mon, 03 Nov 2003 10:11:16 -0200 Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> References: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: <1067861475.5670.2.camel@dhcp-sao01-194-186.Brazil.Sun.COM> Hello mathias, i know sun has one thing called: Serial Over Lan, with that, you can power up all stations using the "LAN CONSOLE" in the v60/65 (intel based) machines. try finding something like this.. regards rafael david tinoco sun professional services - brazil rafael.tinoco at sun.com On Mon, 2003-11-03 at 09:51, Mathias Brito wrote: > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? > > Thanks > Mathias Brito > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci??ncias Exatas e Tecnol??gicas > Estudante do Curso de Ci??ncia da Computa????o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Mon Nov 3 05:52:29 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Mon, 03 Nov 2003 11:52:29 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <1067856749.902.33.camel@revolution.mandrakesoft.com> > Hmm... Let's take the case of a 1000 node system. If we assume a > $3000/node cost (probably low once rack, UPS, hardware support, and > interconnect are added in), we arrive at an approximate hardware cost of > $3,000,000. If we were to use the RHEL WS list price of $179/node, we > get $179,000 or about 6% of the hardware cost. That is assuming RedHat > will not provide any discount on large volume purchases (unlikely). Is > 6% unreasonable? 6% is reasonable but for a clustering awared distribution not for a general use distro. > What are the alternatives? [...] > - Mandrake - Mandrake has their clustering distribution, which could be > a good possibility, but the cost is as high or higher than RedHat. We've already talk about that on this mailing list. http://www.beowulf.org/pipermail/beowulf/2003-September/008032.html CLIC & MandrakeClustering are not comparable with RedHat because we really offer a Linux distribution specially redesigned for the clustering (tools, installation, configuration has been made for meeting the clustering needs). > The cluster management portion of the software stack would be great to > have integrated in to the product, but if third party vendors (Linux > Networx, OSCAR, Rocks, etc...) can provide the cluster management > portion on top of the distribution, a solution can be found. In some > ways this is even better since your cluster management decision is > independent of the OS vendor. Our vision is to provide a real distribution based on a generalist distro (MDK 9.0) with a lots of applications and modifications for the cluster. For example drakcluster helps you to manage your cluster (add/remove nodes or users in maui partition, etc..) -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From yudong at hsb.gsfc.nasa.gov Mon Nov 3 10:30:47 2003 From: yudong at hsb.gsfc.nasa.gov (Yudong Tian) Date: Mon, 3 Nov 2003 10:30:47 -0500 Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: WOL (Wake on LAN) will do the trick. If your NIC cards support WOL, your BIOS lets you turn it on, your nodes have decent power supplies, and your network switches behave normally, then you can let your master node to turn on the slave nodes automatically. You can turn them on one by one in whatever sequence you desire. ------------------------------------------------------------ Falun Dafa: The Tao of Meditation (http://www.falundafa.org) ------------------------------------------------------------ Yudong Tian, Ph.D. NASA/GSFC (301) 286-2275 > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf > Of Mathias Brito > Sent: Monday, November 03, 2003 6:51 AM > To: beowulf at beowulf.org > Subject: Turn on nodes through the network > > > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? > > Thanks > Mathias Brito > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci?ncias Exatas e Tecnol?gicas > Estudante do Curso de Ci?ncia da Computa??o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bhalevy at panasas.com Mon Nov 3 11:02:54 2003 From: bhalevy at panasas.com (Halevy, Benny) Date: Mon, 3 Nov 2003 11:02:54 -0500 Subject: What`s wrong with my code Message-ID: <30489F1321F5C343ACF6872B2CF7942A039DF922@PIKES.panasas.com> Mathias, I suspect you run out of stack with higher values of LINHAS and COLUNAS. Try calculating how much memory these automatic variables need... > if(my_rank == master) { > int matriz1[LINHAS][COLUNAS], >matriz2[LINHAS][COLUNAS], resultado[LINHAS][COLUNAS]; You should consider to allocate these matrices dynamically using malloc(); - Benny >-----Original Message----- >From: Mathias Brito [mailto:mathiasbrito at yahoo.com.br] >Sent: Monday, November 03, 2003 7:54 AM >To: beowulf at beowulf.org >Subject: What`s wrong with my code > > >Well, I avoided to send my code, because it is not the >best way to solve the problem, and I using only the >basics calls of MPI. The programam make the sum of two >matrices. It generate two matrices ramdomically and >sum it. But it didn`t work with a matrix greather than >834x834. I dont kwon why. Some variables and functions >have portuguese names, but i commented it to say what >it do. > >#include >#include >#include >#include > >#define LINHAS 835 /*Number of lines*/ >#define COLUNAS 835 /*Number of colums*/ >#define TRUE 1 >#define FALSE 0 > >void juntar(int *, int*); /*put the result of local >operation in the final result matrix*/ >void somar(int *, int*, int*); /*make the sum*/ >void imprimir(int[][COLUNAS]); /*print a matrix*/ >void inicializar(int[][COLUNAS]); /*initialize matrix >with ramdom numbers*/ > >int main(int argc, char *argv[]) { > int minha_parte1[LINHAS], /*my_part, my_result*/ >minha_parte2[LINHAS], meu_resultado[LINHAS] = {0}; > int size, my_rank; > int i, j, tag = 0; > int master = 0; > int sair = 0; /*exit = 0*/ > MPI_Status status; > > srand(time(NULL)); > > MPI_Init(&argc, &argv); > MPI_Comm_size(MPI_COMM_WORLD, &size); > MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); > > if(my_rank == master) { > int matriz1[LINHAS][COLUNAS], >matriz2[LINHAS][COLUNAS], resultado[LINHAS][COLUNAS]; > int linhas_env = 0; /*number of line sent*/ > int linhas_rec = 0; /*number of lines received*/ > > inicializar(matriz1); > inicializar(matriz2); > > //imprimir(matriz1); > //imprimir(matriz2); > > for(i = 1; i < size; i++) { > if(linhas_env < LINHAS) { > >if(MPI_Send(&matriz1[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD) == MPI_ERR_BUFFER) > printf("ERRO\n"); > >MPI_Send(&matriz2[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD); > //printf("MASTER: Enviando >dados para o >processo %d\n", i); > linhas_env++; > } > } > > > i = 1; > while(TRUE) { > > if(linhas_rec < LINHAS) { > MPI_Recv(&meu_resultado, >COLUNAS, MPI_INT, i, >tag, MPI_COMM_WORLD, &status); > juntar(&resultado[linhas_rec][0], >meu_resultado); > //printf("MASTER: Recebendo >dados do processo >%d. Total de linhas recebidas = %d\n", i, linhas_rec + >1); > linhas_rec++; > } > else > break; > > if(linhas_env < LINHAS) { > >MPI_Send(&matriz1[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD); > >MPI_Send(&matriz2[linhas_env][0], COLUNAS, >MPI_INT, i, tag, MPI_COMM_WORLD); > //printf("MASTER: Enviado >mais dados para o >processo %d. Total de linhas enviadas = %d\n", i, >linhas_env + 1); > linhas_env++; > } > if(i == size - 1) > i = 1; > else > i++; > } > > for(i = 1; i < size; i++) { > MPI_Send(&sair, COLUNAS, MPI_INT, i, tag, >MPI_COMM_WORLD); > //printf("MASTER: Finalizando >processo %d\n", >i); > } > > printf("\n\n"); > //imprimir(resultado); > > } > else { > while(TRUE) { > MPI_Recv(&minha_parte1, COLUNAS, >MPI_INT, master, >tag, MPI_COMM_WORLD, &status); > if(minha_parte1[0] == 0) > break; > > MPI_Recv(&minha_parte2, COLUNAS, >MPI_INT, master, >tag, MPI_COMM_WORLD, &status); > > somar(minha_parte1, minha_parte2, >meu_resultado); > > MPI_Send(&meu_resultado, COLUNAS, MPI_INT, >master, tag, MPI_COMM_WORLD); > } > } > > MPI_Finalize(); > return 0; >} > >void juntar(int *m, int *r) { > int i; > for(i = 0; i < COLUNAS; i++) { > m[i] = r[i]; > } >} > >void somar(int m[], int n[], int r[]) { > int i; > for(i = 0; i < COLUNAS; i++) > r[i] = m[i] + n[i]; >} > >void imprimir(int m[][COLUNAS]) { > int i, j; > for(i = 0;i < LINHAS; i++) { > for(j = 0; j < COLUNAS; j++) { > printf("%d\t", m[i][j]); > } > printf("\n"); > } > printf("\n\n"); >} > >void inicializar(int m[][COLUNAS]) { > int i, j; > for(i = 0; i < LINHAS; i++) { > for(j = 0; j < COLUNAS; j++) { > m[i][j] = (rand() % 10) + 1; > } > } >} > > >===== >Mathias Brito >Universidade Estadual de Santa Cruz - UESC >Departamento de Ci?ncias Exatas e Tecnol?gicas >Estudante do Curso de Ci?ncia da Computa??o > >Yahoo! Mail - o melhor webmail do Brasil >http://mail.yahoo.com.br >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 3 11:52:24 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 3 Nov 2003 11:52:24 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: <1067861475.5670.2.camel@dhcp-sao01-194-186.Brazil.Sun.COM> Message-ID: On Mon, 3 Nov 2003, rafael david tinoco wrote: > i know sun has one thing called: Serial Over Lan, with that, you can > power up all stations using the "LAN CONSOLE" in the v60/65 (intel > based) machines. This is a standard feature of Intel IPMI 1.5 specification. Most implementations of IPMI also allow setting up the BIOS. While not part of the standard, it's a natural connection of BIOS-over-serial and serial-over-LAN. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 3 11:50:42 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 3 Nov 2003 11:50:42 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: On Mon, 3 Nov 2003, Mathias Brito wrote: > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? We developed the driver support (needed with some cards) and ether-wake code to do that several years ago: http://www.scyld.com/expert/wake-on-lan.html This requires both Wake-on-LAN hardware and soft-power-off, but most modern machines have that. A more reliable and sophisticated approach is to use systems with IPMI 1.5 support. That generally requires a Baseboard Management Controller (BMC) on the motherboard, which adds $25-$150 to the price. We have demoed software that hooks into the load monitoring to automatically bring up more cluster nodes as needed. That takes advantage of our ability to boot nodes in only a few seconds, but you might still consider booting your nodes on demand. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Mon Nov 3 10:28:44 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Mon, 03 Nov 2003 16:28:44 +0100 Subject: Turn on nodes through the network In-Reply-To: <20031103115124.77919.qmail@web12203.mail.yahoo.com> References: <20031103115124.77919.qmail@web12203.mail.yahoo.com> Message-ID: <1067873324.902.44.camel@revolution.mandrakesoft.com> Le lun 03/11/2003 ? 12:51, Mathias Brito a ?crit : > I finish my cluster beowulf, and first of all, I wold > like to thanks everybody that help me through this > mailing list, now I`m sure that I`ll have a lot of > other problems in this new phase. And i already have > one. I would like to know Howto boot my > machines(nodes) using the network, I would like that > turning on my master node the slave nodes > automatically wake. What can I do? or, where can I > find more information? you can add a script in your rc.local that make a series of call to ether-wake. Your nodes must be wakeable by network but most of new computers are able to do it. -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 3 13:08:29 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 3 Nov 2003 13:08:29 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: Message-ID: > 1.5 support. That generally requires a Baseboard Management Controller > (BMC) on the motherboard, which adds $25-$150 to the price. I'd very much appreciate seeing an example of this. or do you mean "BMC adds $25-150 to the price of an already gold-plated system"? as a concrete example, Tyan's S2723 is a reasonable example of a board you might find in a cluster. IPMI/BMC is an option via the qlogic zircon, but I have never found a real price for it - one vendor quoted me a little under $Cdn 1000 for the daughtercard, which is just plain ridiculous for a <$500 motherboard. thanks, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Mon Nov 3 14:29:40 2003 From: djholm at fnal.gov (Don Holmgren) Date: Mon, 03 Nov 2003 13:29:40 -0600 Subject: Turn on nodes through the network In-Reply-To: References: Message-ID: On Mon, 3 Nov 2003, Mark Hahn wrote: > > 1.5 support. That generally requires a Baseboard Management Controller > > (BMC) on the motherboard, which adds $25-$150 to the price. > > I'd very much appreciate seeing an example of this. or do you mean > "BMC adds $25-150 to the price of an already gold-plated system"? > > as a concrete example, Tyan's S2723 is a reasonable example of a board > you might find in a cluster. IPMI/BMC is an option via the qlogic > zircon, but I have never found a real price for it - one vendor quoted > me a little under $Cdn 1000 for the daughtercard, which is just plain > ridiculous for a <$500 motherboard. > > thanks, mark hahn. On clusters we've built with Supermicro E7500 or E7501 chipset motherboards (P4DPE, X5DPE), which I believe are roughly equivalent to the Tyan S2723 in features and price, there's an IPMI/BMC option card based on the Agilent BMC available. We've paid between $90 and $100 for these cards, depending on volume. I've not purchased Intel motherboards in quantity, but from doing a quick web search, it looks like the incremental price between boards without (SE7501CW2) and with (SE7501BR2) IPMI is no more than $150. Don Holmgren _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Nov 3 14:00:46 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 3 Nov 2003 11:00:46 -0800 Subject: opteron VS Itanium 2 In-Reply-To: References: Message-ID: <20031103190046.GF1167@greglaptop.internal.keyresearch.com> On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: > Yeah, me too. As someone who just ponied up for a rather large IB > installation, I'm not sure that most people realize what a substantial > percentage of the cost of the cluster the IB might be. >From all public indications, IB prices are roughly the same as Myrinet. Nothing new there... -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Mon Nov 3 14:31:40 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Mon, 3 Nov 2003 13:31:40 -0600 (CST) Subject: opteron VS Itanium 2 In-Reply-To: <20031103190046.GF1167@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 3 Nov 2003, Greg Lindahl wrote: > On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: > > > Yeah, me too. As someone who just ponied up for a rather large IB > > installation, I'm not sure that most people realize what a substantial > > percentage of the cost of the cluster the IB might be. > > From all public indications, IB prices are roughly the same as > Myrinet. Nothing new there... > > -- greg > IB costs significantly more than Myrinet... -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.worsham at mci.com Mon Nov 3 15:07:24 2003 From: michael.worsham at mci.com (Michael Worsham) Date: Mon, 03 Nov 2003 15:07:24 -0500 Subject: Freebee RH Releases... Message-ID: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> As Per Slashdot (http://slashdot.org/) Received a missive this morning from the Red Hat Network, stating that they will discontinue maintenance on Red Hat Linux 7.x and 8.0 by the end of 2003, and on Red Hat 9.0 by the end of April, 2004. And, more ominously: 'Red Hat does not plan to release another product in the Red Hat Linux line.' [The full text of the email is on Newsforge.] Does this mean that we will all have to using WS or ES version of RedHat, thus getting ripped a bit on support and updates? Anyone have a cluster running on anything else non-RH based and any details for how to do it? -- Michael _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Nov 3 16:51:17 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Mon, 3 Nov 2003 13:51:17 -0800 (PST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: On Mon, 3 Nov 2003, Michael Worsham wrote: > As Per Slashdot (http://slashdot.org/) > > Received a missive this morning from the Red Hat Network, stating that they > will discontinue maintenance on > Red Hat Linux 7.x and 8.0 by the end of 2003, and on Red Hat 9.0 by the end > of April, 2004. And, more ominously: 'Red Hat does not plan to release > another product in the Red Hat Linux line.' [The full text > of the email is > on Newsforge.] > > Does this mean that we will all have to using WS or ES version of RedHat, > thus getting ripped a bit on support and updates? Anyone have a cluster > running on anything else non-RH based and any details for how to do it? fedora.redhat.com > -- Michael > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 3 17:36:16 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 3 Nov 2003 17:36:16 -0500 (EST) Subject: Turn on nodes through the network In-Reply-To: Message-ID: On Mon, 3 Nov 2003, Mark Hahn wrote: > > 1.5 support. That generally requires a Baseboard Management Controller > > (BMC) on the motherboard, which adds $25-$150 to the price. > > I'd very much appreciate seeing an example of this. or do you mean > "BMC adds $25-150 to the price of an already gold-plated system"? I've seen a quote of +$26 to populate the BMC chip a board that supported it. Or more precisely, -$26 to delete the chip from the standard config. While not low-end boards, this was a motherboard definitely in the commodity range. Daughtercard implementations costs at the high end of the range, $70-$150, if you can find them. > as a concrete example, Tyan's S2723 is a reasonable example of a board > you might find in a cluster. IPMI/BMC is an option via the qlogic > zircon, but I have never found a real price for it - one vendor quoted > me a little under $Cdn 1000 for the daughtercard, which is just plain > ridiculous for a <$500 motherboard. That's means "we don't know how much it costs, but for $1K we'll find out". Much like the old VGA feature connector or IRDA header, a BMC header is worthless. The only way you'll get a BMC is when one is packaged and priced (and tested) with the motherboard. The Zircon chip is easily most common controller, but it seems that the firmware must be tweaked for each implementation. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 3 18:27:50 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 3 Nov 2003 18:27:50 -0500 (EST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: On Mon, 3 Nov 2003, Michael Worsham wrote: > Does this mean that we will all have to using WS or ES version of RedHat, > thus getting ripped a bit on support and updates? Anyone have a cluster > running on anything else non-RH based and any details for how to do it? Fedora, fedora, fedora. http://fedora.redhat.com/ AFAICT, this is going to de facto be "Red Hat 10" (except that one can probably not say that because of trademark issues and so forth), but with more "community involvement". Community involvement that will probably be a GOOD thing, by the way. Fedora will come pre-yummified at the core and will have RH engineers continuing to be heavily involved. This is only sensible as I expect fedora to become the core of RH's development cycle, as they aren't going to be able to offer "rawhide" of any sort at RHE prices. However, with the community really participating, I also don't expect fedora to be in any sense "alpha" or "beta" versions of RHE -- more like RHL, reasonably stable, reasonably supported, but don't expect to be able to just call RH and demand to talk to a systems engineer and get help. Which, by the way, one doesn't really do now. So nobody RHish panic, just start looking into fedora, maybe join its list(s). BTW, I expect there to be opteron support in fedora pretty soon as well. There better be; I'm getting a bunch of them...;-) rgb > > -- Michael > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 3 18:44:18 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 3 Nov 2003 18:44:18 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) (fwd) Message-ID: Andrew sent me this but forgot to add the list address, so I'm forwarding it on to the list for him...:-) I'll probably send my reply to this later. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu ---------- Forwarded message ---------- Date: Sat, 1 Nov 2003 19:48:52 +0000 From: Andrew M.A. Cater To: Robert G. Brown Subject: Re: Cluster Poll Results (tangent into OS choices) On Sat, Nov 01, 2003 at 10:40:24AM -0500, Robert G. Brown wrote: > On Fri, 31 Oct 2003, Joel Jaeggli wrote: > > > > Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't > > > claim support from RH for more than one of the systems. > > > > read the liscsense agreement for you redhat enterprise disks... > RH can request that they be allowed to audit your cluster, IIRC. I think the idea is that RH Enterprise [Server/Advanced Server/Workstation] is trademarked, copywrited and contains some non-GPL portions. RH can therefore insist that you install only one copy per single machine as per your licence - you can't just copy the binaries and put the one copy on to your other 1023 nodes. But you do get (up to) five years support. [You can't buy RHE without support, IIRC]. If you add non-GPL software to an otherwise GPL'd distribution, you can charge for it: you can also restrict the use of the whole distribution thus created as I understand it because it contains your proprietary code. If you modify GPL'd software in order to create your proprietary added value, however, then that modified software must in turn be available under the GPL. FWIW, SUSE operate the same way: you can't buy SUSE .iso's unless you buy the box and you are not licensed to make copies thereof. [SUSE do, however, make it possible to install the whole distribution via ftp from their site]. > > Note the phrase "free software". Note also the various inheritance > clauses. I'm not a lawyer; I don't know how this would ultimately > untangle in a court if someone chose to just ignore RH's license > agreement and install things as they wished, but I'm sure we'll > eventually find out...;-) > I trained as an (English) lawyer - but didn't pursue that to practice. It would depend on the jurisdiction. Another potentially good reason to go with Debian - which doesn't restrict use, modification, distribution or field of endeavour. I won't mention other purported Linux distributions which now require you to sign a non-GPL licence before you can download GPL licenced updates but that too is an interesting case. :) Andy [Potentially OT PS: Yum appears to be a re-invention of apt functionality with some improvements. You've hit the same dependencies problems that may already have been solved by apt three years ago. The _real_ trick is to sort dependency issues properly at a distribution wide level. My problems with RH have always been that the RH doesn't include enough packages and those packages that are not packaged directly by RH can be of variable quality -hence digging the package out from the 'Net somewhere and DLL hell. (It also doesn't help that there are five or six vendors out there with "different" .rpms of the same thing for SUSE/Mandrake/RH7.x/8.0/9.0 because RPM has been interpreted/implemented in a variable way). Debian isn't perfect - a spell spent on reading the mailing list archives would _easily_ prove this :) - but, perversely, having 8710 packages in the "stable" tree and about 14000 in unstable has meant that the main distribution often contains exactly the package you were looking for ready to drop in. [It's also quite useful, for example, to re-use legacy lab hardware and run your new cluster on Opterons but display the pretty output graphs on your Suns and do post-processing of the data on your old Alphas. I couldn't do that with the same versions of the software on all three architectures on any other current GNU/Linux distribution :) ] For those who haven't used Debian and wonder what all the fuss is about: apt-cache show foo will give you all the details of foo apt-get install foo will install foo and all its dependencies in one operation And by using the following command line apt-get update ; apt-get dist-upgrade your entire machine will be brought up to date. [Where apt-get update updates your package list and apt-get dist-upgrade resolves the interdependencies and fetches the necessary packages.] ] _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Mon Nov 3 17:30:51 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Mon, 03 Nov 2003 17:30:51 -0500 Subject: opteron VS Itanium 2 In-Reply-To: References: Message-ID: <3FA6D71B.1020900@comcast.net> Rocky McGaugh wrote: >On Mon, 3 Nov 2003, Greg Lindahl wrote: > > > >>On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: >> >> >> >>>Yeah, me too. As someone who just ponied up for a rather large IB >>>installation, I'm not sure that most people realize what a substantial >>>percentage of the cost of the cluster the IB might be. >>> >>> >>From all public indications, IB prices are roughly the same as >>Myrinet. Nothing new there... >> >>-- greg >> >> >> > >IB costs significantly more than Myrinet... > > Are you sure? In the quotes I've gotten, it's about the same as Myrinet except for very small clusters (perhaps 4 nodes or less). In fact in some cases, it's cheaper than Myrinet. :) Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Mon Nov 3 18:50:33 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Mon, 03 Nov 2003 18:50:33 -0500 Subject: Freebee RH Releases... In-Reply-To: References: Message-ID: <3FA6E9C9.7010804@comcast.net> Robert G. Brown wrote: >On Mon, 3 Nov 2003, Michael Worsham wrote: > > >>Does this mean that we will all have to using WS or ES version of RedHat, >>thus getting ripped a bit on support and updates? Anyone have a cluster >>running on anything else non-RH based and any details for how to do it? >> >> > >Fedora, fedora, fedora. > > http://fedora.redhat.com/ > >AFAICT, this is going to de facto be "Red Hat 10" (except that one can >probably not say that because of trademark issues and so forth), but >with more "community involvement". Community involvement that will >probably be a GOOD thing, by the way. > >Fedora will come pre-yummified at the core and will have RH engineers >continuing to be heavily involved. This is only sensible as I expect >fedora to become the core of RH's development cycle, as they aren't >going to be able to offer "rawhide" of any sort at RHE prices. However, >with the community really participating, I also don't expect fedora to >be in any sense "alpha" or "beta" versions of RHE -- more like RHL, >reasonably stable, reasonably supported, but don't expect to be able to >just call RH and demand to talk to a systems engineer and get help. > >Which, by the way, one doesn't really do now. > >So nobody RHish panic, just start looking into fedora, maybe join its >list(s). > >BTW, I expect there to be opteron support in fedora pretty soon as well. >There better be; I'm getting a bunch of them...;-) > Let me mention cAos (caosity.org). It's a community supported RH based OS built on RH EL (3.0 I think). It's following the letter of the law in removing trademarks from RH EL and rebuilding the distribution with some add features. If you look at the list of people involved, I think you'll see some familiar names from this mailing list. Consequently, I think cAos will be built with clusters in mind. Anyway, just a suggestion. Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Mon Nov 3 18:31:03 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Mon, 3 Nov 2003 17:31:03 -0600 (CST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: I can hold my tongue no longer. Most of us are faced with similar problems. Several groups are in process of making a freely distributable OS for scientific use. The cAos project (http://www.caosity.org) is one such, and one that I feel is worth looking into. The website does not reflect their expanded goals as well as it could. cAos seeks to fill many needs. There will be four main flavors of cAos to match the needs expressed by the community. They are best described here: http://caosity.org/pipermail/caos/2003-September/000385.html cAosel-2 will be built from the SRPMS located at: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS It is not yet ready for release, but it is close. It should provide a long-term base that will be great for use in clustering and server environments. There will be x86, x86_64, and ia64 versions. Other flavors of cAos will provide a good base for scientific desktops. -- Rocky McGaugh _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Mon Nov 3 18:51:41 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Mon, 3 Nov 2003 16:51:41 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031031203745.GU1408@aminor.cs.uiuc.edu>; from weideng@uiuc.edu on Fri, Oct 31, 2003 at 02:37:45PM -0600 References: <1067629499.21719.73.camel@localhost.localdomain> <20031031203745.GU1408@aminor.cs.uiuc.edu> Message-ID: <20031103165141.A3153@lnxi.com> On Fri, Oct 31 2003 at 13:37, Wei Deng wrote: > On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > > - OSCAR / Rocks / etc... - generally installed on top of another > > distribution. We still have to pick a base distribution. > > From what I heard from Rocks mailing list, they will release 3.1.0 the > next Month, which will be based on RHEL 3.0, compiled from source code > that is publicly available, and free of charge. Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test for corporations trying to coexist and actually work with Red Hat. Why not focus that questionable rebuilding effort on a more worthwhile task? E.g. porting Fedora Core to support amd64, ia64, etc; adding features to Fedora Core that are relevant to clustering, etc. > Even though Rocks is based on RedHat distribution, it is complete, which > means you only need to download Rocks ISOs to accomplish your > installation. All well and good, but basing a "complete" clustering solution on a reverse engineered RHEL is completely underhanded and wrong (regardless of whether you feel RH is being greedy or whatever). Ripping off RHEL is a pretty cheap contribution to the advancement of free clustering technology. But maybe this type of thing gets peoples' ROCKS off? Mike (these views are my own; I just happen to work for a clustering company ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmkurtzer at lbl.gov Mon Nov 3 20:42:19 2003 From: gmkurtzer at lbl.gov (Greg Kurtzer) Date: Mon, 3 Nov 2003 17:42:19 -0800 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031103165141.A3153@lnxi.com> References: <1067629499.21719.73.camel@localhost.localdomain> <20031031203745.GU1408@aminor.cs.uiuc.edu> <20031103165141.A3153@lnxi.com> Message-ID: <20031104014219.GB32428@tux.lbl.gov> On Mon, Nov 03, 2003 at 04:51:41PM -0700, Mike Snitzer told me: > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > for corporations trying to coexist and actually work with Red Hat. Why > not focus that questionable rebuilding effort on a more worthwhile task? > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > Fedora Core that are relevant to clustering, etc. I guess what some would consider a worth while task others would consider a waste of time. From what I see, Fedora core is an unreasonable solution for me and I will not be contributing to it while RH holds every seat on the steering committee and rules all directions. Not that I have anything against RH, it is just that there is a major conflict of interest, don't you think? If Fedora gets too good, won't it take business from RHEL? > > Even though Rocks is based on RedHat distribution, it is complete, which > > means you only need to download Rocks ISOs to accomplish your > > installation. > > All well and good, but basing a "complete" clustering solution on a reverse > engineered RHEL is completely underhanded and wrong (regardless of whether > you feel RH is being greedy or whatever). Ripping off RHEL is a pretty > cheap contribution to the advancement of free clustering technology. But > maybe this type of thing gets peoples' ROCKS off? Uhmm, what is reversed engineered? The source _is_ open ya know... ;) Not that I have anything against what RH is doing, but to prove a point... Isn't RH taking code from the community, and selling it back to the community with limitations on redistribution? It seems to me that to accuse the community of "ripping off" OSS software is a bit harsh. So as RH has stated, their business model is not about the code, rather their support models around the code, and their trademark. Now I do want to mention that I think that RH's new direction is what is needed for Linux to become a suitable Enterprise solution. This move however left a vacancy in the community which is why projects are emerging or changing direction to fix this. It is OSS evolution (see: http://caosity.org/). > (these views are my own; I just happen to work for a clustering company ;) My views are also mine and not necessarily shared by my employers. ;) Greg -- Greg M. Kurtzer, CSE: Linux cluster specialist Lawrence Berkeley National Laboratory Contact: O=510.495.2307, P=510.448.4540, M=510.928.9953 1 Cyclotron Road MS:50C-3396, Berkeley, CA 94720 http://www.lbl.gov, http://scs.lbl.gov/, http://lug.lbl.gov/ Email: GMKurtzer_at_lbl.gov, Text: 5109289953_at_mobileatt.net _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Nov 3 21:08:34 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Mon, 03 Nov 2003 21:08:34 -0500 Subject: Freebee RH Releases... In-Reply-To: <3FA6E9C9.7010804@comcast.net> References: <3FA6E9C9.7010804@comcast.net> Message-ID: <1067911714.4434.32.camel@protein.scalableinformatics.com> On Mon, 2003-11-03 at 18:50, Jeffrey B. Layton wrote: > Let me mention cAos (caosity.org). It's a community supported RH > based OS built on RH EL (3.0 I think). It's following the letter of the > law in removing trademarks from RH EL and rebuilding the distribution > with some add features. > If you look at the list of people involved, I think you'll see some > familiar names from this mailing list. Consequently, I think cAos will > be built with clusters in mind. On a related note, has anyone played with Warewulf? I'd like to hear back from end users who have built their clusters using this system. Please send me email offline so as not to pollute the ongoing RH discussion... -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 3 21:13:21 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 3 Nov 2003 18:13:21 -0800 (PST) Subject: Cluster Poll Results (tangent into OS choices) - options In-Reply-To: <20031103165141.A3153@lnxi.com> Message-ID: hi ya On Mon, 3 Nov 2003, Mike Snitzer wrote: > On Fri, Oct 31 2003 at 13:37, > Wei Deng wrote: > > > On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > > > - OSCAR / Rocks / etc... - generally installed on top of another > > > distribution. We still have to pick a base distribution. > > > > From what I heard from Rocks mailing list, they will release 3.1.0 the > > next Month, which will be based on RHEL 3.0, compiled from source code > > that is publicly available, and free of charge. > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > for corporations trying to coexist and actually work with Red Hat. Why > not focus that questionable rebuilding effort on a more worthwhile task? > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > Fedora Core that are relevant to clustering, etc. i think that any proprietory sw should be avoided if it requires $$$ and licenses unfortunately, sometimes, 3rd party sw is built and tested against things like RH - AS and its permutations and derivatives .. so my feeling is to avoid those 3rd party vendors too - it's a choice of: - pay RH licenses ( cheaper than an inhouse $150K/yr linux guy ??) - get an in-house linux dude that can support all the GPL stuff and tweek it to your needs/requirements - buy/get a "free" distro that has most of the apps you need - working on the lastest/breatest pre-release or beta/alpha release implies you have lots of in-house development expertise or the ability to manipulate the vendors priorities to fix the bugs you find - am thinking, ( naively ? ) why is it so hard to get a distro that does what one needs and avoid "license fees" - what's so special ... - in every instance that a 3rd party vendor required xx-linux-version-0.5 ... i've been able to make those apps work on the latest/greatest version of said vendor or other distro - reading the various sw licenses is also a full time job too :-) - support should be done by in-house staff, or outsourced to expensive "support outfits" like rh, ibm, and few others - doing support in house is best, if staff is available, in which case, the fact that redhat is not providing support for older distros is a non-issue .. - the fact that redhat and other distro wants to collect license fee or break something that used to work in prior releases is a big problem in my book, especially when 90%-99% of the apps that's running is all GPL'd - but, on the bright side, at least rh, is directly or indirectly doing and supporting a lot of development work that is released as gpl have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Mon Nov 3 21:02:23 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Mon, 3 Nov 2003 21:02:23 -0500 Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> References: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: <20031103210223.C1594@www2> I thought that this article: http://news.com.com/2100-7344_3-5094774.html did a pretty good job of explaining what Red Hat is up to, and what some of the implications are. --Bob On Mon, Nov 03, 2003 at 03:07:24PM -0500, Michael Worsham wrote: > > As Per Slashdot (http://slashdot.org/) > > Received a missive this morning from the Red Hat Network... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Mon Nov 3 22:19:48 2003 From: jsims at csiopen.com (Joey Sims) Date: Mon, 3 Nov 2003 22:19:48 -0500 Subject: IB vs Myrinet Message-ID: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> I believe IB is a much better interconnect technology than Myrinet period. Plus, you don't have to deal with Myricom. IB is about to find major traction in this industry and Myricom will not have the guns to stop it. As adoption rates increase the price will decrease quite rapidly. I've been working with Mellanox and Topspin both using Mellanox chips but, their product positioning is different. The difference between the two being that Topspin offers a more "value added" flavor of Mellanox silicon with various hardware tweaks and a more robust software package. It depends on how you're looking at the cost of IB. First of all, it's comparative to Myrinet in "cost per port". Not too long ago, Myrinet was higher in price than IB is today and they haven't came out with anything "new" in forever. Well except a PCI-X version when PCI Express is around the corner. Myricom has a lot of installations worldwide and they are highly credible without a doubt but, this industry moves very fast and new things are not a new thing. At 3x the performance of Myrinet, "comparative" is still a better value. IB has many different options such as bridging between IB, GbE, or FC so you could hang your storage boxes off the IB switch without much hassle. Up to 10GB/sec is fairly fat today. The roadmap for IB has this interconnect technology ratcheted up way higher than 10GB. Regards, ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Mon Nov 3 22:50:42 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: Mon, 03 Nov 2003 21:50:42 -0600 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <1067917842.3219.88.camel@terra> On Mon, 2003-11-03 at 21:19, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. > Hmmm, all my dealings with Myricom have been excellent. We had a frame failure right before a holiday and they happily cross-shipped a replacement. We were back in business very quickly. All our questions of support have been answered quickly and accurately. -- -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From luis.licon at cimav.edu.mx Mon Nov 3 21:13:32 2003 From: luis.licon at cimav.edu.mx (luis.licon at cimav.edu.mx) Date: Mon, 3 Nov 2003 19:13:32 -0700 (MST) Subject: Freebee RH Releases... In-Reply-To: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> References: <003001c3a246$19aeea90$9e9f32a6@Wcomnet.com> Message-ID: <37053.148.223.46.10.1067912012.squirrel@www3.cimav.edu.mx> Fedora ;) (http://fedora.redhat.com) cheerz, Luis > As Per Slashdot (http://slashdot.org/) > > Received a missive this morning from the Red Hat Network, stating that > they > will discontinue maintenance > on > Red Hat Linux 7.x and 8.0 by the end of 2003, and on Red Hat 9.0 by the > end > of April, 2004. And, more ominously: 'Red Hat does not plan to release > another product in the Red Hat Linux line.' [The full text > of the email is > on Newsforge.] > > Does this mean that we will all have to using WS or ES version of RedHat, > thus getting ripped a bit on support and updates? Anyone have a cluster > running on anything else non-RH based and any details for how to do it? > > -- Michael > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Mon Nov 3 22:48:08 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Mon, 3 Nov 2003 19:48:08 -0800 Subject: Fwd: Cluster Poll Results (tangent into OS choices) Message-ID: Begin forwarded message: > From: Glen Otero > Date: Mon Nov 3, 2003 6:42:09 PM US/Pacific > To: beowulf at beowulf.org, Mike Snitzer > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: Cluster Poll Results (tangent into OS choices) > > > On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: > >> On Fri, Oct 31 2003 at 13:37, >> Wei Deng wrote: >> >>> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>>> - OSCAR / Rocks / etc... - generally installed on top of another >>>> distribution. We still have to pick a base distribution. >>> >>> From what I heard from Rocks mailing list, they will release 3.1.0 >>> the >>> next Month, which will be based on RHEL 3.0, compiled from source >>> code >>> that is publicly available, and free of charge. >> >> Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the >> smile-test >> for corporations trying to coexist and actually work with Red Hat. > > Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with > and/or sell Rocks-based clusters? Because it won't pass the smile test > inside a corporation? > >> Why >> not focus that questionable rebuilding effort on a more worthwhile >> task? >> E.g. porting Fedora Core to support amd64, ia64, etc; adding features >> to >> Fedora Core that are relevant to clustering, etc. >> >>> Even though Rocks is based on RedHat distribution, it is complete, >>> which >>> means you only need to download Rocks ISOs to accomplish your >>> installation. >> >> All well and good, but basing a "complete" clustering solution on a >> reverse >> engineered RHEL is completely underhanded and wrong (regardless of >> whether >> you feel RH is being greedy or whatever). Ripping off RHEL is a >> pretty >> cheap contribution to the advancement of free clustering technology. >> But >> maybe this type of thing gets peoples' ROCKS off? > > It's hardly reverse engineered, underhanded, or wrong. The Rocks guys > have been releasing their software for years based on standard Red Hat > releases. In order to make their cluster software freely available on > ia64, they built RH AS 2.1 from srpms, which is perfectly legal. They > also had planned to base the Rocks software on RH9 in the near future, > but RH decided to stop supporting everything but RHEL. So, in order to > continue to provide the community with the latest and greatest > clustering software with a Red Hat foundation, the Rocks guys are > migrating to a RHEL release. And in order to keep it free of charge, > they are building it all from scratch using RHEL srpms. And don't > think they are pulling one over on Red Hat or ripping Red Hat off. The > Rocks crew communicates frequently with Red Hat regarding these very > issues. Red Hat knows exactly what they are doing and supports it. > Besides, the technology that makes Rocks what it is is hardly due to > anything Red Hat creates. It's all the software that the Rocks crew > has written and packaged on top of Red Hat that matters. > >> Mike >> >> (these views are my own; I just happen to work for a clustering >> company ;) > > These views are my own. I just happen to own a clustering company. > > Glen Otero, Ph.D. > Linux Prophet > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Mon Nov 3 23:14:43 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Mon, 3 Nov 2003 20:14:43 -0800 Subject: Fwd: [Rocks-Discuss]Fwd: Cluster Poll Results (tangent into OS choices) Message-ID: <6AB8D414-0E7D-11D8-9947-000393911A90@linuxprophet.com> Begin forwarded message: > From: "Philip Papadopoulos" > Date: Mon Nov 3, 2003 7:40:44 PM US/Pacific > To: "Glen Otero" > To: npaci-rocks-discussion-admin at sdsc.edu > To: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]Fwd: Cluster Poll Results (tangent into OS > choices) > Reply-To: phil at sdsc.edu > > Since I don't read the beowulf list ... Somebody can forward. > > 1) Rocks isn't cheaply ripping off redhat. It certainly is our right > under gpl to provide the distro in the fashion we do. > 2) I would like redhat to have a cluster pricing that > academics/companies can afford and makes sense so they don't opt out > of using a tested product. That price is > $0. > 3) Redhat does an immense amount of work for the entire linux > community. I'd like them to have a way for us to give them a > reasonable amount of money for clusters. It takes real people, time, > and money to build a complete, tested distro. > 4) Fedora is really a rolling beta. The community needs this. But, > people who run "production" clusters desire regression-tested distros > and more slowly moving software. > 5) I don't work for or run a clustering company. I don't own stock in > redhat. > > 6) Open source means freedom in software. It doesn't mean free beer. > > -p > -----Original Message----- > From: Glen Otero > Date: Mon, 3 Nov 2003 18:49:51 > To:npaci-rocks-discussion at sdsc.edu > Subject: [Rocks-Discuss]Fwd: Cluster Poll Results (tangent into OS > choices) > > > > Begin forwarded message: > >> From: Glen Otero >> Date: Mon Nov 3, 2003 6:42:09 PM US/Pacific >> To: beowulf at beowulf.org, Mike Snitzer >> Cc: npaci-rocks-discussion at sdsc.edu >> Subject: Re: Cluster Poll Results (tangent into OS choices) >> >> >> On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: >> >>> On Fri, Oct 31 2003 at 13:37, >>> Wei Deng wrote: >>> >>>> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>>>> - OSCAR / Rocks / etc... - generally installed on top of another >>>>> distribution. We still have to pick a base distribution. >>>> >>>> From what I heard from Rocks mailing list, they will release 3.1.0 >>>> the >>>> next Month, which will be based on RHEL 3.0, compiled from source >>>> code >>>> that is publicly available, and free of charge. >>> >>> Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the >>> smile-test >>> for corporations trying to coexist and actually work with Red Hat. >> >> Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with >> and/or sell Rocks-based clusters? Because it won't pass the smile test >> inside a corporation? >> >>> Why >>> not focus that questionable rebuilding effort on a more worthwhile >>> task? >>> E.g. porting Fedora Core to support amd64, ia64, etc; adding features >>> to >>> Fedora Core that are relevant to clustering, etc. >>> >>>> Even though Rocks is based on RedHat distribution, it is complete, >>>> which >>>> means you only need to download Rocks ISOs to accomplish your >>>> installation. >>> >>> All well and good, but basing a "complete" clustering solution on a >>> reverse >>> engineered RHEL is completely underhanded and wrong (regardless of >>> whether >>> you feel RH is being greedy or whatever). Ripping off RHEL is a >>> pretty >>> cheap contribution to the advancement of free clustering technology. >>> But >>> maybe this type of thing gets peoples' ROCKS off? >> >> It's hardly reverse engineered, underhanded, or wrong. The Rocks guys >> have been releasing their software for years based on standard Red Hat >> releases. In order to make their cluster software freely available on >> ia64, they built RH AS 2.1 from srpms, which is perfectly legal. They >> also had planned to base the Rocks software on RH9 in the near future, >> but RH decided to stop supporting everything but RHEL. So, in order to >> continue to provide the community with the latest and greatest >> clustering software with a Red Hat foundation, the Rocks guys are >> migrating to a RHEL release. And in order to keep it free of charge, >> they are building it all from scratch using RHEL srpms. And don't >> think they are pulling one over on Red Hat or ripping Red Hat off. The >> Rocks crew communicates frequently with Red Hat regarding these very >> issues. Red Hat knows exactly what they are doing and supports it. >> Besides, the technology that makes Rocks what it is is hardly due to >> anything Red Hat creates. It's all the software that the Rocks crew >> has written and packaged on top of Red Hat that matters. >> >>> Mike >>> >>> (these views are my own; I just happen to work for a clustering >>> company ;) >> >> These views are my own. I just happen to own a clustering company. >> >> Glen Otero, Ph.D. >> Linux Prophet > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > > Sent via BlackBerry - a service from AT&T Wireless. > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at callident.com Mon Nov 3 21:42:09 2003 From: glen at callident.com (Glen Otero) Date: Mon, 3 Nov 2003 18:42:09 -0800 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031103165141.A3153@lnxi.com> Message-ID: <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: > On Fri, Oct 31 2003 at 13:37, > Wei Deng wrote: > >> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>> - OSCAR / Rocks / etc... - generally installed on top of another >>> distribution. We still have to pick a base distribution. >> >> From what I heard from Rocks mailing list, they will release 3.1.0 the >> next Month, which will be based on RHEL 3.0, compiled from source code >> that is publicly available, and free of charge. > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the > smile-test > for corporations trying to coexist and actually work with Red Hat. Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with and/or sell Rocks-based clusters? Because it won't pass the smile test inside a corporation? > Why > not focus that questionable rebuilding effort on a more worthwhile > task? > E.g. porting Fedora Core to support amd64, ia64, etc; adding features > to > Fedora Core that are relevant to clustering, etc. > >> Even though Rocks is based on RedHat distribution, it is complete, >> which >> means you only need to download Rocks ISOs to accomplish your >> installation. > > All well and good, but basing a "complete" clustering solution on a > reverse > engineered RHEL is completely underhanded and wrong (regardless of > whether > you feel RH is being greedy or whatever). Ripping off RHEL is a pretty > cheap contribution to the advancement of free clustering technology. > But > maybe this type of thing gets peoples' ROCKS off? It's hardly reverse engineered, underhanded, or wrong. The Rocks guys have been releasing their software for years based on standard Red Hat releases. In order to make their cluster software freely available on ia64, they built RH AS 2.1 from srpms, which is perfectly legal. They also had planned to base the Rocks software on RH9 in the near future, but RH decided to stop supporting everything but RHEL. So, in order to continue to provide the community with the latest and greatest clustering software with a Red Hat foundation, the Rocks guys are migrating to a RHEL release. And in order to keep it free of charge, they are building it all from scratch using RHEL srpms. And don't think they are pulling one over on Red Hat or ripping Red Hat off. The Rocks crew communicates frequently with Red Hat regarding these very issues. Red Hat knows exactly what they are doing and supports it. Besides, the technology that makes Rocks what it is is hardly due to anything Red Hat creates. It's all the software that the Rocks crew has written and packaged on top of Red Hat that matters. > Mike > > (these views are my own; I just happen to work for a clustering > company ;) These views are my own. I just happen to own a clustering company. Glen Otero, Ph.D. Linux Prophet Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Mon Nov 3 23:28:49 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Mon, 3 Nov 2003 21:28:49 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104014219.GB32428@tux.lbl.gov>; from gmkurtzer@lbl.gov on Mon, Nov 03, 2003 at 05:42:19PM -0800 References: <1067629499.21719.73.camel@localhost.localdomain> <20031031203745.GU1408@aminor.cs.uiuc.edu> <20031103165141.A3153@lnxi.com> <20031104014219.GB32428@tux.lbl.gov> Message-ID: <20031103212849.A4021@lnxi.com> On Mon, Nov 03 2003 at 18:42, Greg Kurtzer wrote: > On Mon, Nov 03, 2003 at 04:51:41PM -0700, Mike Snitzer told me: > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > > for corporations trying to coexist and actually work with Red Hat. Why > > not focus that questionable rebuilding effort on a more worthwhile task? > > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > > Fedora Core that are relevant to clustering, etc. > > I guess what some would consider a worth while task others would consider a > waste of time. From what I see, Fedora core is an unreasonable solution for me > and I will not be contributing to it while RH holds every seat on the steering > committee and rules all directions. Not that I have anything against RH, it is > just that there is a major conflict of interest, don't you think? > > If Fedora gets too good, won't it take business from RHEL? I have the same concerns but think it would be better to challenge the level of control that the RedHat-only committee will exert on the Fedora Project sooner rather than later. Below you reference how RedHat says its not about the code; so why should Red Hat _really_ care if Fedora is even better than the enterprise offering? If RedHat holds Fedora too close to their chest they'll give people a _real_ reason to defect to other solutions. > > > Even though Rocks is based on RedHat distribution, it is complete, which > > > means you only need to download Rocks ISOs to accomplish your > > > installation. > > > > All well and good, but basing a "complete" clustering solution on a reverse > > engineered RHEL is completely underhanded and wrong (regardless of whether > > you feel RH is being greedy or whatever). Ripping off RHEL is a pretty > > cheap contribution to the advancement of free clustering technology. But > > maybe this type of thing gets peoples' ROCKS off? > > Uhmm, what is reversed engineered? The source _is_ open ya know... ;) Yeap, reverse engineered is the wrong term; how about time spent uncovering what is RH-specific that needs to be removed/replaced. I'd be inclined to say that the sustained engineering effort that is proposed for cAosel would be better spent innovating Fedora; but maybe thats just me. Today, RedHat developers openly stated on the fedora-devel list that RHELv3 code (specifically amd64 code) is open for all to filter into Fedora. Now thats a true test of the RedHat-only committee, no? > Not that I have anything against what RH is doing, but to prove a point... > Isn't RH taking code from the community, and selling it back to the community > with limitations on redistribution? It seems to me that to accuse the > community of "ripping off" OSS software is a bit harsh. > > So as RH has stated, their business model is not about the code, rather their > support models around the code, and their trademark. > > Now I do want to mention that I think that RH's new direction is what is > needed for Linux to become a suitable Enterprise solution. This move however > left a vacancy in the community which is why projects are emerging or changing > direction to fix this. It is OSS evolution (see: http://caosity.org/). Fair enough, but keep in mind that the polished innovations that RedHat has put into the Red Hat product are free too; hence the ability to just rebuild their RHEL SRPMs. Red Hat realized there was a large segment of the OSS community that would be left in the cold by their move; they balanced that fact with Fedora. Conspiracy theories on the RedHat-only committee aside, Fedora is a pretty good peace offering. Time will tell if Fedora truly is good for OSS; but to just go off and further splinter the RPM-based Linux distro space (with cAos, or whatever) is short-cited. OSCAR, ROCKS, Warewulf, could very easily take the time to make Fedora into what they need it to be. In that moderately innovative competing solutions to the same problem has been the chosen path for clustering; why not seal the same fate for the Linux distributions that their based on, right? This is a fun debate, but might be too off-topic... feel free to email me either way. Thanks, Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Mon Nov 3 23:13:32 2003 From: jsims at csiopen.com (Joey Sims) Date: Mon, 3 Nov 2003 23:13:32 -0500 Subject: IB vs Myrinet Message-ID: <812B16724C38EE45A802B03DD01FD547226268@exchange.concen.com> Hello Dean, I was stating this opinion from a manufacturers viewpoint. A viewpoint expressed outside of their circle of distribution partners. You have to have outstanding products, service and support to be as successful as Myricom. -joey ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Nov 3 23:58:16 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 03 Nov 2003 23:58:16 -0500 Subject: Fwd: Cluster Poll Results (tangent into OS choices) In-Reply-To: References: Message-ID: <3FA731E8.2060603@scalableinformatics.com> Glen Otero wrote: > > > Begin forwarded message: > >> On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: >> >>> On Fri, Oct 31 2003 at 13:37, >>> Wei Deng wrote: >>> >>>> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: >>> [...] >>> >>> Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the >>> smile-test >>> for corporations trying to coexist and actually work with Red Hat. >> >> >> Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with >> and/or sell Rocks-based clusters? Because it won't pass the smile >> test inside a corporation? > The "smile" test? I thought it was all about risks, support, etc. ROCKS appears to be in significant use as indicated by the ROCKS counter page. Remember that RedHat's added value is in packaging, bug fixes, etc. They bundle many peoples' code (Don's and probably a number of others here). They have added value back to the community as a whole. That said, they are not terribly interested in HPC from what I can see. Might be due to the size of this market compared to their total addressable market. >> >>> Why >>> not focus that questionable rebuilding effort on a more worthwhile >>> task? >>> E.g. porting Fedora Core to support amd64, ia64, etc; adding >>> features to >>> Fedora Core that are relevant to clustering, etc. >>> I would argue that Fedora is more like a permanent beta. It doesn't look like we will get good things into Fedora anytime soon (x86_64, XFS et al), and the release/support cycle is too short to be useful for long term customer support. The risks of that platform would be somewhat high for a commercial deployment, and I would find it hard to justify installing this for a customer knowing full well that next year, they are support free. >>>> Even though Rocks is based on RedHat distribution, it is complete, >>>> which >>>> means you only need to download Rocks ISOs to accomplish your >>>> installation. >>> >>> >>> All well and good, but basing a "complete" clustering solution on a >>> reverse >>> engineered RHEL is completely underhanded and wrong (regardless of >>> whether >>> you feel RH is being greedy or whatever). Ripping off RHEL is a pretty >>> cheap contribution to the advancement of free clustering >>> technology. But >>> maybe this type of thing gets peoples' ROCKS off? >> Ripping of RedHat? I thought they were packaging GPL and similar software... how is taking GPL software which is Libre' and redistributing recompiled versions of it (allowable under the license) ripping off the folks who have a their own packaging of it? >> >>> Mike >>> >>> (these views are my own; I just happen to work for a clustering >>> company ;) >> >> >> These views are my own. I just happen to own a clustering company. > RedHat is focused upon its primary market, which appears to be Unix/Windows server displacement. Mike's employer is focused upon selling hardware. Glen's company is focused upon good quality cluster software. For companies like mine, the issue is a stable reliable platform to build our product offerings. The problem with things like the permanent beta cycles of Fedora is that we will have to focus more upon the underlying issues of the platform changes (which will not be focused upon HPC needs) than on our own development. This is a moving target. This is "Not A Good Thing(TM)". A whole bunch of commercial software vendors have "old" and "outdated" OS support for their wares. I have to carefully check the software OS support matrix when building engineering or bioclusters. RedHat 7.3 is long in the tooth, and it happens to be a very good cluster distribution, in large part because so many commercial codes have been ported in the RH7.x time frame. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Tue Nov 4 00:58:14 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Mon, 3 Nov 2003 22:58:14 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com>; from glen@callident.com on Mon, Nov 03, 2003 at 06:42:09PM -0800 References: <20031103165141.A3153@lnxi.com> <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> Message-ID: <20031103225814.B4021@lnxi.com> On Mon, Nov 03 2003 at 19:42, Glen Otero wrote: > > On Monday, November 3, 2003, at 03:51 PM, Mike Snitzer wrote: > > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the > > smile-test for corporations trying to coexist and actually work with > > Red Hat. > > Really? Is that why Dell, HP, Cray, Promicro, and Intel all work with > and/or sell Rocks-based clusters? Because it won't pass the smile test > inside a corporation? Could be that the larger corporations in your list embraced Rocks before this enterprise distro vs. no-cost distro became an issue. I _really_ doubt those corporations would do themselves any justice in the eyes of RedHat by undermining RedHat's enterprise offering by having an educational institution broker RHEL rebuilds. All of this debate over RHEL repackaging appropriateness is interesting to me. I have explored this as an option and arrived at the fact that it really doesn't offer anything of real value; simply offers a free-beer solution to an otherwise expensive product. Which obviously is invaluable to Rocks and many others on this list. > > Why not focus that questionable rebuilding effort on a more worthwhile > > task? E.g. porting Fedora Core to support amd64, ia64, etc; adding > > features to Fedora Core that are relevant to clustering, etc. > > > >> Even though Rocks is based on RedHat distribution, it is complete, > >> which means you only need to download Rocks ISOs to accomplish your > >> installation. > > > > All well and good, but basing a "complete" clustering solution on a > > reverse engineered RHEL is completely underhanded and wrong > > (regardless of whether you feel RH is being greedy or whatever). > > It's hardly reverse engineered, underhanded, or wrong. The Rocks guys > have been releasing their software for years based on standard Red Hat > releases. In order to make their cluster software freely available on > ia64, they built RH AS 2.1 from srpms, which is perfectly legal. I never said rebuilding RHEL is illegal; simply stated that I felt it was underhanded and wrong; we're all entitled to our opinions. I guess the Rocks people are at peace with their chosen engineering roadmap. > Besides, the technology that makes Rocks what it is is hardly due to > anything Red Hat creates. It's all the software that the Rocks crew has > written and packaged on top of Red Hat that matters. Thats a bold statement; Rocks' dependency on RH is implicit and hacking RHEL to be "free" requires significant effort on the part of rocks developers (even though they play it down). Also there is this post that points out just how important Red Hat is to Rocks: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003307.html Also, nice to see you cross posted to the rocks-discussion, for the benefit of those on the beowulf list, Mason Katz (mjk at sdsc.edu) had an informative reply: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-November/003567.html It would appear as though Rocks is free and clear to openly redistribute RHEL SRPM-rebuilds; this is an interesting loop-hole: - Rocks released by an academic institution, which means it has a license to use the RedHat trademark. This also means no one can charge for Rocks software (only support). Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Tue Nov 4 02:11:47 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Tue, 4 Nov 2003 00:11:47 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA731E8.2060603@scalableinformatics.com>; from landman@scalableinformatics.com on Mon, Nov 03, 2003 at 11:58:16PM -0500 References: <3FA731E8.2060603@scalableinformatics.com> Message-ID: <20031104001147.D4021@lnxi.com> On Mon, Nov 03 2003 at 21:58, Joe Landman wrote: > The "smile" test? I thought it was all about risks, support, etc. > ROCKS appears to be in significant use as indicated by the ROCKS counter > page. In my _personal_ utopia of the industry smile-tests are worthy; I do however realize business is business and people want stable yet affordable solutions before anything else. That said, smiles can be had along the way. > Remember that RedHat's added value is in packaging, bug fixes, etc. Not to mention numerous contributions to the Linux kernel, low-level libraries (nptl), compilers and much more. > I would argue that Fedora is more like a permanent beta. It doesn't > look like we will get good things into Fedora anytime soon (x86_64, XFS > et al), and the release/support cycle is too short to be useful for long > term customer support. The risks of that platform would be somewhat > high for a commercial deployment, and I would find it hard to justify > installing this for a customer knowing full well that next year, they > are support free. It all comes down to opportunity cost; time spent working with the Fedora project (and its evolving policies) to add required features is time consuming and takes away from _real_ HPC innovation. BUT, if the entire HPC community actually worked together to bring about that change it wouldn't be that hard. Too idealistic? It would appear so based on the resounding cry for rebuilt RHEL solutions. Keep in mind that customers want "the real thing". > Ripping of RedHat? I thought they were packaging GPL and similar > software... how is taking GPL software which is Libre' and > redistributing recompiled versions of it (allowable under the license) > ripping off the folks who have a their own packaging of it? It comes down to the unfortunate reality that many in the HPC community would rather continuously fork/reinvent RHEL than work with Red Hat to arrive at a mutually beneficial arrangement. > RedHat is focused upon its primary market, which appears to be > Unix/Windows server displacement. Mike's employer is focused upon > selling hardware. Glen's company is focused upon good quality cluster > software. While I appreciate you associating myself and my views with my employeer I have expressed my _personal_ views. However, your assessment of my employeer's focus is not accurate; but I'm not going to get into that discussion. > For companies like mine, the issue is a stable reliable platform to > build our product offerings. The problem with things like the permanent > beta cycles of Fedora is that we will have to focus more upon the > underlying issues of the platform changes (which will not be focused > upon HPC needs) than on our own development. This is a moving target. > This is "Not A Good Thing(TM)". > > A whole bunch of commercial software vendors have "old" and "outdated" > OS support for their wares. I have to carefully check the software OS > support matrix when building engineering or bioclusters. RedHat 7.3 is > long in the tooth, and it happens to be a very good cluster > distribution, in large part because so many commercial codes have been > ported in the RH7.x time frame. Make no mistake about it, its not good for any commercial company that historically relied upon Red Hat Linux; hence the extensive attention this debate has recieved all over the Internet. You have blantantly attempted to spin this thread in a self-serving/tangential direction of company vs company; and it wasn't about that. Now I know why this list is perdominantly technical and _tries_ to stay away from the commercial interests of any one vendor. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.pfenniger at obs.unige.ch Tue Nov 4 04:01:40 2003 From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger) Date: Tue, 04 Nov 2003 10:01:40 +0100 Subject: opteron VS Itanium 2 In-Reply-To: <3FA6D71B.1020900@comcast.net> References: <3FA6D71B.1020900@comcast.net> Message-ID: <3FA76AF4.7080800@obs.unige.ch> Jeffrey B. Layton wrote: > Rocky McGaugh wrote: > >> On Mon, 3 Nov 2003, Greg Lindahl wrote: >>> On Fri, Oct 31, 2003 at 03:14:35PM -0600, Roger L. Smith wrote: >>>> Yeah, me too. As someone who just ponied up for a rather large IB >>>> installation, I'm not sure that most people realize what a substantial >>>> percentage of the cost of the cluster the IB might be. >>> From all public indications, IB prices are roughly the same as >>> Myrinet. Nothing new there... >>> >>> -- greg >>> >> IB costs significantly more than Myrinet... > > Are you sure? In the quotes I've gotten, it's about the same as Myrinet > except for very small clusters (perhaps 4 nodes or less). In fact in some > cases, it's cheaper than Myrinet. :) I confirm because we bought such a 24 node cluster with switched IB. The hardware cost of IB was 2/3 of Myrinet with better specs, but without a software support as good as provided by Myricom. For example the free mpich over GM provided by Myricom correpsonds to $200 per processor if a commercial MPI must be purchased. Daniel Pfenniger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Tue Nov 4 12:10:36 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Tue, 04 Nov 2003 09:10:36 -0800 Subject: Turn on nodes through the network References: Message-ID: <3FA7DD8C.2ED177B2@attglobal.net> Would it be possible to tweak the BIOS to behave like a BMC? Paul Schenker Don Holmgren wrote: > On Mon, 3 Nov 2003, Mark Hahn wrote: > > > > 1.5 support. That generally requires a Baseboard Management Controller > > > (BMC) on the motherboard, which adds $25-$150 to the price. > > > > I'd very much appreciate seeing an example of this. or do you mean > > "BMC adds $25-150 to the price of an already gold-plated system"? > > > > as a concrete example, Tyan's S2723 is a reasonable example of a board > > you might find in a cluster. IPMI/BMC is an option via the qlogic > > zircon, but I have never found a real price for it - one vendor quoted > > me a little under $Cdn 1000 for the daughtercard, which is just plain > > ridiculous for a <$500 motherboard. > > > > thanks, mark hahn. > > On clusters we've built with Supermicro E7500 or E7501 chipset > motherboards (P4DPE, X5DPE), which I believe are roughly equivalent to > the Tyan S2723 in features and price, there's an IPMI/BMC option card > based on the Agilent BMC available. We've paid between $90 and $100 for > these cards, depending on volume. > > I've not purchased Intel motherboards in quantity, but from doing a > quick web search, it looks like the incremental price between boards > without (SE7501CW2) and with (SE7501BR2) IPMI is no more than $150. > > Don Holmgren > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Nov 4 04:18:37 2003 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 4 Nov 2003 10:18:37 +0100 (CET) Subject: Freebee RH Releases... In-Reply-To: Message-ID: On Mon, 3 Nov 2003, Robert G. Brown wrote: > On Mon, 3 Nov 2003, Michael Worsham wrote: > > Fedora, fedora, fedora. > I agree. But please never be caught on video prancing about the stage shouting this :-) > http://fedora.redhat.com/ > > > Fedora will come pre-yummified at the core and will have RH engineers I did a yum update last night to the fedora test release. Seemed to work fine! Yum is nice (I've bene using apt-for-rpm until now) The Fedora release 1 is due out in the next couple of days (there was a slip in the release date, which was supposed to be Monday) > So nobody RHish panic, just start looking into fedora, maybe join its > list(s). > BTW, I expect there to be opteron support in fedora pretty soon as well. > There better be; I'm getting a bunch of them...;-) Ooooh.... that sounds interesting. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Tue Nov 4 05:37:47 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Tue, 4 Nov 2003 02:37:47 -0800 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <20031104103747.GA836@sphere.math.ucdavis.edu> On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. People I've talked to is pretty happy with Myricom, of course there have been the occasional complaint. But in general I'd say they deliver what they promise, and the result is pretty much as advertised. > IB is about to find major traction in this industry and Myricom will not I've heard this statement for 2 years running, not that it couldn't become true. > have the guns to stop it. As adoption rates increase the price will > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. > The difference between the two being that Topspin offers a more "value > added" flavor of Mellanox silicon with various hardware tweaks and a > more robust software package. > > It depends on how you're looking at the cost of IB. First of all, it's > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when PCI Express > is around the corner. Myricom has a lot of installations worldwide and > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different 3x the performance of Myrinet? Actual observed performance in a real life situation? Are we talking programmer visible bandwidth? Latency? Or both? At the process? Port? Or switch level? > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. Roadmaps are great, easy, and cheap. I'm most interested in what I can build a cluster with today. Do you mean 10 GB/sec = 10 Gigabytes/sec or 10 Gigabits/sec? In what conditions? Where can I download a linux compatible driver? Linux compatible MPI implementation? Linux/AMD64 drivers/MPI implementations? -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 07:38:30 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 04 Nov 2003 07:38:30 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104001147.D4021@lnxi.com> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> Message-ID: <3FA79DC6.9040608@scalableinformatics.com> Mike Snitzer wrote: >On Mon, Nov 03 2003 at 21:58, >Joe Landman wrote: > > > >>The "smile" test? I thought it was all about risks, support, etc. >>ROCKS appears to be in significant use as indicated by the ROCKS counter >>page. >> >> > >In my _personal_ utopia of the industry smile-tests are worthy; I do > > Ahh... so thats what I need to win more business. The smile test... (...) >>Remember that RedHat's added value is in packaging, bug fixes, etc. >> >> > >Not to mention numerous contributions to the Linux kernel, low-level >libraries (nptl), compilers and much more. > > This was specifically implied/mentioned in my post. "They bundle many peoples' code (Don's and probably a number of others here). They have added value back to the community as a whole." > > >>I would argue that Fedora is more like a permanent beta. It doesn't >>look like we will get good things into Fedora anytime soon (x86_64, XFS >>et al), and the release/support cycle is too short to be useful for long >>term customer support. The risks of that platform would be somewhat >>high for a commercial deployment, and I would find it hard to justify >>installing this for a customer knowing full well that next year, they >>are support free. >> >> > >It all comes down to opportunity cost; time spent working with the Fedora >project (and its evolving policies) to add required features is >time consuming and takes away from _real_ HPC innovation. > Yes. That is the point after all, that if you spend all your time discussing whether or not your masters^H^H^H^H^H^H^H benefactors will accept XFS or other HPC relevant features in the kernel, you do not then have time to put them in. Part of it is opportunity cost, the other part is a zero sum game of time. > BUT, if the >entire HPC community actually worked together to bring about that change >it wouldn't be that hard. Too idealistic? > I believe it might be too idealistic. This crowd, if you read this forum and some of the others, likes to innovate and create its own value atop some sort of standard offering. If I am reading you correctly, you are advising focusing on making on particular platform that you personally (to separate you from your employer here) like, as the standard, and stop all the bickering about doing another direction (that you personally do not like). Is this a fair read? >It would appear so based on >the resounding cry for rebuilt RHEL solutions. Keep in mind that >customers want "the real thing". > > Hmmm. The one thing that customers repeatedly tell me is that they want their solutions to be supportable. They don't want one-offs, or other things that significantly increase their risks. If the "real thing" represents a risk, they will not go for it. If the "non-real-thing" ala ROCKS, Warewulf, CLIC, OSCAR, et al. are well supported, and slowly varying enough, they are content to live with a few warts. More important than this is a specific understanding on the part of the customer that RedHat is focused upon a different market. As I noted in the post to which you replied "That said, they are not terribly interested in HPC from what I can see. Might be due to the size of this market compared to their total addressable market. " They, in this case, are RedHat. The "Real-Thing"(TM) doesn't matter in this case, if it is missing key functionality/features et al. Moreover, Fedora will be no-more the real thing than the RHEL based versions. Though, the RHEL will vary more slowly over time than Fedora, which is a very good thing for a stable commercial/academic cycle shop. > > >>Ripping of RedHat? I thought they were packaging GPL and similar >>software... how is taking GPL software which is Libre' and >>redistributing recompiled versions of it (allowable under the license) >>ripping off the folks who have a their own packaging of it? >> >> > >It comes down to the unfortunate reality that many in the HPC community >would rather continuously fork/reinvent RHEL than work with Red Hat to >arrive at a mutually beneficial arrangement. > > As noted previously "That said, they are not terribly interested in HPC from what I can see. Might be due to the size of this market compared to their total addressable market. " This market is not significant to them. It will not drive hundreds of thousands of additional unit sales. It requires levels of support that they may be unwilling to supply due to the individuality of the products offered. Aside from that, finding and paying for real HPC people (e.g. more than a few years experience) is not cheap/easy. This increases their marginal costs without really increasing their marginal utility. I don't blame them for this. It is simple economics. It leaves a market hole that (as Glen pointed out with his company), people are willing to step up to fill. > > >>RedHat is focused upon its primary market, which appears to be >>Unix/Windows server displacement. Mike's employer is focused upon >>selling hardware. Glen's company is focused upon good quality cluster >>software. >> >> > >While I appreciate you associating myself and my views with my employeer I >have expressed my _personal_ views. However, your assessment of my >employeer's focus is not accurate; but I'm not going to get into that >discussion. > > > Highly oversimplistic assessment on my part. I assumed that someone writing from an lnxi.com email address would be expressing corporate philosophy. This is your own _personal_ opinion then? >>For companies like mine, the issue is a stable reliable platform to >>build our product offerings. The problem with things like the permanent >>beta cycles of Fedora is that we will have to focus more upon the >>underlying issues of the platform changes (which will not be focused >>upon HPC needs) than on our own development. This is a moving target. >>This is "Not A Good Thing(TM)". >> >>A whole bunch of commercial software vendors have "old" and "outdated" >>OS support for their wares. I have to carefully check the software OS >>support matrix when building engineering or bioclusters. RedHat 7.3 is >>long in the tooth, and it happens to be a very good cluster >>distribution, in large part because so many commercial codes have been >>ported in the RH7.x time frame. >> >> > >Make no mistake about it, its not good for any commercial company that >historically relied upon Red Hat Linux; hence the extensive attention this >debate has recieved all over the Internet. You have blantantly attempted >to spin this thread in a self-serving/tangential direction of company >vs company; and it wasn't about that. Now I know why this list is > > Uh... no. Not even close, and mind the defensive bit, no one was attacking you. There is no "spin" here. Commercial software developers need stable bases for their products, pure and simple. Nothing self-serving about that. I didn't state whether or not I worked for a clustering company. Though you asked us above to consider your opinion to be a personal one, this statement ties your posts to your company (IMO), and using your words, " attempted to spin this thread in a self-serving ..." Glen gave his point of view, which I am guessing reflects his corporate positions. If you are saying that Glen's statement is self serving, I guess I am at a loss to understand how yours is not. I gave my point of view, and yes, it does reflect my company's philosophy. I do not believe it is "self serving" to state what we perceive as product developers. >perdominantly technical and _tries_ to stay away from the commercial >interests of any one vendor. > > I may be (slightly but purposely) dense at the moment, but I would suggest that you might wish to place a disclaimer in your posts as to when you post here, that these are your views and not those of the good folks at Linux Networx. Otherwise, folks like me, Glen, and a number of others might make a mistake and assume you are posting their views. >Mike > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 4 08:26:39 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 4 Nov 2003 08:26:39 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104014219.GB32428@tux.lbl.gov> Message-ID: On Mon, 3 Nov 2003, Greg Kurtzer wrote: > On Mon, Nov 03, 2003 at 04:51:41PM -0700, Mike Snitzer told me: > > Rebuilding RHEL3 into a freebie-ripoff version doesn't pass the smile-test > > for corporations trying to coexist and actually work with Red Hat. Why > > not focus that questionable rebuilding effort on a more worthwhile task? > > E.g. porting Fedora Core to support amd64, ia64, etc; adding features to > > Fedora Core that are relevant to clustering, etc. > > I guess what some would consider a worth while task others would consider a > waste of time. From what I see, Fedora core is an unreasonable solution for me > and I will not be contributing to it while RH holds every seat on the steering > committee and rules all directions. Not that I have anything against RH, it is > just that there is a major conflict of interest, don't you think? > > If Fedora gets too good, won't it take business from RHEL? It already is, but it is mostly business that RHEL will never get anyway. Basically, it will be a cold day in hell before Duke or any other University starts paying Red Hat a million dollars a year for access to a linux distribution consisting of 99% GPL/free packages and 1% logos, given that we do NOT consume RH "support", but are rather a net contributer (we run a primary RH mirror, for example, as well as several GPL projects some of which have in the past de facto shamelessly promoted RH and which now will shamelessly promote fedora just by existing with that as their primary base). Remember also that RH >>needs<< fedora, or something like it. They CANNOT release RHEL in rawhide (like I'm not only going to pay RH order of $100/seat, but I'm going to get and install beta-level software for that price and help debug it? Oh yeah.) One reason RH has been, for the most part, rock-solid as largish linux distributions go is because all the squishiness in each new release has been squeezed from the rock in rawhide. There are several other reasons that fedora will likely work just fine and not be subverted by RH. One is that (as we've been discussing on the list) it is by no means clear that RH can legally restrict even the reinstallation of binary RPM's whose primary content is GPL software in any way. I personally, after reading the GPL carefully yet again, think that this is the case. In fact, I think that if they do something like mix a lot of "trademark" logos in with e.g. GPL/gnome icons that are required by GPL packages in e.g. redhat-artwork that redhat-artwork de facto inherits a GPL -- the GPL is explicitly written to keep free software free as in air. Note well that I >>cannot<< make a GPL package "proprietary" in any way -- certainly not by adding a header that says something like "this package is copyright and trademark and belongs to rgb". Not even by actually writing something that adds value that IS copyright rgb. Add to the GPL base, become part of the GPL base -- that is the rule. The only packages RH can legitimately constrain the reinstallation of are: packages containing proprietary Red Hat code with no full GPL (or equivalent) components or library dependencies and packages containing strictly RH logos and trademarks, ditto. Nobody cares about either one. I don't think there are any of the former (I could be wrong) and the latter is advertising. Like I should pay RH enormous sums of money for installing their advertising on my system? Finally, one REASON that RH is splitting out fedora is to GET more work out of "the community". It costs them a fair bit to keep up the RHL releases for years after they are obsoleted. They'd like to have their developers working on RH 10, but they're still supporting RH 7, 8 and 9 and have to constantly fix bugs, backport security patches, etc. So they're putting fedora out there in part to armtwist US into keeping old releases up (or not), as a tacit acknowledgement of the realities of the GPL (which apply to RHEL whether or not they like it and frankly whether or not one rebuilds the source rpm's -- there isn't any way to restrict the installation or redistribution of a BINARY package of GPL code as I read the license), and to maintain the absolutely essential community involvement in the rawhide process that leads TO a "RHEL" that they can market with some degree of success to corporations. As I've noted and will continue to note again, I think that we are on the verge of a paradigm shift in linux anyway, and that this is going to likely kick it over the edge. The existence of yum AND apt-tools finally make it natural to consider merger, and completely alter the scaling paradigms at the institutional support level. We are seeing the very last releases of RH where CD ISO's are in any way relevant, for example. The future is going to focus completely on the network and the concept of "the repository", on cross-distribution standardized package metadata, on fully automated rebuilds from source packaging. Yum conceivably makes the entire concept of "a distribution release" obsolete -- we'll have to wait and see, but I suspect that this will be the case. Instead of upgrading systems on a timescale with granularity of years, we may be entering a universe where systems are microincrementally updated nightly, with immediate feedback and repair, and with a user/admin determined lag relative to the primary repositories to insure a level of institutional stability deemed locally acceptable. In a way this is DEMANDED by security anyway -- security requirements are a major driver of the paradigm shift. RH will eventually be making its money from RHEL by inserting themselves into this stream with a FIXED delay and a certification process required by e.g. banks and other corporations that have due diligence and government audit laws to satisfy. Everybody else will ride the wave (and generally be more secure than said banks, but it will take years for lawmakers to catch up to the new paradigm. > Now I do want to mention that I think that RH's new direction is what is > needed for Linux to become a suitable Enterprise solution. This move however > left a vacancy in the community which is why projects are emerging or changing > direction to fix this. It is OSS evolution (see: http://caosity.org/). It be VERY interesting to see what the proliferation of community efforts produces. Perhaps they'll one day merge. Perhaps not. Open source is a rich environment for the evolution of new ideas, and these will be "interesting" times indeed:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Tue Nov 4 08:31:39 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Tue, 04 Nov 2003 08:31:39 -0500 Subject: IB vs Myrinet In-Reply-To: <1067917842.3219.88.camel@terra> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <1067917842.3219.88.camel@terra> Message-ID: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> On Mon, 2003-11-03 at 22:50, Dean Johnson wrote: > On Mon, 2003-11-03 at 21:19, Joey Sims wrote: > > I believe IB is a much better interconnect technology than Myrinet > > period. Plus, you don't have to deal with Myricom. > > > > Hmmm, all my dealings with Myricom have been excellent. We had a frame > failure right before a holiday and they happily cross-shipped a > replacement. We were back in business very quickly. All our questions of > support have been answered quickly and accurately. Same here -- the support has been great. We did run into a problem where they were short on line cards to send as replacements, but they did keep us posted and let us know what was going on. BTW -- did no one notice that the statement was 'comparable prices per port' but that you had to _pay_ for the mpi implementation? I feel alot better knowing there is a free and open source implementation of MPI over gm from LAM and mpich. Nic -- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Tue Nov 4 09:34:56 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue, 4 Nov 2003 08:34:56 -0600 Subject: IB vs Myrinet In-Reply-To: <20031104103747.GA836@sphere.math.ucdavis.edu> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <20031104103747.GA836@sphere.math.ucdavis.edu> Message-ID: On Tue, 4 Nov 2003, Bill Broadley wrote: > On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > > > IB is about to find major traction in this industry and Myricom will not > > I've heard this statement for 2 years running, not that it couldn't > become true. Just look at all of the recent press releases for IB clusters being built. The hardware is finally actually available, and a lot of HPC clusters are starting to be built with it. In the spirit of full disclosure, I have three engineers on-site today from an IB vendor working with me to install a 192 node diskless IB cluster. > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > > interconnect technology ratcheted up way higher than 10GB. > > Roadmaps are great, easy, and cheap. I'm most interested in > what I can build a cluster with today. > > Do you mean 10 GB/sec = 10 Gigabytes/sec or 10 Gigabits/sec? In what > conditions? Where can I download a linux compatible driver? Linux > compatible MPI implementation? Linux/AMD64 drivers/MPI implementations? It's 10 gigabits per second (theoretical). Linux drivers are available from all of the vendors. Certain vendors (including the one I purchased my IB from) provide open-source drivers. There are a few MPI implementations, there are commercial versions MPI/Pro and ChaMPIon from MPI Software Technology, Inc. MVAPICH is available from OSC, and I'm hearing that there may be a version of LAM in the near future. I'm not sure of the status of the AMD64 drivers, although I know of at least one AMD64 cluster currently being built with IB, so at least some level of support exists. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Tue Nov 4 09:53:55 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Tue, 04 Nov 2003 09:53:55 -0500 Subject: IB vs Myrinet In-Reply-To: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> References: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> Message-ID: <3FA7BD83.2060901@lmco.com> Nicholas Henke wrote: > On Mon, 2003-11-03 at 22:50, Dean Johnson wrote: > > On Mon, 2003-11-03 at 21:19, Joey Sims wrote: > > > I believe IB is a much better interconnect technology than Myrinet > > > period. Plus, you don't have to deal with Myricom. > > > > > > > Hmmm, all my dealings with Myricom have been excellent. We had a frame > > failure right before a holiday and they happily cross-shipped a > > replacement. We were back in business very quickly. All our > questions of > > support have been answered quickly and accurately. > > Same here -- the support has been great. We did run into a problem where > they were short on line cards to send as replacements, but they did keep > us posted and let us know what was going on. > > BTW -- did no one notice that the statement was 'comparable prices per > port' but that you had to _pay_ for the mpi implementation? I feel alot > better knowing there is a free and open source implementation of MPI > over gm from LAM and mpich. > I want to interject a comment here. In the past (recent and a few years back) we've had trouble with the open source MPI implementations with our codes. When we contacted them about our problem we got a luke warm (at best) response. When we contacted a commercial MPI vendor, they fixed the problem in less than a day. Plus our codes were about 30% faster than the open-source ones. However, we continue to look at LAM, MPICH, and others. While I'm a big proponent of open-source for many reasons, at least for MPI, we've found that a commercial vendor is worthwhile for us. The one we've used provides a very good and fast product for our systems. Also, their technical support is extremely good (I normally reserve that phrase, but it truly applies in this case). More importantly, we've found that most of our problems beyond the first few months that a cluster is in production, are with MPI. Having a company to help us diagnose and fix the problem quickly means a great deal to us (we're in production 24/7 and down time is a true killer). So for us, when we look at per port costs, we include a commercial MPI for whatever network we're looking at, well with one exception. While there are differences in MPI costs based on the type of interconnect, the difference is in the noise for price/performance for us. One of the hidden costs from my prospective, that allows us to compare interconnects, is a product of the cost of diagnosing problems, fixing problems, and how frequently the problems occur. We have experience with one high-speed interconnect in this regard and that number is very large. This has made us gun-shy about trying any other high-speed interconnect on a production basis (although we continue to test). Just my 2 cents this morning. Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Tue Nov 4 10:10:48 2003 From: jcownie at etnus.com (James Cownie) Date: Tue, 04 Nov 2003 15:10:48 +0000 Subject: IB vs Myrinet In-Reply-To: Message from Nicholas Henke of "Tue, 04 Nov 2003 08:31:39 EST." <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> Message-ID: <1AH2pU-2Gz-00@etnus.com> > BTW -- did no one notice that the statement was 'comparable prices > per port' but that you had to _pay_ for the mpi implementation? I > feel alot better knowing there is a free and open source > implementation of MPI over gm from LAM and mpich. There's at least one free, open source MPICH over Infiniband project :- http://nowlab.cis.ohio-state.edu/projects/mpi-iba/ -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From waitt at saic.com Tue Nov 4 10:08:09 2003 From: waitt at saic.com (Tim Wait) Date: Tue, 04 Nov 2003 10:08:09 -0500 Subject: IB vs Myrinet In-Reply-To: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <1067917842.3219.88.camel@terra> <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> Message-ID: <3FA7C0D9.3050104@saic.com> > BTW -- did no one notice that the statement was 'comparable prices per > port' but that you had to _pay_ for the mpi implementation? I feel alot > better knowing there is a free and open source implementation of MPI > over gm from LAM and mpich. There is, it's called MVAPICH: http://nowlab.cis.ohio-state.edu/projects/mpi-iba/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Nov 4 10:06:16 2003 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 4 Nov 2003 16:06:16 +0100 (CET) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: Message-ID: On Tue, 4 Nov 2003, Robert G. Brown wrote: > > As I've noted and will continue to note again, I think that we are on > the verge of a paradigm shift in linux anyway, and that this is going to > likely kick it over the edge. The existence of yum AND apt-tools > > Instead of upgrading systems on a timescale with granularity of years, > we may be entering a universe where systems are microincrementally > updated nightly, with immediate feedback and repair, and with a > user/admin determined lag relative to the primary repositories to insure > a level of institutional stability deemed locally acceptable. In a way Agree. A couple of days ago you made a post re. updating mechanisms, where you talked about yum, or yum-like mechanisms which would recompile packages to suit the target node, or something along those lines. How about we think of applications along the same lines - but not necessarily being recompiled. In the context of a grid world, it still worries me that people are saying that application Z will run anywhere - on some machine maintained on another campus - who knows if it is even running the same Linux distribution? (OK - maybe everything needs to be statically linked then.) If I may, I'll join Bob in a blue sky thought. How about applications being installed as RPMs? Then the RPM would have dependencies - application Z needs library-b between x and y. Could you then get a message back saying 'sorry - this cluster won't support this application: it needs x-y' Sorry - a huge bly sky here. But we do keep hearing about grid and infrastructure on demand. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From egan at sense.net Tue Nov 4 13:18:35 2003 From: egan at sense.net (Egan Ford) Date: Tue, 4 Nov 2003 11:18:35 -0700 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <027d01c3a300$13c57120$27b358c7@titan> What about software? Where is the stable free OSS IB kernel modules and proven current MPICH implementation (don't point me to MVICH either, it is no longer being maintained)? What about people? Myricom has many HPC experts. I need to look at the total picture. Support, service, software, and history are equally important to raw hardware specs. > -----Original Message----- > From: beowulf-admin at scyld.com > [mailto:beowulf-admin at scyld.com] On Behalf Of Joey Sims > Sent: Monday, November 03, 2003 8:20 PM > To: beowulf at beowulf.org > Subject: IB vs Myrinet > > > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. > > IB is about to find major traction in this industry and > Myricom will not > have the guns to stop it. As adoption rates increase the price will > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. > The difference between the two being that Topspin offers a more "value > added" flavor of Mellanox silicon with various hardware tweaks and a > more robust software package. > > It depends on how you're looking at the cost of IB. First of > all, it's > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when > PCI Express > is around the corner. Myricom has a lot of installations > worldwide and > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different > options such as bridging between IB, GbE, or FC so you could hang your > storage boxes off the IB switch without much hassle. > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. > > Regards, > > ================================================== > Joey P. Sims 800.995.4274 - 242 > Sales Manager 770.442.5896 - Fax > HPC/Storage Division www.csilabs.net > Concentric Systems, Inc. jsims at csiopen.com > ====================================ISO9001:2000== > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ktpedre at sandia.gov Tue Nov 4 12:35:40 2003 From: ktpedre at sandia.gov (Kevin Pedretti) Date: Tue, 4 Nov 2003 09:35:40 -0800 Subject: IB vs. Myrinet In-Reply-To: <200311040417.hA44H0g27831@NewBlue.scyld.com> References: <200311040417.hA44H0g27831@NewBlue.scyld.com> Message-ID: <200311040935.40505.ktpedre@sandia.gov> > Subject: IB vs Myrinet > Date: Mon, 3 Nov 2003 22:19:48 -0500 > From: "Joey Sims" > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. Things like Myrinet and Quadrics do have at least one architectural advantage over IB for HPC -- they have a programmable processor on the NIC. Myricom's MX will, presumably, use the NIC processor to offload MPI receive matching. Quadrics Tports also offloads MPI matching. Offloading theoretically lowers host CPU overhead (less interrupts) and lowers latency (less trips across the PCI bus). If Ohio State's MVAPICH really scales beyond 8 nodes well (I've only seen 8 node benchmarks), then maybe my point is irrelevant. Still, in my opinion the offload approach is more elegant. I've heard some IB HBAs are programmable but there is no standardization and documentation is scarce. Does anybody have more information? Kevin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Tue Nov 4 10:00:12 2003 From: ctierney at hpti.com (Craig Tierney) Date: Tue, 4 Nov 2003 08:00:12 -0700 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <20031104150011.GC1872@hpti.com> On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Plus, you don't have to deal with Myricom. > > IB is about to find major traction in this industry and Myricom will not > have the guns to stop it. As adoption rates increase the price will > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. > The difference between the two being that Topspin offers a more "value > added" flavor of Mellanox silicon with various hardware tweaks and a > more robust software package. > > It depends on how you're looking at the cost of IB. First of all, it's > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when PCI Express > is around the corner. Myricom has a lot of installations worldwide and > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different > options such as bridging between IB, GbE, or FC so you could hang your > storage boxes off the IB switch without much hassle. Comparative is very relative. Pricing I have seen does not show that. You state 3x performance, that might be true in bandwidth for the PCIX-D card but not the E-card. The E-card does saturate the PCIX slot. For both cards, latency is better on Myrinet than on IB. For some that is more important than bandwidth. Also, Myrinet is going to release new software (MX) that reduces the latency futher. Show me a very large IB system that scales well. VaTech does not count. I hope they figure that out though. I like the IB specs and their roadmap, but it has to work as well. Craig > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. > > Regards, > > ================================================== > Joey P. Sims 800.995.4274 - 242 > Sales Manager 770.442.5896 - Fax > HPC/Storage Division www.csilabs.net > Concentric Systems, Inc. jsims at csiopen.com > ====================================ISO9001:2000== > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney at hpti.com) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 4 14:59:30 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 4 Nov 2003 14:59:30 -0500 (EST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: Message-ID: On Tue, 4 Nov 2003, John Hearns wrote: > If I may, I'll join Bob in a blue sky thought. How about applications > being installed as RPMs? Then the RPM would have dependencies - > application Z needs library-b between x and y. > Could you then get a message back saying 'sorry - this cluster won't > support this application: it needs x-y' > > Sorry - a huge bly sky here. But we do keep hearing about grid and > infrastructure on demand. Curiously and not terribly coincidentally, we REQUIRE all our linux applications to be installed as RPMs at Duke. Even commercial ones we get in some other form, we typically repackage into an RPM. Otherwise it IS very difficult to know whether dependencies are being met or warped, and a good rpm also facilitates de-installation (an rpm --erase or yum remove away). This isn't to say that individuals may not install applications that aren't rpm's on their own systems from time to time or force rpm installs from mismatched distributions without a rebuild, but this is the Dark Path and leads to RPM Hell. This also means that you don't even get the message above. If you use yum as in: yum install clusterapp and clusterapp is on one of the repositories in its /etc/yum.conf, then yum will fire back a message such as "clusterapp needs packages: clusterlib clustergui clusterdevel to be installed. Install (y/n)?: Press y and all four packages are grabbed and installed so clusterapp is ready to run, possibly from the clustergui you may not have known existed. Or use yum -y -d 0 install clusterapp and install clusterapp AND its dependencies right now and you mean it and only say something if you encounter a condition that makes the install fail. You also get warnings if clusterapp contains and wishes to replace files "belonging" to installed packages, if there are obsoletes in its dependencies, if there are dependency loops -- basically if there is anything whatsoever that cannot be automagically resolved and requires human intervention to make happen safely. (Getting such a message usually means that your system is in RPM Hell from a previous RPM force; NOT getting such a message when you should have usually means that you built something and installed it from source so that it isn't in the rpm db but is on the system anyway.) If you install strictly RPM's strictly with anaconda or yum, and never override or force anything, your system has an excellent chance of staying out of RPM Hell and being consistently automagically installable, upgradeable, updateable, and so forth. If nothing else when you put a "bad rpm" on your repository (and there are plenty out there) it won't install and you'll be forced to fix/rebuild it so that it does instead of breaking your system with a force. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Tue Nov 4 16:36:32 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Tue, 4 Nov 2003 21:36:32 +0000 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA79DC6.9040608@scalableinformatics.com> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> Message-ID: <20031104213632.GA1662@galactic.demon.co.uk> On Tue, Nov 04, 2003 at 07:38:30AM -0500, Joe Landman wrote: > > > >BUT, if the > >entire HPC community actually worked together to bring about that change > >it wouldn't be that hard. Too idealistic? > > > > I believe it might be too idealistic. This crowd, if you read this > forum and some of the others, likes to innovate and create its own value > atop some sort of standard offering. If I am reading you correctly, you > are advising focusing on making on particular platform that you > personally (to separate you from your employer here) like, as the > standard, and stop all the bickering about doing another direction (that > you personally do not like). Is this a fair read? > Everyone, There is a problem here: the sky is falling, and no-one is listening to us :) Red Hat Linux as we have known it for the past few years has changed focus. Most of the Red Hat Linux boxes out there will be unsupported after December this year - the remainder will be unsupported after April 2004 (unless they happen to convert to RH Enterprise Linux in one or other variant)> Lots of smaller Red Hat based specialist distributions will also potentially feel the knock on effect as software is not updated or you can't get the base OS on which to build any more. That's OK - it just doesn't feel comfortable at this point. The replacement is still a beta - although the beta test period is over, Fedora Core 1.0 isn't out yet. The new model of Fedora Core / Extra and so on is going to be hard to get used to - as is the accelerated speed of change and potential lack of bug fixes. "Don't fix, upgrade" may be the new model. There are calls to rip out the proprietary bits of RHEL and build a Libre version: that would possibly be unwise - you are still tying your efforts to someone else's code: this is also likely to be code where fixes are made relatively slowly on a long timescale and where the vendor may have other peoples values in mind: the typical EL customer is not necessarily the typical cluster owner. Forking small special purpose distributions is potentially a bad idea. Rocks/Warewulf/Scyld/Caos(if I'm spelling correctly) are all RH based - on older code - and all in the same marketplace. BUT There may be an alternative which will guarantee you code freedom, won't charge for licenses in any event, won't bow to "commercial" pressure and won't restrict your code use/re-use/modification/distribution. If you want an ultra stable platform to which you can freely contribute code and which you can use for any purpose - try Debian "stable". The release cycle is long, it's updated relatively infrequently but security patches and major bugs are fixed. It won't vary wildly from quarter to quarter. It takes two years between major releases - but the code will have been tested for longer and, potentially on more platforms. [It's been said that Debian is the test bed of choice for the X Window System, for example, because it is made to work on many architectures and tends to find obscure bugs] Debian "testing" is in a state of slow flux. The name is a misnomer in that the code is not necessarily beta quality: it should always be the "testing" for the next build of the full release such that it's perfectly usable from day to day / week to week / month to month and should be releasable at relatively short notice. Occasionally, major changes may break stuff for a few days: there is a transition at the moment, for example, from KDE 2.x (which was released in stable over a year ago) to KDE 3.x (which has been working in "unstable" for some months). Because of incompatibilities, some KDE components may not work for a couple of days until it settles down. For 90% of folk using a cluster, that sort of thing may be irrelevant. Testing is asymptotically approaching the next stable release - but won't be released as stable until its ready :) Debian "unstable" - may change fairly dynamically: may break but is quickly fixed. Latest bleeding edge software trickles down from here through a defined process until it reaches testing and then, ultimately, becomes part of the next stable release. Probably not for clusters - though I have a toy/evaluation cluster running at work on unstable purely for the very latest GCC, for example. Debian encourages everyone to build specialist distributions based on Debian. If the hassle's too much, feed in your cluster packages to become part of the main distribution. There is at least one Debian newsgroup (debian beowulf) where clusters are of high interest. Check out what's already within Debian. As noted in a previous post, you may find what you want has already been put in place in the 8000+ packages. This is a purely personal post. It does _not_ represent my employers or any other person. Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 17:50:57 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 04 Nov 2003 17:50:57 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104213632.GA1662@galactic.demon.co.uk> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> Message-ID: <3FA82D51.3070102@scalableinformatics.com> Andrew M.A. Cater wrote: >Everyone, > >There is a problem here: the sky is falling, and no-one is listening to >us :) > > And I thought it was just a business sea change :0 [...] >BUT > >There may be an alternative which will guarantee you code freedom, >won't charge for licenses in any event, won't bow to "commercial" >pressure and won't restrict your code use/re-use/modification/distribution. >If you want an ultra stable platform to which you can freely contribute >code and which you can use for any purpose - try Debian "stable". > > There are interesting bits in debian. I am not sure it is necessarily the right choice for clusters due to the specific lack of commercial support for cluster specific items such as Myrinet, and the other high speed interconnects. Commercial compiler support for Debian (e.g. Intel, Absoft, et al) is largely non-existant as far as I know (please do correct me if I am wrong). Few if any commercial applications are certified to work on Debian (Oracle, Legato, ....) and again, please correct me if I am wrong. I simply don't see this as a universally viable alternative. Debian does indeed have lots of nice technical things going for it. Maybe I am missing some obvious point here. I do know some people have built clusters using it, but a few clusters does not a clustering distribution make. I believe someone at Cornell built Windows 2000 into a cluster. Doesn't make Win2k a clustering OS though. The distribution matters less than the overall support for what you want to do with it. I believe that it might be possible to build a Gentoo based cluster, though I would be concerned about the length of time for an OS load, among other things. One of the hardest parts of a cluster is getting the OS on. ROCKS, BioBrew (and I understand Warewulf) make this ridiculously easy. Increasing the setup/management time, or making your life harder in general, doesn't make much sense. There is a Knoppix variant that does clustering (OpenMosix style). Not sure it is the best solution, but I would like to hear from anyone using it. -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Tue Nov 4 17:33:57 2003 From: patrick at myri.com (Patrick Geoffray) Date: 04 Nov 2003 17:33:57 -0500 Subject: IB vs Myrinet In-Reply-To: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <1067985237.1208.392.camel@asterix> Hi Joey, On Mon, 2003-11-03 at 22:19, Joey Sims wrote: > I believe IB is a much better interconnect technology than Myrinet > period. Free country. > Plus, you don't have to deal with Myricom. Would you share the horror story of you dealing with Myricom ? Did Myricom did something bad to you or your company ? > IB is about to find major traction in this industry and Myricom will not > have the guns to stop it. As adoption rates increase the price will Which industry ? That's the real question. If you say HPC, it's bad news for IB, for several reasons. First, it's a tiny market. The VC don't find it very appealing, it's not the billions dollar market they were promised. Storage and data center are worth it, but IB did not succeed to penetrate these markets. Look at the fate of the last IB company to close its doors (Fabric Networks if I remember well). The press release was saying that they were taking their money to go do something else, because the current market was not worth it. The second reason is that HPC has very special needs. You can get some success by having a big pipe, but it's usually not enough. MPI is important, application performance is important. That's not what the storage and data center needed, and that's not what IB was designed for. > decrease quite rapidly. I've been working with Mellanox and Topspin > both using Mellanox chips but, their product positioning is different. There is something very bad in this sentence: "both using Mellanox chips". Where are the dozens of silicon vendors that were supposed to flood the market and drive the price down ? They died the last 2 years. Today, it's not Infiniband, it's Mellanox and resellers. Not that different from Myricom and resellers... > It depends on how you're looking at the cost of IB. First of all, it's It really depends on how your are looking at the cost of IB. Mellanox has been, and still is I believe, burning VC cash, as they don't have the sales volume to sustain their internal cost. Today's price for IB products are not sustainable price, they are aggressive penetration price, that means it's near cost or below cost. That's why so many players died. > comparative to Myrinet in "cost per port". Not too long ago, Myrinet > was higher in price than IB is today and they haven't came out with > anything "new" in forever. Well except a PCI-X version when PCI Express > is around the corner. Myricom has a lot of installations worldwide and The current PCI-X NIC is 1X, the second one to ship by SC03 is 2X, the next one is 4X. It's not that hard to add links and aggregate bandwidth, the rest is more important (like being able to do bidirectional faster than unidirectional...). Why do you want to have a PCI-Express product when no customers ask for it because PCI-Express is not shipping in volume yet ? Don't worry, when you can buy PCI-Express nodes in volume, you will be able to buy a PCI-Express Myrinet NIC to put it in. > they are highly credible without a doubt but, this industry moves very > fast and new things are not a new thing. At 3x the performance of > Myrinet, "comparative" is still a better value. IB has many different New things are not always hardware. We will demo a completely new software stack at SC. Same hardware, much better application performance. As I said, adding links to aggregate bandwidth is easy, but doing the right thing to run applications faster is another level of difficulties. Now, when you have the right software design, just ramp up the pipe performance to please the spec believers and you have what customer wants. > options such as bridging between IB, GbE, or FC so you could hang your > storage boxes off the IB switch without much hassle. You can bridge Myrinet and GigE. Not FC, the protocol stinks too much. > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > interconnect technology ratcheted up way higher than 10GB. Myricom's roadmap goes up way higher than 10 Pb/s, if that makes you feel more comfortable. Patrick -- Patrick Geoffray Myricom, Inc. http://www.myri.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Nov 4 19:01:27 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 04 Nov 2003 16:01:27 -0800 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067985237.1208.392.camel@asterix> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> Message-ID: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> > > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > > interconnect technology ratcheted up way higher than 10GB. > >Myricom's roadmap goes up way higher than 10 Pb/s, if that makes you >feel more comfortable. > >Patrick >-- Wait a minute here... You might run into some fundamental physics problems, especially when getting "way higher" than 10 Pb/sec... I'd like to see what you've paved that road map with, and make sure it doesn't ruin my shoes when I walk on it . Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in powers of 10, here, and 100 Gb/sec is just too hard to envision..) 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 Giga, right?) Where are you going to fit those million wires/fibers/connectors? Let's say you're using optical fibers that are 10 micron in diameter (which is a fairly impressive feat). Assuming you space them by 5 micron, you can pack 1000x1000 of them in 5 mm x 5 mm... There is a bit of a problem with interconnects, etc., but perhaps you can terminate it right on top of a die, and the circuitry for one channel is small enough to fit? How tolerant is Myricom's hardware of skew and jitter between the parallel lines? At least with a million lines, you can use statistical techniques to characterize it, and you'd almost have to use some form of forward error correction, so the extreme outliers wouldn't give you troubles. You might be able to push the bit rate a bit higher.. We've got some components working at 94 GHz here, and there are some novel techniques with propagating the wave in the boundary outside a dielectric rod, so the loss is reasonable. We haven't figured out how to turn a corner yet, but that wouldn't violate any laws of physics. The distance is short, so maybe waveguide can work (optical fiber is waveguide and fairly low loss) Hmm.. now, about that X-ium or X-lon mobo that is going to send/accept/process the 10 Pb/s data stream.... What is the physical limit on memory speed? The cells can only be so small, and you've got to propagate the charge across it. I suppose, theoretically, one could use a charge as small as 1 electron, so that sort of provides a lower bound. I've heard of CMOS processes with fT of 10 GHz in very small feature sizes (the wireless market really wants to do RF and digital on the same chip). Say you get that 10 GHz memory... you'll need million way interleaving. This starts to make the SIMD systolic arrays look more attractive doesn't it. Maybe free space optical interconnects with monolithically fabricated optics over the chip might be a solution? HeNe lasers are about 474 THz, as I recall, so if you baseband encode your 10 Pb/sec bitstream, you're only looking at 30 nm extreme UV kinds of bandwidth. Lends new meaning to the RF designer's term: DC to light... James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Tue Nov 4 19:27:36 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 04 Nov 2003 17:27:36 -0700 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> Message-ID: <1067992056.21779.50.camel@fpga.sandia.gov> > Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in powers of > 10, here, and 100 Gb/sec is just too hard to envision..) > > 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 > Giga, right?) Um: http://pr.fujitsu.com/en/news/2000/09/25.html or: http://www.siemens.com/page/1,3771,228164-1-999_4_0-0-,00.html Multiple Tb/s on a single fiber... Of course, I don't have any idea what they are going to do with the data when they get it there either ;-) (i.e. issues with memory, buses, processor speeds, etc.) Or, for that matter, how they are going to get that data out of any silicon chip... But, I assume these are planned for after we are doing all optical computing, right? ;-) Keith -- Keith D. Underwood _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Tue Nov 4 19:04:12 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Tue, 4 Nov 2003 17:04:12 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104213632.GA1662@galactic.demon.co.uk>; from amacater@galactic.demon.co.uk on Tue, Nov 04, 2003 at 09:36:32PM +0000 References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> Message-ID: <20031104170412.A9868@lnxi.com> On Tue, Nov 04 2003 at 14:36, Andrew M.A. Cater wrote: > There may be an alternative which will guarantee you code freedom, > won't charge for licenses in any event, won't bow to "commercial" > pressure and won't restrict your code use/re-use/modification/distribution. > If you want an ultra stable platform to which you can freely contribute > code and which you can use for any purpose - try Debian "stable". ... > Debian encourages everyone to build specialist distributions based on > Debian. If the hassle's too much, feed in your cluster packages to > become part of the main distribution. There is at least one Debian > newsgroup (debian beowulf) where clusters are of high interest. > Check out what's already within Debian. As noted in a previous post, > you may find what you want has already been put in place in the 8000+ > packages. Debian is at a disadvantage in that RPM is not its native package format; BUT debs are also what make Debian so robust. However, RPM is the package format of choice for HPC, the enterprise, and lets not forget the LSB. Its unfortunate really, but Debian has generally prided itself on making aspiring debian developers run the deb packaging gauntlet in order to prove they've got the required deb-fu. That's something that'll have to be lessened; possibly by leveraging some of the build systems that are coming into light from developers in the Debian community. If anything I'd say that Debian's lack of native RPM support is the biggest hurdle for debian to have a break-out run as the Linux distro of choice for many. BUT there is hope; Progeny recently announced that they ported RedHat's Anaconda to Debian (still under development). This is significant in that projects like ROCKS _could_ theoretically drop Debian in as a replacement for the underlying Linux distro and still maintain the complete clustering feature set that ROCKS offers (at least kickstart compatibility). Granted extensive RedHat-isms would need to be ported to be Debian-isms; but this is where the LSB is _supposed_ to weigh-in. But as Joe Landman pointed out certification and commercial software support for Debian is non-existent (AFAIK); that _could_ change in the near future if Bruce Perens and others in the Debian community have anything to say about it. As Bruce Perens recently mentioned on an lwn.net forum: Bruce: What's wrong with RedHat? (Posted Oct 11, 2003 16:02 UTC (Sat) by BrucePerens) (Post reply) The most important things that a user-driven distribution can provide over RHAS are that the free version will be the certified one, that there won't be a lock on support information, and that it won't be dominated by one company. I am having talks with sponsors now. You'll hear from me in a few weeks. > This is a purely personal post. It does _not_ represent my employers > or any other person. Likewise... Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Tue Nov 4 19:05:13 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Wed, 5 Nov 2003 00:05:13 +0000 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA82D51.3070102@scalableinformatics.com> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> <3FA82D51.3070102@scalableinformatics.com> Message-ID: <20031105000513.GA2101@galactic.demon.co.uk> On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: > > > Andrew M.A. Cater wrote: > > [...] > > >BUT > > > >If you want an ultra stable platform to which you can freely contribute > >code and which you can use for any purpose - try Debian "stable". > > > > It's an idea. > > There are interesting bits in debian. I am not sure it is necessarily > the right choice for clusters due to the specific lack of commercial > support for cluster specific items such as Myrinet, and the other high > speed interconnects. Dan - if I build a _really big_ cluster, will you get Quadrics to do Debian :) Same goes for any other vendor - if you ask them nicely and make it worth their while, they'll do it. In many cases, it's only a recompile of a device driver to account for library differences, after all. HP use Debian internally, IIRC. Some of the Debian developers are also HP folk - HP are potentially looking to support more of their products under Linux? [See, for example, Debian Weekly News for today :) ] > Commercial compiler support for Debian (e.g. > Intel, Absoft, et al) is largely non-existant as far as I know (please > do correct me if I am wrong). Compaq Alpha compilers work on the Alpha port or can be tweaked to IIRC. I have no current expertise on big commercial compilers, however. > Few if any commercial applications are certified to work on Debian > (Oracle, Legato, ....) and again, please correct me if I am wrong. > Many of these will run fine without formal certification from the vendor. Few, if any, current commercial apps run on Red Hat 4.2 / 5.0 - and current Red Hat 7.x/8.x/9.x is now as commercially relevant. The big commercial apps will have to retrench their markets, potentially, to (one/both) of Novell / RH Enterprise Linux at ??$ per licence. Unless it says RH/Novell on the box, they won't certify on something "less but Libre" based on RH. But this is Linux - a commercial Linux app. will run on other distributions with a little thought / planning. I'm not sure they'll run Oracle on Scyld / ROCKS, for example. > I simply don't see this as a universally viable alternative. Debian > does indeed have lots of nice technical things going for it. Maybe I am > missing some obvious point here. I do know some people have built > clusters using it, but a few clusters does not a clustering distribution > make. I believe someone at Cornell built Windows 2000 into a cluster. > Doesn't make Win2k a clustering OS though. If the HPC on Linux community wants to build a clustering distribution on their terms they can within Debian. A thousand coders worldwide who have more than a passing interest in fun stuff can work wonders if they see the motivation in quality and good code - a character trait I'm sure they have in common with many cluster folk, academics and researchers :) > > The distribution matters less than the overall support for what you want > to do with it. I believe that it might be possible to build a Gentoo > based cluster, though I would be concerned about the length of time for > an OS load, among other things. One of the hardest parts of a cluster > is getting the OS on. Getting Debian nodes up is no harder than anything else on any other distribution - provided its not your first ever experience of Linux :) The minimal Debian install really is fairly minimal, if that's what you want - you can readily build from there. Want a full featured X Windows System - apt-get install x-windows-system. Want vi ?? Apt-get install vi / elvis / vim / nvi ... :) > ROCKS, BioBrew (and I understand Warewulf) make > this ridiculously easy. Increasing the setup/management time, or making > your life harder in general, doesn't make much sense. There is a > Knoppix variant that does clustering (OpenMosix style). Not sure it is > the best solution, but I would like to hear from anyone using it. > This is fun if you want an ad-hoc StoneSouperComputer - the 512 node machine built in a night on a German TV show or the four node proof of concept idea for a show and tell in someone's office - but I'm not entirely sure I'd trust my most valuable data to it. But hey, like most things KNOPPIX based its an ultra cool demo :) Have fun - at 0015 or so Zulu time, I'd better get some rest :) Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Tue Nov 4 20:27:03 2003 From: patrick at myri.com (Patrick Geoffray) Date: 04 Nov 2003 20:27:03 -0500 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> Message-ID: <1067995623.1208.431.camel@asterix> Hi Jim, On Tue, 2003-11-04 at 19:01, Jim Lux wrote: > >Myricom's roadmap goes up way higher than 10 Pb/s, if that makes you > >feel more comfortable. > Wait a minute here... You might run into some fundamental physics problems, > especially when getting "way higher" than 10 Pb/sec... I'd like to see what > you've paved that road map with, and make sure it doesn't ruin my shoes > when I walk on it . It supposed to wish you luck in some countries, right ? :-) But what about this twin photon spin stuff ? You send a photon, you keep his twin brother and when you spin the second one, the first one spin the same way a few thousand miles away.... Seriously, it was a very ironical comment, I should have marked it that way to be sure it would not be taken at the letter. I wrote Pb because I don't know what there is after "Peta". I thought about googooplexb/s but didn't know if it was valid. The point is that having something in your roadmap has no more value than what just ruined your shoes. Some vendors also have a public roadmap and a private roadmap, you need an NDA to access the private roadmap, either because you lie in the public roadmap or because the private roadmap contains strategic choices that you don't want to share with your competitors. So let's say that Myricom's roadmap goes up way higher than 10 googooplexb/s around 2030 :-) Patrick --- Patrick Geoffray Myricom, Inc. http://www.myri.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 19:29:13 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 04 Nov 2003 19:29:13 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031105000513.GA2101@galactic.demon.co.uk> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> <3FA82D51.3070102@scalableinformatics.com> <20031105000513.GA2101@galactic.demon.co.uk> Message-ID: <1067992153.4513.24.camel@protein.scalableinformatics.com> On Tue, 2003-11-04 at 19:05, Andrew M.A. Cater wrote: > On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: [...] > > There are interesting bits in debian. I am not sure it is necessarily > > the right choice for clusters due to the specific lack of commercial > > support for cluster specific items such as Myrinet, and the other high > > speed interconnects. > > Dan - if I build a _really big_ cluster, will you get Quadrics to do > Debian :) I think the question is, if you buy $10M in interconnects from them, would they please port to distro X. Likely it would be worth their while in that case. Are you going to build such a big cluster? :) > Same goes for any other vendor - if you ask them nicely and make it > worth their while, they'll do it. In many cases, it's only a recompile > of a device driver to account for library differences, after all. Not always. The issue is not simply a port, but also the support costs. Support in the sense of qualifying the port against a standard load. Coming up with the standard load, building the regression tests, educating the staff on the new support ... They may simply make the port and say, good luck, you are on your own. > HP use Debian internally, IIRC. Depends upon who you ask. Bruce Perens had some effect there, but as I remember, they use SUSE, RedHat, ROCKS, etc. > Some of the Debian developers are also > HP folk - HP are potentially looking to support more of their products > under Linux? [See, for example, Debian Weekly News for today :) ] Following on others foray into this, I am going to take a pragmatic position. I will believe it when I see it (.deb's from HP and others). > > Commercial compiler support for Debian (e.g. > > Intel, Absoft, et al) is largely non-existant as far as I know (please > > do correct me if I am wrong). > > Compaq Alpha compilers work on the Alpha port or can be tweaked to IIRC. > I have no current expertise on big commercial compilers, however. :) I seem to remember HP recently EOLing the Alpha in favor of some other chip... can't remember its name ... ;-) I can run Debian on my SGI Indy. I am not, but I can. Doesn't mean much as the market for Indy's has basically dried up. > > Few if any commercial applications are certified to work on Debian > > (Oracle, Legato, ....) and again, please correct me if I am wrong. > > > > Many of these will run fine without formal certification from the > vendor. Ok. Now sell that to a CIO/CTO, or someone responsible for making the infrastructure work. Mike at Linux Networx (though speaking for himself) called it the "smile test" or something like that. The question you will be asked is, if something breaks in our critical business application, who are we going to call if we are using the un-certified OS distribution? This is a hard sell. > Few, if any, current commercial apps run on Red Hat 4.2 / 5.0 - and > current Red Hat 7.x/8.x/9.x is now as commercially relevant. I respectfully disagree with the last portion of the statement. Most of the engineering code that I have played with recently spec out RH7.x as their linux supported platform. Anything else and you are on your own. The bio and chem codes which come pre-compiled tend to have a "requirements" section as well, listing RH7.x. Remember that RHAS2.1 will be supported a few more years, and it is ostensibly RH7.x. > The big > commercial apps will have to retrench their markets, potentially, to > (one/both) of Novell / RH Enterprise Linux at ??$ per licence. Unless > it says RH/Novell on the box, they won't certify on something "less but > Libre" based on RH. But this is Linux - a commercial Linux app. will run > on other distributions with a little thought / planning. I'm not sure > they'll run Oracle on Scyld / ROCKS, for example. Some distros are more (for lack of a better term) engineered than others. There are some code issues which some of these which break the "defacto" standard Linux. As for Oracle on ROCKS, well, Oracle does run in a supported mode on RHAS2.1 (see above), and ROCKS == RH7.3, so the rest is left to the reader. As the underlying OS is RedHat, with a meta layer atop it called ROCKS, Oracle should not see any reason not to work under this environment in a supported manner. That said, I am not sure that is what you want to do with Oracle though. [...] > > > ROCKS, BioBrew (and I understand Warewulf) make > > this ridiculously easy. Increasing the setup/management time, or making > > your life harder in general, doesn't make much sense. There is a > > Knoppix variant that does clustering (OpenMosix style). Not sure it is > > the best solution, but I would like to hear from anyone using it. > > > > This is fun if you want an ad-hoc StoneSouperComputer - the 512 node > machine built in a night on a German TV show or the four node proof > of concept idea for a show and tell in someone's office - but I'm > not entirely sure I'd trust my most valuable data to it. But hey, like > most things KNOPPIX based its an ultra cool demo :) I think the irony in all of this is that the one disk I carry with me everywhere is a Knoppix disk (Debian based). I like it, it is technically neat. > > Have fun - at 0015 or so Zulu time, I'd better get some rest :) > > Andy -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Tue Nov 4 20:08:02 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue, 4 Nov 2003 17:08:02 -0800 Subject: IB vs Myrinet In-Reply-To: <3FA7BD83.2060901@lmco.com> References: <1067952697.15517.2.camel@QUIGLEY.LINIAC.UPENN.EDU> <3FA7BD83.2060901@lmco.com> Message-ID: <20031105010802.GE4682@greglaptop.internal.keyresearch.com> On Tue, Nov 04, 2003 at 09:53:55AM -0500, Jeff Layton wrote: > I want to interject a comment here. In the past (recent and a few > years back) we've had trouble with the open source MPI implementations > with our codes. When we contacted them about our problem we got > a luke warm (at best) response. Jeff, Whom did you contact, the people maintaining MPICH, or the people who sold you the cluster? System integrators can do more than deliver piles of parts, they can also support things. So there's more to life than asking open source maintainers to fix something, or switching to a closed-source commercial MPI vendor. > One of the hidden costs from my prospective, that allows us to > compare interconnects, is a product of the cost of diagnosing problems, > fixing problems, and how frequently the problems occur. We have > experience with one high-speed interconnect in this regard and that > number is very large. Amen. Again, the front-line of interconnect support is generally the system integrator. Myricom, for example, has a reseller model, in which front-line support should be provided by your system integrator. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Tue Nov 4 20:53:52 2003 From: jsims at csiopen.com (Joey Sims) Date: Tue, 4 Nov 2003 20:53:52 -0500 Subject: IB vs Mryinet Message-ID: <812B16724C38EE45A802B03DD01FD547226269@exchange.concen.com> Hello Patrick, First of all, I want to apologize for my message initially coming off as if there were problems dealing with Myricom in regards to your products and/or customer service and it's user base. Not so. It was not intended to be directed to Myrinet users at all. I clarified that immediately. Secondly, I do not want to use this message board which I enjoy visiting, chatting with others, and learning a lot from as some tool to sling mud back and forth. Personally, I don't think that would be very courteous to the owners of this site nor its visitors. Moving forward with respect to others, I will keep this response brief and no more need be said about it. At least you guys are consistent with your "shock and awe" defense reflex at the first mention of a competitors product when compared to yours. Friendly competition is good for everyone for many reasons. Matter of fact, I've purchased your product from one of my own competitors and he was awesome. He helped me out big time. For the record, we still haven't found that missing card.... Be treated the way you would want to be treated. I am not the only one who has witnessed the show of arrogance and disrespect towards someone coming to you for years trying to sell your product and be treated like we just stuck you with bad debt. We're just trying to do our jobs. Have a great evening. Sincerely, ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 4 20:32:16 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 04 Nov 2003 20:32:16 -0500 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067992056.21779.50.camel@fpga.sandia.gov> References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> <1067992056.21779.50.camel@fpga.sandia.gov> Message-ID: <1067995935.4513.51.camel@protein.scalableinformatics.com> On Tue, 2003-11-04 at 19:27, Keith D. Underwood wrote: > > Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in powers of > > 10, here, and 100 Gb/sec is just too hard to envision..) > > > > 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 > > Giga, right?) > > Um: http://pr.fujitsu.com/en/news/2000/09/25.html > or: http://www.siemens.com/page/1,3771,228164-1-999_4_0-0-,00.html > > Multiple Tb/s on a single fiber... Get yer lambda's here... red hot... lots of them... 10000 to a fiber ... Think of this like the cable TV coax coming into your house (theoretically if need be). One wire, lots of bandwidth. I have heard (e.g. bad memory) that TV signals require 6MHz of bandwidth, so "hundreds" of TV stations require somewhat less than a GHz in bandwidth. Same effect, using different 1/lambda's for each "channel". > Of course, I don't have any idea what they are going to do with the data > when they get it there either ;-) (i.e. issues with memory, buses, > processor speeds, etc.) Or, for that matter, how they are going to get > that data out of any silicon chip... But, I assume these are planned > for after we are doing all optical computing, right? ;-) The entropy generation rate must be huge with all that data going into the bit bucket, http://www.nature.com/nature/journal/v406/n6799/box/4061047a0_bx1.html unless of course a fast enough receiver can do something about this... k(B) ln 2 for each bit "erased", so something like ((10**15) k(B) ln 2)/second, if you consider that bits dropped on the floor are erased. On the order of 10**(-8) W/K. I defer to practicing physicists on the interpretation (as it is not my field). -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmyers1400 at comcast.net Tue Nov 4 10:23:36 2003 From: rmyers1400 at comcast.net (Robert Myers) Date: Tue, 04 Nov 2003 10:23:36 -0500 Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <3FA7C478.9060908@comcast.net> Butti Gabriele - Dottorati di Ricerca wrote: > > > >The code we want to run on these machines >is basically a home-made code, not fully optimized, which allocates >around 500 Mb of RAM per node. > I'm surprised no one has picked up on this comment. If they have, I've missed it. If you don't want to mess with optimizing and tuning, don't even consider itanium. Let somebody else be the pioneer. You'll live longer. There are people who are seriously into this kind of stuff, and plainly you're not. RM _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From widyono at cis.upenn.edu Tue Nov 4 12:34:17 2003 From: widyono at cis.upenn.edu (Daniel Widyono) Date: Tue, 4 Nov 2003 12:34:17 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031103225814.B4021@lnxi.com> References: <20031103165141.A3153@lnxi.com> <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> <20031103225814.B4021@lnxi.com> Message-ID: <20031104173417.GA22590@central.cis.upenn.edu> I just asked Mason about the source of his statement, and he referred me to RedHat's own site, from which I found at http://www.redhat.com/about/corporate/trademark/guidelines/page9.html the following quote (taken out of the Educational Institutions paragraph): "This permission is not applicable to Red Hat. Enterprise Linux. or any Red Hat subscription product. Of course, you are always permitted to redistribute the code without utilizing Red Hat's trademark so long as you otherwise comply with the GNU General Public License and Red Hat's Trademark Guidelines." cAos's work is sounding mighty tasty. Thanks for the cross-post, Mike. Dan W. > Also, nice to see you cross posted to the rocks-discussion, for the > benefit of those on the beowulf list, Mason Katz (mjk at sdsc.edu) had an > informative reply: > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-November/003567.html > > It would appear as though Rocks is free and clear to openly redistribute > RHEL SRPM-rebuilds; this is an interesting loop-hole: > > - Rocks released by an academic institution, which means it has a > license to use the RedHat trademark. This also means no one can charge > for Rocks software (only support). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From phil at sdsc.edu Tue Nov 4 16:05:21 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Tue, 04 Nov 2003 13:05:21 -0800 Subject: [Rocks-Discuss]Re: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031104173417.GA22590@central.cis.upenn.edu> References: <20031103165141.A3153@lnxi.com> <7C4A0D93-0E70-11D8-9947-000393911A90@callident.com> <20031103225814.B4021@lnxi.com> <20031104173417.GA22590@central.cis.upenn.edu> Message-ID: <3FA81491.9030501@sdsc.edu> Daniel Widyono wrote: >I just asked Mason about the source of his statement, and he referred me to >RedHat's own site, from which I found at > >http://www.redhat.com/about/corporate/trademark/guidelines/page9.html > >the following quote (taken out of the Educational Institutions paragraph): > >"This permission is not applicable to Red Hat. Enterprise Linux. or any Red >Hat subscription product. Of course, you are always permitted to >redistribute the code without utilizing Red Hat's trademark so long as you >otherwise comply with the GNU General Public License and Red Hat's Trademark >Guidelines." > If read the _entire_ paragraph, this refers to a redistribution of their binaries (which we did for distributions <= 7.3). For IA-64, we recompiled everything that was on RedHat's open-source _source_ directory. I very much doubt that the enterprise-specific crown jewels (failover, etc) are in this source directory. In particular, we are not redistributing redhat-created binaries. We went down the path of trying to get permission to re-distribute IA-64 to a small select group of folks, but could not and therefore we didn't -- Hence, we "figured" out how to rebuild a complete IA-64 distro from the open-source sources. Also, we are only working with the advanced workstation (AW) source tree. I don't know (and don't care) if this the same base set that is used in the advanced server (AS) and enterprise server (ES) version that Redhat sells as well. Our _current_ clustering needs are met quite well the re-compiled open-source rpms of AW. And we are not trying to re-engineer RedHat's entire line (even though some may think we are). If one needs AS or ES, the web address is http://www.redhat.com. So, are there things in AW that aren't in Rocks. Probably. I haven't looked in detail. Are there things in Rocks that aren't in AW. Absolutely (go check our cvs repository). Are there open-source add-ons in Rocks that we didn't author? Uh. Yeah. Try a whole litany of base cluster tools. MPICH, GM, SGE, Globus, Condor-G. ... . Are there open-source things that Redhat didn't author? Duh. So if it's all open source, what does redhat sell? Services, patches, updates, notifications, integration. All of this is very very valuable. They make key contributions to linux development and have authored a whole bunch a critical software -- Their (open-source) packaging format is de-facto standard. Even SuSE uses it. We like Redhat (alot), want to support them (both morally and with $$). There is, however, a reality of how much money people have and how much they are willing to spend. RedHat will find that balance (I hope) for clusters, universities, and others. I believe that most folks agree that O($200/node/yr) does not match either the amount of money people have or how much money they are willing to spend for the support in a clustered environment. -P > >cAos's work is sounding mighty tasty. Thanks for the cross-post, Mike. > >Dan W. > > > >>Also, nice to see you cross posted to the rocks-discussion, for the >>benefit of those on the beowulf list, Mason Katz (mjk at sdsc.edu) had an >>informative reply: >>https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-November/003567.html >> >>It would appear as though Rocks is free and clear to openly redistribute >>RHEL SRPM-rebuilds; this is an interesting loop-hole: >> >> - Rocks released by an academic institution, which means it has a >>license to use the RedHat trademark. This also means no one can charge >>for Rocks software (only support). >> >> -- == Philip Papadopoulos, Ph.D. == Program Director for San Diego Supercomputer Center == Grid and Cluster Computing 9500 Gilman Drive == Ph: (858) 822-3628 University of California, San Diego == FAX: (858) 822-5407 La Jolla, CA 92093-0505 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Nov 4 20:20:57 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 04 Nov 2003 17:20:57 -0800 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067992056.21779.50.camel@fpga.sandia.gov> References: <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <5.2.0.9.2.20031104154142.018be380@mailhost4.jpl.nasa.gov> Message-ID: <5.2.0.9.2.20031104171841.030df030@mailhost4.jpl.nasa.gov> At 05:27 PM 11/4/2003 -0700, Keith D. Underwood wrote: > > Say you can run a wire/fiber/whathaveyou at 10 Gb/sec (working in > powers of > > 10, here, and 100 Gb/sec is just too hard to envision..) > > > > 10 Pb/sec would be a million times faster (Peta = 1000 Tera = 1000,000 > > Giga, right?) > >Um: http://pr.fujitsu.com/en/news/2000/09/25.html >or: http://www.siemens.com/page/1,3771,228164-1-999_4_0-0-,00.html > >Multiple Tb/s on a single fiber... > >Of course, I don't have any idea what they are going to do with the data >when they get it there either ;-) (i.e. issues with memory, buses, >processor speeds, etc.) Or, for that matter, how they are going to get >that data out of any silicon chip... But, I assume these are planned >for after we are doing all optical computing, right? ;-) > And I'll bet that fancy Wavelength Division Multiplexing (WDM) hardware that Fujitsu used is physically quite large to mux those 200 channels together. You'd run into a light time across the box problem. That nanosecond per foot adds up when you're sending bits at 10 Gbps. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 4 23:05:09 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 4 Nov 2003 23:05:09 -0500 (EST) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <1067995623.1208.431.camel@asterix> Message-ID: On 4 Nov 2003, Patrick Geoffray wrote: > But what about this twin photon spin stuff ? You send a photon, you keep > his twin brother and when you spin the second one, the first one spin > the same way a few thousand miles away.... Oooo, don't get me started on EPR. All I can say is no, no, and no. Can't describe relativistic phenomena with non-relativistic physics, and you can't "spin one photon one way" and do anything at all to the other thousands of miles away. Time reversal invariance. Generalized master equation. Quantum mechanics of closed systems is fully deterministic. Measurement represents classical interference (all measurement apparati are classical and of indeterminate/stochastic phase and known only via a trace in the GME). Ahhh, my head is exploding....must...stop...thinking...about...quantum...measurement...theory. > So let's say that Myricom's roadmap goes up way higher than 10 googooplexb/s > around 2030 :-) Let's see, if we drive a truck containing 125 terabyte sized RAID arrays through a big garage door fast enough to pass through in about one second...I guess that makes about a Pbps, right? So you're wrong, Jim. We can accomplish this today, at least in a burst. If we can afford a whole line of said trucks, we might even achieve it sustained...;-) All myricom needs is a bunch of trucks. BIG trucks...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Tue Nov 4 05:31:51 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Tue, 04 Nov 2003 11:31:51 +0100 Subject: Freebee RH Releases... In-Reply-To: References: Message-ID: <1067941911.902.126.camel@revolution.mandrakesoft.com> Le mar 04/11/2003 ? 00:31, Rocky McGaugh a ?crit : > I can hold my tongue no longer. > Most of us are faced with similar problems. > Several groups are in process of making a freely distributable OS > for scientific use. I must say that's also our goal :) This is the basis of the CLIC project but combined with an OS approach. Making a clustering OS for scientists. -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Tue Nov 4 23:35:56 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Tue, 04 Nov 2003 20:35:56 -0800 Subject: Freebee RH Releases... In-Reply-To: <1067941911.902.126.camel@revolution.mandrakesoft.com> (Erwan Velu's message of "Tue, 04 Nov 2003 11:31:51 +0100") References: <1067941911.902.126.camel@revolution.mandrakesoft.com> Message-ID: <85oevrph3n.fsf@blindglobe.net> Erwan Velu writes: > Le mar 04/11/2003 ? 00:31, Rocky McGaugh a ?crit : >> I can hold my tongue no longer. >> Most of us are faced with similar problems. >> Several groups are in process of making a freely distributable OS >> for scientific use. > I must say that's also our goal :) > This is the basis of the CLIC project but combined with an OS approach. > Making a clustering OS for scientists. It's many people's goal. It's one of the reasons for Quantian (a ClusterKnoppix repackaging, focusing on analytic and quantitative tools for data analysis) best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Tue Nov 4 23:46:07 2003 From: lathama at yahoo.com (Andrew Latham) Date: Tue, 4 Nov 2003 20:46:07 -0800 (PST) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: Message-ID: <20031105044607.37924.qmail@web60307.mail.yahoo.com> speacking of big trucks (I almost hurt myself laughing). did anyone ever use anything with the silk road technology. I read about them making fiber adapters with internal robotics to grab the fiber and automagicly align the laser for best use. It sounded cool a few years ago, Just curious. --- "Robert G. Brown" wrote: > On 4 Nov 2003, Patrick Geoffray wrote: > > > But what about this twin photon spin stuff ? You send a photon, you keep > > his twin brother and when you spin the second one, the first one spin > > the same way a few thousand miles away.... > > Oooo, don't get me started on EPR. All I can say is no, no, and no. > > Can't describe relativistic phenomena with non-relativistic physics, and > you can't "spin one photon one way" and do anything at all to the other > thousands of miles away. > > Time reversal invariance. Generalized master equation. Quantum > mechanics of closed systems is fully deterministic. Measurement > represents classical interference (all measurement apparati are > classical and of indeterminate/stochastic phase and known only via a > trace in the GME). > > Ahhh, my head is > exploding....must...stop...thinking...about...quantum...measurement...theory. > > > > > So let's say that Myricom's roadmap goes up way higher than 10 > googooplexb/s > > around 2030 :-) > > Let's see, if we drive a truck containing 125 terabyte sized RAID arrays > through a big garage door fast enough to pass through in about one > second...I guess that makes about a Pbps, right? So you're wrong, Jim. > We can accomplish this today, at least in a burst. If we can afford a > whole line of said trucks, we might even achieve it sustained...;-) > > All myricom needs is a bunch of trucks. BIG trucks...:-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /---------------------------------------------------------------------------------------------------\ Andrew Latham - Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a god and the future with witch religions are concerned with. Or, if not impossible, at least impossible at the present time. LathamA.com - (lay-th-ham-eh) - lathama at lathama.com - lathama at yahoo.com \---------------------------------------------------------------------------------------------------/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian.dobbins at yale.edu Wed Nov 5 00:38:03 2003 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Wed, 5 Nov 2003 00:38:03 -0500 (EST) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: Message-ID: Hah! Big trucks? Actually, if we use a collection of 8-disk RAID 5 arrays, equipped with Maxtor's 320GB EIDE HDD, we'd get effectively 2.24 TB per array, right? That can fit in a 2U rack, and we'd need 56 of them to reach 125 TB, so if we seperate that into 2 x 28, we'd need two rows, each 19" wide, 24" (approx.) tall, and 28 x 3.5" long - or 8.17 feet. (Laying them straight up, in this case) Hmm... let's make it 2 x 2 rows, 14 deep, so we've got a bulk that's 38" wide, 48" tall, and 49" long. Add a little for padding, I guess, but since this is just for throughput not counting, uh, well, lots of error correcting that may be needed if the road is bumpy, we'd get something a little over three feet wide, 4 feet tall, and just over 4 feet long, right? Taking a look at Chevy.com, it appears that the basic Silverado 1500 has a cargo box width of 49 inches, so we'd just make it, and a whopping 78.6 inches in length, so we can fit all of our RAID arrays, plus plenty of room left over for beer to celebrate the milestone with afterwards. :) > All myricom needs is a bunch of trucks. BIG trucks...:-) I'm betting with a U-Hual we could update the roadmaps even more! Cheers, - Brian Brian Dobbins Yale University Mechanical Engineering ------------------------------------------------------------------- "Be nice to other people... they outnumber you six billion to one." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Nov 5 01:49:27 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Tue, 04 Nov 2003 22:49:27 -0800 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: References: Message-ID: <5.2.0.9.2.20031104224018.04590310@216.82.101.6> So now you have a van full of RAID arrays... How do you load and unload it? It would be quite comical to run a spool of armored/weatherized multimode fiber from the computer room, down the hallway, out the door and into the parking lot. :-) Using the previously mentioned figure of 125TB in a truck, and a estimated coast-to-coast driving time of 60 hours, this works out to a one-way transfer rate of 2.08333 terabytes per hour, not counting the load and unload time. > Taking a look at Chevy.com, it appears that the basic Silverado 1500 has >a cargo box width of 49 inches, so we'd just make it, and a whopping 78.6 >inches in length, so we can fit all of our RAID arrays, plus plenty of >room left over for beer to celebrate the milestone with afterwards. :) > > > All myricom needs is a bunch of trucks. BIG trucks...:-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 00:30:35 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 5 Nov 2003 16:30:35 +1100 Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: References: Message-ID: <200311051630.37595.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 5 Nov 2003 03:05 pm, Robert G. Brown wrote: > On 4 Nov 2003, Patrick Geoffray wrote: > > But what about this twin photon spin stuff ? You send a photon, you keep > > his twin brother and when you spin the second one, the first one spin > > the same way a few thousand miles away.... > > Oooo, don't get me started on EPR. All I can say is no, no, and no. > > Can't describe relativistic phenomena with non-relativistic physics, and > you can't "spin one photon one way" and do anything at all to the other > thousands of miles away. I think he's talking about things like using entanglement for crypto key exchanges, etc, which has already been done. Viz: http://www.quiprocone.org/pressrelease_JRarity.htm Jan 2001 - DERA Scientists achieve world record 1.9km range for free- space secure key exchange using quantum cryptography. My favourite quote: To avoid air turbulence effects the experiment was carried out over an elevated path with the receiver on the DERA Malvern site and the transmitter located in a rented room in a conveniently situated pub on the side of the Malvern hills. These are smart cookies working on this, John is an Institute of Physics medal winner (1995). - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qIr7O2KABBYQAh8RAi0pAJ0U+T7O7DKD4FA8hX+vwWNIBb+hcQCgi/S8 hY56fIUjydfLhcU+VcmrSP8= =PlZQ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Nov 5 05:01:15 2003 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 5 Nov 2003 11:01:15 +0100 (CET) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104224018.04590310@216.82.101.6> Message-ID: On Tue, 4 Nov 2003, Eric Kuhnke wrote: > So now you have a van full of RAID arrays... How do you load and unload > it? It would be quite comical to run a spool of armored/weatherized And quite comical to turn corners (depending on how the disks are oriented of course) But don't joke about things like this. This is exactly how e-cinema is currently arranged. As I recall, Lucasfilm delivered RAID arrays with the digital files of Star Wars Episode 1 to a cinema in Leicester Square, ready for digital projection. Star Wars is the first film to be short entirely digital - from the cameras right through to projection. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Wed Nov 5 05:22:53 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Wed, 5 Nov 2003 11:22:53 +0100 Subject: Cluster Poll Results (tangent into OS choices) Message-ID: <200311051022.hA5AMrw02523@dali.crs4.it> Joe Landman wrote: > There are interesting bits in debian. I am not sure it is necessarily > the right choice for clusters due to the specific lack of commercial > support for cluster specific items such as Myrinet, and the other high > speed interconnects. The above comment is just one of many that seem to me to describe the situation as being dependent on a commercial distribution whereas, in my experience, Red Hat was not sufficient. The most stable Red Hat for a long time was 7.3 but to solve a problem with the ext3 file system, I installed the most recent kernel from kernel.org. Moreover, from time-to-time I update gcc/g77/g++ from Gnu. If I waited for Red Hat, I would be out-of-date software for the compiler and the kernel, for a non-trivial period of time. Perhaps as a slogan I should write: its not that we need a distribution as good as Red Hat, we need something even better. I was motivated to write this when I read the reference to Myrinet. The gm driver needs to be compiled with a specific kernel. Using the most recent kernel (at the time I did the work around the beginning of 2003) from kernel.org and using the most recent gcc from Gnu, I built gm and MPICH-gm and it works fine. With regard to the comment by Joe Landman, I assume he is referring to a cluster-specific distribution such as ROCKS, whereas my comments make reference to Red Hat. My intention is to raise a general question, wouldn't any RedHat-like distribution be sufficient as the base such that one person could do the rest of the work needed to build and maintain a cluster? On our cluster we have MPICH_pgi, MPICH_intel, MPICH_gcc, MPICH_pgi_myrinet, MPICH_intel_myrinet and most of these also compiled for debugging. It not fun to build but I want to give users a choice. ^was Would I have the same flexibility automatically with a distribution oriented towards clusters? Alan Scheinine _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Nov 5 04:58:38 2003 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 5 Nov 2003 10:58:38 +0100 (CET) Subject: 10 Pb/sec? Re: IB vs Myrinet In-Reply-To: <5.2.0.9.2.20031104224018.04590310@216.82.101.6> Message-ID: On Tue, 4 Nov 2003, Eric Kuhnke wrote: > So now you have a van full of RAID arrays... How do you load and unload > it? It would be quite comical to run a spool of armored/weatherized And quite comical to turn corners (depending of course on how the disks are oriented). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at mendel.bio.caltech.edu Wed Nov 5 11:59:29 2003 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 05 Nov 2003 08:59:29 -0800 Subject: A Tyan S2466 gotcha Message-ID: Has anybody else seen this? Last week I was going nuts trying to figure out why one of our Tyan S2466 nodes was running slower than all the others. It was physically the same as the other nodes (1Gb memory, S2466 motheboard, single Athlon MP 2200+) yet CPU bound jobs ran about 25% slower on only this node. Finally it crossed my mind that maybe the vendor had somehow or other stuck the wrong chip in it, and sure enough, /proc/cpuinfo showed that one node having an Athlon MP 1600+. Except it wasn't really. Simply rebooting the node changed /proc/cpuinfo back to Athlon MP 2200+. Apparently under some circumstances this motherboard will throttle back from 266Mhz to 200Mhz, at which point it misreports the identity of the CPU. Asus mobos do something similar when they shut down funny (for instance on a power failure)but they stay at the slower "safe" setting until it is changed manually in the BIOS. This Tyan board bounces back up to the higher setting on the next reboot. Anyway, the take home lesson seems to be that one should scan the /proc/cpuinfo on all nodes following a reboot to verify that all came up at the rated speed. Is there some way to configure these nodes so that they cannot drop into the lower speed? Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 13:00:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 13:00:13 -0500 (EST) Subject: A Tyan S2466 gotcha In-Reply-To: Message-ID: On Wed, 5 Nov 2003, David Mathog wrote: > Anyway, the take home lesson seems to be that one should > scan the /proc/cpuinfo on all nodes following a reboot to > verify that all came up at the rated speed. xmlsysd: Content-Length: 728 AuthenticAMD 6 6 AMD Athlon(tm) MP 1900+ 1600.096 256 AuthenticAMD 6 6 AMD Athlon(tm) Processor 1600.096 256 plus wulfstat: r00 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 21d:04h:56m:09s| 98 r01 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:30 pm| 15d:23h:17m:58s| 94 r02 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 15d:23h:17m:50s| 93 r03 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 15d:23h:17m:26s| 93 r04 |AMD Athlon(tm) MP 1900+ |1600| 256|12:51:31 pm| 21d:04h:55m:39s| 98 ... make it easy to scan a cluster for this particular problem -- all the rnodes are 2466's:-) Did the clock drop on just ONE CPU or on both? xmlsysd provides both as you can see, but up to know I only have displayed the clock of the first one in wulfstat as it never occurred to me that they might be different. > Is there some way to configure these nodes so that > they cannot drop into the lower speed? What BIOS revision are you running? Most of the problems we've had with 2466's are related to running an older BIOS. It should be at least 4.03 I think to run fairly stably. Although if this is a thermal throttling to avoid processor burnout, what it may be telling you is that this particular node has a bad CPU cooler or a ribbon cable somewhere that is partially obstructing airflow. The Tyan/Athlon combination >>really<< hates heat and responds to an excess with temper tantrums and worse. We've found that just having CPU-coolers that "work" but rattle a bit while working is enough to induce node failure under load. You might not WANT to override the BIOS action here, but rather tweak the node to run cooler. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Wed Nov 5 12:16:32 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed, 5 Nov 2003 09:16:32 -0800 (PST) Subject: Fwd: [Mauiusers] SC 2003 Activities Message-ID: <20031105171632.17590.qmail@web11401.mail.yahoo.com> --- David Jackson wrote: > News Brief > > Cluster Resources to show the latest advances in HPC > Scheduling\Resource > Management at Supercomputing 2003 with "Moab" releases. > > - Unveiling of "Moab Cluster Scheduler" - Next Generation Scheduler > - Demonstration of "Moab Cluster Manager" - Graphical Mgmt Tool > - Early Access Release of "Moab Grid Scheduler" (a.k.a. Silver) > - Maui Consortium - Great Advancements, Great Benefits, Join the Team > - BOF - Birds of a Feather Meeting: Nov 18th at Noon to 1:00 P.M. > - Visit us in the Ames Lab, CHPC, and Indiana University Booths > > Nov. 04, 2003 - Cluster Resources is pleased to announce that > demonstrations and overviews of our latest developments and plans > will be > highlighted at Supercomputing 2003 in Phoenix. Join us at a "BOF" > meeting > on the 18th at 12:00 to 1:00 P.M. in room 16-18, and view > demonstrations > and discuss the latest advancements in the booths of Ames Lab, CHPC > and > Indiana. > > Cluster Resources will be unveiling the advancements being made in > the > next generation Moab Cluster Scheduler as well as demonstrating the > Open-alpha of our Moab Cluster Monitor/Manager product that makes HPC > > scheduling a significantly simpler task. Grid scheduling will also > make a > leap forward with the Early Access Release of Moab Grid Scheduler. > > Maui Consortium will unveil some of the latest advancements that have > just > completed as well as projects that are underway in the "BOF" meeting, > and > organizations that wish to be candidates for membership may either > approach us or any other of the founding members in that meeting or > in the > above mentioned booths. This year benefits go beyond at-cost > development > projects to cover regular group training sessions, discounted support > and > usage of Early Access Release software. We extend our thanks to CHPC > for > sponsoring this year's Maui Consortium "BOF" and for Ames Lab, CHPC > and > Indiana Universities for their consideration, in providing access in > their > booths. > > About Cluster Resource, Inc. > Cluster Resources, Inc. is an industry-leading provider of resource > scheduling and management software for cluster and grid environments. > > Our vision is to provide tools and services that enable organizations > to > understand, control, and fully optimize compute resources, allowing > organizations to realize the full potential of their compute resource > investment in a way that maximizes the service delivered to the > organization.s most critical objectives. Copyright ?2001-2003 > Cluster > Resources, Inc All Rights Reserved. For more information call (801) > > 873-3400 or visit www.clusterresources.com. > > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://supercluster.org/mailman/listinfo/mauiusers __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Wed Nov 5 13:03:51 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Wed, 05 Nov 2003 13:03:51 -0500 Subject: Petabits/sec, and the like In-Reply-To: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> References: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> Message-ID: <1068055431.11118.33.camel@roughneck.liniac.upenn.edu> On Wed, 2003-11-05 at 12:40, Jim Lux wrote: > All of the enjoyable chat about achieving stupendous data rates with disk > drives in trucks is quite interesting. By the way, I don't know why you > insist on having the drives mounted in racks..why not just leave them in > their original shipping containers. There's also the concept of how many > bits are being moved in, say, a container load of Britney Spears DVDs. > (leaving aside questions about redundancy, information entropy, and whether > there is any information content in Britney Spears to begin with) > ..snipped... > For example.... old style 10Mbps thinnet ethernet used solid dielectric > coax, which had a propagation velocity of about 0.66 c. twisted pair is > probably around 0.75, fiber optics are a bit tricky, depending on the mode > of propagation, but probably around 0.85. The pickup truck full of disks > is about 1E-7. The units of the new measure would be, what, (bits per > second)*(meters per second) or bit meters per second squared. I'd normalize > by c, to make the units more useful..I'd modestly propose calling the new > unit the Lux, but it's already been used, so perhaps we should recognize > rgb's contributions by calling it the "Brown" 10Mbps over thinnet would > then be 6MegaBrowns. 100mbps over twisted pair would be 70MegaBrowns. The > 1 Pb/s truckload of disks would be 100MegaBrowns. > Couldn't resist -- 'What can Brown do for you' ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Nov 5 12:40:09 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 05 Nov 2003 09:40:09 -0800 Subject: Petabits/sec, and the like Message-ID: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> All of the enjoyable chat about achieving stupendous data rates with disk drives in trucks is quite interesting. By the way, I don't know why you insist on having the drives mounted in racks..why not just leave them in their original shipping containers. There's also the concept of how many bits are being moved in, say, a container load of Britney Spears DVDs. (leaving aside questions about redundancy, information entropy, and whether there is any information content in Britney Spears to begin with) But, on to a more practical aspect. It seems that a mere bits per second number isn't useful, because it doesn't embody some practically important things, like latency or transport time, both of which can be significant. This is of particular concern to me, because I'm used to having to deal with networks where the round trip light time is significant. So, I propose that an interesting single metric might be to scale the bit rate by the latency with which the bits appear at the other end of the pipe. As illustrious an early high performance computing as Seymour Cray recognized that this could be significant when you're looking at pumping lots of bits real fast. And, there's a handy yardstick to measure by (issues of quantum entanglement and photon twinning aside), in vacuo speed of light. For example.... old style 10Mbps thinnet ethernet used solid dielectric coax, which had a propagation velocity of about 0.66 c. twisted pair is probably around 0.75, fiber optics are a bit tricky, depending on the mode of propagation, but probably around 0.85. The pickup truck full of disks is about 1E-7. The units of the new measure would be, what, (bits per second)*(meters per second) or bit meters per second squared. I'd normalize by c, to make the units more useful..I'd modestly propose calling the new unit the Lux, but it's already been used, so perhaps we should recognize rgb's contributions by calling it the "Brown" 10Mbps over thinnet would then be 6MegaBrowns. 100mbps over twisted pair would be 70MegaBrowns. The 1 Pb/s truckload of disks would be 100MegaBrowns. This is clearly the "raw pipe speed" too... not taking into account the headers and any coding that's going on. The disk drive pipe hides all the coding and sector headers, so the measurement is a real data transfer throughput. The Ethernet scheme on the other hand, is just the signalling rate, and there is some significant non-zero overhead. One might also ask whether physical size of the system being communicated within should be factored in (say, when talking about bisection bandwidth). Clearly, a cluster with a physical dimension of 100meters is going to be slower than one with a physical dimension of 1 meter, all other things (processor speed, comm speed, etc.) being equal. One has to also consider the bandwidth of the entrance and exit to the pipe... merely having the capability to transport Tb of disk drives rapidly doesn't mean that you can put data onto those disks at a Pb/s and get it off at the other end of the shipping channel. This is where those "use free air as a communication medium" schemes get into trouble. Sure, the optical bandwidth of air (or optical fiber) is pretty darn wide (on the order of 0.5 PetaHertz (a unit I never thought I'd ever use) for just the visible spectrum) but the modulation and demodulation might prove to be a problem. There's also the issue of real computing efficiency.. speed is not everything in some applications... some applications might optimize for calculations per Dollar/Euro or calculations/Joule. Coming up with a metric for the calculation is a bit tricky. The calculations could be viewed as extracting information bits from a redundant data set (a coding/decoding process), or as creating new information (although, hmmm... this gets a bit metaphysical) I leave the selection of appropriate units and names to the community. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 13:39:54 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 13:39:54 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> Message-ID: On Wed, 5 Nov 2003, Jim Lux wrote: > by c, to make the units more useful..I'd modestly propose calling the new > unit the Lux, but it's already been used, so perhaps we should recognize > rgb's contributions by calling it the "Brown" 10Mbps over thinnet would > then be 6MegaBrowns. 100mbps over twisted pair would be 70MegaBrowns. The > 1 Pb/s truckload of disks would be 100MegaBrowns. You are clearly an evil man, and children and pets probably cross the street to avoid you. For the love of God, don't name a unit the Brown. Megabrowns. Sheeesh. > This is clearly the "raw pipe speed" too... not taking into account the > headers and any coding that's going on. The disk drive pipe hides all the > coding and sector headers, so the measurement is a real data transfer > throughput. The Ethernet scheme on the other hand, is just the signalling > rate, and there is some significant non-zero overhead. > > One might also ask whether physical size of the system being communicated > within should be factored in (say, when talking about bisection > bandwidth). Clearly, a cluster with a physical dimension of 100meters is > going to be slower than one with a physical dimension of 1 meter, all other > things (processor speed, comm speed, etc.) being equal. > > One has to also consider the bandwidth of the entrance and exit to the > pipe... merely having the capability to transport Tb of disk drives rapidly > doesn't mean that you can put data onto those disks at a Pb/s and get it > off at the other end of the shipping channel. This is where those "use > free air as a communication medium" schemes get into trouble. Sure, the > optical bandwidth of air (or optical fiber) is pretty darn wide (on the > order of 0.5 PetaHertz (a unit I never thought I'd ever use) for just the > visible spectrum) but the modulation and demodulation might prove to be a > problem. > > > There's also the issue of real computing efficiency.. speed is not > everything in some applications... some applications might optimize for > calculations per Dollar/Euro or calculations/Joule. Coming up with a > metric for the calculation is a bit tricky. The calculations could be > viewed as extracting information bits from a redundant data set (a > coding/decoding process), or as creating new information (although, hmmm... > this gets a bit metaphysical) > > I leave the selection of appropriate units and names to the community. By the time you add dollars to the problem those truckfulls of disks look pretty damn good, actually. Which one is cheaper: Building an optical fiber network capable of distributing the kids of datasets they accumulate at the big accelerator labs to the participating Universities (often on the other side of the country) with enough bandwidth to be useful, or cross-shipping a RAID that gets refilled and emptied at the ends? Consider a metaphor: Fermilab is a river of data. People at Duke are thirsty, but they can only drink just so much just so fast. It is very likely much cheaper to just ship Duke an occasional truckfull of bottled water -- I mean data -- than to build a crosscountry pipeline just to put a high capacity spigot in a single room. It is also useful to consider how long it takes to FILL a terabyte RAID. Even at (say) 100 MB/sec it is still 10,000 seconds, or about three hours. A petabyte would require 3000 hours (admittedly potentially in parallel). That would be a goodly chunk of a year. By the time bottlenecks like this are considered, the time and cost of overnight shipping a containerized PB across the country are relatively insignicant. Interesting transformations between time and spatial dimensions involved in all of this. wire/fiber carrier frequency, wire/fiber bundle density and multiplexing/termination costs plust the cost of the wire/fiber itself vs achieving a very high spatial information density using a storage VOLUME and moving the space, with THOSE associated costs. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tony at mpi-softtech.com Wed Nov 5 13:22:04 2003 From: tony at mpi-softtech.com (Anthony Skjellum) Date: Wed, 5 Nov 2003 12:22:04 -0600 Subject: IB vs Myrinet References: <812B16724C38EE45A802B03DD01FD547226267@exchange.concen.com> <20031104103747.GA836@sphere.math.ucdavis.edu> Message-ID: <000a01c3a3cd$01575670$a900a8c0@cis.uab.edu> Roger, FYI, we have had only trivial issues supporting Opteron-based IB, for instance, things worked in a matter of hours with Racksaver-based Opteron dual 1U boxes with Mellanox cards and VAPI drivers. This is with a SuSE/UNITED linux type build that Racksaver generally ships (nothing specialized). In polling mode, we had about 4.5us latency from the Mellanox A1 cards with VAPI out of the box from MPI/Pro (above 4-5 months ago). After that, we ran ChaMPIon/Pro on some larger cluster configs with dual Opteron too, all very straightforward. -Tony ----- Original Message ----- From: "Roger L. Smith" To: "Bill Broadley" Cc: "Joey Sims" ; Sent: Tuesday, November 04, 2003 8:34 AM Subject: Re: IB vs Myrinet > On Tue, 4 Nov 2003, Bill Broadley wrote: > > > On Mon, Nov 03, 2003 at 10:19:48PM -0500, Joey Sims wrote: > > > > > IB is about to find major traction in this industry and Myricom will > not > > > > I've heard this statement for 2 years running, not that it couldn't > > become true. > > Just look at all of the recent press releases for IB clusters being built. > The hardware is finally actually available, and a lot of HPC clusters are > starting to be built with it. > > In the spirit of full disclosure, I have three engineers on-site today > from an IB vendor working with me to install a 192 node diskless IB > cluster. > > > > Up to 10GB/sec is fairly fat today. The roadmap for IB has this > > > interconnect technology ratcheted up way higher than 10GB. > > > > Roadmaps are great, easy, and cheap. I'm most interested in > > what I can build a cluster with today. > > > > Do you mean 10 GB/sec = 10 Gigabytes/sec or 10 Gigabits/sec? In what > > conditions? Where can I download a linux compatible driver? Linux > > compatible MPI implementation? Linux/AMD64 drivers/MPI implementations? > > It's 10 gigabits per second (theoretical). Linux drivers are available > from all of the vendors. Certain vendors (including the one I purchased > my IB from) provide open-source drivers. > > There are a few MPI implementations, there are commercial versions MPI/Pro > and ChaMPIon from MPI Software Technology, Inc. MVAPICH is available from > OSC, and I'm hearing that there may be a version of LAM in the near > future. > > I'm not sure of the status of the AMD64 drivers, although I know of at > least one AMD64 cluster currently being built with IB, so at least some > level of support exists. > > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ > | Roger L. Smith Phone: 662-325-3625 | > | Sr. Systems Administrator FAX: 662-325-7692 | > | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | > | Mississippi State University | > |____________________________________ERC__________________________________| > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From konstantin_kudin at yahoo.com Wed Nov 5 17:54:05 2003 From: konstantin_kudin at yahoo.com (Konstantin Kudin) Date: Wed, 5 Nov 2003 14:54:05 -0800 (PST) Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux Message-ID: <20031105225405.60387.qmail@web21205.mail.yahoo.com> Could anyone please share experiences with these boards under linux? Is it still a risky proposition at this time? It seem like there are drivers for AMD-8111/8131/8151 chipset on the AMD page, drivers for the Broadcom network chip in other places. Any feedback on SATA support for the Silicon Image Sil3114 SATA RAID Accelerator and on SATA support in general? Any other caveats? Thanks in advance for any help! Konstantin __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Nov 5 16:41:49 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed, 05 Nov 2003 13:41:49 -0800 Subject: Petabits/sec, and the like In-Reply-To: References: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> Message-ID: <5.2.0.9.2.20031105133632.03f6f078@216.82.101.6> For those interested in long distance optical transmission of research data, Bill St. Arnaud of CANARIE/CAnet4 posts many interesting things to the CANET-NEWS list: http://morris.canarie.ca/MLISTS/news2003/index.html Quoting from a post to the list: "Another example is the Canadian Virtual Observatory that will require to transfer data files and stream instrumentation data of over half a terabyte a day (!!) from facilities to Hawaii, France and UK http://www.risq.qc.ca/risq2003-canw2003/en/conferenciers/_sgaudet.html " >Consider a metaphor: Fermilab is a river of data. People at Duke are >thirsty, but they can only drink just so much just so fast. It is very >likely much cheaper to just ship Duke an occasional truckfull of bottled >water -- I mean data -- than to build a crosscountry pipeline just to >put a high capacity spigot in a single room. > >It is also useful to consider how long it takes to FILL a terabyte RAID. >Even at (say) 100 MB/sec it is still 10,000 seconds, or about three >hours. A petabyte would require 3000 hours (admittedly potentially in >parallel). That would be a goodly chunk of a year. By the time >bottlenecks like this are considered, the time and cost of overnight >shipping a containerized PB across the country are relatively >insignicant. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 17:23:39 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 6 Nov 2003 09:23:39 +1100 Subject: Petabits/sec, and the like In-Reply-To: <5.2.0.9.2.20031105133632.03f6f078@216.82.101.6> References: <5.2.0.9.2.20031105085531.02f7bc38@mailhost4.jpl.nasa.gov> <5.2.0.9.2.20031105133632.03f6f078@216.82.101.6> Message-ID: <200311060923.40670.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 6 Nov 2003 08:41 am, Eric Kuhnke wrote: > Quoting from a post to the list: > "Another example is the Canadian Virtual Observatory that will require to > transfer data files and stream instrumentation data of over half a terabyte > a day (!!) from facilities to Hawaii, France and UK There are higher rates on the horizon, for instance the LOFAR (Low Frequency Array) telescope that's proposed will reportedly be delivering multi-terrabits a second from each detector, which will need to be processed on site. Part of a CSIRO webpage on the project, if it were to be located in Western Australia, says: http://www.atnf.csiro.au/projects/ska/general/lofar.html [quote] Specific technologies that would be developed for LOFAR in WA include: * The construction of a 6 terabit/second optic-fibre link from the heart of inland WA to coastal Geraldton. This is a higher data-rate than systems in general use today. LOFAR would therefore represent a non-commercial test-bed for developing technologies. [/quote] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qXhrO2KABBYQAh8RAp0AAKCD2ccIYF4psvK78skXKd58Twg0rwCeMjCZ f+Shi+yhowKtXPpRI9agGHY= =K4rn -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 17:46:29 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 17:46:29 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <200311060923.40670.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > There are higher rates on the horizon, for instance the LOFAR (Low Frequency > Array) telescope that's proposed will reportedly be delivering > multi-terrabits a second from each detector, which will need to be processed > on site. > > Part of a CSIRO webpage on the project, if it were to be located in Western > Australia, says: > > http://www.atnf.csiro.au/projects/ska/general/lofar.html > > [quote] > > Specific technologies that would be developed for LOFAR in WA include: > > * The construction of a 6 terabit/second optic-fibre link from the heart of > inland WA to coastal Geraldton. This is a higher data-rate than systems in > general use today. LOFAR would therefore represent a non-commercial test-bed > for developing technologies. > > [/quote] Ah, did I mention that building this sort of thing is a very important jobs program for struggling telcom industries;-)? That should be very exciting. Very, very expensive, but very exciting. Just mux/demuxing the data should be "interesting", as should finding someplace to put it as it comes through. Sort of like catching the aforementioned metaphorical river as it flows and splitting it into lots of small pipes that go into bottles that just exactly fill all without spilling a drop. Unless it is all about peak transmission times for small bundles of data, that is... rgb > > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/qXhrO2KABBYQAh8RAp0AAKCD2ccIYF4psvK78skXKd58Twg0rwCeMjCZ > f+Shi+yhowKtXPpRI9agGHY= > =K4rn > -----END PGP SIGNATURE----- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 17:52:17 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 17:52:17 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <200311060923.40670.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > Part of a CSIRO webpage on the project, if it were to be located in Western > Australia, says: > > http://www.atnf.csiro.au/projects/ska/general/lofar.html > > [quote] > > Specific technologies that would be developed for LOFAR in WA include: > > * The construction of a 6 terabit/second optic-fibre link from the heart of > inland WA to coastal Geraldton. This is a higher data-rate than systems in > general use today. LOFAR would therefore represent a non-commercial test-bed > for developing technologies. On a second thought, I suspect that the OF link is going to be delivering real time analog data. This is very similar to a plan for an even bigger radiotelescope that I've had for years -- one that spans a continent, or even continents. The key to making a radiotelescope is being able to deliver realtime traces of the received signals with very precise time/phase delay information to a centralized location where the traces can be recombined and used to create an interference projection of the sky. Perhaps they're going to digitize the signal(s) first, but I don't see why they would offhand. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Nov 5 19:15:38 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed, 05 Nov 2003 16:15:38 -0800 Subject: Petabits/sec, and the like In-Reply-To: References: <200311060923.40670.csamuel@vpac.org> Message-ID: <5.2.0.9.2.20031105160600.02b003c0@216.82.101.6> Re: Australian terabit project The budget for routers alone will be astronomical... Loading up a Juniper T640 with OC-192 PICs isn't cheap! To the best of my knowledge there are submarine cables commercially available from Alcatel, Fujitsu/Siemens and KDDI with capacities in the 640Gb range. This requires not only multiple OC-192 capable routers, but vastly expensive DWDM terminal equipment at each end to insert multiple lambdas in fiber pairs. 20 singlemode fibre Rx/Tx pairs * 20 DWDM wavelengths per fiber pair * real-world 9.2Gb/s per DWDM wavelength = one immense problem processing/receiving at the destination of your data stream. Currently, with the telecom capacity glut, the highest capacity single-purpose cables laid in 2001 such as 360Atlantic (Boston USA to UK) and Tyco Transatlantic are operating at 80Gb or less. As a price reference, the FLAG group (recently purchased by Reliance Telecom) spent something like $1.6 to $2.0 billion to build a worldwide network four years ago with 80Gb capacity. >That should be very exciting. Very, very expensive, but very exciting. >Just mux/demuxing the data should be "interesting", as should finding >someplace to put it as it comes through. Sort of like catching the >aforementioned metaphorical river as it flows and splitting it into lots >of small pipes that go into bottles that _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 17:57:07 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 6 Nov 2003 09:57:07 +1100 Subject: Petabits/sec, and the like In-Reply-To: References: Message-ID: <200311060957.10972.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 6 Nov 2003 05:39 am, Robert G. Brown wrote: > By the time you add dollars to the problem those truckfulls of disks > look pretty damn good, actually. To take an example, I know of a local group doing work at CERN. Apparently they are participating in experiments that generate around 1TB a day of data (no idea if that's compressed/uncompressed or how compressible it would be). At the standard AARNET rate for international traffic of AU$22.50 per gig that's over $22,000 dollars a day if you wanted to pull that back over the 'net (assuming sufficient bandwidth to be able to do that). It then becomes obvious why they choose to fly it back with them. :-) - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qYBDO2KABBYQAh8RAnfvAJ4rAXhYoKnZmJiNOt6UhO8Jq5EZGwCdE1TG OsIsfqAvJ+3setXpVCA8v8A= =6mur -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 5 18:00:15 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 6 Nov 2003 10:00:15 +1100 Subject: Petabits/sec, and the like In-Reply-To: References: Message-ID: <200311061000.16897.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 6 Nov 2003 09:52 am, Robert G. Brown wrote: > On a second thought, I suspect that the OF link is going to be > delivering real time analog data. This is very similar to a plan for an > even bigger radiotelescope that I've had for years -- one that spans a > continent, or even continents. The key to making a radiotelescope is > being able to deliver realtime traces of the received signals with very > precise time/phase delay information to a centralized location where the > traces can be recombined and used to create an interference projection > of the sky. LOFAR is an interferometer in its own right, and it'll be the only 'scope going down to those frequencies (AFAIK) and so there won't be anything else to combine it with. :-) Website: http://www.lofar.org/ - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qYD/O2KABBYQAh8RAv6qAJ9z5FxCcxMDoA+F8mWqcyaf6y772wCfTQta IGEZ3BAmH3fnCMppwoCur2I= =YWrJ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 5 19:18:26 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 5 Nov 2003 19:18:26 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: <200311061000.16897.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, 6 Nov 2003 09:52 am, Robert G. Brown wrote: > > > On a second thought, I suspect that the OF link is going to be > > delivering real time analog data. This is very similar to a plan for an > > even bigger radiotelescope that I've had for years -- one that spans a > > continent, or even continents. The key to making a radiotelescope is > > being able to deliver realtime traces of the received signals with very > > precise time/phase delay information to a centralized location where the > > traces can be recombined and used to create an interference projection > > of the sky. > > LOFAR is an interferometer in its own right, and it'll be the only 'scope > going down to those frequencies (AFAIK) and so there won't be anything else > to combine it with. :-) I meant the "outlying stations" that give it a large baseline for high resolution. My idea has (for years now) been to transform all the cell phone towers in a country into a gigantic radiotelescope. You lose the single receiver directionality that LOFAR has with a tight array of parabolic receivers, but it is potentially SO cheap and there are SO many stations with SUCH a large baseline that overall brightness and resolution should be quite satisfactory. The north american continent, for example, would have an aperture of what, roughly 5000 km with towers strung in irregular networked distributions -- every few km along major highways and in dense clusters near cities and increasingly near even small highways and small towns. There must be tens to hundreds of thousands of towers by now, with interference brightening of 10^8 or so along the selected direction. I actually have a student working on this idea to a limited extent at this very moment -- sort of a preliminary feasibility study. In fundamental terms this means determining if the cell tower owners are willing to permit a dual public use of their receivers (which should be passive and utterly irrelevant to their function as cell phone antennae). Otherwise, I expect that all of the towers have fiber to them already; it is just a matter of piggybacking...;-) Using GPS and/or atomic clocks to establish a precise time base, local PC's should be able to record a generalized radio trace at a particular frequency. The same GPS can be used to precisely locate the towers. With a precise physical map of the receivers and a precise signal against a common time base, recombining the signals with various delays to assemble an image is then straightforward. In fact, with the time base, one could even do (I think) Hanbury-Brown-Twiss correlation studies, which I imagine is also a goal of LOFAR via its outlier stations although I haven't read far enough to find out. In this context I don't know whether or not the traces from the individual towers would be best sent digitized or not. In the LOFAR context they probably are. Alas, I'm a theorist and so I'm not sufficiently familiar with the hardware requirements one has to work with to capture, save, send, and ultimately recombine the signals, although I can visualize the math easily enough. I'll see if I can get my student to join the LOFAR discussion group. I think he's a bit behind on this anyway, with all the work he has this semester. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Nov 6 04:23:50 2003 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 6 Nov 2003 10:23:50 +0100 (CET) Subject: Petabits/sec, and the like In-Reply-To: <200311060957.10972.csamuel@vpac.org> Message-ID: On Thu, 6 Nov 2003, Chris Samuel wrote: > To take an example, I know of a local group doing work at CERN. Apparently > they are participating in experiments that generate around 1TB a day of data > (no idea if that's compressed/uncompressed or how compressible it would be). > I THINK (though don't quote me) that this is the raw data rate. What happens in an HEP experiment is that raw data comes from the detector. It is passed through three levels of trigger processors, from a very simple (are 1st level inside the detector at LHC???) to a third level, which is run on PCs. I guess this 1TB rate is the raw event rate after the level 3 trigger. The data is then sent to a reconstruction farm, where the raw levels are combined into tracks and energy deposits, using the physical data and calibrations of the detector. The physicists then work on the resulting DST - data summary tape, which is much less data than the raw data. I'm not sure of the plans for processing raw data at LHC - maybe all is processed at the main site, maybe som is shipped off to the Tier 1 centres. I really don't know the answer here. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Thu Nov 6 07:39:13 2003 From: djholm at fnal.gov (Don Holmgren) Date: Thu, 06 Nov 2003 06:39:13 -0600 Subject: Petabits/sec, and the like In-Reply-To: References: Message-ID: On Thu, 6 Nov 2003, John Hearns wrote: > On Thu, 6 Nov 2003, Chris Samuel wrote: > > > To take an example, I know of a local group doing work at CERN. Apparently > > they are participating in experiments that generate around 1TB a day of data > > (no idea if that's compressed/uncompressed or how compressible it would be). > > > I THINK (though don't quote me) that this is the raw data rate. > What happens in an HEP experiment is that raw data comes from the > detector. > It is passed through three levels of trigger processors, from > a very simple (are 1st level inside the detector at LHC???) to a third > level, which is run on PCs. > > I guess this 1TB rate is the raw event rate after the level 3 trigger. > The data is then sent to a reconstruction farm, where the raw levels > are combined into tracks and energy deposits, using the physical > data and calibrations of the detector. > > The physicists then work on the resulting DST - data summary tape, > which is much less data than the raw data. > > I'm not sure of the plans for processing raw data at LHC - > maybe all is processed at the main site, maybe som is shipped off > to the Tier 1 centres. I really don't know the answer here. > I was part of the team that implemented the level 3 trigger at the CDF experiment at FNAL. The order of magnitude data rate out of the detector is 1 TByte/sec - collisions at O(1 MHz), O(1 million) data channels, O(1 byte/channel). That rate gets reduced through Level 1, 2, and 3 triggers. The level 3 trigger Linux computers do event building (assembling full events from event fragments sent via an ATM switch) and reconstruction (full events distributed via fast ethernet, data "inverted" to produce particle tracks and energies). Here were the specifications we worked from in 1997 for L3: - event rate into L3: 250 to 1000 Hz - event size: 250 KB avg - accept rate: 72 Hz The accept rate translates into 18 MB/sec, written to mass storage. At this 18 MB/sec (set by the tape budget, BTW), CDF currently writes ~ 1.5 TB/day to tape. The D0 experiment at Fermilab is writing a similar amount. On typical days, the Fermilab mass storage system moves 10's of TB/day - I think the record is something like 35 TB/day. I'm not sure of LHC design numbers, but believe they are more like 1 GB/sec to storage. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Nov 6 07:55:04 2003 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 6 Nov 2003 13:55:04 +0100 (CET) Subject: Petabits/sec, and the like In-Reply-To: Message-ID: On Thu, 6 Nov 2003, Don Holmgren wrote: > > > - event rate into L3: 250 to 1000 Hz > - event size: 250 KB avg > - accept rate: 72 Hz > > The accept rate translates into 18 MB/sec, written to mass storage. Wow. (I was part of a LEP experiment, and we of course had much less data to content with. I still remember though in the days of low capacityB leased lines that I ftp'd the first LEP event back to Glasgow Uni. I was soundly rapped over the knuckles for tying up the whole line)A > > > I'm not sure of LHC design numbers, but believe they are more like 1 > GB/sec to storage. We are in the wrong game. Time to buy shares in a tape manufacturer like 3M. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Nov 6 08:02:06 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 6 Nov 2003 08:02:06 -0500 (EST) Subject: Petabits/sec, and the like In-Reply-To: Message-ID: On Thu, 6 Nov 2003, John Hearns wrote: > I'm not sure of the plans for processing raw data at LHC - > maybe all is processed at the main site, maybe som is shipped off > to the Tier 1 centres. I really don't know the answer here. We have some people who work at CERN and Fermilab, and they do indeed talk about needing very, very fat pipes, big big disk, and the biggest problem -- backup to match. Or if you prefer, viewing tape as a reasonably compact high-data-density transport medium, these labs have shipped tapes around from time immemorial -- we just happen to live in a time where tape densities have been most unfortunately bypassed by hard media in both data density in one direction and cost in the other. My feeling is that the labs are still in the process of reacting to this and reengineering the data transport problem, balanced between wildly varying costs and ease of use for different alternatives, political pressure (I wasn't kidding about the jobs program for telcoms part -- lots of politicians would LOVE to see billion dollar programs for fat pipes funded), and the actual facility/infrastructure realities at both ends. Duke, for example, is on one of the experimental gigabit networks, which (at least when the project was started) was pretty bleeding edge, but this is still only 0.1% of a terabps, connectivity is far from uniform across the net, the pipe is shared by many users (and it isn't just the HEP community that makes fat data sets -- medical centers like to ship around images of their own:-). At the one or two meetings I've sat in on with these guys (discussing beowulfery and data transport) its like they look at the primary campus feed and kind of shrug their shoulders and ask if they can get a few of those for themselves -- one isn't enough. I personally think that there are always going to be bleeding edge consumers of advances on any of the primary computing/data processing bottlenecks. Even with terabyte RAIDS (a number that would have been unthinkably expensive just five years ago that I could now build for myself upstairs using leftover development account money, if I had the slightest use for a TB:-) some people are blocked by too little disk. LOTS of people ride the Moore's Law curves on raw CPU and memory (size and speed both). Others pray for networks that could carry orders of magnitude more than a "mere" Gbps. Most of us on this list likely wish for whole combinations of the above -- a desktop RAID holding a petabyte of data backed up to a holographic optical crystal, 100x faster CPUs with 1000x larger and faster memory (to get memory speed closer to CPU speed) fed by networks with 1000x the bw and 1/1000th the latency (c'mon, admit it, network latency on the order of a nanosecond would be lovely. Too bad about that pesky speed of light thing...:-). And while we're messing with that holographic crystal in our imaginations, let's just make everything optical and built on top of nanoscale devices, shall we? One thing of great beauty is that Moore's Law makes it quite likely that at least some of this "insane" wish list will come true over the next decade. Not the ns-latency network though...at least if you want to talk to things more than a few cm away. . [Although hey, if y'all think one can violate causality AND TRANSMIT MESSAGES by "twisting" one of a pair of correlated photons, a little thing like ns latency networks across the entire continent become straightforward, right? Cannot use non-relativistic Schrodinger equations or even concepts to describe relativistic field propagation, grumble... no such thing as "wavefunction collapse", grumble, not time-reversal invariant, grumble, violates causal propagation of field UNLESS one looks at advanced field and Wheeler-Feynman and Dirac which do not permit separation of local field interaction of eventual absorber/measurement device from system even "back" at emission event on same light cone, grumble. Having a grumbly day. Stayed up too late working on something for a slave-driv... I mean "friend" of mine on this very list...Grumble;-)] rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ZukaitAJ at nv.doe.gov Thu Nov 6 11:55:20 2003 From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony) Date: Thu, 6 Nov 2003 08:55:20 -0800 Subject: Scyld and MPICH. Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E1E@lao-exchpo1-nt.nv.doe.gov> I am having a problem with MPI_reduce and I believe that it is a buffer size error. Is there a way to calculate the maximum size of the buffer and what is the maximum size of the buffer allowed? It does not seem to be linear with the number of processors. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mike.sullivan at alltec.com Thu Nov 6 13:39:51 2003 From: mike.sullivan at alltec.com (Mike Sullivan) Date: Thu, 06 Nov 2003 13:39:51 -0500 Subject: Tyan 2880 and 2885 Message-ID: <3FAA9577.2040001@alltec.com> >Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? I have used the 2880 under RedHat AS 2.1 and gingin64 and it works fine execpt for the SATA controller. I did not get the promise chip to work but did not spend a lot of time on it. The GigE interface works. The board was stable and I have been using them in NAS devices with 3ware cards. The SMDC option for these units works fairly well with the most recent console and you can get sensor data. > It seem like there are drivers for AMD-8111/8131/8151 >chipset on the AMD page, drivers for the Broadcom >network chip in other places. Any feedback on SATA >support for the Silicon Image Sil3114 SATA RAID >Accelerator and on SATA support in general? Any other >caveats? I also have both a 2882 and 2885 that I will be testing early next week with Suse Linux 9 for AMD64 and would will post my findings. Thanks in advance for any help! Konstantin -- Mike Sullivan Director Performance Computing @lliance Technologies, Voice: (416) 385-3255 x 228, 18 Wynford Dr, Suite 407 Fax: (416) 385-1774 Toronto, ON, Canada, M3C-3S2 Toll Free:1-877-216-3199 http://www.alltec.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Thu Nov 6 13:37:59 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Thu, 06 Nov 2003 10:37:59 -0800 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: <20031105225405.60387.qmail@web21205.mail.yahoo.com> References: <20031105225405.60387.qmail@web21205.mail.yahoo.com> Message-ID: <3FAA9507.2000508@cert.ucr.edu> Konstantin Kudin wrote: > Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? > > We have a few of the s2880's. They were real problematic at first in that they'd constantly crash. But it turned out that when I downgraded the bios, all of our problems went away. Of course I also needed to install the latest 2.4.22 kernel before the machines would boot with the older bios installed. I'm not sure what to tell you about the serial ata support, as I've never played with it. Linux seems to support the nic just fine though. Hope that helps, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Thu Nov 6 12:48:24 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Thu, 06 Nov 2003 11:48:24 -0600 Subject: Scyld and MPICH. In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E1E@lao-exchpo1-nt.nv .doe.gov> References: <09AE3D324A22D511A1A50002A5289F2101030E1E@lao-exchpo1-nt.nv.doe.gov> Message-ID: <6.0.0.22.2.20031106114611.039168b8@localhost> At 10:55 AM 11/6/2003, Zukaitis, Anthony wrote: >I am having a problem with MPI_reduce and I believe that it is a buffer size >error. Is there a way to calculate the maximum size of the buffer and what >is the maximum size of the buffer allowed? It does not seem to be linear >with the number of processors. There should be no maximum buffer size, though the ch_p4 device does impose a limit when shared memory is used to transfer a message. Do you have an example program that we could test (Bug reports for MPICH should be sent to mpi-maint at mcs.anl.gov) Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tod at gust.sr.unh.edu Thu Nov 6 15:02:25 2003 From: tod at gust.sr.unh.edu (Tod Hagan) Date: 06 Nov 2003 15:02:25 -0500 Subject: Article: Sony Cell CPU to deliver two teraflops in 64-core config Message-ID: <1068148945.24918.28.camel@haze.sr.unh.edu> http://www.theregister.co.uk/content/3/33791.html It also mentions the ClearSpeed chip that was discussed here recently. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From aasmund at simula.no Thu Nov 6 17:52:28 2003 From: aasmund at simula.no (=?iso-8859-1?Q?=C5smund_=D8deg=E5rd?=) Date: Thu, 06 Nov 2003 23:52:28 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <20031105000513.GA2101@galactic.demon.co.uk> References: <3FA731E8.2060603@scalableinformatics.com> <20031104001147.D4021@lnxi.com> <3FA79DC6.9040608@scalableinformatics.com> <20031104213632.GA1662@galactic.demon.co.uk> <3FA82D51.3070102@scalableinformatics.com> <20031105000513.GA2101@galactic.demon.co.uk> Message-ID: On Wed, 5 Nov 2003 00:05:13 +0000, Andrew M.A. Cater wrote: > > On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: >> >> There are interesting bits in debian. I am not sure it is necessarily >> the right choice for clusters due to the specific lack of commercial >> support for cluster specific items such as Myrinet, and the other high >> speed interconnects. > > Dan - if I build a _really big_ cluster, will you get Quadrics to do > Debian :) > Same goes for any other vendor - if you ask them nicely and make it > worth their while, they'll do it. In many cases, it's only a recompile > of a device driver to account for library differences, after all. > > HP use Debian internally, IIRC. Some of the Debian developers are also > HP folk - HP are potentially looking to support more of their products > under Linux? [See, for example, Debian Weekly News for today :) ]' Actually, we have quite recently installed a Itanium2 based cluster, using debian, because we want debian. We got HP to do it for us, using the (former Compaq) CMU tool. They did some porting to support debian in this tool... So, ask nicely (and put it as a requirement to let them get the deal), and you can get what ever you want ;-) >> Commercial compiler support for Debian (e.g. >> Intel, Absoft, et al) is largely non-existant as far as I know (please >> do correct me if I am wrong). No problem with Intel compilers on Debian (alien do the trick). -- [simula.research laboratory] ?smund ?deg?rd Scientific Programmer / Chief Sys.Adm phone: 67828291 / 90069915 http://www.simula.no/~aasmundo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Thu Nov 6 19:59:51 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Thu, 6 Nov 2003 16:59:51 -0800 (PST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031106145623.GA5867@iib.unsam.edu.ar> Message-ID: <20031107005951.2157.qmail@web11407.mail.yahoo.com> A very good paper about building HPC clusters with FreeBSD: "Building a High-performance Computing Cluster Using FreeBSD" http://people.freebsd.org/~brooks/papers/bsdcon2003/ The author talked about hardware issues: KVM, BIOS redirection, CPU choices; and then talked about why he chose FreeBSD instead of Linux... he also did the port of GridEngine (SGE) to FreeBSD. Anyone tried to setup HPC clusters with *BSD?? Rayson --- Fernan Aguero wrote: > Any FreeBSD users willing to share clustering experiences > out there? > > Fernan __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Thu Nov 6 22:07:53 2003 From: jsims at csiopen.com (Joey Sims) Date: Thu, 6 Nov 2003 22:07:53 -0500 Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL Message-ID: <812B16724C38EE45A802B03DD01FD5471E049E@exchange.concen.com> Maybe someone could lend a hand and help Intel find out what their unknown material is. Be careful! Don't spill it in your lap for goodness sake.... Dohh! :-O I found this amusing: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL 11.07.03 by Jennifer Tabor HPCwire ======================================================================== ====== Chip makers are searching for ways to create smaller and smaller computer chips, and researchers at Intel believe they have discovered a new material that would help them to do just that. Intel's announcement will garner much attention in an industry where the demand for products that push fundamental physical limits is ever increasing. A problem afflicting many chip makers today is the prevention of electrical currents from leaking outside their proper patches. Because the transistor gates are now becoming as small as just five atomic layers, chips need more power. In turn, they also need a more efficient cooling system. Intel has been having difficulties with the cooling of its chips -- the smaller they get (with etchings as small as 90-130 nanometers), the hotter they become. Recent reports say that the problem has even caused a delay in the Prescott, Intel's most advanced version of the Pentium. Though the new technology would not debut until approximately 2007, Intel is planning to scale down their current 90 nanometer chip size over the years to 65, followed by 45. It is at this point that Intel's new material, which is still unknown, would be introduced. Intel's discovery comes at the height of an intense industry wide search for a new material to replace silicon dioxide, which is used as insulator between the gate and the channel through which current flows in an active transistor. Intel researchers have been working on solving the chip predicament for five years in efforts to keep pace with Moore's Law. Gordon E. Moore, co-founder of Intel, believed that the number of transistors in the same space should double every 18 months. Intel believes they can continue to make short strides, despite the thoughts of many who doubt their ability to keep up such a pace. Though many researchers and competitors agree that Intel's announcement revolves around the most important research area in the chip industry, some feel that the lack of specific technical detail will deter scientists from assessing their claims. ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodrigc at crodrigues.org Thu Nov 6 23:04:15 2003 From: rodrigc at crodrigues.org (Craig Rodrigues) Date: Thu, 6 Nov 2003 23:04:15 -0500 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107005951.2157.qmail@web11407.mail.yahoo.com> References: <20031106145623.GA5867@iib.unsam.edu.ar> <20031107005951.2157.qmail@web11407.mail.yahoo.com> Message-ID: <20031107040415.GA5711@crodrigues.org> On Thu, Nov 06, 2003 at 04:59:51PM -0800, Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? Hi, Not quite the same as an HPC cluster, but take a look at the University of Utah's Emulab: http://www.emulab.net It is heavily based on FreeBSD (i.e. makes use of FreeBSD routing, Dummynet, etc.). The Emulab is a remotely accessible testbed that researchers can use to conduct network experiments. It consists of about 200 PC nodes. The same company that Brooks works for (Aerospace), has apparently set up an internal testbed based on the Emulab software developed at Utah. I use the Emulab every day as party of my research work at BBN, and it is an excellent facility. -- Craig Rodrigues http://crodrigues.org rodrigc at crodrigues.org _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Nov 7 04:13:40 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 07 Nov 2003 10:13:40 +0100 Subject: Article: Sony Cell CPU to deliver two teraflops in 64-core config In-Reply-To: <1068148945.24918.28.camel@haze.sr.unh.edu> References: <1068148945.24918.28.camel@haze.sr.unh.edu> Message-ID: <1068196420.17694.8.camel@penguin> And also on The Reg: http://www.theregister.co.uk/content/3/33813.html The Reg reckons Opteron 250s by early next year. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From franz.marini at mi.infn.it Fri Nov 7 07:56:28 2003 From: franz.marini at mi.infn.it (Franz Marini) Date: Fri, 7 Nov 2003 13:56:28 +0100 (CET) Subject: OctigaBay 12K In-Reply-To: <1068196420.17694.8.camel@penguin> References: <1068148945.24918.28.camel@haze.sr.unh.edu> <1068196420.17694.8.camel@penguin> Message-ID: Hello, just discover this interesting, imho, company and its first product : http://www.octigabay.com/ Their first product is a linux opteron-based cluster that they said could scale up to 12K processors. The base system is a 3.5U shelf with 12 opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor latency and 77GB/s aggregate mem bandwidth. Seems nice, I would like to know what rgb and some of the other people in here think about it :) Have a nice day, Franz --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it --------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Nov 7 08:44:11 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 7 Nov 2003 08:44:11 -0500 (EST) Subject: OctigaBay 12K In-Reply-To: Message-ID: On Fri, 7 Nov 2003, Franz Marini wrote: > Hello, > > just discover this interesting, imho, company and its first product : > > http://www.octigabay.com/ > > Their first product is a linux opteron-based cluster that they said > could scale up to 12K processors. The base system is a 3.5U shelf with 12 > opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor > latency and 77GB/s aggregate mem bandwidth. > > Seems nice, I would like to know what rgb and some of the other people > in here think about it :) Why, it looks simply lovely, as hardware I've never actually tried goes. I mean, if the octigabay people want to send me one for free just so I can write a review for it on this list and the brahma website, well, from the look of it I wouldn't kick it out of my machine room for chewing crackers... and I >>can<< be bought, folks, yes I can, just look at the brahma vendors page and my brazen demand for t-shirts in exchange for space:-) I'll even dig up something fine grained to run on it so that I can pretend to really test it. The bottom line is, well, the bottom line. Pretty isn't enough. Performance (even performance that is absolutely everything promised) isn't enough. It is PRICE performance that matters, or better yet cost-benefit. How does the cost compare to the benefits the design delivers in your environment. For my own personal code, for example, I don't NEED their fancy interconnect, and I can rack up a bunch of opterons for the cost of the basic hardware and a nice case to put them in. They'd therefore have to literally give it to me to make it a cost-benefit win (especially true since I just spent the last of my money in this grant cycle buying hey, whaddya know, a stack of 9 dual Opteron 242's for a hair over $20K). However, there are people out there who run fine grained synchronous parallel code that is bottlenecked at the network IPC level. Even THERE the computations have some intrinsic "value" in that there are finite amounts of money people are willing to pay to get them done, and there are choices. So ultimately it will come down to whether there is a match between the value of the computation (amount people are willing to pay to get it done), the needs of the computation, and the marketplace. It's one of these people that you need to ask about whether or not this is a good deal or good arrangement. My knee jerk reaction is that it is lovely but a bit too far into the big iron side (SP3-ish) to be likely to win a hard-nosed CB comparison relative to a DIY cluster with e.g. myrinet or SCI for MANY clustervolken (the market gets smaller and smaller the further up one travels to super-high-speed networks), but corporate consumers and the larger government consumers shy away from DIY, and even in the intermediate market it comes down to price/performance, eh? If they price it competitively with the other high speed networks and it has clear benefits (as it looks like it might) well then, who knows? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Fri Nov 7 10:00:56 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Fri, 7 Nov 2003 10:00:56 -0500 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107005951.2157.qmail@web11407.mail.yahoo.com> References: <20031106145623.GA5867@iib.unsam.edu.ar> <20031107005951.2157.qmail@web11407.mail.yahoo.com> Message-ID: <20031107150056.GA16835@netmeister.org> [Resending; this message was originally sent last night across the various mailing lists, but beowulf at beowulf.org chokes on the gpg signature. :-/ ] Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? I have a 30 node NetBSD/i386 cluster, and just recently created the tech-cluster at netbsd.org mailing list. Some people are working on a port of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in particular for cluster usage in the near future. Some URLs of relevance: http://guinness.cs.stevens-tech.edu/~jschauma/hpcf/ http://www.netbsd.org/MailingLists/#tech-cluster http://www.netbsd.org/ http://eurobsdcon.org/papers/#souvatzis http://bsd.slashdot.org/article.pl?sid=03/10/20/1523252&mode=thread&tid=122&tid=185&tid=190 http://bsd.slashdot.org/bsd/03/11/05/1536226.shtml?tid=122&tid=185&tid=190 -Jan -- Life," said Marvin, "don't talk to me about life." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Nov 7 12:17:23 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 7 Nov 2003 09:17:23 -0800 (PST) Subject: Fwd: Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) Message-ID: <20031107171723.61183.qmail@web11409.mail.yahoo.com> Forwarding... (Brooks is not on the beowulf list) Rayson --- Brooks Davis wrote: > We (my department, but mostly different people then Fellowship) have > a > small 10-node setup (though each node does have 6 gigabit ports :-). > I > think we're aiming to upgrade to around 48 nodes in the next year. > > Our HPC cluster is currently pretty close to what's described in the > paper, though we are up to 160 nodes and we're adding rack space for > another 192 this year. > > The short version of my take on which OS to run on your cluster is > that so long as it runs the apps you need, the best OS is one you > know > how to admin well since that's most of the work. I've spent a few > weeks here and there porting applictions or improving their ports, > but > by and large, most key systems are already ported to the major UNIX > platforms. The free MPI implemntations work on just about anything, > the > base Ganglia metrics work nearly everywhere (FreeBSD and Linux are at > feature parity in the upcoming release), and SGE works on a wide > range > of platforms. On an amusing note, we were the launch customer for > Grid > Mathematica despite not running a supported OS because the Linux > version > runs just fine on FreeBSD. > > -- Brooks > > -- > Any statement of the form "X is the one, true Y" is FALSE. > PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 > > ATTACHMENT part 2 application/pgp-signature __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Fri Nov 7 14:17:48 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Fri, 7 Nov 2003 14:17:48 -0500 (EST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107150056.GA16835@netmeister.org> Message-ID: > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > particular for cluster usage in the near future. why? I've never understood the evangelical aspect to *BSD (or for that matter Debian). is there a tangible, measurable benefit? I'm not sure it's effective to advocate a niche OS/dist for ideological reasons or just plain personal preference... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Fri Nov 7 16:36:48 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Fri, 7 Nov 2003 21:36:48 +0000 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: References: <20031107150056.GA16835@netmeister.org> Message-ID: <20031107213648.GA11665@galactic.demon.co.uk> On Fri, Nov 07, 2003 at 02:17:48PM -0500, Mark Hahn wrote: > > why? I've never understood the evangelical aspect to *BSD > (or for that matter Debian). is there a tangible, measurable benefit? For some of the list this is old news :( Sorry, but as the co-author of the Distributions HOWTO (and I know it badly needs updating :) ), I can't let this pass. In the beginning was chaos: build your own Linux machine from random bits and pieces, bootstrapping from Minix if necessary. Soft Landing Systems (SLS) introduced the whole concept of a Linux distribution - a collection of code more or less known to work together. In 1993, Patrick Volkerding got fed up with the problems of SLS and founded Slackware. Coincidentally, within a couple of months, a university student named Ian Murdock did almost the same and founded Debian, named for himself and his then girlfriend Debra. Mark Ewing and the Red Hat founders came along a little later, liked some of the concepts from Debian and the idea of package management and introduced the Red Hat Package Manager which spawned .rpm packages. SuSE came out of Slackware via Unifix and Jurix and adopted .rpm's a little later. Mandrake was a French attempt to localise Red Hat and introduce some better packages ... and so on. Don Becker can probably back me up on this - the first cluster was a quick project to fill a specific need for NASA. Cheap, commodity hardware and a quick win. It's name was Beowulf (for the mythical hero). It wasn't intended to go further than NASA and be a short term thing. But it snowballed to the point where everyone wanted "a Beowulf" The first Extreme Linux disk was based on Red Hat 5.x because that was what happened to be around at the time. I've still got mine somewhere - a quick bootstrap to experiment with a cluster (in the days when 4 x 486's still counted for something). Much the same with Extreme Linux 2 and the (semi-commercial/commercial) Scyld. Then commerce woke up to and "commoditised" clusters. Clusters don't have to be Red Hat - nothing Linux _has_ to be anything - but many of them are. There have always been various distributions: various clustering solutions/hardware have come and gone. Everyone has "their" cluster and "their" problem/solution set. It may still be quicker to build your own minimalist system / cheaper to use cast-offs / more economic for you to use ultra-high performance interconnects and networking - that's for you / your budget holder / sysadmins / vendor / cooling plant vendors to decide :) I advocate Debian not just because I use it a lot (for the record, I've also used Red Hat 4.2/5.*/6.*/7.*/8/9/Fedora beta, Mandrake, SuSE, Slackware and the late lamented Linux-FT) but because it has some good qualities, runs on lots of hardware platforms and is relatively unencumbered by nasty legal agreements/high fees. That's my choice - I'll happily help others to run it/port programs so they fit in the Debian distribution/ask vendors nicely if they'll consider support for Linux. It may not be your choice or the choice of others - but it is always worth trying stuff out and being open to change. [Rabid flame mode on: For the record, Debian, despite being maintained by an army of multi-national, multi-lingual volunteers who occasionally manage the semblance of close formation, is _not_ a niche OS - some figures put it second to RH in terms of popularity as a Linux distribution :) Go and write GNU/Linux 1000 times as a penance.] As far as *BSD goes: The BSD's have a longer pedigree. Some people swear by (others swear at) their networking capability. If you've grown up with SunOS and many of the other commercial Unices, much may feel familiar. NetBSD will run on (almost) anything, FreeBSD is less hardware agnostic and differently focused ... and so it goes. A lot of arguments have been thrashed out over the years which generate more heat than light (vi vs. emacs vs. any other editor / .rpm vs. .deb / apt vs. yum vs. urpmi vs. up2date :) ) and this is probably one of them, so it's probably not worth a massive OT thread to follow up. > I'm not sure it's effective to advocate a niche OS/dist for > ideological reasons or just plain personal preference... > See above: your niche OS may be someone else's ideal. An awful lot of niche Linux variants have tried to set themselves up as "the standard" - and vanished without trace. Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmyers1400 at comcast.net Fri Nov 7 11:16:56 2003 From: rmyers1400 at comcast.net (Robert Myers) Date: Fri, 07 Nov 2003 11:16:56 -0500 Subject: OctigaBay 12K In-Reply-To: References: Message-ID: <3FABC578.8050503@comcast.net> Robert G. Brown wrote: >However, there are people out there who run fine grained synchronous >parallel code that is bottlenecked at the network IPC level. Even THERE >the computations have some intrinsic "value" in that there are finite >amounts of money people are willing to pay to get them done, and there >are choices. > As a reader of one of my undergraduate essays commented, that's the kind of knowing comment that Henry James might have written in a letter to his brother William. You may have the status of Henry James in this particular field (my grading professor plainly did not think that I did in the field on which I was holding forth) and therefore be entitled to such remarks with no defense, but could you perhaps elaborate in light of http://www.lanl.gov/orgs/ccn/salishan2003/pdf/camp.pdf particularly slide 25 et. seq.? Thanks RM _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Fri Nov 7 17:50:14 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Fri, 7 Nov 2003 15:50:14 -0700 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: References: <20031107150056.GA16835@netmeister.org> Message-ID: <200311072249.hA7MnG3T000139@knockout.kirtland.af.mil> On Fri, Nov 07, 2003 at 02:17:48PM -0500, Mark Hahn wrote: > > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > > particular for cluster usage in the near future. > > why? I've never understood the evangelical aspect to *BSD > (or for that matter Debian). is there a tangible, measurable benefit? > I'm not sure it's effective to advocate a niche OS/dist for > ideological reasons or just plain personal preference... It is true that Debian has a core of developers that are very committed to a particular definition of free software, and this often carries with it a surprisingly consistent set of philosophical POVs. Nonetheless, Debian is remarkably stable and it has, by far, the best package management system I have ever come across. I use it on all my machines with great overall happiness. With RH moving toward the Big-Bucks model of software, I will not be surprised if I see many new Debian users over the next few months. Art Edwards > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 8 11:54:26 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 8 Nov 2003 11:54:26 -0500 (EST) Subject: Linux vs FreeBSD clusters In-Reply-To: <20031108162037.GA835@netmeister.org> Message-ID: > > > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > > > particular for cluster usage in the near future. > > > > why? > > I believe that NetBSD, even though is a very, very clean and clear OS > with high-quality code, is much neglected. It deserves more attention > and is perfectly suitable for the task at end. there's lots of good stuff out there; I notice you didn't state that *BSD is actually cleaner/clearer/higher-quality than Linux though. and such a statement would be manifestly untrue. the 'deserves' thing is purely an aesthetic judgement, which is perfectly fine, but not a reason to switch... > > I've never understood the evangelical aspect to *BSD (or for that > > matter Debian). > > Evangelical? The entire Linux ``movement'' is based on such evangelism, > much more so than Free- or NetBSD. how strange! absolutely everyone I know who uses Linux (MANY) use it for purely practical reasons - speed, robustness, ease-of-whatever, and sometimes simply because it's the most prevalent Unix. wait, OK, I do know one Debianista, but there's on in every crowd ;) > > is there a tangible, measurable benefit? > > Well, the same as mentioned in the FreeBSD paper, for example. If all what paper is that? > my other machines are NetBSD machines, then having the cluster be NetBSD > makes administration an order of magnitude easier. sure. as I said, *BSD seems mainly preferred by people who are already committed to *BSD. and precisely what I'm looking for is any reason to go BSD for the majority who already have Linux. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Sat Nov 8 12:17:08 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Sat, 8 Nov 2003 12:17:08 -0500 Subject: Linux vs FreeBSD clusters In-Reply-To: References: <20031108162037.GA835@netmeister.org> Message-ID: <20031108171708.GA24376@netmeister.org> Mark Hahn wrote: > > I believe that NetBSD, even though is a very, very clean and clear OS > > with high-quality code, is much neglected. It deserves more attention > > and is perfectly suitable for the task at end. > > there's lots of good stuff out there; I notice you didn't state that > *BSD is actually cleaner/clearer/higher-quality than Linux though. I'm not familiar enough with the Linux code to warrant such a statement. I do know people who are and who would argue that such a statement has a basis. But this should probably be discussed off-list, if at all. > > > is there a tangible, measurable benefit? > > > > Well, the same as mentioned in the FreeBSD paper, for example. If all > > what paper is that? The one posted here. Message-ID: <20031107005951.2157.qmail at web11407.mail.yahoo.com> http://people.freebsd.org/~brooks/papers/bsdcon2003/ > and precisely what I'm looking for is any reason to go BSD for the > majority who already have Linux. s/BSD/Linux/ s/Linux/Solaris/ That's probably pretty much the same question posed before Linux was the new bandwagon to jump. *shrug* -Jan -- Wenn ich tot bin, mir soll mal Einer mit Auferstehung oder so kommen, ich hau ihm eine rein! (Anonym) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Sat Nov 8 11:20:37 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Sat, 8 Nov 2003 11:20:37 -0500 Subject: Linux vs FreeBSD clusters In-Reply-To: References: <20031107150056.GA16835@netmeister.org> Message-ID: <20031108162037.GA835@netmeister.org> Mark Hahn wrote: > > of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > > particular for cluster usage in the near future. > > why? I believe that NetBSD, even though is a very, very clean and clear OS with high-quality code, is much neglected. It deserves more attention and is perfectly suitable for the task at end. > I've never understood the evangelical aspect to *BSD (or for that > matter Debian). Evangelical? The entire Linux ``movement'' is based on such evangelism, much more so than Free- or NetBSD. > is there a tangible, measurable benefit? Well, the same as mentioned in the FreeBSD paper, for example. If all my other machines are NetBSD machines, then having the cluster be NetBSD makes administration an order of magnitude easier. -Jan -- "I am so amazingly cool you could keep a side of meat in me for a month. I am so hip I have difficulty seeing over my pelvis." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 8 14:17:49 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 8 Nov 2003 14:17:49 -0500 (EST) Subject: Linux vs FreeBSD clusters In-Reply-To: <3FAD2B7C.2050302@mail2.vcu.edu> Message-ID: > It seems that your are trying to logically argue that people who > evangelize whatever form of linux/unix are being zealots. However, you > avoid providing us with any logical reasons for this assertion. what? I'm doing the exact opposite: asking for non-religious reason for choosing a particular OS or dist. if there's a good, factual reason for using *BSD (say, its gigabit latency is 10 us) then I'd seriously consider switching. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Fetrovsky at netscape.net Sat Nov 8 16:09:47 2003 From: Fetrovsky at netscape.net (=?ISO-8859-1?Q?Daniel_Jes=FAs_Valencia_S=E1nchez?=) Date: Sat, 08 Nov 2003 13:09:47 -0800 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <3FAD5B9B.4050207@netscape.net> hahn at physics.mcmaster.ca wrote: >>It seems that your are trying to logically argue that people who >>evangelize whatever form of linux/unix are being zealots. However, you >>avoid providing us with any logical reasons for this assertion. >> >> > >what? I'm doing the exact opposite: asking for non-religious >reason for choosing a particular OS or dist. if there's a good, >factual reason for using *BSD (say, its gigabit latency is 10 us) >then I'd seriously consider switching. > > I'm sorry... I couldn't help myself. After several years using linux, and dealing with stability problems, I switched to FreeBSD, and since then (about 4 yrs ago) I've had no problems. My code actually runs faster and I don't have to deal with several FreeBSD distributions. If I have to hack an OS kernel or distribution in order to make it work, then I don't want it. That's why I switched to Freebie. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Sat Nov 8 16:32:41 2003 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Sat, 8 Nov 2003 21:32:41 +0000 (UTC) Subject: Linux vs FreeBSD clusters In-Reply-To: Message-ID: On Sat, 8 Nov 2003, Mark Hahn wrote: > and precisely what I'm looking for is any reason > to go BSD We-e-ell ... For starters, the experience might help get rid of this inexplicable: 'Red Hat Is The Only One True God' attitude you seem to want to inflict on the list readership. Doesn't really go down a storm here in Europe, where *real* Linux experts are automatically expected to be experienced with SUSE, Mandrake, Debian -- and of course, Red Hat. As well as others. I'm afraid the "Oh, we don't use anything but Red Hat" freaks come over as being extremely blinkered, and tend not to get very far (or quickly relegated to the Red Hat niche areas). Autre pays, autre moeurs. (Oh, and of course there are valid technical reasons behind everyone's preferred choice for carrying out a particular task. However, blanket dismissal of all but the reigning high-visibility sales publicity leader does not count as a technical reason for most.) Sorry; but too many remarks on this list over the past two weeks have been allowed to pass without comment, and have increasingly pressed the wrong buttons for me. As far as I'm concerned, Red Hat is the Cobol of the 21st century. -- Martin Wheeler Long-time enthusiastic user of "the real thing". Sometime user of Red Hat (but only when paid to do so). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 8 17:45:21 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 8 Nov 2003 17:45:21 -0500 (EST) Subject: Linux vs FreeBSD clusters In-Reply-To: Message-ID: > > and precisely what I'm looking for is any reason > > to go BSD > > > > We-e-ell ... For starters, the experience might help get rid of this > inexplicable: 'Red Hat Is The Only One True God' attitude you seem to > want to inflict on the list readership. you must have be confused with someone else: I could not care less whether it's RH or not. dists are unimportant, just a way to install a set of precompiled tools/libs. they are at best merely non-problematic. kernels are important. compilers are important. some libraries (libc, mpi) are important. dists seem to spend most of their effort on things like installers (which my cluster needed just once), GUI junk, and things like which 20 files in /etc contribute to the network config ;) I value Suse and RH as organizations simply because they both provide meaningful support to certain projects that I'm interested in (kernel, gcc, x86-64, etc) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jmdavis at mail2.vcu.edu Sat Nov 8 12:44:28 2003 From: jmdavis at mail2.vcu.edu (Mike Davis) Date: Sat, 08 Nov 2003 12:44:28 -0500 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <3FAD2B7C.2050302@mail2.vcu.edu> Mark, It seems that your are trying to logically argue that people who evangelize whatever form of linux/unix are being zealots. However, you avoid providing us with any logical reasons for this assertion. So, Debian works, as do Suse and Red Hat. Debian has a better package system than either Suse or RH. I haven't seen a press release from Debian telling me that Novels worldwide support network makes Debian a better OS. I still can't figure out what CNA's and CNE's have to do with linux, but I guess a tech person is a tech person as far as a CEO is concerned. In addition, I haven't seen a Debian press release saying that they were going to stop development of the current release model and create the "New Improved Enterpise Server" which they will happily sell you at $300-$1500 per year for maintenance and support that you don't use but you must buy to download patches for BUGS and errors. Hmm. So, Suse is trying to convince me that people who know nothing about linux are good support and RH is trying to convince me that I should pay for something that I don't use so that I may download patches. Seems to me that the Debianista's (as you refer to them) may be on to something. On the OpenBSD side, it works. No questions, no doubts. I've used it over the years in a variety of roles ranging from Firewall and packet filters, to servers, and to research type machines. Mike Davis Mark Hahn wrote: >>>>of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in >>>>particular for cluster usage in the near future. >>>> >>>> >>>why? >>> >>> >>I believe that NetBSD, even though is a very, very clean and clear OS >>with high-quality code, is much neglected. It deserves more attention >>and is perfectly suitable for the task at end. >> >> > >there's lots of good stuff out there; I notice you didn't state that >*BSD is actually cleaner/clearer/higher-quality than Linux though. >and such a statement would be manifestly untrue. the 'deserves' thing >is purely an aesthetic judgement, which is perfectly fine, but not >a reason to switch... > > > >>> I've never understood the evangelical aspect to *BSD (or for that >>> matter Debian). >>> >>> >>Evangelical? The entire Linux ``movement'' is based on such evangelism, >>much more so than Free- or NetBSD. >> >> > >how strange! absolutely everyone I know who uses Linux (MANY) >use it for purely practical reasons - speed, robustness, ease-of-whatever, >and sometimes simply because it's the most prevalent Unix. wait, OK, >I do know one Debianista, but there's on in every crowd ;) > > > >>>is there a tangible, measurable benefit? >>> >>> >>Well, the same as mentioned in the FreeBSD paper, for example. If all >> >> > >what paper is that? > > > >>my other machines are NetBSD machines, then having the cluster be NetBSD >>makes administration an order of magnitude easier. >> >> > >sure. as I said, *BSD seems mainly preferred by people who are already >committed to *BSD. and precisely what I'm looking for is any reason >to go BSD for the majority who already have Linux. > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cosmik.debris at elec.canterbury.ac.nz Sun Nov 9 15:02:07 2003 From: cosmik.debris at elec.canterbury.ac.nz (Cosmik Debris) Date: Mon, 10 Nov 2003 09:02:07 +1300 Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL Message-ID: <74C3DBA1ACA54844B781615F22D0DB18DA14D8@claude.elec.canterbury.ac.nz> > -----Original Message----- > From: Joey Sims [mailto:jsims at csiopen.com] > Sent: Friday, 7 November 2003 16:08 p.m. > To: beowulf at beowulf.org > Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL > > > Maybe someone could lend a hand and help Intel find out what > their unknown material is. Be careful! Don't spill it in > your lap for goodness sake.... Dohh! :-O > > I found this amusing: > > INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL > 11.07.03 > by Jennifer Tabor > HPCwire > ============================================================== Diamond???? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 9 17:48:00 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 10 Nov 2003 09:48:00 +1100 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <200311100948.05192.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 9 Nov 2003 08:32 am, Martin WHEELER wrote: > On Sat, 8 Nov 2003, Mark Hahn wrote: > > and precisely what I'm looking for is any reason > > to go BSD > > > > We-e-ell ... For starters, the experience might help get rid of this > inexplicable: 'Red Hat Is The Only One True God' attitude you seem to > want to inflict on the list readership. Mark is looking for benchmarks, basically. Some hard facts (well, OK, figures, benchmarks often don't quite qualify as fact :-) ) that will give some actual measure to whether one is better than the other. Also note that he's stated he's looking to be persuaded, and that he's not tied to Linux. He just wants whatever will get his jobs done fastest. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/rsQjO2KABBYQAh8RAvh6AJ9mA52+tBkmW1FEmS9Iuhl1CcJrrACeO+wp t/XI0f1tW9/dScTkCvBWB2c= =1P+Y -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 9 18:05:17 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 10 Nov 2003 10:05:17 +1100 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <20031107213648.GA11665@galactic.demon.co.uk> References: <20031107150056.GA16835@netmeister.org> <20031107213648.GA11665@galactic.demon.co.uk> Message-ID: <200311101005.23978.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 8 Nov 2003 08:36 am, Andrew M.A. Cater wrote: > Soft Landing Systems (SLS) introduced the whole concept of a Linux > distribution - a collection of code more or less known to work together. Not quite correct I believe. My understanding is that SLS is predated by the Manchester Computing Centre (MCC) distribution (the first distro I used around 199[23]). For instance v2.03 of the c.o.l.a FAQ (27th Jan 1993) says on distributions: MCC and SLS are more complete systems that contain most of what is needed for normal use. MCC is older, SLS includes X. ...and the Linux Distribution List says this: http://lightning.prohosting.com/~ldl/cgi-bin/show.cgi?action=2&show=169 MCC Interim Linux is currently the oldest distribution listed on the LDL. It was started by the Manchester Computing Centre in February of 1992, after they made Linux availible on their FTP site in November of 1991. The distribution was one of the first to use a combined boot/root disk. Several distributions were based off of MCC Interim Linux, including TAMU, MJ, and SLS (which later morphed into Slackware Linux, a distribution that's still alive today). Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/rsgtO2KABBYQAh8RAm6PAKCBenVvS1Ob7AgiCTWyRfcg25j+BACfUx8K XFjheVGrgqo3WpHUYEHmLXk= =IXy8 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Mon Nov 10 00:22:33 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Sun, 9 Nov 2003 22:22:33 -0700 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <3FAF1B62.1090900@tamu.edu> References: <20031107150056.GA16835@netmeister.org> <200311072249.hA7MnG3T000139@knockout.kirtland.af.mil> <3FAF1B62.1090900@tamu.edu> Message-ID: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> On Sun, Nov 09, 2003 at 11:00:18PM -0600, Gerry Creager N5JXS wrote: > Until the Debian model allows me to find libs where I am used to looking > for libs (and doesn't mandate /share over /usr/local) as well as a few > more little nagging issues, I'm not going there. I don't, honestly, > have time for me to adapt, and to make my students adapt! > > One of my hopes for LSB was that I'd be able to go from distro to distro > without some of those headaches. Hasn't happened yet. > > gerry It is interesting because one of the initial attractions for Debian was its organization of libraries and configuration files. Afer RH, it seemed totally transparent. I guess this is just a matter of personal taste. I would be surprised, though, if after trying apt-get, you could ever go back to the rpm model. Art > > Arthur H. Edwards wrote: > >On Fri, Nov 07, 2003 at 02:17:48PM -0500, Mark Hahn wrote: > > > >>>of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in > >>>particular for cluster usage in the near future. > >> > >>why? I've never understood the evangelical aspect to *BSD > >>(or for that matter Debian). is there a tangible, measurable benefit? > >>I'm not sure it's effective to advocate a niche OS/dist for > >>ideological reasons or just plain personal preference... > > > > > >It is true that Debian has a core of developers that are very > >committed to a particular definition of free software, and this often > >carries with it a surprisingly consistent set of philosophical > >POVs. Nonetheless, Debian is remarkably stable and it has, by far, the > >best package management system I have ever come across. I use it on > >all my machines with great overall happiness. With RH moving toward > >the Big-Bucks model of software, I will not be surprised if I see many > >new Debian users over the next few months. > > > >Art Edwards > > > > > >>_______________________________________________ > >>Beowulf mailing list, Beowulf at beowulf.org > >>To change your subscription (digest mode or unsubscribe) visit > >>http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > -- > Gerry Creager -- gerry.creager at tamu.edu > Network Engineering -- AATLT, Texas A&M University > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > Page: 979.228.0173 > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Nov 10 00:46:38 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 10 Nov 2003 16:46:38 +1100 Subject: Linux package management (was Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?)) In-Reply-To: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> References: <20031107150056.GA16835@netmeister.org> <3FAF1B62.1090900@tamu.edu> <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> Message-ID: <200311101646.39648.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 10 Nov 2003 04:22 pm, Arthur H. Edwards wrote: > I would be surprised, though, if after trying apt-get, you > could ever go back to the rpm model. Mandrake's urpmi is very apt-get like (works out dependencies, you can add other sources, etc) and works very well, and there's also apt4rpm. I've used both (as well as apt-get on Debian) and they're very capable. Yum has a good reputation, though I've never tried it. Lets face it, trying to convince people that distro X is better than distro Y is an exercise in futility, and quite pointless as it all comes down to a subjective judgement of what someone finds more appealing or capable. To me the diversity is good for the future of Linux, it means there's a lot of people with their own ideas that can grow and evolve and spread. The more the merrier. :-) Of course all this idealism tends to come down with a bump when you hit the hard reality of will vendor X support product Y on distro Z, and how much having (or not having) support means to you. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/ryY+O2KABBYQAh8RAj1lAJ4jLnFqT+PWGpYS1tYv6HgD6CU4yACcDUCs 7+e+rX3lh0rttbQ5BwKFfuo= =Ahvj -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Mon Nov 10 03:53:52 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Mon, 10 Nov 2003 09:53:52 +0100 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux Message-ID: <200311100853.hAA8rqI02462@dali.crs4.it> Glen Kaukola gave me a more precise explanation of the downgrade the resulted in a stable O/S for Tyen S2880. He wrote: "I downgraded from 2.01 to 1.07." and mentioned the URL http://www.tyan.com/support/html/b_s2880.html That page describe 1.07 as the first BIOS and 2.01 as the second, the latter release 20 august 2003. Curiously the AMD page for recommended motherboards http://www.amd.com/us-en/recmobo/ResultsHandler/1,,30_2252_869_8819%5E8821~68707,00.html describes the second release as 19 august 2003, version PON but then describes a newer version "TOY" version 2.01p with date 8 october 2004. Where I live, the only readily available board for Opteron is from Tyan, so I hope we collectively shed more light on the situation. best regards, Alan Scheinine _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Mon Nov 10 06:53:10 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Mon, 10 Nov 2003 08:53:10 -0300 (ART) Subject: Compiling HPL Message-ID: <20031110115310.40488.qmail@web12201.mail.yahoo.com> I trying to compile the HPL, it starts the process and stop with the following message: mpicc -o HPL_pdinfo.o -c -fomit-frame-pointer -O3 -funroll-loops -W -Wall -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I/work1/mathias/hpl/include -I/work1/mathias/hpl/include/Linux_ATHLON_FBLAS -I/opt/mpich/include ../HPL_pdinfo.c mpicc -o HPL_pdtest.o -c -fomit-frame-pointer -O3 -funroll-loops -W -Wall -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I/work1/mathias/hpl/include -I/work1/mathias/hpl/include/Linux_ATHLON_FBLAS -I/opt/mpich/include ../HPL_pdtest.c g77 -fomit-frame-pointer -O3 -funroll-loops -W -Wall -o /work1/mathias/hpl/bin/Linux_ATHLON_FBLAS/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /work1/mathias/hpl/lib/Linux_ATHLON_FBLAS/libhpl.a /usr/lib/libblas.a /usr/lib/libatlas.a /opt/mpich/lib/libmpich.a /opt/mpich/lib/libmpich.a(comm_split.o)(.text+0x138): In function `MPI_Comm_split': : undefined reference to `PMPI_Allreduce' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x63): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Allreduce' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x94): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Bcast' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x111): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Sendrecv' /opt/mpich/lib/libmpich.a(context_util.o)(.text+0x154): In function `MPIR_Context_alloc': : undefined reference to `PMPI_Allreduce' collect2: ld returned 1 exit status make[2]: ** [dexe.grd] Erro 1 make[2]: Leaving directory `/work1/mathias/hpl/testing/ptest/Linux_ATHLON_FBLAS' make[1]: ** [build_tst] Erro 2 make[1]: Leaving directory `/work1/mathias/hpl' make: ** [build] Erro 2 I think the problem is with the mpi. Well I change the makefile with the Make.Linux_ATHLON_FBLAS with the necessary modifications. Did I forget something? ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - 6MB, anti-spam e antiv?rus gratuito. Crie sua conta agora: http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 10 07:30:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 10 Nov 2003 07:30:13 -0500 (EST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> Message-ID: On Sun, 9 Nov 2003, Arthur H. Edwards wrote: > > It is interesting because one of the initial attractions for Debian > was its organization of libraries and configuration files. Afer RH, it > seemed totally transparent. I guess this is just a matter of personal > taste. I would be surprised, though, if after trying apt-get, you > could ever go back to the rpm model. > > Art It isn't "the rpm model" -- in both cases the packaging and metadata are adequate. Comparing apt to rpm is apples to oranges -- apt-get is a toplevel toolset to extract and resolve dependencies from the debian packages and use them to retrieve and install package(s) and their entire consistent dependency trees, by revision. The problem is that in the past there has been no comparable toolset for RPM packages and all the distributions that rely on them. For the last two or three years, there has been (first yup, now yum). It, too, is "totally transparent" and has, arguably features that some administrators might prefer (including considerable and increasingly fine-grained control over their own, local, repository images). Whether or not you've looked at yum and tried yum and compared yum's operation and features to apt, the existence of choices appears to be a good thing, as does "competition" of sorts (the friendly, slightly religious sort that tends to exist in the open source world:-). I know yum's primary developers quite well (since they work about fifty meters away from my office in the same building:-) and they are very, very dedicated and not at all religiously inclined towards Red Hat per se. Yum has been successfully used to make RPM-based repositories for just about all the primary RPM linux distributions, and I believe that people have even used it to distribute/maintain RPMs on Solaris boxes. At this particular moment, I think that yum makes RH (or if you prefer, Fedora) slightly preferrable to Debian in a scaled/automated LAN installation because both effectively automaintain after installation, but RH/Fedora permits the easy use of PXE/kickstart. Kickstart, after all, is (IMO) the reason RH maintained its dominance in spite of the otherwise pain of manipulating RPMs compared to Debian, and the reason it remained dominant among RPM-based distros as well. In part because of its existence, there is actually some talk of coming up with a rational unification of linux packaging schemes, reviewing and getting rid of package features that have proven to be more Evil than Good over the years, developing an XML schema, and lots of other good things that might actually reduce the "us and them" barriers for linux in general. I personally think that this would be a good thing. As Mark has been saying -- most of us are religious about open source, stability, functionality, but at best we are "used" to particular distributions and could be convinced to change fairly easily if advantages associated with the change outweighed the hassle of learning something new. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Sun Nov 9 23:21:07 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Sun, 09 Nov 2003 22:21:07 -0600 Subject: Linux vs FreeBSD clusters In-Reply-To: References: Message-ID: <3FAF1233.5030301@tamu.edu> OK, so let's get back on-track. I *am* a RH user for my systems, and less likely to sway in he near future if Fedora pans out. I went (somewhat reluctantly) to RH for its RPM capability, back when dselect had user interface problems (well, more precisely, its user interface was user-hostile) and Slack's package management was tarballs with no accountabilty. SuSE didn't load reliably, and Mandrake hadn't started snagging RedHat RPMS and repackaging them yet. debian has always, in its quest to be the "one pure linux" placed things where they weren't easy for me to find. I am a proponent of the OpenSource concept and movement, but I'm also too busy to have to chase where one distro puts things, determine why another doesn't clean up after itself, and finally have to fight almost every one for some form of accountability in package management. RedHat offered the package management piece at a time when that's what was needed to give me a little breathing room. As one friend put it this weekend, RedHat is as quirky as all the other distributions, but I seem to know where and what the quirks are. The change to Fedora has me reexamining the potential for change. It'd be easiest for me, and my lab and other activities, to go to Mandrake. SuSE has a good reputation with folks I know, work with, and trust. Debian still hasn't decided where they're going to put pieces of the code, and while the distro appears internally consistent (all the debian installs I've worked with _worked_), I often have to customize things, and I don't have time these days to go looking for where someone decided they felt libs or modules belonged, in contravention to the rest of the world. I'm not so sure why you're so hard on RedHat. I don't think I'd characterize it as the COBOL (it's an acronym) of the 21st century. Unless you're simply looking for a tag that usually raises the ire of scientific programmers who might have had to take a COBOL course in their academic past... This tends to be a rather high-end group of computational expertise. I learn a lot here. I've been known to contribute a bit in the past (back when I got to do high performance computing instead of managing a pack of grad students who now get the fun stuff). We don't really need the elitist plugs. If you don't like RedHat, fine. If someone else does, fine. Just recognize that you don't have to use my distro and I don't have to use yours. Exasperatedly yours, Gerry Martin WHEELER wrote: > On Sat, 8 Nov 2003, Mark Hahn wrote: > > >> and precisely what I'm looking for is any reason >>to go BSD > > > > > We-e-ell ... For starters, the experience might help get rid of this > inexplicable: 'Red Hat Is The Only One True God' attitude you seem to > want to inflict on the list readership. > > Doesn't really go down a storm here in Europe, where *real* Linux > experts are automatically expected to be experienced with SUSE, > Mandrake, Debian -- and of course, Red Hat. As well as others. > > I'm afraid the "Oh, we don't use anything but Red Hat" freaks come > over as being extremely blinkered, and tend not to get very far (or > quickly relegated to the Red Hat niche areas). > > Autre pays, autre moeurs. > > (Oh, and of course there are valid technical reasons behind everyone's > preferred choice for carrying out a particular task. However, blanket > dismissal of all but the reigning high-visibility sales publicity leader > does not count as a technical reason for most.) > > Sorry; but too many remarks on this list over the past two weeks have > been allowed to pass without comment, and have increasingly pressed the > wrong buttons for me. > > As far as I'm concerned, Red Hat is the Cobol of the 21st century. > > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Mon Nov 10 11:26:48 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Mon, 10 Nov 2003 09:26:48 -0700 Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: References: <200311100521.hAA5Li3P011069@knockout.kirtland.af.mil> Message-ID: <200311101626.hAAGPs3R013517@knockout.kirtland.af.mil> I think your point about newer package management tools is well-taken. I have tried the apt for rpms (when I was running the free scyld distribution) and it was clealy better. I have not tried yum, but I have not had enough (any) frustration with apt-get, and now apt-proxy, to make a move desirable. I also agree that my attachment is more to open-source than to Debian per-se, although after using RH, SUSe, and even turbo, Linux, I have stuck with Debian. I actually wish SUSe (now part of Novell) well, and I am sorry to see the demise of RH as we know it, because they are where increased user base comes from. However, I don't know whether SUSe will have better luck at generating revenue than did RH, and they may well go the same direction. It is that possibility that makes me think that Debian, or a similar, volunteer-based distribution may have the greater longevity. Art Edwards On Mon, Nov 10, 2003 at 07:30:13AM -0500, Robert G. Brown wrote: > On Sun, 9 Nov 2003, Arthur H. Edwards wrote: > > > > > It is interesting because one of the initial attractions for Debian > > was its organization of libraries and configuration files. Afer RH, it > > seemed totally transparent. I guess this is just a matter of personal > > taste. I would be surprised, though, if after trying apt-get, you > > could ever go back to the rpm model. > > > > Art > > It isn't "the rpm model" -- in both cases the packaging and metadata are > adequate. Comparing apt to rpm is apples to oranges -- apt-get is a > toplevel toolset to extract and resolve dependencies from the debian > packages and use them to retrieve and install package(s) and their > entire consistent dependency trees, by revision. > > The problem is that in the past there has been no comparable toolset for > RPM packages and all the distributions that rely on them. For the last > two or three years, there has been (first yup, now yum). It, too, is > "totally transparent" and has, arguably features that some > administrators might prefer (including considerable and increasingly > fine-grained control over their own, local, repository images). > > Whether or not you've looked at yum and tried yum and compared yum's > operation and features to apt, the existence of choices appears to be a > good thing, as does "competition" of sorts (the friendly, slightly > religious sort that tends to exist in the open source world:-). I know > yum's primary developers quite well (since they work about fifty meters > away from my office in the same building:-) and they are very, very > dedicated and not at all religiously inclined towards Red Hat per se. > Yum has been successfully used to make RPM-based repositories for just > about all the primary RPM linux distributions, and I believe that people > have even used it to distribute/maintain RPMs on Solaris boxes. At this > particular moment, I think that yum makes RH (or if you prefer, Fedora) > slightly preferrable to Debian in a scaled/automated LAN installation > because both effectively automaintain after installation, but RH/Fedora > permits the easy use of PXE/kickstart. Kickstart, after all, is (IMO) > the reason RH maintained its dominance in spite of the otherwise pain of > manipulating RPMs compared to Debian, and the reason it remained > dominant among RPM-based distros as well. > > In part because of its existence, there is actually some talk of coming > up with a rational unification of linux packaging schemes, reviewing and > getting rid of package features that have proven to be more Evil than > Good over the years, developing an XML schema, and lots of other good > things that might actually reduce the "us and them" barriers for linux > in general. I personally think that this would be a good thing. As > Mark has been saying -- most of us are religious about open source, > stability, functionality, but at best we are "used" to particular > distributions and could be convinced to change fairly easily if > advantages associated with the change outweighed the hassle of learning > something new. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 10 12:20:20 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 10 Nov 2003 12:20:20 -0500 (EST) Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: <200311100853.hAA8rqI02462@dali.crs4.it> Message-ID: > "I downgraded from 2.01 to 1.07." and mentioned the URL > http://www.tyan.com/support/html/b_s2880.html > That page describe 1.07 as the first BIOS and 2.01 as the second, the > latter release 20 august 2003. Curiously the AMD page for > recommended motherboards > http://www.amd.com/us-en/recmobo/ResultsHandler/1,,30_2252_869_8819%5E8821~68707,00.html > describes the second release as 19 august 2003, version PON but then > describes a newer version "TOY" version 2.01p with date 8 october 2004. I recently upgraded from 1.0.1 to 2880201l.rom (which seems to be the most recent from the site above). I did so mainly to let me boot my shiny new nodes without a keyboard ;( I haven't noticed any problems with the new bios. admittedly, mine are fairly simple machines: if pxe, the nics and userspace works, I'm happy... "dd if=/dev/mem bs=1k skip=640 count=384 | strings|less" shows: BIOS Date: 08/18/03 15:19:23 Ver: 08.00.08 TYAN Thunder K8S V2.01l BIOS that reminds me: has anyone done the gruntwork to figure out how to run a flash upgrade without a windows-formatted floppy and floppy drive? I'd actually feel fairly sanguine about pxe-booting to a faked floppy image that ran the installer... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Mon Nov 10 13:26:31 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Mon, 10 Nov 2003 10:26:31 -0800 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: References: Message-ID: <3FAFD857.2010303@cert.ucr.edu> Mark Hahn wrote: >that reminds me: has anyone done the gruntwork to figure out how to run >a flash upgrade without a windows-formatted floppy and floppy drive? >I'd actually feel fairly sanguine about pxe-booting to a faked floppy >image that ran the installer... > Best I've been able to come up with is turning a floppy image into a bootable cdrom. From one of the little howto's I've made for myself: mkdir /tmp/foo cp floppy.img /tmp/foo/ cd /tmp/foo mkisofs -J -r -b floppy.img . > /tmp/floppy.iso Cheers, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Sat Nov 8 20:11:48 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Sat, 8 Nov 2003 17:11:48 -0800 (PST) Subject: Performance Tuning on Clusters Course Message-ID: <20031109011148.63342.qmail@web11405.mail.yahoo.com> http://webct.ncsa.uiuc.edu:8900/public/PTCLUST/ Rayson __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shewa at inel.gov Mon Nov 10 14:34:23 2003 From: shewa at inel.gov (Andrew Shewmaker) Date: Mon, 10 Nov 2003 12:34:23 -0700 Subject: Compiling HPL In-Reply-To: <20031110115310.40488.qmail@web12201.mail.yahoo.com> References: <20031110115310.40488.qmail@web12201.mail.yahoo.com> Message-ID: <3FAFE83F.7000809@inel.gov> Mathias Brito wrote: > I trying to compile the HPL, it starts the process and > stop with the following message: > > /opt/mpich/lib/libmpich.a(comm_split.o)(.text+0x138): > In function `MPI_Comm_split': > : undefined reference to `PMPI_Allreduce' > > I think the problem is with the mpi. > > Well I change the makefile with the > Make.Linux_ATHLON_FBLAS with the necessary > modifications. Did I forget something? Mathias, You might want to check to see if you are mixing LAM and MPICH with "rpm -qf `which mpicc`" (note the backticks) on an rpm based distro. If it is an MPICH problem then you will get the best help from mpi-maint at mcs.anl.gov and you will have to provide them with more information. Read their FAQ to find out what they expect from you in order to provide support. http://www.mcs.anl.gov/mpi/mpich/docs/faq.htm Make sure you tell them what version of mpich you have installed and whether or not you built it yourself. Andrew -- Andrew Shewmaker, Associate Engineer Phone: 1-208-526-1415 Idaho National Eng. and Environmental Lab. P.0. Box 1625, M.S. 3605 Idaho Falls, Idaho 83415-3605 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nfaerber at penguincomputing.com Mon Nov 10 13:56:40 2003 From: nfaerber at penguincomputing.com (Nate Faerber) Date: 10 Nov 2003 10:56:40 -0800 Subject: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: References: Message-ID: <1068490599.28875.35.camel@m10.penguincomputing.com> > that reminds me: has anyone done the gruntwork to figure out how to run > a flash upgrade without a windows-formatted floppy and floppy drive? > I'd actually feel fairly sanguine about pxe-booting to a faked floppy > image that ran the installer... > You can use FreeDOS boot floppies if you are trying to free yourself from MS. Then if you want to free yourself from the floppy drive, try out MEMDISK from H. Peter Anvin (SYSLINUX) with your floppy image. Unfortunately, we have found that the Phoenix Flash utility (phlash16.exe) has not been working lately over a network with PXE/MEMDISK. We haven't contacted H. Peter about this, yet. It could be a MEMDISK issue or it could be a FreeDOS issue or maybe a bit of both. If you don't mind burning tiny CDs for BIOS upgrades, another option instead of floppy is CDR. Try DOSEMU with your floppy image. -- Nate Faerber, Engineer Tel: 415-358-2666 Fax: 415-358-2646 Toll Free: 888-PENGUIN PENGUIN COMPUTING www.penguincomputing.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 18:00:12 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 15:00:12 -0800 (PST) Subject: floppy images - Re: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: <3FAFD857.2010303@cert.ucr.edu> Message-ID: hi ya glen On Mon, 10 Nov 2003, Glen Kaukola wrote: > Mark Hahn wrote: > > >that reminds me: has anyone done the gruntwork to figure out how to run > >a flash upgrade without a windows-formatted floppy and floppy drive? > >I'd actually feel fairly sanguine about pxe-booting to a faked floppy > >image that ran the installer... > > > > Best I've been able to come up with is turning a floppy image into a > bootable cdrom. > > From one of the little howto's I've made for myself: > mkdir /tmp/foo > cp floppy.img /tmp/foo/ > cd /tmp/foo > mkisofs -J -r -b floppy.img . > /tmp/floppy.iso if you're trying to do upgrades ... to a faked floppy drive http://www.linux-consulting/www-linux/Boot/Boot.Loop.txt - lots of other booting/installer stuff c ya alvin # # Loopback Device # # # http://burks.brighton.ac.uk/burks/linux/rute/node19.htm # # # # dd if=/dev/zero of=/dev/ram0 count=1440 bs=1024 # dd if=/dev/zero of=/tmp/floppy count=1440 bs=1024 # losetup /dev/loop0 /tmp/floppy mke2fs /dev/loop0 # mount /dev/loop0 /mnt/test # # copy files to the loopback device ( will go to floppy later ) # # get a copy of tom's root-boot for sample contents # to copy onto your faked floppy version # ( shows you what files are required # # - use your minimized kernel and no modules # ls -al /mnt/test # umount /mnt/test losetup -d /dev/loop0 # # # Now copy the data to the floppy drive # # dd if=/dev/ram0 of=/dev/fd0 count=1440 bs=1024 dd if=/tmp/floppy of=/dev/fd0 count=1440 bs=1024 # # End of file _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 17:55:47 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 14:55:47 -0800 (PST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: <200311101626.hAAGPs3R013517@knockout.kirtland.af.mil> Message-ID: hi ya art On Mon, 10 Nov 2003, Arthur H. Edwards wrote: > I think your point about newer package management tools is > well-taken. I have tried the apt for rpms (when I was running the free > scyld distribution) and it was clealy better. I have not tried yum, > but I have not had enough (any) frustration with apt-get, and now > apt-proxy, to make a move desirable. I also agree that my attachment > is more to open-source than to Debian per-se, although after using RH, > SUSe, and even turbo, Linux, I have stuck with Debian. I actually wish > SUSe (now part of Novell) well, and I am sorry to see the demise of RH > as we know it, because they are where increased user base comes > from. However, I don't know whether SUSe will have better luck at > generating revenue than did RH, and they may well go the same > direction. It is that possibility that makes me think that Debian, or a > similar, volunteer-based distribution may have the greater longevity. good point ... - i think that "volunteer-based distro" will survive all the commercial methodologies ... - commercial folks are out to make $$$$ to attempt to cover the costs of marketing, sales, advertisement and analysts expectations - voluteers do what they do, because its what they like doing and will probably continue doing so for the next few eons *.rpm or *.deb or *.foo or *.tgz package managers... - i can make *.deb break its dependecies equally easily as *.rpm would be barfing ... - that if the dependcies arent installed, the app you're trying to install fails, or that if you use --no-deps than even worst things happen when you brute force things my choice/methodology is *.tgz original sources, and run my update scripts - updates/upgrades/installing/patches are distro independent ( linux, bsd, solaris, sgi, etc ) - copy the old files FIRST into a date-stamped tar ball backup ( you should always be able restore what it used to be ( before the failed updates/upgrades - merge the old files with the new configs - overwrite with the merged data and binaries other cluster installers http://www.Linux-Consulting.com/Cluster c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 18:40:36 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 15:40:36 -0800 (PST) Subject: floppy images - fix - Re: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux In-Reply-To: Message-ID: hi again ooppss ( fixing the i dont have a drive at the bootom ) On Mon, 10 Nov 2003, Alvin Oga wrote: > > Best I've been able to come up with is turning a floppy image into a > > bootable cdrom. > > > > From one of the little howto's I've made for myself: > > mkdir /tmp/foo > > cp floppy.img /tmp/foo/ > > cd /tmp/foo > > mkisofs -J -r -b floppy.img . > /tmp/floppy.iso > > if you're trying to do upgrades ... to a faked floppy drive > > > http://www.linux-consulting/www-linux/Boot/Boot.Loop.txt > - lots of other booting/installer stuff > > > c ya > alvin > > # > # Loopback Device > # > # > # http://burks.brighton.ac.uk/burks/linux/rute/node19.htm > # > # > # > # dd if=/dev/zero of=/dev/ram0 count=1440 bs=1024 > # > dd if=/dev/zero of=/tmp/floppy count=1440 bs=1024 > # > losetup /dev/loop0 /tmp/floppy > mke2fs /dev/loop0 > # > mount /dev/loop0 /mnt/test > # > # copy files to the loopback device ( will go to floppy later ) > # > # get a copy of tom's root-boot for sample contents > # to copy onto your faked floppy version > # ( shows you what files are required > # > # - use your minimized kernel and no modules > # > ls -al /mnt/test > # > umount /mnt/test > losetup -d /dev/loop0 > # > # > # Now copy the data to the floppy drive > # > # dd if=/dev/ram0 of=/dev/fd0 count=1440 bs=1024 ### > dd if=/tmp/floppy of=/dev/fd0 count=1440 bs=1024 dd if=/tmp/floppy of=/tmp/initrd.fakefloppy count=14440 bs=1024 gzip /tmp/initrd.fakefloppy # # your new boot stuff # cp /tmp/initrd.fakefloppy.gz /boot # # - look mahh, no /dev/fd0 :-) # > # > # End of file > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 10 19:45:55 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 10 Nov 2003 19:45:55 -0500 (EST) Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) In-Reply-To: Message-ID: On Mon, 10 Nov 2003, Alvin Oga wrote: > > hi ya art > > On Mon, 10 Nov 2003, Arthur H. Edwards wrote: > > > I think your point about newer package management tools is > > well-taken. I have tried the apt for rpms (when I was running the free > > scyld distribution) and it was clealy better. I have not tried yum, > > but I have not had enough (any) frustration with apt-get, and now > > apt-proxy, to make a move desirable. I also agree that my attachment > > is more to open-source than to Debian per-se, although after using RH, > > SUSe, and even turbo, Linux, I have stuck with Debian. I actually wish > > SUSe (now part of Novell) well, and I am sorry to see the demise of RH > > as we know it, because they are where increased user base comes > > from. However, I don't know whether SUSe will have better luck at > > generating revenue than did RH, and they may well go the same > > direction. It is that possibility that makes me think that Debian, or a > > similar, volunteer-based distribution may have the greater longevity. > > good point ... > > - i think that "volunteer-based distro" will survive all the commercial > methodologies ... > - commercial folks are out to make $$$$ to attempt to cover the > costs of marketing, sales, advertisement and analysts expectations > > - voluteers do what they do, because its what they like doing and will > probably continue doing so for the next few eons Alvin and Arthur, I don't think either one will disappear anytime soon and think that we've entered an era where the two can exhibit excellent synthesis. There is nothing wrong with commercial distributions, or commercial distributions making money, as long as they remember: a) They don't own their product. b) They are therefore at best selling added value, such as support. c) This puts pretty strict limits on what they can sanely charge. Some of the major distributions may be forgetting c) just a bit, but the market will correct this soon enough:-) Or maybe this is just wishful thinking and some marketing hype to give their stocks a bit of a bounce. Truthfully, the commercial distributions and individual/volunteer developers have achieved a moderately healthy synergism in all the various Linuxes. In fact, the commericial folks have provided a variety of valuable services -- collecting packages, running a moderately systematic debugging service (which actually has worked adequately for serious bugs, however poorly/slowly it works for ignorable/annoying bugs), applying critical patches, contributing whole applications and much needed support services to the development process. I certainly don't begrudge them a living and have gone out of my way to buy something from them periodically in the hope that the golden goose stays fat (enough) and will continue laying. I just think it is pretty silly of them to try to charge as much or more than their major commercial competitors for a product that they don't own and (for the most part) didn't develop. They won't get it and in the meantime they'll irritate a lot of people they should be (and have in the past) been working with, as well as for. On the other hand, it is perfectly reasonable to try to re-engage the volunteer community in the development process and to give the notion of "supported versions" a bit of a goose. In recent years, we all may literally have become somewhat complacent, trusting the commercial groups to PROVIDE those valuable services without our strong participation. If this is all Fedora (for example) is about, I'm all for it. However, the whole process >>has<< already started spawning new alternatives (such as caos) and I think "the community" has plenty of capacity to support plenty of non-commercial or very low margin alternatives (as one would expect, given the existence of Gnu itself, freebsd, debian, all of which support themselves by means of low margin events like T-shirt sales and donations). So the commercials may find that they've created something of a monster. Or four. Overall, I'm less cynical about the process than I was a month ago because I see some benefit that could arise from it. Companies like Red Hat, Mandrake, SuSE/Novell may learn from all this what the limits are on what they can charge and what added value they need to provide and where to earn a fair living (in contrast to a gross Bill Gates billionaire profit). They also are firmly reminded that they need us (the "volunteers" who as often as not own the software they sell) more than we need them. As I've said before and will repeat -- they aren't going to get anything like the prices they wish to charge businesses for "rawhide", and they >>must<< have a meaningful rawhide and community development process in order to sustain Linux's legendary stability and universal utility. The "community" may be reminded that although the software is free, supporting it in a distributional form isn't free, and if they don't take steps to fairly compensate whoever it is that is providing the service, they'd better make arrangements to do it themselves individually or collectively. The stress will also very likely create a spate of new programs, which is a very good thing. Some of them may even be revolutionary products, as I think we're about to enter an era where computers build themselves an operating environment from certified open sources in real-time and on demand, except where administrators deliberately do or mirror a prebuilt repository based on the same tools and sources for efficiency reasons. The GPL sanctifies the source package, new XML-based packaging schema and the web itself can guarantee cross-linux, maybe even cross-linux+BSD combined build compatibility. That is, SOMEWHERE in the very near future I think we are going to see the emergence of a completely new paradigm, one that finally ends the era of commercial software as we have known it in the past. The requisite tool components are all there, the broadband connections to the home required to sustain it are there, and I think the creative juices are cooking in developers' minds. The coming revolution will make even the notions of java and .net look tame and in retrospect a bit silly, as the entire notion of java and source application delivery will be just a tiny fraction of a source application delivery system that can and will deliver the entire operating system and all derivative tools (including java). C, perl, python, java, html, php -- sources of all sorts bundled into GPL packages with attached development processes and delivered directly to your system on demand for prices ranging from nothing (as most web services are delivered today) to a trivial amount for a snazzy subscription/security service. Hmm, sounds like a whole new .com concept, doesn't it? But I really think that is where we are going, fairly rapidly now. I do think that consumer Linux, especially, will suffer tremendously (indeed already is) from the largely unnecessary and unjustifiable price inflation -- as if Linux is somehow more expensive to support (badly) than Windows. 2004 should have been the year of consumer linux, and still would be if Red Hat would get out there with a $35 box set of the absolute kitchen sink followed by perhaps $25/year full update support per household. Or even less -- they now risk the community providing the installation and update support for free so efficiently that they can't charge even this. You can get pretty rich selling $15 objects to people, if you sell them to a LOT of people. Just ask J. Rowling (and her publisher). But who would buy even the best of Harry Potter books for $150 a copy? Especially when they could get pretty much the same book for free, or in an inexpensive $5 paperback version. Sounds like Econ 141 time -- I vaguely remember some nifty concepts such as "supply and demand", and "elastic and inelastic markets". We'll see if RH, Mandrake, SuSE/Novell remember them too. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Tue Nov 11 10:05:10 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Tue, 11 Nov 2003 09:05:10 -0600 Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: <3FB0FAA6.2030502@tamu.edu> One key element to look at is fabric speed. If the backplane can't keep up to wire-speeds, you're going to suffer some slowdown and latencies associated with the network. Whether that's a problem in your installation, or not, we can't tell from this range. However, if there's sufficient money I'd be buying the most capable switch I could from a backplane and port sustainability point as I could. gerry Keyan Mehravaran wrote: > Hi, > > I am planning to connect 8 dual Xeon PCs > with onboard gigabit through a switch and > I only need access to the "zeroth" node. > I have two questions: > > 1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? > > 2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? > > Please advise. > > Thank You, > > Kian Mehravaran > Research Assistant > 4110 Engineering > Michigan State University > East Lansing, MI 48823 > > __________________________________ > Do you Yahoo!? > Protect your identity with Yahoo! Mail AddressGuard > http://antispam.yahoo.com/whatsnewfree > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From keyanm at yahoo.com Tue Nov 11 09:24:45 2003 From: keyanm at yahoo.com (Keyan Mehravaran) Date: Tue, 11 Nov 2003 06:24:45 -0800 (PST) Subject: Gigabit Switch Message-ID: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Hi, I am planning to connect 8 dual Xeon PCs with onboard gigabit through a switch and I only need access to the "zeroth" node. I have two questions: 1) Is there any benefit to using "managed" switch rather than the "unmanaged" ones? 2) Is it possible to increase bandwidth by adding an extra gigabit NIC to each node? If the answer is yes, then should all the 16 ports connect to the same switch? Please advise. Thank You, Kian Mehravaran Research Assistant 4110 Engineering Michigan State University East Lansing, MI 48823 __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Tue Nov 11 11:17:50 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Tue, 11 Nov 2003 11:17:50 -0500 Subject: Gigabit Switch In-Reply-To: <3FB0FAA6.2030502@tamu.edu> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> <3FB0FAA6.2030502@tamu.edu> Message-ID: <20031111111750.D9711@www2> I will add that the ability of a second gigabit adapter to add to the total bandwith available to a node will depend a great deal on (a) the architecture of the node and (b) the adapter chosen. For example, two 32-bit gigabit adapters on the same PCI bus aren't going to do you much good, while two 64-bit adapters on separate PCI busses might. --Bob On Tue, Nov 11, 2003 at 09:05:10AM -0600, Gerry Creager N5JXS wrote: > > One key element to look at is fabric speed. If the backplane can't keep > up to wire-speeds, you're going to suffer some slowdown and latencies > associated with the network. Whether that's a problem in your > installation, or not, we can't tell from this range. However, if > there's sufficient money I'd be buying the most capable switch I could > from a backplane and port sustainability point as I could. > > gerry > > Keyan Mehravaran wrote: > >Hi, > > > >I am planning to connect 8 dual Xeon PCs > >with onboard gigabit through a switch and > >I only need access to the "zeroth" node. > >I have two questions: > > > >1) Is there any benefit to using "managed" > > switch rather than the "unmanaged" ones? > > > >2) Is it possible to increase bandwidth by > > adding an extra gigabit NIC to each node? > > If the answer is yes, then should all the > > 16 ports connect to the same switch? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Nov 11 12:32:32 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 11 Nov 2003 12:32:32 -0500 (EST) Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: On Tue, 11 Nov 2003, Keyan Mehravaran wrote: > I am planning to connect 8 dual Xeon PCs > with onboard gigabit through a switch and > I only need access to the "zeroth" node. > I have two questions: > > 1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? Frequently "managed" switches are a negative. An Ethernet switch should "just work". Providing configuration options just encourages setting the switch to flawed modes, such as forced-full-duplex or filtering packet types you thought you were not using. > 2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? Yes, you can marginally increase bandwidth. But it's not worth it. If you channel bond GbE, you'll likely get out-of-order packets on the receiving side and consume much more CPU to reassemble. If you trunk, you will not see higher peak bandwidth, and may still suffer from bad cache or interrupt affinity effects. You should use separate switches for channel bonding. Although it's possible to use VLAN to avoid this, that's brings us back to the switch configuration issue. And two half size switches are less expensive than one. Bottom line: use a single GbE channel unless there is a specific application reason to do otherwise. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Rafael.Tinoco at sun.com Tue Nov 11 11:41:31 2003 From: Rafael.Tinoco at sun.com (Rafael David Tinoco) Date: Tue, 11 Nov 2003 14:41:31 -0200 Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: <3FB1113B.2090903@sun.com> ooops, sorry about one thing i forget.. to make bonding connections with 2 NICs in each node, you would have to have 2 VLANS in the switch, 1 nic for each node in vlan0 (for ex) and the other in vlan1. so bonding could work regards rafael david tinoco sun professional services - brazil rafael.tinoco at sun.com Keyan Mehravaran wrote: >Hi, > >I am planning to connect 8 dual Xeon PCs >with onboard gigabit through a switch and >I only need access to the "zeroth" node. >I have two questions: > >1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? > >2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? > >Please advise. > >Thank You, > >Kian Mehravaran >Research Assistant >4110 Engineering >Michigan State University >East Lansing, MI 48823 > >__________________________________ >Do you Yahoo!? >Protect your identity with Yahoo! Mail AddressGuard >http://antispam.yahoo.com/whatsnewfree >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Nov 11 13:27:34 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 11 Nov 2003 13:27:34 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: > > 1) Is there any benefit to using "managed" > > switch rather than the "unmanaged" ones? > > Frequently "managed" switches are a negative. > An Ethernet switch should "just work". indeed. I usually take an Occam's apprach to features, too. but I was chatting with a big-gbe switch vendor last week, and thought of a couple of features which could be useful for HPC: 1. suppose you could attach a QOS/TOS tag to small packets, and have the switch give them preferential treatment. for instance, if there's a congested port with a backlog, let small packets "cut" the queue. 2. the vendor claims that multicast is reliable. I've often pondered whether multicast would be worth using in clusters, since it's going to be faster than even a tree-based multicast (as MPICH/LAM do, I think). 3. it would be neat to be able to query performance/load/queueing stats from the switch on a per-port basis. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Rafael.Tinoco at sun.com Tue Nov 11 11:39:51 2003 From: Rafael.Tinoco at sun.com (Rafael David Tinoco) Date: Tue, 11 Nov 2003 14:39:51 -0200 Subject: [Fwd: Re: Gigabit Switch] Message-ID: <3FB110D7.4030800@sun.com> hello keyan, We've made a 16 node cluster in a project and we've used 2 gigabit NIC with linux BONDING (to balance) for each node. There is no big changes at all in bandwidth because our cluster is not exchanging to much network information. To have 2 NIC in each node i think your application would have to exchange TOO MUCH information between the nodes, and when i say TOO MUCH .. i really mean it!! hehe Our cluster was: 16 hosts V60 SUN with dual XEON 2.8 - 1G RAM and 2 SCSI DRIVES about the managed switchs, i dont think there is any difference. but im not completly sure. regards rafael david tinoco sun professional services - brazil rafael.tinoco at sun.com Keyan Mehravaran wrote: >Hi, > >I am planning to connect 8 dual Xeon PCs >with onboard gigabit through a switch and >I only need access to the "zeroth" node. >I have two questions: > >1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? > >2) Is it possible to increase bandwidth by > adding an extra gigabit NIC to each node? > If the answer is yes, then should all the > 16 ports connect to the same switch? > >Please advise. > >Thank You, > >Kian Mehravaran >Research Assistant >4110 Engineering >Michigan State University >East Lansing, MI 48823 > >__________________________________ >Do you Yahoo!? >Protect your identity with Yahoo! Mail AddressGuard >http://antispam.yahoo.com/whatsnewfree >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ranjansm at psu.edu Tue Nov 11 13:41:10 2003 From: ranjansm at psu.edu (Ranjan S. Mehta) Date: Tue, 11 Nov 2003 13:41:10 -0500 Subject: Peculiar Problem :Any Help would be appreciated Message-ID: <3FB12D46.9010300@psu.edu> Hi all, I have a serial application, which allws me to hook into itself, using dynamically shared objects ( .so ). I wanted to use this .so file to do some parallel processing of the data, which I get from the application at run-time and then feed the results back. Now to add more complexity, I have to do it in Fortran !! ( my advisor needs it like that ). Has anyone done something like this. If yes, any help is appreciated. If NO, tell me why it cannot be done. Please comment, Thanks and regards Ranjan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Tue Nov 11 15:27:48 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 11 Nov 2003 21:27:48 +0100 (CET) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003, Mark Hahn wrote: > 1. suppose you could attach a QOS/TOS tag to small packets, and > have the switch give them preferential treatment. for instance, > if there's a congested port with a backlog, let small packets > "cut" the queue. That means that you actually have a backlog. Many switches, especially the cheap ones, have small buffers that can be filled fast, so further (small) packets might not make it to the switch at all. Furthermore, by letting packets go out-of-order, you make life harder for the receiver... > 2. the vendor claims that multicast is reliable. See Donald's answers to this very question in this thread: http://marc.theaimsgroup.com/?l=linux-net&m=106665132425192&w=2 > 3. it would be neat to be able to query performance/load/queueing stats > from the switch on a per-port basis. That is actually one of the 2 things that I use out of a managed switch... Althought I think the information should be available as SNMP, I use it only when I try to find out if there is some networking problem, which is rarely enough, so I always used the switch CLI or web interface to do it. The second thing that I use from a managed switch is VLAN - not for splitting it in half for bonding, but for things like separating the the control/login/NFS connection from the one used for parallel computation (in case of at least 2 NICs/node). -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Nov 11 16:47:50 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 11 Nov 2003 16:47:50 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: > > 1. suppose you could attach a QOS/TOS tag to small packets, and > > have the switch give them preferential treatment. for instance, > > if there's a congested port with a backlog, let small packets > > "cut" the queue. > > That means that you actually have a backlog. if you ever have two nodes sending to one node (eg gather), you will. > Many switches, especially the > cheap ones, have small buffers that can be filled fast, so further (small) right, but irrelevant. the topic is "is there any point to managable or otherwise fancy switches?" > Furthermore, by letting packets go out-of-order, you make life harder for > the receiver... TCP is good at dealing with out-of-order. practically by by definition! > > 2. the vendor claims that multicast is reliable. > > See Donald's answers to this very question in this thread: > > http://marc.theaimsgroup.com/?l=linux-net&m=106665132425192&w=2 I remember. the point is that the switch vendor claimed that full multicast was not lossy, contradicting Becker's claim. this vendor specializes in large, big-backplane chassis switches, so they might be right. it may be that Don was thinking of a cluster with multiple switches. I think latency is the real appeal of hw-supported multicast - if you want to do a barrier across 256 nodes, do you want a ~8-deep tree of user-level processes farming out your tinygrams (say, 8x50=400 us), or do you want a single 30 us multicast? > > 3. it would be neat to be able to query performance/load/queueing stats > > from the switch on a per-port basis. > > That is actually one of the 2 things that I use out of a managed switch... > Althought I think the information should be available as SNMP, I use it > only when I try to find out if there is some networking problem, which is > rarely enough, so I always used the switch CLI or web interface to do it. which misses the point. I'd like to be able to let a user find out that his node 12 is always bottlenecking the sim because of a problem with his domain decomp, for instance. summary stats are not useful, only per-port (preferably per-flow, but...) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From zarquon at zarq.dhs.org Tue Nov 11 19:40:41 2003 From: zarquon at zarq.dhs.org (zarquon at zarq.dhs.org) Date: Tue, 11 Nov 2003 19:40:41 -0500 Subject: Gigabit Switch In-Reply-To: <3FB1113B.2090903@sun.com> References: <20031111142445.44251.qmail@web12101.mail.yahoo.com> <3FB1113B.2090903@sun.com> Message-ID: <20031112004041.GB29819@earendel.org> On Tue, Nov 11, 2003 at 02:41:31PM -0200, Rafael David Tinoco wrote: > ooops, sorry about one thing i forget.. > > to make bonding connections with 2 NICs in each node, you would have > to have 2 VLANS in the switch, 1 nic for each node in vlan0 (for ex) and > the other > in vlan1. Depends on the switch. Some switches have a single mac address / port table, even if they have VLAN support. We had a big managed HP switch that behaved that way. R C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 11 19:07:16 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 11 Nov 2003 19:07:16 -0500 (EST) Subject: Peculiar Problem :Any Help would be appreciated In-Reply-To: <3FB12D46.9010300@psu.edu> Message-ID: On Tue, 11 Nov 2003, Ranjan S. Mehta wrote: > Hi all, > > I have a serial application, which allws me to hook into itself, using > dynamically shared objects ( .so ). > > I wanted to use this .so file to do some parallel processing of the > data, which I get from the application at run-time and then feed the > results back. > > Now to add more complexity, I have to do it in Fortran !! ( my advisor > needs it like that ). > > Has anyone done something like this. If yes, any help is appreciated. > > If NO, tell me why it cannot be done. > > Please comment, I think you'll have to do a better job of describing your program's expected flow, as I at least am very confused. You are making a library? With some sort of recursive call? You have to use Fortran (ooo, quel drag, mon!)? What has this to do with clusters? I can think of lots of ways to implement lots of things "like" what you may be trying to describe (but not in Fortran, which I took a vow never to code in again -- unless of course somebody offers me obscene quantities of money to do so:-). Most of them don't need shared libraries per se, which is one of the things I don't understand. The other is why beowulf list -- are you trying to invoke this recursion across a cluster or something? So that the "shared libraries" live on different systems? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Tue Nov 11 19:37:38 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 11 Nov 2003 16:37:38 -0800 Subject: Gigabit Switch In-Reply-To: References: Message-ID: <20031112003738.GA4558@greglaptop.internal.keyresearch.com> On Tue, Nov 11, 2003 at 04:47:50PM -0500, Mark Hahn wrote: > TCP is good at dealing with out-of-order. practically by by definition! Oh? Please share some test results. The reality is that out-of-order packets are a moderate load on the CPU at best, and Linux isn't exactly great at handling them, especially with multiple cpus and multiple interfaces. > I think latency is the real appeal of hw-supported multicast - if you > want to do a barrier across 256 nodes, do you want a ~8-deep tree of > user-level processes farming out your tinygrams (say, 8x50=400 us), > or do you want a single 30 us multicast? A reliable barrier built using unreliable multicast isn't 30 usec. And MPI programs rarely have barriers. Perhaps you know of an MPI application that really needs a barrier operation? I don't think I've run into one yet. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Nov 11 22:33:44 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 12 Nov 2003 11:33:44 +0800 (CST) Subject: Gridengine 6.0 (new features) Message-ID: <20031112033344.1533.qmail@web16809.mail.tpe.yahoo.com> It's actually old news, but no one mentioned it on this list: http://gridengine.sunsource.net/workshop22-24.09.03/proceedings.html They have presented a lot of new SGE 6.0 features, and i think the most famous one is the cluster queues, and the most interesting one is the SGE P2P client (just like UD or SETI at home). Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel.leiva at uam.es Wed Nov 12 07:58:07 2003 From: angel.leiva at uam.es (Rafael Angel Garcia Leiva) Date: Wed, 12 Nov 2003 12:58:07 +0000 Subject: Q: ATM Beowulf Message-ID: <200311121258.07155.angel.leiva@uam.es> Hi everybody, I am planning to build a cluster (around 300 nodes) for Monte-Carlo simulation. I will run the same program, but with different input data files, on each node. I expect that the computation time is much greater than communication time, and that I will have to transfer large amount of (input and output) data files from working nodes to the master server. Does make sense to use LAN emulation over ATM for this kind of clusters? Has anyone experimented with ATM interconnections? Do you think is it cost-effective today (specially compared to Fast / Gigabit Ethernet)? Thanks in advance. -- Rafael Angel Garcia Leiva Universidad Autonoma Madrid http.//www.uam.es/angel.leiva _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Nov 12 07:05:04 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 12 Nov 2003 13:05:04 +0100 (CET) Subject: Gigabit Switch In-Reply-To: <20031111142445.44251.qmail@web12101.mail.yahoo.com> Message-ID: On Tue, 11 Nov 2003, Keyan Mehravaran wrote: > 1) Is there any benefit to using "managed" > switch rather than the "unmanaged" ones? For a cluster scenario I would go for the unmanaged switch, since management features might reduce the performance of the switch and are not really needed. In a cluster, all that you want your switch to do is layer-2 switching. It is important that you know your switch's real performance. Sometimes, technical data sheets might help to find out about the capabilities of your switch, but sometimes data sheets are "inaccurate" (not to say "written based on wishfull dreaming"). We described such a switch and how we found out about its capabilities in our paper "Cost/Performance Tradeoffs in Network Interconnects for Clusters of Commodity PCs" [1]. So, before you order a cluster, specify exactly what your switch must be able to do. If it doesn't fulfil your specification, you might get a free upgrade ;-) - Felix [1] http://www.inf.ethz.ch/~rauch/#cac03 -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Wed Nov 12 07:54:07 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 12 Nov 2003 13:54:07 +0100 (CET) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003, Mark Hahn wrote: > right, but irrelevant. the topic is "is there any point to managable > or otherwise fancy switches?" Sorry, but I don't see the "irrelevant" part. You mentioned QOS/TOS and I replied that I don't see it as an advantage... Anyway, another point is that unmanaged switches with large number of ports are not so common. Maybe your switch vendor can explain why ? > TCP is good at dealing with out-of-order. practically by by definition! Sure. But at what cost ? Do you want to do some computation too on that node ? :-) If you talk about multicast, you eliminate TCP from the discussion. Then how do you synchronize between data transmitted over TCP and some zero-payload (barrier) sent through other protocol ? The stack guarantees in-order delivery of data for the same socket, not for different sockets or even more for different protocols. > the point is that the switch vendor claimed that full multicast was not > lossy, contradicting Becker's claim. Not only the switch has to be non-lossy but the stack as well. Packets can be dropped for example at network driver level (let's say Rx overrun) or simply transmission errors, so the forwarding logic in the switch is not involved. > this vendor specializes in large, big-backplane chassis switches, so > they might be right. I never thought of this but I can probably build a switch that never drops a correctly received packet by putting insane amount of buffers on it. But at what price (money as well as performance) ? And what do you do with transmission errors ? > or do you want a single 30 us multicast? Pardon me, but "reliable" and "single" do not match in my view. Reliable to me means that receiver acknowledges, which can be done by unicast or multicast again. And the problem (and latency) multiplies... > I'd like to be able to let a user find out ... You're too kind to your users :-) I've never been given information related to communication problems on any parallel computer I tried to make CHARMM run on for my group. Sure, I often got CPU performance counters, but never communication parameters. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Tue Nov 11 15:13:08 2003 From: nixon at nsc.liu.se (nixon at nsc.liu.se) Date: Tue, 11 Nov 2003 21:13:08 +0100 Subject: Gigabit Switch In-Reply-To: (Donald Becker's message of "Tue, 11 Nov 2003 12:32:32 -0500 (EST)") References: Message-ID: Donald Becker writes: > Frequently "managed" switches are a negative. > An Ethernet switch should "just work". > Providing configuration options just encourages setting the switch to > flawed modes, such as forced-full-duplex or filtering packet types you > thought you were not using. On the other hand, in the real world autonegotiation doesn't always work. And when you get in that spot, it's *very* nice to be able to lock down a port's mode. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 12 09:10:35 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 12 Nov 2003 09:10:35 -0500 (EST) Subject: Q: ATM Beowulf In-Reply-To: <200311121258.07155.angel.leiva@uam.es> Message-ID: On Wed, 12 Nov 2003, Rafael Angel Garcia Leiva wrote: > > Hi everybody, > > I am planning to build a cluster (around 300 nodes) for Monte-Carlo > simulation. I will run the same program, but with different input data files, > on each node. I expect that the computation time is much greater than > communication time, and that I will have to transfer large amount of (input > and output) data files from working nodes to the master server. > > Does make sense to use LAN emulation over ATM for this kind of clusters? Has > anyone experimented with ATM interconnections? Do you think is it > cost-effective today (specially compared to Fast / Gigabit Ethernet)? >From what you describe, it perhaps depends on what "large amounts of input and output files" works out to in more detail, but the answer is almost certainly not. The problem is embarrassingly parallel (completely independent programs) which makes it relatively easy to figure out how performance is likely to depend on the actual sizes (transfer times) of the programs relative to their run time. What you probably need to do is set up (or borrow from a friendly vendor -- most serious cluster vendors have a test cluster and will cheerily loan you an account) a few test nodes with gigabit interconnects. Measure the time it takes to actually run your program alone, then the time it takes to run your program WHILE copying its "next" input data set in and its "last" data set out (without any sort of e.g. ssh encryption -- use as raw as possible a data transfer tool). Depending on how effective your NIC is at doing DMA transfers and how I/O bound the MC code is, copying large files while your job is running may not count as a "serial penalty" against your CPU/memory bound computation. It will also give you a pretty accurate idea of what the actual transfer times are on Gbps ethernet relative to run times. This in turn will give you some clue as to required server capacity and whether or how to distribute/gather the files from a single server or multiple servers (whether or not this will help will of course depend on what use you make of the files when you get them). Part of the problem with your question is that as you frame it nobody can answer it -- yet. It requires detailed data. If by "large" you mean a 10 MB input file (which is yeah, pretty large) and a 100 MB output file (ditto), well, that is roughly 1-2 seconds on a 100BT connection for input transfer, 10-15 seconds for output transfer. If the program runs for 24 hours per input and output transfer, well, you could run on 300 nodes with 100BT and never warm up the lines. If by "large" you mean 10x (100MB in, 1 GB out), but still 24 hours computation it would STILL run pretty perfectly on 100BT. If by "large" you mean ANOTHER 10x (1 GB out, 10 GB in), you're finally up to a significant fraction of an hour for the data transfer at 100BT relative to a daylong run. However, at 1000BT you are still on the order of minutes of I/O total (maybe 90 Gbits to transfer on a 1 Gbps line at perhaps 50% efficiency -- three minutes or so?) and keeping all 300 nodes fed takes only 900 minutes (fifteen hours), which is less than the 1440 minutes of a day. So a single server with Gbps ethernet could distribute and collect results from 15+ hour long computations with 3 minutes of pure serial I/O per computation on 300 nodes and (barely) not block. If your NIC and disk channel manages DMA and can run modestly in parallel with your computation, it simply improves things. Obviously the important thing is the RATIO of computation to additional per node serial communication (assuming optimal round-robin task organization); if this ratio remains at roughly 300:1 a single server with the cheapest network that can sustain the ratio should suffice. If the ratio is less than this, you have to start to think. For example, would it be better (or even possible) to a) channel bond to increase bandwidth and decrease serial I/O time; b) use more than one server (servers are cheap relative to high end networks, and fortunately your task can use stacked relatively cheap switches as you'll only use a single channel at a time to the nodes in round robin -- IIRC jgigabit ethernet gets more expensive than alternative high end networks if you insist on putting 300 nodes on a single switching fabric); c) use a high end network. I don't usually think of ATM in this last category -- I'd think of Myrinet or SCI, probably the latter because it is switchless and perhaps a bit cheaper per node while still adequate, and you're VERY LIKELY to be able to find a task organization where it is adequate. Also, I'm pretty sure both Myrinet and SCI use DMA very effectively and will largely parallelize the actual data transfer with your computation. Ultimately, when you work out the actual numbers for the different networks (at least approximately) you have to do a cost benefit analysis and just pick the cheapest alternative that will scale to 300 nodes. Fortunately, with an EP computation it is pretty easy to actually do this very systematically and be pretty confident that you have a near-optimal design. rgb > > Thanks in advance. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Nov 12 08:56:20 2003 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 12 Nov 2003 14:56:20 +0100 (CET) Subject: Q: ATM Beowulf In-Reply-To: <200311121258.07155.angel.leiva@uam.es> Message-ID: On Wed, 12 Nov 2003, Rafael Angel Garcia Leiva wrote: > > Does make sense to use LAN emulation over ATM for this kind of clusters? Has > anyone experimented with ATM interconnections? Do you think is it > cost-effective today (specially compared to Fast / Gigabit Ethernet)? The list will know that a few years ago I was very enthusuastic about ATM. I put in a leading edge ATM network at a hospital in the UK for medical imaging. I was a proponent of using ATM for clustering. These days, I would say the idea is not so good. You get built-in Gigabit network interfaces on many motherboards, and Gigabit switches are really cheap. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ZukaitAJ at nv.doe.gov Wed Nov 12 09:43:21 2003 From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony) Date: Wed, 12 Nov 2003 06:43:21 -0800 Subject: mpirun + Scyld MPI Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E2B@lao-exchpo1-nt.nv.doe.gov> I am currently using MPI distributed with scyld which I believe is MPICH. I have 6 dual CPU nodes for a total of 12 cpu's. When ever I try to use 12 processors it puts 3 processes on one of the nodes and only one process on the master node. I have tried using a machinefile like master:2 .0:2 .1:2 .2:2 .3:2 .4:2 and -map and it doesnt seem to help. Any hints? -----Original Message----- From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com] Sent: Friday, November 07, 2003 10:04 AM To: beowulf at beowulf.org Subject: Beowulf digest, Vol 1 #1533 - 13 msgs Send Beowulf mailing list submissions to beowulf at beowulf.org To subscribe or unsubscribe via the World Wide Web, visit http://www.beowulf.org/mailman/listinfo/beowulf or, via email, send a message with subject or body 'help' to beowulf-request at beowulf.org You can reach the person managing the list at beowulf-admin at beowulf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beowulf digest..." Today's Topics: 1. Re:Scyld and MPICH. (William Gropp) 2. Re:Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux (Glen Kaukola) 3. Tyan 2880 and 2885 (Mike Sullivan) 4. Article: Sony Cell CPU to deliver two teraflops in 64-core config (Tod Hagan) 5. Re:Cluster Poll Results (tangent into OS choices) (=?iso-8859-1?Q?=C5smund_=D8deg=E5rd?=) 6. Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) (Rayson Ho) 7. INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL (Joey Sims) 8. Re:Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) (Craig Rodrigues) 9. Re:Article: Sony Cell CPU to deliver two teraflops in 64-core config (John Hearns) 10. OctigaBay 12K (Franz Marini) 11. Re:OctigaBay 12K (Robert G. Brown) 12. Re:Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) (Jan Schaumann) --__--__-- Message: 1 Date: Thu, 06 Nov 2003 11:48:24 -0600 To: "Zukaitis, Anthony" From: William Gropp Subject: Re: Scyld and MPICH. Cc: "'beowulf at scyld.com'" , mpi-maint at mcs.anl.gov At 10:55 AM 11/6/2003, Zukaitis, Anthony wrote: >I am having a problem with MPI_reduce and I believe that it is a buffer size >error. Is there a way to calculate the maximum size of the buffer and what >is the maximum size of the buffer allowed? It does not seem to be linear >with the number of processors. There should be no maximum buffer size, though the ch_p4 device does impose a limit when shared memory is used to transfer a message. Do you have an example program that we could test (Bug reports for MPICH should be sent to mpi-maint at mcs.anl.gov) Bill --__--__-- Message: 2 Date: Thu, 06 Nov 2003 10:37:59 -0800 From: Glen Kaukola To: Konstantin Kudin CC: beowulf at beowulf.org Subject: Re: Tyan S2880 (K8S) /S2885 (K8W) Opteron boards under Linux Konstantin Kudin wrote: > Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? > > We have a few of the s2880's. They were real problematic at first in that they'd constantly crash. But it turned out that when I downgraded the bios, all of our problems went away. Of course I also needed to install the latest 2.4.22 kernel before the machines would boot with the older bios installed. I'm not sure what to tell you about the serial ata support, as I've never played with it. Linux seems to support the nic just fine though. Hope that helps, Glen --__--__-- Message: 3 Date: Thu, 06 Nov 2003 13:39:51 -0500 From: Mike Sullivan Reply-To: mike.sullivan at alltec.com To: beowulf at beowulf.org Subject: Tyan 2880 and 2885 >Could anyone please share experiences with these >boards under linux? Is it still a risky proposition at >this time? I have used the 2880 under RedHat AS 2.1 and gingin64 and it works fine execpt for the SATA controller. I did not get the promise chip to work but did not spend a lot of time on it. The GigE interface works. The board was stable and I have been using them in NAS devices with 3ware cards. The SMDC option for these units works fairly well with the most recent console and you can get sensor data. > It seem like there are drivers for AMD-8111/8131/8151 >chipset on the AMD page, drivers for the Broadcom >network chip in other places. Any feedback on SATA >support for the Silicon Image Sil3114 SATA RAID >Accelerator and on SATA support in general? Any other >caveats? I also have both a 2882 and 2885 that I will be testing early next week with Suse Linux 9 for AMD64 and would will post my findings. Thanks in advance for any help! Konstantin -- Mike Sullivan Director Performance Computing @lliance Technologies, Voice: (416) 385-3255 x 228, 18 Wynford Dr, Suite 407 Fax: (416) 385-1774 Toronto, ON, Canada, M3C-3S2 Toll Free:1-877-216-3199 http://www.alltec.com --__--__-- Message: 4 Subject: Article: Sony Cell CPU to deliver two teraflops in 64-core config From: Tod Hagan To: Beowulf List Date: 06 Nov 2003 15:02:25 -0500 http://www.theregister.co.uk/content/3/33791.html It also mentions the ClearSpeed chip that was discussed here recently. --__--__-- Message: 5 Date: Thu, 06 Nov 2003 23:52:28 +0100 To: beowulf at beowulf.org Subject: Re: Cluster Poll Results (tangent into OS choices) Reply-To: aasmund at simula.no From: =?iso-8859-1?Q?=C5smund_=D8deg=E5rd?= Organization: Simula Research Laboratory AS On Wed, 5 Nov 2003 00:05:13 +0000, Andrew M.A. Cater wrote: > > On Tue, Nov 04, 2003 at 05:50:57PM -0500, Joe Landman wrote: >> >> There are interesting bits in debian. I am not sure it is necessarily >> the right choice for clusters due to the specific lack of commercial >> support for cluster specific items such as Myrinet, and the other high >> speed interconnects. > > Dan - if I build a _really big_ cluster, will you get Quadrics to do > Debian :) > Same goes for any other vendor - if you ask them nicely and make it > worth their while, they'll do it. In many cases, it's only a recompile > of a device driver to account for library differences, after all. > > HP use Debian internally, IIRC. Some of the Debian developers are also > HP folk - HP are potentially looking to support more of their products > under Linux? [See, for example, Debian Weekly News for today :) ]' Actually, we have quite recently installed a Itanium2 based cluster, using debian, because we want debian. We got HP to do it for us, using the (former Compaq) CMU tool. They did some porting to support debian in this tool... So, ask nicely (and put it as a requirement to let them get the deal), and you can get what ever you want ;-) >> Commercial compiler support for Debian (e.g. >> Intel, Absoft, et al) is largely non-existant as far as I know (please >> do correct me if I am wrong). No problem with Intel compilers on Debian (alien do the trick). -- [simula.research laboratory] ?smund ?deg?rd Scientific Programmer / Chief Sys.Adm phone: 67828291 / 90069915 http://www.simula.no/~aasmundo --__--__-- Message: 6 Date: Thu, 6 Nov 2003 16:59:51 -0800 (PST) From: Rayson Ho Subject: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) To: bioclusters at bioinformatics.org, beowulf , Linux Cluster , List A very good paper about building HPC clusters with FreeBSD: "Building a High-performance Computing Cluster Using FreeBSD" http://people.freebsd.org/~brooks/papers/bsdcon2003/ The author talked about hardware issues: KVM, BIOS redirection, CPU choices; and then talked about why he chose FreeBSD instead of Linux... he also did the port of GridEngine (SGE) to FreeBSD. Anyone tried to setup HPC clusters with *BSD?? Rayson --- Fernan Aguero wrote: > Any FreeBSD users willing to share clustering experiences > out there? > > Fernan __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree --__--__-- Message: 7 Subject: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL Date: Thu, 6 Nov 2003 22:07:53 -0500 From: "Joey Sims" To: Maybe someone could lend a hand and help Intel find out what their unknown material is. Be careful! Don't spill it in your lap for goodness sake.... Dohh! :-O I found this amusing: INTEL DISCOVERS NEW CHIP-SHRINKING MATERIAL 11.07.03 by Jennifer Tabor HPCwire ======================================================================== ====== Chip makers are searching for ways to create smaller and smaller computer chips, and researchers at Intel believe they have discovered a new material that would help them to do just that. Intel's announcement will garner much attention in an industry where the demand for products that push fundamental physical limits is ever increasing. A problem afflicting many chip makers today is the prevention of electrical currents from leaking outside their proper patches. Because the transistor gates are now becoming as small as just five atomic layers, chips need more power. In turn, they also need a more efficient cooling system. Intel has been having difficulties with the cooling of its chips -- the smaller they get (with etchings as small as 90-130 nanometers), the hotter they become. Recent reports say that the problem has even caused a delay in the Prescott, Intel's most advanced version of the Pentium. Though the new technology would not debut until approximately 2007, Intel is planning to scale down their current 90 nanometer chip size over the years to 65, followed by 45. It is at this point that Intel's new material, which is still unknown, would be introduced. Intel's discovery comes at the height of an intense industry wide search for a new material to replace silicon dioxide, which is used as insulator between the gate and the channel through which current flows in an active transistor. Intel researchers have been working on solving the chip predicament for five years in efforts to keep pace with Moore's Law. Gordon E. Moore, co-founder of Intel, believed that the number of transistors in the same space should double every 18 months. Intel believes they can continue to make short strides, despite the thoughts of many who doubt their ability to keep up such a pace. Though many researchers and competitors agree that Intel's announcement revolves around the most important research area in the chip industry, some feel that the lack of specific technical detail will deter scientists from assessing their claims. ================================================== Joey P. Sims 800.995.4274 - 242 Sales Manager 770.442.5896 - Fax HPC/Storage Division www.csilabs.net Concentric Systems, Inc. jsims at csiopen.com ====================================ISO9001:2000== --__--__-- Message: 8 Date: Thu, 6 Nov 2003 23:04:15 -0500 From: Craig Rodrigues To: Rayson Ho Cc: bioclusters at bioinformatics.org, beowulf , Linux Cluster , List Subject: Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) On Thu, Nov 06, 2003 at 04:59:51PM -0800, Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? Hi, Not quite the same as an HPC cluster, but take a look at the University of Utah's Emulab: http://www.emulab.net It is heavily based on FreeBSD (i.e. makes use of FreeBSD routing, Dummynet, etc.). The Emulab is a remotely accessible testbed that researchers can use to conduct network experiments. It consists of about 200 PC nodes. The same company that Brooks works for (Aerospace), has apparently set up an internal testbed based on the Emulab software developed at Utah. I use the Emulab every day as party of my research work at BBN, and it is an excellent facility. -- Craig Rodrigues http://crodrigues.org rodrigc at crodrigues.org --__--__-- Message: 9 Subject: Re: Article: Sony Cell CPU to deliver two teraflops in 64-core config From: John Hearns To: beowulf at beowulf.org Organization: Clustervision Date: Fri, 07 Nov 2003 10:13:40 +0100 And also on The Reg: http://www.theregister.co.uk/content/3/33813.html The Reg reckons Opteron 250s by early next year. --__--__-- Message: 10 Date: Fri, 7 Nov 2003 13:56:28 +0100 (CET) From: Franz Marini To: beowulf at beowulf.org Subject: OctigaBay 12K Hello, just discover this interesting, imho, company and its first product : http://www.octigabay.com/ Their first product is a linux opteron-based cluster that they said could scale up to 12K processors. The base system is a 3.5U shelf with 12 opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor latency and 77GB/s aggregate mem bandwidth. Seems nice, I would like to know what rgb and some of the other people in here think about it :) Have a nice day, Franz --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it --------------------------------------------------------- --__--__-- Message: 11 Date: Fri, 7 Nov 2003 08:44:11 -0500 (EST) From: "Robert G. Brown" To: Franz Marini Cc: beowulf at beowulf.org Subject: Re: OctigaBay 12K On Fri, 7 Nov 2003, Franz Marini wrote: > Hello, > > just discover this interesting, imho, company and its first product : > > http://www.octigabay.com/ > > Their first product is a linux opteron-based cluster that they said > could scale up to 12K processors. The base system is a 3.5U shelf with 12 > opterons, 1Tb/s aggregate switching capacity, 1 microsec interprocessor > latency and 77GB/s aggregate mem bandwidth. > > Seems nice, I would like to know what rgb and some of the other people > in here think about it :) Why, it looks simply lovely, as hardware I've never actually tried goes. I mean, if the octigabay people want to send me one for free just so I can write a review for it on this list and the brahma website, well, from the look of it I wouldn't kick it out of my machine room for chewing crackers... and I >>can<< be bought, folks, yes I can, just look at the brahma vendors page and my brazen demand for t-shirts in exchange for space:-) I'll even dig up something fine grained to run on it so that I can pretend to really test it. The bottom line is, well, the bottom line. Pretty isn't enough. Performance (even performance that is absolutely everything promised) isn't enough. It is PRICE performance that matters, or better yet cost-benefit. How does the cost compare to the benefits the design delivers in your environment. For my own personal code, for example, I don't NEED their fancy interconnect, and I can rack up a bunch of opterons for the cost of the basic hardware and a nice case to put them in. They'd therefore have to literally give it to me to make it a cost-benefit win (especially true since I just spent the last of my money in this grant cycle buying hey, whaddya know, a stack of 9 dual Opteron 242's for a hair over $20K). However, there are people out there who run fine grained synchronous parallel code that is bottlenecked at the network IPC level. Even THERE the computations have some intrinsic "value" in that there are finite amounts of money people are willing to pay to get them done, and there are choices. So ultimately it will come down to whether there is a match between the value of the computation (amount people are willing to pay to get it done), the needs of the computation, and the marketplace. It's one of these people that you need to ask about whether or not this is a good deal or good arrangement. My knee jerk reaction is that it is lovely but a bit too far into the big iron side (SP3-ish) to be likely to win a hard-nosed CB comparison relative to a DIY cluster with e.g. myrinet or SCI for MANY clustervolken (the market gets smaller and smaller the further up one travels to super-high-speed networks), but corporate consumers and the larger government consumers shy away from DIY, and even in the intermediate market it comes down to price/performance, eh? If they price it competitively with the other high speed networks and it has clear benefits (as it looks like it might) well then, who knows? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu --__--__-- Message: 12 Date: Fri, 7 Nov 2003 10:00:56 -0500 From: Jan Schaumann To: beowulf at beowulf.org Subject: Re: Linux vs FreeBSD clusters (was: how are the Redhat product changes affecting existing and future plans?) [Resending; this message was originally sent last night across the various mailing lists, but beowulf at beowulf.org chokes on the gpg signature. :-/ ] Rayson Ho wrote: > A very good paper about building HPC clusters with FreeBSD: > > "Building a High-performance Computing Cluster Using FreeBSD" > > http://people.freebsd.org/~brooks/papers/bsdcon2003/ > > The author talked about hardware issues: KVM, BIOS redirection, CPU > choices; and then talked about why he chose FreeBSD instead of Linux... > he also did the port of GridEngine (SGE) to FreeBSD. > > Anyone tried to setup HPC clusters with *BSD?? I have a 30 node NetBSD/i386 cluster, and just recently created the tech-cluster at netbsd.org mailing list. Some people are working on a port of SGE to NetBSD, too. I hope to expand the awareness of NetBSD in particular for cluster usage in the near future. Some URLs of relevance: http://guinness.cs.stevens-tech.edu/~jschauma/hpcf/ http://www.netbsd.org/MailingLists/#tech-cluster http://www.netbsd.org/ http://eurobsdcon.org/papers/#souvatzis http://bsd.slashdot.org/article.pl?sid=03/10/20/1523252&mode=thread&tid=122& tid=185&tid=190 http://bsd.slashdot.org/bsd/03/11/05/1536226.shtml?tid=122&tid=185&tid=190 -Jan -- Life," said Marvin, "don't talk to me about life." --__--__-- _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf End of Beowulf Digest _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 12 10:58:38 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 12 Nov 2003 10:58:38 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003, Mark Hahn wrote: > indeed. I usually take an Occam's apprach to features, too. > but I was chatting with a big-gbe switch vendor last week, > and thought of a couple of features which could be useful for HPC: > > 1. suppose you could attach a QOS/TOS tag to small packets, and > have the switch give them preferential treatment. for instance, > if there's a congested port with a backlog, let small packets > "cut" the queue. QOS/TOS tags already exist, but consider very carefully before you wish for a LAN switch that observes them. Using QOS/TOS is a good idea on multitraffic, multipath WANs, where bulk transfer would otherwise block telnet-like traffic. But ACKs bypassing data packets on a LAN will likely lead to congestion, with higher overall latency and dropped packets. This is compounded with now-common flow control, which only works within the LAN. > 2. the vendor claims that multicast is reliable. "Our equipment is perfect, and is not limited by fundamental principles". > I've often pondered whether multicast would be worth using in > clusters, since it's going to be faster than even a tree-based > multicast (as MPICH/LAM do, I think). You can construct a set of hardware+protocol+tuning that will work for a demo. But we now use Ethernet switches, not repeaters. Consider that modern switched Ethernet has many of the same constraints as Myrinet, SCI, and IB, where multicast must be emulated. Again, multicast is good for service discovery and low-rate communication. Just don't rely on it for bulk data delivery. > 3. it would be neat to be able to query performance/load/queueing stats > from the switch on a per-port basis. You can some of the info from the connected machines. But you likely want the time-averaged FIFO length and high-water-mark in packets and bytes, right? -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 12 12:23:52 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 12 Nov 2003 12:23:52 -0500 (EST) Subject: Q: ATM Beowulf In-Reply-To: Message-ID: On Wed, 12 Nov 2003, Robert G. Brown wrote: [[ Long, informative text deleted. ]] > Part of the problem with your question is that as you frame it nobody > can answer it -- yet. It requires detailed data. If by "large" you > mean a 10 MB input file (which is yeah, pretty large) and a 100 MB > output file (ditto), well, that is roughly 1-2 seconds on a 100BT Recalibrate your idea of "large". We recently encountered an application that was using PVFS with 50MB files over GbE. With the PVFS filesystem spread over 16 servers, that's only 3MB per machine, or 30-40 msec. of data transfer to permute the file contents. The 50MB files were too small to ignore the start-up overhead and get an accurate performance baseline! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 12 11:46:10 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 12 Nov 2003 11:46:10 -0500 (EST) Subject: Gigabit Switch In-Reply-To: Message-ID: On Tue, 11 Nov 2003 nixon at nsc.liu.se wrote: > Donald Becker writes: > > > Frequently "managed" switches are a negative. > > An Ethernet switch should "just work". > > Providing configuration options just encourages setting the switch to > > flawed modes, such as forced-full-duplex or filtering packet types you > > thought you were not using. > > On the other hand, in the real world autonegotiation doesn't always work. > And when you get in that spot, it's *very* nice to be able to lock > down a port's mode. The only autonegotiation problems I'm aware of is firmware bugs in early Cisco and 3Com switches. The switches would autonegotiate, but sometimes would not notice the parameter changes. To draw an automotive analogy "sometimes starter motors fail, thus all cars should have a hand crank in the front". The proper solution is to replace with working equipment. A fall back is to disable autonegotiation and use 10/100 speed sensing half duplex. A flawed approach (unfortunately the one recommended by Cisco) was to force speed and full-duplex. A great thing about autonegotiation is that it is automatic, transparent and extensible. Most installations are now using Ethernet flow control. Because it is configured using autonegotiation, almost no one knows that they have it. Things just work better. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From asabigue at fing.edu.uy Wed Nov 12 06:16:48 2003 From: asabigue at fing.edu.uy (Ariel Sabiguero) Date: Wed, 12 Nov 2003 14:16:48 +0300 Subject: Q: ATM Beowulf In-Reply-To: <200311121258.07155.angel.leiva@uam.es> References: <200311121258.07155.angel.leiva@uam.es> Message-ID: <3FB216A0.6070100@fing.edu.uy> Rafael Angel Garcia Leiva wrote: >Hi everybody, > >I am planning to build a cluster (around 300 nodes) for Monte-Carlo >simulation. I will run the same program, but with different input data files, >on each node. I expect that the computation time is much greater than >communication time, and that I will have to transfer large amount of (input >and output) data files from working nodes to the master server. > > I might be quite optimistic, but it looks like you do not need an expensive network at all. All you need is a good file-server and that's it. if you are not able to feed the nodes with a single fileserver, maybe you can rsync 6 of them an each of them feed serve 50 nodes. You can have your own comodity-gigabit-SAN between servers and attach 2x24 port switches to them. Let me emphasize this: I might be getting a wrong picture of your problem, but it seems that it accepts jitter and delay. From your description I assume that there is no need for expensive synchronizations.... I think you will save lot's of money and even get a better solution. Ariel >Does make sense to use LAN emulation over ATM for this kind of clusters? Has >anyone experimented with ATM interconnections? Do you think is it >cost-effective today (specially compared to Fast / Gigabit Ethernet)? > >Thanks in advance. > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: asabigue.vcf Type: text/x-vcard Size: 166 bytes Desc: not available URL: From gerry.creager at tamu.edu Wed Nov 12 09:38:10 2003 From: gerry.creager at tamu.edu (Gerry Creager (N5JXS)) Date: Wed, 12 Nov 2003 08:38:10 -0600 Subject: Q: ATM Beowulf In-Reply-To: References: Message-ID: <3FB245D2.7060300@tamu.edu> To expand on that a little bit, the overhead associated withn LANE is also going to be a problem for you, even ic communications will be a small segment of your cluster activity. If your application were capable of talking directly at the ATM layer, and not have to go through the framing and conversion issues with either LANE or IPOA, then you could see some advantages. However, the issue of sending 1500 byte... or 9kB... packets over ethernet, taking advantage of the somewhat faster encoding we see for ethernet today, far outstrips the potential benefits of ATM on a cluster environment. And making up 1500 byte packets, then resending them as 53 byte cells, with 10% overhead, just doesn't make sense anymore save in a connection-oriented network like WAN or Carrier. And, fwiw, the carriers are also dropping ATM for GBE, 10GBE and carrying same over multiplexed lamdas in the glass now. Gerry John Hearns wrote: > On Wed, 12 Nov 2003, Rafael Angel Garcia Leiva wrote: > > >>Does make sense to use LAN emulation over ATM for this kind of clusters? Has >>anyone experimented with ATM interconnections? Do you think is it >>cost-effective today (specially compared to Fast / Gigabit Ethernet)? > > > The list will know that a few years ago I was very enthusuastic > about ATM. I put in a leading edge ATM network at a hospital in > the UK for medical imaging. I was a proponent of using ATM for clustering. > > These days, I would say the idea is not so good. > You get built-in Gigabit network interfaces on many motherboards, > and Gigabit switches are really cheap. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Office: 979.458.4020 FAX: 979.847.8578 Cell: 979.229.5301 Pager: 979.228.0173 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 10 21:34:45 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 10 Nov 2003 18:34:45 -0800 (PST) Subject: Linux vs FreeBSD clusters ... In-Reply-To: Message-ID: hi ya robert - i truncated the long subject ... hope it didnt mess up anybody's procmail On Mon, 10 Nov 2003, Robert G. Brown wrote: > > good point ... > > > > - i think that "volunteer-based distro" will survive all the commercial > > methodologies ... > > - commercial folks are out to make $$$$ to attempt to cover the > > costs of marketing, sales, advertisement and analysts expectations > > > > - voluteers do what they do, because its what they like doing and will > > probably continue doing so for the next few eons > > Alvin and Arthur, > > I don't think either one will disappear anytime soon and think that > we've entered an era where the two can exhibit excellent synthesis. > There is nothing wrong with commercial distributions, or commercial > distributions making money, as long as they remember: > > a) They don't own their product. > > b) They are therefore at best selling added value, such as support. > > c) This puts pretty strict limits on what they can sanely charge. > > Some of the major distributions may be forgetting c) just a bit, but the > market will correct this soon enough:-) Or maybe this is just wishful > thinking and some marketing hype to give their stocks a bit of a bounce. yes.. thats the problem ... all the others ( the suits ) tend to forget where all their new widgets tehy are able to sell ( at high margins w/o any overhead r/d expenses ) are coming from i always buy the full blown cdroms from which ever distro the clients want to use ... ( my little contribution ) vs burning my own cdrom of other people's distro paying for "support" ( per phone call, per email, per task, per contract is fine ... ) making everybody pay at least $1500 for a "pre-packaged product" is not "fine" ( in my book, and lowering the "value" you get for it ) and unfortunately, if one big-boy does it, all the other equivalent big boys or medium sized boys will also try to bump their prices and try to compete with similar business models when they see their revenues drop from too high a price, than they might adjust their plans 6mon, a year later ... and jsut announce quarterly losses for a while ... - wish i can tell the landlords and ISPs that we suffered a big $$$ loss this quarter .. so you should buy more stock :-) have fun alvin - crystal ball says: "there needs to be a new generation of GPL licenses, that's free for non-commercial use, otherwise pay up..." and there's 20-30 different variations of the licenses .. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ole at scali.com Wed Nov 12 10:18:47 2003 From: ole at scali.com (Ole W. Saastad) Date: 12 Nov 2003 16:18:47 +0100 Subject: Gigabit Switch In-Reply-To: <200311121257.hACCvxS24146@NewBlue.scyld.com> References: <200311121257.hACCvxS24146@NewBlue.scyld.com> Message-ID: <1068650327.26645.85.camel@pc-2.office.scali.no> Some comments about the channel aggregation of Gibabit ethernet channels and switches. > > Message: 1 > Date: Tue, 11 Nov 2003 12:32:32 -0500 (EST) > From: Donald Becker > To: Keyan Mehravaran > cc: beowulf at beowulf.org > Subject: Re: Gigabit Switch > > On Tue, 11 Nov 2003, Keyan Mehravaran wrote: > > > I am planning to connect 8 dual Xeon PCs > > with onboard gigabit through a switch and > > I only need access to the "zeroth" node. > > I have two questions: > > > > 1) Is there any benefit to using "managed" > > switch rather than the "unmanaged" ones? > > Frequently "managed" switches are a negative. > An Ethernet switch should "just work". > Providing configuration options just encourages setting the switch to > flawed modes, such as forced-full-duplex or filtering packet types you > thought you were not using. > > > 2) Is it possible to increase bandwidth by > > adding an extra gigabit NIC to each node? > > If the answer is yes, then should all the > > 16 ports connect to the same switch? > Scali has also addressed this issue and developed a device for Scali MPI Connect (SMC) called Direct Ethernet Transport, DET. This bypasses the tcp/ip stack and works well with a single gigabit ethernet channel. However, the main benefit of DET is that it very simple to bond two NICs to a single device, usually named det2. For ScaMPI usage you just select at run time det2 instead of det0, tcp, myr0 or sci. > Yes, you can marginally increase bandwidth. But it's not worth it. > I do not agree, we see a marginally lower latency, but the bandwidth increase when going from one to two gigabit channels are in the order of 50-60%. When we get approx 110 MB/sec using one channel this approach yield 165 to 175 MB/sec. When doing exchange full duplex we can quote a number like 350 MB/sec. > If you channel bond GbE, you'll likely get out-of-order packets on > the receiving side and consume much more CPU to reassemble. > If you trunk, you will not see higher peak bandwidth, and may still > suffer from bad cache or interrupt affinity effects. Yes, this is true and if you do not really are constrained by bandwidth is does not pay off. Latency are most of the time the constraint. Check your application with a test setup using channel aggregation and measure yourself. > You should use separate switches for channel bonding. Although it's > possible to use VLAN to avoid this, that's brings us back to the > switch configuration issue. And two half size switches are less > expensive than one. > Switches up to 24 ports are so cheap today that you can just buy an extra. For large switches the algebra becomes more complex. The cost per port becomes so high that you can consider using a high performance interconnect like Myrinet, Infiniband or SCI. This will in addition to high bandwidth give you a very low latency which is beneficial for most applications. > Bottom line: use a single GbE channel unless there is a specific > application reason to do otherwise. Agree, but is said before, if your really need bandwidth there is an option you can try with Gigabit Ethernet. > > -- > Donald Becker becker at scyld.com > Scyld Computing Corporation http://www.scyld.com > 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system > Annapolis MD 21403 410-990-9993 -- Ole W. Saastad, Dr.Scient. Manager, ISV relations/Business Dev. dir. +47 22 62 89 68 fax. +47 22 62 89 51 mob. +47 93 05 74 87 ole at scali.com Scali - www.scali.com High Performance Clustering _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 12 18:38:22 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 13 Nov 2003 10:38:22 +1100 Subject: list managemnt issue In-Reply-To: References: Message-ID: <200311131038.27648.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 13 Nov 2003 10:22 am, Joel Jaeggli wrote: > can some plase blackhole anyone at systemsfirm.net their mailserver has > been perodically bouncing messages sent to the list for weeks... Also note that it should be trivial to automatically bin email from postmaster at systemsfirm.net in your MUA. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/ssRuO2KABBYQAh8RAiPTAJsH8EjXUgpj2IMRKR8ro7zch9vudACfSABa hp+rS9x/nshVQT+s9QZn9t4= =FiSc -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Wed Nov 12 18:22:36 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed, 12 Nov 2003 15:22:36 -0800 (PST) Subject: list managemnt issue Message-ID: can some plase blackhole anyone at systemsfirm.net their mailserver has been perodically bouncing messages sent to the list for weeks... Date: Wed, 12 Nov 2003 18:16:00 -0500 From: postmaster at systemsfirm.net To: joelja at darkwing.uoregon.edu Subject: Delivery Status Notification (Failure) Parts/Attachments: 1 Shown 8 lines Text (charset: Unknown) 2 Shown 338 bytes Message, "Delivery Status" 3 Shown 5.4 KB Message, "Re: building a RAID system" 3.1 Shown ~43 lines Text ---------------------------------------- Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Wed, 12 Nov 2003 18:03:31 -0500 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Wed, 15 Oct 2003 00:36:09 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Wed, 15 Oct 2003 01:24:42 -0400 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, 14 Oct 2003 23:02:08 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 14 Oct 2003 23:50:47 -0400 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, 14 Oct 2003 21:59:52 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 14 Oct 2003 22:59:51 -0400 Return-Path: Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, 14 Oct 2003 21:07:45 -0500 Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 14 Oct 2003 22:07:45 -0400 Return-Path: Received: from newblue.scyld.com ([64.237.107.19]) by ; Wed, 08 Oct 2003 17:42:03 -0500 Received: from NewBlue.scyld.com (localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.10.2/8.10.2) with ESMTP id h98L0tb28952; Wed, 8 Oct 2003 17:00:55 -0400 Received: from darkwing.uoregon.edu (root at darkwing.uoregon.edu [128.223.142.13]) by NewBlue.scyld.com (8.10.2/8.10.2) with ESMTP id h98Kvqb28616 for ; Wed, 8 Oct 2003 16:57:53 -0400 Received: from twin.uoregon.edu (twin.uoregon.edu [128.223.214.27]) by darkwing.uoregon.edu (8.12.10/8.12.10) with ESMTP id h98KwHEA005624 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NOT); Wed, 8 Oct 2003 13:58:17 -0700 (PDT) From: Joel Jaeggli X-X-Sender: joelja at twin.uoregon.edu To: Daniel Fernandez cc: beowulf at beowulf.org -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Nov 13 00:11:35 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 13 Nov 2003 00:11:35 -0500 (EST) Subject: list managemnt issue In-Reply-To: Message-ID: On Wed, 12 Nov 2003, Joel Jaeggli wrote: > can some plase blackhole anyone at systemsfirm.net their mailserver has > been perodically bouncing messages sent to the list for weeks... The logs report that the address was automatically deleted because of bounces on October 20. The problem is old queued messages, and an apparent mail loop. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Wed Nov 12 23:23:35 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed, 12 Nov 2003 20:23:35 -0800 Subject: list managemnt issue In-Reply-To: <200311131038.27648.csamuel@vpac.org> References: <200311131038.27648.csamuel@vpac.org> Message-ID: <20031113042335.GA17561@sphere.math.ucdavis.edu> On Thu, Nov 13, 2003 at 10:38:22AM +1100, Chris Samuel wrote: > On Thu, 13 Nov 2003 10:22 am, Joel Jaeggli wrote: > > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > been perodically bouncing messages sent to the list for weeks... > > Also note that it should be trivial to automatically bin email from > postmaster at systemsfirm.net in your MUA. > > Chris So instead of the list admin blocking a obviously broken mail setup, every poster to this list should have to setup a seperate filter? Please unsubscribe dan at systemsfirm.com. I'm getting hourly bounces to my posting of over a month ago, I suspect every other poster is as well. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Nov 13 00:51:56 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 12 Nov 2003 21:51:56 -0800 (PST) Subject: list managemnt issue In-Reply-To: Message-ID: On Thu, 13 Nov 2003, Donald Becker wrote: > On Wed, 12 Nov 2003, Joel Jaeggli wrote: > > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > been perodically bouncing messages sent to the list for weeks... > > The logs report that the address was automatically deleted because of > bounces on October 20. > > The problem is old queued messages, and an apparent mail loop. sometimes *-you-*, the subscriber have to clean things up and this systemsfirm.net stuff is simple to get rid of that junk .. and nope ... i dont get those systemsfirm.net junk anymore .... c ya alvin # # i added their ip# and domains to the /etc/mail/access list # # # cd /etc/mail # # make # # restart sendmail or exim or ?? # # if you don't have access or control of your mta, this might be a good # time to rethink your "how do i get email" strategy # or add some pop mail filtering before you get your mail, etc # systemsfirm.net REJECT - geez .. do you need help to fix your PC # 1.2.3.4 REJECT - more junk # # .. more ip# for their junk .. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Nov 12 23:52:51 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 13 Nov 2003 15:52:51 +1100 Subject: list managemnt issue In-Reply-To: <20031113042335.GA17561@sphere.math.ucdavis.edu> References: <200311131038.27648.csamuel@vpac.org> <20031113042335.GA17561@sphere.math.ucdavis.edu> Message-ID: <200311131553.05445.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 13 Nov 2003 03:23 pm, Bill Broadley wrote: > So instead of the list admin blocking a obviously broken mail setup, > every poster to this list should have to setup a seperate filter? No, I'm saying that *as* *well* as unsubscribing them they should filter those emails out. That's why I said "also" in my email. Just unsubscribing them won't fix this quickly. I'm getting multiple bounces for messages I sent to the beowulf list over a month ago now, so even if they are unsubscribed I think we'll all be getting these bounces for the forseeable future. Can anyone in the US get in contact with them by phone and tell them what's going on please ? The WHOIS data for them says: Organization: The Systems.Firm Daniel Philpott 348 Rutgers Street Rockville, MD 20850 US Phone: 3016109635 Fax..: 3016109636 Email: dphilpott at ex-pressnet.com Registrar Name....: Register.com Registrar Whois...: whois.register.com Registrar Homepage: http://www.register.com Domain Name: SYSTEMSFIRM.NET Created on..............: Tue, Jun 15, 1999 Expires on..............: Thu, Jun 15, 2006 Record last updated on..: Wed, Dec 04, 2002 Administrative Contact: The Systems.Firm Daniel Philpott 348 Rutgers Street Rockville, MD 20850 US Phone: 3016109635 Fax..: 3016109636 Email: dphilpott at ex-pressnet.com Technical Contact, Zone Contact: Register.Com Domain Registrar 575 8th Avenue - 11th Floor New York, NY 10018 US Phone: 902-749-2701 Fax..: 902-749-5429 Email: domain-registrar at register.com thanks, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/sw4jO2KABBYQAh8RAuEjAKCPnEyGif7S17OL+/ykJsJAc7kiIgCcDV2U AxTo9T98ZZIdhm8Ap8pliB8= =5kzH -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Thu Nov 13 05:16:15 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Thu, 13 Nov 2003 05:16:15 -0500 Subject: list managemnt issue In-Reply-To: <20031113042335.GA17561@sphere.math.ucdavis.edu> References: <200311131038.27648.csamuel@vpac.org> <20031113042335.GA17561@sphere.math.ucdavis.edu> Message-ID: <20031113051615.I9711@www2> On Wed, Nov 12, 2003 at 08:23:35PM -0800, Bill Broadley wrote: > > On Thu, Nov 13, 2003 at 10:38:22AM +1100, Chris Samuel wrote: > > On Thu, 13 Nov 2003 10:22 am, Joel Jaeggli wrote: > > > > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > > been perodically bouncing messages sent to the list for weeks... > > > > Also note that it should be trivial to automatically bin email from > > postmaster at systemsfirm.net in your MUA. > > > > Chris > > So instead of the list admin blocking a obviously broken mail setup, > every poster to this list should have to setup a seperate filter? > > Please unsubscribe dan at systemsfirm.com. > > I'm getting hourly bounces to my posting of over a month ago, I suspect > every other poster is as well. But that is just the problem: Only posters -- not non-posting subscribers -- are getting these messages because they are coming direct to the poster without going back through the list. There is absolutely nothing that Donald can do to fix that. Note that there are no such bounces from recent posts; this is because that address was removed from the list a long time ago. --Bob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Nov 13 10:56:17 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 13 Nov 2003 07:56:17 -0800 (PST) Subject: list managemnt issue In-Reply-To: <3FB3A5C9.80609@tamu.edu> Message-ID: Its not spam. it's a bunged mailserver and messages bounce to the sender rather than the list admins. mail to both postmaster and the techincal admin contact for that domain have failed. I have thirteen messages in my lamer folder from yesterday from their mailer telling me it couldn't deliver the message. If you can't configure your mta properly you don't belong on mailing lists, period, end of story. joelja On Thu, 13 Nov 2003, Gerry Creager N5JXS wrote: > Can someone *NOT* blackhole anyone? > > I'm sorry Joel. This is a hot-button. I've found myself blackholed in > the past because I was on an ISDN modem, on DSL, from a University, and > once for an open relay... that I didn't run. > > Getting out of the blackhole list is a PITA, and sometimes unachievable. > > I've firmly decided that blackhole/blacklisting spammers/potential > spammers/someone I just don't like/etc. isn't the answer. I've had > considerable success with graylisting, but that's not the problem here. > > What I guess I'm asking here is for the listadmin to unceremoniously > unsubscribe *@systemsfirm.net for much the same reason you asked for > them to be blackholed. > > Blacklist/blackhole implementations are, IMO, broken at best, and a > number of the administrators of same I've dealt with are pompous > juveniles who can't interact with a human when they make a mistake. > > gerry > > Joel Jaeggli wrote: > > can some plase blackhole anyone at systemsfirm.net their mailserver has > > been perodically bouncing messages sent to the list for weeks... > > > > Date: Wed, 12 Nov 2003 18:16:00 -0500 > > From: postmaster at systemsfirm.net > > To: joelja at darkwing.uoregon.edu > > Subject: Delivery Status Notification (Failure) > > Parts/Attachments: > > 1 Shown 8 lines Text (charset: Unknown) > > 2 Shown 338 bytes Message, "Delivery Status" > > 3 Shown 5.4 KB Message, "Re: building a RAID system" > > 3.1 Shown ~43 lines Text > > ---------------------------------------- > > > > > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Wed, 12 Nov 2003 18:03:31 -0500 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Wed, > > 15 Oct 2003 00:36:09 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Wed, 15 Oct 2003 01:24:42 -0400 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > > 14 Oct 2003 23:02:08 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Tue, 14 Oct 2003 23:50:47 -0400 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > > 14 Oct 2003 21:59:52 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Tue, 14 Oct 2003 22:59:51 -0400 > > Return-Path: > > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > > 14 Oct 2003 21:07:45 -0500 > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > > Microsoft SMTPSVC(5.0.2195.6713); > > Tue, 14 Oct 2003 22:07:45 -0400 > > Return-Path: > > Received: from newblue.scyld.com ([64.237.107.19]) by ; Wed, > > 08 Oct 2003 17:42:03 -0500 > > Received: from NewBlue.scyld.com (localhost.localdomain [127.0.0.1]) > > by localhost.localdomain (8.10.2/8.10.2) with ESMTP id > > h98L0tb28952; > > Wed, 8 Oct 2003 17:00:55 -0400 > > Received: from darkwing.uoregon.edu (root at darkwing.uoregon.edu > > [128.223.142.13]) > > by NewBlue.scyld.com (8.10.2/8.10.2) with ESMTP id h98Kvqb28616 > > for ; Wed, 8 Oct 2003 16:57:53 -0400 > > Received: from twin.uoregon.edu (twin.uoregon.edu [128.223.214.27]) > > by darkwing.uoregon.edu (8.12.10/8.12.10) with ESMTP id > > h98KwHEA005624 > > (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 > > verify=NOT); > > Wed, 8 Oct 2003 13:58:17 -0700 (PDT) > > From: Joel Jaeggli > > X-X-Sender: joelja at twin.uoregon.edu > > To: Daniel Fernandez > > cc: beowulf at beowulf.org > > > > > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Nov 13 11:55:43 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 13 Nov 2003 11:55:43 -0500 (EST) Subject: LECCIBG at SC2003 In-Reply-To: <20031113051615.I9711@www2> Message-ID: For those attending SC2003, there does not seem to be a Beowulf Bash on Monday night after the Opening Gala. There is however a LECCIBG: http://www.cluster-rant.com/LECCIBG/LECCIBG.html Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Nov 13 12:53:34 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 13 Nov 2003 12:53:34 -0500 (EST) Subject: Alternate LECCIBG notice In-Reply-To: <20031113051615.I9711@www2> Message-ID: If you have trouble getting to the cluster-rant.com try this: http://www.hpc-design.com/LECCIBG/ Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 12:59:56 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 13 Nov 2003 17:59:56 GMT Subject: list managemnt issue In-Reply-To: References: Message-ID: <20031113175956.23801.qmail@houston.wolf.com> Joel Jaeggli writes: > Its not spam. it's a bunged mailserver and messages bounce to the sender > rather than the list admins. mail to both postmaster and the techincal > admin contact for that domain have failed. I have thirteen messages in my > lamer folder from yesterday from their mailer telling me it couldn't > deliver the message. If you can't configure your mta properly you don't > belong on mailing lists, period, end of story. Took the easy way out. reported them to rfc-ignorant.org (should RBL them upon verification) and added them to our local RBL-so that's that. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Thu Nov 13 05:24:54 2003 From: clwang at csis.hku.hk (Cho Li Wang) Date: Thu, 13 Nov 2003 18:24:54 +0800 Subject: Cluster2003: Advance Registration (Due: Nov. 15) References: <3F80CC86.FAFBFAD2@csis.hku.hk> Message-ID: <3FB35BF6.A9466ABF@csis.hku.hk> ---------------------------------------------------------------- 2003 IEEE International Conference on Cluster Computing December 1-4, 2003 Sheraton Hong Kong Hotel & Towers Tsim Sha Tsui, Kowloon, Hong Kong Sponsored by: IEEE Task Force on Cluster Computing IEEE Computer Society The University of Hong Kong --------------------------------------------------------------- Dear Colleagues The deadline for Cluster2003 advance registration is approaching (Nov. 15). You are reminded to make your registration and hotel reservation ahead. For more detailed information about the conference activities, please visit our web site at http://www.csis.hku.hk/cluster2003/ Regards Cho-Li Wang and Daniel Katz Cluster2003 Program Co-Chairs ---------------------------------------------------------------- Conference Highlights ** There will be 48 contributed papers to be presented on Dec. 2-4, covering a wide range of subjects in cluster computing research. See our full program in : http://www.csis.hku.hk/cluster2003/advance-program.html ** Keynote addresses given by world-class researchers: - Linux Clusters for Extremely Large Scientific Simulation (Mark K. Seager, LLNL) - Distributed Security Enforcement for Trusted Cluster and Grid Computing (Kai Hwang, USC) - Cluster Computing for Financial Engineering (Thomas F. Coleman, Cornell) - Towards Grid and Cluster Federations (Satoshi Sekiguchi, AIST) - ... More deatils: http://www.csis.hku.hk/cluster2003/keynote.htm ** Panel discussion on Dec. 3: "Top Problems in Cluster Computing and Systems and Possible Solutions", led by - Rusty Lust, Angonne National Lab., USA - Dhabaleswar K. Panda, Ohio State University, USA - Phil Papadopoulos, San Diego Supercomputing Center, USA - Thomas Stricker, ETH, Switzerland - Chip Watson, DOE Jefferson Lab, USA - Zhiwei Xu, Institute of Computing Technology, China - Xiaodong Zhang (Moderator), NSF and College of William and Mary, USA ** Four tutorials, introducing state-of-the-art clustering technologies on Dec. 1 : 1. Designing Next Generation Clusters with Infiniband: Opportunities and Challenges 2. Using MPI-2: Advanced Features of the Message Passing Interface 3. The Gridbus Toolkit for Grid and Utility Computing 4. Building and Managing Clusters with NPACI Rocks More http://www.csis.hku.hk/cluster2003/tutorials.htm ** Vender technical talks and exhibitions by : HP, Microsoft, IBM, Extreme Networks, Sun Microsystems, Dawning, Intel, DELL, Cluster File System, Linux Networx, Mellanox, RackSaver, More http://www.csis.hku.hk/cluster2003/vender.htm ** A live Grid demo session on Dec. 4, featuring 8 innovative Grid applications/systems. More .. http://www.csis.hku.hk/cluster2003/griddemo.html ** A boat trip at the beautiful and romantic Victoria Harbour in the evening of Dec. 4 More .. http://www.csis.hku.hk/~chyu/cluster2003/boat.html ** One-day tour on Dec. 5 : - Enjoy the fascinating view of the Victoria Harbor from the Victoria Peak - The breathtaking views over sandy beaches at Repulse Bay - Visit the "Ling Chi" ("herb of the gods") plantation garden. - "Big Bowl Feast" ("Pun Choi" in Chinese) -- food served in wooden basins More http://www.csis.hku.hk/~chyu/cluster2003/tour.html ----------------------------------------------------------------- Conference/Tutorial Registration : http://www.csis.hku.hk/cluster2003/registration.htm Hotel Reservation: http://www.csis.hku.hk/cluster2003/hotel.htm --------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Nov 13 10:39:53 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 13 Nov 2003 09:39:53 -0600 Subject: list managemnt issue In-Reply-To: References: Message-ID: <3FB3A5C9.80609@tamu.edu> Can someone *NOT* blackhole anyone? I'm sorry Joel. This is a hot-button. I've found myself blackholed in the past because I was on an ISDN modem, on DSL, from a University, and once for an open relay... that I didn't run. Getting out of the blackhole list is a PITA, and sometimes unachievable. I've firmly decided that blackhole/blacklisting spammers/potential spammers/someone I just don't like/etc. isn't the answer. I've had considerable success with graylisting, but that's not the problem here. What I guess I'm asking here is for the listadmin to unceremoniously unsubscribe *@systemsfirm.net for much the same reason you asked for them to be blackholed. Blacklist/blackhole implementations are, IMO, broken at best, and a number of the administrators of same I've dealt with are pompous juveniles who can't interact with a human when they make a mistake. gerry Joel Jaeggli wrote: > can some plase blackhole anyone at systemsfirm.net their mailserver has > been perodically bouncing messages sent to the list for weeks... > > Date: Wed, 12 Nov 2003 18:16:00 -0500 > From: postmaster at systemsfirm.net > To: joelja at darkwing.uoregon.edu > Subject: Delivery Status Notification (Failure) > Parts/Attachments: > 1 Shown 8 lines Text (charset: Unknown) > 2 Shown 338 bytes Message, "Delivery Status" > 3 Shown 5.4 KB Message, "Re: building a RAID system" > 3.1 Shown ~43 lines Text > ---------------------------------------- > > > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Wed, 12 Nov 2003 18:03:31 -0500 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Wed, > 15 Oct 2003 00:36:09 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Wed, 15 Oct 2003 01:24:42 -0400 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > 14 Oct 2003 23:02:08 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Tue, 14 Oct 2003 23:50:47 -0400 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > 14 Oct 2003 21:59:52 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Tue, 14 Oct 2003 22:59:51 -0400 > Return-Path: > Received: from topcat.systemsfirm.net ([68.49.45.176]) by ; Tue, > 14 Oct 2003 21:07:45 -0500 > Received: from topcat ([111.111.111.3]) by topcat.systemsfirm.net with > Microsoft SMTPSVC(5.0.2195.6713); > Tue, 14 Oct 2003 22:07:45 -0400 > Return-Path: > Received: from newblue.scyld.com ([64.237.107.19]) by ; Wed, > 08 Oct 2003 17:42:03 -0500 > Received: from NewBlue.scyld.com (localhost.localdomain [127.0.0.1]) > by localhost.localdomain (8.10.2/8.10.2) with ESMTP id > h98L0tb28952; > Wed, 8 Oct 2003 17:00:55 -0400 > Received: from darkwing.uoregon.edu (root at darkwing.uoregon.edu > [128.223.142.13]) > by NewBlue.scyld.com (8.10.2/8.10.2) with ESMTP id h98Kvqb28616 > for ; Wed, 8 Oct 2003 16:57:53 -0400 > Received: from twin.uoregon.edu (twin.uoregon.edu [128.223.214.27]) > by darkwing.uoregon.edu (8.12.10/8.12.10) with ESMTP id > h98KwHEA005624 > (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 > verify=NOT); > Wed, 8 Oct 2003 13:58:17 -0700 (PDT) > From: Joel Jaeggli > X-X-Sender: joelja at twin.uoregon.edu > To: Daniel Fernandez > cc: beowulf at beowulf.org > > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jampuero at Princeton.EDU Thu Nov 13 14:44:32 2003 From: jampuero at Princeton.EDU (Jean Paul Ampuero) Date: Thu, 13 Nov 2003 14:44:32 -0500 Subject: bpcp with globbing Message-ID: <3FB3DF20.6040506@princeton.edu> I am trying to gather output files from the slaves to the master node using the bpcp command (example: bpcp 2:/scratch/ampuero/SCEC1/S001* ~ampuero) But globbing does not work the way I'd like: bpcp tries to expand the * in the master, instead of in the slave. Similar problem with "bpsh -a cp /scratch/ampuero/SCEC1/S0* ~ampuero". Is there a workaround ? -- Jean Paul (Pablo) AMPUERO Post-Doctoral Research Associate Princeton University - Department of Geosciences Guyot Hall, Room 321 B - Princeton NJ 08544 Office: (609) 258 2598 Mobile: (609) 638 0106 Fax : (609) 258 1671 http://geoweb.princeton.edu/people/resstaff/ampuero.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 15:52:06 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 13 Nov 2003 20:52:06 GMT Subject: list managemnt issue In-Reply-To: <3FB3A5C9.80609@tamu.edu> References: <3FB3A5C9.80609@tamu.edu> Message-ID: <20031113205206.3460.qmail@houston.wolf.com> Gerry Creager N5JXS writes: > Can someone *NOT* blackhole anyone? > > I'm sorry Joel. This is a hot-button. I've found myself blackholed in > the past because I was on an ISDN modem, on DSL, from a University, and > once for an open relay... that I didn't run. > > Getting out of the blackhole list is a PITA, and sometimes unachievable. > > I've firmly decided that blackhole/blacklisting spammers/potential > spammers/someone I just don't like/etc. isn't the answer. I've had > considerable success with graylisting, but that's not the problem here. > > What I guess I'm asking here is for the listadmin to unceremoniously > unsubscribe *@systemsfirm.net for much the same reason you asked for them > to be blackholed. > > Blacklist/blackhole implementations are, IMO, broken at best, and a number > of the administrators of same I've dealt with are pompous juveniles who > can't interact with a human when they make a mistake. Knee jerk reactions are never good-no matter what side of the RBL question you are on. I love RBLs. They do exactly what they are supposed to do, block abuse of my systems from the incompetent (at best), or deliberate abusive (at worse) without having to add more of a burden to my and my users. Also, I can with a two line entry control access to all my boxes. Don't wanna get RBL'd? Keep your system tighened down. Someone does not get into RBLs by keeping their system configured correctly. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dstanzi at clemson.edu Thu Nov 13 16:22:07 2003 From: dstanzi at clemson.edu (Dan Stanzione) Date: Thu, 13 Nov 2003 16:22:07 -0500 Subject: bpcp with globbing References: <3FB3DF20.6040506@princeton.edu> Message-ID: <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> >From: "Jean Paul Ampuero" >To: >Sent: Thursday, November 13, 2003 2:44 PM >Subject: bpcp with globbing > > I am trying to gather output files from the slaves to the master node > using the bpcp command (example: bpcp 2:/scratch/ampuero/SCEC1/S001* > ~ampuero) > But globbing does not work the way I'd like: bpcp tries to expand the * > in the master, > instead of in the slave. > Similar problem with "bpsh -a cp /scratch/ampuero/SCEC1/S0* ~ampuero". > Is there a workaround ? I'm assuming you're running Scyld? That's actually a problem with the shell itself more than bpcp; and unlike some utilities (like say, scp) you can't get around by simply putting the whole thing in quotes. I haven't had the patience to find a "good" fix, but it seems from your example you have all the files you need on an NFS or other shared directory, and you're just trying to move them to local disk. A really ugly work-around (but it takes 10 seconds) is just to put the cp command with the wildcard argument into a one-line script, then run "bpsh -a " and the arguments will be expanded on the slaves. There's got to be a better way to do this, but that will get you through the night. Any ideas, Don? Dan ---------------------------------------------- Dan Stanzione, PhD dstanzio at nsf.gov AAAS Fellow Division of Graduate Education National Science Foundation (703)292-8121 Fax: (703) 292-9048 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Thu Nov 13 17:01:20 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Thu, 13 Nov 2003 17:01:20 -0500 Subject: clubmask-0.6b1 released Message-ID: <1068760880.3988.17.camel@roughneck.liniac.upenn.edu> On a sourceforge mirror near you: Name : Clubmask Version : 0.6 Release : b1 Group : Cluster Resource Management and Scheduling Vendor : Liniac Project, University of Pennsylvania License : GPL-2 URL : http://clubmask.sourceforge.net Download : http://sourceforge.net/project/showfiles.php?group_id=1316&release_id=197383 What is Clubmask ------------------------------------------------------------------------------ Clubmask is a resource manager designed to allow Bproc based clusters enjoy the full scheduling power and configuration of the Maui HPC Scheduler. Clubmask uses a modified version of the Supermon resource monitoring software to gather resource information from the cluster nodes. This information is combined with job submission data and delivered to the Maui scheduler. Maui issues job control commands back to Clubmask, which then starts or stops the job scripts using the Bproc environment. Clubmask also provides builtin support for a supermon2ganglia translator that allows a standard Ganlgia web backend to contact supermon and get XML data that will disply through the Ganglia web interface. Clubmask is currently running on around 10 clusters, varying in size from 8 to 128 nodes, and has been tested up to 5000 jobs. Notes/warnings on this release: ------------------------------------------------------------------------------ Before upgrading, please make sure to save your /etc/clubmask/clubmask.conf file, as it may get overwritten. There are a few new variables in clubmask.conf, so beware! To use the resource requests, you must be running the latest snapshot of maui. Changes since 0.5: ------------------------------------------------------------------------------ Change the name from the god awfull absolute timestamp, to a more normal "string.number" format, where "string" is an arbitrary job name and "number" is the Nth time that the job name is being used. EX root.1, root.2, ... fix cmnodesshknownhosts to get the -n information from the bproc nodenumber that is given as the argument update to latest supermon APIs Feature Request #790938: add 'cmsubmit -r ' to run a job in a maui reservation. Fixed bug #791396: make sure processes get killed in Interactive jobs make sure bproc is running when starting resource_manager fix cmsubmit -h. it is now cleaner, and easier to understand add support for resource requirements on the nodes. swap, mem, disk, qos, reservation, and processors per node are supported now. see cmsumbit -h for more information. add infrastructure for architecture, os, network, arbitrary features as node resource requests. We do not get this information dynamically yet, so no need in letting people muck with it. add supermon_state daemon to manage the nodelist for supermon. keeps that logic out of resource_manager make sure there is at most one 'R' command in the pipeline for down nodes at any given time. No sense in asking nodes to revive if they have not responded to the last request yet. cleanup setup to perform RPM builds cleaner split /etc/clubmask/clubmask.conf to /etc/clubmask/{system,clubmask}.conf to allow variables that need user editing to live in clubmask.conf and the rest of the system varaibles to live in system.conf. This will let a user update to a newer version of Clubmask, and just copy over the old clubmask.conf to restore their configuration. migrate all docs from Docbook XML to Lyx/latex. All of the docs -- pdf, html single, and html multiple can be generated with a simple 'make' in the docs/ directory. add --secret-key to setup.py args for building maui and clubmask with same checksum key. This removes the need to edit setup.py when installing clubmask. Links ------------- Bproc: http://bproc.sourceforge.net Ganglia: http://ganglia.sourceforge.net Maui Scheduler: http://www.supercluster.org/maui Supermon: http://supermon.sourceforge.net Cheers~ Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Nov 13 18:06:55 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 14 Nov 2003 10:06:55 +1100 Subject: list managemnt issue In-Reply-To: <20031113205206.3460.qmail@houston.wolf.com> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> Message-ID: <200311141006.56305.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 14 Nov 2003 07:52 am, Angel Rivera wrote: > Don't wanna get RBL'd? Keep your system tighened down. Someone does not > get into RBLs by keeping their system configured correctly. This is rapidly getting off-topic, but this needed addressing. People *can* get into blacklists without doing anything wrong, if the maintainers are overly broad with their brush (such as listing entire class-C networks at hosting companies) or because of malicious/clueless submission of reports. Debian blacklisted: http://lists.debian.org/debian-devel/2002/debian-devel-200207/msg00044.html The Age newspaper report on SpamCop blocking entire /24's (and hence Politech and others): http://www.theage.com.au/articles/2002/12/19/1040174329829.html Politech blocked 3 times by SpamCop: http://www.politechbot.com/p-04121.html Peacefire become collateral damage to a netblock blocked by MAPS for hosting a site selling spam software: http://slashdot.org/yro/00/12/13/1853237.shtml RFC-ignorant blacklists the entire 202/7 netblock: http://www.apnic.net/mailing-lists/apops/archive/2001/10/msg00009.html Now note I'm not saying that blacklists are bad, just that there *is* collateral damage from them, and as long as people are aware of that and decide they can tolerate the risk then that's fine. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/tA6PO2KABBYQAh8RAtkNAJoCTbVE4xnRJFJSY5wHkszrC5zVQQCffzLW zL4ppbE6JHN1f7y2xWv9cxo= =jNQu -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Nov 13 17:54:07 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 13 Nov 2003 17:54:07 -0500 (EST) Subject: list managemnt issue In-Reply-To: <20031113205206.3460.qmail@houston.wolf.com> Message-ID: Just to summarize the immediate issue: - messages from about a month ago were being bounced back to the posters - the address associated with the 'borken system was removed Oct. 20 - there is nothing that can be done at beowulf.org to prevent the bogus bounces, as the bounces were not routed through here On Thu, 13 Nov 2003, Angel Rivera wrote: > Gerry Creager N5JXS writes: > > I'm sorry Joel. This is a hot-button. I've found myself blackholed in > > the past because I was on an ISDN modem, on DSL, from a University, and .. > I love RBLs. They do exactly what they are supposed to do, block abuse of .. > Don't wanna get RBL'd? Keep your system tighened down. Someone does not get > into RBLs by keeping their system configured correctly. Scyld runs a bunch of Linux and Beowulf-related mailing lists, with all of them moderated or posting limited to members. I average about 40 minutes a day moderating and adding to the spam filters. Even with that care, we have been on a number of RBLs. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Nov 13 18:54:56 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 13 Nov 2003 15:54:56 -0800 (PST) Subject: list managemnt issue - rbl In-Reply-To: <200311141006.56305.csamuel@vpac.org> Message-ID: hi ya a list admin cannot do nothing about stopping spam other than making it members only ... rest of the spam fighting applies to all lists and all regular user emails too thanx donald and crew for the list... its a lot of work to keep a list going On Fri, 14 Nov 2003, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 14 Nov 2003 07:52 am, Angel Rivera wrote: > > > Don't wanna get RBL'd? Keep your system tighened down. Someone does not > > get into RBLs by keeping their system configured correctly. > > This is rapidly getting off-topic, but this needed addressing. yes and no, 75% or more of the spams come from "mis-managed" clusters ( at least when i'm collecting data on sven virus ) http://www.Linux-Sec.net/Mail/SpamVirus/Sven/ all that "compute power" for sending out spam .. :-) ( *pout* ) > People *can* get into blacklists without doing anything wrong, if the - only way is if you sent spam .. - or if you inherited an ip# of a spammer - or if the rbl db admin decides to block all ip# in the class-C, class-B, country > maintainers are overly broad with their brush (such as listing entire class-C > networks at hosting companies) or because of malicious/clueless submission of > reports. BL works when: - the blacklister has a copy of the spam to prove their case ( it works when you run your own RBL lists .. or whatever way ( you/your corp decide to fight spam - building your own rbl is trivial or complicated ..depending on what you want it to do .. http://www.UCEAS.org/RBL.Server/ BL does NOT work when: - its done by a 3rd party - its done for free on tehir t1 or t3 line for everybody to use - the bl db maintainer adds any incoming report w/o checking - the bl db maintainers does NOT remove people from the bl db - the bl db mainterners adds the entire class-C, class-B or entire country to their bad-boy list > Debian blacklisted: > > http://lists.debian.org/debian-devel/2002/debian-devel-200207/msg00044.html mailing lists should be open for all, liek they are, in which case spam can get thru if mailing lists are members only, its one more hurdle for the spammer sw to subscribe, spam the list, and unsubscribe -------- whitelisting doesn't work, you dont know where your business inquiry is coming from challenge response system is too much of a pain for people to start a (business/social) conversation .... but does work, but again it tells the other business you dont know how else to stop spam without considering everybody a potential spammer ( a bad impression in my book ) tar pitting works if enough people implements it and slows down the sending ( misconfigured open relay or cracked ) server simple bouncing ( rejecting ) the incoming spam will fill the sending guilty spam server of rejected/bounced spam - dropping the spam is a bad idea, since it confirm to them that the email address is valid and you'd get more of it ---- 80% of all DNS is misconfigured too ... :-) c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From twhitcomb at apl.washington.edu Thu Nov 13 20:51:47 2003 From: twhitcomb at apl.washington.edu (Timothy R. Whitcomb) Date: Thu, 13 Nov 2003 17:51:47 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) Message-ID: We are having an ongoing issue with our compute cluster, running Scyld 28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute nodes and 1 master node. We are running the Navy's weather model. The problem: The model runs fine when run on 4 processors (1 on each compute node). However, when I use the SMP capabilities of the machine and try to run on, say, 8 processors (using both CPUs on each compute node), everything will run fine for a while. Then, at a non-consistent time, a node will invariably freeze up. The cluster loses its connection to the node and I cannot communicate with it using any of the cluster tools - sometimes it will automatically reboot, but usually it requires me to go perform a hard reset on the node. However, I have found that in most cases if I run 2 jobs in parallel (i.e. 2 4-cpu processes, each using only 1 CPU on each node) things seem to work fine. Nodes may still freeze from time to time but not nearly as often. The hardware: The cluster was obtained pre-built from PSSC LabsEach compute node is a dual-processor Tyan MB with 2 Athlon MP CPUS. They are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We are using the BeoMPI 1.0.7 implementation of MPICH compiled with: --with-device=ch_p4 --with-comm=bproc (note that I had to recompile BeoMPI with the PGI compiler to get it to work with the model) Again, we use Scyld Beowulf 28cz4 for the operating system uname -a gives Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 i686 unknown _Please_ help if you have _any_ suggestions whatsoever. I am at the end of my rope, and this is presenting a serious impediment to our research! If you need more information, let me know and I will be happy to provide it! Thanks... Tim Whitcomb Meteorologist University of Washington Applied Physics Lab twhitcomb at apl.washington.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 20:52:45 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 14 Nov 2003 01:52:45 GMT Subject: list managemnt issue In-Reply-To: <200311141006.56305.csamuel@vpac.org> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> <200311141006.56305.csamuel@vpac.org> Message-ID: <20031114015245.24191.qmail@houston.wolf.com> Chris Samuel writes: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 14 Nov 2003 07:52 am, Angel Rivera wrote: > >> Don't wanna get RBL'd? Keep your system tighened down. Someone does not >> get into RBLs by keeping their system configured correctly. > > This is rapidly getting off-topic, but this needed addressing. > > People *can* get into blacklists without doing anything wrong, if the > maintainers are overly broad with their brush (such as listing entire class-C > networks at hosting companies) or because of malicious/clueless submission of > reports. It is off-topic but... While I agree it can and has happen, I do not believe that it is that common-at least in my experience. I am one who believes in expaning blocks to include large swaths if IP space where abuse it rampant. Our policy is we LART 100% of the time for spam and block that IP. If the ISPs postmaster and/or abuse does not work we submit them to rfc-ignorant.org (an RBL we also use). if they do nothing, we do nothing also and they stay blocked. If we get another spam in the same C-equiv we block the entire C. Heck, I block entire countries. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Nov 13 21:38:18 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 13 Nov 2003 18:38:18 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: Message-ID: hi ya tim i usually was able to fix random cpu crashes by changing the kernel to the latest/greatest one at the time ( 2.4.22 ) if i were to use a new smp kernel today if the latest kernel has no effect, than there's some other serious hw problems ... timing issues ?? - make sure the kernel is compiled for athlon and not p4 and smp enabled - memory clock speeds, marginal memeory sticks ( get rid of generic no-name-brand memory sticks - swap memory sticks and see if the problem follow the memory ( keep good track of it so you can easily identify it if all the memory was thrown on the floor all at the same time - make sure you only have 1 ide disk on each cable to help identify any other hw issues - blow air, with a household 24"-36" fan, in the same direction as normal airflow of the system and see if it helps any - replace the home-made nic ables with molded cat-5 cables where its obvious that a person didnt hand-crimp the wires - swap the ports the the nic cables are connected to - inexpensive hubs is the next to swap out c ya alvin On Thu, 13 Nov 2003, Timothy R. Whitcomb wrote: > We are having an ongoing issue with our compute cluster, running Scyld > 28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute > nodes and 1 master node. We are running the Navy's weather model. > > The problem: > The model runs fine when run on 4 processors (1 on each compute node). > However, when I use the SMP capabilities of the machine and try to run on, > say, 8 processors (using both CPUs on each compute node), everything will > run fine for a while. Then, at a non-consistent time, a node will > invariably freeze up. The cluster loses its connection to the > node and I cannot communicate with it using any of the cluster tools - > sometimes it will automatically reboot, but usually it requires me to go > perform a hard reset on the node. > > However, I have found that in most cases if I run 2 jobs in parallel (i.e. > 2 4-cpu processes, each using only 1 CPU on each node) things seem to work > fine. Nodes may still freeze from time to time but not nearly as often. > > The hardware: > The cluster was obtained pre-built from PSSC LabsEach compute node is a > dual-processor Tyan MB with 2 Athlon MP CPUS. They > are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 > Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We > are using the BeoMPI 1.0.7 implementation of MPICH compiled with: > --with-device=ch_p4 --with-comm=bproc > (note that I had to recompile BeoMPI with the PGI compiler to get it to > work with the model) > Again, we use Scyld Beowulf 28cz4 for the operating system > uname -a gives > Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 > i686 unknown > > _Please_ help if you have _any_ suggestions whatsoever. I am at the end > of my rope, and this is presenting a serious impediment to our research! > If you need more information, let me know and I will be happy to provide > it! > > Thanks... > > Tim Whitcomb > Meteorologist > University of Washington Applied Physics Lab > twhitcomb at apl.washington.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jampuero at Princeton.EDU Thu Nov 13 16:45:06 2003 From: jampuero at Princeton.EDU (Jean Paul Ampuero) Date: Thu, 13 Nov 2003 16:45:06 -0500 Subject: bpcp with globbing In-Reply-To: <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> References: <3FB3DF20.6040506@princeton.edu> <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> Message-ID: <3FB3FB62.3010503@princeton.edu> Right, I am running Scyld. I already tried your "ugly" work-around, without success. I get: execmove: Exec format error Dan Stanzione wrote: > I'm assuming you're running Scyld? > >That's actually a problem with the shell itself more than bpcp; and unlike >some utilities (like say, scp) you can't get around by simply putting the >whole >thing in quotes. I haven't had the patience to find a "good" fix, but >it seems from your example you have all the files you need on an NFS >or other shared directory, and you're just trying to move them to local >disk. > >A really ugly work-around (but it takes 10 seconds) is just to put the >cp command with the wildcard argument into a one-line script, then >run "bpsh -a " and the arguments will be expanded on >the slaves. > >There's got to be a better way to do this, but that will get you >through the night. Any ideas, Don? > > Dan > >---------------------------------------------- >Dan Stanzione, PhD dstanzio at nsf.gov >AAAS Fellow >Division of Graduate Education >National Science Foundation >(703)292-8121 Fax: (703) 292-9048 > > > -- Jean Paul (Pablo) AMPUERO Post-Doctoral Research Associate Princeton University - Department of Geosciences Guyot Hall, Room 321 B - Princeton NJ 08544 Office: (609) 258 2598 Mobile: (609) 638 0106 Fax : (609) 258 1671 http://geoweb.princeton.edu/people/resstaff/ampuero.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Nov 13 21:31:49 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 13 Nov 2003 21:31:49 -0500 (EST) Subject: bpcp with globbing In-Reply-To: <3FB3DF20.6040506@princeton.edu> Message-ID: On Thu, 13 Nov 2003, Jean Paul Ampuero wrote: > I am trying to gather output files from the slaves to the master node > using the bpcp command (example: bpcp 2:/scratch/ampuero/SCEC1/S001* > ~ampuero) > But globbing does not work the way I'd like: bpcp tries to expand the * > in the master, > instead of in the slave. This is the behavior your would expect: globbing is done by the local shell before the expanded args are passed to bpcp. > Similar problem with "bpsh -a cp /scratch/ampuero/SCEC1/S0* ~ampuero". > Is there a workaround ? To get the exact semantics of shell globbing, you must use the shell itself. Here is a broken-out way of doing what you wish: FILES=`bpsh 2 /bin/sh -c 'echo /scratch/ampuero/SCEC1/S001*' for file in $FILES; do bpcp 2:$file ~ampuero done The clearest way to do this in a single line is to 'reverse' the direction of the copy: bpsh 2 bpcp `bpsh 2 /bin/sh -c 'echo /scratch/ampuero/SCEC1/S001*' master:~ampuero Some detailed notes about this: - By running the 'bpcp' on the "read-from" node we don't have to prepend "2:" to each file name. - 'bpcp' is modeled after 'rcp', which has a similar local globbing issue. - You use you might be able to simplify the specific commands to not use shell globbing. But using '/bin/sh' prevents misinterpretations, such confusing regexp and globbing: does ".*" mean all files or just dot files? - The command above relies on 'echo' being a shell built-in. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dstanzi at clemson.edu Thu Nov 13 17:25:35 2003 From: dstanzi at clemson.edu (Dan Stanzione) Date: Thu, 13 Nov 2003 17:25:35 -0500 Subject: bpcp with globbing References: <3FB3DF20.6040506@princeton.edu> <195c01c3aa2c$30eb0f20$795d9680@slimowitzXP> <3FB3FB62.3010503@princeton.edu> Message-ID: <198a01c3aa35$0e9b6740$795d9680@slimowitzXP> > Right, I am running Scyld. > I already tried your "ugly" work-around, without success. I get: > execmove: Exec format error > > OK, I knew I'd done that before, but I was able to re-create your problem. There are two gotchas to make this work: -Your shell script has to be a script, not just commands in a file you source (i.e., it starts #!/bin/sh). That takes care of the execmove error. -If you run a command that's not on the nodes (like cp) from a script, bpsh doesn't know to move that command out there, so, everything you use must reside on the node; you may need to stick a copy of /bin/cp out there somewhere. With those two caveats, it works just fine... I reiterate that there must be a better way... Dan ---------------------------------------------- Dan Stanzione, PhD dstanzio at nsf.gov AAAS Fellow Division of Graduate Education National Science Foundation (703)292-8121 Fax: (703) 292-9048 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Nov 13 22:00:36 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 14 Nov 2003 03:00:36 GMT Subject: list managemnt issue In-Reply-To: <3FB44137.5090603@tamu.edu> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> <3FB44137.5090603@tamu.edu> Message-ID: <20031114030036.28845.qmail@houston.wolf.com> Gerry Creager N5JXS writes: > Count to 10. Don't respond initially with what you wanted to say... > Okay, I've followed the advice. Good. Perhaps I should have done that too-I am very passionate about spam and fighting all forms of network abuse. But: *timeout here* I was not talking about "you" as in you or the beowulf list. It was a generic "you." > > Reread the initial portion of my e-mail. I *DO* keep my system tight. > The last known compromise was a buffer overflow in apache, exploited > before it was announced by apache or bugtraq. And fixed appropriately as > soon as a patch was available (within hours). Because of system configs and > safeguards, no spam emitted from the site. The one previous to that was > caused by a buffer overflow exploit in wuftpd. That represents the last > time wuftpd ran on one of my systems. It also resulted in forensics > running back thru 3 other compromised systems in the US, and to 2 > originating machines in Germany. And some detentions (I never got final > word on arrests/convictions, if any). This is not what I would consider an open system. I certainly spend an awful lot of time keeping and eye on my system and fighting all of the slick ways they find to get spam through all my rbl, filters and avs. I stopped a hacker from UPenn (I think it was) as he was hacking. When they got to his house he was asleep with his girlfriend-someone had hacked into this linux box that was wide open. That I do consider negligent. > I've not had a documented case of an open relay. I've not been > appropriately accused of having spam transit any of my systems. I perform > periodic security audits. I no longer run honey-pots and tarpits because > of an Attorney General's opinion on their legality, but I have. > > AND YOU ARE GOING TO TELL ME TO TIGHTEN UP MY SYSTEM? See above. I am not sure they wouldn't pass muster. if someone is not predisposed to being a criminal and tresspassing and stealing from you-then having them is of no negative value. I am not Don Quixote. I am not trying to track down and chase spammers to ground.I do not run them. I do not smtp scan other boxes. All I am trying to do is keep spam out my box and those of my 2000 or so email users and when it does, I log it, keep a copy of the spam (kinda hard to protest one's innocent under those conditions)and RBL them until they get it fixed and hades freezes over-which ever comes first. I have been subject to one semi-spam complaint. Years ago. You can find it in NANE. It was a camera company that used my domain name internally and they spammed. > Sorry about your _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Nov 13 21:43:03 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 13 Nov 2003 20:43:03 -0600 Subject: list managemnt issue In-Reply-To: <20031113205206.3460.qmail@houston.wolf.com> References: <3FB3A5C9.80609@tamu.edu> <20031113205206.3460.qmail@houston.wolf.com> Message-ID: <3FB44137.5090603@tamu.edu> Count to 10. Don't respond initially with what you wanted to say... Okay, I've followed the advice. Reread the initial portion of my e-mail. I *DO* keep my system tight. The last known compromise was a buffer overflow in apache, exploited before it was announced by apache or bugtraq. And fixed appropriately as soon as a patch was available (within hours). Because of system configs and safeguards, no spam emitted from the site. The one previous to that was caused by a buffer overflow exploit in wuftpd. That represents the last time wuftpd ran on one of my systems. It also resulted in forensics running back thru 3 other compromised systems in the US, and to 2 originating machines in Germany. And some detentions (I never got final word on arrests/convictions, if any). I've not had a documented case of an open relay. I've not been appropriately accused of having spam transit any of my systems. I perform periodic security audits. I no longer run honey-pots and tarpits because of an Attorney General's opinion on their legality, but I have. AND YOU ARE GOING TO TELL ME TO TIGHTEN UP MY SYSTEM? Angel Rivera wrote: > Gerry Creager N5JXS writes: > >> Can someone *NOT* blackhole anyone? >> I'm sorry Joel. This is a hot-button. I've found myself blackholed >> in the past because I was on an ISDN modem, on DSL, from a University, >> and once for an open relay... that I didn't run. >> Getting out of the blackhole list is a PITA, and sometimes unachievable. >> I've firmly decided that blackhole/blacklisting spammers/potential >> spammers/someone I just don't like/etc. isn't the answer. I've had >> considerable success with graylisting, but that's not the problem here. >> What I guess I'm asking here is for the listadmin to unceremoniously >> unsubscribe *@systemsfirm.net for much the same reason you asked for >> them to be blackholed. >> Blacklist/blackhole implementations are, IMO, broken at best, and a >> number of the administrators of same I've dealt with are pompous >> juveniles who can't interact with a human when they make a mistake. > > > Knee jerk reactions are never good-no matter what side of the RBL > question you are on. > I love RBLs. They do exactly what they are supposed to do, block abuse > of my systems from the incompetent (at best), or deliberate abusive (at > worse) without having to add more of a burden to my and my users. Also, > I can with a two line entry control access to all my boxes. > Don't wanna get RBL'd? Keep your system tighened down. Someone does not > get into RBLs by keeping their system configured correctly. -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Nov 14 01:11:20 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 14 Nov 2003 01:11:20 -0500 (EST) Subject: mpirun + Scyld MPI In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E2B@lao-exchpo1-nt.nv.doe.gov> Message-ID: On Wed, 12 Nov 2003, Zukaitis, Anthony wrote: > I am currently using MPI distributed with scyld which I believe is > MPICH. Which version of Scyld? > I have 6 dual CPU nodes for a total of 12 cpu's. When ever I try to use 12 > processors it puts 3 processes on one of the nodes and only one process on > the master node. That's the preferred behavior, and thus the default. The initial single process, which will become MPI Rank 0, is on the master. The initialization and scheduling is done single threaded. Additional processes are created when MPI_Init() is called. An alternate behavior is putting all processes on compute nodes. This leaves the master free to manage the jobs. Rank 0 will be on a compute node and thus may not have access to the full set of file systems and scheduling information. > I have tried using a machinefile like > master:2 > .0:2 > .1:2 > .2:2 > .3:2 > .4:2 Using a 'machinefile' is old-fashioned and inflexible. Read the 'beomap' section in the manual for details on the many scheduling options available with Scyld. I'm guessing that you want the control of specifying an explicit job map with the environment variable or command-line option: --map BEOWULF_JOB_MAP Use the colon-delimited list to specify which nodes to run on. It's also possible for the application to influence or specify a process mapping, or for the administrator to install a alternate scheduler as a dynamic library. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Nov 14 01:16:02 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 14 Nov 2003 01:16:02 -0500 (EST) Subject: 5th Annual Beowulf Bash at SC2003 -- early Wednesday evening Message-ID: Everyone attending SC2003 in Phoenix next week is invited to the 5th Annual Extreme Beowulf Bash. Yes, 5th already! As usual, Scyld continues it role as a founding sponsor for the traditional event. WHEN: Wednesday November 18th / 6-8pm WHERE: Hyatt Regency Hotel, Phoenix The Atrium SPONSORS: AMD Penguin Computing Scyld Other sponsors are welcome. Contact jcarrington at scyld.com. For updates and additions check back at http://www.beowulf.org/beowulf/bash in the days before the event! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From beowulf-announce-admin at scyld.com Fri Nov 14 08:10:02 2003 From: beowulf-announce-admin at scyld.com (beowulf-announce-admin at scyld.com) Date: Fri, 14 Nov 2003 08:10:02 -0500 Subject: Your message to Beowulf-announce awaits moderator approval Message-ID: <200311141310.hAEDA2S11540@NewBlue.scyld.com> Your mail to 'Beowulf-announce' with the subject ClusterWorld Is being held until the list moderator can review it for approval. The reason it is being held: Post to moderated list Either the message will get posted to the list, or you will receive notification of the moderator's decision. From deadline at linux-mag.com Fri Nov 14 08:47:14 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 14 Nov 2003 08:47:14 -0500 (EST) Subject: ClusterWorld In-Reply-To: Message-ID: Hi everyone, I am happy to announce ClusterWorld a new magazine about clusters. If you are going to SC2003 come see us at booth 637 for a free copy and sign up for three free issues. You can also check it out on-line right now at: http://www.clusterworld.com Did I mention the four node Microway AMD Opteron cluster we are giving away. Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Fri Nov 14 10:03:20 2003 From: david.n.lombard at intel.com (Lombard, David N) Date: Fri, 14 Nov 2003 07:03:20 -0800 Subject: Scyld Nodes Freezing w/ SMP (fwd) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4BF1C@orsmsx402.jf.intel.com> From: Alvin Oga [deletia] > > if the latest kernel has no effect, than there's some other > serious hw problems ... timing issues ?? > - make sure the kernel is compiled for athlon and not p4 > and smp enabled > > - memory clock speeds, marginal memeory sticks > ( get rid of generic no-name-brand memory sticks > - swap memory sticks and see if the problem > follow the memory ( keep good track of it > so you can easily identify it if all the memory > was thrown on the floor all at the same time > Instead of guessing, try memtest86 at http://www.memtest86.com/memtest86-3.0.tar.gz The easiest way to use it is download the package, copy the pre-built binary to a floppy, and boot the node with the floppy. It will run until you stop it. This should point to a memory problem if it exists. Make sure you read the docs, as there are some Athlon-specific comments IIRC. -- David N. Lombard My comments represent my opinions, not those of Intel. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Nov 14 11:11:42 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Fri, 14 Nov 2003 11:11:42 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <1068826302.28176.22.camel@roughneck.liniac.upenn.edu> On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > Hi, > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. Does it work when using Lam/MPI ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jim at ks.uiuc.edu Fri Nov 14 10:58:10 2003 From: jim at ks.uiuc.edu (Jim Phillips) Date: Fri, 14 Nov 2003 09:58:10 -0600 (CST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: Message-ID: Hi, This is very similar to problems we're seeing on our dual Athlon MP 2600+ cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. -Jim On Thu, 13 Nov 2003, Timothy R. Whitcomb wrote: > We are having an ongoing issue with our compute cluster, running Scyld > 28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute > nodes and 1 master node. We are running the Navy's weather model. > > The problem: > The model runs fine when run on 4 processors (1 on each compute node). > However, when I use the SMP capabilities of the machine and try to run on, > say, 8 processors (using both CPUs on each compute node), everything will > run fine for a while. Then, at a non-consistent time, a node will > invariably freeze up. The cluster loses its connection to the > node and I cannot communicate with it using any of the cluster tools - > sometimes it will automatically reboot, but usually it requires me to go > perform a hard reset on the node. > > However, I have found that in most cases if I run 2 jobs in parallel (i.e. > 2 4-cpu processes, each using only 1 CPU on each node) things seem to work > fine. Nodes may still freeze from time to time but not nearly as often. > > The hardware: > The cluster was obtained pre-built from PSSC LabsEach compute node is a > dual-processor Tyan MB with 2 Athlon MP CPUS. They > are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 > Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We > are using the BeoMPI 1.0.7 implementation of MPICH compiled with: > --with-device=ch_p4 --with-comm=bproc > (note that I had to recompile BeoMPI with the PGI compiler to get it to > work with the model) > Again, we use Scyld Beowulf 28cz4 for the operating system > uname -a gives > Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 > i686 unknown > > _Please_ help if you have _any_ suggestions whatsoever. I am at the end > of my rope, and this is presenting a serious impediment to our research! > If you need more information, let me know and I will be happy to provide > it! > > Thanks... > > Tim Whitcomb > Meteorologist > University of Washington Applied Physics Lab > twhitcomb at apl.washington.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jim at ks.uiuc.edu Fri Nov 14 11:19:28 2003 From: jim at ks.uiuc.edu (Jim Phillips) Date: Fri, 14 Nov 2003 10:19:28 -0600 (CST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: <1068826302.28176.22.camel@roughneck.liniac.upenn.edu> Message-ID: On Fri, 14 Nov 2003, Nicholas Henke wrote: > On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > > Hi, > > > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > Does it work when using Lam/MPI ? I don't know. We run (and write) NAMD using direct TCP sockets. Is Lam bproc-compatible now? Do you have some theory that the sockets code is somehow responsible (other than simply stressing the machine)? -Jim _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Nov 14 11:31:57 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Fri, 14 Nov 2003 11:31:57 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <1068827516.28176.24.camel@roughneck.liniac.upenn.edu> On Fri, 2003-11-14 at 11:19, Jim Phillips wrote: > On Fri, 14 Nov 2003, Nicholas Henke wrote: > > > On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > > > Hi, > > > > > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > > > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > > > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > > > Does it work when using Lam/MPI ? > > I don't know. We run (and write) NAMD using direct TCP sockets. Is Lam > bproc-compatible now? Do you have some theory that the sockets code is > somehow responsible (other than simply stressing the machine)? The 7.X tree of lam is bproc 'aware', and as far as why I would try it -- just another datapoint. It may help to isolate what exactly is causing the hangs. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Fri Nov 14 12:29:52 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Fri, 14 Nov 2003 12:29:52 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <3FB51110.9060002@scalableinformatics.com> Hi Jim: I commented to Timothy offline that I am seeing stability problems on my customers machines based upon Tyan 2466 MB's. Some success via MB replacement (after isolating subsystems through memory tests/exchange, IO loads, net loads,...). Some were CPU replacement, the CPUs seemed to be burned. Failure was very difficult to isolate, lots of symptoms, few were repeatable. Joe Jim Phillips wrote: > Hi, > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > -Jim > > > On Thu, 13 Nov 2003, Timothy R. Whitcomb wrote: > > >>We are having an ongoing issue with our compute cluster, running Scyld >>28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute >>nodes and 1 master node. We are running the Navy's weather model. >> >>The problem: >>The model runs fine when run on 4 processors (1 on each compute node). >>However, when I use the SMP capabilities of the machine and try to run on, >>say, 8 processors (using both CPUs on each compute node), everything will >>run fine for a while. Then, at a non-consistent time, a node will >>invariably freeze up. The cluster loses its connection to the >>node and I cannot communicate with it using any of the cluster tools - >>sometimes it will automatically reboot, but usually it requires me to go >>perform a hard reset on the node. >> >>However, I have found that in most cases if I run 2 jobs in parallel (i.e. >>2 4-cpu processes, each using only 1 CPU on each node) things seem to work >>fine. Nodes may still freeze from time to time but not nearly as often. >> >>The hardware: >>The cluster was obtained pre-built from PSSC LabsEach compute node is a >>dual-processor Tyan MB with 2 Athlon MP CPUS. They >>are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 >>Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We >>are using the BeoMPI 1.0.7 implementation of MPICH compiled with: >>--with-device=ch_p4 --with-comm=bproc >>(note that I had to recompile BeoMPI with the PGI compiler to get it to >>work with the model) >>Again, we use Scyld Beowulf 28cz4 for the operating system >>uname -a gives >>Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 >>i686 unknown >> >>_Please_ help if you have _any_ suggestions whatsoever. I am at the end >>of my rope, and this is presenting a serious impediment to our research! >>If you need more information, let me know and I will be happy to provide >>it! >> >>Thanks... >> >>Tim Whitcomb >>Meteorologist >>University of Washington Applied Physics Lab >>twhitcomb at apl.washington.edu >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Nov 14 12:00:49 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Fri, 14 Nov 2003 12:00:49 -0500 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: <1068827516.28176.24.camel@roughneck.liniac.upenn.edu> References: <1068827516.28176.24.camel@roughneck.liniac.upenn.edu> Message-ID: <1068829249.28174.26.camel@roughneck.liniac.upenn.edu> On Fri, 2003-11-14 at 11:31, Nicholas Henke wrote: > On Fri, 2003-11-14 at 11:19, Jim Phillips wrote: > > On Fri, 14 Nov 2003, Nicholas Henke wrote: > > > > > On Fri, 2003-11-14 at 10:58, Jim Phillips wrote: > > > > Hi, > > > > > > > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > > > > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > > > > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. > > > > > > Does it work when using Lam/MPI ? > > > > I don't know. We run (and write) NAMD using direct TCP sockets. Is Lam > > bproc-compatible now? Do you have some theory that the sockets code is > > somehow responsible (other than simply stressing the machine)? > > The 7.X tree of lam is bproc 'aware', and as far as why I would try it > -- just another datapoint. It may help to isolate what exactly is > causing the hangs. BTW -- what version of bproc + kernel are you running ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Fri Nov 14 14:02:11 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Fri, 14 Nov 2003 11:02:11 -0800 Subject: limiting cpu usage on grid engine Message-ID: <3FB526B3.2020509@cert.ucr.edu> Hi, I was wondering if anyone would know an easy way to limit the amount of cpu's a user can request when they run an mpich job under grid engine? I was hoping to set the limit to around 16, and if a user requests more than that, then I'd like their job to be rejected. Thanks in advance, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nfaerber at penguincomputing.com Fri Nov 14 12:42:57 2003 From: nfaerber at penguincomputing.com (Nate Faerber) Date: 14 Nov 2003 09:42:57 -0800 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: References: Message-ID: <1068831777.12797.47.camel@m10.penguincomputing.com> > The problem: > The model runs fine when run on 4 processors (1 on each compute node). > However, when I use the SMP capabilities of the machine and try to run on, > say, 8 processors (using both CPUs on each compute node), everything will > run fine for a while. Then, at a non-consistent time, a node will > invariably freeze up. The cluster loses its connection to the > node and I cannot communicate with it using any of the cluster tools - > sometimes it will automatically reboot, but usually it requires me to go > perform a hard reset on the node. > > However, I have found that in most cases if I run 2 jobs in parallel (i.e. > 2 4-cpu processes, each using only 1 CPU on each node) things seem to work > fine. Nodes may still freeze from time to time but not nearly as often. Do you have experience running this software on other clusters with SMP? I have seen a software package that did not perform well (or properly) on SMP systems. He have a customer that could only run one process per system. This limitation was a known to the software vendor and customer. It may not be the case now for that piece of software if it has matured since then. -- Nate Faerber, Engineer Tel: 415-358-2666 Fax: 415-358-2646 Toll Free: 888-PENGUIN PENGUIN COMPUTING www.penguincomputing.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajt at rri.sari.ac.uk Fri Nov 14 12:30:01 2003 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri, 14 Nov 2003 17:30:01 +0000 Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC Message-ID: <3FB51119.2070506@rri.sari.ac.uk> I've encountered problems using multiple 3C905CX-TXM NIC's in MSI KT3/4 AMD motherboards using the Scyld 3c59x kernel driver module with a Linux 2.4.22-openmosix1 kernel under Red Hat 8.0 Linux (updated by apt-get from the Fedora Linux rmp repository). Installing a single NIC is detected by kudzu, and it works correctly. Sometimes, installing a second NIC works, but sometimes it compromises the first NIC, sometimes the second NIC works. Similarly with a third NIC... The puzzling thing is that by selecting cards from a 'pool' of NIC's I bought for the cluster I can eventually get three that *will* work together. All these NIC's are brand new. The NIC's are connected to a Cisco switch and auto-negotiate with it. Using netdiag 'vortex-diag' I can verify that the NIC's are installed correctly and I can reset them using 'mii-diag -R' after which they re-negotiate with the switch and work correctly, but they do not work again from a cold boot (power off/on and reboot) without being reset like this manually with mii-diag. I have an MSI KT4V-L, KT4AV-L and 23 KT3Ultra-2 motherboards - I've not tested them exhaustively (all are brand new) but this problem appears to be common to all of them. I wonder if anyone else has a similar problem? [I put multiple NIC's in Gigabyte GA-7ZXE motherboards with the same 3c59x driver+kernel on my original openMosix cluster but I didn't have problems like this!]. The present configuration of NIC's in my motherboards works correctly from a cold boot, but not with just 'any' 3C905CX-TXM NIC's fitted: KT4V-L (head node of 8-node 'bobcat' cluster 7 at GA-7ZXE's) eth0 3COM 3C905CX eth1 3COM 3C905CX eth2 3COM 3C905CX eth3 VIA "Rhine" on-board LAN adapter (disabled) KT4AV-L (head node of 24-node 'topcat' cluster 23 at KT3Ultra-2) eth0 3COM 3C905CX eth1 3COM 3C2000-T eth2 VIA "Rhine" on-board LAN adapter Although I've 'solved' the problem for the two head nodes, I've got another 23 systems to configure and I'd like to know if there is a work-around for this problem that will let me install 'any' NIC? Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Nov 14 15:06:23 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 14 Nov 2003 15:06:23 -0500 (EST) Subject: Beowulf Bash update: date is Wednesday 19th, 2003 at SC2003 Message-ID: My cut-n-paste from last year's text turned into a snarf-n-barf when I missed changing the date. The web site contains the accurate information, and has just been updated: http://www.beowulf.org/beowulf/bash Our links to pictures of past Bashes have faded with time. If anyone has on-line copies, let me know so that I can add them to the page. ________________ 5th Annual Extreme Beowulf Bash. WHEN: Wednesday November 19th / 6-8pm WHERE: Hyatt Regency Hotel, Phoenix The Atrium SPONSORS: AMD Penguin Computing Scyld Other sponsors are welcome. Contact jcarrington at scyld.com. For updates and additions check back at http://www.beowulf.org/beowulf/bash in the days before the event! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Nov 14 16:11:06 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 14 Nov 2003 13:11:06 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) - memory In-Reply-To: Message-ID: hi ya jim On Fri, 14 Nov 2003, Jim Phillips wrote: > Hi, > > This is very similar to problems we're seeing on our dual Athlon MP 2600+ > cluster with Gigabyte GA-7DPXDW+ motherboards, Intel PRO/1000 MT Server > network cards, and Clustermatic 3 (on Red Hat 8). No solution, though. cpu/nic is very specific how about specific manufacturer of the memory ? ( and it does make a very big difference ) - mushkin, corsair, kingston would be amongst the first few vendors that we would start from - lei, century(?), couple other is the next tier - i would not use any of the rest of the vendors in any dual-cpu systems and more importantly, do you know if they used "brand new memory" or recycled memory from other dead/randomly crashing systems ( returned parts ) - systems usually worked, solved itself, when i bought brand new memory from the distributor or a pc store i trusted would sell me new parts vs returned parts thanx alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Nov 14 16:00:46 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 14 Nov 2003 13:00:46 -0800 (PST) Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4BF1C@orsmsx402.jf.intel.com> Message-ID: hi ya On Fri, 14 Nov 2003, Lombard, David N wrote: > From: Alvin Oga > [deletia] > > > > if the latest kernel has no effect, than there's some other > > serious hw problems ... timing issues ?? > > - make sure the kernel is compiled for athlon and not p4 > > and smp enabled > > > > - memory clock speeds, marginal memeory sticks > > ( get rid of generic no-name-brand memory sticks > > - swap memory sticks and see if the problem > > follow the memory ( keep good track of it > > so you can easily identify it if all the memory > > was thrown on the floor all at the same time > > > Instead of guessing, try memtest86 at > http://www.memtest86.com/memtest86-3.0.tar.gz i have yet to see memtest find a failure thats real or pass the memory that works in a given system ... where you know the system crashes and yet moving the mem stick to another system give you identical failures or passes the "application running tests" other memory testors http://www.Linux-1U.net/Diags/#Mem http://www.Linux-1U.net/Memory/#Test > The easiest way to use it is download the package, copy the pre-built > binary to a floppy, and boot the node with the floppy. It will run > until you stop it. > > This should point to a memory problem if it exists. Make sure you read > the docs, as there are some Athlon-specific comments IIRC. have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Nov 14 18:55:36 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sat, 15 Nov 2003 00:55:36 +0100 (CET) Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC In-Reply-To: <3FB51119.2070506@rri.sari.ac.uk> Message-ID: On Fri, 14 Nov 2003, Tony Travis wrote: > 2.4.22-openmosix1 kernel under Red Hat 8.0 Linux (updated by apt-get > from the Fedora Linux rmp repository). It's not clear what part of Fedora you are using now. Are you using kudzu from Fedora ? It appears to create some problems with 3C905 cards; there are some bug reports in Red Hat's Bugzilla, but so far nothing concludent. The only "solution" is to disable kudzu... > Installing a single NIC is detected by kudzu, and it works correctly. You can try deactivating kudzu ("chkconfig kudzu off") and run it manually only when adding cards. > The puzzling thing is that by selecting cards from a 'pool' of NIC's I > bought for the cluster I can eventually get three that *will* work > together. That's interesting. And after you find these 3 cards that work together they will _always_ work even after reboot > I have an MSI KT4V-L, KT4AV-L and 23 KT3Ultra-2 motherboards - I've not > ... > [I put multiple NIC's in Gigabyte GA-7ZXE motherboards with the same My guess is that it's something related to ACPI. GA-7ZXE didn't have support it. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajt at rri.sari.ac.uk Sat Nov 15 11:46:54 2003 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Sat, 15 Nov 2003 16:46:54 +0000 Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC In-Reply-To: References: Message-ID: <3FB6587E.6020209@rri.sari.ac.uk> Bogdan Costescu wrote: > [...] > It's not clear what part of Fedora you are using now. Are you using kudzu > from Fedora ? It appears to create some problems with 3C905 cards; there > are some bug reports in Red Hat's Bugzilla, but so far nothing concludent. > The only "solution" is to disable kudzu... Hello, Bogdan. I originally installed RH8.0 from the 'Psyche' iso distribution, then updated it periodically from RHN. Recently, I installed apt-get from the Fedora RH8.0 repository. I now update and upgrade from there instead. >>Installing a single NIC is detected by kudzu, and it works correctly. > > > You can try deactivating kudzu ("chkconfig kudzu off") and run it manually > only when adding cards. Tried that - makes no difference: These 3C905CX NIC's are failing to auto-negotiate with the Cisco switch at a low level, not failing to be detected and installed by kudzu. >[...] > That's interesting. And after you find these 3 cards that work together > they will _always_ work even after reboot Yes, that's right once I have a 'set' of NIC's that work, they continue to work reliably. Even, NIC's that don't initialise correcly on a cold boot will re-negotiate and connect properly if the tranceiver is reset manually using "mii-diag -R". >>I have an MSI KT4V-L, KT4AV-L and 23 KT3Ultra-2 motherboards - I've not >>... >>[I put multiple NIC's in Gigabyte GA-7ZXE motherboards with the same > > > My guess is that it's something related to ACPI. GA-7ZXE didn't have > support it. We spent quite a while deciding which boards to use: MSI are recommended by AMD. I've no other complaint about the boards. I'm not sure what level of auto-configuration the NIC's are capable of at PC BIOS level. I'm using the 3COM 3c2000 Linux 2.4 driver for the 3C2000-T NIC and it doesn't negotiate with the switch until the Linux driver is loaded. The 3C905CX's appear to wake up as soon as the ATX PSU AC is powered on. The status LED is green, which indicates 10Base-T and the NIC's appear as 10Base-T on the Cisco switch display panel. The NIC's negotiate with the Cisco switch as soon as the motherboard power switch is pressed. NIC's that fail to auto-negotiate end up with a flashing amber LED. A steady amber LED is present on NIC's that work. This all happens before GRUB boots the system (i.e. it is done at BIOS level). When Linux boots, the cards are all seen but, as I described, sometimes they don't work. The cards appear to be started by the Linux kernel 'OK' and can be seen by ifconfig but if they have a flashing amber LED, they will not work until manually reset using "mii-diag -R". I thought the 3c59x driver would do something similar to initialise the NIC's instead of relying on the BIOS to do it or have I misunderstood the problem? Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From larry at pssclabs.com Sat Nov 15 14:01:53 2003 From: larry at pssclabs.com (Larry Lesser) Date: Sat, 15 Nov 2003 11:01:53 -0800 Subject: Scyld Nodes Freezing w/ SMP (fwd) In-Reply-To: Message-ID: <5.1.0.14.2.20031115110107.035e5698@mail.pssclabs.com> Tim: As this cluster has been running for over a year without any crashes, I would suspect that the hardware is fine. In general, the Tyan 2466 supports SMP applications fairly well. We have installed many Beowulfs using the Tyan 2466 without any SMP issues. However, most customers use Redhat. Have you tried running the model with both processors only on the head node ? If that fails, you may want to install a current version of Red Hat and see if that works better. Larry At 05:51 PM 11/13/2003 -0800, you wrote: >We are having an ongoing issue with our compute cluster, running Scyld >28cz4. It's a 5-node cluster (each node is dual-processor) with 4 compute >nodes and 1 master node. We are running the Navy's weather model. > >The problem: >The model runs fine when run on 4 processors (1 on each compute node). >However, when I use the SMP capabilities of the machine and try to run on, >say, 8 processors (using both CPUs on each compute node), everything will >run fine for a while. Then, at a non-consistent time, a node will >invariably freeze up. The cluster loses its connection to the >node and I cannot communicate with it using any of the cluster tools - >sometimes it will automatically reboot, but usually it requires me to go >perform a hard reset on the node. > >However, I have found that in most cases if I run 2 jobs in parallel (i.e. >2 4-cpu processes, each using only 1 CPU on each node) things seem to work >fine. Nodes may still freeze from time to time but not nearly as often. > >The hardware: >The cluster was obtained pre-built from PSSC LabsEach compute node is a >dual-processor Tyan MB with 2 Athlon MP CPUS. They >are also equipped with 2 on-board NICs (lspci gives them as 3com 3c982 >Dual Port Server Cyclon rev 78 and the 3c59x kernel driver is used). We >are using the BeoMPI 1.0.7 implementation of MPICH compiled with: >--with-device=ch_p4 --with-comm=bproc >(note that I had to recompile BeoMPI with the PGI compiler to get it to >work with the model) >Again, we use Scyld Beowulf 28cz4 for the operating system >uname -a gives >Linux nashi 2.4.17-0.18.18_Scyldsmp #1 SMP Thu Jul 11 19:26:54 EDT 2002 >i686 unknown > >_Please_ help if you have _any_ suggestions whatsoever. I am at the end >of my rope, and this is presenting a serious impediment to our research! >If you need more information, let me know and I will be happy to provide >it! > >Thanks... > >Tim Whitcomb >Meteorologist >University of Washington Applied Physics Lab >twhitcomb at apl.washington.edu >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf Larry Lesser 949-380-7288 www.pssclabs.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fredruopp at yahoo.com Sun Nov 16 10:44:01 2003 From: fredruopp at yahoo.com (Fred Ruopp) Date: Sun, 16 Nov 2003 07:44:01 -0800 (PST) Subject: Optimal SMP Stucture for Opteron Message-ID: <20031116154401.19933.qmail@web60309.mail.yahoo.com> In order to build a 16 to 32 processor Opteron machine without corporate resources; a high performance and economic approach would seem to be a cluster of Quad motherboards interconnected by infiniBand host channel adapters( a la SBS Technologies) or ,possibly, a less expensive RemoteDMA data transfer PCI card. This approach stems from the little I know of the Opteron memory model; it seems that the Opteron leans towards NUMA memory management in a SMP system with more than 8 CPU's.Many have opined that Opteron's current Hyper Transport bus becomes saturated with 8 CPU's on one board. The locality of each CPU's memory seems to fit a NUMA model best and, more so, as the number of CPU's rise. SGI's Altrix has an approach somewhat similar to this. One distinctive feature that SGI has added to the Linux kernel to empower their NUMA model is process affinity - linking a process to one (or a group) of CPU's. If someone with an intimate knowledge of NUMA could critique this general approach for a SMP Opteron system, I would appreciate it greatly. __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Sun Nov 16 11:11:26 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Sun, 16 Nov 2003 11:11:26 -0500 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <20031116154401.19933.qmail@web60309.mail.yahoo.com> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> Message-ID: <3FB7A1AE.5020307@comcast.net> Fred, I think the first question to answer, what do you want to do with the cluster? In other words, what are your applications? Also, what do you mean 'without corporate resources'? If you can start filling in the answers to these questions, it becomes a little easier to give advise (although that has never stopped me before) :) Let me embellish a little. If you application doesn't require much network bandwidth or if network latency is not important, then you can consider a slower, cheaper network such as FastE or GigE (GigE is pretty cheap for smaller systems right now or you can use the smaller GigE switches in some kind of tree arrangement). Even if bandwidth and latency is important to some degree you could also start looking at dual Opteron boxes instead of 4-way or 8-way boxes. This may get you the performance you need or may have a better price/performance ratio. One last comment. This next week is SC2003 so many of the regular posters to this list won't be posting much. So don't be surprised if you don't get many answers to your questions right away. However, in the meantime, I think I can safely say that if you start thinking about the answers to these first few questions, the more likely you are to get more concrete answers. Jeff P.S. And to all of you folks going to SC2003 while I sit at work sucking on the glass teet, green with envy, the lot of you are all bastards! Bastards I say! Where's the love? I need T-shirts! > In order to build a 16 to 32 processor Opteron >machine >without corporate resources; a high performance and >economic approach would seem to be a cluster of Quad >motherboards interconnected by infiniBand host channel >adapters( a la SBS Technologies) or ,possibly, a less >expensive RemoteDMA data transfer PCI card. > > This approach stems from the little I know of the >Opteron memory model; it seems that the Opteron leans >towards NUMA memory management in a SMP system with >more than 8 CPU's.Many have opined that Opteron's >current Hyper Transport bus becomes saturated with 8 >CPU's on one board. The locality of each CPU's memory >seems to fit a NUMA model best >and, more so, as the number of CPU's rise. > > SGI's Altrix has an approach somewhat similar to >this. One distinctive feature that SGI has added to >the Linux kernel to empower their NUMA model is >process affinity - linking a process to one (or a >group) of CPU's. > > If someone with an intimate knowledge of NUMA could >critique this general approach for a SMP Opteron >system, I would appreciate it greatly. > > > >__________________________________ >Do you Yahoo!? >Protect your identity with Yahoo! Mail AddressGuard >http://antispam.yahoo.com/whatsnewfree >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Nov 16 20:18:01 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Mon, 17 Nov 2003 09:18:01 +0800 (CST) Subject: top500 list (was: opteron VS Itanium 2) Message-ID: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> Sorry Greg, top500 list came out and you lost! BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops Apple BigMac is number 3, while the Opteron cluster is number 6. Also, the 1936-CPU IA64 cluster is at the 5th place, at 8.6 TFlops. http://www.top500.org/dlist/2003/11/ Andrew. > > This would place the Big Mac in the 3rd place on > > the top500 list > > Except that there are several other new large > clusters that will > likely place higher -- LANL announced a 2,048 cpu > Opteron cluster a > while back, and LLNL has something new, too, I > think. Comparing > yourself to the obsolete list in multiple press > releases isn't very clever. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Nov 17 02:01:11 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Sun, 16 Nov 2003 23:01:11 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> References: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> Message-ID: <20031117070111.GB18073@greglaptop.greghome.keyresearch.com> On Mon, Nov 17, 2003 at 09:18:01AM +0800, Andrew Wang wrote: > Sorry Greg, top500 list came out and you lost! 'tis true, I did lose. It is nice to see a bunch of new clusters in the top 10. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Nov 17 01:59:39 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Sun, 16 Nov 2003 22:59:39 -0800 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <3FB7A1AE.5020307@comcast.net> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> <3FB7A1AE.5020307@comcast.net> Message-ID: <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> On Sun, Nov 16, 2003 at 11:11:26AM -0500, Jeffrey B. Layton wrote: > P.S. And to all of you folks going to SC2003 while I sit at > work sucking on the glass teet, green with envy, the lot of you > are all bastards! Bastards I say! Where's the love? I need T-shirts! We'll drink a beer on your behalf, before returning home with piles and piles of loot. Unfortunately, Yotta Yotta is kaput, so no more of the cute fuzzy orange cubes. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Mon Nov 17 02:36:02 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Mon, 17 Nov 2003 01:36:02 -0600 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> <3FB7A1AE.5020307@comcast.net> <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> Message-ID: <3FB87A62.3000406@tamu.edu> and all the TAMU folks going are gonna snag the toys for themselves! I'm helping plan their Grid Network for the State and I didn't even get a lousy T-shirt! This work stuff is getting in the way of all the cool meetings. gerry Greg Lindahl wrote: > On Sun, Nov 16, 2003 at 11:11:26AM -0500, Jeffrey B. Layton wrote: > > >>P.S. And to all of you folks going to SC2003 while I sit at >>work sucking on the glass teet, green with envy, the lot of you >>are all bastards! Bastards I say! Where's the love? I need T-shirts! > > > We'll drink a beer on your behalf, before returning home with piles > and piles of loot. Unfortunately, Yotta Yotta is kaput, so no more of > the cute fuzzy orange cubes. > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 17 06:23:13 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 17 Nov 2003 12:23:13 +0100 (CET) Subject: ClusterWorld In-Reply-To: Message-ID: Three month trial only valid within the US :-( If I cried and pleaded, adn said I regularly bought Linux Magazine retail in Borders in the UK could I get a sample copy? Hope everyone is having a good time at SC. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 17 07:24:32 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 17 Nov 2003 07:24:32 -0500 (EST) Subject: Optimal SMP Stucture for Opteron In-Reply-To: <20031117065939.GA18073@greglaptop.greghome.keyresearch.com> Message-ID: On Sun, 16 Nov 2003, Greg Lindahl wrote: > On Sun, Nov 16, 2003 at 11:11:26AM -0500, Jeffrey B. Layton wrote: > > > P.S. And to all of you folks going to SC2003 while I sit at > > work sucking on the glass teet, green with envy, the lot of you > > are all bastards! Bastards I say! Where's the love? I need T-shirts! > > We'll drink a beer on your behalf, before returning home with piles > and piles of loot. Another reminder: the Beowulf Bash is Wednesday night, and we'll have some interesting things to announce. Jeff, if it makes you feel any better, T-shirts were way down last year. And for most of them you had to sit through a sales presentation. And we'll wait until after the show to gloat about how cool the swag was this year ;-> > Unfortunately, Yotta Yotta is kaput, so no more of > the cute fuzzy orange cubes. Doh! I _loved_ the sound clip they had! "Yotta Yotta" really fast and high pitched. My cubes disappeared as soon as they got back to the office. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Nov 17 08:15:34 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Mon, 17 Nov 2003 21:15:34 +0800 (CST) Subject: limiting cpu usage on grid engine Message-ID: <20031117131534.38200.qmail@web16807.mail.tpe.yahoo.com> You can send questions to the mailing lists hosted on the project homepage: http://gridengine.sunsource.net Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Mon Nov 17 10:20:49 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Mon, 17 Nov 2003 07:20:49 -0800 (PST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: Message-ID: <20031117152049.99076.qmail@web11406.mail.yahoo.com> --- Mark Hahn wrote: > and their yield is around 59%. not to mention the little bit > of missing ECC... I didn't follow, which "yield" are you refering to?? > I'd like very much to know the actual prices and discounts for Big > Mac. > it's a shame this isn't required for Top500... The price is around 1100*3000*(discount for edu) + cost interconnect 5.2M $ is not too far away from the "actual price"... Rayson __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 09:33:51 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 09:33:51 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117011801.14899.qmail@web16803.mail.tpe.yahoo.com> Message-ID: > Sorry Greg, top500 list came out and you lost! sigh. Apple bought a benchmark. does this make Apple products better? > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops OK, so we already know that their pricing disclosures are ah, "optimistic". (nodes probably cost >6k apiece, not half that) and their yield is around 59%. not to mention the little bit of missing ECC... actually, I'd love to know what their yield is if they had only 4GB per node - almost certainly lower. > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops yield is 72%. > Apple BigMac is number 3, while the Opteron cluster is > number 6. for the billionth time: rmax is just a matter of how much money you have. rmax/rpeak is the only part of top500 that matters. I'd like very much to know the actual prices and discounts for Big Mac. it's a shame this isn't required for Top500... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Mon Nov 17 10:28:10 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: Mon, 17 Nov 2003 09:28:10 -0600 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069082890.2659.69.camel@terra> On Mon, 2003-11-17 at 08:33, Mark Hahn wrote: > > Sorry Greg, top500 list came out and you lost! > > sigh. Apple bought a benchmark. does this make Apple products better? > Virginia bought a benchmark, not Apple. According to the reports, he paid list for them. Course he got extra goodies as a result of buying so many, but I suspect that behavior would occur with any vendor. BTW, Yes they are pretty damned good. Running them head to head against other machines, even without benchmarking heroics, they stand up quite well. One Amber benchmark that a colleague ran showed that a 2Ghz G5 was nearly twice as fast as a 2.8Ghz Xeon. Thats one datapoint, but other testing has shown that it continues to do pretty well. We can't do real apple-to-apples comparisons, no pun intended, because we only have gigabit on the G5's and things like Amber and Gromacs seem to run into that pretty quickly. > > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops > > OK, so we already know that their pricing disclosures are ah, > "optimistic". (nodes probably cost >6k apiece, not half that) > and their yield is around 59%. not to mention the little bit > of missing ECC... I'm sure that the infiniband figures into the discrepancy a bit. $3k is roughly the list for a stock box. > > actually, I'd love to know what their yield is if they had only > 4GB per node - almost certainly lower. > > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > yield is 72%. > > > Apple BigMac is number 3, while the Opteron cluster is > > number 6. > > for the billionth time: rmax is just a matter of how much money > you have. rmax/rpeak is the only part of top500 that matters. Perhaps for shallow people who put stock in such lists. What really matters is how much of that peak can be applied to actual computationally intense problems that the owner considers needing solved. It doesn't matter how fast the Ferrari can go, as the speed limit is still only 70. > > I'd like very much to know the actual prices and discounts for Big Mac. > it's a shame this isn't required for Top500... > Does Virginia have to include the 600-700 pizzas that they bought for the volunteers during the construction phase. ;-) Perhaps they should also have something like watts per gigaflop, or cubic feet occupied per gigaflop. -- -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 11:28:25 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 11:28:25 -0500 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB8F729.3080709@lmco.com> Mark Hahn wrote: > > Sorry Greg, top500 list came out and you lost! > > sigh. Apple bought a benchmark. does this make Apple products better? > > > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops > > OK, so we already know that their pricing disclosures are ah, > "optimistic". (nodes probably cost >6k apiece, not half that) > and their yield is around 59%. not to mention the little bit > of missing ECC... > I went through a straw-man on pricing at one time. Let me dig that up.... I'm seeing dual 2.0 G5 boxes with the Superdrive, 512 Megs of memory, and about 80 Gigs of storage for $3k (commercial pricing in single units). So, for laughs, let's assume $3k per box for the 4 Gigs of memory and the storage. Node cost: 1100 x $3k = $3.3 million For the IB network, I've been using $1500-$1600 per node based on quotes I've gotten from other companies. IB cost: 1100 x $1500 = $1.65 million For the Cisco network, I have no idea. Why anybody would use Cisco crap in a HPC system even for a management network is beyond me. So, let's just guess, $300k. Cisco network cost: $300k Total so far: $5.25 million Rack cost - again I have no idea. I'll be kind and gentle and guesstimate about $1500 a rack. It looks like they're getting about 12 nodes per rack, so assume 92 racks. Rack cost: $1500 x 92 = $138k Let's exclude the floor space, windows, pizzas, chillers, etc. and figure out the total: Total = $5.388 million I guess I'm not too far off. Personally I think the big unknown is the rack cost. That could be very expensive since it's specialized (although 92 racks in a single sale might be considered a commodity). Also, the Cisco costs could be high as well (Cisco never does anything that can't make money off of). This was just for laughs. I still think there is a sugar daddy somewhere in there. Be it Cisco, Apple, IBM, etc., there are some costs not being mentioned. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 11:50:14 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 11:50:14 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8F729.3080709@lmco.com> Message-ID: > I'm seeing dual 2.0 G5 boxes with the Superdrive, 512 Megs of > memory, and about 80 Gigs of storage for $3k (commercial > pricing in single units). So, for laughs, let's assume $3k per box > for the 4 Gigs of memory and the storage. I did the pricing too. afaikt, it's actually 8GB per node, so the price was just under $5900 list. I'd guess that the IB hardware is at least $1500 list. > Node cost: 1100 x $3k = $3.3 million around 8M list. > Total = $5.388 million it's a very good price, no doubt. it would be nice if Top500 would require full price disclosures - for instance, could I take a same-sized pile of cash to Apple/Mellanox and get the same discount? I doubt it. besides price, lack of ECC is a big question. how many other Top500 scores are ECC-less? does anyone know the FIT rate for dram nowadays? I figure BigMac has at either 7e4 or 1.4e5 dram chips... how much does it help your HPL score to run 4GB/cpu? I'd guess that most clusters are lighter than that. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Mon Nov 17 10:06:40 2003 From: ctierney at hpti.com (Craig Tierney) Date: 17 Nov 2003 08:06:40 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069081600.14602.57.camel@localhost.localdomain> On Mon, 2003-11-17 at 07:33, Mark Hahn wrote: > > Sorry Greg, top500 list came out and you lost! > > sigh. Apple bought a benchmark. does this make Apple products better? Dell bought a benchmark at NCSA (See Tungsten at #4). Does this mean that Dell knows clusters? I would be more than happy to have a vendor decide to let me have their stuff at significant discount so they can have a good benchmark. > > > BigMac G5 (aka "X") (2200 CPUs) - 10.3 TFlops > > OK, so we already know that their pricing disclosures are ah, > "optimistic". (nodes probably cost >6k apiece, not half that) > and their yield is around 59%. not to mention the little bit > of missing ECC... I am really curious to know why their are only getting 59% efficiency when Quadrics system with similar node counts is above 70%. We know it isn't an issue with the node speed, because the X cluster ran at 80% efficiency with 128 nodes. Whats going on with the Infiniband????? > > actually, I'd love to know what their yield is if they had only > 4GB per node - almost certainly lower. > > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > yield is 72%. How much was this system? You sure Linux Networx didn't 'buy' the business? It seems to be happening alot these days with large systems. Craig > > > Apple BigMac is number 3, while the Opteron cluster is > > number 6. > > for the billionth time: rmax is just a matter of how much money > you have. rmax/rpeak is the only part of top500 that matters. > Yes but besides to politicians and the people funding the systems, how relevant is the rmax/rpeak ratio? Craig > I'd like very much to know the actual prices and discounts for Big Mac. > it's a shame this isn't required for Top500... > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 11:54:42 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 11:54:42 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117152049.99076.qmail@web11406.mail.yahoo.com> Message-ID: > > and their yield is around 59%. not to mention the little bit > > of missing ECC... > > I didn't follow, which "yield" are you refering to?? rmax/rpeak. it's really the most interesting part of the list; the actual ranking is just a matter of funding. > > I'd like very much to know the actual prices and discounts for Big > > Mac. > > it's a shame this isn't required for Top500... > > The price is around 1100*3000*(discount for edu) + cost interconnect 3000 would be an impressive discount, since the list is around $5900. > 5.2M $ is not too far away from the "actual price"... it's massively discounted, and probably not repeatable. it would be nice to know the best-published prices, as well as what it would cost to "fix" the ECC. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Mon Nov 17 10:08:41 2003 From: david.n.lombard at intel.com (Lombard, David N) Date: Mon, 17 Nov 2003 07:08:41 -0800 Subject: FW: Scyld Nodes Freezing w/ SMP (fwd) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4BF2C@orsmsx402.jf.intel.com> From: Alvin Oga [mailto:alvin at Mail.Linux-Consulting.com] > > Instead of guessing, try memtest86 at > > http://www.memtest86.com/memtest86-3.0.tar.gz > > i have yet to see memtest find a failure thats real > or pass the memory that works > in a given system ... where you know the system crashes > > and yet moving the mem stick to another system give you > identical failures or passes the "application running tests" > Hmmm. My experiences were consistently different, i.e., consistently useful. In a previous job, I made it a standard boot option on installed systems so that customers could just boot right into it, including via PXE boot, and save us a diagnostic visit to only find out new memory was needed. > other memory testors s/testors/testers/ > http://www.Linux-1U.net/Diags/#Mem > http://www.Linux-1U.net/Memory/#Test Thanks for the pointers, I'll check them out! -- David N. Lombard My comments represent my opinons, not those of Intel. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 12:18:32 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 12:18:32 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8F729.3080709@lmco.com> Message-ID: On Mon, 17 Nov 2003, Jeff Layton wrote: > Let's exclude the floor space, windows, pizzas, chillers, etc. > and figure out the total: > > Total = $5.388 million > > I guess I'm not too far off. Personally I think the big unknown > is the rack cost. That could be very expensive since it's specialized > (although 92 racks in a single sale might be considered a commodity). > Also, the Cisco costs could be high as well (Cisco never does anything > that can't make money off of). Don't leave out the wiring and chillers if you're going to include the racks and cisco stuff -- 1100 nodes burning (at an as you say humorous but generous guess) 250 Watts apiece is, um, 275,000 watts. As in forgetting the capital costs of the chillers and wiring, just buying the power to run this puppy for a year will cost around $275K/year (more than the racks themselves). The cost for the chillers, blowers, transformers, and primary wiring infrastructure to actually move this power in and waste heat out of their space will likely add a pretty big chunk to the total. Perhaps 180 20 amp circuits? A chiller the size of a small destroyer? Their own nuclear power plant (just kidding:-)? So add another seven digit number to the above, at a guess...;-) The pizza I agree is free... rgb > > > This was just for laughs. I still think there is a sugar daddy > somewhere in there. Be it Cisco, Apple, IBM, etc., there are some > costs not being mentioned. > > Jeff > > > -- > Dr. Jeff Layton > Aerodynamics and CFD > Lockheed-Martin Aeronautical Company - Marietta > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Mon Nov 17 11:53:25 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 17 Nov 2003 09:53:25 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069081600.14602.57.camel@localhost.localdomain> References: <1069081600.14602.57.camel@localhost.localdomain> Message-ID: <1069088005.8428.1185.camel@thinkpad> > I am really curious to know why their are only > getting 59% efficiency when Quadrics system with > similar node counts is above 70%. We know it > isn't an issue with the node speed, because > the X cluster ran at 80% efficiency with 128 nodes. > > Whats going on with the Infiniband????? It's not just the Infiniband, a lot of it is the processor. Actually, it's the artificially inflated FLOPs of the processor. Everyone who believes that a G5 should be rated at 8 GFLOPs, please speak up... Multiply-adds are great for some stuff, but not everything... (LINPACK happens to be among the stuff it's pretty good for). Keith _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 12:01:38 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 12:01:38 -0500 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB8FEF2.9090009@lmco.com> Mark Hahn wrote: > > I'm seeing dual 2.0 G5 boxes with the Superdrive, 512 Megs of > > memory, and about 80 Gigs of storage for $3k (commercial > > pricing in single units). So, for laughs, let's assume $3k per box > > for the 4 Gigs of memory and the storage. > > I did the pricing too. afaikt, it's actually 8GB per node, > so the price was just under $5900 list. I'd guess that the IB > hardware is at least $1500 list. > From the pdf at the VTech website, it's 4 GB per node. (http://don.cc.vt.edu/tcfslides.pdf) > > Node cost: 1100 x $3k = $3.3 million > > around 8M list. > > > Total = $5.388 million > > it's a very good price, no doubt. it would be nice if Top500 > would require full price disclosures > It would be interesting. I'm sure a number of the clusters in there were 'bought' by the vendor. Of course, they make their money back by screwing you later on with really high maintenance fees. I know people like to get hardware for free (I've had arguments with people about this), but anytime it looks like a vendor is offering something or nothing, the little hairs on the back of my neck stand up and I reach over and grab my ankles. Of course, management seldoms sees things the same way. They are very short-term focused. :) > how much does it help your HPL score to run 4GB/cpu? I'd guess that most > clusters are lighter than that. > It depends. You can process more data per node, so you are cutting down on communications. But if I remember the rules correctly, you can pick whatever size problem you want. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 12:37:37 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 12:37:37 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069088211.8428.1193.camel@thinkpad> Message-ID: > > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > > > yield is 72%. > > BTW, 60% of 8 GF == 4.8 GF per processor. 72% of 4 GF == 2.88. If you > use LINPACK as a metric, why do you think the latter wins? because rmax/rpeak as being a sort of "balance-like" measure. it's also scale-invariant, to the first order at least. within the same category of hardware (say, desktop microprocessors and a premium but off-the-shelf interconnect), rmax/rpeak is interesting, since $/cpu are very roughly comparable. > > for the billionth time: rmax is just a matter of how much money > > you have. rmax/rpeak is the only part of top500 that matters. > > You have to include cost. I assume that if top500 reported prices, they'd be fairly wonky. for instance, in the DB domain, are $/TPC numbers all that useful? > Or, put another > way, a vendor would be better off building a slower processor with a > modern memory system that achieved 95% of peak. yes, you've just described a trad vector box. looking at rmax/rpeak is indeed a "vector super-ness" measure. > You can always put more > of them together with more money, right? right, which is why I want to somehow regress scale out of the measure. > (I'm not sure if I know of any > networks that scale to 100,000 processors). grid ;) > rmax/rpeak is just as bad (or worse) of a metric as rmax if it is the > only metric. It's not like LINPACk is terribly communication bound or > anything (in which case, rmax/rpeak might mean something). I wish I had a 1K CPU cluster with gigabit, Myri, Quadrics, *and* IB ;) squinting at top500, it looks like there is a fairly significant dependence of rmax/rpeak upon type of interconnect. that's quite interesting. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Mon Nov 17 11:56:51 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 17 Nov 2003 09:56:51 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069088211.8428.1193.camel@thinkpad> > > Lightning (LANL Opteron) (2816 CPUs) - 8.1 TFlops > > yield is 72%. BTW, 60% of 8 GF == 4.8 GF per processor. 72% of 4 GF == 2.88. If you use LINPACK as a metric, why do you think the latter wins? (note the if. I'm not suggesting LINPACK is the right benchmark.) > for the billionth time: rmax is just a matter of how much money > you have. rmax/rpeak is the only part of top500 that matters. You have to include cost. Otherwise, if I could buy a 1 PetaFLOP system that yielded 10% efficiency for the same price as a 100 TeraFLOP system that yielded 75% efficiency, I should buy the latter? Or, put another way, a vendor would be better off building a slower processor with a modern memory system that achieved 95% of peak. You can always put more of them together with more money, right? (I'm not sure if I know of any networks that scale to 100,000 processors). rmax/rpeak is just as bad (or worse) of a metric as rmax if it is the only metric. It's not like LINPACk is terribly communication bound or anything (in which case, rmax/rpeak might mean something). Keith _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 13:09:51 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 13:09:51 -0500 Subject: top500 list In-Reply-To: <3FB90E52.6020204@lmco.com> References: <3FB90E52.6020204@lmco.com> Message-ID: <3FB90EEF.80206@lmco.com> Of course, I forgot to adjust for the memory difference. According to Mark, the 4G dual boxes on the Apple store are $5349 each. Nodes: 1100 x $5349 = $5.884 million Which brings the total to: $7.972 Sorry about the confusion. Jeff > > > > > So, Mark's numbers are correct. So my 'adjusted' estimate is, > > Nodes: 1100 x $8k = $8.8 million > IB network: $1.65 million > Cisco Crap^h^h^h^hNetwork: $300k > Racks: $138k > > Total: $10.889 million > > Who's your Sugar Daddy VTech? > > > > Thanks! > > Jeff > > -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Nov 17 13:02:30 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 17 Nov 2003 13:02:30 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8FEF2.9090009@lmco.com> Message-ID: > From the pdf at the VTech website, it's 4 GB per node. > > (http://don.cc.vt.edu/tcfslides.pdf) hmm, you're right (I found 4GB/node in multiple sources). that means that the current list price is $5349, rather than $8k. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 13:07:14 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 13:07:14 -0500 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB90E52.6020204@lmco.com> Mark Hahn wrote: > > > > > I'd like very much to know the actual prices and discounts for > Big > > > > > Mac. > > > > > it's a shame this isn't required for Top500... > > > > > > > > The price is around 1100*3000*(discount for edu) + cost > interconnect > > > > > > 3000 would be an impressive discount, since the list is around $5900. > > > > > > > > http://www.microcenter.com/single_product_results.phtml?product_id=0161922 > > > when I view that page, it lists $3k for the dual 2.0 with 512M, > which is exactly what store.apple.com says. > > the VT config is with 8GB per box, which store.apple.com says will > list at $7949! wow, that has to be higher than last time I looked... > Wow! Apple is charging $5k to go to 8 Gigs! What a rip! It's just plain-jane DDR memory that you can get anywhere. It's not even ECC! If you don't mind, I'm cc-ing this to the beowulf list. So, Mark's numbers are correct. So my 'adjusted' estimate is, Nodes: 1100 x $8k = $8.8 million IB network: $1.65 million Cisco Crap^h^h^h^hNetwork: $300k Racks: $138k Total: $10.889 million Who's your Sugar Daddy VTech? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kdunder at sandia.gov Mon Nov 17 13:04:45 2003 From: kdunder at sandia.gov (Keith D. Underwood) Date: 17 Nov 2003 11:04:45 -0700 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <1069092285.8431.1295.camel@thinkpad> > because rmax/rpeak as being a sort of "balance-like" measure. > it's also scale-invariant, to the first order at least. > > within the same category of hardware (say, desktop microprocessors and a > premium but off-the-shelf interconnect), rmax/rpeak is interesting, > since $/cpu are very roughly comparable. But the rpeaks vary by a factor of 2 or more... > > You can always put more > > of them together with more money, right? > > right, which is why I want to somehow regress scale out of the measure. no - that was sarcasm. At 10,000 processors it is hard enough to build a box that will stay up long enough to do a useful amount of work with apps that run across all of the nodes. At 100,000 processors, today, it is pretty close to impossible. And that is IF you can get your app to scale that well. Big IF. (yes, monte carlo simulations can probably scale that high. Yes, you could probably build fault tolerant monte carlo simulations. Yes, it would be nice to run something other than monte carlo on the machine.) Keith _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 12:05:35 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 12:05:35 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069082890.2659.69.camel@terra> Message-ID: On Mon, 17 Nov 2003, Dean Johnson wrote: > Perhaps they should also have something like watts per gigaflop, or > cubic feet occupied per gigaflop. I actually think that it would be very interesting to plot watts/flop over time, for integrated systems (not just "the CPU", but CPU and whole box supporting). My personal theory is that it is actually decreasing, on average, because of interactions between Moore's Law scaling of CPU speed and CPU power consumption and because of the cost of feeding the REST of the system, which tends to hang nearly constant (effectively dividing it as the relative power of the CPU is increased). At any rate, as I run a 300 MHz Celeron side by side with a 2200 MHz Celeron at home, I don't think the 2200 MHz Celeron eats 7+ times the power. One day I'll liberate my kill-a-watt and take it home and find out. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 14:15:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 14:15:12 -0500 (EST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8F729.3080709@lmco.com> Message-ID: On Mon, 17 Nov 2003, Jeff Layton wrote: > Let's exclude the floor space, windows, pizzas, chillers, etc. > and figure out the total: > > Total = $5.388 million > > I guess I'm not too far off. Personally I think the big unknown > is the rack cost. That could be very expensive since it's specialized > (although 92 racks in a single sale might be considered a commodity). > Also, the Cisco costs could be high as well (Cisco never does anything > that can't make money off of). With 1100 dual CPU nodes drawing perhaps 250 Watts apiece, the room needs some 275 KW of capacity, maybe 180 20 amp circuits (assuming one can drive roughly six nodes per circuit). This costs ballpark estimate of $275,000/year just to feed and cool the nodes, more than the racks themselves. The capital cost of the circuits, transformers, space renovation, and the chillers required to drive this cluster would likely add another seven digit number to your estimate and is a lot less ignorable than the cost of the racks or network;-) Small nuclear power plant optional... Now the pizza cost, that can be ignored. However, the human cost is another "interesting" question. With 1100 systems running 24x7 under stress, I would expect to rack up system failures nearly every day after the cluster was roughly a year old and beyond. If operating system installation and administration scaled nearly perfectly (which with linux is not insanely impossible, but for a cluster this size e.g. pxe-automated installs are absolutely essential) one's ability to manage the cluster is likely limited by user support (which is beyond prediction, as it depends on task mix and expertise of user base) and hardware maintenance capacity. They also need proactive administration -- hot and cold running help for emergencies given the large productivity cost when the cluster is down. I'm going to guess that they have 5-6 full time people just to care for and feed the cluster and to sacrifice the odd chicken here and there. Maybe another $300K in salaries and benefits. So I'd go to over $6 million (maybe even over $7 million) total including infrastructure, with perhaps a $600-750K/year operating budget. > This was just for laughs. I still think there is a sugar daddy > somewhere in there. Be it Cisco, Apple, IBM, etc., there are some > costs not being mentioned. It >>does<< seem to be a lot of money for a cluster, doesn't it. Not exactly pocket change, or University startup money. DoD, DOE, NIH, perhaps, it seems a lot for NSF unless, as you suggest, there are corporate sponsors contributing. The other thing that always amuses me about clusters like this is the Moore's Law effect. They buy it this year, after spending a year (easily) preparing the site and building the requisite infrastructure. They operate it for three years (spending $2.25 million, say). In the meantime, node power at constant cost has increased by a factor of 4. If they invested their capital in bonds for those three years (including the operating budget), and bought that 4x faster node hardware, they would BREAK EVEN on the amount of work they get done by year four, and have saved three years operating expenses plus interest in addition to the interest on the entire capital amount for three years -- an easy $3+ million. To put it another way, it is bloody silly to take an N year budget and spend it all in year one on computing hardware, because compute capacity that can be purchased at constant cost grows exponentially while compute capacity that has been purchased AT fixed cost depreciates exponentially and has a rather high baseline operating cost. It also means that you >>really<< pay for a design error. If this enormous 1100 node cluster, designed and purchased all at once, has any design flaw with a repair cost that scales like the number of nodes, it would be ruinous. If one had only bought (say) 1/4 of the nodes in year one, 1/4 more in year two, 1/4 more in year three, and 1/4 more in year 4, one would get roughly: 4 years @ 0.25 capacity +3 years @ 0.40 capacity +2 years @ 0.63 capacity +1 year @ 1.00 capacity ========================== 4.46 capacity-years (assuming an 18 month ML doubling time) and would have numerous opportunities to repair design flaws at minimal cost and to exploit special deals and opportunities that exceed this "average" performance. Sigh, rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Mon Nov 17 14:28:42 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Mon, 17 Nov 2003 14:28:42 -0500 (EST) Subject: ClusterWorld In-Reply-To: Message-ID: On Mon, 17 Nov 2003, John Hearns wrote: > Three month trial only valid within the US :-( Sorry. > > If I cried and pleaded, adn said I regularly bought Linux Magazine > retail in Borders in the UK could I get a sample copy? Unfortunately, I do not think CW will show up at Borders or Barnes and Noble unless the general public gets really excited about clusters. Who knows. > > Hope everyone is having a good time at SC. Do you know anyone who is going? Have them come by and get you a copy. Doug > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Nov 17 13:59:21 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 17 Nov 2003 10:59:21 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <3FB8FEF2.9090009@lmco.com> References: < Message-ID: <5.2.0.9.2.20031117104002.018c6618@mailhost4.jpl.nasa.gov> At 12:01 PM 11/17/2003 -0500, Jeff Layton wrote: >Mark Hahn wrote: > >> > Node cost: 1100 x $3k = $3.3 million >> >>around 8M list. >> >> > Total = $5.388 million >> >>it's a very good price, no doubt. it would be nice if Top500 >>would require full price disclosures > > It would be interesting. I'm sure a number of the clusters >in there were 'bought' by the vendor. Of course, they make >their money back by screwing you later on with really >high maintenance fees. Also, bear in mind that the apparent cost of the node, to the manufacturer, is somewhat less than it would be to the eventual retail consumer, even for volume purchases. Depending on how the mfr does their accounting, the actual "cost" (as in, bottom line effect) of the node being provided gratis to an educational instutition may be quite low, because it may not have things like an apportionment of marketing and distribution costs. On the other hand, the mfr can probably claim a "retail value of $X Million" for their tax deduction (subject to some restrictions.. you can't claim costs you didn't actually incur). Also, consider that if a company like Dell or Apple spends, say, 10% of their budget on sales and marketing (Apple, for instance, spent 898M on "selling, general, and administrative" costs on $4,492M in sales), that a few million dollars in computers isn't a huge advertising expense (compared to buying ads... Web portals typically get $20-30/CPM (CPM=cost per thousand (views/impressions)) A full page color ad in a Sunday Paper with a circulation of several hundred thousand might be $100/CPM I don't have rate cards in front of me, but I found some information on the web (of course!) Printed PC Magazine, International edition, 4 color ad page for 2 wk period ($52K) which works out to $56/CPM Printed Wired Magazine, full page, $51/CPM Compare this to the "free publicity" from getting your cluster on the list, and featured in some articles. Giving away a million bucks worth of computers might actually be a better deal than buying a million bucks worth of ads. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Mon Nov 17 14:37:38 2003 From: canon at nersc.gov (canon at nersc.gov) Date: Mon, 17 Nov 2003 11:37:38 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: Message from Mark Hahn of "Mon, 17 Nov 2003 12:37:37 EST." Message-ID: <200311171937.hAHJbcfr011705@pookie.nersc.gov> I think the Big Mac guys deserve snaps for pulling this system off. The VA Tech guys accomplished a real feat and I suspect they worked their collective butts off to do it. Who would have predicted a Mac based cluster to be in the top 5? Not me. I still suspect this is an anomaly. I don't think we are going to see a bunch of Mac based clusters breaking into the list next year. Which begs the question "Why not?" My feeling is when you build a system that large, you want to know you can get real work done with it. That's where committed vendors and large user communities become important. At this point, Big Mac is a one-of-a-kind. The Apple crowd has never even looked at HPC before this (probably because its typically a money loser). Meanwhile there seems to be a growing community of people that want to use the Opteron for HPC. That's why I expect we will see more Opteron clusters over time. But hey, maybe Big Mac will make people look at the Apple stuff more closely. There still seems to be a lot of missing pieces though (parallel debuggers, profiling tools, libraries, etc). The long term measure for the Big Mac is to see how well they can use the system, especially for generic codes. Regarding the top500, I see the point of the top500 as being a ranking of capability of various machines. Unfortunately, its difficult to come up with a benchmark that accurately measures capability that is super portable and easy to run. Personally I don't think LinPACK should necessarily be that code, but at least it forces people to run a consistent problem across the entries. I think adding costs would be interesting since any real purchase has to take this into consideration, but it would be more for comparison purposes and not ranking. NERSC(#9) has used (sustained performance)/$, where the sustained performance is calculated from a collection of standard codes used by the NERSC community. I think this approach has served us well, but it can be challenging to get apples to apples comparison when you are talking about projecting the performance of codes to large scales. Each vendor does the projection their own way and its tricky to know how much to believe, especially if its on a non-existent hardware or at untested scales. My true measure for the top500 would be the value of the science (or work) accomplished with it, a difficult to impossible thing to determine. NERSC's puts all the emphasis on the science. This means considering: how usable the system is; how hard is it to harness the full capability of the system; what will the sustained performance be. Then we try to squeeze every cycle out of the system. We've ran Seaborg (#9) with +90% utilization for years now. We've gotten tons of science done with it, just like we did the T3E before it. It can be a little disappointing to watch your system slide down the rankings, when you know its still being used to do great stuff and its still making a large impact. But I guess that's just the nature of Moore's law. I think this years top 500 raises all sorts of interesting questions. How will the X-1 evolve? Will Opteron systems become a big player? What about Itanium? Will the Blue Gene based systems make an impact? Its certainly more interesting than a few years ago where there were just a handful of vendors and no clear direction where things were heading. --Shane Disclaimer: These statements represent my own opinions and not those of NERSC. ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Mon Nov 17 13:50:19 2003 From: agrajag at dragaera.net (Sean Dilda) Date: Mon, 17 Nov 2003 13:50:19 -0500 Subject: top500 list In-Reply-To: <3FB90EEF.80206@lmco.com>; from jeffrey.b.layton@lmco.com on Mon, Nov 17, 2003 at 01:09:51PM -0500 References: <3FB90E52.6020204@lmco.com> <3FB90EEF.80206@lmco.com> Message-ID: <20031117135019.A30183@vallista.dragaera.net> On Mon, 17 Nov 2003, Jeff Layton wrote: > > Of course, I forgot to adjust for the memory difference. > According to Mark, the 4G dual boxes on the Apple store > are $5349 each. > > Nodes: 1100 x $5349 = $5.884 million > > Which brings the total to: $7.972 Wow! That's over $2k/node for just a few Gig of RAM. There's also a chance that they didn't buy the RAM from Apple. If you look at their pictures site (http://don.cc.vt.edu/g5modify/) you can see that they in fact modified all of the boxes. I didn't see it say what was modified, but lets assume they added RAM. Now if you go look at crucial (http://www.crucial.com/store/listparts.asp?Mfr%2BProductline=Apple%2BPower+Mac&mfr=Apple&cat=RAM&model=Power+Mac+G5+%28Dual+2.0GHz+DDR%29&submit=Go) you find they can get a 512M stick of ram for $93.99 Even if they replaced all th ram that apple sent them with new RAM, that's only around $752/node for the RAM, not $2349. So, 1100 * $3752 = $4.127 million, and the total up to $6.215 million (using your numbers). Not as low as your first numbers, but not has high as your new ones. And there are some other adjustments that could be made. Like the racks for instance. I have heard of big name vendors throwing in racks with large purchases, especially for repeat customers. So, its possible that the racks and maybe some other stuff were given away by Apple. They may not give the same deal to everyone, but I imagine anyone buying that many machines at once can talk the sales rep into wheeling and dealing quite a bit. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Nov 17 13:23:56 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Mon, 17 Nov 2003 10:23:56 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <1069082890.2659.69.camel@terra> References: <1069082890.2659.69.camel@terra> Message-ID: <20031117182356.GA19831@greglaptop.greghome.keyresearch.com> On Mon, Nov 17, 2003 at 09:28:10AM -0600, Dean Johnson wrote: > Virginia bought a benchmark, not Apple. "Virginia" is the University of Virginia. Virginia Tech is that *other* school. -- greg, alumnus of the real thing _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 17 13:58:54 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 17 Nov 2003 13:58:54 -0500 Subject: top500 list In-Reply-To: <20031117135019.A30183@vallista.dragaera.net> References: <20031117135019.A30183@vallista.dragaera.net> Message-ID: <3FB91A6E.2060703@lmco.com> Sean Dilda wrote: > On Mon, 17 Nov 2003, Jeff Layton wrote: > > > > > Of course, I forgot to adjust for the memory difference. > > According to Mark, the 4G dual boxes on the Apple store > > are $5349 each. > > > > Nodes: 1100 x $5349 = $5.884 million > > > > Which brings the total to: $7.972 > > Wow! That's over $2k/node for just a few Gig of RAM. There's also a > chance that they didn't buy the RAM from Apple. If you look at their > pictures site (http://don.cc.vt.edu/g5modify/) you can see that they in > fact modified all of the boxes. I didn't see it say what was modified, > but lets assume they added RAM. Now if you go look at crucial > (http://www.crucial.com/store/listparts.asp?Mfr%2BProductline=Apple%2BPower+Mac&mfr=Apple&cat=RAM&model=Power+Mac+G5+%28Dual+2.0GHz+DDR%29&submit=Go > ) > > you find they can get a 512M stick of ram for $93.99 Even if they > replaced all th ram that apple sent them with new RAM, that's only > around $752/node for the RAM, not $2349. > Good point. I forgot they popped the cases and did something to them (never did bother to figure out what though). > So, 1100 * $3752 = $4.127 million, and the total up to $6.215 million > (using your numbers). > > Not as low as your first numbers, but not has high as your new ones. > And there are some other adjustments that could be made. Like the racks > for instance. I have heard of big name vendors throwing in racks with > large purchases, especially for repeat customers. So, its possible that > the racks and maybe some other stuff were given away by Apple. They may > not give the same deal to everyone, but I imagine anyone buying that > many machines at once can talk the sales rep into wheeling and dealing > quite a bit. > The presentations I've seen said that they contacted the Rack manufacturer directly and that custom racks were designed and built. I'm sure the rack manufacturer got paid by someone, just not sure who. :) Still, in my quick analysis, you only drop $138k. Still not close to the $5.2 million that's floating around. Let's have some more fun! Let's assume that all the vendors but Apple got paid what I projected. So the difference between the quoted (5.2) and the projected (6.215) is $1.015 million. Divide that by the number of nodes and you get $923 per node. Furthermore, let's assume that VTech got a volume discount of $923 per node. Then we get the $5.2 million. So, in fact VTech paid Apple about $2k per node instead of $3k. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Mon Nov 17 15:21:25 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Mon, 17 Nov 2003 14:21:25 -0600 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <20031117152049.99076.qmail@web11406.mail.yahoo.com> References: <20031117152049.99076.qmail@web11406.mail.yahoo.com> Message-ID: On Mon, 17 Nov 2003, Rayson Ho wrote: > --- Mark Hahn wrote: > > and their yield is around 59%. not to mention the little bit > > of missing ECC... > > I didn't follow, which "yield" are you refering to?? > > > I'd like very much to know the actual prices and discounts for Big > > Mac. > > it's a shame this isn't required for Top500... > > The price is around 1100*3000*(discount for edu) + cost interconnect > > 5.2M $ is not too far away from the "actual price"... You've apparently never priced 27 96-port IB switches (plus cables)! _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Mon Nov 17 16:08:40 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Mon, 17 Nov 2003 13:08:40 -0800 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: <3FB8F729.3080709@lmco.com> Message-ID: <20031117210840.GC25979@sphere.math.ucdavis.edu> > To put it another way, it is bloody silly to take an N year budget and > spend it all in year one on computing hardware, because compute capacity > that can be purchased at constant cost grows exponentially while compute > capacity that has been purchased AT fixed cost depreciates exponentially > and has a rather high baseline operating cost. It also means that you > >>really<< pay for a design error. If this enormous 1100 node cluster, > designed and purchased all at once, has any design flaw with a repair > cost that scales like the number of nodes, it would be ruinous. If one > had only bought (say) 1/4 of the nodes in year one, 1/4 more in year > two, 1/4 more in year three, and 1/4 more in year 4, one would get > roughly: Having just sat through a Production Clusters talk at SC2003, I figured it would be worth mentioning the downside of yearly upgrades. Hetrogenious clusters are a nightmare, at least linear scaling in support costs, and if your running large codes you can can get zero scaling. I.e. 250 nodes a year, at the end of 4 years you can run 250 fast nodes, or 1000 nodes at the speed of the 1st years. The opinion of the 4 speakers giving the talk was buy a cluster large enough to keep it till replaced. This dramatically decreases support costs, keeps things simple for the end users, keeps the batch queue simpler, and stops silly things like a BIOS upgrade for some of the nodes taking down the entire cluster. Certifying a large body of applications, user tools, quota monitoring, sensor monitoring etc for a particular configuration is alot of work. Numerous nightmares were reported even for "identical" nodes that ended up coming from different factories. Large site installations spend alot of sweat and tears becoming intimiately familar with their hardware. Analyzing failure rates, how to read various temp sensors, monitoring of various types, etc. Building a cluster 1 year at a time can work of course, especially if your jobs are never bigger then a single years purchase, but it's not free. In many cases when your support staff limited (seems very common) you might be better off with a cluster every couple years. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Mon Nov 17 16:36:28 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Mon, 17 Nov 2003 13:36:28 -0800 Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: References: <20031117152049.99076.qmail@web11406.mail.yahoo.com> Message-ID: <20031117213628.GA26471@sphere.math.ucdavis.edu> After all this discussion of the top 500 list, it got me thinking about a "better" benchmark. Where "better" means more useful to evaluating my idea of cluster goodness. So what is hard about large clusters? Seems to me like it is primary scaling. What controls the scaling? Mostly the interconnect. So we primarily need to evaluate the interconnect and how it performs in a large cluster environment. Additionally getting an account or even the hardware to evaluate single cpu performance of a IT2, G5, P4, or Opteron is fairly easy and direct. Of course there are characteristics inside the box that effect scaling outside, but I'd argue these effects are much smaller then the effects of the interconnect. So what would a better benchmark look like? Bisectional bandwidth is of course interesting, although it's a fairly gross measure. How about something along the lines of: * Minimal CPU work, only enough to ensure correctness. * MPI based (focus on user visible performance) * Provide scores for sending messages 1,10,100,1000,10000 64 bit numbers * Have a random mode (any node can talk to any other) * Have a nearest neighbor mode (end user can define arbitrary mapping of virtual nodes to physical nodes for maximum performance.) * Run on 8, 16, ... 2^N nodes (for pretty scaling graphs) For shared memory machines it's much tougher, I don't know of any portable way to insure remote page allocation. Maybe have each cpu allocate 512 MB arrays, access it for a million times, then swap pointers, start the clock and measure the bandwidth per CPU to that memory (wherever it was allocated). Does anyone know of similar tools for doing this? If not do people think it would be worthwhile? If so I'd be willing to take a shot at writing the MPI version. Anyone interested in a SC2003 BOF to discuss it? Feedback? Comments? -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Nov 17 17:02:49 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 17 Nov 2003 14:02:49 -0800 Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: <3FB8F729.3080709@lmco.com> Message-ID: <5.2.0.9.2.20031117134023.037dd200@mailhost4.jpl.nasa.gov> rgb wrote: >With 1100 dual CPU nodes drawing perhaps 250 Watts apiece, the room >needs some 275 KW of capacity, maybe 180 20 amp circuits (assuming one >can drive roughly six nodes per circuit). This costs ballpark estimate >of $275,000/year just to feed and cool the nodes, more than the racks >themselves. The capital cost of the circuits, transformers, space >renovation, and the chillers required to drive this cluster would likely >add another seven digit number to your estimate and is a lot less >ignorable than the cost of the racks or network;-) Just the AC receptacles, boxes, and conduit (along with electricians to install it) alone will be a significant cost.. For comparison, when my tract house was built, they charged a flat fee of $50 to add a receptacle; for putting in conduit, installing a duplex receptacle, pulling the wire, and attaching it to the distribution panel in an industrial environment, you could figure about $30-50 in materials and a couple hours in labor (@ $50/hr fully burdened). Just to do some quick back of the enveloping, lets assume $150/receptacle. Say 200 circuits (based on rgb's calculation above), so you're at $30K, just for the end of the wire. A typical 50 kVA pad mount single phase transformer runs about $1500-2000, plus about $700 to install it, and you'd need at least 6, probably more like 9, so that's another $20K. There's also panels, overcurrent protection, grounding, etc., getting the P.E. to design the system and sign and seal plans (and we licensed engineers don't come cheap). The infrastructure for a job like this would be many hundreds of thousands of dollars, before you rolled in the first rack of computers. >Small nuclear power plant optional... > >Now the pizza cost, that can be ignored. Unless it's a government funded facility, where OMB guidelines (and, more importantly, instiutional interpretation) say that provision of meals (in distinction to snacks at a meeting) is verboten (donuts: OK, bagels: NO; because bagels are food and doughnuts are not) >The other thing that always amuses me about clusters like this is the >Moore's Law effect. They buy it this year, after spending a year >(easily) preparing the site and building the requisite infrastructure. >They operate it for three years (spending $2.25 million, say). In the >meantime, node power at constant cost has increased by a factor of 4. >If they invested their capital in bonds for those three years (including >the operating budget), and bought that 4x faster node hardware, they >would BREAK EVEN on the amount of work they get done by year four, and >have saved three years operating expenses plus interest in addition to >the interest on the entire capital amount for three years -- an easy $3+ >million. > Unless one gets partial results early on that make the later years of analysis and computing more efficient. Difficult to quantify, but an important factor. Also, there is a certain fixed amount of labor for "fiddling around to get it all to work" that will apply at the beginning of the computation, and earlier is better, because you're paying with non-inflated dollars. In fact, here is a great argument for scalable clusters. You can invest in all the infrastructure up front (because it's generally cheaper to buy things like buildings all at once) and implement a smaller cluster to get through the teething pains, and then, as the performance of the hardware improves, upgrade the cluster along the way. If you haven't tied the computation inextricably to the particular implementation, then this may provide a more efficient/optimum use of a fixed amount of capital. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Mon Nov 17 17:31:32 2003 From: david.n.lombard at intel.com (Lombard, David N) Date: Mon, 17 Nov 2003 14:31:32 -0800 Subject: top500 list (was: opteron VS Itanium 2) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4BF32@orsmsx402.jf.intel.com> From: canon at nersc.gov, Monday, November 17, 2003 11:38 AM > ... At this point, > Big Mac is a one-of-a-kind. The Apple crowd has never even > looked at HPC before this (probably because its typically a money loser). Actually, the "Apple crowd" had been making the rounds, at least the ISV rounds, at Cluster World in June of this year. Don't know how long before that they were (certainly not at LW in Jan) or if their approach was a locality affect (again, LW in Jan in NYC). Perhaps though, the cold^H^H^H^HFRIGID weather kept them away from LW at NYC (kudos for not being *that* dumb ;^) -- David N. Lombard My comments represent my opinions, not those of Intel. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Nov 17 17:46:54 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 18 Nov 2003 09:46:54 +1100 Subject: Sun to start selling Opteron systems - official + Sun/AMD to work with community to create 64-bit Linux ABI Message-ID: <200311180946.58367.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Usual PR stuff, but the last part I've quoted (which ran on from the bit about 64-bit Solaris for Opteron originally) about joint work on a 64-bit Linux ABI seems the most interesting part to me. http://www.sun.com/smi/Press/sunflash/2003-11/sunflash.20031117.2.html [quote] With today's announcement that Sun Microsystems, Inc. (Nasdaq: SUNW) and AMD (NYSE: AMD) have formed an alliance to deliver a broad range of AMD Opteron[tm] processor-based systems, Sun also announced it plans to offer its Java Enterprise System on the AMD Opteron processor and is significantly extending the reach of its Solaris Operating System (OS) and leadership in the 64-bit space. [...] [...] The Solaris OS on the 64-bit AMD Opteron processor platform is expected to be available in the first half of 2004 through Sun's innovative early-access Software Express for Solaris program. Furthermore, Sun and AMD intend to work jointly with the Linux community to define and promote a 64-bit UNIX(r)-Linux Application Binary Interface (ABI) to enable interoperability. UNIX or Linux applications could run natively on any operating systems supporting this ABI. [/quote] Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/uU/eO2KABBYQAh8RAgxCAJ99Qp58juNNNSSecu+WtaaaXTLuOQCdFgmG w6MdeWwomodfTZ41E/B2YnA= =sbfy -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Nov 17 18:28:34 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Mon, 17 Nov 2003 15:28:34 -0800 (PST) Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4BF32@orsmsx402.jf.intel.com> Message-ID: On Mon, 17 Nov 2003, Lombard, David N wrote: > From: canon at nersc.gov, Monday, November 17, 2003 11:38 AM > > ... At this point, > > Big Mac is a one-of-a-kind. The Apple crowd has never even > > looked at HPC before this (probably because its typically a money > loser). they're had a cluster version of the xserve since the last major rev of the platform... making the rounds, and "actually releveant" to people building clusters are kind of different things. I have a mac (among several other machines) on my desk, and while it runs linux fairly well, I'm not terribly convinced that my goals and those of steve jobs/apple computer are terribly well aligned. > Actually, the "Apple crowd" had been making the rounds, at least the ISV > rounds, at Cluster World in June of this year. Don't know how long > before that they were (certainly not at LW in Jan) or if their approach > was a locality affect (again, LW in Jan in NYC). Perhaps though, the > cold^H^H^H^HFRIGID weather kept them away from LW at NYC (kudos for not > being *that* dumb ;^) > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Mon Nov 17 18:44:19 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: Mon, 17 Nov 2003 17:44:19 -0600 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4BF32@orsmsx402.jf.intel.com> Message-ID: On Monday, November 17, 2003, at 04:31 PM, Lombard, David N wrote: > From: canon at nersc.gov, Monday, November 17, 2003 11:38 AM >> ... At this point, >> Big Mac is a one-of-a-kind. The Apple crowd has never even >> looked at HPC before this (probably because its typically a money > loser). > > Actually, the "Apple crowd" had been making the rounds, at least the > ISV > rounds, at Cluster World in June of this year. Don't know how long > before that they were (certainly not at LW in Jan) or if their approach > was a locality affect (again, LW in Jan in NYC). Perhaps though, the > cold^H^H^H^HFRIGID weather kept them away from LW at NYC (kudos for not > being *that* dumb ;^) > Actually I think Apple folks have been sniffing around bioinformatics for a while, but overall lacked in the floating point arena to make an impact in other areas of HPC. You also have to keep in mind that it is no longer just the "apple crowd". IBM is also sniffing around using the PPC for HPC. The G5 blades that they are working on are pretty good evidence of that. I think they should hold the January LW in Minneapolis. That would *definitely* indicate who is dedicated and who is not. ;-) -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 17 19:56:46 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 17 Nov 2003 19:56:46 -0500 (EST) Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: <5.2.0.9.2.20031117134023.037dd200@mailhost4.jpl.nasa.gov> Message-ID: On Mon, 17 Nov 2003, Jim Lux wrote: > Say 200 circuits (based on rgb's calculation above), so you're at $30K, > just for the end of the wire. A typical 50 kVA pad mount single phase > transformer runs about $1500-2000, plus about $700 to install it, and you'd > need at least 6, probably more like 9, so that's another $20K. There's also > panels, overcurrent protection, grounding, etc., getting the P.E. to design > the system and sign and seal plans (and we licensed engineers don't come > cheap). The infrastructure for a job like this would be many hundreds > of thousands of dollars, before you rolled in the first rack of computers. Ya. And the AC might well cost several times the electrical circuits. And don't forget all the network wiring (as opposed to NICs and switches). Lots of pulls, cable trays, maybe raised floor action (this looks like a high rent cluster likely to have a raised floor design and custom cabinets). Infrastructure and renovation costs pretty much EQUALLED the costs of the first 100+ nodes we moved into our new cluster space, and their capacity looks like it is many times ours. > >The other thing that always amuses me about clusters like this is the > >Moore's Law effect. They buy it this year, after spending a year > >(easily) preparing the site and building the requisite infrastructure. > >They operate it for three years (spending $2.25 million, say). In the > >meantime, node power at constant cost has increased by a factor of 4. > >If they invested their capital in bonds for those three years (including > >the operating budget), and bought that 4x faster node hardware, they > >would BREAK EVEN on the amount of work they get done by year four, and > >have saved three years operating expenses plus interest in addition to > >the interest on the entire capital amount for three years -- an easy $3+ > >million. > > Unless one gets partial results early on that make the later years of > analysis and computing more efficient. Difficult to quantify, but an > important factor. Also, there is a certain fixed amount of labor for > "fiddling around to get it all to work" that will apply at the beginning of > the computation, and earlier is better, because you're paying with > non-inflated dollars. Oh, yeah, you and Bill are right. My argument was simplistic and won't apply in all cases (especially as Bill noted if you're trying to scale a real parallel computation across all N nodes all at once, and not just divvying up compute cycles amongst a large number of users none of whom are running computations that can scale to more than N/4 nodes anyway). > In fact, here is a great argument for scalable clusters. You can invest in > all the infrastructure up front (because it's generally cheaper to buy > things like buildings all at once) and implement a smaller cluster to get > through the teething pains, and then, as the performance of the hardware > improves, upgrade the cluster along the way. > If you haven't tied the computation inextricably to the particular > implementation, then this may provide a more efficient/optimum use of a > fixed amount of capital. This is my general feeling. The other point is that for VTech to get funding for a "supercluster" like this once is a strike of lightning -- $10 million dollar projects don't fall in your lap every day. However, in 4-5 years tops, the hardware is going to be aged out (in six years contemporary computers will have a LOT more memory per node, processors that are estimatable to be 16x as fast, we might be up to REALLY fast networks or fast networks might be really cheap -- who knows?). Some joker like me will be able to build a cluster in their basement for $100K and equal its throughput, especially when scaling penalties on 1100 nodes are taken into account. So they'll have to go BACK to the well early and often, just like a real supercomputer center, or be obsoleted out of relevance by Moore's Law. And if they go back to the well every year, well, they're adopting a scalable cluster model. This is the killer -- what exactly will they DO with the cluster that is worth $7 million, plus the better part of a million a year just to run it? Not a whole lot of projects out there that are worth the up-front investment. It's really a matter of mindset. I've seen or heard of lots of very very expensive computers designed and assembled to accomplish some "really important" computation "really fast" that have been funded by all sorts of deep pocketed government agencies. In some of those cases, building the computer was so difficult that it didn't even get finished before Moore's Law overtook it at 1/10th the cost using commodity hardware (anything that takes years to build is at real risk of this). Worse, a lot of the research funded this way isn't really burning issue stuff in that the outcome won't change people's lives. Worth doing, sure, but not worth spending millions on to get a year or two earlier. Moore's Law just trundles right along, and now we're spending huge amounts to reach for teraflops, where a decade ago we were spending huge amounts to reach for gigaflops and a decade before THAT a megaflop was awesomely expensive. Well hell, I do gigaflops at home these days, for a few thousand dollars total. In ten more years, Inshallah, I'll be doing teraflops on my desktop and my personal digital assistant in my shirt pocket will be doing gigaflops:-). It really is a matter of waiting or not waiting to accomplish particular tasks. The REALLY big iron guys (or REALLY big cluster guys:-) hate to hear that -- they make a living from their really big supercomputers that live out on the bleeding edge. So I'm not surprised to hear that four out of four reject a scalable approach in favor of the big project model. The big science guys hate it too. Doesn't stop it from being true...at least for some projects. YMMV, and I'm not trying to break anybody's dolly;-) rgb P.S. -- anybody remember the good old days, when you'd have been arrested and put in jail as a traitor to the American Way if you'd sold a Russian or Chinese person a Gigaflop-capable computer because they could use it to Simulate Nuclear Devices? Developing GHz CPUs sort of put a squeeze on THAT idea, ay? Especially with the beowulf model to pursue. Now beowulfs are being built that follow the big iron model. We have met the enemy and it is us... > > > James Lux, P.E. > Spacecraft Telecommunications Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Nov 18 00:01:21 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 17 Nov 2003 21:01:21 -0800 Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) References: Message-ID: <008601c3ad94$4f2648e0$32a8a8c0@laptop152422> Some philosophical comments below (and what is a list like this for, if not philosophical comments) rgb wrote: > It's really a matter of mindset. I've seen or heard of lots of very > very expensive computers designed and assembled to accomplish some > "really important" computation "really fast" that have been funded by > all sorts of deep pocketed government agencies. In some of those cases, > building the computer was so difficult that it didn't even get finished > before Moore's Law overtook it at 1/10th the cost using commodity > hardware (anything that takes years to build is at real risk of this). > Worse, a lot of the research funded this way isn't really burning issue > stuff in that the outcome won't change people's lives. Of course, one could make this argument about particle physics or deep space exploration. Whether we find that next particle or discover life on Europa or verify Einstein or find water on Mars won't affect a significant fraction of the lives on Earth anytime soon (except those, like me, who get paid to facilitate such exploration). However, aside from the "white collar welfare" aspect (not an aspect to be totally disregarded, what with pork barrels and such), there are practical and immediate benefits. While the actual application may not have much immediate need, it might provide a framework, and specific application, that drives a development which has general application. Sometimes, a specific problem is needed to get work rolling, rather than sitting in a "what might be the optimum general solution" analysis mode for years. If the problem is stated as "determine X", then something needs to get done, clusters need to get built (however inefficient), technology needs to be developed, which is then "inserted" into succeeding projects/missions etc. Also, for anything novel, there's always the "I'm not going first" problem. Like penguins wondering if there's a leopard seal in the water, someone's got to jump in and show that you won't die instantly. Sometimes, those programs of perceived little value (and hence, little opprobrium if you fail) provide the mechanism to demonstrate a new technology. Jim Lux _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Tue Nov 18 02:12:46 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Tue, 18 Nov 2003 08:12:46 +0100 Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: <20031117213628.GA26471@sphere.math.ucdavis.edu> References: <20031117152049.99076.qmail@web11406.mail.yahoo.com> <20031117213628.GA26471@sphere.math.ucdavis.edu> Message-ID: <20031118071246.GC17558@unthought.net> On Mon, Nov 17, 2003 at 01:36:28PM -0800, Bill Broadley wrote: > > After all this discussion of the top 500 list, it got me thinking about a > "better" benchmark. Where "better" means more useful to evaluating my > idea of cluster goodness. There are lies, damn lies, and statistics... Your points about a more appropriate benchmark are valid - but we must realize that there is not such thing as "the one true benchmark". Some clusters are tailored for one specific workload - one app. that has been written for the cluster, as the cluster was built for the app. In those situations, you can run that app on the cluster and get your "true performance" metric. For most of the top machines, I'd be rather surprised if there hadn't been a pretty clear idea about what the machines would be running, prior to purchase. A general list such as Top500 needs one benchmark which will arguably be both unfair and even irrelevant for a large number of the systems on the list. (example: if all I do is factor large numbers, I don't care what the Linpack performance of my machine is - I may well have a system that does factoring 10 times faster than the Earth Simulator, while my system cannot even make the Top500). All in all - for a list as Top500, having *one* *simple* benchmark that is *well known*, is really the true value of the list. Having a "fairer" benchmark with more numbers (one number is as you argue and as per my previous example, irrelevant in many if not most cases), would in my oppinion not be a gain for the usefulness of the list as such. It's not what the Top500 is for. The Top500 is for "who's got big iron that can do Linpack really fast". Chances are such big iron will perform other tasks really fast as well, but we don't know, and if the Top500 could tell us, the list would be so massively complicated that we couldn't use it for anything at all in the first place anyway. I think that having one poor (but well known and simple) metric is the better solution. -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lars at meshtechnologies.com Tue Nov 18 04:41:56 2003 From: lars at meshtechnologies.com (Lars Henriksen) Date: Tue, 18 Nov 2003 09:41:56 +0000 Subject: GenericNQS batch system Message-ID: <1069148516.7118.26.camel@tp1.mesh-hq> Dear beowulfers I'm having some problems with the Generic NQS batch system. Creating and using queues on a single host works fine,, but when i try to submit jobs to queues on remote hosts, it does not work. Does anyone have experience with that kind of operation? Here is what i've done: On the scheduling host (host1): # qmgr create pipe sched-queue destination = exe-in at host2 # qmgr set lb_out sched-queue # qmgr enable queue sched-queue On the host that has to do the job execution (host2): # qmgr create batch exe-queue pipeonly # qmgr create pipe exe-in pipeonly destination exe-queue # qmgr set lb_in exe-in # qmgr enable queue exe-queue # qmgr enable queue run-in # qmgr set scheduler host1 In 'nmapmgr' on both host, entries has been added both for principal names and aliases. /etc/hosts.nqs looks like this on both hosts: * * So when i try to submit at job to the system on host1: (top of job description file:) ------- #QSUB-q sched-queue #QSUB-eo #QSUB-r test ------- nothing happens :-( edited syslog from the host where submission is made: host1 NQS daemon[7467]: psc_spawn: Rqst not scheduled due to none there. host1 NQS daemon[7467]: psc_spawn: Rqst not scheduled due to none there. host1 NQS Pipeclient[5899]: Process logging started at Tue Nov 18 10:24:36 2003 host1 NQS Netdaemon[5900]: Netdaemon: Connection from host1 host1 NQS Pipeclient[5899]: Unable to deliver request 31 to a destination host1 NQS Pipeclient[5899]: Msg #2:Scheduling request for retry at a later time host1 NQS Pipeclient[5899]: Msg #2:Request rescheduled; exiting A 'qstat -x' shows this: Destset = {exe-in at host2 [RETRY] }; I'm kinda baffled by this... Well thanks for your patience in reading this. I hope some of you can give me some pointers... best regards Lars -- Lars Henriksen | MESH-Technologies A/S Systems Manager & Consultant | Forskerparken 10 www.meshtechnologies.com | DK-5230 Odense M, Denmark lars at meshtechnologies.com | mobile: +45 2291 2904 direct: +45 6315 7310 | fax: +45 6315 7314 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Tue Nov 18 06:01:12 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Tue, 18 Nov 2003 06:01:12 -0500 Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: References: Message-ID: <3FB9FBF8.9080400@lmco.com> Robert G. Brown wrote: > It's really a matter of mindset. I've seen or heard of lots of very > very expensive computers designed and assembled to accomplish some > "really important" computation "really fast" that have been funded by > all sorts of deep pocketed government agencies. In some of those cases, > building the computer was so difficult that it didn't even get finished > before Moore's Law overtook it at 1/10th the cost using commodity > hardware (anything that takes years to build is at real risk of this). > Worse, a lot of the research funded this way isn't really burning issue > stuff in that the outcome won't change people's lives. Worth doing, > sure, but not worth spending millions on to get a year or two earlier. > Moore's Law just trundles right along, and now we're spending huge > amounts to reach for teraflops, where a decade ago we were spending huge > amounts to reach for gigaflops and a decade before THAT a megaflop was > awesomely expensive. > > Well hell, I do gigaflops at home these days, for a few thousand dollars > total. In ten more years, Inshallah, I'll be doing teraflops on my > desktop and my personal digital assistant in my shirt pocket will be > doing gigaflops:-). It really is a matter of waiting or not waiting to > accomplish particular tasks. The REALLY big iron guys (or REALLY big > cluster guys:-) hate to hear that -- they make a living from their > really big supercomputers that live out on the bleeding edge. So I'm > not surprised to hear that four out of four reject a scalable approach > in favor of the big project model. The big science guys hate it too. > Bob, I think it's about time you posted a quick review of the little scenario you came up with regarding having a pot of money and a project to finish in a certain amount of time. It's the one where you showed that it's better (more cost effective) to wait until the project is almost due, buy the fastest cluster you need, and run the code, rather than buy the fastest machine at the beginning of the project and compute the rest of the time. This analysis was beautiful and very insightful. I think alot of people would benefit from reading it. Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 18 09:21:23 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 18 Nov 2003 09:21:23 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: <20031118071246.GC17558@unthought.net> Message-ID: On Tue, 18 Nov 2003, Jakob Oestergaard wrote: > On Mon, Nov 17, 2003 at 01:36:28PM -0800, Bill Broadley wrote: > > > > After all this discussion of the top 500 list, it got me thinking about a > > "better" benchmark. Where "better" means more useful to evaluating my > > idea of cluster goodness. > > There are lies, damn lies, and statistics... > > Your points about a more appropriate benchmark are valid - but we must > realize that there is not such thing as "the one true benchmark". > > Some clusters are tailored for one specific workload - one app. that has > been written for the cluster, as the cluster was built for the app. In > those situations, you can run that app on the cluster and get your "true > performance" metric. I agree and disagree. I personally have a deep and abiding mistrust of high end benchmarks -- benchmarks of complex code -- unless they are MY complex code. Things like linpack and spec are useful only to the extent that one or more components "resembles" your application. Screw resemblance -- test your application. However, I think Bill's points are very well taken, so much so that I saved the article in my "List Ideas" directory for eventual reconsideration and mention in an article or the book. I also think that MICROBENCHMARKS are very useful indeed to systems and cluster engineers. Things like lmbench or stream or netpipes are small (generally nearly trivial code) and relatively insensitive to compiler/architecture quirks, or at least if they are they are likely to be sensitive in ways that do translate to arbitrary applications that use the tested operations. They are also a LOT harder to "fool", especially if the microbenchmarks can be run by anybody from a GPL source base. The vendor cannot easily fudge a benchmark if you put your benchmark source on a vanilla Linux install, compile it, and run it. Or again, if they do "fudge" somehow under those circumstances (perhaps by warping an entire architecture to optimize some result:-) it is likely that a real application will benefit from the optimized operation, even if other operations elsewhere suffer. The latter sort of tradeoff is why Larry McVoy insists that lmbench (which can be run, of course, any way a user likes, a microbenchmark at a time) can only be used to publish >>results<< if a full suite of results are published, not "selected" ones on which a vendor does well. This is intended to prevent the kind of abuse that early benchmarks were notorious for attracting (and that likely continues today). Chip real estate ALSO goes through various opportunity cost decisioning processes (re: previous post on grant processes:-) and a new LU to optimize process X comes at the expense of e.g. on-chip context storage, more registers, heat production and hence higher clock. At some point you are robbing peter to pay paul, and the issue becomes one of balance. The balance issue extends out to the rest of the architecture, as has increasingly been a list focus. CPU clock has consistently outpaced memory (in Moore's Law exponent); both have WAY outpaced the network. Disk has outpaced even the CPU in volume, but lagged even the network in speed. So I personally would like to see a full suite of microbenchmarks -- literally trivial components wrapped in a timing harness. These should measure core functions that are building blocks of real programs. Many of these computational component measurements exist for standalone systems; not so many for clustered systems. I think this is the intriguing element of Bill's suggestions. A benchmark graph of just how long it takes to use raw UDP or TCP sockets, MPI, PVM to pass a message according to one of several patterns, plotted as a 2d/3d function of e.g. message size and number of nodes, together with stream results (and perhaps some of the other cpu_rate or lmbench benchmarks, depending on your arithmetic mix) would be a lot more openly informative than what gets published now. For one thing, it would separate out a lot of the bullshit associated with "top 500-ness". We could look at two clusters and compare their actual performance in important metrics at a glance, instead of wondering who could possibly give a rodent's furry behind about tools that de facto are just ONE possible measure of aggregate CPU in ONE set of fairly complex operations out of a practical infinity that might actually occur in our code. > For most of the top machines, I'd be rather surprised if there hadn't > been a pretty clear idea about what the machines would be running, prior > to purchase. ;-) I think you're right... > I think that having one poor (but well known and simple) metric is the > better solution. It does make it simple, but it doesn't make it better. It's the old issue -- "how many MFLOPS -> GFLOPS -> TFLOPS is your cluster?" (arrows indicate the progress of roughly decades). Who's di..um, I mean "cluster" is bigger. First, tell me what the HELL a MFLOP is. My microbenchmark measurements of a MFLOP don't agree with any of the accepted definitions, and vary significantly with whether or not e.g. division is included in the "floating point operations" tested. Since division is so slow, it is almost always omitted from computations of FLOPS. Since division is so common, people wonder why even their simple loops with division in them don't ever achieve the blazing throughput they expected. Then there are the rather immense variations in performance observed as e.g. the size of vectors is varied, code is driven from local/sequential to nonlocal/random. Cluster engineers are not stupid. Well, maybe SOME of them are stupid, somewhere, but I haven't met any that happened to be drooling and looking off in the distance with a vacant expression. Unless a beer happened to be sitting in front of them, of course. I think that they could manage to learn to use a very complex (but well documented, GPL) instrument set to support intelligent cluster design. Hell, I think most of the good people on this list use a complex but NOT terribly integrated set to support intelligent cluster design now! As I said, stream, netpipes, even spec (there ARE people whose tasks match decently with at least one component). And of course, the best of benchmarks, your application, but >>even optimizing your application<< requires knowledge only a microbenchmark can provide. The benefits of using this sort of information intelligently can equal the output of your entire cluster put together. Dongarra's ATLAS project is a shining beacon for what can be done in this regard. Factors of 2-3 speedup are not unknown for what CAN be core operations in many computations, just automagically adjusting algorithm and stride to take maximal advantage of register/L1/L2/memory latencies and bandwidths and the underlying CPU/chipset. It is pretty much the ONLY way one can achieve superlinear speedup -- know where significant nonlinearities in bottleneck speed occur and partition the task accordingly. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 18 08:40:33 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 18 Nov 2003 08:40:33 -0500 (EST) Subject: Economics of clusters was Re: top500 list (was: opteron VS Itanium 2) In-Reply-To: <008601c3ad94$4f2648e0$32a8a8c0@laptop152422> Message-ID: On Mon, 17 Nov 2003, Jim Lux wrote: > Also, for anything novel, there's always the "I'm not going first" problem. > Like penguins wondering if there's a leopard seal in the water, someone's > got to jump in and show that you won't die instantly. Sometimes, those > programs of perceived little value (and hence, little opprobrium if you > fail) provide the mechanism to demonstrate a new technology. Agreed. However, the fundamental underlying issue is economics. You have an approximately fixed budget of X billions of dollars allocated to publically funded research in the US. This money is distributed among many agencies for targeted disbursement. The target selection process (as you note) contains elements of national, state, local, and scientific politics -- there is plenty of pork in it. Some does indeed get distributed as a sort of jobs program for starving corporations who not completely coincidentally made large donations to selected politicians (often on both sides of a race). Other parts go to fund some scientific director's pet project. However, at the crux of each funding decision, politics or no, is the issue of opportunity cost. It was opportunity cost that ultimately brought down the SSC. It is never and "and" operator with funding, it is inherently an "or" operator, given a fixed budget (and if the budget is deliberately expanded to include somebody's pet project, the "or" operator needs apply to the expanded but still fixed pool, even if that decisioning is done at a very coarse granularity and one decision level up, e.g. the US Senate). So I totally agree with everything you say. Sure, we need to climb certain scientific mountains just because they are there, and trust that new worlds lie on the other side of some of them. HOWEVER, that does not release us from the obligation of making choices. For every project that is funded, the pool of funds is diminished, and alternative projects are rejected and not funded. My personal research colleague is an ARO grant officer, and I am fairly frequently treated to a view of this from the other side -- so much he'd LIKE to fund, so finite a pool of resources to fund from, so much politics that sends huge chunks of money to specific venues outside of the normal review and selection process. It is difficult to raise oneself to a sufficiently elevated level to even begin to judge a lot of this. However, waste openly abounds. I know of quite a few places, for example, that have bought e.g. SP2's to do HPC computations in years past. These are (were), recall, quite expensive boxes. Naturally, they were publically funded from various grants. At the time they were purchased, the beowulf model was already well known, and on a per-processor/per-cycle basis a competing beowulf cost perhaps 1/5th as much. The grantholders even KNEW about the beowulf model, and were using the systems to run primarily embarrassingly parallel applications that would have run efficiently on a pile of PCs and sneakernet. However, politics or open ignorance or "deals" cut by IBM, bewteen one thing or another there they were with $500,000 computers whose actual benefit to their owners could easily be matched by $100,000 beowulfs, even on considerably finer grained code than they were running. Then there is the ADDITIONAL issue of whether the work being done was worth the cost of the hardware, compared to all the other work that might have been done with that money. I'm sure that the money from sales like these floated IBM's boat through tough times, and kept its sales and engineeering force from having to go on welfare, and I'm not even sarcastic about it. The same hand ultimately feeds me, after all, and I have no wish to bite it. However, there >>is<< the opportunity cost issue of the extra $400K or so. If the work was really valuable, perhaps it could have been completed much faster with a more intelligent cluster model. If an intelligent cluster model had been used at the lower rate, perhaps some other deserving project could have been funded to keep ITS researchers and support people out of homeless shelters. Choice is essential. Cost-benefit is at the heart of economic choice. Where admittedly, the liver is politics...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Nov 18 13:23:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 18 Nov 2003 13:23:13 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Tue, 18 Nov 2003, John Hearns wrote: > There's a page frpm Paralogic with a packaged set of > benchmarking tools http://www.plogic.com/bps/ > > Maybe could be a start to your ideas? Doug (Eadline) and I have talked about this for years now, and he has put together a small package that are used for design purposes, I believe, at paralogic. Maybe he'll write an article about this himself in his new mag...:-) I don't think that they are quite "done", though, (at least the last time I checked) so yes, I'd call it a "start" to the idea. Not really my idea, as you can see. I think there are lots of folks who have thought on this, and lots more that have a de facto suite they use whether or not they are packaged. lmbench, netpipe, netperf, bonnie, memtest86 -- lots of tools out there for doing bits of this, some of them very nice. cpu_rate (which is available on my own website under the Beowulf link) is another such tool. I've no time to work on it just now, but I'm in midstream on a fairly major rewrite to really separate out the timing harness and test invocation process so that code snippets can be wrapped in a standard subroutine pro/epilogue and timed, with correct subtraction of the subtroutine overhead. This isn't as easy as you might think (at least to get consistent results) but when it is finished cpu_rate should be a highly extensible way to wrap up anything from microbenchmarks to specific code fragments you want to test. One day we might even get a little group together at a meeting and kick around specs for a really nice, full GPL cluster exerciser toolset that can test, benchmark, and help debug problems with clusters large and small. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Nov 18 13:00:32 2003 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 18 Nov 2003 19:00:32 +0100 (CET) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: There's a page frpm Paralogic with a packaged set of benchmarking tools http://www.plogic.com/bps/ Maybe could be a start to your ideas? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Nov 18 13:54:33 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 18 Nov 2003 10:54:33 -0800 Subject: Heat, computers, etc. Message-ID: <5.2.0.9.2.20031118105317.02f8e608@mailhost4.jpl.nasa.gov> An interesting column from Robert X. Cringely talking about infrastructure issues, particularly power density in racked computers. http://www.pbs.org/cringely/pulpit/pulpit20031106.html James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Tue Nov 18 15:06:54 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue, 18 Nov 2003 14:06:54 -0600 Subject: Heat, computers, etc. Message-ID: <200311182006.hAIK6sP26138@mycroft.ahpcrc.org> James Lux wrote: >An interesting column from Robert X. Cringely talking about infrastructure >issues, particularly power density in racked computers. > >http://www.pbs.org/cringely/pulpit/pulpit20031106.html > Jim and All, Another paper I have found useful on the same subject is amoung the ADC white paper list (paper #46) at: http://www.apc.com/tools/mytools/index.cfm?action=search&category=whitepaper Search with power, cooling, and racks. It makes the point that the goal should not be simply to endlessly reduce rack square footage because high power density models have non-linear affects on ancillary power and cooling costs both in terms of the square feet they occupy on their own and their intrinsic cost. ADC posits this begins to occur around 4 KW per rack. The bottom line then (if you believe them) is that as per rack compute density goes up per chip wattage (and general per node wattage) must go down to retain the savings of a smaller foot print. Regards, rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com # rbw at ahpcrc.org # #--------------------------------------------------- # "What you can do, or dream you can, begin it; # Boldness has genius, power, and magic in it." # -Goethe #--------------------------------------------------- # "Without mystery, there can be no authority." # -Charles DeGaulle #--------------------------------------------------- # Nullum magnum ingenium sine mixtura dementiae fuit. # - Seneca #--------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From seth at hogg.org Tue Nov 18 16:37:09 2003 From: seth at hogg.org (Simon Hogg) Date: Tue, 18 Nov 2003 21:37:09 +0000 Subject: Optimal SMP Stucture for Opteron In-Reply-To: <3FB7A1AE.5020307@comcast.net> References: <20031116154401.19933.qmail@web60309.mail.yahoo.com> <20031116154401.19933.qmail@web60309.mail.yahoo.com> Message-ID: <4.3.2.7.2.20031118213530.00accf00@pop.clara.net> At 11:11 16/11/03 -0500, Jeffrey B. Layton wrote: >One last comment. This next week is SC2003 so many of the >regular posters to this list won't be posting much. Having been away for 2 days (not at SC2003) and just checking my mail, I would just like to say 'au contraire'. Simon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Tue Nov 18 16:24:27 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Tue, 18 Nov 2003 16:24:27 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Tue, 18 Nov 2003, Robert G. Brown wrote: > On Tue, 18 Nov 2003, John Hearns wrote: > > > There's a page frpm Paralogic with a packaged set of > > benchmarking tools http://www.plogic.com/bps/ > > > > Maybe could be a start to your ideas? > > Doug (Eadline) and I have talked about this for years now, and he has > put together a small package that are used for design purposes, I > believe, at paralogic. Maybe he'll write an article about this himself > in his new mag...:-) Well, yes and more. We are going to address the benchmark thing in a bit more detail in the future. The BPS package is described at http://www.cluster-rant.com/article.pl?sid=03/03/17/1838236. It will be getting an upgrade soon and there will be some real codes added as well. Stay tuned. We will have an issue on featuring benchmarking as well. You will notice that it is call BPS (Beowulf Performance Suite) and not BBF (Beowulf Benchmark Suite). The reason is that BPS was not supposed to be a benchmark per se. It was intended to generate a baseline from which to measure the effect of changes to the cluster (i.e.new driver, new kernel, etc.) and to diagnose some problems. I intentionally omitted HPL because I did not want the suite to become a contest until it could provide good data on which good engineering decisions could be made. Doug > > I don't think that they are quite "done", though, (at least the last > time I checked) so yes, I'd call it a "start" to the idea. Not really > my idea, as you can see. I think there are lots of folks who have > thought on this, and lots more that have a de facto suite they use > whether or not they are packaged. lmbench, netpipe, netperf, bonnie, > memtest86 -- lots of tools out there for doing bits of this, some of > them very nice. > > cpu_rate (which is available on my own website under the Beowulf link) > is another such tool. I've no time to work on it just now, but I'm in > midstream on a fairly major rewrite to really separate out the timing > harness and test invocation process so that code snippets can be wrapped > in a standard subroutine pro/epilogue and timed, with correct > subtraction of the subtroutine overhead. This isn't as easy as you > might think (at least to get consistent results) but when it is finished > cpu_rate should be a highly extensible way to wrap up anything from > microbenchmarks to specific code fragments you want to test. > > One day we might even get a little group together at a meeting and kick > around specs for a really nice, full GPL cluster exerciser toolset that > can test, benchmark, and help debug problems with clusters large and > small. > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian.dobbins at yale.edu Tue Nov 18 15:26:09 2003 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Tue, 18 Nov 2003 15:26:09 -0500 (EST) Subject: Q: Any info on the PathScale compilers? Message-ID: Hi guys, I recently came across the announcement of the (upcoming) PathScale compilers for the Opteron platform - does anyone have any experience with them yet? Apparently they're at SC2003, so if any of you who happen to be there have come across them, what's the latest news? For those of you who aren't familiar with them, check out: http://www.pathscale.com/products1.html Thanks in advance! :-) - Brian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From galitz at uclink.berkeley.edu Tue Nov 18 19:26:12 2003 From: galitz at uclink.berkeley.edu (Geoff Galitz) Date: Tue, 18 Nov 2003 16:26:12 -0800 Subject: thermal sensing Message-ID: G'day. I need to put together a little system which can monitor the temperature of a machine room, and when a certain threshold is reached, trigger a program to run. I can handle the software side, but I'm not really sure where to begin looking on the hardware side. I've been to a few engineering web sites and catalogues but haven't really found just what I need in terms of hardware. I am looking for a temperature sensor that can simply go high or low when the threshold is reached. Any recommendations? If there is already a device or howto out there on how to do this, that would be great too. Thanks, -geoff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Tue Nov 18 18:39:06 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Tue, 18 Nov 2003 15:39:06 -0800 Subject: Q: Any info on the PathScale compilers? In-Reply-To: References: Message-ID: <20031118233906.GA520@sphere.math.ucdavis.edu> I picked up a broshure, they seem to be claiming to beat the competition and have full spec runs labeled estimates because they don't expect to run ship for 4 months (spec has a 3 month rule). I'll post more details when the material and my email access is in the same place. On Tue, Nov 18, 2003 at 03:26:09PM -0500, Brian Dobbins wrote: > > Hi guys, > > I recently came across the announcement of the (upcoming) PathScale > compilers for the Opteron platform - does anyone have any experience with > them yet? Apparently they're at SC2003, so if any of you who happen to be > there have come across them, what's the latest news? > > For those of you who aren't familiar with them, check out: > http://www.pathscale.com/products1.html > > Thanks in advance! :-) > - Brian > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Hans.Schwengeler at unibas.ch Mon Nov 17 08:08:19 2003 From: Hans.Schwengeler at unibas.ch (Hans Schwengeler) Date: Mon, 17 Nov 2003 14:08:19 +0100 Subject: MSI KT3/4 AMD motherboards and 3C905CX-TXM NIC Message-ID: <200311171308.hAHD8J0A003109@ida.astro.unibas.ch> Dear Tony, I had once problems to get two 3C905CX to work in our slaves. One alone would work ok, but not two. I could solve the problem by using the 3c90x driver instead of the 3c59x. changes in /etc/modules.conf: alias eth1 3c90x alias eth2 3c90x (for the master) in /etc/beowulf/config.boot: pci 0x10b7 0x9200 3c90x pci 0x10b7 0x9800 3c90x pci 0x10b7 0x9805 3c90x (for the slaves) I have a Scyld bz27-8 system. Yours, Hans. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mof at labf.org Tue Nov 18 03:31:48 2003 From: mof at labf.org (Mof) Date: Tue, 18 Nov 2003 19:01:48 +1030 Subject: top500 list (was: opteron VS Itanium 2) In-Reply-To: <200311171937.hAHJbcfr011705@pookie.nersc.gov> References: <200311171937.hAHJbcfr011705@pookie.nersc.gov> Message-ID: <200311181901.49107.mof@labf.org> Speaking of which, does anyone know what VT intend to use the cluster for ? Mof. On Tue, 18 Nov 2003 06:07 am, canon at nersc.gov wrote: > My true measure for the top500 would be the value of the > science (or work) accomplished with it, a difficult to > impossible thing to determine. NERSC's puts all the > emphasis on the science. This means considering: how usable the system > is; how hard is it to harness the full capability of the system; > what will the sustained performance be. Then we try to squeeze > every cycle out of the system. We've ran Seaborg (#9) > with +90% utilization for years now. We've gotten tons of > science done with it, just like we did the T3E before it. > It can be a little disappointing to watch your system slide > down the rankings, when you know its still being used to do great > stuff and its still making a large impact. But I guess that's > just the nature of Moore's law. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Nov 18 22:24:30 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 19 Nov 2003 11:24:30 +0800 (CST) Subject: GenericNQS batch system In-Reply-To: <1069148516.7118.26.camel@tp1.mesh-hq> Message-ID: <20031119032430.22152.qmail@web16803.mail.tpe.yahoo.com> GNQS is really old, and there have been no improvements for a long time. Are u supporting legacy systems? Is using a different batch system an option? The two most popular batch systems these days are Gridengine (SGE) and Scalable PBS (SPBS). Since SGE is backed by Sun, so more R&D (and money too) are put into it, and more companies use it. On the other hand, SPBS is backed by Supercluster.org, which means that it should work better with Maui/Silver/Gold, and a lot of existing HPC sites are switching from OpenPBS (which has no new development) to ScalablePBS. Both SGE and SPBS are opensource. Lastly, the Condor team told me that once they clean up the build environment, they will release the source! SPBS: http://www.supercluster.org SGE: http://gridengine.sunsource.net Condor: http://www.cs.wisc.edu/condor/ Andrew. --- Lars Henriksen ???? > Dear beowulfers > > I'm having some problems with the Generic NQS batch > system. > > Creating and using queues on a single host works > fine,, but when i try > to submit jobs to queues on remote hosts, it does > not work. Does anyone > have experience with that kind of operation? > > Here is what i've done: > > On the scheduling host (host1): > > # qmgr create pipe sched-queue destination = > exe-in at host2 > # qmgr set lb_out sched-queue > # qmgr enable queue sched-queue > > On the host that has to do the job execution > (host2): > > # qmgr create batch exe-queue pipeonly > # qmgr create pipe exe-in pipeonly destination > exe-queue > # qmgr set lb_in exe-in > # qmgr enable queue exe-queue > # qmgr enable queue run-in > # qmgr set scheduler host1 > > In 'nmapmgr' on both host, entries has been added > both for principal > names and aliases. > > /etc/hosts.nqs looks like this on both hosts: > * * > > So when i try to submit at job to the system on > host1: > (top of job description file:) > ------- > #QSUB-q sched-queue > #QSUB-eo > #QSUB-r test > > ------- > > nothing happens :-( > > edited syslog from the host where submission is > made: > > host1 NQS daemon[7467]: psc_spawn: Rqst not > scheduled due to none there. > host1 NQS daemon[7467]: psc_spawn: Rqst not > scheduled due to none there. > host1 NQS Pipeclient[5899]: Process logging started > at Tue Nov 18 > 10:24:36 2003 > host1 NQS Netdaemon[5900]: Netdaemon: Connection > from host1 > host1 NQS Pipeclient[5899]: Unable to deliver > request 31 to a > destination > host1 NQS Pipeclient[5899]: Msg #2:Scheduling > request for retry at a > later time > host1 NQS Pipeclient[5899]: Msg #2:Request > rescheduled; exiting > > A 'qstat -x' shows this: > > > Destset = {exe-in at host2 [RETRY] > 12:35:10 CET 2003> > CET 2003> > }; > > > I'm kinda baffled by this... > > Well thanks for your patience in reading this. I > hope some of you can > give me some pointers... > > best regards > > Lars > -- > Lars Henriksen | MESH-Technologies > A/S > Systems Manager & Consultant | Forskerparken 10 > www.meshtechnologies.com | DK-5230 Odense M, > Denmark > lars at meshtechnologies.com | mobile: +45 2291 > 2904 > direct: +45 6315 7310 | fax: +45 6315 7314 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Nov 18 22:35:22 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 19 Nov 2003 11:35:22 +0800 (CST) Subject: Fwd: Open to the Public Colloquium with VT's Srinidhi Varadarajan In-Reply-To: <54B4C31E-1950-11D8-A838-000393838B9E@linguamediagroup.com> Message-ID: <20031119033522.23658.qmail@web16803.mail.tpe.yahoo.com> Since there are way too many guesses and "i think..." (and then continue with several thousand words descibing how bad a Mac cluster would be!) about BigMac, why don't you go to the following colloquium to find out the truth? Andrew. --- Garrett Cobarr ??? > The Johns Hopkins Applied Physics Lab in Laurel, > Maryland will host a > colloquium with Virginia Tech's Srinidhi Varadarajan > on December 5 > that's open to the public. > > http://www.jhuapl.edu/colloquium/schedule.html > _______________________________________________ > clusters mailing list | clusters at lists.apple.com > Help/Unsubscribe/Archives: > http://www.lists.apple.com/mailman/listinfo/clusters > Do not post admin requests to the list. They will be ignored. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Nov 18 23:39:00 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 18 Nov 2003 23:39:00 -0500 (EST) Subject: thermal sensing In-Reply-To: Message-ID: > I can handle the software side, but I'm not really > sure where to begin looking on the hardware side. ibutton and a serial interface, $25 or so. > hardware. I am looking for a temperature sensor that > can simply go high or low when the threshold is reached. gross. data is cheap, machines are fast; why not collect 16ths of a degree every few seconds? www.ibutton.com. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lars at meshtechnologies.com Wed Nov 19 01:58:13 2003 From: lars at meshtechnologies.com (Lars Henriksen) Date: 19 Nov 2003 06:58:13 +0000 Subject: GenericNQS batch system In-Reply-To: <20031119032430.22152.qmail@web16803.mail.tpe.yahoo.com> References: <20031119032430.22152.qmail@web16803.mail.tpe.yahoo.com> Message-ID: <1069225093.2286.6.camel@fermi> On Wed, 2003-11-19 at 03:24, Andrew Wang wrote: > Are u supporting legacy systems? Is using a different > batch system an option? Well, short of rewriting a large scripted system, i have no choice but to use GNQS :-( > The two most popular batch systems these days are > Gridengine (SGE) and Scalable PBS (SPBS). Yeah, i usually use SPBS. Thanks for your input, best regards Lars -- Lars Henriksen | MESH-Technologies A/S Systems Manager & Consultant | Forskerparken 10 www.meshtechnologies.com | DK-5260 Odense M, Denmark lars at meshtechnologies.com | mobile: +45 2291 2904 direct: +45 6315 7310 | fax: +45 6315 7314 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 19 07:52:38 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 19 Nov 2003 07:52:38 -0500 (EST) Subject: thermal sensing In-Reply-To: Message-ID: On Tue, 18 Nov 2003, Geoff Galitz wrote: > > > G'day. > > I need to put together a little system which can > monitor the temperature of a machine room, and > when a certain threshold is reached, trigger a > program to run. > > I can handle the software side, but I'm not really > sure where to begin looking on the hardware side. > I've been to a few engineering web sites and catalogues > but haven't really found just what I need in terms of > hardware. I am looking for a temperature sensor that > can simply go high or low when the threshold is reached. > Any recommendations? There are a bunch of links on http://www.phy.duke.edu/brahma for temperature sensors, and there is even a place where you can get a "kit" of components and build your own. Prices for read to run solutions range from $100-200 on up to netbotz, which can be pretty expensive but which have lots of fabulous features and sensors. rgb > > If there is already a device or howto out there on how to > do this, that would be great too. > > Thanks, > -geoff > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Wed Nov 19 04:28:11 2003 From: nixon at nsc.liu.se (nixon at nsc.liu.se) Date: Wed, 19 Nov 2003 10:28:11 +0100 Subject: thermal sensing In-Reply-To: (Geoff Galitz's message of "Tue, 18 Nov 2003 16:26:12 -0800") References: Message-ID: Geoff Galitz writes: > I've been to a few engineering web sites and catalogues > but haven't really found just what I need in terms of > hardware. I am looking for a temperature sensor that > can simply go high or low when the threshold is reached. > Any recommendations? Picotech's stuff is nice. Linux drivers are supplied. http://www.picotech.com/thermistor.html -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Nov 19 08:08:09 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 19 Nov 2003 08:08:09 -0500 (EST) Subject: Reminder: the 5th Annual Beowulf Bash is tonight! Message-ID: All of the information is front-and-center at http://www.Beowulf.org and http://www.Beowulf.org/beowulf/bash The summary is The Annual Beowulf Bash is held in conjunction with the IEEE SC conferences. The party is tonight, Wednesday November 19th, 2003 at the Phoenix Hyatt directly, across the street from the SC2003 venue. It's be held on the second floor atrium, and we'll have large signs posted. We are pleased to introduce a new magazine as a sponsor, and welcome back Etnus, a founding sponsor from 1999 and 2000. Other sponsors are AMD, Penguin and Scyld (a founding sponsor). A note to attendee: please bring a camera: we'll be collecting for a pictorial on beowulf.org. Please note blackmail-worthy images so that we can fund next year's bash ;-> -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Nov 19 09:15:33 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 19 Nov 2003 09:15:33 -0500 (EST) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Wed, 19 Nov 2003, Felix Rauch wrote: > On Tue, 18 Nov 2003, Robert G. Brown wrote: > > I don't think that they are quite "done", though, (at least the last > > time I checked) so yes, I'd call it a "start" to the idea. Not really > > my idea, as you can see. I think there are lots of folks who have > > thought on this, and lots more that have a de facto suite they use > > whether or not they are packaged. lmbench, netpipe, netperf, bonnie, > > memtest86 -- lots of tools out there for doing bits of this, some of > > them very nice. > > Please correct me if I'm wrong, but if I remember correctly netpipe > and netperf are one-to-one benchmarks. While these are important to > find out more about (and tune) the performance of your NICs, we need > more to find out about the overall performance of the whole cluster > network. No, of course I agree in detail with all of the observations below. This was what I meant when I suggested tests involving various message passing communications patterns in raw sockets, MPI, PVM -- in more detail, master-slave (boring but often relevant), tree distribution, all-to-all with and without some effort to avoid collisions, etc. Netpipes is very nice and does let you test PVM and MPI, but isn't really engineered for driving a cluster switch to its figurative knees. > > There are switches who's backplane offers only half bisectional > bandwidth, which might be fine for some applications. Other switches > are advertized to offer full bisectional bandwidth, but they simply > can't hold the promise. Other switches are expensive but deliver real > full bisection bandwidth. Some applications don't care if they don't > have a full-bisection-bandwidth network -- others do. > > So, for a comprehensive cluster benchmark, we should also have tools > to get insight into the inner workings of the network. Our reserach > group introduced such a benchmark as part of our paper > "Cost/Performance Tradeoffs in Network Interconnects for Clusters of > Commodity PCs" presented at this years CAC workshop (see [1]). We > found out that some switches perform rather poorly for some > communication patterns and that a full bisection bandwidth can play a > role for the performance of some applications (e.g. car traffic > simulation). > > While we don't have a ready-to-be-used-for-all-clusters kind of > benchmark, I still hope the ideas might be valuable for this > discussion. > > - Felix > > [1] http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 This is the kind of thing that should ultimately be a component of any full suite. What we really need are some handy dandy students who want to write and GPL all of this stuff and publish it. Alas, I'm a physicist and don't have the right kind of students, and although I do work on writing it myself I lack the time to really put it all together. It does seem like the sort of project a CS department with research efforts in cluster computing might want to tackle and "own", the way the Clemson guys own PVFS. Maybe I'll talk to my CS cluster colleagues here at Duke and see if a joint proposal can be worked out, perhaps collaboratively with a few other interested groups elsewhere. I seriously think that there is real computer science work to be done here, with an end stage goal being the creation of a daemon or kernel module that automagically generates microbenchmark numbers (ideally from a suite of modules that can be added or deleted at any time by e.g. dropping a suitably instrumente program file in a suitable directory) that are subsequently published in /proc (I've suggested this on the lmbench list at least twice now, to no avail). The advantage of this is that one COULD then rewrite e.g. ATLAS so that instead of having to be rebuilt for each micro-architecture on which it might run (a tedious and time consuming process) it simply drops its basic parametric tests in (if they aren't already in the default set) and runs. When it runs it reads in increasingly accurate numbers from /proc and dynamically autotunes. One could likely add a damped gradient search to the autotuning routine so that it can actually adjust itself (gradually) to very specific features of the system on which it is running, including the effect of the rest of its typical dynamic load. And not just ATLAS, of course. ANY program that might need to switch algorithm or access pattern based on microperformance metrics could benefit. As a single example, it might be possible to write a PVM or MPI program that automagically selects an optimal message passing pattern IF there were microbenchmark results immediately available indicating message passing efficiency at various scales (varying message size, distribution pattern, number of nodes). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Nov 19 08:56:33 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 19 Nov 2003 14:56:33 +0100 (CET) Subject: New cluster benchmark proposal (Re: top500 list) In-Reply-To: Message-ID: On Tue, 18 Nov 2003, Robert G. Brown wrote: > I don't think that they are quite "done", though, (at least the last > time I checked) so yes, I'd call it a "start" to the idea. Not really > my idea, as you can see. I think there are lots of folks who have > thought on this, and lots more that have a de facto suite they use > whether or not they are packaged. lmbench, netpipe, netperf, bonnie, > memtest86 -- lots of tools out there for doing bits of this, some of > them very nice. Please correct me if I'm wrong, but if I remember correctly netpipe and netperf are one-to-one benchmarks. While these are important to find out more about (and tune) the performance of your NICs, we need more to find out about the overall performance of the whole cluster network. There are switches who's backplane offers only half bisectional bandwidth, which might be fine for some applications. Other switches are advertized to offer full bisectional bandwidth, but they simply can't hold the promise. Other switches are expensive but deliver real full bisection bandwidth. Some applications don't care if they don't have a full-bisection-bandwidth network -- others do. So, for a comprehensive cluster benchmark, we should also have tools to get insight into the inner workings of the network. Our reserach group introduced such a benchmark as part of our paper "Cost/Performance Tradeoffs in Network Interconnects for Clusters of Commodity PCs" presented at this years CAC workshop (see [1]). We found out that some switches perform rather poorly for some communication patterns and that a full bisection bandwidth can play a role for the performance of some applications (e.g. car traffic simulation). While we don't have a ready-to-be-used-for-all-clusters kind of benchmark, I still hope the ideas might be valuable for this discussion. - Felix [1] http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Wed Nov 19 11:08:20 2003 From: jcownie at etnus.com (James Cownie) Date: Wed, 19 Nov 2003 16:08:20 +0000 Subject: Yotta Yotta Message-ID: <1AMUsO-752-00@etnus.com> Despite reports on this list to the contrary, Yotta Yotta are still in business, and have a stand here at SC. If you ask Wayne _really_ nicely he even has a few Yotta Yotta cubes :-) -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Wed Nov 19 11:17:27 2003 From: jcownie at etnus.com (James Cownie) Date: Wed, 19 Nov 2003 16:17:27 +0000 Subject: Q: Any info on the PathScale compilers? In-Reply-To: Message from Bill Broadley of "Tue, 18 Nov 2003 15:39:06 PST." <20031118233906.GA520@sphere.math.ucdavis.edu> Message-ID: <1AMV1D-75g-00@etnus.com> I attended a talk by one of the PathScale folks on the IBM booth. The compilers are based on the Open64 sources released under GPL by SGI. (Presumably they have some expert GPL lawyers). The SPEC numbers quoted were unlabelled as to whether they were peak or base. Some marginally conflicting claims were made :- The only compiler designed from the ground up for Opteron A stable code base from Open64 (presumably they mean the code-generator was designed from scratch for Opteron). -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Wed Nov 19 17:21:24 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed, 19 Nov 2003 14:21:24 -0800 Subject: Q: Any info on the PathScale compilers? In-Reply-To: <1AMV1D-75g-00@etnus.com> References: <20031118233906.GA520@sphere.math.ucdavis.edu> <1AMV1D-75g-00@etnus.com> Message-ID: <20031119222124.GA6034@sphere.math.ucdavis.edu> Alas my ethernet AND wireless seem to be buying on the dell laptop I'm using, keeping my notes off the network. At least unless I can find a smallish phillips screwdriver in the downtown pheonix area. In any case the sheets I got are labeled speculative I believe, I have the ratios handy: Estimate ratios for an IBM eserver 325 dual 2.0 GHz with PC3200 CINT2000 = 1065 953 1364 615 1714 935 1605 1362 1138 2206 1086 1011 CFP2000 = 1733 2225 1526 1277 1660 2425 1347 1341 1654 1134 1415 1275 613 1150 INT 1200 est, INT base 1173 FP 1416 est, FP base 1237 I don't have similar numbers for NAG, PGI, or anyone else who has an opteron compiler handy. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Thu Nov 20 00:49:31 2003 From: josip at lanl.gov (Josip Loncaric) Date: Wed, 19 Nov 2003 22:49:31 -0700 Subject: thermal sensing In-Reply-To: References: Message-ID: <3FBC55EB.4040009@lanl.gov> Geoff Galitz wrote: > I need to put together a little system which can > monitor the temperature of a machine room, and > when a certain threshold is reached, trigger a > program to run. If cost must be minimized, how about a cheap $10 thermostat suitably wired to a serial port DCD or CTS line? This may provide on/off thermal signaling (e.g. some UPS units use this method to signal power failures). On a related note, Mark Hahn mentioned this back in June: http://www.ibutton.com/ibuttons/thermochron.html which could be useful in somewhat different situations... Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scameron at ubi.com Thu Nov 20 11:05:29 2003 From: scameron at ubi.com (Scott Cameron) Date: Thu, 20 Nov 2003 11:05:29 -0500 Subject: Linux 2.4.20 + bonding troubles Message-ID: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> Hi there, I'm not sure where to look for information regarding this. I've been trying to implement an etherchannel setup for one of my systems and have been seeing varied success. I have the first etherchannel set up on 2 Intel Etherexpress 100 cards (e100 driver), it seems to work with little problem -- the only issue with this etherchannel I have seen is that the channel can not seem go above 100 megabits, while I certainly generate enough throughput to go beyond 100 megabits. The second etherchannel is on 2 Intel 1000 Mbit cards (e1000 driver). I've had the most trouble with this channel -- when I bring it up, the interface begins showing CRC errors & collisions for Tx (not Rx). However, I don't see the collisions on the switch -- just the CRC errors. Both channels are running in load-balancing round-robin mode. On the switch I have the port-channel configured to do source XOR destination IP load-balancing. The switch I'm connecting to is a Catalyst 6006 running the integrated IOS. I can't see any errors in the log on the switch, and not sure how to proceed to figure out where the CRC errors are coming from. If anyone could point me in the right direction that would be great. Scott Switch: Cisco Catalyst 6006 (integrated IOS) Linux box: P3-1.4 GHz, 2.4.20 kernel 2x Intel PRO/1000 (driver 5.2.20) 4x Intel PRO/100 (driver 2.1.24-k1) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From siegert at sfu.ca Thu Nov 20 13:01:07 2003 From: siegert at sfu.ca (Martin Siegert) Date: Thu, 20 Nov 2003 10:01:07 -0800 Subject: Linux 2.4.20 + bonding troubles In-Reply-To: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> References: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> Message-ID: <20031120180107.GA10741@stikine.ucs.sfu.ca> Hi Scott, we ran into the same problem here: the problem is not Linux (as you mentioned you can set the channels to round-robin under Linux), but the Cisco switch: you cannot set the Cisco to round-robin mode on the etherchannel [partially to blame is the IEEE 802.3ad standard, which does not specify round-robin mode; but that standard was probably intended for Telco situations (serving many connections at the same time) instead of HPC situations (aiming at high throughput)]. As a consequence the Cisco will always forward all packets to a single leg of the outgoing trunk. If your receiving trunk is made out of 100Mbit/s connections this will limit you to 100Mbit/s. There is not much you can do about this: If all machines that connect to that network have two NICs, you can create two VLANs on the Cisco and connect the first of two NICs of each box to VLAN 1 and the second VLAN 2. If you are not in that situation (and we aren't) the only thing that you can do is to forklift the Cisco out of the way and buy a switch that supports round-robin mode on etherchannels, e.g., Extreme's Black Diamond switches. Cheers, Martin -- Martin Siegert Manager, Research Services WestGrid Site Manager Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 On Thu, Nov 20, 2003 at 11:05:29AM -0500, Scott Cameron wrote: > Hi there, > > I'm not sure where to look for information regarding this. > > I've been trying to implement an etherchannel setup for one of my systems > and have been seeing varied success. I have the first etherchannel set up > on 2 Intel Etherexpress 100 cards (e100 driver), it seems to work with > little problem -- the only issue with this etherchannel I have seen is that > the channel can not seem go above 100 megabits, while I certainly generate > enough throughput to go beyond 100 megabits. > > The second etherchannel is on 2 Intel 1000 Mbit cards (e1000 driver). I've > had the most trouble with this channel -- when I bring it up, the interface > begins showing CRC errors & collisions for Tx (not Rx). However, I don't > see the collisions on the switch -- just the CRC errors. > > Both channels are running in load-balancing round-robin mode. On the switch > I have the port-channel configured to do source XOR destination IP > load-balancing. > > The switch I'm connecting to is a Catalyst 6006 running the integrated IOS. > I can't see any errors in the log on the switch, and not sure how to proceed > to figure out where the CRC errors are coming from. > > If anyone could point me in the right direction that would be great. > > Scott > > Switch: Cisco Catalyst 6006 (integrated IOS) > Linux box: P3-1.4 GHz, 2.4.20 kernel > 2x Intel PRO/1000 (driver 5.2.20) > 4x Intel PRO/100 (driver 2.1.24-k1) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jmoyer at redhat.com Thu Nov 20 16:25:58 2003 From: jmoyer at redhat.com (Jeff Moyer) Date: Thu, 20 Nov 2003 16:25:58 -0500 Subject: Linux 2.4.20 + bonding troubles In-Reply-To: <20031120180107.GA10741@stikine.ucs.sfu.ca> References: <2E17AC0ED12EA54F9BEF813457413A6401C91433@srvmail1-mtl.montreal.ubisoft.org> <20031120180107.GA10741@stikine.ucs.sfu.ca> Message-ID: <16317.12646.988920.171852@segfault.boston.redhat.com> ==> Regarding Re: Linux 2.4.20 + bonding troubles; Martin Siegert adds: [snip] siegert> There is not much you can do about this: If all machines that siegert> connect to that network have two NICs, you can create two VLANs on siegert> the Cisco and connect the first of two NICs of each box to VLAN 1 siegert> and the second VLAN 2. If you are not in that situation (and we siegert> aren't) the only thing that you can do is to forklift the Cisco siegert> out of the way and buy a switch that supports round-robin mode on siegert> etherchannels, e.g., Extreme's Black Diamond switches. Note that a simple round robin scheme for sending packets can cause performance issues as well if you get tcp packet reordering. See, for example: http://roland.grc.nasa.gov/~mallman/papers/tcp-reorder-ccr.ps Cheers, Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Nov 20 22:39:05 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 21 Nov 2003 11:39:05 +0800 (CST) Subject: News, FYI: High Productivity Computing Systems & PBS 5.4 Message-ID: <20031121033905.45494.qmail@web16806.mail.tpe.yahoo.com> "... Part of this (High Productivity Computing System) is looking into better super-computing benchmarks": http://www.aceshardware.com/#75000448 Also, some news about PBS from sc2003: 1) http://www.supercomputingonline.com/article.php?sid=5079 2) http://www.supercomputingonline.com/article.php?sid=5089 3) http://www.supercomputingonline.com/article.php?sid=5090 Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From neil.brown at syngenta.com Fri Nov 21 10:35:10 2003 From: neil.brown at syngenta.com (neil.brown at syngenta.com) Date: Fri, 21 Nov 2003 15:35:10 -0000 Subject: RHEL Copyright Removal Message-ID: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Hi all, We're having a bit of a dilemma here, as I'm sure many others are, about what to use as our standard Linux distro with the end of life of the Red Hat family. RHEL or SLES are looking favourites in terms of supportability, but of course there's the not insignificant problem of cost. The thought of having to pay at least $179 per server, with around 50 compute nodes, along with various other non-beowulf Linux servers doesn't appeal. I've been trying to find out how much effort it takes to strip the RH copyrighted bits out of RHEL and compile it for our own use and whether doing so reduces it's functionality a great deal. I've trawled the web and usenet, but not found much to write home about on the subject. Have any of you had experiences with such an exercise? Were they positive? How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure whether it's ES or AS) so it surely can't be that bad as a cluster oriented distro. Thanks for any suggestions, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tlovie at pokey.mine.nu Fri Nov 21 11:00:17 2003 From: tlovie at pokey.mine.nu (Thomas Lovie) Date: Fri, 21 Nov 2003 11:00:17 -0500 Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Unfortunately, this is not a trivial task. I had attempted to re-build 2.1AS, and to get everything to build is quite tricky, since various packages have dependency lists that sometimes conflict. From what I understand, building 3.0AS is even more difficult. But there are others that share the same dilema, and much progress has been made on doing this so far. You might want to check out this mailing list: rhel-rebuild mailing list rhel-rebuild-l at uibk.ac.at Hosted at the University of Innsbruck, Austria And also a distribution called cAos at: caosity.org (I believe they have a mailing list) Tom Lovie. -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of neil.brown at syngenta.com Sent: Friday, November 21, 2003 10:35 AM To: beowulf at beowulf.org Subject: RHEL Copyright Removal Hi all, We're having a bit of a dilemma here, as I'm sure many others are, about what to use as our standard Linux distro with the end of life of the Red Hat family. RHEL or SLES are looking favourites in terms of supportability, but of course there's the not insignificant problem of cost. The thought of having to pay at least $179 per server, with around 50 compute nodes, along with various other non-beowulf Linux servers doesn't appeal. I've been trying to find out how much effort it takes to strip the RH copyrighted bits out of RHEL and compile it for our own use and whether doing so reduces it's functionality a great deal. I've trawled the web and usenet, but not found much to write home about on the subject. Have any of you had experiences with such an exercise? Were they positive? How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure whether it's ES or AS) so it surely can't be that bad as a cluster oriented distro. Thanks for any suggestions, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Nov 21 11:30:36 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 21 Nov 2003 11:30:36 -0500 (EST) Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: On Fri, 21 Nov 2003 neil.brown at syngenta.com wrote: > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red Hat > family. RHEL or SLES are looking favourites in terms of supportability, but > of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > This will soon be a FAQ. The best solutions kicked around so far (if you wish to stick with basically free RPM-based full-service kickstartable/pxe installable distros) are two community supported efforts: Fedora: http://fedora.redhat.com This is basically a core that is RH9 with all the logos etc stripped down to where they inherit GPL (eventually completely, I imagine). It is a community supported model, where I believe they are looking for people to take on pieces of the bug triage tree -- Adopt a Package Today! It is designed in layers, with a "core" that should be fully functional at the server and workstation level and supported as well as anything out there, a legacy layer, and a contributed/kitchen sink layer with less stable but bleeding-edge useful stuff. The project is yummified from the beginning, which means that it is very simple to create/rsync your own repository mirror and then use yum to maintain a LAN or cluster from it. At a guess, NEARLY anything you have set up for RH 9 will eventually be quite portable to fedora, although on the yum list I hear of occasional exceptions, as one might expect until things settle down. The fedora core is "in production" now at version 1, I believe, although I expect that only hardy admins and developer types are adopting it at first until and to help it settle in. Note well that www.fedora.org is a site that will just ask you to go away, it is NOT associated with this project...:-( Note well that Red Hat IS associated with this project. This may or may not make you feel good about going this route. I personally think they are strongly committed to it as they rely on SOMETHING to create a rawhide -> semistable released -> rockstable corporate chain; they damn well can't unlease barely-out-of-rawhide on people paying big bucks per seat and disinclined to participate in the debugging process. Caosity: http://www.caosity.org This is Community Linux WITHOUT corporate strings, run at least in part by clustervolken. They too are stripping RH as a base, but plan to eventually diverge. At a guess, at some point there will be Much Synthesis and sharing between the two projects as it would be silly not to. They too are soliciting humans to help out. I know people heavily engaged in both projects, and expect both projects to be stable at the starting level of RH9 before RHL support ceases. One or the other will likely be the most successful at setting up and organizing the bug triage network and perhaps eventually dominate, although they are also likely to differentiate in focus (Caos has a very definite cluster/scientific computing flavor due to the work environments of a lot of the primary drivers). HTH. I personally have stopped worrying about the transition, and plan to convert my personal machines to fedora "soon" to start screwing around with it prior to a campus conversion likely in the spring. "Soon" as in my first rsync of the oceanic fedora core to my home repository server is being slurped through a DSL straw as I type this, likely done sometime today. I have a totally idle box all lined up to be first, PXE and kickstart all happy -- I'll cheerfully report my experiences as soon as I have any. rgb > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. > > Thanks for any suggestions, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Fri Nov 21 12:24:06 2003 From: canon at nersc.gov (canon at nersc.gov) Date: Fri, 21 Nov 2003 09:24:06 -0800 Subject: RHEL Copyright Removal In-Reply-To: Message from "Thomas Lovie" of "Fri, 21 Nov 2003 11:00:17 EST." <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Message-ID: <200311211724.hALHO744026500@pookie.nersc.gov> Niel, The two projects/groups Tom mentioned are a good starting point. I have rebuilt 3.0AS without too much trouble and that's without purchasing a copy (which would have given me a better jumping off point). Another project that might be appealing is whitebox. http://www.beau.org/~jmorris/linux/whitebox/index.html This is an already rebuilt RHEL with all trademarks removed. --Shane ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From j.c.burton at gats-inc.com Fri Nov 21 12:36:26 2003 From: j.c.burton at gats-inc.com (John Burton) Date: Fri, 21 Nov 2003 12:36:26 -0500 Subject: RHEL Copyright Removal In-Reply-To: References: Message-ID: <3FBE4D1A.6030803@gats-inc.com> I'm running Fedora on one of my machines and am pretty happy with it - its RedHat Linux with the names and logos changed. It uses a slightly newer kernel than RH9 (2.4.22 vs 2.4.20 IIRC). One minor difficulty I had came from trying to compile a third party (nvidia) kernel module. The kernel is compiled with gcc32, but the default compiler is gcc33. Both are supplied on the system, so you just have to be careful about specifying which compiler to use. I'm guessing there is some issue with gcc33 and the kernel... So far so good... we'll probably go with fedora for development or personal workstations and RHEL for our production servers... John Robert G. Brown wrote: >On Fri, 21 Nov 2003 neil.brown at syngenta.com wrote: > > > >>Hi all, >> >>We're having a bit of a dilemma here, as I'm sure many others are, about >>what to use as our standard Linux distro with the end of life of the Red Hat >>family. RHEL or SLES are looking favourites in terms of supportability, but >>of course there's the not insignificant problem of cost. The thought of >>having to pay at least $179 per server, with around 50 compute nodes, along >>with various other non-beowulf Linux servers doesn't appeal. >> >>I've been trying to find out how much effort it takes to strip the RH >>copyrighted bits out of RHEL and compile it for our own use and whether >>doing so reduces it's functionality a great deal. I've trawled the web and >>usenet, but not found much to write home about on the subject. >> >> >> > >This will soon be a FAQ. The best solutions kicked around so far (if >you wish to stick with basically free RPM-based full-service >kickstartable/pxe installable distros) are two community supported >efforts: > >Fedora: http://fedora.redhat.com > >This is basically a core that is RH9 with all the logos etc stripped >down to where they inherit GPL (eventually completely, I imagine). It >is a community supported model, where I believe they are looking for >people to take on pieces of the bug triage tree -- Adopt a Package >Today! It is designed in layers, with a "core" that should be fully >functional at the server and workstation level and supported as well as >anything out there, a legacy layer, and a contributed/kitchen sink layer >with less stable but bleeding-edge useful stuff. The project is >yummified from the beginning, which means that it is very simple to >create/rsync your own repository mirror and then use yum to maintain a >LAN or cluster from it. > >At a guess, NEARLY anything you have set up for RH 9 will eventually be >quite portable to fedora, although on the yum list I hear of occasional >exceptions, as one might expect until things settle down. > >The fedora core is "in production" now at version 1, I believe, although >I expect that only hardy admins and developer types are adopting it at >first until and to help it settle in. > >Note well that www.fedora.org is a site that will just ask you to go >away, it is NOT associated with this project...:-( > >Note well that Red Hat IS associated with this project. This may or may >not make you feel good about going this route. I personally think they >are strongly committed to it as they rely on SOMETHING to create a >rawhide -> semistable released -> rockstable corporate chain; they damn >well can't unlease barely-out-of-rawhide on people paying big bucks per >seat and disinclined to participate in the debugging process. > >Caosity: http://www.caosity.org > >This is Community Linux WITHOUT corporate strings, run at least in part >by clustervolken. They too are stripping RH as a base, but plan to >eventually diverge. At a guess, at some point there will be Much >Synthesis and sharing between the two projects as it would be silly not >to. They too are soliciting humans to help out. > >I know people heavily engaged in both projects, and expect both projects >to be stable at the starting level of RH9 before RHL support ceases. >One or the other will likely be the most successful at setting up and >organizing the bug triage network and perhaps eventually dominate, >although they are also likely to differentiate in focus (Caos has a very >definite cluster/scientific computing flavor due to the work >environments of a lot of the primary drivers). > >HTH. I personally have stopped worrying about the transition, and plan >to convert my personal machines to fedora "soon" to start screwing >around with it prior to a campus conversion likely in the spring. >"Soon" as in my first rsync of the oceanic fedora core to my home >repository server is being slurped through a DSL straw as I type this, >likely done sometime today. I have a totally idle box all lined up to >be first, PXE and kickstart all happy -- I'll cheerfully report my >experiences as soon as I have any. > > rgb > > > >>Have any of you had experiences with such an exercise? Were they positive? >>How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure >>whether it's ES or AS) so it surely can't be that bad as a cluster oriented >>distro. >> >>Thanks for any suggestions, >>Neil >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> > >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Nov 21 11:16:29 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 21 Nov 2003 08:16:29 -0800 (PST) Subject: RHEL Copyright Removal In-Reply-To: <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Message-ID: what was your build environment? joelja On Fri, 21 Nov 2003, Thomas Lovie wrote: > Unfortunately, this is not a trivial task. I had attempted to re-build > 2.1AS, and to get everything to build is quite tricky, since various > packages have dependency lists that sometimes conflict. From what I > understand, building 3.0AS is even more difficult. But there are others > that share the same dilema, and much progress has been made on doing this so > far. You might want to check out this mailing list: > > rhel-rebuild mailing list > rhel-rebuild-l at uibk.ac.at > Hosted at the University of Innsbruck, Austria > > And also a distribution called cAos at: caosity.org > (I believe they have a mailing list) > > Tom Lovie. > > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of > neil.brown at syngenta.com > Sent: Friday, November 21, 2003 10:35 AM > To: beowulf at beowulf.org > Subject: RHEL Copyright Removal > > > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red Hat > family. RHEL or SLES are looking favourites in terms of supportability, but > of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. > > Thanks for any suggestions, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From p.pennaz at tiscali.it Fri Nov 21 13:27:04 2003 From: p.pennaz at tiscali.it (p.pennaz at tiscali.it) Date: Fri, 21 Nov 2003 19:27:04 +0100 Subject: booting from usb pen drive Message-ID: <3FAA831D0001F2C1@mail-1.tiscali.it> Does anyone know if it is a possibility in boot a linux PC system via USB cartridge? My usb subsystem is working fine. Thank you __________________________________________________________________ Tiscali ADSL SENZA CANONE, paghi solo quando navighi! E in pi? il modem e' GRATIS! Abbonati subito. http://point.tiscali.it/adsl/index.shtml _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Fri Nov 21 11:25:56 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Fri, 21 Nov 2003 10:25:56 -0600 (CST) Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: On Fri, 21 Nov 2003 neil.brown at syngenta.com wrote: > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red Hat > family. RHEL or SLES are looking favourites in terms of supportability, but > of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. For 2.1, I am unsure if there are any binary releases out there or not. For 3, there are a couple groups doing work with this. One is called White Box Enterprise Linux and has binaries up at http://www.beau.org/~jmorris/linux/whitebox/index.html . Another group, www.caosity.org, is doing the same thing, but does not yet have ISO's available. I am involved with this project and we expect to have a testing release out in the next week or so. I've also heard that ROCKS is putting together a rebuild for their own use, but I was unable to find any information about it after a short search. I also know of several other groups that have internal projects to do the same thing. -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jbecker at northwestern.edu Fri Nov 21 10:51:00 2003 From: jbecker at northwestern.edu (Jesse Becker) Date: Fri, 21 Nov 2003 09:51:00 -0600 Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: <20031121155100.GD8468@northwestern.edu> On Fri, Nov 21, 2003 at 03:35:10PM -0000, neil.brown at syngenta.com wrote: > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not sure > whether it's ES or AS) so it surely can't be that bad as a cluster oriented > distro. Actaully, I believe that ROCKS is based on RHEL 2.1 WS. I've used it a few times, and parts of it are quite nice. The ROCKS guys have automated most of the recompile process, but I don't know if the automation includes stripping out the RH stuff. -- Jesse Becker GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From alvin at Mail.Linux-Consulting.com Fri Nov 21 19:15:32 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 21 Nov 2003 16:15:32 -0800 (PST) Subject: RHEL Copyright Removal In-Reply-To: <001201c3b048$91dcdad0$3006a8c0@fishnet.exigentsi.com> Message-ID: hi ya neil/thomas On Fri, 21 Nov 2003, Thomas Lovie wrote: > Unfortunately, this is not a trivial task. I had attempted to re-build > 2.1AS, and to get everything to build is quite tricky, since various > packages have dependency lists that sometimes conflict. i think that strictly depends on "how linux" was installed and which "versions" ... dependencies are relatively easy to solve ... compared to the problems you folks are trying to solve with the clusters only problems in the last few (5) years that i've seen that couldnt be solved was a mix-n-match of latest versions of php, perl, mysql, gcc, bugzilla, mozilla, apache ... ( bugilla would not work when some upgrades/patches are applied ( and d/l the latest patches at the time of each didnt work ( either.. - most all other apps have all worked on any other distro that i tend to use ( rh, slackware, suse, custom, .. ) ( ie .. customers do not need to be locked down to a particular ( older version and forced to pay $$ for it knowing which additional "user application software" you need to have running is what makes 95% of the difference of which distro to use or not and the rest os system tweeking and debugging and patches - i think, imho, "support" is the most expensive part of the cluster's TCO and the hardware is relatively in-expensive in comparason .. - $ 200/server * 50 machines ( $10K ) is still inexpensive compared to hiring an outsourced "linux support" > From what I > understand, building 3.0AS is even more difficult. But there are others > that share the same dilema, and much progress has been made on doing this so > far. You might want to check out this mailing list: > > rhel-rebuild mailing list > rhel-rebuild-l at uibk.ac.at > Hosted at the University of Innsbruck, Austria > > And also a distribution called cAos at: caosity.org > (I believe they have a mailing list) > > Tom Lovie. thanx alvin -- if anybody is local in silicon valley, and want to build alternative cluster distro's using existing "free distro", i'm game .. -- cluster apps that people seem to use http://www.Linux-Consulting.com/Cluster _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From virtualsuresh at yahoo.co.in Sat Nov 22 00:34:52 2003 From: virtualsuresh at yahoo.co.in (=?iso-8859-1?q?Suresh=20Chandra=20Mannava?=) Date: Sat, 22 Nov 2003 05:34:52 +0000 (GMT) Subject: distributed computing applications Message-ID: <20031122053452.63176.qmail@web8005.mail.in.yahoo.com> distributed computing efforts. Sir, I am interested in the area of Distibuted/ Parallel/High performnace computing, as a part of my study I am preparing a list of applications that can utilise distributed computing power. I made a small list by searching on the internet, there are much more applications yet to added. I request you to provide pointers for latest applications and the applictaions I missed. I also request you to provide pointers for applications specific to Beowulf clusters Here is the list: (They are not properly organised) 1) Visualization, image processing, rendering, special effects Parallel ray-tracing University of Bristol (Computer Graphics group) http://www.cs.bris.ac.uk/research/graphics Kwangu Institute of Science & technology (Information System group) http://parallel.kjist.ac.kr/projects.htm 2)Data mining PAPIA -PArallel Protein Information Analysis http://www.rwcp.or.jp/papia PADMA-PArallel Data Mining Agents http://www.eecs.wsu.edu/~hillol/padma.html 3)Goggle ? Web Search Engine with Linux cluster (more than 10,000 servers). 4) High Performance, High availability web servers eddieware, khttpd(Static pages) 5)Computing in Computational fluid dynamics 6)Search for Extraterrestrial Life (SETI) Radio signals are monitored by Computationally-intense algorithms http://setiathome.ssl.berkeley.edu/ 7)Folding at Home: An effort to understand protein folding and aggregation for use toward fighting degenerative diseases. http://www.stanford.edu/group/pandegroup/folding/ 8)Find-a-Drug: http://www.find-a-drug.org/ Evaluates the potential of different molecules to interact with certain protein targets. Molecules that are found to be "hits" can become new drug candidates used for treating important diseases. 9) GIMPS: The Great Internet Mersenne Prime Search: http://www.mersenne.org Searches for record-size Mersenne prime numbers. Discovered the 39th known Mersenne prime number, 2^13,466,917 - 1 on November 14, 2001. 10)Distributed Search for Fermat Number Divisors: http://www.fermatsearch.org/ Searches for additional divisors of Fermat numbers. Found 7 new divisorsin 2003. 1) Brute force attacks on cryptographic keys Cracking RC4, RC5, DES Cracking Passwords http://www.isg.inf.ethz.ch/docu/security/passwd/crackcluster.html http://www.cisiar.org/proyectos/cisilia/home_en.php 12)other Applications Computing for Genomic Analysis Genetic programming ?Big Science? problems involving modeling, simulating and understanding large complex systems, example: cosmology sub atomic physics climate modeling Biomedicine and Biochemistry ===== ---------------------------Research ScholarVIT, India. ________________________________________________________________________ Yahoo! India Mobile: Download the latest polyphonic ringtones. Go to http://in.mobile.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From award at andorra.ad Sat Nov 22 05:30:45 2003 From: award at andorra.ad (Alan Ward) Date: Sat, 22 Nov 2003 11:30:45 +0100 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> Message-ID: <3FBF3AD5.4040301@andorra.ad> Yes, but for the time being you need to boot the kernel off a diskette or network. You also need to hack the kernel a wee bit. I just sent an article on this for linuxgazette.net , and will keep you posted when it comes out (probably next month). Best regards, Alan Ward En/na p.pennaz at tiscali.it ha escrit: > Does anyone know if it is a possibility in boot a linux PC system via USB > cartridge? > My usb subsystem is working fine. > Thank you > > __________________________________________________________________ > Tiscali ADSL SENZA CANONE, paghi solo quando navighi! > E in pi? il modem e' GRATIS! Abbonati subito. > http://point.tiscali.it/adsl/index.shtml > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Nov 22 07:52:54 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 22 Nov 2003 20:52:54 +0800 (CST) Subject: distributed computing applications In-Reply-To: <20031122053452.63176.qmail@web8005.mail.in.yahoo.com> Message-ID: <20031122125254.16605.qmail@web16811.mail.tpe.yahoo.com> Also don't miss Condor, PBS, and Gridengine. They are the enabling technologies for distributed/parallel applications. Andrew. --- Suresh Chandra Mannava ???T???G > distributed computing efforts. > Sir, > I am interested in the area of Distibuted/ > Parallel/High performnace computing, as a part of > my > study I am preparing a list of applications that can > utilise distributed computing power. > I made a small list by searching on the internet, > there are much more applications yet to added. I > request you to provide pointers for latest > applications and the applictaions I missed. > I also request you to provide pointers for > applications specific to Beowulf clusters > > Here is the list: > > (They are not properly organised) > > 1) Visualization, image processing, rendering, > special > effects > Parallel ray-tracing > University of Bristol (Computer Graphics group) > http://www.cs.bris.ac.uk/research/graphics > Kwangu Institute of Science & technology > (Information > System group) > http://parallel.kjist.ac.kr/projects.htm > > 2)Data mining > > PAPIA -PArallel Protein Information Analysis > http://www.rwcp.or.jp/papia > PADMA-PArallel Data Mining Agents > http://www.eecs.wsu.edu/~hillol/padma.html > > 3)Goggle ?Web Search Engine with Linux cluster > (more > than 10,000 servers). > > 4) High Performance, High availability web servers > eddieware, khttpd(Static pages) > > 5)Computing in Computational fluid dynamics > > 6)Search for Extraterrestrial Life (SETI) > Radio signals are monitored by > Computationally-intense > algorithms > http://setiathome.ssl.berkeley.edu/ > > 7)Folding at Home: > An effort to understand protein folding and > aggregation for use toward fighting degenerative > diseases. > http://www.stanford.edu/group/pandegroup/folding/ > > 8)Find-a-Drug: http://www.find-a-drug.org/ > Evaluates the potential of different molecules to > interact with certain protein targets. Molecules > that > are found to be "hits" can become new drug > candidates > used for treating important diseases. > > 9) GIMPS: The Great Internet Mersenne Prime Search: > http://www.mersenne.org > Searches for record-size Mersenne prime numbers. > Discovered the 39th known Mersenne prime number, > 2^13,466,917 - 1 on November 14, 2001. > > 10)Distributed Search for Fermat Number Divisors: > http://www.fermatsearch.org/ > Searches for additional divisors of Fermat > numbers. > Found 7 new divisorsin 2003. > > 1) Brute force attacks on cryptographic keys > Cracking RC4, RC5, DES > Cracking Passwords > http://www.isg.inf.ethz.ch/docu/security/passwd/crackcluster.html > http://www.cisiar.org/proyectos/cisilia/home_en.php > > 12)other Applications > > Computing for Genomic Analysis > Genetic programming > ?Big Science?problems involving modeling, > simulating > and understanding large complex systems, example: > cosmology > sub atomic physics > climate modeling > Biomedicine and Biochemistry > > > ===== > ---------------------------Research ScholarVIT, > India. > > ________________________________________________________________________ > Yahoo! India Mobile: Download the latest polyphonic > ringtones. > Go to http://in.mobile.yahoo.com > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ?C???? Yahoo!?_?? ?????C???B?????????B?R?A???????A???b?H?????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sat Nov 22 10:52:13 2003 From: agrajag at dragaera.net (Jag) Date: 22 Nov 2003 10:52:13 -0500 Subject: booting from usb pen drive In-Reply-To: <3FAA831D0001F2C1@mail-1.tiscali.it> References: <3FAA831D0001F2C1@mail-1.tiscali.it> Message-ID: <1069516333.2018.1.camel@loiosh> On Fri, 2003-11-21 at 13:27, p.pennaz at tiscali.it wrote: > Does anyone know if it is a possibility in boot a linux PC system via USB > cartridge? > My usb subsystem is working fine. The short answer is yes. The long answer is, it depends on your BIOS. Its kinda like a few years ago when some systems would boot from CD and some wouldn't. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sat Nov 22 09:11:27 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sat, 22 Nov 2003 06:11:27 -0800 Subject: booting from odd sources was Re: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <3FBF3AD5.4040301@andorra.ad> Message-ID: <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> Along the same lines (oddly, I was wondering about just this idea (booting from USB)), one can get a IDE<>compact flash adapter for about $20 that mounts right on the motherboard (space permitting). One CAN boot off the CF drive (and you could use sneakernet to get the stuff on the drive in the first place). Check out http://www.mini-box.com/ As far as CF goes, places like Best Buy are selling 64MB for $35, 256MB for $80-85, but I'm sure a bit of research would turn it up cheaper (I was just looking through the inserts in the morning paper). Smart Media, Memory Sticks, and "secure digital memory" appear to be in the same general price range but I don't know about interfaces. And another question.. has anyone done a network boot off a network adapter attached to the USB port? Or, more to my precise need, has anyone done a network boot over a wireless network adapter of any kind? Do the wireless adapters have the PXE or bootrom capability? Does the bios even allow you to specify the "non-wired to the bus" adapaters as a boot device? Is there some fundamental chipset reason why they can't. Jim Lux ----- Original Message ----- From: "Alan Ward" To: Cc: Sent: Saturday, November 22, 2003 2:30 AM Subject: Re: booting from usb pen drive > Yes, but for the time being you need to boot the kernel > off a diskette or network. You also need to hack the > kernel a wee bit. > > I just sent an article on this for linuxgazette.net , and > will keep you posted when it comes out (probably next > month). > > Best regards, > Alan Ward > > > En/na p.pennaz at tiscali.it ha escrit: > > Does anyone know if it is a possibility in boot a linux PC system via USB > > cartridge? > > My usb subsystem is working fine. > > Thank you > > > > __________________________________________________________________ > > Tiscali ADSL SENZA CANONE, paghi solo quando navighi! > > E in pi? il modem e' GRATIS! Abbonati subito. > > http://point.tiscali.it/adsl/index.shtml > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sat Nov 22 13:37:11 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sat, 22 Nov 2003 10:37:11 -0800 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> Message-ID: <000601c3b127$a5845560$32a8a8c0@laptop152422> Aiee.. an answer just long enough to really whet our appetites. A bit longer answer please? Which BIOS? Which mobo? How could one tell (without having the mobo sitting in front of you)? This could be a very elegant solution for booting diskless nodes, since virtually every mobo made today has USB interfaces on it, and would save you the hassle of putting CDROM or Floppy drives out there. I'd point out that NOT every mobo out there has PXE or network boot capability, so this is a nice alternative. ----- Original Message ----- From: "Jag" To: Cc: Sent: Saturday, November 22, 2003 7:52 AM Subject: Re: booting from usb pen drive > On Fri, 2003-11-21 at 13:27, p.pennaz at tiscali.it wrote: > > Does anyone know if it is a possibility in boot a linux PC system via USB > > cartridge? > > My usb subsystem is working fine. > > The short answer is yes. The long answer is, it depends on your BIOS. > Its kinda like a few years ago when some systems would boot from CD and > some wouldn't. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 22 14:51:46 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 22 Nov 2003 14:51:46 -0500 (EST) Subject: RHEL Copyright Removal In-Reply-To: Message-ID: > - i think, imho, "support" is the most expensive part of the cluster's > TCO and the hardware is relatively in-expensive in comparason .. depends. some components of TCO scale with cluster size (fixing hardware failures, initial hardware cost, power, cooling). others scale with some function of the userbase (clueless ones require more support). but in this case, we're talking about the kind of support offered by dists and OS's: it doesn scale with cluster size at all, since the cluster is basically a single machine. > - $ 200/server * 50 machines ( $10K ) is still inexpensive compared to > hiring an outsourced "linux support" but that's silly - for the cost of a person, you get a lot more than what's offered by OS/dist support. in summary, RH is doing a sensible thing. there's a market for OS/dists sold to "high-maintenance" users who can't or don't want to learn the details, and don't have someone else to do it. but it's silly to think that that kind of maintenance contract should scale with cluster size. it's also clear that there will continue to be low-maintenance dists (and afaikt, that's exactly what Fedora is.) I suppose that in a very indirect way, the cost to support a large cluster does scale with size. that is, if you have 1K CPUs that won't work at all, you should be willing to pay more than $200 for support. $200/machine is silly though, since these days, a node can easily be $2k or less, and 10% is simply too much. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tim at otten.co.uk Fri Nov 21 15:59:52 2003 From: tim at otten.co.uk (Tim) Date: Fri, 21 Nov 2003 20:59:52 -0000 Subject: booting from usb pen drive In-Reply-To: <3FAA831D0001F2C1@mail-1.tiscali.it> Message-ID: <20031121205946.MZBN13700.mta05-svc.ntlworld.com@methodman> Depends if your mobo has a boot from usb option. -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of p.pennaz at tiscali.it Sent: 21 November 2003 18:27 To: beowulf at beowulf.org Subject: booting from usb pen drive Does anyone know if it is a possibility in boot a linux PC system via USB cartridge? My usb subsystem is working fine. Thank you __________________________________________________________________ Tiscali ADSL SENZA CANONE, paghi solo quando navighi! E in pi? il modem e' GRATIS! Abbonati subito. http://point.tiscali.it/adsl/index.shtml _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cmwoo at hkucc.hku.hk Fri Nov 21 23:41:33 2003 From: cmwoo at hkucc.hku.hk (Woo Chat Ming) Date: Sat, 22 Nov 2003 12:41:33 +0800 Subject: How : up2date 128 nodes of Redhat 9 ? Message-ID: <3FC7020D@webmaila.hku.hk> Dear beowulf friends, We are a university in Hong Kong and we have a Redhat Linux 9 beowulf cluster consisting of 128 nodes. All of them have real IP address and are connected to the Internet. May I know how can I up2date all those nodes using a single command ? Thanks in advance for your information. Regards, Woo Chat Ming. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Nov 22 17:07:57 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 22 Nov 2003 17:07:57 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: <20031121205946.MZBN13700.mta05-svc.ntlworld.com@methodman> Message-ID: > Depends if your mobo has a boot from usb option. imagine that! I wonder how bootable usb-keys work. it would be pretty useless if the bios only had enough smarts to load a bootsector and run it. the bios must at least contain enough of a usb-block driver to let it emulate a floppy disk. if so, I'd expect linux to "just work"... as long as you can somehow get even a bare kernel loaded, you're home free. things like gui bootloaders or even initrd's are just icing ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Sat Nov 22 18:22:55 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Sun, 23 Nov 2003 07:22:55 +0800 Subject: How : up2date 128 nodes of Redhat 9 ? In-Reply-To: <3FC7020D@webmaila.hku.hk> References: <3FC7020D@webmaila.hku.hk> Message-ID: <1069543373.2179.6.camel@scalable> Hi, by strict definition, your 128 nodes is not a beowulf cluster but a NOW. but anyway, u need a batch system, parallel shell or a script to launch up2date with commandline options. If you have sge, pbs or lsf etc installed, u could lauch up2date and schedule the updates.. a better method is to explore other means to update only your master node and have your master node pushes or your compute nodes pull the updates... better overall security and manageability. take a look at Rocks, Scyld, Oscar etc.. these cluster toolkits helps remove many manual tasks of managing a cluster. Cheers! laurence ps. If u are going for ieee cluster 2003 in hong kong, we can meet and discuss further. On Sat, 2003-11-22 at 12:41, Woo Chat Ming wrote: > Dear beowulf friends, > > We are a university in Hong Kong and we have a Redhat Linux 9 > beowulf cluster consisting of 128 nodes. All of them have real > IP address and are connected to the Internet. > May I know how can I up2date all those nodes using a single > command ? > Thanks in advance for your information. > > Regards, > Woo Chat Ming. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Sat Nov 22 20:58:06 2003 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Sun, 23 Nov 2003 01:58:06 +0000 (UTC) Subject: [OT] statistical calculations In-Reply-To: Message-ID: This is off-topic for this list, I know; but coming from my background (linguistics) I can't think of a better place to ask. It's probably not the usual size problem list-members deal with, but to me it feels like it. I have to process a group of several thousand acquired datasets, each containing well over one hundred numerical items; and eventually, I'm going to have to work with a statistician to pull some meaningful figures out of it all. In other words, the data have to be massaged in some pretty fancy ways. For various reasons outwith my control this is being done principally via a spreadsheet (wouldn't have been an obvious choice for me, but hey, I only know about words, not numbers). Can anyone on this list used to doing this stuff point me towards a GPLed spreadsheet with built-in statistical functions? or an add-in to gnumeric / OpenOffice etc.? (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? Please correct me if I'm barking up a wrong tree here. Any help appreciated, -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sun Nov 23 02:09:12 2003 From: agrajag at dragaera.net (Jag) Date: 23 Nov 2003 02:09:12 -0500 Subject: How : up2date 128 nodes of Redhat 9 ? In-Reply-To: <3FC7020D@webmaila.hku.hk> References: <3FC7020D@webmaila.hku.hk> Message-ID: <1069571352.2022.18.camel@loiosh> On Fri, 2003-11-21 at 23:41, Woo Chat Ming wrote: > Dear beowulf friends, > > We are a university in Hong Kong and we have a Redhat Linux 9 > beowulf cluster consisting of 128 nodes. All of them have real > IP address and are connected to the Internet. > May I know how can I up2date all those nodes using a single > command ? > Thanks in advance for your information. RHN (https://rhn.redhat.com/) can handle this for you. If you get all your machines registered with RHN, you can log into the RHN webpage, and with a few mouse clicks tell it what packages to update on them. Your machines should be running the rhn client daemon, which will regularly connect to RHN's servers and download the appropriate updates. up2date is a part of RHN. If you're not up for paying RH for this service, I suggest looking into yum (http://linux.duke.edu/projects/yum/). With it you can setup a local repository that you update with new updates from Red Hat. You then have a cronjob on all your machines to run yum, which will update them off your local repository. Jag _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sun Nov 23 02:03:14 2003 From: agrajag at dragaera.net (Jag) Date: 23 Nov 2003 02:03:14 -0500 Subject: booting from usb pen drive In-Reply-To: <000601c3b127$a5845560$32a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> Message-ID: <1069570994.2022.10.camel@loiosh> On Sat, 2003-11-22 at 13:37, Jim Lux wrote: > Aiee.. an answer just long enough to really whet our appetites. > A bit longer answer please? Which BIOS? Which mobo? I don't have any specific machine information as to what does and doesn't support it. > How could one tell > (without having the mobo sitting in front of you)? Check online specs/user guides/feature lists from your manufacturer. Ask your sales rep. Consult the almighty oracle that resides at google.com. And any other method you'd normally use to find out information on specific motherboards and other hardware components. > This could be a very elegant solution for booting diskless nodes, since > virtually every mobo made today has USB interfaces on it, and would save you > the hassle of putting CDROM or Floppy drives out there. I'm not sure I'd be a fan of it. On one hand, you could just have one usb pen drive that you use to boot all the nodes. Nice in theory, but I really don't want to have to touch a slave node just to reboot it. Other than that, you'd have a nice usb key sticking out of either the front or rear of all your machines like a sore thumb, and would be quite easy to accidently brush against and break/pull-out/snap-off in your usb port. I have heard of people using usb devices (ipod's) to boot public kiosk machines so that if a machine were cracked into, the real system files couldn't be tampered with, and a reboot would wipe any added back doors. But that's a very different situation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sun Nov 23 09:46:10 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sun, 23 Nov 2003 06:46:10 -0800 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> Message-ID: <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> > > How could one tell > > (without having the mobo sitting in front of you)? > > Check online specs/user guides/feature lists from your manufacturer. > Ask your sales rep. Consult the almighty oracle that resides at > google.com. And any other method you'd normally use to find out > information on specific motherboards and other hardware components. Online spec sheets are usually a bit sketchy, and, while one can download the usermanual for the mobo most of the time, it often resorts to some hokey "Press F2 to get the boot selection menu, then press + or - to move the order around, see Figure nn" which you KNOW isn't the BIOS version you're going to get. As for the almighty oracle that is google, isn't that what this list is??? Perhaps one could email the mfr of the mobo and get an answer.. > > > This could be a very elegant solution for booting diskless nodes, since > > virtually every mobo made today has USB interfaces on it, and would save you > > the hassle of putting CDROM or Floppy drives out there. > > I'm not sure I'd be a fan of it. On one hand, you could just have one > usb pen drive that you use to boot all the nodes. Nice in theory, but I > really don't want to have to touch a slave node just to reboot it. I was thinking of a USB drive on each and every diskless node, not moving the one drive around. > Other than that, you'd have a nice usb key sticking out of either the > front or rear of all your machines like a sore thumb, and would be quite > easy to accidently brush against and break/pull-out/snap-off in your usb > port. Only if you packaged it that way... Lots of mobos have USB ports that come out to a header and they expect you to put a little adapter dohickey (which can cost as much as the USB drive) to create the USB jack on the front panel. Leave the USB drive inside the case. > > > I have heard of people using usb devices (ipod's) to boot public kiosk > machines so that if a machine were cracked into, the real system files > couldn't be tampered with, and a reboot would wipe any added back doors. > But that's a very different situation. As far as I know, you can't make a USB pod readonly, which is what I'd want for a non-tamperable, non-hackable, backup. Not so different. For what it's worth, this is how they do electronic voting machines, except I think they use Compact Flash. There's an "interesting" story about mass software updates of machines in Georgia over a weekend on the internet. (and you think managing the software configuration of a cluster is a challenge!) In my specific case, I'm looking for a cheap, off the shelf diskless boot solution that is compatible with having only wireless access to the node. My application is almost embarassingly parallel (by deliberate design) and the goal is to show that "useful work" can be done with power being the only physical connection to each node. So far, the CF/IDE adapter looks like a winner... > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sun Nov 23 12:19:12 2003 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sun, 23 Nov 2003 17:19:12 +0000 Subject: booting from usb pen drive In-Reply-To: <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> Message-ID: <20031123171912.GA533@galactic.demon.co.uk> On Sun, Nov 23, 2003 at 06:46:10AM -0800, Jim Lux wrote: > > I was thinking of a USB drive on each and every diskless node, not moving > the one drive around. > > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be quite > > easy to accidently brush against and break/pull-out/snap-off in your usb > > port. > Only if you packaged it that way... Lots of mobos have USB ports that come > out to a header and they expect you to put a little adapter dohickey (which > can cost as much as the USB drive) to create the USB jack on the front > panel. Leave the USB drive inside the case. Fine if you can. If you can't the smallest 32M USB drive I've just seen is barely big enough to protrude beyond the rear of the case. Another has a neat cable to extend the USB "plug" by about two feet / 60cm. Just leave it dangling neatly and run a cable tie round it to tie it to the ethernet cable :) > > > > > > > I have heard of people using usb devices (ipod's) to boot public kiosk > > machines so that if a machine were cracked into, the real system files > > couldn't be tampered with, and a reboot would wipe any added back doors. > > But that's a very different situation. > As far as I know, you can't make a USB pod readonly, which is what I'd want > for a non-tamperable, non-hackable, backup. At least one of those I saw yesterday for round the GBP30 mark had a physical R/W switch. > > Not so different. For what it's worth, this is how they do electronic > voting machines, except I think they use Compact Flash. There's an > "interesting" story about mass software updates of machines in Georgia over > a weekend on the internet. (and you think managing the software > configuration of a cluster is a challenge!) > > In my specific case, I'm looking for a cheap, off the shelf diskless boot > solution that is compatible with having only wireless access to the node. My > application is almost embarassingly parallel (by deliberate design) and the > goal is to show that "useful work" can be done with power being the only > physical connection to each node. So far, the CF/IDE adapter looks like a > winner... > > This is effectively only a form factor converter. CF == IDE if you pull one pin low/high. Pull it whichever way (I can't remember right now :) ) and you can write to it as an IDE disk. Unassert it and it becomes CF and read only :) Google for the Soekris wireless devices / the Openbrick low power low form factor devices used primarily as firewalls and WiFi devices - they do more or less exactly this, as do some of the low power mini-ITX boards. CF doesn't like too many writes but read is forever IIRC. HTH, Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From seth at hogg.org Sun Nov 23 16:49:01 2003 From: seth at hogg.org (Simon Hogg) Date: Sun, 23 Nov 2003 21:49:01 +0000 Subject: booting from usb pen drive In-Reply-To: <002101c3b1d0$89def4b0$36a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> Message-ID: <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> > > > This could be a very elegant solution for booting diskless nodes, since > > > virtually every mobo made today has USB interfaces on it, and would save >you > > > the hassle of putting CDROM or Floppy drives out there. > > > > I'm not sure I'd be a fan of it. On one hand, you could just have one > > usb pen drive that you use to boot all the nodes. Nice in theory, but I > > really don't want to have to touch a slave node just to reboot it. > >I was thinking of a USB drive on each and every diskless node, not moving >the one drive around. > > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be quite > > easy to accidently brush against and break/pull-out/snap-off in your usb > > port. >Only if you packaged it that way... Lots of mobos have USB ports that come >out to a header and they expect you to put a little adapter dohickey (which >can cost as much as the USB drive) to create the USB jack on the front >panel. Leave the USB drive inside the case. > >In my specific case, I'm looking for a cheap, off the shelf diskless boot >solution that is compatible with having only wireless access to the node. My >application is almost embarassingly parallel (by deliberate design) and the >goal is to show that "useful work" can be done with power being the only >physical connection to each node. So far, the CF/IDE adapter looks like a >winner... Forgive my intrusion, but I don't see why this approach is so very different from having a disk (sure, it's a solid state disk, but still, it's kind of a disk) and for all the messing with trying to install a usb pen drive in each node, why not just stick a CD-ROM in it to boot from (apart from size)? At least that's pretty much guaranteed to be read-only. But on a related note (and I *think* I have seen it on this list before) how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. It even comes in a 2.5" disk form factor. URL is at http://www.kontron.com/products/pdproductdetail.cfm?keyProduct=31731 for one of them, not sure if there are other developers out there, and I have no idea of cost. Simn _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Sun Nov 23 17:27:22 2003 From: lathama at yahoo.com (Andrew Latham) Date: Sun, 23 Nov 2003 14:27:22 -0800 (PST) Subject: booting from usb pen drive In-Reply-To: <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> Message-ID: <20031123222722.2858.qmail@web60310.mail.yahoo.com> the idea is that using a 32meg usb memory device to boot a system gives you now moving parts, is cheap, is 3l33t. The WEBASDISK/x is cool but I am assuming that they are not under $100USD. --- Simon Hogg wrote: > > > > > This could be a very elegant solution for booting diskless nodes, since > > > > virtually every mobo made today has USB interfaces on it, and would > save > >you > > > > the hassle of putting CDROM or Floppy drives out there. > > > > > > I'm not sure I'd be a fan of it. On one hand, you could just have one > > > usb pen drive that you use to boot all the nodes. Nice in theory, but I > > > really don't want to have to touch a slave node just to reboot it. > > > >I was thinking of a USB drive on each and every diskless node, not moving > >the one drive around. > > > > > Other than that, you'd have a nice usb key sticking out of either the > > > front or rear of all your machines like a sore thumb, and would be quite > > > easy to accidently brush against and break/pull-out/snap-off in your usb > > > port. > >Only if you packaged it that way... Lots of mobos have USB ports that come > >out to a header and they expect you to put a little adapter dohickey (which > >can cost as much as the USB drive) to create the USB jack on the front > >panel. Leave the USB drive inside the case. > > > >In my specific case, I'm looking for a cheap, off the shelf diskless boot > >solution that is compatible with having only wireless access to the node. My > >application is almost embarassingly parallel (by deliberate design) and the > >goal is to show that "useful work" can be done with power being the only > >physical connection to each node. So far, the CF/IDE adapter looks like a > >winner... > > Forgive my intrusion, but I don't see why this approach is so very > different from having a disk (sure, it's a solid state disk, but still, > it's kind of a disk) and for all the messing with trying to install a usb > pen drive in each node, why not just stick a CD-ROM in it to boot from > (apart from size)? At least that's pretty much guaranteed to be read-only. > > But on a related note (and I *think* I have seen it on this list before) > how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. It > even comes in a 2.5" disk form factor. > > URL is at > http://www.kontron.com/products/pdproductdetail.cfm?keyProduct=31731 for > one of them, not sure if there are other developers out there, and I have > no idea of cost. > > Simn > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /---------------------------------------------------------------------------------------------------\ Andrew Latham -LathamA - Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a god or the future with which religions are concerned with. Or, if not impossible, at least impossible at the present time. LathamA.com - (lay-th-ham-eh) - lathama at lathama.com - lathama at yahoo.com \---------------------------------------------------------------------------------------------------/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Sun Nov 23 18:26:40 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Sun, 23 Nov 2003 15:26:40 -0800 Subject: booting from usb pen drive References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> Message-ID: <000e01c3b21a$038f4280$36a8a8c0@laptop152422> It may be a virtual disk, but it's not a device with moving parts or one that requires anywhere as much cooling or power as a real disk. It also allows one to "power on boot" the cluster and be up and running relatively quickly, even with a low bandwidth link among nodes (i.e. wireless network), since one doesn't have to load the entire software image over the net. There are all manner of weird and wonderful adapters and solid state disk emulators aimed at the industrial market, among others, but I was looking for something very consumer/mass market (read cheap), since this is only going to have to work in a lab environment, albeit, no moving parts and DC supply. ----- Original Message ----- From: "Simon Hogg" To: "Jim Lux" ; "Jag" Cc: > > > >In my specific case, I'm looking for a cheap, off the shelf diskless boot > >solution that is compatible with having only wireless access to the node. My > >application is almost embarassingly parallel (by deliberate design) and the > >goal is to show that "useful work" can be done with power being the only > >physical connection to each node. So far, the CF/IDE adapter looks like a > >winner... > > Forgive my intrusion, but I don't see why this approach is so very > different from having a disk (sure, it's a solid state disk, but still, > it's kind of a disk) and for all the messing with trying to install a usb > pen drive in each node, why not just stick a CD-ROM in it to boot from > (apart from size)? At least that's pretty much guaranteed to be read-only. > > But on a related note (and I *think* I have seen it on this list before) > how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. It > even comes in a 2.5" disk form factor. > > URL is at > http://www.kontron.com/products/pdproductdetail.cfm?keyProduct=31731 for > one of them, not sure if there are other developers out there, and I have > no idea of cost. > > Simn > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sun Nov 23 18:49:03 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sun, 23 Nov 2003 18:49:03 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: <1069570994.2022.10.camel@loiosh> Message-ID: > Other than that, you'd have a nice usb key sticking out of either the > front or rear of all your machines like a sore thumb, and would be quite many motherboards have several additional USB ports (as headers) located outside the standard ATX backplate. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Sun Nov 23 19:57:50 2003 From: lathama at yahoo.com (Andrew Latham) Date: Sun, 23 Nov 2003 16:57:50 -0800 (PST) Subject: booting from usb pen drive In-Reply-To: Message-ID: <20031124005750.33631.qmail@web60310.mail.yahoo.com> I would also urge the use of a dongle. just a small one to maybe make a custom mount. Crazy thought of the week. What about KVMs that allow the access of shared USB devices!?!? --- Mark Hahn wrote: > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be quite > > many motherboards have several additional USB ports (as headers) > located outside the standard ATX backplate. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /---------------------------------------------------------------------------------------------------\ Andrew Latham -LathamA - Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a god or the future with which religions are concerned with. Or, if not impossible, at least impossible at the present time. LathamA.com - (lay-th-ham-eh) - lathama at lathama.com - lathama at yahoo.com \---------------------------------------------------------------------------------------------------/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 23 21:39:41 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 24 Nov 2003 13:39:41 +1100 Subject: booting from usb pen drive In-Reply-To: <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069570994.2022.10.camel@loiosh> <4.3.2.7.2.20031123214109.00b599d0@pop.clara.net> Message-ID: <200311241339.49659.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 24 Nov 2003 08:49 am, Simon Hogg wrote: > But on a related note (and I *think* I have seen it on this list before) > how about the "WEBasIDE" little gubbins which provides TCP/IP over IDE. Looking at the website for this it looks like what it actually does is IDE over TCP/IP, rather than the other way around. Still, interesting gadget. :-) Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/wW9zO2KABBYQAh8RAtiWAKCBTpUU00OlUzJ5+pJtfefkRUp90wCfT1TO zSvR7BTh/M7r/1tyGsisB4c= =gq0o -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Nov 23 22:17:02 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 24 Nov 2003 14:17:02 +1100 Subject: RHEL Copyright Removal In-Reply-To: <20031121155100.GD8468@northwestern.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> <20031121155100.GD8468@northwestern.edu> Message-ID: <200311241417.03362.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 22 Nov 2003 02:51 am, Jesse Becker wrote: > Actaully, I believe that ROCKS is based on RHEL 2.1 WS. This is only true for the IA64 version of Rocks 3.0.0 (the current version), the release notes say: http://rocks.npaci.edu/rocks-documentation/3.0.0/release-notes.html Based on RedHat 7.3 for x86 and RedHat Advanced Workstation 2.1 for ia64 (all packages recompiled from publicly available source). > I've used it a few times, and parts of it are quite nice. Rocks is pretty cool, we've recently put it on an IA32 cluster owned by one of our member institutions which we manage for them and we've tweaked the installed systems a little (removed OpenPBS and MAUI and put Scalable PBS and the latest MAUI on instead), but that said it just works (for us). YMMV. :-) > The ROCKS guys have automated most of the recompile process, but I don't > know if the automation includes stripping out the RH stuff. Not tried the IA64 version (yet), so can't comment on that yet. cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/wXguO2KABBYQAh8RArPlAJ9topK3mzXCVkAWljRoXxNhEsxS9wCZAWJG 2pjxpWIAx9/Rpjkvh4Dd4E8= =a+pR -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From anand at novaglobal.com.sg Mon Nov 24 04:30:01 2003 From: anand at novaglobal.com.sg (Anand Vaidya) Date: Mon, 24 Nov 2003 17:30:01 +0800 Subject: RHEL Copyright Removal In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F4@ukjhmbx12.ukjh.zeneca.com> Message-ID: <200311241730.08129.anand@novaglobal.com.sg> You can checkout http://www.whiteboxlinux.org They seem to have successfully produced ISOs from RHEL3 sources. The dist is at RC1 now. Regards, Anand On Friday 21 November 2003 23:35, neil.brown at syngenta.com wrote: > Hi all, > > We're having a bit of a dilemma here, as I'm sure many others are, about > what to use as our standard Linux distro with the end of life of the Red > Hat family. RHEL or SLES are looking favourites in terms of supportability, > but of course there's the not insignificant problem of cost. The thought of > having to pay at least $179 per server, with around 50 compute nodes, along > with various other non-beowulf Linux servers doesn't appeal. > > I've been trying to find out how much effort it takes to strip the RH > copyrighted bits out of RHEL and compile it for our own use and whether > doing so reduces it's functionality a great deal. I've trawled the web and > usenet, but not found much to write home about on the subject. > > Have any of you had experiences with such an exercise? Were they positive? > How much effort was required? I believe ROCKS is based on RHEL 2.1 (not > sure whether it's ES or AS) so it surely can't be that bad as a cluster > oriented distro. > > Thanks for any suggestions, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From neil.brown at syngenta.com Mon Nov 24 03:53:17 2003 From: neil.brown at syngenta.com (neil.brown at syngenta.com) Date: Mon, 24 Nov 2003 08:53:17 -0000 Subject: booting from usb pen drive Message-ID: <0B27450D68F1D511993E0001FA7ED2B3036EE4F7@ukjhmbx12.ukjh.zeneca.com> > -----Original Message----- > From: Jim Lux [mailto:james.p.lux at jpl.nasa.gov] > Sent: 22 November 2003 18:37 > To: Jag > Cc: beowulf at beowulf.org > Subject: Re: booting from usb pen drive > > > Aiee.. an answer just long enough to really whet our appetites. > A bit longer answer please? Which BIOS? Which mobo? How > could one tell > (without having the mobo sitting in front of you)? > This could be a very elegant solution for booting diskless > nodes, since > virtually every mobo made today has USB interfaces on it, and > would save you > the hassle of putting CDROM or Floppy drives out there. I'd > point out that > NOT every mobo out there has PXE or network boot capability, > so this is a > nice alternative. Not sure about specific BIOS/Mobo models, you'd probably need to look at their specs on the respective manufacturers web sites, but Dell PC's have had this functionality built in for a while now. Ford made a big effort to get rid of floppy disk drives and use USB to boot their PC's when they needed to be rebuilt with their standard ghost image (albeit this was most probably Windoze). See http://tinyurl.com/wajo for more about that. My HP desktop PC that I'm writing this on also has support for USB boot, as, I imagine do most modern desktop PC's. As for 1U server motherboards, I'm not so sure, although again I'd imagine that most newish boards would have this capability. As you say, not every mobo has PXE capability and USB boot would certainly be a nice alternative to floppy booting in these cases. However, I think it's likely that if a mobo doesn't have PXE boot capability, it's not likely to have USB boot support either. Given the choice of the two, I'd go for PXE boot in a cluster computing environment unless I was doing a "one off" sort of thing where it wasn't worth the effort on the server side of the PXE boot. Just my tuppence worth, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From neil.brown at syngenta.com Mon Nov 24 04:06:36 2003 From: neil.brown at syngenta.com (neil.brown at syngenta.com) Date: Mon, 24 Nov 2003 09:06:36 -0000 Subject: RHEL Copyright Removal Message-ID: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> > Actaully, I believe that ROCKS is based on RHEL 2.1 WS. I've > used it a few > times, and parts of it are quite nice. The ROCKS guys have automated > most of the recompile process, but I don't know if the > automation includes > stripping out the RH stuff. > > -- > Jesse Becker > GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 Thanks everyone for your replies on this topic. I think part of our problem is that we're ideally looking for a standard distro that we can use on our Linux servers and desktop PC's as well as on our cluster. This would be nice, as it'd make administration easier with the commonality between Linux boxes. Perhaps this isn't the best way of doing it though. I'm beginning to think that maybe something like Fedora would be good for the cluster. I've had a play with it and it seems VERY similar to RH9. The fast paced release cycle wouldn't be so bad for the cluster, as it's easy to rebuild and we wouldn't need to upgrade EVERY time a new Fedora release came out. For the other servers, we often run Oracle and we really need to run a supported distro. The problem is, about the only supported Linux distro's later than RH7.1 are "paid for" ones like RHEL and SLES. They do support UnitedLinux too though. What would be nice is if there was a free Linux distro based on UnitedLinux. I've looked at cAos before. Looks good, I'd like to try it when a release becomes available. Not heard of White Box before, but I'll have a look at it. Thanks again, Neil _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Nov 24 07:13:57 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 24 Nov 2003 07:13:57 -0500 (EST) Subject: [OT] statistical calculations In-Reply-To: Message-ID: On Sun, 23 Nov 2003, Martin WHEELER wrote: > This is off-topic for this list, I know; but coming from my background > (linguistics) I can't think of a better place to ask. > It's probably not the usual size problem list-members deal with, but to > me it feels like it. > > I have to process a group of several thousand acquired datasets, each > containing well over one hundred numerical items; and eventually, I'm > going to have to work with a statistician to pull some meaningful > figures out of it all. > In other words, the data have to be massaged in some pretty fancy ways. > > For various reasons outwith my control this is being done principally > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > I only know about words, not numbers). Can anyone on this list used to > doing this stuff point me towards a GPLed spreadsheet with built-in > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > Please correct me if I'm barking up a wrong tree here. Ask on the GSL (Gnu Scientific Library) list. There have been mentions on the list of people wrapping/encapsulating list functions in various ways, but I can't remember offhand if any of them were inside a spreadsheet per se. It also depends to some extent on what you mean by "built in statistical functions" -- GSL has the basic functions but is not a package like R. Which is the second thing you should probably look at on: www.r-project.org. R is a full-service stats suite with a variety of interfaces including web -- hopefully somebody has wrapped it up into a spreadsheet of some sort. rgb > > Any help appreciated, > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 24 08:20:28 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 24 Nov 2003 14:20:28 +0100 Subject: booting from usb pen drive In-Reply-To: <1069570994.2022.10.camel@loiosh> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> <000601c3b127$a5845560$32a8a8c0@laptop152422> <1069570994.2022.10.camel@loiosh> Message-ID: <1069680028.1218.5.camel@penguin> On Sun, 2003-11-23 at 08:03, Jag wrote: > > > This could be a very elegant solution for booting diskless nodes, since > > virtually every mobo made today has USB interfaces on it, and would save you > > the hassle of putting CDROM or Floppy drives out there. > > I'm not sure I'd be a fan of it. On one hand, you could just have one > usb pen drive that you use to boot all the nodes. Nice in theory, but I > really don't want to have to touch a slave node just to reboot it. > Other than that, you'd have a nice usb key sticking out of either the > front or rear of all your machines like a sore thumb, and would be quite > easy to accidently brush against and break/pull-out/snap-off in your usb > port. > I'll be happy to help anyone who wants to get Stresslinux running. Also another potential use would be for BIOS updates to nodes without floppies. Yes, I know it is just as easy to have a USB floppy drive attached. But the USB keychain things are just so portable. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 24 08:17:46 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 24 Nov 2003 14:17:46 +0100 Subject: booting from odd sources was Re: booting from usb pen drive In-Reply-To: <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <3FBF3AD5.4040301@andorra.ad> <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> Message-ID: <1069679866.1218.2.camel@penguin> On Sat, 2003-11-22 at 15:11, Jim Lux wrote: > Along the same lines (oddly, I was wondering about just this idea (booting > from USB)), one can get a IDE<>compact flash adapter for about $20 that > mounts right on the motherboard (space permitting). One CAN boot off the CF > drive (and you could use sneakernet to get the stuff on the drive in the > first place). I have booted the mini-ITX boards off Compact Flash and USB. Its quite easy. The secret though with the M1000 board is to completely power it off first. I have booted Tyan boards with a USB stick having Stresslinux on it. http://www.stresslinux.org Good to have in your toolkit - does CPU burn, memtest, Bonnie++, lm_sensors _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 24 09:07:52 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 24 Nov 2003 09:07:52 -0500 Subject: booting from usb pen drive In-Reply-To: <1069680028.1218.5.camel@penguin> References: <1069680028.1218.5.camel@penguin> Message-ID: <3FC210B8.60503@lmco.com> Good morning! A friend of mine and I have been talking about this type of thing for about a year now. Our idea was to put a base install on a CF card and boot from it. Prices on CF aren't too bad until you get to the high end (yes I know hard drives are cheaper) and with some cluster distributions, you only need 128 Megs and you would have plenty of space (may be able to get that down to 64 megs). Our goal behind using CF cards was to eliminate hard drives from the nodes as a possible source of downtime. One neat little gizmo we found is a 7-in-1 reader which can also handle floppies: http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=170109 It's a little pricey, but you can have pretty much whatever solid-state media is out there. It fits into a floppy bay. As others have pointed out, if your motherboard can boot off USB then this should work. Enjoy! Jeff > On Sun, 2003-11-23 at 08:03, Jag wrote: > > > > > > This could be a very elegant solution for booting diskless nodes, > since > > > virtually every mobo made today has USB interfaces on it, and > would save you > > > the hassle of putting CDROM or Floppy drives out there. > > > > I'm not sure I'd be a fan of it. On one hand, you could just have one > > usb pen drive that you use to boot all the nodes. Nice in theory, > but I > > really don't want to have to touch a slave node just to reboot it. > > Other than that, you'd have a nice usb key sticking out of either the > > front or rear of all your machines like a sore thumb, and would be > quite > > easy to accidently brush against and break/pull-out/snap-off in your > usb > > port. > > > -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Mon Nov 24 09:29:18 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Mon, 24 Nov 2003 09:29:18 -0500 Subject: [OT] statistical calculations In-Reply-To: <200311241217.hAOCHMS31645@NewBlue.scyld.com> References: <200311241217.hAOCHMS31645@NewBlue.scyld.com> Message-ID: <20031124142918.GA52661@piskorski.com> > From: "Robert G. Brown" > To: Martin WHEELER > On Sun, 23 Nov 2003, Martin WHEELER wrote: > > I have to process a group of several thousand acquired datasets, each > > containing well over one hundred numerical items; and eventually, I'm > > going to have to work with a statistician to pull some meaningful > > figures out of it all. > > In other words, the data have to be massaged in some pretty fancy ways. > > > > For various reasons outwith my control this is being done principally > > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > > I only know about words, not numbers). Can anyone on this list used to > > doing this stuff point me towards a GPLed spreadsheet with built-in > > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > > Please correct me if I'm barking up a wrong tree here. > Ask on the GSL (Gnu Scientific Library) list. There have been mentions > on the list of people wrapping/encapsulating list functions in various > ways, but I can't remember offhand if any of them were inside a > spreadsheet per se. It also depends to some extent on what you mean by > "built in statistical functions" -- GSL has the basic functions but is > not a package like R. Which is the second thing you should probably > look at on: www.r-project.org. R is a full-service stats suite with a > variety of interfaces including web -- hopefully somebody has wrapped it > up into a spreadsheet of some sort. Martin, R should definitely do whatever statistical stuff you want. There is also an R plugin for the Gnumeric spreadsheet, and some stuff to let MS Excel call R. I've never tried either of those plugins, but they might be good if you don't want to use R directly: http://www.omegahat.org/RGnumeric/ For general vendor data clean-up and conversion issues, well, that depends. :) You didn't say enough for me to know whether you need to worry about that or not, but most of the vendor data I've seen (not in linguistics) has always needed cleanup of some sort! In my own line of work, for that sort of thing (which means for financial/market data), I mostly write Tcl code to read and manipulate the files, shove all the data into an RDBMS like Oracle or PostgreSQL, then sometimes do additional processing in the database. This works well, but if you're not already using an RDBMS you probably should NOT want to get into that for just for this one application. Most likely, as long as your data all fits (or almost fits?) into RAM, and you don't need the many-readers many-writers (concurrency, atomicity, etc.) support that a real RDBMS provides, stuffing all your data into a R's built in matrix or dataframe types should be fine. Depending on what the vendor files look like to begin with, you may want to pre-process them a bit with a Tcl, Perl, Python, or whatever script first to make them easier to get into R via R's read.table() function. -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Nov 24 09:12:15 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 24 Nov 2003 15:12:15 +0100 Subject: booting from usb pen drive In-Reply-To: <3FC210B8.60503@lmco.com> References: <1069680028.1218.5.camel@penguin> <3FC210B8.60503@lmco.com> Message-ID: <1069683134.1218.18.camel@penguin> On Mon, 2003-11-24 at 15:07, Jeff Layton wrote: > Good morning! > > A friend of mine and I have been talking about this type > of thing for about a year now. Our idea was to put a base > install on a CF card and boot from it. All you need is a CF to IDE adapter. Google will find plenty, eg. http://www.cfide.co.uk/compact_flash_ide_adapters.shtml The Compact Flash card then plugs straight on an IDE cable. If you put a Linux image on the CF card the machine will boot it just the same as from a hard disk. (Of course I mean you read from the CF and boot to RAM) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Nov 24 09:22:08 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 24 Nov 2003 09:22:08 -0500 Subject: booting from usb pen drive In-Reply-To: <1069683134.1218.18.camel@penguin> References: <1069683134.1218.18.camel@penguin> Message-ID: <3FC21410.3050908@lmco.com> John Hearns wrote: > On Mon, 2003-11-24 at 15:07, Jeff Layton wrote: > > Good morning! > > > > A friend of mine and I have been talking about this type > > of thing for about a year now. Our idea was to put a base > > install on a CF card and boot from it. > > All you need is a CF to IDE adapter. > Google will find plenty, eg. > http://www.cfide.co.uk/compact_flash_ide_adapters.shtml > The Compact Flash card then plugs straight on an IDE cable. > If you put a Linux image on the CF card the machine will boot it just > the same as from a hard disk. > (Of course I mean you read from the CF and boot to RAM) > We were thinking of actually booting and running off the CF card with it being mounted as RO. We would have to move certain things to a RAM disk such as parts of /var, /dev (probably), and a few others. However, using it with something like Warewulf which has already solved most of the details would be really neat. I'll have to think about trying this one. Thanks for the pointer! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Mon Nov 24 09:01:30 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Mon, 24 Nov 2003 22:01:30 +0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> Message-ID: <1069682488.2179.127.camel@scalable> Hi all, RedHat have annouced academic pricing at USD25 per desktop (RHEL WS based) and USD50 for Academic server (RHEL ES based) a week or so ago. >Raleigh, N.C.-based Red Hat, the top seller of the open-source >operating system, will sell students its Red Hat Academic Desktop >product for $25 and sell schools its Red Hat Academic Server product >for $50, including online software updates but no telephone support. >The products will be offered first in the United States, but will be >available internationally by the end of the year, said John Young, vice >president of marketing. I have been building clusters for 5 - 6 years for various customers, and have seen the arrivals and disappearance of distros and cluster distros.... The cluster community have done very well and today, large commercial organisations are adopting linux clusters as one of the tools they use to solve their complex problems. But I find this talk of "stripping" RHEL copyright to create yet another distro to be counter productive as linux beowulf clusters goes into commercial mainstream computing.... where customers have specific support demands. (And yes... commercial customers WILL PAY the full list price of RHEL to build a cluster). Now... I believe the USD25 and USD50 are acceptable pricing for the value that RHEL + RHN brings to the customer (academic). The cost of the OS is a small fraction of the total value of the cluster. Most of our users want a stable and supported OS, but more importantly, most of them run a commercial software of one form or another... and this means that these 3rd party ISV softwares are most likely to be certified on RHEL. It would do me no good if I build a cluster with a "RHEL with copyright removed" or a fedora core as my customers would not be able to get support for their Ansys, Fluent, Matlab and so on and so forth... yes technically they can be the same.. but commercial support matrix says otherwise. BTW ROCKS V3 is based on RHEL 3.0 WS... With the new RHEL academic pricing model, I would encourage all to go for the academic pricing for RHEL and focus on the real problem on hand which is building better cluster systems ontop of a commerical quality, robust and supported OS, rather than try to roll-your-own distro.. and support updates etc etc... Linux have enough Linux distro already. What we should be concentrating on is to create more value ontop of existing distros such as RHEL... create better cluster toolkits like what the Rocks and Oscar guys are doing, or improve on Ganglia, PVFS, distributed shared mem, checkpointing etc.... or focus on getting your apps to run faster... There are alot of cluster problems that needs to be addressed and I believe the community would benefit more if we focus on these issues rather than another distro.... let Redhat make what they deserve, let them continue to engage the ISVs and get them to certify and support RHEL... the wider the based of ISVs running on RHEL.. the faster and wider the adoption of Linux not only in the schools but also in the enterprises. if the community continues to fork a project just becauses it charges some $$$$, our progress would be very slow.... Redhat have listened to the customer and partners and have created a academic pricing model for cluster builders... so we should accept that and move on. today the linux market is anchored by Redhat and a few other linux vendors... imagine if Redhat were to become unprofitable and closes shop.... the impact would be tremendous. yes.. there will always be another linux company that will try to take over redhat position in the market..., but the credibility of the linux community and the opensource business model would be thrown into disarray and you will see droves of commercial ISVs abandoning linux and moving back to UNIX and Windows.... where would that leave us? without commercial apps, linux would never sustain and grow in the commercial arena. cheers! laurence On Mon, 2003-11-24 at 17:06, neil.brown at syngenta.com wrote: > > > Actaully, I believe that ROCKS is based on RHEL 2.1 WS. I've > > used it a few > > times, and parts of it are quite nice. The ROCKS guys have automated > > most of the recompile process, but I don't know if the > > automation includes > > stripping out the RH stuff. > > > > -- > > Jesse Becker > > GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 > > Thanks everyone for your replies on this topic. > > I think part of our problem is that we're ideally looking for a standard > distro that we can use on our Linux servers and desktop PC's as well as on > our cluster. This would be nice, as it'd make administration easier with the > commonality between Linux boxes. Perhaps this isn't the best way of doing it > though. I'm beginning to think that maybe something like Fedora would be > good for the cluster. I've had a play with it and it seems VERY similar to > RH9. The fast paced release cycle wouldn't be so bad for the cluster, as > it's easy to rebuild and we wouldn't need to upgrade EVERY time a new Fedora > release came out. > > For the other servers, we often run Oracle and we really need to run a > supported distro. The problem is, about the only supported Linux distro's > later than RH7.1 are "paid for" ones like RHEL and SLES. They do support > UnitedLinux too though. What would be nice is if there was a free Linux > distro based on UnitedLinux. > > I've looked at cAos before. Looks good, I'd like to try it when a release > becomes available. Not heard of White Box before, but I'll have a look at > it. > > Thanks again, > Neil > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From azubrow at galton.uchicago.edu Mon Nov 24 11:38:19 2003 From: azubrow at galton.uchicago.edu (Alexis Zubrow) Date: Mon, 24 Nov 2003 10:38:19 -0600 (CST) Subject: [OT] statistical calculations In-Reply-To: Message-ID: Martin- A related possibility is to use some sort of database. You might be able to "easily" translate the original datasets into one of the SQL based database formats. If you can do that, I know that some of them can be accessed via python or R, which will give you a much larger suite of computational possibilities. One database that I've tried out is mySQL: http://www.mysql.com I know that this can be accessed via python and R, as well as a bunch of other programming languages. Though it doesn't sound like you want or need to parallelize this, both python and R have wrappers around MPI code. Best, Alexis > > For various reasons outwith my control this is being done principally > > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > > I only know about words, not numbers). Can anyone on this list used to > > doing this stuff point me towards a GPLed spreadsheet with built-in > > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > > Please correct me if I'm barking up a wrong tree here. > > Ask on the GSL (Gnu Scientific Library) list. There have been mentions > on the list of people wrapping/encapsulating list functions in various > ways, but I can't remember offhand if any of them were inside a > spreadsheet per se. It also depends to some extent on what you mean by > "built in statistical functions" -- GSL has the basic functions but is > not a package like R. Which is the second thing you should probably > look at on: www.r-project.org. R is a full-service stats suite with a > variety of interfaces including web -- hopefully somebody has wrapped it > up into a spreadsheet of some sort. > > rgb > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Mon Nov 24 13:29:13 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Mon, 24 Nov 2003 13:29:13 -0500 Subject: [OT] statistical calculations In-Reply-To: <200311241702.hAOH2IS02987@NewBlue.scyld.com> References: <200311241702.hAOH2IS02987@NewBlue.scyld.com> Message-ID: <20031124182913.GA12259@piskorski.com> On Mon, Nov 24, 2003 at 12:02:18PM -0500, beowulf-request at scyld.com wrote: > A related possibility is to use some sort of database. You might be able Yes indeed. > computational possibilities. One database that I've tried out is mySQL: > http://www.mysql.com This is getting, way, way of topic for this list, but as someone who's done a lot of database programming, I feel compelled to point out that, generally speaking, you should never, ever use MySQL for anything important unless you BOTH: 1. Have very specific technical requirements which you have assured yourself MySQL is capable of meeting. (This will be many fewer applications than you might think.) 2. Have specific reasons why MySQL is a better choice for you than any other database. (E.g., you are really cheap, and can find a shared hosting service offering MySQL cheaper than one offering PostgreSQL.) There are many, many reasons why MySQL is usually a poor choice for database applications, but if you care, here are two links to get you started: http://openacs.org/philosophy/why-not-mysql.html http://sql-info.de/mysql/gotchas.html But if you don't want to worry about any of that the answer is simple, just use PostgreSQL instead. (Or perhaps Firebird or SAPdb; but PostgreSQL would be my first choice in any open source database.) -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netmeister.org Mon Nov 24 09:53:47 2003 From: jschauma at netmeister.org (Jan Schaumann) Date: Mon, 24 Nov 2003 09:53:47 -0500 Subject: booting from odd sources was Re: booting from usb pen drive In-Reply-To: <1069679866.1218.2.camel@penguin> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <3FBF3AD5.4040301@andorra.ad> <002a01c3b102$85d4c2b0$32a8a8c0@laptop152422> <1069679866.1218.2.camel@penguin> Message-ID: <20031124145347.GA15355@netmeister.org> John Hearns wrote: > I have booted Tyan boards with a USB stick having Stresslinux on it. > http://www.stresslinux.org `` Hey, it worked ! The SSL/TLS-aware Apache webserver was successfully installed on this website. If you can see this page, then the people who own this website have just installed the Apache Web server software and the Apache Interface to OpenSSL (mod_ssl) successfully. They now have to add content to this directory and replace this placeholder page, or else point the server at their real content. [...]'' Hehe. -Jan -- 'I have reached an age where my main purpose is not to receive messages.' --- Umberto Eco, quoted in the New Yorker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 186 bytes Desc: not available URL: From mp00aa at cosc.brocku.ca Sun Nov 23 18:25:49 2003 From: mp00aa at cosc.brocku.ca (Matthew Timothy Pratola) Date: Sun, 23 Nov 2003 18:25:49 -0500 Subject: [OT] statistical calculations (Martin WHEELER) In-Reply-To: <200311231701.hANH1CS16397@NewBlue.scyld.com> References: <200311231701.hANH1CS16397@NewBlue.scyld.com> Message-ID: Hello Martin, The primary opensource package used in statistical analyses is R, which you can find at www.r-project.org. R is an OSS implementation of the S language, which is also the basis for the commercial package S-Plus. A quick search of "using R in gnumeric" gives the following link: http://www.omegahat.org/RGnumeric/Docs/introduction.pdf which may be helpful. I don't know if any of the other spreadsheet programs have an R plugin written, i would suspect not. At any rate, if the data you are working with will require some pretty fancy approaches, i'd be pretty suprised that any spreadsheet program (ie without said plugin) would be able to do anything half-decent with any degree of reliability. Especially for large datasets. Anyhow, to keep this slightly on-topic, in a recent conversation with someone from the R project, i was told that there is maybe a rough, non-widely distributed implementation of MPI in R, which i think would be nice, but currently searching R and MPI on google does not yield much. Actually the person i spoke to gave me a name to search for, but i don't have that information in front of me right now... -Matt ps - i'm a starving grad student just heading home for xmas vacation, so i don't have a lot to do for the next 3 weeks if you are looking for some short-term R coding work to be done... ....................................................................... Matthew T. Pratola http://zynec.homelinux.net mtpratol _at_ cs.sfu.ca Home: 604.899.8845 Office: 604.291.4983 Department of Statistics and Actuarial Science, Simon Fraser University ....................................................................... > I have to process a group of several thousand acquired datasets, each > containing well over one hundred numerical items; and eventually, I'm > going to have to work with a statistician to pull some meaningful > figures out of it all. > In other words, the data have to be massaged in some pretty fancy ways. > > For various reasons outwith my control this is being done principally > via a spreadsheet (wouldn't have been an obvious choice for me, but hey, > I only know about words, not numbers). Can anyone on this list used to > doing this stuff point me towards a GPLed spreadsheet with built-in > statistical functions? or an add-in to gnumeric / OpenOffice etc.? > (I believe such exist.) Or maybe a library of GPLed spreadsheet macros? > Please correct me if I'm barking up a wrong tree here. > > Any help appreciated, > -- > Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England > mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ > GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB > - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From r.grenyer at imperial.ac.uk Mon Nov 24 10:53:29 2003 From: r.grenyer at imperial.ac.uk (Rich Grenyer) Date: Mon, 24 Nov 2003 15:53:29 +0000 Subject: [OT] statistical calculations In-Reply-To: <20031124142918.GA52661@piskorski.com> Message-ID: <58C0AF58-1E96-11D8-8E20-003065F0ED32@imperial.ac.uk> Likewise, as a heavy-ish R user, I'd say go look, immediately. R is a stunning piece of software anyway, but I suspect the level of interface between most major language packages (as the previous poster said, it talks both ways to Tcl, Perl and Python to name but a few) and database implementations alone would make it your first stop. *Most* statisticians would love you for it, too. Rich On Monday, Nov 24, 2003, at 14:29 Europe/London, Andrew Piskorski wrote: >> From: "Robert G. Brown" >> To: Martin WHEELER > >> On Sun, 23 Nov 2003, Martin WHEELER wrote: >>> I have to process a group of several thousand acquired datasets, each >>> containing well over one hundred numerical items; and eventually, I'm >>> going to have to work with a statistician to pull some meaningful >>> figures out of it all. >>> In other words, the data have to be massaged in some pretty fancy >>> ways. >>> >>> For various reasons outwith my control this is being done principally >>> via a spreadsheet (wouldn't have been an obvious choice for me, but >>> hey, >>> I only know about words, not numbers). Can anyone on this list used >>> to >>> doing this stuff point me towards a GPLed spreadsheet with built-in >>> statistical functions? or an add-in to gnumeric / OpenOffice etc.? >>> (I believe such exist.) Or maybe a library of GPLed spreadsheet >>> macros? >>> Please correct me if I'm barking up a wrong tree here. > >> Ask on the GSL (Gnu Scientific Library) list. There have been >> mentions >> on the list of people wrapping/encapsulating list functions in various >> ways, but I can't remember offhand if any of them were inside a >> spreadsheet per se. It also depends to some extent on what you mean >> by >> "built in statistical functions" -- GSL has the basic functions but is >> not a package like R. Which is the second thing you should probably >> look at on: www.r-project.org. R is a full-service stats suite with a >> variety of interfaces including web -- hopefully somebody has wrapped >> it >> up into a spreadsheet of some sort. > > Martin, R should definitely do whatever statistical stuff you want. > There is also an R plugin for the Gnumeric spreadsheet, and some stuff > to let MS Excel call R. I've never tried either of those plugins, but > they might be good if you don't want to use R directly: > > http://www.omegahat.org/RGnumeric/ > > For general vendor data clean-up and conversion issues, well, that > depends. :) You didn't say enough for me to know whether you need to > worry about that or not, but most of the vendor data I've seen (not in > linguistics) has always needed cleanup of some sort! > > In my own line of work, for that sort of thing (which means for > financial/market data), I mostly write Tcl code to read and manipulate > the files, shove all the data into an RDBMS like Oracle or PostgreSQL, > then sometimes do additional processing in the database. This works > well, but if you're not already using an RDBMS you probably should NOT > want to get into that for just for this one application. > > Most likely, as long as your data all fits (or almost fits?) into RAM, > and you don't need the many-readers many-writers (concurrency, > atomicity, etc.) support that a real RDBMS provides, stuffing all your > data into a R's built in matrix or dataframe types should be fine. > Depending on what the vendor files look like to begin with, you may > want to pre-process them a bit with a Tcl, Perl, Python, or whatever > script first to make them easier to get into R via R's read.table() > function. > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From william.mandra at us.army.mil Sun Nov 23 22:36:01 2003 From: william.mandra at us.army.mil (William J Mandra) Date: Sun, 23 Nov 2003 22:36:01 -0500 Subject: Need a little help getting started Message-ID: Hello all. I am new to this lit and apologize in advance if any of the questions that I have are silly but here it goes. I am in the design phase of a cluster and I am having some trouble figuring out which software packages to use. The cluster will originally consist of 12 nodes linked via 100BaseT switched ethernet and a cluster controller. The following are some of my requirements: 1. All nodes netboot off of the cluster controller 2. automatic process migration and load balancing (openMOSIX) 3. distributed shared memory The cluster controller will be connected to both the main network and the private cluster network and I would like to be able to start applications on the cluster remotely via the cluster controller. I have been doing an exhaustive amount of research on all of the different software available to accomplish this, but I have fallen short in figuring out which ones will work together. I am planning on using Red Hat 9 on all of the nodes in the cluster. I just need a little more information to give me that push in the right direction. I do have some time, s I am not planning to start building the cluster until March or April. Thanks in advance, William Mandra _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Mon Nov 24 15:04:55 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Mon, 24 Nov 2003 12:04:55 -0800 Subject: [OT] statistical calculations (Martin WHEELER) In-Reply-To: (Matthew Timothy Pratola's message of "Sun, 23 Nov 2003 18:25:49 -0500") References: <200311231701.hANH1CS16397@NewBlue.scyld.com> Message-ID: <851xrxh6s8.fsf@blindglobe.net> Matthew Timothy Pratola writes: > Anyhow, to keep this slightly on-topic, in a recent conversation with > someone from the R project, i was told that there is maybe a rough, > non-widely distributed implementation of MPI in R, which i think would be > nice, but currently searching R and MPI on google does not yield much. > Actually the person i spoke to gave me a name to search for, but i don't > have that information in front of me right now... Look for Rmpi. I believe it's in the contrib non-current directory on CRAN. It works with LAM-MPI, though we've talked about extending it to MPICH. If interested in programming statistical calculations on a beowulf, one might consider SNOW, which is an R library which provides a higher (but simpler) level implementation (independent of PVM or MPI -- will even use socket-based communication on a cluster if you don't have it), and integrates transparently with SPRNG (the scalable parallel RNG). See http://www.analytics.washington.edu/~rossini/courses/cph-statcomp/ and Lecture/Lab 4 for description/issues in interactively computing statistical quantities on a computational cluster (for statisticians who don't want to figure out communication, and just want to get results faster). (I'm biased in my view -- we wrote the wrappers to PVM and SPRNG for R, as well as contributed to SNOW, and just need to extend the current set of MPI wrappers). best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Mon Nov 24 15:04:55 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Mon, 24 Nov 2003 12:04:55 -0800 Subject: [OT] statistical calculations (Martin WHEELER) In-Reply-To: (Matthew Timothy Pratola's message of "Sun, 23 Nov 2003 18:25:49 -0500") References: <200311231701.hANH1CS16397@NewBlue.scyld.com> Message-ID: <851xrxh6s8.fsf@blindglobe.net> Matthew Timothy Pratola writes: > Anyhow, to keep this slightly on-topic, in a recent conversation with > someone from the R project, i was told that there is maybe a rough, > non-widely distributed implementation of MPI in R, which i think would be > nice, but currently searching R and MPI on google does not yield much. > Actually the person i spoke to gave me a name to search for, but i don't > have that information in front of me right now... Look for Rmpi. I believe it's in the contrib non-current directory on CRAN. It works with LAM-MPI, though we've talked about extending it to MPICH. If interested in programming statistical calculations on a beowulf, one might consider SNOW, which is an R library which provides a higher (but simpler) level implementation (independent of PVM or MPI -- will even use socket-based communication on a cluster if you don't have it), and integrates transparently with SPRNG (the scalable parallel RNG). See http://www.analytics.washington.edu/~rossini/courses/cph-statcomp/ and Lecture/Lab 4 for description/issues in interactively computing statistical quantities on a computational cluster (for statisticians who don't want to figure out communication, and just want to get results faster). (I'm biased in my view -- we wrote the wrappers to PVM and SPRNG for R, as well as contributed to SNOW, and just need to extend the current set of MPI wrappers). best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 24 16:51:06 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 24 Nov 2003 16:51:06 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: Message-ID: On Sat, 22 Nov 2003, Mark Hahn wrote: > > Depends if your mobo has a boot from usb option. It's slightly more complex than that: only some (many, but not all) USB memory devices are usable as boot media. The Intel-branded Itanium-2 (I2) machines can boot from USB devices. Intel might be the best source for a list of usable USB boot devices. The I2 might be the only interesting case for USB booting: an I2 kernel can't even come close to fitting in 1.44 or 2.88 MB! > I wonder how bootable usb-keys work. it would be pretty useless > if the bios only had enough smarts to load a bootsector and run it. > the bios must at least contain enough of a usb-block driver to let > it emulate a floppy disk. if so, I'd expect linux to "just work"... We've been doing this for years with Scyld BeoBoot: use the BIOS to load both the kernel and an ramdisk '/'. The now-standard Linux approach is loading an "initrd", which accomplishes the same thing with a slightly different environment. The advantage here is that the kernel doesn't require USB support built-in, or any USB support at all! Everything needed from the boot media is loaded into memory by the boot ROM + BIOS. But bottom line is that booting is no longer a hotly-debated cluster issue. Essentially every current system has PXE network booting. Approaches such as BeoBoot stage 1 or USB booting are only needed for legacy machines. With x86 machines you can use PXE to do BIOS updates, hardware diagnostics, or boot the machine as a cluster node, all without touching the hardware. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 24 17:13:53 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 24 Nov 2003 17:13:53 -0500 (EST) Subject: Need a little help getting started In-Reply-To: Message-ID: On Sun, 23 Nov 2003, William J Mandra wrote: > Hello all. I am new to this lit and apologize in advance if any of the > questions that I have are silly but here it goes. I am in the design phase > of a cluster and I am having some trouble figuring out which software > packages to use. The cluster will originally consist of 12 nodes linked via > 100BaseT switched ethernet and a cluster controller. The following are some > of my requirements: > 1. All nodes netboot off of the cluster controller > 2. automatic process migration and load balancing (openMOSIX) Do you require transparent process migration at run-time (e.g. Mosix) which imposed significant overhead, or will directed process migration work? > 3. distributed shared memory Ahhh, you have control of your application, which implies that you likely won't benefit from transparent process migration. There are several Distributed Shared Memory (DSM) systems, with different design tradeoffs. Since it's very easy to thrash a DSM system, you should select one that matches you application's needs and then carefully tune your application. You should treat the DSM system exactly the same as MPI or the message-passing subsystem of PVM: a library that fits with the rest of the system, not the piece around which everything else revolves. > The cluster controller will be connected to both the main network and the > private cluster network and I would like to be able to start applications on > the cluster remotely via the cluster controller. That's a normal configuration. Almost every cluster design configures one (or a small number of) master and designates the other machines as compute nodes. The Scyld system goes further by making the compute slaves capable of only running processes initiated and controlled by the master. > I have been doing an exhaustive amount of research on all of the different > software available to accomplish this, but I have fallen short in figuring > out which ones will work together. You'll find two approaches: - Monolithic designs, that have no independently replaceable subsystems - Component designs, that use independent subsystem The challenge is implementing component designs using an over-all architecture that results in a simple system. Most approaches using independent components end up being unable to evolve. The result is overly feature-full, complex subsystems as individual try to address new problems using only the subsystem they understand and have control over. > I am planning on using Red Hat 9 on all of the nodes in the cluster. You should understand what you are asking for: perhaps you mean "I need library and application compatibility with Red Hat 9". Because you aren't going to get process migration and DSM without modifying the kernel and/or libraries. > I just need a little more information to give me that push in the right > direction. I do have some time, s I am not planning to start building the > cluster until March or April. You should consider Gigabit Ethernet a likely baseline network by then. If your application requires DSM, there is a fair chance that you would benefit from Remote DMA (RDMA) or Remote Write in SCI, Myrinet, Quadrics or Infiniband. Selecting one of those will impose a library interface, and you may find that you have few additional decisions to make. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Nov 24 17:41:04 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 24 Nov 2003 17:41:04 -0500 (EST) Subject: booting from usb pen drive In-Reply-To: Message-ID: On Mon, 24 Nov 2003, Mark Hahn wrote: > > > > Depends if your mobo has a boot from usb option. > > > > It's slightly more complex than that: only some (many, but not all) USB > > memory devices are usable as boot media. > > I've seen that advertised, but it was unclear to me whether it was a > purely marketing feature or not. > what does the device need to do to support booting? It surprised me that Intel needed to list which USB memory devices were usable as boot devices. > > We've been doing this for years with Scyld BeoBoot: use the BIOS to load > > both the kernel and an ramdisk '/'. The now-standard Linux approach is > > right, but this is actually two-step, no? that is, the bios only loads > the bootsector and jumps to it. your code in the bootsector (or just > the generic code in the kernel's boot.S) is then responsible for making > further bios calls for reading more than that 512B. so if the bios > doesn't provide a floppy-like block driver, it wouldn't work. Correct. The bootloader - is in 16 bit mode, - may only use the basic BIOS entry points for reading blocks, - must follow rules such as periodically calling the keyboard-read loop The key is that your bootloader must load everything the final system might need before exiting 16 bit mode, 'cause there ain't no goin' back. > I guess what I'm wondering is whether a bios that provides USB-booting > does actually provide a block driver. Yes, but it's a BIOS block driver -- it's not suitable for general purpose use. The functionality might be divided between polling hardware with interrupts disabled and doing things within the keyboard-read calls. It might re-program the timer and PIC chips, or use the SIM mode of the processor. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From derek.richardson at pgs.com Mon Nov 24 18:12:16 2003 From: derek.richardson at pgs.com (Derek Richardson) Date: Mon, 24 Nov 2003 17:12:16 -0600 Subject: Opteron kernel Message-ID: <3FC29050.6000003@pgs.com> All, Does anyone know where to find info on tuning the linux kernel for Opterons? Googling hasn't turned up much useful information. Thanks, Derek R. -- Linux Administrator derek.richardson at pgs.com derek.richardson at ieee.org Office 713-781-4000 Cell 713-817-1197 Madison's Inquiry: If you have to travel on the Titanic, why not go first class? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jducom at nd.edu Mon Nov 24 19:17:54 2003 From: jducom at nd.edu (Jean-Christophe Ducom) Date: Mon, 24 Nov 2003 19:17:54 -0500 Subject: Beowulf of bare motherboards Message-ID: <3FC29FB2.5070504@nd.edu> I tried to find a link to a 'old' project where people were using racks to put barebone motherboards (to save the cost of the case basically). It was similar to the following project but was more elaborated (it was possible to pull out the bare motherboards of the shelf, etc...) http://www.abo.fi/~physcomp/cluster/celeron.html I spent hours to find it on google..without success. Could anyone remember it? Please send the link. Thanks a lot JC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Mon Nov 24 20:14:51 2003 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Mon, 24 Nov 2003 20:14:51 -0500 Subject: Beowulf of bare motherboards In-Reply-To: <3FC29FB2.5070504@nd.edu> References: <3FC29FB2.5070504@nd.edu> Message-ID: <3FC2AD0B.4090301@comcast.net> Is this it? http://www.clustercompute.com/ Jeff > I tried to find a link to a 'old' project where people were using > racks to put barebone motherboards (to save the cost of the case > basically). > It was similar to the following project but was more elaborated (it > was possible to pull out the bare motherboards of the shelf, etc...) > http://www.abo.fi/~physcomp/cluster/celeron.html > > I spent hours to find it on google..without success. > Could anyone remember it? Please send the link. > Thanks a lot > > JC > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Mon Nov 24 20:30:08 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Mon, 24 Nov 2003 17:30:08 -0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <1069682488.2179.127.camel@scalable> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> Message-ID: <20031125013008.GA6416@sphere.math.ucdavis.edu> On Mon, Nov 24, 2003 at 10:01:30PM +0800, Laurence Liew wrote: > Hi all, > > RedHat have annouced academic pricing at USD25 per desktop (RHEL WS > based) and USD50 for Academic server (RHEL ES based) a week or so ago. This sounded relatively attractive to me, until I found out that USD25 per desktop for RHEL WS did NOT include the Opteron version. To add insult to injury RHEL ES does not support opteron. > Now... I believe the USD25 and USD50 are acceptable pricing for the > value that RHEL + RHN brings to the customer (academic). The cost of the > OS is a small fraction of the total value of the cluster. Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of $792. If you want named, dhcpd, and friends it's $1992. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Nov 24 20:10:39 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 24 Nov 2003 17:10:39 -0800 (PST) Subject: Beowulf of bare motherboards In-Reply-To: <3FC29FB2.5070504@nd.edu> Message-ID: hi ya On Mon, 24 Nov 2003, Jean-Christophe Ducom wrote: > I tried to find a link to a 'old' project where people were using racks to put > barebone motherboards (to save the cost of the case basically). hotmail and google used those motherboard in the 19" (kingstarusa.com) racks -- looks like its discontinued ?? - a flat piece of (aluminum/steel) metal (from home depot/orchard) will work too you know - just add a couple holes on stand off for the mb and power supply - or get a sheet metal shop to bend and drill a few holes w rack mounting ears > It was similar to the following project but was more elaborated (it was possible > to pull out the bare motherboards of the shelf, etc...) > http://www.abo.fi/~physcomp/cluster/celeron.html i'm very interested in those systems ... - to build a cluster w/ just motherboards and optionally w/ disks - power supply will be simple +12vDC wall adaptor ... - P4-3G equivalent mb/cpu - it'd be a good engineering challenge :-) ( big question is what holds up the back of the "caseless" ( motherboards and disks c ya alvin > I spent hours to find it on google..without success. > Could anyone remember it? Please send the link. > Thanks a lot there are other pc104 based caseless clusters http://eri.ca.sandia.gov/eri/howto.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Nov 24 19:40:38 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 24 Nov 2003 16:40:38 -0800 Subject: booting from usb pen drive In-Reply-To: References: < Message-ID: <5.2.0.9.2.20031124163754.018c7b58@mailhost4.jpl.nasa.gov> At 04:51 PM 11/24/2003 -0500, Donald Becker wrote: >But bottom line is that booting is no longer a hotly-debated cluster >issue. Essentially every current system has PXE network booting. Every "x86, wired ethernet" cluster has PXE booting. >Approaches such as BeoBoot stage 1 or USB booting are only needed for >legacy machines. Or for clusters built with some other processor (still COTS, but not necessarily "currently sold x86 mobo in the consumer/office market" COTS). > With x86 machines you can use PXE to do BIOS >updates, hardware diagnostics, or boot the machine as a cluster node, >all without touching the hardware. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Nov 24 23:00:55 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 25 Nov 2003 15:00:55 +1100 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> <20031125013008.GA6416@sphere.math.ucdavis.edu> Message-ID: <200311251500.56467.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 25 Nov 2003 12:30 pm, Bill Broadley wrote: > Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of > $792. If you want named, dhcpd, and friends it's $1992. Release Candidate 1 of Mandrake 9.2 for AMD64 is now available for download. http://www.mandrakelinux.com/en/92amd64beta.php3 There's also an experimental Gentoo build available, and a Debian port is in the works. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/wtP3O2KABBYQAh8RAk6zAJ4xqhx0pCbf2BJehd+pkwb7uXpEoQCeINBF e7gR5Gnx7f33dKueUF7UiUQ= =Bsw1 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From herrold at owlriver.com Mon Nov 24 23:53:14 2003 From: herrold at owlriver.com (R P Herrold) Date: Mon, 24 Nov 2003 23:53:14 -0500 (EST) Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> <20031125013008.GA6416@sphere.math.ucdavis.edu> Message-ID: On Mon, 24 Nov 2003, Bill Broadley wrote: > Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of > $792. If you want named, dhcpd, and friends it's $1992. Goodness, such pessimism. The new Red Hat pricing model allows for you to avail yourselves of their release integration for a long lived product tail, 'instant ISO' download, and for various scaled support models (including their 'up2date' tool for remote console administration, update pools, and update scheduling) Clever folks saved copies of the beta ISO's and are 90 percent there already. Daemon applications simply don't change that fast, unless there is a security matter. Buy the low end stripped model, and get the kernel and libraries updates by RH; invest a week to learn yum and package building and signing, and add whatever application layer tools you want. Or pay a third party to build them for you, to a SLA you can afford. Owl River has sold such third-party services for years, as have the nice folks of the KRUD distribution, the Wirex folks, and so forth. It's a small group of people doing this, coming from both inside and outside the RH private beta testers group; many are listed toward the bottom of: http://www.owlriver.com/projects/packaging/ As for binaries built by a third party are not on the manifest for that package, the 'up2date' update channel from Red Hat should simply ignore them, as it would any other 'foriegn' package. One has to assume their client is smart enough to ignore updating non-channel content. If not, you gain a windfall (or maybe are harmed if it updates someting you did not expect it to); if so, download the updates from the trusted alternative archive as planned. I have pushed the development of yum, and published proof of concept code under the GPL to use yum for some of server-side functions similar to RH's 'up2date', and published added kickstart integration, as well as for the more familiar client side tasks. Large parts of our work future will continue to be available under the GPL our 'cAos' participation. http://www.owlriver.com/support/yum/ For those unwilling to read, experiment, maintain the needed devel lab, and develop, I am happy to sell such services. http://www.owlriver.com/support/wings/ BTW: The RH exit from the mass 'free download/free support/forever' market should come as no great surprise to folks; I note that our page is untouched since early June when this outcome was pretty obvious (and before RH formally even named their new line). -- Russ Herrold _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hunting at ix.netcom.com Tue Nov 25 01:19:16 2003 From: hunting at ix.netcom.com (Michael Huntingdon) Date: Mon, 24 Nov 2003 22:19:16 -0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <1069682488.2179.127.camel@scalable> <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> Message-ID: <3.0.3.32.20031124221916.01250910@popd.ix.netcom.com> Huge suprise to all of us. Someone or a group of folks will have to jump in, write the code and fill the void. Not sure something like with will take long. At 05:30 PM 11/24/2003 -0800, Bill Broadley wrote: >On Mon, Nov 24, 2003 at 10:01:30PM +0800, Laurence Liew wrote: >> Hi all, >> >> RedHat have annouced academic pricing at USD25 per desktop (RHEL WS >> based) and USD50 for Academic server (RHEL ES based) a week or so ago. > >This sounded relatively attractive to me, until I found out that >USD25 per desktop for RHEL WS did NOT include the Opteron version. > >To add insult to injury RHEL ES does not support opteron. > >> Now... I believe the USD25 and USD50 are acceptable pricing for the >> value that RHEL + RHN brings to the customer (academic). The cost of the >> OS is a small fraction of the total value of the cluster. > >Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of >$792. If you want named, dhcpd, and friends it's $1992. > >-- >Bill Broadley >Mathematics >UC Davis >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Tue Nov 25 09:29:50 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Tue, 25 Nov 2003 22:29:50 +0800 Subject: LONG RANT [RE: RHEL Copyright Removal] In-Reply-To: <20031125013008.GA6416@sphere.math.ucdavis.edu> References: <0B27450D68F1D511993E0001FA7ED2B3036EE4F8@ukjhmbx12.ukjh.zeneca.com> <1069682488.2179.127.camel@scalable> <20031125013008.GA6416@sphere.math.ucdavis.edu> Message-ID: <1069770053.2179.224.camel@scalable> Hi, I am still waiting for my Red Hat rep to get me official pricing which should also provide me the platforms offered under the Academic program. If currently there is no academic pricing for AMD64, then what I would suggest you do is write to your Red Hat rep/sales manager and ask for it. Over in Singapore/Asia Pac, I have asked for Academic pricing for RHEL for HPC clusters for the last 8 months.. and I guess part of my prayers have been answered. The pricing announced is very close (better in fact) to what I have requested for at USD50 per node. It is not easy convincing Red Hat that I needed a HPC pricing model, but it can be done and I guess they have listened (to this mailing list, their customers and their partners). Most vendors need to get feedback on what is wanted/required, and it is up to the community to let Red Hat know what that is... just be reasonable. As a business, they need to survive and make a profit. If we can argue a win-win proposal, I am sure they will listen. BTW, we have used RHEL WS to build clusters and it seemed to include all required daemons (sorry do not have access to AMD64 yet... so cannot comment). Cheers! laurence On Tue, 2003-11-25 at 09:30, Bill Broadley wrote: > On Mon, Nov 24, 2003 at 10:01:30PM +0800, Laurence Liew wrote: > > Hi all, > > > > RedHat have annouced academic pricing at USD25 per desktop (RHEL WS > > based) and USD50 for Academic server (RHEL ES based) a week or so ago. > > This sounded relatively attractive to me, until I found out that > USD25 per desktop for RHEL WS did NOT include the Opteron version. > > To add insult to injury RHEL ES does not support opteron. > > > Now... I believe the USD25 and USD50 are acceptable pricing for the > > value that RHEL + RHN brings to the customer (academic). The cost of the > > OS is a small fraction of the total value of the cluster. > > Alas if you buy a AMD64 laptop, desktop, or server it's a minimum of > $792. If you want named, dhcpd, and friends it's $1992. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.worsham at mci.com Tue Nov 25 13:38:55 2003 From: michael.worsham at mci.com (Michael Worsham) Date: Tue, 25 Nov 2003 13:38:55 -0500 Subject: Serious processing power... Message-ID: <001101c3b383$62515190$9c9832a6@Wcomnet.com> This article might interest a few of you with some serious processing power... Meet the real star of Lord of the Rings - a 1,600-box server farm. http://www.wired.com/wired/archive/11.12/play.html?pg=2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Nov 25 15:05:55 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 25 Nov 2003 15:05:55 -0500 (EST) Subject: Opteron kernel In-Reply-To: <3FC29050.6000003@pgs.com> Message-ID: On Mon, 24 Nov 2003, Derek Richardson wrote: > Does anyone know where to find info on tuning the linux kernel for > Opterons? Googling hasn't turned up much useful information. What type of tuning? PCI bus transactions (the Itanium required more, but the Opteron still benefits)? Scheduling? Processor affinity? What kernel version? If you ask specific questions, there is likely someone on the list that knows the specific answer. The easiest performance improvement comes from proper memory DIMM configuration to match the application layout. Each processor has its own local memory controller, and understanding how the memory slots are filled and the options e.g. interleave can make a 30% difference on a dual processor system. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Tue Nov 25 18:21:49 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 25 Nov 2003 15:21:49 -0800 Subject: Opteron kernel In-Reply-To: References: <3FC29050.6000003@pgs.com> Message-ID: <20031125232149.GA2995@greglaptop.internal.keyresearch.com> On Tue, Nov 25, 2003 at 03:05:55PM -0500, Donald Becker wrote: > The easiest performance improvement comes from proper memory DIMM > configuration to match the application layout. Each processor has its > own local memory controller, and understanding how the memory slots are > filled and the options e.g. interleave can make a 30% difference on a > dual processor system. I second this -- don't trust what you think you *know* (we all know it only has 1 memory channel, so you shouldn't have to fill all the dimm slots) and instead measure (filling all the dimms slots helps perf.) The 2.6 kernels have a bit better performance than 2.4, and there are bugs that simply aren't fixed in 2.4, including one that our compiler stomps on frequently. You will definitely want the "runon" command for processor affinity... but it will change your choice of interleave in the BIOS. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Nov 25 22:28:54 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 25 Nov 2003 22:28:54 -0500 Subject: Opteron kernel In-Reply-To: <20031125232149.GA2995@greglaptop.internal.keyresearch.com> References: <3FC29050.6000003@pgs.com> <20031125232149.GA2995@greglaptop.internal.keyresearch.com> Message-ID: <1069817334.8326.122.camel@protein.scalableinformatics.com> On Tue, 2003-11-25 at 18:21, Greg Lindahl wrote: > You will definitely want the "runon" command for processor affinity... > but it will change your choice of interleave in the BIOS. Hi Greg: Has anyone implemented a real runon, or built something like the old IRIX dplace stuff yet? I had been looking into this, and don't want to re-invent a working thing... Joe > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 02:30:14 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 08:30:14 +0100 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doe sn't succeed Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2EB@agnnl02.mas.eurocontrol.be> Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Nov 26 04:00:08 2003 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 26 Nov 2003 01:00:08 -0800 Subject: Cluster benchmark summary Message-ID: <20031126090008.GA15109@cse.ucdavis.edu> Greetings all. Many thanks for the many responses. My main frustration and motivation for my benchmark proposal was the relatively poor relationship between advertised link latencies and bandwidths and actual application level scaling/efficiency. Turns out this was a popular topic at SC2003, I discussed it with many people, and attended a few discussions on it. McCalpin had a talk discussing (from memory, so expect inaccuracies) how MFlops predict spec cpu_rate (very poorly) and how memory bandwidth predicts cpu_rate (less poorly). He then discussed a hybrid model using MFlops, cache size, and memory bandwidth. Something along the lines of 0.8 bytes per flop with zero cache and 0.1 bytes per flop with 8MB of cache was used for a model to predict Spec cpu_rate based on MFlops, and memory bandwidth. Using this fairly simple model Mflops * cachefficency * bandwidth led to a pretty good correlation with the 900+ spec cpu_rate numbers he collected (my vague memory wants to claim +/- 10 or 20%). Interpreted by me as somewhat of a validation of microbenchmarks acting as a predictor of real application performance (for applications that are well understood). If I find the slides online I'll post (if someone else does please follow up). At least he has a convincing graph (I know, a great way to lie) on his predictions for 923 spec_cpu_rate results. The most noteworthy benchmark suite mentioned at SC2003 was: http://icl.cs.utk.edu/hpcc/ Basically 5 benchmarks (well 4 + the top-500 HP Linpack) to help quantify cluster performance and scaling. McCalpin's stream, Random Access (I believe I heard this referred to as something that sounded liked Gups), Ptrans(parallel matrix transpose), and b_eff (effective bandwidth benchmark). Current version is at 0.4 alpha, so here is your chance people, improve it while you can. I'm assuming that input is welcome, and patches doubly so. I think this is a great start. Currently submitted results are for a Cray (vector), Alphaserver, Itanium2, Altix, and Power 4 based clusters. I'd love to see additional numbers for Myrinet, Dolphin, Quadrics and Infiniband clusters. Submit yours today! Oh and most importantly (no Spec mistakes here), source is available, so have at it and report results (click on archive or upload). I have no idea what the license status of the source is, it is available for download but doesn't mention any licensing terms. Ideally it will be GPL or similar. I believe source code optimization is legal AFTER reporting based unmodified results. I also believe that ALL results most be posted, mainly to avoiding cherry picking. Of course the URL mentioned is the authorative source for such info. I heard rumors from several different people that top500.org was going to collect these performance numbers, but still rank only on HPL. Of course people can download the results and rank however they want. So hopefully this will lead to interconnect companies competing on complete cluster performance instead of link speeds and latencies. ============================================================================ I'll list here any other benchmarks people brought to my attention, please follow up if I missed anything, many of the messages came in at the conference after my ethernet and wireless died (damn Dell laptop), of course upon my return I've been swamped with email. Felix Rauch mentioned the Switchmark discussed in a paper at: http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 A collection of benchmarks, mentioned at SC2003, is available at: http://www.ipacs-benchmark.org John Hearns mentioned: http://www.plogic.com/bps (beowulf performance suite) More related discussion at: http://www.cluster-rant.com/article.pl?sid=03/03/17/1838236. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 03:29:49 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 09:29:49 +0100 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doe sn't succeed Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2EF@agnnl02.mas.eurocontrol.be> Oh, yes. My original email contained the Ethereal dumps: --- snip --- But still, when trying ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the board doesn't respond to the Activate Session request: Frame 1 (65 bytes on wire, 65 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.094931000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 65 bytes Capture Length: 65 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 51 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1923 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 31 Checksum: 0x8d6b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 9 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Request LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Checksum 2: 0x33 Data (2 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 33 00 00 40 00 40 11 19 23 c8 c8 c8 01 c8 c8 .3.. at .@..#...... 0020 c8 04 1b 59 02 6f 00 1f 8d 6b 06 00 ff 07 00 00 ...Y.o...k...... 0030 00 00 00 00 00 00 00 09 20 18 c8 81 04 38 0e 02 ........ ....8.. 0040 33 3 Frame 2 (72 bytes on wire, 72 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321677000 Time delta from previous packet: 0.226746000 seconds Time relative to first packet: 0.226746000 seconds Frame Number: 2 Packet Length: 72 bytes Capture Length: 72 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 58 Identification: 0x0a9c (2716) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e70 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 38 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 16 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Response LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Completion Code: Command completed normally (0x00) Checksum 2: 0x97 Data (8 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 3a 0a 9c 40 00 40 11 0e 70 c8 c8 c8 04 c8 c8 .:.. at .@..p...... 0020 c8 01 02 6f 1b 59 00 26 00 00 06 00 ff 07 00 00 ...o.Y.&........ 0030 00 00 00 00 00 00 00 10 81 1c 63 20 04 38 00 01 ..........c .8.. 0040 20 1c 00 bd 13 00 00 97 ....... Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321885000 Time delta from previous packet: 0.000208000 seconds Time relative to first packet: 0.226954000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0x3693 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0xd0 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 1b 59 02 6f 00 2e 36 93 06 00 ff 07 00 00 ...Y.o..6....... 0030 00 00 00 00 00 00 00 18 20 18 c8 81 08 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 d0 DMIN............ Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.436904000 Time delta from previous packet: 0.115019000 seconds Time relative to first packet: 0.341973000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a9d (2717) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e62 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0xdd Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 9d 40 00 40 11 0e 62 c8 c8 c8 04 c8 c8 .G.. at .@..b...... 0020 c8 01 02 6f 1b 59 00 33 00 00 06 00 ff 07 00 00 ...o.Y.3........ 0030 00 00 00 00 00 00 00 1c 81 1c 63 20 08 39 00 10 ..........c .9.. 0040 3d 22 9c 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 =".uN.9..Vao ..x 0050 6b 86 c2 dd 00 k.... Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.437332000 Time delta from previous packet: 0.000428000 seconds Time relative to first packet: 0.342401000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0xc2b9 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0x9c223d10 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0x9c223d10 Authentication Code: DD4D2F557F83B6DFE6E9CACAE38CA53E Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x0c 0000 11.. = Sequence: 0x03 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0x3c Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 1b 59 02 6f 00 43 c2 b9 06 00 ff 07 05 00 ...Y.o.C........ 0030 00 00 00 10 3d 22 9c dd 4d 2f 55 7f 83 b6 df e6 ....="..M/U..... 0040 e9 ca ca e3 8c a5 3e 1d 20 18 c8 81 0c 3a 05 02 ......>. ....:.. 0050 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 6b 86 c2 uN.9..Vao ..xk.. 0060 dd 76 ef fd 3c .v..< The same holds true when changing ipmi_auths[5] to: { ipmi_md5_authcode_init, ipmi_md5_authcode_gen, ipmi_md5_authcode_check, ipmi_md5_authcode_cleanup } However, the Java IPMI tool that SuperMicro delivers with the BMC is able to activate the session: Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.427023000 Time delta from previous packet: 0.002488000 seconds Time relative to first packet: 0.123603000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0xcd34 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0x59 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 8d af 02 6f 00 2e cd 34 06 00 ff 07 00 00 .....o...4...... 0030 00 00 00 00 00 00 00 18 20 18 c8 00 00 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 59 DMIN...........Y Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.542351000 Time delta from previous packet: 0.115328000 seconds Time relative to first packet: 0.238931000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a93 (2707) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e6c (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0x85 Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 93 40 00 40 11 0e 6c c8 c8 c8 04 c8 c8 .G.. at .@..l...... 0020 c8 01 02 6f 8d af 00 33 00 00 06 00 ff 07 00 00 ...o...3........ 0030 00 00 00 00 00 00 00 1c 00 1c e4 20 00 39 00 10 ........... .9.. 0040 30 d2 b3 b4 66 5c 8d 59 95 1f 9e 48 1d 51 b3 4f 0...f\.Y...H.Q.O 0050 cd eb 3f 85 00 ..?.. Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.549163000 Time delta from previous packet: 0.006812000 seconds Time relative to first packet: 0.245743000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0x836b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0xb3d23010 Authentication Code: 5A50292FC164E754A3E7846B0A96880F Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0xbd Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 8d af 02 6f 00 43 83 6b 06 00 ff 07 05 00 .....o.C.k...... 0030 00 00 00 10 30 d2 b3 5a 50 29 2f c1 64 e7 54 a3 ....0..ZP)/.d.T. 0040 e7 84 6b 0a 96 88 0f 1d 20 18 c8 00 00 3a 05 04 ..k..... ....:.. 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 00 00 00 bd ..... Frame 6 (90 bytes on wire, 90 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.627386000 Time delta from previous packet: 0.078223000 seconds Time relative to first packet: 0.323966000 seconds Frame Number: 6 Packet Length: 90 bytes Capture Length: 90 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 76 Identification: 0x0a94 (2708) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e66 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 56 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000001 Session ID: 0xb3d23010 Authentication Code: 1C000048D88D000200000000000000A4 Message Length: 18 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Activate Session (0x3a) Completion Code: Command completed normally (0x00) Checksum 2: 0xd8 Data (10 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 4c 0a 94 40 00 40 11 0e 66 c8 c8 c8 04 c8 c8 .L.. at .@..f...... 0020 c8 01 02 6f 8d af 00 38 00 00 06 00 ff 07 05 01 ...o...8........ 0030 00 00 00 10 30 d2 b3 1c 00 00 48 d8 8d 00 02 00 ....0.....H..... 0040 00 00 00 00 00 00 a4 12 00 1c e4 20 00 3a 00 05 ........... .:.. 0050 10 30 d2 b3 00 00 00 00 04 d8 .0........ --- snip --- Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 08:56 To: WANGNICK Sebastian; openipmi-developer at lists.sourceforge.net; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 07:55:11 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 13:55:11 +0100 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doe sn't succeed Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2F7@agnnl02.mas.eurocontrol.be> Hello Jeff, when doing ./ipmicmd -k "0f 00 06 01" lan 200.200.200.4 623 md2 admin ADMIN ADMIN I'm getting Requested authentication 1 not supported (supporting 0x20 only)Unable to setup connection: 16 as explained below. This is OpenIPMI-1.5.5, modified as described further down. Is 172.16.211.198 a SuperMicro or an Intel machine? Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 13:24 To: WANGNICK Sebastian; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed I can connect using following commands: [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 none user "" " " Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 After set LAN password by SSU, [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 md2 user "" "1 23456" Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: WANGNICK Sebastian [mailto:sebastian.wangnick at eurocontrol.int] Sent: Wednesday, November 26, 2003 4:30 PM To: Zheng, Jeff; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Oh, yes. My original email contained the Ethereal dumps: --- snip --- But still, when trying ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the board doesn't respond to the Activate Session request: Frame 1 (65 bytes on wire, 65 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.094931000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 65 bytes Capture Length: 65 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 51 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1923 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 31 Checksum: 0x8d6b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 9 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Request LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Checksum 2: 0x33 Data (2 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 33 00 00 40 00 40 11 19 23 c8 c8 c8 01 c8 c8 .3.. at .@..#...... 0020 c8 04 1b 59 02 6f 00 1f 8d 6b 06 00 ff 07 00 00 ...Y.o...k...... 0030 00 00 00 00 00 00 00 09 20 18 c8 81 04 38 0e 02 ........ ....8.. 0040 33 3 Frame 2 (72 bytes on wire, 72 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321677000 Time delta from previous packet: 0.226746000 seconds Time relative to first packet: 0.226746000 seconds Frame Number: 2 Packet Length: 72 bytes Capture Length: 72 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 58 Identification: 0x0a9c (2716) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e70 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 38 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 16 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Response LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Completion Code: Command completed normally (0x00) Checksum 2: 0x97 Data (8 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 3a 0a 9c 40 00 40 11 0e 70 c8 c8 c8 04 c8 c8 .:.. at .@..p...... 0020 c8 01 02 6f 1b 59 00 26 00 00 06 00 ff 07 00 00 ...o.Y.&........ 0030 00 00 00 00 00 00 00 10 81 1c 63 20 04 38 00 01 ..........c .8.. 0040 20 1c 00 bd 13 00 00 97 ....... Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321885000 Time delta from previous packet: 0.000208000 seconds Time relative to first packet: 0.226954000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0x3693 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0xd0 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 1b 59 02 6f 00 2e 36 93 06 00 ff 07 00 00 ...Y.o..6....... 0030 00 00 00 00 00 00 00 18 20 18 c8 81 08 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 d0 DMIN............ Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.436904000 Time delta from previous packet: 0.115019000 seconds Time relative to first packet: 0.341973000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a9d (2717) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e62 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0xdd Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 9d 40 00 40 11 0e 62 c8 c8 c8 04 c8 c8 .G.. at .@..b...... 0020 c8 01 02 6f 1b 59 00 33 00 00 06 00 ff 07 00 00 ...o.Y.3........ 0030 00 00 00 00 00 00 00 1c 81 1c 63 20 08 39 00 10 ..........c .9.. 0040 3d 22 9c 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 =".uN.9..Vao ..x 0050 6b 86 c2 dd 00 k.... Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.437332000 Time delta from previous packet: 0.000428000 seconds Time relative to first packet: 0.342401000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0xc2b9 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0x9c223d10 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0x9c223d10 Authentication Code: DD4D2F557F83B6DFE6E9CACAE38CA53E Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x0c 0000 11.. = Sequence: 0x03 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0x3c Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 1b 59 02 6f 00 43 c2 b9 06 00 ff 07 05 00 ...Y.o.C........ 0030 00 00 00 10 3d 22 9c dd 4d 2f 55 7f 83 b6 df e6 ....="..M/U..... 0040 e9 ca ca e3 8c a5 3e 1d 20 18 c8 81 0c 3a 05 02 ......>. ....:.. 0050 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 6b 86 c2 uN.9..Vao ..xk.. 0060 dd 76 ef fd 3c .v..< The same holds true when changing ipmi_auths[5] to: { ipmi_md5_authcode_init, ipmi_md5_authcode_gen, ipmi_md5_authcode_check, ipmi_md5_authcode_cleanup } However, the Java IPMI tool that SuperMicro delivers with the BMC is able to activate the session: Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.427023000 Time delta from previous packet: 0.002488000 seconds Time relative to first packet: 0.123603000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0xcd34 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0x59 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 8d af 02 6f 00 2e cd 34 06 00 ff 07 00 00 .....o...4...... 0030 00 00 00 00 00 00 00 18 20 18 c8 00 00 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 59 DMIN...........Y Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.542351000 Time delta from previous packet: 0.115328000 seconds Time relative to first packet: 0.238931000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a93 (2707) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e6c (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0x85 Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 93 40 00 40 11 0e 6c c8 c8 c8 04 c8 c8 .G.. at .@..l...... 0020 c8 01 02 6f 8d af 00 33 00 00 06 00 ff 07 00 00 ...o...3........ 0030 00 00 00 00 00 00 00 1c 00 1c e4 20 00 39 00 10 ........... .9.. 0040 30 d2 b3 b4 66 5c 8d 59 95 1f 9e 48 1d 51 b3 4f 0...f\.Y...H.Q.O 0050 cd eb 3f 85 00 ..?.. Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.549163000 Time delta from previous packet: 0.006812000 seconds Time relative to first packet: 0.245743000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0x836b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0xb3d23010 Authentication Code: 5A50292FC164E754A3E7846B0A96880F Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0xbd Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 8d af 02 6f 00 43 83 6b 06 00 ff 07 05 00 .....o.C.k...... 0030 00 00 00 10 30 d2 b3 5a 50 29 2f c1 64 e7 54 a3 ....0..ZP)/.d.T. 0040 e7 84 6b 0a 96 88 0f 1d 20 18 c8 00 00 3a 05 04 ..k..... ....:.. 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 00 00 00 bd ..... Frame 6 (90 bytes on wire, 90 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.627386000 Time delta from previous packet: 0.078223000 seconds Time relative to first packet: 0.323966000 seconds Frame Number: 6 Packet Length: 90 bytes Capture Length: 90 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 76 Identification: 0x0a94 (2708) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e66 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 56 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000001 Session ID: 0xb3d23010 Authentication Code: 1C000048D88D000200000000000000A4 Message Length: 18 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Activate Session (0x3a) Completion Code: Command completed normally (0x00) Checksum 2: 0xd8 Data (10 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 4c 0a 94 40 00 40 11 0e 66 c8 c8 c8 04 c8 c8 .L.. at .@..f...... 0020 c8 01 02 6f 8d af 00 38 00 00 06 00 ff 07 05 01 ...o...8........ 0030 00 00 00 10 30 d2 b3 1c 00 00 48 d8 8d 00 02 00 ....0.....H..... 0040 00 00 00 00 00 00 a4 12 00 1c e4 20 00 3a 00 05 ........... .:.. 0050 10 30 d2 b3 00 00 00 00 04 d8 .0........ --- snip --- Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 08:56 To: WANGNICK Sebastian; openipmi-developer at lists.sourceforge.net; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Wed Nov 26 10:37:51 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Wed, 26 Nov 2003 08:37:51 -0700 Subject: Opteron kernel In-Reply-To: <1069817334.8326.122.camel@protein.scalableinformatics.com>; from landman@scalableinformatics.com on Tue, Nov 25, 2003 at 10:28:54PM -0500 References: <3FC29050.6000003@pgs.com> <20031125232149.GA2995@greglaptop.internal.keyresearch.com> <1069817334.8326.122.camel@protein.scalableinformatics.com> Message-ID: <20031126083751.A6434@lnxi.com> On Tue, Nov 25 2003 at 20:28, Joe Landman wrote: > On Tue, 2003-11-25 at 18:21, Greg Lindahl wrote: > > > You will definitely want the "runon" command for processor affinity... > > but it will change your choice of interleave in the BIOS. > > Hi Greg: > > Has anyone implemented a real runon, or built something like the old > IRIX dplace stuff yet? I had been looking into this, and don't want to > re-invent a working thing... http://www.tech9.net/rml/schedutils/ hth, Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From egan at sense.net Wed Nov 26 11:11:21 2003 From: egan at sense.net (Egan Ford) Date: Wed, 26 Nov 2003 09:11:21 -0700 Subject: Opteron kernel In-Reply-To: <1069817334.8326.122.camel@protein.scalableinformatics.com> Message-ID: <066901c3b437$ef2d9a10$27b358c7@titan> node15:~ # numactl usage: numactl [--interleave=nodes] [--homenode=homenode] [--cpubind=nodes] [--membind=nodes] [--localalloc] command args ... numactl [--show] nodes is a comma delimited list of node numbers. You can get this as part of SLES8 SP3, however it appears to only work with the 2.4.19 included kernel, not 2.4.21. > -----Original Message----- > From: beowulf-admin at scyld.com > [mailto:beowulf-admin at scyld.com] On Behalf Of Joe Landman > Sent: Tuesday, November 25, 2003 8:29 PM > To: Greg Lindahl > Cc: Beowulf > Subject: Re: Opteron kernel > > > On Tue, 2003-11-25 at 18:21, Greg Lindahl wrote: > > > You will definitely want the "runon" command for processor > affinity... > > but it will change your choice of interleave in the BIOS. > > Hi Greg: > > Has anyone implemented a real runon, or built something like the old > IRIX dplace stuff yet? I had been looking into this, and > don't want to > re-invent a working thing... > > Joe > > > > -- greg > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeff.zheng at intel.com Wed Nov 26 02:55:35 2003 From: jeff.zheng at intel.com (Zheng, Jeff) Date: Wed, 26 Nov 2003 15:55:35 +0800 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Message-ID: <37FBBA5F3A361C41AB7CE44558C3448E011959F1@pdsmsx403.ccr.corp.intel.com> Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Openipmi-developer mailing list Openipmi-developer at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sebastian.wangnick at eurocontrol.int Wed Nov 26 11:57:43 2003 From: sebastian.wangnick at eurocontrol.int (WANGNICK Sebastian) Date: Wed, 26 Nov 2003 17:57:43 +0100 Subject: SuperMicro IPMI authentication Message-ID: <15ACCB26C192D411BADE0000F64664CB01BFC2F9@agnnl02.mas.eurocontrol.be> Dear Peter, thanks for your quick answer. I'll cross-post your answer to the Beowulf and Openipmi mailing list. Please note that this approach renders your IPMI support propriatary, and unusable for us (your tool won't pass our safety assessment). However, I'm not interested in source code of IPMIview (Java is not an option for us anyway). What I'm asking is the specification of the IPMI type 5 authentification. I just learned via the Beowulf mailing list that the Intel server boards do fully support type 0, 1, 2 and 4 authentication as specified in the standard. May I ask you, based on this clarification, to reconsider your position? Thanks in advance, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Support_Europe [mailto:Support at supermicro.nl] Sent: Wednesday 26 November 2003 15:00 To: Sebastian Wangnick Subject: RE: IPMI authorisation Hello Sir, while the IPMI 1.5 standard is implemented with our IPMI card we have to customize our IPMIview program, this is the only program to use with the IPMI card. While we can't give free the source code of the program. Best Regards, Peter Maas Supermicro Computer B.V. Het Sterrenbeeld 28 5215 ML 's-Hertogenbosch The Netherlands T: +31 (0)73-6400390 F: +31 (0)73-6416525 -----Original Message----- From: Sebastian Wangnick [mailto:sebastian.wangnick at eurocontrol.int] Sent: Wednesday, November 26, 2003 11:24 AM To: support at supermicro.nl Subject: IPMI authorisation Name: Sebastian Wangnick E-mail: sebastian.wangnick at eurocontrol.int Phone: +31 43366 1370 Model: SM-X5DPL-iGM with IPM Question or Comment: Dear Madam, dear Sir, I'm trying in vain to create an IPMI LAN session to my SuperMicro system. The system always replies in the GetChannelAuthCapabilities response that it doesn't support any standard authorisation scheme (0=No auth, 1=MD2, 2=MD5, 4=Straight), only an OEM-specific one (5=OEM), and neither MD2 nor MD5 seem to fit by chance. Which authorisation algorithm am I to use to successfully activate the IPMI session? Note that for system engineering reasons I can not rely on your IPMI-View tool. Regards, Sebastian Wangnick -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From c00jsh00 at nchc.gov.tw Wed Nov 26 01:10:57 2003 From: c00jsh00 at nchc.gov.tw (Jyh-Shyong Ho) Date: Wed, 26 Nov 2003 14:10:57 +0800 Subject: MPICH-1.2.5 on Opteron Message-ID: <3FC443F1.DDF317E7@nchc.gov.tw> Hi, I wonder if this is the right please to post this question, however, I'll appreciate if anyone can provide me some suggestion. We have a 1+4 nodes of dual Opteron cluster running SuSE Linux Enterprise 8 for AMD64, I installed MPICH-1.2.5 with PGI 5.1 64-bit compiler on the system. the MPICH configure file /opt/mpich/ch_p4/share/machines.LINUX has the following lines: Zephyr:2 Eurus1:2 Eurus2:2 Eurus3:2 Eurus4:2 When we ran a MPI program, it did not run with 10 cpus, however, it ran with 8 cpus. What might be the possible reason that not all cpus can be used? Jyh-Shyong Ho, PhD. Research Scientist National Center for High-Performance Computing Hsinchu, Taiwan, ROC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Wed Nov 26 04:58:32 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Wed, 26 Nov 2003 10:58:32 +0100 Subject: How : up2date 128 nodes of Redhat 9 ? In-Reply-To: <3FC7020D@webmaila.hku.hk> References: <3FC7020D@webmaila.hku.hk> Message-ID: <1069840712.7839.24.camel@revolution.mandrakesoft.com> Le sam 22/11/2003 ? 05:41, Woo Chat Ming a ?crit : > Dear beowulf friends, Hi, > We are a university in Hong Kong and we have a Redhat Linux 9 > beowulf cluster consisting of 128 nodes. All of them have real > IP address and are connected to the Internet. Woow ! Your nodes are using public IPs ? Why not MASQUERADING them ? > May I know how can I up2date all those nodes using a single > command ? You may use a parallel command such as rshp or c3 to ask your nodes to update themself. On MandrakeClustering/CLIC, urpmi parallel allow you to install packages on a full cluster. In your case, the following command executed on the server, ask each node to choose the updates it needs from the sources the server knows (the main distribution, updates from internet, your own packages etc..) and then uses rshp&mput from KA-Tools (http://ka-tools.sourceforge.net/) for copying/installing efficiently rpms on each nodes. It take the same range of time for 1 or 200 nodes ! urpmi.update -a # <-- Read the lastest updates for your rpms sources urpmi --parallel cluster --auto-select # <-- Ask each node of the "cluster" group to update itself with the list of rpms that the server knows. Best regards, -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From graham.mullier at syngenta.com Wed Nov 26 12:48:54 2003 From: graham.mullier at syngenta.com (graham.mullier at syngenta.com) Date: Wed, 26 Nov 2003 17:48:54 -0000 Subject: long reply (was RE: LONG RANT [RE: RHEL Copyright Removal]) Message-ID: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> Laurence Liew wrote: [...] > The cluster community have done very well and today, large commercial > organisations are adopting linux clusters as one of the tools they use > to solve their complex problems. Yup, that's us - we are currently adopting a variety of open source tools, including Linux, to help tackle at least some of our HPC needs. > > But I find this talk of "stripping" RHEL copyright to create > yet another > distro to be counter productive as linux beowulf clusters goes into > commercial mainstream computing.... where customers have specific > support demands. (And yes... commercial customers WILL PAY > the full list > price of RHEL to build a cluster). > > Now... I believe the USD25 and USD50 are acceptable pricing for the > value that RHEL + RHN brings to the customer (academic). The > cost of the > OS is a small fraction of the total value of the cluster. > > Most of our users want a stable and supported OS, but more > importantly, > most of them run a commercial software of one form or another... and > this means that these 3rd party ISV softwares are most likely to be > certified on RHEL. [...] I think you are confusing things here (I know you are ranting but let's try to keep the arguments coherent, please! ;) I'm running a project within a commercial company, so academic rates are of no use to me. I am willing to pay for what I get, but I'm not willing to pay simply to give us a warm glow that we are "supported". If I get some value I'll pay. I don't think I get value if I'm expected to pay separately for each copy of RHEL-AS on each of 42 compute nodes, and the only price I'm offered is an extreme full list price. I would be willing to buy into a model where I'm paying for a clean, well-tested patch stream. But that model can not scale cost linearly with number of installed nodes - I'm not even convinced it can scale as the log of the number of nodes. > > if the community continues to fork a project just becauses it charges > some $$$$, our progress would be very slow.... Redhat have listened to > the customer and partners and have created a academic pricing > model for > cluster builders... so we should accept that and move on. As I've said above, this is simply confused and does nothing for me or my project. The community depends on people contributing work - and in some cases those people contribute work in exchange for remuneration. But in other cases we as a community find ways of driving development forward through what amounts to barter - we all get value from the open source software, and we all contribute to it in some way. RH is (or at least appears to be) going down the restrictive licence, over-priced model pushed by MS. They've also learned the 'force frequent upgrades' trick. That leaves me uncomfortable about them as a vendor with whom I believe I'll have a good long-term relationship. But in the short-term software I use needs "RH 7.1", or "supported only on RH 7.3" or "RHEL-AS 2.1". Great. So I want ways of using RH that reduce my risks (what if RH stop making binaries available - can I still operate? If not, I want to be able to recompile from the source, and need to avoid copyright infringement problems). [...] > disarray and you will see droves of commercial ISVs > abandoning linux and > moving back to UNIX and Windows.... > > where would that leave us? without commercial apps, linux would never > sustain and grow in the commercial arena. > ah, well, now you've moved off into another universe. This isn't the one I'm in. Closed source is bad - it gets in my way, makes my life difficult, and increases my project's risks enormously. Why should I pay RH huge sums of money for Linux AND have to fight to get acceptance of Linux internally when I could take the "easy" option and just buy Windows? [by the way, I know why, and I'm fighting - and winning] Where I am now is a small part of the commercial arena, it uses commercial apps that run on Linux because we, customers, demand that they do. If RH make life difficult for us (awkward licence model and/or high price per node) we will start looking for ways around the problem, because it is worthwhile. Maybe we'll shift to another distro, maybe we'll take the time and sort out how to build it ourselves - and once we've done that, what use are RH to us? And if they are no use, will they get any money - no I don't think so. Open source is a whole new way of working - and the money has to come in a different way. If we're offered useful services that we can't or don't want to handle internally, we'll look at buying them. But if the price is too high we won't bother. Graham (long term IRIX user, computational chemist, and now chemoinformatics specialist. I put up with Windows for office use but wouldn't want to rely on it for anything important...) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From william.mandra at us.army.mil Tue Nov 25 20:53:01 2003 From: william.mandra at us.army.mil (William J Mandra) Date: Tue, 25 Nov 2003 20:53:01 -0500 Subject: Could clusters soon be a thing of the past? ..... Message-ID: If anyone is interested in the possibility of a desktop computer making the Top500 list check out this article: http://www.wired.com/news/technology/0,1282,60791,00.html The product will not be available until next year so I will wait and hold my breath (maybe some benchmarks will come out of the company before it disappears). :) It would be very interesting to see what kind off performance a current cluster could attain with an add on like this. ----- William Mandra william.mandra at us.army.mil ----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From derek.richardson at pgs.com Wed Nov 26 13:23:47 2003 From: derek.richardson at pgs.com (Derek Richardson) Date: Wed, 26 Nov 2003 12:23:47 -0600 Subject: Opteron kernel In-Reply-To: References: Message-ID: <3FC4EFB3.10708@pgs.com> Donald, Sorry for the late reply, bloody Exchange server didn't drop it in my inbox until late this morning. Memory and scheduling would probably be the biggest factor. Processor affinity doesn't matter as much, because in my experience we haven't had problems w/ processes bouncing between CPUs. PCI bus is almost a non-issue, since our application is embarassingly parallel and therefore has no need for > 100 Mbit ethernet, and there is no disk on a PCI-attached controller, so we have very little information passing over the PCI bus. By interleaving, I assume you mean at the physical level, which I had a quick peek at when we got the system ( it's an IBM eServer 325, a loaner for testing ) and I assumed to be correct. But given the poor performance I have seen ( 2 GHz Opterons coming in at ~15% slower than a 3 GHz P4 on a compute/memory intensive application when most benchmarks I have seen would imply the inverse ), I will double-check that when given a chance. I will probably just try the latest 2.6 kernel and a few other tweaks as well, and AMD has also offerred help, but that would more likely be at the application layer ( which I don't have control of, unfortunately ). Thanks for the response, and my apologies for the vagueness of the question. Derek R. Donald Becker wrote: >On Mon, 24 Nov 2003, Derek Richardson wrote: > > > >>Does anyone know where to find info on tuning the linux kernel for >>Opterons? Googling hasn't turned up much useful information. >> >> > >What type of tuning? >PCI bus transactions (the Itanium required more, but the Opteron still >benefits)? Scheduling? Processor affinity? What kernel version? >If you ask specific questions, there is likely someone on the list that >knows the specific answer. > >The easiest performance improvement comes from proper memory DIMM >configuration to match the application layout. Each processor has its >own local memory controller, and understanding how the memory slots are >filled and the options e.g. interleave can make a 30% difference on a >dual processor system. > > > -- Linux Administrator derek.richardson at pgs.com derek.richardson at ieee.org Office 713-781-4000 Cell 713-817-1197 Madison's Inquiry: If you have to travel on the Titanic, why not go first class? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeff.zheng at intel.com Wed Nov 26 07:24:27 2003 From: jeff.zheng at intel.com (Zheng, Jeff) Date: Wed, 26 Nov 2003 20:24:27 +0800 Subject: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Message-ID: <37FBBA5F3A361C41AB7CE44558C3448E011959F2@pdsmsx403.ccr.corp.intel.com> I can connect using following commands: [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 none user "" " " Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 After set LAN password by SSU, [root at jeff home]# ipmicmd -k "0f 00 06 01" lan 172.16.211.198 623 md2 user "" "1 23456" Connection 0 to the BMC is upConnection to the BMC restoredNo IPMB address speci fied 0f 07 00 01 00 20 81 00 19 51 9f 57 01 00 0e 00 00 10 01 25 > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: WANGNICK Sebastian [mailto:sebastian.wangnick at eurocontrol.int] Sent: Wednesday, November 26, 2003 4:30 PM To: Zheng, Jeff; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Oh, yes. My original email contained the Ethereal dumps: --- snip --- But still, when trying ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the board doesn't respond to the Activate Session request: Frame 1 (65 bytes on wire, 65 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.094931000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 65 bytes Capture Length: 65 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 51 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1923 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 31 Checksum: 0x8d6b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 9 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Request LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Checksum 2: 0x33 Data (2 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 33 00 00 40 00 40 11 19 23 c8 c8 c8 01 c8 c8 .3.. at .@..#...... 0020 c8 04 1b 59 02 6f 00 1f 8d 6b 06 00 ff 07 00 00 ...Y.o...k...... 0030 00 00 00 00 00 00 00 09 20 18 c8 81 04 38 0e 02 ........ ....8.. 0040 33 3 Frame 2 (72 bytes on wire, 72 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321677000 Time delta from previous packet: 0.226746000 seconds Time relative to first packet: 0.226746000 seconds Frame Number: 2 Packet Length: 72 bytes Capture Length: 72 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 58 Identification: 0x0a9c (2716) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e70 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 38 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Channel Auth Capabilities (0x38) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 16 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x04 0000 01.. = Sequence: 0x01 .... ..00 = Response LUN: 0x00 Command: Get Channel Auth Capabilities (0x38) Completion Code: Command completed normally (0x00) Checksum 2: 0x97 Data (8 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 3a 0a 9c 40 00 40 11 0e 70 c8 c8 c8 04 c8 c8 .:.. at .@..p...... 0020 c8 01 02 6f 1b 59 00 26 00 00 06 00 ff 07 00 00 ...o.Y.&........ 0030 00 00 00 00 00 00 00 10 81 1c 63 20 04 38 00 01 ..........c .8.. 0040 20 1c 00 bd 13 00 00 97 ....... Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.321885000 Time delta from previous packet: 0.000208000 seconds Time relative to first packet: 0.226954000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0x3693 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0xd0 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 1b 59 02 6f 00 2e 36 93 06 00 ff 07 00 00 ...Y.o..6....... 0030 00 00 00 00 00 00 00 18 20 18 c8 81 08 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 d0 DMIN............ Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.436904000 Time delta from previous packet: 0.115019000 seconds Time relative to first packet: 0.341973000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a9d (2717) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e62 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: afs3-callback (7001) Source port: aux_bus_shunt (623) Destination port: afs3-callback (7001) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Remote Console Software ID (0x81) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0x63 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x08 0000 10.. = Sequence: 0x02 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0xdd Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 9d 40 00 40 11 0e 62 c8 c8 c8 04 c8 c8 .G.. at .@..b...... 0020 c8 01 02 6f 1b 59 00 33 00 00 06 00 ff 07 00 00 ...o.Y.3........ 0030 00 00 00 00 00 00 00 1c 81 1c 63 20 08 39 00 10 ..........c .9.. 0040 3d 22 9c 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 =".uN.9..Vao ..x 0050 6b 86 c2 dd 00 k.... Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:50:10.437332000 Time delta from previous packet: 0.000428000 seconds Time relative to first packet: 0.342401000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (odslin4.mas.eurocontrol.be) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: afs3-callback (7001), Dst Port: aux_bus_shunt (623) Source port: afs3-callback (7001) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0xc2b9 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0x9c223d10 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0x9c223d10 Authentication Code: DD4D2F557F83B6DFE6E9CACAE38CA53E Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Remote Console Software ID (0x81) Seq/LUN: 0x0c 0000 11.. = Sequence: 0x03 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0x3c Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 1b 59 02 6f 00 43 c2 b9 06 00 ff 07 05 00 ...Y.o.C........ 0030 00 00 00 10 3d 22 9c dd 4d 2f 55 7f 83 b6 df e6 ....="..M/U..... 0040 e9 ca ca e3 8c a5 3e 1d 20 18 c8 81 0c 3a 05 02 ......>. ....:.. 0050 75 4e d2 39 a4 0b 56 61 6f 20 df ea 78 6b 86 c2 uN.9..Vao ..xk.. 0060 dd 76 ef fd 3c .v..< The same holds true when changing ipmi_auths[5] to: { ipmi_md5_authcode_init, ipmi_md5_authcode_gen, ipmi_md5_authcode_check, ipmi_md5_authcode_cleanup } However, the Java IPMI tool that SuperMicro delivers with the BMC is able to activate the session: Frame 3 (80 bytes on wire, 80 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.427023000 Time delta from previous packet: 0.002488000 seconds Time relative to first packet: 0.123603000 seconds Frame Number: 3 Packet Length: 80 bytes Capture Length: 80 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 66 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x1914 (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 46 Checksum: 0xcd34 (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 24 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Get Session Challenge (0x39) Checksum 2: 0x59 Data (17 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 42 00 00 40 00 40 11 19 14 c8 c8 c8 01 c8 c8 .B.. at .@......... 0020 c8 04 8d af 02 6f 00 2e cd 34 06 00 ff 07 00 00 .....o...4...... 0030 00 00 00 00 00 00 00 18 20 18 c8 00 00 39 05 41 ........ ....9.A 0040 44 4d 49 4e 00 00 00 00 00 00 00 00 00 00 00 59 DMIN...........Y Frame 4 (85 bytes on wire, 85 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.542351000 Time delta from previous packet: 0.115328000 seconds Time relative to first packet: 0.238931000 seconds Frame Number: 4 Packet Length: 85 bytes Capture Length: 85 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 71 Identification: 0x0a93 (2707) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e6c (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 51 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Get Session Challenge (0x39) Session: ID 0x00000000 (9 bytes) Authentication Type: NONE (0x00) Session Sequence Number: 0x00000000 Session ID: 0x00000000 Message Length: 28 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Get Session Challenge (0x39) Completion Code: Command completed normally (0x00) Checksum 2: 0x85 Data (20 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 47 0a 93 40 00 40 11 0e 6c c8 c8 c8 04 c8 c8 .G.. at .@..l...... 0020 c8 01 02 6f 8d af 00 33 00 00 06 00 ff 07 00 00 ...o...3........ 0030 00 00 00 00 00 00 00 1c 00 1c e4 20 00 39 00 10 ........... .9.. 0040 30 d2 b3 b4 66 5c 8d 59 95 1f 9e 48 1d 51 b3 4f 0...f\.Y...H.Q.O 0050 cd eb 3f 85 00 ..?.. Frame 5 (101 bytes on wire, 101 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.549163000 Time delta from previous packet: 0.006812000 seconds Time relative to first packet: 0.245743000 seconds Frame Number: 5 Packet Length: 101 bytes Capture Length: 101 bytes Ethernet II, Src: 00:04:76:0f:3f:b5, Dst: 00:30:48:25:61:4d Destination: 00:30:48:25:61:4d (Supermic_25:61:4d) Source: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Type: IP (0x0800) Internet Protocol, Src Addr: odslin4.mas.eurocontrol.be (200.200.200.1), Dst Addr: 200.200.200.4 (200.200.200.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 87 Identification: 0x0000 (0) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x18ff (correct) Source: odslin4.mas.eurocontrol.be (200.200.200.1) Destination: 200.200.200.4 (200.200.200.4) User Datagram Protocol, Src Port: 36271 (36271), Dst Port: aux_bus_shunt (623) Source port: 36271 (36271) Destination port: aux_bus_shunt (623) Length: 67 Checksum: 0x836b (correct) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Request (0x06), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000000 Session ID: 0xb3d23010 Authentication Code: 5A50292FC164E754A3E7846B0A96880F Message Length: 29 Response Address: BMC Slave Address (0x20) NetFn/LUN: Application Request 0001 10.. = NetFn: Application Request (0x06) .... ..00 = Response LUN: 0x00 Checksum 1: 0xc8 Request Address: Unknown (0x00) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Request LUN: 0x00 Command: Activate Session (0x3a) Checksum 2: 0xbd Data (22 bytes) 0000 00 30 48 25 61 4d 00 04 76 0f 3f b5 08 00 45 00 .0H%aM..v.?...E. 0010 00 57 00 00 40 00 40 11 18 ff c8 c8 c8 01 c8 c8 .W.. at .@......... 0020 c8 04 8d af 02 6f 00 43 83 6b 06 00 ff 07 05 00 .....o.C.k...... 0030 00 00 00 10 30 d2 b3 5a 50 29 2f c1 64 e7 54 a3 ....0..ZP)/.d.T. 0040 e7 84 6b 0a 96 88 0f 1d 20 18 c8 00 00 3a 05 04 ..k..... ....:.. 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 00 00 00 bd ..... Frame 6 (90 bytes on wire, 90 bytes captured) Arrival Time: Nov 25, 2003 10:44:18.627386000 Time delta from previous packet: 0.078223000 seconds Time relative to first packet: 0.323966000 seconds Frame Number: 6 Packet Length: 90 bytes Capture Length: 90 bytes Ethernet II, Src: 00:30:48:25:61:4d, Dst: 00:04:76:0f:3f:b5 Destination: 00:04:76:0f:3f:b5 (3Com_0f:3f:b5) Source: 00:30:48:25:61:4d (Supermic_25:61:4d) Type: IP (0x0800) Internet Protocol, Src Addr: 200.200.200.4 (200.200.200.4), Dst Addr: odslin4.mas.eurocontrol.be (200.200.200.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00) 0001 00.. = Differentiated Services Codepoint: Unknown (0x04) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 76 Identification: 0x0a94 (2708) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0e66 (correct) Source: 200.200.200.4 (200.200.200.4) Destination: odslin4.mas.eurocontrol.be (200.200.200.1) User Datagram Protocol, Src Port: aux_bus_shunt (623), Dst Port: 36271 (36271) Source port: aux_bus_shunt (623) Destination port: 36271 (36271) Length: 56 Checksum: 0x0000 (none) Remote Management Control Protocol, Class: IPMI Version: 0x06 Sequence: 0xff Type: Normal RMCP, Class: IPMI ...0 0111 = Class: IPMI (0x07) 0... .... = Message Type: Normal RMCP (0x00) Intelligent Platform Management Interface, NetFn: Application Response (0x07), Cmd: Activate Session (0x3a) Session: ID 0xb3d23010 (25 bytes) Authentication Type: OEM (0x05) Session Sequence Number: 0x00000001 Session ID: 0xb3d23010 Authentication Code: 1C000048D88D000200000000000000A4 Message Length: 18 Request Address: Unknown (0x00) NetFn/LUN: Application Response 0001 11.. = NetFn: Application Response (0x07) .... ..00 = Request LUN: 0x00 Checksum 1: 0xe4 Response Address: BMC Slave Address (0x20) Seq/LUN: 0x00 0000 00.. = Sequence: 0x00 .... ..00 = Response LUN: 0x00 Command: Activate Session (0x3a) Completion Code: Command completed normally (0x00) Checksum 2: 0xd8 Data (10 bytes) 0000 00 04 76 0f 3f b5 00 30 48 25 61 4d 08 00 45 10 ..v.?..0H%aM..E. 0010 00 4c 0a 94 40 00 40 11 0e 66 c8 c8 c8 04 c8 c8 .L.. at .@..f...... 0020 c8 01 02 6f 8d af 00 38 00 00 06 00 ff 07 05 01 ...o...8........ 0030 00 00 00 10 30 d2 b3 1c 00 00 48 d8 8d 00 02 00 ....0.....H..... 0040 00 00 00 00 00 00 a4 12 00 1c e4 20 00 3a 00 05 ........... .:.. 0050 10 30 d2 b3 00 00 00 00 04 d8 .0........ --- snip --- Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Zheng, Jeff [mailto:jeff.zheng at intel.com] Sent: Wednesday 26 November 2003 08:56 To: WANGNICK Sebastian; openipmi-developer at lists.sourceforge.net; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Did you enable lan support? > Thanks > Jeff Jeff.Zheng at intel.com > BTW, I speak for myself, not for Intel Corp. -----Original Message----- From: openipmi-developer-admin at lists.sourceforge.net [mailto:openipmi-developer-admin at lists.sourceforge.net]On Behalf Of WANGNICK Sebastian Sent: Wednesday, November 26, 2003 3:30 PM To: 'openipmi-developer at lists.sourceforge.net'; Beowulf Subject: RE: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed Hello Corey, thanks, I'll try to challenge SuperMicro with that ... Actually, I'm currenly evaluating PC's for a replacement programme of one of our systems (about 120 machines): *) I've got the SuperMicro SM-X5DPL-iGM with the IPMI 1.5 kit, which has the prescribed problems. *) I've also got a Tyan S2469UGN with the M3289 remote supervisor adaptor, but the M3289 didn't work and was returned for repair. *) I've also got an IBM E-Series x225 server. That one doesn't support IPMI, however, but runs a web server on the remote management card. This card has its own backup power supply and network connection (which is nice since you can provide it with its IP address via DHCP). Could anyone on this mailing recommend another dual-processor mainboard, which would be capable of one of the "standard" IPMI authentication mechanisms? I'm cross-posting to the beowulf mailing list since Google tells me that the SuperMicro issue has been discussed there before as well ... Regards, Sebastian -- Dipl.-Inform. Sebastian Wangnick Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6199AC Maastricht-Airport, Tel: +31-433661-370, Fax: -300 -----Original Message----- From: Corey Minyard [mailto:minyard at acm.org] Sent: Tuesday 25 November 2003 15:10 To: WANGNICK Sebastian Cc: 'openipmi-developer at lists.sourceforge.net' Subject: Re: [Openipmi-developer] OpenIPMI via LAN to SuperMicro board doesn't succeed According to that authentication support bitmask, it only supports an OEM authentication type, as you have already figured out. That means you have to find out what algorithm they are using and implement it. OpenIPMI doesn't currently have an interface to register authentication algorithms, but it needs one, I guess. Also, you probably want "admin" privilege level, not "user" as user cannot really do very much. -Corey WANGNICK Sebastian wrote: >Dear all, > >I'm trying in vain to connect via LAN to a SuperMicro systems using >OpenIPMI (1.1.5). It seems that the SuperMicro *only* offers >Authorisation Capability 5 (that is OEM according to the Ethereal IPMI >code). > >[I've changed ipmi_lan.c:auth_cap_done to provide the details of the problem: > ipmi_log(IPMI_LOG_ERR_INFO, "Requested authentication %d not >supported (supporting 0x%x only)", lan->authtype, (unsigned int) >msg->data[2]); ] > >Seems that the SuperMicro BMC sets msg->data[2] to 0x20. > >Now, to overcome the problem, I've added to ipmi_auth.h: > #define IPMI_AUTHTYPE_OEM 5 >changed in ipmi_auth.c ipmi_auths[5] to: > { ipmi_md2_authcode_init, ipmi_md2_authcode_gen, > ipmi_md2_authcode_check, ipmi_md2_authcode_cleanup } >and added to sample.c:main: > } else if (strcmp(argv[curr_arg+3], "oem") == 0) { > authtype = IPMI_AUTHTYPE_OEM; > >But still, when trying > ./ipmisample -dmsg lan 200.200.200.4 623 oem user ADMIN ADMIN the >board doesn't respond to the Activate Session request: > > ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Wed Nov 26 04:51:25 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Wed, 26 Nov 2003 10:51:25 +0100 Subject: booting from usb pen drive In-Reply-To: <1069516333.2018.1.camel@loiosh> References: <3FAA831D0001F2C1@mail-1.tiscali.it> <1069516333.2018.1.camel@loiosh> Message-ID: <1069840284.7852.16.camel@revolution.mandrakesoft.com> > The short answer is yes. The long answer is, it depends on your BIOS. > Its kinda like a few years ago when some systems would boot from CD and > some wouldn't. Agreed but I've played a bit with it and it seems there is many way a bios boot an usb key. The usb key could be detected as a USB-FDD, USB-ZIP, USB-HDD, USB-CDROM. The geometry & the bootloader you use could change the way it is detected. Many BIOSes don't give the choice between this options, there is only a "Boot USB" option which usually equals USB-ZIP or USB-FDD. Then if you are using a FAT filesystem you can use syslinux, or grub/lilo on all the others (FAT included). I'm using my usb key as a firmware/bios updater when PXE is not available and/or for booting a rescue linux for repairing some linux boxes. Best regards, PS: On my Asus A7N8x-Deluxe, if the Bios Option "USB LEGACY MOUSE" is not activated I can't boot on my usb key ! :((( I took some time to find it. I've reported it to ASUS but no news, no fix :(. I know this is not a usual Clustering Mobo but such trick could help some of you. -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Nov 26 13:59:33 2003 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 26 Nov 2003 10:59:33 -0800 Subject: Opteron kernel In-Reply-To: <3FC4EFB3.10708@pgs.com> References: <3FC4EFB3.10708@pgs.com> Message-ID: <20031126185933.GB16806@cse.ucdavis.edu> > for testing ) and I assumed to be correct. But given the poor > performance I have seen ( 2 GHz Opterons coming in at ~15% slower than a > 3 GHz P4 on a compute/memory intensive application when most benchmarks > I have seen would imply the inverse ), I will double-check that when I've seen this repeatedly. Did each opteron cpu have 2 or 4 dimms attached? Did you benchmark TWO jobs on the opteron vs TWO jobs on the P4? Is the memory at least PC2700? Have you played with the BIOS settings, I've seen significant speedups playing with both the node interleaving and the memory interleaving. I can provide a benchmark that should show 2GB/sec to main memory on a single opteron, and 3 GB/sec to a dual opteron if it's properly setup. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From derek.richardson at pgs.com Wed Nov 26 14:37:32 2003 From: derek.richardson at pgs.com (Derek Richardson) Date: Wed, 26 Nov 2003 13:37:32 -0600 Subject: Opteron kernel In-Reply-To: <20031126185933.GB16806@cse.ucdavis.edu> References: <3FC4EFB3.10708@pgs.com> <20031126185933.GB16806@cse.ucdavis.edu> Message-ID: <3FC500FC.7080101@pgs.com> Bill, It had 4 DIMMs, w/ 6 slots on of the motherboard. I am going to go confirm which banks correspond to what shortly, just in case IBM put it together w/ an imbalance. Yes, we have done the 2 job scenario ( that's our primary mode of operating, running 2 jobs ( called executive shells, or es's ) on a dual-cpu node, w/ anywhere from 10 up to 125 nodes participating in a job ), the memory should be DDR333 IIRC. The BIOS doesn't have much in the way of tuning options, more's the pity. What's the benchmark? If it's a publicly available one, I probably have it installed already, if not, yes, I would appreciate it. When you have seen it repeatedly, you mean an improper distribution of memory to CPU? Derek R. Bill Broadley wrote: >>for testing ) and I assumed to be correct. But given the poor >>performance I have seen ( 2 GHz Opterons coming in at ~15% slower than a >>3 GHz P4 on a compute/memory intensive application when most benchmarks >>I have seen would imply the inverse ), I will double-check that when >> >> > >I've seen this repeatedly. > >Did each opteron cpu have 2 or 4 dimms attached? > >Did you benchmark TWO jobs on the opteron vs TWO jobs on the P4? > >Is the memory at least PC2700? > >Have you played with the BIOS settings, I've seen significant speedups >playing with both the node interleaving and the memory interleaving. > >I can provide a benchmark that should show 2GB/sec to main memory >on a single opteron, and 3 GB/sec to a dual opteron if it's properly >setup. > > > -- Linux Administrator derek.richardson at pgs.com derek.richardson at ieee.org Office 713-781-4000 Cell 713-817-1197 Madison's Inquiry: If you have to travel on the Titanic, why not go first class? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Wed Nov 26 15:43:29 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed, 26 Nov 2003 12:43:29 -0800 Subject: Opteron kernel In-Reply-To: <1069817334.8326.122.camel@protein.scalableinformatics.com> References: <3FC29050.6000003@pgs.com> <20031125232149.GA2995@greglaptop.internal.keyresearch.com> <1069817334.8326.122.camel@protein.scalableinformatics.com> Message-ID: <20031126204329.GE2793@greglaptop.internal.keyresearch.com> On Tue, Nov 25, 2003 at 10:28:54PM -0500, Joe Landman wrote: > Has anyone implemented a real runon, or built something like the old > IRIX dplace stuff yet? I had been looking into this, and don't want to > re-invent a working thing... In addition to the SUSE thing posted already, Fedora has some kind of user utility too. We're using an in-house thingie for now, same functionality, different name... -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laurenceliew at yahoo.com.sg Wed Nov 26 17:56:53 2003 From: laurenceliew at yahoo.com.sg (Laurence Liew) Date: Thu, 27 Nov 2003 06:56:53 +0800 Subject: long reply (was RE: LONG RANT [RE: RHEL Copyright Removal]) In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> References: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> Message-ID: <1069887413.2179.577.camel@scalable> Hi, It is great that you have convinced your management to use Linux and winning the game :-) you should look at using RHEL WS for the compute nodes and RHEL ES or AS for the frontend. It will lower your costs quite a bit. for 1 frontend with AS (USD1499) and 42 compute (42 x USD179) = USD$9017. (for 42 nodes, u probably can and should get discounts!) Your cost of your hardware would probable amount to around USD100K. So the OS costs comes up to about 10%... I believe 10% for OS for a cluster is about right. I understand how you feel about RHEL policies etc, and I am hopeful that RH will have specific HPC pricing further down the road. I would encourage you to speak to your RH rep nicely and explain what you are doing and why you think you should get a "HPC" pricing. (ie. more discounts of the compute nodes) You will be surprised that not all in Red Hat appreciates HPC and what we do, and why their model of pricing currently does not work for us. As for alternative distro, you may wish to look at Novel/Suse Linux and use it as a counter balance to RH. Again, you will note that I can only encourage the use of these "commercial" distros as it will probably be part of a ISVs supported matrix. Most of my customers are sticky about such support and demands that the OS used is a supported OS for their applications. I look forward to the day Novell Linux offers a HPC pricing model. But again for them to do so and provide the support and patch stream, there will be costs and I hope it will be reasonable and which something the community can accept. Cheers! laurence On Thu, 2003-11-27 at 01:48, graham.mullier at syngenta.com wrote: > Laurence Liew wrote: > [...] > > The cluster community have done very well and today, large commercial > > organisations are adopting linux clusters as one of the tools they use > > to solve their complex problems. > > Yup, that's us - we are currently adopting a variety of open source tools, > including Linux, to help tackle at least some of our HPC needs. > > > > But I find this talk of "stripping" RHEL copyright to create > > yet another > > distro to be counter productive as linux beowulf clusters goes into > > commercial mainstream computing.... where customers have specific > > support demands. (And yes... commercial customers WILL PAY > > the full list > > price of RHEL to build a cluster). > > > > Now... I believe the USD25 and USD50 are acceptable pricing for the > > value that RHEL + RHN brings to the customer (academic). The > > cost of the > > OS is a small fraction of the total value of the cluster. > > > > Most of our users want a stable and supported OS, but more > > importantly, > > most of them run a commercial software of one form or another... and > > this means that these 3rd party ISV softwares are most likely to be > > certified on RHEL. > [...] > > I think you are confusing things here (I know you are ranting but let's try > to keep the arguments coherent, please! ;) > I'm running a project within a commercial company, so academic rates are of > no use to me. I am willing to pay for what I get, but I'm not willing to pay > simply to give us a warm glow that we are "supported". If I get some value > I'll pay. I don't think I get value if I'm expected to pay separately for > each copy of RHEL-AS on each of 42 compute nodes, and the only price I'm > offered is an extreme full list price. I would be willing to buy into a > model where I'm paying for a clean, well-tested patch stream. But that model > can not scale cost linearly with number of installed nodes - I'm not even > convinced it can scale as the log of the number of nodes. > > > > if the community continues to fork a project just becauses it charges > > some $$$$, our progress would be very slow.... Redhat have listened to > > the customer and partners and have created a academic pricing > > model for > > cluster builders... so we should accept that and move on. > As I've said above, this is simply confused and does nothing for me or my > project. The community depends on people contributing work - and in some > cases those people contribute work in exchange for remuneration. But in > other cases we as a community find ways of driving development forward > through what amounts to barter - we all get value from the open source > software, and we all contribute to it in some way. > > RH is (or at least appears to be) going down the restrictive licence, > over-priced model pushed by MS. They've also learned the 'force frequent > upgrades' trick. That leaves me uncomfortable about them as a vendor with > whom I believe I'll have a good long-term relationship. > > But in the short-term software I use needs "RH 7.1", or "supported only on > RH 7.3" or "RHEL-AS 2.1". Great. So I want ways of using RH that reduce my > risks (what if RH stop making binaries available - can I still operate? If > not, I want to be able to recompile from the source, and need to avoid > copyright infringement problems). > [...] > > disarray and you will see droves of commercial ISVs > > abandoning linux and > > moving back to UNIX and Windows.... > > > > where would that leave us? without commercial apps, linux would never > > sustain and grow in the commercial arena. > > > ah, well, now you've moved off into another universe. This isn't the one I'm > in. Closed source is bad - it gets in my way, makes my life difficult, and > increases my project's risks enormously. > > Why should I pay RH huge sums of money for Linux AND have to fight to get > acceptance of Linux internally when I could take the "easy" option and just > buy Windows? [by the way, I know why, and I'm fighting - and winning] > > Where I am now is a small part of the commercial arena, it uses commercial > apps that run on Linux because we, customers, demand that they do. If RH > make life difficult for us (awkward licence model and/or high price per > node) we will start looking for ways around the problem, because it is > worthwhile. Maybe we'll shift to another distro, maybe we'll take the time > and sort out how to build it ourselves - and once we've done that, what use > are RH to us? And if they are no use, will they get any money - no I don't > think so. > > Open source is a whole new way of working - and the money has to come in a > different way. If we're offered useful services that we can't or don't want > to handle internally, we'll look at buying them. But if the price is too > high we won't bother. > > Graham > > (long term IRIX user, computational chemist, and now chemoinformatics > specialist. I put up with Windows for office use but wouldn't want to rely > on it for anything important...) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Wed Nov 26 20:59:44 2003 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed, 26 Nov 2003 17:59:44 -0800 Subject: long reply (was RE: LONG RANT [RE: RHEL Copyright Removal]) In-Reply-To: <1069887413.2179.577.camel@scalable> References: <0B27450D68F1D511993E0001FA7ED2B30437CEB8@ukjhmbx12.ukjh.zeneca.com> <1069887413.2179.577.camel@scalable> Message-ID: <20031127015944.GB4959@greglaptop.internal.keyresearch.com> On Thu, Nov 27, 2003 at 06:56:53AM +0800, Laurence Liew wrote: > Your cost of your hardware would probable amount to around USD100K. So > the OS costs comes up to about 10%... I believe 10% for OS for a cluster > is about right. That might be true if the OS was cluster-aware and helped out with cluster problems. RHEL doesn't meet this standard. > Most of my customers are sticky about such support and demands > that the OS used is a supported OS for their applications. Then by all means sell them whatever they will buy. But don't be surprised if they think they're getting ripped off paying so much for a non-cluster-aware OS. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eccf at super.unam.mx Thu Nov 27 19:38:35 2003 From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores) Date: Thu, 27 Nov 2003 18:38:35 -0600 (CST) Subject: Grand Challenge Message-ID: Hi to all, Does anybody know if there were any articles talking about the Grand Challenge years ago? Sorry if this is a bit out of topic cafe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Fri Nov 28 15:04:30 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Fri, 28 Nov 2003 21:04:30 +0100 Subject: Mainboard identification and BIOS dump Message-ID: <1070049870.528.34.camel@qeldroma.cttc.org> Hi again, We have a fully OS remote installation to recover crashed nodes or upgrade them. They're configured and installed through BOOTP, NFS and some scripting, but our cluster and workstation machines are not uniform at all and some critical configuration and monitoring depends on motherboard model. BIOS on PC's is found at the last 64 Kb as reported by the "System ROM" entry at /proc/iomem: 00000000-0009efff : System RAM 0009f000-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000d0000-000d5fff : Extension ROM 000f0000-000fffff : System ROM 00100000-1fffbfff : System RAM 00100000-0023d67d : Kernel code 0023d67e-002b8f1f : Kernel data We have just done a simply BIOS dump script first, avoiding trouble with kernel calls in C language: dd if=/dev/mem bs=1048575 count=1 | tail -c 65535 > dumpbios.bin Therefore, we just need to parse this "dumpbios.bin" file and check against a small database file if a known motherboard string is present. I think data strings are put at the same place through different models ( supposing same bios manufacturer ), so brute force parsing this file won't be needed... anyway this file is damn short. Is there any motherboard identifying utility for linux ? We could also mess with kernel calls as well but that method should suffice ? any thoughts ? Thank you in advance. -- Daniel Fernandez Laboratori de Termot?cnia i Energia - CTTC www.upc.edu/lte c/ Colom n?11 UPC Campus Terrassa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Nov 28 20:28:53 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 28 Nov 2003 17:28:53 -0800 (PST) Subject: Mainboard identification and BIOS dump In-Reply-To: <1070049870.528.34.camel@qeldroma.cttc.org> Message-ID: hi ya daniel assuming that one can uniquely identify a motherboard model... a) yes ... sometimes that info Asus p4-aaaa is in the bios but usually not b) why play with dd if=/dev/mem .... its easier to save a copy of the output of dmesg on bootups /etc/rc.d/rc.local echo "" echo "save some startup info" dmesg > /etc/rc.d/rc.local.dmesg - append info from /proc/io proc/cpuinfo /proc/pci /proc/iomem /proc/meminfo cat /proc/cpuinfo >> /etc/rc.d/rc.local.dmesg .... - poke around at that rc.localdmesg file when you wanna know which mb it might be - you'd have to make a list of mapping/signatures from the chipset back to the mb manufacturer and model# making a kernel that supports all your hardware is the easiest way to handle the non-homogenous network c ya aklvin On Fri, 28 Nov 2003, Daniel Fernandez wrote: > Hi again, > > We have a fully OS remote installation to recover crashed nodes or > upgrade them. They're configured and installed through BOOTP, NFS and > some scripting, but our cluster and workstation machines are not uniform > at all and some critical configuration and monitoring depends on > motherboard model. > > BIOS on PC's is found at the last 64 Kb as reported by the "System ROM" > entry at /proc/iomem: > > 00000000-0009efff : System RAM > 0009f000-0009ffff : reserved > 000a0000-000bffff : Video RAM area > 000c0000-000c7fff : Video ROM > 000d0000-000d5fff : Extension ROM > 000f0000-000fffff : System ROM > 00100000-1fffbfff : System RAM > 00100000-0023d67d : Kernel code > 0023d67e-002b8f1f : Kernel data > > We have just done a simply BIOS dump script first, avoiding trouble with > kernel calls in C language: > > dd if=/dev/mem bs=1048575 count=1 | tail -c 65535 > dumpbios.bin > > Therefore, we just need to parse this "dumpbios.bin" file and check > against a small database file if a known motherboard string is present. > I think data strings are put at the same place through different models > ( supposing same bios manufacturer ), so brute force parsing this file > won't be needed... anyway this file is damn short. > > Is there any motherboard identifying utility for linux ? We could also > mess with kernel calls as well but that method should suffice ? any > thoughts ? Thank you in advance. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nishanth at mec.ac.in Fri Nov 28 21:16:48 2003 From: nishanth at mec.ac.in (Nishanth Rajan) Date: Sat, 29 Nov 2003 07:46:48 +0530 (IST) Subject: MOSIX In-Reply-To: <200311281704.hASH4SS04011@NewBlue.scyld.com> References: <200311281704.hASH4SS04011@NewBlue.scyld.com> Message-ID: <4541.202.88.246.210.1070072208.squirrel@mail.mec.ac.in> hi everybody, This is Nishanth from Cochin , India. I am new comer into this mailing list.. Iam an engineering student and am intending to do MOSIX for my project... If anyone could help me in this regard...pls contact me. Thanks Nishanth <-=+||+=-> _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nashif at planux.com Sat Nov 29 23:28:32 2003 From: nashif at planux.com (Anas Nashif) Date: Sat, 29 Nov 2003 23:28:32 -0500 Subject: Mainboard identification and BIOS dump In-Reply-To: <1070049870.528.34.camel@qeldroma.cttc.org> References: <1070049870.528.34.camel@qeldroma.cttc.org> Message-ID: <3FC971F0.7060807@planux.com> Hi, DMI decode is your friend http://www.nongnu.org/dmidecode/ Anas Daniel Fernandez wrote: > Hi again, > > We have a fully OS remote installation to recover crashed nodes or > upgrade them. They're configured and installed through BOOTP, NFS and > some scripting, but our cluster and workstation machines are not uniform > at all and some critical configuration and monitoring depends on > motherboard model. > > BIOS on PC's is found at the last 64 Kb as reported by the "System ROM" > entry at /proc/iomem: > > 00000000-0009efff : System RAM > 0009f000-0009ffff : reserved > 000a0000-000bffff : Video RAM area > 000c0000-000c7fff : Video ROM > 000d0000-000d5fff : Extension ROM > 000f0000-000fffff : System ROM > 00100000-1fffbfff : System RAM > 00100000-0023d67d : Kernel code > 0023d67e-002b8f1f : Kernel data > > We have just done a simply BIOS dump script first, avoiding trouble with > kernel calls in C language: > > dd if=/dev/mem bs=1048575 count=1 | tail -c 65535 > dumpbios.bin > > Therefore, we just need to parse this "dumpbios.bin" file and check > against a small database file if a known motherboard string is present. > I think data strings are put at the same place through different models > ( supposing same bios manufacturer ), so brute force parsing this file > won't be needed... anyway this file is damn short. > > Is there any motherboard identifying utility for linux ? We could also > mess with kernel calls as well but that method should suffice ? any > thoughts ? Thank you in advance. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sun Nov 30 01:58:35 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sat, 29 Nov 2003 22:58:35 -0800 (PST) Subject: Mainboard identification and BIOS dump In-Reply-To: <3FC971F0.7060807@planux.com> Message-ID: hi ya anas On Sat, 29 Nov 2003, Anas Nashif wrote: > Hi, > > DMI decode is your friend > > http://www.nongnu.org/dmidecode/ very nice !! c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf