From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sun, 25 May 103 21:25:17 +0400 (MSD) Subject: Opteron-based nodes benchmarks: RDTSC Message-ID: <200305251725.VAA20503@nocserv.free.net> I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz server we want to use in Beowulf cluster. In particular, I need to measure small time intervals, for which I want to use RDTSC-based "function" (for example I attach below one - published by T.Prince). But it requires some minor modifications, I beleive, to work properly on x86-64. I use gcc-3.2 under SuSE SLES8 and call this function from the source compilated by pgf90-5.0beta2 (64-bit mode). The original source version of function by T.Prince gives assembler errors because i386 is not pre-defined. I simple defined both i386 and _M_IX86, gcc -c is now OK, it create 64-bit object module, but after linking and runs of test the time measured is wrong :-( (negative in some cases). I'll be very appreciate for any ideas what should I modify in the source (applied below) to resolve the problem. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow kus at free.net > =================================================== > #define _IFC 1 > > #define CLOCK_RATE 1600000000 > /* SET THIS AND RECOMPILE FOR TARGET MACHINE */ > #undef _WIN32 > /* set not to use API calls even on Windows */ > #ifdef _WIN32 > #include > #endif > unsigned long long int rdtsc( ) > { > #ifdef _M_IA64 > > unsigned __int64 __getReg(int whichReg); > #pragma intrinsic(__getReg); > #define INL_REGID_APITC 3116 > > return __getReg(INL_REGID_APITC); > #elif defined(_WIN32) > unsigned long long int qpc; > (void)QueryPerformanceCounter((LARGE_INTEGER *)&qpc); > return qpc; > #elif defined(__GNUC__) > #ifdef i386 > long long a; > asm volatile("rdtsc":"=A" (a)); > return a; > #else > unsigned long result; > /* gcc-IA64 version */ > __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: "memory"); > while (__builtin_expect ((int) result == -1, 0)) > __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: > "memory"); > return result; > > #endif > #elif defined(_M_IX86) > _asm > { > _emit 0x0f /* rdtsc */ > _emit 0x31 > > } > return; > #else > #error "only supports IA64,IX86,GNUC" > #endif > } > > #ifdef _G77 > double g77_etime_0__ (float tarray[2]) > #elif defined (_IFC) > double g77_etime_0_ (float tarray[2]) > #else > double g77_etime_0 (float tarray[2]) > #endif > > { > static int win32_platform = -1; > double usertime, systime; > > { > static double clock_per=1./(long long)CLOCK_RATE; > static unsigned long long int old_count; > unsigned long long count; > if(!old_count){ > #ifdef _WIN32 > unsigned long long int qpf; > if(QueryPerformanceFrequency((LARGE_INTEGER *)&qpf)) > clock_per=1./(long long)qpf; > #endif > old_count=rdtsc(); > } > > count = rdtsc(); > tarray[0] = usertime = (long long)(count - old_count) * clock_per; > tarray[1] = 0; > } > return usertime ; > > } > > #ifdef _G77 > void f90_cputime4__(float *time){ // Intel Fortran call > #elif defined (_IFC) > void f90_cputime4_(float *time){ > #else > void f90_cputime4 (float *time){ > #endif > float tarray[2]; > #ifdef _G77 > *time=(float)g77_etime_0__ (tarray); > #else > *time=(float)g77_etime_0_ (tarray); > #endif > } > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 26 May 103 22:43:43 +0400 (MSD) Subject: Opteron-based nodes benchmarks: RDTSC In-Reply-To: <200305251725.VAA20503@nocserv.free.net> from "Mikhail Kuzminsky" at May 25, 3 09:56:38 pm Message-ID: <200305261843.WAA10134@nocserv.free.net> According to Mikhail Kuzminsky > > I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz > server we want to use in Beowulf cluster. In particular, I need to measure > small time intervals, for which I want to use RDTSC-based "function" > (for example I attach below one - published by T.Prince). But it requires > some minor modifications, I beleive, to work properly on x86-64. > I found now that all is OK if I'm using calls from g77-33 (#define for 386 and _M_IX86 as I wrote in previous message are enough). Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow kus at free.net _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 11 Jun 103 22:10:00 +0400 (MSD) Subject: NAS Parallel Benchmarks for Current Hardware In-Reply-To: <3EE609F7.BE430A1E@ideafix.litec.csic.es> from "A.P.Manners" at Jun 10, 3 05:40:23 pm Message-ID: <200306111810.WAA01122@nocserv.free.net> According to A.P.Manners > > I am looking to put together a small cluster for numerical simulation > and have been surprised at how few NPB benchmark results using current > hardware I can find via google. > It's common situation w/NPB (in opposition to Linpack, SPECcpu e.a.) :-( Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 18 Jun 103 20:05:31 +0400 (MSD) Subject: what is a flop In-Reply-To: <3EEF5F48.5020505@roma2.infn.it> from "Roberto Ammendola" at Jun 17, 3 08:34:48 pm Message-ID: <200306181605.UAA24772@nocserv.free.net> According to Roberto Ammendola > The "Floating point operations per clock cycle" depends on the > processor, obviously, and on which instructions you use in your code. > For example in a processor with the SSE instruction set you can perform > 4 operations (on 32 bit register each) per clock cycle. One processor > (Xeon or P4) running at 2.0 GHz can reach 8 GigaFlops. Taking into account that throughput of FMUL and FADD units in P4/Xeon is 2 cycles, i.e. FP result may be received on any 2nd sycle only, the peak Performance of P4/2 Ghz must be 4 GFLOPS. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 18 Jun 103 20:19:35 +0400 (MSD) Subject: SMP CPUs scaling factors (was "what is a flop") In-Reply-To: from "Franz Marini" at Jun 18, 3 10:53:17 am Message-ID: <200306181619.UAA24910@nocserv.free.net> According to Franz Marini > On Tue, 17 Jun 2003, Maurice Hilarius wrote: > > And I would say dual CPU boards do not sale at a factor of 2:1 over singles. > > ... > > As a general ( really general as it changes a lot with code and > compilers) > > the rule I know : > > Dual P3 ( VIA chipset): 1.5 : 1 > > Dual XEON P4 ( Intel 7501 chipset): 1.3 : 1 > ... > > Dual AthlonMP ( AMD 760MPX chipset) 1.4 : 1 > > Does anyone have some real world application figures regarding the > performance ratio between single and two-way (and maybe four-way) SMP > systems based on the P4 Xeon processor ? I may say about SMP speedups for AthlonMP/760MP, for P4 they will depends from chipset (kind of FSB and memory used). On G98 speedup for 2 CPUs is between 1.4-1.8 depending from calc. method and problem size. For Opteron/1.6 Ghz they are higher (up to 1.97 in some G98 tests). 4-way P4 SMP may be not too attractive if 4 CPUs will share common bus to memory. 4-way Opteron's system must be very good (they may be will arrive soon in the market). Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 20 Jun 103 17:42:01 +0400 (MSD) Subject: cluster of AOD Opteron In-Reply-To: from "Stefano" at Jun 18, 3 11:22:25 pm Message-ID: <200306201342.RAA28782@nocserv.free.net> According to Stefano > As I am going to receive some funding this fall, I was wondering of buying > an opteron cluster for my research. > Mainlym the cluster will run VASP (an ab-initio quantum program, > written by a group in Wien), with myrinet. > Is somebody who is using AMD opterons yet ? We tested 2-way SMP server based on RioWorks mobo. But I should not recommend this motherboard for using: by default it has no monitoring (temperature etc) chips on the board, it's necessary to buy special additional card ! Unfortunately as a result I don't have data about lm_sensors work. Moreover, the choice of SMP boards is very restricted now: Tyan S2880 and MSI K8D. > ... > I think some fortran vendor has announced the port of their F90 to > the opteron. Well, it would be nice to recompile VASP for 64bits and see > how fast it goes. There is some possibilities: pgf90, Intel ifc(32 bit only), g77-3.3 (now really is very good, but f77 only) and Absoft. We tested 3 first compilers. But I'm not sure that you'll receive just now essential speed-up from 64 bit mode itself. SSE2 is supported in 32 bit mode also, but it looks that SSE2 in Opteron is realized "more worse" than in P4 (in the sense of microarchitecture). Yes, some compilers can now generate codes which use additional registers from x86-64 architecture extensions, but we didn't find essential speed-up on simple loops like DAXPY. > With the itanium2 (compiled in 2 version 32 and 64 > bits), it not so fast to justify the HUGE cost of an itanium cluster. > Maybe the opteron will shake high-performace scientific computing ! I beleive yes, but for 64-bit calculations. The price for Opteron- based servers is high, and price/performance ratio in comparison w/Xeon is not clear. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 20 Jun 103 17:57:28 +0400 (MSD) Subject: cluster of AOD Opteron (Stefano) In-Reply-To: <000401c33683$aaf403c0$0b01a8c0@redstorm> from "moor007@bellsouth.net" at Jun 19, 3 11:56:02 am Message-ID: <200306201357.RAA28995@nocserv.free.net> According to moor007 at bellsouth.net > I just received my hardware yesterday for my opteron cluster. My tech will > start putting it together today or tomorrow. I am building a 16 CPU cluster > w/ the 240 processor onboard the Tyan 2880. I will be using the 2D wulfkit > running SuSE enterprise server and Portland Group Server for the Opteron. I > am hoping it will be fast. Of course, that is relative. Anyway, I said all > that to say that I will begin posting performance benchmarks as they become > available. We compared Opteron/1.6 w/dual DDR266 CL2.5 and Athlon MP 1800+ w/close frequency (1533 MHz) and DDR266 also. Speedup for Gamess-US (ifc 7.1, opt for P4) and for binary G98 version (pgf77, optimized for PIII) on a set of different computational methods (in the sense of cache localization, memory throughput requirements etc) is about 1.5-1.9. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 20 Jun 103 18:09:51 +0400 (MSD) Subject: [OT] Maximum performance on single processor ? In-Reply-To: <4.3.2.7.2.20030620140207.00ae23a0@pop.freeuk.net> from "Simon Hogg" at Jun 20, 3 02:15:47 pm Message-ID: <200306201409.SAA29175@nocserv.free.net> According to Simon Hogg > > At 14:44 20/06/03 +0200, Marc Baaden wrote: > >I have an existing application which is part of a project. I have > >the source code. It is Fortran. It *can* be parallelized, but we > >would rather spend our time on the other parts of the project > >which need to be written from scratch *first*. > > > >The application is to run in real time, that is the user does something > >and as a function of user input and the calculation with the fortran > >program that I described, there is a correponding feedback to the > >user on the screen (and in some Virtual Reality equipment). > > > >Right now, even on simple test cases, the "response time" (eg calculation > >time for a single step) of our program is on the order of the second. > >(this is for an athlon MP 2600+) > >We need to get that down to a fraction of seconds, best milli-seconds, > >in order to be usable in real time. (makes it a factor of roughly 1000) > > > >As I said the code can indeed be parallelized - maybe even simply cleaned > >up in some parts - but unfortunately there remains very much other important > >stuff to do. So we'd rather spend some money on a really fast CPU and not > >touch the code at the moment. > > > >So my question was more, what is the fastest CPU I can get for $20000 > >at the moment (without explicitly parallelizing, hyperthreading or > >vectorizing my code). > > I'm sure some other people will give 'better' answers, but from having a > look at your web pages, I would be tempted to go down the route of > second-hand SGI equipment. > > For example (and no, I don't know how the performance stacks up, I'm > looking partly at a general bio-informatics / SGI link if that makes sense) > I can see for sale an Origin 2000 Quad 500MHz / 4GB RAM for UKP 15,725. W/o parallelization it looks as bad choice: any CPU will be more slow than the same Opteron or P4. If FP performance is important, Power4+ or Itanium 2 (or, more exactly, Madison one month later) may be the best choice. And, at least, optimize your program as possible :-) Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sat, 21 Jun 103 17:48:28 +0400 (MSD) Subject: cluster of AOD Opteron In-Reply-To: <005701c33792$c7c1ddf0$6501a8c0@sims.nrc.ca> from "Serguei Patchkovskii" at Jun 20, 3 09:16:44 pm Message-ID: <200306211348.RAA15586@nocserv.free.net> According to Serguei Patchkovskii > for Opteron- > > based servers is high, and price/performance ratio in comparison > > w/Xeon is not clear. > Once you start populating your systems with "interesting" amounts of memory > (i.e. anything above 2Gbytes), the price difference between dual Opterons > and > dual Xeons is really in the noise - at least at the places we buy. If your > suppliers > charge you a lot more for Opterons, may be you should look for another > source? > There is currently not "too wide" choice of possible sources of dual Opteron systems now in Russia :-) I agree that high memory price (for DIMMs from 1 GB, but the price will decrease) lower the percent of differences in total price, but if you use 512MB DIMMs for complectation, price difference is essential. Pls sorry: I assume, that in general the prices here in Russia are similar to other countries, but I didn't check just now. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sat, 21 Jun 103 17:16:15 +0400 (MSD) Subject: cluster of AOD Opteron In-Reply-To: <1056121119.9688.7.camel@picard.lab.atipa.com> from "Curt Moore" at Jun 20, 3 09:58:40 am Message-ID: <200306211316.RAA15134@nocserv.free.net> According to Curt Moore > The RioWorks HDAMA (Arima) motherboard does have on-board sensors, > adm1026 based. 1) there is no information about environment monitoring chips in the HDAMA motherboard guide (at least in the guide we had) 2) sensors-detect utility (I used version from SuSe enterprise Linux beta-version distribution) didn't find any monitoring chips at the testing > Arima does have planned both a mini BMC which does just > management type functions and also a full BMC with will do other neat > things, I believe, such as KVM over LAN. Below is a lm_sensors dump > from an Arima HDAMA. It's good. But which lm_sensors version should be used and what are the necessary settings for lm_sensors kernel modules (taking into account that lm_sensors didn't find anything ) ? > > adm1026-i2c-0-2c > Adapter: SMBus AMD8111 adapter at 80e0 > Algorithm: Non-I2C SMBus adapter > in0: +1.15 V (min = +0.00 V, max = +2.99 V) > in1: +1.59 V (min = +0.00 V, max = +2.99 V) > in2: +1.57 V (min = +0.00 V, max = +2.99 V) > in3: +1.19 V (min = +0.00 V, max = +2.99 V) > in4: +1.18 V (min = +0.00 V, max = +2.99 V) > in5: +1.14 V (min = +0.00 V, max = +2.99 V) > in6: +1.24 V (min = +0.00 V, max = +2.49 V) > in7: +1.59 V (min = +0.00 V, max = +2.49 V) > in8: +0.00 V (min = +0.00 V, max = +2.49 V) > in9: +0.45 V (min = +1.25 V, max = +0.98 V) > in10: +2.70 V (min = +0.00 V, max = +3.98 V) > in11: +3.33 V (min = +0.00 V, max = +4.42 V) > in12: +3.38 V (min = +0.00 V, max = +4.42 V) > in13: +5.12 V (min = +0.00 V, max = +6.63 V) > in14: +1.57 V (min = +0.00 V, max = +2.99 V) > in15: +11.88 V (min = +0.00 V, max = +15.94 V) > in16: -12.03 V (min = +2.43 V, max = -16.00 V) > fan0: 0 RPM (min = 0 RPM, div = 2) > fan1: 0 RPM (min = 0 RPM, div = 2) > fan2: 0 RPM (min = 0 RPM, div = 2) > fan3: 0 RPM (min = 0 RPM, div = 2) > fan4: 0 RPM (min = 0 RPM, div = 1) > fan5: 0 RPM (min = 0 RPM, div = 1) > fan6: -1 RPM (min = 0 RPM, div = 1) > fan7: -1 RPM (min = 0 RPM, div = 1) > temp1: +37?C (min = -128?C, max = +80?C) > temp2: +46?C (min = -128?C, max = +100?C) > temp3: +46?C (min = -128?C, max = +100?C) > vid: +1.850 V (VRM Version 9.1) > Sorry, what does it means ? adm1026 has no enough possibilities to measure the values (in this case only 3 temperatures but no any RPM value) or lm_sensors version don't work correctly ? Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 24 Jun 103 20:12:35 +0400 (MSD) Subject: Opteron (x86-64) compute farms/clusters? In-Reply-To: <3EF809A4.1050802@dlr.de> from "Thomas Alrutz" at Jun 24, 3 10:19:48 am Message-ID: <200306241612.UAA09513@nocserv.free.net> According to Thomas Alrutz > > I just made some benchmarks on a Opteron 240 (1.4 GHz) node running with > Suse/United Linux Enterprise edition. > I have sucessfully compiled mpich-1.2.4 in 64 bit without any problems > (./configure -device=ch_p4 -commtype=shared). The default compiler is > the gcc-3.2.2 (maybe a Suse patch) and is set to 64Bit, the Portland > (5.0beta) compiler didn't worked at all ! > > I tried our CFD-code (TAU) to run 3 aerodynamik configurations on this > machine with both CPUs and the results are better then estimated. > We achieved in full multigrid (5 cycles, 1 equation turbulence model) a > efficiency of about 97%, 92% and 101 % for the second CPU. > Those results are much better as the results we get on the Intel Xeons > (around 50%). It looks that this results are predictable: Xeon CPUs require high memory bandwidth, but both CPUs share common system bus. Opteron CPUs have own memory buses and scale in this sense excellent. Better SPECrate results for Opteron (i.e. work on a mix of tasks) confirm (in particular) this features. CFD codes, I beleive, require high memory throughput ... Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 27 Jun 103 21:01:49 +0400 (MSD) Subject: Intel PRO/1000CT Gigabit ethernet with CSA In-Reply-To: <3EFBEA29.60602@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 08:54:33 am Message-ID: <200306271701.VAA12659@nocserv.free.net> According to Daniel Pfenniger > > For a small experimental cluster (24 dual Xeon nodes) > we decided to use InfiniBand technology, which from specs is > 4 times faster (8Gb/s), 1.5 lower latency (~5musec) than > Myrinet for approximately the same cost/port. Could you pls compare them a bit more detailed ? Infiniband card costs (as I heard) about $1000-, (HCA-Net from FabricNetworks, former InfiniSwitch ?), what is close to Myrinet. But what is about switches (I heard about high prices) ? In particular, I'm interesting in very small switches; FabricNetworks produce 8-port 800-series switch, but I don't know about prices. May be there is 6 or 4 port switches ? BTW, is it possible to connect pair of nodes by means of "cross-over" cable (as in Ethernet), i.e. w/o switch ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sun, 29 Jun 103 18:14:48 +0400 (MSD) Subject: Intel PRO/1000CT Gigabit ethernet with CSA In-Reply-To: <3EFCA093.4090006@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 09:52:51 pm Message-ID: <200306291414.SAA12281@nocserv.free.net> According to Daniel Pfenniger > Patrick Geoffray wrote: > > On Fri, 2003-06-27 at 13:46, Daniel Pfenniger wrote: > >>The exact costs are presently not well fixed because several companies > >>enter the market. The nice thing about IB is that it is an open > >>standard, the components from different companies are compatible, > >>which is good for pressing costs down. > > > > With the slicon coming from one company (actually 2 but the second one > > does only switch chip), the price adjustment would mainly affect the > > reseller, where the margin are not that high. I don't expect much a > > price war in the Infiniband market, mainly because many IB shops are > > already just burning (limited) VC cash. > > The main point for price advantage of IB is if the volume goes up. It's > > a very different problem that the multiple-vendors-marketing-stuff. One > > can argue that HPC does not yield such high volumes, only a business > > market like the Databases one does. > > > > Remember Gigabit Ethernet. It was very expensive when the early adopters > > were the HPC crowd and the price didn't drop until it made its way to > > the desktop. It's the case for 10GE today. > > ... > > Patrick Geoffray > > Myricom, Inc. > > Yes I mostly agree with your analysis, database is the only significant > potential market for IB. > > However the problem with 1GBE or 10GBE is that the latency remains poor > for HPC applications, while IB goes in the right direction. > The real comparison to be made is not between GE and IB, but between > IB and Myricom products, which belong to an especially protected niche. > As a result for years the Myrinet products did hardly drop in price > for a sub-Moore's-law increase in performance, because of a lack of > competition (the price we paid for our Myricom cards and switch > 18 months ago is today *exactly* the same). I agree with you both. From the viewpoint of HPC clusters the IB competitor is Myrinet (and SCI etc). But there are many applications w/coarse-grained parallelism, where bandwidth is the main thing, not the latency (I think, quantum chemistry applications are bandwidth- limited). In this case (i.e. if latnecy is less important) 10Gb Ethernet is also IB competitor. Moreover, IB, I beleive, will be used for TCP/IP connections also - in opposition to Myrinet etc. (I beleive there is no TCP/IP drivers for Myrinet - am I correct ?) Again, from the veiwpoint of some real appilications, there are some applications which use TCP/IP stack for parallelization (I agree that is bad, but ...) - for example Linda tools (used in Gaussian) work over TCP/IP, Gamess-US DDI "subsystem" works over TCP/IP. In the case of IB or 10Gb Ethernet TCP/IP is possible. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 3 Jul 103 20:27:51 +0400 (MSD) Subject: Linux support for AMD Opteron with Broadcom NICs In-Reply-To: <20030701224808.GA15167@stikine.ucs.sfu.ca> from "Martin Siegert" at Jul 1, 3 03:48:08 pm Message-ID: <200307031627.UAA02885@nocserv.free.net> According to Martin Siegert > > Hello, > I have a dual AMD Opteron for a week or so as a demo and try to install > Linux on it - so far with little success. > First of all: doing a google search for x86-64 Linux turns up a lot of > press releases but not much more, particularly nothing one could download > and install. Even a direct search on the SuSE and Mandrake sites shows > only press releases. Sigh. > Anyway: I found a few ftp sites that supply a Mandrake-9.0 x86_64 version. > Thus I did a ftp installation which after (many) hickups actually worked. > However, that distribution does not support the onboard Broadcom 5704 > NICs. I also could not get the driver from the broadcom web site to work > (insmod fails with "could not find MAC address in NVRAM"). > Thus I tried to compile the 2.4.21 kernel which worked, but > "insmod tg3" freezes the machine instantly. > Thus, so far I am not impressed. > For those of you who have such a box: which distribution are you using? > Any advice on how to get those GigE Broadcom NICs to work? I may only add to the list of AMD64-oriented distributions Turbolinux 8 for AMD64. I'm not sure that "promotional" version of Turbolinux is complete enough, but "commercial" version costs only about $70 (w/o support ;-)). BTW, does somebody try it ? We worked w/SuSE SLES8: it looks today as the only "reliable" choice of 64-bit ditribution :-( Let me congratulate our colleagues in USA w/4th July ! Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 16 Jul 103 18:28:33 +0400 (MSD) Subject: Global Shared Memory and SCI/Dolphin In-Reply-To: <200307161516.09818.joachim@ccrl-nece.de> from "Joachim Worringen" at Jul 16, 3 03:16:09 pm Message-ID: <200307161428.SAA28224@nocserv.free.net> According to Joachim Worringen > Franz Marini: > > being in the process of deciding which net infrastructure to use for = > our > > next cluster (Myrinet, SCI/Dolphin or Quadrics), I was looking at the > > specs for the different types of hw. > > Provided that SCI/Dolphin implements RDMA, I was wondering why so lit= > tle > > effort seems to be put into implementing a GSM solution for x86 cluster= > s. > > Because MPI is what most people want to achieve code- and=20 > peformance-portability. Partially I may agree, partially - not: MPI is not the best in the sense of portability (for example, optimiziation requires knowledge of interconnect topology, which may vary from cluster to cluster, and of course from MPP to MPP computer). I think that if there is relative cheap and effective way to build ccNUMA system from cluster - it may have success. > > > The only (maybe big, maybe not) problem I see in the Dolphin hw is the > > lack of support for cache coherency. > > > > I think that having GSM support in (almost) commodity clusters would = > be > > a really-nice-thing(tm). > > Martin Schulz (formerly TU M=FCnchen, now Cornell Theory Center) has deve= > loped=20 > exactly the thing you are looking for. See=20 > http://wwwbode.cs.tum.edu/Par/arch/smile/software/shmem/ . You will also = > find=20 > his PhD thesis there which describes the complete software. > > I do not know about the exact status of the SW right now (his approach=20 > required some patches to the SCI driver, and it will probably be necessar= > y to=20 > apply them to the current drivers). Very interesting approach, though. > > Other, non SCI approaches like MOSIX and the various DSM/SVM libraries al= > so=20 > offer you some sort of global shared memory - but most do only use TCP/IP= > for=20 > communication. > Joachim > Joachim Worringen - NEC C&C research lab St.Augustin > fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de > Even hardware implementation of CPUs cache-coherence for large number of processors may give bottleneck. Broadcasting MOESI gives high coherence traffic, ccNUMA-systems use directory-based cache-coherence approach. Software solutions are in general not efficient, but hardware solutions (if they will exist) will be expensive :-( Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 16 Jul 103 22:31:15 +0400 (MSD) Subject: Global Shared Memory and SCI/Dolphin In-Reply-To: <19coKN-5n4-00@etnus.com> from "James Cownie" at Jul 16, 3 04:36:23 pm Message-ID: <200307161831.WAA02082@nocserv.free.net> According to James Cownie > > > > Because MPI is what most people want to achieve code- and > > > peformance-portability. > > > Partially I may agree, partially - not: MPI is not the best in the > > sense of portability (for example, optimiziation requires knowledge > > of interconnect topology, which may vary from cluster to cluster, > > and of course from MPP to MPP computer). > > MPI has specific support for this in Rolf Hempel's topology code, > which is intended to allow you to have the system help you to choose a > good mapping of your processes onto the processors in the system. Unfortunately I do not know about that codes :-( but for the best optimization I'll re-build the algorithm itself to "fit" for target topology. > > This seems to me to be _more_ than you have in a portable way on the > ccNUMA machines, where you have to worry about > > 1) where every page of data lives, not just how close each process is > to another one (and you have more pages than processes/threads to > worry about !) > > 2) the scheduler choosing to move your processes/threads around the > machine. Yes, but "by default" I beleive that they are the tasks of operating system, or, as maximum, the information I'm supplying to OS, *after* translation and linking of the program. > > > I think that if there is relative cheap and effective way to build > > ccNUMA system from cluster - it may have success. > > Which is, of course, what SCI was _intended_ to be, and we saw how > well that succeeded :-( > > -- Jim > James Cownie > Etnus, LLC. +44 117 9071438 > http://www.etnus.com Mikhail Kuzminsky Zelinsky Institute of Organic Chemsitry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 25 Jul 103 20:55:49 +0400 (MSD) Subject: Infiniband: cost-effective switchless configurations Message-ID: <200307251655.UAA08132@nocserv.free.net> It's possible to build 3-nodes switchless Infiniband-connected cluster w/following topology (I assume one 2-ports Mellanox HCA card per node): node2 -------IB------Central node-----IB-----node1 ! ! ! ! ----------------------IB----------------------- It gives complete nodes connectivity and I assume to have 3 separate subnets w/own subnet manager for each. But I think that in the case if MPI broadcasting must use hardware multicasting, MPI broadcast will not work from nodes 1,2 (is it right ?). OK. But may be it's possible also to build the following topology (I assume 2 x 2-ports Mellanox HCAs per node, and it gives also complete connectivity of nodes) ? : node 2----IB-------- C e n t r a l n o d e -----IB------node1 \ / \ / \ / \ / \ / \ / \--node3 node4-- and I establish also additional IB links (2-1, 2-4, 3-1, 3-4, not presenetd in the "picture") which gives me complete nodes connectivity. Sorry, is it possible (I don't think about changes in device drivers)? If yes, it's good way to build very small and cost effective IB-based switchless clusters ! BTW, if I will use IPoIB service, is it possible to use netperf and/or netpipe tools for measurements of TCP/IP performance ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 20 Aug 103 20:09:20 +0400 (MSD) Subject: SGE on AMD Opteron ? Message-ID: <200308201609.UAA08558@nocserv.free.net> Sorry, is here somebody who works w/Sun GrideEngine on AMD Opteron platform ? I'm interesting in any information - about binary SGE distribution in 32-bit mode, or about compilation from the source for x86-64 mode, under SuSE or RedHat distribution etc. Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 22 Aug 103 22:15:01 +0400 (MSD) Subject: PCI-X/133 NICs on PCI-X/100 Message-ID: <200308221815.WAA27091@nocserv.free.net> I'm interesting in any experience about work of PCI-X/133 NICs with PCI-X/100 slot. Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards work w/PCI-X/100 slots on Opteron-based mobos (most of them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro mobos) - i.e. how high is the probability that they are incompatible ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemnistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 21 Oct 103 14:49:07 +0400 (MSD) Subject: parllel eigen solvers In-Reply-To: <200310201236.28901.kinghorn@pqs-chem.com> from "Donald B. Kinghorn" at Oct 20, 3 12:36:28 pm Message-ID: <200310211049.OAA18031@nocserv.free.net> According to Donald B. Kinghorn > > Does anyone know of any recent progress on parallel eigensolvers suitable for > beowulf clusters running over gigabit ethernet? > It would be nice to have something that scaled moderately well and at least > gave reasonable approximations to some subset of eigenvalues and vectors for > large (10,000x10,000) symmetric systems. > My interests are primarily for quantum chemistry. > In the case you think about semiempirical fockian diagonalisation, there is a set of alternative methods for direct construction of density matrix avoiding preliminary finding of eigenvectors. This methods are realized, in particular, in Gaussian-03 and MOPAC-2002 methods. For non-empirical quantum chemistry diagonalisation usually doesn't limit common performance. In the case of methods like CI it's necessary to find only some eigenvectors, and it is better to use special diagonalization methods. There is special parallel solver package, but I don't have exact reference w/me :-( Mikhail Kuzminsky Zelinsky Inst. of Orgamic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 21 Oct 103 22:10:23 +0400 (MSD) Subject: parllel eigen solvers In-Reply-To: <20031021150637.GA8076@plk.af.mil> from "Arthur H. Edwards" at Oct 21, 3 09:06:37 am Message-ID: <200310211810.WAA08779@nocserv.free.net> According to Arthur H. Edwards > > I should point out that density function theorcan be compute-bound on > diagonalization. QUEST, a Sandia Code, easily handles several hundred > atoms, but the eigen solve dominates by ~300-400 atoms. Thus, > intermediate size diagonalization is of strong interest. > > Art Edwards > Yes, I agree w/you about DFT. Yours Mikhail Kuzminsky _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 30 Dec 103 18:23:32 +0300 (MSK) Subject: [Beowulf] X-window, MPICH, MPE, Cluster performance test In-Reply-To: from "D. Scott" at Dec 29, 3 11:27:21 am Message-ID: <200312301523.SAA06085@nocserv.free.net> According to D. Scott > > At last! My cluster is now online. I would like to thank everyone for they > help. I thinking of putting a website together covering my experience in > putting this cluster together. Will this be of use to anyone? Is they > website that covers top 100 list of small cluster?. > Now it is online I would like to test it. > > MPICH comes with test program, eg mpptest. Programs works and it produce > nice graph. Is they any documentation/tutorial that explains meaning of > these graphs? > MPICH also comes with MPE graphic test programs, mandel. Problem is that I > have only got X-window installed on the master node. But, when I run > pmandel, it returms an error, staying that it can not find shared library > for X-window on other nodes. How can I make X-window shared across other > nodes from the Master node? You may use NFS for access to master node. > Same me install GUI programs on other nodes. > This could be related problem, but when I complied life (that uses MPE > libraries) it returns error that MPE libraries are undefined. Any ideas? > Can I install both LAM/MPICH and MPICH-1.2.5 on the same machine? Yes, of course you may work w/both LAM and MPICH. BTW, let me congratulate Beowulf maillist subscribers w/New Year ! Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 23 Jan 104 15:35:32 +0300 (MSK) Subject: [Beowulf] cluster on suse In-Reply-To: from "Anand TNC" at Jan 23, 4 10:40:43 am Message-ID: <200401231235.PAA05593@nocserv.free.net> According to Anand TNC > > Hi, > > I'm new to clustering...does anyone know of some clustering software which > works on Suse 8.2 or Suse 9.0? All of the usual cluster software will work succesfully w/SuSE Linux. If you say about software *included* in distribution as RPM-packages, then also yes, SuSE Linux has most important things such as MPI for example. Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow > > Thanks > > regards, > > Anand > > -- > Anand TNC > PhD Student, > Engine Research Laboratory U-55 IISc Hostels, > Dept. of Mechanical Engg., Indian Institute of Science, > Indian Institute of Science, Bangalore 560 012. > Bangalore 560 012. Ph: 080 293 2591 > Lab Ph: 293 2352 080 293 2624 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 10 Feb 104 21:27:22 +0300 (MSK) Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> from "=?big5?q?Andrew=20Wang?=" at Feb 10, 4 11:42:32 am Message-ID: <200402101827.VAA05978@nocserv.free.net> According to =?big5?q?Andrew=20Wang?= > From comp.arch: "One of the things that the version > 8.0 of the Intel compiler included was an > "Intel-specific" flag." > > But looks like the purpose is to slow down AMD: > http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com > > If intel releases 64-bit x86 CPUs and compilers, then > AMD may get even better benchmarks results. The danger of this "slow-down" is not too extremally large now: SPECcpu2000 results (perhaps the best obtained) published for "high-end" Opterons are based on Portland compiler, not on ifc. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow > > Again, no matter how pretty the benchmarks results > look, in the end we still need to run on the real > system. So, what's the point of having benchmarks? > > Andrew. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 14 May 104 22:27:21 +0400 (MSD) Subject: [Beowulf] Athlon64 / Opteron test In-Reply-To: <40A4E4D8.9010001@mscsoftware.com> from "Joe Griffin" at May 14, 4 08:25:12 am Message-ID: <200405141827.WAA12362@nocserv.free.net> According to Joe Griffin > > ... > Below is a web site comparing IA32, IA64 (linux and HPUX), Opteron > and an IBM P655 running AIX. The site should only be used to > compare hardare platforms when running our software. I am sure > that Fluent, LSTC/Dyna, Star-CD have similar sites. I recomend > finding out about the software that you will be using. > > MSC.Nastran Hardware comparison: > > http://www.mscsoftware.com/support/prod_support/nastran/performance/v04_sngl.cfm > > Regards, > Joe Griffin > This page contains very interesting tables w/description of hardware used, but at first look I found only the data about OSes, not about compilers/run time libraries used. The (relative bad) data for IBM e325/Opteron 2 Ghz looks "nontrivial"; I beleive some interptretation of "why?" will be helpful. May be some applications used are relative cache-friendly and have working set placing in large Itanium 2 cache? May be it depends from compiler and Math library used ? BTW, for LGQDF test: I/O is relative small (compare pls elapsed and CPU times which are very close); but Windows time for Dell P4/3.2 Ghz (4480 sec) is much more worse than for Linux on the same hardware (3713 sec). IMHO, in this case they must be very close in the case of using same comlilers&libraries (I don't like Windows, but this result is too bad for this OS :-)) Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 10 Jun 104 19:11:31 +0400 (MSD) Subject: [Beowulf] Setting memory limits on a compute node In-Reply-To: from "Brent M. Clements" at Jun 8, 4 10:42:43 am Message-ID: <200406101511.TAA17314@nocserv.free.net> According to Brent M. Clements > > We have a user who submits a job to a compute node. > > The application is gaussian. The parent gaussian process can spawn a few > child processes. It appears that the gaussian application is exhausting > all of the memory in the system essentially stopping the machine from > working. You can still ping the machine but can't ssh. Anyway's I know the > fundementals of why this is happening. My question, is there any way to > limit a user's total addressable space that his processes can use so that > it doesn't kill the node? This situation may depends strongly from real method of calculation used in frames of Gaussian (and may be from objects of calculations, i.e. molecules). We work w/G98 (I beleive G03 will have the same behaviour) jobs and didn't have like problems. You may try to restrict (if it's really necessary) the memory used for particular Gaussian job by means of setting up of %mem value in the input Gaussian data; there is also default settings for %mem value in gaussian configuration file. G98 can't exceed %mem value. We inform our G98 users about upper limit of %mem value which don't leads to high paging. You may also try to setup ulimit/limit values for stack and data in the shell script used for G98 job submitting . Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 16 Jun 104 20:05:24 +0400 (MSD) Subject: [Beowulf] CCL:Experiences with 64 bits AMD processors (fwd from In-Reply-To: <20040616042135.GH12847@leitl.org> from "Eugen Leitl" at Jun 16, 4 06:21:35 am Message-ID: <200406161605.UAA24654@nocserv.free.net> According to Eugen Leitl > > > From: Marc Noguera Julian > Date: Tue, 10 Jun 2003 19:09:00 +0200 > To: chemistry at ccl.net > Subject: CCL:Experiences with 64 bits AMD processors > User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 > > Hello, > we are interested in buying some more computational resources. In our > group we are interested in 64 bit AMD processors, but we do not know > about their compatibility. They are supposed, as AMD says, to be32 bit > compatible and therefore AMD 64 bit processor should be able to run any > 32 bit application. Is that true? Any experience about this will help us > a lot. We run, in particular, Gaussian-98 (32 bit binary version) on Opteron servers w/SuSE SLES8. > By the way, we are running mainly gaussian jobs, and have some other 32 > bit binaries like turbomole and jaguar. We have source code license for > gaussian 03. Has anyone tried to compile Gaussian 03 for a AMD 64 bit > machine? Do 32 bit pentium binaries run correctly on a 64 bit processor > which is the increase on the performance? Yes, G03 is compiled at least by Gaussian, Inc itself: there is G03 64-bit binary version for Opteron in the price list. We have significant speed-up on Opteron in comparison w/Athlons. We run also 32-bit binaries codes translated for Pentium on Opteron. > Do Turbomole and Jaguar > binaries run on 64 bit AMD processors? anyone tried? > Any information will be helpful. > Thanks a lot > Marc > > --------------------------- > Marc Noguera Julian > Thcnic Especialista de Suport a la Recerca > Qummica Fisica, Universitat Autrnoma de Barcelona. > Tlf: 00-34-935812173 > Fax: 00-34-935812920 > e-mail: marc at klingon.uab.es > --------------------------------------- > > Eugen* Leitl leitl > ______________________________________________________________ > ICBM: 48.07078, 11.61144 http://www.leitl.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > http://moleculardevices.org http://nanomachines.net > Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 18 Jun 104 20:15:23 +0400 (MSD) Subject: [Beowulf] cluster on Mellanox Infiniband Message-ID: <200406181615.UAA19878@nocserv.free.net> We are purchasing a pair of Mellanox Infiniband 4x HCA cards (PCI-X/133) for building of small 2-nodes 4-processor switchless testing cluster on the base of AMD Opteron w/Tyan S2880 boards. The nodes work under SuSE Linux 9.0 for AMD64. I'll be very appreciate in receiving any information about following: 1) Do we need to buy some additional software from Mellanox ? (like THCA-3 or HPC Gold CD Distrib etc) 2) Any information about potential problems of building and using of this hard/software. To be more exactly, we want to install also MVAPICH (for MPI-1) or new VMI 2.0 from NCSA for work w/MPI. For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for installation. But I don't know, will we receive this software w/Mellanox cards or we should buy this software additionally ? I need this data badly, because we are very restricted in money ;-) ! Thanks for your help ! Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:09:39 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 21 Jun 104 17:46:23 +0400 (MSD) Subject: [Beowulf] cluster on Mellanox Infiniband In-Reply-To: from "Franz Marini" at Jun 21, 4 10:24:58 am Message-ID: <200406211346.RAA17895@nocserv.free.net> According to Franz Marini > Hi, > > On Fri, 18 Jun 104, Mikhail Kuzminsky wrote: > > > 1) Do we need to buy some additional software from Mellanox ? > > (like THCA-3 or HPC Gold CD Distrib etc) > > You shouldn't have to. Thank you VERY much for your fast reply !! I'm glad to hear ... > > 2) Any information about potential problems of building and using > > of this hard/software. > > > To be more exactly, we want to install also MVAPICH (for MPI-1) or > > new VMI 2.0 from NCSA for work w/MPI. > > For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for > > installation. But I don't know, will we receive this software w/Mellanox > > cards or we should buy this software additionally ? > > Hrm, no, VMI 2.0 doesn't require neither THCA-3 nor HPC Gold CD (whatever > it is ;)). The NCSA site for VMI says "Infiniband device is linked against THCA-3. OpenIB device is linked using HPC Gold CD distrib". What does it means ? I must install VMI for Opteron + SuSE 9.0, there is no such binary RPM, i.e. I must install VMI from the source. I thought that I must use software cited above for building of my bibary VMI version. I beleive that Software/Driver THCA Linux 3.1.1 will be delivered w/Mellanox cards. OpenSM 0.3.1 - I hope, also. But I don'n know nothing about "HPC Gold CD distrib" :-( > > We have a small (6 dual Xeon nodes, plus server) testbed cluster with > Mellanox Infiniband (switched, obviously). > > So far, it's been really good. We tested the net performance with SKaMPI4 > ( http://liinwww.ira.uka.de/~skampi/ ), the results should be in the > online db soon, if you want to check them out. > > Seeing that you are at the Institute of Organic Chemistry, I guess you're > interested in running programs like Gromacs or CPMD. So far both of them > worked great with our cluster, as far as only one cpu per node is used > (running two different runs of gromacs and/or CPMD on both cpus on each > node gives good results, but running only one instance of either program > on both cpus on each node results in very poor scaling). It looks that it gives conflicts on bus to shared memory ? Thanks for help Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow > > Have a good day, > > Franz > > > --------------------------------------------------------- > Franz Marini > Sys Admin and Software Analyst, > Dept. of Physics, University of Milan, Italy. > email : franz.marini at mi.infn.it > --------------------------------------------------------- > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at eadline.org Mon Jul 16 15:48:53 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 16 Jul 2012 15:48:53 -0400 Subject: [Beowulf] A few Cluster Monkey things ... Message-ID: Happy summer everyone, I have had a poll up for while now on Cluster Monkey asking about social media and HPC. If the interest in this poll is any indication, I think I can guess the final results, but if you have a minute, head on over and take the poll: http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html As always our polls and results are on the site for your viewing. BTW, I think it might be worth while to re-ask some of the older poll questions. http://www.clustermonkey.net/Cluster/HPC-Polls-and-Surveys/ Also, if you have a burning question, let me know I'll put it up as a poll. Finally, while you are there check out the HPC500 program that Intersect360 has launched. Seems interesting and great way to help influence the industry. http://clustermonkey.net/Select-News/are-you-leading-the-hpc-charge.html Thanks! Doug Eadline -- Doug -- Mailscanner: Clean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dnlombar at ichips.intel.com Mon Jul 16 16:20:28 2012 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Mon, 16 Jul 2012 13:20:28 -0700 Subject: [Beowulf] A few Cluster Monkey things ... In-Reply-To: References: Message-ID: <20120716202028.GA29118@nlxcldnl2.cl.intel.com> On Mon, Jul 16, 2012 at 03:48:53PM -0400, Douglas Eadline wrote: > > Happy summer everyone, > > I have had a poll up for while now on Cluster Monkey asking about social > media and HPC. If the interest in this poll is any indication, I think I > can guess the final results, but if you have a minute, head on over and > take the poll: > > http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html Hmmm. This doesn't distinguish usages. It would be nice to see how people view social media as a professional tool. Something like "What kind of social media do you turn to for technical information?" The choices you have for your question fit this, too :) -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Mailscanner: Clean From newton at netific.com.netific.com Mon Jul 23 12:44:16 2012 From: newton at netific.com.netific.com (Wing Newton) Date: Mon, 11 Sep 100 08:02:09 -0700 (PDT) Subject: multi-ethernets LAN Message-ID: <200009111502.IAA30083@ws132.netific.com> Greetings, I am looking a Linux driver for combining multiple ethernet segments into 1 LAN using several ethernet cards to scale the LAN bandwidth from 10/100 to x*10/100. Thank you for your help. Newton _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf From newton at netific.com.netific.com Mon Jul 23 12:44:16 2012 From: newton at netific.com.netific.com (Wing Newton) Date: Mon, 11 Sep 100 08:02:09 -0700 (PDT) Subject: multi-ethernets LAN Message-ID: <200009111502.IAA30083@ws132.netific.com> Greetings, I am looking a Linux driver for combining multiple ethernet segments into 1 LAN using several ethernet cards to scale the LAN bandwidth from 10/100 to x*10/100. Thank you for your help. Newton _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 25 Sep 100 23:04:28 +0400 (MSD) Subject: kickstart RH6.2 installation problems Message-ID: <200009251904.XAA01603@nocserv.free.net> Dear netters, I'm installing RH6.2 on the nodes of new Beowulf cluster. I want to do kickstart RH 6.2 installation on HDDs, having the necessary partitions (but they are w/o ext2fs created). But after start of installation I receive following traceback messages: File /usr/bin/anaconda, line 341 in ? extraModules=extraModules) File /usr/lib/anaconda/todo.py, line 332 in __init__ self.setClass(instClass) File /usr/lib/anaconda/todo.py, line 822, in setClass todo.addmount(dev,mntpoint,fstype,reformat) File /usr/lib/anaconda/todo.py, line 395, in addMount and install exited abnormally. It looks that the problems are w/partitions. I'll be very appreciate in ideas what is reason of errors. ks.cfg file contents is : lang en_US network --bootproto static --ip 192.168.0.10 --netmask 255.255.255.0 --gateway 192.168.0.4 ### Source File Location cdrom keyboard us ### Partitioning Information #zerombr yes zerombr no #clearpart --linux part / --size 141 --onpart hda2 ^^^^^^^^^^ (the result dosn't depend from the presence of size keywords) part swap --size 133 --onpart hda3 part /usr --size 3004 --onpart hda5 install ### Mouse Configuration mouse genericps/2 --emulthree ### Time Zone Configuration timezone --utc US/Eastern ### X Configuration xconfig --vsync 60 ### Root Password Designation rootpw paSSword ### Authorization Configuration auth --useshadow --enablemd5 ### Lilo Configuration lilo --linear --location mbr ### Package Designation %packages @ Base chkfontpath groff-perl ... ### Commands To Be Run Post-Installation %post echo "This is in the chroot" > /tmp/message Thanks for your help ! Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 5 Nov 101 23:11:12 +0300 (MSK) Subject: Athlon MP vs Athlon XP Message-ID: <200111052011.XAA04207@nocserv.free.net> Dear colleagues, I think about buying of Tyan S2460 motherboards for Beowulf. According the data I have, Athlon XP (Palomino core) microprocessors can work successfully w/this mobos. But there is also Athlon MP microprocessors w/same Palomino core w/same OPGA package w/same voltages and w/same frequencies beginning from 1333 (1500+). They costs, as I understand, higher than corresponding MP models. Sorry, what is the difference between MP and XP chips ? Both, if my source was correct, supports cache coherence. Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 6 Nov 101 19:59:45 +0300 (MSK) Subject: Athlon MP vs Athlon XP In-Reply-To: <3C4B2812.D133809D@lnxi.com> from "Patrice Duffort" at Jan 20, 2 01:26:58 pm Message-ID: <200111061659.TAA10767@nocserv.free.net> According to Patrice Duffort > > Dear Mikhail, > > The XPs and MPs have the same core but the XP is essentially a crippled MP. Lesser overall performance. > I'm sorry, do you have some test results of performance or some data about microarchitecture difference ? It's not too obviously how to prepare chip w/decreased MP performance, but working correctly in SMP environment. For example, I should suppress something like split transactions handling etc. It looks too expensive to prepare special chip with a bit different microarchitecture. Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 3 Dec 101 20:02:08 +0300 (MSK) Subject: GCC/Fortran 90/95 questions In-Reply-To: <20011201035522.85826.qmail@web14706.mail.yahoo.com> from "Ron Chen" at Nov 30, 1 07:55:22 pm Message-ID: <200112031702.UAA12622@nocserv.free.net> According to Ron Chen > There is a compiler called open64, which is SGI's > compiler for IA64. They have a C front-end, which is > based on gcc, and they have another for f90. (I don't > know the details...) > Recently, they have ported the f90 front-end and > run-time to other compiler back-ends. Please read the > note below for details. > http://open64.sourceforge.net/ > http://sourceforge.net/tracker/?group_id=34861&atid=413342 > > ... > > =========================================================== > Porting open64 F90 front-end to Solaris > This patch ports the open64 Fortran90 compiler front > end to sparc_solaris platform. Specifically, it ports > these three executable programs: "mfef90", "ir_tools", > and "whirl2f". ANY OTHER COMPONENT OF OPEN64 IS NOT IN > THE SCOPE OF THIS PATCH. > Tested platforms include sparc_solaris, mips_irix and > ia32_linux, using both GNU gcc and vendor compiler. > Makefiles, some header files and some c/c++ source > files were modified for porting. It's very interesting information. As I know, SGI discontinuued the development and support of SGI Pro64 developmnet tools. Sorry, where you found the data about IA-32 /Linux platform support by open64? At the first look I don't see them on references you sent :-( Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 10 Dec 101 21:15:25 +0300 (MSK) Subject: NetGear FA-31x In-Reply-To: from "Donald Becker" at Dec 10, 1 12:41:03 pm Message-ID: <200112101815.VAA06836@nocserv.free.net> According to Donald Becker > On Mon, 10 Dec 2001, Javier Iglesias wrote: > > There was a post some time ago that mentioned problems > > using NetGear FA311 NICs > > (-> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=5) > > with some "specific AMD chipsets" > > (-> http://www.beowulf.org/pipermail/beowulf/2001-October/001668.html) > That wasn't a specific report. It was pretty much "something doesn't > work". > > > We are experiencing some problems getting the highly > > recommended FA310 cards > > (-> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4) > > The FA310 (Lite-On PNIC-2 or ADM Comet chip) is completely unrelated to > the FA311 (NatSemi DP83815 chip). Any problem common between the two is > likely from the motherboard, not the NIC. I heard about some problems w/south bridge on Tyan Tiger MP, but (if I'm correct) somebody (sorry, I don't remember) wrote in our mailing list that some GigE NICs, in particular from Intel, works successfully w/Tiger MP. We have both (Tiger MP and Intel Pro 1000 T), but didn't check them. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 11 Jan 102 20:00:50 +0300 (MSK) Subject: Fastest Intel Processors In-Reply-To: from "Joel Jaeggli" at Jan 11, 2 08:38:35 am Message-ID: <200201111700.UAA11054@nocserv.free.net> According to Joel Jaeggli > > There are 2.2ghz p4's, these are based on the .13 micron northwood core > rather than the willamete. to date I haven't heard of anyone having issues > with these... drop one on your socket 478 mainbaord and go to town... ;) > As I understand, It'll be not right for all the motherboards. Northwood will have different available voltage values for different I (ampers), so you need really special VRM version which may be not present on your motherboard. At least for Tualatin core it's just as I said. Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 7 Feb 102 22:50:17 +0300 (MSK) Subject: PCI-64: how to find Message-ID: <200202071950.WAA27670@nocserv.free.net> Dear netters. I want to find some confirmation that my installed RH 7.2 "understand" that it works with PCI-64. (We are using some dual mobos from Tyan which supports 64-bit PCI slots). We have Intel Pro1000T NICs installed in PCI-64 slots, but I didn't find any information that Linux works in "PCI-64" mode. May be this information must be presented somewhere (/proc/pci, /var/log/messages etc ) ? Thanks for help ! Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 15 Feb 102 22:58:29 +0300 (MSK) Subject: Intel Pro/1000T frames dropping&overruns Message-ID: <200202151958.WAA27863@nocserv.free.net> Dear netters, we are testing connection of 2 dual nodes w/Pentium III Tualatin 1266 Mhz CPUs (Tyan Thunder S2518 mobos). Both nodes uses Intel Pro/1000T NIC installed on PCI-64/66 Mhz slots. (They are "old" cards: we buy them in begin of 2001; if I'm correct, Intel produce now new modification). RH 7.2 (kernel 2.4.7-10enterprise, driver e1000 from RH distribution) is installed on both nodes. We found that ping -s 2048 or TCP_STREAM tests of netperf-2.2alpha leads to "hang up" of connection. Ifconfig says that there is Rx errors: for something about 300 packets received we have about 30 dropped and overrruns. Setting options e1000 RxIntDelay=nn (nn was decreased from 64 to 48, 32, 8, 1,0) didn't help. Setting of Jumbo=0 don't helps also. If we transmit small packages (usual ping w/o -s), there is no problems. Both NICs worked successfully on Athlon/700 Mhz 1-CPU nodes (but w/PCI-32 and 33 Mhz). I beleive that PIII/1266 CPU performance is enough for GigE; but I don't know what may be the other source of packets droppings. Should I try new e1000 version from Intel ? I'll be very appreciate in any ideas how to improve the situation. Yours Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 15 Mar 102 22:03:14 +0300 (MSK) Subject: gige benchmark performance In-Reply-To: from "Mark Hartner" at Mar 14, 2 08:40:57 pm Message-ID: <200203151903.WAA02340@nocserv.free.net> According to Mark Hartner > > For the Intel Pro/1000T and Netgear GA620 there was only a slight > performance difference between 32 and 64 bit PCI (the 64bit slot did > slightly better). We found very big difference between 32-bit PCI on Athlon/700 Mhz nodes (Gigabyte GA-7VX mobos) and 64-bit/33 Mhz PCI on Tyan S2460 for Intel Pro/1000T cards. 32-bit PCI gives for netperf TCP_STREAM tests only about 300 Mbit/s, but on S2460 we received excellent results - about 910 Mbit/s for TCP_STREAM. The tests looks as not CPU bound. Theoretically it may be also due to difference in software : for Athlon/700 we tested RH 6.2 (2.2.14-5 kernel) and RH 7.1 (if I remember correctly, kernel 2.4 was standard in distribution) w/3.0.10 version of e1000. On S2460 we worked w/RH 7.2 (2.4.7-10) and e1000-drivers 3.1.22 and 4.0.7 (last was found as much more stable, but it's other talk...). But I beleive that the difference is too high and the reason of difference is hardware (if there is no problems w/south bridge on GA-7VX, it must be the difference in PCI buses). Mikhail Kuzminsky, Zelinsky Institute of Organic Chemistry, Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 26 Mar 102 22:59:52 +0300 (MSK) Subject: LFS and Fortran in Scyld Beowulf In-Reply-To: <3CA0875E.3030604@mscsoftware.com> from "Joe Griffin" at Mar 26, 2 06:36:14 am Message-ID: <200203261959.WAA08503@nocserv.free.net> According to Joe Griffin > > g77 may be recompiled for large files. > We use: -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE > This allows: > Formatted/Sequential > 4 Gb > Unformatted/Sequential < 4 Gb > Unformatted/Direct < 4 Gb > Not quite limitless, but is gains from > the 2 Gb limit. I think that like restrictions are very importatn for many "beowulfers". By my opinion, large unformatted files are more important than formatted, and 4 Gbytes restriction is inappropriate. Of course, below I say about 32-bit CPUs( IA-32). There is 2 different ways to supprot large files: a) use special subset of system calls to open/read/write which allows to work with large files. This leads potentially to possible changes in the source. The pluses of this way is that they may be realized for more old versions of kernel. b) Use modern features of kernels 2.4.x and ext3fs Unfortunately I'm not familiar w/restrictions in file sizes at usual system calls in this environment. But theoreticaly it's clear that I want to have large files w/usual open/read/write. Sorry, if you say about restrictions in sizes (here and below) - what do you mean - the ways w/changing of compiler source (and run-time library sources) for more old kernels, or it's necessary also for 2.4.x&ext3fs ? > The CURRENT Intel compiler - 5.0.1 has a > 2 Gb limit. The next release to come out > at the end of this month (6.0) will have > the limit removed. > Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 27 Mar 102 20:50:29 +0300 (MSK) Subject: LFS and Fortran Message-ID: <200203271750.UAA16216@nocserv.free.net> > On Tue, 26 Mar 102, Mikhail Kuzminsky wrote: > > According to Joe Griffin > > > g77 may be recompiled for large files. > > > We use: -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE > .. > > > Unformatted/Sequential < 4 Gb > > I think that like restrictions are very importatn for many > > "beowulfers". By my opinion, large unformatted files are more > > important than formatted, and 4 Gbytes restriction is inappropriate. > > LFS support is much more important for Beowulf systems than the average > workstation user. > > > a) use special subset of system calls to open/read/write > .. > > b) Use modern features of kernels 2.4.x and ext3fs > > The LFS kernel support has some limitations, many of which remain true > for 2.4 kernels. First, the offset is really only 40/41/42 bits, not 64 > bits, because we still use 32 bit block offsets with 512/1K blocks. > Very few places have a single file larger than 4TB, so this isn't a > current problem. > In many cases 40-42 bits is enough (at least it gives more disc space than it exist on the node - if I use some modern HDDs per node). My questions concern more to Linux itself than to Scyld distribution. There is 3 possible "levels" of large files support for application programs written on F77 a) I use statically linked binaries created under more old Linux versions. I suppose that this binaries use standard run-time libraries w/o special 64-bit open/read/write calls. Is it possible to work w/large files if I'll run this binaries on 2.4.x w/ext3 or ext2 ? b) I use the same binaries but w/dynamic linking, and may change g77/pgf77/ifc run-time library to more new version. Is it enough for work w/large files under 2.4.x w/ext3 or ext2 ? c) It's possible to re-translate f77 source with some modern version of compiler to receive large files support. It's clear now that g77 3.1 (or specially precompiled but more old g77 version) and ifc 6.0 will be enough ; but was is about pgf77 versions ? Yours Mikhail Kuzminsky Zelisnky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 27 Mar 102 21:26:55 +0300 (MSK) Subject: e1000-4.1.7 (new driver version) Message-ID: <200203271826.VAA16681@nocserv.free.net> I've received today the information about availability of new version of e1000 driver for Intel Pro/1000T NICs. Taking into account that upgrade from more old to previous version of e1000 was very important for our cluster based on Athlon Tyan S2460 nodes (w/o this upgrade netperf tests simple hang-up connection after a series of packets droppings and overruns; but the problems remain on dual PIII Tualatin Tyan mobos), this information may be interesting also for subscribers of Beowulf mail list. According to issuppor at mailbox.cps.intel.com > From issuppor at mailbox.cps.intel.com Wed Mar 27 07:12:41 2002 > Date: Wed, 27 Mar 2002 04:12:37 GMT > Subject: RE: Re: Re: Re: Re: e1000-4.0.7 installation > From: issuppor at mailbox.cps.intel.com > Reply-To: "joseph m" > To: kus at free.net > > We just released version 4.1.7 of the driver. There have been a few performance cleanups that should speed up the driver. > > http://support.intel.com/support/go/linux/e1000.htm > Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 3 Apr 102 16:06:17 +0400 (MSD) Subject: Hyperthreading in P4 Xeon Message-ID: <200204031206.QAA10335@nocserv.free.net> According to William Park > From beowulf-admin at beowulf.org Wed Apr 3 11:52:00 2002 > From: William Park > To: beowulf at beowulf.org > Subject: Hyperthreading in P4 Xeon (question) > > What is the realistic effect of "hyperthreading" in P4 Xeon? I didn't see any data about applications which are typical for clusters. But there is some other results on Intel Web-site. The success will depend from application strongly. For example, if you have an application, which need full cache size for working set of pages, the perfornmance of like application will degrade because at simultaneous running of 2 processes the cache will share. > I'm not > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > behave like 4-way SMP? Yes, every physical CPU is equal to 2 logical CPUs, and you may use OpenMP etc. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 16 Apr 102 23:02:59 +0400 (MSD) Subject: again OpenPBS vs SGE Message-ID: <200204161902.XAA04166@nocserv.free.net> I'm in process of choice of *free* batch queue system for new Linux cluster(s). We are using GNQS on many SMP systems and we are happy with it, but GNQS isn't develop now. Real competition is, IMHO, between OpenPBS and Codine/SGE (which was very praised early in our maillist, in particular, by Chris Black). Some comparisons are presented by Omar Hassaine from Sun (www.sun.com/products-n-solutions/edu/hpc/presentations/june01/ omar_hassaine.pdf). IMHO, some of this estimations are inconsistent w/some Chris Blake statements. So, I'll try below to formulate shortly few (looking as important for me) advantages and disadvantages of OpenPBS and SGE. I'll be very appreciate in any remarks, opinions etc (especially were I'm wrong). I.Some PBS minuses. 1) The main is instable work of deamons 2) PBS don't support user checkpoint migartion. For example, I run Gaussian98 job (which creates own checkpoint file) on one node, and there is now subsequent G98 job which may run on other (free) node, but this other node don't have the necessary G98 checkpoint file 3) Absence of interface w/Globus Grid - if it's Open PBS (not PBSpro). II. Some PBS pluses - it looks as most popular for Linux clusters - it's possible to receive job from one node and send it to run on other node of *other cluster* III. Some SGE minuses 1) Do not support "multiclustering" 2) The schedule algorithms are restricted to only one default (this is inconsistent w/Chris Black message, as I understand) IV. Some SGE pluses 1) Reliable work 2) Globus Grid is integrated (?? is it correct ?) 3) There is support of job migration I don't see to absence of SGE source today (I beleive it'll be available in nearly future). Thanks for the future help, Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 17 Apr 102 12:16:19 +0400 (MSD) Subject: again OpenPBS vs SGE Message-ID: <200204170816.MAA08490@nocserv.free.net> According to Rayson Ho > From raysonlogin at yahoo.com Tue Apr 16 23:39:42 2002 > Date: Tue, 16 Apr 2002 12:39:39 -0700 (PDT) > From: Rayson Ho > Subject: Re: again OpenPBS vs SGE > To: Mikhail Kuzminsky , beowulf at beowulf.org > ... > > > 2) The schedule algorithms are restricted to only one > > default (this is inconsistent w/Chris Black message, as > > I understand) > > You talking about SGE 5.2.x? Yes, I wrote about 5.2.3.1 which is last "production" version currently available. > Chris Black must be talking about SGE 5.3, which has several advanced > nice scheduler features: > > http://www.hardi.se/products/literature/sun_grid_engine.pdf > Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 25 Apr 102 20:20:14 +0400 (MSD) Subject: Tyan Tiger 2460 (Re) Message-ID: <200204251620.UAA24432@nocserv.free.net> According to Robert G. Brown > From beowulf-admin at beowulf.org Thu Apr 25 11:50:34 2002 > From: "Robert G. Brown" > To: Beowulf Mailing List > Subject: Tyan Tiger 2460 > > We've had problems (as have others on this list) getting our 2U > rackmount Tyan Tiger 2460 motherboards to boot/install/run reliably and > stably. > > ... to conclude that this is a > reproducible BUG in the 2460 Tiger motherboard, either in the BIOS or > (worse) in the physical wiring of slot 1... > BTW, so far the 2466 runs fine, as noted by many listvolken. > It's not only problem w/Tyan dual motherboards. The problem exist also w/correct work of Hardware Monitor chips (for work of lm_sensors it's necessary to do (at the boot) some trick w/BIOS), for both 2460 and 2466. Moreover, for Thunder w/Tualatin chips lm_sensors can't work. May be Supermicro boards are more stable ... Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 7 May 102 22:23:39 +0400 (MSD) Subject: opinion on XFS (Re:) Message-ID: <200205071823.WAA15765@nocserv.free.net> According to Yudong Tian > From beowulf-admin at beowulf.org Tue May 7 19:43:09 2002 > From: "Yudong Tian" > To: "Beowulf \(E-mail\)" > Subject: opinion on XFS > Date: Tue, 7 May 2002 11:28:23 -0400 > > > Hello, > Has anyone tested the water of using SGI's XFS on a Linux cluster Can > you kindly share any experience and insights? Unfortunately we don't use xfs on Linux nodes because it's not standard kernel feature, but we has big experience w/xfs under some generations of SGI Irix on different hardware from workstations to SMP/ccNUMA servers, and xfs looks as very reliable and appropriate for intensive I/O. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sat, 18 May 102 20:22:21 +0400 (MSD) Subject: real temperature on Tyan S2460 Message-ID: <200205181622.UAA03151@nocserv.free.net> I'm working w/cluster based on Tyan S2460 mobos w/AMD Athlon XP1800+ (we have old BIOS which allows to use Athlon XP instead of MP). There is well known problem w/lm_sensors on this motherboards and well known trick w/BIOS and lm_sensors setting. After that measures lm_sensors works successfully on our nodes. The question is about real temperature values. BIOS shows us too high values (about 76 C) which formally corresponds to W83627 data obtained from lm_sensors package (simultaneously W83782d chip shows something about 39-43 C). But after running sensors -s the lm_sensors data by W83627 (i.e. output of "sensors" command) is decreasing to "good" 40-45 C because of setting the kind of sensors to 3904 transistors (in the sensors.conf file). I asked Tyan staff about real kind of sensors but didn't receive answer. All known me installations use 3904 setting. At increasing of CPU load the W83627 data from lm_sensors w/3904 setting comes from 42 to 46.5 C, but with setting to tiristors - only from 76 to 77 C. It looks therefore that BIOS data are wrong ("tuned" to wrong kind of sensors). Sorry, am I right about BIOS ? Mikhail Kuzmisnky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 21 May 102 19:58:36 +0400 (MSD) Subject: Fortran Compilers for Scyld (Re:) Message-ID: <200205211558.TAA02073@nocserv.free.net> According to Arnie Miles > From beowulf-admin at beowulf.org Mon May 20 19:14:15 2002 > Subject: Fortran Compilers for Scyld > From: Arnie Miles > To: beowulf at beowulf.org > Date: Mon, 20 May 2002 10:49:04 -0400 > > Does anyone have input on using the Intel ifc Fortran 95 compiler on a > Scyld cluster? Is it compatible? > As I understand, ifc dosn't "depend" from particular Linux/cluster software distribution. The only thing where ifc "depends" strongly from parallelization is the support of OpenMP, but it includes only parallelization for SMP nodes. AFAIK ifc is sompatible w/g77. You may use ifc for work w/MPI etc. Mikhail Kuzminsky Zelisnky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 27 May 102 20:50:42 +0400 (MSD) Subject: Infiniband and Intel (Re:) Message-ID: <200205271650.UAA00167@nocserv.free.net> According to Patrick Geoffray > From beowulf-admin at beowulf.org Sat May 25 01:45:00 2002 > From: Patrick Geoffray > To: Beowulf mailinglist > Subject: Infiniband and Intel > Sender: beowulf-admin at beowulf.org > Date: Fri, 24 May 2002 17:33:04 -0400 > > Intel is pulling out from Infiniband: > http://story.news.yahoo.com/news?tmpl=story&ncid=70&e=1&cid=70&u=/cn/20020524/tc_cn/intel_cancels_infiniband_products > > Considering Intel's weight, that's a bad sign. > I agree. Infiniband may be very attractive as potential interconnect for cluster nodes and potentially may compete w/Myrinet ;-) At the last IDF (Spring-2002, San-Francisco, where I was) it were presented many Infiniband solutions. Moreover, it was presented special track (by NCSA) about using of Infiniband for MPI communications. Mikhail Kuzminsky Zelisnky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 21 Jun 102 12:44:57 +0400 (MSD) Subject: ATHLON vs XEON: number crunching Message-ID: <200206210844.MAA11709@nocserv.free.net> According to Richard Walsh > From beowulf-admin at beowulf.org Thu Jun 20 23:42:38 2002 > From: Richard Walsh > To: beowulf at beowulf.org, lindahl at keyresearch.com > Subject: Re: ATHLON vs XEON: number crunching > > "Under heavy load conditions, the latency of SDRAM deteriorates > rapidly. RDRAM holds up quite gracefully ... under heavy load, > where memory performance is crucial to CPU performance, RDRAM > has far lower latency than SDRAM." > > Also, I note that both the McKinley/ZX1 from HP, EV7, and Cray SV2 will > use RDRAM. Would you argue that this is for bandwidth reasons only? > > Perhaps this is a total versus component latency difference? The choice of RDRAM may be was done simple because this decision was done a lot of time ago (i.e. the time of development is too high), when DDR was not available as good alternative to RDRAM. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 4 Jul 102 21:53:17 +0400 (MSD) Subject: CCL:Origin300, Linux Cluster (Re:) Message-ID: <200207041753.VAA16733@nocserv.free.net> According to Eugen Leitl > From beowulf-admin at beowulf.org Thu Jul 4 20:36:41 2002 > From: Eugen Leitl > To: > Subject: CCL:Origin300, Linux Cluster (fwd) > ---------- Forwarded message ---------- > Date: Thu, 4 Jul 2002 12:04:07 -0400 > From: Jianhui Wu > To: chemistry at ccl.net > Cc: amber at heimdal.compchem.ucsf.edu > Subject: CCL:Origin300, Linux Cluster > Dear Colleagues, > I have a budget around $40k CN to shop for a new computer system, which > will be used for MD simulation, virtual screening and some bioinformative > stuff. Currently, I am looking at two options: Origin 300 (2 cpu) or PC > Linux Cluster. I would like to hear your experience with these systems and > spend the limited budget right. > > (1) An Origin 300 2cpu 500MHZ cost around $35k. Are you using this kind of > system? Do you have benchmark of MD simulation (such as Amber) for this > system? Do you regret your purchase? Some time ago I looked somewhere on //www.sgi.com a set of benchmarks results, in particular on Amber, for some SGI systems. But it's absolutely clear that you'll have much more high performance and much more better price/performance ratio if you'll build cluster of x86-based PC's w/1-2CPU's per node. Moreover, usually you'll have better performance simple per CPU, i.e. at equal number of processors. The main reasons for choice of Origin 300 may be a)the presence of some chemical software which may "exist" for IRIX but absent for Linux; b) The total cost of ownership, because cluster requires much more "human time" for installation and administration. Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 4 Jul 102 21:53:17 +0400 (MSD) Subject: CCL:Origin300, Linux Cluster (Re:) Message-ID: <200207041753.VAA16733@nocserv.free.net> According to Eugen Leitl > From beowulf-admin at beowulf.org Thu Jul 4 20:36:41 2002 > From: Eugen Leitl > To: > Subject: CCL:Origin300, Linux Cluster (fwd) > ---------- Forwarded message ---------- > Date: Thu, 4 Jul 2002 12:04:07 -0400 > From: Jianhui Wu > To: chemistry at ccl.net > Cc: amber at heimdal.compchem.ucsf.edu > Subject: CCL:Origin300, Linux Cluster > Dear Colleagues, > I have a budget around $40k CN to shop for a new computer system, which > will be used for MD simulation, virtual screening and some bioinformative > stuff. Currently, I am looking at two options: Origin 300 (2 cpu) or PC > Linux Cluster. I would like to hear your experience with these systems and > spend the limited budget right. > > (1) An Origin 300 2cpu 500MHZ cost around $35k. Are you using this kind of > system? Do you have benchmark of MD simulation (such as Amber) for this > system? Do you regret your purchase? Some time ago I looked somewhere on //www.sgi.com a set of benchmarks results, in particular on Amber, for some SGI systems. But it's absolutely clear that you'll have much more high performance and much more better price/performance ratio if you'll build cluster of x86-based PC's w/1-2CPU's per node. Moreover, usually you'll have better performance simple per CPU, i.e. at equal number of processors. The main reasons for choice of Origin 300 may be a)the presence of some chemical software which may "exist" for IRIX but absent for Linux; b) The total cost of ownership, because cluster requires much more "human time" for installation and administration. Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 23 Aug 102 17:47:18 +0400 (MSD) Subject: Wanted: Good mobo for Intel 850E chipset and 1066 RDRAM In-Reply-To: <54sn167rtb.fsf@intech19.enhanced.com> from "Camm Maguire" at Aug 22, 2 12:59:28 pm Message-ID: <200208231347.RAA22077@nocserv.free.net> According to Camm Maguire > Greetings! We're upgrading our 16-node cluster. Our code heavily > uses matrix-vector BLAS level2 operations. Memory bandwidth is the > bottleneck, and our preliminary tests show that rambus is at present > the clear winner in terms of performance. This of course is > unfortunate, given the legal manipulations surrounding the > technology. We would much prefer to go with dual channel DDR, but > this doesn't appear to be available anytime soon. > Dual Channel DDR is supported, but for AMD Athlon (in nVidia chipset), and the speed-up for memory-limited applications like Gaussian98, is essential. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 28 Aug 102 19:44:20 +0400 (MSD) Subject: >2 p4 processor systems In-Reply-To: <8724B9A26BBD904495EECACA7A7DA505011B6FF6@gemini.diversa.com> from "Brian LaMere" at Aug 27, 2 11:59:53 am Message-ID: <200208281544.TAA06897@nocserv.free.net> According to Brian LaMere > So I'm trying to find out if anyone knows of a 4-way p4 system out there. > I'm wanting to bring a couple dual-p4's in here just so they'll see that the > performance far surpases the current per-node performance we have on our > cluster, but...brick wall. The guy above me agrees with me, the guy above > him won't talk to me about it. He just gets all excited about a 6-way p3 > server in 1u. Whoopie. > So...help? Anyone know of any 4-way p4 systems? And no, amd isn't an > option (unfortunately). > I want to add some words to "minuses" of x86 SMPs. We use 2-CPUs Tyan S2460 w/Athlon MP which don't require such many memory throughput as P4 for "obtaining" of high performance. We tested S2460 w/Athlon MP 1800+ under STREAM tests (using OpenMP parallelization of loops with ifc 5.0) and found that 2-CPU (2-thread) results are not better than for 1 CPU. You may find close results for 2-CPUs SMPs at //www.streambench.org. Some applications are scaled relative well from 1 to 2 CPUs SMP. The examples are 1) Linpack(n=100 and n=1000) which is localized in cache 2) Gaussian 98 SCF method where localization in cache is also high. In last case the speed-up on test178 is something about 1.7 (I don't remember exactly). But high-performance calculations, in particular many methods realized in g98 are memory-bounded now. So you should expect bad speedup on 2-CPUs x86 systems because of memory bottlenecks. Most 2-CPUs x86 SMPs have 1-port main memory, and the competition for memory of modern x86 CPUs will be high (especially for P4, where SPECfp2000 data depends significantly from memory throughput). So it's not clear for me that 2-CPUs SMP are more attractive than 2*single-CPUs nodes (yes, we should calculate price/performance ratio ...). What is about 4-CPUs SMPs then I looked in some cases that the architecture is bad in the sense of memory throughput scaling (but this was for more old PIII-based systems). Therefore it's necessary to be sure that 4-CPUs P4 systems has efficient memory throughput, else 2*2 CPUs SMP may be better. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 28 Aug 102 19:49:22 +0400 (MSD) Subject: Intel's MKL WAS RE: >2 p4 processor systems In-Reply-To: from "Rocky McGaugh" at Aug 27, 2 03:22:35 pm Message-ID: <200208281549.TAA06946@nocserv.free.net> According to Rocky McGaugh > With HPL, ive seen consistently better performance from atlas than i have > with Intel's MKL. Granted, this is only a single application. Does anyone > have any testimonials about the MKL? > I've tested a set of different x86 CPUs on Linpack (n=100 and n=1000) and found that in most cases Atlas gives more high performance. But this may be because of using recursive LU in dgetrf of Atlas (this algorithm is the most fast today) - but I don't know about algorithms used in MKL 5.01 version of dgetrf. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 30 Aug 102 22:38:47 +0400 (MSD) Subject: Performance Benchmarks In-Reply-To: from "Tom A Krewson" at Aug 30, 2 01:00:58 pm Message-ID: <200208301838.WAA01246@nocserv.free.net> According to Tom A Krewson > Does anyone know of a good objective benchmark for Linux clusters running > MPI? I have tried Linpack but failed to get the results I need. It seems > to need tuning for each cluster which is makes it hard to be objective > with reguards to its results. I also have used llcbench and have gotten > some nifty graphs but nothing to compare what I have in my cluster > objectively to other clusters. > I may recommend you Linpack High Parallel benchmark which is used also in TOP500 table where custers also are presented. URL: //netlib2.cs.utk.edu/benchmark/hpl There is a set of other known benhcmarks, in particular for MPI itself, but if you want to see to computational performance of your cluster, Linpack parallel looks as the simplest way. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 12 Sep 102 19:52:56 +0400 (MSD) Subject: Tyan S2468UGN In-Reply-To: <200209120951.LAA21905@dylandog.crs4.it> from "Alan Scheinine" at Sep 12, 2 11:51:26 am Message-ID: <200209121552.TAA08121@nocserv.free.net> According to Alan Scheinine > > While we are on the subject on the Tyan AMD mother board, > I have a question concerning the S2466 and S2468. The manual > of AMD for the AMD MP 2000+ and 2200+ says that the chip has > a thermal diode but the mother board must have circuitry to > read the diode. Tom's hardware web site has a movie of what > happens to the AMD if the heatsink is detached, it begins to > smoke after about one second. The manual of these Tyan boards > at the Tyan site does not mention thermal shutdown protection. > Do these Tyan boards have a thermal shutdown that would protect > the board and the even greater risk of a fire? There is only few mobos which *really* do shutdown looking to Athlon chip diode (in particular, one mobo from Fujitsu-Siemens). And I know only about 1-CPU mobos :-( Tyan don't say nothing simple because their mobos don't do this :-( Beginning from July 2002 AMD certifies only mobos having this feature, but Tyan mobos, as I understand, were developed before this date. Mikhail Kuzminksy Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 13 Sep 102 17:25:45 +0400 (MSD) Subject: Disk noises and Tyan S2468UGN In-Reply-To: from "Joel Jaeggli" at Sep 12, 2 12:28:36 pm Message-ID: <200209131325.RAA18949@nocserv.free.net> According to Joel Jaeggli > Thermal recalibration of the head sounds like the most likely cause... I thought about the same, but every 20 seconds ???!!! Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow > > On Thu, 12 Sep 2002, David Mathog wrote: > > > Our S2468UGN has 5 x 36 GB IBM disks in it, 2 on one > > SCSI bus, 3 on the other. These are IBM IC35L036UWD210-0. > > They work fine, pass all diagnostics, including exhaustive > > surface testing using IBM's Drive Fitness Test 3.10. But the odd > > thing is that there is a semiperiodic (mean maybe 20 seconds, but > > huge variance) noise from one or more of the disks which sounds > > for all the world like a quieter version of a DLT tape repositioning. > > That is, a longish (1.5 seconds?) whir followed immediately by a > > shorter sort of "shunk" at the end. Some sort of movement sound, > > but not anything that sounds like an overt failure. > > > > I can't see the individual drive lights on these disks because > > of the way they are mounted, in fact, I don't even know that > > they have drive lights, so I can't really say if this is one drive > > doing this or all 5 drives doing it once in a while. The main system > > drive light does not come on when this sound is made. I upgraded > > to the latest Tyan BIOS (v4.03) and it still occurs. The sound is > > produced whenever there is power: sitting in the BIOS, waiting in > > DOS, running linux, etc. > > > > Has anybody else observed this? > > Any idea what it might be? > > > > Thanks, > > > > David Mathog > > mathog at caltech.edu > > Manager, Sequence Analysis Facility, Biology Division, Caltech > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > -------------------------------------------------------------------------- > Joel Jaeggli Academic User Services joelja at darkwing.uoregon.edu > -- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -- > In Dr. Johnson's famous dictionary patriotism is defined as the last > resort of the scoundrel. With all due respect to an enlightened but > inferior lexicographer I beg to submit that it is the first. > -- Ambrose Bierce, "The Devil's Dictionary" > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 23 Oct 102 16:08:27 +0400 (MSD) Subject: thermal kill switch In-Reply-To: from "alvin@Maggie.Linux-Consulting.com" at Oct 22, 2 06:25:04 pm Message-ID: <200210231208.QAA21664@nocserv.free.net> According to alvin at Maggie.Linux-Consulting.com > some motherboards have health monitors... > - you can go into the bios and tell it what to do > ( shutdown when the temp hits a value ) > But are you sure that Linux shutdown will be correct in that case ? Mikhail Kuzminksy Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ares at lanpartynw.com Mon Jul 23 12:44:16 2012 From: ares at lanpartynw.com (Ares) Date: Wed, 23 Oct 102 12:03:09 EDT Subject: Help needed Message-ID: <200210231203590.SM00960@lanpartynw.com> My name is Derek Pryor. I am a senior in high school and to graduate we have to do a big project. I am creating a beowulf cluster for my project. One of the requirements is that we have a mentor help us out. I have not found anyone in my local area (Seattle, WA) so I asking online now. What this would involve is helping me plan out the design of the software. Also I would need some help creating a benchmark test so I could mesure the proformance increase. I have knowledge in Linux and C Programming and Linux Socket Programming. If you are intersted or have any questions feel free to talk to me. Email: ares at lanpartynw.com Aim: sith lord 1226 (I'm on most of the time from 3pm ? 9pm PST) Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 22 Nov 102 22:35:22 +0300 (MSK) Subject: Linda problems (under work w/G98) Message-ID: <200211221935.WAA29625@nocserv.free.net> I've installed binary Linda 6.2 version (for homogenous cluster) for our Giagbit Ethernet-based cluster (nodes works under RH 7.2). The main task of Linda for us is to support inter-nodes parallelization of one application (binary version of Gaussian -98 Rev.A11). But we found that this application starts parallel processes on cluster nodes and "hang-ups" because of Linda/network problems (it looks that the problem is not w/G98 itself). I'll be very appreciate in any ideas what may be the real source of our problem ! A bit more detailed description of our situation follows. 1) We tested G98+Linda on 2 "equal" SMP nodes w/default Linda configuration file, i.e. w/Tsnet.Appl.maxprocspernode: 1 (i.e. Linda starts 1 master process on master node, and 1 additional process on 2nd node). The clocks on both nodes are synchronized through ntpd. NFS is not used. 2) This nodes has equal .tsnet.config files in home directories of the same user on different nodes. This files has 1 string: Tsnet.Appl.nodelist: host1 host2 3) At start of g98l (application executable) on host1 we see following ntsnet messages: ... ntsnet starting master process on host1 ntsnet starting 1 worker on host2 ntsnet waiting for Linda group messages ntsnet received Linda group message: group has 2 members ... and now we see parallel processes working on both nodes, but it looks that they can't exchange (send/receive) the messages: they are mainly in waiting state, strace gives - select/gettimeofday/sendto/recvfrom (last -w/"resource temporary unavailable") syscalls in a loop - on host1 (master) - select/gettimeofday/sendto syscalls in a loop - on host2 After some time interval we see on host1 the message: ntsnet: worker on node host2 exited abnormally - and the run is finished. 4) At start of g98l on host2 (i.e. host2 is now master node) the situation is not the same (not symmetrical): ntsnet starting master process on host2 ntsnet starting 1 worker on host1 ntsnet waiting for Linda group message Linda Error: node host1(0) warning: sendto failed: Network is unreachable ntsnet received Linda group message: group has 2 members ... and then a lot of Linda error messages - that Network is unreachable. And as in previous case we see parallel (waiting) processes on both nodes. 5) At the time when parallel processes on both nodes can't "negotiate" successfully, ping and rsh between this nodes works normally. Ping gives various delays for host1-->host2 and host2-->host1 (90-130 microseconds), but it looks appropriate. Ifconfig says that there is no network errors. Yours Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 8 Jan 103 19:22:27 +0300 (MSK) Subject: SCMS question(s) Message-ID: <200301081622.TAA26098@nocserv.free.net> I've read some overviews/user manual/... about SCE and, in particular(especially), SCMS, and it looks for me that it's the best choice today (I looked also on bWatch, CMS, SGI PCP). But few SCMS-2.0 features are not clear for me just now. I'll be very appreciate if somebody will help me w/answers. 1) I have PIII Tualatin CPU's on frontal cluster node but Athlon MP on compute nodes. The documentation says that I must have equal "kinds" of CPUs. Is it "strong" requirement ? What really will not work in SCMS in my case ? 2) HARDWARE plugin for cms_rms has the possibility to control fan speeds and temperature values. How it's organised ? Is it work through lm_sensors package or directly ? Yours Mikhail Kuzminksy Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 10 Jan 103 21:25:07 +0300 (MSK) Subject: question about Intel P4 versus Alpha's In-Reply-To: from "Dominic Wu" at Jan 10, 3 09:53:21 am Message-ID: <200301101825.VAA23830@nocserv.free.net> According to Dominic Wu > Is HT anything more than a thinly-veiled attempt at luring more software > developers to develop multi-threaded applications so as to help Intel sell > more CPU in the future? (I.E. the new fangled software that is optimized > for HT can really benefit from additional REAL processors instead of using > just HT?) No. It looks that the "reasons" were other. 1) It is known that a lot of "execution resources" (in particular, execution units) of superscalar microprocessors are not used (simultaneously) by many applications - especially which don't give heavy load for CPUs. 2) The possibility to organize multitherading execution by modern superscalar chips "costs" (in the sense of additional hardware ) very low, and was practically realized by Intel a lot of time ago. But they opened this possibility for users only early. 3) One of the powerful "reasons" to propose HT was the war w/AMD; HT is excellent marketing step. BTW, IMHO, we must say "thanks" to Intel for introducing HT: now paralelization will come to desktop computer, and the corresponding parallelization technologies will be necessary for new areas of applications. Mikhail Kuzminksy Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 13 Jan 103 16:55:50 +0300 (MSK) Subject: question about Intel P4 versus Alphas In-Reply-To: <200301130819.JAA04462@dylandog.crs4.it> from "Alan Scheinine" at Jan 13, 3 09:19:45 am Message-ID: <200301131355.QAA24384@nocserv.free.net> According to Alan Scheinine > In a previous message, Mikhail Kuzminksy spoke about multithreading > and hyperthreading, and also superscalar microprocessors. I would like > to add a few remarks for the sake of greater precision. Superscalar > and out-of-order execution has a high hardware cost because of the > large amount of logic needed to organize the execution steps dynamically. > The primary motivation for the Itanium was that this organizing of > the work at a fine-grained level would be done by the compiler. > Multithreading means that the processors gives time slices to various > threads. The state of the CPU for each thread is switched between > thread. Hyperthreading has several threads executing at the same time, > so exceptions and condition codes may be for one or another thread > at the same time. > For clusters, parallel execution generally uses message passing > so the user does not write the code as a multithreaded program. > As a consequence the application program would not be using hyperthreading. I'm sorry, one additional remark for clarification :-) I beleive that if I have 2 logical CPUs as in the case of HT, then I may use (for writing of my application) some parallelization tools, not only OpenMP for shared memory, but also MPI- for doing paralelization. When I wrote about "thanks" to Intel from beowulfers, I thought about any kinds of parallelization, in particular MPI. The questions "is in this case MPI better than OpenMP or pthreads etc" or "is it reasonable for some application(s) to use MPI for parallelization not only between the nodes, but also inside the HT-P4-nodes ?" are another questions. Mikhail Kuzminsky, Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 20 Mar 103 22:59:32 +0300 (MSK) Subject: Linux distributives for Opteron Message-ID: <200303201959.WAA05046@nocserv.free.net> By RedHat it was declared that x86-64 support will be realized in the frames of RH Linux Advanced Server distributive. But what is known about much more cheap RH Linux professional (according our experience, it is enough for building beowulf clusters w/2-cpu's nodes) ? According my data, x86-64 support in RH may be realized just at summer. The other choice for Opterons coming in May may be SuSe Linux profesional (it's cheap, but we traditionally based on RH). Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 26 Mar 103 19:31:14 +0300 (MSK) Subject: sun grid engine? In-Reply-To: <81D14648D6BD694CBDB4F45536E81CBC280A48@aquarius.diversa.com> from "Brian LaMere" at Mar 25, 3 03:39:44 pm Message-ID: <200303261631.TAA00228@nocserv.free.net> According to Brian LaMere > > this is not at all a request to be contacted by salepeople - please. All > such emails will be ignored. I have a Sun rep already. > > To the point - does anyone actually use Sun's Grid Engine, and what sort of > pro's and con's have they experienced? I can say about free SGE 5.3 version we are using. By my opinion, the pluses, in particular, are (in addition to answers to your questions) simple installation, good documentation, good graphical interface for administrator, some Globus features (we don't use them currently, but paln to use in the future). > Run well? Yes. > Enough functionality? By my opinion, yes. The main weak point is, IMHO, not " too advanced" sheduler. In particular, in OpenPBS I may implement MAUI source. > Stable? Yes, it was one of the main reasons of our choice of SGE in comparison w/OpenPBS. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry RAS Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 27 Mar 103 20:08:39 +0300 (MSK) Subject: sun grid engine Message-ID: <200303271708.UAA17249@nocserv.free.net> According to hanzl at noel.feld.cvut.cz > > > The main weak point is, IMHO, not > > " too advanced" sheduler. In particular, in OpenPBS I may > > implement MAUI source. > > SGE too is integrated with MAUI. I did not try it myself but I guess the > integration is far enough to be usable (those of you wo did try - please > comment on this). May be it's integrated into SGE 5.3 Enterprise Edition ? I said about *free* SGE 5.3. Both "Sun ONE Grid Engine Administartor and User's Guide" and "Sun ONE Grid Engine Release Notes" don't have just the word "MAUI". Moreover, the only sheduler algorithm allowed in usual (free) SGE 5.3 is "standard" (see SGE Administrator & User's guide, p.225). Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 27 Mar 103 21:59:28 +0300 (MSK) Subject: sun grid engine? Message-ID: <200303271859.VAA18743@nocserv.free.net> According to hanzl at noel.feld.cvut.cz > It is easy to get confused by SGE versions. > Enterprise Edition is also free. MAUI was integrated with it - most of > this work was done by MAUI team with help from SGE team. > > Regarding SGE versions, I think it works as follows: > 1) Developers create opensource SGE version. They work using publicly > available CVS software repository. All new features come to this > version. > ... > 2) 'Commercial' part of SUN takes these sources (probably without any > important changes) and compiles 'commercial' SGE and SGEEE. They add > word 'ONE' to the name. They create nice manuals. You can buy this > software and get usual support you expect for commercial software. > You can still download the manuals for free. Just skip word 'ONE' > while reading them - they are perfectly usable for free SGE as well. > They just may be out of date because the free version already has new > features (like MAUI integration). They may also never mention MAUI > integration because the 'commercial' part of SUN has no support for > it. > ... > PBS is older than SGE (and yes, PBS did many good things, no doubt) > and everybody knew PBS when opensource SGE was born. And many people > could easily expect that SGE used the same model as PBS did. (It was > easy to think that SGE EE is the commercial version - no, it is not.) Thanks ! I was sure that SGE model is the same as PBS :-) Now I'll like SGE much more :-) - SGE EE has additionaly nice features for heterogenous clusters/sets of clusters etc ! Mikhail Kuzminsky Zelisnky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 28 Mar 103 19:45:39 +0300 (MSK) Subject: sun grid engine? Message-ID: <200303281645.TAA03663@nocserv.free.net> According to Mikhail Kuzminsky > According to hanzl at noel.feld.cvut.cz > > ... > > PBS is older than SGE (and yes, PBS did many good things, no doubt) > > and everybody knew PBS when opensource SGE was born. And many people > > could easily expect that SGE used the same model as PBS did. (It was > > easy to think that SGE EE is the commercial version - no, it is not.) > Thanks ! I was sure that SGE model is the same as PBS :-) > Now I'll like SGE much more :-) - SGE EE has additionaly nice > features for heterogenous clusters/sets of clusters etc ! My :-) above related to me myself (not to SGE) - SGE is nice product, and in comparison w/NQS (we use, btw, Generic NQS on some old SGI serevrs) SGE has additionally not only PE support, but also for example Globus features. But the "type" of MAUI integration w/SGE looks (from this discussion) not clear: From: Ron Chen > ... >1. read the document: >http://supercluster.org/documentation/maui/sgeintegration.html >2. you need a password to get the latest versions of >Maui scheduler, which is in Alpha/Beta state. You can >get it from help at supercluster.org. I understand, that I may use latest version of Maui if I'll compile its source. According Alan Scheinine message here, MAUI simple may submit things to OpenPBS/SGE. But what means then MAUI integration in Sun binary version of SGEEE ? Does it means that I have pre-compiled versions of both MAUI and SGEE or only that binary SGEEE includes all the necessary corrections, allowing it to work w/more old Maui version ? Yours Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 3 Apr 103 19:59:22 +0400 (MSD) Subject: small cluster In-Reply-To: <1049316063.1932.4.camel@skull.america.net> from "Dennis Sarvis, II" at Apr 2, 3 03:41:03 pm Message-ID: <200304031559.TAA01399@nocserv.free.net> According to Dennis Sarvis, II > How does one go about creating a 2 PC cluster? I have a redhat 400Mhz > PII and a Debian Celeron 550Mhz. Can I do something like use 2 NICs in > the controller and one in the slave (1 NIC for the office > network/internet and the other connecting via crossover 10baseT to the > NIC on node1 slave)? Yes, I use like configuration in my home (but w/o permanent external link to Internet). Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 11 Apr 103 19:05:13 +0400 (MSD) Subject: [Linux-IA64] Itanium gets supercomputer software Message-ID: <200304111505.TAA23036@nocserv.free.net> >From: David Mosberger > Duraid> You and I both know the only real barrier to Itanium > Duraid> adoption is the price. Can anyone here shed some light on > Duraid> this? Why is Itanium hardware still so expensive? >Remember that Intel is targeting Itanium 2 against Power4 and SPARC. >In that space, the price of Itanium 2 is very competitive. It's absolutely right. But after start of AMD Opteron producing the situation may cahnge. Opteron will have much more slow performance but much more better price/performance ratio (and the same problem w.absence of 64-bit software ;-)). In the case of Opteron success Intel will do the choice: a) to realize "Plan B" (extend x86 to 64 bit in some new chip(s)), what has, unfortunately (I like RISC&IA-64 architecture) serious probability (by my opinion), or b) to change price/perforamnce situation drastically - by means of Deerfield. I'll be happy if the last way (b) will be realized. (of course, it's possible to have choice c) - to ignore x86-64 :-)). BTW, do somebody know something about *real* IPC (instructions per count) values obtained for some programs w/Itanium 2 and Power4 ? Taking into account extensive out-of-order execution of groups of instructions in Power4 it's not clear for me which IPC is higher (theoretical limit for It2 is 6, for Power4 is 5). >From: Duraid Madina >David Mosberger wrote: >> Remember that Intel is targeting Itanium 2 against Power4 and SPARC. >> In that space, the price of Itanium 2 is very competitive. >OK, I want to be clear on this. I asked why Itanium hardware is still so >expensive. Your answer seems to be marketing speak for "The prices are >still high because we are _happy_ selling small quantities of this >equipment to people used to paying through the nose for good quality >hardware." Is this correct? >Can I then conclude that Intel has not yet had any interest whatsoever >in driving IA64 into the realm of reasonble prices? It's sad to see so >much work being put into this Linux port when, if things remain as they >are, it will hardly be used. It looks that there is some "gentleman's" agreement between Intel and companies, manufacturing IA64-based systems, about "price increase". It may be "not official", but I'm sure that it's reality. It's typical for companies working on market of expensive, mainly RISC-based servers. I understand that Intal do not want to destroy this "approach" and initial agreements :-( But we should also take into account the real cost price for Intel. Do somebody know which it's ? It depends mainly from die size but we don't know also percent of good chips. From: Matt Chapman >> Can I then conclude that Intel has not yet had any interest whatsoever >> in driving IA64 into the realm of reasonble prices? >My understanding is that Deerfield will be targeted at the lower cost >market, though I haven't seen much info about it recently. In my last talks w/Intel staff they confirmed me that Deerfield will be oriented in particular to clusters market. Taking into account that Madison will arrive something about summer of current year, Deerfiled will be available, by my estiamtion, something at end of 2003. The main question will be, by my opinion, price/performance ratio which is absolutly unclear now. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 14 Apr 103 17:23:55 +0400 (MSD) Subject: [Linux-ia64] Itanium gets supercomputing software In-Reply-To: <16023.13579.676695.490297@napali.hpl.hp.com> from "David Mosberger" at Apr 11, 3 02:35:07 pm Message-ID: <200304141323.RAA06099@nocserv.free.net> According to David Mosberger > > As for what the future holds, I guess we'll just have to wait and see. > Remember though: just a year ago, the cheapest ia64 workstation you > could get was priced at $7k+ 2-cpu Itanium 2 server manufactured by HP, w/maximim academic discount was proposed for me with "not too essentially more high" price, but I can't disclose details. In any case the prcie is too high for Beowulf (IMHO), we should wait Madison/Deerfield. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 14 Apr 103 18:25:06 +0400 (MSD) Subject: [Linux-ia64] Itanium gets supercomputing software In-Reply-To: <3E97263E.5010605@octopus.com.au> from "Duraid Madina" at Apr 12, 3 06:31:58 am Message-ID: <200304141425.SAA07170@nocserv.free.net> According to Duraid Madina > David, > Itanium 2 isn't even competitive with other offerings from your own > company. Compare: > David Mosberger wrote: > > Here is one real price point for an Itanium 2 workstation: > > > > - hp workstation zx2000 (Linux software enablement kit) > > - Intel? Itanium 2 900MHz Processor with 1.5MB on-chip L3 cache > > ... > > - $3,298 > > with: > - HP server rp2430 > - 1xHP PA-8700 650MHz CPU with 2.25MB on-chip L1 cache > - $1,095 > I bought one of these, and it is excellent (if a little loud. ;) I > would happily buy a bare-bones Itanium 2 system at the same price. Taking into account that Itanium 2 has much more high performance, the price from HP looks reasonable. Moreover, I found that HP prices for Itanium 2 computers are lower than the prices for Itanium 2 servers manufactured by other "non-brand" companies ! So we should look to HP prices as to the best "indicator" of prices (I don't work for HP :-) ). It must be some "pressure" from users to computer manufacturers, they must understand that it exist now more cheap alternatives. > This > doesn't seem to like it's going to be possible any time soon. In less > than two weeks, I will be able to buy an Opteron system that runs a > great deal faster at the same price. Yes, Opteron may give good alternative, but I'm not sure that price/performance ratio for Opteron servers will be better than for P4 Xeon dual servers. Only if you need badly 64-bit processor ... Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 15 Apr 103 17:38:50 +0400 (MSD) Subject: [Linux-ia64] Itanium gets supercomputing software In-Reply-To: <3E9B40D2.9010400@octopus.com.au> from "Duraid Madina" at Apr 15, 3 09:14:26 am Message-ID: <200304151338.RAA24401@nocserv.free.net> According to Duraid Madina > > > SPECfp2000 is ~1170 for a 2GHz 1MB L2 Opteron. Not too bad. The SPECint > figure is fantastic though (~1200). If you'll re-calcualte SPECcpus data to frequencies of Opteron will be available just now (1.4 and 1.6 Ghz, and 1.8 Ghz in May - according unofficial russian source), then Xeon has more high performance. What is about price, then 1.6 Hhz will have the price about $670-$690 (but 1.4 Hhz chips will be *much more* cheap). "Today" is not the best day for IA64, because It2 will be very soon excahnged to Madison w/1.3-1.5x speedup. I don't like x86 architecture (IA-32), but today I can't wait Deerfield :-( and I think about Opteroon also ... Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 15 Apr 103 20:27:17 +0400 (MSD) Subject: [Linux-ia64] Itanium gets supercomputing software In-Reply-To: <1050421292.27085.9.camel@sadl16603.sandia.gov> from "Keith D. Underwood" at Apr 15, 3 09:41:32 am Message-ID: <200304151627.UAA26477@nocserv.free.net> According to Keith D. Underwood > You should actually look at those numbers. See here: > > http://www.spec.org/cpu2000/results/res2002q4/cpu2000-20021119-01859.html > > The only way you get graphs like that is when a couple of your > benchmarks actually fit in cache. Benchmarks running from cache are not > terribly representative of most real applications. > Sorry, I'm not familiar w/details of cache behaviour of separate tests from SPECfp2000: are you sure that tests "working sizes" fit to 3 MB L2 but will not fit (i.e. gives a lot of cache misses) in 1 MB on Xeon for example ? (I don't say even about more large L3 cache in Power4 or about 1.75 MB in Alpha 21364 or 1.5 MB (D-cache) in PA-8700). Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 18 Apr 103 19:11:43 +0400 (MSD) Subject: [Linux-ia64] Itanium gets supercomputing software In-Reply-To: <20030416203811.GB1149@greglaptop.internal.keyresearch.com> from "Greg Lindahl" at Apr 16, 3 01:38:11 pm Message-ID: <200304181511.TAA20527@nocserv.free.net> According to Greg Lindahl > > Open64 has a GPLed IA64 backend. While it's unfortunate that SGI has > stopped GPLing new work on it, it's still a pretty good compiler. It's bad for beowulf community not only because SGI has great compiler team. IMHO, in the case of IA-64, program optimisation for next generation chips is very sensitive to microarchitecture details because of needs to prepare simultaneously executed contens in bundles. But like restrictions (allowing to do parallel execution in bundles) are changed really (for example from Itanium to Itanium 2) and it's necessary to re-construct the optimizations block of compiler. This means that old compilers will lost their efficiency :-( Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 18 Apr 103 19:00:28 +0400 (MSD) Subject: [Linux-ia64] Itanium gets supercomputing software In-Reply-To: <200304170756.h3H7umB02357@dali.crs4.it> from "Alan Scheinine" at Apr 17, 3 09:56:48 am Message-ID: <200304181500.TAA20408@nocserv.free.net> According to Alan Scheinine > > I do not think there was a promise that getting efficiency would > be easier with EPIC. My understanding of the situation is that > the logic of dynamic allocation of resources, that is, the various > tricks done in silicon, could not scale to a large number of > processing units on a chip. That is, the complexity grows faster > than linear, much faster. I beleive you are absolutely right. One of main reasons of IA64/EPIC developmnet were difficulties just in development hardware logic of superscalar out-of order calculations. But pls look to the current (and nearest future) IA-64 chips. The number of execution units don't increase: the main advantages of McKinley in comparison w/Itanium (in microarchitectural sense) was allowing to do more parallel/simultaneous instructions in pair of bundles (elimination of a set of restrictions in Itanium) plus, of course, cache, frequency etc. The number of execution unints in Madison will be, as I understand, the same. Next IA-64 chips will have >1 microprocessor cores, what means, by my opinion, that every microprocessor core will have again the same number of execution units. It looks that Intel increase size of cache, frequency, insert simultaneous multi-threading etc, but I don't see incerase of execution units number. This means that some potential advantages of IA-64/EPIC are not realized. IMHO, it may be simple because of compilers problem. If compiler can't realize high average IPC (instructions -per-cycle) value for real applications, why I'll add new execution units ? Mikhail Kuzminksy Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 22 Apr 103 18:48:48 +0400 (MSD) Subject: Opteron announcement In-Reply-To: <20030422053550.GA6923@sphere.math.ucdavis.edu> from "Bill Broadley" at Apr 21, 3 10:35:51 pm Message-ID: <200304221448.SAA21439@nocserv.free.net> According to Bill Broadley > Apparently the link to http://www.amd.com/opteronservers just went > live. Tons of cool docs/benchmarks. > > ... > Oh and one more interesting link: > Software Optimization Guide for AMD athlon 64 and AMD Opteron Processors > http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7203,00.html > > Amusingly all the submissions that I looked at the full reports for > use the Intel compiler. So the Opterons extra registers are ignored. > > Time will tell if 3rd party compilers that fully utilize the additional > registers can win benchmarks against Intel's compiler. PGI (Portland Group) 5.0 will have Opteron support. The product will be available at summer (June, if I remember correctly). It'll be very interesting to compare ! Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sun, 25 May 103 21:25:17 +0400 (MSD) Subject: Opteron-based nodes benchmarks: RDTSC Message-ID: <200305251725.VAA20503@nocserv.free.net> I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz server we want to use in Beowulf cluster. In particular, I need to measure small time intervals, for which I want to use RDTSC-based "function" (for example I attach below one - published by T.Prince). But it requires some minor modifications, I beleive, to work properly on x86-64. I use gcc-3.2 under SuSE SLES8 and call this function from the source compilated by pgf90-5.0beta2 (64-bit mode). The original source version of function by T.Prince gives assembler errors because i386 is not pre-defined. I simple defined both i386 and _M_IX86, gcc -c is now OK, it create 64-bit object module, but after linking and runs of test the time measured is wrong :-( (negative in some cases). I'll be very appreciate for any ideas what should I modify in the source (applied below) to resolve the problem. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow kus at free.net > =================================================== > #define _IFC 1 > > #define CLOCK_RATE 1600000000 > /* SET THIS AND RECOMPILE FOR TARGET MACHINE */ > #undef _WIN32 > /* set not to use API calls even on Windows */ > #ifdef _WIN32 > #include > #endif > unsigned long long int rdtsc( ) > { > #ifdef _M_IA64 > > unsigned __int64 __getReg(int whichReg); > #pragma intrinsic(__getReg); > #define INL_REGID_APITC 3116 > > return __getReg(INL_REGID_APITC); > #elif defined(_WIN32) > unsigned long long int qpc; > (void)QueryPerformanceCounter((LARGE_INTEGER *)&qpc); > return qpc; > #elif defined(__GNUC__) > #ifdef i386 > long long a; > asm volatile("rdtsc":"=A" (a)); > return a; > #else > unsigned long result; > /* gcc-IA64 version */ > __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: "memory"); > while (__builtin_expect ((int) result == -1, 0)) > __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: > "memory"); > return result; > > #endif > #elif defined(_M_IX86) > _asm > { > _emit 0x0f /* rdtsc */ > _emit 0x31 > > } > return; > #else > #error "only supports IA64,IX86,GNUC" > #endif > } > > #ifdef _G77 > double g77_etime_0__ (float tarray[2]) > #elif defined (_IFC) > double g77_etime_0_ (float tarray[2]) > #else > double g77_etime_0 (float tarray[2]) > #endif > > { > static int win32_platform = -1; > double usertime, systime; > > { > static double clock_per=1./(long long)CLOCK_RATE; > static unsigned long long int old_count; > unsigned long long count; > if(!old_count){ > #ifdef _WIN32 > unsigned long long int qpf; > if(QueryPerformanceFrequency((LARGE_INTEGER *)&qpf)) > clock_per=1./(long long)qpf; > #endif > old_count=rdtsc(); > } > > count = rdtsc(); > tarray[0] = usertime = (long long)(count - old_count) * clock_per; > tarray[1] = 0; > } > return usertime ; > > } > > #ifdef _G77 > void f90_cputime4__(float *time){ // Intel Fortran call > #elif defined (_IFC) > void f90_cputime4_(float *time){ > #else > void f90_cputime4 (float *time){ > #endif > float tarray[2]; > #ifdef _G77 > *time=(float)g77_etime_0__ (tarray); > #else > *time=(float)g77_etime_0_ (tarray); > #endif > } > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 26 May 103 22:43:43 +0400 (MSD) Subject: Opteron-based nodes benchmarks: RDTSC In-Reply-To: <200305251725.VAA20503@nocserv.free.net> from "Mikhail Kuzminsky" at May 25, 3 09:56:38 pm Message-ID: <200305261843.WAA10134@nocserv.free.net> According to Mikhail Kuzminsky > > I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz > server we want to use in Beowulf cluster. In particular, I need to measure > small time intervals, for which I want to use RDTSC-based "function" > (for example I attach below one - published by T.Prince). But it requires > some minor modifications, I beleive, to work properly on x86-64. > I found now that all is OK if I'm using calls from g77-33 (#define for 386 and _M_IX86 as I wrote in previous message are enough). Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow kus at free.net _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 11 Jun 103 22:10:00 +0400 (MSD) Subject: NAS Parallel Benchmarks for Current Hardware In-Reply-To: <3EE609F7.BE430A1E@ideafix.litec.csic.es> from "A.P.Manners" at Jun 10, 3 05:40:23 pm Message-ID: <200306111810.WAA01122@nocserv.free.net> According to A.P.Manners > > I am looking to put together a small cluster for numerical simulation > and have been surprised at how few NPB benchmark results using current > hardware I can find via google. > It's common situation w/NPB (in opposition to Linpack, SPECcpu e.a.) :-( Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 18 Jun 103 20:05:31 +0400 (MSD) Subject: what is a flop In-Reply-To: <3EEF5F48.5020505@roma2.infn.it> from "Roberto Ammendola" at Jun 17, 3 08:34:48 pm Message-ID: <200306181605.UAA24772@nocserv.free.net> According to Roberto Ammendola > The "Floating point operations per clock cycle" depends on the > processor, obviously, and on which instructions you use in your code. > For example in a processor with the SSE instruction set you can perform > 4 operations (on 32 bit register each) per clock cycle. One processor > (Xeon or P4) running at 2.0 GHz can reach 8 GigaFlops. Taking into account that throughput of FMUL and FADD units in P4/Xeon is 2 cycles, i.e. FP result may be received on any 2nd sycle only, the peak Performance of P4/2 Ghz must be 4 GFLOPS. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 18 Jun 103 20:19:35 +0400 (MSD) Subject: SMP CPUs scaling factors (was "what is a flop") In-Reply-To: from "Franz Marini" at Jun 18, 3 10:53:17 am Message-ID: <200306181619.UAA24910@nocserv.free.net> According to Franz Marini > On Tue, 17 Jun 2003, Maurice Hilarius wrote: > > And I would say dual CPU boards do not sale at a factor of 2:1 over singles. > > ... > > As a general ( really general as it changes a lot with code and > compilers) > > the rule I know : > > Dual P3 ( VIA chipset): 1.5 : 1 > > Dual XEON P4 ( Intel 7501 chipset): 1.3 : 1 > ... > > Dual AthlonMP ( AMD 760MPX chipset) 1.4 : 1 > > Does anyone have some real world application figures regarding the > performance ratio between single and two-way (and maybe four-way) SMP > systems based on the P4 Xeon processor ? I may say about SMP speedups for AthlonMP/760MP, for P4 they will depends from chipset (kind of FSB and memory used). On G98 speedup for 2 CPUs is between 1.4-1.8 depending from calc. method and problem size. For Opteron/1.6 Ghz they are higher (up to 1.97 in some G98 tests). 4-way P4 SMP may be not too attractive if 4 CPUs will share common bus to memory. 4-way Opteron's system must be very good (they may be will arrive soon in the market). Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 20 Jun 103 17:42:01 +0400 (MSD) Subject: cluster of AOD Opteron In-Reply-To: from "Stefano" at Jun 18, 3 11:22:25 pm Message-ID: <200306201342.RAA28782@nocserv.free.net> According to Stefano > As I am going to receive some funding this fall, I was wondering of buying > an opteron cluster for my research. > Mainlym the cluster will run VASP (an ab-initio quantum program, > written by a group in Wien), with myrinet. > Is somebody who is using AMD opterons yet ? We tested 2-way SMP server based on RioWorks mobo. But I should not recommend this motherboard for using: by default it has no monitoring (temperature etc) chips on the board, it's necessary to buy special additional card ! Unfortunately as a result I don't have data about lm_sensors work. Moreover, the choice of SMP boards is very restricted now: Tyan S2880 and MSI K8D. > ... > I think some fortran vendor has announced the port of their F90 to > the opteron. Well, it would be nice to recompile VASP for 64bits and see > how fast it goes. There is some possibilities: pgf90, Intel ifc(32 bit only), g77-3.3 (now really is very good, but f77 only) and Absoft. We tested 3 first compilers. But I'm not sure that you'll receive just now essential speed-up from 64 bit mode itself. SSE2 is supported in 32 bit mode also, but it looks that SSE2 in Opteron is realized "more worse" than in P4 (in the sense of microarchitecture). Yes, some compilers can now generate codes which use additional registers from x86-64 architecture extensions, but we didn't find essential speed-up on simple loops like DAXPY. > With the itanium2 (compiled in 2 version 32 and 64 > bits), it not so fast to justify the HUGE cost of an itanium cluster. > Maybe the opteron will shake high-performace scientific computing ! I beleive yes, but for 64-bit calculations. The price for Opteron- based servers is high, and price/performance ratio in comparison w/Xeon is not clear. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 20 Jun 103 17:57:28 +0400 (MSD) Subject: cluster of AOD Opteron (Stefano) In-Reply-To: <000401c33683$aaf403c0$0b01a8c0@redstorm> from "moor007@bellsouth.net" at Jun 19, 3 11:56:02 am Message-ID: <200306201357.RAA28995@nocserv.free.net> According to moor007 at bellsouth.net > I just received my hardware yesterday for my opteron cluster. My tech will > start putting it together today or tomorrow. I am building a 16 CPU cluster > w/ the 240 processor onboard the Tyan 2880. I will be using the 2D wulfkit > running SuSE enterprise server and Portland Group Server for the Opteron. I > am hoping it will be fast. Of course, that is relative. Anyway, I said all > that to say that I will begin posting performance benchmarks as they become > available. We compared Opteron/1.6 w/dual DDR266 CL2.5 and Athlon MP 1800+ w/close frequency (1533 MHz) and DDR266 also. Speedup for Gamess-US (ifc 7.1, opt for P4) and for binary G98 version (pgf77, optimized for PIII) on a set of different computational methods (in the sense of cache localization, memory throughput requirements etc) is about 1.5-1.9. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 20 Jun 103 18:09:51 +0400 (MSD) Subject: [OT] Maximum performance on single processor ? In-Reply-To: <4.3.2.7.2.20030620140207.00ae23a0@pop.freeuk.net> from "Simon Hogg" at Jun 20, 3 02:15:47 pm Message-ID: <200306201409.SAA29175@nocserv.free.net> According to Simon Hogg > > At 14:44 20/06/03 +0200, Marc Baaden wrote: > >I have an existing application which is part of a project. I have > >the source code. It is Fortran. It *can* be parallelized, but we > >would rather spend our time on the other parts of the project > >which need to be written from scratch *first*. > > > >The application is to run in real time, that is the user does something > >and as a function of user input and the calculation with the fortran > >program that I described, there is a correponding feedback to the > >user on the screen (and in some Virtual Reality equipment). > > > >Right now, even on simple test cases, the "response time" (eg calculation > >time for a single step) of our program is on the order of the second. > >(this is for an athlon MP 2600+) > >We need to get that down to a fraction of seconds, best milli-seconds, > >in order to be usable in real time. (makes it a factor of roughly 1000) > > > >As I said the code can indeed be parallelized - maybe even simply cleaned > >up in some parts - but unfortunately there remains very much other important > >stuff to do. So we'd rather spend some money on a really fast CPU and not > >touch the code at the moment. > > > >So my question was more, what is the fastest CPU I can get for $20000 > >at the moment (without explicitly parallelizing, hyperthreading or > >vectorizing my code). > > I'm sure some other people will give 'better' answers, but from having a > look at your web pages, I would be tempted to go down the route of > second-hand SGI equipment. > > For example (and no, I don't know how the performance stacks up, I'm > looking partly at a general bio-informatics / SGI link if that makes sense) > I can see for sale an Origin 2000 Quad 500MHz / 4GB RAM for UKP 15,725. W/o parallelization it looks as bad choice: any CPU will be more slow than the same Opteron or P4. If FP performance is important, Power4+ or Itanium 2 (or, more exactly, Madison one month later) may be the best choice. And, at least, optimize your program as possible :-) Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sat, 21 Jun 103 17:48:28 +0400 (MSD) Subject: cluster of AOD Opteron In-Reply-To: <005701c33792$c7c1ddf0$6501a8c0@sims.nrc.ca> from "Serguei Patchkovskii" at Jun 20, 3 09:16:44 pm Message-ID: <200306211348.RAA15586@nocserv.free.net> According to Serguei Patchkovskii > for Opteron- > > based servers is high, and price/performance ratio in comparison > > w/Xeon is not clear. > Once you start populating your systems with "interesting" amounts of memory > (i.e. anything above 2Gbytes), the price difference between dual Opterons > and > dual Xeons is really in the noise - at least at the places we buy. If your > suppliers > charge you a lot more for Opterons, may be you should look for another > source? > There is currently not "too wide" choice of possible sources of dual Opteron systems now in Russia :-) I agree that high memory price (for DIMMs from 1 GB, but the price will decrease) lower the percent of differences in total price, but if you use 512MB DIMMs for complectation, price difference is essential. Pls sorry: I assume, that in general the prices here in Russia are similar to other countries, but I didn't check just now. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sat, 21 Jun 103 17:16:15 +0400 (MSD) Subject: cluster of AOD Opteron In-Reply-To: <1056121119.9688.7.camel@picard.lab.atipa.com> from "Curt Moore" at Jun 20, 3 09:58:40 am Message-ID: <200306211316.RAA15134@nocserv.free.net> According to Curt Moore > The RioWorks HDAMA (Arima) motherboard does have on-board sensors, > adm1026 based. 1) there is no information about environment monitoring chips in the HDAMA motherboard guide (at least in the guide we had) 2) sensors-detect utility (I used version from SuSe enterprise Linux beta-version distribution) didn't find any monitoring chips at the testing > Arima does have planned both a mini BMC which does just > management type functions and also a full BMC with will do other neat > things, I believe, such as KVM over LAN. Below is a lm_sensors dump > from an Arima HDAMA. It's good. But which lm_sensors version should be used and what are the necessary settings for lm_sensors kernel modules (taking into account that lm_sensors didn't find anything ) ? > > adm1026-i2c-0-2c > Adapter: SMBus AMD8111 adapter at 80e0 > Algorithm: Non-I2C SMBus adapter > in0: +1.15 V (min = +0.00 V, max = +2.99 V) > in1: +1.59 V (min = +0.00 V, max = +2.99 V) > in2: +1.57 V (min = +0.00 V, max = +2.99 V) > in3: +1.19 V (min = +0.00 V, max = +2.99 V) > in4: +1.18 V (min = +0.00 V, max = +2.99 V) > in5: +1.14 V (min = +0.00 V, max = +2.99 V) > in6: +1.24 V (min = +0.00 V, max = +2.49 V) > in7: +1.59 V (min = +0.00 V, max = +2.49 V) > in8: +0.00 V (min = +0.00 V, max = +2.49 V) > in9: +0.45 V (min = +1.25 V, max = +0.98 V) > in10: +2.70 V (min = +0.00 V, max = +3.98 V) > in11: +3.33 V (min = +0.00 V, max = +4.42 V) > in12: +3.38 V (min = +0.00 V, max = +4.42 V) > in13: +5.12 V (min = +0.00 V, max = +6.63 V) > in14: +1.57 V (min = +0.00 V, max = +2.99 V) > in15: +11.88 V (min = +0.00 V, max = +15.94 V) > in16: -12.03 V (min = +2.43 V, max = -16.00 V) > fan0: 0 RPM (min = 0 RPM, div = 2) > fan1: 0 RPM (min = 0 RPM, div = 2) > fan2: 0 RPM (min = 0 RPM, div = 2) > fan3: 0 RPM (min = 0 RPM, div = 2) > fan4: 0 RPM (min = 0 RPM, div = 1) > fan5: 0 RPM (min = 0 RPM, div = 1) > fan6: -1 RPM (min = 0 RPM, div = 1) > fan7: -1 RPM (min = 0 RPM, div = 1) > temp1: +37?C (min = -128?C, max = +80?C) > temp2: +46?C (min = -128?C, max = +100?C) > temp3: +46?C (min = -128?C, max = +100?C) > vid: +1.850 V (VRM Version 9.1) > Sorry, what does it means ? adm1026 has no enough possibilities to measure the values (in this case only 3 temperatures but no any RPM value) or lm_sensors version don't work correctly ? Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 24 Jun 103 20:12:35 +0400 (MSD) Subject: Opteron (x86-64) compute farms/clusters? In-Reply-To: <3EF809A4.1050802@dlr.de> from "Thomas Alrutz" at Jun 24, 3 10:19:48 am Message-ID: <200306241612.UAA09513@nocserv.free.net> According to Thomas Alrutz > > I just made some benchmarks on a Opteron 240 (1.4 GHz) node running with > Suse/United Linux Enterprise edition. > I have sucessfully compiled mpich-1.2.4 in 64 bit without any problems > (./configure -device=ch_p4 -commtype=shared). The default compiler is > the gcc-3.2.2 (maybe a Suse patch) and is set to 64Bit, the Portland > (5.0beta) compiler didn't worked at all ! > > I tried our CFD-code (TAU) to run 3 aerodynamik configurations on this > machine with both CPUs and the results are better then estimated. > We achieved in full multigrid (5 cycles, 1 equation turbulence model) a > efficiency of about 97%, 92% and 101 % for the second CPU. > Those results are much better as the results we get on the Intel Xeons > (around 50%). It looks that this results are predictable: Xeon CPUs require high memory bandwidth, but both CPUs share common system bus. Opteron CPUs have own memory buses and scale in this sense excellent. Better SPECrate results for Opteron (i.e. work on a mix of tasks) confirm (in particular) this features. CFD codes, I beleive, require high memory throughput ... Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 27 Jun 103 21:01:49 +0400 (MSD) Subject: Intel PRO/1000CT Gigabit ethernet with CSA In-Reply-To: <3EFBEA29.60602@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 08:54:33 am Message-ID: <200306271701.VAA12659@nocserv.free.net> According to Daniel Pfenniger > > For a small experimental cluster (24 dual Xeon nodes) > we decided to use InfiniBand technology, which from specs is > 4 times faster (8Gb/s), 1.5 lower latency (~5musec) than > Myrinet for approximately the same cost/port. Could you pls compare them a bit more detailed ? Infiniband card costs (as I heard) about $1000-, (HCA-Net from FabricNetworks, former InfiniSwitch ?), what is close to Myrinet. But what is about switches (I heard about high prices) ? In particular, I'm interesting in very small switches; FabricNetworks produce 8-port 800-series switch, but I don't know about prices. May be there is 6 or 4 port switches ? BTW, is it possible to connect pair of nodes by means of "cross-over" cable (as in Ethernet), i.e. w/o switch ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Sun, 29 Jun 103 18:14:48 +0400 (MSD) Subject: Intel PRO/1000CT Gigabit ethernet with CSA In-Reply-To: <3EFCA093.4090006@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 09:52:51 pm Message-ID: <200306291414.SAA12281@nocserv.free.net> According to Daniel Pfenniger > Patrick Geoffray wrote: > > On Fri, 2003-06-27 at 13:46, Daniel Pfenniger wrote: > >>The exact costs are presently not well fixed because several companies > >>enter the market. The nice thing about IB is that it is an open > >>standard, the components from different companies are compatible, > >>which is good for pressing costs down. > > > > With the slicon coming from one company (actually 2 but the second one > > does only switch chip), the price adjustment would mainly affect the > > reseller, where the margin are not that high. I don't expect much a > > price war in the Infiniband market, mainly because many IB shops are > > already just burning (limited) VC cash. > > The main point for price advantage of IB is if the volume goes up. It's > > a very different problem that the multiple-vendors-marketing-stuff. One > > can argue that HPC does not yield such high volumes, only a business > > market like the Databases one does. > > > > Remember Gigabit Ethernet. It was very expensive when the early adopters > > were the HPC crowd and the price didn't drop until it made its way to > > the desktop. It's the case for 10GE today. > > ... > > Patrick Geoffray > > Myricom, Inc. > > Yes I mostly agree with your analysis, database is the only significant > potential market for IB. > > However the problem with 1GBE or 10GBE is that the latency remains poor > for HPC applications, while IB goes in the right direction. > The real comparison to be made is not between GE and IB, but between > IB and Myricom products, which belong to an especially protected niche. > As a result for years the Myrinet products did hardly drop in price > for a sub-Moore's-law increase in performance, because of a lack of > competition (the price we paid for our Myricom cards and switch > 18 months ago is today *exactly* the same). I agree with you both. From the viewpoint of HPC clusters the IB competitor is Myrinet (and SCI etc). But there are many applications w/coarse-grained parallelism, where bandwidth is the main thing, not the latency (I think, quantum chemistry applications are bandwidth- limited). In this case (i.e. if latnecy is less important) 10Gb Ethernet is also IB competitor. Moreover, IB, I beleive, will be used for TCP/IP connections also - in opposition to Myrinet etc. (I beleive there is no TCP/IP drivers for Myrinet - am I correct ?) Again, from the veiwpoint of some real appilications, there are some applications which use TCP/IP stack for parallelization (I agree that is bad, but ...) - for example Linda tools (used in Gaussian) work over TCP/IP, Gamess-US DDI "subsystem" works over TCP/IP. In the case of IB or 10Gb Ethernet TCP/IP is possible. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 3 Jul 103 20:27:51 +0400 (MSD) Subject: Linux support for AMD Opteron with Broadcom NICs In-Reply-To: <20030701224808.GA15167@stikine.ucs.sfu.ca> from "Martin Siegert" at Jul 1, 3 03:48:08 pm Message-ID: <200307031627.UAA02885@nocserv.free.net> According to Martin Siegert > > Hello, > I have a dual AMD Opteron for a week or so as a demo and try to install > Linux on it - so far with little success. > First of all: doing a google search for x86-64 Linux turns up a lot of > press releases but not much more, particularly nothing one could download > and install. Even a direct search on the SuSE and Mandrake sites shows > only press releases. Sigh. > Anyway: I found a few ftp sites that supply a Mandrake-9.0 x86_64 version. > Thus I did a ftp installation which after (many) hickups actually worked. > However, that distribution does not support the onboard Broadcom 5704 > NICs. I also could not get the driver from the broadcom web site to work > (insmod fails with "could not find MAC address in NVRAM"). > Thus I tried to compile the 2.4.21 kernel which worked, but > "insmod tg3" freezes the machine instantly. > Thus, so far I am not impressed. > For those of you who have such a box: which distribution are you using? > Any advice on how to get those GigE Broadcom NICs to work? I may only add to the list of AMD64-oriented distributions Turbolinux 8 for AMD64. I'm not sure that "promotional" version of Turbolinux is complete enough, but "commercial" version costs only about $70 (w/o support ;-)). BTW, does somebody try it ? We worked w/SuSE SLES8: it looks today as the only "reliable" choice of 64-bit ditribution :-( Let me congratulate our colleagues in USA w/4th July ! Mikhail Kuzminsky Zelinsky Inst. of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 16 Jul 103 18:28:33 +0400 (MSD) Subject: Global Shared Memory and SCI/Dolphin In-Reply-To: <200307161516.09818.joachim@ccrl-nece.de> from "Joachim Worringen" at Jul 16, 3 03:16:09 pm Message-ID: <200307161428.SAA28224@nocserv.free.net> According to Joachim Worringen > Franz Marini: > > being in the process of deciding which net infrastructure to use for = > our > > next cluster (Myrinet, SCI/Dolphin or Quadrics), I was looking at the > > specs for the different types of hw. > > Provided that SCI/Dolphin implements RDMA, I was wondering why so lit= > tle > > effort seems to be put into implementing a GSM solution for x86 cluster= > s. > > Because MPI is what most people want to achieve code- and=20 > peformance-portability. Partially I may agree, partially - not: MPI is not the best in the sense of portability (for example, optimiziation requires knowledge of interconnect topology, which may vary from cluster to cluster, and of course from MPP to MPP computer). I think that if there is relative cheap and effective way to build ccNUMA system from cluster - it may have success. > > > The only (maybe big, maybe not) problem I see in the Dolphin hw is the > > lack of support for cache coherency. > > > > I think that having GSM support in (almost) commodity clusters would = > be > > a really-nice-thing(tm). > > Martin Schulz (formerly TU M=FCnchen, now Cornell Theory Center) has deve= > loped=20 > exactly the thing you are looking for. See=20 > http://wwwbode.cs.tum.edu/Par/arch/smile/software/shmem/ . You will also = > find=20 > his PhD thesis there which describes the complete software. > > I do not know about the exact status of the SW right now (his approach=20 > required some patches to the SCI driver, and it will probably be necessar= > y to=20 > apply them to the current drivers). Very interesting approach, though. > > Other, non SCI approaches like MOSIX and the various DSM/SVM libraries al= > so=20 > offer you some sort of global shared memory - but most do only use TCP/IP= > for=20 > communication. > Joachim > Joachim Worringen - NEC C&C research lab St.Augustin > fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de > Even hardware implementation of CPUs cache-coherence for large number of processors may give bottleneck. Broadcasting MOESI gives high coherence traffic, ccNUMA-systems use directory-based cache-coherence approach. Software solutions are in general not efficient, but hardware solutions (if they will exist) will be expensive :-( Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 16 Jul 103 22:31:15 +0400 (MSD) Subject: Global Shared Memory and SCI/Dolphin In-Reply-To: <19coKN-5n4-00@etnus.com> from "James Cownie" at Jul 16, 3 04:36:23 pm Message-ID: <200307161831.WAA02082@nocserv.free.net> According to James Cownie > > > > Because MPI is what most people want to achieve code- and > > > peformance-portability. > > > Partially I may agree, partially - not: MPI is not the best in the > > sense of portability (for example, optimiziation requires knowledge > > of interconnect topology, which may vary from cluster to cluster, > > and of course from MPP to MPP computer). > > MPI has specific support for this in Rolf Hempel's topology code, > which is intended to allow you to have the system help you to choose a > good mapping of your processes onto the processors in the system. Unfortunately I do not know about that codes :-( but for the best optimization I'll re-build the algorithm itself to "fit" for target topology. > > This seems to me to be _more_ than you have in a portable way on the > ccNUMA machines, where you have to worry about > > 1) where every page of data lives, not just how close each process is > to another one (and you have more pages than processes/threads to > worry about !) > > 2) the scheduler choosing to move your processes/threads around the > machine. Yes, but "by default" I beleive that they are the tasks of operating system, or, as maximum, the information I'm supplying to OS, *after* translation and linking of the program. > > > I think that if there is relative cheap and effective way to build > > ccNUMA system from cluster - it may have success. > > Which is, of course, what SCI was _intended_ to be, and we saw how > well that succeeded :-( > > -- Jim > James Cownie > Etnus, LLC. +44 117 9071438 > http://www.etnus.com Mikhail Kuzminsky Zelinsky Institute of Organic Chemsitry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 25 Jul 103 20:55:49 +0400 (MSD) Subject: Infiniband: cost-effective switchless configurations Message-ID: <200307251655.UAA08132@nocserv.free.net> It's possible to build 3-nodes switchless Infiniband-connected cluster w/following topology (I assume one 2-ports Mellanox HCA card per node): node2 -------IB------Central node-----IB-----node1 ! ! ! ! ----------------------IB----------------------- It gives complete nodes connectivity and I assume to have 3 separate subnets w/own subnet manager for each. But I think that in the case if MPI broadcasting must use hardware multicasting, MPI broadcast will not work from nodes 1,2 (is it right ?). OK. But may be it's possible also to build the following topology (I assume 2 x 2-ports Mellanox HCAs per node, and it gives also complete connectivity of nodes) ? : node 2----IB-------- C e n t r a l n o d e -----IB------node1 \ / \ / \ / \ / \ / \ / \--node3 node4-- and I establish also additional IB links (2-1, 2-4, 3-1, 3-4, not presenetd in the "picture") which gives me complete nodes connectivity. Sorry, is it possible (I don't think about changes in device drivers)? If yes, it's good way to build very small and cost effective IB-based switchless clusters ! BTW, if I will use IPoIB service, is it possible to use netperf and/or netpipe tools for measurements of TCP/IP performance ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 20 Aug 103 20:09:20 +0400 (MSD) Subject: SGE on AMD Opteron ? Message-ID: <200308201609.UAA08558@nocserv.free.net> Sorry, is here somebody who works w/Sun GrideEngine on AMD Opteron platform ? I'm interesting in any information - about binary SGE distribution in 32-bit mode, or about compilation from the source for x86-64 mode, under SuSE or RedHat distribution etc. Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 22 Aug 103 22:15:01 +0400 (MSD) Subject: PCI-X/133 NICs on PCI-X/100 Message-ID: <200308221815.WAA27091@nocserv.free.net> I'm interesting in any experience about work of PCI-X/133 NICs with PCI-X/100 slot. Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards work w/PCI-X/100 slots on Opteron-based mobos (most of them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro mobos) - i.e. how high is the probability that they are incompatible ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemnistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 21 Oct 103 14:49:07 +0400 (MSD) Subject: parllel eigen solvers In-Reply-To: <200310201236.28901.kinghorn@pqs-chem.com> from "Donald B. Kinghorn" at Oct 20, 3 12:36:28 pm Message-ID: <200310211049.OAA18031@nocserv.free.net> According to Donald B. Kinghorn > > Does anyone know of any recent progress on parallel eigensolvers suitable for > beowulf clusters running over gigabit ethernet? > It would be nice to have something that scaled moderately well and at least > gave reasonable approximations to some subset of eigenvalues and vectors for > large (10,000x10,000) symmetric systems. > My interests are primarily for quantum chemistry. > In the case you think about semiempirical fockian diagonalisation, there is a set of alternative methods for direct construction of density matrix avoiding preliminary finding of eigenvectors. This methods are realized, in particular, in Gaussian-03 and MOPAC-2002 methods. For non-empirical quantum chemistry diagonalisation usually doesn't limit common performance. In the case of methods like CI it's necessary to find only some eigenvectors, and it is better to use special diagonalization methods. There is special parallel solver package, but I don't have exact reference w/me :-( Mikhail Kuzminsky Zelinsky Inst. of Orgamic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 21 Oct 103 22:10:23 +0400 (MSD) Subject: parllel eigen solvers In-Reply-To: <20031021150637.GA8076@plk.af.mil> from "Arthur H. Edwards" at Oct 21, 3 09:06:37 am Message-ID: <200310211810.WAA08779@nocserv.free.net> According to Arthur H. Edwards > > I should point out that density function theorcan be compute-bound on > diagonalization. QUEST, a Sandia Code, easily handles several hundred > atoms, but the eigen solve dominates by ~300-400 atoms. Thus, > intermediate size diagonalization is of strong interest. > > Art Edwards > Yes, I agree w/you about DFT. Yours Mikhail Kuzminsky _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 30 Dec 103 18:23:32 +0300 (MSK) Subject: [Beowulf] X-window, MPICH, MPE, Cluster performance test In-Reply-To: from "D. Scott" at Dec 29, 3 11:27:21 am Message-ID: <200312301523.SAA06085@nocserv.free.net> According to D. Scott > > At last! My cluster is now online. I would like to thank everyone for they > help. I thinking of putting a website together covering my experience in > putting this cluster together. Will this be of use to anyone? Is they > website that covers top 100 list of small cluster?. > Now it is online I would like to test it. > > MPICH comes with test program, eg mpptest. Programs works and it produce > nice graph. Is they any documentation/tutorial that explains meaning of > these graphs? > MPICH also comes with MPE graphic test programs, mandel. Problem is that I > have only got X-window installed on the master node. But, when I run > pmandel, it returms an error, staying that it can not find shared library > for X-window on other nodes. How can I make X-window shared across other > nodes from the Master node? You may use NFS for access to master node. > Same me install GUI programs on other nodes. > This could be related problem, but when I complied life (that uses MPE > libraries) it returns error that MPE libraries are undefined. Any ideas? > Can I install both LAM/MPICH and MPICH-1.2.5 on the same machine? Yes, of course you may work w/both LAM and MPICH. BTW, let me congratulate Beowulf maillist subscribers w/New Year ! Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 23 Jan 104 15:35:32 +0300 (MSK) Subject: [Beowulf] cluster on suse In-Reply-To: from "Anand TNC" at Jan 23, 4 10:40:43 am Message-ID: <200401231235.PAA05593@nocserv.free.net> According to Anand TNC > > Hi, > > I'm new to clustering...does anyone know of some clustering software which > works on Suse 8.2 or Suse 9.0? All of the usual cluster software will work succesfully w/SuSE Linux. If you say about software *included* in distribution as RPM-packages, then also yes, SuSE Linux has most important things such as MPI for example. Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow > > Thanks > > regards, > > Anand > > -- > Anand TNC > PhD Student, > Engine Research Laboratory U-55 IISc Hostels, > Dept. of Mechanical Engg., Indian Institute of Science, > Indian Institute of Science, Bangalore 560 012. > Bangalore 560 012. Ph: 080 293 2591 > Lab Ph: 293 2352 080 293 2624 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Tue, 10 Feb 104 21:27:22 +0300 (MSK) Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> from "=?big5?q?Andrew=20Wang?=" at Feb 10, 4 11:42:32 am Message-ID: <200402101827.VAA05978@nocserv.free.net> According to =?big5?q?Andrew=20Wang?= > From comp.arch: "One of the things that the version > 8.0 of the Intel compiler included was an > "Intel-specific" flag." > > But looks like the purpose is to slow down AMD: > http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com > > If intel releases 64-bit x86 CPUs and compilers, then > AMD may get even better benchmarks results. The danger of this "slow-down" is not too extremally large now: SPECcpu2000 results (perhaps the best obtained) published for "high-end" Opterons are based on Portland compiler, not on ifc. Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow > > Again, no matter how pretty the benchmarks results > look, in the end we still need to run on the real > system. So, what's the point of having benchmarks? > > Andrew. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 14 May 104 22:27:21 +0400 (MSD) Subject: [Beowulf] Athlon64 / Opteron test In-Reply-To: <40A4E4D8.9010001@mscsoftware.com> from "Joe Griffin" at May 14, 4 08:25:12 am Message-ID: <200405141827.WAA12362@nocserv.free.net> According to Joe Griffin > > ... > Below is a web site comparing IA32, IA64 (linux and HPUX), Opteron > and an IBM P655 running AIX. The site should only be used to > compare hardare platforms when running our software. I am sure > that Fluent, LSTC/Dyna, Star-CD have similar sites. I recomend > finding out about the software that you will be using. > > MSC.Nastran Hardware comparison: > > http://www.mscsoftware.com/support/prod_support/nastran/performance/v04_sngl.cfm > > Regards, > Joe Griffin > This page contains very interesting tables w/description of hardware used, but at first look I found only the data about OSes, not about compilers/run time libraries used. The (relative bad) data for IBM e325/Opteron 2 Ghz looks "nontrivial"; I beleive some interptretation of "why?" will be helpful. May be some applications used are relative cache-friendly and have working set placing in large Itanium 2 cache? May be it depends from compiler and Math library used ? BTW, for LGQDF test: I/O is relative small (compare pls elapsed and CPU times which are very close); but Windows time for Dell P4/3.2 Ghz (4480 sec) is much more worse than for Linux on the same hardware (3713 sec). IMHO, in this case they must be very close in the case of using same comlilers&libraries (I don't like Windows, but this result is too bad for this OS :-)) Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 10 Jun 104 19:11:31 +0400 (MSD) Subject: [Beowulf] Setting memory limits on a compute node In-Reply-To: from "Brent M. Clements" at Jun 8, 4 10:42:43 am Message-ID: <200406101511.TAA17314@nocserv.free.net> According to Brent M. Clements > > We have a user who submits a job to a compute node. > > The application is gaussian. The parent gaussian process can spawn a few > child processes. It appears that the gaussian application is exhausting > all of the memory in the system essentially stopping the machine from > working. You can still ping the machine but can't ssh. Anyway's I know the > fundementals of why this is happening. My question, is there any way to > limit a user's total addressable space that his processes can use so that > it doesn't kill the node? This situation may depends strongly from real method of calculation used in frames of Gaussian (and may be from objects of calculations, i.e. molecules). We work w/G98 (I beleive G03 will have the same behaviour) jobs and didn't have like problems. You may try to restrict (if it's really necessary) the memory used for particular Gaussian job by means of setting up of %mem value in the input Gaussian data; there is also default settings for %mem value in gaussian configuration file. G98 can't exceed %mem value. We inform our G98 users about upper limit of %mem value which don't leads to high paging. You may also try to setup ulimit/limit values for stack and data in the shell script used for G98 job submitting . Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 16 Jun 104 20:05:24 +0400 (MSD) Subject: [Beowulf] CCL:Experiences with 64 bits AMD processors (fwd from In-Reply-To: <20040616042135.GH12847@leitl.org> from "Eugen Leitl" at Jun 16, 4 06:21:35 am Message-ID: <200406161605.UAA24654@nocserv.free.net> According to Eugen Leitl > > > From: Marc Noguera Julian > Date: Tue, 10 Jun 2003 19:09:00 +0200 > To: chemistry at ccl.net > Subject: CCL:Experiences with 64 bits AMD processors > User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 > > Hello, > we are interested in buying some more computational resources. In our > group we are interested in 64 bit AMD processors, but we do not know > about their compatibility. They are supposed, as AMD says, to be32 bit > compatible and therefore AMD 64 bit processor should be able to run any > 32 bit application. Is that true? Any experience about this will help us > a lot. We run, in particular, Gaussian-98 (32 bit binary version) on Opteron servers w/SuSE SLES8. > By the way, we are running mainly gaussian jobs, and have some other 32 > bit binaries like turbomole and jaguar. We have source code license for > gaussian 03. Has anyone tried to compile Gaussian 03 for a AMD 64 bit > machine? Do 32 bit pentium binaries run correctly on a 64 bit processor > which is the increase on the performance? Yes, G03 is compiled at least by Gaussian, Inc itself: there is G03 64-bit binary version for Opteron in the price list. We have significant speed-up on Opteron in comparison w/Athlons. We run also 32-bit binaries codes translated for Pentium on Opteron. > Do Turbomole and Jaguar > binaries run on 64 bit AMD processors? anyone tried? > Any information will be helpful. > Thanks a lot > Marc > > --------------------------- > Marc Noguera Julian > Thcnic Especialista de Suport a la Recerca > Qummica Fisica, Universitat Autrnoma de Barcelona. > Tlf: 00-34-935812173 > Fax: 00-34-935812920 > e-mail: marc at klingon.uab.es > --------------------------------------- > > Eugen* Leitl leitl > ______________________________________________________________ > ICBM: 48.07078, 11.61144 http://www.leitl.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > http://moleculardevices.org http://nanomachines.net > Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Fri, 18 Jun 104 20:15:23 +0400 (MSD) Subject: [Beowulf] cluster on Mellanox Infiniband Message-ID: <200406181615.UAA19878@nocserv.free.net> We are purchasing a pair of Mellanox Infiniband 4x HCA cards (PCI-X/133) for building of small 2-nodes 4-processor switchless testing cluster on the base of AMD Opteron w/Tyan S2880 boards. The nodes work under SuSE Linux 9.0 for AMD64. I'll be very appreciate in receiving any information about following: 1) Do we need to buy some additional software from Mellanox ? (like THCA-3 or HPC Gold CD Distrib etc) 2) Any information about potential problems of building and using of this hard/software. To be more exactly, we want to install also MVAPICH (for MPI-1) or new VMI 2.0 from NCSA for work w/MPI. For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for installation. But I don't know, will we receive this software w/Mellanox cards or we should buy this software additionally ? I need this data badly, because we are very restricted in money ;-) ! Thanks for your help ! Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Jul 23 12:44:16 2012 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 21 Jun 104 17:46:23 +0400 (MSD) Subject: [Beowulf] cluster on Mellanox Infiniband In-Reply-To: from "Franz Marini" at Jun 21, 4 10:24:58 am Message-ID: <200406211346.RAA17895@nocserv.free.net> According to Franz Marini > Hi, > > On Fri, 18 Jun 104, Mikhail Kuzminsky wrote: > > > 1) Do we need to buy some additional software from Mellanox ? > > (like THCA-3 or HPC Gold CD Distrib etc) > > You shouldn't have to. Thank you VERY much for your fast reply !! I'm glad to hear ... > > 2) Any information about potential problems of building and using > > of this hard/software. > > > To be more exactly, we want to install also MVAPICH (for MPI-1) or > > new VMI 2.0 from NCSA for work w/MPI. > > For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for > > installation. But I don't know, will we receive this software w/Mellanox > > cards or we should buy this software additionally ? > > Hrm, no, VMI 2.0 doesn't require neither THCA-3 nor HPC Gold CD (whatever > it is ;)). The NCSA site for VMI says "Infiniband device is linked against THCA-3. OpenIB device is linked using HPC Gold CD distrib". What does it means ? I must install VMI for Opteron + SuSE 9.0, there is no such binary RPM, i.e. I must install VMI from the source. I thought that I must use software cited above for building of my bibary VMI version. I beleive that Software/Driver THCA Linux 3.1.1 will be delivered w/Mellanox cards. OpenSM 0.3.1 - I hope, also. But I don'n know nothing about "HPC Gold CD distrib" :-( > > We have a small (6 dual Xeon nodes, plus server) testbed cluster with > Mellanox Infiniband (switched, obviously). > > So far, it's been really good. We tested the net performance with SKaMPI4 > ( http://liinwww.ira.uka.de/~skampi/ ), the results should be in the > online db soon, if you want to check them out. > > Seeing that you are at the Institute of Organic Chemistry, I guess you're > interested in running programs like Gromacs or CPMD. So far both of them > worked great with our cluster, as far as only one cpu per node is used > (running two different runs of gromacs and/or CPMD on both cpus on each > node gives good results, but running only one instance of either program > on both cpus on each node results in very poor scaling). It looks that it gives conflicts on bus to shared memory ? Thanks for help Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow > > Have a good day, > > Franz > > > --------------------------------------------------------- > Franz Marini > Sys Admin and Software Analyst, > Dept. of Physics, University of Milan, Italy. > email : franz.marini at mi.infn.it > --------------------------------------------------------- > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at eadline.org Mon Jul 16 15:48:53 2012 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 16 Jul 2012 15:48:53 -0400 Subject: [Beowulf] A few Cluster Monkey things ... Message-ID: Happy summer everyone, I have had a poll up for while now on Cluster Monkey asking about social media and HPC. If the interest in this poll is any indication, I think I can guess the final results, but if you have a minute, head on over and take the poll: http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html As always our polls and results are on the site for your viewing. BTW, I think it might be worth while to re-ask some of the older poll questions. http://www.clustermonkey.net/Cluster/HPC-Polls-and-Surveys/ Also, if you have a burning question, let me know I'll put it up as a poll. Finally, while you are there check out the HPC500 program that Intersect360 has launched. Seems interesting and great way to help influence the industry. http://clustermonkey.net/Select-News/are-you-leading-the-hpc-charge.html Thanks! Doug Eadline -- Doug -- Mailscanner: Clean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dnlombar at ichips.intel.com Mon Jul 16 16:20:28 2012 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Mon, 16 Jul 2012 13:20:28 -0700 Subject: [Beowulf] A few Cluster Monkey things ... In-Reply-To: References: Message-ID: <20120716202028.GA29118@nlxcldnl2.cl.intel.com> On Mon, Jul 16, 2012 at 03:48:53PM -0400, Douglas Eadline wrote: > > Happy summer everyone, > > I have had a poll up for while now on Cluster Monkey asking about social > media and HPC. If the interest in this poll is any indication, I think I > can guess the final results, but if you have a minute, head on over and > take the poll: > > http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html Hmmm. This doesn't distinguish usages. It would be nice to see how people view social media as a professional tool. Something like "What kind of social media do you turn to for technical information?" The choices you have for your question fit this, too :) -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Mailscanner: Clean