From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sun, 25 May 103 21:25:17 +0400 (MSD)
Subject: Opteron-based nodes benchmarks: RDTSC
Message-ID: <200305251725.VAA20503@nocserv.free.net>

I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz
server we want to use in Beowulf cluster. In particular, I need to measure
small time intervals, for which I want to use RDTSC-based "function"
(for example I attach below one - published by T.Prince). But it requires
some minor modifications, I beleive, to work properly on x86-64. 
 
 I use gcc-3.2 under SuSE SLES8 and call this function from
 the source compilated by pgf90-5.0beta2 (64-bit mode). The original
 source version of function by T.Prince gives assembler
errors because i386 is not pre-defined. I simple defined
 both i386 and _M_IX86, gcc -c is now OK, it create 64-bit object
 module, but after linking and runs of test the time measured
 is wrong :-( (negative in some cases). 
 
 I'll be very appreciate for any ideas what should I modify
 in the source (applied below) to resolve the problem.
 
 Mikhail Kuzminsky
 Zelinsky Institute of Organic Chemistry
 Moscow
 kus at free.net
> ===================================================
> #define _IFC 1
> 
> #define CLOCK_RATE 1600000000
> /* SET THIS AND RECOMPILE FOR TARGET MACHINE */
> #undef _WIN32
> /* set not to use API calls even on Windows */
>  #ifdef _WIN32
>  #include <windows.h>
>      #endif
> unsigned long long int rdtsc( )
> {
> #ifdef _M_IA64
> 
> unsigned __int64 __getReg(int whichReg);
> #pragma intrinsic(__getReg);
> #define INL_REGID_APITC 3116
> 
>   return  __getReg(INL_REGID_APITC);
> #elif defined(_WIN32)
>  unsigned long long int qpc;
>  (void)QueryPerformanceCounter((LARGE_INTEGER *)&qpc);
>  return qpc;
> #elif defined(__GNUC__)
> #ifdef i386
>    long long a;
>    asm volatile("rdtsc":"=A" (a));
>    return a;
> #else
>  unsigned long result;
> /* gcc-IA64 version */
>  __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: "memory");
>  while (__builtin_expect ((int) result == -1, 0))
>   __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) ::
> "memory");
>  return result;
> 
> #endif
> #elif defined(_M_IX86)
>   _asm
>   {
>    _emit 0x0f /* rdtsc */
>    _emit 0x31
> 
>   }
> return;
> #else
> #error "only supports IA64,IX86,GNUC"
> #endif
> }
> 
> #ifdef _G77
> double g77_etime_0__ (float tarray[2])
> #elif defined (_IFC)
> double g77_etime_0_  (float tarray[2])
> #else
> double g77_etime_0   (float tarray[2])
> #endif
> 
> {
>    static int win32_platform = -1;
>    double usertime, systime;
> 
>      {
>        static double clock_per=1./(long long)CLOCK_RATE;
>        static unsigned long long int old_count;
>        unsigned long long count;
>        if(!old_count){
>  #ifdef _WIN32
>  unsigned long long int qpf;
>  if(QueryPerformanceFrequency((LARGE_INTEGER *)&qpf))
>      clock_per=1./(long long)qpf;
>      #endif
>  old_count=rdtsc();
>  }
> 
>        count = rdtsc();
>        tarray[0] = usertime = (long long)(count - old_count) * clock_per;
>        tarray[1] = 0;
>      }
>    return usertime ;
> 
> }
> 
> #ifdef _G77
> void f90_cputime4__(float *time){ // Intel Fortran call
> #elif defined (_IFC)
> void f90_cputime4_(float *time){
> #else
> void f90_cputime4  (float *time){
> #endif
>   float tarray[2];
> #ifdef _G77
>   *time=(float)g77_etime_0__ (tarray);
> #else
>   *time=(float)g77_etime_0_  (tarray);
> #endif
> }
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 26 May 103 22:43:43 +0400 (MSD)
Subject: Opteron-based nodes benchmarks: RDTSC
In-Reply-To: <200305251725.VAA20503@nocserv.free.net> from "Mikhail Kuzminsky" at May 25, 3 09:56:38 pm
Message-ID: <200305261843.WAA10134@nocserv.free.net>

According to Mikhail Kuzminsky
> 
> I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz
> server we want to use in Beowulf cluster. In particular, I need to measure
> small time intervals, for which I want to use RDTSC-based "function"
> (for example I attach below one - published by T.Prince). But it requires
> some minor modifications, I beleive, to work properly on x86-64. 
>  
  I found now that all is OK if I'm using calls from g77-33
(#define for 386 and _M_IX86 as I wrote in previous message
are enough).

  Mikhail Kuzminsky
  Zelinsky Institute of Organic Chemistry
  Moscow
  kus at free.net

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 11 Jun 103 22:10:00 +0400 (MSD)
Subject: NAS Parallel Benchmarks for Current Hardware
In-Reply-To: <3EE609F7.BE430A1E@ideafix.litec.csic.es> from "A.P.Manners" at Jun 10, 3 05:40:23 pm
Message-ID: <200306111810.WAA01122@nocserv.free.net>

According to A.P.Manners
> 
> I am looking to put together a small cluster for numerical simulation
> and have been surprised at how few NPB benchmark results using current
> hardware I can find via google. 
> 
  It's common situation w/NPB (in opposition to Linpack, SPECcpu e.a.) :-(

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 18 Jun 103 20:05:31 +0400 (MSD)
Subject: what is a flop
In-Reply-To: <3EEF5F48.5020505@roma2.infn.it> from "Roberto Ammendola" at Jun 17, 3 08:34:48 pm
Message-ID: <200306181605.UAA24772@nocserv.free.net>

According to Roberto Ammendola
> The "Floating point operations per clock cycle" depends on the 
> processor, obviously, and on which instructions you use in your code. 
> For example in a processor with the SSE instruction set you can perform 
> 4 operations (on 32 bit register each) per clock cycle. One processor 
> (Xeon or P4) running at 2.0 GHz can reach 8 GigaFlops.
  Taking into account that throughput of FMUL and FADD units in
P4/Xeon is 2 cycles, i.e. FP result may be received on any 2nd sycle
only, the peak Performance of P4/2 Ghz must be 4 GFLOPS.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 18 Jun 103 20:19:35 +0400 (MSD)
Subject: SMP CPUs scaling factors (was "what is a flop")
In-Reply-To: <Pine.LNX.4.44.0306181045210.25735-100000@merlino.mi.infn.it> from "Franz Marini" at Jun 18, 3 10:53:17 am
Message-ID: <200306181619.UAA24910@nocserv.free.net>

According to Franz Marini
> On Tue, 17 Jun 2003, Maurice Hilarius wrote:
> > And I would say dual CPU boards do not sale at a factor of 2:1 over singles.
> > ...
> > As a general ( really general as it changes a lot with code and 
> compilers) 
> > the rule I know :
> > Dual P3 ( VIA chipset): 1.5 : 1
> > Dual XEON P4 ( Intel 7501 chipset): 1.3 : 1
> ... 
> > Dual AthlonMP ( AMD 760MPX chipset) 1.4 : 1
> 
> Does anyone have some real world application figures regarding the 
> performance ratio between single and two-way (and maybe four-way) SMP 
> systems based on the P4 Xeon processor ?
  I may say about SMP speedups for AthlonMP/760MP, for P4 they
will depends from chipset (kind of FSB and memory used). On G98 speedup
for 2 CPUs is between 1.4-1.8 depending from calc. method and problem
size. For Opteron/1.6 Ghz they are higher (up to 1.97 in some G98 tests).
4-way P4 SMP may be not too attractive if 4 CPUs will share common
bus to memory. 4-way Opteron's system must be very good (they may
be will arrive soon in the market).

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 20 Jun 103 17:42:01 +0400 (MSD)
Subject: cluster of AOD Opteron
In-Reply-To: <Pine.LNX.4.40.0306182315030.22310-100000@nietzsche.mit.edu> from "Stefano" at Jun 18, 3 11:22:25 pm
Message-ID: <200306201342.RAA28782@nocserv.free.net>

According to Stefano
> As I am going to receive some funding this fall, I was wondering of buying
> an opteron cluster for my research.
> Mainlym the cluster will run VASP (an ab-initio quantum program,
> written by a group in Wien), with myrinet.
> Is somebody who is using AMD opterons yet ?
  We tested 2-way SMP server based on RioWorks mobo. But I should
not recommend this motherboard for using: by default it has no 
monitoring (temperature etc) chips on the board, it's necessary
to buy special additional card ! Unfortunately as a result I don't
have data about lm_sensors work. Moreover, the choice of SMP
boards is very restricted now: Tyan S2880 and MSI K8D.
> ...
> I think some fortran vendor has announced the port of their F90 to
> the opteron. Well, it would be nice to recompile VASP for 64bits and see
> how fast it goes.
  There is some possibilities: pgf90, Intel ifc(32 bit only), g77-3.3 (now
really is very good, but f77 only) and Absoft. We tested 3 first compilers.
But I'm not sure that you'll receive just now essential speed-up
from 64 bit mode itself. SSE2 is supported in 32 bit mode also, but
it looks that SSE2 in Opteron is realized "more worse" than in P4
(in the sense of microarchitecture).
Yes, some compilers can now generate codes which use additional
registers from x86-64 architecture extensions, but we didn't find
essential speed-up on simple loops like DAXPY. 

> With the itanium2 (compiled in 2 version 32 and 64
> bits), it not so fast to justify the HUGE cost of an itanium cluster.
> Maybe the opteron will shake high-performace scientific computing !
  I beleive yes, but for 64-bit calculations. The price for Opteron-
based servers is high, and price/performance ratio in comparison 
w/Xeon is not clear.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 20 Jun 103 17:57:28 +0400 (MSD)
Subject: cluster of AOD Opteron (Stefano)
In-Reply-To: <000401c33683$aaf403c0$0b01a8c0@redstorm> from "moor007@bellsouth.net" at Jun 19, 3 11:56:02 am
Message-ID: <200306201357.RAA28995@nocserv.free.net>

According to moor007 at bellsouth.net
> I just received my hardware yesterday for my opteron cluster.  My tech will
> start putting it together today or tomorrow.  I am building a 16 CPU cluster
> w/ the 240 processor onboard the Tyan 2880.  I will be using the 2D wulfkit
> running SuSE enterprise server and Portland Group Server for the Opteron.  I
> am hoping it will be fast.  Of course, that is relative.  Anyway, I said all
> that to say that I will begin posting performance benchmarks as they become
> available.
We compared Opteron/1.6 w/dual DDR266 CL2.5 and Athlon MP 1800+ w/close
frequency (1533 MHz) and DDR266 also. Speedup for Gamess-US (ifc 7.1,
opt for P4) and for binary G98 version (pgf77, optimized for PIII) 
on a set of different computational methods (in the sense of cache
localization, memory throughput requirements etc) is about 1.5-1.9. 

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 20 Jun 103 18:09:51 +0400 (MSD)
Subject: [OT] Maximum performance on single processor ?
In-Reply-To: <4.3.2.7.2.20030620140207.00ae23a0@pop.freeuk.net> from "Simon Hogg" at Jun 20, 3 02:15:47 pm
Message-ID: <200306201409.SAA29175@nocserv.free.net>

According to Simon Hogg
> 
> At 14:44 20/06/03 +0200, Marc Baaden wrote:
> >I have an existing application which is part of a project. I have
> >the source code. It is Fortran. It *can* be parallelized, but we
> >would rather spend our time on the other parts of the project
> >which need to be written from scratch *first*.
> >
> >The application is to run in real time, that is the user does something
> >and as a function of user input and the calculation with the fortran
> >program that I described, there is a correponding feedback to the
> >user on the screen (and in some Virtual Reality equipment).
> >
> >Right now, even on simple test cases, the "response time" (eg calculation
> >time for a single step) of our program is on the order of the second.
> >(this is for an athlon MP 2600+)
> >We need to get that down to a fraction of seconds, best milli-seconds,
> >in order to be usable in real time. (makes it a factor of roughly 1000)
> >
> >As I said the code can indeed be parallelized - maybe even simply cleaned
> >up in some parts - but unfortunately there remains very much other important
> >stuff to do. So we'd rather spend some money on a really fast CPU and not
> >touch the code at the moment.
> >
> >So my question was more, what is the fastest CPU I can get for $20000
> >at the moment (without explicitly parallelizing, hyperthreading or
> >vectorizing my code).
> 
> I'm sure some other people will give 'better' answers, but from having a 
> look at your web pages, I would be tempted to go down the route of 
> second-hand SGI equipment.
> 
> For example (and no, I don't know how the performance stacks up, I'm 
> looking partly at a general bio-informatics / SGI link if that makes sense) 
> I can see for sale an Origin 2000 Quad 500MHz / 4GB RAM for UKP 15,725.
  W/o parallelization it looks as bad choice: any CPU will be
more slow than the same Opteron or P4. If FP performance is important,
Power4+ or Itanium 2 (or, more exactly, Madison one month later)
may be the best choice.  And, at least, optimize your program as
possible :-)

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sat, 21 Jun 103 17:48:28 +0400 (MSD)
Subject: cluster of AOD Opteron
In-Reply-To: <005701c33792$c7c1ddf0$6501a8c0@sims.nrc.ca> from "Serguei Patchkovskii" at Jun 20, 3 09:16:44 pm
Message-ID: <200306211348.RAA15586@nocserv.free.net>

According to Serguei Patchkovskii
> for Opteron-
> > based servers is high, and price/performance ratio in comparison
> > w/Xeon is not clear.
> Once you start populating your systems with "interesting" amounts of memory
> (i.e. anything above 2Gbytes), the price difference between dual Opterons
> and
> dual Xeons is really in the noise - at least at the places we buy. If your
> suppliers
> charge you a lot more for Opterons, may be you should look for another
> source?
> 
  There is currently not "too wide" choice of possible sources
of dual Opteron systems now in Russia :-) I agree that high memory
price (for DIMMs from 1 GB, but the price will decrease) lower the
percent of differences in total price, but if you use 512MB DIMMs
for complectation, price difference is essential. Pls sorry: I assume,
that in general the prices here in Russia are similar to other countries,
but I didn't check just now. 

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sat, 21 Jun 103 17:16:15 +0400 (MSD)
Subject: cluster of AOD Opteron
In-Reply-To: <1056121119.9688.7.camel@picard.lab.atipa.com> from "Curt Moore" at Jun 20, 3 09:58:40 am
Message-ID: <200306211316.RAA15134@nocserv.free.net>

According to Curt Moore
> The RioWorks HDAMA (Arima) motherboard does have on-board sensors,
> adm1026 based. 
1) there is no information about environment monitoring chips
   in the HDAMA motherboard guide (at least in the guide we had)
2) sensors-detect utility (I used version from SuSe enterprise Linux
beta-version distribution) didn't find any monitoring chips at the testing
> Arima does have planned both a mini BMC which does just
> management type functions and also a full BMC with will do other neat
> things, I believe, such as KVM over LAN.  Below is a lm_sensors dump
> from an Arima HDAMA.
  It's good. But which lm_sensors version should be used and what are the
necessary settings for lm_sensors kernel modules (taking into account
that lm_sensors didn't find anything ) ?
> 
> adm1026-i2c-0-2c
> Adapter: SMBus AMD8111 adapter at 80e0
> Algorithm: Non-I2C SMBus adapter
> in0:       +1.15 V  (min =  +0.00 V, max =  +2.99 V)
> in1:       +1.59 V  (min =  +0.00 V, max =  +2.99 V)
> in2:       +1.57 V  (min =  +0.00 V, max =  +2.99 V)
> in3:       +1.19 V  (min =  +0.00 V, max =  +2.99 V)
> in4:       +1.18 V  (min =  +0.00 V, max =  +2.99 V)
> in5:       +1.14 V  (min =  +0.00 V, max =  +2.99 V)
> in6:       +1.24 V  (min =  +0.00 V, max =  +2.49 V)
> in7:       +1.59 V  (min =  +0.00 V, max =  +2.49 V)
> in8:       +0.00 V  (min =  +0.00 V, max =  +2.49 V)
> in9:       +0.45 V  (min =  +1.25 V, max =  +0.98 V)
> in10:      +2.70 V  (min =  +0.00 V, max =  +3.98 V)
> in11:      +3.33 V  (min =  +0.00 V, max =  +4.42 V)
> in12:      +3.38 V  (min =  +0.00 V, max =  +4.42 V)
> in13:      +5.12 V  (min =  +0.00 V, max =  +6.63 V)
> in14:      +1.57 V  (min =  +0.00 V, max =  +2.99 V)
> in15:     +11.88 V  (min =  +0.00 V, max = +15.94 V)
> in16:     -12.03 V  (min =  +2.43 V, max = -16.00 V)
> fan0:        0 RPM  (min =    0 RPM, div = 2)
> fan1:        0 RPM  (min =    0 RPM, div = 2)
> fan2:        0 RPM  (min =    0 RPM, div = 2)
> fan3:        0 RPM  (min =    0 RPM, div = 2)
> fan4:        0 RPM  (min =    0 RPM, div = 1)
> fan5:        0 RPM  (min =    0 RPM, div = 1)
> fan6:       -1 RPM  (min =    0 RPM, div = 1)
> fan7:       -1 RPM  (min =    0 RPM, div = 1)
> temp1:       +37?C  (min = -128?C, max =  +80?C)
> temp2:       +46?C  (min = -128?C, max = +100?C)
> temp3:       +46?C  (min = -128?C, max = +100?C)
> vid:      +1.850 V    (VRM Version  9.1)
> 
  Sorry, what does it means ? adm1026 has no enough possibilities
to measure the values (in this case only 3 temperatures but
no any RPM value) or lm_sensors version don't work correctly ?

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 24 Jun 103 20:12:35 +0400 (MSD)
Subject: Opteron (x86-64) compute farms/clusters?
In-Reply-To: <3EF809A4.1050802@dlr.de> from "Thomas Alrutz" at Jun 24, 3 10:19:48 am
Message-ID: <200306241612.UAA09513@nocserv.free.net>

According to Thomas Alrutz
> 
> I just made some benchmarks on a Opteron 240 (1.4 GHz) node running with
> Suse/United Linux Enterprise edition.
> I have sucessfully compiled mpich-1.2.4 in 64 bit without any problems
> (./configure -device=ch_p4 -commtype=shared). The default compiler is
> the gcc-3.2.2 (maybe a Suse patch) and is set to 64Bit, the Portland
> (5.0beta) compiler didn't worked at all !
> 
> I tried our CFD-code (TAU) to run 3 aerodynamik configurations on this
> machine with both CPUs and the results are better then estimated.
> We achieved in full multigrid (5 cycles, 1 equation turbulence model) a
> efficiency of about 97%, 92% and 101 % for the second CPU.
> Those results are much better as the results we get on the Intel Xeons
> (around 50%).
   It looks that this results are predictable: Xeon CPUs require high
memory bandwidth, but both CPUs share common system bus. Opteron CPUs
have own memory buses and scale in this sense excellent. Better SPECrate
results for Opteron (i.e. work on a mix of tasks) confirm (in particular)
this features. CFD codes, I beleive, require high memory throughput ...

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 27 Jun 103 21:01:49 +0400 (MSD)
Subject: Intel PRO/1000CT Gigabit ethernet with CSA
In-Reply-To: <3EFBEA29.60602@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 08:54:33 am
Message-ID: <200306271701.VAA12659@nocserv.free.net>

According to Daniel Pfenniger
> 
> For a small experimental cluster (24 dual Xeon nodes)
> we decided to use InfiniBand technology, which from specs is
> 4 times faster (8Gb/s), 1.5 lower latency (~5musec) than
> Myrinet for approximately the same cost/port.
  Could you pls compare them a bit more detailed ?
Infiniband card costs (as I heard) about $1000-, (HCA-Net from
FabricNetworks, former InfiniSwitch ?), what is close to Myrinet.
But what is about switches (I heard about high prices) ?

In particular, I'm interesting in very small switches;
FabricNetworks produce 8-port 800-series switch, but I don't
know about prices. May be there is 6 or 4 port switches ?

BTW, is it possible to connect pair of nodes by means of
"cross-over" cable (as in Ethernet), i.e. w/o switch ?

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sun, 29 Jun 103 18:14:48 +0400 (MSD)
Subject: Intel PRO/1000CT Gigabit ethernet with CSA
In-Reply-To: <3EFCA093.4090006@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 09:52:51 pm
Message-ID: <200306291414.SAA12281@nocserv.free.net>

According to Daniel Pfenniger
> Patrick Geoffray wrote:
> > On Fri, 2003-06-27 at 13:46, Daniel Pfenniger wrote:
> >>The exact costs are presently not well fixed because several companies
> >>enter the market.  The nice thing about IB is that it is an open
> >>standard, the components from different companies are compatible,
> >>which is good for pressing costs down.
> > 
> > With the slicon coming from one company (actually 2 but the second one
> > does only switch chip), the price adjustment would mainly affect the
> > reseller, where the margin are not that high. I don't expect much a
> > price war in the Infiniband market, mainly because many IB shops are
> > already just burning (limited) VC cash.
> > The main point for price advantage of IB is if the volume goes up. It's
> > a very different problem that the multiple-vendors-marketing-stuff. One
> > can argue that HPC does not yield such high volumes, only a business
> > market like the Databases one does.
> > 
> > Remember Gigabit Ethernet. It was very expensive when the early adopters
> > were the HPC crowd and the price didn't drop until it made its way to
> > the desktop. It's the case for 10GE today.
> > ...
>  > Patrick Geoffray
>  > Myricom, Inc.
> 
> Yes I mostly agree with your analysis, database is the only significant
> potential market for IB.
> 
> However the problem with 1GBE or 10GBE is that the latency remains poor
> for HPC applications, while IB goes in the right direction.
> The real comparison to be made is not between GE and IB, but between
> IB and Myricom products, which belong to an especially protected niche.
> As a result for years the Myrinet products did hardly drop in price
> for a sub-Moore's-law increase in performance, because of a lack of
> competition (the price we paid for our Myricom cards and switch
> 18 months ago is today *exactly* the same).
  I agree with you both. From the viewpoint of HPC clusters the IB
competitor is Myrinet (and SCI etc). But there are many applications
w/coarse-grained parallelism, where bandwidth is the main thing, not the
latency (I think, quantum chemistry applications are bandwidth-
limited). In this case (i.e. if latnecy is less important) 10Gb Ethernet
is also IB competitor. Moreover, IB, I beleive, will be used for
TCP/IP connections also - in opposition to Myrinet etc. (I beleive
there is no TCP/IP drivers for Myrinet - am I correct ?)

Again, from the veiwpoint of some real appilications, there are some
applications which
use TCP/IP stack for parallelization (I agree that is bad, but ...)
- for example Linda tools (used in Gaussian) work over TCP/IP, Gamess-US
DDI "subsystem" works over TCP/IP. In the case of IB or 10Gb Ethernet
TCP/IP is possible.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 3 Jul 103 20:27:51 +0400 (MSD)
Subject: Linux support for AMD Opteron with Broadcom NICs
In-Reply-To: <20030701224808.GA15167@stikine.ucs.sfu.ca> from "Martin Siegert" at Jul 1, 3 03:48:08 pm
Message-ID: <200307031627.UAA02885@nocserv.free.net>

According to Martin Siegert
> 
> Hello,
> I have a dual AMD Opteron for a week or so as a demo and try to install
> Linux on it - so far with little success.
> First of all: doing a google search for x86-64 Linux turns up a lot of
> press releases but not much more, particularly nothing one could download
> and install. Even a direct search on the SuSE and Mandrake sites shows
> only press releases. Sigh.
> Anyway: I found a few ftp sites that supply a Mandrake-9.0 x86_64 version.
> Thus I did a ftp installation which after (many) hickups actually worked.
> However, that distribution does not support the onboard Broadcom 5704
> NICs. I also could not get the driver from the broadcom web site to work
> (insmod fails with "could not find MAC address in NVRAM").
> Thus I tried to compile the 2.4.21 kernel which worked, but
> "insmod tg3" freezes the machine instantly.
> Thus, so far I am not impressed.
> For those of you who have such a box: which distribution are you using?
> Any advice on how to get those GigE Broadcom NICs to work?
  I may only add to the list of AMD64-oriented distributions
Turbolinux 8 for AMD64. I'm not sure that "promotional" version of
Turbolinux is complete enough, but "commercial" version costs
only about $70 (w/o support ;-)).

BTW, does somebody try it ?

We worked w/SuSE SLES8: it looks today as the only "reliable"
choice of 64-bit ditribution :-(

Let me congratulate our colleagues in USA w/4th July !

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 16 Jul 103 18:28:33 +0400 (MSD)
Subject: Global Shared Memory and SCI/Dolphin
In-Reply-To: <200307161516.09818.joachim@ccrl-nece.de> from "Joachim Worringen" at Jul 16, 3 03:16:09 pm
Message-ID: <200307161428.SAA28224@nocserv.free.net>

According to Joachim Worringen
> Franz Marini:
> >   being in the process of deciding which net infrastructure to use for =
> our
> > next cluster (Myrinet, SCI/Dolphin or Quadrics), I was looking at the
> > specs for the different types of hw.
> >   Provided that SCI/Dolphin implements RDMA, I was wondering why so lit=
> tle
> > effort seems to be put into implementing a GSM solution for x86 cluster=
> s.
> 
> Because MPI is what most people want to achieve code- and=20
> peformance-portability.
  Partially I may agree, partially - not: MPI is not the best
in the sense of portability (for example, optimiziation requires
knowledge of interconnect topology, which may vary from cluster to cluster,
and of course from MPP to MPP computer). I think that if there is
relative cheap and effective way to build ccNUMA system from cluster - it may
have success. 
> 
> > The only (maybe big, maybe not) problem I see in the Dolphin hw is the
> > lack of support for cache coherency.
> >
> >   I think that having GSM support in (almost) commodity clusters would =
> be
> > a really-nice-thing(tm).
> 
> Martin Schulz (formerly TU M=FCnchen, now Cornell Theory Center) has deve=
> loped=20
> exactly the thing you are looking for. See=20
> http://wwwbode.cs.tum.edu/Par/arch/smile/software/shmem/ . You will also =
> find=20
> his PhD thesis there which describes the complete software.
> 
> I do not know about the exact status of the SW right now (his approach=20
> required some patches to the SCI driver, and it will probably be necessar=
> y to=20
> apply them to the current drivers). Very interesting approach, though.
> 
> Other, non SCI approaches like MOSIX and the various DSM/SVM libraries al=
> so=20
> offer you some sort of global shared memory - but most do only use TCP/IP=
>  for=20
> communication.
>  Joachim
> Joachim Worringen - NEC C&C research lab St.Augustin
> fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de
> 
  Even hardware implementation of CPUs cache-coherence for large number
of processors may give bottleneck. Broadcasting MOESI gives high coherence
traffic, ccNUMA-systems use directory-based cache-coherence approach.
Software solutions are in general not efficient, but hardware solutions
(if they will exist) will be expensive :-(

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 16 Jul 103 22:31:15 +0400 (MSD)
Subject: Global Shared Memory and SCI/Dolphin
In-Reply-To: <19coKN-5n4-00@etnus.com> from "James Cownie" at Jul 16, 3 04:36:23 pm
Message-ID: <200307161831.WAA02082@nocserv.free.net>

According to James Cownie
> 
> > > Because MPI is what most people want to achieve code- and
> > > peformance-portability.
> 
> >   Partially I may agree, partially - not: MPI is not the best in the
> > sense of portability (for example, optimiziation requires knowledge
> > of interconnect topology, which may vary from cluster to cluster,
> > and of course from MPP to MPP computer).
> 
> MPI has specific support for this in Rolf Hempel's topology code,
> which is intended to allow you to have the system help you to choose a
> good mapping of your processes onto the processors in the system.
  Unfortunately I do not know about that codes :-( but for the best optimization I'll re-build the algorithm itself to "fit" for target topology.   
> 
> This seems to me to be _more_ than you have in a portable way on the
> ccNUMA machines, where you have to worry about
> 
> 1) where every page of data lives, not just how close each process is
>    to another one (and you have more pages than processes/threads to
>    worry about !)
> 
> 2) the scheduler choosing to move your processes/threads around the
>    machine. 
  Yes, but "by default" I beleive that they are the tasks of operating system,
or, as maximum, the information I'm supplying to OS, *after* translation
and linking of the program.
> 
> > I think that if there is relative cheap and effective way to build
> > ccNUMA system from cluster - it may have success.
> 
> Which is, of course, what SCI was _intended_ to be, and we saw how
> well that succeeded :-(
> 
> -- Jim 
> James Cownie	<jcownie at etnus.com>
> Etnus, LLC.     +44 117 9071438
> http://www.etnus.com

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemsitry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 25 Jul 103 20:55:49 +0400 (MSD)
Subject: Infiniband: cost-effective switchless configurations
Message-ID: <200307251655.UAA08132@nocserv.free.net>

  It's possible to build 3-nodes switchless Infiniband-connected
cluster w/following topology (I assume one 2-ports Mellanox HCA card
per node):

    node2 -------IB------Central node-----IB-----node1
     !                                             !
     !                                             !
     ----------------------IB-----------------------

It gives complete nodes connectivity and I assume to have
3 separate subnets w/own subnet manager for each. But I think that
in the case if MPI broadcasting must use hardware multicasting,
MPI broadcast will not work from nodes 1,2 (is it right ?).

OK. But may be it's possible also to build the following topology
(I assume 2 x 2-ports Mellanox HCAs per node, and it gives also
complete connectivity of nodes) ? :


  node 2----IB-------- C e n t r a l  n o d e -----IB------node1
       \              /                      \           /
         \          /                         \         /
           \       /                           \      /
             \--node3                         node4--

and I establish also additional IB links (2-1, 2-4, 3-1, 3-4, not
presenetd in the "picture") which gives me complete nodes connectivity.
Sorry, is it possible (I don't think about changes in device drivers)?
If yes, it's good way to build very small
and cost effective IB-based switchless clusters !

BTW, if I will use IPoIB service, is it possible to use netperf
and/or netpipe tools for measurements of TCP/IP performance ?
       
Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 20 Aug 103 20:09:20 +0400 (MSD)
Subject: SGE on AMD Opteron ?
Message-ID: <200308201609.UAA08558@nocserv.free.net>

   Sorry, is here somebody who
works w/Sun GrideEngine on AMD Opteron platform ?
I'm interesting in any information -
about binary SGE distribution in 32-bit mode,
or about compilation from the source for x86-64 mode,
under SuSE or RedHat distribution etc.

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 22 Aug 103 22:15:01 +0400 (MSD)
Subject: PCI-X/133 NICs on PCI-X/100
Message-ID: <200308221815.WAA27091@nocserv.free.net>

   I'm interesting in any experience
about work of PCI-X/133 NICs with PCI-X/100 slot.

Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards
work w/PCI-X/100 slots on Opteron-based mobos (most of
them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro
mobos) - i.e. how high is the probability that they are
incompatible ?

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemnistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 21 Oct 103 14:49:07 +0400 (MSD)
Subject: parllel eigen solvers
In-Reply-To: <200310201236.28901.kinghorn@pqs-chem.com> from "Donald B. Kinghorn" at Oct 20, 3 12:36:28 pm
Message-ID: <200310211049.OAA18031@nocserv.free.net>

According to Donald B. Kinghorn
> 
> Does anyone know of any recent progress on parallel eigensolvers suitable for 
> beowulf clusters running over gigabit ethernet?
>  It would be nice to have something that scaled moderately well and at least 
> gave reasonable approximations to some subset of eigenvalues and vectors for 
> large (10,000x10,000) symmetric systems.
> My interests are primarily for quantum chemistry.
>
  In the case you think about semiempirical fockian diagonalisation,
there is a set of alternative methods for direct construction of density
matrix avoiding preliminary finding of eigenvectors. This methods
are realized, in particular, in Gaussian-03 and MOPAC-2002 methods.
  
  For non-empirical quantum chemistry diagonalisation usually doesn't limit
common performance. In the case of methods like CI it's necessary to
find only some eigenvectors, and it is better to use special diagonalization
methods. 

  There is special parallel solver package, but I don't have exact
reference w/me :-(

Mikhail Kuzminsky
Zelinsky Inst. of Orgamic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 21 Oct 103 22:10:23 +0400 (MSD)
Subject: parllel eigen solvers
In-Reply-To: <20031021150637.GA8076@plk.af.mil> from "Arthur H. Edwards" at Oct 21, 3 09:06:37 am
Message-ID: <200310211810.WAA08779@nocserv.free.net>

According to Arthur H. Edwards
> 
> I should point out that density function theorcan be compute-bound on
> diagonalization. QUEST, a Sandia Code, easily handles several hundred
> atoms, but the eigen solve dominates by ~300-400 atoms. Thus,
> intermediate size diagonalization is of strong interest.
> 
> Art Edwards
> 
  Yes, I agree w/you about DFT.

Yours
Mikhail Kuzminsky
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 30 Dec 103 18:23:32 +0300 (MSK)
Subject: [Beowulf] X-window, MPICH, MPE, Cluster performance test
In-Reply-To: <E1AavYP-0004YS-JM@maroon.csi.cam.ac.uk> from "D. Scott" at Dec 29, 3 11:27:21 am
Message-ID: <200312301523.SAA06085@nocserv.free.net>

According to D. Scott
> 
> At last! My cluster is now online. I would like to thank everyone for they 
> help. I thinking of putting a website together covering my experience in 
> putting this cluster together. Will this be of use to anyone? Is they 
> website that covers top 100 list of small cluster?.
> Now it is online I would like to test it.
> 
> MPICH comes with test program, eg mpptest. Programs works and it produce 
> nice graph. Is they any documentation/tutorial that explains meaning of 
> these graphs?
> MPICH also comes with MPE graphic test programs, mandel. Problem is that I 
> have only got X-window installed on the master node. But, when I run 
> pmandel, it returms an error, staying that it can not find shared library 
> for X-window on other nodes. How can I make X-window shared across other 
> nodes from the Master node?
  You may use NFS for access to master node.

> Same me install GUI programs on other nodes.
> This could be related problem, but when I complied life (that uses MPE 
> libraries) it returns error that MPE libraries are undefined. Any ideas?
> Can I install both LAM/MPICH and MPICH-1.2.5 on the same machine?
  Yes, of course you may work w/both LAM and MPICH.

BTW, let me congratulate Beowulf maillist subscribers w/New Year !

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 23 Jan 104 15:35:32 +0300 (MSK)
Subject: [Beowulf] cluster on suse
In-Reply-To: <Pine.LNX.4.33.0401231039160.24098-100000@mecheng.iisc.ernet.in> from "Anand TNC" at Jan 23, 4 10:40:43 am
Message-ID: <200401231235.PAA05593@nocserv.free.net>

According to Anand TNC
> 
> Hi,
> 
> I'm new to clustering...does anyone know of some clustering software which 
> works on Suse 8.2 or Suse 9.0?
  All of the usual cluster software will work succesfully w/SuSE Linux.
If you say about software *included* in distribution as RPM-packages,
then also yes, SuSE Linux has most important things such as MPI for example.

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

> 
> Thanks
> 
> regards,
> 
> Anand
> 
> -- 
> Anand TNC
> PhD Student,                         
> Engine Research Laboratory           U-55 IISc Hostels,
> Dept. of Mechanical Engg.,           Indian Institute of Science,
> Indian Institute of Science,         Bangalore 560 012.
> Bangalore 560 012.                   Ph: 080 293 2591
> Lab Ph: 293 2352                         080 293 2624
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 10 Feb 104 21:27:22 +0300 (MSK)
Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> from "=?big5?q?Andrew=20Wang?=" at Feb 10, 4 11:42:32 am
Message-ID: <200402101827.VAA05978@nocserv.free.net>

According to =?big5?q?Andrew=20Wang?=
> From comp.arch: "One of the things that the version
> 8.0 of the Intel compiler included was an
> "Intel-specific" flag."
> 
> But looks like the purpose is to slow down AMD:
> http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com
> 
> If intel releases 64-bit x86 CPUs and compilers, then
> AMD may get even better benchmarks results.
  The danger of this "slow-down" is not too extremally large now:
SPECcpu2000 results (perhaps the best obtained) published for
"high-end" Opterons are based on Portland compiler, not on ifc.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

> 
> Again, no matter how pretty the benchmarks results
> look, in the end we still need to run on the real
> system. So, what's the point of having benchmarks?
> 
> Andrew.
> 
  
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 14 May 104 22:27:21 +0400 (MSD)
Subject: [Beowulf] Athlon64 / Opteron test
In-Reply-To: <40A4E4D8.9010001@mscsoftware.com> from "Joe Griffin" at May 14,
	4 08:25:12 am
Message-ID: <200405141827.WAA12362@nocserv.free.net>

According to Joe Griffin
> 
>  ... 
> Below is a web site comparing IA32, IA64 (linux and HPUX),  Opteron
> and an IBM P655 running AIX.   The site should only be used to
> compare hardare platforms when running our software.   I am sure
> that Fluent, LSTC/Dyna, Star-CD have similar sites.  I recomend
> finding out about the software that you will be using.
> 
> MSC.Nastran Hardware comparison:
> 
> http://www.mscsoftware.com/support/prod_support/nastran/performance/v04_sngl.cfm
> 
> Regards,
> Joe Griffin
> 
  This page contains very interesting tables w/description of hardware
used, but at first look I found only the data about OSes, not about compilers/run time libraries used. The (relative bad) data for IBM e325/Opteron 2 Ghz
looks "nontrivial"; I beleive some interptretation of "why?" will be helpful.
May be some applications used are relative cache-friendly and have working set
placing in large Itanium 2 cache? 

May be it depends from compiler and Math library used ? BTW, for LGQDF test:
I/O is relative small (compare pls elapsed and CPU times which are very close);
but Windows time for Dell P4/3.2 Ghz (4480 sec) is much more worse than
for Linux on the same hardware (3713 sec). IMHO, in this case they
 must be very close in the case of using same comlilers&libraries
 (I don't like Windows, but this result is too bad for this OS :-))

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 10 Jun 104 19:11:31 +0400 (MSD)
Subject: [Beowulf] Setting memory limits on a compute node
In-Reply-To: <Pine.GSO.4.58.0406081033210.22717@is.rice.edu> from "Brent M.
	Clements" at Jun 8, 4 10:42:43 am
Message-ID: <200406101511.TAA17314@nocserv.free.net>

According to Brent M. Clements
> 
> We have a user who submits a job to a compute node.
> 
> The application is gaussian. The parent gaussian process can spawn a few
> child processes. It appears that the gaussian application is exhausting
> all of the memory in the system essentially stopping the machine from
> working. You can still ping the machine but can't ssh. Anyway's I know the
> fundementals of why this is happening. My question, is there any way to
> limit a user's total addressable space that his processes can use so that
> it doesn't kill the node?
  This situation may depends strongly from real method of calculation used
in frames of Gaussian (and may be from objects of calculations, i.e. molecules).
We work w/G98 (I beleive G03 will have the same behaviour) jobs and didn't
have like problems. 
  You may try to restrict (if it's really necessary) the memory used for
particular Gaussian job by means of setting up of %mem value in the input
Gaussian data; there is also default settings for %mem value in gaussian
 configuration file. G98 can't exceed %mem value.

  We inform our G98 users about upper limit of %mem value which don't leads
to high paging. You may also try to setup ulimit/limit values for stack and data
in the shell script used for G98 job submitting .

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 16 Jun 104 20:05:24 +0400 (MSD)
Subject: [Beowulf] CCL:Experiences with 64 bits AMD processors (fwd from
In-Reply-To: <20040616042135.GH12847@leitl.org> from "Eugen Leitl" at Jun 16,
	4 06:21:35 am
Message-ID: <200406161605.UAA24654@nocserv.free.net>

According to Eugen Leitl
> 
> 
> From: Marc Noguera Julian <marc at klingon.uab.es>
> Date: Tue, 10 Jun 2003 19:09:00 +0200
> To: chemistry at ccl.net
> Subject: CCL:Experiences with 64 bits AMD processors
> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113
> 
> Hello,
> we  are interested in buying some more computational resources. In our
> group we are interested in 64 bit AMD processors, but we do not know
> about their compatibility. They are supposed, as AMD says,  to be32 bit
> compatible and therefore AMD 64 bit processor should be able to run any
> 32 bit application. Is that true? Any experience about this will help us
> a lot.
  We run, in particular, Gaussian-98 (32 bit binary version) on Opteron
servers w/SuSE SLES8.
> By the way, we are running mainly gaussian jobs, and have some other 32
> bit binaries like turbomole and jaguar. We have source code license for
> gaussian 03. Has anyone tried to compile Gaussian 03 for a AMD 64 bit
> machine? Do 32 bit pentium binaries run correctly on a 64 bit processor
> which is the increase on the performance? 
  Yes, G03 is compiled at least by Gaussian, Inc itself: there is G03
64-bit binary version for Opteron in the price list. We have significant
speed-up on Opteron in comparison w/Athlons.
 We run also 32-bit binaries codes translated for Pentium on Opteron.
> Do Turbomole and Jaguar
> binaries run on 64 bit AMD processors?  anyone tried?
> Any information will be helpful.
> Thanks a lot
> Marc
> 
> ---------------------------
> Marc Noguera Julian
> Thcnic Especialista de Suport a la Recerca
> Qummica Fisica, Universitat Autrnoma de Barcelona.
> Tlf: 00-34-935812173
> Fax: 00-34-935812920
> e-mail: marc at klingon.uab.es
> ---------------------------------------
> 
> Eugen* Leitl <a href=3D"http://leitl.org">leitl</a>
> ______________________________________________________________
> ICBM: 48.07078, 11.61144            http://www.leitl.org
> 8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
> http://moleculardevices.org         http://nanomachines.net
> 

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 18 Jun 104 20:15:23 +0400 (MSD)
Subject: [Beowulf] cluster on Mellanox Infiniband
Message-ID: <200406181615.UAA19878@nocserv.free.net>

  We are purchasing a pair of Mellanox Infiniband 4x HCA cards
(PCI-X/133) for building of small 2-nodes 4-processor switchless
testing cluster on the base of AMD Opteron w/Tyan S2880 boards.
The nodes work under SuSE Linux 9.0 for AMD64.

I'll be very appreciate in receiving any information about following:

1) Do we need to buy some additional software from Mellanox ?
(like THCA-3 or HPC Gold CD Distrib etc)

2) Any information about potential problems of building and using
of this hard/software. 

To be more exactly, we want to install also MVAPICH (for MPI-1) or
new VMI 2.0 from NCSA for work w/MPI. 

For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for
installation. But I don't know, will we receive this software w/Mellanox
cards or we should buy this software additionally ?

I need this data badly, because we are very restricted in money ;-) !

Thanks for your help !

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:09:39 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 21 Jun 104 17:46:23 +0400 (MSD)
Subject: [Beowulf] cluster on Mellanox Infiniband
In-Reply-To: <Pine.LNX.4.53.0406210942390.2631@merlino.mi.infn.it> from "Franz
	Marini" at Jun 21, 4 10:24:58 am
Message-ID: <200406211346.RAA17895@nocserv.free.net>

According to Franz Marini
> Hi,
> 
> On Fri, 18 Jun 104, Mikhail Kuzminsky wrote:
> 
> > 1) Do we need to buy some additional software from Mellanox ?
> > (like THCA-3 or HPC Gold CD Distrib etc)
> 
> You shouldn't have to.
  Thank you VERY much for your fast reply !! I'm glad to hear ...
> > 2) Any information about potential problems of building and using
> > of this hard/software. 
> 
> > To be more exactly, we want to install also MVAPICH (for MPI-1) or
> > new VMI 2.0 from NCSA for work w/MPI. 
> > For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for
> > installation. But I don't know, will we receive this software w/Mellanox
> > cards or we should buy this software additionally ?
> 
> Hrm, no, VMI 2.0 doesn't require neither THCA-3 nor HPC Gold CD (whatever 
> it is ;)). 
  The NCSA site for VMI says "Infiniband device is linked against THCA-3.
OpenIB device is linked using HPC Gold CD distrib". What does it means ?
I must install VMI for Opteron + SuSE 9.0, there is no such binary RPM,
i.e. I must install VMI from the source. I thought that I must use software
cited above for building of my bibary VMI version. 
   I beleive that Software/Driver THCA Linux 3.1.1 will be delivered w/Mellanox
cards. OpenSM 0.3.1 - I hope, also.
 But I don'n know nothing about "HPC Gold CD distrib" :-(
> 
> We have a small (6 dual Xeon nodes, plus server) testbed cluster with 
> Mellanox Infiniband (switched, obviously). 
> 
> So far, it's been really good. We tested the net performance with SKaMPI4 
> ( http://liinwww.ira.uka.de/~skampi/ ), the results should be in the 
> online db soon, if you want to check them out.
> 
> Seeing that you are at the Institute of Organic Chemistry, I guess you're 
> interested in running programs like Gromacs or CPMD. So far both of them 
> worked great with our cluster, as far as only one cpu per node is used 
> (running two different runs of gromacs and/or CPMD on both cpus on each 
> node gives good results, but running only one instance of either program 
> on both cpus on each node results in very poor scaling).
  It looks that it gives conflicts on bus to shared memory ?

Thanks for help
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
> 
> Have a good day,
> 
> Franz 
> 
> 
> ---------------------------------------------------------
> Franz Marini
> Sys Admin and Software Analyst,
> Dept. of Physics, University of Milan, Italy.
> email : franz.marini at mi.infn.it
> --------------------------------------------------------- 
> 
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at eadline.org  Mon Jul 16 15:48:53 2012
From: deadline at eadline.org (Douglas Eadline)
Date: Mon, 16 Jul 2012 15:48:53 -0400
Subject: [Beowulf] A few Cluster Monkey things ...
Message-ID: <dfe242fa86a07c88958b20550b1f6139.squirrel@mail.eadline.org>


Happy summer everyone,

I have had a poll up for while now on Cluster Monkey asking about social
media and HPC. If the interest in this poll is any indication, I think I
can guess the final results, but if you have a minute, head on over and
take the poll:

  http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html

As always our polls and results are on the site for your viewing.
BTW, I think it might be worth while to re-ask some of the older
poll questions.

  http://www.clustermonkey.net/Cluster/HPC-Polls-and-Surveys/

Also, if you have a burning question, let me know I'll put it
up as a poll.

Finally, while you are there check out the HPC500 program that
Intersect360 has launched. Seems interesting and great way to help
influence
the industry.

  http://clustermonkey.net/Select-News/are-you-leading-the-hpc-charge.html

Thanks!

Doug Eadline


--
Doug

-- 
Mailscanner: Clean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dnlombar at ichips.intel.com  Mon Jul 16 16:20:28 2012
From: dnlombar at ichips.intel.com (David N. Lombard)
Date: Mon, 16 Jul 2012 13:20:28 -0700
Subject: [Beowulf] A few Cluster Monkey things ...
In-Reply-To: <dfe242fa86a07c88958b20550b1f6139.squirrel@mail.eadline.org>
References: <dfe242fa86a07c88958b20550b1f6139.squirrel@mail.eadline.org>
Message-ID: <20120716202028.GA29118@nlxcldnl2.cl.intel.com>

On Mon, Jul 16, 2012 at 03:48:53PM -0400, Douglas Eadline wrote:
> 
> Happy summer everyone,
> 
> I have had a poll up for while now on Cluster Monkey asking about social
> media and HPC. If the interest in this poll is any indication, I think I
> can guess the final results, but if you have a minute, head on over and
> take the poll:
> 
>   http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html

Hmmm. This doesn't distinguish usages. It would be nice to see how
people view social media as a professional tool.  Something like
"What kind of social media do you turn to for technical information?"
The choices you have for your question fit this, too :)

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Mailscanner: Clean


From newton at netific.com.netific.com  Mon Jul 23 12:44:16 2012
From: newton at netific.com.netific.com (Wing Newton)
Date: Mon, 11 Sep 100 08:02:09 -0700 (PDT)
Subject: multi-ethernets LAN
Message-ID: <200009111502.IAA30083@ws132.netific.com>

Greetings,

	I am looking a Linux driver for combining multiple ethernet segments
into 1 LAN using several ethernet cards  to scale the LAN
bandwidth from 10/100 to x*10/100. 

	Thank you for your help.

	Newton

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf


From newton at netific.com.netific.com  Mon Jul 23 12:44:16 2012
From: newton at netific.com.netific.com (Wing Newton)
Date: Mon, 11 Sep 100 08:02:09 -0700 (PDT)
Subject: multi-ethernets LAN
Message-ID: <200009111502.IAA30083@ws132.netific.com>

Greetings,

	I am looking a Linux driver for combining multiple ethernet segments
into 1 LAN using several ethernet cards  to scale the LAN
bandwidth from 10/100 to x*10/100. 

	Thank you for your help.

	Newton

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 25 Sep 100 23:04:28 +0400 (MSD)
Subject: kickstart RH6.2 installation problems
Message-ID: <200009251904.XAA01603@nocserv.free.net>

       Dear netters,

I'm installing RH6.2 on the nodes of new Beowulf cluster.
I want to do kickstart RH 6.2 installation on HDDs, having
the necessary partitions (but they are w/o ext2fs created).
 
But after start of installation I receive following
traceback messages:

File /usr/bin/anaconda, line 341 in ?
     extraModules=extraModules)
File /usr/lib/anaconda/todo.py, line 332 in __init__
     self.setClass(instClass)
File /usr/lib/anaconda/todo.py, line 822, in setClass
     todo.addmount(dev,mntpoint,fstype,reformat)
File /usr/lib/anaconda/todo.py, line 395, in addMount

and install exited abnormally. It looks that
the problems are w/partitions. I'll be very appreciate in
ideas what is reason of errors.

ks.cfg file contents is :

lang en_US
network --bootproto static --ip 192.168.0.10 --netmask 255.255.255.0 --gateway 192.168.0.4
### Source File Location
cdrom
keyboard us
### Partitioning Information
#zerombr yes
zerombr no
#clearpart --linux
part / --size 141 --onpart hda2
       ^^^^^^^^^^
      (the result dosn't depend from the presence of size keywords)
part swap --size 133 --onpart hda3
part /usr --size 3004 --onpart hda5
install
### Mouse Configuration
mouse genericps/2 --emulthree
### Time Zone Configuration
timezone --utc US/Eastern
### X Configuration
xconfig --vsync 60
### Root Password Designation
rootpw paSSword
### Authorization Configuration
auth --useshadow --enablemd5
### Lilo Configuration
lilo --linear --location mbr
### Package Designation
%packages
@ Base
chkfontpath
groff-perl
...
### Commands To Be Run Post-Installation
%post
echo "This is in the chroot" > /tmp/message

Thanks for your help !

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
 

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 5 Nov 101 23:11:12 +0300 (MSK)
Subject: Athlon MP vs Athlon XP
Message-ID: <200111052011.XAA04207@nocserv.free.net>

    Dear colleagues,

I think about buying of Tyan S2460 motherboards for Beowulf.
According the data I have, Athlon XP (Palomino core) microprocessors
can work successfully w/this mobos.

But there is also Athlon MP microprocessors w/same Palomino core 
w/same OPGA package w/same voltages and w/same frequencies beginning
from 1333 (1500+). They costs, as I understand, higher than corresponding
MP models.

Sorry, what is the difference between MP and XP chips ? Both,
if my source was correct, supports cache coherence.

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 6 Nov 101 19:59:45 +0300 (MSK)
Subject: Athlon MP vs Athlon XP
In-Reply-To: <3C4B2812.D133809D@lnxi.com> from "Patrice Duffort" at Jan 20, 2 01:26:58 pm
Message-ID: <200111061659.TAA10767@nocserv.free.net>

According to Patrice Duffort
> 
> Dear Mikhail,
> 
> The XPs and MPs have the same core but the XP is essentially a crippled MP.  Lesser overall performance.
>
  I'm sorry, do you have some test results of performance or some data
about microarchitecture difference ?

It's not too obviously how to prepare chip w/decreased MP performance,
but working correctly in SMP environment. For example, I should
suppress something like split transactions handling etc. It looks too
expensive to prepare special chip with a bit different microarchitecture.

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 3 Dec 101 20:02:08 +0300 (MSK)
Subject: GCC/Fortran 90/95 questions
In-Reply-To: <20011201035522.85826.qmail@web14706.mail.yahoo.com> from "Ron Chen" at Nov 30, 1 07:55:22 pm
Message-ID: <200112031702.UAA12622@nocserv.free.net>

According to Ron Chen
> There is a compiler called open64, which is SGI's
> compiler for IA64. They have a C front-end, which is
> based on gcc, and they have another for f90. (I don't
> know the details...)
> Recently, they have ported the f90 front-end and
> run-time to other compiler back-ends. Please read the
> note below for details.
> http://open64.sourceforge.net/
> http://sourceforge.net/tracker/?group_id=34861&atid=413342
> 
> ...
> 
> ===========================================================
> Porting open64 F90 front-end to Solaris 
> This patch ports the open64 Fortran90 compiler front 
> end to sparc_solaris platform. Specifically, it ports 
> these three executable programs: "mfef90", "ir_tools",
> and "whirl2f". ANY OTHER COMPONENT OF OPEN64 IS NOT IN
> THE SCOPE OF THIS PATCH.
> Tested platforms include  sparc_solaris, mips_irix and
> ia32_linux, using both GNU  gcc and vendor compiler.
> Makefiles, some header files  and some c/c++ source
> files were modified for porting.  
   It's very interesting information. As I know,
SGI discontinuued the development and support of
SGI Pro64 developmnet tools. Sorry, where you found the data
about IA-32 /Linux platform support by open64? At the first look
I don't see them on references you sent :-(

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 10 Dec 101 21:15:25 +0300 (MSK)
Subject: NetGear FA-31x
In-Reply-To: <Pine.LNX.4.10.10112101236240.719-100000@vaio.greennet> from "Donald Becker" at Dec 10, 1 12:41:03 pm
Message-ID: <200112101815.VAA06836@nocserv.free.net>

According to Donald Becker
> On Mon, 10 Dec 2001, Javier Iglesias wrote:
> > There was a post some time ago that mentioned problems 
> > using NetGear FA311 NICs 
> > (-> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=5)
> > with some "specific AMD chipsets"
> > (-> http://www.beowulf.org/pipermail/beowulf/2001-October/001668.html)
> That wasn't a specific report.  It was pretty much "something doesn't
> work". 
> 
> > We are experiencing some problems getting the highly 
> > recommended FA310 cards
> > (-> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4)
> 
> The FA310 (Lite-On PNIC-2 or ADM Comet chip) is completely unrelated to
> the FA311 (NatSemi DP83815 chip).  Any problem common between the two is
> likely from the motherboard, not the NIC.
  I heard about some problems w/south bridge on Tyan Tiger MP,
but (if I'm correct) somebody (sorry, I don't remember) wrote in
our mailing list that some GigE NICs, in particular from Intel,
works successfully w/Tiger MP. We have both (Tiger MP and Intel
Pro 1000 T), but didn't check them.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 11 Jan 102 20:00:50 +0300 (MSK)
Subject: Fastest Intel Processors
In-Reply-To: <Pine.LNX.4.33.0201110813560.18760-100000@twin.uoregon.edu> from "Joel Jaeggli" at Jan 11, 2 08:38:35 am
Message-ID: <200201111700.UAA11054@nocserv.free.net>

According to Joel Jaeggli
> 
> There are 2.2ghz p4's, these are based on the .13 micron northwood core 
> rather than the willamete. to date I haven't heard of anyone having issues 
> with these... drop one on your socket 478 mainbaord and go to town... ;)
> 
   As I understand, It'll be not right for all the motherboards.
Northwood will have different available voltage values for different I
(ampers), so you need really special VRM version which may be not
present on your motherboard. At least for Tualatin core it's just
as I said.

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 7 Feb 102 22:50:17 +0300 (MSK)
Subject: PCI-64: how to find
Message-ID: <200202071950.WAA27670@nocserv.free.net>

   Dear netters.

I want to find some confirmation that my installed
RH 7.2 "understand" that it works with PCI-64.
(We are using some dual mobos from Tyan which supports
64-bit PCI slots). We have Intel Pro1000T NICs installed
in PCI-64 slots, but I didn't find any information
that Linux works in "PCI-64" mode.

May be this information  must be presented somewhere
(/proc/pci, /var/log/messages etc ) ?

Thanks for help !

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 15 Feb 102 22:58:29 +0300 (MSK)
Subject: Intel Pro/1000T frames dropping&overruns
Message-ID: <200202151958.WAA27863@nocserv.free.net>

    Dear netters,

we are testing connection of 2 dual nodes w/Pentium III Tualatin
1266 Mhz CPUs (Tyan Thunder S2518 mobos). Both nodes uses
Intel Pro/1000T NIC installed on PCI-64/66 Mhz slots.
(They are "old" cards: we buy them in begin of 2001;
if I'm correct, Intel produce now new modification).
RH 7.2 (kernel 2.4.7-10enterprise, driver e1000 from RH 
distribution) is installed on both nodes.

We found that ping -s 2048 or TCP_STREAM tests of netperf-2.2alpha
leads to "hang up" of connection.
Ifconfig says that there is Rx errors: for something
about 300 packets received we have about 30 dropped and overrruns.
Setting 
options e1000 RxIntDelay=nn

(nn was decreased from 64 to 48, 32, 8, 1,0) didn't help.
Setting of Jumbo=0 don't helps also.

If we transmit small packages (usual ping w/o -s), there is no problems.
Both NICs worked successfully on Athlon/700 Mhz 1-CPU nodes
(but w/PCI-32 and 33 Mhz). I beleive that PIII/1266 CPU
performance is enough for GigE; but I don't know what may
be the other source of packets droppings.

Should I try new e1000 version from Intel ?
I'll be very appreciate in any ideas how to improve the situation.

Yours
Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 15 Mar 102 22:03:14 +0300 (MSK)
Subject: gige benchmark performance
In-Reply-To: <Pine.LNX.4.21.0203142026370.1027-100000@famine.cs.utah.edu> from "Mark Hartner" at Mar 14, 2 08:40:57 pm
Message-ID: <200203151903.WAA02340@nocserv.free.net>

According to Mark Hartner
> 
> For the Intel Pro/1000T and Netgear GA620 there was only a slight 
> performance difference between 32 and 64 bit PCI (the 64bit slot did
> slightly better).
  We found very big difference between 32-bit PCI on Athlon/700 Mhz
nodes (Gigabyte GA-7VX mobos)
 and 64-bit/33 Mhz PCI on Tyan S2460 for Intel Pro/1000T cards.
32-bit PCI gives for netperf TCP_STREAM tests only about 300 Mbit/s,
but on S2460 we received excellent results - about 910 Mbit/s for
TCP_STREAM.
  The tests looks as not CPU bound.

  Theoretically it may be also due to difference in software :
for Athlon/700 we tested RH 6.2 (2.2.14-5 kernel) and RH 7.1
(if I remember correctly, kernel 2.4 was standard in distribution)
w/3.0.10 version of e1000.

On S2460 we worked w/RH 7.2 (2.4.7-10) and e1000-drivers 3.1.22 and 4.0.7
(last was found as much more stable, but it's other talk...).

  But I beleive that the difference is too high and the reason of
difference is hardware (if there is no problems w/south bridge
on GA-7VX, it must be the difference in PCI buses).

Mikhail Kuzminsky,
Zelinsky Institute of Organic Chemistry,
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 26 Mar 102 22:59:52 +0300 (MSK)
Subject: LFS and Fortran in Scyld Beowulf
In-Reply-To: <3CA0875E.3030604@mscsoftware.com> from "Joe Griffin" at Mar 26, 2 06:36:14 am
Message-ID: <200203261959.WAA08503@nocserv.free.net>

According to Joe Griffin
> 
> g77 may be recompiled for large files.
> We use: -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
> This allows:
> Formatted/Sequential   > 4 Gb
> Unformatted/Sequential < 4 Gb
> Unformatted/Direct     < 4 Gb
> Not quite limitless, but is gains from
> the 2 Gb limit.
  I think that like restrictions are very importatn for many
"beowulfers". By my opinion, large unformatted files are more
important than formatted, and 4 Gbytes restriction is inappropriate.
Of course, below I say about 32-bit CPUs( IA-32).

There is 2 different ways to supprot large files:
a) use special subset of system calls to open/read/write
   which allows to work with large files. This leads
  potentially to possible changes in the source.
  
  The pluses of this way is that they may be realized for more old
versions of kernel.

b) Use modern features of kernels 2.4.x and ext3fs 

   Unfortunately I'm not familiar w/restrictions in file sizes
at usual system calls in this environment. But theoreticaly
it's clear that I want to have large files w/usual open/read/write.
Sorry, if you say about restrictions in sizes (here and below) -
what do you mean - the ways w/changing of compiler source (and
run-time library sources) for more old kernels, or it's necessary
also for 2.4.x&ext3fs ?

> The CURRENT Intel compiler - 5.0.1 has a
> 2 Gb limit.  The next release to come out
> at the end of this month (6.0) will have
> the limit removed.
> 
  
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 27 Mar 102 20:50:29 +0300 (MSK)
Subject: LFS and Fortran
Message-ID: <200203271750.UAA16216@nocserv.free.net>

> On Tue, 26 Mar 102, Mikhail Kuzminsky wrote:
> > According to Joe Griffin
> > > g77 may be recompiled for large files.
> > > We use: -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
> ..
> > > Unformatted/Sequential < 4 Gb
> >   I think that like restrictions are very importatn for many
> > "beowulfers". By my opinion, large unformatted files are more
> > important than formatted, and 4 Gbytes restriction is inappropriate.
> 
> LFS support is much more important for Beowulf systems than the average
> workstation user.
> 
> > a) use special subset of system calls to open/read/write
> ..
> > b) Use modern features of kernels 2.4.x and ext3fs
> 
> The LFS kernel support has some limitations, many of which remain true
> for 2.4 kernels.  First, the offset is really only 40/41/42 bits, not 64
> bits, because we still use 32 bit block offsets with 512/1K blocks.
> Very few places have a single file larger than 4TB, so this isn't a
> current problem.
> 
  In many cases 40-42 bits is enough (at least it gives more disc space
than it exist on the node - if I use some modern HDDs per node).

  My questions concern more to Linux itself than to Scyld distribution.
There is 3 possible "levels" of large files support for application
programs written on F77

a) I use statically linked binaries created under more old Linux versions. 
 I suppose that this binaries use standard run-time libraries w/o 
special 64-bit open/read/write calls.
 
Is it possible to work w/large files if I'll run this binaries on
2.4.x w/ext3 or ext2 ?

b) I use the same binaries but w/dynamic linking, and may change 
g77/pgf77/ifc run-time library to more new version. 

Is it enough for work w/large files under 2.4.x w/ext3 or ext2 ?

c) It's possible to re-translate f77 source with some modern
version of compiler to receive large files support.

It's clear now that g77 3.1 (or specially precompiled but more old g77 version)
and ifc 6.0 will be enough ; but was is about pgf77 versions ?
 

Yours
Mikhail Kuzminsky
Zelisnky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 27 Mar 102 21:26:55 +0300 (MSK)
Subject: e1000-4.1.7 (new driver version)
Message-ID: <200203271826.VAA16681@nocserv.free.net>

I've received today the information about availability
of new version of e1000 driver for Intel Pro/1000T NICs.

Taking into account that upgrade from more old to previous version
of e1000 was very important for our cluster based on Athlon
Tyan S2460 nodes (w/o this upgrade netperf tests simple hang-up
connection after a series of packets droppings and overruns;
but the problems remain on dual PIII Tualatin Tyan mobos),
this information may be interesting also for subscribers of Beowulf
mail list.

According to issuppor at mailbox.cps.intel.com
> From issuppor at mailbox.cps.intel.com Wed Mar 27 07:12:41 2002
> Date: Wed, 27 Mar 2002 04:12:37 GMT
> Subject: RE: Re: Re: Re: Re: e1000-4.0.7 installation
> From: issuppor at mailbox.cps.intel.com
> Reply-To: "joseph m" <issuppor at mailbox.cps.intel.com>
> To: kus at free.net
> 
> We just released version 4.1.7 of the driver. There have been a few performance cleanups that should speed up the driver.
> 
> http://support.intel.com/support/go/linux/e1000.htm
> 
  
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 3 Apr 102 16:06:17 +0400 (MSD)
Subject: Hyperthreading in P4 Xeon 
Message-ID: <200204031206.QAA10335@nocserv.free.net>

According to William Park
> From beowulf-admin at beowulf.org Wed Apr  3 11:52:00 2002
> From: William Park <opengeometry at yahoo.ca>
> To: beowulf at beowulf.org
> Subject: Hyperthreading in P4 Xeon (question)
> 
> What is the realistic effect of "hyperthreading" in P4 Xeon?
  I didn't see any data about applications  which are typical 
for clusters. But there is some other results on Intel Web-site.
The success will depend from application strongly. 
For example, if you have an application, which need full
cache size for working set of pages, the perfornmance of like
application will degrade because at simultaneous running of
2 processes the cache will share.
>  I'm not
> versed in the latest CPU trends.  Does it mean that dual-P4Xeon will
> behave like 4-way SMP?
  Yes, every physical CPU is equal to 2 logical CPUs, and you
may use OpenMP etc.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 16 Apr 102 23:02:59 +0400 (MSD)
Subject: again OpenPBS vs SGE
Message-ID: <200204161902.XAA04166@nocserv.free.net>


    I'm in process of choice of *free* batch queue system
for new Linux cluster(s). We are using GNQS on many SMP
systems and we are happy with it, but GNQS isn't develop
now. 
    Real competition is, IMHO, between OpenPBS and
Codine/SGE (which was very praised early in our maillist, in particular, 
by Chris Black).

Some comparisons are presented by Omar Hassaine from Sun 
(www.sun.com/products-n-solutions/edu/hpc/presentations/june01/
omar_hassaine.pdf). IMHO, some of this estimations are inconsistent
w/some Chris Blake statements. So, I'll try below to formulate
shortly few (looking as important for me) advantages and
disadvantages of OpenPBS and SGE. I'll be very appreciate in
any remarks, opinions etc (especially  were I'm wrong).

I.Some PBS minuses.

1) The main is instable work of deamons
2) PBS don't support user checkpoint migartion.
For example, I run Gaussian98 job (which creates own
checkpoint file) on one node, and there is now subsequent
G98 job which may run on other (free) node, but this other
node don't have the necessary G98 checkpoint file
3) Absence of interface w/Globus Grid - if it's Open PBS
(not PBSpro).

II. Some PBS pluses

- it looks as most popular for Linux clusters 
- it's possible to receive job from one node and send it to run
  on other node of *other cluster*
  
III. Some SGE minuses


1) Do not support "multiclustering"
2) The schedule algorithms are restricted to only one
   default (this is inconsistent w/Chris Black message, as
   I understand)

IV. Some SGE pluses

1) Reliable work
2) Globus Grid is integrated (?? is it correct ?)
3) There is support of job migration


I don't see to absence of SGE source today (I beleive it'll be
available in nearly future).

Thanks for the future help,
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 17 Apr 102 12:16:19 +0400 (MSD)
Subject: again OpenPBS vs SGE 
Message-ID: <200204170816.MAA08490@nocserv.free.net>

According to Rayson Ho
> From raysonlogin at yahoo.com Tue Apr 16 23:39:42 2002
> Date: Tue, 16 Apr 2002 12:39:39 -0700 (PDT)
> From: Rayson Ho <raysonlogin at yahoo.com>
> Subject: Re: again OpenPBS vs SGE
> To: Mikhail Kuzminsky <kus at free.net>, beowulf at beowulf.org
> ...
> 
> > 2) The schedule algorithms are restricted to only one
> >    default (this is inconsistent w/Chris Black message, as
> >    I understand)
> 
> You talking about SGE 5.2.x?
  Yes, I wrote about 5.2.3.1 which is last "production" version 
currently available.

> Chris Black must be talking about SGE 5.3, which has several advanced
> nice scheduler features:
> 
> http://www.hardi.se/products/literature/sun_grid_engine.pdf
> 

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 25 Apr 102 20:20:14 +0400 (MSD)
Subject: Tyan Tiger 2460 (Re)
Message-ID: <200204251620.UAA24432@nocserv.free.net>

According to Robert G. Brown
> From beowulf-admin at beowulf.org Thu Apr 25 11:50:34 2002
> From: "Robert G. Brown" <rgb at phy.duke.edu>
> To: Beowulf Mailing List <beowulf at beowulf.org>
> Subject: Tyan Tiger 2460
> 
> We've had problems (as have others on this list) getting our 2U
> rackmount Tyan Tiger 2460 motherboards to boot/install/run reliably and
> stably. 
> 
> ... to conclude that this is a
> reproducible BUG in the 2460 Tiger motherboard, either in the BIOS or
> (worse) in the physical wiring of slot 1...

> BTW, so far the 2466 runs fine, as noted by many listvolken.
> 
  It's not only problem w/Tyan dual motherboards. The problem
exist also w/correct work of Hardware Monitor chips (for work of
lm_sensors it's necessary to do (at the boot) some trick w/BIOS), 
for both 2460 and 2466. Moreover, for Thunder w/Tualatin chips lm_sensors
can't work. May be Supermicro boards are more stable ...
 
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 7 May 102 22:23:39 +0400 (MSD)
Subject: opinion on XFS (Re:)
Message-ID: <200205071823.WAA15765@nocserv.free.net>

According to Yudong Tian
> From beowulf-admin at beowulf.org Tue May  7 19:43:09 2002
> From: "Yudong Tian" <yudong at hsb.gsfc.nasa.gov>
> To: "Beowulf \(E-mail\)" <beowulf at beowulf.org>
> Subject: opinion on XFS
> Date: Tue, 7 May 2002 11:28:23 -0400
> 
> 
>  Hello,
>     Has anyone tested the water of using SGI's XFS on a Linux cluster Can
> you kindly share any experience and insights?
   Unfortunately we don't use xfs on Linux nodes because it's
not standard kernel feature, but we has big experience w/xfs under
some generations of SGI Irix on different hardware from workstations
to SMP/ccNUMA servers, and xfs looks as very reliable and appropriate
for intensive I/O.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sat, 18 May 102 20:22:21 +0400 (MSD)
Subject: real temperature on Tyan S2460
Message-ID: <200205181622.UAA03151@nocserv.free.net>

   I'm working w/cluster based on Tyan S2460 mobos w/AMD Athlon XP1800+
(we have old BIOS which allows to use Athlon XP instead of MP).

   There is well known problem w/lm_sensors on this motherboards and
well known trick w/BIOS and lm_sensors setting. After that measures
lm_sensors works successfully on our nodes.

   The question is about real temperature values. BIOS shows us
too high values (about 76 C) which formally corresponds to W83627 data
obtained from lm_sensors package (simultaneously W83782d chip shows something
about 39-43 C). But after running 
sensors -s

the lm_sensors data by W83627 (i.e. output of "sensors"
command) is decreasing to "good" 40-45 C because of setting
the kind of sensors to 3904 transistors (in the sensors.conf file).

   I asked Tyan staff about real kind of sensors but didn't receive
answer. All known me installations use 3904 setting. At increasing of CPU load
the W83627 data from lm_sensors w/3904 setting comes from 42 to 46.5 C,
but with setting to tiristors - only from 76 to 77 C.

   It looks therefore that BIOS data are wrong ("tuned" to wrong kind
of sensors). Sorry, am I right about BIOS ?

Mikhail Kuzmisnky
Zelinsky Institute of Organic Chemistry
Moscow
 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 21 May 102 19:58:36 +0400 (MSD)
Subject: Fortran Compilers for Scyld (Re:)
Message-ID: <200205211558.TAA02073@nocserv.free.net>

According to Arnie Miles
> From beowulf-admin at beowulf.org Mon May 20 19:14:15 2002
> Subject: Fortran Compilers for Scyld
> From: Arnie Miles <adm35 at georgetown.edu>
> To: beowulf at beowulf.org
> Date: Mon, 20 May 2002 10:49:04 -0400
> 
> Does anyone have input on using the Intel ifc Fortran 95 compiler on a 
> Scyld cluster?  Is it compatible?
> 
 As I understand, ifc dosn't "depend" from particular Linux/cluster software
distribution. The only thing where ifc "depends" strongly from parallelization
is the support of OpenMP, but it includes only parallelization
for SMP nodes. AFAIK ifc is sompatible w/g77. You may use ifc for work
w/MPI etc.

Mikhail Kuzminsky
Zelisnky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 27 May 102 20:50:42 +0400 (MSD)
Subject: Infiniband and Intel (Re:)
Message-ID: <200205271650.UAA00167@nocserv.free.net>

According to Patrick Geoffray
> From beowulf-admin at beowulf.org Sat May 25 01:45:00 2002
> From: Patrick Geoffray <patrick at myri.com>
> To: Beowulf mailinglist <beowulf at beowulf.org>
> Subject: Infiniband and Intel
> Sender: beowulf-admin at beowulf.org
> Date: Fri, 24 May 2002 17:33:04 -0400
> 
> Intel is pulling out from Infiniband:
> http://story.news.yahoo.com/news?tmpl=story&ncid=70&e=1&cid=70&u=/cn/20020524/tc_cn/intel_cancels_infiniband_products
> 
> Considering Intel's weight, that's a bad sign.
> 
  I agree. Infiniband may be very attractive as potential
interconnect for cluster nodes and potentially may
compete w/Myrinet ;-) At the last IDF (Spring-2002, San-Francisco,
where I was) it were presented many Infiniband solutions.
Moreover, it was presented special track (by NCSA) about
using of Infiniband for MPI communications.

Mikhail Kuzminsky
Zelisnky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 21 Jun 102 12:44:57 +0400 (MSD)
Subject: ATHLON vs XEON: number crunching 
Message-ID: <200206210844.MAA11709@nocserv.free.net>

According to Richard Walsh
> From beowulf-admin at beowulf.org Thu Jun 20 23:42:38 2002
> From: Richard Walsh <rbw at ahpcrc.org>
> To: beowulf at beowulf.org, lindahl at keyresearch.com
> Subject: Re: ATHLON vs XEON: number crunching
> 
> "Under heavy load conditions, the latency of SDRAM deteriorates
>  rapidly. RDRAM holds up quite gracefully ... under heavy load,
>  where memory performance is crucial to CPU performance, RDRAM
>  has far lower latency than SDRAM."
> 
> Also, I note that both the McKinley/ZX1 from HP, EV7, and Cray SV2 will 
> use RDRAM. Would you argue that this is for bandwidth reasons only?
> 
> Perhaps this is a total versus component latency difference?
   The choice of RDRAM may be was done simple because this decision
was done a lot of time ago (i.e. the time of development is 
too high), when DDR was not available as good alternative to RDRAM.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry 
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 4 Jul 102 21:53:17 +0400 (MSD)
Subject: CCL:Origin300, Linux Cluster (Re:)
Message-ID: <200207041753.VAA16733@nocserv.free.net>

According to Eugen Leitl
> From beowulf-admin at beowulf.org Thu Jul  4 20:36:41 2002
> From: Eugen Leitl <eugen at leitl.org>
> To: <Beowulf at beowulf.org>
> Subject: CCL:Origin300, Linux Cluster (fwd)
> ---------- Forwarded message ----------
> Date: Thu, 4 Jul 2002 12:04:07 -0400
> From: Jianhui Wu <wujih at BRI.NRC.CA>
> To: chemistry at ccl.net
> Cc: amber at heimdal.compchem.ucsf.edu
> Subject: CCL:Origin300, Linux Cluster
> Dear Colleagues,
> I have a budget around $40k CN to shop for a new computer system, which
> will be used for MD simulation, virtual screening and some bioinformative
> stuff. Currently, I am looking at two options: Origin 300 (2 cpu) or PC
> Linux Cluster. I would like to hear your experience with these systems and
> spend the limited budget right.
> 
> (1) An Origin 300 2cpu 500MHZ cost around $35k. Are you using this kind of
> system? Do you have benchmark of MD simulation (such as Amber) for this
> system? Do you regret your purchase?
  Some time ago I looked somewhere on //www.sgi.com a set of benchmarks
results, in particular on Amber, for some SGI systems.
  But it's absolutely clear that you'll have much more high performance
and much more better price/performance ratio if you'll 
build cluster of x86-based PC's w/1-2CPU's per node. Moreover, usually
you'll have better performance simple per CPU, i.e. at equal number of
processors. 
  
  The main reasons for choice of Origin 300 may be a)the presence of
some chemical software which may "exist" for IRIX but absent for Linux;
b) The total cost of ownership, because cluster requires much more
"human time" for installation and administration.

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 4 Jul 102 21:53:17 +0400 (MSD)
Subject: CCL:Origin300, Linux Cluster (Re:)
Message-ID: <200207041753.VAA16733@nocserv.free.net>

According to Eugen Leitl
> From beowulf-admin at beowulf.org Thu Jul  4 20:36:41 2002
> From: Eugen Leitl <eugen at leitl.org>
> To: <Beowulf at beowulf.org>
> Subject: CCL:Origin300, Linux Cluster (fwd)
> ---------- Forwarded message ----------
> Date: Thu, 4 Jul 2002 12:04:07 -0400
> From: Jianhui Wu <wujih at BRI.NRC.CA>
> To: chemistry at ccl.net
> Cc: amber at heimdal.compchem.ucsf.edu
> Subject: CCL:Origin300, Linux Cluster
> Dear Colleagues,
> I have a budget around $40k CN to shop for a new computer system, which
> will be used for MD simulation, virtual screening and some bioinformative
> stuff. Currently, I am looking at two options: Origin 300 (2 cpu) or PC
> Linux Cluster. I would like to hear your experience with these systems and
> spend the limited budget right.
> 
> (1) An Origin 300 2cpu 500MHZ cost around $35k. Are you using this kind of
> system? Do you have benchmark of MD simulation (such as Amber) for this
> system? Do you regret your purchase?
  Some time ago I looked somewhere on //www.sgi.com a set of benchmarks
results, in particular on Amber, for some SGI systems.
  But it's absolutely clear that you'll have much more high performance
and much more better price/performance ratio if you'll 
build cluster of x86-based PC's w/1-2CPU's per node. Moreover, usually
you'll have better performance simple per CPU, i.e. at equal number of
processors. 
  
  The main reasons for choice of Origin 300 may be a)the presence of
some chemical software which may "exist" for IRIX but absent for Linux;
b) The total cost of ownership, because cluster requires much more
"human time" for installation and administration.

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 23 Aug 102 17:47:18 +0400 (MSD)
Subject: Wanted: Good mobo for Intel 850E chipset and 1066 RDRAM
In-Reply-To: <54sn167rtb.fsf@intech19.enhanced.com> from "Camm Maguire" at Aug 22, 2 12:59:28 pm
Message-ID: <200208231347.RAA22077@nocserv.free.net>

According to Camm Maguire
> Greetings!  We're upgrading our 16-node cluster.  Our code heavily
> uses matrix-vector BLAS level2 operations.  Memory bandwidth is the
> bottleneck, and our preliminary tests show that rambus is at present
> the clear winner in terms of performance.  This of course is
> unfortunate, given the legal manipulations surrounding the
> technology.  We would much prefer to go with dual channel DDR, but
> this doesn't appear to be available anytime soon.
> 
  Dual Channel DDR is supported, but for AMD Athlon (in nVidia
chipset), and the speed-up for memory-limited applications like Gaussian98,
is essential.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 28 Aug 102 19:44:20 +0400 (MSD)
Subject: >2 p4 processor systems
In-Reply-To: <8724B9A26BBD904495EECACA7A7DA505011B6FF6@gemini.diversa.com> from "Brian LaMere" at Aug 27, 2 11:59:53 am
Message-ID: <200208281544.TAA06897@nocserv.free.net>

According to Brian LaMere
> So I'm trying to find out if anyone knows of a 4-way p4 system out there.
> I'm wanting to bring a couple dual-p4's in here just so they'll see that the
> performance far surpases the current per-node performance we have on our
> cluster, but...brick wall.  The guy above me agrees with me, the guy above
> him won't talk to me about it.  He just gets all excited about a 6-way p3
> server in 1u.  Whoopie.
> So...help?  Anyone know of any 4-way p4 systems?  And no, amd isn't an
> option (unfortunately).
> 
   I want to add some words to "minuses" of x86 SMPs. We use 2-CPUs
Tyan S2460 w/Athlon MP which don't require such many memory throughput
as P4 for "obtaining" of high performance. We tested S2460 w/Athlon MP
1800+ under STREAM tests (using OpenMP parallelization of loops
with ifc 5.0) and found
that 2-CPU (2-thread) results are not better than for 1 CPU. You may find
close results for 2-CPUs SMPs at //www.streambench.org.

Some applications are scaled relative well from 1 to 2 CPUs SMP. The
examples are
1)  Linpack(n=100 and n=1000) which is localized in cache
2) Gaussian 98 SCF method where localization in cache is also high.
In last case the speed-up on test178 is something about 1.7 (I don't remember
exactly).

But high-performance calculations, in particular many methods realized in g98
are memory-bounded now. So you should expect bad speedup on 2-CPUs
x86 systems because of memory bottlenecks. Most 2-CPUs x86 SMPs have
1-port main memory, and the competition for memory of modern x86 CPUs 
will be high (especially for P4, where SPECfp2000 data depends significantly
from memory throughput). So it's not clear for me that 2-CPUs SMP
are more attractive than 2*single-CPUs nodes (yes, we should calculate
price/performance ratio ...).

What is about 4-CPUs SMPs then I looked in some cases that the
architecture is bad in the sense of memory throughput scaling
(but this was for more old PIII-based systems). Therefore it's
necessary to be sure that 4-CPUs P4 systems has efficient memory
throughput, else 2*2 CPUs SMP may be better.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 28 Aug 102 19:49:22 +0400 (MSD)
Subject: Intel's MKL  WAS RE: >2 p4 processor systems
In-Reply-To: <Pine.LNX.4.33.0208271518270.2533-100000@rocky.lab.atipa.com> from "Rocky McGaugh" at Aug 27, 2 03:22:35 pm
Message-ID: <200208281549.TAA06946@nocserv.free.net>

According to Rocky McGaugh
> With HPL, ive seen consistently better performance from atlas than i have
> with Intel's MKL. Granted, this is only a single application. Does anyone
> have any testimonials about the MKL?
> 
   I've tested a set of different x86 CPUs on Linpack (n=100 and n=1000)
and found that in most cases Atlas gives more high performance.
But this may be because of using recursive LU in dgetrf of Atlas
(this algorithm is the most fast today) - but I don't know
about algorithms used in MKL 5.01 version of dgetrf.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 30 Aug 102 22:38:47 +0400 (MSD)
Subject: Performance Benchmarks
In-Reply-To: <Pine.GSO.3.96.1020830125424.28268A-100000@korrnet.org> from "Tom A Krewson" at Aug 30, 2 01:00:58 pm
Message-ID: <200208301838.WAA01246@nocserv.free.net>

According to Tom A Krewson
> Does anyone know of a good objective benchmark for Linux clusters running
> MPI? I have tried Linpack but failed to get the results I need. It seems
> to need tuning for each cluster which is makes it hard to be objective
> with reguards to its results. I also have used llcbench and have gotten
> some nifty graphs but nothing to compare what I have in my cluster
> objectively to other clusters. 
> 
  I may recommend you Linpack High Parallel benchmark which is used
also in TOP500 table where custers also are presented. URL:
//netlib2.cs.utk.edu/benchmark/hpl

There is a set of other known benhcmarks, in particular for MPI
itself, but if you want to see to computational performance
of your cluster, Linpack parallel looks as the simplest way.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
   
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 12 Sep 102 19:52:56 +0400 (MSD)
Subject: Tyan S2468UGN
In-Reply-To: <200209120951.LAA21905@dylandog.crs4.it> from "Alan Scheinine" at Sep 12, 2 11:51:26 am
Message-ID: <200209121552.TAA08121@nocserv.free.net>

According to Alan Scheinine
> 
> While we are on the subject on the Tyan AMD mother board,
> I have a question concerning the S2466 and S2468.  The manual
> of AMD for the AMD MP 2000+ and 2200+ says that the chip has
> a thermal diode but the mother board must have circuitry to
> read the diode.  Tom's hardware web site has a movie of what
> happens to the AMD if the heatsink is detached, it begins to
> smoke after about one second.  The manual of these Tyan boards
> at the Tyan site does not mention thermal shutdown protection.
> Do these Tyan boards have a thermal shutdown that would protect
> the board and the even greater risk of a fire?  
   There is only few mobos which *really* do shutdown looking
to Athlon chip diode (in particular, one mobo from Fujitsu-Siemens).
And I know only about 1-CPU mobos :-(
Tyan don't say nothing simple because their mobos don't do this :-(
Beginning from July 2002 AMD certifies only mobos having
this feature, but Tyan mobos, as I understand, were developed
before this date.

Mikhail Kuzminksy
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 13 Sep 102 17:25:45 +0400 (MSD)
Subject: Disk noises and Tyan S2468UGN
In-Reply-To: <Pine.LNX.4.44.0209121226450.5515-100000@twin.uoregon.edu> from "Joel Jaeggli" at Sep 12, 2 12:28:36 pm
Message-ID: <200209131325.RAA18949@nocserv.free.net>

According to Joel Jaeggli
> Thermal recalibration of the head sounds like the most likely cause...

I thought about the same, but every 20 seconds ???!!!

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
> 
> On Thu, 12 Sep 2002, David Mathog wrote:
> 
> > Our S2468UGN has 5 x 36 GB IBM disks in it, 2 on one
> > SCSI bus, 3 on the other.  These are IBM  IC35L036UWD210-0.
> > They work fine, pass all diagnostics, including exhaustive
> > surface testing using IBM's Drive Fitness Test 3.10.  But the odd
> > thing is that there is a semiperiodic (mean maybe 20 seconds, but
> > huge variance) noise from one or more of the disks which sounds
> > for all the world like a quieter version of a DLT tape repositioning. 
> > That is, a longish (1.5 seconds?) whir followed immediately by a
> > shorter sort of "shunk" at the end.  Some sort of movement sound,
> > but not anything that sounds like an overt failure.
> > 
> > I can't see the individual drive lights on these disks because
> > of the way they are mounted, in fact, I don't even know that
> > they have drive lights, so I can't really say if this is one drive
> > doing this or all 5 drives doing it once in a while.  The main system
> > drive light does not come on when this sound is made.  I upgraded
> > to the latest Tyan BIOS (v4.03) and it still occurs.  The sound is
> > produced whenever there is power: sitting in the BIOS, waiting in
> > DOS, running linux, etc.
> > 
> > Has anybody else observed this?
> > Any idea what it might be?
> > 
> > Thanks,
> > 
> > David Mathog
> > mathog at caltech.edu
> > Manager, Sequence Analysis Facility, Biology Division, Caltech
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> -- 
> -------------------------------------------------------------------------- 
> Joel Jaeggli	      Academic User Services   joelja at darkwing.uoregon.edu    
> --    PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E      --
>   In Dr. Johnson's famous dictionary patriotism is defined as the last
>   resort of the scoundrel.  With all due respect to an enlightened but
>   inferior lexicographer I beg to submit that it is the first.
> 	   	            -- Ambrose Bierce, "The Devil's Dictionary"
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 23 Oct 102 16:08:27 +0400 (MSD)
Subject: thermal kill switch
In-Reply-To: <Pine.LNX.3.96.1021022182159.10349A-100000@Maggie.Linux-Consulting.com> from "alvin@Maggie.Linux-Consulting.com" at Oct 22, 2 06:25:04 pm
Message-ID: <200210231208.QAA21664@nocserv.free.net>

According to alvin at Maggie.Linux-Consulting.com
> some motherboards have health monitors...
> 	- you can go into the bios and tell it what to do
> 	( shutdown when the temp hits a value )
> 
   But are you sure that Linux shutdown will be correct in that case ?

Mikhail Kuzminksy
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ares at lanpartynw.com  Mon Jul 23 12:44:16 2012
From: ares at lanpartynw.com (Ares)
Date: Wed, 23 Oct 102 12:03:09 EDT
Subject: Help needed
Message-ID: <200210231203590.SM00960@lanpartynw.com>

My name is Derek Pryor. I am a senior in high school and to graduate we have to do a big project. I am creating a beowulf cluster for my project. One of the requirements is that we have a mentor help us out. I have not found anyone in my local area (Seattle, WA) so I asking online now.

What this would involve is helping me plan out the design of the software. Also I would need some help creating a benchmark test so I could mesure the proformance increase. I have knowledge in Linux and C Programming and Linux Socket Programming.

If you are intersted or have any questions feel free to talk to me.
Email: ares at lanpartynw.com
Aim: sith lord 1226 (I'm on most of the time from 3pm ? 9pm PST)

Thank you.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 22 Nov 102 22:35:22 +0300 (MSK)
Subject: Linda problems (under work w/G98)
Message-ID: <200211221935.WAA29625@nocserv.free.net>


I've installed binary Linda 6.2 version (for homogenous cluster)
 for our Giagbit Ethernet-based cluster
(nodes works under RH 7.2). The main task of Linda for us
is to support inter-nodes parallelization of one application
(binary version of Gaussian -98 Rev.A11). 
But we found that this application starts parallel processes
on cluster nodes and "hang-ups" because of Linda/network problems
(it looks that the problem is not w/G98 itself). 

I'll be very appreciate in any ideas what may be the
real source of our problem !
 
A bit more detailed description of our situation follows.

1) We tested G98+Linda on 2 "equal" SMP nodes w/default
Linda configuration file, i.e. w/Tsnet.Appl.maxprocspernode: 1
(i.e. Linda starts 1 master process on master node, and
1 additional process on 2nd node). The clocks on both
nodes are synchronized through ntpd. NFS is not used.

2) This nodes has equal .tsnet.config files in home directories
of the same user on different nodes. This files has 1 string:

Tsnet.Appl.nodelist: host1 host2

3) At start of g98l (application executable) on host1 
we see following ntsnet messages:
...
ntsnet starting master process on host1
ntsnet starting 1 worker on host2
ntsnet waiting for Linda group messages 
ntsnet received Linda group message: group has 2 members

... and now we see parallel processes working on both nodes,
but it looks that they can't exchange (send/receive) the messages: they are
mainly in waiting state, strace gives
- select/gettimeofday/sendto/recvfrom (last -w/"resource temporary unavailable") syscalls in a loop - on host1 (master)
- select/gettimeofday/sendto syscalls in a loop - on host2

After some time interval we see on host1 the message:
ntsnet: worker on node host2 exited abnormally       
- and the run is finished.

4) At start of g98l on host2 (i.e. host2 is now master node)
the situation is not the same (not symmetrical):

ntsnet starting master process on host2
ntsnet starting 1 worker on host1
ntsnet waiting for Linda group message
Linda Error: node host1(0) warning: sendto failed: Network is unreachable
ntsnet received Linda group message: group has 2 members
... and then a lot of Linda error messages - that Network is unreachable.

And as in previous case we see parallel (waiting) processes on both nodes.

5) At the time when parallel processes on both nodes can't
"negotiate" successfully, ping and rsh between this nodes works
normally. Ping gives various delays for host1-->host2 and host2-->host1
(90-130 microseconds), but it looks appropriate. Ifconfig
says that there is no network errors. 

Yours
Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 8 Jan 103 19:22:27 +0300 (MSK)
Subject: SCMS question(s)
Message-ID: <200301081622.TAA26098@nocserv.free.net>

I've read some overviews/user manual/... about 
SCE and, in particular(especially), SCMS, and it looks for me 
that it's the best choice today (I looked also on bWatch,
CMS, SGI PCP).           

But few SCMS-2.0 features are not clear for me just now.
I'll be very appreciate if somebody will help me w/answers. 

1) I have PIII Tualatin CPU's on frontal cluster node 
but Athlon MP on compute nodes. The documentation 
says that I must have equal "kinds" of CPUs. 
Is it "strong" requirement ? What really 
will not work in SCMS in my case ? 

2) HARDWARE plugin for cms_rms has the possibility 
to control fan speeds and temperature values. 
How it's organised ? Is it work through lm_sensors 
package or directly ? 

Yours 
Mikhail Kuzminksy 
Zelinsky Institute of Organic Chemistry
Moscow 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 10 Jan 103 21:25:07 +0300 (MSK)
Subject: question about Intel P4 versus Alpha's
In-Reply-To: <ADEKKEIIHLLBKGOJEKPHOEMLCLAA.dwu@swales.com> from "Dominic Wu" at Jan 10, 3 09:53:21 am
Message-ID: <200301101825.VAA23830@nocserv.free.net>

According to Dominic Wu
> Is HT anything more than a thinly-veiled attempt at luring more software
> developers to develop multi-threaded applications so as to help Intel sell
> more CPU in the future?  (I.E. the new fangled software that is optimized
> for HT can really benefit from additional REAL processors instead of using
> just HT?)
  No. It looks that the "reasons" were other. 1) It is known that
a lot of "execution resources" (in particular, execution units) of
superscalar microprocessors are not used (simultaneously) by
many applications -  especially which don't give heavy load for
CPUs. 2) The possibility to organize multitherading execution
by modern superscalar chips "costs" (in the sense of additional
hardware ) very low, and was practically realized by Intel a lot of time ago.
But they opened this possibility for users only early.
3) One of the powerful "reasons" to propose HT was the war w/AMD;
HT is excellent marketing step.

BTW, IMHO, we must say "thanks" to Intel for introducing HT:
now paralelization will come to desktop computer, and the corresponding
parallelization technologies will be necessary for new areas of applications.

Mikhail Kuzminksy
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 13 Jan 103 16:55:50 +0300 (MSK)
Subject: question about Intel P4 versus Alphas
In-Reply-To: <200301130819.JAA04462@dylandog.crs4.it> from "Alan Scheinine" at Jan 13, 3 09:19:45 am
Message-ID: <200301131355.QAA24384@nocserv.free.net>

According to Alan Scheinine
> In a previous message, Mikhail Kuzminksy spoke about multithreading
> and hyperthreading, and also superscalar microprocessors.  I would like
> to add a few remarks for the sake of greater precision.  Superscalar 
> and out-of-order execution has a high hardware cost because of the
> large amount of logic needed to organize the execution steps dynamically.
> The primary motivation for the Itanium was that this organizing of
> the work at a fine-grained level would be done by the compiler.
> Multithreading means that the processors gives time slices to various
> threads.  The state of the CPU for each thread is switched between
> thread.  Hyperthreading has several threads executing at the same time,
> so exceptions and condition codes may be for one or another thread
> at the same time.
>    For clusters, parallel execution generally uses message passing
> so the user does not write the code as a multithreaded program.
> As a consequence the application program would not be using hyperthreading.
  I'm sorry, one additional remark for clarification :-)
I beleive that if I have 2 logical CPUs as in the case
of HT, then I may use (for writing of my application) some
parallelization tools, not only OpenMP for shared memory,
but also MPI-  for doing paralelization. When I wrote
about "thanks" to Intel from beowulfers, I thought
about any kinds of parallelization, in particular MPI.
The questions "is in this case MPI better than OpenMP
or pthreads etc" or "is it reasonable for some application(s)
to use MPI for parallelization not only between the nodes,
but also inside the HT-P4-nodes ?" are another questions.

Mikhail Kuzminsky,
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 20 Mar 103 22:59:32 +0300 (MSK)
Subject: Linux distributives for Opteron
Message-ID: <200303201959.WAA05046@nocserv.free.net>

    By RedHat it was declared that x86-64 support will be realized
in the frames of RH Linux Advanced Server distributive.
But what is known about much more cheap RH Linux professional (according
our experience, it is enough for building beowulf clusters w/2-cpu's
nodes) ? According my data, x86-64 support in RH may be realized
just at summer.

    The other choice for Opterons coming in May may be SuSe Linux profesional
(it's cheap, but we traditionally based on RH). 

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 26 Mar 103 19:31:14 +0300 (MSK)
Subject: sun grid engine?
In-Reply-To: <81D14648D6BD694CBDB4F45536E81CBC280A48@aquarius.diversa.com> from "Brian LaMere" at Mar 25, 3 03:39:44 pm
Message-ID: <200303261631.TAA00228@nocserv.free.net>

According to Brian LaMere
> 
> this is not at all a request to be contacted by salepeople - please.  All
> such emails will be ignored.  I have a Sun rep already.
> 
> To the point - does anyone actually use Sun's Grid Engine, and what sort of
> pro's and con's have they experienced?
  I can say about free SGE 5.3 version we are using. By my opinion,
the pluses, in particular, are (in addition to answers to your questions) simple
installation, good documentation, good graphical interface for administrator,
some Globus features (we don't use them currently, but paln to
use in the future).
  
> Run well?
  Yes. 
> Enough functionality?
  By my opinion, yes. The main weak point is, IMHO, not
" too advanced" sheduler. In particular, in OpenPBS I may
implement MAUI source. 
> Stable?
  Yes, it was one of the main reasons of our choice of SGE in comparison
w/OpenPBS.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry RAS
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 27 Mar 103 20:08:39 +0300 (MSK)
Subject: sun grid engine
Message-ID: <200303271708.UAA17249@nocserv.free.net>


According to hanzl at noel.feld.cvut.cz
> 
> > The main weak point is, IMHO, not
> > " too advanced" sheduler. In particular, in OpenPBS I may
> > implement MAUI source. 
> 
> SGE too is integrated with MAUI. I did not try it myself but I guess the
> integration is far enough to be usable (those of you wo did try - please
> comment on this).

May be it's integrated into SGE 5.3 Enterprise Edition ? I said about
*free* SGE 5.3. Both "Sun ONE Grid Engine Administartor and User's Guide"
and "Sun ONE Grid Engine Release Notes" don't have just the word "MAUI".
Moreover, the only sheduler algorithm allowed in usual
(free) SGE 5.3 is "standard" (see SGE Administrator & User's guide, p.225).

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 27 Mar 103 21:59:28 +0300 (MSK)
Subject: sun grid engine?
Message-ID: <200303271859.VAA18743@nocserv.free.net>


According to hanzl at noel.feld.cvut.cz
> It is easy to get confused by SGE versions.
> Enterprise Edition is also free. MAUI was integrated with it - most of
> this work was done by MAUI team with help from SGE team.
> 
> Regarding SGE versions, I think it works as follows:
> 1) Developers create opensource SGE version. They work using publicly
> available CVS software repository. All new features come to this
> version.
> ... 
> 2) 'Commercial' part of SUN takes these sources (probably without any
> important changes) and compiles 'commercial' SGE and SGEEE. They add
> word 'ONE' to the name. They create nice manuals. You can buy this
> software and get usual support you expect for commercial software.
> You can still download the manuals for free. Just skip word 'ONE'
> while reading them - they are perfectly usable for free SGE as well.
> They just may be out of date because the free version already has new
> features (like MAUI integration). They may also never mention MAUI
> integration because the 'commercial' part of SUN has no support for
> it.
> ... 
> PBS is older than SGE (and yes, PBS did many good things, no doubt)
> and everybody knew PBS when opensource SGE was born. And many people
> could easily expect that SGE used the same model as PBS did. (It was
> easy to think that SGE EE is the commercial version - no, it is not.)
  Thanks ! I was sure that SGE model is the same as PBS :-)
Now I'll like SGE much more :-) - SGE EE has additionaly nice
features for heterogenous clusters/sets of clusters etc !

Mikhail Kuzminsky
Zelisnky Inst. of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 28 Mar 103 19:45:39 +0300 (MSK)
Subject: sun grid engine?
Message-ID: <200303281645.TAA03663@nocserv.free.net>

According to Mikhail Kuzminsky
> According to hanzl at noel.feld.cvut.cz
> > ... 
> > PBS is older than SGE (and yes, PBS did many good things, no doubt)
> > and everybody knew PBS when opensource SGE was born. And many people
> > could easily expect that SGE used the same model as PBS did. (It was
> > easy to think that SGE EE is the commercial version - no, it is not.)
>   Thanks ! I was sure that SGE model is the same as PBS :-)
> Now I'll like SGE much more :-) - SGE EE has additionaly nice
> features for heterogenous clusters/sets of clusters etc !
  My :-) above related to me myself (not to SGE) - SGE is nice product,
and in comparison w/NQS (we use, btw, Generic NQS on some old SGI serevrs)
SGE has additionally not only PE support, but also for example Globus features.
  
  But the "type" of MAUI integration w/SGE looks (from this discussion) not 
clear:

From: Ron Chen <ron_chen_123 at yahoo.com>
> ...
>1. read the document:
>http://supercluster.org/documentation/maui/sgeintegration.html
>2. you need a password to get the latest versions of
>Maui scheduler, which is in Alpha/Beta state. You can
>get it from help at supercluster.org.
 
I understand, that I may use latest version of Maui if I'll compile
its source. According Alan Scheinine <scheinin at crs4.it> message here,
MAUI simple may submit things to OpenPBS/SGE. 

 But what means then MAUI integration in Sun binary version of SGEEE ? 
Does it means that I have pre-compiled versions of both MAUI
and SGEE or only that binary SGEEE includes all the necessary
corrections, allowing it to work w/more old Maui version ?

Yours
Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 3 Apr 103 19:59:22 +0400 (MSD)
Subject: small cluster
In-Reply-To: <1049316063.1932.4.camel@skull.america.net> from "Dennis Sarvis, II" at Apr 2, 3 03:41:03 pm
Message-ID: <200304031559.TAA01399@nocserv.free.net>

According to Dennis Sarvis, II
> How does one go about creating a 2 PC cluster? I have a redhat 400Mhz
> PII and a Debian Celeron 550Mhz.  Can I do something like use 2 NICs in
> the controller and one in the slave (1 NIC for the office
> network/internet and the other connecting via crossover 10baseT to the
> NIC on node1 slave)?
   Yes, I use like configuration in my home (but w/o permanent
external link to Internet).

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 11 Apr 103 19:05:13 +0400 (MSD)
Subject: [Linux-IA64] Itanium gets supercomputer software
Message-ID: <200304111505.TAA23036@nocserv.free.net>

>From: David Mosberger <davidm at napali.hpl.hp.com>
>  Duraid> You and I both know the only real barrier to Itanium
>  Duraid> adoption is the price. Can anyone here shed some light on
>  Duraid> this? Why is Itanium hardware still so expensive?

>Remember that Intel is targeting Itanium 2 against Power4 and SPARC.
>In that space, the price of Itanium 2 is very competitive.

 It's absolutely right. But after start of AMD Opteron producing
the situation may cahnge. Opteron will have much more slow performance
but much more better price/performance ratio (and the same problem
w.absence of 64-bit software ;-)). In the case of Opteron success
Intel will do the choice: a) to realize "Plan B" (extend x86 to 64 bit
in some new chip(s)), what has, unfortunately (I like RISC&IA-64
architecture) serious probability (by my opinion), or
b) to change price/perforamnce situation drastically - by means
of Deerfield. I'll be happy if the last way (b) will be realized.
(of course, it's possible to have choice c) - to ignore x86-64 :-)).

BTW, do somebody know something about *real* IPC (instructions per
count) values obtained for some programs w/Itanium 2 and Power4 ?
Taking into account extensive out-of-order execution of groups
of instructions in Power4 it's not clear for me which IPC is
higher (theoretical limit for It2 is 6, for Power4 is 5).

>From: Duraid Madina <duraid at octopus.com.au>
>David Mosberger wrote:
>> Remember that Intel is targeting Itanium 2 against Power4 and SPARC.
>> In that space, the price of Itanium 2 is very competitive.
>OK, I want to be clear on this. I asked why Itanium hardware is still so 
>expensive. Your answer seems to be marketing speak for "The prices are 
>still high because we are _happy_ selling small quantities of this 
>equipment to people used to paying through the nose for good quality 
>hardware." Is this correct?
>Can I then conclude that Intel has not yet had any interest whatsoever 
>in driving IA64 into the realm of reasonble prices? It's sad to see so 
>much work being put into this Linux port when, if things remain as they 
>are, it will hardly be used.
  It looks that there is some "gentleman's" agreement between Intel
and companies, manufacturing IA64-based systems, about "price increase".
It may be "not official", but I'm sure that it's reality. It's typical
for companies working on market of expensive, mainly RISC-based servers.
I understand that Intal do not want to destroy this "approach" and
initial agreements :-( But we should also take into account
the real cost price for Intel. Do somebody know which it's ? 
It depends mainly from die size but we don't know also percent of
good chips.

From: Matt Chapman <matthewc at cse.unsw.edu.au>
>> Can I then conclude that Intel has not yet had any interest whatsoever 
>> in driving IA64 into the realm of reasonble prices?

>My understanding is that Deerfield will be targeted at the lower cost
>market, though I haven't seen much info about it recently.
  In my last talks w/Intel staff they confirmed me that Deerfield
will be oriented in particular to clusters market. Taking into
account that Madison will arrive something about summer of current year,
Deerfiled will be available, by my estiamtion, something at end of
2003. The main question will be, by my opinion, price/performance
ratio which is absolutly unclear now.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 14 Apr 103 17:23:55 +0400 (MSD)
Subject: [Linux-ia64] Itanium gets supercomputing software
In-Reply-To: <16023.13579.676695.490297@napali.hpl.hp.com> from "David Mosberger" at Apr 11, 3 02:35:07 pm
Message-ID: <200304141323.RAA06099@nocserv.free.net>

According to David Mosberger
> 
> As for what the future holds, I guess we'll just have to wait and see.
> Remember though: just a year ago, the cheapest ia64 workstation you
> could get was priced at $7k+
  2-cpu Itanium 2 server manufactured by HP, w/maximim academic
 discount was proposed for me with "not too essentially more high" price,
 but I can't disclose details. In any case the prcie is too high
 for Beowulf (IMHO), we should wait Madison/Deerfield.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow  

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 14 Apr 103 18:25:06 +0400 (MSD)
Subject: [Linux-ia64] Itanium gets supercomputing software
In-Reply-To: <3E97263E.5010605@octopus.com.au> from "Duraid Madina" at Apr 12, 3 06:31:58 am
Message-ID: <200304141425.SAA07170@nocserv.free.net>

According to Duraid Madina
> David,
> 	Itanium 2 isn't even competitive with other offerings from your own 
> company. Compare:
> David Mosberger wrote:
> > Here is one real price point for an Itanium 2 workstation:
> > 
> > 	- hp workstation zx2000 (Linux software enablement kit)
> > 	-  Intel? Itanium 2 900MHz Processor with 1.5MB on-chip L3 cache
> > ...
> > 	- $3,298
> 
> with:
> 	- HP server rp2430
> 	- 1xHP PA-8700 650MHz CPU with 2.25MB on-chip L1 cache
> 	- $1,095
> 	I bought one of these, and it is excellent (if a little loud. ;) I 
> would happily buy a bare-bones Itanium 2 system at the same price.
   Taking into account that Itanium 2 has much more high performance,
the price from HP looks reasonable. Moreover, I found that HP prices
for Itanium 2 computers are lower than the prices for Itanium 2 servers
manufactured by other "non-brand" companies ! So we should look to
HP prices as to the best "indicator" of prices (I don't work for
HP :-) ). It must be some "pressure" from users to computer manufacturers,
they must understand that it exist now more cheap alternatives.           

> This 
> doesn't seem to like it's going to be possible any time soon. In less 
> than two weeks, I will be able to buy an Opteron system that runs a 
> great deal faster at the same price.
   Yes, Opteron may give good alternative, but I'm not sure
that price/performance ratio for Opteron servers will be better
than for P4 Xeon dual servers. Only if you need badly 64-bit
processor ...

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 15 Apr 103 17:38:50 +0400 (MSD)
Subject: [Linux-ia64] Itanium gets supercomputing software
In-Reply-To: <3E9B40D2.9010400@octopus.com.au> from "Duraid Madina" at Apr 15, 3 09:14:26 am
Message-ID: <200304151338.RAA24401@nocserv.free.net>

According to Duraid Madina
> 
> 
> SPECfp2000 is ~1170 for a 2GHz 1MB L2 Opteron. Not too bad. The SPECint 
> figure is fantastic though (~1200).
  If you'll re-calcualte SPECcpus data to frequencies of Opteron
will be available just now (1.4 and 1.6 Ghz, and 1.8 Ghz in May -
according unofficial russian source), then Xeon has more high performance.
What is about price, then 1.6 Hhz will have the price about $670-$690
(but 1.4 Hhz chips will be *much more* cheap).

  "Today" is not the best day for IA64, because It2 will be very
soon excahnged to Madison w/1.3-1.5x speedup. I don't like
x86 architecture (IA-32), but today I can't wait Deerfield :-(
and I think about Opteroon also ...

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 15 Apr 103 20:27:17 +0400 (MSD)
Subject: [Linux-ia64] Itanium gets supercomputing software
In-Reply-To: <1050421292.27085.9.camel@sadl16603.sandia.gov> from "Keith D. Underwood" at Apr 15, 3 09:41:32 am
Message-ID: <200304151627.UAA26477@nocserv.free.net>

According to Keith D. Underwood
> You should actually look at those numbers.  See here:
> 
> http://www.spec.org/cpu2000/results/res2002q4/cpu2000-20021119-01859.html
> 
> The only way you get graphs like that is when a couple of your
> benchmarks actually fit in cache.  Benchmarks running from cache are not
> terribly representative of most real applications.  
> 
  Sorry, I'm not familiar w/details of cache behaviour of separate
tests from SPECfp2000: are you sure that tests "working sizes"
fit to 3 MB L2 but will not fit (i.e. gives a lot of
cache misses) in 1 MB on Xeon for example ? (I don't say even
about more large L3 cache in Power4 or about 1.75 MB in Alpha 21364 
or 1.5 MB (D-cache) in PA-8700).

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 18 Apr 103 19:11:43 +0400 (MSD)
Subject: [Linux-ia64] Itanium gets supercomputing software
In-Reply-To: <20030416203811.GB1149@greglaptop.internal.keyresearch.com> from "Greg Lindahl" at Apr 16, 3 01:38:11 pm
Message-ID: <200304181511.TAA20527@nocserv.free.net>

According to Greg Lindahl
> 
> Open64 has a GPLed IA64 backend. While it's unfortunate that SGI has
> stopped GPLing new work on it, it's still a pretty good compiler.
  It's bad for beowulf community not only because SGI has great
compiler team. IMHO, in the case of IA-64, program optimisation
for next generation chips
is very sensitive to microarchitecture details because of needs 
to prepare simultaneously executed contens in bundles. But like
restrictions (allowing to do parallel execution in bundles) are
changed really (for example from Itanium to Itanium 2) and
it's necessary to re-construct the optimizations block of compiler.
This means that old compilers will lost their efficiency :-(

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 18 Apr 103 19:00:28 +0400 (MSD)
Subject: [Linux-ia64] Itanium gets supercomputing software
In-Reply-To: <200304170756.h3H7umB02357@dali.crs4.it> from "Alan Scheinine" at Apr 17, 3 09:56:48 am
Message-ID: <200304181500.TAA20408@nocserv.free.net>

According to Alan Scheinine
> 
>    I do not think there was a promise that getting efficiency would
> be easier with EPIC.  My understanding of the situation is that
> the logic of dynamic allocation of resources, that is, the various
> tricks done in silicon, could not scale to a large number of
> processing units on a chip.  That is, the complexity grows faster
> than linear, much faster.
  I beleive you are absolutely right. One of main reasons
of IA64/EPIC developmnet were difficulties just in development
hardware logic of superscalar out-of order calculations.
But pls look to the current (and nearest future) IA-64 chips.
The number of execution units don't increase: the main advantages
of McKinley in comparison w/Itanium (in microarchitectural sense) was
allowing to do more parallel/simultaneous instructions in pair
of bundles (elimination of a set of restrictions in Itanium) 
plus, of course, cache, frequency etc. The number of execution
unints in Madison will be, as I understand, the same. Next IA-64
chips will have >1 microprocessor cores, what means, by my opinion,
that every microprocessor core will have again the same number
of execution units. It looks that Intel increase size of cache,
frequency, insert simultaneous multi-threading etc, but I don't
see incerase of execution units number.
 
This means that some potential advantages of IA-64/EPIC
are not realized. IMHO, it may be simple because of compilers
problem. If compiler can't realize high average IPC
(instructions -per-cycle) value for real applications, 
why I'll add new execution units ?

Mikhail Kuzminksy
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 22 Apr 103 18:48:48 +0400 (MSD)
Subject: Opteron announcement
In-Reply-To: <20030422053550.GA6923@sphere.math.ucdavis.edu> from "Bill Broadley" at Apr 21, 3 10:35:51 pm
Message-ID: <200304221448.SAA21439@nocserv.free.net>

According to Bill Broadley
> Apparently the link to http://www.amd.com/opteronservers just went
> live.  Tons of cool docs/benchmarks.  
> 
> ... 
> Oh and one more interesting link:
> Software Optimization Guide for AMD athlon 64 and AMD Opteron Processors
> http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7203,00.html
> 
> Amusingly all the submissions that I looked at the full reports for
> use the Intel compiler.  So the Opterons extra registers are ignored.
> 
> Time will tell if 3rd party compilers that fully utilize the additional
> registers can win benchmarks against Intel's compiler.
   PGI (Portland Group) 5.0 will have Opteron support. The product
will be available at summer (June, if I remember correctly).
It'll be very interesting to compare !

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sun, 25 May 103 21:25:17 +0400 (MSD)
Subject: Opteron-based nodes benchmarks: RDTSC
Message-ID: <200305251725.VAA20503@nocserv.free.net>

I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz
server we want to use in Beowulf cluster. In particular, I need to measure
small time intervals, for which I want to use RDTSC-based "function"
(for example I attach below one - published by T.Prince). But it requires
some minor modifications, I beleive, to work properly on x86-64. 
 
 I use gcc-3.2 under SuSE SLES8 and call this function from
 the source compilated by pgf90-5.0beta2 (64-bit mode). The original
 source version of function by T.Prince gives assembler
errors because i386 is not pre-defined. I simple defined
 both i386 and _M_IX86, gcc -c is now OK, it create 64-bit object
 module, but after linking and runs of test the time measured
 is wrong :-( (negative in some cases). 
 
 I'll be very appreciate for any ideas what should I modify
 in the source (applied below) to resolve the problem.
 
 Mikhail Kuzminsky
 Zelinsky Institute of Organic Chemistry
 Moscow
 kus at free.net
> ===================================================
> #define _IFC 1
> 
> #define CLOCK_RATE 1600000000
> /* SET THIS AND RECOMPILE FOR TARGET MACHINE */
> #undef _WIN32
> /* set not to use API calls even on Windows */
>  #ifdef _WIN32
>  #include <windows.h>
>      #endif
> unsigned long long int rdtsc( )
> {
> #ifdef _M_IA64
> 
> unsigned __int64 __getReg(int whichReg);
> #pragma intrinsic(__getReg);
> #define INL_REGID_APITC 3116
> 
>   return  __getReg(INL_REGID_APITC);
> #elif defined(_WIN32)
>  unsigned long long int qpc;
>  (void)QueryPerformanceCounter((LARGE_INTEGER *)&qpc);
>  return qpc;
> #elif defined(__GNUC__)
> #ifdef i386
>    long long a;
>    asm volatile("rdtsc":"=A" (a));
>    return a;
> #else
>  unsigned long result;
> /* gcc-IA64 version */
>  __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: "memory");
>  while (__builtin_expect ((int) result == -1, 0))
>   __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) ::
> "memory");
>  return result;
> 
> #endif
> #elif defined(_M_IX86)
>   _asm
>   {
>    _emit 0x0f /* rdtsc */
>    _emit 0x31
> 
>   }
> return;
> #else
> #error "only supports IA64,IX86,GNUC"
> #endif
> }
> 
> #ifdef _G77
> double g77_etime_0__ (float tarray[2])
> #elif defined (_IFC)
> double g77_etime_0_  (float tarray[2])
> #else
> double g77_etime_0   (float tarray[2])
> #endif
> 
> {
>    static int win32_platform = -1;
>    double usertime, systime;
> 
>      {
>        static double clock_per=1./(long long)CLOCK_RATE;
>        static unsigned long long int old_count;
>        unsigned long long count;
>        if(!old_count){
>  #ifdef _WIN32
>  unsigned long long int qpf;
>  if(QueryPerformanceFrequency((LARGE_INTEGER *)&qpf))
>      clock_per=1./(long long)qpf;
>      #endif
>  old_count=rdtsc();
>  }
> 
>        count = rdtsc();
>        tarray[0] = usertime = (long long)(count - old_count) * clock_per;
>        tarray[1] = 0;
>      }
>    return usertime ;
> 
> }
> 
> #ifdef _G77
> void f90_cputime4__(float *time){ // Intel Fortran call
> #elif defined (_IFC)
> void f90_cputime4_(float *time){
> #else
> void f90_cputime4  (float *time){
> #endif
>   float tarray[2];
> #ifdef _G77
>   *time=(float)g77_etime_0__ (tarray);
> #else
>   *time=(float)g77_etime_0_  (tarray);
> #endif
> }
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 26 May 103 22:43:43 +0400 (MSD)
Subject: Opteron-based nodes benchmarks: RDTSC
In-Reply-To: <200305251725.VAA20503@nocserv.free.net> from "Mikhail Kuzminsky" at May 25, 3 09:56:38 pm
Message-ID: <200305261843.WAA10134@nocserv.free.net>

According to Mikhail Kuzminsky
> 
> I'm testing some fortran benchmarks on 2-CPUs Opteron 1.6 Hhz
> server we want to use in Beowulf cluster. In particular, I need to measure
> small time intervals, for which I want to use RDTSC-based "function"
> (for example I attach below one - published by T.Prince). But it requires
> some minor modifications, I beleive, to work properly on x86-64. 
>  
  I found now that all is OK if I'm using calls from g77-33
(#define for 386 and _M_IX86 as I wrote in previous message
are enough).

  Mikhail Kuzminsky
  Zelinsky Institute of Organic Chemistry
  Moscow
  kus at free.net

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 11 Jun 103 22:10:00 +0400 (MSD)
Subject: NAS Parallel Benchmarks for Current Hardware
In-Reply-To: <3EE609F7.BE430A1E@ideafix.litec.csic.es> from "A.P.Manners" at Jun 10, 3 05:40:23 pm
Message-ID: <200306111810.WAA01122@nocserv.free.net>

According to A.P.Manners
> 
> I am looking to put together a small cluster for numerical simulation
> and have been surprised at how few NPB benchmark results using current
> hardware I can find via google. 
> 
  It's common situation w/NPB (in opposition to Linpack, SPECcpu e.a.) :-(

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 18 Jun 103 20:05:31 +0400 (MSD)
Subject: what is a flop
In-Reply-To: <3EEF5F48.5020505@roma2.infn.it> from "Roberto Ammendola" at Jun 17, 3 08:34:48 pm
Message-ID: <200306181605.UAA24772@nocserv.free.net>

According to Roberto Ammendola
> The "Floating point operations per clock cycle" depends on the 
> processor, obviously, and on which instructions you use in your code. 
> For example in a processor with the SSE instruction set you can perform 
> 4 operations (on 32 bit register each) per clock cycle. One processor 
> (Xeon or P4) running at 2.0 GHz can reach 8 GigaFlops.
  Taking into account that throughput of FMUL and FADD units in
P4/Xeon is 2 cycles, i.e. FP result may be received on any 2nd sycle
only, the peak Performance of P4/2 Ghz must be 4 GFLOPS.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 18 Jun 103 20:19:35 +0400 (MSD)
Subject: SMP CPUs scaling factors (was "what is a flop")
In-Reply-To: <Pine.LNX.4.44.0306181045210.25735-100000@merlino.mi.infn.it> from "Franz Marini" at Jun 18, 3 10:53:17 am
Message-ID: <200306181619.UAA24910@nocserv.free.net>

According to Franz Marini
> On Tue, 17 Jun 2003, Maurice Hilarius wrote:
> > And I would say dual CPU boards do not sale at a factor of 2:1 over singles.
> > ...
> > As a general ( really general as it changes a lot with code and 
> compilers) 
> > the rule I know :
> > Dual P3 ( VIA chipset): 1.5 : 1
> > Dual XEON P4 ( Intel 7501 chipset): 1.3 : 1
> ... 
> > Dual AthlonMP ( AMD 760MPX chipset) 1.4 : 1
> 
> Does anyone have some real world application figures regarding the 
> performance ratio between single and two-way (and maybe four-way) SMP 
> systems based on the P4 Xeon processor ?
  I may say about SMP speedups for AthlonMP/760MP, for P4 they
will depends from chipset (kind of FSB and memory used). On G98 speedup
for 2 CPUs is between 1.4-1.8 depending from calc. method and problem
size. For Opteron/1.6 Ghz they are higher (up to 1.97 in some G98 tests).
4-way P4 SMP may be not too attractive if 4 CPUs will share common
bus to memory. 4-way Opteron's system must be very good (they may
be will arrive soon in the market).

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 20 Jun 103 17:42:01 +0400 (MSD)
Subject: cluster of AOD Opteron
In-Reply-To: <Pine.LNX.4.40.0306182315030.22310-100000@nietzsche.mit.edu> from "Stefano" at Jun 18, 3 11:22:25 pm
Message-ID: <200306201342.RAA28782@nocserv.free.net>

According to Stefano
> As I am going to receive some funding this fall, I was wondering of buying
> an opteron cluster for my research.
> Mainlym the cluster will run VASP (an ab-initio quantum program,
> written by a group in Wien), with myrinet.
> Is somebody who is using AMD opterons yet ?
  We tested 2-way SMP server based on RioWorks mobo. But I should
not recommend this motherboard for using: by default it has no 
monitoring (temperature etc) chips on the board, it's necessary
to buy special additional card ! Unfortunately as a result I don't
have data about lm_sensors work. Moreover, the choice of SMP
boards is very restricted now: Tyan S2880 and MSI K8D.
> ...
> I think some fortran vendor has announced the port of their F90 to
> the opteron. Well, it would be nice to recompile VASP for 64bits and see
> how fast it goes.
  There is some possibilities: pgf90, Intel ifc(32 bit only), g77-3.3 (now
really is very good, but f77 only) and Absoft. We tested 3 first compilers.
But I'm not sure that you'll receive just now essential speed-up
from 64 bit mode itself. SSE2 is supported in 32 bit mode also, but
it looks that SSE2 in Opteron is realized "more worse" than in P4
(in the sense of microarchitecture).
Yes, some compilers can now generate codes which use additional
registers from x86-64 architecture extensions, but we didn't find
essential speed-up on simple loops like DAXPY. 

> With the itanium2 (compiled in 2 version 32 and 64
> bits), it not so fast to justify the HUGE cost of an itanium cluster.
> Maybe the opteron will shake high-performace scientific computing !
  I beleive yes, but for 64-bit calculations. The price for Opteron-
based servers is high, and price/performance ratio in comparison 
w/Xeon is not clear.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 20 Jun 103 17:57:28 +0400 (MSD)
Subject: cluster of AOD Opteron (Stefano)
In-Reply-To: <000401c33683$aaf403c0$0b01a8c0@redstorm> from "moor007@bellsouth.net" at Jun 19, 3 11:56:02 am
Message-ID: <200306201357.RAA28995@nocserv.free.net>

According to moor007 at bellsouth.net
> I just received my hardware yesterday for my opteron cluster.  My tech will
> start putting it together today or tomorrow.  I am building a 16 CPU cluster
> w/ the 240 processor onboard the Tyan 2880.  I will be using the 2D wulfkit
> running SuSE enterprise server and Portland Group Server for the Opteron.  I
> am hoping it will be fast.  Of course, that is relative.  Anyway, I said all
> that to say that I will begin posting performance benchmarks as they become
> available.
We compared Opteron/1.6 w/dual DDR266 CL2.5 and Athlon MP 1800+ w/close
frequency (1533 MHz) and DDR266 also. Speedup for Gamess-US (ifc 7.1,
opt for P4) and for binary G98 version (pgf77, optimized for PIII) 
on a set of different computational methods (in the sense of cache
localization, memory throughput requirements etc) is about 1.5-1.9. 

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 20 Jun 103 18:09:51 +0400 (MSD)
Subject: [OT] Maximum performance on single processor ?
In-Reply-To: <4.3.2.7.2.20030620140207.00ae23a0@pop.freeuk.net> from "Simon Hogg" at Jun 20, 3 02:15:47 pm
Message-ID: <200306201409.SAA29175@nocserv.free.net>

According to Simon Hogg
> 
> At 14:44 20/06/03 +0200, Marc Baaden wrote:
> >I have an existing application which is part of a project. I have
> >the source code. It is Fortran. It *can* be parallelized, but we
> >would rather spend our time on the other parts of the project
> >which need to be written from scratch *first*.
> >
> >The application is to run in real time, that is the user does something
> >and as a function of user input and the calculation with the fortran
> >program that I described, there is a correponding feedback to the
> >user on the screen (and in some Virtual Reality equipment).
> >
> >Right now, even on simple test cases, the "response time" (eg calculation
> >time for a single step) of our program is on the order of the second.
> >(this is for an athlon MP 2600+)
> >We need to get that down to a fraction of seconds, best milli-seconds,
> >in order to be usable in real time. (makes it a factor of roughly 1000)
> >
> >As I said the code can indeed be parallelized - maybe even simply cleaned
> >up in some parts - but unfortunately there remains very much other important
> >stuff to do. So we'd rather spend some money on a really fast CPU and not
> >touch the code at the moment.
> >
> >So my question was more, what is the fastest CPU I can get for $20000
> >at the moment (without explicitly parallelizing, hyperthreading or
> >vectorizing my code).
> 
> I'm sure some other people will give 'better' answers, but from having a 
> look at your web pages, I would be tempted to go down the route of 
> second-hand SGI equipment.
> 
> For example (and no, I don't know how the performance stacks up, I'm 
> looking partly at a general bio-informatics / SGI link if that makes sense) 
> I can see for sale an Origin 2000 Quad 500MHz / 4GB RAM for UKP 15,725.
  W/o parallelization it looks as bad choice: any CPU will be
more slow than the same Opteron or P4. If FP performance is important,
Power4+ or Itanium 2 (or, more exactly, Madison one month later)
may be the best choice.  And, at least, optimize your program as
possible :-)

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sat, 21 Jun 103 17:48:28 +0400 (MSD)
Subject: cluster of AOD Opteron
In-Reply-To: <005701c33792$c7c1ddf0$6501a8c0@sims.nrc.ca> from "Serguei Patchkovskii" at Jun 20, 3 09:16:44 pm
Message-ID: <200306211348.RAA15586@nocserv.free.net>

According to Serguei Patchkovskii
> for Opteron-
> > based servers is high, and price/performance ratio in comparison
> > w/Xeon is not clear.
> Once you start populating your systems with "interesting" amounts of memory
> (i.e. anything above 2Gbytes), the price difference between dual Opterons
> and
> dual Xeons is really in the noise - at least at the places we buy. If your
> suppliers
> charge you a lot more for Opterons, may be you should look for another
> source?
> 
  There is currently not "too wide" choice of possible sources
of dual Opteron systems now in Russia :-) I agree that high memory
price (for DIMMs from 1 GB, but the price will decrease) lower the
percent of differences in total price, but if you use 512MB DIMMs
for complectation, price difference is essential. Pls sorry: I assume,
that in general the prices here in Russia are similar to other countries,
but I didn't check just now. 

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sat, 21 Jun 103 17:16:15 +0400 (MSD)
Subject: cluster of AOD Opteron
In-Reply-To: <1056121119.9688.7.camel@picard.lab.atipa.com> from "Curt Moore" at Jun 20, 3 09:58:40 am
Message-ID: <200306211316.RAA15134@nocserv.free.net>

According to Curt Moore
> The RioWorks HDAMA (Arima) motherboard does have on-board sensors,
> adm1026 based. 
1) there is no information about environment monitoring chips
   in the HDAMA motherboard guide (at least in the guide we had)
2) sensors-detect utility (I used version from SuSe enterprise Linux
beta-version distribution) didn't find any monitoring chips at the testing
> Arima does have planned both a mini BMC which does just
> management type functions and also a full BMC with will do other neat
> things, I believe, such as KVM over LAN.  Below is a lm_sensors dump
> from an Arima HDAMA.
  It's good. But which lm_sensors version should be used and what are the
necessary settings for lm_sensors kernel modules (taking into account
that lm_sensors didn't find anything ) ?
> 
> adm1026-i2c-0-2c
> Adapter: SMBus AMD8111 adapter at 80e0
> Algorithm: Non-I2C SMBus adapter
> in0:       +1.15 V  (min =  +0.00 V, max =  +2.99 V)
> in1:       +1.59 V  (min =  +0.00 V, max =  +2.99 V)
> in2:       +1.57 V  (min =  +0.00 V, max =  +2.99 V)
> in3:       +1.19 V  (min =  +0.00 V, max =  +2.99 V)
> in4:       +1.18 V  (min =  +0.00 V, max =  +2.99 V)
> in5:       +1.14 V  (min =  +0.00 V, max =  +2.99 V)
> in6:       +1.24 V  (min =  +0.00 V, max =  +2.49 V)
> in7:       +1.59 V  (min =  +0.00 V, max =  +2.49 V)
> in8:       +0.00 V  (min =  +0.00 V, max =  +2.49 V)
> in9:       +0.45 V  (min =  +1.25 V, max =  +0.98 V)
> in10:      +2.70 V  (min =  +0.00 V, max =  +3.98 V)
> in11:      +3.33 V  (min =  +0.00 V, max =  +4.42 V)
> in12:      +3.38 V  (min =  +0.00 V, max =  +4.42 V)
> in13:      +5.12 V  (min =  +0.00 V, max =  +6.63 V)
> in14:      +1.57 V  (min =  +0.00 V, max =  +2.99 V)
> in15:     +11.88 V  (min =  +0.00 V, max = +15.94 V)
> in16:     -12.03 V  (min =  +2.43 V, max = -16.00 V)
> fan0:        0 RPM  (min =    0 RPM, div = 2)
> fan1:        0 RPM  (min =    0 RPM, div = 2)
> fan2:        0 RPM  (min =    0 RPM, div = 2)
> fan3:        0 RPM  (min =    0 RPM, div = 2)
> fan4:        0 RPM  (min =    0 RPM, div = 1)
> fan5:        0 RPM  (min =    0 RPM, div = 1)
> fan6:       -1 RPM  (min =    0 RPM, div = 1)
> fan7:       -1 RPM  (min =    0 RPM, div = 1)
> temp1:       +37?C  (min = -128?C, max =  +80?C)
> temp2:       +46?C  (min = -128?C, max = +100?C)
> temp3:       +46?C  (min = -128?C, max = +100?C)
> vid:      +1.850 V    (VRM Version  9.1)
> 
  Sorry, what does it means ? adm1026 has no enough possibilities
to measure the values (in this case only 3 temperatures but
no any RPM value) or lm_sensors version don't work correctly ?

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 24 Jun 103 20:12:35 +0400 (MSD)
Subject: Opteron (x86-64) compute farms/clusters?
In-Reply-To: <3EF809A4.1050802@dlr.de> from "Thomas Alrutz" at Jun 24, 3 10:19:48 am
Message-ID: <200306241612.UAA09513@nocserv.free.net>

According to Thomas Alrutz
> 
> I just made some benchmarks on a Opteron 240 (1.4 GHz) node running with
> Suse/United Linux Enterprise edition.
> I have sucessfully compiled mpich-1.2.4 in 64 bit without any problems
> (./configure -device=ch_p4 -commtype=shared). The default compiler is
> the gcc-3.2.2 (maybe a Suse patch) and is set to 64Bit, the Portland
> (5.0beta) compiler didn't worked at all !
> 
> I tried our CFD-code (TAU) to run 3 aerodynamik configurations on this
> machine with both CPUs and the results are better then estimated.
> We achieved in full multigrid (5 cycles, 1 equation turbulence model) a
> efficiency of about 97%, 92% and 101 % for the second CPU.
> Those results are much better as the results we get on the Intel Xeons
> (around 50%).
   It looks that this results are predictable: Xeon CPUs require high
memory bandwidth, but both CPUs share common system bus. Opteron CPUs
have own memory buses and scale in this sense excellent. Better SPECrate
results for Opteron (i.e. work on a mix of tasks) confirm (in particular)
this features. CFD codes, I beleive, require high memory throughput ...

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 27 Jun 103 21:01:49 +0400 (MSD)
Subject: Intel PRO/1000CT Gigabit ethernet with CSA
In-Reply-To: <3EFBEA29.60602@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 08:54:33 am
Message-ID: <200306271701.VAA12659@nocserv.free.net>

According to Daniel Pfenniger
> 
> For a small experimental cluster (24 dual Xeon nodes)
> we decided to use InfiniBand technology, which from specs is
> 4 times faster (8Gb/s), 1.5 lower latency (~5musec) than
> Myrinet for approximately the same cost/port.
  Could you pls compare them a bit more detailed ?
Infiniband card costs (as I heard) about $1000-, (HCA-Net from
FabricNetworks, former InfiniSwitch ?), what is close to Myrinet.
But what is about switches (I heard about high prices) ?

In particular, I'm interesting in very small switches;
FabricNetworks produce 8-port 800-series switch, but I don't
know about prices. May be there is 6 or 4 port switches ?

BTW, is it possible to connect pair of nodes by means of
"cross-over" cable (as in Ethernet), i.e. w/o switch ?

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Sun, 29 Jun 103 18:14:48 +0400 (MSD)
Subject: Intel PRO/1000CT Gigabit ethernet with CSA
In-Reply-To: <3EFCA093.4090006@obs.unige.ch> from "Daniel Pfenniger" at Jun 27, 3 09:52:51 pm
Message-ID: <200306291414.SAA12281@nocserv.free.net>

According to Daniel Pfenniger
> Patrick Geoffray wrote:
> > On Fri, 2003-06-27 at 13:46, Daniel Pfenniger wrote:
> >>The exact costs are presently not well fixed because several companies
> >>enter the market.  The nice thing about IB is that it is an open
> >>standard, the components from different companies are compatible,
> >>which is good for pressing costs down.
> > 
> > With the slicon coming from one company (actually 2 but the second one
> > does only switch chip), the price adjustment would mainly affect the
> > reseller, where the margin are not that high. I don't expect much a
> > price war in the Infiniband market, mainly because many IB shops are
> > already just burning (limited) VC cash.
> > The main point for price advantage of IB is if the volume goes up. It's
> > a very different problem that the multiple-vendors-marketing-stuff. One
> > can argue that HPC does not yield such high volumes, only a business
> > market like the Databases one does.
> > 
> > Remember Gigabit Ethernet. It was very expensive when the early adopters
> > were the HPC crowd and the price didn't drop until it made its way to
> > the desktop. It's the case for 10GE today.
> > ...
>  > Patrick Geoffray
>  > Myricom, Inc.
> 
> Yes I mostly agree with your analysis, database is the only significant
> potential market for IB.
> 
> However the problem with 1GBE or 10GBE is that the latency remains poor
> for HPC applications, while IB goes in the right direction.
> The real comparison to be made is not between GE and IB, but between
> IB and Myricom products, which belong to an especially protected niche.
> As a result for years the Myrinet products did hardly drop in price
> for a sub-Moore's-law increase in performance, because of a lack of
> competition (the price we paid for our Myricom cards and switch
> 18 months ago is today *exactly* the same).
  I agree with you both. From the viewpoint of HPC clusters the IB
competitor is Myrinet (and SCI etc). But there are many applications
w/coarse-grained parallelism, where bandwidth is the main thing, not the
latency (I think, quantum chemistry applications are bandwidth-
limited). In this case (i.e. if latnecy is less important) 10Gb Ethernet
is also IB competitor. Moreover, IB, I beleive, will be used for
TCP/IP connections also - in opposition to Myrinet etc. (I beleive
there is no TCP/IP drivers for Myrinet - am I correct ?)

Again, from the veiwpoint of some real appilications, there are some
applications which
use TCP/IP stack for parallelization (I agree that is bad, but ...)
- for example Linda tools (used in Gaussian) work over TCP/IP, Gamess-US
DDI "subsystem" works over TCP/IP. In the case of IB or 10Gb Ethernet
TCP/IP is possible.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 3 Jul 103 20:27:51 +0400 (MSD)
Subject: Linux support for AMD Opteron with Broadcom NICs
In-Reply-To: <20030701224808.GA15167@stikine.ucs.sfu.ca> from "Martin Siegert" at Jul 1, 3 03:48:08 pm
Message-ID: <200307031627.UAA02885@nocserv.free.net>

According to Martin Siegert
> 
> Hello,
> I have a dual AMD Opteron for a week or so as a demo and try to install
> Linux on it - so far with little success.
> First of all: doing a google search for x86-64 Linux turns up a lot of
> press releases but not much more, particularly nothing one could download
> and install. Even a direct search on the SuSE and Mandrake sites shows
> only press releases. Sigh.
> Anyway: I found a few ftp sites that supply a Mandrake-9.0 x86_64 version.
> Thus I did a ftp installation which after (many) hickups actually worked.
> However, that distribution does not support the onboard Broadcom 5704
> NICs. I also could not get the driver from the broadcom web site to work
> (insmod fails with "could not find MAC address in NVRAM").
> Thus I tried to compile the 2.4.21 kernel which worked, but
> "insmod tg3" freezes the machine instantly.
> Thus, so far I am not impressed.
> For those of you who have such a box: which distribution are you using?
> Any advice on how to get those GigE Broadcom NICs to work?
  I may only add to the list of AMD64-oriented distributions
Turbolinux 8 for AMD64. I'm not sure that "promotional" version of
Turbolinux is complete enough, but "commercial" version costs
only about $70 (w/o support ;-)).

BTW, does somebody try it ?

We worked w/SuSE SLES8: it looks today as the only "reliable"
choice of 64-bit ditribution :-(

Let me congratulate our colleagues in USA w/4th July !

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 16 Jul 103 18:28:33 +0400 (MSD)
Subject: Global Shared Memory and SCI/Dolphin
In-Reply-To: <200307161516.09818.joachim@ccrl-nece.de> from "Joachim Worringen" at Jul 16, 3 03:16:09 pm
Message-ID: <200307161428.SAA28224@nocserv.free.net>

According to Joachim Worringen
> Franz Marini:
> >   being in the process of deciding which net infrastructure to use for =
> our
> > next cluster (Myrinet, SCI/Dolphin or Quadrics), I was looking at the
> > specs for the different types of hw.
> >   Provided that SCI/Dolphin implements RDMA, I was wondering why so lit=
> tle
> > effort seems to be put into implementing a GSM solution for x86 cluster=
> s.
> 
> Because MPI is what most people want to achieve code- and=20
> peformance-portability.
  Partially I may agree, partially - not: MPI is not the best
in the sense of portability (for example, optimiziation requires
knowledge of interconnect topology, which may vary from cluster to cluster,
and of course from MPP to MPP computer). I think that if there is
relative cheap and effective way to build ccNUMA system from cluster - it may
have success. 
> 
> > The only (maybe big, maybe not) problem I see in the Dolphin hw is the
> > lack of support for cache coherency.
> >
> >   I think that having GSM support in (almost) commodity clusters would =
> be
> > a really-nice-thing(tm).
> 
> Martin Schulz (formerly TU M=FCnchen, now Cornell Theory Center) has deve=
> loped=20
> exactly the thing you are looking for. See=20
> http://wwwbode.cs.tum.edu/Par/arch/smile/software/shmem/ . You will also =
> find=20
> his PhD thesis there which describes the complete software.
> 
> I do not know about the exact status of the SW right now (his approach=20
> required some patches to the SCI driver, and it will probably be necessar=
> y to=20
> apply them to the current drivers). Very interesting approach, though.
> 
> Other, non SCI approaches like MOSIX and the various DSM/SVM libraries al=
> so=20
> offer you some sort of global shared memory - but most do only use TCP/IP=
>  for=20
> communication.
>  Joachim
> Joachim Worringen - NEC C&C research lab St.Augustin
> fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de
> 
  Even hardware implementation of CPUs cache-coherence for large number
of processors may give bottleneck. Broadcasting MOESI gives high coherence
traffic, ccNUMA-systems use directory-based cache-coherence approach.
Software solutions are in general not efficient, but hardware solutions
(if they will exist) will be expensive :-(

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 16 Jul 103 22:31:15 +0400 (MSD)
Subject: Global Shared Memory and SCI/Dolphin
In-Reply-To: <19coKN-5n4-00@etnus.com> from "James Cownie" at Jul 16, 3 04:36:23 pm
Message-ID: <200307161831.WAA02082@nocserv.free.net>

According to James Cownie
> 
> > > Because MPI is what most people want to achieve code- and
> > > peformance-portability.
> 
> >   Partially I may agree, partially - not: MPI is not the best in the
> > sense of portability (for example, optimiziation requires knowledge
> > of interconnect topology, which may vary from cluster to cluster,
> > and of course from MPP to MPP computer).
> 
> MPI has specific support for this in Rolf Hempel's topology code,
> which is intended to allow you to have the system help you to choose a
> good mapping of your processes onto the processors in the system.
  Unfortunately I do not know about that codes :-( but for the best optimization I'll re-build the algorithm itself to "fit" for target topology.   
> 
> This seems to me to be _more_ than you have in a portable way on the
> ccNUMA machines, where you have to worry about
> 
> 1) where every page of data lives, not just how close each process is
>    to another one (and you have more pages than processes/threads to
>    worry about !)
> 
> 2) the scheduler choosing to move your processes/threads around the
>    machine. 
  Yes, but "by default" I beleive that they are the tasks of operating system,
or, as maximum, the information I'm supplying to OS, *after* translation
and linking of the program.
> 
> > I think that if there is relative cheap and effective way to build
> > ccNUMA system from cluster - it may have success.
> 
> Which is, of course, what SCI was _intended_ to be, and we saw how
> well that succeeded :-(
> 
> -- Jim 
> James Cownie	<jcownie at etnus.com>
> Etnus, LLC.     +44 117 9071438
> http://www.etnus.com

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemsitry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 25 Jul 103 20:55:49 +0400 (MSD)
Subject: Infiniband: cost-effective switchless configurations
Message-ID: <200307251655.UAA08132@nocserv.free.net>

  It's possible to build 3-nodes switchless Infiniband-connected
cluster w/following topology (I assume one 2-ports Mellanox HCA card
per node):

    node2 -------IB------Central node-----IB-----node1
     !                                             !
     !                                             !
     ----------------------IB-----------------------

It gives complete nodes connectivity and I assume to have
3 separate subnets w/own subnet manager for each. But I think that
in the case if MPI broadcasting must use hardware multicasting,
MPI broadcast will not work from nodes 1,2 (is it right ?).

OK. But may be it's possible also to build the following topology
(I assume 2 x 2-ports Mellanox HCAs per node, and it gives also
complete connectivity of nodes) ? :


  node 2----IB-------- C e n t r a l  n o d e -----IB------node1
       \              /                      \           /
         \          /                         \         /
           \       /                           \      /
             \--node3                         node4--

and I establish also additional IB links (2-1, 2-4, 3-1, 3-4, not
presenetd in the "picture") which gives me complete nodes connectivity.
Sorry, is it possible (I don't think about changes in device drivers)?
If yes, it's good way to build very small
and cost effective IB-based switchless clusters !

BTW, if I will use IPoIB service, is it possible to use netperf
and/or netpipe tools for measurements of TCP/IP performance ?
       
Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 20 Aug 103 20:09:20 +0400 (MSD)
Subject: SGE on AMD Opteron ?
Message-ID: <200308201609.UAA08558@nocserv.free.net>

   Sorry, is here somebody who
works w/Sun GrideEngine on AMD Opteron platform ?
I'm interesting in any information -
about binary SGE distribution in 32-bit mode,
or about compilation from the source for x86-64 mode,
under SuSE or RedHat distribution etc.

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 22 Aug 103 22:15:01 +0400 (MSD)
Subject: PCI-X/133 NICs on PCI-X/100
Message-ID: <200308221815.WAA27091@nocserv.free.net>

   I'm interesting in any experience
about work of PCI-X/133 NICs with PCI-X/100 slot.

Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards
work w/PCI-X/100 slots on Opteron-based mobos (most of
them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro
mobos) - i.e. how high is the probability that they are
incompatible ?

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemnistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 21 Oct 103 14:49:07 +0400 (MSD)
Subject: parllel eigen solvers
In-Reply-To: <200310201236.28901.kinghorn@pqs-chem.com> from "Donald B. Kinghorn" at Oct 20, 3 12:36:28 pm
Message-ID: <200310211049.OAA18031@nocserv.free.net>

According to Donald B. Kinghorn
> 
> Does anyone know of any recent progress on parallel eigensolvers suitable for 
> beowulf clusters running over gigabit ethernet?
>  It would be nice to have something that scaled moderately well and at least 
> gave reasonable approximations to some subset of eigenvalues and vectors for 
> large (10,000x10,000) symmetric systems.
> My interests are primarily for quantum chemistry.
>
  In the case you think about semiempirical fockian diagonalisation,
there is a set of alternative methods for direct construction of density
matrix avoiding preliminary finding of eigenvectors. This methods
are realized, in particular, in Gaussian-03 and MOPAC-2002 methods.
  
  For non-empirical quantum chemistry diagonalisation usually doesn't limit
common performance. In the case of methods like CI it's necessary to
find only some eigenvectors, and it is better to use special diagonalization
methods. 

  There is special parallel solver package, but I don't have exact
reference w/me :-(

Mikhail Kuzminsky
Zelinsky Inst. of Orgamic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 21 Oct 103 22:10:23 +0400 (MSD)
Subject: parllel eigen solvers
In-Reply-To: <20031021150637.GA8076@plk.af.mil> from "Arthur H. Edwards" at Oct 21, 3 09:06:37 am
Message-ID: <200310211810.WAA08779@nocserv.free.net>

According to Arthur H. Edwards
> 
> I should point out that density function theorcan be compute-bound on
> diagonalization. QUEST, a Sandia Code, easily handles several hundred
> atoms, but the eigen solve dominates by ~300-400 atoms. Thus,
> intermediate size diagonalization is of strong interest.
> 
> Art Edwards
> 
  Yes, I agree w/you about DFT.

Yours
Mikhail Kuzminsky
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 30 Dec 103 18:23:32 +0300 (MSK)
Subject: [Beowulf] X-window, MPICH, MPE, Cluster performance test
In-Reply-To: <E1AavYP-0004YS-JM@maroon.csi.cam.ac.uk> from "D. Scott" at Dec 29, 3 11:27:21 am
Message-ID: <200312301523.SAA06085@nocserv.free.net>

According to D. Scott
> 
> At last! My cluster is now online. I would like to thank everyone for they 
> help. I thinking of putting a website together covering my experience in 
> putting this cluster together. Will this be of use to anyone? Is they 
> website that covers top 100 list of small cluster?.
> Now it is online I would like to test it.
> 
> MPICH comes with test program, eg mpptest. Programs works and it produce 
> nice graph. Is they any documentation/tutorial that explains meaning of 
> these graphs?
> MPICH also comes with MPE graphic test programs, mandel. Problem is that I 
> have only got X-window installed on the master node. But, when I run 
> pmandel, it returms an error, staying that it can not find shared library 
> for X-window on other nodes. How can I make X-window shared across other 
> nodes from the Master node?
  You may use NFS for access to master node.

> Same me install GUI programs on other nodes.
> This could be related problem, but when I complied life (that uses MPE 
> libraries) it returns error that MPE libraries are undefined. Any ideas?
> Can I install both LAM/MPICH and MPICH-1.2.5 on the same machine?
  Yes, of course you may work w/both LAM and MPICH.

BTW, let me congratulate Beowulf maillist subscribers w/New Year !

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 23 Jan 104 15:35:32 +0300 (MSK)
Subject: [Beowulf] cluster on suse
In-Reply-To: <Pine.LNX.4.33.0401231039160.24098-100000@mecheng.iisc.ernet.in> from "Anand TNC" at Jan 23, 4 10:40:43 am
Message-ID: <200401231235.PAA05593@nocserv.free.net>

According to Anand TNC
> 
> Hi,
> 
> I'm new to clustering...does anyone know of some clustering software which 
> works on Suse 8.2 or Suse 9.0?
  All of the usual cluster software will work succesfully w/SuSE Linux.
If you say about software *included* in distribution as RPM-packages,
then also yes, SuSE Linux has most important things such as MPI for example.

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

> 
> Thanks
> 
> regards,
> 
> Anand
> 
> -- 
> Anand TNC
> PhD Student,                         
> Engine Research Laboratory           U-55 IISc Hostels,
> Dept. of Mechanical Engg.,           Indian Institute of Science,
> Indian Institute of Science,         Bangalore 560 012.
> Bangalore 560 012.                   Ph: 080 293 2591
> Lab Ph: 293 2352                         080 293 2624
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Tue, 10 Feb 104 21:27:22 +0300 (MSK)
Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> from "=?big5?q?Andrew=20Wang?=" at Feb 10, 4 11:42:32 am
Message-ID: <200402101827.VAA05978@nocserv.free.net>

According to =?big5?q?Andrew=20Wang?=
> From comp.arch: "One of the things that the version
> 8.0 of the Intel compiler included was an
> "Intel-specific" flag."
> 
> But looks like the purpose is to slow down AMD:
> http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com
> 
> If intel releases 64-bit x86 CPUs and compilers, then
> AMD may get even better benchmarks results.
  The danger of this "slow-down" is not too extremally large now:
SPECcpu2000 results (perhaps the best obtained) published for
"high-end" Opterons are based on Portland compiler, not on ifc.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow

> 
> Again, no matter how pretty the benchmarks results
> look, in the end we still need to run on the real
> system. So, what's the point of having benchmarks?
> 
> Andrew.
> 
  
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 14 May 104 22:27:21 +0400 (MSD)
Subject: [Beowulf] Athlon64 / Opteron test
In-Reply-To: <40A4E4D8.9010001@mscsoftware.com> from "Joe Griffin" at May 14,
	4 08:25:12 am
Message-ID: <200405141827.WAA12362@nocserv.free.net>

According to Joe Griffin
> 
>  ... 
> Below is a web site comparing IA32, IA64 (linux and HPUX),  Opteron
> and an IBM P655 running AIX.   The site should only be used to
> compare hardare platforms when running our software.   I am sure
> that Fluent, LSTC/Dyna, Star-CD have similar sites.  I recomend
> finding out about the software that you will be using.
> 
> MSC.Nastran Hardware comparison:
> 
> http://www.mscsoftware.com/support/prod_support/nastran/performance/v04_sngl.cfm
> 
> Regards,
> Joe Griffin
> 
  This page contains very interesting tables w/description of hardware
used, but at first look I found only the data about OSes, not about compilers/run time libraries used. The (relative bad) data for IBM e325/Opteron 2 Ghz
looks "nontrivial"; I beleive some interptretation of "why?" will be helpful.
May be some applications used are relative cache-friendly and have working set
placing in large Itanium 2 cache? 

May be it depends from compiler and Math library used ? BTW, for LGQDF test:
I/O is relative small (compare pls elapsed and CPU times which are very close);
but Windows time for Dell P4/3.2 Ghz (4480 sec) is much more worse than
for Linux on the same hardware (3713 sec). IMHO, in this case they
 must be very close in the case of using same comlilers&libraries
 (I don't like Windows, but this result is too bad for this OS :-))

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Thu, 10 Jun 104 19:11:31 +0400 (MSD)
Subject: [Beowulf] Setting memory limits on a compute node
In-Reply-To: <Pine.GSO.4.58.0406081033210.22717@is.rice.edu> from "Brent M.
	Clements" at Jun 8, 4 10:42:43 am
Message-ID: <200406101511.TAA17314@nocserv.free.net>

According to Brent M. Clements
> 
> We have a user who submits a job to a compute node.
> 
> The application is gaussian. The parent gaussian process can spawn a few
> child processes. It appears that the gaussian application is exhausting
> all of the memory in the system essentially stopping the machine from
> working. You can still ping the machine but can't ssh. Anyway's I know the
> fundementals of why this is happening. My question, is there any way to
> limit a user's total addressable space that his processes can use so that
> it doesn't kill the node?
  This situation may depends strongly from real method of calculation used
in frames of Gaussian (and may be from objects of calculations, i.e. molecules).
We work w/G98 (I beleive G03 will have the same behaviour) jobs and didn't
have like problems. 
  You may try to restrict (if it's really necessary) the memory used for
particular Gaussian job by means of setting up of %mem value in the input
Gaussian data; there is also default settings for %mem value in gaussian
 configuration file. G98 can't exceed %mem value.

  We inform our G98 users about upper limit of %mem value which don't leads
to high paging. You may also try to setup ulimit/limit values for stack and data
in the shell script used for G98 job submitting .

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Wed, 16 Jun 104 20:05:24 +0400 (MSD)
Subject: [Beowulf] CCL:Experiences with 64 bits AMD processors (fwd from
In-Reply-To: <20040616042135.GH12847@leitl.org> from "Eugen Leitl" at Jun 16,
	4 06:21:35 am
Message-ID: <200406161605.UAA24654@nocserv.free.net>

According to Eugen Leitl
> 
> 
> From: Marc Noguera Julian <marc at klingon.uab.es>
> Date: Tue, 10 Jun 2003 19:09:00 +0200
> To: chemistry at ccl.net
> Subject: CCL:Experiences with 64 bits AMD processors
> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113
> 
> Hello,
> we  are interested in buying some more computational resources. In our
> group we are interested in 64 bit AMD processors, but we do not know
> about their compatibility. They are supposed, as AMD says,  to be32 bit
> compatible and therefore AMD 64 bit processor should be able to run any
> 32 bit application. Is that true? Any experience about this will help us
> a lot.
  We run, in particular, Gaussian-98 (32 bit binary version) on Opteron
servers w/SuSE SLES8.
> By the way, we are running mainly gaussian jobs, and have some other 32
> bit binaries like turbomole and jaguar. We have source code license for
> gaussian 03. Has anyone tried to compile Gaussian 03 for a AMD 64 bit
> machine? Do 32 bit pentium binaries run correctly on a 64 bit processor
> which is the increase on the performance? 
  Yes, G03 is compiled at least by Gaussian, Inc itself: there is G03
64-bit binary version for Opteron in the price list. We have significant
speed-up on Opteron in comparison w/Athlons.
 We run also 32-bit binaries codes translated for Pentium on Opteron.
> Do Turbomole and Jaguar
> binaries run on 64 bit AMD processors?  anyone tried?
> Any information will be helpful.
> Thanks a lot
> Marc
> 
> ---------------------------
> Marc Noguera Julian
> Thcnic Especialista de Suport a la Recerca
> Qummica Fisica, Universitat Autrnoma de Barcelona.
> Tlf: 00-34-935812173
> Fax: 00-34-935812920
> e-mail: marc at klingon.uab.es
> ---------------------------------------
> 
> Eugen* Leitl <a href=3D"http://leitl.org">leitl</a>
> ______________________________________________________________
> ICBM: 48.07078, 11.61144            http://www.leitl.org
> 8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
> http://moleculardevices.org         http://nanomachines.net
> 

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Fri, 18 Jun 104 20:15:23 +0400 (MSD)
Subject: [Beowulf] cluster on Mellanox Infiniband
Message-ID: <200406181615.UAA19878@nocserv.free.net>

  We are purchasing a pair of Mellanox Infiniband 4x HCA cards
(PCI-X/133) for building of small 2-nodes 4-processor switchless
testing cluster on the base of AMD Opteron w/Tyan S2880 boards.
The nodes work under SuSE Linux 9.0 for AMD64.

I'll be very appreciate in receiving any information about following:

1) Do we need to buy some additional software from Mellanox ?
(like THCA-3 or HPC Gold CD Distrib etc)

2) Any information about potential problems of building and using
of this hard/software. 

To be more exactly, we want to install also MVAPICH (for MPI-1) or
new VMI 2.0 from NCSA for work w/MPI. 

For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for
installation. But I don't know, will we receive this software w/Mellanox
cards or we should buy this software additionally ?

I need this data badly, because we are very restricted in money ;-) !

Thanks for your help !

Yours
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kus at free.net  Mon Jul 23 12:44:16 2012
From: kus at free.net (Mikhail Kuzminsky)
Date: Mon, 21 Jun 104 17:46:23 +0400 (MSD)
Subject: [Beowulf] cluster on Mellanox Infiniband
In-Reply-To: <Pine.LNX.4.53.0406210942390.2631@merlino.mi.infn.it> from "Franz
	Marini" at Jun 21, 4 10:24:58 am
Message-ID: <200406211346.RAA17895@nocserv.free.net>

According to Franz Marini
> Hi,
> 
> On Fri, 18 Jun 104, Mikhail Kuzminsky wrote:
> 
> > 1) Do we need to buy some additional software from Mellanox ?
> > (like THCA-3 or HPC Gold CD Distrib etc)
> 
> You shouldn't have to.
  Thank you VERY much for your fast reply !! I'm glad to hear ...
> > 2) Any information about potential problems of building and using
> > of this hard/software. 
> 
> > To be more exactly, we want to install also MVAPICH (for MPI-1) or
> > new VMI 2.0 from NCSA for work w/MPI. 
> > For example, VMI 2.0, I beleive, requires THCA-3 and HPC Gold CD for
> > installation. But I don't know, will we receive this software w/Mellanox
> > cards or we should buy this software additionally ?
> 
> Hrm, no, VMI 2.0 doesn't require neither THCA-3 nor HPC Gold CD (whatever 
> it is ;)). 
  The NCSA site for VMI says "Infiniband device is linked against THCA-3.
OpenIB device is linked using HPC Gold CD distrib". What does it means ?
I must install VMI for Opteron + SuSE 9.0, there is no such binary RPM,
i.e. I must install VMI from the source. I thought that I must use software
cited above for building of my bibary VMI version. 
   I beleive that Software/Driver THCA Linux 3.1.1 will be delivered w/Mellanox
cards. OpenSM 0.3.1 - I hope, also.
 But I don'n know nothing about "HPC Gold CD distrib" :-(
> 
> We have a small (6 dual Xeon nodes, plus server) testbed cluster with 
> Mellanox Infiniband (switched, obviously). 
> 
> So far, it's been really good. We tested the net performance with SKaMPI4 
> ( http://liinwww.ira.uka.de/~skampi/ ), the results should be in the 
> online db soon, if you want to check them out.
> 
> Seeing that you are at the Institute of Organic Chemistry, I guess you're 
> interested in running programs like Gromacs or CPMD. So far both of them 
> worked great with our cluster, as far as only one cpu per node is used 
> (running two different runs of gromacs and/or CPMD on both cpus on each 
> node gives good results, but running only one instance of either program 
> on both cpus on each node results in very poor scaling).
  It looks that it gives conflicts on bus to shared memory ?

Thanks for help
Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow
> 
> Have a good day,
> 
> Franz 
> 
> 
> ---------------------------------------------------------
> Franz Marini
> Sys Admin and Software Analyst,
> Dept. of Physics, University of Milan, Italy.
> email : franz.marini at mi.infn.it
> --------------------------------------------------------- 
> 
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at eadline.org  Mon Jul 16 15:48:53 2012
From: deadline at eadline.org (Douglas Eadline)
Date: Mon, 16 Jul 2012 15:48:53 -0400
Subject: [Beowulf] A few Cluster Monkey things ...
Message-ID: <dfe242fa86a07c88958b20550b1f6139.squirrel@mail.eadline.org>


Happy summer everyone,

I have had a poll up for while now on Cluster Monkey asking about social
media and HPC. If the interest in this poll is any indication, I think I
can guess the final results, but if you have a minute, head on over and
take the poll:

  http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html

As always our polls and results are on the site for your viewing.
BTW, I think it might be worth while to re-ask some of the older
poll questions.

  http://www.clustermonkey.net/Cluster/HPC-Polls-and-Surveys/

Also, if you have a burning question, let me know I'll put it
up as a poll.

Finally, while you are there check out the HPC500 program that
Intersect360 has launched. Seems interesting and great way to help
influence
the industry.

  http://clustermonkey.net/Select-News/are-you-leading-the-hpc-charge.html

Thanks!

Doug Eadline


--
Doug

-- 
Mailscanner: Clean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dnlombar at ichips.intel.com  Mon Jul 16 16:20:28 2012
From: dnlombar at ichips.intel.com (David N. Lombard)
Date: Mon, 16 Jul 2012 13:20:28 -0700
Subject: [Beowulf] A few Cluster Monkey things ...
In-Reply-To: <dfe242fa86a07c88958b20550b1f6139.squirrel@mail.eadline.org>
References: <dfe242fa86a07c88958b20550b1f6139.squirrel@mail.eadline.org>
Message-ID: <20120716202028.GA29118@nlxcldnl2.cl.intel.com>

On Mon, Jul 16, 2012 at 03:48:53PM -0400, Douglas Eadline wrote:
> 
> Happy summer everyone,
> 
> I have had a poll up for while now on Cluster Monkey asking about social
> media and HPC. If the interest in this poll is any indication, I think I
> can guess the final results, but if you have a minute, head on over and
> take the poll:
> 
>   http://clustermonkey.net/poll/2-what-kind-of-social-media-do-you-use-the-most.html

Hmmm. This doesn't distinguish usages. It would be nice to see how
people view social media as a professional tool.  Something like
"What kind of social media do you turn to for technical information?"
The choices you have for your question fit this, too :)

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Mailscanner: Clean