From stig at mono.org Thu Feb 1 05:12:50 2001 From: stig at mono.org (stig) Date: Thu, 1 Feb 2001 10:12:50 +0000 (GMT) Subject: Scyld and Red Hat 7 In-Reply-To: <3A78A2D6.DCAD1B6F@cs.tamu.edu> Message-ID: As long as the system includes the main libs, a kernel and the popular package managers (well RPM) does it really matter what distribution it is based on? Would there be this discussion if they 'based' it on their own compilation of binaries instead of those of RedHats. David On Wed, 31 Jan 2001, Gerry Creager wrote: > Ken wrote: > > > > Martin Siegert wrote: > > > > > 2. with respect to hardware support: most of that comes with the kernel, > > > particularly everything that is loaded as modules (e.g., and NIC drivers). > > > Hence, upgrading to a 2.4 kernel probably gets you better hardware > > > support than upgrading to RH 7.0. > > > > > > > I can agree with you in that since I upgraded to RH7 I've decided to use > > Mandrake instead. ;-) > > The original questions was about baseing the next release of Scyld in > > RH7. It is hard to upgrade a distro that doesn't load in the first > > place. If upgrading the kernel is enough, then that's fine. I can > > think of plenty of reasons to keep the distro as simple and functional > > as possible. > > While my experiences with Mandrake have generally been horror stories, > my experiences with RH7 have been disasters of truly epic proportion. > I've just about got RH7 tamed... as long as I don't mount my CDROM and > use tcp/ip at the same time... and NO! I'm not kidding. > > I've stuck to RH6.2 for production. > -- > Gerry Creager | Never ascribe to Malice that > AATLT | which can adequately be > Texas A&M University | explained by Stupidity. > 979.458.4020 (Phone) | -- Lazarus Long > 979.847.8578 (Fax) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From valentin at olagrande.net Thu Feb 1 04:58:23 2001 From: valentin at olagrande.net (valentin at olagrande.net) Date: Thu, 1 Feb 2001 03:58:23 -0600 (CST) Subject: diskless nodes with scyld Message-ID: <200102010958.DAA28459@og1.olagrande.net> I am trying to set up a diskless cluster using the Scyld CD-ROM. Althought previous articles in the archive suggest that this is possible, I have found no instruction anywhere on how to do this. My nodes are single P3-800/133 with 128Mb of RAM, floppy drives, and 3 10/100 ethernet ports. They also can send dhcp requests. >From an article in the December archives, I assumed that all I need to do is change the /etc/beowulf/fstab file. Mine currently has the following entries: /dev/ram3 / ext2 fs_size=65536 0 0 none /proc proc defaults 0 0 none /dev/pts devpts gid=5,mode=620 0 0 $MASTER:/home /home nfs defaults 0 0 This fails, and I have trouble interpreting the error log attached at the end of this message. Now, who can help me? 1. What are the instructions for booting a diskless node with Scyld? 2. Is it possible to boot a diskless node without a Scyld floppy or CD-ROM? My nodes send out DHCP requests. Can I simply setup a dchp server to hand out /var/beowulf/. Will dhcpd conflict with beoserv over ports? Valentin --------------cut here------------- [root at scyld beowulf]# cat /var/log/beowulf/node.0 node_up: Setting system clock. mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09 ext2fs_check_if_mountFilesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 128 inodes, 1024 blocks 51 blocks (4.98%) reserved for the super user First data block=1 1 block group 8192 blocks per group, 8192 fragments per group 128 inodes per group Writing inode tables: done Writing superblocks and filesystem accounting information: : No such file or directory while determining whether /dev/ram1 is mounted. done node_up: TODO set interface netmask. node_up: Configuring loopback interface. /dev/hda: No such device beoboot: /lib/modules/2.2.16-21.beo/modules.dep missing /usr/lib/beoboot/bin/node_modprobe: /lib/modules/2.2.16-21.beo/modules.dep: No such file or directory setup_fs: Checking /dev/ram3 (type=ext2)... e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09 ext2fs_check_if_mount: No such file or directory while determining whether /dev/ram3 is mounted. Couldn't find ext2 superblock, trying backup blocks... The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 e2fsck: Bad magic number in super-block while trying to open /dev/ram3 setup_fs: FSCK failure. setup_fs: Creating ext2 on /dev/ram3... mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09 ext2fs_check_if_mount: No such file or directory while determining whether /dev/ram3 is mounted. setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults) setup_fs: Checking 192.168.2.1:/home (type=nfs)... setup_fs: Mounting 192.168.2.1:/home on /rootfs//home... (type=nfs; options=defaults) beoboot: /lib/modules/2.2.16-21.beo/modules.dep missing beoboot: /lib/modules/2.2.16-21.beo/modules.dep missing /usr/lib/beoboot/bin/node_modprobe: /lib/modules/2.2.16-21.beo/modules.dep: No such file or director\y /usr/lib/beoboot/bin/node_modprobe: /lib/modules/2.2.16-21.beo/modules.dep: No such file or directory node_modprobe: installing kernel module: nfs /tmp/nfs.o: unresolved symbol rpc_register_sysctl_Rbf9a77c0 /tmp/nfs.o: unresolved symbol rpc_wake_up_task_Rffa78ed9 /tmp/nfs.o: unresolved symbol rpc_do_call_R0fae8de2 /tmp/nfs.o: unresolved symbol rpc_proc_unregister_R5bd26000 /tmp/nfs.o: unresolved symbol rpc_allocate_R0cd1c989 /tmp/nfs.o: unresolved symbol rpcauth_lookupcred_R0366fdf8 /tmp/nfs.o: unresolved symbol rpc_clnt_sigunmask_R17abaa09 /tmp/nfs.o: unresolved symbol xdr_encode_string_Rabc0fe0c /tmp/nfs.o: unresolved symbol rpc_init_task_Rf4c99bc4 /tmp/nfs.o: unresolved symbol rpc_sleep_on_R41929c92 /tmp/nfs.o: unresolved symbol rpc_shutdown_client_Rb50bc549 /tmp/nfs.o: unresolved symbol rpc_create_client_R4589e663 /tmp/nfs.o: unresolved symbol rpciod_up_R375492a4 /tmp/nfs.o: unresolved symbol rpc_call_setup_R6f2441da /tmp/nfs.o: unresolved symbol rpc_proc_init_Rf56e5632 /tmp/nfs.o: unresolved symbol rpc_killall_tasks_R66ae6aea /tmp/nfs.o: unresolved symbol rpc_release_task_Re71e954e /tmp/nfs.o: unresolved symbol nlmclnt_proc_Rc02cb40f /tmp/nfs.o: unresolved symbol nfs_debug_Raf5bf6ef /tmp/nfs.o: unresolved symbol rpc_execute_R2f4e83ce /tmp/nfs.o: unresolved symbol rpc_clnt_sigmask_R3b8df6d4 /tmp/nfs.o: unresolved symbol xprt_create_proto_Rc88e4139 /tmp/nfs.o: unresolved symbol rpciod_down_Rbabf0f35 /tmp/nfs.o: unresolved symbol rpc_proc_register_R83e79004 /tmp/nfs.o: unresolved symbol xprt_destroy_Rea15ebb6 /tmp/nfs.o: unresolved symbol rpc_wake_up_next_R134f0e35 mount: fs type nfs not supported by kernel Failed to mount 192.168.2.1:/home on /home. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sjarczyk at wist.net.pl Thu Feb 1 05:54:11 2001 From: sjarczyk at wist.net.pl (Sergiusz Jarczyk) Date: Thu, 1 Feb 2001 11:54:11 +0100 (CET) Subject: Q: Any parallel DBs for the cluster computers ? In-Reply-To: <005001c08bef$71056f00$5f72f2cb@TEST> Message-ID: On Thu, 1 Feb 2001, Yoon Jae Ho wrote: > I am seeking the Parallel Database for the linux clusters for 2 years. > but failed. > > Is there any information about Parallel Database using PVFS or GFS or itself > filesystem or any other parallel filesystem ? > > Is there anyone here making the Parallel Database for the linux cluster > including Scyld Beowulf ? > > I will be happy if I get any information about Parallal Database for the > linux . > You should check clustra: http://www.clustra.com Sergiusz _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 1 07:35:09 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 1 Feb 2001 07:35:09 -0500 (EST) Subject: Scyld and Red Hat 7 In-Reply-To: Message-ID: On Thu, 1 Feb 2001, stig wrote: > As long as the system includes the main libs, a kernel and the popular > package managers (well RPM) does it really matter what distribution it is > based on? > > Would there be this discussion if they 'based' it on their own compilation > of binaries instead of those of RedHats. The reasons to periodically upgrade an operating system distribution (theirs or anybody else's), and not just the kernel, are many and valid. By the numbers: a) Improved compilers and support libraries. This is probably the number one reason to upgrade a whole distribution rather than just the kernel. Sure, you can just upgrade compilers alone, and kernels alone, and libraries alone, but at some point (especially for major e.g. libc revisions) you find that you have to rebuild everything anyway and the whole point of distributions and kickstart and yellow dog's "yup" tool is to make it easy to get from tested configuration to tested configuration. I've done systems management piecemeal and it is no fun at all. This is currently a highly nontrivial reason in my mind. I'm in the middle of fixing an extremely serious bug in the cpu-rate tool I've been using to measure floating point performance on nodes and have uncovered a rat's nest of wierdness somewhere in the gcc/linux interaction on 6.2 systems. As in I can run the same benchmark code with the same parameters and get two completely different timings, depending literally one whether I set a parameter by a fallthrough default or "override" the parameter to the exact same value on the command line. Or change the order of initialization statements. Different by a factor of two -- not a small difference. This SEEMS to be fixed in RH 7.0 although I'm still testing. b) Improved kernel. For example, NFS is basically and maddeningly broken in pre-2.18 kernels (but MAY be fixed in 2.18) -- I've actually survived a server crash without having to reboot all my NFS clients since upgrading my (non-scyld) cluster. Yes, one can rebuild the kernel by hand, but some of the scyld advantages (and other useful beowulf stuff) interface directly with the kernel. These days one sometimes has to upgrade the base compiler to upgrade the kernel. This is less important to a scyld beowulf than to a more general purpose cluster node, but scyld cannot remain stagnant at a given kernel revision forever. c) Improved everything else. This isn't too important to scyld but again, even e.g. MPI marches along. Bugs are fixed, optimizations are tuned. Scyld may not have to remain sync'd to RH's development cycle, but it has to re-release its OWN distribution package periodically to keep everything up to date and/or users will have to periodically upgrade node or server packages piecemeal. RH 7 has definitely got some problems, but 7.1beta comes out what, today? and reportedly fixes a lot of those problems (as do the many updates already released). Since RH 7 has an incompatible RPM relative to 6.2, the 6.2->7 upgrade requires a pretty serious commitment and lots of folks are holding off until its problems diminish. I therefore don't think that the issue is whether scyld should rebuild on the 7.x distribution -- it is rather a question of when. This is thus a reasonable question to ask, although there is (as noted) less pressure for them to do it immediately. There is also the question of how difficult it is to do the rebuild -- if the distribution is RPM packaged, rebuilding really shouldn't take long at all; it is the testing and stabilizing that takes the time. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andresfc at ideam.gov.co Thu Feb 1 10:07:56 2001 From: andresfc at ideam.gov.co (Andres Felipe CALDERON) Date: Thu, 1 Feb 2001 10:07:56 -0500 Subject: Any parallel DBs for the cluster computers ? References: <005001c08bef$71056f00$5f72f2cb@TEST> Message-ID: <001701c08c60$c741c820$0200000a@casa.zipa.sdc> Oracle Parallel Server? ----- Mensaje original ----- De: Yoon Jae Ho Para: beowulf at beowulf.org Enviado: Mi?coles, 31 de Enero de 2001 08:36 p.m. Asunto: Q: Any parallel DBs for the cluster computers ? I am seeking the Parallel Database for the linux clusters for 2 years. but failed. Is there any information about Parallel Database using PVFS or GFS or itself filesystem or any other parallel filesystem ? Is there anyone here making the Parallel Database for the linux cluster including Scyld Beowulf ? I will be happy if I get any information about Parallal Database for the linux . Is there anyone to make parallel mysql to be used for the cluster ? Thank you in advance --------------------------------------------------------------------------------------- Yoon Jae Ho Economist POSCO Research Institute yoon at bh.kyungpook.ac.kr jhyoon at mail.posri.re.kr http://ie.korea.ac.kr/~supercom/ Korea Beowulf Supercomputer Imagination is more important than knowledge. A. Einstein "?o?o?AAI?????I A?A?AC Aa??A?" ?o???a ?i?i, " A?A??A C??AAC ?a?u" Aa?a ?cAI?? ?U?c(A?AcA??? ?y??CI?a CA?I???? AuAU) "?o?o?AAI?o 'AI?IAo?| ???e?i???A ?E?A'AI?o?i ?? ?o AO?A??, ?o?o?A E??AA? ???u???U?? ?RC?Au E??AAI?U." ?AAI ?eA??o "?oC?Au ?E?A ?e?cA? ?i?RAu ?o?o?A E??A?? ???oAI ?E?U" A?AcE? 2000.4.22 "?o?o?AA? AyA??A?u ?i?o ???R?i ?a???? ACC??? ??AaCN?U" A? AcE? 2000.4.29 "?o?o?AAC ?CCoA? ?IA??u ?I?UCN ?e?AAI CE?aCI?U" A? AcE? 2000.4.24 " http://www.c3tv.com " 2001.1.10 ---------------------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From yocum at linuxcare.com Thu Feb 1 11:12:49 2001 From: yocum at linuxcare.com (Dan Yocum) Date: Thu, 01 Feb 2001 10:12:49 -0600 Subject: Q: Any parallel DBs for the cluster computers ? References: <005001c08bef$71056f00$5f72f2cb@TEST> Message-ID: <3A798B01.54AEBE34@linuxcare.com> This probably isn't completely related to the beowulf list (probably more related to the linux-ha list), but has anyone run a DB (pick a DB, any DB) on a cluster using DBD (distributed block device) and C-Ensemble's distributed lock manager (http://www.northforknet.com)? Cheers, Dan > Yoon Jae Ho wrote: > > I am seeking the Parallel Database for the linux clusters for 2 years. > but failed. > > Is there any information about Parallel Database using PVFS or GFS or > itself filesystem or any other parallel filesystem ? > > Is there anyone here making the Parallel Database for the linux > cluster including Scyld Beowulf ? > > I will be happy if I get any information about Parallal Database for > the linux . > > Is there anyone to make parallel mysql to be used for the cluster ? -- Dan Yocum, Sr. Linux Consultant Linuxcare, Inc. 630.697.8066 tel yocum at linuxcare.com, http://www.linuxcare.com Linuxcare. Putting open source to work. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alangrimes at starpower.net Thu Feb 1 13:30:48 2001 From: alangrimes at starpower.net (Alan Grimes) Date: Thu, 01 Feb 2001 13:30:48 -0500 Subject: Big Iorn Message-ID: <3A79AB58.47921B55@starpower.net> Hey, I have been hearing a lot of things about MVS, the inherant superiority of the 390, and all sorts of stuff about how all these big machines are so radicaly advanced that its not even funny... This has finally piqued my interest to the point where I now would like to know more about how these machines work and what they can actually do. Since this list is tangentaly related to that field I am sure there are at least a few here who could give me some useful pointers. =) -- Perhaps I will upgrade my OS from Win 3.11... But It has to be more sophisticated than Win 3.11. As well as less complicated than Win 3.11. *AND* It must run on THE MACHINE!!!! http://users.erols.com/alangrimes/ Message-ID: On Thu, 1 Feb 2001 valentin at olagrande.net wrote: > I am trying to set up a diskless cluster using the Scyld CD-ROM. Althought > previous articles in the archive suggest that this is possible, I have found > no instruction anywhere on how to do this. We have an updated CD out now (see our website) that runs disklessly by default (based on popular demand). > >From an article in the December archives, I assumed that all I need to do > is change the /etc/beowulf/fstab file. Mine currently has the following > entries: > > /dev/ram3 / ext2 fs_size=65536 0 0 > none /proc proc defaults 0 0 > none /dev/pts devpts gid=5,mode=620 0 0 > $MASTER:/home /home nfs defaults 0 0 That looks right. > This fails, and I have trouble interpreting the error log attached at the > end of this message. Now, who can help me? > 1. What are the instructions for booting a diskless node with Scyld? Comment out the /home mount in your fstab. Your nodes are having NFS problems that are keeping them from coming up. (the node NFS fs module is failing to load for some reason). > 2. Is it possible to boot a diskless node without a Scyld floppy or CD-ROM? > My nodes send out DHCP requests. Can I simply setup a dchp server to > hand out /var/beowulf/. Will dhcpd conflict with beoserv over ports? Basically no problem. I'll whip up some instructions for people who want to PXE boot their boxes. With the latest Scyld release, you basically do: beoboot -2 -i to make phase-2 images with the kernel and initrd split out so that you can use them with any boot strategy you choose. Regards, Dan Ridge Scyld Computing Corporation _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From siegert at sfu.ca Thu Feb 1 17:01:03 2001 From: siegert at sfu.ca (Martin Siegert) Date: Thu, 1 Feb 2001 14:01:03 -0800 Subject: Scyld and Red Hat 7 In-Reply-To: ; from rgb@phy.duke.edu on Thu, Feb 01, 2001 at 07:35:09AM -0500 References: Message-ID: <20010201140103.A2727@stikine.ucs.sfu.ca> On Thu, Feb 01, 2001 at 07:35:09AM -0500, Robert G. Brown wrote: > On Thu, 1 Feb 2001, stig wrote: > > > As long as the system includes the main libs, a kernel and the popular > > package managers (well RPM) does it really matter what distribution it is > > based on? With respect to applications it matters on which version of glibc the distro is based on. > > Would there be this discussion if they 'based' it on their own compilation > > of binaries instead of those of RedHats. > > The reasons to periodically upgrade an operating system distribution > (theirs or anybody else's), and not just the kernel, are many and valid. > By the numbers: > > a) Improved compilers and support libraries. This is probably the > number one reason to upgrade a whole distribution rather than just the > kernel. Sure, you can just upgrade compilers alone, and kernels alone, > and libraries alone, but at some point (especially for major e.g. libc > revisions) you find that you have to rebuild everything anyway and the > whole point of distributions and kickstart and yellow dog's "yup" tool > is to make it easy to get from tested configuration to tested > configuration. I've done systems management piecemeal and it is no fun > at all. This is also the #1 reason for me not to upgrade: if a new distribution comes with a glibc that is not downward compatible with the commercial compilers and scientific libraries that I purchased, I simply cannot use it without spending lots of $$. There must be very good reasons for that. Right now I doubt that, e.g., Portland compilers aren't even available for glibc-2.2; no NAG library either. > This is currently a highly nontrivial reason in my mind. I'm in the > middle of fixing an extremely serious bug in the cpu-rate tool I've been > using to measure floating point performance on nodes and have uncovered > a rat's nest of wierdness somewhere in the gcc/linux interaction on 6.2 > systems. As in I can run the same benchmark code with the same > parameters and get two completely different timings, depending literally > one whether I set a parameter by a fallthrough default or "override" the > parameter to the exact same value on the command line. Or change the > order of initialization statements. Different by a factor of two -- not > a small difference. This SEEMS to be fixed in RH 7.0 although I'm still > testing. I admire you - benchmarking is an art by itself. Just look at the stream benchmark: The comments in the code (stream_d.f) tell you that you can either use static or f90-type allocatable arrays. They don't tell you that the results will be dramatically different (you see the same difference with with stream_d.c when you malloc the array). So which way should you do it? Probably the slow way if you want a meaningful result for your application - I at least malloc almost everything at run time. However, stream results are never quoted that way. > b) Improved kernel. For example, NFS is basically and maddeningly > broken in pre-2.18 kernels (but MAY be fixed in 2.18) -- I've actually > survived a server crash without having to reboot all my NFS clients > since upgrading my (non-scyld) cluster. Yes, one can rebuild the kernel > by hand, but some of the scyld advantages (and other useful beowulf > stuff) interface directly with the kernel. These days one sometimes has > to upgrade the base compiler to upgrade the kernel. This is less > important to a scyld beowulf than to a more general purpose cluster > node, but scyld cannot remain stagnant at a given kernel revision > forever. That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3 And as long as I can do it without going to glibc-2.2 I'll probably upgrade. Now it doesn't look as if RH will be releasing a 2.4 kernel rpm for 6.2 (although I can't see a reason why they couldn't). [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?] With respect to Scyld (and RH and whoever) this means: I would welcome upgrades as long as the distribution remains downward compatible. The showstopper is glibc here and not the kernel. Sure there are limits to that, but the reasons for giving up downward compatibility must be very good: so good that the $$ reasons given above don't count anymore. > c) Improved everything else. This isn't too important to scyld but > again, even e.g. MPI marches along. Bugs are fixed, optimizations are > tuned. Scyld may not have to remain sync'd to RH's development cycle, > but it has to re-release its OWN distribution package periodically to > keep everything up to date and/or users will have to periodically > upgrade node or server packages piecemeal. > > RH 7 has definitely got some problems, but 7.1beta comes out what, > today? and reportedly fixes a lot of those problems (as do the many > updates already released). Since RH 7 has an incompatible RPM relative > to 6.2, the 6.2->7 upgrade requires a pretty serious commitment and lots > of folks are holding off until its problems diminish. > > I therefore don't think that the issue is whether scyld should rebuild > on the 7.x distribution -- it is rather a question of when. This is > thus a reasonable question to ask, although there is (as noted) less > pressure for them to do it immediately. There is also the question of > how difficult it is to do the rebuild -- if the distribution is RPM > packaged, rebuilding really shouldn't take long at all; it is the > testing and stabilizing that takes the time. ... and when they decide to rebuild based on 7.x they hopefully consider keeping a branch based on glibc-2.1. Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 ======================================================================== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From siegert at sfu.ca Thu Feb 1 17:27:11 2001 From: siegert at sfu.ca (Martin Siegert) Date: Thu, 1 Feb 2001 14:27:11 -0800 Subject: Alpha beowulf: True64 or Linux? Message-ID: <20010201142711.B2727@stikine.ucs.sfu.ca> We are in the planning stages of setting up a small Alpha cluster. One of the questions that came up is: should we use True64 or Linux? Now I don't need any flame wars here, but serious arguments. You don't even have to convince me (I probably have to run the thing. Since I am familiar with Linux and I'll continue to support our Pentium based cluster, Linux just means less work - which is one good argument, but I need more than that). Thus: - are there performance differences? - software availability? I heard that Compaq's development suite (compilers, debuggers, etc.) is available on both platforms. What about scientific libraries, etc. - my guess is that both OS are fully 64bit OS (files > 2GB, etc.). How about the compilers? Can I have 128bit precision for floating point operations? - if we buy 4 processor smp boxes: How is the support under either OS? (OpenMP, etc.) - How good is the smp performance (i.e., is it worth it in comparison to myrinet?)? - what other pros and cons? I'd appreciate all comments and remarks that'll help me to come to a decision one way or the other. Thanks. Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 ======================================================================== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wsb at paralleldata.com Thu Feb 1 17:54:51 2001 From: wsb at paralleldata.com (W Bauske) Date: Thu, 01 Feb 2001 16:54:51 -0600 Subject: Scyld and Red Hat 7 References: <20010201140103.A2727@stikine.ucs.sfu.ca> Message-ID: <3A79E93B.B3C1E0BB@paralleldata.com> Martin Siegert wrote: > > That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3 > And as long as I can do it without going to glibc-2.2 I'll probably > upgrade. Now it doesn't look as if RH will be releasing a 2.4 kernel > rpm for 6.2 (although I can't see a reason why they couldn't). > [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?] I run 2.4.x on two x86's with LFS working. Not too heavy of testing but it seems to behave. You do need to compile your code with the correct defines to access them. Also, on a RH6.2 system, you need to recompile certain other programs with the right defines too, like your shell. Wes _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From JParker at coinstar.com Thu Feb 1 17:36:35 2001 From: JParker at coinstar.com (JParker at coinstar.com) Date: Thu, 1 Feb 2001 14:36:35 -0800 Subject: Scyld and Red Hat 7 Message-ID: G'Day ! >> As long as the system includes the main libs, a kernel and the popular >> package managers (well RPM) does it really matter what distribution it is >> based on? >The reasons to periodically upgrade an operating system distribution >(theirs or anybody else's), and not just the kernel, are many and valid. >By the numbers: Well the question is still valid. Very few would disagree that you should update your system from time to time with the latest version of your distribution of choice. I happen to prefer Debian ;-) So the question remains ... is Schyld compatable with the other major distributions ? cheers, Jim Parker Sailboat racing is not a matter of life and death .... It is far more important than that !!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From fmuldoo at alpha2.eng.lsu.edu Thu Feb 1 17:42:33 2001 From: fmuldoo at alpha2.eng.lsu.edu (Frank Muldoon) Date: Thu, 01 Feb 2001 16:42:33 -0600 Subject: Alpha beowulf: True64 or Linux? References: <20010201142711.B2727@stikine.ucs.sfu.ca> Message-ID: <3A79E658.DA17BF2A@me.lsu.edu> I have tested my CFD code using Dec's Fortran 90/95 compiler on 2 identical Alpha 21264's @500Mhz. The ratio of time to finish for Tru64/linux was .85. This is right in line with what Dec was saying the performance penalty for using Linux on their machines was. Does anyone know why this is? I heard something about Linux not having page coloring, which I am not familiar with. -- Frank Muldoon Computational Fluid Dynamics Research Group Louisiana State University Baton Rouge, LA 70803 225-344-7676 (h) 225-388-5217 (w) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bapper at piratehaven.org Thu Feb 1 17:58:05 2001 From: bapper at piratehaven.org (Brian Pomerantz) Date: Thu, 1 Feb 2001 14:58:05 -0800 Subject: Alpha beowulf: True64 or Linux? In-Reply-To: <3A79E658.DA17BF2A@me.lsu.edu>; from fmuldoo@alpha2.eng.lsu.edu on Thu, Feb 01, 2001 at 04:42:33PM -0600 References: <20010201142711.B2727@stikine.ucs.sfu.ca> <3A79E658.DA17BF2A@me.lsu.edu> Message-ID: <20010201145805.A22564@skull.piratehaven.org> On Thu, Feb 01, 2001 at 04:42:33PM -0600, Frank Muldoon wrote: > I have tested my CFD code using Dec's Fortran 90/95 compiler on 2 > identical Alpha 21264's @500Mhz. The ratio of time to finish for > Tru64/linux was .85. This is right in line with what Dec was saying > the performance penalty for using Linux on their machines was. Does > anyone know why this is? I heard something about Linux not having > page coloring, which I am not familiar with. > Page coloring has to do with how cache lines map to pages in memory. Here is a brief blurb on page coloring from the BSD people: We'll end with the page coloring optimizations. Page coloring is a performance optimization designed to ensure that accesses to contiguous pages in virtual memory make the best use of the processor cache. In ancient times (i.e. 10+ years ago) processor caches tended to map virtual memory rather than physical memory. This led to a huge number of problems including having to clear the cache on every context switch in some cases, and problems with data aliasing in the cache. Modern processor caches map physical memory precisely to solve those problems. This means that two side-by-side pages in a processes address space may not correspond to two side-by-side pages in the cache. In fact, if you aren't careful side-by-side pages in virtual memory could wind up using the same page in the processor cache -- leading to cacheable data being thrown away prematurely and reducing CPU performance. This is true even with multi-way set-associative caches (though the effect is mitigated somewhat). FreeBSD's memory allocation code implements page coloring optimizations, which means that the memory allocation code will attempt to locate free pages that are contiguous from the point of view of the cache. For example, if page 16 of physical memory is assigned to page 0 of a process's virtual memory and the cache can hold 4 pages, the page coloring code will not assign page 20 of physical memory to page 1 of a process's virtual memory. It would, instead, assign page 21 of physical memory. The page coloring code attempts to avoid assigning page 20 because this maps over the same cache memory as page 16 and would result in non-optimal caching. This code adds a significant amount of complexity to the VM memory allocation subsystem as you can well imagine, but the result is well worth the effort. Page Coloring makes VM memory as deterministic as physical memory in regards to cache performance. There has been a lot of arguing back and forth about whether there is any benefit to page coloring when you take into consideration that it is very time consuming and difficult to set up and get right. The thing that I here REALLY increases performance on many scientific apps is the use of super pages. BAPper _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Fri Feb 2 05:52:03 2001 From: jcownie at etnus.com (James Cownie) Date: Fri, 02 Feb 2001 10:52:03 +0000 Subject: Alpha beowulf: True64 or Linux? In-Reply-To: Your message of "Thu, 01 Feb 2001 14:27:11 PST." <20010201142711.B2727@stikine.ucs.sfu.ca> Message-ID: <14Odox-0pB-00@etnus.com> Martin Siegert asked :- > Should we use True64 or Linux? > - software availability? I heard that Compaq's development suite (compilers, > debuggers, etc.) is available on both platforms. What about scientific > libraries, etc. Our Totalview debugger is available for either operating system (and supports MPI on either). Compaq's compilers are available on either, however I believe that the Compaq compilers on Linux do _not_ support either HPF or OpenMP. (For HPF they like their own message passing system, and for OpenMP they like their own thread library). Good luck. -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Feb 2 08:28:38 2001 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 2 Feb 2001 14:28:38 +0100 (CET) Subject: Scyld and Red Hat 7 In-Reply-To: <20010201140103.A2727@stikine.ucs.sfu.ca> Message-ID: On Thu, 1 Feb 2001, Martin Siegert wrote: > That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3 NFSv3 support is present in 2.2.18, however NVSv3 over TCP doesn't work right at this point (this is valid for 2.4). All the patches that were floating around and were integrated by major vendors in their kernels were also integrated in 2.2.18. However, you have to compile it yourself, it doesn't come as RH update... NFS FAQ and mailing list at http://nfs.sourceforge.net > [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?] Yes, but you also need LFS support from your glibc. AFAIK, glibc-2.1 from RH 6.2 is not compiled for LFS. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glindahl at hpti.com Fri Feb 2 09:30:43 2001 From: glindahl at hpti.com (Greg Lindahl) Date: Fri, 2 Feb 2001 09:30:43 -0500 Subject: Q: Any parallel DBs for the cluster computers ? In-Reply-To: <005001c08bef$71056f00$5f72f2cb@TEST>; from yoon@bh.kyungpook.ac.kr on Thu, Feb 01, 2001 at 10:36:46AM +0900 References: <005001c08bef$71056f00$5f72f2cb@TEST> Message-ID: <20010202093043.A1138@wumpus.hpti.com> > Is there any information about Parallel Database using PVFS or GFS or > itself filesystem or any other parallel filesystem ? Parallel databases don't necessarily use parallel filesystems. That's a detail which the database vendor generally hides from you. Oracle, for example, has a parallel database which doesn't require a shared filesystem; it only really requires shared data. Then they have their own lock manager, which provides all they need. Unfortunately I don't think this is available on Linux. However, depending on your problem, you might be able to use N separate databases and do the parallel part yourself. For example, you would run all queries on all the databases and combine all of the results. I've seen some non-Linux software which does this over any SQL database, but I haven't seen such a system for Linux. -- g _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sshealy at asgnet.psc.sc.edu Fri Feb 2 11:49:50 2001 From: sshealy at asgnet.psc.sc.edu (Scott Shealy) Date: Fri, 2 Feb 2001 11:49:50 -0500 Subject: Q: Any parallel DBs for the cluster computers ? Message-ID: <5773B442597BD2118B9800105A1901EE1B4D4B@asgnet2> I think if you are looking for polished open source stuff .... you are probably out of luck. But if you will accept a commercial solution look into IBM's DB2 Extended Enterprise Edtion. We have used this DB for large warehousing and data mining projects extensively on our IBM SP(really nothing more than a real fancy beowulf) and have been pleased with its performance and awesome scalablilty. Unlike Oracle OPS(really has components of a shared architecture which doesnt scale as well), DB2 EEE uses a shared nothing architecture that is difficult to configure,administer, and adds an extra dimension for the DBA's and data architects to deal with .... but really kicks! Recently IBM has released it for linux and I think you can download a trial from them. We are getting ready to give it a whirl on our linux cluster. You probably already know this but you are probably going to have to configure your beowulf nodes a little differently than you typically do for other computational tasks. You will need to spend alot of money on the IO subsystem on each node(thats the bottleneck in a DB) and if you are need to gurantee uptime you going to have to think about fail over for each node. Anyway have fun! Scott Shealy E811 Inc sshealy at E811.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wsb at paralleldata.com Fri Feb 2 14:40:42 2001 From: wsb at paralleldata.com (W Bauske) Date: Fri, 02 Feb 2001 13:40:42 -0600 Subject: Scyld and Red Hat 7 References: Message-ID: <3A7B0D3A.5D351FC3@paralleldata.com> Bogdan Costescu wrote: > > On Thu, 1 Feb 2001, Martin Siegert wrote: > > > That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3 > > NFSv3 support is present in 2.2.18, however NVSv3 over TCP doesn't work > right at this point (this is valid for 2.4). All the patches that were > floating around and were integrated by major vendors in their kernels were > also integrated in 2.2.18. However, you have to compile it yourself, it > doesn't come as RH update... NFS FAQ and mailing list at > http://nfs.sourceforge.net > > > [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?] > > Yes, but you also need LFS support from your glibc. AFAIK, glibc-2.1 from > RH 6.2 is not compiled for LFS. > How about trying it before commenting? I did and it appears to work... [wsb at wsb62 wsb]$ cat /etc/*lease Red Hat Linux release 6.2 (Zoot) [wsb at wsb62 wsb]$ ls -l /z/wsb62f total 10496032 -rw-r--r-- 1 root root 10737418240 Nov 30 22:18 junk1 drwxr-xr-x 2 root root 16384 Nov 25 19:08 lost+found [wsb at wsb62 wsb]$ dmesg | more Linux version 2.4.0-test10 (root at wsb62.paralleldata.com) (gcc version egcs-2.91. 66 19990314/Linux (egcs-1.1.2 release)) #6 SMP Sat Nov 25 15:52:34 CST 2000 Wes _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From toon at moene.indiv.nluug.nl Thu Feb 1 16:59:58 2001 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Thu, 01 Feb 2001 22:59:58 +0100 Subject: [Fwd: Scyld and Red Hat 7] Message-ID: <3A79DC5D.529F9A0D@moene.indiv.nluug.nl> Sorry - meant for the list. -- Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction) -------------- next part -------------- An embedded message was scrubbed... From: Toon Moene Subject: Re: Scyld and Red Hat 7 Date: Thu, 01 Feb 2001 22:31:13 +0100 Size: 2188 URL: From Todd_Henderson at readwo.com Thu Feb 1 14:28:46 2001 From: Todd_Henderson at readwo.com (Todd Henderson) Date: Thu, 01 Feb 2001 13:28:46 -0600 Subject: diskless nodes with scyld References: Message-ID: <3A79B8EE.D0D7336A@readwo.com> What is the oldest Intel that the Scyld will install and run on? I have a couple of old 486's at home I was thinking about playing around with? Thanks, Todd Daniel Ridge wrote: > On Thu, 1 Feb 2001 valentin at olagrande.net wrote: > > > I am trying to set up a diskless cluster using the Scyld CD-ROM. Althought > > previous articles in the archive suggest that this is possible, I have found > > no instruction anywhere on how to do this. > > We have an updated CD out now (see our website) that runs disklessly > by default (based on popular demand). > > > >From an article in the December archives, I assumed that all I need to do > > is change the /etc/beowulf/fstab file. Mine currently has the following > > entries: > > > > /dev/ram3 / ext2 fs_size=65536 0 0 > > none /proc proc defaults 0 0 > > none /dev/pts devpts gid=5,mode=620 0 0 > > $MASTER:/home /home nfs defaults 0 0 > > That looks right. > > > This fails, and I have trouble interpreting the error log attached at the > > end of this message. Now, who can help me? > > > 1. What are the instructions for booting a diskless node with Scyld? > > Comment out the /home mount in your fstab. Your nodes are having NFS > problems that are keeping them from coming up. (the node NFS fs module is > failing to load for some reason). > > > 2. Is it possible to boot a diskless node without a Scyld floppy or CD-ROM? > > My nodes send out DHCP requests. Can I simply setup a dchp server to > > hand out /var/beowulf/. Will dhcpd conflict with beoserv over ports? > > Basically no problem. I'll whip up some instructions for people who want > to PXE boot their boxes. With the latest Scyld release, you basically do: > beoboot -2 -i to make phase-2 images with the kernel and initrd split out > so that you can use them with any boot strategy you choose. > > Regards, > Dan Ridge > Scyld Computing Corporation > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ddj at mookie.cis.brown.edu Sun Feb 4 00:15:58 2001 From: ddj at mookie.cis.brown.edu (Dave Johnson) Date: Sun, 4 Feb 2001 00:15:58 -0500 Subject: Scyld + myrinet mpich-gm? Message-ID: <200102040515.f145FwN18773@mookie.cis.brown.edu> I've gotten myself involved in bringing a small cluster up and into production. I'm learning as I go, with the help of the archives of this mailing list. Unfortunately the searchable archives at Supercomputer.org seem to be off line (I get internal server error), and out of date (the last messages seem to be from around May 2000). The current setup is one master with 100base-T to the world, gigabit fiber to a 16-10/100 + 2-1000 switch, and 12 diskless slaves with 10/100 and myrinet interfaces. The Scyld release of last Monday is up and running, and I can bpsh to my heart's content. I'm stuck at the point of trying to deploy MPI. Scyld supplies mpi-beowulf which does not appear to me to use bproc, and /usr/bin/mpirun and mpprun which do. I've built the mpich-gm from Myricom, but their mpirun command does not grok bpsh, and expects either rsh or ssh daemons on each slave. I've tried a number of approaches that start out looking like they might work, but have gotten stuck after a few hours down each cowpath. Here is a list of some of the snags (I've lost track of some others): bpsh is not a full blown shell, doesn't deal well with redirection, changing directory before running a command, and in particular it can't be swapped for rsh or ssh when configuring mpich (ie -rsh=bpsh). The master node is outside the myrinet, I haven't a clue how to get it to cooperate with the slaves over ethernet yet have the slaves use myrinet as much as possible. I tried hacking on the first test in mpich-1.2..4/examples/test (pt2pt/third) that you get when you do make testing or runtests -check. Tried to get it to use /usr/bin/mpirun. Had to get rid of -mvhome and -mvback args first, then tried to use bpsh to start up the mpirun on one node, hoping it could use GM to start up on the other slaves. After creating the directory in /var where it could create shm_beostat, Now I get truckloads of errors: shmblk_open: Couldn't open shared memory file: /shm_beostat shmblk_open failed. I suppose these might be from the other nodes, expecting everyone is sharing /var, but I'm leery of nfs mounting all of the master's /var on each slave. I tried applying the Scyld patches against the 1.2.0 mpich sources to the 1.2..4 sources from Myricom, but most of them went into the mpid/ch_p4 directory, which is not built when --with-device=ch_gm is specified. Then I thought I'd look into the mpprun sources, but I couldn't get them to build even before I started hacking on them... decided to look elsewhere for a while. Tried getting sshd2 up and running on a slave node. So far it insists on asking for my password and won't accept it at all. Has anyone got a working cluster anything like the one we're building? What did you have to do differently to make the various packages and drivers play nice with each other? Where did I go wrong? Thanks, -- ddj Dave Johnson ddj at cascv.brown.edu Brown University TCASCV _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tibbs at math.uh.edu Sun Feb 4 02:40:39 2001 From: tibbs at math.uh.edu (Jason L Tibbitts III) Date: 04 Feb 2001 01:40:39 -0600 Subject: Scyld + myrinet mpich-gm? In-Reply-To: Dave Johnson's message of "Sun, 4 Feb 2001 00:15:58 -0500" References: <200102040515.f145FwN18773@mookie.cis.brown.edu> Message-ID: >>>>> "DJ" == Dave Johnson writes: DJ> Has anyone got a working cluster anything like the one we're building? We have the same basic structure: Gigabit Ethernet from front end to switch, 100MBps Ethernet from switch to nodes, and Myrinet between just the nodes. In our case, we have 32 nodes plus the front end and the previous generation 16 port Myrinet switches, so getting the front end on the Myrinet would be rather expensive. With the new switch setup it wouldn't be so bad. I had a short exchange with Donald Becker about our configuration; I don't want to speak for him, but the impression I got was that they hadn't really anticipated this configuration. Their setup lets you run entirely over Myrinet, but it assumes that the front end is on the Myrinet as well. With a support contract, it's possible that they could work this out, but I can't push that funding for an existing cluster so I've backed off the Scyld setup for now. I'll specify it with our next cluster purchase. -- Jason L Tibbitts III - tibbs at uh.edu - 713/743-3486 - 660PGH - 94 PC800 System Manager: University of Houston Department of Mathematics Born alone beneath pale sardonic skies. One love, one life, one sorrow. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tibbs at math.uh.edu Sun Feb 4 02:40:39 2001 From: tibbs at math.uh.edu (Jason L Tibbitts III) Date: 04 Feb 2001 01:40:39 -0600 Subject: Scyld + myrinet mpich-gm? In-Reply-To: Dave Johnson's message of "Sun, 4 Feb 2001 00:15:58 -0500" References: <200102040515.f145FwN18773@mookie.cis.brown.edu> Message-ID: >>>>> "DJ" == Dave Johnson writes: DJ> Has anyone got a working cluster anything like the one we're building? We have the same basic structure: Gigabit Ethernet from front end to switch, 100MBps Ethernet from switch to nodes, and Myrinet between just the nodes. In our case, we have 32 nodes plus the front end and the previous generation 16 port Myrinet switches, so getting the front end on the Myrinet would be rather expensive. With the new switch setup it wouldn't be so bad. I had a short exchange with Donald Becker about our configuration; I don't want to speak for him, but the impression I got was that they hadn't really anticipated this configuration. Their setup lets you run entirely over Myrinet, but it assumes that the front end is on the Myrinet as well. With a support contract, it's possible that they could work this out, but I can't push that funding for an existing cluster so I've backed off the Scyld setup for now. I'll specify it with our next cluster purchase. -- Jason L Tibbitts III - tibbs at uh.edu - 713/743-3486 - 660PGH - 94 PC800 System Manager: University of Houston Department of Mathematics Born alone beneath pale sardonic skies. One love, one life, one sorrow. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From per at computer.org Sun Feb 4 14:09:09 2001 From: per at computer.org (Per Jessen) Date: Sun, 04 Feb 2001 14:09:09 Subject: Big Iorn Message-ID: <200102041407.f14E7Ol18306@mercury.nildram.co.uk> On Thu, 01 Feb 2001 13:30:48 -0500, Alan Grimes wrote: >Hey, I have been hearing a lot of things about MVS, the inherant >superiority of the 390, and all sorts of stuff about how all these big >machines are so radicaly advanced that its not even funny... > >This has finally piqued my interest to the point where I now would like >to know more about how these machines work and what they can actually >do. >Since this list is tangentaly related to that field I am sure there are >at least a few here who could give me some useful pointers. =) What would you like to know ? I doubt if the z-server architecture is particularly advanced, but it's probably on a par with other modern processors. I've done system-level development (mostly assembler) for the 370 and 390 architectures for 10-12 years - ask away. I've done VM, MVS and TPF - not much else runs on 390 - except for Linux now. regards, Per Jessen regards, Per Jessen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Feb 4 09:47:29 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 4 Feb 2001 09:47:29 -0500 (EST) Subject: Kickstart Installation problems In-Reply-To: Message-ID: On Wed, 31 Jan 2001, Mallik Vonteddu wrote: > After booting from the floppy, it could able to get the IP address from > the DHCP server,but it fails to mount the NFS partition. > It comes out with an error message" Mount: RPC timeout " . > > Checked the following daemons Portmapper,nfsd,mountd and rpcinfo. > Executing the command "exportfs" shows the exported partitions too. > Evertyhing seems to work on the nfs server, but when it tries to mount > the nfs partition, it hangs there for some time and comes out > as " Mount : RPC timeout " . Have you checked to make sure that the ip number you are granting still has permissions to mount? Have you tried booting a rescue floppy and mounting the NFS partition by hand? Is the NFS partition mountable by other clients in the net (if they are given permission to mount)? I'm sorry if these suggestions sound lame, but you've already checked a lot, it sounds like, and it worked and now it doesn't. Either something changed or something broke (hardware or software). First hypothesis is that something changed, so look for something that changed -- an extra character that somehow got typed in the kickstart line in its dhcpd entry, an address from the wrong block -- typos can be killers because everything "works" but -- doesn't. Second hypothesis is software, so make sure that the NFS client-server connection is valid for the exported space for some other reliable client. Check to be sure that your kickstart floppy is valid, unbroken, current, and works for some other client (if you can). At this point, you've checked the entire install path, and you're down to client hardware. Which does break, although I wouldn't expect it to produce an RPC error (only) if it did. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Feb 4 10:23:04 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 4 Feb 2001 10:23:04 -0500 (EST) Subject: Scyld and Red Hat 7 In-Reply-To: <3A787495.830E5454@hsc.vcu.edu> Message-ID: On Wed, 31 Jan 2001, Mike Davis wrote: > For a production server, I'm in complete agreement with Martin. The most > important thing that a > research computer can do is continually compute research. Flippant as > that might sound, it is the > truth. While I have upgraded my desktop and some webservers to RH7, I > have no overwhelming > desire to upgrade our cluster for the reasons mentioned. It's anecdotal, to be sure, but after the RH 5.x->RH 6.x upgrade in our department all my compiled research binaries ran some 20% faster. We made back the one day of downtime in one week of production, and of course there were other tremendous benefits in even slightly broken 6.0 compared to 5.2. There were library issues associated with upgrades as well back then and all of these arguments were advanced and debated. The tension between stability and improvement is as old as code itself. Most people find a happy medium that is reasonably economic -- they get things stable and productive and then leave them alone until their friends start to make fun of them and then they upgrade, grumbling all the while, get things stable, and then leave them alone (iterate indefinitely). As long as they have smart and helpful friends who live close enough to the bleeding edge that it eventually is stabilized, this is probably just fine. It can easily be carried to a fault, though, as my anecdote makes clear. We'd ALL pretty much make fun of somebody still installing and running 5.2 on brand new hardware (and only buying peripheral hardware from the limited list of supported devices from that time), wouldn't we? There are real improvements associated with upgrades, and at some point it becomes clearly worth it to pay the "cost" of the upgrade (time, hassle, money, instability, recompiling, and so forth, which is actually pretty damn minimal for RH based systems with kickstart) to gain the benefits. Piecemeal upgrade isn't a good answer either, at least not in the long run (although it is essential for prototyping an organizational upgrade). It becomes increasingly difficult to manage an "island" of obsoleted systems in a sea of current ones for a variety of reasons: the rpm incompatibility between 6.2 and 7.0, the hassle of tracking two different update lists to ensure that your overall operation remains secure (a step often skipped, but then lots of operations just aren't particularly secure), the "missing application" problem when something you get used to on the one distro isn't on the other, and in the case of desktops, the lack of backwards compatibility in many of the X/gnome improvements that really screw things up if one shifts between distributions with a common NFS mounted home) it starts costing one MORE time and MORE productivity to keep things heterogeneous than it would to upgrade. Homogeneity equals administrative scalability, and this contributes to overall productivity too. But you know all this -- I'm just trying to provide some perspective for less experienced readers, so they don't get the impression that we're linux-luddites of some sort who plan to be running 6.2 two years from now...:-) So my point wasn't that everybody should stop everything and upgrade to 7 NOW or that Scyld should do so right away -- it was that at some point (the point where in the mind of the individual the costs and benefits start to balance) one SHOULD upgrade, and that Scyld is likely to do so when in their judgement that point is reached. For what it's worth we haven't upgraded to 7.0 yet either, but at some point in the not too distant future we will (possibly to 7.1 instead of 7.0). We'll do it "all at once", by prototyping and thoroughly testing a few archetypical systems, certifying a particular collection of rpm's and updates that "works", and then using kickstart to simply convert over all the department systems in a day (or at most two). Not much lost productivity, although we will also not be stupid and do this right before some important event (like finals) when the systems HAVE to be up in case there are problems. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lowther at att.net Sun Feb 4 10:37:41 2001 From: lowther at att.net (Ken) Date: Sun, 04 Feb 2001 10:37:41 -0500 Subject: diskless nodes with scyld References: <3A79B8EE.D0D7336A@readwo.com> Message-ID: <3A7D7745.782E00CD@att.net> Todd Henderson wrote: > > What is the oldest Intel that the Scyld will install and run on? I have a couple of old 486's at home I was > thinking about playing around with? > It should go on a i386. Ok for playing around with, but not really usefull given todays prices on hardware vs electicty. ;-) Ken _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lowther at att.net Sun Feb 4 10:37:41 2001 From: lowther at att.net (Ken) Date: Sun, 04 Feb 2001 10:37:41 -0500 Subject: diskless nodes with scyld References: <3A79B8EE.D0D7336A@readwo.com> Message-ID: <3A7D7745.782E00CD@att.net> Todd Henderson wrote: > > What is the oldest Intel that the Scyld will install and run on? I have a couple of old 486's at home I was > thinking about playing around with? > It should go on a i386. Ok for playing around with, but not really usefull given todays prices on hardware vs electicty. ;-) Ken _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wsb at paralleldata.com Sun Feb 4 23:25:10 2001 From: wsb at paralleldata.com (W Bauske) Date: Sun, 04 Feb 2001 22:25:10 -0600 Subject: Big Iron References: <200102041407.f14E7Ol18306@mercury.nildram.co.uk> Message-ID: <3A7E2B26.B8091B7D@paralleldata.com> Per Jessen wrote: > > On Thu, 01 Feb 2001 13:30:48 -0500, Alan Grimes wrote: > > >Hey, I have been hearing a lot of things about MVS, the inherant > >superiority of the 390, and all sorts of stuff about how all these big > >machines are so radicaly advanced that its not even funny... > > > >This has finally piqued my interest to the point where I now would like > >to know more about how these machines work and what they can actually > >do. > >Since this list is tangentaly related to that field I am sure there are > >at least a few here who could give me some useful pointers. =) > > What would you like to know ? > I doubt if the z-server architecture is particularly advanced, but it's > probably on a par with other modern processors. > I've done system-level development (mostly assembler) for the 370 and > 390 architectures for 10-12 years - ask away. I've done VM, MVS and TPF - > not much else runs on 390 - except for Linux now. > Go to IBM's site and look up "zseries" and "linux 390". Look thru that and search for whatever else you're curious about. Wes _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Jon.Tegner at wiglaf.se Sun Feb 4 13:54:50 2001 From: Jon.Tegner at wiglaf.se (Jon Tegner) Date: Sun, 04 Feb 2001 19:54:50 +0100 Subject: Managing rpms Message-ID: <3A7DA57A.612FB7A7@wiglaf.se> In a post awhile back the yup-package for maintaining rpms was mentioned, and I was wondering if someone has experiences of that or some other package which automatically takes care of updating rpms in a system (on the page http://www.rpm.org/software.html there seems to be several canditates). Regards, /jon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Carl_Notfors at vdgc.com.sg Mon Feb 5 02:23:42 2001 From: Carl_Notfors at vdgc.com.sg (Carl_Notfors at vdgc.com.sg) Date: Mon, 5 Feb 2001 15:23:42 +0800 Subject: Fault tolerance and MPI Message-ID: Our computational model is quite simple. We have a master node and a number of slave nodes. All communication is between the master and the slaves, ie. no internode communication, so all communication is done with MPI_Send and MPI_Recv (we are using LAM/MPI). The problem with MPI is that there is no fault tolerance, if a slave node "dies" the whole process goes down. According to the LAM documentation it should be possible to achieve some fault tolerance but we have as yet not tried this. Is there anyone who has got this working? Is there fault tolerance in any othe MPI implementations? Would it be better to use PVM if you want fault tolerance? Carl _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bahnsen at theo-physik.uni-kiel.de Mon Feb 5 05:30:20 2001 From: bahnsen at theo-physik.uni-kiel.de (Robert Bahnsen) Date: Mon, 5 Feb 2001 11:30:20 +0100 (MET) Subject: Alpha beowulf: True64 or Linux? In-Reply-To: <200102021700.MAA11198@blueraja.scyld.com> from "beowulf-admin@beowulf.org" at Feb 02, 2001 12:00:06 PM Message-ID: <200102051030.LAA03119@berg.theo-physik.uni-kiel.de> Martin, as far as the NAG fl90 Library is concerned the following versions are available for Compaq Alpha: FNDAU04DB Release 4 / Compaq Alpha UNIX / Compaq compiler FNDAU03D9 Release 3 / Compaq Alpha UNIX / NAGWare compiler FNDAL03D9 Release 3 / Compaq Alpha Linux / NAGWare compiler The combination Compaq Alpha Linux + (free/cheap) Compaq compiler is missing, and NAG said they would not release one in the near future. Take this pro Tru64 or con NAG, as you like. HTH, Robert > - are there performance differences? > - software availability? I heard that Compaq's development suite (compilers, > debuggers, etc.) is available on both platforms. What about scientific > libraries, etc. > - my guess is that both OS are fully 64bit OS (files > 2GB, etc.). > How about the compilers? Can I have 128bit precision for floating point > operations? > - if we buy 4 processor smp boxes: How is the support under either OS? > (OpenMP, etc.) > - How good is the smp performance (i.e., is it worth it in comparison to > myrinet?)? > - what other pros and cons? -- Dipl.-Phys. Robert Bahnsen Institut f. Theoretische Physik und Astrophysik CAU Kiel, Leibnizstr. 15, D-24098 Kiel, Germany Fon: +49 (0)431 8804112 Fax: +49 (0)431 8804094 E-Mail: bahnsen at tp.cau.de www.theo-physik.uni-kiel.de/~bahnsen/index.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 5 07:21:13 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 5 Feb 2001 07:21:13 -0500 (EST) Subject: Managing rpms In-Reply-To: <3A7DA57A.612FB7A7@wiglaf.se> Message-ID: On Sun, 4 Feb 2001, Jon Tegner wrote: > In a post awhile back the yup-package for maintaining rpms was > mentioned, and I was wondering if someone has experiences of that or > some other package which automatically takes care of updating rpms in a > system (on the page http://www.rpm.org/software.html there seems to be > several canditates). > > Regards, > > /jon I've just started using yup personally, as it is being prototyped as a method for automating the generally incredibly painful process of keeping a system or set of systems both consistent and secure (in the sense of being up to date with respect to security patches and the like). yup does a lot of things for an RPM-based distribution that we are used to seeing only from e.g. Debian -- it is dependency aware and can update an entire dependency tree with one call. It also does sanity checks and effectively forces one to eliminate inconsistencies from an RPM tree before it will run -- on one of my oldest systems, I had multiple rpm revisions of some packages installed which had survived the 6.2 upgrade. yup patiently went through this and helped me figure out what was bollixed up and remove or hand update things until it was satisfied that the distribution itself on the system was at least not overtly broken somewhere. It can also be used to generate a plain list of all installed packages. In application, once one has a clean system it becomes a simple client-side call. It can be run nightly in a cron script, for example, on all clients. The clients are directed to an FTP server which has yup configuration information and distribution/update directories. Everything is then done automagically -- it compares what you have to what you should have, retrieves and caches copies of rpm's that need updating and all their dependencies, installs them, removes the cache copies, and goes away. It can also be run from the command line targeted at specific packages. For example, on the aforementioned host I still have a bug that is preventing a full update (a bug which might well be in yup -- the package isn't yet perfect). However, it still works fine for individual packages, and I'm working my way through "important" packages one at a time. Below is a trace of operation for updating e.g. lpr (which is actually not that important on this host, but is out of date): rgb at rgb|T:3#more /tmp/rpm-list rgb at rgb|T:4#yup update lpr Reading RPM database... (100%) Performing dependencies sanity check... Checking for package list updates... Done transfering... 280B in 0.0s at 115kB per/sec Package list is up to date... Reading package list... (100%) As requested, I will do the following: [update: lpr] Downloading lpr-0.50-7.6.x.i386.rpm Done transfering... 89.6kB in 2.0s at 44.8kB per/sec Reading packages... Done lpr-0.50-7.6.x.i386.rpm [..........] 42.770user 0.780sys 86.8%, 0ib 0ob 0tx 0da 0to 0swp 0:50.16 This appears to be a bit easier than: a) Figuring out the package of lpr that I have. b) Finding out if it is superceded by an update c) Hand-ftp'ing the update rpm (and dependencies, if any) from a Red Hat mirror. d) installing the rpm(s) by hand. One call, all automated. If run a second time, it returns: rgb at rgb|T:5#yup update lpr Reading RPM database... (100%) Performing dependencies sanity check... Checking for package list updates... Done transfering... 280B in 0.0s at 114kB per/sec Package list is up to date... Reading package list... (100%) Error: Package lpr is already installed and is the latest version 40.620user 0.650sys 99.6%, 0ib 0ob 0tx 0da 0to 0swp 0:41.41 That's all I know at the moment from a user perspective (somebody else is managing the FTP site and master yup configuration). I believe that this configuration process isn't too arduous, though. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tony at MPI-Softtech.Com Mon Feb 5 09:49:12 2001 From: tony at MPI-Softtech.Com (Tony Skjellum) Date: Mon, 5 Feb 2001 08:49:12 -0600 (CST) Subject: Fault tolerance and MPI In-Reply-To: Message-ID: You can see our initial paper on this subject at http://www.mpi-softtech.com/publications/mpift-paper-dsm2001.pdf It contains references to other known works in this area. -Tony Anthony Skjellum, PhD, President (tony at mpi-softtech.com) MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759 +1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com "Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters." On Mon, 5 Feb 2001 Carl_Notfors at vdgc.com.sg wrote: > > > Our computational model is quite simple. We have a master node and a > number of slave nodes. All communication is between the master and the > slaves, ie. no internode communication, so all communication is done with > MPI_Send and MPI_Recv (we are using LAM/MPI). > > The problem with MPI is that there is no fault tolerance, if a slave node > "dies" the whole process goes down. According to the LAM documentation it > should be possible to achieve some fault tolerance but we have as yet not > tried this. > > Is there anyone who has got this working? Is there fault tolerance in any > othe MPI implementations? Would it be better to use PVM if you want fault > tolerance? > > > Carl > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From keithu at parl.clemson.edu Mon Feb 5 10:14:52 2001 From: keithu at parl.clemson.edu (Keith Underwood) Date: Mon, 5 Feb 2001 10:14:52 -0500 (EST) Subject: Scyld + myrinet mpich-gm? In-Reply-To: <200102040515.f145FwN18773@mookie.cis.brown.edu> Message-ID: Hmmm... we have something similar, but not quite the same. We have a master w/ 100base-T to the world, gigabit fiber to a 24-10/100 + 2-1000 switch and 16 slaves (not diskless) with 10/100 and gigabit interfaces. We only have 16 ports on our gigabit switch and out master is a different type of machine from the 16 slaves. We have successfully convinced the machines to communicate over the gigabit exclusively while communicating with the master over the 10/100. You do need to use the Scyld MPI though. I seriously doubt that you will get another MPI running as is. Anyway, what we did was: after bringing the nodes up: bpsh -a route add -host 192.168.1.1 eth0 bpsh -a route del default bpsh -a modprobe sk98lin then on each node: bpsh ifconfig eth1 up Then to run an MPI job that DOES NOT run on the head: NO_INLINE_MPIRUN=true bpsh 0 mpiapp -p4pg /tmp/pgfile where /tmp/pgfile is a p4 process group file. This is a real sketchy config so don't expect too much support on it just yet ;-) On Sun, 4 Feb 2001, Dave Johnson wrote: > I've gotten myself involved in bringing a small cluster up and > into production. I'm learning as I go, with the help of the > archives of this mailing list. Unfortunately the searchable > archives at Supercomputer.org seem to be off line (I get internal > server error), and out of date (the last messages seem to be from > around May 2000). > > The current setup is one master with 100base-T to the world, gigabit > fiber to a 16-10/100 + 2-1000 switch, and 12 diskless slaves with > 10/100 and myrinet interfaces. The Scyld release of last Monday is > up and running, and I can bpsh to my heart's content. > > I'm stuck at the point of trying to deploy MPI. Scyld supplies mpi-beowulf > which does not appear to me to use bproc, and /usr/bin/mpirun and mpprun > which do. I've built the mpich-gm from Myricom, but their mpirun command > does not grok bpsh, and expects either rsh or ssh daemons on each slave. > > I've tried a number of approaches that start out looking like they might > work, but have gotten stuck after a few hours down each cowpath. > > Here is a list of some of the snags (I've lost track of some others): > > bpsh is not a full blown shell, doesn't deal well with redirection, changing > directory before running a command, and in particular it can't be swapped for > rsh or ssh when configuring mpich (ie -rsh=bpsh). > > The master node is outside the myrinet, I haven't a clue how to get > it to cooperate with the slaves over ethernet yet have the slaves > use myrinet as much as possible. > > I tried hacking on the first test in mpich-1.2..4/examples/test > (pt2pt/third) that you get when you do make testing or runtests -check. > Tried to get it to use /usr/bin/mpirun. Had to get rid of -mvhome and > -mvback args first, then tried to use bpsh to start up the mpirun on > one node, hoping it could use GM to start up on the other slaves. > After creating the directory in /var where it could create shm_beostat, > > Now I get truckloads of errors: > shmblk_open: Couldn't open shared memory file: /shm_beostat > shmblk_open failed. > > I suppose these might be from the other nodes, expecting everyone is > sharing /var, but I'm leery of nfs mounting all of the master's /var > on each slave. > > I tried applying the Scyld patches against the 1.2.0 mpich sources to > the 1.2..4 sources from Myricom, but most of them went into the mpid/ch_p4 > directory, which is not built when --with-device=ch_gm is specified. > > Then I thought I'd look into the mpprun sources, but I couldn't get > them to build even before I started hacking on them... decided to look > elsewhere for a while. > > Tried getting sshd2 up and running on a slave node. So far it insists > on asking for my password and won't accept it at all. > > Has anyone got a working cluster anything like the one we're building? > What did you have to do differently to make the various packages and > drivers play nice with each other? Where did I go wrong? > > Thanks, > > -- ddj > > Dave Johnson > ddj at cascv.brown.edu > Brown University TCASCV > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu at parl.clemson.edu Clemson University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From newt at scyld.com Mon Feb 5 14:47:24 2001 From: newt at scyld.com (Daniel Ridge) Date: Mon, 5 Feb 2001 14:47:24 -0500 (EST) Subject: diskless nodes with scyld In-Reply-To: <3A79B8EE.D0D7336A@readwo.com> Message-ID: Todd, On Thu, 1 Feb 2001, Todd Henderson wrote: > What is the oldest Intel that the Scyld will install and run on? I have a couple of old 486's at home I was > thinking about playing around with? Our distribution will (out of the box) run on slave nodes which have a PCI bus. If you have more time than money, our software can certianly be made to run on older machines. Regards, Dan Ridge Scyld Computing Corporation _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shahin at labf.org Tue Feb 6 03:00:52 2001 From: shahin at labf.org (Mofeed Shahin) Date: Tue, 6 Feb 2001 08:00:52 +0000 Subject: MP PowerPC Message-ID: <01020608005202.14355@localhost.localdomain> Has anyone had a look these ? http://www.totalimpact.com/G3_MP.html What do people think of them? Mof. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lowther at att.net Mon Feb 5 17:01:46 2001 From: lowther at att.net (Ken) Date: Mon, 05 Feb 2001 17:01:46 -0500 Subject: MP PowerPC References: <01020608005202.14355@localhost.localdomain> Message-ID: <3A7F22CA.36F13FFF@att.net> Mofeed Shahin wrote: > > Has anyone had a look these ? > > http://www.totalimpact.com/G3_MP.html > > What do people think of them? > I've heard they are expensive. They say up to 8 boards can work together, but don't give a height diminsion. I kind of doubt you could populate all your slots with them. At least not on my board. :( Ken _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mrao2001 at yahoo.com Mon Feb 5 15:37:03 2001 From: mrao2001 at yahoo.com (mrao2001 at yahoo.com) Date: Mon, 5 Feb 2001 12:37:03 -0800 Subject: Kickstart Installation problems References: Message-ID: <007401c08fb3$663c9c20$2464a8c0@quova.com> Hi Robert You are right. Something has changed i.e some one had plugged his linux test machine on the same network, which is also running DHCP leasing different IP addresses on the network. After removing the test box, installation went on smoothly. Thanks a lot for your help Regards Mallik ----- Original Message ----- From: "Robert G. Brown" To: "Mallik Vonteddu" Cc: Sent: Sunday, February 04, 2001 6:47 AM Subject: Re: Kickstart Installation problems > On Wed, 31 Jan 2001, Mallik Vonteddu wrote: > > > After booting from the floppy, it could able to get the IP address from > > the DHCP server,but it fails to mount the NFS partition. > > It comes out with an error message" Mount: RPC timeout " . > > > > Checked the following daemons Portmapper,nfsd,mountd and rpcinfo. > > Executing the command "exportfs" shows the exported partitions too. > > Evertyhing seems to work on the nfs server, but when it tries to mount > > the nfs partition, it hangs there for some time and comes out > > as " Mount : RPC timeout " . > > Have you checked to make sure that the ip number you are granting still > has permissions to mount? > > Have you tried booting a rescue floppy and mounting the NFS partition by > hand? > > Is the NFS partition mountable by other clients in the net (if they are > given permission to mount)? > > I'm sorry if these suggestions sound lame, but you've already checked a > lot, it sounds like, and it worked and now it doesn't. Either something > changed or something broke (hardware or software). First hypothesis is > that something changed, so look for something that changed -- an extra > character that somehow got typed in the kickstart line in its dhcpd > entry, an address from the wrong block -- typos can be killers because > everything "works" but -- doesn't. Second hypothesis is software, so > make sure that the NFS client-server connection is valid for the > exported space for some other reliable client. Check to be sure that > your kickstart floppy is valid, unbroken, current, and works for some > other client (if you can). At this point, you've checked the entire > install path, and you're down to client hardware. Which does break, > although I wouldn't expect it to produce an RPC error (only) if it did. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From RSchilling at affiliatedhealth.org Mon Feb 5 21:14:14 2001 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Mon, 5 Feb 2001 18:14:14 -0800 Subject: MP PowerPC Message-ID: <51FCCCF0C130D211BE550008C724149EBE1039@mail1.affiliatedhealth.org> I'd also want to take a look at the driver code as well. The page indicated they are capable of running ELF code, but I'd look for more information about how the GNU environment is used on a specific installation. It'd be nice if it works well though, 'cause you might actually have something that compares to the Intel daughter cards that were made for the Power PC. --Richard Schilling > -----Original Message----- > From: Ken [mailto:lowther at att.net] > Sent: Monday, February 05, 2001 2:02 PM > To: shahin at labf.org > Cc: beowulf at beowulf.org > Subject: Re: MP PowerPC > > > Mofeed Shahin wrote: > > > > Has anyone had a look these ? > > > > http://www.totalimpact.com/G3_MP.html > > > > What do people think of them? > > > > I've heard they are expensive. They say up to 8 boards can work > together, but don't give a height diminsion. I kind of doubt > you could > populate all your slots with them. At least not on my board. :( > > Ken > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zolia at lydys.sc-uni.ktu.lt Tue Feb 6 05:53:19 2001 From: zolia at lydys.sc-uni.ktu.lt (zolia) Date: Tue, 6 Feb 2001 12:53:19 +0200 (EET) Subject: BSc diploma & beowulf Message-ID: hello, i was reading this list nearly a year. I've made a small cluster based on debian and mpi; ran mpqc w/ mpipro and did some successful computations, but for my BSc diploma i have to create some full functional application, and i would like it to run on my cluster. I thought about few things: implement face morphing algorithm and parallelize it. Other would be to write some monitoring/management programs, maybe with snmp, but in this case i don't know for sure what it would be (what tasks to manage, monitor etc..) :/ If you have any suggestions or new ideas what program would be usefull, please let me know. thanx, ==================================================================== Antanas Masevicius Kaunas University of Technology Studentu 48a-101 Computer Center LT-3028 Kaunas LITNET NOC UNIX Systems Administrator Lithuania E-mail: zolia at sc.ktu.lt _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kragen at pobox.com Tue Feb 6 14:11:46 2001 From: kragen at pobox.com (kragen at pobox.com) Date: Tue, 6 Feb 2001 14:11:46 -0500 (EST) Subject: Scyld and Red Hat 7 Message-ID: <200102061911.OAA16710@kirk.dnaco.net> "Stephen Gaudet" writes: > > The one reason that could make me upgrade is the installation of a 2.4 > > kernel. Since RH 7.0 does not have it, there is no reason to upgrade yet. > > Here's another reason you might be interested in if looking to use large > data sets. Presumably you're talking about being interested in 2.4, not RH 7. > Latest Linux kernel holds appeal for IT > > The keepers of the Linux operating system have made improvements to the core > technology that should make it easier to find lost data. > The biggest addition to the release of Linux kernel 2.4.1 is the ReiserFS, > which is a journaling file system. Journaling file systems are key to > operating systems and applications used over extended corporate networks > because they allow administrators to more quickly recover data in the event > of system failure. IMHO, this is not a particularly good explanation of the situation; metadata-journaling filesystems like ReiserFS sacrifice a little performance in the average case (when everything is working fine) to make fscking after a crash very quick. If you're losing any significant amount of your cluster time to fscking, you probably have bigger problems that you should address first. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kragen at pobox.com Tue Feb 6 14:11:47 2001 From: kragen at pobox.com (kragen at pobox.com) Date: Tue, 6 Feb 2001 14:11:47 -0500 (EST) Subject: Big Iorn Message-ID: <200102061911.OAA16720@kirk.dnaco.net> "Per Jessen" writes: > What would you like to know ? > I doubt if the z-server architecture is particularly advanced, but it's > probably on a par with other modern processors. > I've done system-level development (mostly assembler) for the 370 and > 390 architectures for 10-12 years - ask away. I've done VM, MVS and TPF - > not much else runs on 390 - except for Linux now. I'm curious: - in pure (integer, symbolic, or floating-point) computational speed, without much memory access, how do the 390 processors compare to other modern CPUs? (I know that's not what they're sold for, but I'm interested to hear the answer.) - in memory bandwidth (stream benchmarks, for example), how do they compare? - in I/O bandwidth, how do they compare? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kragen at pobox.com Tue Feb 6 14:11:46 2001 From: kragen at pobox.com (kragen at pobox.com) Date: Tue, 6 Feb 2001 14:11:46 -0500 (EST) Subject: Scyld and Red Hat 7 Message-ID: <200102061911.OAA16710@kirk.dnaco.net> "Stephen Gaudet" writes: > > The one reason that could make me upgrade is the installation of a 2.4 > > kernel. Since RH 7.0 does not have it, there is no reason to upgrade yet. > > Here's another reason you might be interested in if looking to use large > data sets. Presumably you're talking about being interested in 2.4, not RH 7. > Latest Linux kernel holds appeal for IT > > The keepers of the Linux operating system have made improvements to the core > technology that should make it easier to find lost data. > The biggest addition to the release of Linux kernel 2.4.1 is the ReiserFS, > which is a journaling file system. Journaling file systems are key to > operating systems and applications used over extended corporate networks > because they allow administrators to more quickly recover data in the event > of system failure. IMHO, this is not a particularly good explanation of the situation; metadata-journaling filesystems like ReiserFS sacrifice a little performance in the average case (when everything is working fine) to make fscking after a crash very quick. If you're losing any significant amount of your cluster time to fscking, you probably have bigger problems that you should address first. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Eugene.Leitl at lrz.uni-muenchen.de Tue Feb 6 15:12:24 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de) Date: Tue, 06 Feb 2001 21:12:24 +0100 Subject: Scyld and Red Hat 7 References: <200102061911.OAA16710@kirk.dnaco.net> Message-ID: <3A805AA8.D3E08970@lrz.uni-muenchen.de> kragen at pobox.com wrote: > IMHO, this is not a particularly good explanation of the situation; > metadata-journaling filesystems like ReiserFS sacrifice a little > performance in the average case (when everything is working fine) to > make fscking after a crash very quick. I have the impression ReiserFS also offers much better performance and noticeably better raw bit utilization in case of many small files. Also, the roadmap is at where the goodies are. It is not just a fs... For a good time call: http://www.namesys.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lowther at att.net Wed Feb 7 08:56:54 2001 From: lowther at att.net (Ken) Date: Wed, 07 Feb 2001 08:56:54 -0500 Subject: Scyld and Red Hat 7 References: <200102061911.OAA16710@kirk.dnaco.net> <3A805AA8.D3E08970@lrz.uni-muenchen.de> Message-ID: <3A815426.138B251E@att.net> Eugene.Leitl at lrz.uni-muenchen.de wrote: > > kragen at pobox.com wrote: > > > IMHO, this is not a particularly good explanation of the situation; > > metadata-journaling filesystems like ReiserFS sacrifice a little > > performance in the average case (when everything is working fine) to > > make fscking after a crash very quick. > > I have the impression ReiserFS also offers much better performance > and noticeably better raw bit utilization in case of many small files. > Also, the roadmap is at where the goodies are. It is not just a fs... > > For a good time call: > http://www.namesys.com/ > I have had crashes where fscheck required manual intervention and ended with the statement: "File system altered!". Maybe not too bad on an individual node, but I'd rather see the ReiserFS give me that little "using old" blurb rush by on the head node after a crash. RSF is effectively putting all that data you crunched into a new file and keeping the old on hand until the new is successfully written as opposed to opening the old and overwriting it. If you crash during the write, you lose that file. Of course, you could always have the software writing dupicates in case of a crash. Ken _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kragen at pobox.com Wed Feb 7 12:41:23 2001 From: kragen at pobox.com (kragen at pobox.com) Date: Wed, 7 Feb 2001 12:41:23 -0500 (EST) Subject: Scyld and Red Hat 7 Message-ID: <200102071741.MAA03604@kirk.dnaco.net> Ken writes: > I have had crashes where fscheck required manual intervention and ended > with the statement: "File system altered!". On ext2fs, reiserfs, or what? > Maybe not too bad on an individual node, but I'd rather see the ReiserFS > give me that little "using old" blurb rush by on the head node after a > crash. "using old"? > RSF is effectively putting all that data you crunched into a new "RSF"? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at coffee.psychology.mcmaster.ca Wed Feb 7 14:26:01 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed, 7 Feb 2001 14:26:01 -0500 (EST) Subject: ServerWorks HEsl reviewed Message-ID: http://www.anandtech.com/showdoc.html?i=1414&p=18 I'm embarassed to admit that I noticed this review. why? it's on anandtech. I *can* however, honestly claim I only did it because I was putting off cleaning the litter boxes (4 of them!) anyway, the short story is: extremely mundane Sandra scores. double-wide PC133 doesn't deliver anything there, though most of the other benchmarks were modestly better than other boards. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From RSchilling at affiliatedhealth.org Wed Feb 7 17:58:29 2001 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Wed, 7 Feb 2001 14:58:29 -0800 Subject: An IT Research and Development center Message-ID: <51FCCCF0C130D211BE550008C724149EBE104C@mail1.affiliatedhealth.org> I have recently been given cause to research the feasibility of opening up an advanced technology research center, and I'm wondering if any of you or your organizations would be interested in using one if it were available. The goal of the center would be to host a place where organizations, researchers, and students could go to get their hands on systems that might not be otherwise available. This would include: virtual reality simulators, static and full-motion beowulf clusters geographic information systems The list is not finalized, but one aim would be to open the center to as many disciplines as possible. The center would also aim to host conferences, and provide educational programs, such as an after school technology program for youths. So far, it looks like if the participants are willing to share the costs, it could be an affordable resource. Thanks for considering . . . Richard Schilling Webmaster / Web Integration Programmer Affiliated Health Services Mount Vernon, WA USA phone 01 360 856 7129 -------------- next part -------------- An HTML attachment was scrubbed... URL: