Ready for Real Parallel Computation, as if there was any other
In the last column we introduced the Parallel Virtual Machine (PVM) subroutine library, the original toolset that permitted users to convert a variety of computers on a single network into a "virtual supercomputer". We reviewed its history and discussed how it works, then turned our attention to what you might have to do to install it and make it work on your own cluster playground (which might well be a very simple Network of Workstations -- NOW cluster -- that are ordinary workstations on an ordinary local area network).
New readers are here advised to consult previous columns to get up to date. To play along, you'll need a few linux-based computers on a network with account access on all of them and ideally a shared home directory on all of them. Now it is time to set up PVM so that it can be used (in the next installment) in a Real Parallel Computation.
There are several steps involved in getting PVM to where one can use it in a simple calculation.
- Set up a remote shell such as rsh or ssh so that you can login from your "head node" (the workstation on which you actually run PVM and parallelized task) to all the compute nodes in your cluster without a password. We learned in the last column how to install and set up ssh (secure shell) to accomplish this preliminary but essential step.
- Install PVM itself, ideally in a packaged form (.rpm or .deb), on all the nodes.
- Perform some fairly routine systems administration tasks: arrange for a common file space and user access on all the nodes, if this is not already done on your LAN or cluster.
- Set some environment variables. In a packaged version of PVM these are likely set for you when you start PVM, but it doesn't hurt to set them up permanently.
- Start PVM either from the xpvm (graphical) or the pvm (tty) console and configure your nodes into a virtual supercomputer.
- Run a PVM task.
In this column we will explore how to get your computer to where PVM is installed and working, so that you can create a virtual supercomputer. This accomplishment will set the stage for redoing our original "generate random numbers" problem, but using PVM as a base instead of a perl script and a binary, in next month's column.
Installing and Running PVM
For some years now PVM has been part of the regular Red Hat distribution and can just be installed like any other RPM package. It is also available for most of the other RPM based distributions and Debian on a similar basis. That is, the easiest way to proceed is to just use your distribution repository, or CD set, copy it to a shared directory, and enter (for example, your revision numbers may differ):
#rpm -Uvh pvm-3.4.4-12.i386.rpm #rpm -Uvh pvm-gui-3.4.4-12.i386.rpm
Note that we're also installing XPVM, PVM's nifty graphical front end as it will really help you visualize and debug the virtual computer while getting started.
Alternatively, you can visit the PVM home page. There you can follow instructions to download a tarball of the pvm sources and build it locally. This method has some advantages, but for beginners the disadvantages (such as figuring out the correct paths and PVM's awesomely complex "Artificial Intelligence Make" utility, aimk) outweigh the use of prebuilt RPMs.
We are almost done. We have to set up the environment to make PVM function "automatically" for us. It actually would almost work out of the RPM box as the "executable" installed by the rpm is actually a shell script wrapper that sets most of what you need, but we have to tell PVM to use ssh instead of rsh so we might as well set them all. If your default shell is bash, add the following to your .bashrc on all nodes (likely only one addition, assuming it is NFS shared):
# PVM environment variables PVM_ROOT=/usr/share/pvm3 PVM_RSH=/usr/bin/ssh XPVM_ROOT=/usr/share/pvm3/xpvm export PVM_ROOT export PVM_RSH export XPVM_ROOT
If your default shell is csh or tcsh, add the following to your .cshrc or .tcshrc:
#PVM environment variables setenv PVM_ROOT /usr/share/pvm3 setenv PVM_RSH /usr/bin/ssh setenv XPVM_ROOT /usr/share/pvm3/xpvm
Now log out and log in again so that your current shell session has these variables correctly set.
It's time to test the installation by starting pvm on our master node and adding a compute node on a remote system. The sidebar shows this procedure. Before attempting this recall that you must be able to remotely login to the compute node without a password using ssh as discussed in last month's article. If you missed this column, don't panic -- a few minutes with Google and the web should find you online HOWTO resources on how to set up ssh so you can login without a password -- it is frequently discussed on a number of archived lists and at least one web document is devoted to this alone.
Sidebar One: Example PVM Start-Up |
Start up an xterm or other terminal window and enter (changing the names to match those of your network):
$pvm pvm> add lilith add lilith 1 successful HOST DTID lilith 80000 pvm> conf conf 2 hosts, 1 data format HOST DTID ARCH SPEED DSIG lucifer 40000 LINUXI386 1000 0x00408841 lilith 80000 LINUXI386 1000 0x00408841 pvm> |
If you were able to reproduce something similar to the sidebar, you have a virtual supercomputer running with two nodes, lucifer (the head node) and lilith! We could add more nodes this way (and you should feel free to do so and otherwise experiment). If you read the pvm documentation (available online at the URL's given in the Resources sidebar well as the pvm man pages that accompanied your distribution) you can easily learn to add a whole list of hosts at once by putting their names in a hostfile and running pvm hostfile. There still other, and better, ways to add a lot of nodes all at once, but this is enough to get us started.
Before we move on, we should learn one more thing about pvm: how to quit. There are actually two ways to exit the console. The "quit" command exits the console but leaves pvm running on all the nodes. In this way, you can start pvm, build a cluster, and exit the console monitor, run tasks, play games, logout and go home, come back the next day and crank up the pvm console, and there is your cluster, still configured or still working. Try it -- quit from pvm and then start it up again. When you type conf at the prompt, you should see your cluster still there.
On the other hand, the "halt" command stops pvm and all the pvmds on all the nodes! It destroys your virtual cluster completely. This type of exit is important to be able to do as well. PVM creates lock files on all the nodes in a cluster that prevents their use by any other people running PVM, including you (you can't start pvm twice, or start pvm on a node and see or start a different cluster). The halt command SHOULD remove all of those lock files and trace files and restore a node to the state where a new PVM cluster can be built, possibly by another user.