Maui Installation
A basic installation of Maui is straightforward, but if you plan on using it make sure you read the Online Administrator's Guide to understand all that it has to offer:-bash-3.00# wget -nd http://www.clusterresources.com/downloads/maui/maui-3.2.6p14.tar.gz -bash-3.00# tar -zxf maui-3.2.6p14.tar.gz -bash-3.00# cd maui-3.2.6p14 -bash-3.00# ./configure --with-pbs=/usr/local/torque/torque-2.1.2/
With this version there seems to be a problem with the Makefile, which is looking for the libpbs library (which was the name for the Torque library in previous versions, but now is called libtorque), so in line 26 of the Makefile we change -lpbs to -ltorque and then we continue:
-bash-3.00# yum install libnet -bash-3.00# make -bash-3.00# make install
As for Torque previously, in order to use the modules package with Maui, we create the directory /usr/local/Modules/3.2.3/modulefiles/maui/ and inside it we create two files:
-bash-3.00# cat /usr/local/Modules/3.2.3/modulefiles/maui/.version #%Module1.0########################################################### ## ## version file for Maui ## set ModulesVersion "326p14" -bash-3.00# cat /usr/local/Modules/3.2.3/modulefiles/maui/maui326p14 #%Module1.0##################################################################### ## ## Maui 3.2.6p14 modulefile ## ## modulefiles/maui/maui326p14 ## proc ModulesHelp { } { global version mauiroot puts stderr "\tmaui 3.2.6p14 - loads MAUI version 3.2.6p14" puts stderr "\n\tThis adds $mauiroot/* to several of the" puts stderr "\tenvironment variables.\n" } module-whatis "loads MAUI 3.2.6p14" # for Tcl script use only set version 3.2.6p14 set mauiroot /usr/local/maui prepend-path PATH $mauiroot/bin prepend-path MANPATH $mauiroot/man prepend-path LD_LIBRARY_PATH $mauiroot/lib
Also, as per Torque, we want users and root to have access to the bin files, the libraries, etc. so we add to both /etc/bashrc and /root/.bashrc the line:
module load maui/maui326p14
We restart the root session and we start maui manually.
-bash-3.00# /usr/local/maui/sbin/maui
If all goes well you can check the job queue with the command showq from either the root or the angelv user account. We will test this after the following section on how to make Maui start at boot time.
Automatic Start Of Maui At Boot Time
For the automatic start of Maui at boot time, we just need to create the init file maui (a similar script can be found in the etc/maui.d directory of the source code). Remember to change the permissions to 755, with the command chmod 755 /etc/init.d/maui, and then to create the necessary symbolic links with the command chkconfig --add maui:-bash-3.00# cat /etc/init.d/maui #!/bin/bash # # Red Hat Linux Maui Resource script # # chkconfig: 345 90 90 # description: Maui is a cluster scheduler which uses # TORQUE to schedule jobs on that cluster. # Source function library. . /etc/init.d/functions MAUIBINARY="/usr/local/maui/sbin/maui" start() { if [ -x $MAUIBINARY ]; then daemon $MAUIBINARY RETVAL=$? return $RETVAL else echo "$0 ERROR: Maui program not found" fi } stop() { echo -n $"Stopping $prog: " killproc $MAUIBINARY RETVAL=$? echo return $RETVAL } restart() { stop start } reload() { restart } case "$1" in start) start ;; stop) stop ;; reload|restart) restart ;; status) status $MAUIBINARY ;; *) echo $"Usage: $0 {start|stop|restart|reload|status}" exit 1 esac exit $? exit $RETVAL
Verification Of Parallel Programming Execution
After all this work, we are nearly finished with our first version of the virtual cluster. To test that everything is working correctly, we will execute a parallel program, submitted to the cluster through Maui. First of all, we should reboot the cluster (remember the recipe we saw above), to verify that all the services are started at boot time correctly. Then, as an example of how to submit jobs to Maui and in order to verify the execution of a parallel programs submitted to the queue, we create two files in the angelv user account (cpu-eater.c and submit-eater):[angelv@boldo ~]$ cat cpu-eater.c #include "mpi.h" #includeint main(int argc, char *argv[]) { int rank, size; int t; long i,j = 0; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Hello World from process %d of %d\n", rank, size); for (i=0;i<100000;i++) for(j=0;j<100000;j++) if (!(i % 10000) && (j == 0) && (rank == 0)) printf("."); MPI_Finalize(); return 0; } [angelv@boldo ~]$ cat submit-eater #!/bin/sh # This finds out the number of nodes we have NP=$(wc -l $PBS_NODEFILE | awk '{print $1}') cd $PBS_O_WORKDIR # Make the MPI call mpirun -np $NP -machinefile $PBS_NODEFILE ./cpu-eater
cpu-eater.c is just the parallel version of your typical "Hello World" program, with a wasteful loop, so that the job does not complete immediately. submit-eater is the file needed to submit this job to our queuing system. We compile it, launch it a number of times to the queue with Maui, and verify that everything is working as expected:
[angelv@boldo ~]$ mpicc -o cpu-eater cpu-eater.c [angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater [angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater [angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater [angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater [angelv@boldo ~]$ qsub -l nodes=4:ppn=4 submit-eater [angelv@boldo ~]$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 9 angelv Running 4 00:58:58 Fri Jun 16 01:58:35 10 angelv Running 4 00:59:14 Fri Jun 16 01:58:51 11 angelv Running 4 00:59:16 Fri Jun 16 01:58:53 12 angelv Running 4 00:59:17 Fri Jun 16 01:58:54 4 Active Jobs 16 of 16 Processors Active (100.00%) 4 of 4 Nodes Active (100.00%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 13 angelv Idle 16 1:00:00 Fri Jun 16 01:59:05 1 Idle Job BLOCKED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 5 Active Jobs: 4 Idle Jobs: 1 Blocked Jobs: 0 [angelv@boldo ~]$
Conclusions
Phew!! We did quite a lot of work to get here, but now we have a more or less functional virtual cluster. Many improvements can be made, but by now you should have the basic understanding to configure your own cluster. I would suggest you to create a snapshot of the cluster (as we saw above) and continue experimenting with many of the other features that you would perhaps want in a real production cluster (DHCP, LDAP, SystemImager, Highly Available services, Parallel File Systems, cluster monitoring software, etc.). As mentioned in the introduction, if you just want to try out the virtual cluster obtained by following the steps in Part One and Two of this article, but without doing all the configuration steps yourself, you can obtain a ready-made cluster image from the download:contrib:cluster page at Jailtime.org. Happy (virtual) clustering!Angel de Vicente, Ph.D, has been working during the last three years at the Instituto de Astrofisica de Canarias, giving support to the astrophysicists about scientific software and being in charge of supercomputing at the institute. Being in the process of upgrading their Beowulf cluster, he lives of late in a world of virtual machines and networks, where he feels free to experiment.