Building a Virtual Cluster with Xen (Part Two)

Article Index

Maui Installation

A basic installation of Maui is straightforward, but if you plan on using it make sure you read the Online Administrator's Guide to understand all that it has to offer:

  
-bash-3.00# wget -nd http://www.clusterresources.com/downloads/maui/maui-3.2.6p14.tar.gz
-bash-3.00# tar -zxf maui-3.2.6p14.tar.gz
-bash-3.00# cd maui-3.2.6p14
-bash-3.00# ./configure --with-pbs=/usr/local/torque/torque-2.1.2/

With this version there seems to be a problem with the Makefile, which is looking for the libpbs library (which was the name for the Torque library in previous versions, but now is called libtorque), so in line 26 of the Makefile we change -lpbs to -ltorque and then we continue:

 
-bash-3.00# yum install libnet
-bash-3.00# make
-bash-3.00# make install

As for Torque previously, in order to use the modules package with Maui, we create the directory /usr/local/Modules/3.2.3/modulefiles/maui/ and inside it we create two files:

  
-bash-3.00# cat /usr/local/Modules/3.2.3/modulefiles/maui/.version
#%Module1.0###########################################################
##
## version file for Maui
##
set ModulesVersion      "326p14"

-bash-3.00# cat /usr/local/Modules/3.2.3/modulefiles/maui/maui326p14
#%Module1.0#####################################################################
##
## Maui 3.2.6p14 modulefile
##
## modulefiles/maui/maui326p14
##
proc ModulesHelp { } {
        global version mauiroot

        puts stderr "\tmaui 3.2.6p14 - loads MAUI version 3.2.6p14"
        puts stderr "\n\tThis adds $mauiroot/* to several of the"
        puts stderr "\tenvironment variables.\n"
}

module-whatis   "loads MAUI 3.2.6p14"

# for Tcl script use only
set     version         3.2.6p14
set     mauiroot       /usr/local/maui

prepend-path    PATH            $mauiroot/bin
prepend-path    MANPATH         $mauiroot/man
prepend-path    LD_LIBRARY_PATH $mauiroot/lib

Also, as per Torque, we want users and root to have access to the bin files, the libraries, etc. so we add to both /etc/bashrc and /root/.bashrc the line:

  
module load maui/maui326p14

We restart the root session and we start maui manually.

  
-bash-3.00# /usr/local/maui/sbin/maui

If all goes well you can check the job queue with the command showq from either the root or the angelv user account. We will test this after the following section on how to make Maui start at boot time.

Automatic Start Of Maui At Boot Time

For the automatic start of Maui at boot time, we just need to create the init file maui (a similar script can be found in the etc/maui.d directory of the source code). Remember to change the permissions to 755, with the command chmod 755 /etc/init.d/maui, and then to create the necessary symbolic links with the command chkconfig --add maui:

  
-bash-3.00# cat /etc/init.d/maui
#!/bin/bash
#
# Red Hat Linux Maui Resource script
#
# chkconfig: 345 90 90
# description: Maui is a cluster scheduler which uses
# TORQUE to schedule jobs on that cluster.


# Source function library.
. /etc/init.d/functions

MAUIBINARY="/usr/local/maui/sbin/maui"

start() {
        if [ -x $MAUIBINARY ]; then
          daemon $MAUIBINARY
          RETVAL=$?
          return $RETVAL
        else
          echo "$0 ERROR: Maui program not found"
        fi
}

stop() {
        echo -n $"Stopping $prog: "
        killproc $MAUIBINARY
        RETVAL=$?
        echo
        return $RETVAL
}

restart() {
        stop
        start
}

reload() {
        restart
}

case "$1" in
start)
        start
        ;;
stop)
        stop
        ;;
reload|restart)
        restart
        ;;
status)
        status $MAUIBINARY
        ;;
*)
        echo $"Usage: $0 {start|stop|restart|reload|status}"
        exit 1
esac

exit $?
exit $RETVAL

Verification Of Parallel Programming Execution

After all this work, we are nearly finished with our first version of the virtual cluster. To test that everything is working correctly, we will execute a parallel program, submitted to the cluster through Maui. First of all, we should reboot the cluster (remember the recipe we saw above), to verify that all the services are started at boot time correctly. Then, as an example of how to submit jobs to Maui and in order to verify the execution of a parallel programs submitted to the queue, we create two files in the angelv user account (cpu-eater.c and submit-eater):

  
[angelv@boldo ~]$ cat cpu-eater.c
#include "mpi.h"
#include 

int main(int argc, char *argv[])
{
  int rank, size;
  int t;
  long i,j = 0;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  printf("Hello World from process %d of %d\n", rank, size);

  for (i=0;i<100000;i++)
    for(j=0;j<100000;j++)
      if (!(i % 10000) && (j == 0) && (rank == 0))
        printf(".");

  MPI_Finalize();
  return 0;
}

[angelv@boldo ~]$ cat submit-eater
#!/bin/sh

# This finds out the number of nodes we have
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
cd $PBS_O_WORKDIR

# Make the MPI call
mpirun -np $NP -machinefile $PBS_NODEFILE ./cpu-eater

cpu-eater.c is just the parallel version of your typical "Hello World" program, with a wasteful loop, so that the job does not complete immediately. submit-eater is the file needed to submit this job to our queuing system. We compile it, launch it a number of times to the queue with Maui, and verify that everything is working as expected:

  
[angelv@boldo ~]$ mpicc -o cpu-eater cpu-eater.c
[angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater
[angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater
[angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater
[angelv@boldo ~]$ qsub -l nodes=2:ppn=2 submit-eater
[angelv@boldo ~]$ qsub -l nodes=4:ppn=4 submit-eater

[angelv@boldo ~]$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

9                    angelv    Running     4    00:58:58  Fri Jun 16 01:58:35
10                   angelv    Running     4    00:59:14  Fri Jun 16 01:58:51
11                   angelv    Running     4    00:59:16  Fri Jun 16 01:58:53
12                   angelv    Running     4    00:59:17  Fri Jun 16 01:58:54

     4 Active Jobs      16 of   16 Processors Active (100.00%)
                         4 of    4 Nodes Active      (100.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

13                   angelv       Idle    16     1:00:00  Fri Jun 16 01:59:05

1 Idle Job

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 5   Active Jobs: 4   Idle Jobs: 1   Blocked Jobs: 0
[angelv@boldo ~]$

{mosgoogle right}

Conclusions

Phew!! We did quite a lot of work to get here, but now we have a more or less functional virtual cluster. Many improvements can be made, but by now you should have the basic understanding to configure your own cluster. I would suggest you to create a snapshot of the cluster (as we saw above) and continue experimenting with many of the other features that you would perhaps want in a real production cluster (DHCP, LDAP, SystemImager, Highly Available services, Parallel File Systems, cluster monitoring software, etc.). As mentioned in the introduction, if you just want to try out the virtual cluster obtained by following the steps in Part One and Two of this article, but without doing all the configuration steps yourself, you can obtain a ready-made cluster image from the download:contrib:cluster page at Jailtime.org. Happy (virtual) clustering!

Angel de Vicente, Ph.D, has been working during the last three years at the Instituto de Astrofisica de Canarias, giving support to the astrophysicists about scientific software and being in charge of supercomputing at the institute. Being in the process of upgrading their Beowulf cluster, he lives of late in a world of virtual machines and networks, where he feels free to experiment.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.