|
Page 1 of 2
Authentication and disk help on the way
The Beowulf mailing list provides detailed discussions about issues
concerning Linux HPC clusters.
In this article I review some postings to the
Beowulf list on
user authentication within clusters and on some postings to the
smartmontools
mailing list discussing the monitoring of disks.
Authentication Within Clusters
A very good cluster topic for discussion is how people authenticate
within a cluster. Authentication is the process of determining who
you are and what you can do on a system. In layman's terms, authentication
allows you to log into a node and run jobs. On January 30, 2004,
Brent Clements asked the Beowulf mailing list how people did
authentication on their clusters.
One should expect a number of responses to this question. The first
response was from Daniel Widyono who responded that they had one form
of authentication to log into the head node and then used their own
system for authentication inside the cluster. They copy
/etc/passwd
to all of the nodes via a cron script and have written wrappers
for useradd and userdel to copy /etc/passwd
and /etc/shadow to the nodes when a user is added or removed. They
use /etc/password for account information and then they update an
authentication token on each node once it becomes assigned to a user
(through a scheduling system). Then ssh checks the authentication
token using a PAM module before execution begins. They also use Bproc
to determine ownership on the head node.
Robert Brown (RGB to his friends) then pointed out what many experienced
cluster people know - NIS is a high overhead protocol that impacts
the performance of clusters. There have been past discussions about
NIS usage in clusters and if you search the web for "NIS" and "cluster"
you should be able to find the discussion (try filtering the search
with "beowulf" to refine the search). RGB pointed out that you will
get NIS traffic any time a file stat is performed. Imagine this
across many nodes and you will see how NIS can become a drain on network
performance. RGB also discussed security aspects of NIS. There have
been many well known problems with NIS including the fact that NIS
sends information in the clear (i.e. not encrypted). RGB then pointed
out that many people use rsync to copy /etc/passwd and
/etc/shadow
to the nodes (in much the fashion that Daniel mentioned). However, RGB
did point out that you have to watch for password changes and copy the
appropriate files to the nodes (you could write a wrapper for passwd
to perform parts of this operation).
A user with the email name of "Jag" replied that at his university
they configured PAM on the head node to authenticate off the main
kerberos server but they remap the home directories and other things
for the cluster. They also use NIS within the cluster but only for
name service information. To access the compute nodes they use host-based
authentication using ssh. Jag also suggested that for people
using NIS that NSCD (Name Services Caching Daemon), which is part of
glibc, could be used. NSCD doesn't stop the NIS traffic but limits
it because it stores authentication information for subsequent requests.
Leif Nixon posted that he was suspicious of NSCD because he has seen
it hang on stale information with no good reason.
Mark Hahn chipped in that he uses the ubiquitous rsync-ing of the
password/shadow files and uses ssh to get to the nodes inside
the cluster. Mark also had some good comments about why he doesn't
like centralized authentication for a campus because it creates a
central point of failure (despite fail-over servers, etc.), a network
hotspot, and because it can increase the work load on the poor person
who has to administer the central authentication system.
Joe Landman posted that he was very leery of NIS because he has had
customers crash it when serving login information just by running a
simple script across the cluster. Joe said that he prefers to push
name service lookups through DNS, particularly dnsmasq. Joe added that
configuring a full blown named/bind system for a cluster is a significant
overkill in many cases. For authentication, Joe had been hoping that
LDAP would solve his problems but he hasn't been able to repeatedly make
a working LDAP server with databases. He said that he's beginning to
think about a simple database with PAM modules on the front end (such
as pam-mysql).
Brent Clements responded that they had been using LDAP and found it
to work very well especially with Red Hat. They like it because they
can integrate it with a web based account management system for
various groups within the campus. Joe responded that he thought the
client side of LDAP was very easy to configure and run, but it was
the server side that he had trouble with. He used Red Hat's LDAP
rpm's and tried various things but could never get it to work the
way he wanted.
The final poster was Steve Timm and he had some good information
about NIS. Steve has used NIS on their cluster, but found problems
with it. In particular when a job, such as a cron job that runs a
script, starts on all the nodes at once, then the NIS server is
hammered by all the nodes (aka' "NIS storm"). In an effort to
prevent NIS storms, they tried allowing each node in the cluster to
be a NIS slave, but found that the transmission protocol is not
perfect and there were always a few slaves that were down a map or
two. Steve said they ended up pushing the password and shadow files
out to the compute nodes from the head node using rsync.
It seems for the time being that many people prefer using rsync to
copy the password and shadow files to the compute nodes. While not
the most ideal of methods, it is very simple and effective and has
a very low impact on the network (unlike NIS). Perhaps some
ingenious person will come up with a better way some day (hint,
hint).
|