|
Page 2 of 2
There are many options with iSCSI. You can use a central server
(target) with a number of disks, or several targets or several
networks, or combinations (as the Vulcans say, "infinite diversity
in infinite combinations" -- while not infinite, there are a number
of options). iSCSI makes block devices available via the Ethernet network.
To create the block devices, you can use md (Multiple Devices or
software RAID), lvm, evms, etc. to create block devices suitable
for iSCSI. These tools allow you to create block devices from
partitions or different disks. Using these tools also allows you to
export block devices from one target to multiple nodes (initiators).
iSCSI also allows you to export block devices over various Ethernet
networks to various initiators.
Using software RAID (md ) on the server (target) machine you can
combine block devices before exposing them to iSCSI initiators. This
allows you to use some kind of RAID protection before exposing the
block devices. For example, you could create a software RAID 5 on
the target machine using all of the disks, then use LVM to create
volume group(s) and logical volumes across all of the disks, and
then expose the logical volumes to nodes (initiators). This way if
a disk is lost, it can be replaced without losing the file system
on any node. However, if the server (target) goes down, you lose
the storage on the nodes (initiators) it was serving.
An alternative is to take a number of servers with disks, expose
a set of block devices from each target to a set of nodes such that
each node (initiator) mounts one block device from a given target.
Then the node would take the iSCSI block devices, use software RAID-5
(or RAID-1) and LVM to create a final block device that is formatted
with a file system. This configuration allows an entire target machine to go down
without losing the storage on the nodes (initiators) since the final
block device is RAID-5 or at least RAID-1 so that you still have access to
the data. You can also use RAID-5 on the targets so that
the lose of a single disk will not interrupt the initiators. This
configuration might also have some speed advantages depending upon
how the storage is used.
You can also use striping via RAID or lvm to improve the disk
performance on the target prior to exposing the storage block to
the initiator(s). However, this will likely put the bottleneck on
the network. You could also stripe on the initiator side by using
the block devices from various targets in md to create the final
block device for the file system.
Since Gigabit Ethernet (GigE) is relatively inexpensive today, it's
possible to have the target machines expose block devices on various
networks. This feature allows you to reduce the number of block devices
communicating over a given network thus improving throughput.
There are many possible ways to configure an iSCSI storage solution.
Using md and lvm or evms, you can create block
devices on the targets and expose those to the initiators. Then you can
use exposed devices from various targets on a single initiator to get good
performance and improve resiliency.
HyperSCSI
HyperSCSI can also be used
to provide local storage on the nodes.
HyperSCSI
is a network
storage protocol like iSCSI, but rather than
use IP as iSCSI does, it uses it's own packets over raw Ethernet. By
doing so, it can be more efficient because of the reduction in the
TCP/IP overhead. However, because it doesn't use IP packets it's not
a routable protocol. For small to medium clusters this is not
likely to be an issue.
Configurations for HyperSCSI are conceptually very similar to
iSCSI configurations. It uses block devices as does iSCSI and it
uses Ethernet networks. As I said before, the big advantage of
HyperSCSI is that is doesn't use IP, but it's own packets. This
feature can make for an extremely efficient network storage protocol
and is very well suited for clusters since they typically don't
use routed networks inside the cluster.
Commercial Offerings
There are several commercial options for providing storage, both
locally and for global file systems. For example, one could use
Lustre,
IBRIX,
GPFS, or
Terrscale with various
storage devices, or use the
Panasas ActiveScale Storage Cluster.
One could also use Coraid
ATA-over-Ethernet product to provide local storage for each node
in a fashion similar to iSCSI or HyperSCSI.
For smaller clusters, these solutions are likely to be too expensive.
For larger clusters, perhaps from 32 nodes and up, they might prove
to be a price/performance winner. However, there are some
applications that are very I/O intensive and could benefit from a
high performance file system regardless of the size of the cluster.
Summary
As you can see there are a number of options for providing either global
storage to diskless nodes or local storage for diskless nodes. Depending
upon your code(s), you can choose to use either global storage or local
storage or a combination of the two.
For small to medium clusters, which I call up to 64 or 128 nodes, NFS
will work well enough if you have a good storage subsystem behind it
and your IO usage isn't too large (high IO rates can easily kill performance
over NFS). In addition, AFS offers some very attractive feature compared
to NFS so you should seriously consider it. If you need lots of IO, then
PVFS or PVFS2 will work well, if
you understand that it is a high-speed scratch file system and not a place
for storing your files on a longer term basis such as what a home file
system requires.
If you need storage local to each node for running your codes then
either iSCSI or HyperSCSI will work well. Plus they are very flexible
and can be configured in just about any way you want or need. In some
cases you might have to also use global storage such as NFS to help.
In either way
In my next installment I'll discuss commercial options more in
depth as I continue discussing file system options for diskless clusters
larger than 128 nodes.
The core of this article was originally published in ClusterWorld Magazine.
It has been updated and formatted for the web. If you want to read more about
HPC clusters and Linux you may wish to visit
Linux Magazine.
Dr. Jeff Layton hopes to someday have a 20 TB file system in his home
computer (donations gladly accepted). He can sometimes be found lounging
at a nearby Fry's, dreaming of
hardware and drinking coffee (but never during working hours).
Comment on this article
You must login to leave comments...
Other Visitors Comments
There are no comments currently....
|