|
Page 3 of 3
iSCSI
With the advent of high-speed networks and faster processors, the ability
to centralize storage and allocate it to various machines on
the network has taken off. SAN systems use this approach but use
expensive and proprietary Fibre Channel (FC) networks and in some
cases proprietary storage media. An open initiative to replace the FC
network with common IP based networks and common storage media was
begun. This initiative, called iSCSI (internet SCSI), was developed by
the Internet Engineering Task Force (IETF).
iSCSI encapsulates SCSI commands in TCP (Transmission Control Protocol)
packets and sends them to the target computer over IP (Internet
Protocol) on an ethernet network. The system then processes the TCP/IP
packet and processes the SCSI commands. Since SCSI is bi-directional,
any results or data in response to the original request are passed
back to the originating system. Thus a system can access storage over
the network using standard SCSI commands. In fact, the client
computer (called an initiator) does not even need a hard drive in it
at all and can access storage space on the target computer using
iSCSI. Using iSCSI, the storage space appears as though it's
physically attached (via a block device) and a file system can be
built on it.
The overall basic process for iSCSI is fairly simple. Assume that a user
or an application on the initiator makes a request of the iSCSI
storage space. The operating system creates the corresponding SCSI
commands, encapsulates them, perhaps encrypting them, and puts a header
on the packet. It then sends them over the IP network to the target.
The target decrypts the packet (if encrypted) and then separates out
the SCSI commands. The SCSI commands are then sent to the SCSI
controller and any results of the command are returned to the
original request. Since IP networks can be lossy where packets can
either be dropped, or have to be resent, or arrive out of order, the
iSCSI protocol has had to develop techniques to accommodate these and
similar situations.
There are several desirable aspects to iSCSI. First, no new hardware is
required either by the initiator (client) or the target (server). The
same physical disks, network cards, and network can be used for an
iSCSI network. Consequently the startup costs are much less than a
SAN. Second, iSCSI can be used over wide area networks (WANs) that
span multiple routers. SANs are limited to their distance based on
their configuration. Also, theoretically, since iSCSI is a standard
protocol, you can mix and match initiators and targets across various
operating systems.
There are several Linux iSCSI projects. The most prominent is an
iSCSI initiator
that was developed by Cisco and open-sourced. There are
patches for 2.4 and 2.6 kernels. Many Linux distributions ship with
the initiator already in the kernel. An iSCSI
target package
is also available, but only for 2.4 kernels (this package is sometimes
called the Aristech target package). It allows Linux machines
to be used as targets for iSCSI initiators. There is also a
project
originally developed by Intel and open-sourced.
A fork of the Ardistech iSCSI target package was made a with an
eye towards porting it to the Linux 2.6 kernel and adding features
to it (the original Aristech iSCSI target package has not been
developed for some time). Then this project was combined with
the iSCSI initiator
project to develop a combined initiator and target package for
Linux. This
package is under very
active development and fully supports the Linux 2.6 kernel series.
There is a very good
HOWTO on
how to use the Cisco initiator and the Ardistech target package in Linux.
There is also an article on how to use
iSCSI as the root disk
for nodes in a cluster. This could be used to boot diskless compute
nodes and provide them with an operating system located on the network.
There are several ways to use iSCSI with a cluster. A simple way would be
to use a few disk-full nodes within a cluster as targets for the rest
of the compute nodes in the cluster that are the initiators. The
compute nodes can even be made diskless. Parts of the disk subsystem
on each target node would be allocated to a compute node. A separate
storage network can be utilized to increase throughput of iSCSI. The
compute node can then format and mount the disk as though it were a
local disk. This architecture allows the storage to be concentrated
in a few nodes for easier management. More over, lvm can be used to
provide space in an intelligent manner for the compute nodes so that
space can be expanded.
HyperSCSI
HyperSCSI is a
related protocol to iSCSI. It uses a different packet
encapsulation than the TCP encapsulation of iSCSI and sends its
packets over raw ethernet. HyperSCSI
is being developed by researchers at the Data Storage Institute that
is affiliated with the National University of Singapore and has
been placed on sourceforge. The
researchers have developed HyperSCSI under a GNU GPL (GNU Public
License) license on Linux platforms. The developers say they have
focused on developing a fast, efficient, and secure protocol that can
be easily used on common, inexpensive ethernet networks.
Similar to iSCSI, HyperSCSI wraps the SCSI commands to transmit the packet to
the target system over the network . However, in contrast to iSCSI,
HyperSCSI uses its own packet header rather than a TCP header. This
approach promises to be more efficient because the TCP overhead has
been eliminated. The target system then decodes and executes the SCSI
commands. Thus, any HyperSCSI equipped system, even one without a
disk or a SCSI controller, can access a HyperSCSI exported device as
though they were a local device. You can even run lvm (Logical Volume
Management) and RAID (Redundant Array of Inexpensive Disk) tools on
these mounted devices.
The performance of HyperSCSI is also quite good. Between two systems over
Gigabit Ethernet, the developers have achieved over 99% performance
of a local disk using several benchmarks. The developers of HyperSCSI
also claim that they can get better performance than iSCSI. For
instance, they claim that they can match Fibre Channel performance
with only a 21% increase in CPU utilization and 3.4 times more IRQ
(Interrupt Requests) per second than Fibre Channel. To match the same
Fibre Channel performance the HyperSCSI developers say that software
based iSCSI requires a 33% increase in CPU utilization and 6 times
more IRQs per second than Fibre Channel.
This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.
Dr. Jeff Layton hopes to someday have a 20 TB file system in his home
computer. He lives in the Atlanta area
and can sometimes be found lounging at the nearby Fry's, dreaming of
hardware and drinking coffee (but never during working hours).
Comment on this article
You must login to leave comments...
Other Visitors Comments
|