Survey of File Systems for Clusters (And Storage Options)


With the advent of high-speed networks and faster processors, the ability to centralize storage and allocate it to various machines on the network has taken off. SAN systems use this approach but use expensive and proprietary Fibre Channel (FC) networks and in some cases proprietary storage media. An open initiative to replace the FC network with common IP based networks and common storage media was begun. This initiative, called iSCSI (internet SCSI), was developed by the Internet Engineering Task Force (IETF).

{mosgoogle right}

iSCSI encapsulates SCSI commands in TCP (Transmission Control Protocol) packets and sends them to the target computer over IP (Internet Protocol) on an ethernet network. The system then processes the TCP/IP packet and processes the SCSI commands. Since SCSI is bi-directional, any results or data in response to the original request are passed back to the originating system. Thus a system can access storage over the network using standard SCSI commands. In fact, the client computer (called an initiator) does not even need a hard drive in it at all and can access storage space on the target computer using iSCSI. Using iSCSI, the storage space appears as though it's physically attached (via a block device) and a file system can be built on it.

The overall basic process for iSCSI is fairly simple. Assume that a user or an application on the initiator makes a request of the iSCSI storage space. The operating system creates the corresponding SCSI commands, encapsulates them, perhaps encrypting them, and puts a header on the packet. It then sends them over the IP network to the target. The target decrypts the packet (if encrypted) and then separates out the SCSI commands. The SCSI commands are then sent to the SCSI controller and any results of the command are returned to the original request. Since IP networks can be lossy where packets can either be dropped, or have to be resent, or arrive out of order, the iSCSI protocol has had to develop techniques to accommodate these and similar situations.

There are several desirable aspects to iSCSI. First, no new hardware is required either by the initiator (client) or the target (server). The same physical disks, network cards, and network can be used for an iSCSI network. Consequently the startup costs are much less than a SAN. Second, iSCSI can be used over wide area networks (WANs) that span multiple routers. SANs are limited to their distance based on their configuration. Also, theoretically, since iSCSI is a standard protocol, you can mix and match initiators and targets across various operating systems.

There are several Linux iSCSI projects. The most prominent is an iSCSI initiator that was developed by Cisco and open-sourced. There are patches for 2.4 and 2.6 kernels. Many Linux distributions ship with the initiator already in the kernel. An iSCSI target package is also available, but only for 2.4 kernels (this package is sometimes called the Aristech target package). It allows Linux machines to be used as targets for iSCSI initiators. There is also a project originally developed by Intel and open-sourced.

A fork of the Ardistech iSCSI target package was made a with an eye towards porting it to the Linux 2.6 kernel and adding features to it (the original Aristech iSCSI target package has not been developed for some time). Then this project was combined with the iSCSI initiator project to develop a combined initiator and target package for Linux. This package is under very active development and fully supports the Linux 2.6 kernel series.

There is a very good HOWTO on how to use the Cisco initiator and the Ardistech target package in Linux. There is also an article on how to use iSCSI as the root disk for nodes in a cluster. This could be used to boot diskless compute nodes and provide them with an operating system located on the network.

There are several ways to use iSCSI with a cluster. A simple way would be to use a few disk-full nodes within a cluster as targets for the rest of the compute nodes in the cluster that are the initiators. The compute nodes can even be made diskless. Parts of the disk subsystem on each target node would be allocated to a compute node. A separate storage network can be utilized to increase throughput of iSCSI. The compute node can then format and mount the disk as though it were a local disk. This architecture allows the storage to be concentrated in a few nodes for easier management. More over, lvm can be used to provide space in an intelligent manner for the compute nodes so that space can be expanded.


HyperSCSI is a related protocol to iSCSI. It uses a different packet encapsulation than the TCP encapsulation of iSCSI and sends its packets over raw ethernet. HyperSCSI is being developed by researchers at the Data Storage Institute that is affiliated with the National University of Singapore and has been placed on sourceforge. The researchers have developed HyperSCSI under a GNU GPL (GNU Public License) license on Linux platforms. The developers say they have focused on developing a fast, efficient, and secure protocol that can be easily used on common, inexpensive ethernet networks.

{mosgoogle left}

Similar to iSCSI, HyperSCSI wraps the SCSI commands to transmit the packet to the target system over the network . However, in contrast to iSCSI, HyperSCSI uses its own packet header rather than a TCP header. This approach promises to be more efficient because the TCP overhead has been eliminated. The target system then decodes and executes the SCSI commands. Thus, any HyperSCSI equipped system, even one without a disk or a SCSI controller, can access a HyperSCSI exported device as though they were a local device. You can even run lvm (Logical Volume Management) and RAID (Redundant Array of Inexpensive Disk) tools on these mounted devices.

The performance of HyperSCSI is also quite good. Between two systems over Gigabit Ethernet, the developers have achieved over 99% performance of a local disk using several benchmarks. The developers of HyperSCSI also claim that they can get better performance than iSCSI. For instance, they claim that they can match Fibre Channel performance with only a 21% increase in CPU utilization and 3.4 times more IRQ (Interrupt Requests) per second than Fibre Channel. To match the same Fibre Channel performance the HyperSCSI developers say that software based iSCSI requires a 33% increase in CPU utilization and 6 times more IRQs per second than Fibre Channel.

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Dr. Jeff Layton hopes to someday have a 20 TB file system in his home computer. He lives in the Atlanta area and can sometimes be found lounging at the nearby Fry's, dreaming of hardware and drinking coffee (but never during working hours).



    Login Form

    Share The Bananas

    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.