Scyld ClusterWare HPC: User's Guide
<< Previous	Scyld ClusterWare Overview	Next >>

Scyld ClusterWare Technical Summary

Scyld ClusterWare presents a more uniform system view of the entire cluster to both users and applications through extensions to the kernel. A guiding principle of these extensions is to have little increase in both kernel size and complexity and, more importantly, negligible impact on individual processor performance.

In addition to its enhanced Linux kernel, Scyld ClusterWare includes libraries and utilities specifically improved for high-performance computing applications. For information on the Scyld libraries, see the Reference Guide. Information on using the Scyld utilities to run and monitor jobs is provided in the Chapter called Interacting With the System and the Chapter called Running Programs. If you need to use the Scyld utilities to configure and administer your cluster, see the Administrator's Guide.

Top-Level Features of Scyld ClusterWare

The following list summarizes the top-level features of Scyld ClusterWare.

Security and Authentication. With Scyld ClusterWare, the master node is a single point of security administration and authentication. The authentication envelope is drawn around the entire cluster and its private network. This obviates the need to manage copies or caches of credentials on compute nodes or to add the overhead of networked authentication. Scyld ClusterWare provides simple permissions on compute nodes, similar to Unix file permissions, allowing their use to be administered without additional overhead.

Easy Installation. Scyld ClusterWare is designed to augment a full Linux distribution, such as Red Hat Enterprise Linux (RHEL) or CentOS. The installer used to initiate the installation on the master node is provided on an auto-run CD-ROM. You can install from scratch and have a running Linux HPC cluster in less than an hour. See the Installation Guide for full details.

Install Once, Execute Everywhere. A full installation of Scyld ClusterWare is required only on the master node. Compute nodes are provisioned from the master node during their boot process, and they dynamically cache any additional parts of the system during process migration or at first reference.

Single System Image. Scyld ClusterWare makes a cluster appear as a multi-processor parallel computer. The master node maintains (and presents to the user) a single process space for the entire cluster, known as the BProc Distributed Process Space. BProc is described briefly later in this chapter, and more details are provided in the Administrator's Guide.

Execution Time Process Migration. Scyld ClusterWare stores applications on the master node. At execution time, BProc migrates processes from the master to the compute nodes. This approach virtually eliminates both the risk of version skew and the need for hard disks on the compute nodes. More information is provided in the section on process space migration later in this chapter. Also refer to the BProc discussion in the Administrator's Guide.

Seamless Cluster Scalability. Scyld ClusterWare seamlessly supports the dynamic addition and deletion of compute nodes without modification to existing source code or configuration files. See the chapter on the BeoSetup utility in the Administrator's Guide.

Administration Tools. Scyld ClusterWare includes simplified tools for performing cluster administration and maintenance. Both graphical user interface (GUI) and command line interface (CLI) tools are supplied. See the Administrator's Guide for more information.

Web-Based Administration Tools. Scyld ClusterWare includes web-based tools for remote administration, job execution, and monitoring of the cluster. See the Administrator's Guide for more information.

Additional Features. Additional features of Scyld ClusterWare include support for cluster power management (IPMI and Wake-on-LAN, easily extensible to other out-of-band management protocols); runtime and development support for MPI and PVM; and support for the LFS and NFS3 file systems.

Fully-Supported. Scyld ClusterWare is fully-supported by Penguin Computing, Inc.

Process Space Migration Technology

Scyld ClusterWare is able to provide a single system image through its use of the BProc Distributed Process Space, the Beowulf process space management kernel enhancement. BProc enables the processes running on compute nodes to be visible and managed on the master node. All processes appear in the master node's process table, from which they are migrated to the appropriate compute node by BProc. Both process parent-child relationships and Unix job-control information are maintained with the migrated jobs. The stdout and stderr streams are redirected to the user's ssh or terminal session on the master node across the network.

The BProc mechanism is one of the primary features that makes Scyld ClusterWare different from traditional Beowulf clusters. For more information, see the system design description in the Administrator's Guide.

Compute Node Provisioning

Scyld ClusterWare utilizes light-weight provisioning of compute nodes from the master node's kernel and Linux distribution. For Scyld Series 30 and Scyld ClusterWare HPC, PXE is the supported method for booting nodes into the cluster; the 2-phase boot sequence of earlier Scyld distributions is no longer used.

The master node is the DHCP server serving the cluster private network. PXE booting across the private network ensures that the compute node boot package is version-synchronized for all nodes within the cluster. This boot package consists of the kernel, initrd, and rootfs. If desired, the boot package can be customized per node in the Beowulf configuration file /etc/beowulf/config, which also includes the kernel command line parameters for the boot package.

For a detailed description of the compute node boot procedure, see the system design description in the Administrator's Guide. Also refer to the chapter on compute node boot options in that document.

Compute Node Categories

Compute nodes seen by the master over the private network are classified into one of three categories by the master node, as follows:

Unknown — A node not formally recognized by the cluster as being either a "configured" or "ignored" node. When bringing a new compute node online, or after replacing an existing node's network interface card, the node will be classified as "unknown".
Ignored — Nodes which, for one reason or another, you'd like the master node to ignore. These are not considered part of the cluster, nor will they receive a response from the master node during their boot process.
Configured — Those nodes listed in the cluster configuration file using the "node" tag. These are formally part of the cluster, recognized as such by the master node, and used as computational resources by the cluster.

For more information on compute node categories, see the system design description in the Administrator's Guide.

Compute Node States

BProc maintains the current condition or "node state" of each configured compute node in the cluster. The compute node states are defined as follows:

down — Node is not communicating with the master and its previous state was either "down", "up", "error", "unavailable", or "boot".
unavailable — Node has been marked "unavailable" or "off-line" by the cluster administrator; typically used when performing maintenance activities.
error — Node encountered an error during its initialization; this state may also be set manually by the cluster administrator.
up — Node completed its initialization without error; node is online and operating normally. This is the only state in which end users may use the node.
reboot — Node has been commanded to reboot itself; node will remain in this state until it reaches the "boot" state, as described below.
halt — Node has been commanded to halt itself; node will remain in this state until it is reset (or powered back on) and reaches the "boot" state, as described below.
pwroff — Node has been commanded to power itself off; node will remain in this state until it is powered back on and reaches the "boot" state, as described below.
boot — Node has completed its stage 2 boot but is still initializing; after the node finishes booting, its next state will be either "up" or "error".

For more information on compute node states, see the system design description in the Administrator's Guide.

Major Software Components

The following is a list of the major software components included with Scyld ClusterWare HPC. For more information, see the relevant sections of the Scyld ClusterWare HPC documentation set, including the Installation Guide, Administrator's Guide, User's Guide, Reference Guide, and Programmer's Guide.

BProc — The process migration technology; an integral part of Scyld ClusterWare.
BeoSetup — A GUI for configuring the cluster.
BeoStatus — A GUI for monitoring cluster status.
beostat — A text-based tool for monitoring cluster status.
beoboot — A set of utilities for booting the compute nodes.
beofdisk — A utility for remote partitioning of hard disks on the compute nodes.
beoserv — The cluster's DHCP, PXE and dynamic provisioning server; it responds to compute nodes and serves the boot image.
BPmaster — The BProc master daemon; it runs on the master node.
BPslave — The BProc compute daemon; it runs on each of the compute nodes.
bpstat — A BProc utility that reports status information for all nodes in the cluster.
bpctl — A BProc command line interface for controlling the nodes.
bpsh — A BProc utility intended as a replacement for rsh (remote shell).
bpcp — A BProc utility for copying files between nodes, similar to rcp (remote copy).
MPI — The Message Passing Interface, optimized for use with Scyld ClusterWare.
PVM — The Parallel Virtual Machine, optimized for use with Scyld ClusterWare.
mpprun — A parallel job-creation package for Scyld ClusterWare.

<< Previous	Home	Next >>
Scyld ClusterWare Overview	Up	Typical Applications of Scyld ClusterWare