Scyld ClusterWare HPC: User's Guide
<< Previous	Running Programs	Next >>

Running Programs That Aren't Parallelized

Starting and Migrating Programs to Compute Nodes (bpsh)

There are no executable programs (binaries) on the file system of the compute nodes. This means that there is no getty, no login, nor any shells on the compute nodes.

Instead of the remote shell (rsh) and secure shell (ssh) commands that are available on networked stand-alone computers (each of which has its own collection of binaries), Scyld ClusterWare has the bpsh command. The following example shows the standard ls command running on node 2 using bpsh:

[user@cluster user] $ bpsh 2 ls -FC /
  bin/   dev/  home/  lib64/  proc/  sys/  usr/
  bpfs/  etc/  lib/   opt/    sbin/  tmp/  var/

At startup time, by default Scyld ClusterWare exports the directories /bin and /usr/bin on the master node, and those directories are NFS-mounted by compute nodes.

However, an NFS-accessible /bin/ls is not a requirement for bpsh 2 ls to work. Note that the /sbin directory also exists on the compute node. It is not exported by the master node by default, and thus it exists locally on a compute node in the RAM-based filesystem. bpsh 2 ls /sbin usually shows an empty directory. Nonetheless, bpsh 2 modprobe bproc executes successfully, even though which modprobe shows the command resides in /sbin/modprobe and bpsh 2 which modprobe fails to find the command on the compute node because its /sbin does not contain modprobe.

bpsh 2 modprobe bproc works because the bpsh starts a modprobe process on the master node, then creates a process memory image that includes the command's binary and references to all its dynamically linked libraries. This process memory image is then copied (migrated) to the compute node, and there the references to dynamic libraries are remapped in the process address space. Only then does the modprobe command begin real execution.

bpsh is not a special version of sh, but a special way of handling execution. This process works with any program. Be aware of the following:

All three standard I/O streams — stdin, stdout, and stderr — are forwarded to the master node. Since some programs need to read standard input and will stop working if they're run in the background, be sure to close standard input at invocation by using use the bpsh -n flag when you run a program in the background on a compute node.
Because shell scripts expect executables to be present, and because compute nodes don't meet this requirement, shell scripts should be modified to include the bpsh commands required to affect the compute nodes and run on the master node.
The dynamic libraries are cached separately from the process memory image, and are copied to the compute node only if they are not already there. This saves time and network bandwidth. After the process completes, the dynamic libraries are unloaded from memory, but they remain in the local cache on the compute node, so they won't need to be copied if needed again.

For additional information on the BProc Distributed Process Space and how processes are migrated to compute nodes, see the Administrator's Guide.

Copying Information to Compute Nodes (bpcp)

Just as traditional Unix has copy (cp), remote copy (rcp), and secure copy (scp) to move files to and from networked machines, Scyld ClusterWare has the bpcp command.

Although the default sharing of the master node's home directories via NFS is useful for sharing small files, it is not a good solution for large data files. Having the compute nodes read large data files served via NFS from the master node will result in major network congestion, or even an overload and shutdown of the NFS server. In these cases, staging data files on compute nodes using the bpcp command is an alternate solution. Other solutions include using dedicated NFS servers or NAS appliances, and using cluster file systems.

Following are some examples of using bpcp.

This example shows the use of bpcp to copy a data file named foo2.dat from the current directory to the /tmp directory on node 6:

[user@cluster user] $ bpcp foo2.dat 6:/tmp

The default directory on the compute node is the current directory on the master node. The current directory on the compute node may already be NFS-mounted from the master node, but it may not exist. The example above works, since /tmp exists on the compute node, but will fail if the destination does not exist. To avoid this problem, you can create the necessary destination directory on the compute node before copying the file, as shown in the next example.

In this example, we change to the /tmp/foo directory on the master, use bpsh to create the same directory on the node 6, then copy foo2.dat to the node:

[user@cluster user] $ cd /tmp/foo
[user@cluster user] $ bpsh 6 mkdir /tmp/foo
[user@cluster user] $ bpcp foo2.dat 6:

This example copies foo2.dat from node 2 to node 3 directly, without the data being stored on the master node. As in the first example, this works because /tmp exists:

[user@cluster user] $ bpcp 2:/tmp/foo2.dat 3:/tmp

<< Previous	Home	Next >>
Running Programs	Up	Running Parallel Programs