Configuring RSH for Remote Job Execution

Many standard and shrink-wrapped applications are written to use Linux remote shell for launching jobs on compute nodes. In addition, some applications may not be able to use the MPICH implementation provided with Scyld ClusterWare. While RSH is not the optimal way to run jobs on a Scyld cluster, it is supported for just such applications.

In the Scyld implementation, jobs started using remote shell commands are run within the same unified process space as are jobs started using the preferred commands (bpsh, mpprun, and beorun). Scyld RSH support is installed by default, but it is not configured to run. This section describes how to configure RSH for remote job execution on a Scyld cluster.

Typically, configuring the cluster to use RSH requires that the directories on the master node containing shell commands and application binaries be mounted on or copied to the compute nodes.

RSH from Master to Compute Node

There are two versions of the RSH client program, /usr/bin/rsh and a Kerberos version in /usr/kerberos/bin/rsh. You must use the non-Kerberos version, /usr/bin/rsh.

To enable RSH on the node, the rcmdd cluster initialization script should be made executable, as follows:

[root@cluster ~] # /sbin/beochkconfig 80rcmdd on

Any node booted after these steps are completed will be running the rcmdd daemon, and will accept connections from the RSH application.

Updating User Information

Current versions of Scyld ClusterWare do support cluster-wide dynamic changes to user information when RSH is used as the remote execution method, so updated password/group files are not automatically copied to the compute nodes.

Copying Command-Line Utilities

Many standard command-line utilities used for system administration and other tasks will not exist on the compute nodes by default. For example, executing "rsh .1 ls" will not work as /bin/ls does not exist by default on the compute node.

Copying the binaries over to the compute nodes will solve this problem. However, there is one drawback, in that these applications will generally be loaded into the RAM disk used as the root file system on the compute nodes, and thus will occupy system memory even when not in use.

RSH from Compute Node to Master

Some applications require communication back to the master via RSH, and thus require that an RSH-server be running on the master. This could be either the standard rshd or the Scyld rcmdd. Both have pros and cons when running on the master node. The Scyld rcmdd was designed to be run in an isolated environment (on a node) and doesn't do anything to restrict access from any machine on the public interface. The standard rshd is more complex to set up, but provides better security.

In either case, it is prudent to add rules to the firewall to stop RSH access from anywhere outside the cluster. You can use RSH between the master node and the compute nodes by making the private network interface (typically eth1) a "trusted device" in the firewall configuration. See the sections titled "Network Security Settings" and "Selecting Trusted Devices" in Chapter 3 of the Installation Guide.

If using RSH-server, edit the file /etc/xinetd.d/rsh and change the "yes" on the disable line to "no" to tell the xinetd service to enable incoming RSH access. Then, to restart xinetd with the rsh server enabled on the master, do the following:

[root@cluster ~] # killall -HUP xinetd

If you wish to allow the root account to be able to RSH to the master, edit the file /etc/securetty and add a line that reads rsh.

For any accounts (including root) that should be allowed to have RSH access to the master, you must list the names of the allowable hosts, one per line, in a file named .rhosts in the account's home directory. Permissions on the file should be set to 0700 after it is created, as follows:

[root@cluster ~] # chmod 0700 ~/.rhosts

The following example of /root/.rhosts would allow RSH access from nodes 0 through 5, 7, and 22:

.0
.1
.2
.3
.4
.5
.7
.22