Post-Installation Configuration Issues

Following a successful update or install of ClusterWare, you may wish to make one or more configuration changes, depending upon the local needs of your cluster.

Resolve *.rpmnew and *.rpmsave configuration file differences

As with every ClusterWare upgrade, after the upgrade you should locate any ClusterWare *.rpmsave and rpmnew files and perform merges, as appropriate, to carry forward the local changes. Sometimes a ClusterWare upgrade will save the locally modified version as *.rpmsave and overwrite the basic file with a new version. Other times the upgrade will keep the locally modified version untouched, installing the new version as *.rpmnew.

For example,

    cd /etc/beowulf
    find . -name \*rpmnew
    find . -name \*rpmsave
and examine each such file to understand how it differs from the configuration file that existed prior to the update. You may need to merge new lines from the newer *.rpmnew file into the existing file (common with config.rpmnew), or perhaps replace existing lines with new modifications. Or you may need to merge older local modifications in *.rpmsave into the newly installed pristine version of the file (common with fstab.rpmsave). Contact Scyld Customer Support if you are unsure about how to resolve particular differences, especially with /etc/beowulf/config.

Disable SELinux

ClusterWare execution currently requires that SELinux be disabled. You can manually edit /etc/sysconfig/selinux and ensure that:

    SELINUX=disabled
is set. If SELinux was not already set to disabled, then the master node must be rebooted for this change to take effect.

Optionally enable TORQUE

If you wish to run TORQUE, confirm that it is enabled on the master node:

    /sbin/chkconfig --list torque
and verify the settings 3:on 4:on 5:on.

After you successfully start the cluster compute nodes for the first time, enable the /etc/beowulf/init.d/torque script:

    beochkconfig 90torque on
then restart TORQUE and restart the compute nodes:
    service torque restart
    bpctl -S all -R
See the Administrator's Guide for more details about TORQUE configuration, and the User's Guide for details about how to use TORQUE.

Optionally enable Ganglia monitoring tool

To enable the Ganglia cluster monitoring tool, edit /etc/xinetd.d/beostat to change disable=yes to disable=no, followed by:

    /sbin/chkconfig xinetd on
    /sbin/chkconfig httpd on
    /sbin/chkconfig gmetad on
then either reboot the master node, which automatically restarts these three system services; or without rebooting, manually restart xinetd to re-read the newly edited beostat config file, and start the remaining services that are not already running:
    service xinetd restart
    service httpd start
    service gmetad start
See the Administrator's Guide for more details.

Optionally enable IMF ClusterAdmin Web Interface

The Integrated Management Framework (IMF) ClusterAdmin Web Interface is used by a cluster administrator to monitor and administer the cluster using a Web browser. It requires Apache on the master node (service httpd) and is access-protected with a Web application-specific username, admin, and password combination.

To enable the ClusterAdmin Web interface, perform the following steps on the master node:

  1. Enable the httpd service, if it is not already enabled:

        /sbin/chkconfig httpd on
        service httpd start

  2. Initialize the admin account by assigning it a unique password:

        /usr/bin/htpasswd /etc/httpd/imf/htpasswd-users admin

After that, you point your Web browser to your master node and log into the IMF ClusterAdmin Interface via the Scyld ClusterWare IMF link. See the Administrator's Guide for more details.

Optionally enable NFS locking

If you wish to use cluster-wide NFS locking, then you must enable locking on the master node and on the compute nodes. First ensure that NFS locking is enabled and running on the master:

    /sbin/chkconfig nfslock on
    service nfslock start
Then for each NFS mount point for which you need the locking functionality, you must edit /etc/beowulf/fstab (or the appropriate node-specific /etc/beowulf/fstab.N file(s)) to remove the default option nolock. See the Administrator's Guide for more details.

Optionally increase the number of nfsd threads

The default count of 8 nfsd NFS daemons may be insufficient for large clusters. One symptom of an insufficiency is a syslog message, most commonly seen when you boot all the cluster nodes:

    nfsd: too many open TCP sockets, consider increasing the number of nfsd threads
To increase the thread count (e.g., to 16):
    echo 16 > /proc/fs/nfsd/threads
Ideally, the chosen thread count should be sufficient to eliminate the syslog complaints, but not significantly higher, as that would unnecessarily consume system resources. One approach is to repeatedly double the thread count until the syslog error messages stop occurring, then make the satisfactory value N persistent across master node reboots by creating the file /etc/sysconfig/nfs, if it does not already exist, and adding to it an entry of the form:
    RPCNFSDCOUNT=N
where a value N of 1.5x to 2x the number of nodes is probably adequate, although perhaps excessive. See the Administrator's Guide for a more detailed discussion of NFS configuration.

Optionally increase the ip_conntrack table size for ip forwarding

Certain workloads doing ip forwarding (most commonly seen when you boot all the cluster nodes) may produce a syslog message of the form:

    ip_conntrack: table full, dropping packet.
Use the command:
    cat /proc/sys/net/ipv4/ip_conntrack_max
to see the current table size. To increase the size:
    echo N > /proc/sys/net/ipv4/ip_conntrack_max
e.g., where N is double the current size.

Optionally increase the max number of processID values

The kernel defaults to using a maximum of 32,768 processID values. Because BProc manages a common process space across the cluster, this default may be insufficient for very large clusters and/or workloads that create large numbers of concurrent processes. The sysadmin can increase this upper bound by using the sysctl command. For example,

    sysctl -w kernel.pid_max=262144
instructs the kernel to use a range of values up to 262,144. The maximum BProc-supported value is the same sysctl-managed upper bound supported by the kernel: 4,194,304 [= (4*1024*1024)].

Optionally adjust the size limit for locked memory

OpenIB and mvapich-0.9.9 require an override to the limit of how much memory can be locked.

ClusterWare adds a memlock override to /etc/security/limits.conf during a ClusterWare upgrade, if the override does not already exist in that file, regardless of whether or not Infiniband is present in the cluster. The new override line,

    *    -    memlock    unlimited
raises the limit to unlimited. The sysadmin may remove that override from limits.conf if Infiniband is not present, or in an Infiniband cluster may reduce that unlimited to a discrete value, though mvapich-0.9.9 requires a minimum of 16 MBytes, which is a limits.conf value of 16384. If the value is too small, then MVAPICH reports a "CQ Creation" or "QP Creation" error.

Optionally enable automatic CPU frequency management

If you wish to enable automatic CPU frequency management, you must have the base distribution's kernel-utils package installed, and then enable the script:

    beochkconfig 30cpuspeed on
You may optionally create a configuration file /etc/beowulf/conf.d/cpuspeed.conf (or node-specific cpuspeed.conf.N), ostensibly derived from the master node's /etc/cpuspeed.conf, to override default behavior. See man cpuspeed for details.

Optionally enable SSHD on compute nodes

If you wish to allow users to /usr/bin/ssh or /usr/bin/scp from the master to a compute node, or from one compute node to another compute node, then you must enable sshd on compute nodes by enabling the script:

    beochkconfig 81sshd on
See the Administrator's Guide for details.

Optionally reconfigure node names

You may declare site-specific alternative node names for cluster nodes by adding entries to /etc/beowulf/config. The syntax for a node name entry is:

    nodename format-string [domain/netgroup] [IPv4addr]
For example,
    nodename n%N
allows the user to refer to node 4 using the traditional .4 name, or alternatively using names like n4 or n004. See man beowulf-config and the Administrator's Guide for details.