Known Issues And Workarounds

The following are known issues of significance with the latest version of ClusterWare 5 and suggested workarounds.

Issues with ptrace

Cluster-wide ptrace functionality is not yet supported in ClusterWare 5. For example, you cannot use a debugger running on the master node to observe or manipulate a process that is executing on a compute node, e.g., using gdb -p procID, where procID is a processID of a compute node process. strace does function in its basic form, although you cannot use the -f or -F options to trace forked children if those children move away from the parent's node.

Issues with xpvm

xpvm is not currently supported in ClusterWare 5.

Issues with ENV Modules and TORQUE

ENV Modules are currently unsupported for OpenMPI jobs submitted through TORQUE. As a workaround, mimic what ENV Modules does by explicitly setting the environment variables with pathnames appropriate to the compiler that was used to build the OpenMPI application. For example, for an OpenMPI application compiled with GNU, add to the TORQUE script:

    export PATH=/usr/openmpi/gnu/bin:$PATH
    export LD_LIBRARY_PATH=/usr/lib64/OMPI/gnu:$LD_LIBRARY_PATH
    export OPAL_PKGDATADIR=/usr/openmpi/gnu/share
then run /usr/bin/mpirun as normal.

Caution using beosetup

At this time, we do not recommend using beosetup for observing or altering the cluster state while new compute nodes are booting.

Issues with Gdk

If you access a cluster master node using ssh -X from a workstation, some graphical commands or program may fail with:

    Gdk-ERROR **: BadMatch (invalid parameter attributes)
      serial 798 error_code 8 request_code 72 minor_code 0
    Gdk-ERROR **: BadMatch (invalid parameter attributes)
      serial 802 error_code 8 request_code 72 minor_code 0
Remedy this by doing:
    export XLIB_SKIP_ARGB_VISUALS=1
prior to running the failing program. If this workaround is successful, then consider adding this line to /etc/bashrc or to ~/.bashrc. See https://bugs.launchpad.net/ubuntu/+source/xmms/+bug/58192 for details.

Issues with Ganglia

The Ganglia cluster monitoring tool may fail for large clusters. If the /var/log/httpd/error_log shows a fatal error of the form:

    PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted
then edit the file /etc/php.ini to increase the memory_limit parameter. The default is:
    memory_limit = 16M
which can be safely doubled and re-doubled until the error goes away.

Caution when modifying ClusterWare scripts

ClusterWare installs various scripts in /etc/beowulf/init.d/ that node_up executes when booting each node in the cluster. Any site-local modification to one of these scripts will be lost when a subsequent ClusterWare update overwrites the file with a newer version. If a local sysadmin believes a local modification is necessary, we suggest:

  1. Copy the to-be-edited original script to a file with a unique name, e.g.:

        cd /etc/beowulf/init.d
        cp 20ipmi 20ipmi_local
  2. Remove the executable state of the original:

        beochkconfig 20ipmi off
  3. Edit 20ipmi_local as desired.

  4. Thereafter, subsequent ClusterWare updates may install a new 20ipmi, but that update will not re-enable the non-executable state of that script. The locally modified 20ipmi_local remains untouched. However, keep in mind that the newer ClusterWare version of 20ipmi may contain fixes or other changes that need to be reflected in 20ipmi_local because that edited file was based upon an older ClusterWare version.

Caution using tools that modify config files touched by ClusterWare

Software tools exist that might make modifications to various system configuration files that ClusterWare also modifies. These tools do not have knowledge of the ClusterWare specific changes and therefore may undo or cause damage to the changes or configuration. Care must be taken when using such tools. One such example is /usr/sbin/authconfig, which manipulates /etc/nsswitch.conf.

ClusterWare modifies these system configuration files at install time:

    /etc/exports
    /etc/nsswitch.conf
    /etc/security/limits.conf
    /etc/sysconfig/syslog
Additionally, ClusterWare uses /sbin/chkconfig to enable nfs.

Running nscd service on master node may cause kickbackdaemon to misbehave

The nscd (Name Service Cache Daemon) service executes by default on each compute node. However, if this service is also enabled on the master node, then it may cause the ClusterWare name service kickbackdaemon to misbehave.

Workaround: when Beowulf starts, if it detects that nscd is running on the master node, then Beowulf automatically stops nscd and reports that it has done so. Beowulf does not invoke /sbin/chkconfig nscd off to permanently turn off the service.

Note: even after stopping nscd on the master node,

    service nscd status
will report that nscd is running because the daemon continues to execute on each compute node, as controlled by /etc/beowulf/init.d/09nscd.

ClusterWare MVAPICH CPU affinity management

CW4.2.0 (and later releases) support Infiniband via Open Source kernel drivers, OpenIB, OFED, and a ClusterWare-enhanced MVAPICH. The CW4.2.0 MVAPICH default behavior is to assign threads of each multithreaded job to specific CPUs in each node, starting with cpu0 and incrementing upward. While keeping threads pinned to a specific CPU may be an optimal NUMA and CPU cache strategy for nodes that are dedicated solely to a single job, it is usually suboptimal if multiple multithreaded jobs share a node, as each job's threads get permanently assigned to the same low-numbered CPUs. The CW4.2.1 (and beyond) default behavior is to not impose strict CPU affinity assignments, which allows the kernel CPU scheduler to migrate threads as it sees fit to load-balance the node's CPUs as workloads change over time.

However, the user may override this default using:

    export VIADEV_ENABLE_AFFINITY=1

Conflicts with base distribution of openmpi

ClusterWare 5 includes MPI-related packages that conflict with certain packages in the Red Hat or CentOS base distribution.

If yum informs you that it cannot install or update ClusterWare because various mpich and mpiexec packages conflict with various openmpi packages from the base distribution, then run the command:

    yum remove openmpi* mvapich*
to remove the conflicting base distribution packages, then retry:

    yum groupupdate Scyld-ClusterWare

Beofdisk does not support local disks without partition tables

Currently, beofdisk only supports disks that already have partition tables, even if those tables are empty. Compute nodes with preconfigured hardware RAID, where partition tables have been created on the LUNs, should be configurable. Contact Customer Service for assistance with a disk without partition tables.

Issues with bproc and the getpid() syscall

BProc interaction with getpid() may return incorrect processID values.

Details: The Red Hat's glibc implements the getpid() syscall by asking the kernel once for the current processID value, then caching that value for subsequent calls to getpid(). If a program calls getpid() before calling bproc_rfork() or bproc_vrfork(), then bproc silently changes the child's processID, but a subsequent getpid() continues to return the former cached processID value.

Workaround: do not call getpid() prior to calling bproc_[v]rfork.

Issue with bpcp

Using the bpcp command, specifying master for both the source and destination, e.g.,

    bpcp master:/tmp/x master:/tmp/y
sometimes exits with good status before the copy actually completes. Thus, the bpcp immediately followed with a check for the existence of the destination file, e.g., ls /tmp/y may fail.

Workaround: Specify master for the source file or for the destination file, but not both. For example, do:

    bpcp /tmp/x master:/tmp/y
Also, any short sleep or delay after the bpcp and before testing for the existence of the destination file will succeed in finding the destination file.

ClusterWare 5 and Bladerunner Xeon servers

The RHEL5 base distribution includes a sky2 network driver that panics the kernel on Penguin Computing Bladerunner Xeon servers. Contact Penguin Computing Customer Service for a workaround.