Monitoring the Status of the Cluster

Scyld ClusterWare provides several methods to monitor cluster performance and health, with a Web browser, a GUI, the command line, and "C" language interfaces. In general, these tools provide easy access to the information available through the Linux /proc system, as well as BProc information for each of the cluster nodes. The monitoring programs are available to both administrators and regular users, since they provide no cluster command capabilities.

Monitoring Utilities

Cluster Monitoring Interfaces

Scyld ClusterWare provides several cluster monitoring interfaces: Scyld IMF, BeoStatus, beostat, libbeostat, and bpstat. Scyld IMF is a separately licensed product and is installed from its own YUM repository. The other interfaces are installed during the standard installation procedure. Except for bpstat (which provides text-only output), these interfaces each provide the ability to customize the display format. Following is brief summary of these interfaces; more detailed information is provided in the sections that follow:

  • Scyld IMF — The Scyld Integrated Management Framework (IMF) centralizes access to Beostat, Torque, Ganglia and the documentation. The Nodes component displays Beostat data: per-core CPU utilization, memory usage, swap usage, disk usage, and network utilization. See the Chapter called Scyld Integrated Management Framework (IMF) for a detailed discussion of this tool.

  • BeoStatus — The BeoStatus cluster monitoring utility displays CPU utilization, memory usage, swap usage, disk usage, and network utilization. It defaults to a bar graph X-window GUI, but can display the information in several text formats. For large clusters, a small footprint GUI can be selected, with colored dots depicting the overall status on each node.

  • beostat — For a detailed command line display, beostat can be used. With no options, beostat will list /proc/cpuinfo, /proc/meminfo, /proc/loadavg, /proc/net/dev, and /proc/stat. Alternatively, you can use the arguments to select any combination of those statistics.

  • libbeostat — For users who wish to create their own custom displays or create more sophisticated resource scheduling software, Scyld provides "C" language interfaces through the libbeostat library. These functions are the same as those used by BeoStatus and beostat.

  • bpstat — This displays a text-only snapshot of the current cluster state/configuration.

Scyld also installs the popular Ganglia monitoring package by default, but does not configure it to run by default. For information on configuring Ganglia to run, see the Section called Ganglia in the Chapter called Extra Tools of this document.

Monitoring Daemons

Underlying the monitoring facility are two daemons, sendstats and recvstats. You can check that these are running with the following command:

[root@cluster ~] # ps -aux | grep stats
The sendstats daemon, as the name implies, sends the status information from each of the nodes and the master to the recvstats daemon, which runs only on the master.

These daemons are started on the master in the /etc/rc.d/init.d/beowulf script. The daemons operate in either a command/response fashion, where the recvstats daemon sends out multi-cast packets on port 3040 to request updates from all of the nodes, or in auto-transmit mode at 59-second intervals. For more information on the daemon options, see the Reference Guide.