Scyld ClusterWare HPC: Programmer's Guide | ||
---|---|---|
<< Previous | Next >> |
Scyld Beostat is a layered state, status, and alert system. It operates from two sources of data: a snapshot of static information gathered when a compute node first joins the cluster, and dynamic information updated by a light-weight program on each cluster node that unicasts or multicasts real-time status. This information is collected into a shared memory region on the master, which in turn is read by library routines.
Programs may hook into the status information at multiple levels. Directly receiving the multicast data stream provides an immediate indication of changed data, but this puts an additional, constant load on the monitoring system and the communication protocol is not documented as an end-user format. Reading the data gathered into the shared memory region is very effective for programs such as schedulers that need very efficient access to all data, but the format may unpredictably change between release versions. Most programs should instead use the highest level access, the BeoStat library.
The Scyld BeoStat library provides a collection of functions that allow a system program to retrieve a wide variety of performance and status data about any node in the cluster. These functions include information on CPU, memory, swap area, file system usage, network statistics, current process load, and details of the system architecture. This data is useful for monitoring system usage and making scheduling decisions and thus may be of interest to programmers developing system monitoring and scheduling tools, and may optionally be directly used by applications to modify their scaling behavior.
The functions in this set abstract details of the processor architecture and speed. They return general load information about CPU, nodes, and the cluster as a whole. Performance data may be used relative to other nodes, but may be useful as an absolute metric.
float beostat_get_cpu_percent (int node, int cpu); unsigned long beostat_get_net_rate (int node); int beostat_get_disk_usage (int node, int *max, int *curr); int beostat_count_idle_cpus (float threshold); int beostat_count_idle_cpus_on_node(int node, float cpu_idle_threshold); extern int beostat_get_avail_nodes_by_id (int **node_list, uid_t uid, gid_t *gid_list, int gid_size ); extern int beostat_is_node_available (int node, uid_t uid, gid_t *gid_list, int gid_size ); |
<< Previous | Home | Next >> |
Getting Information About the Parallel Machine | Specific Node Information |