Scyld ClusterWare HPC: Programmer's Guide
<< Previous	BProc Overview	Next >>

Getting Information About the Parallel Machine

In order to start processes on compute nodes with BProc the programmer needs to know the number of the compute node. When using a library like MPI, some kind of scheduler selects the nodes to run each task on, thus hiding this from the programmer. Using BProc, some mechanism must be used to select a compute node to start processes on. If a scheduler is available, it can be called to get node numbers to run processes on. Otherwise, the programmer must select nodes. Similarly, if the programmer's task is to write a scheduler — as may be the case if special scheduling is needed for the application — then the programmer must have a means of learning what nodes are available to the BProc system for running processes.

BProc includes a set of system calls that allow a program to learn what nodes are known to BProc, and what the status of those nodes is. Under BProc, all of the compute nodes that are known to the system are numbered from 0 to P-1, where P is the number of compute nodes known to BProc. In addition, there is the master node, which is numbered -1. Thus, the first call a program will make is often:

	int bproc_numnodes(void);

This call returns the number of compute nodes known to BProc. Sometimes it is important for a program to know which node it is currently executing on. It may choose to perform some actions only if it is on a special node or it may use the information to identify itself to other processes. BProc returns this information with the call:

	int bproc_currnode(void);

This call returns the number of the node the process is currently executing on. Its return values range from -1 to P-1.

A node known to BProc is not necessarily available for running user processes. Not all nodes may be operational at any given point in time. Nodes may be in various stages of booting (and thus might be available momentarily) or might have suffered an error (which might bear reporting) or be shut down for maintenance. In any case, BProc provides a mechanism to learn the state of each compute node, as far as BProc is currently aware. The master node is always assumed to be up. The call provided for this is:

	int bproc_nodestatus(int node);

This function returns one of several values, which are defined in the header file bproc.h as follows:

	bproc_node_down, bproc_node_unavailable, bproc_node_error,
	bproc_node_up, bproc_node,reboot, bproc_node_halt,
	bproc_node_pwroff, bproc_node_boot

BProc is a low level system utility and does not generally get involved in node usage policies. Thus, if a node is up and permissions are set correctly, BProc will allow any user process to start a process on a node. In many environments there is a need to manage how nodes are used. For example, it may be desirable to guarantee that only one user is using a particular node for a period of time. It may be important to assign a set of nodes to a user or group of users for exclusive use over a period of time. It may be important to require users to start processes via some intermediary like a batch queue system or accounting system and thus prevent processes from being started directly.

In order to accommodate these issues, BProc provides a permission and access mechanism that is based on the Unix file access model. Under this model, every node is owned by a user and by a group and has execute permission bits for user, group and all. In order for a user to start a process on a node, an execute permission bit must be set for a matching user, matching group, or all.

These functions need not be used at all in many systems, but allow system programs to enforce policies controlling which users can run processes on which nodes. For example, a batch queue scheduler could decide to allocate 16 nodes to a job, and change the user of those nodes to match the user who submitted the job, thus preventing other users from inadvertently running processes on those nodes.

A program that is making scheduling decisions needs to know not only what nodes are known, and what nodes are actually operational, but what nodes have permissions set appropriately to allow processes to be run. BProc provides a call to retrieve this information for each node known to BProc.

	int bproc_nodeinfo(int node, struct bproc_node_info_t *info);

This function fills in the following struct:

	struct bproc_node_info_t 
	{
	    	int      num_nodes;         /* Info about your state of the world */
	 	int      curr_node;
	 	int      node;              /* Info about a particular node */
	 	int      status;
	 	uint32_t addr;
	 	int      user;
	 	int      group;
	};

Note that this call not only returns permission and node ownership information, but also includes the node status and network address. Thus this single call provides the information to decide if a process may be started on a node, along with the information needed to contact the node.

Permission to start a process on a node is distinct from the decision of whether that node has the resources, such as available memory and CPU cycles, to support the process. The scheduling and mapping policy is support by resource information that may be retrieved through the Beostat API. See the "Programming with Beostat API" section in this manual.

In order to implement schedulers, system programs must be able to control the ownership and access permission of compute nodes. BProc provides functions that set those attributes. These functions can only be used by root or a user with appropriate permissions.

	int bproc_setnodestatus(int node, int status);

This function allows a node management program to force node status to a particular state.

	int  bproc_chmod(int node, int mode);
	int  bproc_chown(int node, int user);
	int  bproc_chgrp(int node, int group);

These functions change the user and group that own a node, or the User/Group/Others execute permission for a node. They use the same semantics as Unix file ownership and execute permission.

	#define BPROC_X_OK 1
	int bproc_access(node, mode);

This function checks if the current user may start a process on the specified node. This is similar to the access() system call for files. The only useful value for mode is BPROC_X_OK.

The last set of node management functions is used to implement multiple master systems and redundant, robust, and available servers.

	int bproc_slave_chroot(int node, char *path);

This function sets a new root directory on a compute node, similar to the chroot() system call on the master node (local machine). This is used internally to implement isolated multiple master clusters, and allows a node to be installed with more than one configuration with the specific configuration set based on the process to be started.

	int bproc_detach(long code);

This function allows a process to detach itself from control by the master. The master sees the code as the apparent exit status of the process.