beowulf-config

Name

/etc/beowulf/config -- Configuration file for BProc and beoboot

Description

This manual page describes the format of the Beowulf config file /etc/beowulf/config. This file defines the structure of a Scyld cluster, and provides a central location for many of the operational parameters.

The Beowulf config file contains the settings for beoboot, node initialization, BProc communication parameters, and other aspects of cluster operation.

The syntax of the Scyld configuration files is standardized, and is intended for human editing with embedded comments. Tools are provided for reading and writing from common programming and scripting languages, with writing retaining comments and formatting.

Tip

While /etc/beowulf/config may be edited with any text editor, it is best changed using the beonetconf or beosetup programs. However, some parameters may not be editable using those programs, and must be set carefully when editing the file, as incorrect editing may leave the cluster unusable.

Scyld Config File Format

The config file format is a line-oriented sequence of configuration entries. Each configuration entry starts with a keyword followed by parameters. A line is terminated by a newline or '#'. The latter character starts a comment.

The keyword and following parameters have the same syntax rules: they may be preceded by whitespace and continue to the next whitespace or the end of the line.

Keywords and following parameters may include whitespace by quoting between a matching pair of '"' (double quote) or ''' (single quote) characters. A '\' (backslash) removes the special meaning of the following quote character.

Note that comments and newlines take precedence over any other processing, thus a '#' may not be used in a keyword or embedded in a parameter, and a backslash followed by a newline does not join lines.

Each configuration option is contained on a single line, with keyword and optional parameters. Blank lines are ignored. Comments begin with an unquoted '#' and continue to the end of the line.

The Beowulf-config file is intended to be parsed using the beoconfig utility.

Keywords

interface interfacename

The interface directive is used to specify the name of the interface that connects the master node to the compute nodes. This is used by the cluster services and management tools such as the BProc master daemon and the Beoserv daemon. Common values are "eth1" (the second ethernet device) and "myri0" (for Myrinet). If present, entries after the interface name specify the IP address and netmask that the interface should be configured to.

iprange [nodenumber] IPaddress1 IPaddress2

The iprange directive specifies the range of IP addresses to be assigned to nodes. If the optional nodenumber is given, the first address in the range will be assigned to that node, the second address to the next node, etc. If no node number is given, the address assignment will begin with the node following the node that was last assigned. If no nodes have been assigned, the assignment will begin with node 0.

ip [nodenumber] IPaddress

The ip directive assigns one or more IP addresses to a single node. The addresses must be in dotted notation (e.g., 192.168.1.100). If the optional nodenumber argument is given, the specified IP address will be used for that node. If no node number is given, the addresses will be assigned beginning with the node number following whatever node was assigned last. If no nodes have been assigned, the IP address assignment will start with node 0.

bootfile [nodenumber] phase2-pathname

The bootfile directive is used to specify the name of the phase 2 image that should be used. If a node number is given, then the specified phase 2 image will be used for that node. If no node number is given, then the specified phase 2 image will be used for all compute nodes that have not been specified in any other bootfile entries. The nodenumber can also be a list of comma-separated node ranges (e.g., 1-5, 10-20).

bprocport port

The bprocport directive is used to specify the TCP port that the compute node connects to on the master for BProc.

fsck fsck-policy

The fsck directive is used to specify the file system checking policy to be used at boot time. The valid policies are "never", "safe" or "full".

never

The file system on the compute nodes will not be checked on boot.

safe

The file system on the compute nodes will go through a safe check every time the compute node boots.

full

The file system on the compute nodes will go through a full check every time the compute node boots. The full check might possibly remove files from the filesystem if they cannot be repaired.

kernelcommandline options

The kernelcommandline directive is used to specify any options you wish to have passed to the kernel on the compute nodes. These are the same options that are normally passed with "append=" in lilo, or on the lilo prompt while the machine is booting (e.g., "kernelcommandline apm=power-off").

kernelimage imagename

The kernelimage directive is used to specify the full path to the kernel that should be used when creating the final boot images for the compute nodes.

ignore MACaddress

The ignore directive is used to specify a MAC address (e.g., 00:11:22:AA:BB:CC) that beoserv should ignore DHCP and PXE requests from. Multiple ignore directives are allowed.

insmod module-name [options]

The insmod directive is used to specify a kernel module to be loaded (usually a network driver). Options for the module may be specified as well.

libraries librarypath1 [, librarypath2, ...]

The libraries directive is used to specify a list of libraries that should be cached on the compute nodes when an application on the node references the library. The library path can be a directory or file. If a file name is specified, then that specific file may be cached, if needed. If a directory name is specified, then every file in that directory may be cached. If the directory name ends with "/", then subdirectories under the specified directory may be cached.

prestage pathname

The prestage directive is used to name an explicit file that each compute node pulls from the master at node boot time. Multiple instances of prestage can be used. Typically, each pathname is a file in one of the libraries directories, and the pathname gets pulled into the compute node's library cache.

logfacility facility

The logfacility directive specifies the log facility that the BProc master daemon should use. Some example log facility names are "daemon", "syslog", and "local0" (see the syslog documentation for more information). The default log facility is "daemon".

mcastbcast interface

The mcastbcast directive is used to tell beoserv to use broadcast instead of multicast when transmitting files over the interface. This is useful when network equipment has trouble with heavy multicast traffic.

mcastthrottle interface rate

The mcastthrottle directive is used to control the rate at which data is transmitted over the specified interface. The rate is given in megabits per second. This is useful when the compute node interfaces cannot keep up with the master interface when sending large files.

mkfs mkfs-policy

The mkfs directive specifies the policy to use when building a Linux file system on the compute nodes. The valid policies are "never", "if_needed", or "always".

never

The filesystem on the compute nodes will never be recreated on boot.

if_needed

The filesystem on the compute nodes will only be recreated if the filesystem check fails.

always

The filesystem on the compute nodes will be recreated on every boot. fsck will be assumed to be set to "never" when this is set.

modprobe module-name [options ]

The modprobe directive is used to specify the name of the kernel module to be loaded with dependency checking, along with any specified module options.

modarg options

The modarg directive is used to specify options to be used for modules that are loaded during the boot process without options. This is useful for specifying options to modules that get loaded during the PCI scan.

moddep module-list

The moddep directive is used to specify module dependencies. The first module listed is dependent on the remaining modules in the space separated list. The first module will be loaded after all other listed modules. Module dependency information is normally automatically generated by the beoboot script.

node [nodenumber] MACaddress

The node directive is used to assign MAC addresses to node numbers. There should be one of these lines for each node in your cluster. Note the following:

  • If a value is not provided for the nodenumber argument, the first node entry is node 0, the second is node 1, the third is node 2, etc.

  • The value "off" can be used for the MACaddress argument to leave a place holder for that node number.

  • To skip a node number, use the value "node" or "node off" for the MACaddress argument.

  • To skip a node number and make sure it will never be automatically filled in by something later in the future, use the value "node reserved" for the MACaddress argument.

nodename name-format [IPv4 Offset or Base] [netgroup]

The nodename directive is used to define the primary hostname, as well as additional hostname-aliases for compute nodes. It can also be used to define hostnames and hostname-aliases for non-compute node entities with a per compute node relationship (e.g., to define a hostname and IP address for the IPMI management interface on each compute node). The presence of the (optional) IPv4 parameter determines if the entry is for compute nodes or for non-compute node entities. If no 'nodename' keyword is defined for compute nodes, then compute nodes' primary hostname is of the 'dot-number' format (e.g., node 10's primary hostname is '.10').

name-format

Define a hostname or hostname-alias. The first instance of the nodename keyword with no IPv4 parameter defines the primary hostname format for compute nodes. While the user may define the primary hostname, the FIRST hostname alias shall always be of the 'dot-number' format. This allows compute nodes to always resolve their address from the 'dot-number' notation. Additional nodename entries without an IPv4 parameter define additional hostname aliases.

The name-format string must contain a conversion specification for node number substitution. The conversion specification is introduced by a percent sign (the '%' symbol). An optional following digit in the range 1..5 specifies a zero-padded minimum field width. The specification is completed with an 'N'. An unspecified or zero field width allows numeric interpretation to match compute node host names. For example, "n%N" will match "n23", "n+23", and "n0000023". By contrast, "n%3N" will only match "n001" or "n023", but not "n1" or "n23".

IPv4 Offset or Base

The presence of the optional IPv4 argument defines if the entry is for "compute nodes" (i.e. the entry will resolve to the 'dot-number' name) or if the entry is for non-cluster entities that are loosely associated with the compute node. If the argument has a leading zero, then the parameter specifies an IPv4 Offset. If the argument does not lead with a zero, then the argument specifies a 'base' from which IP addresses are computed, by adding the 'node-number' associated with the non-compute node entity.

Netgroup

The netgroup parameter specifies a netgroup that contains all the entries generated by the nodename entry

nodes numnodes

The nodes directive is used to specify the total possible number of nodes in the cluster. This should normally be set to match the iprange. However, if multiple ipranges are specified, then this value should represent the total number of nodes in all the iprange entries.

nodeassign nodeassign-method

The nodeassign directive specifies the node assignment strategy used when the beoserv daemon receives a new, unknown MAC address from a computer that is not currently entered in the node database. The total number of entries in the node database is limited to the number specified with the nodes keyword (see above).

The valid node assignment methods are "append", "insert", "manual", or "locked". Note the following:

  • "Append" and "insert" are the only two choices that allow new nodes to be automatically given node numbers and welcomed into the cluster.

  • Any failures of automatic node assignment through "append" or "insert" (such as when the node table is full) will cause the node assignment to be treated as "manual".

append

This is the default setting. The system will append new MAC addresses to the end of the node list in the /etc/beowulf/config file. This is done by seeking out the highest already-assigned node number and attempting to go one number beyond it. If the highest node number in the cluster has already been assigned, the "append" method will fail and the "manual" method will take precedence.

insert

The system will insert new MAC addresses into the node list in the /etc/beowulf/config file, starting with the lowest vacant node number. If no spaces are available, the "append" method will be used instead. Typically, a user would choose "insert" when replacing a single node if they want the new node entry to appear in the same place as the old node entry. If the node table is full, the "insert" method will fail and the "manual" method will take precedence.

manual

The system will enter new MAC addresses in the /var/beowulf/unknown_addresses file, and require the user to manually assign the new nodes. The node entries will appear in the "Unknown" list in the BeoSetup GUI, which simplifies the node assignment process. An alternative to using the BeoSetup GUI is to manually edit the /etc/beowulf/config file and copy in the new MAC addresses from the /var/beowulf/unknown_addresses file.

locked

The system will ignore DHCP requests from any MAC addresses not already listed in the /etc/beowulf/config file. This prevents nodes from getting added to the cluster accidentally. This is particularly useful in a cluster with multiple masters, because it enables the Cluster Administrator to control which master responds to a new node request. When you are troubleshooting issues related to the cluster not "seeing" new nodes, one of the first things to check is whether nodeassign is set to "locked".

See the Administrator's Guide for additional information on configuring nodes with the BeoSetup GUI and on manual node configuration.

masterorder nodes IPaddress_primary IPaddress_secondary

The masterorder directive specifies the cluster IP addresses of the primary master node and the secondary master node(s) for a given set of nodes. This is used by the beoserv daemon for Master-Failover (cold reparenting). A compute node's PXE request broadcasts across the cluster network. The primary master node is given masterpxedelay seconds to respond, after which the first secondary master node will respond. If multiple secondary master nodes are specified, then each waits in turn for masterpxedelay seconds for a preferred master to respond. Similarly, the compute node's subsequent DHCP broadcast gets serviced in the same order, with each secondary master waiting masterdelay seconds for a preferred master to respond.

Example:

   masterorder 0,5,10-20 10.1.0.1 10.2.0.1
   masterorder 1-4,21-30 10.2.0.1 10.1.0.1

If master 10.1.0.1 is down or fails to respond to PXE/DHCP requests to compute node 10, then master 10.2.0.1 becomes the primary parent for compute node 10.

masterpxedelay SECS

The masterpxedelay directive specifies the timeout value in seconds for a non-primary master node to delay sending a response to an incoming PXE request. The default value is 5 seconds.

masterdelay SECS

The masterdelay directive specifies the timeout value in seconds for a non-primary master node to delay sending a response to an incoming dhcp request. The default value is 15 seconds.

pci vendorid deviceid drivername

The pci directive is used to specify what driver should be used in support of the specified PCI device. A device is identified by a unique vendor ID and device ID pair. The vendor and device ID's can be either in decimal or hexadecimal with the "0x" notation. You should have one of these lines for each PCI ID (a vendor ID combined with a device ID) for each device on your compute nodes that is not already recognized. Any module dependencies or arguments should be specified with moddep and modarg.

server transport-protocol port

The server directive is used to specify the port number that beoserv uses for the specified transport protocol (see the transportmode directive). A different port number must be used for each protocol type. The allowable values for the protocol are "mc" (multicast) or "tcp" (unicast). The default port numbers are 10001 for multicast and 1556 for unicast.

transportmode protocol

The transportmode directive is used to specify the transport protocol to be used by beoserv for serving IP addresses, boot files, and libraries to the compute nodes. The valid transport protocols are "mc" (multicast) or "tcp" (unicast). The default value is "tcp".

Examples

iprange 192.168.1.0 192.168.1.50
nodename ipmi-n%N 0.0.1.0

In the above example, the hostname "ipmi-n0" has an address of 192.168.2.50. That is, the compute node's address (192.168.1.50 for compute node 0) plus the IPv4 Offset of 0.0.1.0. The hostname "ipmi-n12" has an address of 192.168.2.12, which is compute node 12's address plus the IPv4 Offset of 0.0.1.0.

nodename ib0-n%N 0.1.0.0 infiniband

In the above example, define a hostname for the infiniband interface for each compute node. Using the iprange values in the previous example, the infiniband interface for compute node 0 has a primary hostname of "ib-n0" and resolves to the address 192.169.1.0: node 0's basic iprange IP address, plus the increment 0.1.0.0. The infiniband interface for compute node 10 has a primary hostname of "ib-n10" and resolves to the address 192.169.1.10. Each of the "ib0-n%N" hostnames belong to the "infiniband" netgroup.

nodename computenode%N
nodename cnode%3N

In the above example, the primary hostname for compute node 0 is "computenode0", and the primary hostname for compute node 12 is "computenode12". The second nodename entry defines additional hostname aliases. The FIRST hostname alias will always be the 'dot-number' notation, so compute node 12's first hostname alias is ".12", and the second hostname alias will be "cnode012". The '%' followed by a three specifies a three-digit field width format for the entry.

The following is an example of a complete Beowulf Configuration File

# Beowulf Configuration file

# Network interface used for Beowulf
# Only first argument to interface is important
interface eth1 192.168.1.1 255.255.255.1

# These two should probably agree for most users
iprange 192.168.1.100 192.168.1.107
nodes 8

# Default location of boot images
bootfile /var/beowulf/boot.img
kernelimage /boot/vmlinuz-2.4.17-0.18.12.beo
kernelcommandline apm=power-off

# Default libraries
libraries /lib /usr/lib

# Default file system policies.
fsck full
mkfs if_needed

# beoserv settings
server mc 10001
server tcp 1556
transportmode mc

# Default Modules
bootmodule 3c59x 8139too dmfe eepro100 epic100 hp100 natsemi
bootmodule ne2k-pci pcnet32 sis900 starfire sundance tlan
bootmodule tulip via-rhine winbond-840 yellowfin

# Non-kernel integrated drivers
bootmodule e100 bcm5700 # gm

# Node assignment method
nodeassign append

# PCI Gigabit Ethernet.
#  * AceNIC and SysKonnect firmwares are very large.
#  * Some of these are distributed separate from the kernel
bootmodule dl2k hamachi e1000 ns83820 # acenic sk98lin

node 00:50:8B:D3:25:4D
node 00:50:8B:D3:07:8B
ignore 00:50:8B:D3:31:FB
node 00:50:8B:D3:62:A0
node 00:50:8B:D3:00:66
node 00:50:8B:D3:30:42
node 00:50:8B:D3:98:EA

See Also

beoconfig(1), beonetconf(1), and beosetup(1).

Administrator's Guide