Scyld ClusterWare HPC: User's Guide | ||
---|---|---|
<< Previous | Interacting With the System | Next >> |
One of the features of Scyld ClusterWare that isn't provided in traditional Beowulf clusters is the BProc Distributed Process Space. BProc presents a single unified process space for the entire cluster, run from the master node, where you can see and control jobs running on the compute nodes. This process space allows you to use standard Unix tools, such as top, ps, and kill. See the Administrator's Guide for more details on BProc.
Scyld ClusterWare also includes a tool called bpstat that can be used to determine which node is running a process. Using the command option bpstat -p will list all processes currently running by PID, with the number of the node running each process. The following output is an example:
[user@cluster user] $ bpstat -p PID Node 6301 0 6302 1 6303 0 6304 2 6305 1 6313 2 6314 3 6321 3 |
Using the command option bpstat -P (with an uppercase "P" instead of a lowercase "p") tells bpstat to take the output of the ps and reformat it, pre-pending a column showing the node number. The following two examples show the difference in the outputs from ps and from bpstat -P.
Example output from ps:
[user@cluster user] $ ps xf PID TTY STAT TIME COMMAND 6503 pts/2 S 0:00 bash 6665 pts/2 R 0:00 ps xf 6471 pts/3 S 0:00 bash 6538 pts/3 S 0:00 /bin/sh /usr/bin/linpack 6553 pts/3 S 0:00 \_ /bin/sh /usr/bin/mpirun -np 5 /tmp/xhpl 6654 pts/3 R 0:03 \_ /tmp/xhpl -p4pg /tmp/PI6553 -p4wd /tmp 6655 pts/3 S 0:00 \_ /tmp/xhpl -p4pg /tmp/PI6553 -p4wd /tmp 6656 pts/3 RW 0:01 \_ [xhpl] 6658 pts/3 SW 0:00 | \_ [xhpl] 6657 pts/3 RW 0:01 \_ [xhpl] 6660 pts/3 SW 0:00 | \_ [xhpl] 6659 pts/3 RW 0:01 \_ [xhpl] 6662 pts/3 SW 0:00 | \_ [xhpl] 6661 pts/3 SW 0:00 \_ [xhpl] 6663 pts/3 SW 0:00 \_ [xhpl] |
Example of the same ps output when run through bpstat -P instead:
[user@cluster user] $ ps xf | bpstat -P NODE PID TTY STAT TIME COMMAND 6503 pts/2 S 0:00 bash 6666 pts/2 R 0:00 ps xf 6667 pts/2 R 0:00 bpstat -P 6471 pts/3 S 0:00 bash 6538 pts/3 S 0:00 /bin/sh /usr/bin/linpack 6553 pts/3 S 0:00 \_ /bin/sh /usr/bin/mpirun -np 5 /tmp/xhpl 6654 pts/3 R 0:06 \_ /tmp/xhpl -p4pg /tmp/PI6553 -p4wd /tmp 6655 pts/3 S 0:00 \_ /tmp/xhpl -p4pg /tmp/PI6553 -p4wd /tmp 0 6656 pts/3 RW 0:06 \_ [xhpl] 0 6658 pts/3 SW 0:00 | \_ [xhpl] 1 6657 pts/3 RW 0:06 \_ [xhpl] 1 6660 pts/3 SW 0:00 | \_ [xhpl] 2 6659 pts/3 RW 0:06 \_ [xhpl] 2 6662 pts/3 SW 0:00 | \_ [xhpl] 3 6661 pts/3 SW 0:00 \_ [xhpl] 3 6663 pts/3 SW 0:00 \_ [xhpl] |
For additional information on bpstat, see the section on monitoring node status earlier in this chapter. For information on the bpstat command line options, see the Reference Guide.
<< Previous | Home | Next >> |
Copying Data to the Compute Nodes | Up | Running Programs |