Scyld ClusterWare HPC
Administrator's Guide
Copyright
© 1999 - 2009 Penguin Computing, Inc.
Table of Contents
Preface
Feedback
Scyld ClusterWare Design Description
System Architecture
System Hardware Context
Network Topologies
System Data Flow
System Software Context
System Level Files
Technical Description
Compute Node Boot Procedure
BProc Distributed Process Space
Compute Node Categories
Compute Node States
Miscellaneous Components
Software Components
BeoBoot Tools
BProc Daemons
BProc Clients
ClusterWare Utilities
Configuring the Cluster with BeoSetup
BeoSetup Features
Starting BeoSetup
Full vs. Limited Privileges
Limiting Full Privileges
The BeoSetup Main Window
The File Menu
Configuration File...
Start Cluster
Service Reconfigure
Shutdown Cluster
Exit
The Settings Menu
Configuration
Preferences
Reread Cfg Files
The Help Menu
About...
The Toolbar
Apply
Revert
Node Floppy
Node CD
Config Boot
Configuration
Preferences
The Node List Boxes
The Configured Nodes List
The Unknown Addresses List
The Ignored Addresses List
Configuring the Cluster Manually
Configuration Files
/etc/beowulf/config
/etc/beowulf/fdisk
/etc/beowulf/fstab
Command Line Tools
bpstat
bpctl
node_down
The Kernel Command Line
Useful Command Line Options
Adding New Kernel Modules
Accessing External License Servers
Configuring RSH for Remote Job Execution
RSH from Master to Compute Node
RSH from Compute Node to Master
Configuring SSH for Remote Job Execution
Interconnects
Ethernet
Other Interconnects
Monitoring the Status of the Cluster
Monitoring Utilities
Cluster Monitoring Interfaces
Monitoring Daemons
Using the Data
BeoStatus
BeoStatus File Menu
BeoStatus Modes
beostat and libbeostat
bpstat
Managing Users on the Cluster
Managing User Accounts
Adding New Users
Removing Users
Managing User Groups
Creating a Group
Adding a User to a Group
Removing a Group
Controlling Access to Cluster Resources
What Node Ownership Means
Checking Node Ownership
Setting Node Ownership
Job Batching
Remote Administration and Monitoring
Command Line Tools
X Forwarding
Web-based Administration and Monitoring
Scyld Integrated Management Framework (IMF)
Working with the Web interface
Scyld IMF Components
Nodes
Torque Job and Queue Status
Ganglia
TaskMaster Portal
Settings
Help and Documentation
Advanced Customization for Developers
Architectural overview and data flow
Integrating existing Web interfaces
Adding diagnostic commands
Additional help
Compute Node Boot Options
Compute Node Boot Media
Floppy Disk
CD-ROM
PXE
Linux BIOS
Flash Disk
Changing Boot Settings
Adding Steps to the node_up Script
Per-Node Parameters
Other Per-Node Config Options
Error Logs
Disk Partitioning
Disk Partitioning Concepts
Disk Partitioning with ClusterWare
Master Node
Compute Nodes
Default Partitioning
Master Node
Compute Nodes
Partitioning Scenarios
Applying the Default Partitioning
Specifying Manual Partitioning
File Systems
File Systems on a Cluster
Local File Systems
Network/Cluster File Systems
NFS
NFS on Clusters
Configuration of NFS
ROMIO
Reasons to Use ROMIO
Installation and Configuration of ROMIO
Other Cluster File Systems
Supporting Multiple Master Nodes
Static Partitioning of Compute Nodes Among Multiple Masters
Cold Re-parenting of Compute Nodes
Node Migration Commands
Failover
When Compute Nodes Fail
Compute Node Data
When Master Nodes Fail
Protecting an Application from Node Failure
Preventing Node Failure
Load Balancing
Load Balancing in a Scyld Cluster
Mapping Policy
Queuing Policy
Implementing a Scheduling Policy
Extra Tools
TORQUE
Enabling TORQUE
IPMITool
Ganglia
Updating Software On Your Cluster
What Can't Be Updated
Special Directories, Configuration Files, and Scripts
What Resides on the Master Node
/etc/beowulf Directory
/usr/lib/beoboot Directory
/var/beowulf Directory
/var/log/beowulf Directory
What Gets Put on the Compute Nodes at Boot Time
Site-Local Startup Scripts
Next >>
Preface