Heracles_Architecture

Heracles Architecture - Multi-Core Cluster

Server name: heracles.ucdenver.pvt

Heracles Multi-core cluster consists of following primary components:

Total of 18 nodes distributed as the following:

1 master node
15 compute nodes: from node 2 to node 16
1 Node (node 1) with 2 x NVIDIA Ada Lovelace GPUs
1 Node (node 18) with 4 x NVIDIA Tesla P100 GPUs

256GB Total Memory @ 3200 MHz (16 x 16GB DDR4 3200 MHz ECC/Registered Memory)
2 x 960GB Intel SSD D3-S4620 2.5" SATA 6Gbps
6 x 3.84TB Intel SSD D3-S4510 2.5" SATA 6Gbps
/home - 15.36TB RAID6

Master Node

The master node is mainly used to manage all computing resources and operations on Heracles cluster, and it corresponds to node 0 in the cluster. It is also the machine that users log into, create/edit/compile programs, and submit the programs for execution on compute nodes.

Users do not run their programs on master node.

Repeat: user programs MUST NOT be run on master node. Instead, they must be submitted to the compute nodes for execution.

The master node on Heracles cluster is featured by:

2 x Intel Xeon 5317 'Ice Lake-SP' CPUs 3.0 GHz 12-core
2 x AVX-512 units per core (clock speeds with AVX-512 instructions: 2.3 - 3.4 GHz)
18MB L3 Cache
Hyper-Threading, and Turbo Boost up to 3.6 GHz

Compute Nodes - node 2 to node 16

Compute nodes execute the jobs submitted by users. From the master node, users may submit programs to execute them on one or more compute nodes.

There are 15 compute nodes (nodes 2 to 16) on Heracles.

Each node has 2 x Intel Xeon E5-2650v4 Broadwell-EP 2.20 GHz Twelve Cores

Supports Hyper-Threading, i.e., each core can run two threads that gives a total of 48 threads per node (2 processors x 12 cores per processor x 2 threads per core)

30MB L3 Cache, DDR4-2400, 9.6 GT/sec QPI, 105W

128GB Total Memory per Node @ 2400MHz

120GB Intel DC S3510 2.5" SATA 6Gbps MLC SSD (16nm) per node

SATA 6Gb/s Interface (Supports 3Gb/s)

The fifteen compute nodes together have:

360 cores (15 nodes x 2 processors per node x 12 cores per processor)

They can run 720 threads (360 cores x 2 hyper threads per core)

Node 1 with 2 x GPUs - Nvidia Ada Lovelace

2 x Intel Xeon 5317 'Ice Lake-SP' CPUs 3.0 GHz 12-core
Mellanox ConnectX-6 VPI Single-Port QSFP56 HCA, HDR InfiniBand (200Gb/s) and 200GigE
2 x NVIDIA Ada "Lovelace" L40S PCI-E 48GB ECC Passive GPU Accelerator / Graphics Card
CUDA Driver Version / Runtime Version 12.5 / 12.4
CUDA Capability Major/Minor version number: 8.9
Total amount of global memory: 45495 MBytes (47704637440 bytes)
(142) Multiprocessors, (128) CUDA Cores/MP: 18176 CUDA Cores
GPU Max Clock rate: 2520 MHz (2.52 GHz)

For detail information about the GPUs on node18 run the following command:

ssh node2 /usr/local/cuda/samples/Samples/1_Utilities/deviceQuery/deviceQuery

Configuration of Node 18 with 4 x GPUs - Nvidia Tesla P100

2 x Intel Xeon E5-2650v4 Broadwell-EP 2.20 GHz 12 Cores

30MB L3 Cache, DDR4-2400, 9.6 GT/sec QPI, 105W

Supports Hyper-Threading and Turbo Boost up to 2.9 GHz

4 x NVIDIA Tesla P100 16GB "Pascal" SXM2 GPU Accelerator

For detail information about the GPUs on node18 run the following command:

ssh node18 /usr/local/cuda/samples/Samples/1_Utilities/deviceQuery/deviceQuery

3,584 CUDA cores per GPU
Total of 14,336 CUDA cores
(56) Multiprocessors, ( 64) CUDA Cores/MP
16GB High-Bandwidth HBM2 Memory (720 GB/sec peak bandwidth)
NVIDIA Tesla P100 16GB "Pascal" SXM2 GPU Accelerator
SXM2 form factor with NVLink interconnect support
GP100 GPU chip with NVIDIA-certified Passive Heatsink

Monitor the GPUs on node2 and node18 by using the following command:

ssh node18 nvidia-smi

ssh node2 nvidia-smi