Heracles
Architecture - Multi-Core Cluster
Server name: heracles.ucdenver.pvt
Home
Heracles
Multi-core cluster consists of following primary components:
- Total of 18 nodes distributed as the following:
- 1 master node
- 15 compute nodes: from node 2 to node 16
- 1 Node (node 1) with 2 x NVIDIA Ada Lovelace GPUs
- 1 Node (node 18) with 4 x NVIDIA Tesla P100 GPUs
- 256GB Total Memory @ 3200 MHz (16 x 16GB DDR4 3200
MHz ECC/Registered Memory)
- 2 x 960GB Intel SSD D3-S4620 2.5" SATA 6Gbps
- 6 x 3.84TB Intel SSD D3-S4510 2.5" SATA 6Gbps
- /home - 15.36TB RAID6
Master Node
The master node is mainly
used to manage all computing resources and operations on Heracles
cluster, and
it corresponds to node 0 in the cluster. It is also the machine that
users log
into, create/edit/compile programs, and submit the programs for
execution on
compute nodes.
Users do not run their
programs on master node.
Repeat: user programs MUST
NOT be run on master node. Instead, they
must be submitted to the compute nodes for execution.
The master node on Heracles
cluster is featured by:
- 2 x Intel Xeon 5317 'Ice Lake-SP' CPUs 3.0 GHz
12-core
- 2 x AVX-512
units per core (clock speeds with AVX-512 instructions: 2.3 - 3.4 GHz)
- 18MB L3 Cache
- Hyper-Threading, and Turbo Boost up to 3.6 GHz
Compute Nodes - node 2 to node 16
Compute nodes execute the
jobs submitted by users. From the master node, users may submit
programs to
execute them on one or more compute nodes.
There are 15 compute nodes
(nodes 2 to 16) on Heracles.
- Each
node has 2 x Intel Xeon E5-2650v4 Broadwell-EP 2.20 GHz Twelve Cores
- Supports
Hyper-Threading, i.e., each core can run two threads that gives a total
of 48 threads per node (2 processors x 12 cores per processor x 2
threads per core)
- 30MB
L3 Cache, DDR4-2400, 9.6 GT/sec QPI, 105W
- 128GB
Total Memory per Node @ 2400MHz
- 120GB
Intel DC S3510 2.5" SATA 6Gbps MLC SSD (16nm) per node
- SATA
6Gb/s Interface (Supports 3Gb/s)
The
fifteen compute nodes together have:
- 360
cores (15 nodes x 2 processors per node x 12 cores per processor)
- They
can run 720 threads (360 cores x 2 hyper threads per core)
Node
1 with 2 x GPUs - Nvidia Ada Lovelace
- 2 x Intel Xeon 5317 'Ice Lake-SP' CPUs 3.0 GHz
12-core
- Mellanox ConnectX-6 VPI Single-Port QSFP56 HCA, HDR
InfiniBand (200Gb/s) and 200GigE
- 2 x NVIDIA Ada "Lovelace" L40S PCI-E 48GB ECC
Passive GPU Accelerator / Graphics Card
- CUDA Driver Version / Runtime Version 12.5 / 12.4
- CUDA Capability Major/Minor version number: 8.9
- Total amount of global memory: 45495 MBytes (47704637440 bytes)
- (142) Multiprocessors, (128) CUDA Cores/MP: 18176 CUDA Cores
- GPU Max Clock rate: 2520 MHz (2.52 GHz)
- For detail information about the GPUs on node18 run the following command:
- ssh node2 /usr/local/cuda/samples/Samples/1_Utilities/deviceQuery/deviceQuery
Configuration of Node
18 with 4 x GPUs - Nvidia Tesla P100
- 2 x Intel Xeon E5-2650v4 Broadwell-EP 2.20 GHz 12
Cores
30MB L3 Cache, DDR4-2400, 9.6
GT/sec QPI, 105W
Supports Hyper-Threading and
Turbo Boost up to 2.9 GHz
- 4 x NVIDIA Tesla P100 16GB "Pascal" SXM2 GPU
Accelerator
For detail information about the GPUs on node18 run the following command:
ssh node18 /usr/local/cuda/samples/Samples/1_Utilities/deviceQuery/deviceQuery
3,584 CUDA cores per GPU
Total of 14,336 CUDA cores
(56) Multiprocessors, ( 64) CUDA Cores/MP
16GB High-Bandwidth HBM2
Memory (720 GB/sec peak bandwidth)
NVIDIA Tesla P100 16GB
"Pascal" SXM2 GPU Accelerator
SXM2 form factor with NVLink
interconnect support
GP100 GPU chip with
NVIDIA-certified Passive Heatsink
- Monitor the GPUs on node2
and node18 by using the following command:
ssh node18 nvidia-smi
ssh node2 nvidia-smi