Heracles Architecture - Multi-Core Cluster

Server name: heracles.ucdenver.pvt


Heracles Multi-core cluster consists of following primary components:

                                                                                Figure 1. Heracles Architecture

Each node in the cluster has 2 x Intel Xeon E5-2650v4 Processors with 24 cores (12 cores per Processor)

Intel® Xeon® Processor E5-2600 v4 Product Family
                           Figure 2. Intel Xeon Processor E5-2600 v4 Product Family

Cache Hierarchy

Master Node - node 1

The master node is mainly used to manage all computing resources and operations on Heracles cluster and it corresponds to node 1 in the cluster. It is also the machine that users log into, create/edit/compile programs, and submit the programs for execution on compute nodes.

Users do not run their programs on master node. Repeat: user programs MUST NOT be run on  master node. Instead, they have to be submitted to the compute nodes for execution.

The master node on Heracles cluster is featured by:

Compute Nodes - node 2 to node 17

Compute nodes execute the jobs submitted by users. From the master node, users may submit programs to execute them on one or more compute nodes.

There are 16 compute nodes (nodes 2 to 17) on Heracles.

We have 4 x NumberSmasher-4X Intel Xeon Twin Servers, that containg four compute nodes each one, in a total of 16 compute nodes.
The sixteen compute nodes together have:

Node 18 with 4 x Nvidia Tesla P100

Each Nvidia Tesla P100-SXM2-16GB has the following Features:

Each Nvidia Tesla P100-SXM2-16GB has the following Capacity
CUDA Driver Version / Runtime Version
 8.0 / 8.0
CUDA Capability Major/Minor version number
 Total amount of global memory:
16276 MBytes (17066885120 bytes)
(56) Multiprocessors, ( 64) CUDA Cores/MP
3584 CUDA Cores
GPU Max Clock rate
405 MHz (0.41 GHz)
Memory Clock rate
 715 Mhz
 L2 Cache Size: 
4194304 bytes
Total amount of constant memory: 
65536 bytes
Total amount of shared memory per block:
49152 bytes
Total number of registers available per block:
Warp size
 Maximum number of threads per multiprocessor
 Maximum number of threads per block:
Max dimension size of a thread block (x,y,z)
(1024, 1024, 64)
Max dimension size of a grid size    (x,y,z)
(2147483647, 65535, 65535)
Concurrent copy and kernel execution
Yes with 2 copy engine(s)
You can monitor the GPUs on node 18 by using this command: