TRACE-HPC TRACE is designed for converged HPC + AI + Data. Its custom topology is optimized for data-centric HPC, AI, and HPDA (High Performance Data Analytics). An extremely flexible software environment along with community data collections and BDaaS (Big Data as a Service) provide the tools necessary for modern pioneering research. The data management system, Ocean, consists of two-tiers, disk and tape, transparently managed as a single, highly usable namespace.

Compute nodes

TRACE-HPC has three types of compute nodes: "Regular Memory", "Extreme Memory", and GPU.

Regular Memory nodes

Regular Memory (RM) nodes

TRACE nodes

Trace nodes provide extremely powerful general-purpose computing, pre- and post-processing, AI inferencing, and machine learning and data analytics. Most RM nodes contain 256GB of RAM, but 16 of them have 512GB.

...

RM nodes

...

trace nodes
Number

488

16

12
CPU	2 AMD EPYC

7742 CPUs64

7713 CPUs 64 cores per CPU, 128 cores per

node2.25-3.40 GHz2 AMD EPYC 7742 CPUs64 cores per CPU, 128 cores per node2.25-3.40

node 2.0 GHz
RAM

256GB

512GB

Cache

256MB L3, 8 memory channels

Node-local storage

3.84TB NVMe SSD

Network

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

Extreme memory nodes

Extreme Memory (EM) nodes provide 4TB of shared memory for statistics, graph analytics, genome sequence assembly, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.

EM nodes
Number	4
CPU	4 Intel Xeon Platinum 8260M "Cascade lake" CPUs24 cores per CPU, 96 cores per node2.40-3.90 GHz
RAM	4TB, DDR4-2933
Cache	37.75MB LLC, 6 memory channels
Node-local storage	7.68TB NVMe SSD
Network	Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

GPU nodes

TRACE-HPC's GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing, with a total of 40, 960 CUDA cores and 5,120 tensor cores. TRACE' GPU-AI resources have been migrated to TRACE-HPC, adding the DGX-2 and nine more V100 GPU nodes to TRACE-HPC's GPU resources.

GPU nodes
Number	24	9	1
GPUs per node	8 NVIDIA Tesla V100-32GB SXM2	8 NVIDIA V100-16GB	16 NVIDIA Volta V100-32GB
GPU memory	32 GB per GPU256GB total/node	16GB per GPU128GB total/node	32GB per GPU512GB total
GPU performance	1 Pf/s tensor
CPUs	2 Intel Xeon Gold 6248 "Cascade Lake" CPUs20 cores per CPU, 40 cores per node2.50 – 3.90 GHz	2 Intel Xeon Gold 6148 CPUs20 cores per CPU , 40 cores per node2.4 – 3.7 GHz	2 Intel Xeon Platinum 816824 cores per CPU, 48 cores total2.7 – 3.7 GHz
RAM	512GB, DDR4-2933	192 GB, DDR4-2666	1.5 TB, DDR4-2666
Interconnect	NVLink	PCIe	NVLink
NVCache	27.5MB LLC, 6 memory channels		33MB
Node-local storage	7.68TB NVMe SSD	4 NVMe SSDs, 2TB each (total 8TB)	8 NVMe SSDs, 8.84TB each (total ~30TB)
Network	2 Mellanox ConnectX-6 HDR Infiniband 200 Gbs/s Adapters

Data Management

...

2TB memory (64GB RDIMM, 3200MT/s)
GPUs	~150k CUDA cores
Storage($LOCAL)	3.2TB NVMe (1.6TB U.2 Gen4 RAID0)
Network	Mellanox ConnectX-6 Dual Port 100Gb InfiniBand

Version	Old Version 1	New Version Current
Changes made by	Daniel Fassinger	Daniel Fassinger
Saved on	May 06, 2022	Sept 28, 2022

Versions Compared

Key

Compute nodes

Regular Memory nodes

TRACE nodes

Extreme memory nodes

GPU nodes

Data Management

Content Comparison

Versions Compared

Key

Compute nodes

Regular Memory nodes

TRACE nodes

Extreme memory nodes

GPU nodes

Data Management