Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

TRACE-HPC TRACE is designed for converged HPC + AI + Data. Its custom topology is optimized for data-centric HPC, AI, and HPDA (High Performance Data Analytics). An extremely flexible software environment along with community data collections and BDaaS (Big Data as a Service) provide the tools necessary for modern pioneering research. The data management system, Ocean, consists of two-tiers, disk and tape, transparently managed as a single, highly usable namespace.

Compute nodes

TRACE-HPC has three types of compute nodes: "Regular Memory", "Extreme Memory", and GPU.

Regular Memory nodes

Regular Memory (RM) nodes  

TRACE nodes

Trace nodes provide extremely powerful general-purpose computing, pre- and post-processing, AI inferencing, and machine learning and data analytics. Most RM nodes contain 256GB of RAM, but 16 of them have 512GB.

...

RM nodes

...

 

...

 

trace nodes



Number

488

16

12


CPU

2 AMD EPYC

7742 CPUs64

7713 CPUs 64 cores per CPU, 128 cores per

node2.25-3.40 GHz2 AMD EPYC 7742 CPUs64 cores per CPU, 128 cores per node2.25-3.40

node 2.0 GHz


RAM

256GB

512GB

Cache

256MB L3, 8 memory channels

256MB L3, 8 memory channels

Node-local storage

3.84TB NVMe SSD

3.84TB NVMe SSD

Network

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

 

Extreme memory nodes

Extreme Memory (EM) nodes provide 4TB of shared memory for statistics, graph analytics, genome sequence assembly, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.

EM nodes

 

Number

4

CPU

4 Intel Xeon Platinum 8260M "Cascade lake" CPUs24 cores per CPU, 96 cores per node2.40-3.90 GHz

RAM

4TB, DDR4-2933

Cache

37.75MB LLC, 6 memory channels

Node-local storage

7.68TB NVMe SSD

Network

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

 

GPU nodes

TRACE-HPC's GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing, with a total of 40, 960 CUDA cores and 5,120 tensor cores. TRACE' GPU-AI resources have been migrated to TRACE-HPC, adding the DGX-2 and nine more V100 GPU nodes to TRACE-HPC's GPU resources.

GPU nodes

 

 

 

Number

24

9

1

GPUs per node

8 NVIDIA Tesla V100-32GB SXM2

8 NVIDIA V100-16GB

16 NVIDIA Volta V100-32GB

GPU memory

32 GB per GPU256GB total/node

16GB per GPU128GB total/node

32GB per GPU512GB total

GPU performance

1 Pf/s tensor

 

 

CPUs

2 Intel Xeon Gold 6248 "Cascade Lake" CPUs20 cores per CPU, 40 cores per node2.50 – 3.90 GHz

2 Intel Xeon Gold 6148 CPUs20 cores per CPU , 40 cores per node2.4 – 3.7 GHz

2 Intel Xeon Platinum 816824 cores per CPU, 48 cores total2.7 – 3.7 GHz

RAM

512GB, DDR4-2933

192 GB, DDR4-2666

1.5 TB, DDR4-2666

Interconnect

NVLink

PCIe

NVLink

NVCache

27.5MB LLC, 6 memory channels

 

33MB

Node-local storage

7.68TB NVMe SSD

4 NVMe SSDs, 2TB each (total 8TB)

8 NVMe SSDs, 8.84TB each (total ~30TB)

Network

2 Mellanox ConnectX-6 HDR Infiniband 200 Gbs/s Adapters

 

 

 

Data Management

...

2TB memory (64GB RDIMM, 3200MT/s)


GPUs

~150k CUDA cores


Storage($LOCAL)

3.2TB NVMe (1.6TB U.2 Gen4 RAID0)


Network

Mellanox ConnectX-6 Dual Port 100Gb InfiniBand