Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

TRACE-HPC TRACE is designed for converged HPC + AI + Data. Its custom topology is optimized for data-centric HPC, AI, and HPDA (High Performance Data Analytics). An extremely flexible software environment along with community data collections and BDaaS (Big Data as a Service) provide the tools necessary for modern pioneering research. The data management system, Ocean, consists of two-tiers, disk and tape, transparently managed as a single, highly usable namespace.

Compute nodes

TRACE-HPC has three types of compute nodes: "Regular Memory", "Extreme Memory", and GPU.

Regular Memory nodes

Regular Memory (RM) nodes  

TRACE nodes

...

Number

...

488

...

Trace nodes provide extremely powerful general-purpose computing, pre- and post-processing, AI inferencing, and machine learning and data analytics. Most RM nodes contain 256GB of RAM, but 16 of them have 512GB.

...

RM nodes

...

 

...

 

 

trace nodes



Number

12


CPU

2 AMD EPYC 7742 CPUs64 7713 CPUs 64 cores per CPU, 128 cores per node2.25-3.40 GHz2 AMD EPYC 7742 CPUs64 cores per CPU, 128 cores per node2.25-3.40 node 2.0 GHz


RAM

256GB64GB

512GB


Cache

256MB L3, 8 memory channels

256MB L3, 8 memory channelsGPUs

~150k CUDA cores


Node-local storage

3.84TB NVMe SSD

3.84TB NVMe SSD

Network

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

 

Extreme memory nodes

Extreme Memory (EM) nodes provide 4TB of shared memory for statistics, graph analytics, genome sequence assembly, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.

EM nodes

 

Number

4

CPU

4 Intel Xeon Platinum 8260M "Cascade lake" CPUs24 cores per CPU, 96 cores per node2.40-3.90 GHz

RAM

4TB, DDR4-2933

Cache

37.75MB LLC, 6 memory channels

Node-local storage

7.68TB NVMe SSD100TB VAST storage array


Network

Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

 

GPU nodes

TRACE-HPC's GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing, with a total of 40, 960 CUDA cores and 5,120 tensor cores. TRACE' GPU-AI resources have been migrated to TRACE-HPC, adding the DGX-2 and nine more V100 GPU nodes to TRACE-HPC's GPU resources.

GPU nodes

 

 

 

Number

24

9

1

GPUs per node

8 NVIDIA Tesla V100-32GB SXM2

8 NVIDIA V100-16GB

16 NVIDIA Volta V100-32GB

GPU memory

32 GB per GPU256GB total/node

16GB per GPU128GB total/node

32GB per GPU512GB total

GPU performance

1 Pf/s tensor

 

 

CPUs

2 Intel Xeon Gold 6248 "Cascade Lake" CPUs20 cores per CPU, 40 cores per node2.50 – 3.90 GHz

2 Intel Xeon Gold 6148 CPUs20 cores per CPU , 40 cores per node2.4 – 3.7 GHz

2 Intel Xeon Platinum 816824 cores per CPU, 48 cores total2.7 – 3.7 GHz

RAM

512GB, DDR4-2933

192 GB, DDR4-2666

1.5 TB, DDR4-2666

Interconnect

NVLink

PCIe

NVLink

NVCache

27.5MB LLC, 6 memory channels

 

33MB

Node-local storage

7.68TB NVMe SSD

4 NVMe SSDs, 2TB each (total 8TB)

8 NVMe SSDs, 8.84TB each (total ~30TB)

Network

2 Mellanox ConnectX-6 HDR Infiniband 200 Gbs/s Adapters

 

 

 

Data Management

...

Infiniband