TRACE-HPC TRACE is designed for converged HPC + AI + Data. Its custom topology is optimized for data-centric HPC, AI, and HPDA (High Performance Data Analytics). An extremely flexible software environment along with community data collections and BDaaS (Big Data as a Service) provide the tools necessary for modern pioneering research. The data management system, Ocean, consists of two-tiers, disk and tape, transparently managed as a single, highly usable namespace.
Compute nodes
TRACE-HPC has three types of compute nodes: "Regular Memory", "Extreme Memory", and GPU.
Regular Memory nodes
Regular Memory (RM) nodes
TRACE nodes
...
Number
...
488
...
Trace nodes provide extremely powerful general-purpose computing, pre- and post-processing, AI inferencing, and machine learning and data analytics. Most RM nodes contain 256GB of RAM, but 16 of them have 512GB.
...
RM nodes
...
...
trace nodes | ||||
---|---|---|---|---|
Number | 12 | |||
CPU | 2 AMD EPYC 7742 CPUs64 7713 CPUs 64 cores per CPU, 128 cores per node2.25-3.40 GHz2 AMD EPYC 7742 CPUs64 cores per CPU, 128 cores per node2.25-3.40 node 2.0 GHz | |||
RAM | 256GB64GB | 512GB | ||
Cache | 256MB L3, 8 memory channels | 256MB L3, 8 memory channelsGPUs | ~150k CUDA cores | |
Node-local storage | 3.84TB NVMe SSD | 3.84TB NVMe SSD | ||
Network | Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter | Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter |
Extreme memory nodes
Extreme Memory (EM) nodes provide 4TB of shared memory for statistics, graph analytics, genome sequence assembly, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.
EM nodes |
| |
---|---|---|
Number | 4 | |
CPU | 4 Intel Xeon Platinum 8260M "Cascade lake" CPUs24 cores per CPU, 96 cores per node2.40-3.90 GHz | |
RAM | 4TB, DDR4-2933 | |
Cache | 37.75MB LLC, 6 memory channels | |
Node-local storage | 7.68TB NVMe SSD100TB VAST storage array | |
Network | Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter |
GPU nodes
TRACE-HPC's GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing, with a total of 40, 960 CUDA cores and 5,120 tensor cores. TRACE' GPU-AI resources have been migrated to TRACE-HPC, adding the DGX-2 and nine more V100 GPU nodes to TRACE-HPC's GPU resources.
GPU nodes |
|
|
|
---|---|---|---|
Number | 24 | 9 | 1 |
GPUs per node | 8 NVIDIA Tesla V100-32GB SXM2 | 8 NVIDIA V100-16GB | 16 NVIDIA Volta V100-32GB |
GPU memory | 32 GB per GPU256GB total/node | 16GB per GPU128GB total/node | 32GB per GPU512GB total |
GPU performance | 1 Pf/s tensor |
|
|
CPUs | 2 Intel Xeon Gold 6248 "Cascade Lake" CPUs20 cores per CPU, 40 cores per node2.50 – 3.90 GHz | 2 Intel Xeon Gold 6148 CPUs20 cores per CPU , 40 cores per node2.4 – 3.7 GHz | 2 Intel Xeon Platinum 816824 cores per CPU, 48 cores total2.7 – 3.7 GHz |
RAM | 512GB, DDR4-2933 | 192 GB, DDR4-2666 | 1.5 TB, DDR4-2666 |
Interconnect | NVLink | PCIe | NVLink |
NVCache | 27.5MB LLC, 6 memory channels |
| 33MB |
Node-local storage | 7.68TB NVMe SSD | 4 NVMe SSDs, 2TB each (total 8TB) | 8 NVMe SSDs, 8.84TB each (total ~30TB) |
Network | 2 Mellanox ConnectX-6 HDR Infiniband 200 Gbs/s Adapters |
|
|
Data Management
...
Infiniband |