AI-server Config

PCIe-5 AI Server with 8 GPUs

(:summary A technical overview and checklist for planning an 8-GPU PCIe Gen5 AI server. :)

Overview

A PCIe 5.0 AI server with *eight GPUs* is typically a 4U enterprise chassis designed for high-density accelerator workloads such as LLM training, inference, and HPC compute. Systems rely on PCIe Gen5 lanes, high-capacity redundant PSUs, and specialized airflow to cool passive GPUs such as the NVIDIA H100 PCIe (350W).

Supported Server Platforms

ASUS ESC8000A-E12P — 4U chassis supporting 8 dual-slot PCIe Gen5 GPUs, AMD EPYC 9004 platform, redundant 3000W PSUs.
Supermicro GPU Server Families — Multiple 4U chassis options with 8-GPU support for PCIe Gen5 accelerators; EPYC or Xeon; OCP networking.
Lenovo ThinkSystem H100-qualified platforms — OEM-validated 8-GPU builds with vendor-tested PCIe topology.

Recommended Base Architecture

Chassis: 4U rackmount, 8 × FHFL dual-slot GPUs
CPU Platform:
- AMD EPYC 9004/9005 (preferred for high PCIe lane count)
- Dual Intel 5th-Gen Xeon Scalable (vendor-qualified models)
GPUs: 8 × NVIDIA H100 PCIe Gen5 (350W TDP), optional NVLink bridges per adjacent pair
Memory: 512 GB to 2 TB DDR5 ECC RDIMMs
Networking: 1–2 × 200/400 GbE or InfiniBand HDR/NDR via OCP 3.0 or PCIe NICs
Storage: NVMe OS drives + multiple high-capacity NVMe for training data staging

Power and Cooling

H100 PCIe = 350W TDP → 8 GPUs ≈ 2800W GPU power
Typical full-system requirement: 3.5–4.5 kW
Use redundant 3000W (or higher) PSUs depending on vendor
Chassis must provide high-pressure front-to-rear airflow for passive GPU cooling
Optional liquid-assist modules depending on system design

PCIe Topology Considerations

Verify how GPUs map to CPU sockets
Check for PCIe switch usage and bandwidth allocation
Ensure NVLink-bridged GPU pairs reside within the same CPU domain
Validate that NICs and NVMe storage do not bottleneck GPU lanes

Procurement Notes

NVIDIA H100 PCIe systems are enterprise-priced and typically require OEM quotes
Lead times vary depending on GPU availability
Check driver, firmware, and OS compatibility per vendor documentation
Optional: NVIDIA AI Enterprise can be licensed through vendors

Vendor Checklist

Provide a complete PCIe topology diagram
Confirm GPU model, power cables, and NVLink bridge support
List sustained thermal performance and airflow metrics
PSU configuration: wattage, redundancy, PDU requirements
Supported NIC and NVMe configurations
Supported OS, driver versions, and management firmware

Example Specification (Template)

Chassis: 4U, 8 × PCIe-5 FHFL GPUs
CPU: 2 × AMD EPYC 9004
GPU: 8 × NVIDIA H100 PCIe Gen5
RAM: 1 TB DDR5 ECC
Storage: 2 × 3.84 TB NVMe (OS), 4 × 7.68 TB NVMe (datasets)
Networking: 2 × 400 GbE
PSU: 2 × 3000W Titanium (redundant)

Sergey A. Uzunyan