AI-server Config
PCIe-5 AI Server with 8 GPUs
(:summary A technical overview and checklist for planning an 8-GPU PCIe Gen5 AI server. :)
Overview
A PCIe 5.0 AI server with *eight GPUs* is typically a 4U enterprise chassis designed for high-density accelerator workloads such as LLM training, inference, and HPC compute. Systems rely on PCIe Gen5 lanes, high-capacity redundant PSUs, and specialized airflow to cool passive GPUs such as the NVIDIA H100 PCIe (350W).
Supported Server Platforms
- ASUS ESC8000A-E12P — 4U chassis supporting 8 dual-slot PCIe Gen5 GPUs, AMD EPYC 9004 platform, redundant 3000W PSUs.
- Supermicro GPU Server Families — Multiple 4U chassis options with 8-GPU support for PCIe Gen5 accelerators; EPYC or Xeon; OCP networking.
- Lenovo ThinkSystem H100-qualified platforms — OEM-validated 8-GPU builds with vendor-tested PCIe topology.
Recommended Base Architecture
- Chassis: 4U rackmount, 8 × FHFL dual-slot GPUs
- CPU Platform:
- AMD EPYC 9004/9005 (preferred for high PCIe lane count)
- Dual Intel 5th-Gen Xeon Scalable (vendor-qualified models)
- GPUs: 8 × NVIDIA H100 PCIe Gen5 (350W TDP), optional NVLink bridges per adjacent pair
- Memory: 512 GB to 2 TB DDR5 ECC RDIMMs
- Networking: 1–2 × 200/400 GbE or InfiniBand HDR/NDR via OCP 3.0 or PCIe NICs
- Storage: NVMe OS drives + multiple high-capacity NVMe for training data staging
Power and Cooling
- H100 PCIe = 350W TDP → 8 GPUs ≈ 2800W GPU power
- Typical full-system requirement: 3.5–4.5 kW
- Use redundant 3000W (or higher) PSUs depending on vendor
- Chassis must provide high-pressure front-to-rear airflow for passive GPU cooling
- Optional liquid-assist modules depending on system design
PCIe Topology Considerations
- Verify how GPUs map to CPU sockets
- Check for PCIe switch usage and bandwidth allocation
- Ensure NVLink-bridged GPU pairs reside within the same CPU domain
- Validate that NICs and NVMe storage do not bottleneck GPU lanes
Procurement Notes
- NVIDIA H100 PCIe systems are enterprise-priced and typically require OEM quotes
- Lead times vary depending on GPU availability
- Check driver, firmware, and OS compatibility per vendor documentation
- Optional: NVIDIA AI Enterprise can be licensed through vendors
Vendor Checklist
- Provide a complete PCIe topology diagram
- Confirm GPU model, power cables, and NVLink bridge support
- List sustained thermal performance and airflow metrics
- PSU configuration: wattage, redundancy, PDU requirements
- Supported NIC and NVMe configurations
- Supported OS, driver versions, and management firmware
Example Specification (Template)
- Chassis: 4U, 8 × PCIe-5 FHFL GPUs
- CPU: 2 × AMD EPYC 9004
- GPU: 8 × NVIDIA H100 PCIe Gen5
- RAM: 1 TB DDR5 ECC
- Storage: 2 × 3.84 TB NVMe (OS), 4 × 7.68 TB NVMe (datasets)
- Networking: 2 × 400 GbE
- PSU: 2 × 3000W Titanium (redundant)
See Also
- NVIDIA H100 PCIe Architecture
- Vendor reference designs (ASUS, Supermicro, Lenovo)
- HPC/AI Cluster Build Guides