SC24: Supermicro Delivers Direct-Liquid-Optimized Nvidia Blackwell Solutions
SuperClusters with Nvidia HGX B200 8-GPU, Nvidia GB200, NVL4, and NVL72 systems deliver unprecedented AI compute density
This is a Press Release edited by StorageNewsletter.com on November 26, 2024 at 2:01 pmAt SC24, Supermicro, Inc. is announcing the highest-performing SuperCluster, an end-to-end AI data center solution featuring the Nvidia Blackwell platform for the era of trillion-parameter-scale GenAI.
The SuperCluster will significantly increase the number of Nvidia HGX B200 8-GPU systems in a liquid-cooled rack, resulting in a large increase in GPU compute density compared to Supermicro’s current liquid-cooled Nvidia HGX H100 and H200-based SuperClusters. In addition, the company is enhancing the portfolio of its Nvidia Hopper systems to address the rapid adoption of accelerated computing for HPC applications and mainstream enterprise AI.
“Supermicro has the expertise, delivery speed, and capacity to deploy the largest liquid-cooled AI data center projects in the world, containing 100,000 GPUs, which Supermicro and Nvidia contributed to and recently deployed,” said Charles Liang, president and CEO, Supermicro. “These Supermicro SuperClusters reduce power needs due to DLC efficiencies. We now have solutions that use the Nvidia Blackwell platform. Using our Building Block approach allows us to quickly design servers with Nvidia HGX B200 8-GPU, which can be either liquid-cooled or air-cooled. Our SuperClusters provide unprecedented density, performance, and efficiency, and pave the way toward even more dense AI computing solutions in the future. The Supermicro clusters use direct liquid cooling, resulting in higher performance, lower power consumption for the entire data center, and reduced operational expenses.“
Proven AI performance at scale: Supermicro Nvidia HGX B200 systems
The upgraded SuperCluster scalable unit is based on a rack-scale design with innovative vertical coolant distribution manifolds (CDMs), which allow for an increased amount of compute nodes in a single rack. Newly developed and efficient cold plates and an advanced hose design further improve the efficiency of the liquid cooling system. A new in-row CDU option for large deployments is also available. Traditional air-cooled data centers can also take advantage of the Nvidia HGX B200 8-GPU systems with a new air-cooled system chassis.
The Supermicro Nvidia HGX B200 8-GPU systems come with a range of upgrades compared to the previous-gen. This system includes improvements to thermals and power delivery, support for dual 500W Xeon 6 (with DDR5 MRDIMMs at 8,800MT/s), or AMD EPYC 9005 Series processors. A new air-cooled 10U form-factor Supermicro Nvidia HGX B200 system features a redesigned chassis with expanded thermal headroom to accommodate 8x1,000W TDP Blackwell GPUs. These systems are designed with a 1:1 GPU-to-NIC ratio supporting Nvidia BlueField-3 SuperNICs or Nvidia ConnectX-7 NICs for scaling across a high-performance compute fabric. In addition, 2xNvidia BlueField-3 data processing units (DPUs) per system streamline data handling to and from attached high-performance AI storage.
Supermicro solutions featuring Nvidia GB200 Grace Blackwell Superchips
The company also offers solutions for all Nvidia GB200 Grace Blackwell Superchips, including the newly announced Nvidia GB200 NVL4 Superchip and the Nvidia GB200 NVL72 single-rack exascale computer.
The firm’s lineup of Nvidia MGX designs will support the Nvidia GB200 Grace Blackwell NVL4 Superchip. This superchip unlocks the future of converged HPC and AI, delivering revolutionary performance through 4xNvidia NVLink-connected Blackwell GPUs unified with 2 Nvidia Grace CPUs over NVLink-C2C. Compatible with Supermicro’s liquid-cooled Nvidia MGX modular systems, the Superchip provides up to 2x performance for scientific computing, graph neural network (GNN) training, and inference applications over the prior-gen.
The Nvidia GB200 NVL72 SuperCluster with Supermicro end-to-end liquid-cooling solution delivers an exascale supercomputer in a single rack with SuperCloud Composer (SCC) software, providing monitoring and management capability for liquid-cooled data centers. 72 Nvidia Blackwell GPUs and 36 Nvidia Grace CPUs are all connected via 5th–Gen Nvidia NVLink and NVLink Switch, effectively operating as 1 powerful GPU with a massive pool of HBM3e memory, facilitating 130TB/s of total GPU communication bandwidth with low latency.
Accelerated computing systems with Nvidia H200 NVL
Supermicro’s 5U PCIe accelerated computing systems are available with Nvidia H200 NVL, for lower-power, air-cooled enterprise rack designs that require flexible configurations, delivering acceleration for many AI and HPC workloads regardless of size. With up to 4xGPUs connected by Nvidia NVLink, a 1.5x memory capacity, and a 1.2x bandwidth increase with HBM3e, Nvidia H200 NVL can fine-tune LLMs in a few hours, delivering up to 1.7x faster LLM inference performance over the previous-gen. Nvidia H200 NVL also includes a 5-year subscription to Nvidia AI Enterprise, a cloud-native software platform for developing and deploying production AI.
Supermicro’s X14 and H14 5U PCIe accelerated computing systems support up to 2x4-way Nvidia H200 NVL systems through NVLink technology with a total of 8 GPUs in a system, providing up to 900GB/s GPU-to-GPU interconnection with a combined pool of 564GB of HBM3e memory per 4-GPU NVLink domain. The PCIe accelerated computing system can support up to 10 PCIe GPUs and now also features the latest Xeon 6 or EPYC 9005 Series processors to deliver flexible and versatile options for HPC and AI applications.
Supermicro at Supercomputing Conference 2024
The company will showcase a complete portfolio of AI and HPC infrastructure solutions at the Supercomputing Conference, including our liquid-cooled GPU servers for AI SuperClusters.