Habana Labs AI Training Solution Featuring Supermicro X12 Gaudi AI Training Server With DDN AI400X2 Storage System

Habana Labs Ltd., an Intel Corp. company and developer of AI processors, announced the availability of a turnkey, enterprise AI training solution featuring the Supermicro X12 Gaudi AI training server with the DDN AI400X2 storage system.

This system is the product of the collaboration of Habana Labs and Super Micro Computer, Inc. with DDN (DataDirect Networks, Inc.). With 8 Gaudi purpose-built AI processors, the Supermicro X12 Gaudi AI server provides customers with cost-efficient AI training, ease of use and system scalability. Integration of the Gaudi platform with the AI400X2 appliance eliminates storage bottlenecks found in traditional NAS storage and optimizes utilization of AI compute capacity.

Supermicro X12 Gaudi AI training server

Gaudi HL-205 mezzanine card
As data sets become larger and AI models grow in complexity, demand for AI training capacity is increasing . According to IDC’s Semiannual AI Tracker published in January 2021, over half of respondents who are AI/ML customers report rebuilding their AI models weekly or more often, and over a quarter rebuild models daily and even hourly. Along with this demand, 56% of AI/ML customers report that cost is the most significant challenge to implementing AI/ML solutions. Gaudi was designed to address this need with more cost-efficient price performance. With the integrated solution of the Supermicro X12 Gaudi AI server optimized with the AI400X2 storage appliance, customers requiring enterprise, cost-effective AI training systems with enhanced data management and storage can train more and spend less.

“The Habana team is committed to bringing Gaudi’s price performance, usability and scalability to enterprise AI customers who need more cost-effective AI training solutions,” said Eitan Medina, chief business officer, Habana Labs. “We are pleased to support our customers with this new turnkey solution that brings the efficiency of the Supermicro X12 Gaudi AI Server together with the data management and storage performance of the DDN AI400X2 system to augment utilization of AI compute capacity and enable us to address this growing need in training deep learning models.“

The turnkey AI training solution comes pre-configured with one, 2 and 4 X12 server options to address AI training capacity requirements. The scalable architectures of the X12 Gaudi server and AI400X2 appliance make it easy to expand to larger clusters, thereby enabling customers to scale their AI training infrastructure as their capacity requirements increase. Each Gaudi processor integrates ten 100GbE ports of RDMA over Converged Ethernet (RoCE) to provide easy and massive scaling capacity based on industry standard networking fabrics.

The solution is validated with the company’s SynapseAI software platform and workloads running with the firm‘s optimized TensorFlow and PyTorch Docker container images from the Habana software Vault. The firm’s developer site and reference models on Habana GitHub repositories make it for data scientists and developers to get started with building new models or migrating existing models for Gaudi. The solution is delivered and supported globally by DDN and Supermicro via partners worldwide for quick deployment.

DDN Ai400x2 appliance

AI400X2 appliance is an integrated and optimized platform that brings simplicity and cost-effective data management to AI workloads at any scale. Deployed as either all-flash NVMe or hybrid NVMe and disk systems, customers can choose how to best scale performance and capacity. Individual systems can be deployed with up to 720TB of NVMe flash and 6.4PB of HDDs and deliver greater than 90GB/s throughput and 3 million IO/s. Through automation and data management features, even the most complex AI workflows can be streamlined with a single storage solution. By reducing the number of systems and data center footprint required to deliver storage performance and capacity, the reduction in power, cooling and administrative overhead can be significant.

Supermicro X12 Gaudi AI training server

The X12 Gaudi AI training server features 8 Gaudi HL-205 mezzanine cards, dual 3rd Gen Intel Xeon Scalable processors, 2 PCIe Gen 4 switches, 4 hot swappable NVMe/SATA drives, redundant power supplies, and 24x100GbE RDMA (6 QSFP-DDs), resulting in near-linear system scale-out. The system contains up to 8TB of DDR4-3200MHz memory, unlocking the Gaudi AI processors’ potential. The HL-205 is OCP-OAM (Open Compute Project Accelerator Module) spec compliant. Each Gaudi incorporates 32GB HBM2 on-chip memory.

Click to enlarge

“DDN’s customers are frequently seeking to gain a competitive edge by developing and deploying new AI applications,” said Kurt Kuckein, VP, marketing, DDN. “Developing a turnkey AI training solution with the Supermicro X12 Gaudi AI Server leveraging the Habana Gaudi processors gives our customers a cost-effective way to jump start their AI projects and allow them to scale seamlessly as their requirements grow.“

“Supermicro is first to market with data center systems featuring Habana Gaudi AI processors, enabling customers to easily support the most advanced AI training and ML workloads,” said Don Clegg, SVP, WW sales, Supermicro. “Supermicro’s X12 Gaudi AI Training server delivers state-of-the-art industry solutions including 3rd Gen Intel Xeon Scalable processors, PCI-E Gen 4, hot-swappable storage options, and resource-savings architecture for maximum power and cost savings.“

Resources:
Habana Gaudi v1.1 documentation
Webinar: Getting started with Habana Gaudi AI Processors, December 14, 2021, 9AM to 10AM PST (registration required)
Supermicro Habana Gaudi solution brief
Supermicro X12 Gaudi AI server and DDN AI400X2 appliance DDN WP (registration required)

About Habana Labs
The Intel company, is an AI processor company founded in 2016 to develop purpose-built processor platforms optimized for training deep neural networks and for inference deployment in production environments. It is unlocking the potential of AI with platforms offering orders of magnitude improvements in processing performance, scalability, cost, and power consumption. Acquired by Intel in 2019, it operates as an independent business unit within the Intel Data Products Group.