What are you looking for ?
Advertise with us
RAIDON

Vast Data Unveils Data Center Architecture for AI Factory with Nvidia Technologies

Being deployed at CoreWeave

Vast Data Ltd. unveiled an AI cloud architecture designed to deliver high levels of performance, QoS, zero-trust security and space/cost/power efficiency for the AI factory.

Vast Data Nvidia Bluefield Intro

Building on Nvidia BlueField-3 data processing unit (DPU) technology, the company’s parallel system architecture makes it possible to disaggregate the entirety of its OS natively into AI computing machinery, transforming supercomputers into AI data engines.

Vast Data Nvidia Bluefield Scheme1

The Nvidia BlueField networking platform combines robust compute power and integrated hardware accelerators to create secure and software-defined accelerated computing infrastructure for AI. By outfitting each GPU server with a dedicated BlueField DPU running a stateless container that powers the Vast parallel services OS, this architecture design embeds storage and database processing services directly into AI servers and delivers true linear data services designed to scale to hundreds of thousands of GPUs. Moreover, by removing multiple layers of x86 hardware and networking from the company’s network-attached Data Platform infrastructure, this new AI factory architecture reduces the cost, footprint, and power associated with AI data services.

Vast Nvidia Scheme1

Through its collaboration with Nvidia and this first-of-its-kind integration, Vast Data is: 

  • Maximizing data center efficiency: Its Disaggregated, Shared Everything (DASE) architecture leverages the processing power of BlueField-3 to require less independent compute and networking resources, reducing the power usage and data center footprint for Vast infrastructure by 70%. The combined end-to-end solution results in a net energy consumption savings of over 5% compared to deploying Nvidia-powered supercomputers with the previous Vast distributed data services infrastructure. 

  • Enabling unprecedented QoS: By providing each GPU server with a dedicated and truly parallel storage and database container, this AI factory architecture eliminates contention for data services infrastructure. The firm’s DASE architecture features extreme parallelism such that each BlueField-3 can read and write into shared namespaces of the company’s Data Platform without coordinating IO across containers. In essence, this architecture eliminates infrastructure contention at the most fundamental level. This contention-less architecture is essential for multi-tenant service providers who need to meet the contractual Service Level Objectives of their clients while also maximizing the utilization of all GPU computing assets.

  • Enhancing zero-trust security: This AI factory architecture ensures that data and data management remain protected and isolated from host operating systems. Compared to AI computers that use parallel file system clients (which have an intimate understanding of the data services layer), the company is able to eliminate many attack vectors in a multi-tenant environment by hosting industry-standard network attached services, object services, and database services from BlueField-3 DPUs via standard client protocols that do not expose the underlying Data Platform system topology – such as NFS, SMB, S3 and Apache Arrow.

  • Delivering block storage services: The company’s systems, powered by the Nvidia DOCA software framework that enables the rapid development of containerized services, provides block storage services natively to host OSs – combining with Vast’s file, object, and database services to provide a set of data presentations to high-performance applications.

We’re proud to partner with Nvidia to help industrialize AI computing,” said Jeff Denworth, co-founder, Vast Data. “This new architecture is the perfect showcase to express the parallelism of the Vast Data Platform. With Nvidia BlueField-3 DPUs, we can now realize the full potential of our vision for disaggregated data centers that we’ve been working toward since the company was founded.”

Vast Data Nvidia Bluefield Scheme2

This firm’s architecture – running Vast software on BlueField DPUs in the AI servers – is being tested and deployed first at CoreWeave, a specialized GPU cloud provider. Vast and CoreWeave began partnering in 2023 to build some of a scalable AI machinery and to help many of the world’s LLM builders and blue-chip enterprise customers build their own AI factories.

With Vast’s OS, next-gen accelerated computing solutions are paired with next-generation accelerated network infrastructure, enabling enterprises and service providers to benefit from simpler, more secure experiences with high-performance systems,” said Rob Davis, VP, storage technology, Nvidia Corp.

Vast’s revolutionary architecture is a game-changer for CoreWeave, enabling us to fully disaggregate our data centers. We’re seamlessly integrating Vast’s advanced software directly into our GPU clusters,” said Peter Salanki, VP, engineering, CoreWeave. “Leveraging BlueField DPUs, we’ve been at the forefront of creating sophisticated, software-defined data center abstractions. Now, by natively incorporating storage and database services onto BlueField, we’re not just streamlining our infrastructure but we are also elevating the user experience for our customers by removing bottlenecks in the AI data computing pipeline. CoreWeave is not just keeping pace with the future of cloud data management – we are defining it.

The company will bet at Nvidia GTC 2024 event on March 18-21 in San Jose, CA . Tune in to the AI Showcase starting at 11:00am PT featuring presentations from Vast, Supermicro, Nvidia and Run:ai.

Resources:
Vast + Nvidia    
Vast + Nvidia BlueField solution brief
Nvidia DGX SuperPOD: Vast

Articles_bottom
ExaGrid
AIC
ATTOtarget="_blank"
OPEN-E