SC24: Pliops Unveils and Demonstrates XDP LightningAI Solution Over 5X Acceleration for LLM Inference
XDP LightningAI connects to GPU servers by leveraging mature NVMe-oF storage ecosystem to provide distributed key-value service.
This is a Press Release edited by StorageNewsletter.com on November 21, 2024 at 2:02 pmAddressing the critical issue of constrained power budgets, Pliops Ltd. is enabling AI-powered businesses and hyperscalers to achieve performance by optimizing power usage, reducing costs, and shrinking their carbon footprint.
At SC24, Atlanta, GA, Pliops will spotlight its XDP LightningAI solution, which enables sustainable, high-efficiency AI operations when paired with GPU servers.
Organizations are increasingly concerned about the lack of power budgets in data centers, particularly as AI infrastructure and emerging AI applications lead to higher energy footprints and strain cooling systems. As they scale their AI operations and add GPU compute tiers, the escalating power and cooling demands, coupled with significant capital investments in GPUs, are eroding margins. A monumental challenge looms as data centers struggle to secure essential power, creating significant pressure for companies striving to expand their AI capabilities.
The company knows that efficient infrastructure solutions are essential to address these issues – and the company’s newest Extreme Data Processor (XDP), XDP-PRO ASIC – plus an AI software stack and distributed XDP LightningAI nodes – address GenAI challenges by utilizing a GPU-initiated Key-Value I/O interface as a foundation, creating a memory tier for GPUs, below HBM. The firm’s XDP LightningAI connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed Key-Value service. The firm has focused on LLM inferencing, a crucial and rapidly evolving area within the GenAI world that demands significant efficiency improvements. The company’s demo at SC24 is centered around accelerating LLM inferencing applications. This same memory tier is applicable for other GenAI applications that Pliops plans to introduce over the next few months.
In today’s LLM inferencing computing, GPU prefill operations are heavily compute-bound and critically determine the batch size. While prefill can fully utilize GPU resources, increasing the batch size beyond a certain point only increases the Time to First Token (TTFT) without improving prefill rate. On the other hand, GPU decode operations are HBM bandwidth-bound and mainly influenced by model and KV cache sizes, benefiting significantly from larger batch sizes through higher HBM bandwidth efficiency. The company’ solution improves prefill time, allowing for larger batch sizes without violating user SLA for prefill operations. This enhancement directly affects decode performance as well, as it benefits greatly from the increased batch size. As a result, by improving prefill time, the system achieves nearly proportional improvements in end-to-end throughput.
“By leveraging our state-of-the-art technology, we deliver advanced GenAI and AI solutions that empower organizations to achieve unprecedented performance and efficiency in their AI-driven operations,” said Ido Bukspan, CEO, Pliops. “As the industry’s leading HPC technical conference, SC24 is the ideal venue to showcase how our solutions redefine AI infrastructure, enabling faster, more sustainable innovation at scale.”
Highlights at Pliops booth on SC24 show floor of the Georgia World Congress Center, Atlanta, GA, include:
-
XDP LightningAI running with Dell PowerEdge servers
-
XDP enhancements for AI VectorDB
The company can also be found at the SC24 PetaFLOP reception at the College Football Hall of Fame on November 19.