AI DevWorld: Pliops Showcasing XDP LightningAI Solution

With the growing demand for GenAI applications, optimizing large language models (LLM) inference efficiency and reducing costs have become essential.

Pliops Ltd. is empowering developers to tackle these challenges head-on. At AI DevWorld, Santa Clara, CA (Feb 11-13), the company will showcase its XDP LightningAI solution, which revolutionizes LLM performance by delivering end-to-end efficiency gains while significantly reducing cost, power, and computational requirements. By enabling vLLM to process each context only once, Pliops is setting a new standard for scalable and sustainable AI innovation.

As LLMs continue to grow in size and sophistication, their demands for computational power and energy also increase significantly. This growth introduces challenges, such as longer processing times, to generate the 1st token of a response due to the need to handle extensive context. Notably, up to 99% of context data – such as conversation history, books, and domain-specific text – may be processed repeatedly during LLM inference. This repetition leads to inefficiencies, as these models must continuously compute their key-value (KV) caches for unchanged information.

Pliops LightningAI: Boost for vLLM
XDP LightningAI, an accelerated KV distributed smart node, introduces a new petabyte tier of memory below high-bandwidth memory (HBM) for GPU compute applications. It utilizes cost-effective, disaggregated smart storage to retain computed KV caches, allowing them to be retrieved if discarded from HBM. When serving a pre-processed context, the saved KV caches are efficiently loaded from storage, allowing vLLM to generate new content significantly faster.

The company’s LLM inference solution is optimal for AI autonomous task agents, an emerging use case for LLMs. These models have the capability to function autonomously, and are adept at addressing a diverse array of complex tasks through strategic planning, sophisticated reasoning, and dynamic interaction with external environments.

Pliops’ AI DevWorld demo, featuring multi-turn conversations, fundamentally supports autonomous task agents. At the show, Moshe Twitto, CTO and co-founder, Pliops , will deliver a presentation that highlights the details and provides an overview of this groundbreaking capability. The session will take place on Thursday, February 13 at 10 a.m. PST – with a virtual session to follow on Thursday, February 20 at 10 a.m. PST.

XDP LightningAI fully saturates the fabric (including 400G and beyond), even when handling traffic with extremely small random I/O sizes for both read and write operations. It also facilitates sharing of KV caches across multiple GPUs, vLLM instances, and users. With virtually unlimited storage capacity, any portion of the cached context can be reused without re-computation, unlocking new levels of scalability and efficiency.

XDP LightningAI connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed KV service. XDP LightningAI outperforms traditional Filesystem (FS) and DRAM-based solutions, addressing critical limitations in handling modern AI workloads.

The company’s technology is highly versatile and effective, supporting all advancements in LLMs. The recent announcement by DeepSeek and its innovations further reinforce Pliops’ competitive edge. Each of DeepSeek’s major architectural innovations either enhances or maintains the advantages of Pliops’ KV cache offloading solution.

MLA (KV compression) reduces KV cache size but does not lower compute, resulting in a net gain for Pliops.
Speculative decoding reduces HBM bandwidth per token, making batching more efficient, which strengthens Pliops’ benefits.
Prefill-decode disaggregation aligns with Pliops’ expected market direction, where its solution delivers up to 8x efficiency gains.

DeepSeek’s advancements underscore the robustness of the firm’s shared KV store solution. As new models emerge, the fundamental bottlenecks in memory bandwidth and I/O persist, ensuring that Pliops remains a critical enabler for high-performance AI inference.

Live at AI DevWorld
The company has focused on LLM inferencing, a crucial and rapidly evolving area within the GenAI world that demands significant efficiency improvements. The firm’s demo at AI DevWorld is centered around accelerating LLM inferencing applications. This same memory tier is seamlessly applicable for other GenAI applications that Pliops plans to introduce over the next few months.

“As the world’s largest artificial intelligence dev event, AI DevWorld provides the perfect platform to showcase how our solutions are transforming AI infrastructure, enabling developers to build faster, more sustainable, and scalable AI applications,” said Ido Bukspan, CEO, Pliops. “We’re excited to share how our technology is paving the way for faster, more efficient, and cost-effective AI innovation.”

Highlights at Pliops booth #912 on the AI DevWorld show floor at Santa Clara Convention Center, CA, include:

XDP LightningAI running with Dell PowerEdge servers
XDP enhancements for AI VectorDB