San Diego HPC Center and Fungible Achieve 10 Million IO/s Storage Performance
Solution leverages NVMe over TCP standard and Storage Initiator card and storage target to unlock potential of rest of infrastructure.
This is a Press Release edited by StorageNewsletter.com on November 24, 2021 at 2:00 pmFungible Inc. and the San Diego HPC Center (SDSC) have shattered the NVMe over TCP storage initiator performance world record, achieving 10 million IO/s. (*)
Distributed AI, ML, and other data-centric workflows have traditionally been constrained in what they can accomplish by the limitations of traditional RDMA, iSCSI, and FC based storage protocols and products. The company’s solution leverages the NVMe over TCP standard and the firm‘s Storage Initiator card and storage target to unlock the potential of the rest of the infrastructure. The performance announced exceeds the prior performance record by over 50%. The prior performance record was attained using the Fungible Storage Cluster without the benefit of the Storage Initiator cards. These cards were able to deliver this increase in performance while simultaneously freeing up an amount of server resources to do other work.
Fc200 HBA card
The experiment was performed under the auspices of SDSC’s Advanced Technology Lab. The ATL’s team of scientists and engineers surveys, evaluates, and assembles the computing and storage technologies needed for emerging scientific computing and data analysis systems.
Fungible Storage Cluster (FSC)
“While impressive from a performance perspective, the results of this testing are more about expanding the scope of what AI, ML, data analytics, and other data-centric environments can deliver,” said Eric Hayes, CEO, Fungible. “The Fungible Storage Initiator cards developed on our standards-based Fungible DPUfree up tremendous amounts of server CPU resources to run application code, and the application now has faster access to data than it ever has before. Scale-out data centers, powered by Fungible, can now surpass their performance goals economically, reliably and securely.“
According to John Graham, UC San Diego senior development engineer working at SDSC and the Qualcomm Institute, “The Fungible solution has set a new bar for storage performance in our environment. The results are potentially transformational for large-scale scientific cyberinfrastructure such as the Pacific Research Platform (PRP) and its follow-on, the National Research Platform (NRP). With Fungible’s innovative DPU technology, we are able to deploy a high-performance storage solution that achieves our planned density and cost requirements,” he said. “The PRP and NRP are unique, multi-institutional distributed systems for conducting at-scale AI and data-intensive computing for scientific research in a wide area environment.“
“One of the challenges of doing distributed AI at scale is storage performance, both raw bandwidth and IO/s,” noted Frank Wuerthwein, interim director, SDSC, and principal investigator for the National Research Platform. “Fungible’s technology looks very promising in delivering the storage performance we need to achieve our future goals for a wide area, distributed AI and data science platform.“
“We are proud that AMD EPYC processors and their high-performance capabilities were able to help Fungible and SDSC showcase a new level of storage performance,” said Kumaran Siva, corporate VP, server software and systems, AMD (Advanced Micro Devices, Inc.). “Achievements like this have profound impacts on scale-out data centers around the world for scalability of storage technologies.“
Test methodology
The tests were administered from a Gigabyte R282-Z93 server with a dual 64 core AMD EPYC 7763 processor and 2TB of memory. The 10 million IO/s benchmark was achieved using 5 Fungible Storage Initiator cards running on the PCIe bus of the server with the newly launched NVMe/TCP Storage Initiator (SI) software. The previous record of 6.55 million IO/s was achieved by utilizing Mellanox ConnectX-5 NICs. This record required almost completely saturating the CPU cores on the host AMD EPYC processor-powered server. The new record had the added advantage of using only 63% of the CPU cores to drive the higher performance, leaving more of the cores available for user applications.
(*) All information in this press release is based on publicly available data.