Cray to Deliver Exabyte Storage System for Frontier Exascale System at ORNL
Storage system portion of previously-announced Frontier contract valued at $50 million, largest ClusterStor win to date
This is a Press Release edited by StorageNewsletter.com on June 28, 2019 at 2:20 pmCray Inc. announced that, as part of the Frontier CORAL-2 contract with the Department of Energy (DOE) and Oak Ridge National Lab (ORNL), Cray and ORNL have agreed to build a first-of-a-kind exabyte storage solution.
The next gen Cray ClusterStor storage file system will be integrated as part of ORNL’s Frontier exascale supercomputer, built on Cray’s Shasta architecture.
The storage solution is a new design for the data-intensive workloads of the exascale era and will be based on next gen ClusterStor storage and the Cray Slingshot high-speed interconnect.
The storage system portion of the previously-announced Frontier contract is valued at more than $50 million, which is the largest single ClusterStor win to date. The Frontier system is expected to be delivered in 2021.
The storage solution will be based on the next generation of ClusterStor storage line and will be comprised of over 1EB of hybrid flash and high capacity storage running the Lustre parallel file system. The storage solution will be connected to ORNL’s Frontier system via the Slingshot system interconnect to enable scaling of diverse modeling, simulation, analytics and AI workloads running simultaneously on the system.
The Frontier system is anticipated to debut in 2021 as the world’s most powerful computer with a performance of greater than 1.5 exaflops.
Compared to the storage for ORNL’s current Summit supercomputer, this next generation solution is more than four times the capacity (more than 1EB versus 250PB), and more than four times the throughput (up to 10TB/s vs. 2.5TB/s) of their existing Spectrum Scale-based storage system.
The new Cray ClusterStor storage solution for ORNL will be comprised of over 40 cabinets of storage and provide more than 1EB of total capacity across two tiers of storage to support random and streaming access of data. The primary tier is a flash tier for high-performance scratch storage and the secondary tier is a HDD tier for high capacity storage.
The storage system will be a center-wide system at ORNL in support of the Frontier exascale system and will be accessed by the Lustre parallel file system with ZFS local volumes all in a single POSIX namespace, which will make it the largest single file system in the world.
“We are excited to continue our partnership with ORNL to collaborate in developing a next generation storage solution that will deliver the capacity and throughput needed to support the dynamic new research that will be done on the Frontier exascale system for years to come,” said John Dinning, chief product officer, Cray. “By delivering a new hybrid storage solution that is directly connected to the Slingshot network, users will be able to drive data of any size, access pattern or scale to feed their converged modeling, simulation and AI workflows.“
HPC storage systems have traditionally utilized large arrays of HDDs accessed via large and predictable reads and writes of data. This is in stark contrast to AI and ML workloads, which typically have a mix of random and sequential access of small and large data sizes. As a result, traditional storage systems are not suited for the combined usage of these workloads given the mix of data access and the need for an intelligent high-speed system interconnect to quickly move massive amounts of data on and off the supercomputer to enable these diverse workloads to run simultaneously on exascale systems like Frontier.
The next generation ClusterStor-based storage solution addresses these challenges head on by providing a blend of flash and capacity storage to support complex access patterns, a new software stack for improved manageability and tiering of data, and scaling across both compute and storage through direct connection to the Slingshot high-speed network.
In addition to scaling, the direct connection of storage to the Slingshot network eliminates the need for storage routers that are required in most traditional HPC networks. This results in lower cost, lower complexity and lower latency in the system overall, thus delivering higher application performance and ROI.
Additionally, since Slingshot is Ethernet compatible, it can also enable interoperability with existing third party network storage as well as with other data and compute sources.
Cray’s Shasta supercomputers, ClusterStor storage and the Slingshot interconnect are becoming the technology for the exascale era by combining the performance and scale of HPC with the productivity of cloud computing and datacenter interoperability.
The new compute, software, storage and interconnect capabilities being pioneered for research labs like ORNL are being productized as standard offerings from Cray for research and enterprise customers alike, with expected availability starting at the end of 2019.