Lawrence Livermore HPC Cluster Catalyst With 281TB on SSDs, no HDDs

The U.S. Department of Energy’s Lawrence Livermore National Laboratory (LLNL), in partnership with Intel Corp. and Cray, Inc. announced a HPC cluster that will serve research scientists at all three institutions and provide a proving ground for new HPC and big data technologies and architectures.

“As the name implies, Catalyst aims to accelerate HPC simulation and big data innovation, as well as collaborations between the three institutions,” said Matt Leininger, deputy of advanced technology projects, LLNL. “The partnership between Intel, Cray and LLNL allows us to explore different approaches for utilizing large amounts of high performance non-volatile memory in HPC simulation and big data analytics.”

The Catalyst resource, a Cray CS300 cluster supercomputer, will be shared between the three partners with access rights based on level of investment. System access will be managed through LLNL’s HPC Innovation Center (HPCIC), whose mission is to work with industrial partners in the development of computing solutions for America to compete in the 21st century global economy.

Delivered to LLNL in late October, Catalyst is expected to be available for limited use this month and general use by December. The cluster consists of two scalable units (SUs), and represents an upgrade from the Appro clusters acquired under the Tri-Lab Linux Capacity Cluster (TLCC-2) procurement of a few years ago (Appro has since been acquired by Cray). TLCC aggregates the HPC capacity computing needs of the three weapons laboratories that serve the National Nuclear Security Administration’s (NNSA) Advanced Simulation and Computing (ASC) Program – Lawrence Livermore, Los Alamos and Sandia national laboratories – to procure commodity cluster systems more cost effectively.

The 150 teraflop/s Catalyst cluster has 324 nodes, 7,776 cores and employs the 12-core Xeon E5-2695v2 processors. It runs the NNSA-funded Tri-lab Open Source Software (TOSS) that provides a common user environment across NNSA Tri-lab clusters. Catalyst features include 128GB of DRAM per node, 800GB of NVRAM per compute node, 3.2TB of NVRAM per Lustre router node, and improved cluster networking with dual rail Quad Data Rate (QDR-80) Intel TrueScale fabrics. The addition of an expanded node local NVRAM storage tier based on PCIe Intel SSD allows for the exploration of new approaches to application checkpointing, in-situ visualization, out-of-core algorithms and big data analytics.

“Big Data unlocks an entirely new method of discovery by deriving the solution to a problem from the massive sets of data itself. To research new ways of translating big data into knowledge, we had to design a one-of-a-kind system,” said Raj Hazra, Intel VP and GM of the technical computing group. “Equipped with the most powerful Intel processors, fabrics and SSDs, the Catalyst cluster will become a critical tool, providing insights into the technologies required to fuel innovation for the next decade.”

The Catalyst architecture is expected to provide insights into the kind of technologies the ASC program will require over the next 5-10 years to meet high performance simulation and big data computing mission needs. The increased storage capacity of the system (in both volatile and nonvolatile memory) represents the departure from classic simulation-based computing architectures common at DOE laboratories and opens new opportunities for exploring the potential of combining floating-point-focused capability with data analysis in one environment. Consequently, the insights provided by Catalyst could become a basis for future commodity technology procurements.

HPCIC at LLNL will offer access to Catalyst and the expected big data innovations it enables as new options for its ongoing collaborations with American companies and research institutions. The machine’s expanded DRAM and fast, persistent NVRAM are suited to big data problems (i.e, bioinformatics, business analytics, machine learning and natural language processing), as well as meeting the increasingly demanding simulation requirements of ASC. Catalyst should extend the range of possibilities for the processing, analysis and management of the ever larger and more complex datasets that many areas of business and science now confront.

“We expect this collaboration to serve as a model for the kind of R&D that promotes HPC innovation,” said Fred Streitz, director, HPCIC. “Such innovation is critical to maintaining U.S. leadership in HPC.”

“Cray firmly believes that collaboration is a vital element of introducing new innovations to the worlds of big data and supercomputing, and we are honored that a Cray CS300 cluster supercomputer is the foundation of this important project with our partners at Intel and Lawrence Livermore,” said Daniel Kim, SVP and GM of cluster solutions, Cray.