What are you looking for ?
Advertise with us
RAIDON

R&D: I/O and Storage, ETP4HPC SRA 6 WP

White paper released as part of ETP4HPC’s Strategic Research Agenda 6.

Zenodo, EU Open Research Repository has published an article written by Neuwirth, Sarah M. (editor), Johannes Gutenberg University Mainz, Deniel, Philippe (editor), CEA , Acquaviva, Jean-Thomas, DDN, Golasowski, Martin, VSB – Technical University of Ostrava, Hennecke, Michael, Intel Deutschland GmbH, Jackson, Adrian, Leibovici, Thomas, Atomic Energy and Alternative Energy Commission, Luettgau, Jakob, Inria Rennes Bretagne Atlantique Research Centre, and Nou, Ramon, Barcelona Supercomputing Center.

Abstract: Exascale represents a major leap in the evolution of supercomputing, not just another step after Terascale and Petascale. From a storage perspective, it marks a significant shift in system architecture. The immense compute power of Exascale introduces new challenges that render some current technologies insufficient, as they won’t scale effectively. This calls for exploring new approaches to interfacing storage systems with supercomputers. At the same time, existing technologies must adapt and evolve to meet the demands of the Exascale era.

Some upcoming challenges in HPC may appear to be familiar but take on new forms. System scalability remains a key issue, but now must be considered alongside data scalability, especially with the rise of AI frameworks that place increased pressure on storage systems from both data and metadata perspectives. The evolution of compute hardware, particularly the widespread adoption of GPUs, has significantly impacted HPC, forcing a reconsideration of several paradigms, many of which are tied to storage systems. Storage hardware is also evolving, with broader adoption of SSDs and the introduction of ultrafast NVMe buses. The storage hierarchy is shifting from the traditional two-tier model of tapes and rotating disks to a deeper, more complex structure with three or four levels. This new landscape complicates data placement, as data becomes increasingly heterogeneous, and previously unsolvable problems are now addressable in the Exascale era.

The first essential software component is middleware. Beyond scalability, energy efficiency, and performance, the increased complexity demands a focus on standardization, interoperability, resilience, fault tolerance, and security. As data scales to tens of exabytes and metadata to hundreds of terabytes, minimizing data movement becomes crucial. This, however, contrasts with emerging paradigms like deeper storage hierarchies and disaggregated storage, requiring optimization of long-distance data transfers and placement of applications close to storage. Another strategy is reducing the amount of data transferred, for example, through computational storage, where some processing occurs directly on storage systems. Both these approaches partially address the full workflow requirements of large-scale scientific applications, moving beyond simply a single large application running on a single system. The sheer growth of data challenges traditional file systems, which struggle to scale sufficiently. Object stores offer an efficient alternative but introduce their own issues, particularly around interoperability and security. Heavily utilized in cloud systems, object stores foster a partial convergence between HPC and cloud computing. These exascale-driven advancements necessitate the introduction of new Key Performance Indicators (KPIs), shifting the focus beyond pure performance metrics. Energy consumption, in particular, becomes a critical factor, given the immense scale of all exascale systems, as do more varied I/O patterns.

Articles_bottom
ExaGrid
AIC
ATTO
OPEN-E