What are you looking for ?
Advertise with us
RAIDON

SciNet Consortium Relies on Excelero

For petascale storage at HPC of University of Toronto

Excelero, Inc., in software-defined block storage, announced that customer SciNet HCP Consortium has deployed its NVMesh server SAN for the storage behind a supercomputer at the University of Toronto.

By using NVMesh for burst buffer – a storage architecture that helps ensure HA and high ROI, SciNet created a unified pool of distributed NVMe flash that retains the speeds and latency of directly attached storage media, while meeting the demanding SLAs for the new supercomputer.

For SciNet, NVMesh is an extremely cost-effective method of achieving unheard-of burst buffer bandwidth,” said Dr. Daniel Gruner, CTO, SciNet. “By adding commodity flash drives and NVMesh software to compute nodes, and to a low-latency network fabric that was already provided for the supercomputer itself, NVMesh provides redundancy without impacting target CPUs. This enables standard servers to go beyond their usual role in acting as block targets – the servers now can also act as file servers.

Based in Toronto, SciNet, Canada’s largest supercomputer centre, serves thousands of researchers in biomedical, aerospace, climate sciences, and more.

Their large-scale modelling, simulation, analysis and visualisation applications sometimes run for weeks, and interruptions can sometimes destroy the result of an entire job. To avoid interruption SciNet implemented a burst buffer – a fast intermediate layer between the non-persistent memory of the compute nodes and the storage – to enable fast check-pointing, so that computing jobs can be restarted.

SciNet had deployed the IBM Spectrum Scale (GPFS) shared parallel file system on their spinning disk system, but at scale, as individual jobs become larger, check-pointing may take too long to complete, making the calculation difficult, or even impossible to carry out.

Using NVMesh in a burst buffer implementation, SciNet created a petascale storage system that leverages the performance of NVMe SSDs at scale, over the network – meeting SLA requirements for completing checkpoints in 15 minutes, without needing costly proprietary arrays. NVMesh created an unified, distributed pool of NVMe flash storage comprised of 80 NVMe devices in just 10 NSD protocol-supporting servers. This provided approximately 148GB/s of write burst (device limited) and 230GB/s of read throughput (network limited) – in addition to well over 20 million random 4K IO/s.

Emulating the ‘shared nothing’ architectures of the tech giants, NVMesh deployment allows them to use hardware from any storage, server and networking vendor, eliminating vendor lock-in. Integration with SciNet’s parallel file system is straightforward, and the system enables SciNet to scale both capacity and performance linearly as its research load grows.

Mellanox interconnect solutions include smart and scalable NVMe accelerations that enable users to maximise their storage performance and efficiency,” said Gilad Shainer, VP marketing, Mellanox Technologies, Ltd.Leveraging the advantages of IB, Excelero delivers world leading NVMe platforms, accelerating the next generations of supercomputers.

In super-computing any unavailability wastes time, reduces the availability score of the system and impedes the progress of scientific exploration. We’re delighted to provide SciNet and its researchers with important storage functionality that achieves the highest performance available in the industry at a reduced price – while assuring vital scientific research can progress swiftly,” said Lior Gal, CEO and co-founder, Excelero.

Articles_bottom
ExaGrid
AIC
ATTOtarget="_blank"
OPEN-E