Australian Walter and Eliza Hall Institute of Medical Research Selecting Vast Data and Xenon
To accelerate scientific research and tackles data inflammation
This is a Press Release edited by StorageNewsletter.com on October 19, 2022 at 2:00 pmWEHI (Walter and Eliza Hall Institute of Medical Research) is where the world’s brightest minds collaborate and innovate to make life-changing scientific discoveries that help people live healthier for longer.
Its medical researchers have been serving the community for more than 100 years, making transformative discoveries in cancers, infectious and immune diseases, developmental disorders and healthy ageing.
In 2017, WEHI worked with Xenon to implement a private cloud solution for their HPC environment. At that time, approximately 90 bio informaticians and biologists were processing research data on the HPC cluster. The Xenon private cloud implementation allowed the researchers to containerise their workloads, making their experiments more repeatable. The private cloud approach also allowed for more efficient scheduling of jobs and delivered better compute resource allocations.
The HPC cluster utilised a traditional HSM system. This included a tier of hybrid disks (flash and HDDs), used as scratch cache, and a tier of capacity HDDs with automated tiering to a large tape library, used for archive and long-term storage. The advantage of this tiered system was that researchers were allocated working storage space both within scratch space and within the capacity tier – this presented as unlimited storage capacity to the researcher, with older data tiered to tape.
Data Growth and New Research Tools
Since 2017, WEHI has introduced research tools and instruments, like the cryo-electron microscope (aka cryo-em). The cryo-em processes allow researchers to see biological actions as they happen, frozen in time at the molecular level. To achieve these results and this level of insight, the cryo-em researchers process large, high-resolution images to extract micro-graphs and 3D interpretations of their data. This process alone requires storage with a high level of streaming data R/W performance, and also a high level of IO/s.
The amount of genomic sequencing data generated at WEHI also increased massively over the last 5 years.
“It’s in the order of hundreds of millions of files. Lots of sequencing data; some researchers have directories with over 200,000 individual files,” stated Tim Martin, WEHI senior ITS research systems engineer.
He also said that a lot of file systems struggle with this scale.
Miguel Esteva, WEHI senior ITS research systems engineer noted: “We have a little bit of everything. We have massive imaging files, and some divisions that create millions of tiny files.“
This variation is a key design issue in storage and data management.
While the HSM storage environment provided large capacity, it had performance limitations as the system was originally intended for managing the large data sets of the researchers and providing data protection through the replication to tape.
Hunt for Performance
The WEHI team needed not only scratch storage that would meet their streaming requirement; it also needed to cope with its expanding IO/s load requirements and provide for growth into the future. With Xenon, they explored the range of offerings from storage vendors.
Martin said: “We were really attracted to the architecture of Vast. It was clearly fresh, simple and scalable. Other vendors had legacy architecture issues like east-west traffic, or scalability issues – we have a reputation for pushing storage and compute to its breaking point due to our ever-growing data sets. If there is a limit, we will find it.“
He added: “We also wanted to avoid drivers and go with a more flexible open standard protocol. We’re in a constant battle updating drivers and clients between current storage and filesystems.“
Future scalability was another key consideration.
“Vast also offered a seamless expansion path to add capacity and performance without disrupting users,” he said.
While Vast was new in Australia, and a new storage company, the recommendation from Xenon helped.
“Xenon has always supplied quality that has been proven in the test of time,” explained Esteva.
Science Discoveries are Now Vast
WEHI procured an initial 676TB of Vast storage from Xenon and this was implemented as new HPC scratch storage. This was presented to applications as a cache area, or as a cache and data area, depending on the application requirements. The Vast storage was initially provided to applications requiring high IO/s with a random data access pattern, like the cryo-em team’s processing needs for their image files.
Initial site acceptance testing was showing an acceleration of the new research tools that required high IO/s scratch storage. CryoSPARC was running 5-8x faster when using Vast as a cache, while Relion 2D classification was 10-17x when using Vast as a cache and data area. Samtools sort was 2-3.5x faster.
With Vast, IO/s from the storage is no longer a bottleneck for these research applications. The Vast storage is achieving close to network saturation at 50Gb/s between compute nodes and the storage, and hitting over 300,000 IO/s. These results have encouraged a wider adoption of Vast, with the cryo-em users moving across to use Vast to support their work.
Martin noted: “Feedback from users is that is has greatly improved the performance of their workloads“.
Importantly, from a research perspective, he stated: “Time to result has been reduced, and the scale of research we can run has massively increased. And from our team’s perspective we have more visibility into what the storage is doing, we have really good support from Vast and Xenon, and we’ve never had any kind of issues. We spend less time troubleshooting problems with the filesystem, and that’s time we can spend on other things that might help the researchers more.“
Further Enhancements
Martin and Esteva recently rolled out a snapshot feature to specific use cases. Researchers have to manage their data within their Vast allocation, and snapshots are being used in case of accidental deletions.
Martin explained: “The system allows users to do this kind of data recovery themselves, using the snapshots”.
The WEHI team has also expanded their Vast storage, adding another 676TB, and found this a painless experience – just adding the node and, according to him: “the system auto-balanced the metadata and it just worked for the end users instantly.“
With no clients to update or manual data migration, expansion has been with the full 1.3PB available without tuning, migration or manual interventions.
Now that Vast has provided a performance storage platform, the next challenges ahead for the WEHI team are in the realm of data management: providing further innovation to the researchers who are creating, storing and protecting the data that is the foundation of modern medical research.