AWS Amazon FSx for Lustre Supports Elastic Fabric Adapter and Nvidia GPUDirect Storage
Provides fastest storage performance for GPU instances in cloud, delivering up to 12x higher throughput per client instance (1,200Gb) compared to previous FSx for Lustre systems
This is a Press Release edited by StorageNewsletter.com on December 3, 2024 at 2:51 pmAmazon FSx for Lustre, a service that provides high-performance, cost-effective, and scalable file storage for compute workloads, now supports Elastic Fabric Adapter (EFA) and Nvidia GPUDirect Storage (GDS).
With this launch, Amazon FSx for Lustre provides a fastest storage performance for GPU instances in the cloud, delivering up to 12x higher throughput per client instance (1,200Gb) compared to previous FSx for Lustre systems, so you can complete ML training jobs faster and reduce workload costs.
EFA improves workload performance by using the AWS Scalable Reliable Datagram (SRD) protocol to increase network throughput utilization and by bypassing the operating system during data transfer. For applications powered by high-performance computing instances such as Trn1 and Hpc7a, you can use EFA to achieve higher throughput per client instance. GDS support builds on EFA to further enhance performance by enabling direct data transfer between the file system and the GPU memory. This direct path eliminates memory copies and CPU involvement in data transfer operations. With the combination of EFA and GDS support, applications using P5 GPU instances and Nvidia Compute Unified Device Architecture (CUDA) can achieve up to 12x higher throughput (up to 1,200Gb) per client instance.
EFA and GDS support is available at no additional cost on new FSx for Lustre Persistent-2 file systems in all commercial AWS Regions where Persistent-2 file systems are available.
Resources:
Amazon FSx for Lustre documentation
Blog: Amazon FSx for Lustre increases throughput to GPU instances by up to 12x.