FMS: Xilinx Alveo U50 Data Center Accelerator Low-Profile PCIe Gen 4 Card

Xilinx, Inc. expanded its Alveo data center accelerator card portfolio with the Alveo U50.

The U50 card is a low profile adaptable accelerator with PCIe Gen 4 support, designed to supercharge a range of critical compute, network and storage workloads, all on one reconfigurable platform.

The Alveo U50 provides customers with a programmable low profile and low-power accelerator platform built for scale-out architectures and domain-specific acceleration of any server deployment, on-premise, in the cloud and at the edge. To meet the challenges of emerging dynamic workloads such as cloud microservices, it delivers between 10-20x improvements in throughput, latency and power efficiency. For accelerated networking and storage workloads, the U50 card helps developers identify and eliminate latency and data movement bottlenecks by moving compute closer to the data.

Powered by the Xilinx UltraScale+ architecture, the Alveo U50 card is the first in the Alveo portfolio to be packaged in a half-height, half-length form factor and low 75Wt power envelope. The card features high-bandwidth memory (HBM2), 100Gbp networking connectivity, and support for the PCIe Gen 4 and CCIX interconnects.

By fitting into standard PCIe server slots and using one-third the power, the Alveo U50 expands the scope in which adaptable acceleration can be deployed to unlock dramatic throughput and latency improvements for demanding compute, network and storage workloads. The 8GB of HBM2 delivers over 400Gb/s data transfer speeds and the QSFP ports provide up to 100Gb/s network connectivity. The high-speed networking I/O also supports advanced applications like NVMe-oF solutions (NVMe-oF Express over Fabrics), disaggregated computational storage and specialized financial services applications.

From ML inference, video transcoding and data analytics to computational storage, electronic trading and financial risk modeling, it brings programmability, flexibility, and high throughput and low latency performance advantages to any server deployment. Unlike fixed architecture alternatives, the software and hardware programmability of the Alveo U50 allows customers to meet changing demands and optimize application performance as workloads and algorithms continue to evolve.

Alveo U50 accelerated solutions deliver customer value across range of applications, including:

Deep learning inference acceleration (speech translation): delivers up to 25x lower latency, 10x higher throughput and improved power per node compared to GPU-only for speech translation performance⁽¹⁾;
Data analytics acceleration (database query): running the TPC-H Query benchmark, Alveo U50 delivers 4x higher throughput per hour and reduced operational costs by 3x compared to in-memory CPU⁽²⁾;
Computational storage acceleration (compression): delivers 20x more compression/decompression throughput, faster Hadoop and big data analytics, and over 30% lower cost per node compared to CPU-only nodes⁽³⁾;
Network acceleration (electronic trading): delivers 20x lower latency and sub-500ns trading time compared to CPU-only latency of 10µs⁽⁴⁾;
Financial modeling (grid computing): running the Monte Carlo simulation, delivers 7x power efficiency compared to GPU-only performance⁽⁵⁾ for a faster time to insight, deterministic latency and reduced operational costs.

“Ever-growing demands on the data center are pushing existing infrastructure to its limit, driving the need for adaptable solutions that can optimize performance across a broad range of workloads and extend the lifecycle of existing infrastructure, ultimately reducing TCO,” said Salil Raje, EVP and GM, data center group, Xilinx. “The new Alveo U50 brings an optimized form factor and unprecedented performance and adaptability to data center workloads, and we continue to build out solution stacks with a growing ecosystem of application partners to deliver previously unthinkable capabilities to a range of industries.“

“The forthcoming 2nd Gen AMD EPYC processor is ideally suited for data center-first accelerators like the Alveo U50 that combine compute, network and storage acceleration all on the same platform,” said Raghu Nambiar, VP and CTO, application engineering, Advanced Micro Devices, Inc.. “Taking advantage of AMD’s leadership, first x86 server-class PCIe 4.0 CPU, the Alveo U50 will be the industry’s first adaptable accelerator card with PCIe 4.0 support. We look forward to working with Xilinx to combine the benefits of AMD EPYC based solutions with Alveo acceleration to hyperscale and enterprise customers.“

“IBM is excited about the expansion of the Xilinx Alveo portfolio with the addition of the Alveo U50 adaptable accelerator card,” said Steve Fields, chief architect, IBM Power Systems, IBM Corp. “We believe the combination of low-profile form-factor, HBM2 memory performance, and PCIe Gen 4 speed to interface with IBM Power processors will enable the OpenPOWER ecosystem to provide cutting edge adaptable acceleration solutions.“

“With the smaller design and advanced features of the Alveo U50, Xilinx is well positioned to expand the markets for acceleration with configurable logic,” said Karl Freund, senior analyst, HPC and deep learning, Moor Insights and Strategy. “The new Alveo U50 should allow them to break through the market noise with demonstrated and dramatic performance advantages in high-growth use cases.“

“We are excited to be collaborating with Xilinx at FMS, showcasing the flexibility and performance of the Alveo U50 and our OpenFlex composable NVMe-oF platform,” said Scott Hamilton, senior director, product management, data center systems business unit, Western Digital Corp. “Xilinx is leading the charge in fabric-based computational storage using NVMe-oF to enable full disaggregation of server resources. We believe the new Alveo U50 will be an important part of the ecosystem as organizations take a truly disaggregated approach to SDS infrastructure.“

The Alveo U50 is sampling with OEM system qualifications in process. availability is slated for fall 2019.

⁽¹⁾ Performance of Alveo U50, with both Alveo U50 and Nvidia Tesla T4 running (B=2, L=8), Tesla T4 (B=8, L=8) (estimated data)
⁽²⁾ Alveo U50=24ms, 150k query/hr / CPU Query time = 210ms, 34k query/hr. based on Intel Xeon Platinum 8260 Processor (35.75M Cache, 2.40 GHz) 24 core
⁽³⁾ Intel Skylake-SP 6152 @2.10GHz CPU (Ubuntu 16.04) CPU Query time = 210ms, 34k query/hr. Alveo U50=24ms, 150k query/hr Xilinx Alveo U50 SDAccel 2018.3 (estimate) GB/s compression per CPU core = .0229. Alveo U50 = 10GB/s (estimate)
⁽⁴⁾ Alveo U50 latency is <0.5us, CPU latency is 10us. Measured from start of packet in on Tick (market data) to start of packet out on the order to Start Packet Out on the Order (estimate)
⁽⁵⁾ Intel Xeon E5-2697 v4 GCC 5.4.0 Nvidia Tesla V100 16GB PCIe CUDA 10.1 / GCC 5.4.0 Intel Skylake-SP 6152 @2.10GHz CPU (Ubuntu 16.04) CPU Query time = 210ms, 34k query/hr. Alveo U50=24ms, 150k query/hr Xilinx Alveo U50 SDAccel 2018.3 (estimated data).