Performance Distributed File System YRCloudFile Using in AI From Chinese Iflytek

Iflytek is a provider of AI in China.

It maintains technology in the fields of speech recognition, NLP, computer vision, ML, autonomous learning, etc. AI highly relies on IT infrastructure architecture, one of the important factors achieving the company’s leadership is the support for massive data and computing.

Data platform architecture
When it comes to AI, we have to mention deep learning. Today it has become a generally accepted consensus that AI is achieving through deep learning. The concept of deep learning has been proposed in the 1980s, until recent years, it has been really valued and applied. The 2 major elements (large amounts of labeled data and computing power) on which deep learning relies become reality, therefore AI has essentially become the science of data processing and computing. The firm’s AI data processing also uses the industry’s mainstream processing flow, which can be described by the following diagram:

Throughout the data processing process, different stages have different data IO requirements, use different technologies and tools. The stages of data processing and IO characteristics at each stage are shown in the following figure:

During the data preparation, the company uses Hadoop and other big data technologies for data cleaning. The model training is the core of the entire processing procedure. Model training attempts to get a deliverable model through deep learning algorithms from the features of massive data for AI products and solutions.

The IT infrastructure team of the firm needs to provide stable and performance training storage platforms for various AI teams and business units. They manage thousands of GPU servers. The performance of storage platform directly affects training efficiency of business units. It is the priority of the entire data processing pipeline, and is also breakthrough point for optimizing the training platform.

To satisfy training requirements from AI business units, data platform for model training must have following characteristics:

Ensure high-bandwidth, low-latency, in order to provide sufficient data input for GPU servers and ensure high usage of GPU
Support billions of small files and some large files, mixed read and write IO pattern, satisfying the needs of a large number of feature files or several aggregated big files
Use standard file interface to access the data
Support concurrent access of thousands of HPC nodes
Provide data access capabilities for containerized training tasks

Why choose YRCloudFile as training data platform?
Since 2019, YanRong Tech in Beijing and the company have conducted many communications. The firm’s technical team chose the training storage platform very carefully. The most important concern is performance, including random R/W of large files and R/W performance of small files; operation performance of massive metadata (creation, stat, removal, etc.); supporting for massive files, the consistency issue of data access and operation performance in this scenario; stability of the storage platform in failure scenarios, especially metadata service failure; integration with the container orchestration platform; management of data life cycle and other capabilities. These evaluation standards are concluded from the practical requirements from business units and years of experiences of IT infrastructure team.

YRCloudFile can be decoupled from hardware. It can exert the performance of flash drives and high speed networks. Compared with the other storage system, its performance is ahead. The performance of metadata satisfies the requirements of accessing massive small files. It also provides features in the integration of container orchestration platforms (Kubernetes) and the smart tiering of hot and cold data. Considering from various aspects, YRCloudFile is a performance distributed file storage that suitable for Iflytek training data platform.

After the deep dive into YRCloudFile and comprehensive testing, the company and YanRong Tech reached a cooperation.

With the cooperation of the 2 companies, YRCloudFile has been applied in a large-scale deep learning training cluster of the firm. Relying on YRCloudFile’s flexible software deployment architecture and rapid deployment capabilities, the amount of data in YRCloudFile has grown rapidly from the first YRCloudFile cluster on-line to subsequent clusters deployments, within a few months. The amount of data has reached nearly 10PB, storing nearly ten billion audios, videos, and pictures files used for training. The peak bandwidth of a single cluster has reached nearly 10GB/s, thus the training efficiency has been greatly improved.

Land to expand in Iflytek

The company’s Voice Training Platform consists of nearly 1,000 high performance servers at the firm’s AI research institute. A number of scientists and algorithm engineers use the data to continuously optimize models for various the company’s products.

YanRong Tech and Iflytek infrastructure team have built a close cooperation relationship. Through the analysis of the characteristics of the AI IO pattern, the R&D team of YanRong Tech further optimized YRCloudFile for Iflytek’s AI scenario. Thus it has formed a closed loop of product optimization: YRCloudFile deployment-> IO pattern analysis-> R/W optimization-> update and feedback to product. The technical teams of both sides won the world’s sixth in IO500 storage performance benchmark test based on YRCloudFile.

Through the large-scale use at the company, YRCloudFile rapidly accumulated rich experience and capabilities for high-performance storage scenarios serving AI companies. Also the reliability and stability of YRCloudFile have been improved.

Plan of YRCloudFile in Iflytek
Currently YRCloudFile has provided a large-scale high-performance data services for the training cluster, and it will expand its usage in following scenarios.

YRCloudFile also provides cloud native storage capabilities, including support for PVC Quota, PVC resize, PVC QoS and hotspot analysis, etc. It is also the first Chinese cloud native storage product entering CNCF Landscape. Therefore, the company will use YRCloudFile as the storage for its cloud native applications running in Kubernetes in further.
The R&D team of YanRong Tech will open more SDKs to help Iflytek to integrate YRCloudFile into their cloud platform, providing file services for more businesses.

Benefits to Iflytek
Massive data and computing capability are the 2 major factors for deep learning. High-performance file system YRCloudFile has been used in Iflytek’s learning platform, provides storage services for the AI infrastructure, meanwhile gradually shows its increasing commercial value.

Training time is reduced. Compared with other commercial storage that have been used, YRCloudFile’s high bandwidth and low latency can saturate the computing efficiency of computing servers such as GPUs, reducing a single training cycle from 1 week to 2 days.
Training accuracy is improved. The model generated by deep learning will have deviations. Algorithm engineers need continuously optimize the parameters of the cost function by adjusting the weights. As the single training cycle is shortened, it becomes possible for algorithm engineers to perform more iterations on the model. The more iterations be done, supplemented by algorithm optimization, the better training accuracy of Iflytek can be achieved.

Conclusion
In China, AI industry is developing fast. Traffic, medical treatment, government affairs, education, autonomous driving and other AI scenarios continue to penetrate into our lives from different aspects. YRCloudFile will help AI companies, such as voice recognition, computer vision and autonomous driving, to improve training efficiency and enhance the competitiveness of their products to serve more customers.

About Iflytek
Established in 1999, it is a national level high-tech enterprise. Dedicated to the R&D of intelligent speech and language technologies, AI, and provision of professional services for governments, education sector, financial organizations and other fields. The company was IPO in the Shenzhen Stock Exchange in 2008. Iflytek’s AI technologies such as speech recognition, speech evaluation, and natural language processing, represents the leading level in the world.

About YanRong Tech
It is a high-tech enterprise with SDS technology as its core competitiveness. With independent IP rights in key such as distributed storage, it is in next-gen cloud storage solutions. Holding the technical concept of Drive Future Storage, it is committed to improve the processing capability of enterprise data centers through the next gen of cloud storage technology, providing enterprises with a high-performance, agility and low-cost innovative IT solution.