Evolution History of NAND Flash Interfaces
In order to meet increasingly demanding performance requirements of business scenarios, developers have thought of many ways to improve performance of NAND flash-based systems.
This is a Press Release edited by StorageNewsletter.com on March 9, 2023 at 2:00 pmFrom Union Memory Information System Ltd. (Nanshan District, Shenzhen, China)
1. From Performance Improvement of flash system
For application scenarios from SD memory card, mobile phone, tablet and other consumer products to enterprise data center, NAND flash, with such characteristics as high performance, large capacity, low power consumption and low cost, is very popular among users and currently the most widely used semiconductor non-volatile storage medium. In order to meet the increasingly demanding performance requirements of business scenarios, developers have thought of many ways to improve the performance of NAND flash-based systems, which can be divided into the following categories:
Fig. 1 Overview of flash system performance improvement
Improve bus frequency and optimize AC timing: On the premise of satisfying reliable transmission, upgrade NAND flash bus frequency and use as few timing parameters as possible for operation. Especially, in the case of the increasing Page Sizes (2KBà4KBà8KBà16KB), it is particularly important to optimize the timing parameters to reduce the time of data transmission via the bus, which has been the direction of continuous efforts over the past 20 years, for example, when the bus interface speed is 200Mb/s (100MHz), it takes about 40μs to complete 4KB data transmission, but if the bus interface speed is increased to 1,600Mb/s (800MHz), it only takes about 5μs to complete 4KB data transmission.
Use cache read/program: In general, LUN (Logic Unit Number) is the smallest logical operation unit of NAND flash, in which read/ write operations are serially executed, i.e., the next read/write operation can be carried out only after a RW command is completed. Cache Read/Program allows the user to perform RW data transmission via the bus concurrently when the NAND flash Array is Busy, thus improving the efficiency of the pipeline.
Multiplexing and concurrency technology includes:
1) Inter-channel concurrency, allowing the user to execute independent commands and data operations concurrently via different channels;
2) Intra-channel concurrency, that is, Interleaving operation, allowing the user to execute interleaving operations between different CEs in the channel or different dies in the CE under certain constraints;
3) Multiple-plane operation, that is, Multi-Plane operation, allowing the user to concurrently read/write different planes within the die. It can be said that it is precisely because of the multiplexing and concurrency technology that enables NAND flash-based solid state storage products achieving gigabyte-level reading and writing performance.
It is obvious that the above optimization covers NAND flash basic timing/command optimization to the comprehensive optimization of system level, which can be used in combination in the light of the system requirements in practical applications. After years of development, Cache Read/Program and multiplexing and concurrency technology have been developed relatively mature, in contrast, their changes in recent years are relatively small, but NAND flash bus frequency upgrading technology has been booming, and in recent years there is a trend of accelerated evolution. Below, we will feel the evolution of NAND flash interface speeds with that of ONFI protocols.
2. Evolution history of ONFI interfaces
Founding in May 2006, Open NAND Flash Interface (ONFI) aims to simplify the integration and promotion of NAND flash in consumer electronics applications and computing platforms. Since the release of the first ONFI protocol in December 2006, it has released dozens of updated versions, and the maximum interface speed has also been developed from the initial 50Mb/s to the latest 3,600Mb/s. Fig. 2 shows evolution trend of interface speeds continuously optimized in line with ONFI protocols.
Fig. 2 Evolution of ONFI interface speeds
Click to enlarge
ONFI 1.0: With the successful application of NAND flash in CE products such as SD memory card and IPod, as well as the increasing demand for NAND flash-based MP3 and U disk, the application field of NAND flash has been greatly expanded. However, due to the different interface protocols of different NAND flash manufacturers, the downstream master chip manufacturers and product manufacturers encounter various compatibility problems. In order to change this situation, Intel led the establishment of ONFI, and soon after launched ONFI 1.0, aiming to unify NAND flash interface protocols.
ONFI 2.x: ONFI 2.x introduced NV-DDR technology to achieve Double Data Rate through double-edge sampling, with maximum interface speed evolved from 133Mb/s of ONFI 2.0 to 200Mb/s of ONFI 2.1/2.2. NV-DDR technology introduced an external reference voltage as the sampling reference of data I/O signals, and used a source synchronous clock to accurately latch data, commands, and address signals. However, because DQS and Clock are not differential signals, the edge is prone to interference. Currently, NV-DDR interfaces are rarely seen in mainstream consumer/enterprise NAND flash.
ONFI 3.x: ONFI 3.x introduced NV-DDR2 technology, with maximum interface speed evolved from 400Mb/s of ONFI 3.0/3.1 to 533Mb/s of ONFI 3.2. NV-DDR2 introduced Differential Signal and On-Die Termination (ODT) technology to improve the signal quality. In the differential mode, noise and interference can be effectively suppressed by enabling the RE_n/RE_c differential signal pairs and DQS_t/DQS_c signals differential signal pairs during the stage of data transmission. By integrating the ODT terminating resistor on the NAND flash chip, the impedance mismatch between the controller and the signal transmission channel of NAND flash can be minimized, which helps to reduce the signal reflection. In addition, the ODT resistance value can be controlled from software configuration register, thus simplifying the hardware circuit design.
ONFI 4.x: ONFI 4.x introduced NV-DDR3 technology, with maximum interface speed evolved from 800Mb/s of ONFI 4.0 to 1600Mb/s of ONFI 4.2. In order to ensure the quality of high-speed signal transmission, NV-DDR3 introduced several calibration techniques, including ZQ Calibration, Duty Cycle Correction (DCC) and RW DQ Calibration. As mentioned above, ONFI 3.0 introduced ODT technology. Due to the ODT resistor on NAND flash is prepared by CMOS process, resistance value is prone to drift when the temperature and voltage change, so it is necessary to calibrate the resistance value by external high-precision resistor through ZQ Calibration technology. DCC Calibration allows adjusting the signal duty cycle to solve the mismatch between rising edge and falling edge caused by the asymmetry of high-speed signal transmission path, while read/write DQ Calibration ensures that read/write sampling signal is aligned to the center of eye diagram.
ONFI 5.x: ONFI 5.x introduced NV-LPDDR4 technology, with maximum interface speed evolved from 2,400Mb/s of ONFI 5.0 to 3,600Mb/s of ONFI 5.1. To address the challenge of signal integrity posed by the high-speed interface, ONFI 5.x introduced an asymmetric DQS design and an adaptive equalizer design in addition to further strengthening read/write DQ Calibration and VrefQ Calibration. For example, DFE (Decision Feedback Equalizer) technology uses the output of the last channel to be weighted and fed back to the input after judgment, so as to eliminate the post-symbol interference. In addition, NV-DDR3 and NV-LPDDR4 support the same maximum interface speed, but the advantage of NV-LPDDR4 is that the power consumption of read operation can be greatly reduced after using LTT technology.
From ONFI 1.0 to the recently released ONFI 5.1, it can be seen that in order to match the increasingly demanding bandwidth requirements of the system front interfaces (e.g. eMMC/UFS/PCIe), NAND flash interface speed has been increased by 72x, and will quickly be moved to the next peak in the future.
Then, is the increase in interface speed doubling the benefit of the system? The answer is no.
3. Inadequate traditional protocols call for further improvement
The signals transmitted via the NAND flash bus can be divided into 3 types: Commands, Addresses, and Data. Through DQ[7:0] time division multiplexing, commands, addresses and data are respectively transmitted at different times. Among them, the data is synchronous transmitted and differentially sampled, the rate is high. However, commands and addresses are asynchronously transmitted and single-ended sampled, and the rate is low. From ONFI 1.0 to ONFI 5.1, the interface sppeeds have been developed rapidly, but the transmission form of commands, addresses and data is basically the same. In fact, with the increase of bus transmission speed, the main improvement is the data transmission delay, and the transmission delay of commands and addresses has not been improved, thus, for the system, the bus use efficiency is decreasing. Fig. 3 below is a comparison of bus efficiencies under optimistic write/read scenarios. Obviously, the bus efficiencies under both RW scenarios gradually decrease, especially when the bus efficiency under read scenario drops to about 50%, which further aggravates the challenge of system design.
Fig. 3 Analysis on NAND flash bus efficinecies
Taking a certain model of SSD in the industry as reference, Fig. 4 shows the comparative analysis of it with UH811a, an ESSD developed by Union Memory Information System Ltd.
UH811a and UH831a SSDs
It can be seen that the read bandwidths of the 2 SSDs are equivalent, about 7,000MB/s, but the 4K random read IOs of UH811a is significantly improved compared with the reference SSD. It’s found from further analysis that the 4K IO/s of UH811a after conversion is equivalent to the read bandwidth. However, after the same conversion, the bandwidth of the reference SSD is only 70% of the read bandwidth, the reason is that the expenditure of bus random read operations is higher than that of sequential read ones.
Fig. 4 Read bandwidth vs 4K read IO/s
It can be seen that in the case of unchanged transmission form of the traditional commands/addresses/data, with the improvement of interface speed and the increase of system complexity, the resultant benefits are attenuated. Therefore, the deficiencies of the traditional protocols facilitate the continuous improvement of the protocols. In the face of this problem, Joint Electron Device Engineering Council (JEDEC) is intensively discussing the next round of evolution of protocols, and the interface speed will be improved while optimizing the command/address transmission mode, which is bound to create a new test to the design of master chips.
4. Conclusion
Traditionally, improving NAND flash interface speed is the main means to improve system performance, and NAND flash manufacturers have come up with various ways to solve the problem of signal integrity caused by high-speed signals. Facing the future, the traditional input mode of commands/addresses leads to low bus utilization, the further evolution of protocols will also usher in new input mode of commands/addresses in addition to the consideration of continuously improving the interface speed. Union Memory will pay attention to the development trend of protocols and compatibly support the future media with a flexible architecture, so as to bring better storage products and solutions to customers.
Resources:
ONFI spec
2022, Phison, NAND Flash 101: Flash Device Interfaces
2021, Evolution of Mobile NAND Flash-based Storage
Mathematics in the Chips – Equalizer (EQ) and Its Applications in External High-Speed
About Union Memory
Founded in 2017 and based inNanshan District, Shenzhen, China, it focuses on ESSDs, DSSDs, CSSDs and embedded storages. It offers solutions with performance and reliability, which are used to servers, data centers, PCs, mobile terminals, smart wearables and other fields. It has established many laboratories and 2 modern production bases, with in-house controllers and the capacity of developing firmware, packaging and testing chips. Moreover, it offers customized solutions for enterprises, CE and other partners. It is awarded the National High-Tech Enterprise and the SRDI Enterprise by the MIIT. In addition, it participates in the national special project of VLSI manufacturing equipment and complete process, and is included as MIIT’s key products for building a robust foundation for industrial development and ‘one-stop’ application program demonstration project. As of February 2023, the company had obtained 254 invention patents of invention and 31 software copyrights.