Effects of High Temperatures on HDDs
No more than 40°C ideal
This is a Press Release edited by StorageNewsletter.com on November 10, 2023 at 2:02 pmThis article was written by Rainer W. Kaese, senior manager business development storage products, Toshiba Electronics Europe.
The effects of high temperatures on hard drives
Modern enterprise HDDs are designed for operating temperatures between 5 and 60°C. Manufacturers recommend that they should not be operated at the upper end of this range on a permanent basis because doing so will reduce the lifetime of the drives and pose a risk of higher failure rates. So what happens to HDDs at high temperatures? And can these effects be compensated for later by operating at lower temperatures?
Like most components in servers and storage systems, HDDs get warmer in operation, especially under heavy load. To enable administrators to monitor the temperature of their drives, modern units have an internal temperature sensor that delivers its readings via SMART (Self-Monitoring Analysis and Reporting Technology), so that they can be read using on-board operating system resources, system management tools or the tools for managing RAID controllers and host bus adapters. In addition, there are a whole host of specialist tools for this task, such as the open-source licensed smartmontoolshttps://www.smartmontools.org/, available for both Windows and Linux.
If HDDs get too hot, they no longer work correctly because the electronic and mechanical components only function correctly within a certain temperature range. On top of this, the mechanical components wear out more quickly, resulting in reduced reliability and service life. In particular, the bearing of the spindle within the HDDs is at risk, because at high temperatures the oil used as lubricant becomes too runny and can leak out of the bearing. It is therefore essential that the temperature of the hard drive is monitored to prevent overheating and ensure that the drives provide long and reliable service.
What is optimum temperature?
The manufacturers usually specify a temperature range in which their drives operate correctly. In the case of enterprise HDDs, they assume use in air-conditioned server rooms or data centres, which is why these types of drive are designed for operating temperatures between 5 and 60°C. The specifications for NAS HDDs are 5 to 65°C and surveillance HDDs are 0 to 70°C, because systems for video surveillance are not always set up in rooms with stable ambient conditions.
These specs are really only about the operating capability, but durability is definitely adversely affected when drives are operated in the upper temperature range for a longer period of time. A brief temperature increase, for example when a fan in the system has failed and must be replaced, can usually be tolerated, but even permanent operation at 45°C can cost the hard drives a few months of lifetime. After all, MTTF specs in the manufacturers’ data sheets always refer to an average operating temperature of 40°C.
An interesting point in this regard: average actually means that operating times at more than 40°C can later be compensated for by operating times at a correspondingly lower temperature. In practice, however, it is highly unlikely that HDDs first spend months or years at high temperatures and then the same amount of time at lower ones.
Temperature increases, reliability falls
A typical enterprise HDD has an MTTF of two and a half million hours. In other words, in a case of two and a half million drives, one failure per hour would be expected, or in a case of 1,000 drives, one failure every 2,500 hours. Since this information is not particularly intuitive for estimating the failure probability of hard disks within one’s own infrastructure, the annual failure rate (AFR) is usually used, which can be calculated from the MTTF. The formula for this is as follows: AFR=1-e(-8,760/MTTF)*100, where 8,760 are the annual operating hours for the 24/7 operation which is standard for enterprise HDDs.
In this formula, the drives that have already failed are considered when calculating the AFR for the remaining drives. However, this is not necessary for low failure rates such as is the case with hard disks, which means that the formula can be simplified: AFR=8,760/MTTF*100. The resultant AFR for enterprise HDDs with an MTTF of 2.5 million hours is therefore 0.35%. Where 1,000 drives are used, 3 to 4 of them can be expected to fail each year.
If the average operating temperature of the hard drives is above 40°C, the failure rate increases. As a rule of thumb, for every 5° above 40°C, the failure rate can increase by 30%. At a permanent HDD temperature of 55°C , the AFR should roughly double, so an installed base of 1,000 drives would probably see 6 to 8 failures per year.
Temperature is not the only factor
In addition to temperature, other factors affect their durability, including annual workload (Rated Workload), guarantee period and, in the case of drives not designed for 24/7 use, operating time. This does not mean there is an immediate risk of failure if the specified values are not observed, or if the HDD continues to be operated after the guarantee period has expired, but the AFR increases so that more than the expected number of HDDs per year fail over time.
Correct thermal design and cooling
In systems which are thermally well-designed and which are accommodated in air-conditioned rooms, there should normally be no problem keeping the hard drive temperature at 40°C or lower. Without air conditioning, it can be difficult because in the summer months the temperature in rooms often exceeds 30°C. This means that inside servers and storage systems, temperatures above 40°C are quickly reached. In addition, the warm exhaust air from the systems is difficult to remove without suitable ventilation, resulting in an inevitable increase in the room temperature and, consequently, the systems heat up even more.
It is therefore always better to operate server and storage systems in an air-conditioned environment – especially if top loaders with several dozen HDDs are used. For design reasons the rear drives become warmer than the front ones, because the air flow absorbs the heat from the front drives first and is therefore no longer capable of cooling the rear ones quite as effectively. In this case, air intake temperatures of less than 20°C are required to keep the HDDs in the rear rows below 40°C on a permanent basis.
If the hard drive temperature is permanently more than 15°C above the air intake or ambient temperature, there is something amiss with the thermal design of the system. In this case, administrators need to check whether fans are working correctly or if the air flow reaches the drives without hindrance. In addition, the room as a whole needs to be designed so that cold and warm air do not mix, because this reduces cooling efficiency. This is why racks are usually positioned opposite each other. The cooling air is supplied in the middle where it meets the front of the units and is drawn in to cool the system components. It absorbs heat in the process and then comes out again at the back of the units, where it is removed by fans. Covers on empty trays prevent the warm exhaust air from flowing back into the cold aisle.
Summary
To ensure that HDDs function correctly and last as long as possible, administrators need to continuously monitor their operating temperatures. Even though drives are designed for up to 60°C, it is essential to avoid this maximum value. Operation at an average of no more than 40°C is ideal. Ensuring that this temperature is not exceeded depends primarily on the thermal design of the system and the cooling concept of the room in which the system is accommodated.