High-density data centers evolve due to the global consumption of zettabytes of data. On average, a data center has millions of servers occupying almost 65% of the facility. This demands not only additional space but also an increase in cooling capacity. It is more challenging to optimize cooling compared to space, due to its invisibility. Twice the amount of cooling is required to achieve data center efficiency and reliability. Yet, many managers are still facing undesirable circumstances due to a temperature rise. With this, temperature monitoring using computational fluid dynamics for a more comprehensive analysis will be helpful.
Improper management of cooling systems may result in conditions like hotspots, mixing of hot and cold air, air recirculation, and wasted cooling capacity. These problems pose a threat to the consistency of data center performance. A couple with this is the substantial increase in energy cost. Temperature monitoring helps to ensure sufficient cooling and energy efficiency.
Hot Aisle/ Cold Aisle Layout
Generally, administrators arrange racks into hot and cold aisle proposed by IBM. Based on a survey by NY SERDA, around 2/3 of data centers have implemented this approach. Racks face each other, and their backs are adjacent to the back of the rack on the next row. Data centers are designed to have a clear air path from CRAC to the inlet of servers. Correspondingly, a clear path is needed for the air from the exhaust of the servers back to the CRACs. Raised floors are also installed in the facility. This aims to create a way for an efficient airflow from the CRAC to the server racks. The air rise into perforated tiles facing the servers.
If servers are arranged in a row facing one direction, the server inlet temperature increases as it proceeds to the next row. Therefore, overheating may happen to the last row. This also results in hot and cold air mixture in the aisle resulting in inefficiency.
Temperature Hot Zones
When air recirculates, the hot temperature zone will develop. It can lead to reliability issues of computer components. Reliability is the ability of an equipment to perform its functions under a specific time frame. Reliability factors into availability, as does recovery time after a failure occurs. Having a reliable system is a critical variable for the data center. In the data center, as downtime becomes longer, the repair cost also increases. This serves as clear evidence of the importance of temperature monitoring in implementing proper cooling methods.
Based on a study by Central Queensland University, the following are the potential hot zones within the rack. With consistent temperature monitoring and Computational Fluid Dynamics analysis, these hot zones are discovered.
Above the End Rack
The air from the vent tile flow on the cold aisle will move to the servers’ inlet. It will then move out to the hot aisle, which will pass in the CRAC’s direction. If the hot recirculates into the cold aisle and is mixed with the cold air, the inlet temperature will rise. With this, hot temperature zones will be above the end rack of the data center.
Outside the Rack
This happens when the hot air from the side and top of the racks is forced into the cold aisle. A temperature gradient can also occur in this area. When cooling fails, the equipment will heat the air. In this sense, a substantial increase in temperature may happen. It may go beyond the temperature gradient values stated on ASHRAE guidelines.
Top Corners of the Rack (width plane)
The server receives the mixture of ambient air and hot air from the server outlet. The hot zone will develop on the width plane of the data center, specifically at the top corners of the racks.
Front of the Rack
Although high-density servers provide outstanding performance, they can cause high recirculation may happen within the inlet and ambient air. On average, data centers can cool less than 2Kw per rack. In the study, a temperature rise occurs in the front of the reaching 180-200C.
Air Temperature Limit
Data Center is composed of heat-sensitive equipment that functions round the clock. Upgrading into a high-density data center requires recalibration of the cooling system. Exposure to high temperatures can cause a complete shutdown of severs, impacting the business operation. Not to mention the high repair and downtime cost. Hence, the temperature limit is one of the most critical parameters in data center management.
The ASHRAE provides thermal guidelines for data centers’ equipment spaces. In 2004, it initially recommended an air temperature range within 20-25°C. However, that is only based on the established norm. This is given with the limited data available at that time. Considering the energy efficiency, it issued another ideal temperature range. It became between 18-27° Celsius. In 2011, it allowed a lower temperature threshold of 5-45°C depending on the classes to support a free cooling system or economization.
The cooling system plays a vital function in keeping the data center efficiency. It eliminates the unneeded heat generated by the equipment. It is ensuring that the equipment is operating in the most efficient environment. It also cools the equipment itself to work as quickly as possible.
Inadequate cooling shortens the lifespan of the equipment. In a worst-case scenario, overheating can set equipment damaged beyond repair. One of the most devastating problems that a data center may face after is the loss of data. According to research, approximately 93 percent of data centers suffer from downtime for more than ten days. And the same data centers filed bankruptcy within the same year due to data loss.
Therefore, temperature monitoring is critical to keep track of any air temperature changes. This way, managers can keep their heads above water in ensuring the best equipment condition. The goal of this procedure is to maintain the temperature within the threshold set by the ASHRAE. It makes a difference when the company is aiming for complete efficiency and trying to save costs.
With the real-time data, the technical team can be aware of the issue requiring corrective actions. This allows them to act according to the standard. These corrective actions could vary from lowering the set point to installing an additional fan. It can also be helpful in predictive modeling such as Computational Fluid Dynamics analysis. Hence, air temperature issues will be managed before further problems will occur.
Environmental monitoring tools such as sensors are installed in multiple areas around the facility. The temperature sensors can be placed on walls, racks, or on-air paths, depending on the point you want to measure. The invisibility of the air and temperature is not a problem with temperature sensors. It collects environment data that can be thoroughly analyzed in different visualization methods. The computational fluid dynamics method, though, shows future air temperature still depends on the data provided.
Companies investing in monitoring technologies such as AKCP wireless sensors will achieve maximum efficiency and minimizing cost. It offers both temperature and airflow sensors for smarter air temperature and distribution management. With these sensors, managers can have a more proactive view of the data center.
- AKCP Airflow Sensor
The AKCP airflow sensor is designed for systems that generate heat in the course of their operation, including data centers. It will ensure that the airflow necessary to dissipate the heat is distributed properly. The Airflow sensor is placed in the path of the air stream, where the user can monitor the status of the flowing air. Once installed, it will sense the presence and absence of airflow. Managers can receive notifications via email on the location and description of the fault.
AKCP Temperature Sensor
The AKCP wireless temperature is designed to measure the ambient temperature of the data center. It has a built-in calibration checker. If it detects the sensors are out of acceptable tolerances, an alert sends a warning that the sensor requires re-calibration. It is manufactured using highly integrated, low-power surface mounts technology to ensure long-term reliability. It sends notifications to the managers once the ambient temperature reached its threshold.