IT Infrastructure located in a hot climate is a challenge. This is exasperated when heatwaves occur. Abnormally hot temperatures present considerable challenges to your Data Center.
Protect Your Data Center During Power Outages and Heatwaves.
Heatwaves increase the demand placed on public power grids. The increased demand means power grids are more subject to failure. It is therefore important to protect your Data Center during power outages. Without proper backup power systems in place downtime and data loss if servers and not shutdown gracefully will occur. Such failures are becoming more common as the global average temperatures are rising. Therefore, it is very important that your organization has proper environmental monitoring equipment and backup systems in place at your Data Center.
The most common power issues in the Data Center are:
- The main power line is physically damaged or overloaded
- Instability of the main power frequency
- Electrical noise (oscillation), usually caused by nearby equipment
- Voltage spikes, over-voltage
- Momentary or sustained under-voltage
- Distortion of the sine waveform on the power line
In this article, we will examine some common Data Center backup power systems, their monitoring, and power outage scenarios.
Planning your backup systems
During the design of your datacenter – whether it’s just a small computer room with a few servers and a single air conditioner or a large multi-level facility with supercomputers – the necessary backup systems have to be in place. At a minimum, the backup power system should be in the form of UPS to allow for temporary power outages, fluctuations and provide sufficient time for graceful shutdown. A UPS is however just a short term backup power. It provides sufficient power during the transition between mainline power and generators. Power outages usually last between 1 and 6 hours, and this is too long a time for the servers to be working from batteries alone.
If you have a small server room and no generator backup, then the switching of the UPS to battery mode should signal the servers to enter a controlled shutdown after a few minutes, if the main power doesn’t return during this time.
A mainline power failure has serious implications for the cooling systems of the Data Center. HVAC and chilled water plant systems cannot function on UPS battery power. Temperature can quickly elevate to levels that can result in server failures and thermal shutdown. Operating IT equipment at elevated temperatures will also affect its longevity. National Instruments tests noted that just a 5 °C (41 F) increase in temperature can reduce a hard drive’s life by as much as two years.
Properly sized backup power generators should be installed to ensure continuous operation of Data Center cooling systems during power outages. In addition, an environmental monitoring system (EMS) in the Data Center hall will monitor and alert should temperatures be outside of the prescribed range. Power monitoring and remote generator monitoring systems ensure that backup power systems are ready when called upon and capable of maintaining the loads put upon them.
In larger Data Centers, generators are an important component of the backup system. After mainline power failure the UPS devices take over providing the power. There is a period where power is transferred to the generator. The generator needs to start up and stabilize to provide power. Some larger Data Centers have a rotary flywheel powered generator, which in itself can fully power the infrastructure for a few seconds until the diesel-powered generators can startup.
During this power transfer period, it is important to monitor the temperature and humidity levels in the Data Center. There is a risk of a generator failure to start, which will leave the cooling system offline, and at the end of UPS runtime lead to a total Data Center outage. Generators can also fail after they have started and run for a few hours. This could be as a result of a poor generator maintenance schedule or lack of sufficient fuel. Both of these scenarios can be negated by a suitable generator remote monitoring system. If such an event occurs, where generators fail to start Data Centers need automated server shut down solutions to gracefully shut down the servers to avoid data loss.
HVAC and Chilled Water cooling systems can also fail even if there is no power outage. If the cooling system was sized for the typical ambient temperature and heat loads of the Data Center equipment, the added stress of high than normal ambient temperatures of a heatwave can result in system breakdowns. It is typical therefore in large data centres to have failover backup systems. Redundant compressors that can take over in the event of failure, or be turned on during times of increased demand for example.
Depending on the tier classification of your Data Center the backup power systems should be designed with redundancy in mind; if one equipment fails, there should be an alternative to take its place. Since this type of redundancy increase costs, this kind of redundancy is mostly for larger Data Centers. If you are aiming for Tier IV Data Center classification only 0.8 hours of annual downtime is allowed. This requires all power and cooling components to be 2N redundant, meaning there should be two independent systems each capable of carrying the full Data Center load. Other Tier classes require varying levels of backup power and cooling systems.
Uptime Institute’s Global Data Center Survey gathered responses from nearly 900 Data Center operators and IT practitioners from both major Data Center providers and from private, company-owned Data Centers. It has been found that the Power Usage Effectiveness (PUE) of Data Centers hit an all-time low of 1.58 in recent years. By contrast, the average PUE in 2007 was 2.5, dropping to 1.98 in 2011, and down to 1.65 in the 2013 survey. By 2018 PUE had hit 1.58 but then slightly increased in 2019 to 1.67.
PUE is a measure of energy efficiency in the Data Center. It not only is an indicator of the carbon footprint and operational cost but also used to calculate the power needed to operate and cool a Data Center. A PUE of 2 will mean that for every Watt of power to run the Data Center, another Watt is needed to cool it. A PUE of 1.5 means for every Watt the IT systems use, a half of a Watt is needed for cooling. So, lowering the PUE is an important measure of the amount of power required. It should be factored into the backup power requirements for the Data Center.
Be alerted and take action
We have examined in our article about heat, hardware failures and data loss. It is therefore important to monitor not only the environmental conditions but also the backup power systems. The monitoring system provides realtime data and alerts, notifying in advance of possible critical situations.
Data centers can operate in two modes:
- Normal Operation: all equipment is operating from the main power source, all systems online
- Emergency Operation: necessary systems are operating from standby alternate power sources (UPS and generators). Data Centers should be equipped with redundant cooling systems for this mode.
The switching to emergency mode should trigger notifications, to alert the necessary personnel and take further action to ensure that all critical systems are online – not only the backup power, but cooling and networking as well.
If your operating a lower-tier Data Center and have limited UPS battery capacity or backup generator capacity, it is advised that you make a list of essential and non-essential servers and equipment, which can be first shut down during a power issue. This will extend the UPS battery runtime, minimize the heat load reducing the pressure on cooling systems, and lower the power required from your backup generators. Similarly, when power returns and normal operation resumes, there should be a list of which servers and equipment should be turned on in what order.
Data Center Monitoring Systems?
AKCP’s Data Center monitoring system can help you monitor your servers, UPS, generators and environmental conditions. Whether you are looking for a few temperature and humidity sensors for your computer room or rolling out a multi cabinet monitoring solution in a Tier IV Data Center, AKCP has an end to end Data Center monitoring system. The solution includes sensors, and AKCPro Server DCIM software. Our Rack+ solution is an integrated intelligent rack or aisle containment system. Pressure differential sensors check proper air pressure gradients between hot and cold aisles. RFID Cabinet locks physically secure your IT infrastructure.
AKCP provides both traditional wired and wireless data center monitoring solutions. Our Wireless Tunnel™ System builds upon LoRa™ technology, with specific features designed to meet the needs of Data Center monitoring. Wireless sensors give rapid deployment, easy installation and high level of security. It is the only LoRa based radio solution that has been designed specifically for critical infrastructure monitoring, with instant notifications and on sensor threshold level checking.
AKCPro Server also contains built-in PUE calculation, to help you calculate the efficiency of your data center and view realtime how changes to CRAC setpoints and server load balancing can reduce your PUE.
For monitoring generators, AKCP has sensors for monitoring power, battery voltage and current, runtime hours and fuel level. In addition, we can interface to control panels with Modbus RS485 or SNMP for more detailed data.
The exact date and time, with the environmental conditions will be logged in the alerts being sent. You will then be alerted in time to take further actions if necessary. Automated elevation actions can be set up, for example, if the power loss is lasting longer than X minutes and the temperature reaches a critical value, begin the shutdown of servers.