Protect Your Data Center During Power Outages and Heatwaves

AKCPBlog

IT Infrastructure located in a hot climate is a challenge. This is exasperated when heatwaves occur. Abnormally hot temperatures present considerable challenges to your Data Center. Learn more in this article on how to protect your data center during power outages and heatwaves.

Protect Your Data Center During Power Outages and Heatwaves.

Heatwaves increase the demand placed on public power grids. The increased demand means power grids are more subject to failure. It is therefore important to protect your Data Center during power outages. Without proper backup power systems in place downtime and data loss if servers and not shutdown gracefully will occur. Such failures are becoming more common as the global average temperatures are rising. Therefore, it is very important that your organization has proper environmental monitoring equipment and backup systems in place at your Data Center.

The most common power issues in the Data Center are:

  • The main power line is physically damaged or overloaded.
  • Instability of the main power frequency.
  • Electrical noise is usually caused by nearby equipment.
  • Voltage spikes, over-voltage.
  • Momentary or sustained under-voltage.
  • Distortion of the sine waveform on the power line.

In this article, we will examine some common Data Center backup power systems, their monitoring, and power outage scenarios. 

Planning Your Backup Systems

During the design of your data center – whether it’s just a small computer room with a few servers and a single air conditioner or a large multi-level facility with supercomputers – the necessary backup systems have to be in place. At a minimum, the backup power system should be in the form of UPS to allow for temporary power outages, and fluctuations and provide sufficient time for a graceful shutdown. A UPS is however just a short-term backup power. It provides sufficient power during the transition between mainline power and generators. Power outages usually last between 1 and 6 hours, and this is too long a time for the servers to be working from batteries alone.

If you have a small server room and no generator backup, then the switching of the UPS to battery mode should signal the servers to enter a controlled shutdown after a few minutes, if the main power doesn’t return during this time.

A mainline power failure has serious implications for the cooling systems of the Data Center. HVAC and chilled water plant systems cannot function on UPS battery power. Temperature can quickly elevate to levels that can result in server failures and thermal shutdown. Operating IT equipment at elevated temperatures will also affect its longevity. National Instruments tests noted that just a 5°C increase in temperature can reduce a hard drive’s life by as much as two years.

Properly sized backup power generators should be installed to ensure the continuous operation of Data Center cooling systems during power outages. In addition, an environmental monitoring system (EMS) in the Data Center hall will monitor and alert should temperatures be outside of the prescribed range. Power monitoring and remote generator monitoring systems ensure that backup power systems are ready when called upon and capable of maintaining the loads put upon them.

 

In larger Data Centers, generators are an important component of the backup system. After a mainline power failure, the UPS devices take over providing the power. There is a period where power is transferred to the generator. The generator needs to start up and stabilize to provide power. Some larger Data Centers have a rotary flywheel-powered generator, which in itself can fully power the infrastructure for a few seconds until the diesel-powered generators can startup.

During this power transfer period, it is important to monitor the temperature and humidity levels in the Data Center. There is a risk of a generator failing to start, which will leave the cooling system offline, and at the end of UPS runtime lead to a total Data Center outage. Generators can also fail after they have started and run for a few hours. This could be a result of a poor generator maintenance schedule or a lack of sufficient fuel. Both of these scenarios can be negated by a suitable generator remote monitoring system. If such an event occurs, where generators fail to start data centers need automated server shut down solutions to gracefully shut down the servers to avoid data loss.

HVAC and Chilled Water cooling systems can also fail even if there is no power outage. If the cooling system was sized for the typical ambient temperature and heat loads of the Data Center equipment, the added stress of high than normal ambient temperatures of a heatwave can result in system breakdowns. It is typical therefore in large data centers to have failover backup systems. Redundant compressors can take over in the event of failure, or be turned on during times of increased demand for example. 

Depending on the tier classification of your Data Center the backup power systems should be designed with redundancy in mind; if one piece of equipment fails, there should be an alternative to take its place. Since this type of redundancy increases costs, this kind of redundancy is mostly for larger Data Centers. If you are aiming for Tier IV Data Center classification only 0.8 hours of annual downtime is allowed. This requires all power and cooling components to be 2N redundant, meaning there should be two independent systems each capable of carrying the full Data Center load. Other Tier classes require varying levels of backup power and cooling systems.

Uptime Institute PUE line graph

Diagram courtesy of Uptime Institute

Uptime Institute’s Global Data Center Survey gathered responses from nearly 900 Data Center operators and IT practitioners from both major Data Center providers and from private, company-owned Data Centers. It has been found that the Power Usage Effectiveness (PUE) of Data Centers hit an all-time low of 1.58 in recent years. By contrast, the average PUE in 2007 was 2.5, dropping to 1.98 in 2011, and down to 1.65 in the 2013 survey. By 2018 PUE had hit 1.58 but then slightly increased in 2019 to 1.67. 

PUE is a measure of energy efficiency in the Data Center. It not only is an indicator of the carbon footprint and operational cost but is also used to calculate the power needed to operate and cool a Data Center. A PUE of 2 will mean that for every Watt of power to run the Data Center, another Watt is needed to cool it. A PUE of 1.5 means for every Watt the IT systems use, half of a Watt is needed for cooling. So, lowering the PUE is an important measure of the amount of power required. It should be factored into the backup power requirements for the Data Center.

Be Alerted And Take Action

We have examined in our article heat, hardware failures, and data loss. It is therefore important to monitor not only the environmental conditions but also the backup power systems. The monitoring system provides real-time data and alerts, notifying in advance of possible critical situations.

 Data centers can operate in two modes:

  • Normal Operation: all equipment is operating from the main power source, and all systems online
  • Emergency Operation: necessary systems are operating from standby alternate power sources like UPS and generators. Data Centers should be equipped with redundant cooling systems for this model.

The switching to emergency mode should trigger notifications, to alert the necessary personnel and take further action to ensure that all critical systems are online – not only the backup power but cooling and networking as well.

If your operating a lower-tier Data Center and have limited UPS battery capacity or backup generator capacity, it is advised that you make a list of essential and non-essential servers and equipment, which can be first shut down during a power issue. This will extend the UPS battery runtime, minimize the heat load reduce the pressure on cooling systems, and lower the power required from your backup generators. Similarly, when power returns and normal operation resumes, there should be a list of which servers and equipment should be turned on in what order.

Data Center Monitoring Systems?

AKCP’s Data Center monitoring system can help you monitor your servers, UPS, generators, and environmental conditions. Whether you are looking for a few temperature and humidity sensors for your computer room or rolling out a multi-cabinet monitoring solution in a Tier IV Data Center, AKCP has an end-to-end Data Center monitoring system. The solution includes sensors and AKCPro Server DCIM software. Our Rack+ solution is an integrated intelligent rack or aisle containment system. Pressure differential sensors check proper air pressure gradients between hot and cold aisles. RFID Cabinet locks physically secure your IT infrastructure.

rackplus-to-a-smart-data-center

AKCP provides both traditional wired and wireless data center monitoring solutions. Our Wireless Tunnel™ System builds upon LoRa™ technology, with specific features designed to meet the needs of Data Center monitoring. Wireless sensors give rapid deployment, easy installation, and a high level of security. It is the only LoRa-based radio solution that has been designed specifically for critical infrastructure monitoring, with instant notifications and on-sensor threshold level checking.

AKCPro Server also contains built-in PUE calculation, to help you calculate the efficiency of your data center and view real-time how changes to CRAC setpoints and server load balancing can reduce your PUE.

For monitoring generators, AKCP has sensors for monitoring power, battery voltage and current, runtime hours, and fuel level. In addition, we can interface to control panels with Modbus RS485 or SNMP for more detailed data.

The exact date and time, with the environmental conditions, will be logged in the alerts being sent. You will then be alerted in time to take further actions if necessary. Automated elevation actions can be set up, for example, if the power loss is lasting longer than X minutes and the temperature reaches a critical value, begin the shutdown of servers.

AKCPProtect Your Data Center During Power Outages and Heatwaves