Disaster Recovery Planning for College IT Data Centers
Modern-day universities and colleges rely on stable IT infrastructure to host their ever-growing digital presence. The IT infrastructure of a college must handle thousands of students data. With the advent of distance learning, online virtual classes and whiteboards have become commonplace. Implementing a data center disaster recovery plan is essential. Below is a list of what the typical college campus IT infrastructure must handle.
- Critical personal and work files shared on File Servers
- Directory servers (LDAP)
- Domain Name Servers (DNS)
- Print servers
- Virtualization servers
- Research projects
The loss of this digital infrastructure in the event of natural and man-made disasters can present real and costly issues. Therefore, the management and monitoring of these critical IT facilities is vital for business continuity.
Most colleges are investing in improving their overall campus experiences, but are they all prepared when real disaster strikes?
Does your college have a business continuity plan in the event of a disaster? Internet problems, power outages, cyber-crime, lightning damage, fire and water damage, earthquakes and tornados can all wreak havoc with your college campus IT services. unless your school has a disaster management plan, these could lead to major data and property losses.
As an example, Indiana University invested $40 million USD in a new data center. The data center includes the Big Red II Supercomputers. This is a huge investment in IT infrastructure, an asset that must be protected. The building was designed to take the impact of an EF5 Tornado.
So what can be done to prevent and mitigate disasters that could affect a college IT infrastructure?
Expect the unexpected – implement a data center disaster recovery plan
Make plans. Be prepared for possible problems. Remember the 5 P’s, Prior Preparation Prevents Poor Performance. This will put you in a better position to respond when disaster strikes. Having a plan, that has been practised allows immediate and precise action without confusion.
Due to the importance of college IT infrastructure, it is important to keep an up-to-date disaster recovery plan. The plan should be tested regularly and updated to keep pace with your IT infrastructure developments.
The most common cause of IT outages occur due to human error. Mistakes made during the installation and maintenance of IT systems. Therefore it is important to have trained staff to perform these operations, and follow the proper procedures put in place.
Policies should be structured for different levels of disaster. Some issues affect only a small portion of the infrastructure and could be isolated. For this, a partial recovery plan can be implemented. Other issues present campus-wide problems such as large-area flooding or fire. For these, a full recovery plan should be in place.
Natural disasters often result in power loss. Without power, your whole IT infrastructure and communications network will be down. Reliable communications during a calamity are vital. Redundant backup power systems such as UPS (uninterruptible power supply) and standby generators keep the IT infrastructure up and running while being serviced. Plan and size them appropriately. UPS should be able to deliver sufficient power for the time it takes to transition to generator power. Generators should be sufficiently fueled, with standby fuel tanks for replenishment. Generator starter batteries should be maintained and in good condition. The Generator must be sized correctly for power load requirements. These backup power systems should be regularly maintained and tested. Battery and power monitoring systems installed and remote generator maintenance monitoring. Proper maintenance is the key to a well-performing datacenter, which is even more important for preventing possible critical issues when a disaster happens. A typical data center power outage can cost $9,000 USD per minute of downtime!
If the building ever did lose power, two towering 16-cylinder, 2,200-horse power Cummins diesel backup generators — complete with 10,000-gallon fuel storage tank — would kick on automatically.
Indiana State University Data Center Facilities
The most common form of disasters for an IT infrastructure can come from:
- Lightning damage, power surges
- Water damage (flooding)
- Fire damage
- Human error (for example maintenance issues)
- Excessive temperature and humidity (HVAC issues)
- Cyber-crime and data theft
- Chemical hazards
- Radiological hazards
- Biological hazards
- Bomb situations
Detect and respond
It is important to recognize when disaster happens, or even better be forewarned of critical situations as they develop. This is where a data center monitoring system plays a critical role. The data center monitoring system can include power, security and environmental monitoring. These systems will constantly check vital statistics of your data center, send alerts and sound alarms. For major natural disasters, usually you can be notified from public media and take necessary precautions and mitigation. But when issues happen directly at your datacenter, you will need to rely on local monitoring solutions.
When alerts are received, first assess the situation, then determine the severity to take the necessary steps, and finally act on the problem.
How can AKCP help?
AKCP have multiple solutions for monitoring your data center. Whether you are looking for a few temperature and humidity sensors for your computer room, or rolling out a multi cabinet monitoring solution, AKCP has an end to end data center monitoring solution including sensors, and AKCPro Server DCIM software. Our Rack+ solution is an integrated intelligent rack or aisle containment system. Pressure differential sensors check proper air pressure gradients between hot and cold aisles. RFID Cabinet locks secure your physical IT infrastructure.
AKCP provides both traditional wired and wireless data center monitoring solutions. Our Wireless Tunnel™ System builds upon LoRa™ technology, with specific features designed to meet the needs of data center monitoring. Wireless sensors give rapid deployment, easy installation and high level of security. It is the only LoRa based radio solution that has been designed specifically for critical infrastructure monitoring, with instant notifications and on sensor threshold level checking.
After the disaster situation is cleared and its mitigation is done, it is time for taking recovery steps.
We recommend following a two-step process depending on the severity of the situation (based on the Louisiana Delta Community College Disaster Recovery plan).
- First is a partial recovery, ensuring the most critical hardware is in working condition and the restoration of the most important IT services (email, databases, websites, intranet, critical files). This step of recovery could mean that some functionality of the university’s IT infrastructure will be still unavailable.
- Second is the full recovery of all services and hardware that were in use before the disaster event. This will be only possible when all hardware and backup sets are in good working order.
After the recovery, make cost and damage analysis, and evaluate the effectiveness of the response. This will enable better response for future disaster events.