Five steps to gain data centre power chain management and risk assessment
It is a simple fact that data centre power outages will cause major business disruption in our connected and web delivered world. It’s very likely that a severe data centre power outage can cause a major loss of customers and damage the brand – as well as the stock price.
Everyone must face these facts, because there is no silver lining to a power loss incident. The potential ramifications might be irreversible and include that loss of trust, market share and reputation that are hard to quantify but definitely have a major, and lasting, impact.
It therefore follows that it is smart business sense to take a proactive approach to preventing power failure in the data centre. It even behoves the business heads to take an interest in the foundational technologies that underpin the success of the whole business. I’d encourage anyone in charge of data centre facilities or management to be proud and tell the business how the data centre is secured, complaint and risk-proofed if they are confident in their facilities and the services provided. It’s a great way of demonstrating that you’re on top of the job and polishing the personal halo. If no one knows what the data centre does and how it’s run, then they won’t appreciate the hard-working team that keep the ‘lights on’ and the whole business running.
So if you’re not proud of the power chain management at the moment, here are five steps to take to gain a better control of the situation and manage the risk currently facing the facilities.
Make sure that your physical IT infrastructure is mapped to your power chain – understand the flow
The first step is to discover what devices actually make up your power chain, their locations, respective dependencies and lifecycle status. It is important to know at all times the last time each asset was serviced and by whom.
Have a single pane-of-glass view across all data centres/data rooms – save time and effort on management
Most data centre managers should have some sort of monitoring system with a view into building management system (BMS) and such facility operations as heating, ventilation, and air conditioning (HVAC). It’s good to note that these data centre monitoring systems are often siloed in nature and keep all data locked within their respective databases.
To do a better, all-encompassing and timely job managers should get access to all data, in real-time via a consolidated portal that automatically gathers information from the following sources:
- All data centres
- Multiple BMS systems
- Mixed vendor/hardware (the IT)
- Facilities equipment
Gain the ability to run power failure simulations – test and understand your disaster processes
Power failure simulations are a great way to test the resilience of a power chain while identifying the impact of all down streaming devices affected by power loss. It’s the only way to be able to demonstrate to your users and the business at large that the data centre management team are on top of their game when instances like BA’s outages show that not everyone else is.
Ensure that all incidents are captured within ITSM service desk – then use trend analysis to identify further potential risks of failure before they happen
It’s important to keep track of the small and large issues that impact operations so you can identify problematic patterns and avoid future disruptions. This takes full integration between the data centre operations, the IT service management (ITSM) service desk and facilities information to document problems and make impactful changes.
Following this, trend analysis is about looking back in order to look forward once more. By monitoring and documenting what data centre capacity is used, you can detect trends and patterns which will help in future capacity planning needs.
That will help managers make the case before issues become critical.
Ensure that the power chain is secure
Ask a key question: Does your IT security encompass network vulnerabilities and your power chain’s devices? Know if the power chain is part of IT security protocols, and therefore who has access to your control points. These are critical questions to address and can insulate the entire operation from possible breaches when the answers are known and entry/access is controlled.
If you need to work out what the best method is to assess the probability of power loss and mitigate the associated risks, then ask yourself the following questions:
- Do I have full transparency into all interconnected devices and systems?
- Am I monitoring my operations in real time?
- Have I documented the datacentre’s resiliency?
- Am I capable of running a stress test to determine the various risk levels associated with power loss?
- Can I identify the changing trends in my power system and respond accordingly?
- What is the overall vulnerability of my power chain?
Right away, if you do not have answers for all these questions, and finding them seems daunting, consider implementing a DCIM solution. DCIM solutions – data centre infrastructure management – are a proven means to address these concerns while enabling both facilities and IT personnel to participate in improving overall operations while lowering capital expenses.
Of course, real life is real life. There is no panacea for ensuring a 100% uptime and efficiency. However, there are methods to identify areas of improvement and prepare for service disruption. You owe it to your company and customers to be aware of the data centre management tools that help preserve services. And data centre managers, the unsung heroes of company success, owe it to themselves to have their value understood and respected appropriately.
- » Five ways to step up your cybersecurity: The power of the cloud to combat threats
- » Google Cloud launches container security tool and more at Tokyo jamboree
- » Global public cloud computing revenue trends: How hybrid and multi-cloud will dominate
- » Why big data and analytics revenues will reach $260 billion
- » VMworld 2018: Multi-cloud strategies, AWS partnership blossoms, vSAN and NSX updates, and more