Disaster recovery: How to reduce the business risk at distance
Geographic distance is a necessity because disaster recovery data centres have to be placed outside the circle of disruption.
The meaning of this term depends on the type of disaster. This could be a natural phenomenon like an earthquake, a volcanic eruption, a flood or a fire. Calamities are caused by human error too; so the definition of the circle of disruption varies. In the past, data centres were on average kept 30 miles apart as this was the wisdom at the time. But then today, the circle’s radius can be up to 100 miles or more. In many people’s views, a radius of 20 or 30 miles is too close for comfort for auditors, putting business continuity at risk.
With natural disasters and global warming in mind David Trossell, CEO of leading self-configuring infrastructure and optimise networks (SCIONs) vendor Bridgeworks, therefore ponders on what is an adequate distance between data centres in order to ensure that business goes on, regardless of what happens within the vicinity at one of an organisation’s data centres:
“Many CIOs are faced with a dilemma of how to balance the need of having two data centres located within the same Metro area to ensure synchronisation for failover capability yet, in their hearts they know that both sites will probably be within the circle of disruption,” Trossell explains. He adds that, in order to ensure their survival, they should be thinking of what their minimum proximity from the edge of the circle is, for a tertiary DR site.
“After all, Hurricane Sandy ripped through 24 US states, covering hundreds of miles of the East Coast of the USA and caused approximately $75bn worth of damage and. earthquakes are a major issue throughout much of the world too, so much that DR data centres need to be to be located on different tectonic plates”, he explains.
A lack of technology and resources is often the reason why data centres are placed close to each other within a circle of disruption. “There are, for example, green data centres in Scandinavia and Iceland which are extremely energy efficient, but people are put off because they don’t think there is technology available to transfer data fast enough – and yet these data centres are massively competitive”, says Claire Buchanan, chief commercial officer at Bridgeworks.
Customer risk matrix
Michael Winterson, EMEA managing director at Equinix says that as a data centre provider, his company provides a physical location to its customers. “When we are talking with any one of our clients, we usually respond to their pre-defined risk matrix, and so they’ll ask that we need to be a minimum of ‘x’ and a maximum of ‘y’ kilometres away by fibre or line of sight”, he explains. His company then considers and vets the ‘red flags’ to consider the data centres that fall within or outside of the criteria set by each customer. Whenever Equinix goes outside of the criteria, research is undertaken to justify why a particular data centre site will be adequate.
“We will operate within a circle of disruption that has been identified by a client, but a lot of our enterprise account clients opt for Cardiff because they are happy to operate at distances in excess of 100 miles from each data centre”, says Winterson. Referring back to Fukushima and to Hurricane Sandy he claims that all of Equinix’s New York and Tokyo data centres were able to provide 100% uptime, but some customers still experience many difficulties with transportation and access to networks and to their primary data centres.
“If you operate your IT system in a standard office block that runs on potentially three or four hours of generator power, within that time, you’re now in the dark and so we saw a large number of customers who tried to physically displace themselves to our data centres to be able to operate the equipment directly over our Wi-FI network in our data centres, but quite often they would have difficulty moving because of public transportation issues because the areas were blocked”, explains Winterson. Customers responded by moving their control system in Equinix’s data centres to another remote office in order to access the data centre system remotely.
Responding to disasters
Since Fukushima his company has responded by building data centres in Osaka because Tokyo presents a risk to business continuity at the information technology and network layers. Tokyo is not only an earthquake zone; power outages in Japan’s national grid could affect both its east and west coast. In this case the idea is to get outside the circle of disruption, but Equinix’s New Jersey-based ‘New York’ and Washington DC data centres are “unfortunately” located within circles of disruption – within their epicentre because people elect to put their co-location facilities there.
“In the City of London for instance, for active-active solutions our London data centres are in Slough, and they are adequately placed within 65 kilometres of each other by fibre optic cable, and it is generally considered that you can run an active-active solution across that distance with the right equipment”, he says. In Europe customers are taking a two city solution, and they are looking at the four hubs of telecommunications and technology – London, Frankfurt, Amsterdam and Paris because they are really 20 milliseconds apart from each other over an Ethernet connection.
With regards to time and latency created by distance, Clive Longbottom, client service director at analyst firm Quocirca, says: “The speed of light means that every circumnavigation of the planet creates latency of 133 milliseconds, however, the internet does not work at the speed of light and so there are bandwidth issues that cause jitter and collisions.”
He then explains that active actions are being taken on the packets of data that will increase the latency within a system, and says that it’s impossible to say “exactly what level of latency any data centre will encounter in all circumstances as there are far too many variables to deal with.”
Longbottom also thinks that live mirroring is now possible over hundreds of kilometres, so long as the latency is controlled by using packet shaping and other wide area network acceleration approaches. Longer distances, he says, may require a store-and-forward multi-link approach which will need active boxes between the source and target data centres “ensure that what is received is what was sent”.
Trossell explains that jitter is defined as packets of data that arrive slightly out of time. The issue is caused, he says, by data passing through different switches and connections which can cause performance problems in the same way that packet loss does. “Packet loss occurs when the line is overloaded – this is more commonly known as congestion, and this causes considerable performance drop-offs which doesn’t necessarily reduce if the data centres are positioned closer together.”
“The solution is to have the ability to mitigate latency, to handle jitter and packet loss”, says Buchanan who advises that this needs to be done intelligently, smartly and without human intervention to minimise the associated costs and risks. “This gives IT executives the freedom of choice as to where they place their data centres – protecting their businesses and the new currency of data”, she adds.
A SCION solution such as WANrockIT offers a way to mitigate the latency issues created when data centres are placed outside of a circle of disruption and at a distance from each other. “From a CIO’s perspective, by using machine intelligence the software learns and makes the right decision in a micro-second according to the state of the network and the flow of the data no matter whether it’s day or night”, Buchanan explains. She also claims that a properly architected SCION can remove the perception of distance as an inhibitor for DR planning.
“At this stage, be cautious, however it does have its place and making sure that there is a solid plan B behind SCION’s plan A, means that SCIONs can take away a lot of uncertainty in existing, more manual approaches”, suggests Longbottom.
One company that has explored the benefits of a SCION solution is CVS Healthcare. “The main thrust was that CVS could not move their data fast enough, so instead of being able to do a 430 GB back-up, they could just manage 50 GB in 12 hours because their data centres was 2,800 miles away – creating latency of 86 milliseconds. This put their business at risk, due to the distance involved”, explains Buchanan.
Their intermediate solution was to send it offsite to Iron Mountain, but CVS wasn’t happy with this solution as it didn’t meet their recovery requirements. Using their existing 600Mb pipe and WANrockIT and each end of the network, CVS was able to reduce the 50 GB back-up from 12 hours to just 45 minutes irrespective of the data type. Had this been a 10 Gb pipe, the whole process would have taken just 27 seconds. This magnitude change in performance enabled the company to do full 430 GB back-ups on a nightly basis in just 4 hours. The issues associated with distance and latency was therefore mitigated.
The technology used within SCION, namely machine intelligence, will have its doubters as does anything new. However, in our world of increasingly available large bandwidth, enormous data volumes and the need for velocity, it’s time to consider what technology can do to help businesses underpin a DR data centre strategy that is based upon the recommendations and best practice guidelines that we have learnt since disasters like Hurricane Sandy.
Despite all mankind’s achievements, Hurricane Sandy taught us many lessons about the extensive destructive and disruptive power of nature. Having wrought devastation over 24 States this has dramatically challenged the traditional perception of what is a typical circle of disruption in planning for DR. Metro connected sites for failover continuity have to stay due to the requirements of low delta synchronicity but this is not a sufficient or suitable practice for DR. Sandy has taught us that DR sites must now located be hundreds of miles away if we are to survive.
- » Google Cloud launches in Poland as European data centre expansion continues
- » Amazon completes consumer database migration from Oracle to AWS
- » Is performance engineering still needed when it comes to cloud?
- » Three reasons why killing passwords will improve your cloud security
- » Moving from DevOps to modern ops: Why there is no room for silos when it comes to cloud security