Analysing cloud as the target for disaster recovery
Analysis If we think about the basic functionality that we take advantage of in the cloud, a whole set of possibilities open up for how we plan for disaster recovery (DR).
Until recently, disaster recovery was an expensive, process intensive service, reserved for only the most critical of corporate services. In most cases DR has been about large servers in two geographically distributed location and large data sets being replicated on a timed basis. Many smaller or less critical services were relegated to a backup and restore DR solution although in many cases as these applications “grew up”, organisations realised that they too needed better protection. Unfortunately, the cost of standing up a legacy style DR environment remained prohibitive for all but the largest and most critical services.
As the world has moved to virtual data centre (VDC) and cloud based services we have seen the paradigm shift. Many features of private, public and hybrid clouds provide a solid basis for developing and deploying highly resilient applications and services. Here are some of the features that make this possible.
- Lower infrastructure cost. Deployment of services into a cloud environment has been shown to be highly effective in reducing acquisition, upgrade and retirement costs. While an organisation’s “mileage may vary”, planning and choice of appropriate cloud options can provide an environment to protect a much larger range of services.
- Regionalisation. The ability to create a cloud environment based on multiple geographically distinct cloud infrastructure instances. This allows the “cloud” to operate as one while distributing load and risk to multiple infrastructures. These regions can be built in a number of ways to fit the organisation’s requirements.
- Storage virtualisation and cloud based replication. The biggest issue facing any DR solution hasn’t changed just because we live in the cloud; data consistency is and will remain the number one headache for DR planners whether utilising legacy technologies or the cloud.
Fortunately, over time the maturity of storage virtualisation and cloud based replication technologies has increased in an attempt to keep up with the challenge. Once again, organisations need to understand their options in terms of hypervisor based replication, such as Zerto, which replicates data on a hypervisor-by-hypervisor basis, or storage based virtualisation, such as VPlex and ViPR from EMC, for storage based replication.
The key concept in DR planning in the cloud is the creation of an extended logical cloud regardless of the physical location. Let us look at three possible solutions utilising varying cloud deployment models.
Multi-region private cloud
In this option, both the primary and secondary sites, as well as the DevOps and management environments sit within the organisation’s internal environment. The primary and secondary are configured as regions within the organisation’s private cloud.
The biggest benefits to this option are that data is replicated as wide-area-network speed and corporate security policies can remain unmodified. The downside to this option is the need to acquire multiple sets of infrastructure to support the services.
Multi-region hybrid cloud
In this option the services are developed, managed and primarily deployed to an in-house private cloud environment while the secondary site resides within a public cloud provider’s domain. This configuration reduces the need to purchase a secondary set of hardware but also increase data replication load over the public Internet and the time required to move data to the public cloud.
Multi-region public cloud
In this option both primary and secondary sites reside in the public cloud and depend on the cloud provider’s internal networking and data network services. The service’s management and DevOps still reside within the organisation. This is the lowest cost and most rapid growth option due to low acquisition and update costs as well as providing the most flexibility. The possible downsides to this option are data movement to and from the cloud, and the possible need for adjustment to the organisation's security policies and procedures.
Many aspects of the above solutions need to be considered before beginning a project to use the cloud as a DR target.
There are plenty of items to think about - not least your disaster recovery operations mode, and whether it is active/active or active/passive. Just like legacy DR solutions, a decision needs to be made about the activity or non-activity of the DR resources versus the cost or benefit of utilising, or leaving idle, a set of resources. If it is a benefit to reach a wider geographic region, then active/active might be a consideration, although keep in mind that active/active will require two-way replication of data, while active/passive will not require this level of coordination. Networking is also key; DNS, user access and management connectivity to the environment needs to be thoroughly planned out.
The biggest concern I have heard from customers is, “How do I enforce the same security standards on a public cloud environment as I have for my in-house environments?” This is an excellent question and one not answered lightly.
In many cases corporate security policies (including perimeter controls, IDAM, logging) can be translated to the public cloud by being a little flexible and a good deal innovative. For example, virtual perimeter firewalls can be implemented, and controlled from the same SOC as their physical counterparts. Also, the same IDAM system that is utilised in-house can modified and then accessed over the net in a public cloud based environment.
Keeping the applications that make up the service in sync across regions requires that when updates are made to the primary virtual machines, virtual machines in the secondary environments are also updated. The implementation of a cloud orchestration tool, such as CSC’s Agility suite, can help a great deal.
One decision point that an organisation needs to come to is between virtualisation of data and replication of data. This carefully considered decision depends on the chosen DR operations mode and the application architecture. Another viable option is for the application to maintain the consistency of the data. The best example of this is directory services (below). Directory services applications are built to maintain data consistency across the multiple controller groups.
It is still true that moving large amounts of data in the public cloud can be a slow and painful process. Unfortunately, most applications will need to have sizeable data sets deployed as some point. I have advised a number of customers to limit the number of large data moves to major version deployments and changes to the underlying structure.
Even if the number of large data moves is limited, proper data architecture and structure is critical. Data consistency based on the DR mode and data replication strategy – in other words, how soon the service needs to have data consistent across regions – is another aspect that needs to be understood.
The following is a high level diagram that shows a hybrid solution for directory services:
The easy part of this solution is that the domain controllers are built by the software provider to stay synchronised. This reduces the data replication problem to providing a large enough network connection for the transactions.
Fortunately, the “add, change and delete” transactions typical of directory services are very small and even in a high volume environment do not need a very large pipe between private and public clouds. Also, while a physical firewall controls access to the primary private cloud environment, an equivalent virtual firewall is used in the public cloud.
- » New initiative aims to create ‘first ocean-powered data centre’ in Scotland
- » Microsoft expands European Azure presence with Germany and Switzerland launches
- » Putting data security at the heart of digital transformation – from culture to code
- » Why it continues to make sense for IT ops to move to the cloud: A guide
- » What enterprise IT teams can learn from Google Cloud’s June outage: A guide