Site Recovery service is a BCDR (Business continuity and disaster recovery) solution which keeps your azure workload running during outages. You can configure this service to replicate the workloads to secondary site. Secondary site could be referred to as DR site as well. Its a region other than your primary site. For Example, if your Azure Workload is in Azure UK South Region, you can configure UK West to be your secondary region / DR Site. You can also replicate your on-premise servers to azure via ASR and have azure as your DR site. At the time of outage, just failover to secondary site to keep your application services / servers active thereby minimizing the downtime and disruption to the end users. You can replicate just one server to secondary site or multiple servers or create a recovery plan where in you can add the servers protected by ASR. Recovery plan defines how machines will failover and the sequence in which they start the failover. For example, you want Domain Controller to be up and running first and then SQL server and then Application servers. Up to 100 protected instances can be added to the recovery plan. You can also have servers in multiple recovery plans.
Once you configure the Azure Site Recovery (ASR) by replicating the servers to the secondary site and Protecting the servers. Its a best practice to Test the Failover time to time (disaster recovery drills) to make sure the failover is working as expected and when an outage will occur you will be confident that you will be able to successfully perform disaster recovery to DR Site. Test Failover have no impact on your existing environment because it is performed in an isolated network which has no communication with production network. This will also be a good time to note down the RTO (Recovery Time Objective) and Recovery Point Objective (RPO) when you perform test failovers to know how much time it may take to failover and till what point and if its aligned with the business requirement.
In this Blog Post, I will be configuring ASR for server tp-dc1 (windows server 2019) from one Azure region to another (e.g. UK South to UK West)
For Demo purposes, We will be protecting a virtual machine tp-dc1 existing in UK South (Primary Site) and perform Test failover first to UK West (DR Site) in isolated network and then initiating failover to UK West (DR Site).
Resource Group: tp-uksouth
Virtual Network: vnet-uksouth
Address Space: 10.10.0.0/16
Subnet: LAN 10.10.1.0/24
Virtual Machine: tp-dc1
Resource Group: tp-ukwest
Virtual Network: vnet-ukwest
Address Space: 10.20.0.0/16
Subnet: LAN 10.20.1.0/24
Create Recovery Services Vault
Recovery Services vault is used hold backup data for various azure services like Virtual machines, SQL Databases. You can also use recovery services vault for Site Recovery Service. First we need to create Recovery Services Vault. Make sure to create the Recovery Services Vault in Secondary / DR region. Our DR Site is UK West, so we will create Recovery services vault in UK West Region.
Search for Recovery Services Vault in Azure Portal and Create Recovery Services Vault.
Create recovery services vault
Select Secondary Site / DR Site while creating recovery services vault
Recovery Services Vault
Enable Replication by selecting the options as shown in the screenshot:
Provide the Source location and Source resource Group where your virtual machine(s) exists.
Select the Virtual Machines which you want to protect. In my case, I want to protect tp-dc1 VM, which means in case of disaster (If UK South region goes down completely), then i can failover the VM tp-dc1 to UK West (Secondary Site / DR Site) and keep my services active / running with very less downtime. You can enable replication for more servers by selecting the servers from the list or it can be done later by using Replicated Items option in recovery services vault. Click on Next once you have selected all the servers which you want to protect.
Please make sure the Virtual Machine is in running state before Enabling the replication. ASR installs an agent on the VM which requires VM to be in running state. If Virtual Machine is not running, you will get below error:
The virtual machine(s) ‘tp-dc1’ is/are not in ‘Running’ state or not provisioned successfully. Ensure that the VM’s power status is ‘Running’ and provisioning state is ‘Succeeded’. You can check the VM status in ‘Virtual machine > Settings > Properties > Status’. Refer to this document (https://aka.ms/a2a-vm-state-issues) to troubleshoot VM provisioning state issues.
Replication Settings tab shows the target location, target subscription, target resource group, target virtual network etc. Here on this tab you can verify the configuration and if you want to change or customize then click on Customize to update the settings.
Target Resource Group: As you can see from below screenshot, ASR service automatically creates the resource group for you. This is the resource group which will be used at the time of failover. This resource will be created in Secondary Site / DR Site. You can also create this resource group before enabling the replication and provide the resource group name here.
Target Virtual Network: This will create the vnet in the DR location which mirrors the source vnet of the virtual machine. you can also create an isolated vnet before enabling replication and click on customize and provide the vnet name. For simplicity i am using the default configuration. Please make sure this virtual network is isolated from production environment.
If you have an existing virtual network in UK West region which is being used for resources existing in UK west region, You can also create a subset in that vNet to be used as test failover network subnet. Make sure that subnet is totally isolated and cannot communicate with any other network. If you are using any NVA (e.g Palo Alo or F5), you can use UDR and block all inbound / outbound communications to this subnet. In our example, we are using a separate vNet and by default vNet is totally isolated from other vNet.
After you click on Enable Replication, you can see the update from the Notification center.
You can monitor the Replication status from the status Column by refreshing the page or you can click the the Replicated server Item and check the status to find out about the current replication progress. The server is getting replicated from UK South to UK West. Time it takes to complete the replication varies depending upon the server /disks / data.
Synchronization in progress. Currently 0% synchronized.
After waiting for around 10-15 minutes, Another Refresh to the page. Now the Synchronization is 96% completed.
Waiting for First Recovery Point
Finally the status will show as protected. From the Point of clicking Enable replication to the point of when the server is shown as protected. it took me around 35 minutes. This was a lightweight Windows server 2019. It may take more or less time when you enable the replication depending upon the servers / data / number of disks etc.
You can see the warning message in the Failover Health column which indicates that the we have never performed test failover or last test failover was not successful. If you open the replicated server item, you can check the information about the protected server. [Click on the image to see the enlarged / full version of image].
When Last successful Test Failover state is Never performed successfully, you will get a warning message before performing failover / or invoke DR. Therefore, its recommended to perform test failover time to time to validate your Disaster Recovery plan. Update documentation about the test failover each time test failover is performed and keep note of the RPO and RTO vaules.
Open Recovery services vault -> Replication Items -> Click on the Protected Server.
It will open Overview page as shown in the previous screenshot, where you can check the replication health, Status, RPO, Errors etc. there are three links on the left hand side which will provide more information about the replication and configure the Compute and network and see the disks which are protected. Click on each link to explore and find more information.
I will focus on Compute and Network page where you can configure the settings related to failover and configure vNet, IP Address etc. [Click on the image to enlarge.]
Perform Test Failover (Disaster Recovery drill)
Please check the next part of Azure Site Recovery (ASR) series of blog posts to learn more about how to perform Test Failover.