The “easy” way to setup a Nutanix Disaster Recovery site

Nutanix is great for many reasons, I won’t go into all of them here, but one of my favorite features is the asynchronous replication. If your environment is configured correctly, setting up a disaster recovery environment can be super simple.

Let’s start with prerequisites:

  • At least 2 sites running Nutanix
  • Network infrastructure capable of configuring VRFs
  • Virtual IPAM solution, or duplicate IPAM hardware for test/dev
  • Asynchronous replication is already configured to the remote site

Now, let’s make some assumptions. Your corporate network is 10.0.0.0/16, and you have multiple subnets for various things. The only subnet we care about for this scenario are the subnet(s) added to networking within Nutanix. Let’s pretend it’s a single subnet, 10.0.1.0/24, on VLAN 101. Your second site can be any site; dedicated to disaster recovery or ROBO. Networking for the DR site is irrelevant for now.

The first thing we’re going to do is plan out the DR networking requirements. You have 1 or more PDs being replicated on a single VLAN. The remote site probably has it’s own networking. There are a whole bunch of things we could probably do (VXLAN for instance), but we’re going to make this simple. VRFs allow us to create duplicate networks without having a conflict on interfaces or in routing tables. You’ll need a single VRF and 1 VLAN assigned to that VRF. I’m going to use Brocade VDX (NOS) in this example.

First, the VLAN interfaces. Remember how I said we only needed 1? Yea, well… you could probably get away with 1 but I like to use /30 for firewalls, so we’ll add that now as well as the WAN VLAN. I’ll explain later. We’ll be making all of these changes to the switching/routing infrastructure at your disaster recovery site.

int vlan 1099
name DR_WAN
int vlan 1100
name DR_FWP2P
int vlan 1101
name DR_SUBNET10_1

Now I’m going to define the VRF. The VDX in my example is running in a VCS fabric. The default gateway will come into play later. Also, we’re enabling OSPF to make things easy. The default gateway for the new VRF will be whatever firewall you use. Virtual firewall, dedicated firewall… whatever you want.

rbr 1
vrf dr-vrf
address-family ipv4 unicast
ip route 0.0.0.0/0 10.255.255.1
router ospf vrf dr-vrf
area 0

Next up, we’re going to setup the router interfaces. I’m going to assume you use DHCP and have 2 DHCP servers. I actually prefer to use DHCP and DHCP reservations for servers (cattle not pets; see devops mentality). The IPAM solution I use has great APIs that are leveraged during the automated build process to automatically reserve an IP in a pool of addresses. The WAN VLAN does not require a routed interface, we just need that layer 2 connection.


interface Ve 1100
vrf forwarding dr-vrf
ip ospf area 0
no ip proxy-arp
ip address 10.255.255.2/30
no shutdown
interface Ve 1101
vrf forwarding dr-vrf
ip ospf area 0
ip dhcp relay address 10.0.1.10
ip dhcp relay address 10.0.1.11
no ip proxy-arp
ip address 10.0.1.1/24
no shutdown

At this point, we now have 3 VLANs on a VRF with two routed interfaces. The next step would be to add all 3 VLANs, 1099-1100 and 1101 to all of your Nutanix interfaces, and also into Prism networking. I typically use the VLAN name in the switch as the name in Prism for consistency. Once the VLANs are added, you will go into the Protection Domains at both sites and remap the production network to the DR VRF network. Let’s visualize it…

Now… why the firewall VLAN? To make things REALLY easy, I recommend using a permanent virtual firewall that is always running in your DR environment. Several vendors offer virtual instances now, and many of them will offer discounted rates for non-production environments. This applies to load balancers as well… If you use the same vendor, likely, you can backup and restore the config periodically so that the firewall and load balancers are always ready for a DR event. You will need a dedicated internet connection, or at the very least, a spare dedicated IP you can assign to the DR firewall (which would end up reusing a pre-existing WAN VLAN or moving WAN connections to a switched VLAN). You will likely not be able to use your ROBO firewall due to IP and routing conflicts (firewalls are not VRF aware), hence a separate virtual firewall. In this case, I’m using VLAN 1099.

WAN connection -> switch port on VLAN 1099 -> VLAN 1099 added to all Nutanix interfaces -> VLAN 1099 assigned to virtual firewall NIC 1 “WAN”
VLAN 1100 added to all Nutanix interfaces -> VLAN 1100 assigned to virtual firewall NIC 2 “LAN”

Configure your firewall appropriately. I assigned 10.255.255.2/30 to the switch, so assign 10.255.255.1/30 to the firewall LAN interface. Assign an appropriate IP to your WAN interface. You have a lot of remote access options here… SSL VPN, IPSEC VPN, RemoteApp (if you are a Windows environment), Citrix, etc. Essentially however your users typically access your production environment will be how you want to configure your DR firewall. You can use Amazon’s Route53 or DNSMadeEasy for DNS failover, or a specific DR DNS record. For example, if production users goto remote.whateveryourdomainis.com, then DR would be remote-dr.whateveryourdomainis.com. The rest is user education.

So, to recap, we have our PDs mapped to our new VRF network. A virtual firewall that mimics our production firewall, with it’s own dedicated IP. At this point you can activate the PD on the ROBO site. All of the VMs will get added to Prism… double check the VLAN assignment if you wish. Power everything up. Your self contained DR is now ready to go. If your team is compartmentalized (network admins, server admins, Nutanix admins, etc.) this may be more difficult to accomplish as it requires a great deal of teamwork. However, I highly recommend this route as it is extremely easy to setup, test and run. When you’re done testing, shut everything down and deactivate the PD.

If you have a DMZ in addition to a production network, you can create a second VRF or add the DMZ network to the same VRF as production. This would obviously remove security constraints, but in a DR scenario… what do you want to be troubleshooting? ACLs and multiple VRFs? or would you rather focus on restoring access to end users… Every environment is unique, some environments will require mirrored security constraints. Others will not, and for those I suggest dumping ALL VLANs into a single DR VRF for simplicity.

Side note: Fortinet, in my opinion, has an amazing product line and an UI/UX similar to what I’ve come to appreciate about Nutanix. We use them pretty heavily at the office, so they provided us with a virtual Fortigate and virtual FortiADC (load balancer) for practically nothing. Took about 5 minutes to spin them both up. I highly recommend looking at their products. As an alternative, their larger hardware firewalls support virtual domains (think: virtualized firewalls or VRF but for firewalls). If your company is budget-minded, you can place your DR and Production firewall on the same hardware. I’m sure other vendors are capable of this, but I’ve found that Fortinet makes it super easy.

Did you find value in this article?
Feel free to donate!
BTC 13QFVycCaP3QV8uRXKSm7picypE1a2gLYx
LTC LPA3M2mHcwJG5WpKi8oyS2RiJoLHt1bXyw
ETH 0x0cd8434f8C47fC2d92197748958824B8e7bFD2f2

Leave a Reply

Your email address will not be published. Required fields are marked *