The “easy” way to setup a Nutanix Disaster Recovery site

Nutanix is great for many reasons, I won’t go into all of them here, but one of my favorite features is the asynchronous replication. If your environment is configured correctly, setting up a disaster recovery environment can be super simple.

Let’s start with prerequisites:

  • At least 2 sites running Nutanix
  • Network infrastructure capable of configuring VRFs
  • Virtual IPAM solution, or duplicate IPAM hardware for test/dev
  • Asynchronous replication is already configured to the remote site

Now, let’s make some assumptions. Your corporate network is 10.0.0.0/16, and you have multiple subnets for various things. The only subnet we care about for this scenario are the subnet(s) added to networking within Nutanix. Let’s pretend it’s a single subnet, 10.0.1.0/24, on VLAN 101. Your second site can be any site; dedicated to disaster recovery or ROBO. Networking for the DR site is irrelevant for now.

The first thing we’re going to do is plan out the DR networking requirements. You have 1 or more PDs being replicated on a single VLAN. The remote site probably has it’s own networking. There are a whole bunch of things we could probably do (VXLAN for instance), but we’re going to make this simple. VRFs allow us to create duplicate networks without having a conflict on interfaces or in routing tables. You’ll need a single VRF and 1 VLAN assigned to that VRF. I’m going to use Brocade VDX (NOS) in this example.

First, the VLAN interfaces. Remember how I said we only needed 1? Yea, well… you could probably get away with 1 but I like to use /30 for firewalls, so we’ll add that now as well as the WAN VLAN. I’ll explain later. We’ll be making all of these changes to the switching/routing infrastructure at your disaster recovery site.

int vlan 1099
name DR_WAN
int vlan 1100
name DR_FWP2P
int vlan 1101
name DR_SUBNET10_1

Now I’m going to define the VRF. The VDX in my example is running in a VCS fabric. The default gateway will come into play later. Also, we’re enabling OSPF to make things easy. The default gateway for the new VRF will be whatever firewall you use. Virtual firewall, dedicated firewall… whatever you want.

rbr 1
vrf dr-vrf
address-family ipv4 unicast
ip route 0.0.0.0/0 10.255.255.1
router ospf vrf dr-vrf
area 0

Next up, we’re going to setup the router interfaces. I’m going to assume you use DHCP and have 2 DHCP servers. I actually prefer to use DHCP and DHCP reservations for servers (cattle not pets; see devops mentality). The IPAM solution I use has great APIs that are leveraged during the automated build process to automatically reserve an IP in a pool of addresses. The WAN VLAN does not require a routed interface, we just need that layer 2 connection.


interface Ve 1100
vrf forwarding dr-vrf
ip ospf area 0
no ip proxy-arp
ip address 10.255.255.2/30
no shutdown
interface Ve 1101
vrf forwarding dr-vrf
ip ospf area 0
ip dhcp relay address 10.0.1.10
ip dhcp relay address 10.0.1.11
no ip proxy-arp
ip address 10.0.1.1/24
no shutdown

At this point, we now have 3 VLANs on a VRF with two routed interfaces. The next step would be to add all 3 VLANs, 1099-1100 and 1101 to all of your Nutanix interfaces, and also into Prism networking. I typically use the VLAN name in the switch as the name in Prism for consistency. Once the VLANs are added, you will go into the Protection Domains at both sites and remap the production network to the DR VRF network. Let’s visualize it…

Now… why the firewall VLAN? To make things REALLY easy, I recommend using a permanent virtual firewall that is always running in your DR environment. Several vendors offer virtual instances now, and many of them will offer discounted rates for non-production environments. This applies to load balancers as well… If you use the same vendor, likely, you can backup and restore the config periodically so that the firewall and load balancers are always ready for a DR event. You will need a dedicated internet connection, or at the very least, a spare dedicated IP you can assign to the DR firewall (which would end up reusing a pre-existing WAN VLAN or moving WAN connections to a switched VLAN). You will likely not be able to use your ROBO firewall due to IP and routing conflicts (firewalls are not VRF aware), hence a separate virtual firewall. In this case, I’m using VLAN 1099.

WAN connection -> switch port on VLAN 1099 -> VLAN 1099 added to all Nutanix interfaces -> VLAN 1099 assigned to virtual firewall NIC 1 “WAN”
VLAN 1100 added to all Nutanix interfaces -> VLAN 1100 assigned to virtual firewall NIC 2 “LAN”

Configure your firewall appropriately. I assigned 10.255.255.2/30 to the switch, so assign 10.255.255.1/30 to the firewall LAN interface. Assign an appropriate IP to your WAN interface. You have a lot of remote access options here… SSL VPN, IPSEC VPN, RemoteApp (if you are a Windows environment), Citrix, etc. Essentially however your users typically access your production environment will be how you want to configure your DR firewall. You can use Amazon’s Route53 or DNSMadeEasy for DNS failover, or a specific DR DNS record. For example, if production users goto remote.whateveryourdomainis.com, then DR would be remote-dr.whateveryourdomainis.com. The rest is user education.

So, to recap, we have our PDs mapped to our new VRF network. A virtual firewall that mimics our production firewall, with it’s own dedicated IP. At this point you can activate the PD on the ROBO site. All of the VMs will get added to Prism… double check the VLAN assignment if you wish. Power everything up. Your self contained DR is now ready to go. If your team is compartmentalized (network admins, server admins, Nutanix admins, etc.) this may be more difficult to accomplish as it requires a great deal of teamwork. However, I highly recommend this route as it is extremely easy to setup, test and run. When you’re done testing, shut everything down and deactivate the PD.

If you have a DMZ in addition to a production network, you can create a second VRF or add the DMZ network to the same VRF as production. This would obviously remove security constraints, but in a DR scenario… what do you want to be troubleshooting? ACLs and multiple VRFs? or would you rather focus on restoring access to end users… Every environment is unique, some environments will require mirrored security constraints. Others will not, and for those I suggest dumping ALL VLANs into a single DR VRF for simplicity.

Side note: Fortinet, in my opinion, has an amazing product line and an UI/UX similar to what I’ve come to appreciate about Nutanix. We use them pretty heavily at the office, so they provided us with a virtual Fortigate and virtual FortiADC (load balancer) for practically nothing. Took about 5 minutes to spin them both up. I highly recommend looking at their products. As an alternative, their larger hardware firewalls support virtual domains (think: virtualized firewalls or VRF but for firewalls). If your company is budget-minded, you can place your DR and Production firewall on the same hardware. I’m sure other vendors are capable of this, but I’ve found that Fortinet makes it super easy.

Did you find value in this article?
Feel free to donate!
BTC 13QFVycCaP3QV8uRXKSm7picypE1a2gLYx
LTC LPA3M2mHcwJG5WpKi8oyS2RiJoLHt1bXyw
ETH 0x0cd8434f8C47fC2d92197748958824B8e7bFD2f2

Lab in a Box

If you are on a budget, but you have a Cisco PIX 515, Cisco layer-3 switch (I’m using a 3550) and a HP DL/ML 3-series server, you can create an entire lab with just these three devices. Obviously, it doesn’t have to be Cisco or HP… but as long as the Firewall supports trunking and VLAN subinterfaces, the switch supports VRF routing and the server supports trunking/vlans, then you should be able to modify this to work for any setup accordingly.

Lets start with the core switch, here is relevant config from the 3550 I’m using:

ip vrf INET
rd 2600:2
route-target export 2600:2
ip vrf NET1
rd 2600:3
route-target export 2600:3
ip vrf NET2
rd 2600:4
route-target export 2600:4

interface FastEthernet0/1
description Trunk to HP Server
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2-4
switchport mode trunk
no ip address
spanning-tree portfast

interface FastEthernet0/10
description Uplink to PIX Outside
switchport access vlan 2
switchport mode access
no ip address
spanning-tree portfast

interface FastEthernet0/11
description Trunk to PIX Inside
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 3-5
switchport mode trunk
no ip address
spanning-tree portfast

interface Vlan2
description LAB-INET
ip vrf forwarding INET
ip address 1.1.1.1 255.255.255.0

interface Vlan3
description LAB-NET1
ip vrf forwarding NET1
ip address 192.168.10.254 255.255.255.0

interface Vlan4 description LAB-NET2
ip vrf forwarding NET2
ip address 192.168.20.254 255.255.255.0

ip route vrf NET1 0.0.0.0 0.0.0.0 192.168.10.1
ip route vrf NET2 0.0.0.0 0.0.0.0 192.168.20.1

Here is the relevant config on the Cisco PIX:

interface Ethernet0
nameif outside
security-level 0
ip address 1.1.1.2 255.255.255.0

interface Ethernet1
no nameif
security-level 100
no ip address

interface Ethernet1.10
vlan 3
nameif inside-net1
security-level 100
ip address 192.168.10.1 255.255.255.0

interface Ethernet1.20
vlan 4
nameif inside-net2
security-level 100
ip address 192.168.20.1 255.255.255.0

access-list OUTSIDE_IN extended permit ip any any

global (outside) 6 1.1.1.4
global (outside) 7 1.1.1.5

nat (inside-net1) 6 192.168.10.0 255.255.255.0
nat (inside-net2) 7 192.168.20.0 255.255.255.0

access-group OUTSIDE_IN in interface outside

route outside 0.0.0.0 0.0.0.0 1.1.1.1 1

On the HP server, configure the trunk interface to have vlan 1, vlan 2, vlan 3 and vlan 4 (name the interfaces appropriately, assign them IP addresses). I used the following IPs:

vlan 1 (n/a)

vlan 2 1.1.1.3

vlan 3 192.168.10.2

vlan 4 192.168.20.2

Then, on the HP server install VMWare Server (free). Configure the VM networks to be bridged to vlan 2, 3 and 4.

Provision a virtual server on each interface and assign a corresponding bridged network.

You now have an “internet” server and two “private” servers behind NAT. On the “internet” server, setup DNS and assign the other servers to use it for DNS.

To test that I had NAT and firewall working properly, I installed IIS on each server and configured a host header and the appropriate DNS A records on the “internet” server.

I set each website to use index.asp (enabled ASP first) and used the following code:

<html>
<head>
<title>Teh Interwebs</title>
</head>
<body>
Welcome to teh interwebs.
Your IP Address = <%=Request.ServerVariables("REMOTE_ADDR")%>
</body>
</html>

You should be able to hit each website and have the correct “WAN” IP address display on each website. If you can successfully hit the “internet” from each server, and each server from the “internet” then you have a working setup. You can now dcpromo, install Exchange… do whatever it is that you want to test. Modify my setup slightly, and you can test DMZ configurations, among other things.