VMware View Failover Automation – Solved!

Last week I spent some time talking to a partner, ByteLife, about a solution that they’ve created for a customer. The customer needed an automated failover solution for their VMware View environment and this is the story on how they solved it.

VMware-View-Diagram-large-edited

First some background to what the customer saw as their problem. Customers using VMware View sometimes face issues trying to handle fault tolerance in case of site disaster. Even though VMware has a solution for failing over virtual machines to a secondary site, if you are lucky enough to have one, it does not support virtual desktop infrastructures (see here for the Compability Matrix, no SRM support for View).

As a result, VMware View is often installed either separately in both datacenters, guaranteeing that at least half of the desktops would survive the site failure, or even as a single environment that could break down in whole during a large outage.

The customers, whose business and workers usually don’t like losing access to their applications and/or desktops during a site failure, can choose a more complex setup and use specific manual failover tasks during the site failure. The good thing is that it is possible using solutions such as this from VMware or this from EMC. On the other hand – during a site failover, IT personnel already have a tremendous load and pressure to bring the site or the services back online – any additional service to worry about just adds to the unnecessary complexity of the crisis. Having an automated failover that can be initiated by few clicks in the remaining datacenter will free up the IT staff’s time, when they need it the most.

So for this specific customer, ByteLife has developed a solution called “VMware View Failover Automation” with the following key functionalities:

  • Failover desktop pools and virtual machines in case of site crash
  • Migrate desktop pools and virtual machines during maintenance, tests, and rebalancing the load between sites or as failback after disaster
  • Restore storage synchronization between datacenters after the outage
  • Integration with vSphere WebClient

But wait, there’s more!

For this, all you need is vCenter Orchestrator, no SRM. Yes, you read that right, no SRM. What’s even cooler is that you can actually use this for several sites, you’re not limited to just two sites! Imagine that, being able to failover any VMware View site, without SRM, within minutes.

Failover of the VMware View environment takes only minutes, depending on the number as well as nature of desktops and the components that are failed over. It’s been proven that the first users can restart their work in new site in less than 5 minutes after the failover is initiated, which I find pretty amazing compared to the other solutions I’ve seen on this subject. So how this work, what makes it so fast?

View Pools

Looking at the picture above, let’s assume your current Linked clone pools is called  *-A-Pool and your pool that’ll be used in a failover scenario is called *-A-Pool-Recovery. The pools are exactly the same, uses the same VM as base image, and some VMs are already pre-provisioned. So when failing over, all that’s done is registering the users to the *-A-Pool-Recovery pool, removing them from *-A-Pool and then they can reconnect. Same desktop Pool ID, same everything, so it’s fully transparent to the users. Some other settings are automated as well, like maximum amount of desktops per pool. All pools are enabled all the time, to make sure it’s possible to do changes and things like recompose on all pools to have a consistent image version across the entire environment. All automated, and seeing it live is really impressive.

But what about the manual pools? Well, they’re handled a bit differently. In case of a failure, the vCO workflow shuts down all manual VMs (if they are still reachable and running), removed from vCenter inventory, the datastores dismounted, replication flow of the datastore is then switched and the now primary datastore is attached to the secondary site, the VMs readded to the inventory, powered on, then the manual pool is modified in the AD LDS database to be moved from A to B. And of course, all user assignments are preserved. All automated, frickin awesome IMHO!

VMware View vCenter Orchestrator Failover Workflow

As this is based on vCO workflows, there’s no hardcoded input on pools or available sites, everything is collected using the Status Report, Migration, Failover and Restore Synchronization workflows. The vCO workflows only lists the pools and sites that actually have entitlements and are active, everything else is hidden meaning you can focus on getting your stuff up and running quickly instead of having to trawl through all the possible environments that *might* be used.

So, this can be used for failover, but also planned migration of VMs from one site to another if you want to balance the workload between sites for instance.

Another cool feature that came up during the discussion is that you could actually use this for recomposing large environments with very little downtime. Let’s say you’re currently using *-A-Pool as in our previous example, you could recompose the virtual desktops in *-A-Pool-Recovery, and just migrate your users over there. Instead of recomposing all existing VMs, you’d move your users to already recomposed images with fresh patches and everything installed, how cool is that?!

I found it very refreshing to see a totally new take on the failover methods for VMware View environments, and I’m certain it would benefit your environment.

And lastly, some technical info:
The solution is based on VMware vCenter Orchestrator workflows. The current version of VMware View Failover Automation is supported with VMware View 5.1 and up; and EMC VNX with MirrorView. The network latency between two sites must not exceed 5 ms.

Contact info for the solution:
Alar Kuuda (Project Manager) – alar.kuuda@bytelife.com
+372 5097873

Advertisements

About Jonas Rosland

Open Source Community Manager at {code} by Dell EMC
This entry was posted in Automation, EMC, SRM, vCenter, vCenter Orchestrator, View, VMware, VNX, vSphere5 and tagged , , . Bookmark the permalink.

2 Responses to VMware View Failover Automation – Solved!

  1. Joerg Lew says:

    Cool stuff! (And a very nice burning datacenter in the first picture…) Keep Orchestrating!

    • Jonas Rosland says:

      Thanks! ByteLife did a great job with this solution, and I’ll let them know 🙂
      Happy Automating and Orchestrating!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s