Ensuring high availability for cloud-based banking applications

It’s tempting to think that a cloud service provider will ensure the high availability of your critical cloud-based banking applications. The problem is that they really don’t.

Todd Doane, Solutions Architect, SIOS Technology

Your cloud provider may have helped you configure a cluster of virtual machines (VMs) running out of multiple data centers or availability zones (AZs). It may have implemented an automated failover system to ensure that a standby VM in the configuration can take over immediately if the primary VM suddenly goes offline. It all sounds like it should deliver high availability, right?

But look closely at the service level agreement (SLA) outlining high availability: The SLA guarantees that at least one of the VMs in your system will be accessible at least 99.9% or even 99.99% of the time. But that’s not a guarantee of application or data availability. If the remaining VM can’t access the storage infrastructure where your banking applications and data reside, your critical applications are effectively offline.

Ensuring cloud accessibility

How can you ensure that your critical banking applications and data remain highly accessible in the cloud or in a hybrid on-prem/cloud configuration, if configuring the underlying technology for automated failover across multiple AZs is insufficient?

Let’s start by saying that having clustered VMs spread among multiple AZs is critical to ensuring the high availability (HA) of your key applications and data. What you need in addition, though, is a strategy for ensuring that each of those VMs has access to the critical applications and data you want to keep running. That’s where traditional approaches to HA diverge when it comes to the cloud.

In a traditional — meaning on-premises — HA configuration, you might create a failover cluster consisting of multiple servers or VMs and a storage area network (SAN), where your applications and data reside. Any server or VM in the cluster could interact with the applications and data in the SAN, so if the VM actively running a key application suddenly went offline, the cluster would automatically fail over to another VM that could interact with the SAN and start running the application and updating the same database that the previous machine had been using.

Configuring for the cloud

In the cloud, though, there’s no real option to create a shared SAN. There are some shared storage options, but they’re not built to provide the performance or levels of HA your critical banking applications require. Instead, cloud-based HA configurations depend on high performance storage attached to each of the VMs in the cluster. When a given VM is running an application, it is interacting with data stored in a database that resides in the storage attached to that VM.

The key to HA for cloud-based banking applications, then, is to ensure that each VM in your cluster always has the same applications and the same data. That way, if the primary VM in the cluster suddenly goes dark, the cluster can automatically fail over to a standby VM, any one of which can begin running the application and interacting with the data immediately because a copy of the application and data resides in its own attached storage.

Your cloud provider can easily configure the VMs that will provide the levels of performance and availability that your critical applications demand. It can also attach high performance storage systems to those VMs, and it can configure your cluster for automatic failover across multiple AZs. Then, you need to deploy a mechanism that automates the synchronous replication of data among all the storage systems attached to the VMs in your failover cluster.

Data replication solutions

You have a number of choices when it comes data replication solutions.

If your cluster is based on Windows and you’re using the Microsoft SQL Server, you can use SQL Server’s built-in Availability Groups (AGs) feature, which will automatically replicate user-named SQL databases to each of the nodes in your cluster. The downside of this approach is that it only replicates SQL databases, rather than every block of data in storage. Replicating multiple SQL Server databases to multiple standby VMs can get very expensive as you’ll have to use SQL Server Enterprise Edition to replicate more than one database or to replicate databases to multiple VMs, even if your applications run perfectly well using SQL Server Standard Edition.

Alternatively, you could use a SANless clustering solution, which provides automated block-level replication of data from the active primary VM to each of the secondary VMs in a cluster. The advantage of using a SANless Clustering solution is that it is application and database agnostic; it simply replicates blocks of data from one storage system to another, ensuring that all the data in your primary storage system is replicated to each of the other VMs. The downside to a SANless clustering approach is that there’s yet another piece of software for your IT team to license and learn, which may feel onerous if you can use the AG functionality of SQL Server at no additional cost.

Data replication is the key to ensuring HA for cloud-based banking systems, whether you use the functionality built into a solution like SQL Server, or the functionality provided by an independent SANless clustering solution.

Your cloud provider can provide the high-performance infrastructure that your applications demand, but you must ensure that the data and applications available to each of the VMs in that cluster are up to date if your HA solution is going to perform as expected when you need it to do so.

Todd Doane is a Solutions Architect at SIOS Technology. He has spent more than 20 years, primarily in the financial services world, creating high availability reference architectures and application-specific design patterns and principles.