How to Improve App Reliability on a Budget

How to optimize the reliability of your applications and your infrastructure?

The conventional answer is to deploy a range of tools, teams, and roles that aren’t exactly cheap. You can hire site reliability engineers (RES)specialized in optimizing availability and performance, but they represent one of the most expensive roles in the modern IT organization. You can refactor your applications to run as microservices, which improve reliability and performance, but it requires a lot of development resources that your company may not have. You can pay for more expensive cloud hosting or mirror your workloads across multiple Cloud Availability Zones or Regions to increase availability, but this could significantly increase your cloud computing bill.

So what if you need to improve application reliability but don’t have unlimited financial resources? What if you can’t afford to refactor or pay for SREs to be available 24/7?

There are actually a number of things you can do. As this article explains, it’s possible to improve application reliability without breaking the budget, even if your IT budget is stretched to begin with.

1. Take advantage of load balancing

Load balancers distribute application traffic between multiple application instances or servers. They increase application reliability by ensuring that you use hosting resources as efficiently as possible. For example, a good load balancing configuration can redirect traffic from one application instance that is maxed out to one that is underutilized so that the application continues to operate without dropping requests.

Public clouds offer fully managed load balancing services, or you can set up your own load balancer.

There is a cost associated with managed load balancers, but it’s low and most load balancers are easy to set up. This is a simple and inexpensive way to improve reliability.

2. Configure automatic scaling

Autoscaling services are another inexpensive (or, in some cases, free) and easy way to improve reliability. Automatic scaling allows you to configure rules that will automatically increase or decrease infrastructure availability. As a result, you can adapt to variations in demand, allowing you to keep your application running smoothly even if you encounter unforeseen load changes.

Not all applications and frameworks can scale automatically, but autoscaling is available for most basic IaaS services, such as AWS EC2 and Azure Virtual Machines.

3. Create reliability manuals

Playbooks are predefined procedures that teams develop in advance to explain how they will react to various types of problems, such as a server or network failure. By speeding up incident response processes and removing some guesswork, playbooks help reduce the risk of downtime due to unexpected incidents.

There is no direct cost to develop playbooks and no special tools to pay for (although to get the most out of playbooks you may want to integrate them with your observability and incident response tools). And while you need some staff time to create playbooks, you can streamline the process by looking at issues your team has had in the past and how they’ve responded to them. This information can form the basis of your playbooks.

4. Containerize your app, even if it’s a monolith

Running applications in containers improves reliability because it provides a more consistent and predictable hosting environment. When your application is containerized, the configuration variables on the host server aren’t very important because the only configuration that really matters is what’s built into the container.

Containers are most often used to host microservices. But there’s no reason why you can’t run a monolithic app inside a container as well. You won’t get all the scalability benefits you’d get from a microservices architecture, but you’ll get a more consistent application hosting environment and, by extension, a lower risk of reliability issues. .

You also won’t have to spend significant development resources to refactor your application. You may need to make changes to accommodate requirements such as application storage (because containers are ephemeral, they cannot provide persistent storage resources for a monolithic application the way a host server could ), but you won’t need an entire development team and months of time to complete these challenges.

5. Use Canary Builds

A canary release is the deployment of a newer version of an application to a selected group of users – the so-called canaries. This way, any reliability issues introduced by the release will impact a limited portion of your user base, and you can fix the issues before releasing the release to everyone.

Canary builds add some complexity to the app deployment process, as they require you to be able to host different versions of your app for different users. But you can usually do this quite easily – and at little cost – by setting up load balancers to direct traffic as needed to multiple application versions.

Application reliability doesn’t have to be expensive

In a perfect world, every team would have the resources to invest in software architectures and hosting models that maximize reliability. But in the real world, it’s not always financially possible to take advantage of the most sophisticated reliability tools and techniques.

Fortunately, there are less expensive ways to improve application reliability, and most of them don’t require a lot of complexity or setup either.

About the Author

Christopher Tozzi is a technology analyst with expertise in cloud computing, application development, open source software, virtualization, containers and more. He also teaches at a major university in the Albany, New York area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution”, was published by MIT Press.