You are reading our blog series: Pure Ansible Infrastructure
- Toward infrastructure simplicity (You are here)
- Drumkit and other plumbing
- Dynamic inventory
- Variables and Vault
- Playbooks and support infrastructure
- Building Ansible Collections
- make targets, Droplets, and Aegir, oh my!
Infrastructure as Code gets complicated quickly
In the first iteration of Consensus Enterprises’ internal infrastructure, we built a well-provisioned, multi-environment architecture leveraging the most Open services we could find (https://openstack.org). We built things as if we were one of the larger-scale clients we tend to work with, who have legitimate needs for more complex architectures.
We implemented a carefully crafted Infrastructure-as-Code (IaC) repository, capable of switching between environments based on shell scripts to set credentials and environment variables. We had shared Terraform state files and thoroughly modularized declarative configuration describing our infrastructure. We had Terraform handing off the configuration to Ansible playbooks so that a single command could build up or tear down the entire environment. We had a lot of (potential) power and a corresponding amount of complexity.
In our work with our clients, we often find ourselves advocating for a “keep it simple” approach, only introducing more complexity into a system when it solves a problem we’re currently experiencing.
For our purposes within Consensus, we really just need a handful of VMs and some minimal support infrastructure like a VPC and Firewall. Adding more complexity unnecessarily comes at a real cost, both in terms of maintenance fees but also in terms of cloud services bills! Recently we decided it was worth taking some time to rethink our IaC approach to building our own infrastructure, with an aim to simplify as much as possible.
Shedding complexity
There were a few areas of complexity we recognized we could shed. The key piece of technical complexity we identified was the use of 2 different tools to handle our IaC.
In principle, Terraform is the “correct” tool for handling the provisioning of Cloud resources, maintaining their state, etc. Ansible is intended as a “configuration management” tool, that’s great for picking up a newly-provisioned Ubuntu VM and turning it into a Webserver tuned for hosting your application.
This is great, but some complexity creeps in from the interface between the tools. In order to seamlessly transition from one to the other, you end up with “null_resource” provisioners in your Terraform code to fire off your ansible-playbooks. Those playbooks aren’t quite reusable on their own, because of the context and variables they need which are passed from Terraform. These are manageable considerations but lead to a lot of overhead and churn for what should be relatively simple operations.
Ultimately, for the small and simple infrastructure needs we currently have as an organization, using Terraform and Ansible together was simply more horsepower than we needed.
Simplicity wins the day
In an effort to reduce our costs overall, we recognized that moving to DigitalOcean would save money without any loss of functionality, in terms of what we were actually using with Openstack. What’s more, it’s relatively easy to share access to the DigitalOcean Team, whereas logging into an Openstack Project is.. tedious.
We decided to simplify the IaC setup as well, and move to a pure Ansible approach, dropping Terraform entirely. It turns out that Ansible has matured quite a bit when it comes to provisioning infrastructure, and in particular the community.digitalocean collection of modules is very solid and easy to work with. Here again, the Openstack support in both Terraform and Ansible was always a bit uncertain.
That said, in principle any cloud provider would work in our setup, especially if their API is well-supported by an existing Ansible collection.
Beyond that, we’ve discovered the glory of Tailscale as a dead-simple way to set up a Wireguard-based VPN (and honestly, one of the simplest VPN setups I’ve encountered, period!) We used to provision a Wireguard server ourselves, manage the keys manually, and had a complicated multi-step process to onboard a new team member to login. Now we create a user account, the user logs into Tailscale with it, and they’re done.
Finally, we dropped the notion that a single code repository needed to handle deploying the same (or similar) resources into multiple environments. This not only introduced a lot of complexity (conditional logic, variable indirection) in the codebase. It also represented a risk for the developer or operator using the repository, who could easily run destructive operations on the wrong (ahem, production) environment accidentally. We’ve sinced moved to a “dev” IaC repository and a “prod” IaC repository, and they mirror each other in structure and shared libraries (more to come on this), each has config and credentials specific to the environment they target, and are built to focus only on that environment.
Tying it all together
Having built all this up and deployed it into production over the last few weeks, I wanted to share what we’ve done. I’ve published a couple of Ansible galaxy collections that encapsulate much of what we did, but I wanted to share how it all fits together.
To that end, I’ve created an example project in Gitlab called Example Aegir3 Infrastructure that uses these published collections in a way that closely resembles our production setup (simplified somewhat for illustrative purposes).
In the remainder of this series, I’ll review how we build a pure Ansible project that leverages our Ansible collections to provision a complete Aegir3 instance on DigitalOcean.
The article Toward infrastructure simplicity first appeared on the Consensus Enterprises blog.
We've disabled blog comments to prevent spam, but if you have questions or comments about this post, get in touch!