Kubernetes Won’t Save You
A lot of potential clients come to us with straightforward and small projects and ask, “Well, can you do Kubernetes?” And we say, “Well, we can, but you don’t need it.”
But they’re afraid that they’ll be missing out on something if we don’t add Kubernetes to the stack. So this is a post to tell you why we probably won’t be recommending Kubernetes.
This post is going to look at three perspectives on this question… First, I’ll consider the technical aspects, specifically what problems Kubernetes is really good for, compared with what problems most smaller software projects actually have.
Then I’ll talk about the psychology of “shiny problems.” (Yes, I’m looking at you, Ace Developer. I promise there are other shiny problems in the project you’re working on.)
And last but not least, we’ll consider the business problem of over-engineering, and what gets lost in the process.
Kubernetes solves specific problems
First off, (I’m sorry to draw your attention to this, but): You probably don’t have the problems that Kubernetes solves. Kubernetes lives at the container orchestration level. It shines in its ability to spin up and down stateless servers as needed for load balancing unpredictable or pulsed loads, especially from a large user base. Large, by the way, is not 10,000… it is millions or hundreds of millions.
Especially for the kind of internal custom services that a lot of our clients require, it is overkill. Many purpose-built sites are unlikely to have more than dozens or hundreds of users at a time, and traditional monolithic architectures will be responsive enough.
Kubernetes is designed to solve the problem of horizontal scalability, by making multiple copies of whichever services are most stressed, routing requests to minimize latency, and then being able to turn those machines back off when they are no longer needed. Even if you hope to someday have those problems, we suggest that you should hold off on adding Kubernetes to your stack until you get there, because the added technical overhead of container orchestration is expensive, in both time and dollars.
(It costs more to build, which delays your time to market, which delays your time to revenue, even if you aren’t paying yourself to build it.)
Which does lead to the question, “Why does everybody want to use this technology, anyway?” For that, we’ll have to take a step back and look at…
The Rise of the Twelve-Factor App
With the shift to the cloud and the desire for highly scalable applications, a new software architecture has arisen that has a strong separation between a system’s code and its data.
This approach treats processes as stateless and independent, and externalizes the database as a separate “backing service.” The stateless processes are isolated as microservices, which are each maintained, tested, and deployed as separate code bases.
This microservices approach decomposes the software into a group of related but separate apps, each of which is responsible for one particular part of the application.
Designing according to this architectural approach is non-trivial, and the overhead associated with maintaining the separate code bases, and particularly in coordinating among them is significant. Additionally, each app requires its own separate datastore, and maintaining synchronization in production introduces another level of complexity. Furthermore, extracting relevant queries from distributed systems of data is more challenging than simply writing a well-crafted SQL statement.
Each of these layers of complexity adds to the cost of not only the initial development, but also the difficulty of maintenance. Even Chris Richardson, in Microservices Patterns, recommends starting with a monolithic architecture for new software to allow rapid iteration in the early stages. (https://livebook.manning.com/book/microservices-patterns/chapter-1/174)
For many of the same reasons, you probably don’t need complex layers of data handling either. Redis, for example, is for persisting rapidly changing data in a quickly accessible form. It’s not suitable for a long-standing database with well-established relations, it costs more to run data in memory than to store it on disk, and it’s more difficult to build.
When you are getting started, a SQL back end with a single codebase will probably solve most of your problems, and without the overhead of Kubernetes (or any of the more exotic data stores.) If you’re still not convinced, let’s take a brief detour and consider the lifecycle of a typical application.
Development, CI, and Upgrades (Oh My!)
Most applications have the following characteristics:
- Predictable load
- Few (fewer than millions of) users
- Predictable hours of use (open hours of the business, daily batch processing cron job at 3 AM, etc.)
- Clear options for maintenance windows
- Tight connection between the content layer and the presentation layer
Contrast this with the primary assumptions in the twelve-factor app approach.
The goal of moving to stateless servers is focused on different things:
- Zero downtime
- Rapid development
This approach arose from the needs of large consumer-facing applications like Flickr, Twitter, Netflix, and Instagram. These need to be always-on for hundreds of millions or billions of users, and have no option for things like maintenance mode.
Development and Operations have different goals
When we apply the Dev-Ops calculus to smaller projects, though, there is an emphasis on Dev that comes at the expense of Ops.
Even though we may include continuous integration, automated testing and continuous deployment (and we strongly recommend these be included!), the design and implementation of the codebase and dependency management often focuses on “getting new developers up and running” with a simple “bundle install” (or “build install” etc.)
This is explicitly stated as a goal in the twelve-factor list.
This brings in several tradeoffs and issues in the long-term stability of the system; in particular, the focus on rapid development comes at a cost for operations and upgrades. The goal is to ship things and get them standing up from a cold start quickly… which is the easy part. The more difficult part of operations – the part you probably can’t escape, because you probably aren’t Netflix or Flickr or Instagram – is the maintenance of long-standing systems with live data.
Upgrades of conventional implementations proceed thusly:
- Copy everything to a staging server
- Perform the upgrade on the staging server
- If everything works, port it over to the production environment
There are time delays in this process: for large sites it can take hours to replicate a production database to staging, and if you want a safe upgrade, you need to put the site into maintenance mode to prevent the databases from diverging. The staging environment, no matter how carefully you set it up, is rarely an exact mirror of production; the connections to external services, passwords, and private keys for example, should not be shared. Generally, after the testing is complete in the staging environment, the same sequence of scripts is deployed in production. Even after extensive testing, it may prove necessary to roll back the production environment, database and all. Without the use of a maintenance freeze, this can result in data loss.
This sort of upgrade between versions is significantly easier in monolithic environments.
But isn’t Kubernetes supposed to make that easier?
It’s tempting to point to Kubernetes’ rolling updates and the ability to connect multiple microservices to different pods of the database running the different versions… but in content-focused environments, the trade-off for zero downtime is an additional layer of complexity required to protect against potential data loss.
Kubernetes and other 12-factor systems resolve the issue of data protection by sharding and mirroring the data across multiple stores. The database is separate from the application, and upgrades and rollbacks proceed separately. This is a strength for continuous delivery, but it comes at a cost: data that is produced in a blue environment during a blue-green deployment may simply be lost if it proves necessary to roll back schema changes. Additionally, if there are breaking changes to the schema and the microservices wind up attached to a non-backward compatible version, they can throw errors to the end-user (this is probably preferable to data loss.)
For data persistence, the data needs to be stored in volumes externally from the K8 cluster, and orchestrating multiple versions of the code base and database simultaneously requires significant knowledge and organization.
A deployment plan for such a system will need to include plans for having multiple versions of the code live on different servers at the same time, each of which connects to its associated database until the upgrade is complete and determined to be stable. It can be done, but even Kubernetes experts point out that this process is challenging to oversee.
When we are moving things into production, we need to have an operations team that knows how to respond when something fails. No matter how much testing you have done, sometimes a Big Hairy Bug gets into production, and you need to have enough control of your system to be able to fix it. Kubernetes, sad to say, makes this harder instead of easier for stateful applications.
So let’s consider what it means to have a stateful application.
When the Data is Intrinsic to the Application
A content management system by its nature is stateful. A stateful application has a lot of data that makes up a large fraction of “what it is.” State can also include cache information, which is volatile, but the data is part and parcel of what we are doing. Databases and the application layer are frequently tightly integrated, and it’s not meaningful to ship a new build without simultaneously applying schema updates. The data itself is the point of the application.
Drupal (for example) contains both content and configuration in the database, but there is additional information contained in the file structure. These, in combination, make up the state of the system… and the application is essentially meaningless without it. Also, as in most enterprise-focused applications, this data is not flat but is highly structured. The relationships are defined by both the database schema and the application code. It is not the kind of system that lends itself to scaling through the use of stateless containers.
In other words: by their very nature, Drupal applications lack the strict separation between database and code that makes Kubernetes an appropriate solution.
One of the things that (we) engineers fall into is a desire to solve interesting problems. Kubernetes, as one of the newest and most current technologies, is the “Shiny” technology towards which our minds bend.
But it is complex, has a steep learning curve, and is not the first choice when deploying stateful applications. This means that a lot of the problems you’re going to have to solve are going to be related to the containers and Kubernetes/deployment layer of the application, which will reduce the amount of time and energy you have to solve the problems at the data model and the application layer. We’ve never built a piece of software that didn’t have some interesting challenges; we promise they are available where you are working.
Also, those problems are probably what your company’s revenues rely on, so you should solve them first.
To Sum Up: Over-engineering isn’t free
As I hope I’ve convinced you, the use of heavier technologies than you need burns through your resources and has the potential to jeopardize your project. The desire to architect for the application you hope to have (rather than the one you do) can get your business into trouble. You will need more specialized developers, more complex deployment plans, additional architectural meetings and more coordination among the components.
When you choose technologies that are overpowered (in case you need them at some undefined point in the future), you front-load your costs and increase the risk that you won’t make it to revenue/profitability.
We get it. We love good tech as much as the next person.
But wow! Superfast!
The fact is, though, most projects don’t need response times measured in the millisecond range. They just need to be fast enough to keep users from wandering away from the keyboard while their query loads. (Or they need a reasonable queuing system, batch processing, and notification options.)
And even if you do need millisecond response times but you don’t have millions of users, Kubernetes will still introduce more problems than it solves.
Performance challenges like these are tough, but generally need to be solved by painstaking, time-consuming, unpredictable trial and error–and the more subcomponents your application is distributed/sharded into, the harder (more time consuming, more unpredictable - by orders of magnitude!) that trial and error gets.
But what if we’re the Next Big Thing?
Most sites are relatively small and relatively stable and will do quite well on a properly-sized VM with a well-maintained code base and a standard SQL server. Minimizing your technological requirements to those that are necessary to solve the problems at hand allows you to focus on your business priorities, leaving the complexity associated with containerization and the maintenance of external stateful information to a future iteration.
Leave the “How are we going to scale” problem for once you get there, and you increase the chances that this will eventually be the problem you have.
We've disabled blog comments to prevent spam, but if you have questions or comments about this post, get in touch!