LTS and Rolling Releases

Across the ecosystem, a variety of software support policies can be found, where "support" can mean, bug fixes, security patches, and sometimes technical support.

However, they can be roughly categorized into two types:

Rolling release: only latest release supported. At an extreme, this is the HEAD git commit.
LTS (Long Term Support): Support latest N versions. Sometimes specific versions have extended support. "Long" is relative: Kubernetes supports a version for around a year, while RHEL does for 10 years.

Enterprises love LTS

Historically, enterprise users have favored -- if not demanded -- LTS software.

And they pay good money for it - RedHat makes $1B/year off this business model.

Why?

On the surface, its fairly obvious. A company can pick a version of the software once, and not worry about it for years. This is especially appealing if a modern CI/CD workflow is not in place; the entire system can be manually qualified by QA on specific unchanging versions and shipped.

Overall, I think the desire can be broken down into wanting:

to have security issues patched.
to technical support (also known as "someone to blame").
to have bugs patched.
"stability" (behavior of the software not changing)

Many of these are often compliance requirements (most notably, security patching).

The upgrade paradox

LTS has a perception of "ship it and forget it". A huge part of the allure is that you don't need to constantly update it, as it is supported for years.

However, 2 of the 4 reasons for using LTS are so that it can be upgraded. Bug fixes and security patches don't just magically appear, users need to upgrade to the fixed versions.

So the common misconception that users use LTS software to avoid upgrades is wrong (in most cases). Users do want to upgrade, they just only want some upgrades.

So what upgrades do they want? Knowing most users, most likely "the exact minimal set of bug fixes that impact me and nothing else."

More realistically, we could say "just bug fixes and security patches", leaving out additional features, behavioral changes, performance improvements, refactorings, and other miscellaneous changes.

Bug free?

Note: this section mostly refers to large, actively developed softwares like Istio and Kubernetes. The same probably doesn't hold true for something like grep.

The amount of changes merged into active projects is staggering. The broader Kubernetes org merges nearly 50,000 PRs per year!

At this rate, its simply unfeasible to backport every bug fix. Instead, only a small subset are backported. This typically is bugs that are obviously critical in the open source project, then additional bug fixes that impact certain customers.

So while LTS software is often lauded is "stable", it could be better described as "static". By the time a version is near the end of its support cycle, it could have thousands of known bugs.

Backport risk

One of the main reasons for using LTS is to reduce risk. By only consuming a smaller portion of changes, the risk of upgrading a patch is perceived as smaller. In many cases, that is probably true.

However, the risk of a bug fix is not the same as the risk of backporting that bug fix. While the original bug fix may have been low risk, the same cannot be said for the backport. The further back you go, the more the codebase likely has changed, changing the context of the bug fix. Assumptions made during the original fix may no longer be sound.

This is especially true for vendors patching upstream software, who may be pretty far removed from the development on the upstream software. Vendor forks additionally increases the drift from the upstream with each custom patch, further introducing risk.

This is not theoretical issue, and has caused panics in ubuntu and CVEs in Istio.

Maintenance costs

Probably the last thing a customer paying a vendor $$$$ for support cares about is how hard the vendor needs to work for that money. But, it does have a cost for everyone involve -- including the customer.

Maintaining old software is a lot of work. A bug fix that takes an hour to write the first time may take days to re-work to fit into older versions. This is time that could be better spent actually improving the software in the first place.

Actually supporting the software is almost worse. In my experience, most issues with customers running old software ends up being a bug that has already been fixed on newer versions, or at least has been made significantly more debug-able.

This is a terrible experience for everyone involved. It is a huge waste of time for the maintainers and users, and gives a bad perception of the software to the user.

It is also demoralizing for maintainers. Essentially, customers running older versions are saying "Those years of 40hr work-weeks spenting improving the software we are running? I don't want any of that garbage. Now please, help me debug the issue you already fixed during that time." Do they pay good money for the privilege to demand that? Yes. Does any of that money actually go to the person maintaining the software? Unlikely.

Big bang upgrades

One of the benefits of LTS is (allegedly) less upgrade pain. However, often the opposite occurs. As a user stays on increasingly old software, the mainline development drifts further and further from their old version. As this happens, the upgrade becomes harder and harder. Its not actually an "upgrade" that is challenging (in most software, anyways); its about the changes between the two versions that need to be adapted to. As the gap grows, so do the number of changes, and thus the difficulty of upgrade.

In fact, this may perpetuate a vicious cycle. As the cost to upgrade increases, so does the time needed to execute it. As the time-to-upgrade increases, so does the need for longer LTS!

For example, a customer may complain they need at least 2 year of support since it takes them 12 months to fully test and rollout a new version, and they don't want to be constantly upgrading. But once they get their 2 years of support, the upgrade is even more challenging, and takes 18 months. Better change that to a 3 year support window.

This is basically why CI/CD is a thing. Google has a nice case study on this.

A better option?

If we look deeper, we can see there are two classes of "LTS" customers:

Those who actually want to upgrade, but want to have upgrades be low risk.
Those who refuse to upgrade, but need a support contract.

Treating these as one category is a bad idea.

For type (2) customers, why waste time backporting bug fixes and security patches, when the customers won't upgrade anyways? Just provide support for whatever version they want and call it a day. The cost to support them will be higher and the experience for the customer will likely be worse, but if they pay enough... oh well.

For type (1) customers, clearly we need to provide new versions. Specifically, we need low-risk upgrades that fix bugs and security issues.

But I think LTS is a poor way to achieve this:

Backports are high risk, and increasingly so as the number or age of backports increases.
Users may get some bug fixes, but certainly not all of them, leaving them at risk to known-bugs.
Similar to above, there are plenty of reliability and performance improvements that are missed.
Huge time sink for maintainers; this time could be better spent improving the software.

Instead of dealing with the issues of LTS, I think a better investment could be made into making rolling releases meet enterprise requirements.

Rolling releases for enterprises

If we look at the requirements for enterprise users, rolling releases actually meet most of them. Bugs and security issues are patched, and support can be provided for them.

Yet, if I pitched a rolling release to a Big Bank, I would probably get laughed at, and it all comes down to "stability".

As a thought experiment, imagine if I backported every change. I would have v1.0, v1.1, v1.2, ..., but they would all contain the exact same code (sans a version number).

In some ways we have created the best LTS release - not only do we have every bug fixed, with lower maintenance cost, we also get some nice miscellaneous improvements.

But... we also have new features and behavioral changes introduced. Also, probably new bugs introduced - but lets pretend we are perfect for now, and do not write bugs.

So what if we just turn off the new features and behavioral changes? LTS has always been based on the idea of shipping multiple different codebases, but why not just multiple different configurations of the same code?

If every "behavioral" change is gated behind a flagging mechanism, the same codebase/binary can be ran in different modes to "pin" behavior for a while. Instead of the risk/cost being the process of upgrading the binary, the risky aspect would be changing the flag.

This is not an entirely new idea. Rust has a concept of editions, and Go has started to gate new behavior behind the go.mod version. Both of these projects encourage users to upgrade immediately and prioritize making upgrades easy (although Go still supports 2 versions at a time).

This does rely on an assumption that new functionality is correctly flagged, and that new bugs are not introduced. Neither of these are trivial. However, I think these risk ultimately are smaller than the issues LTS introduces.

Enterprises love LTS#

The upgrade paradox#

Bug free?#

Backport risk#

Maintenance costs#

Big bang upgrades#

A better option?#

Rolling releases for enterprises#