Istio's installation has a long, winding, complex history, leading to an interesting current state.
In this post, I hope to explain some of the historical context of how we arrived to the current state, and where I think the project is going. This is all my personal perspective and memory of things that happened years ago, so there is likely some divergence from reality.
The Past
When I first started working on Istio in 2019, Istio 1.0 had just been released. The ecosystem was a pretty difference place back then.
Istio's installation consisted of a mega-chart composed of 15 (!) different sub-components. These consisted of both core Istio components (at the time, Istio had a micro-service architecture), and addons (Kiali, Prometheus, etc). Managing so many microservices, with deep interdependencies caused quite the headache.
Helm was very "uncool", at least among the people I was talking with. The in-cluster tiller model wasn't well received, competitors like Kustomize showed a lot of promise, and people did not like templating YAML (this part hasn't changed!).
GitOps and associated projects (Flux, ArgoCD, etc) had started to come out, but hadn't become the de-facto standard. In parallel, operators were pretty hyped up.
All of this set the stage for the initial attempts at improving Istio installation.
Istio Installer
The first real attempt at changing Istio's install was the istio/installer
.
The main goals of this were to make Istio installation more modular, support alternative installation methods, and have a clean set of APIs with some of the historical cruft removed.
This also supported running multiple versions of Istio components, an early form of the "revisions" concept in Istio.
This took the form of a bunch of independent Helm charts that could all run together.
These came with some pre-rendered manifests and kustomization.yaml
to support Kustomize.
This was a good start, but had some issues:
- While providing 15 independent charts instead of one monolith chart is very flexible, it was very tedious to install, especially for simple cases.
- Someone (I'll admit, it was me) pushed for full backwards compatibility with the old charts'
values.yaml
schema. This negated the benefits of a clean API, and actually made things worse: all fields were artificially nested under a key. For instance, to install thepilot
chart you would dohelm template ./pilot --set pilot.foo=bar
-- thepilot.
prefix is totally redundant.
Today, these eventually became the current Istio helm charts.
Istio Operator
In parallel, an effort to build out an Istio operator was started. This was seen as a way to manage the complexity of the vast amount of components required to install Istio. A primary driver for this work was for service providers, where the operator model makes more sense: the platform provides the operator transparently, and users can get Istio simply by apply one resource.
This effort joined forces with the Istio Installer, using the Helm charts as the basis for them. The operator than provided automation and a higher level abstraction over the Helm charts.
Part of this included a CLI mode, essentially running the operator reconciliation as a one-off execution.
This eventually became the istioctl install
command that exists today.
Simplification
In parallel to the install and operator efforts, a number of changes happened in the surrounding areas
- All addons (Kiali, Prometheus, etc) were removed from part of Istio's installation responsibility. Instead, Istio would integrate with deployments that users manage outside of Istio. This drops off 6 of the 15 Helm charts in Istio, dramatically reducing the scope of the problem.
- Istio's control plane was transitioned to a monolithic control plane. This dropped off another 6 Helm charts.
- Istio
With these two, the installation problem becomes radically simplified, really just three charts remain:
- The
base
, which deploys the CRDs. istiod
, which deploys the control plane.- Optionally, the
gateways
, which deploy a gateway.
In the surrounding ecosystem, the Gateway API was developed, which moves deployment of the gateways to runtime, dropping us down to only 2 charts.
While all of this was going on, the installation landscape changed as well.
- Helm 3 was launched, eventually solidifying Helm as the de-facto standard package for Kubernetes.
- GitOps tools like ArgoCD and Flux became extremely common, typically orchestrating Helm charts as a packaging mechanism.
The Present
All of this brings us to the present time.
With only, really, CRDs plus 1 chart to install Istio, and most configuration moving from installation time, installation should be a simple problem. There should be hardly any Istio-specific configuration, aside from a few remaining install-time options; the rest remains the standard customizations one might want to make to an deployment running in Kubernetes (resources, etc).
However, this doesn't really match the reality Istio has a complex installation, driven by decisions made on assumptions that were made obsolete by surrounding changes. This is explored in depth in my previous post.
While the surrounding state changed dramatically over the years, Istio's install largely stayed the same. I think this was largely an overcorrection to feedback Istio had been receiving, that it churned too much, especially in the installation. An easy way to not churn is to not improve anything!
So what can we do about it?
The Future
In my opinion, the goals for Istio's installation are:
- Meet users where they are, and make it easy to use whether they are running one-off demos, multi-cluster installs orchestrated by ArgoCD, etc.
- Simplify as much as possible, and push as much configuration from installation to runtime.
- Cleanup the past tech-debt, as a means to simplify.
- Make install something people don't need to think about (or blog about) anymore.
To get there, we should:
Remove the operator
The Istio operator, today, serves little benefit to users. If it was orchestrating many components, providing sophisticated operational controls, automated upgrades, etc, perhaps it could. However, it does not do these things; it only installs Istio, and a single version at that. Operations with the operator are actually more complex than without, as now you also need to operate, manage, and upgrade the operator itself.
In theory, we could chose to instead revamp the operator to do all of these things really well. However, I don't think this is practical or an optimal outcome.
If we want to meet users where they are, that means integrating with their existing CI/CD and rollout processes. If they use ArgoCD to rollout their apps... great! They should use that to rollout Istio upgrade as well, and we should do work to make that experience great.
If we instead build all of this into our own operator, we are building our own "island" of Istio behavior. Even if we manage to build something better than ArgoCD for Istio in isolation (which is a big if, given the years of development from thousands of contributors on the project!), it still may be worse for a user due to being inconsistent with the rest of their environment.
Based on all of this, I believe the best outcome is for Istio to remove the operator.
Cleanup istioctl install
Currently, the Istio install codebase houses a bunch of common installation logic shared by the operator and istioctl install
.
If we remove the operator, that just leaves istioctl install
.
Ultimately, this command is just orchestrating Helm charts and applying them to the cluster - something Helm is very good at. However, in doing this, it re-invents a lot of ideas from Helm without adding much value. While there are some nice additions relative to Helm, over the years we have been able to approximate them in Helm directly. Additionally, as mentioned above, having a command with decent functionality but consistency and compatibility is likely preferred to having slightly better functionality, at the cost of consistency.
Ideally, the implementation of istioctl install
is an increasingly small wrapper around helm install
, delegating to helm
for as much operation as possible.
Currently, we only use Helm code to render the Helm templates, but have a large amount of complex logic around merging of configuration, validation, and applying configuration to the cluster.
istioctl install
off-ramp
If we consolidate our logic on helm
, istioctl install
isn't providing that much value.
A slightly convenience and polish at best.
This is a good thing -- it means we moved the value into the Helm charts, so Helm users can get all the benefits; it doesn't mean we are removing functionality from istioctl install
.
At this point, users may be better off using the Helm charts directly, likely orchestrated by GitOps tooling.
We should provide an easy off-ram, so users can migrate without having to re-build their configuration from scratch.
I don't think this means we should remove istioctl install
; just make it simple for users to migrate to more powerful tooling when the time is right for them.
Consolidate on the Gateway API
One of the key simplifications to the installation is splitting out Gateway installation with the Gateway API. While all of this already exists, adoption is still in early phases. As adoption of this grows, perceived complexity in Istio installation should correspondingly decrease.