Last Updated for: Istio 1.25.

Istio has a huge variety of features it offers, but not a lot of opinions on which you should use. In this post, I hope to cover which features I think you should and should not use in Istio, based on my experience with users using the feature and issues they do/do not run into, my knowledge of the inner workings of the feature, and subjective gut feelings.

The hope is we don't end up with a "JavaScript: The Good Parts" situation.

JavaScript: The Good Parts; a short book
JavaScript: The Good Parts; a short book

Rubric

  • Adopt if needed ✅: these features are generally reliable to be used broadly. However, any feature brings some risk and complexity, so its best to avoid anything that isn't needed; there is no "Always adopt" category intentionally.
  • ⚠️ Adopt with caution ⚠️: these features are reasonably reliable, but come with some caveats. In general, they are safe to use if these caveats are acceptable.
  • 🛑 Avoid if possible 🛑: these features are not very reliable, and should be avoided unless there is a very compelling need for the feature. When doing so, use caution.
  • ☠️ Avoid at all costs ☠️: do not use these features!

Ambient Mode

Adopt

Since its promotion to GA/stable in Istio 1.24, ambient mode is officially marked as ready for widespread production usage. I agree with this recommendation for most users and would recommend it as the default choice unless you have a reason not to use it. Ambient has proven itself in production usage to be extremely stable.

However, given it is relatively newer, there is currently a slightly higher risk of issues with atypical environents. These are generally issues like "ambient won't install on my custom Linux kernel" not "ambient fell over when I had a load spike", which are the types of issues that are not a risk, and usually have quick fixes once detected. As ambient gets thrown into all of these obscure environments that sidecar already has, these issues will fade away over time (and, for the most part, already have).

There are some gaps (relative to sidecars) in features that you should consider as well. These include VirtualService, EnvoyFilter, multi-cluster, and a zero-downtime migration from sidecars to ambient. However, all of these are supported in Gloo Mesh.

Lastly, if you are an existing happy sidecar user that doesn't have a budget for a migration, its a reasonable choice to stick with sidecars.

Gateway API

✅ Adopt ✅

The Kubernetes Gateway API is the next-generation traffic routing and management APIs in Istio, effectively replacing the older Ingress API and a number of third party APIs, including some of Istio's.

Currently, Istio Gateway and VirtualService have suitable replacements. In the future, parts of DestinationRule will as well.

While most Istio APIs and Gateway APIs can co-exist, you will need to use either Istio Gateway + VirtualService or Kubernetes Gateway + HTTPRoute within a single scope. A scope means a single gateway deployment, or in mesh routes. So, for instance, you could use VirtualService for ingress and HTTPRoute for mesh, though this likely would be complicated. Regardless, you can use other APIs like DestinationRule or AuthorizationPolicy.

The Kubernetes API is more flexible, easier to use, and has much more community investment, making it the clear future path. However, there are still some rough edges as it matures:

  • Istio gateway deployments do not expose full customization, though this is on the roadmap for 1.26.
  • Not all features of Istio have correlated fields in the Kubernetes APIs. However, the majority of them are, and the list grows smaller each day.
  • There is currently a smaller ecosystem (Helm charts, etc.), but this is likely to be the opposite in the future.

If these are blockers currently, I would recommend holding off on adoption but keeping an eye out for when the threshold is crossed to consider migration. For anyone else, I would recommend the Gateway API.

Installation

Istio offers a variety of different installation options.

  • Istioctl install: ✅ Adopt ✅
  • Helm install: ✅ Adopt ✅

Istioctl and Helm are roughly equivalent in stability; use whichever fits best in your environment. Helm tends to integrate much better with other tooling like Terraform, ArgoCD, etc, so is a reasonable first choice. The Istio Operator (not to be confused with the IstioOperator API passed in to istioctl install), is an extremely bad option.

Prior to Istio 1.24, there was an in-cluster operator. This has since been removed. If you are interested in an operator, there are a variety of third-party solutions which I am not qualified to judge. More discussion on operators in general.

  • Multi-cluster: ⚠️ Adopt with caution ⚠️
  • Multi-network: 🛑 Avoid if possible 🛑

Multi-cluster and Multi-network are two common features that draw users to Istio. Both are very powerful, mature, and widely adopted, but come with some risk. Multi-cluster changes the routing behavior of Kubernetes in risky ways; any Service with the same name in another cluster is merged, so you will get cross-cluster routing without an explicit opt in. Multi-network, on top of the above risk, with some additional ones. Multi-network sends all cross-network traffic through a TCP proxy. This negatively impacts load balancing, and places a high single-point-of-failure into network communication.

For many users, the benefits of these are well worth the risks. However, its best to make sure you actually need these features before you buy in to these features.

Note: Gloo Mesh offers an alternative multi-cluster implementation on top of Istio that addresses all of these issues with increased security, reliability, and scale.

  • External Istiod: 🛑 Avoid if possible 🛑

This is a fairly niche feature, but if you have a large scale deployment it can be a convenient way to manage Istio. Do not use this if you are small scale. This is really only worth it if you have a centralized infrastructure team that needs to host large numbers of distinct Istio installations, and operationally benefits from these being in a single cluster. However, given most clusters will need infrastructure beyond Istio, which likely must run within each cluster, this benefit becomes even more niche.

  • Revisions and Tags: ✅ Adopt if needed ✅

In general, these are great features to reduce risk and increase ease of upgrades. If you plan to upgrade Istio, you should use these. If you don't plan to upgrade Istio... rethink your plans?

  • CNI: ✅ Adopt if needed ✅

Istio CNI reduces privileges on pods and is required for ambient mode. There are some operational risks to be aware of, but they have mostly been resolved in later versions.

Networking

  • DNS Proxy: ✅ Adopt if needed ✅

Istio's DNS Proxy feature offers a variety of substantial benefits.

This is especially important for ambient mode (where it is enabled by default) due to an increased reliance on DNS Auto Allocation (next section) for egress.

In that past, I marked this as discouraged, on the basis that "It's always DNS". However, in practice, I cannot recall a single user issue tied to the DNS proxying in many releases since a few reliability improvements were added, making me comfortable recommending it more broadly.

  • DNS Auto Allocation: ✅ Adopt if needed ✅

Prior to Istio 1.24, I marked this as "🛑 Avoid if possible 🛑". However, Istio 1.24 comes with a new implementation solving a number of issues with the original implementation, making me a lot more comfortnable with this feature.

However, fundamentally, it is returning bogus IP addresses to the application (which is then translated by the proxy); applications might do strange things when they get bogus IP addresses -- in theory. In practice, I haven't seen any applications behave unexpectedly as a result of this.

  • Locality Load Balancing: ⚠️ Adopt with caution ⚠️

This is a great feature, but is surprisingly hard to operate safely and correctly. Generally, the pros outweigh the cons (cloud egress $$$), though.

Be sure to watch out for unbalanced workloads across localities, and remember that outlierDetection is required.

In Istio 1.25+, the new Service trafficDistribution field can be used as a lightweight alternative that offers similar benefits without as much complexity.

  • Egress Gateways: ✅ Adopt if needed ✅ if using ambient mode

With sidecar mode, egress gateways are so hard to setup correctly I cannot recommend them. However, ambient mode comes with a complete rework of the feature which is dead-simple to use.

If you need to apply policies to traffic leaving the cluster, this is a good option.

Security

  • Mutual TLS: ✅ Adopt if needed ✅

This is, for many, the bread and butter of Istio. While any feature adds some risk, this is one of the most battle tested aspects of Istio. Better yet, it doesn't require any configuration to get started (though it can be improved with Authorization Policies).

  • Authorization Policies: ✅ Adopt if needed ✅

Overall these go very well with Mutual TLS, and are critical to building a secure mesh. One caveat is that, especially in gateways, configuration generated by these rules can add up to become quite expensive.

  • JWT Request Authentication: 🛑 Avoid if possible 🛑

While this features have pretty compelling use case, it comes with quite a bit of risk and runtime dependencies that are not fully understood or documented. Where possible, I would avoid this features, or adopt cautiously.

I would caveat this that I don't have a ton of real world experience with this, however, so this could be a more fear-based reaction than an educated one.

  • External Authorization: ⚠️ Adopt with caution ⚠️

This feature is incredibly powerful. However, it can be complex to use (if you are building a custom authorization server), and introduces an additional per-request runtime dependency which introduces risk.

If you are using this, I recommend evaluating whether local authorization mechanism are sufficient first.

Extensibility

  • WebAssembly (WASM): 🛑 Avoid if possible 🛑

WASM in Istio has lingered as "alpha" for quite a while, and introduces risks around binary distribution, performance, and instability in the WASM runtime. While its probably the best extension mechanism in Istio today (see below for a worse option), its still best to avoid unless its critically required at this point.

  • EnvoyFilter: ☠️ Avoid at all costs ☠️

EnvoyFilter is, objectively, the worst feature in Istio for stability. Essentially, it gives arbitrary patching into Envoy code. An analogy would be to provide a fast-moving project a git diff that is patched dynamically and recompiled; EnvoyFilter is only slightly more stable than that. In addition to risks of breakage, particularly around upgrades, safe usage requires a deep understanding of Envoy, which is surprisingly hard.

That being said, EnvoyFilter brings great power along with it. I would urge you to resist the temptation to use it as much as reasonably possible.

  • Rate Limiting: ☠️ Avoid at all costs ☠️

This feature is not particularly mature (it depends on EnvoyFilter!); best to avoid for now.

Observability

  • Tracing: ✅ Adopt if needed ✅
  • Metrics: ✅ Adopt if needed ✅
  • Access Logs: ✅ Adopt if needed ✅

All of these are extremely useful, stable, and pretty easy to use. Just keep in mind they aren't free, and come with some overhead in the proxy (in terms of memory usage and latency) as well as in your storage solution (this would vary considerable depending on how you store the data, of course).

If you don't have existing infrastructure for these, I would generally recommending the following providers:

  • Tracing: OpenTelemetry Protocol.
  • Metrics: Prometheus (note: this could be scraped by other tools, but using the Prometheus endpoints).
  • Access Logs: Log to stdout.