One thing I have started to pay close attention to when developing Istio is the product adoption curve.
I visualize this as a simple tradeoff of complexity
vs value
: how much complexity do you need to take on to get some amount of value.
Often, this problem is over-reduced to simple statements like "X is too complex" or "Y is simple".
This looks something like below:
Here we can see two ways a product can be sub-optimal, that are not just expressed by "simple" or "complex":
- The red product is too simple: you can get a good amount of value without much complexity. However, there is no way to maximize value; we hit a cliff at some point. In a text editor parallel, this may be something like
nano
. - The purple product is too complex: while we can get incredible value, there is a ton of complexity we need to adopt to get it. In many cases, we need to introduce substantial complexity to even get any value. In a text editor parallel, this may be something like
vim
(though I don't this comes close to the levels of complexity in some enterprise infrastructure software!).
Really what we want is to "make the simple things east, and the difficult things possible". The goal doesn't always need to be to make every feature simple: power users are willing to invest substantial time into configuration/understanding of a product if it gives substantial value in return. However, there does need to be a smooth curve to get users to that point.
Istio ambient
In designing Istio ambient mode, changing our adoption curve was one of the critical design goals. Service mesh sits in a difficult product position in this regard. On one hand, everyone can benefit from the things a service mesh offers - as long as the complexity is worth it. On the other hand, service mesh introduces the opportunity to inject ~arbitrary behavior into the network path for all service communication: the sky is the limit on what enterprises will want to inject here.
Catering to both of these requires a delicate balance of complexity and value.
In ambient mode, we tackled this in a few ways.
First, we made the zero-to-value scenario dead simple. Usually, the starter feature for a user adopting service mesh is either mTLS encryption or observability. Traditionally, both of these required injecting a sidecar into all applications. This huge hump is a serious limit to adoption; the curve doesn't really even look like any curves above, but rather a discrete jump after some non-trivial complexity is onboarded (pictured below).
This also gives you a lot more than just mTLS and observability - things like HTTP load balancing, for instance. While these are generally desirable features, they do also change the behavior of applications, further pushing out the adoption curve.
With ambient mode, we aimed to make the complexity required to get mTLS and observability everywhere as minimal as possible: Istio can simply be installed and enabled cluster-wide, dynamically enrolling existing workloads. This operation does a lot less than the full-blown service mesh sidecar (notably, there is no HTTP proxying), but also comes at a substantially lower cost.
Once we onboard, though, we still need to avoid becoming the "too simple" curve. And, we don't just want a discrete jump from "zero" to "simple" to "complex"; we want a smooth curve where users can incrementally bring on complexity and get corresponding value.
This is where waypoint proxies come in. With waypoint proxies, we give users the ability to pull in the full service-mesh feature set on a per-service basis.
This meaningfully differs from the previous sidecar-based architecture. With sidecars, about 75% of features are implement on the client side; the rest generally require sidecars on both the client and server. This means that if I want to do a canary rollout, for instance, I need to first deploy Istio to all of my clients. This could be thousands of workloads owned by different teams!
With ambient, all workloads in the cluster can easily be enrolled for the basic functionality (mTLS and observability) without requiring dealing with individual workloads owned by other teams. When we want advanced functionality, that is implemented by waypoints created and managed by the service producers. This is critical to the adoption curve: as an app developer, I no longer need to go convince the rest of my company to adopt a feature, I can simply adopt it myself. In other words, the same individuals taking on the complexity are also the ones getting the value.
Putting all of this together, I feel we have offered a very smooth adoption curve for Istio users. Try it out and let me know if you disagree!