Across infrastructure projects, there is generally some range from simple to complex. On one extreme, we have products that can be used off the shelf, are typically somewhat simple, and are opinionated. On the other extreme, we have "platforms for building platforms"; they may require days or months to setup, but offer immense customization - rather than being standalone products on their own, these are the building blocks for products.

In theory, you can do both well -- make the easy things easy, and the difficult things possible -- but in practice this is tricky, as the two often are directly in contention with each other.

I'll make up some terms for these and calls these standalone products vs extensibility platforms.

Current landscape

Most projects are somewhere in between these two extremes:

  • Kubernetes on its own, I would put pretty far in the extensibility platform category mostly - and it does an amazing job at this. Few people are doing Kubernetes the hard way (and if they are... maybe they should rethink their strategy). Interestingly, though, the ecosystem around it is vast enough that you can really consume Kubernetes pretty much anywhere along the spectrum. With something like kubeadm or k3s you are pretty close to raw Kubernetes but with a feasible operational overhead. With higher level tools like GKE AutoPilot, Knative, etc you are getting closer to standalone product territory. On top of that, there is tons of increasingly high level abstractions over the Kubernetes primitives.
  • Envoy is extremely on the extensibility platform side. Its mostly unusable without a control plane to drive it, and the configuration is extremely unfriendly to humans. In my mind, its almost more of a declarative programming language for proxies.
  • Linkerd, on the other hand, is more of a standalone product. It is opinionated, ready to go out of the box, and has a fairly small configuration surface. That being said, it is slowly adding on features over the years while trying to maintain its standalone-product value.
  • Istio seems to straddle both. You can simply istioctl install and be on your way. On top of that, there is a pretty vast configuration surface allowing using Istio as a platform for platforms, and this is a common pattern. However, I feel that both aspects are poor due to not having a clear vision for where it falls in this spectrum. The easy product usage is blurred with the complex use cases, and the complex use cases are hard to use and/or not flexible in an attempt to not overly complicate the easy product usage. In the end, both personas have a suboptimal experience.

In this post, I will focus on what it would look like to build a service mesh that tries to focus exclusively on extensibility and being a platform for platforms. The inverse, an extreme on simplicity, is much less interesting and has far more prior art.

Existing Extensibility

Before we build our own extensibility platform, its good to look at the prior art.

First class configuration is the most common form, and typically easiest to consume. For example, Kubernetes has a Service API. Envoy has a variety of configurations to set up Listeners, Routes, etc. Linux has routing tables.

While these are easy to use, they are inherently opinionated in some form. If you need to build something outside of these opinions, you usually either cannot do it, or need to resort to arcane abuses of the configuration.

For example, Envoy has a pretty opinionated request flow. The opinion is reasonable and covers the vast majority of proxy use cases, but not all of them. In some cases, we may actually want to traverse this flow multiple times. Envoy actually provides a first class way to send a request back to itself to cover this use case. While its great this is possible, its a signal that we are pushing the boundaries of where opinionated configuration can be useful, and may benefit from deep extensibility.

Plugin models are the next level of extensibility. This could be WASM, Lua, linked in libraries, eBPF, etc. All of these allow running user code (generally with some limits) as part of the product.

However, just because these allow executing (possibly arbitrary) user code, this generally doesn't mean there is complete extensibility. Like with configuration, there are still limits determined by the ABI between the application and the plugins. For instance, the plugin will be called at certain times in the application lifecycle with certain data, and be able to mutate certain things.

If the ABI's opinions don't meet the users, we run into the same problems as with configuration. That being said, its much easier to design an ABI that meets all user cases compared to a configuration API.

An often worse form of this is a service-callout mechanism. Rather than embedding code into the application, the application simply makes a call to another service. This has the same ABI limitations, but has a massive performance overhead, and introduces reliability issues. This is, however, required in some cases where local processing is unfeasible.

Service Mesh as a Library

The existing extensibility models are fundamentally limited as they revolve around plugging users behavior into an existing application. This is inherently opinionated and limited.

If we invert this model, we can achieve limitless extensibility. Instead of plugging the user behavior into the infrastructure, we can plug the infrastructure into the user application.

This is not exactly a new idea. Any programming language meets this model. go gives you a net.Listen and net.Dial, you are free to go build your own service mesh on top of this and put whatever functionality you want. However, this basically means every user is implementing an entire service mesh, which is not feasible for the vast majority of users.

Our goal is to achieve this level of customization, but with dramatically reduced barriers to entry.

Library APIs

To build out a platform that can be used easily for simple cases, but also be infinitely customizable, we need a cohesive mix of high and low level APIs.

For example, at the highest level, an application could look like:

func main() {
  highlevelapi.RunSidecar()
}

A user can compile this code and run it wherever they like. It is a little more friction than a standard off-the-shelf product you can install with helm install, but really not dramatically so.

Building blocks

As building blocks, we need to provide basic primitives.

There are two unique things we want in these primitives to build an effective service mesh:

  • Easy access to dynamic data. We will want to have up-to-date access to core types like Services and Endpoints. We may also want access to first-class APIs defined by the project, such as an AuthorizationPolicy (if we use Istio APIs). We may additionally want user defined configuration objects as well (more on this later).

  • Declarative networking APIs. We don't want to write code like:

    l := net.Listen()
    for downstream := l.Accept() {
      upstream := net.Dial("127.0.0.1")
      io.Copy(downstream, upstream)
    }
    

    This imperative model makes it difficult to handle dynamic data. For example, if we want to listen on a bunch of ports defined by Service objects, this model is extremely hard to reconcile. The code would need to subscribe to these changes and manually synchronize these with the set of listeners -- quite challenging. While we should allow this style (remember: we want unlimited extensibility), it shouldn't be encouraged.

    Instead, we can provide a more declarative model. For example:

    listeners.RunForEach(func() (res []Listener) {
      for s := range Services {
        res = append(res, Listener{Address: s.IP, Action: AcceptConnection})
      }
    })
    

    This could dynamically watch changes to Services, and reconcile the set of listeners under the hood for us automatically.

Some example building locks we would have:

  • Fetch a workload x509 certificate.
  • Fetch a collection of x509 certificates, based on some dynamic inputs.
  • Build a simple listener.
  • Build a collection of listeners from some dynamic inputs.
  • Build a collection of listeners from some dynamic inputs, terminating TLS with some certificates fetched based on some dynamic inputs.

The above obvious captures a trivial amount of the surface area that would be required, but should show how we can build up a set of increasingly high level building blocks. Users can pick at what layer they need to target. At the extremely low level we have the programming language standard library, while at the extremely high level we have something like RunSidecar(); most usage would fall in between those.

Dynamic data

One of the more powerful aspects of service meshes is its dynamic nature. Aside from just getting dynamic service discovery information, generally dynamic configuration is allowed as well. This makes it much easier to operate -- instead of deploying and rolling out a new application version, we simply tweak some Kubernetes object and it is instantly applied.

While this may seem counter to putting everything in code, it isn't. In fact, the two pair extremely well.

One of the biggest issues with configuration models is they are either too complex to understand, or to simple to meet use cases easily. For instance, Istio offers an API to configure a connection limit on a per-service basis. This is great if you want to configure one services limits, but if you want to set the same default limit for all services, you suddenly have 1 DestinationRule per Service. Under the hood, this translates to 1 connection_pool in Envoy per cluster. The inefficiencies and complexity here can be staggering. If we already know we want the same connection limit for all services and never intend to change this, we ought to build this directly into the product and avoid duplicating the same configuration over and over again, which has high cost everywhere (storage, networking, CPU, human operational costs, etc).

A real world example of this can be found in Istio. In Envoy, we need to configure the same TLS settings possibly 1000s of times, because it is generic. Envoy allows you to use arbitrary ciphers, TLS versions, verification methods, etc - so we need to configure it each time. In Istio, we always do the same thing in these cases. In ztunnel, we instead just replaced this with protocol: TLS. However, this optimization was only possible because the mainline project was opinionated in this regard - which directly goes against extensibility, in favor of a better product.

What we want is to let users make the decisions on where to be opinionated. This can be done by giving them dynamic data building blocks.

Dynamic data core

Fundamentally, we want the application to have access to some dynamic data. This could be exposed over some interface like:

type Collection[T any] struct {
  Get(key Key) *T
  List() []T
  Watch(Handler)
}

Under the hood, the library can handle dynamically reading this data. The source of this data could be from a control plane (XDS, Kubernetes, etc), local files, or anything else the user dreams up. This is similar to Kubernetes Informers. I will mostly focus on fetching data from a control plane over XDS, as this is closer to the current model for Istio.

With this model, the baseline would be to expose standard types. This could be core Kubernetes objects (Service), Kubernetes extensions (ServiceExport, HTTPRoute), or mesh-specific types (AuthorizationPolicy). However, these alone are not sufficient.

Perhaps our proxy only needs to know the number of AuthorizationPolicy in the cluster (seems weird, but if we are going to be really extensible, weird is normal). Sending the entire list of AuthorizationPolicy is incredibly wasteful.

Above we discussed a global default for connection limits. If we simply exposed DestinationRule, we don't help here -- the user needs to configure 1000s of these, and we send them all over the wire. But the user may also want a dynamic default. Ideally we can just send that single integer value, but dynamically.

Both of these cases can be done by allowing the users to create their own custom types.

Custom types

Users custom types may be completely standalone types they define, or some server-side aggregations. A custom type may be WAFConfig or something. An aggregation may be "Number of AuthorizationPolicy.

At the basic level, we could just make the user manually define these as CRDs, and build their own control plane to handle these. However, like with the proxy libraries, I think we could build out similar control plane libraries to do the same on that end.

For some, this is trivial. For example, if we just want to pipe all WAFConfig from Kubernetes to XDS, its pretty simple. We may also want to add other things like authorization rules, filtering, etc on this.

Aggregations, of course, entail some more logic.

If we wanted to go crazy, we could define this logic in the proxy library directly, like React Server Components, and use this to build the control plane as well -- but this is probably a bad idea.

Crossplane has a pretty extensive custom type strategy that could likely serve as inspiration.

Proxy types

In this post, I discussed the various types of proxy deployments, and the tradeoffs each one makes. This "Service Mesh as a Library" pattern could be used to power any of these deployment modes -- possibly with very little overhead to move between these.

This would allow users to chose the optimal deployment method for their application. For instance, they may have a database that requires maximum performance; compiling the mesh directly into their application can help achieve that. They may also have a legacy application they cannot modify, where they would prefer a sidecar deployment. With some efforts, its plausible that the same (or majority) of the code could be shared between these disparate use cases.

Prior Art

This idea is not entirely new. Caddy actually does most of what is discussed here. While it offers prebuilt binaries with an opinionated set of functionality, it encourages building from source to add arbitrary plugins and has tooling to make this simple. There is even a website to pick the plugins you want and get a single static binary with the appropriate plugins. Note these plugins are not simply restricted to middleware and can really extend to full applications.

A lot of what Caddy does here is substantially overlapping with these ideas. Its plausible a real implementation of this idea could be built on top of Caddy, or at least re-use large portions.

End To End Example

Below shows a possible end to end example of implementing a simple server-side sidecar mesh. Note the code is very much pseudo-code.

Our intended behavior is:

  • Redirect all traffic coming into the pod to our application.
  • Accept all traffic. We want to accept traffic only on ports we have Service, then apply authorization policies based on these services.
func main() {
  // Information about where we are running is easily accessible. Here we get the pod labels.
  app := environment.Labels["app"]
  // Setup IPTables rules
  redirection.SetupRules({
    Inbound: redirection.RedirectTo({Port: 15006})
  })
  // Create a listener. Here we just need a static one.
  listener := NewListener({
    Address: "0.0.0.0:15006",
    Handler: HandleInbound(redirection.ExtractOriginalDestination())
  })
  // Find services applying to our app, and authorization policies applying to any services applying to our app
  services := collections.Subscribe[Service]({Selector: "app={app}"})
  authorizationPolicies := collections.Subscribe[AuthorizationPolicies]({Selector: MatchesAny(services)})
  application.Run()
}

func HandleInbound(ctx Connection) {
  svc, found := FindService(ctx.List[Service](), ctx.DestinationPort())
  if !found {
    return CloseConnection()
  }
  policies := ctx.List[AuthorizationPolicies]({Selector: Matches(svc)})
  if !Allowed(ctx, policies) {
    return CloseConnection()
  }
  destination := net.Dial(ctx.Destination())
  return BidirectionalCopy(ctx.Source(), destination)
}