The common messaging around Istio Ambient Mesh is that is a "node proxy."
For example, from The New Stack
... architecture that moves the proxy functionality from the pod-level to the node-level.
While this is technically accurate, it is misleading and really missing the point and benefits of Ambient.
A brief history of service mesh architectures
This skips quite a bit of information, but is close enough.
One of the earlier service meshes on the market was Linkerd 1 - not to be confused with Linkerd 2, which most people just call "Linkerd" today. Linkerd 1 was a per-node proxy that did all the service mesh functionality we know and love, at the node level.
After this, Istio 0.1 was announced with a sidecar architecture, and shortly after, due to concerns with the original architecture, Linkerd2 was introduced (formerly, named "Conduit"), using the sidecar architecture as well.
Many years later, Cilium Service Mesh was introduced, which moves back to the original Linkerd 1 architecture: all functionality is now handled by a per-node (Envoy) proxy.
Below shows these two architectures.
Remote proxy, not node proxy
Unlike other "per-node" architectures, Istio ambient is not really about moving the service mesh proxy to/from the pod/node. Rather, it is about removing the service mesh proxy from the node entirely.
This is the critical part of ambient, what makes it so powerful, and what sets it apart from previous service mesh architectures.
By running the proxy remotely, we unlock the ability to treat the proxy as just another workload. It can be easily upgraded without downtime, can be horizontally and vertically scaled (indefinitely) with ease, etc. It can even run outside the cluster, if you wanted.
Isn't there still a proxy on the node, though?
It is true - there is still a per-node component to ambient mesh ("ztunnel"). However, this is not comparable to a traditional service mesh proxy, and more similar to something like
kube-proxy. The per-node component has a tightly scoped responsibility, to secure the transport between pods and remote proxies.
Slicing the layers discusses this more.
A more complete model of the architecture is would be as follows:
The "Node Network Infra", which traditionally consists of some combination of the Linux networking stack, iptables, eBPF, etc is expanded to also include the ztunnel component in ambient mesh. However, this is a minor implementation detail, and not the core focus of what ambient mesh is.
Why not a node proxy?
The architecture of ambient was very explicitly chosen to mitigate the issues that have been found with running both "node proxy" and "sidecar proxy" architectures in real world, production environments.
In particular, a node proxy has a number of concerns. HTTP processing accounts for most of the resource costs, complexity, CVEs, and multi-tenancy concerns in service meshes. By stripping all HTTP processing from the node, and keeping it bare bones, much of the concerns of a node proxy are mitigated.
In particular, scaling a DaemonSet is really hard. In some workloads, a per-node proxy could easily consume >50% of the CPU on a node.
More details can be found in the introduction blog post.
- Running a full per-node HTTP service mesh proxy is problematic beyond "demo" scale. I would expect products taking this approach to run into issues if they start to see larger adoption.
- Sidecars mitigate a lot of these concerns, but bring a new set of issues.
- Istio ambient mode is not a per-node HTTP service mesh proxy, but rather a "remote" HTTP service mesh proxy.