When trying to understand the Istio ambient mode architecture, one common misunderstanding is how Ztunnel handles traffic.
Because it is deployed as a per-node pod (a DaemonSet
), the assumption is often that it behaves like a traditional proxy running on a node would - that is, traffic leaves the application pod, traverses the network stack into the Ztunnel pod, and then is handled from there.
This usually comes with a set of concerns like "Is traffic between pods on the same node encrypted?" and "Is traffic traversing the node unencrypted before it reaches Ztunnel".
Fortunately the reality of the implementation avoids these issues - but also makes understanding the architecture a bit more complicated. I like to look at this as viewing the architecture through two distinct views: the compute view and the networking view.
Compute view
The compute level view is the simplest to understand.
If we look at a given node, we have 1 Ztunnel Pod
, which runs 1 Ztunnel container, which has 1 Ztunnel process.
The node, of course, maybe have a variety of other Pod
s also running on the same node.
Networking view
Based on the compute view, it is natural to assume the networking paths look like this:
In this model, traffic flows across the node un-encrypted to Ztunnel. From there it will remain un-encrypted when connecting to pods on the same node, or be mTLS encrypted for cross-node traffic.
However, this is not correct. The reality looks something more like this:
This looks a lot like the sidecar service mesh architecture. Mesh traffic is handled entirely within the application Pod, entering and leaving the pod as mTLS traffic regardless of if the destination is on the same node or another node.
Under the hood
This split view gives us the best of both worlds: we get the resource savings and operational benefits of a per-node instance, but the desired semantics of a sidecar. The innovation that enables this split-view lies in some networking shenanigans.
In Kubernetes, each pod gets its own network namespace. Typically, a process is spawned within a pod simply runs in this network namespace throughout its lifetime, totally unaware that it even is in a network namespace at all. However, Linux provides the ability (with sufficient privileges) to change namespaces.
Ztunnel utilizes this to temporarily enter each pod it manages and open listening sockets within those networks. So if there were two other pods on the node, Ztunnel would have 2 different listening sockets, one for each pod.
Ztunnel can then concurrently accept connections from both of these pods, despite it being only a single process.
Outbound connections from Ztunnel follow a similar pattern: Ztunnel will first enter the network namespace of the pod the connection should come from to open the socket.
See my talk for a more in-depth discussion.