"Zero Trust" is both a very misleading buzzword and a great idea (in that regard, it shares at least 1 attribute with "Serverless"). Despite the confusing name, the principle is sound - nothing should be implicitly trusted (without verifying it), and privileges should be reduced to a bare minimum.

This is in contrast with the more traditional network security approaches which enforce a strong perimeter on the edge of the network, but allow unhindered access within the network (this is the NSA's favorite security model).

The key to zero trust is not removing all access from everyone, but tightly controlling access and verifying, at every layer, before access is given. For instance, the Kubernetes Deployment controller must have access to write Pod objects; it is inherent to its functionality.

However, we should:

  • Verify, on each request, it is actually the deployment controller trying to write to the Pod, not something else.
  • Limit this access only to the deployment controller, rather than giving anyone Pod write access.
  • Understand that, with this permission, this is a highly privileged component that needs to be tightly secured.

In my experience, most of the above is fairly well understood. People are usually doing a pretty good job controlling which Pods can access which things with RBAC for Kubernetes resources. With this, they are able to restrict highly privileged access to only a small subset of applications (things like the certificate authority, GitOps control plane, service mesh control plane, etc).

The dark secret of many of these clusters is unintentionally implicitly trusting every node in the cluster as a highly privileged component. Subtle configuration gaps can easily turn a locked down environment into one where lateral movement can trivially escalate a compromised node into a total environment-wide compromise.

Goal: stop lateral movement

While it would be great to stop any security compromise, short of unplugging your machine, its always a risk. As such, a key component of a secure environment is to prevent lateral movement. Virtually every compromise these days involves chaining a series of attacks. For instance, an attacker may get RCE privileges on a pod and chain this with a container breakout to get privileges on the node. From there, they can escalate to a cluster-wide compromise (more on this soon!).

In my experience, this is the biggest gap in most Kubernetes users security postures -- allowing a node compromise to escalate to a cluster compromise.

The Pod <--> Node boundary is fairly weak, and most attacks start by compromising a users application. These may be poorly written, use outdated dependencies, etc; given a large cluster may run thousands of diverse applications, there is always a weak link somewhere. In some cases, these are even running arbitrary untrusted code .

Node Escalations

Once an attacker has control of a node, they generally have a lot of power. Container boundaries are meaningless -- from the host there is unmitigated access to any pods/containers running on the same node. This includes the pod filesystem, network access, environment variables, and even access to the process memory space.

Importantly, this includes any credentials any Pods on the node have access to. Essentially, compromising the node is equivalent to compromising every pod on that node, and gives access to the union of access of all the pods on that node.

This means if I ran untrusted-user-workload on the same node as gitops-controller (highly privileged), an attacker has easily lateral movement between these.

Based on this, to prevent lateral movement, we should identify our highly privileged workloads and isolate them on dedicated node pools. This is easier said than done, though. Subtle configurations can undermine our posture; below are some examples of how these issues can happen and be avoided.

Node identity done right: kubelet

Among the node's permissions is the ability to connect to the control plane with kubelet's credentials. kubelet can do many things, including reading Secrets, modifying Pods, and creating service account tokens. Fortunately, Kubernetes recognized this and built a specialized authorization model specifically for kubelet: Node Authorization. This limits access to only objects it has legitimate access to.

Essentially, this forms a tree of references bound to the specific node, and only allows access to these objects. For instance, if Pod foo is mounting Secret bar, kubelet will be able to read bar. However, it wouldn't be able to read any other Secrets.

For instance, here I attempt to request a service account token for a Pod, first from a random node, then one running the Pod:

$ docker exec kind-worker env KUBECONFIG=/etc/kubernetes/kubelet.conf kubectl create token default -n default --bound-object-kind Pod --bound-object-name shell-56bd5dbdbf-xhg6v --bound-object-uid 92d3b7d5-ff3c-4d14-8731-2c204b3eed13
error: failed to create token: serviceaccounts "default" is forbidden: User "system:node:kind-worker" cannot create resource "serviceaccounts/token" in API group "" in the namespace "default": no relationship found between node 'kind-worker' and this object

$ docker exec kind-worker2 env KUBECONFIG=/etc/kubernetes/kubelet.conf kubectl create token default -n default --bound-object-kind Pod --bound-object-name shell-56bd5dbdbf-xhg6v --bound-object-uid 92d3b7d5-ff3c-4d14-8731-2c204b3eed13
eyJhbGciOiJSUzI1NiIsImtpZCI...

This special authorization engine is critical to restricting lateral movement with kubelets credentials. Unfortunately, this capability is only available to kubelet; the rest of us are out of luck.

Node identity (often) done wrong: DaemonSets

There is a big, huge, glaring whole in the "node isolation" idea: DaemonSets with Kubernetes RBAC privileges, especially write privileges. While DaemonSets are mostly just like other workloads, they inherently cannot be isolated -- they must run on every node. This makes them a prime attack vector: no matter what node is compromised, you are guaranteed access to anything the DaemonSet can access.

RBAC write access in DaemonSets is pervasive across the ecosystem:

Worry not, I did not just expose 4 unknown security flaws; I specifically picked examples that I knew the projects were aware of and/or had publicly documented.

Past research has found similar flaws on every major cloud Kubernetes platform (fortunately, most are reported as fixed).

Node identity done right: Istio Ambient

In Istio ambient mode, a DaemonSet runs on each node (ztunnel) which is responsible for (among other things), provisioning a certificate for each Pod running on the same node. This is a similar model as kubelet uses, but, as discussed above, kubelet is the only one that can access the Node Authorization secret sauce.

Simply having the Certificate Authority (CA) trust ztunnel to request arbitrary identities would be a major hole in the security model: any compromised nodes could impersonate anyone.

To work around this, Istio does the same thing as Kubernetes! While we cannot rely on the API Server to natively implement Node Authorization for us, the equivalent checks are done in the CA.

This ensures Istio can live up to the claim to provide a "zero trust security solution". Somewhat shockingly, other popular projects advertising similar taglines make this exact mistake!

Wrapping up

Based on the above, I hope I have shown that consideration of Nodes and co-located workloads should be treated with extreme care when designing a secure Kubernetes environment. In particular, DaemonSets with high RBAC privileges are a prime attack vector, and should be avoided where possible.