Access control policies play a central role in creating a secure environment, enabling the "Principle of Least Privilege" and "Zero Trust". At a very high level, these allow expressing policies of the form <thing> can do <thing> if <conditions>.

In Kubernetes, there are two separate access control policies: RBAC Authorization, typically used to control access to the API Server, and Network Policies, the topic of this post.

Network Policies allow expressing which <thing> can communicate (over the network) to which other <thing>s. A simple example restricts access to pods with the app: database label to only accept traffic from Pods with the role: admin label:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: only-admin-can-call-me
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: admin

While this looks great, subtle issues with the API make it challenging to use in a secure and scalable manner, both for implementations of the API and users of the API.

Its all about identity

The root of all the issues come down to how we express the <thing>s we want to control in the API. NetworkPolicy allows referencing groups of Pods, and it does so by a combination of a Pod label selector and Namespace label selector.

A policy is only as secure as the identity it applies to. As a real world parallel, an office door likely has a security checkpoint with a security policy: "Only employees may enter". The strength of this checkpoint varies substantially based on how we identify "employees" - swiping a badge, wearing a company T-Shirt, and a retina and fingerprint scan are all options here with wildly different outcomes.

The same applies to NetworkPolicy.

Pod labels are not a great identity

The problem with labels is they are not really meant for securely identifying some group (or, at least, they are not used in such a well). There are a few problems here.

Pod labels are generally not access controlled. Many organizations adopt forms of access control, whether by gating changes behind CI/CD and human reviews, blindly trusting a subset of developers to have access, or runtime policy checks (GateKeeper, Kyverno). While its possible to control what labels are set on what pods, this is rare in practice. Without any checks, the policy is not very useful, as labels can be arbitrarily added/removed from pods to match the policies.

This may seem like a simple check to add, but its fraught with edge cases. For example...

  • Did you account for Pod labels can changing at runtime?
  • Did you know pod/status permission can be used to modify labels, not just the status field (This is often surprisingly easy to get access to! See my other post)?
  • Did you know every node (via kubelet) has access to modify the labels of all the Pods running on the same node?
  • Did you rember to add add admission control for every label referenced by a NetworkPolicy, or only a few common ones that were in use at the time the policies were set up?

Labels span namespaces. While you can limit the namespace as well, the API allows you not to, and its easy to fall into this trap. Without strict admission control, policies without namespaces are particularly risky; any pod could potentially set the label. For example, role: admin in the dev namespace may have very difference intended semantics from the role: admin in the prod namespace, but this isn't considered in the policy.

This is a particularly problematic when we consider that any node compromise could then assign arbitrary labels to any pods, allowing them to bypass any policy.

You might wonder: if pod labels are problematic, why are namespace labels okay? They almost aren't (though it is a bit more common to have tighter control over Namespace objects), but Kubernetes added a magic kubernetes.io/metadata.name label to every namespace, specifically to address this gap in NetworkPolicy. This is always set to the name of the namespace, allowing explicitly listing out the namespaces by name.

Pod labels are not part of the network request

While the user facing API to NetworkPolicy is Pods (identified by labels), the runtime implementation is actually running against IP packets. An IP packet doesn't include the pods labels; in many cases the only useful identifying information is the IP address (some exceptions discussed later).

As a result, a typical implementation would be to do a reverse lookup from the IP address to a Pod running with that IP, then get the Pod labels and apply them to relevant policies. Example implementation.

This requires building a complete map of all possible clients, typically populated by an eventually consistent watch on the API server. This eventual consistency can cause incorrect policy results when the local state is stale (as shown previously), and is expensive to maintain.

This gets even worse when considering connecting multiple clusters (or other environments outside of Kubernetes); not only does the scale become substantially larger, but consistency concerns are exacerbated.

Beyond this, reliance on IP addresses can cause issues depending on the environment. If traffic needs to traverse various gateways to get the final destination, the original source IP may be lost. Additionally, in some environments there are concerns around IP spoofing.

What can we do then?

Fortunately, we don't need to get too crazy to come up with better options than NetworkPolicy - the building blocks are right in front of us.

Kubernetes already has a secure way to identify a group of pods - ServiceAccount. Unlike pod labels, this was designed from day 1 as a secure identity for a pod, and is the basis for Kubernetes RBAC.

With this secure identity basis, all that is left is a way to securely and scalably tie this identity to our network traffic. TLS does a great job of this, and has formed the basis of internet security for decades.

In Istio, this is done by provisioning TLS certificates tied to a ServiceAccount. All traffic in the mesh is automatically encrypted using these TLS certificates, and policies can written against these.

For example, our original example can be expressed in a similar way:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: only-admin-can-call-me
spec:
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/admin-namespace/sa/admin-service-account"]

While similar in appearance, this approach is far more secure and scalable. Because the identity we are asserting policy against is carried in the actual request, we don't need to know about the entire set of possible clients, or perform unsafe lookups. This also enables flexible networking topologies - requests can traverse multiple hops, bridging diverse environments, while maintaining the identity in the request.

While these two approaches appear similar on the surface, the subtle differences in the APIs lead to dramatic implications for users and implementations.