Over the years working with Envoy (via Istio), I've come across quite a few quirks and gotchas. I thought it would be fun to share some of them, and how to work around them. Many of these surprise even Envoy experts!

To start things off, lets talk about the "Clear route cache" option present on a number of Envoy filters.

Policy based routing

The idea behind "clear route cache" is to allow filters to impact routing decisions. Most often, this is done from an External Authorization server, which allows offloading the decision to an external server, giving a lot of flexibility.

clear_route_cache is the knob that, for most users, just means "make this actually work". And the API documentation is not much different:

Clears the route cache in order to allow the external authorization service to correctly affect routing decisions.

Without this, the external authorization filter will run, but it will not be able to impact routing decisions. With it, it can... and, as we will see, cause some unexpected behavior!

The not-so-secure admin route

To demonstrate this, we will make a simple Envoy configuration with two routes: /admin and /public. The /admin route will require authentication, while the /public route will be accessible to anyone.

filter_chains:
- filters:
  - name: envoy.filters.network.http_connection_manager
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
      stat_prefix: ingress_http
      route_config:
        name: local_route
        virtual_hosts:
        - name: app
          domains: ["*"]
          routes:
          # Protected route. This seems secure, right?
          - match:
              prefix: "/admin"
            route:
              cluster: admin
            typed_per_filter_config:
              envoy.filters.http.rbac:
                "@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBACPerRoute
                rbac:
                  rules:
                    action: ALLOW
                    policies:
                      require-admin-token:
                        permissions:
                        - any: true
                        principals:
                        - header:
                            name: authorization
                            string_match:
                              exact: "Bearer admin-token"
          # Public route, any can access this.
          - match:
              prefix: "/"
            route:
              cluster: public

      http_filters:
      # Authorization filter; does nothing here as we will do a per-route authorization.
      - name: envoy.filters.http.rbac
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC
      - name: envoy.filters.http.router
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Trying this out works as expected:

$ curl localhost:10000/admin -H "Authorization: Bearer admin-token"
Hello from the admin server!
$ curl localhost:10000/admin -H "Authorization: Bearer not-token"
RBAC: access denied

Adding policy based routing

Now, we will add a filter that will influence the routing decision. Here I use a trivial Lua filter, though this could be any filter that modifies the request in a way that impacts routing (e.g. header modification, path modification, etc).

# Set the path from the `target` header.
name: envoy.filters.http.lua
typed_config:
  "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
  inline_code: |
    function envoy_on_request(handle)
      if handle:headers():get("target") then
        handle:headers():replace(":path", handle:headers():get("target"))
        handle:clearRouteCache()
      end
    end

This filter looks for a target header, and if it exists, it replaces the path with the value of that header. A bit contrived for this simple example, but it demonstrates the point.

Now, lets try to access both pages un-authenticated:

$ curl localhost:10000/ -H "Authorization: Bearer not-token" -H "target: /public"
hello from the public server!
$ curl localhost:10000/ -H "Authorization: Bearer not-token" -H "target: /admin"
hello from the admin server!

Huh?? We are able to access the /admin server, which has an RBAC policy requiring the admin token, without providing a valid token!

What is going on?

A common misunderstanding of Envoy is that it follows a typical "middleware" filter execution behavior, where each filter runs entirely in order before handing off to the next step. For instance, I might expect from the above configuration:

  1. Apply the Lua filter.
  2. Apply the Router filter to pick a route.
  3. (If I hit the /admin route) Apply the RBAC filter to check if I have access to the route.

If this was the case, we wouldn't need clear_route_cache at all, and we wouldn't have this quirk. However, Envoy does not work like this!

Instead, Envoy decides the route very early in the request processing, and "caches" it. This is why clear_route_cache is needed at all; the "cache" is not a cross-request cache, but rather a cache within the processing of a single request. When clear_route_cache is called, it forces Envoy to re-evaluate the selected route.

However, this introduces our problem.

Our listener filters look like so:

http_filters:
# Authorization filter; does nothing here as we will do a per-route authorization.
- name: envoy.filters.http.rbac
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC
# Set the path from the `target` header.
- name: envoy.filters.http.lua
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
    inline_code: |
      function envoy_on_request(handle)
        if handle:headers():get("target") then
          handle:headers():replace(":path", handle:headers():get("target"))
          handle:clearRouteCache()
        end
      end
- name: envoy.filters.http.router
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Even though the RBAC configuration is on the route level, there is not a sequential "listener filters, then route filters" execution at all. Instead, the RBAC filter runs before the Lua filter, and selects the configuration based on the initial route selection. In the Lua filter, we clear the route cache, and select a new route. However, the RBAC filter does not re-run, and thus we are still using the RBAC configuration from the initial route selection, which allows us to access the admin route without a token!

If you are using clear_route_cache, you need to be very careful about this behavior, as it can lead to unexpected security (or other) issues like this one!

This behavior is documented but easy to miss.

Expand: Full Config
static_resources:
  listeners:
  - name: ingress
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: app
              domains: ["*"]
              routes:
              # Protected route. Surely, this cannot be accessed without the admin token, right?
              - match:
                  prefix: "/admin"
                route:
                  cluster: admin
                typed_per_filter_config:
                  envoy.filters.http.rbac:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBACPerRoute
                    rbac:
                      rules:
                        action: ALLOW
                        policies:
                          require-admin-token:
                            permissions:
                            - any: true
                            principals:
                            - header:
                                name: authorization
                                string_match:
                                  exact: "Bearer admin-token"
              # Public route, any can access this.
              - match:
                  prefix: "/"
                route:
                  cluster: public

          http_filters:
          # Authorization filter; does nothing here as we will do a per-route authorization.
          - name: envoy.filters.http.rbac
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC
          # Set the path from the `target` header.
          - name: envoy.filters.http.lua
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
              inline_code: |
                function envoy_on_request(handle)
                  if handle:headers():get("target") then
                    handle:headers():replace(":path", handle:headers():get("target"))
                    handle:clearRouteCache()
                  end
                end
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
  - name: admin
    connect_timeout: 1s
    type: STATIC
    load_assignment:
      cluster_name: admin
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8082
  - name: public
    connect_timeout: 1s
    type: STATIC
    load_assignment:
      cluster_name: public
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8081