Over the years working with Envoy (via Istio), I've come across quite a few quirks and gotchas. I thought it would be fun to share some of them, and how to work around them. Many of these surprise even Envoy experts!
To start things off, lets talk about the "Clear route cache" option present on a number of Envoy filters.
Policy based routing
The idea behind "clear route cache" is to allow filters to impact routing decisions. Most often, this is done from an External Authorization server, which allows offloading the decision to an external server, giving a lot of flexibility.
clear_route_cache is the knob that, for most users, just means "make this actually work".
And the API documentation is not much different:
Clears the route cache in order to allow the external authorization service to correctly affect routing decisions.
Without this, the external authorization filter will run, but it will not be able to impact routing decisions. With it, it can... and, as we will see, cause some unexpected behavior!
The not-so-secure admin route
To demonstrate this, we will make a simple Envoy configuration with two routes: /admin and /public.
The /admin route will require authentication, while the /public route will be accessible to anyone.
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: app
domains: ["*"]
routes:
# Protected route. This seems secure, right?
- match:
prefix: "/admin"
route:
cluster: admin
typed_per_filter_config:
envoy.filters.http.rbac:
"@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBACPerRoute
rbac:
rules:
action: ALLOW
policies:
require-admin-token:
permissions:
- any: true
principals:
- header:
name: authorization
string_match:
exact: "Bearer admin-token"
# Public route, any can access this.
- match:
prefix: "/"
route:
cluster: public
http_filters:
# Authorization filter; does nothing here as we will do a per-route authorization.
- name: envoy.filters.http.rbac
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
Trying this out works as expected:
$ curl localhost:10000/admin -H "Authorization: Bearer admin-token"
Hello from the admin server!
$ curl localhost:10000/admin -H "Authorization: Bearer not-token"
RBAC: access denied
Adding policy based routing
Now, we will add a filter that will influence the routing decision. Here I use a trivial Lua filter, though this could be any filter that modifies the request in a way that impacts routing (e.g. header modification, path modification, etc).
# Set the path from the `target` header.
name: envoy.filters.http.lua
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
inline_code: |
function envoy_on_request(handle)
if handle:headers():get("target") then
handle:headers():replace(":path", handle:headers():get("target"))
handle:clearRouteCache()
end
end
This filter looks for a target header, and if it exists, it replaces the path with the value of that header. A bit contrived for this simple example, but it demonstrates the point.
Now, lets try to access both pages un-authenticated:
$ curl localhost:10000/ -H "Authorization: Bearer not-token" -H "target: /public"
hello from the public server!
$ curl localhost:10000/ -H "Authorization: Bearer not-token" -H "target: /admin"
hello from the admin server!
Huh?? We are able to access the /admin server, which has an RBAC policy requiring the admin token, without providing a valid token!
What is going on?
A common misunderstanding of Envoy is that it follows a typical "middleware" filter execution behavior, where each filter runs entirely in order before handing off to the next step. For instance, I might expect from the above configuration:
- Apply the Lua filter.
- Apply the Router filter to pick a route.
- (If I hit the /admin route) Apply the RBAC filter to check if I have access to the route.
If this was the case, we wouldn't need clear_route_cache at all, and we wouldn't have this quirk.
However, Envoy does not work like this!
Instead, Envoy decides the route very early in the request processing, and "caches" it.
This is why clear_route_cache is needed at all; the "cache" is not a cross-request cache, but rather a cache within the processing of a single request.
When clear_route_cache is called, it forces Envoy to re-evaluate the selected route.
However, this introduces our problem.
Our listener filters look like so:
http_filters:
# Authorization filter; does nothing here as we will do a per-route authorization.
- name: envoy.filters.http.rbac
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC
# Set the path from the `target` header.
- name: envoy.filters.http.lua
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
inline_code: |
function envoy_on_request(handle)
if handle:headers():get("target") then
handle:headers():replace(":path", handle:headers():get("target"))
handle:clearRouteCache()
end
end
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
Even though the RBAC configuration is on the route level, there is not a sequential "listener filters, then route filters" execution at all. Instead, the RBAC filter runs before the Lua filter, and selects the configuration based on the initial route selection. In the Lua filter, we clear the route cache, and select a new route. However, the RBAC filter does not re-run, and thus we are still using the RBAC configuration from the initial route selection, which allows us to access the admin route without a token!
If you are using clear_route_cache, you need to be very careful about this behavior, as it can lead to unexpected security (or other) issues like this one!
This behavior is documented but easy to miss.
Expand: Full Config
static_resources:
listeners:
- name: ingress
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: app
domains: ["*"]
routes:
# Protected route. Surely, this cannot be accessed without the admin token, right?
- match:
prefix: "/admin"
route:
cluster: admin
typed_per_filter_config:
envoy.filters.http.rbac:
"@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBACPerRoute
rbac:
rules:
action: ALLOW
policies:
require-admin-token:
permissions:
- any: true
principals:
- header:
name: authorization
string_match:
exact: "Bearer admin-token"
# Public route, any can access this.
- match:
prefix: "/"
route:
cluster: public
http_filters:
# Authorization filter; does nothing here as we will do a per-route authorization.
- name: envoy.filters.http.rbac
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC
# Set the path from the `target` header.
- name: envoy.filters.http.lua
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
inline_code: |
function envoy_on_request(handle)
if handle:headers():get("target") then
handle:headers():replace(":path", handle:headers():get("target"))
handle:clearRouteCache()
end
end
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: admin
connect_timeout: 1s
type: STATIC
load_assignment:
cluster_name: admin
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8082
- name: public
connect_timeout: 1s
type: STATIC
load_assignment:
cluster_name: public
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8081