The OSI model attempts to build a model for network communications, where increasingly high level layers are built upon lower layers. This is only slightly useful in practice, as the real world is not so simple.

In service mesh, generally discussion is reduced to L4 and L7, or TCP and HTTP. This oversimplifies the problem, leading to some confusion.

Thinking in terms of termination

Simply saying "HTTP" is not really clear about what is going on. Instead, I think its more useful to think about what layer we terminate.

L3/Packet processors operate on an IP packet level. At this level, IP addresses are available. This is typically done in user space with TUN devices, or in the kernel.

L4/TCP processors terminate the TCP connection. Even if the proxy does nothing but forward the request as-is, this has huge implications relative to packet processing, as we lose transparency. Destination IPs and ports will change, connect() may succeed when the proxy accepts the connection but the intended destination does not, TCP options or settings may not traverse the proxy, etc. Many of these differences, in my experience, are accidental.

L7/HTTP processors terminate HTTP streams. This means the proxy is responsible for parsing the incoming HTTP requests, doing something, and then sending an HTTP response back. Like TCP processing, this dramatically decreases the transparency of the proxy. There are many hop-by-hop HTTP components, the proxy may differ in behavior from direct connections, etc. Unlike TCP, these differences are often intentional; for example, we may load balance at the request level instead of the connection level. This is a large behavioral change, but often an intentional and helpful one.

Changing layers

A proxy can possibly operate at many layers, and move between them.

Packet to TCP: if a proxy has a packet, it can "upgrade" it to TCP through a few mechanisms. A variety of user-space TCP implementations exist, and its possible to route the packets back into the kernel TCP implementation.

TCP to HTTP: this is how all HTTP works. The TCP body is read and parsed into HTTP messages.

HTTP to TCP: generally you don't downgrade to "raw" TCP after starting with HTTP processing. However, this is sometimes done with HTTP/1.1, especially to handle things like CONNECT.

TCP to Packet: once you have terminated the TCP connection, you cannot really "undo" it.

Layer inversions

Just because we are terminating Packets/TCP Connections/HTTP Requests doesn't mean we cannot do things based on the other layers!

One of the most obvious examples is reading TCP attributes while processing packets. This is routinely done and is pretty trivial. Even the kernel itself does this -- tools like iptables operate on packets but can access TCP attributes. Typically, this is just operating on single packets at a time, allowing us to see the TCP state, ports, etc. However, some more advanced systems do re-assembly the payload by joining packets. This is sort of like a "passive" user-space TCP.

Similarly, "passive" HTTP processing can be done without actually terminating the HTTP requests. For example, we could look for all instances of ^GET .* as a way to count number of requests for metrics. This isn't terribly hard, but typically must be done from the beginning of the connection. HTTP/2 is stateful, and HTTP/1.1 doesn't allow us to distinguish headers from body, so without observing the entire connection (like our example above), we may incorrectly parse things.

Tunnels and TLS - not just TCP

Historically, TLS has mostly been seen as a thing for TCP (rather than unreliable protocols like UDP/IP). Similarly, tunneling data over HTTP CONNECT has also been seen as being only viable for TCP.

Today, this isn't really the case. QUIC now has unreliable datagrams, allowing us to send unreliable data over a TLS-secured transport.

MASQUE builds on this, and specifies how to tunnel UDP and IP over HTTP/3.

Dropping layers

In Istio, we either operate exclusively at TCP or HTTP levels.

If we drop from exclusively TCP to handling packets but reading TCP information we can likely retain close to 100% of functionality, but without the downsides of terminating TCP. Telemetry (connections opened, closed, bytes sent, bytes received) can base handle by looking at packets. Bytes sent and received are a bit trickier to manage, but can be done. TLS can still be handled, as noted above.

If we drop HTTP termination, we definitely lose a lot of functionality. However, this may be worth it from some use cases that don't use 100% of the functionality. Even with passive HTTP inspection, we can still keep full fidelity in metrics and logs. There are even some attempts at propagating traces in this matter through (somewhat disturbing) eBPF shenanigans.