Recently I have been spending a lot of time analyzing and optimizing memory usage in our Rust reverse-proxy, agentgateway. One thing that repeatedly came up was a surprisingly large amount of memory allocated to innocent-looking Tokio mpsc channels.

In my naive understanding, I would have assumed the following allocation pattern:

struct BigStruct {
    data: [u8; 1024],
}
fn main() {
  // Allocates ~1024 bytes
  let _ = tokio::sync::mpsc::channel::<BigStruct>(1);
  // Allocates ~1024*1024 bytes
  let _ = tokio::sync::mpsc::channel::<BigStruct>(1024);
}

However, in practice both of these are wrong: they each allocate 32kb!

In our application we had two particular areas where this had pretty meaningful performance impact.

What's going on

To dig into this a bit more I built out a small playground to analyze allocations from mpscs.

The results were quite interesting:

msg_size capacity heap_after_create heap_after_fill
8 1 800 B 800 B
8 16 800 B 800 B
8 1024 800 B 9728 B
128 1 4640 B 4640 B
128 16 4640 B 4640 B
128 1024 4640 B 132608 B
1024 1 33312 B 33312 B
1024 16 33312 B 33312 B
1024 1024 33312 B 1050112 B

Here we have:

  • msg_size: the size of the struct T.
  • capacity: capacity of the bounded mpsc channel
  • heap_after_create: memory allocated immediately after creation
  • heap_after_fill: memory allocated immediately after we send capacity items on the channel to fill it up.

A few things stand out here:

  • The capacity of the channel has no impact on the initial allocation size.
  • Even when we start sending messages, we don't start to grow until a certain point (spoiler: it's after 32 messages).
  • With some quick math we can see the initial cost is (msg_size * 32) + 544. So there is a fixed cost and a multiplier based on the message size.

Implementation details

This 32 multiplier is easy to find looking at the Tokio source.

The channel is built up of a linked list of Blocks. Each Block stores BLOCK_CAP T values, where BLOCK_CAP is hardcoded to 32.

Because of this, allocating the mpsc allocates a [T; 32] more or less.

The remaining fixed 544 bytes comes from the rest of the channel parts which I didn't analyze too closely.

Real world impact

Many tiny channels

The first real impact for us was a channel we created for each Kubernetes Service object in the cluster to send events about the health of the service on. In many environments these number in the thousands.

The message we sent on each channel is small (only 24 bytes), and the throughput and latency requirements for these is very low.

As we learned above, however, each was taking 1312 bytes! We moved this to another channel, futures-channel, which only allocates one T at a time (at the cost of a throughput degradation which was not relevant to us). The end result cut our overall memory in half in a representative test:

Agentgateway memory before and after optimization
Agentgateway memory before and after optimization

Hyper connections

When using Hyper, we see a 16kb allocation per connection. When serving thousands of connections this can add up quite a bit.

Part of this is obvious: each connection has a hardcoded buffer INIT_BUFFER_SIZE: usize = 8192.

However, the other 8kb comes from the same mpsc issue!

Each SendRequest, the API that requests are sent on, utilizes a channel that dispatches http::Requests. Each request is typically around 250 bytes, multiplying by 32 gives us our remaining 8kb.

Unlike our previous use case, this codepath is very latency/throughput sensitive. The upfront allocation has a meaningful impact on end to end benchmarks, but we only really ever need 1-2 items in the channel at a time - the 32 block size is almost pure overhead.

This is tracked in Hyper issue #4057