Over 2 years ago, I started working on some ideas to build better Kubernetes controllers. In this post, I wanted to give a bit of a retrospective on how things have gone since then.
Over the years working on Istio and other projects, I observed a number of major issues with controllers:
- Most code was about error-prone event handling and state reconciliation, rather than business logic.
- Most tests, in turn, were about the same.
- This, in turn, made the code extremely complex, brittle, and often incorrect.
- This complexity lead to user facing compromise: incorrectness and performance issues.
You might argue I should just write a better controller that is faster and without bugs. Maybe, but probably not.
While someone could probably write a simple controller correctly with the usual primitives, writing a complex one has been proven (in many projects, including Kubernetes core) to be an illusive task.
For Istio, our controller has a ridiculous amount of state: there are over 45 input types and 20 output types!
To address this, I wanted to build a way write controllers without these issues.
Rather than dealing with Add
s and Delete
s and trying to synchronize internal indexes,
I wanted to just write code that contains business logic -- the rest should be figured out for me.
The beginning
After many months of prototyping and discussions during my free time, the initial idea had firmed up. Even after rewriting huge chunks of Istio with the new library (which found a variety of bugs in the old code!), though, I wasn't entirely convinced we should use it at all.
"We should rewrite this stable product in this new framework I made" is a pretty big red flag, even coming out of my own mouth.
However, after much discussion with other maintainers, it was clear this was going to be the best path forward.
A few months later, the initial library, krt
("Kubernetes Declarative Controller Runtime") was merged!
Around the same time, I gave a talk about the library at KubeCon, which also goes into a lot more details around the motivations.
We started by rewriting the code for the new ambient mode controller. This was brand new code that was in "alpha" at the time, making it a good fit for experimentation. Additionally, the code was terrible to work with due to how many objects were being joined together -- we were constantly squashing bugs and writing ridiculous tests to catch all the edge cases.
Takeaways
Since the original introduction, we have gained over a year of real world experience with the libraries. Here are my main takeaways.
Better testing
In my initial designs, I focused primarily on making the primary code development easier. Surprisingly, I have found that writing tests for the code is an even bigger improvement.
Before, we might write a test like this:
s.addWaypoint(t, "1.2.3.4", "my-waypoint")
assert.EventuallyEqual(t, func() int { return len(s.waypoints.List()) }, 1)
s.addPods(t, "10.0.0.1", "my-waypoint-instance")
s.assertEvent(t, s.pod("my-waypoint-instance"))
appTunnel := s.lookup(s.podXdsName("my-waypoint-instance"))[0].GetWorkload().GetApplicationTunnel()
assert.Equal(t, appTunnel, &ApplicationTunnel{
Port: 15088,
})
(Not shown are the copious amount of helper functions needed to make the test somewhat sane).
This is a pretty simple test, but exposes some interesting aspects. The test is very imperative. We add something, then we wait for an event, then we perform some lookup.
Likely, the first few iterations of a test like this will be flaky due to not properly handling eventual consistency. Even when written correctly, it will inherently be slower than it could be due to needing to handle eventually consistency.
A test with krt
would look more like this (as a table driven test):
{
name: "pod with service",
inputs: []any{
model.ServiceInfo{
Service: &workloadapi.Service{
Name: "svc",
Namespace: "ns",
Hostname: "hostname",
},
LabelSelector: model.NewSelector(map[string]string{"app": "foo"}),
},
},
pod: &v1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "name",
Namespace: "ns",
Labels: map[string]string{"app": "foo"},
},
Status: v1.PodStatus{
Phase: v1.PodRunning,
Conditions: podReady,
PodIP: "1.2.3.4",
},
},
result: &workloadapi.Workload{
Name: "name",
Namespace: "ns",
Address: "1.2.3.4",
Status: workloadapi.WorkloadStatus_HEALTHY,
Services: map[string]PortList{
"ns/hostname": {
Ports: []Port{{ServicePort: 80,TargetPort: 8080}},
},
},
},
}
Note here we don't have any event ordering, polling, etc -- we just declare the inputs and the expected outputs. These could even be written as "golden tests" which can automatically update the expected output, if desired.
Development speed
This one wasn't a big surprise, but I have continually been impressed by how efficiently I am able to make complex changes. I recently had to introduce some new logic into one of our controllers, which was reading ~10 different types and creating a "Workload" representation.
I needed to add some information to this by joining the Pod
, EndpointSlice
, and Node
resource to extract some information.
In a traditional controller this would be incredible complex to do right; even if we were able to write the code correctly and quickly, to have robust test coverage would take a lot of work.
The change took me 30 minutes including testing!
Performance and correctness
After rewriting our controller, I found that under extreme loads (like 50x expected real world loads), we started to see some slower-than-expected behavior. After a lot of investigation, the reason why was pretty interesting -- the old code was fast, but technically wrong.
I believe the issue was something like the old code not detecting if a Node
's topology.kubernetes.io/region
label changed.
The new code did detect this, but it took some extra computation to do so.
Can a Node
legitimately change regions... I don't know, maybe I flew my Raspberry Pi across the globe!
This tradeoff of correctness and performance will need a critical eye as we progress to ensure we don't make poor choices between theoretical correctness and practical efficiency.
Broader usage
While originally designed for Istio's purposes, we have seen some adoption beyond Istio which is pretty exciting. While currently this requires importing the monorepo, this could plausibly be broken out to make usage even easier.
Rewrite It In Rust
As established above, krt
is pretty cool (IMO).
You know what is cooler? Rewriting code in Rust.
While somewhat of a joke, krt
is actually really poorly fit for Go's simplistic type system.
For instance, one of our simple test collections in Go:
type SimplePod struct {
Name types.NamespacedName
Labels map[string]string
IP string
}
func (p SimplePod) ResourceName() string {
return p.Name.String()
}
func (p SimplePod) Equals(other SimplePod) bool {
return p.Name == other.Name && maps.Equal(p.Labels, other.Labels) && p.IP == other.IP
}
func SimplePodCollection(pods krt.Collection[*corev1.Pod]) krt.Collection[SimplePod] {
return krt.NewCollection(pods, func(ctx krt.HandlerContext, i *corev1.Pod) *SimplePod {
if i.Status.PodIP == "" {
return nil
}
return &SimplePod{
Name: types.NamespacedName{Name: i.Name, Namespace: i.Namespace},
Labels: i.Labels,
IP: i.Status.PodIP,
}
})
}
Here we have to manually define the Equals
and ResourceName
function.
Tedious and error prone!
In rust:
#[derive(Eq, Key)]
struct SimplePod {
#[krt(key)]
named: Named,
labeled: Labeled,
ip: IpAddr,
}
fn simple_pod_collection(pods: impl Collection<Pod>) -> impl Collection<SimplePod> {
transform::Collection::new(pods, |pod: &Pod| {
pod.status.pod_ip.and_then(|ip| {
Some(SimplePod {
named: Named::new(pod),
labeled: Labeled::new(pod),
ip: ip.parse().ok()?,
})
})
})
}
We automatically get basics like Equals
for us, but we can even add our own custom derivations!
I added a krt(key)
attribute which defines the key.
In the above example it just handles a single value, but we can also make composite keys of multiple values!
In the go example, doing that would require manually defining a mapping of these multiple values to strings.
One other thing I am experimenting with is defining Collection
better:
trait Collection<O: KeyFetcher>: Send + Sync {
type Stream: futures::Stream<Item = Event<O>> + std::marker::Send + 'static;
fn get(&self, key: &O::Key) -> Option<O>;
fn list(&self) -> impl Iterator<Item = O>;
fn stream(&self) -> Self::Stream;
}
Rather than the callback approach of registering a handler, we can just requests a asynchronous stream of events.
list
has a subtle improvement too, allowing it to return an iterator rather than a concrete list (which opens some optimization opportunities).
Right now this is just an experiment, but has some exciting possibilities. I am also looking into how many of these improvements (and others I have been making) can be brought back into the Go version.