Arturo Borrero Gonz lez: Kubecon and CloudNativeCon 2023 Europe summary
This post serves as a report from my attendance to Kubecon and CloudNativeCon 2023 Europe that took place in
Amsterdam in April 2023. It was my second time physically attending this conference, the first one was in
Austin, Texas (USA) in 2017. I also attended once in a virtual fashion.
The content here is mostly generated for the sake of my own recollection and learnings, and is written from
the notes I took during the event.
The very first session was the opening keynote, which reunited the whole crowd to bootstrap the event and
share the excitement about the days ahead. Some astonishing numbers were announced: there were more than
10.000 people attending, and apparently it could confidently be said that it was the largest open source
technology conference taking place in Europe in recent times.
It was also communicated that the next couple iteration of the event will be run in China in September 2023
and Paris in March 2024.
More numbers, the CNCF was hosting about 159 projects, involving 1300 maintainers and about 200.000
contributors. The cloud-native community is ever-increasing, and there seems to be a strong trend in the
industry for cloud-native technology adoption and all-things related to PaaS and IaaS.
The event program had different tracks, and in each one there was an interesting mix of low-level and higher
level talks for a variety of audience. On many occasions I found that reading the talk title alone was not
enough to know in advance if a talk was a 101 kind of thing or for experienced engineers. But unlike in
previous editions, I didn t have the feeling that the purpose of the conference was to try selling me
anything. Obviously, speakers would make sure to mention, or highlight in a subtle way, the involvement of a
given company in a given solution or piece of the ecosystem. But it was non-invasive and fair enough for me.
On a different note, I found the breakout rooms to be often small. I think there were only a couple of rooms
that could accommodate more than 500 people, which is a fairly small allowance for 10k attendees. I realized
with frustration that the more interesting talks were immediately fully booked, with people waiting in line
some 45 minutes before the session time. Because of this, I missed a few important sessions that I ll
hopefully watch online later.
Finally, on a more technical side, I ve learned many things, that instead of grouping by session I ll group
by topic, given how some subjects were mentioned in several talks.
On gitops and CI/CD pipelines
Most of the mentions went to FluxCD and ArgoCD. At
that point there were no doubts that gitops was a mature approach and both flux and argoCD could do an
excellent job. ArgoCD seemed a bit more over-engineered to be a more general purpose CD pipeline, and flux
felt a bit more tailored for simpler gitops setups. I discovered that both have nice web user interfaces that
I wasn t previously familiar with.
However, in two different talks I got the impression that the initial setup of them was simple, but migrating
your current workflow to gitops could result in a bumpy ride. This is, the challenge is not deploying
flux/argo itself, but moving everything into a state that both humans and flux/argo can understand. I also
saw some curious mentions to the config drifts that can happen in some cases, even if the goal of gitops is
precisely for that to never happen. Such mentions were usually accompanied by some hints on how to operate
the situation by hand.
Worth mentioning, I missed any practical information about one of the key pieces to this whole gitops story:
building container images. Most of the showcased scenarios were using pre-built container images, so in that
sense they were simple. Building and pushing to an image registry is one of the two key points we would need
to solve in Toolforge Kubernetes if adopting gitops.
In general, even if gitops were already in our radar for
Toolforge Kubernetes,
I think it climbed a few steps in my priority list after the conference.
Another learning was this site: https://opengitops.dev/.
On etcd, performance and resource management
I attended a talk focused on etcd performance tuning that was very encouraging. They were basically talking
about the exact
same problems we
have had in Toolforge Kubernetes, like api-server and etcd failure modes, and how sensitive etcd is to disk
latency, IO pressure and network throughput. Even though
Toolforge Kubernetes scale is small compared to other Kubernetes deployments out there, I found it very
interesting to see other s approaches to the same set of challenges.
I learned how most Kubernetes components and apps can overload the api-server. Because even the api-server
talks to itself. Simple things like
kubectl
may have a completely different impact on the API depending on
usage, for example when listing the whole list of objects (very expensive) vs a single object.
The conclusion was to try avoiding hitting the api-server with LIST calls, and use ResourceVersion which
avoids full-dumps from etcd (which, by the way, is the default when using bare kubectl get
calls). I
already knew some of this, and for example the jobs-framework-emailer was already making use of this
ResourceVersion functionality.
There have been a lot of improvements in the performance side of Kubernetes in recent times, or more
specifically, in how resources are managed and used by the system. I saw a review of resource management from
the perspective of the container runtime and kubelet, and plans to support fancy things like topology-aware
scheduling decisions and dynamic resource claims (changing the pod resource claims without
re-defining/re-starting the pods).
On cluster management, bootstrapping and multi-tenancy
I attended a couple of talks that mentioned kubeadm, and one in particular was from the maintainers
themselves. This was of interest to me because as of today we use it for
Toolforge. They shared all
the latest developments and improvements, and the plans and roadmap for the future, with a special mention to
something they called kubeadm operator , apparently capable of auto-upgrading the cluster, auto-renewing
certificates and such.
I also saw a comparison between the different cluster bootstrappers, which to me confirmed that kubeadm was
the best, from the point of view of being a well established and well-known workflow, plus having a very
active contributor base. The kubeadm developers invited the audience to submit feature requests,
so I did.
The different talks confirmed that the basic unit for multi-tenancy in kubernetes is the namespace. Any
serious multi-tenant usage should leverage this. There were some ongoing conversations, in official sessions
and in the hallway, about the right tool to implement K8s-whitin-K8s, and vcluster
was mentioned enough times for me to be convinced it was the right candidate. This was despite of my impression
that multiclusters / multicloud are regarded as hard topics in the general community. I definitely would like to play
with it sometime down the road.
On networking
I attended a couple of basic sessions that served really well to understand how Kubernetes instrumented the
network to achieve its goal. The conference program had sessions to cover topics ranging from network
debugging recommendations, CNI implementations, to IPv6 support. Also, one of the keynote sessions had a
reference to how kube-proxy is not able to perform NAT for SIP connections, which is interesting because I
believe Netfilter Conntrack could do it if properly configured. One of the conclusions on the CNI front was
that Calico has a massive community adoption (in Netfilter mode), which is reassuring, especially considering
it is the one we use for Toolforge Kubernetes.
On jobs
I attended a couple of talks that were related to HPC/grid-like usages of Kubernetes. I was truly impressed
by some folks out there who were using Kubernetes Jobs on massive scales, such as to train machine learning
models and other fancy AI projects.
It is acknowledged in the community that the early implementation of things like Jobs and CronJobs had some
limitations that are now gone, or at least greatly improved. Some new functionalities have been added as
well. Indexed Jobs, for example, enables each Job to have a number (index) and process a chunk of a larger
batch of data based on that index. It would allow for full grid-like features like sequential (or again,
indexed) processing, coordination between Job and more graceful Job restarts. My first reaction was: Is that
something we would like to enable in Toolforge Jobs Framework?
On policy and security
A surprisingly good amount of sessions covered interesting topics related to policy and security. It was nice
to learn two realities:
- kubernetes is capable of doing pretty much anything security-wise and create greatly secured environments.
- it does not by default. The defaults are not security-strict on purpose.
- Keynote
- Node Resource Management: The Big Picture - Sascha Grunert & Swati Sehgal, Red Hat; Alexander Kanevskiy, Intel; Evan Lezar, NVIDIA; David Porter, Google.
- How We Securely Scaled Multi-Tenancy with VCluster, Crossplane, and Argo CD - Ilia Medvedev & Kostis Kapelonis, Codefresh. (Couldn t really attend, room full)
- Flux Beyond Git: Harnessing the Power of OCI - Stefan Prodan & Hidde Beydals, Weaveworks. (Couldn t really attend, room full)
- Tutorial: Measure Twice, Cut Once: Dive Into Network Foundations the Right Way! - Marino Wijay & Jason Skrzypek, Solo.io
- Argo CD Core - A Pure GitOps Agent for Kubernetes - Alexander Matyushentsev, Akuity & Leonardo Luz Almeida, Intuit
- Kubeadm Deep Dive - Rohit Anand, NEC & Paco Xu, Dao
- Cloud Operate Multi-Tenancy Service Mesh with ArgoCD in Production - Lin Sun, Solo.io & Faseela K, Ericsson Software Technology
- Keynote
- Setting up Etcd with Kubernetes to Host Clusters with Thousands of Nodes - Marcel Zi ba, Isovalent & Laurent Bernaille, Datadog
- Container Is the New VM: The Paradigm Change No One Explained to You - Marga Manterola, Isovalent & Rodrigo Campos Catelin, Microsoft
- Ephemeral Clusters as a Service with ClusterAPI and GitOps - Alessandro Vozza, Solo.io & Joaquin Rodriguez, Microsoft
- Automating Configuration and Permissions Testing for GitOps with OPA Conftest - Eve Ben Ezra & Michael Hume, The New York Times
- Across Kubernetes Namespace Boundaries: Your Volumes Can Be Shared Now! - Masaki Kimura & Takafumi Takahashi, Hitachi
- Keynote
- Prevent Embarrassing Cluster Takeovers with This One Simple Trick! - Daniele de Araujo dos Santos & Shane Lawrence, Shopify
- Hacking and Defending Kubernetes Clusters: We ll Do It LIVE!!! - Fabian Kammel & James Cleverley-Prance, ControlPlane
- Painless Multi-Cloud to the Edge Powered by NATS & Kubernetes - Tomasz Pietrek & David Gee, Synadia
- Demystifing IPv6 Kubernetes - Antonio Jose Ojea Garcia, Google & Fernando Gont, Yalo
- Open Policy Agent. (OPA) Intro & Deep Dive - Charlie Egan, Styra, Inc. (Couldn t really attend, room full)
- Practical Challenges with Pod Security Admission - V K rbes & Christian Schlotter, VMware
- Enabling HPC and ML Workloads with the Latest Kubernetes Job Features - Micha Wo niak, Google & Vanessa Sochat, Lawrence Livermore National Laboratory
- Can You Keep a Secret? on Secret Management in Kubernetes - Liav Yona & Gal Cohen, Firefly