Dietmar Rabich,
Cape Town (ZA), Sea Point, Nachtansicht 2024 1867-70
2,
CC BY-SA 4.0
This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.
Wikimedia Cloud VPS is a service offered by the Wikimedia
Foundation, built using
OpenStack and managed by the Wikimedia Cloud Services
team. It provides cloud computing resources for projects related to the
Wikimedia movement, including virtual machines, databases, storage,
Kubernetes, and DNS.
A few weeks ago, in April 2025,
we were finally able to introduce IPv6 to
the cloud virtual network, enhancing the platform s scalability, security, and future-readiness. This is a major
milestone, many years in the making, and serves as an excellent point to take a moment to reflect on the road that got
us here.
There were definitely a number of challenges that needed to be addressed before we could get into IPv6. This post covers the journey to this
implementation.
The Wikimedia Foundation was an early adopter of the OpenStack technology, and the original OpenStack deployment in the
organization dates back to 2011. At that time, IPv6 support was still nascent and had limited implementation across
various OpenStack components.
In 2012, the Wikimedia cloud users formally requested IPv6 support.
When Cloud VPS was originally deployed, we had set up the network following some of the upstream-recommended patterns:
- nova-networks as the engine in charge of the software-defined virtual network
- using a flat network topology all virtual machines would share the same network
- using a physical VLAN in the datacenter
- using Linux bridges to make this physical datacenter VLAN available to virtual machines
- using a single virtual router as the edge network gateway, also executing a global egress NAT barring some
exceptions, using what was called dmz_cidr mechanism
In order for us to be able to implement IPv6 in a way that aligned with our architectural goals and operational
requirements, pretty much all the elements in this list would need to change. First of all, we needed to migrate from
nova-networks into Neutron,
a migration effort that started in 2017.
Neutron was the more modern component to implement software-defined networks in OpenStack. To facilitate this
transition, we made the strategic decision to backport certain functionalities from nova-networks into Neutron,
specifically
the dmz_cidr mechanism and some egress NAT capabilities.
Once in Neutron, we started to think about IPv6. In 2018 there was an initial attempt to decide on the network CIDR
allocations that Wikimedia Cloud Services would have. This initiative encountered unforeseen challenges
and was subsequently put on hold. We focused on removing the previously
backported nova-networks patches from Neutron.
Between 2020 and 2021, we initiated another
significant network refresh.
We were able to introduce the cloudgw project, as part of a larger effort to rework the Cloud VPS edge network. The new
edge routers allowed us to drop all the custom backported patches we had in Neutron from the nova-networks era,
unblocking further progress. Worth mentioning that the cloudgw router would use nftables as firewalling and NAT engine.
A pivotal decision in 2022 was to
expose the OpenStack APIs to the internet, which
crucially enabled infrastructure management via OpenTofu. This was key in the IPv6 rollout as will be explained later.
Before this, management was limited to Horizon the OpenStack graphical interface or the command-line interface
accessible only from internal control servers.
Later, in 2023, following the OpenStack project s announcement of the deprecation of the neutron-linuxbridge-agent, we
began to
seriously consider migrating to the neutron-openvswitch-agent.
This transition would, in turn, simplify the enablement of tenant networks a feature allowing each OpenStack project
to define its own isolated network, rather than all virtual machines sharing a single flat network.
Once we replaced neutron-linuxbridge-agent with neutron-openvswitch-agent, we were ready to migrate virtual machines to
VXLAN. Demonstrating perseverance, we decided to execute the VXLAN migration in conjunction with the IPv6 rollout.
We
prepared and tested several things, including the rework of the edge
routing to be based on BGP/OSPF instead of static routing. In 2024 we were ready for the initial attempt to deploy
IPv6,
which failed for unknown reasons. There was a full network outage and
we immediately reverted the changes. This quick rollback was feasible due to
our adoption of OpenTofu: deploying IPv6 had
been reduced to a single code change within our repository.
We started an investigation, corrected a few issues, and
increased our network functional testing coverage before trying again. One
of the problems we discovered was that Neutron would enable the enable_snat configuration flag for our main router
when adding the new external IPv6 address.
Finally, in April 2025,
after many years in the making,
IPv6 was successfully deployed.
Compared to the network from 2011, we would have:
- Neutron as the engine in charge of the software-defined virtual network
- Ready to use tenant-networks
- Using a VXLAN-based overlay network
- Using neutron-openvswitch-agent to provide networking to virtual machines
- A modern and robust edge network setup
Over time, the WMCS team has skillfully navigated numerous challenges to ensure our service offerings consistently meet
high standards of quality and operational efficiency. Often engaging in multi-year planning strategies, we have enabled
ourselves to set and achieve significant milestones.
The successful IPv6 deployment stands as further testament to the team s dedication and hard work over the years. I
believe we can confidently say that the 2025 Cloud VPS represents its most advanced and capable iteration to date.
This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.