The X.Org Foundation has successfully completed a migration from Google Kubernetes Engine (GKE) to Packet, which it reckoned “should save us around $30 per day.”
The X.Org Foundation manages a number of key open-source projects including the Wayland graphics protocol, the X.Org server, and the Mesa 3D graphics library. The migration was mentioned in the X.Org board minutes last week.
The brief note has brought closure to a problem that at one time threatened to disrupt the developers’ work. In January 2020, the monthly bill from Google Cloud Platform (GCP) was over $6,000, and the following month board member Daniel Vetter sent out an email explaining that if the costs were not reduced, CI (Continuous Integration) services would have to be cut “somewhere between May and June this year.”
“That would have been a pretty bad drawback for all the projects there,” Benjamin Tissoires, a senior software engineer at Red Hat, told us. His company allowed him the time to investigate.
How did cloud costs get out of control?
The story began back in 2018, when freedesktop.org migrated from a homegrown project hosting infrastructure to one based on the community edition of GitLab, a source control and DevOps system, hosted on GCP. “We politely declined the offer of a license to the pay-for GitLab Enterprise Edition; we wanted to be fully in control of our infrastructure, and on a level playing field with the rest of the open-source community,” said Collabora’s Daniel Stone from the freedesktop.org team.
GitLab offered to sponsor the GCP hosting for an initial period and everything looked good.
Initially the cloud costs were around $350 to $400 per month. The system was popular, new projects came on board, including Mesa, and there was more use of modern development practices like CI.
By March 2019 the bill had risen to over $3,000. A $30,000 grant from Google removed the issue for around eight months, then at the end of 2019 when the grant was spent it was apparent that something was badly wrong.
“Every time you pull data from the cluster, from Google to anywhere else, you’re paying for it,” said Tissoires. “When we started to run stuff with GitLab we had more and more jobs running on the CI and it turns out that those jobs were drawing a lot more data from Google Kubernetes cluster.”
Tissoires presented the full figures for January 2020 at the X.org developer conference in a presentation here [PDF]. His breakdown showed networking (data transfer) costs of $3,699, much more than the compute cost of $2,258. The rest of the bill was for logging and cloud storage required by Kubernetes, a modest $360.
Bills from GCP tripled between November 2018 and January 2020, though disguised by a $30K grant which covered costs for eight months (red bars). SPI stands for Service Provider Infrastructure. Chart from Benjamin Tissoires’ presentation to the X.Org conference
He set about analysing the usage, which turned out to be split roughly half and half between pulling down container images from the Docker registry, and transferring artifacts, these being the output from project builds.
The setup was complicated because although GitLab was on GCP, the CI runners, which execute the CI jobs, were hosted elsewhere, on Packet (now called Equinix Metal).
Registry operations were reduced by increasing the space for downloaded images, so they could be cached for longer, and took into account the images most likely to be reused.
Artifacts were being pushed into Google Cloud storage, often unnecessarily. “When you don’t know the costs, you just push everything. We started to be more careful,” said Tissoires. Costs fell by 50 per cent to $3,000 per month. Not much more could be done while on GCP since the fixed cost of the GKE infrastructure was around $2,500.
A painful move
That was a short-term fix. Packet, which sponsors X.org usage on its own infrastructure, suggested that it would be better to host GitLab on Kubernetes there as well. “It was painful to migrate,” Tissoires said, “especially because they wanted to migrate without downtime. Which I managed to do… The first step was to deploy a Kubernetes cluster on Packet, that’s when we realised that GKE gives you a lot and if you want to reproduce that, that’s painful.” The Kubernetes distribution they used was the lightweight K3s.
Why cloud costs get out of control: Too much lift and shift, and pricing that is ‘screwy and broken’
Disregarding Packet’s sponsorship, the cost of VMs is similar to what it was on GCP. Network usage is much cheaper, mainly because it is now within one provider, but also because Packet’s $0.05/GB egress charge is less than Google’s, which defaults to around $0.12/GB.
The numbers for X.Org’s usage may seem small from an enterprise perspective, but the exercise was revealing for any user of public cloud. The biggest point, perhaps, has been the cost of hybrid or multi-cloud that can be easy to overlook – network bandwidth. The cost of data transfer within a single cloud provider is generally small by comparison, especially if it is within the same region.
A second key point was the impact of skilled analysis and remediation on costs. Tissoires and his team were able to reduce the bill by half.
There was another, less obvious benefit to this migration. Packet hosting is a much simpler affair than GCP (or other public clouds like AWS and Azure), which means it requires more technical knowledge to operate.
Tissoires said he sees this as an advantage. If the sponsorship arrangement with Packet were to end and another migration were needed, X.Org would be better placed than with GCP. “The idea is that we migrated the data once so we can always do it another time, we’ve got the full infrastructure in Kubernetes that we baked, which means that we can [easily] migrate to new machines,” he said, whereas before there was a dependency on GKE specifically.
The resources include “a bunch of scripts to connect two clusters together so we were able to move the data while still keeping the service up,” he said. “We are not completely locked in Packet.”
Similar thinking applied when the option of using GitLab’s cloud platform was considered and rejected. “We like to be controlling the data,” said Tissoires, “and we want to run only open-source software. On GitLab cloud you are running the Enterprise edition.” ®