Search Results: "glandium"

21 November 2023

Mike Hommey: How I (kind of) killed Mercurial at Mozilla

Did you hear the news? Firefox development is moving from Mercurial to Git. While the decision is far from being mine, and I was barely involved in the small incremental changes that ultimately led to this decision, I feel I have to take at least some responsibility. And if you are one of those who would rather use Mercurial than Git, you may direct all your ire at me. But let's take a step back and review the past 25 years leading to this decision. You'll forgive me for skipping some details and any possible inaccuracies. This is already a long post, while I could have been more thorough, even I think that would have been too much. This is also not an official Mozilla position, only my personal perception and recollection as someone who was involved at times, but mostly an observer from a distance. From CVS to DVCS From its release in 1998, the Mozilla source code was kept in a CVS repository. If you're too young to know what CVS is, let's just say it's an old school version control system, with its set of problems. Back then, it was mostly ubiquitous in the Open Source world, as far as I remember. In the early 2000s, the Subversion version control system gained some traction, solving some of the problems that came with CVS. Incidentally, Subversion was created by Jim Blandy, who now works at Mozilla on completely unrelated matters. In the same period, the Linux kernel development moved from CVS to Bitkeeper, which was more suitable to the distributed nature of the Linux community. BitKeeper had its own problem, though: it was the opposite of Open Source, but for most pragmatic people, it wasn't a real concern because free access was provided. Until it became a problem: someone at OSDL developed an alternative client to BitKeeper, and licenses of BitKeeper were rescinded for OSDL members, including Linus Torvalds (they were even prohibited from purchasing one). Following this fiasco, in April 2005, two weeks from each other, both Git and Mercurial were born. The former was created by Linus Torvalds himself, while the latter was developed by Olivia Mackall, who was a Linux kernel developer back then. And because they both came out of the same community for the same needs, and the same shared experience with BitKeeper, they both were similar distributed version control systems. Interestingly enough, several other DVCSes existed: In this landscape, the major difference Git was making at the time was that it was blazing fast. Almost incredibly so, at least on Linux systems. That was less true on other platforms (especially Windows). It was a game-changer for handling large codebases in a smooth manner. Anyways, two years later, in 2007, Mozilla decided to move its source code not to Bzr, not to Git, not to Subversion (which, yes, was a contender), but to Mercurial. The decision "process" was laid down in two rather colorful blog posts. My memory is a bit fuzzy, but I don't recall that it was a particularly controversial choice. All of those DVCSes were still young, and there was no definite "winner" yet (GitHub hadn't even been founded). It made the most sense for Mozilla back then, mainly because the Git experience on Windows still wasn't there, and that mattered a lot for Mozilla, with its diverse platform support. As a contributor, I didn't think much of it, although to be fair, at the time, I was mostly consuming the source tarballs. Personal preferences Digging through my archives, I've unearthed a forgotten chapter: I did end up setting up both a Mercurial and a Git mirror of the Firefox source repository on alioth.debian.org. Alioth.debian.org was a FusionForge-based collaboration system for Debian developers, similar to SourceForge. It was the ancestor of salsa.debian.org. I used those mirrors for the Debian packaging of Firefox (cough cough Iceweasel). The Git mirror was created with hg-fast-export, and the Mercurial mirror was only a necessary step in the process. By that time, I had converted my Subversion repositories to Git, and switched off SVK. Incidentally, I started contributing to Git around that time as well. I apparently did this not too long after Mozilla switched to Mercurial. As a Linux user, I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git (and Firefox was not the only thing I was using version control for) Other people had also similarly created their own mirror, or with other tools. But none of them were "compatible": their commit hashes were different. Hg-git, used by the latter, was putting extra information in commit messages that would make the conversion differ, and hg-fast-export would just not be consistent with itself! My mirror is long gone, and those have not been updated in more than a decade. I did end up using Mercurial, when I got commit access to the Firefox source repository in April 2010. I still kept using Git for my Debian activities, but I now was also using Mercurial to push to the Mozilla servers. I joined Mozilla as a contractor a few months after that, and kept using Mercurial for a while, but as a, by then, long time Git user, it never really clicked for me. It turns out, the sentiment was shared by several at Mozilla. Git incursion In the early 2010s, GitHub was becoming ubiquitous, and the Git mindshare was getting large. Multiple projects at Mozilla were already entirely hosted on GitHub. As for the Firefox source code base, Mozilla back then was kind of a Wild West, and engineers being engineers, multiple people had been using Git, with their own inconvenient workflows involving a local Mercurial clone. The most popular set of scripts was moz-git-tools, to incorporate changes in a local Git repository into the local Mercurial copy, to then send to Mozilla servers. In terms of the number of people doing that, though, I don't think it was a lot of people, probably a few handfuls. On my end, I was still keeping up with Mercurial. I think at that time several engineers had their own unofficial Git mirrors on GitHub, and later on Ehsan Akhgari provided another mirror, with a twist: it also contained the full CVS history, which the canonical Mercurial repository didn't have. This was particularly interesting for engineers who needed to do some code archeology and couldn't get past the 2007 cutoff of the Mercurial repository. I think that mirror ultimately became the official-looking, but really unofficial, mozilla-central repository on GitHub. On a side note, a Mercurial repository containing the CVS history was also later set up, but that didn't lead to something officially supported on the Mercurial side. Some time around 2011~2012, I started to more seriously consider using Git for work myself, but wasn't satisfied with the workflows others had set up for themselves. I really didn't like the idea of wasting extra disk space keeping a Mercurial clone around while using a Git mirror. I wrote a Python script that would use Mercurial as a library to access a remote repository and produce a git-fast-import stream. That would allow the creation of a git repository without a local Mercurial clone. It worked quite well, but it was not able to incrementally update. Other, more complete tools existed already, some of which I mentioned above. But as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone. Boot to Git In the same time frame, Mozilla ventured in the Mobile OS sphere with Boot to Gecko, later known as Firefox OS. What does that have to do with version control? The needs of third party collaborators in the mobile space led to the creation of what is now the gecko-dev repository on GitHub. As I remember it, it was challenging to create, but once it was there, Git users could just clone it and have a working, up-to-date local copy of the Firefox source code and its history... which they could already have, but this was the first officially supported way of doing so. Coincidentally, Ehsan's unofficial mirror was having trouble (to the point of GitHub closing the repository) and was ultimately shut down in December 2013. You'll often find comments on the interwebs about how GitHub has become unreliable since the Microsoft acquisition. I can't really comment on that, but if you think GitHub is unreliable now, rest assured that it was worse in its beginning. And its sustainability as a platform also wasn't a given, being a rather new player. So on top of having this official mirror on GitHub, Mozilla also ventured in setting up its own Git server for greater control and reliability. But the canonical repository was still the Mercurial one, and while Git users now had a supported mirror to pull from, they still had to somehow interact with Mercurial repositories, most notably for the Try server. Git slowly creeping in Firefox build tooling Still in the same time frame, tooling around building Firefox was improving drastically. For obvious reasons, when version control integration was needed in the tooling, Mercurial support was always a no-brainer. The first explicit acknowledgement of a Git repository for the Firefox source code, other than the addition of the .gitignore file, was bug 774109. It added a script to install the prerequisites to build Firefox on macOS (still called OSX back then), and that would print a message inviting people to obtain a copy of the source code with either Mercurial or Git. That was a precursor to current bootstrap.py, from September 2012. Following that, as far as I can tell, the first real incursion of Git in the Firefox source tree tooling happened in bug 965120. A few days earlier, bug 952379 had added a mach clang-format command that would apply clang-format-diff to the output from hg diff. Obviously, running hg diff on a Git working tree didn't work, and bug 965120 was filed, and support for Git was added there. That was in January 2014. A year later, when the initial implementation of mach artifact was added (which ultimately led to artifact builds), Git users were an immediate thought. But while they were considered, it was not to support them, but to avoid actively breaking their workflows. Git support for mach artifact was eventually added 14 months later, in March 2016. From gecko-dev to git-cinnabar Let's step back a little here, back to the end of 2014. My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. That meant finding a way to store enough information locally to be able to reconstruct whatever the incremental updates would be relying on (guess why other tools hid a local Mercurial clone under hood). I got something working rather quickly, and after talking to a few people about this side project at the Mozilla Portland All Hands and seeing their excitement, I published a git-remote-hg initial prototype on the last day of the All Hands. Within weeks, the prototype gained the ability to directly push to Mercurial repositories, and a couple months later, was renamed to git-cinnabar. At that point, as a Git user, instead of cloning the gecko-dev repository from GitHub and switching to a local Mercurial repository whenever you needed to push to a Mercurial repository (i.e. the aforementioned Try server, or, at the time, for reviews), you could just clone and push directly from/to Mercurial, all within Git. And it was fast too. You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now). Another couple months later (we're now at the end of April 2015), git-cinnabar became able to start off a local clone of the gecko-dev repository, rather than clone from scratch, which could be time consuming. But because git-cinnabar and the tool that was updating gecko-dev weren't producing the same commits, this setup was cumbersome and not really recommended. For instance, if you pushed something to mozilla-central with git-cinnabar from a gecko-dev clone, it would come back with a different commit hash in gecko-dev, and you'd have to deal with the divergence. Eventually, in April 2020, the scripts updating gecko-dev were switched to git-cinnabar, making the use of gecko-dev alongside git-cinnabar a more viable option. Ironically(?), the switch occurred to ease collaboration with KaiOS (you know, the mobile OS born from the ashes of Firefox OS). Well, okay, in all honesty, when the need of syncing in both directions between Git and Mercurial (we only had ever synced from Mercurial to Git) came up, I nudged Mozilla in the direction of git-cinnabar, which, in my (biased but still honest) opinion, was the more reliable option for two-way synchronization (we did have regular conversion problems with hg-git, nothing of the sort has happened since the switch). One Firefox repository to rule them all For reasons I don't know, Mozilla decided to use separate Mercurial repositories as "branches". With the switch to the rapid release process in 2011, that meant one repository for nightly (mozilla-central), one for aurora, one for beta, and one for release. And with the addition of Extended Support Releases in 2012, we now add a new ESR repository every year. Boot to Gecko also had its own branches, and so did Fennec (Firefox for Mobile, before Android). There are a lot of them. And then there are also integration branches, where developer's work lands before being merged in mozilla-central (or backed out if it breaks things), always leaving mozilla-central in a (hopefully) good state. Only one of them remains in use today, though. I can only suppose that the way Mercurial branches work was not deemed practical. It is worth noting, though, that Mercurial branches are used in some cases, to branch off a dot-release when the next major release process has already started, so it's not a matter of not knowing the feature exists or some such. In 2016, Gregory Szorc set up a new repository that would contain them all (or at least most of them), which eventually became what is now the mozilla-unified repository. This would e.g. simplify switching between branches when necessary. 7 years later, for some reason, the other "branches" still exist, but most developers are expected to be using mozilla-unified. Mozilla's CI also switched to using mozilla-unified as base repository. Honestly, I'm not sure why the separate repositories are still the main entry point for pushes, rather than going directly to mozilla-unified, but it probably comes down to switching being work, and not being a top priority. Also, it probably doesn't help that working with multiple heads in Mercurial, even (especially?) with bookmarks, can be a source of confusion. To give an example, if you aren't careful, and do a plain clone of the mozilla-unified repository, you may not end up on the latest mozilla-central changeset, but rather, e.g. one from beta, or some other branch, depending which one was last updated. Hosting is simple, right? Put your repository on a server, install hgweb or gitweb, and that's it? Maybe that works for... Mercurial itself, but that repository "only" has slightly over 50k changesets and less than 4k files. Mozilla-central has more than an order of magnitude more changesets (close to 700k) and two orders of magnitude more files (more than 700k if you count the deleted or moved files, 350k if you count the currently existing ones). And remember, there are a lot of "duplicates" of this repository. And I didn't even mention user repositories and project branches. Sure, it's a self-inflicted pain, and you'd think it could probably(?) be mitigated with shared repositories. But consider the simple case of two repositories: mozilla-central and autoland. You make autoland use mozilla-central as a shared repository. Now, you push something new to autoland, it's stored in the autoland datastore. Eventually, you merge to mozilla-central. Congratulations, it's now in both datastores, and you'd need to clean-up autoland if you wanted to avoid the duplication. Now, you'd think mozilla-unified would solve these issues, and it would... to some extent. Because that wouldn't cover user repositories and project branches briefly mentioned above, which in GitHub parlance would be considered as Forks. So you'd want a mega global datastore shared by all repositories, and repositories would need to only expose what they really contain. Does Mercurial support that? I don't think so (okay, I'll give you that: even if it doesn't, it could, but that's extra work). And since we're talking about a transition to Git, does Git support that? You may have read about how you can link to a commit from a fork and make-pretend that it comes from the main repository on GitHub? At least, it shows a warning, now. That's essentially the architectural reason why. So the actual answer is that Git doesn't support it out of the box, but GitHub has some backend magic to handle it somehow (and hopefully, other things like Gitea, Girocco, Gitlab, etc. have something similar). Now, to come back to the size of the repository. A repository is not a static file. It's a server with which you negotiate what you have against what it has that you want. Then the server bundles what you asked for based on what you said you have. Or in the opposite direction, you negotiate what you have that it doesn't, you send it, and the server incorporates what you sent it. Fortunately the latter is less frequent and requires authentication. But the former is more frequent and CPU intensive. Especially when pulling a large number of changesets, which, incidentally, cloning is. "But there is a solution for clones" you might say, which is true. That's clonebundles, which offload the CPU intensive part of cloning to a single job scheduled regularly. Guess who implemented it? Mozilla. But that only covers the cloning part. We actually had laid the ground to support offloading large incremental updates and split clones, but that never materialized. Even with all that, that still leaves you with a server that can display file contents, diffs, blames, provide zip archives of a revision, and more, all of which are CPU intensive in their own way. And these endpoints are regularly abused, and cause extra load to your servers, yes plural, because of course a single server won't handle the load for the number of users of your big repositories. And because your endpoints are abused, you have to close some of them. And I'm not mentioning the Try repository with its tens of thousands of heads, which brings its own sets of problems (and it would have even more heads if we didn't fake-merge them once in a while). Of course, all the above applies to Git (and it only gained support for something akin to clonebundles last year). So, when the Firefox OS project was stopped, there wasn't much motivation to continue supporting our own Git server, Mercurial still being the official point of entry, and git.mozilla.org was shut down in 2016. The growing difficulty of maintaining the status quo Slowly, but steadily in more recent years, as new tooling was added that needed some input from the source code manager, support for Git was more and more consistently added. But at the same time, as people left for other endeavors and weren't necessarily replaced, or more recently with layoffs, resources allocated to such tooling have been spread thin. Meanwhile, the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes. The ongoing work to move Try pushes to Lando will hide the problem under the rug, but the underlying problem will still exist (although the last version of Mercurial seems to have improved things). On the flip side, more and more people have been relying on Git for Firefox development, to my own surprise, as I didn't really push for that to happen. It just happened organically, by ways of git-cinnabar existing, providing a compelling experience to those who prefer Git, and, I guess, word of mouth. I was genuinely surprised when I recently heard the use of Git among moz-phab users had surpassed a third. I did, however, occasionally orient people who struggled with Mercurial and said they were more familiar with Git, towards git-cinnabar. I suspect there's a somewhat large number of people who never realized Git was a viable option. But that, on its own, can come with its own challenges: if you use git-cinnabar without being backed by gecko-dev, you'll have a hard time sharing your branches on GitHub, because you can't push to a fork of gecko-dev without pushing your entire local repository, as they have different commit histories. And switching to gecko-dev when you weren't already using it requires some extra work to rebase all your local branches from the old commit history to the new one. Clone times with git-cinnabar have also started to go a little out of hand in the past few years, but this was mitigated in a similar manner as with the Mercurial cloning problem: with static files that are refreshed regularly. Ironically, that made cloning with git-cinnabar faster than cloning with Mercurial. But generating those static files is increasingly time-consuming. As of writing, generating those for mozilla-unified takes close to 7 hours. I was predicting clone times over 10 hours "in 5 years" in a post from 4 years ago, I wasn't too far off. With exponential growth, it could still happen, although to be fair, CPUs have improved since. I will explore the performance aspect in a subsequent blog post, alongside the upcoming release of git-cinnabar 0.7.0-b1. I don't even want to check how long it now takes with hg-git or git-remote-hg (they were already taking more than a day when git-cinnabar was taking a couple hours). I suppose it's about time that I clarify that git-cinnabar has always been a side-project. It hasn't been part of my duties at Mozilla, and the extent to which Mozilla supports git-cinnabar is in the form of taskcluster workers on the community instance for both git-cinnabar CI and generating those clone bundles. Consequently, that makes the above git-cinnabar specific issues a Me problem, rather than a Mozilla problem. Taking the leap I can't talk for the people who made the proposal to move to Git, nor for the people who put a green light on it. But I can at least give my perspective. Developers have regularly asked why Mozilla was still using Mercurial, but I think it was the first time that a formal proposal was laid out. And it came from the Engineering Workflow team, responsible for issue tracking, code reviews, source control, build and more. It's easy to say "Mozilla should have chosen Git in the first place", but back in 2007, GitHub wasn't there, Bitbucket wasn't there, and all the available options were rather new (especially compared to the then 21 years-old CVS). I think Mozilla made the right choice, all things considered. Had they waited a couple years, the story might have been different. You might say that Mozilla stayed with Mercurial for so long because of the sunk cost fallacy. I don't think that's true either. But after the biggest Mercurial repository hosting service turned off Mercurial support, and the main contributor to Mercurial going their own way, it's hard to ignore that the landscape has evolved. And the problems that we regularly encounter with the Mercurial servers are not going to get any better as the repository continues to grow. As far as I know, all the Mercurial repositories bigger than Mozilla's are... not using Mercurial. Google has its own closed-source server, and Facebook has another of its own, and it's not really public either. With resources spread thin, I don't expect Mozilla to be able to continue supporting a Mercurial server indefinitely (although I guess Octobus could be contracted to give a hand, but is that sustainable?). Mozilla, being a champion of Open Source, also doesn't live in a silo. At some point, you have to meet your contributors where they are. And the Open Source world is now majoritarily using Git. I'm sure the vast majority of new hires at Mozilla in the past, say, 5 years, know Git and have had to learn Mercurial (although they arguably didn't need to). Even within Mozilla, with thousands(!) of repositories on GitHub, Firefox is now actually the exception rather than the norm. I should even actually say Desktop Firefox, because even Mobile Firefox lives on GitHub (although Fenix is moving back in together with Desktop Firefox, and the timing is such that that will probably happen before Firefox moves to Git). Heck, even Microsoft moved to Git! With a significant developer base already using Git thanks to git-cinnabar, and all the constraints and problems I mentioned previously, it actually seems natural that a transition (finally) happens. However, had git-cinnabar or something similarly viable not existed, I don't think Mozilla would be in a position to take this decision. On one hand, it probably wouldn't be in the current situation of having to support both Git and Mercurial in the tooling around Firefox, nor the resource constraints related to that. But on the other hand, it would be farther from supporting Git and being able to make the switch in order to address all the other problems. But... GitHub? I hope I made a compelling case that hosting is not as simple as it can seem, at the scale of the Firefox repository. It's also not Mozilla's main focus. Mozilla has enough on its plate with the migration of existing infrastructure that does rely on Mercurial to understandably not want to figure out the hosting part, especially with limited resources, and with the mixed experience hosting both Mercurial and git has been so far. After all, GitHub couldn't even display things like the contributors' graph on gecko-dev until recently, and hosting is literally their job! They still drop the ball on large blames (thankfully we have searchfox for those). Where does that leave us? Gitlab? For those criticizing GitHub for being proprietary, that's probably not open enough. Cloud Source Repositories? "But GitHub is Microsoft" is a complaint I've read a lot after the announcement. Do you think Google hosting would have appealed to these people? Bitbucket? I'm kind of surprised it wasn't in the list of providers that were considered, but I'm also kind of glad it wasn't (and I'll leave it at that). I think the only relatively big hosting provider that could have made the people criticizing the choice of GitHub happy is Codeberg, but I hadn't even heard of it before it was mentioned in response to Mozilla's announcement. But really, with literal thousands of Mozilla repositories already on GitHub, with literal tens of millions repositories on the platform overall, the pragmatic in me can't deny that it's an attractive option (and I can't stress enough that I wasn't remotely close to the room where the discussion about what choice to make happened). "But it's a slippery slope". I can see that being a real concern. LLVM also moved its repository to GitHub (from a (I think) self-hosted Subversion server), and ended up moving off Bugzilla and Phabricator to GitHub issues and PRs four years later. As an occasional contributor to LLVM, I hate this move. I hate the GitHub review UI with a passion. At least, right now, GitHub PRs are not a viable option for Mozilla, for their lack of support for security related PRs, and the more general shortcomings in the review UI. That doesn't mean things won't change in the future, but let's not get too far ahead of ourselves. The move to Git has just been announced, and the migration has not even begun yet. Just because Mozilla is moving the Firefox repository to GitHub doesn't mean it's locked in forever or that all the eggs are going to be thrown into one basket. If bridges need to be crossed in the future, we'll see then. So, what's next? The official announcement said we're not expecting the migration to really begin until six months from now. I'll swim against the current here, and say this: the earlier you can switch to git, the earlier you'll find out what works and what doesn't work for you, whether you already know Git or not. While there is not one unique workflow, here's what I would recommend anyone who wants to take the leap off Mercurial right now: As there is no one-size-fits-all workflow, I won't tell you how to organize yourself from there. I'll just say this: if you know the Mercurial sha1s of your previous local work, you can create branches for them with:
$ git branch <branch_name> $(git cinnabar hg2git <hg_sha1>)
At this point, you should have everything available on the Git side, and you can remove the .hg directory. Or move it into some empty directory somewhere else, just in case. But don't leave it here, it will only confuse the tooling. Artifact builds WILL be confused, though, and you'll have to ./mach configure before being able to do anything. You may also hit bug 1865299 if your working tree is older than this post. If you have any problem or question, you can ping me on #git-cinnabar or #git on Matrix. I'll put the instructions above somewhere on wiki.mozilla.org, and we can collaboratively iterate on them. Now, what the announcement didn't say is that the Git repository WILL NOT be gecko-dev, doesn't exist yet, and WON'T BE COMPATIBLE (trust me, it'll be for the better). Why did I make you do all the above, you ask? Because that won't be a problem. I'll have you covered, I promise. The upcoming release of git-cinnabar 0.7.0-b1 will have a way to smoothly switch between gecko-dev and the future repository (incidentally, that will also allow to switch from a pure git-cinnabar clone to a gecko-dev one, for the git-cinnabar users who have kept reading this far). What about git-cinnabar? With Mercurial going the way of the dodo at Mozilla, my own need for git-cinnabar will vanish. Legitimately, this begs the question whether it will still be maintained. I can't answer for sure. I don't have a crystal ball. However, the needs of the transition itself will motivate me to finish some long-standing things (like finalizing the support for pushing merges, which is currently behind an experimental flag) or implement some missing features (support for creating Mercurial branches). Git-cinnabar started as a Python script, it grew a sidekick implemented in C, which then incorporated some Rust, which then cannibalized the Python script and took its place. It is now close to 90% Rust, and 10% C (if you don't count the code from Git that is statically linked to it), and has sort of become my Rust playground (it's also, I must admit, a mess, because of its history, but it's getting better). So the day to day use with Mercurial is not my sole motivation to keep developing it. If it were, it would stay stagnant, because all the features I need are there, and the speed is not all that bad, although I know it could be better. Arguably, though, git-cinnabar has been relatively stagnant feature-wise, because all the features I need are there. So, no, I don't expect git-cinnabar to die along Mercurial use at Mozilla, but I can't really promise anything either. Final words That was a long post. But there was a lot of ground to cover. And I still skipped over a bunch of things. I hope I didn't bore you to death. If I did and you're still reading... what's wrong with you? ;) So this is the end of Mercurial at Mozilla. So long, and thanks for all the fish. But this is also the beginning of a transition that is not easy, and that will not be without hiccups, I'm sure. So fasten your seatbelts (plural), and welcome the change. To circle back to the clickbait title, did I really kill Mercurial at Mozilla? Of course not. But it's like I stumbled upon a few sparks and tossed a can of gasoline on them. I didn't start the fire, but I sure made it into a proper bonfire... and now it has turned into a wildfire. And who knows? 15 years from now, someone else might be looking back at how Mozilla picked Git at the wrong time, and that, had we waited a little longer, we would have picked some yet to come new horse. But hey, that's the tech cycle for you.

1 April 2023

Mike Hommey: Announcing git-cinnabar 0.6.0

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.11? What s new since 0.6.0rc2?

29 October 2022

Mike Hommey: Announcing git-cinnabar 0.5.11 and 0.6.0rc2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get version 0.5.11 on github. Or get version 0.6.0rc2 on github. What s new in 0.5.11? What s new in 0.6.0rc2?

3 October 2022

Mike Hommey: Announcing git-cinnabar 0.6.0rc1

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.10?

30 July 2022

Mike Hommey: Announcing git-cinnabar 0.5.10

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.9?

15 July 2022

Mike Hommey: Announcing git-cinnabar 0.5.9

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.8?

19 November 2021

Mike Hommey: Announcing git-cinnabar 0.5.8

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.7?

31 March 2021

Mike Hommey: Announcing git-cinnabar 0.5.7

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.6?

10 March 2021

Mike Hommey: 5 years ago, Firefox (re)entered Debian

5 years ago today, I was declaring Iceweasel dead, and Firefox was making a come back in Debian. I hadn t planned to make this post, and in fact, I thought it had been much longer. But coincidentally, I was binge-watching Mr. Robot recently, which prominently featured Iceweasel. iceweasel command in a terminal Mr. Robot is set in the year 2015, and I was surprised that Iceweasel was being used, which led me to search for that post where I announced Firefox was back and realizing that we were close to the 5 years mark. Well, we are at the 5 years mark now. iceweasel start page I d normally say time flies, but it turns out it hasn t flown as much as I thought it did. I wonder if the interminable pandemic is to blame for that.

12 November 2020

Mike Hommey: Announcing git-cinnabar 0.5.6

Please partake in the git-cinnabar survey. Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.5?

30 August 2020

Mike Hommey: [Linux] Disabling CPU turbo, cores and threads without rebooting

[Disclaimer: this has been sitting as a draft for close to three months ; I forgot to publish it, this is now finally done.] In my previous blog post, I built Firefox in a multiple different number of configurations where I d disable the CPU turbo, some of its cores or some of its threads. That is something that was traditionally done via the BIOS, but rebooting between each attempt is not really a great experience. Fortunately, the Linux kernel provides a large number of knobs that allow this at runtime. Turbo This is the most straightforward:
$ echo 0 > /sys/devices/system/cpu/cpufreq/boost
Re-enable with
$ echo 1 > /sys/devices/system/cpu/cpufreq/boost
CPU frequency throttling Even though I haven t mentioned it, I might as well add this briefly. There are many knobs to tweak frequency throttling, but assuming your goal is to disable throttling and set the CPU frequency to its fastest non-Turbo frequency, this is how you do it:
$ echo performance > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor
where $n is the id of the core you want to do that for, so if you want to do that for all the cores, you need to do that for cpu0, cpu1, etc. Re-enable with:
$ echo ondemand > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor
(assuming this was the value before you changed it ; ondemand is usually the default) Cores and Threads This one requires some attention, because you cannot assume anything about the CPU numbers. The first thing you want to do is to check those CPU numbers. You can do so by looking at the physical id and core id fields in /proc/cpuinfo, but the output from lscpu --extended is more convenient, and looks like the following:
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    3700.0000 2200.0000
1   0    0      1    1:1:1:0       yes    3700.0000 2200.0000
2   0    0      2    2:2:2:0       yes    3700.0000 2200.0000
3   0    0      3    3:3:3:0       yes    3700.0000 2200.0000
4   0    0      4    4:4:4:1       yes    3700.0000 2200.0000
5   0    0      5    5:5:5:1       yes    3700.0000 2200.0000
6   0    0      6    6:6:6:1       yes    3700.0000 2200.0000
7   0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)
32  0    0      0    0:0:0:0       yes    3700.0000 2200.0000
33  0    0      1    1:1:1:0       yes    3700.0000 2200.0000
34  0    0      2    2:2:2:0       yes    3700.0000 2200.0000
35  0    0      3    3:3:3:0       yes    3700.0000 2200.0000
36  0    0      4    4:4:4:1       yes    3700.0000 2200.0000
37  0    0      5    5:5:5:1       yes    3700.0000 2200.0000
38  0    0      6    6:6:6:1       yes    3700.0000 2200.0000
39  0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)
Now, this output is actually the ideal case, where pairs of CPUs (virtual cores) on the same physical core are always n, n+32, but I ve had them be pseudo-randomly spread in the past, so be careful. To turn off a core, you want to turn off all the CPUs with the same CORE identifier. To turn off a thread (virtual core), you want to turn off one CPU. On machines with multiple sockets, you can also look at the SOCKET column. Turning off one CPU is done with:
$ echo 0 > /sys/devices/system/cpu/cpu$n/online
Re-enable with:
$ echo 1 > /sys/devices/system/cpu/cpu$n/online
Extra: CPU sets CPU sets are a feature of Linux s cgroups. They allow to restrict groups of processes to a set of cores. The first step is to create a group like so:
$ mkdir /sys/fs/cgroup/cpuset/mygroup
Please note you may already have existing groups, and you may want to create subgroups. You can do so by creating subdirectories. Then you can configure on which CPUs/cores/threads you want processes in this group to run on:
$ echo 0-7,16-23 > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus
The value you write in this file is a comma-separated list of CPU/core/thread numbers or ranges. 0-3 is the range for CPU/core/thread 0 to 3 and is thus equivalent to 0,1,2,3. The numbers correspond to /proc/cpuinfo or the output from lscpu as mentioned above. There are also memory aspects to CPU sets, that I won t detail here (because I don t have a machine with multiple memory nodes), but you can start with:
$ cat /sys/fs/cgroup/cpuset/cpuset.mems > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems
Now you re ready to assign processes to this group:
$ echo $pid >> /sys/fs/cgroup/cpuset/mygroup/tasks
There are a number of tweaks you can do to this setup, I invite you to check out the cpuset(7) manual page. Disabling a group is a little involved. First you need to move the processes to a different group:
$ while read pid; do echo $pid > /sys/fs/cgroup/cpuset/tasks; done < /sys/fs/cgroup/cpuset/mygroup/tasks
Then deassociate CPU and memory nodes:
$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus
$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems
And finally remove the group:
$ rmdir /sys/fs/cgroup/cpuset/mygroup

23 April 2020

Mike Hommey: Announcing git-cinnabar 0.5.5

Please partake in the git-cinnabar survey. Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.4?

15 June 2017

Mike Hommey: Announcing git-cinnabar 0.5.0 beta 2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.0 beta 1?

3 June 2017

Mike Hommey: Announcing git-cinnabar 0.5.0 beta 1

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.4.0?

1 April 2017

Mike Hommey: git-cinnabar experimental features

Since version 0.4.0, git-cinnabar has a few hidden experimental features. Two of them are available in 0.4.0, and a third was recently added on the master branch. The basic mechanism to enable experimental features is to set a preference in the git configuration with a comma-separated list of features to enable, or all, for all of them. That preference is cinnabar.experiments. Any means to set a git configuration can be used. You can: But what features are there? wire In order to talk to Mercurial repositories, git-cinnabar normally uses mercurial python modules. This experimental feature allows to access Mercurial repositories without using the mercurial python modules. It then relies on git-cinnabar-helper to connect to the repository through the mercurial wire protocol. As of version 0.4.0, the feature is automatically enabled when Mercurial is not installed. merge Git-cinnabar currently doesn t allow to push merge commits. The main reason for this is that generating the correct mercurial data for those merges is tricky, and needs to be gotten right. In version 0.4.0, enabling this feature allows to push merge commits as long as the parent commits are available on the mercurial repository. If they aren t, you need to first push them independently, and then push the merge. On current master, that limitation doesn t exist anymore ; you can just push everything in one go. The main caveat with this experimental support for pushing merges is that it currently doesn t handle the case where a file was moved on one of the branches the same way mercurial would (i.e. the information would be lost to mercurial users). clonebundles As of mercurial 3.6, Mercurial servers can opt-in to providing pre-generated bundles, which, when clients support it, takes CPU load off the server when a clone is performed. Good for servers, and usually good for clients too when they have a fast network connection, because downloading a pre-generated bundle is usually faster than waiting for the server to generate one. As of a few days ago, the master branch of git-cinnabar supports cloning using those pre-generated bundles, provided the server advertizes them (mozilla-central does).

Mike Hommey: Progress on git-cinnabar memory usage

This all started when I figured out that git-cinnabar was using crazy amounts of memory when cloning mozilla-central. That pointed to memory allocation patterns that triggered a suboptimal behavior in the glibc memory allocator, and, while overall, git-cinnabar wasn t really abusing memory all things considered, it happened to be realloc()ating way too much. It also turned out that recent changes on the master branch had made most uses of fast-import synchronous, making the whole process significantly slower. This is where we started from on 0.4.0: And on the master branch as of be75326: An interesting thing to note here is that the glibc allocator runaway memory use was, this time, more pronounced on 0.4.0 than on master. It was the opposite originally, but as I mentioned in the past ASLR is making it not happen exactly the same way each time. While I m here, one thing I failed to mention in the previous posts is that all these measurements were done by cloning a local mercurial clone of mozilla-central, served from localhost via HTTP to eliminate the download time from hg.mozilla.org. And while mozilla-central itself has received new changesets since the first post, the local clone has not been updated, such that all subsequent clone tests I did were cloning the exact same repository under the exact same circumstances. After last blog post, I focused on the low hanging fruits identified so far: And this is how things now look on the master branch as of 35c18e7: So where does that put us? On a more granular level: What this means is that there s still room for improvement. But at this point, I d rather focus on other things. Logging all the memory allocations with the python allocator disabled still resulted in a 6.5GB compressed log file, containing 2.6 billion calls to malloc, calloc, free and realloc (down from 2.7 billions in be75326). The number of allocator calls done by the git-remote-hg process is down to 2.25 billions (from 2.34 billion in be75326). Surprisingly, while more things were moved to the helper, it still made less allocations than in be75326: 345 millions, down from 363 millions. Presumably, this is because the number of commands processed by the fast-import code was reduced. Let s now take a look at the various metrics we analyzed previously (the horizontal axis represents the number of allocator calls that happened before the measurement): A few observations to make here: [*] The upper bound is the sum of all sizes ever given to malloc, calloc, realloc etc. and the lower bound is the same, but removing the size of allocations passed as input to realloc (in practical words, this pretends reallocs never happened and that the final size for a given reallocated pointer is the one that counts) So presumably, some of the changes led to more short-lived allocations. Considering python uses its own allocator for sizes smaller than 512 bytes, it s probably not so much of a problem. But let s look at the distribution of buffer sizes (including all sizes given to realloc). (Bucket size is 16 bytes) What is not completely obvious from the logarithmic scale is that, in fact, 98.4% of the allocations are less than 512 bytes with the current master (35c18e7), and they were 95.5% with be75326. Interestingly, though, in absolute numbers, there are less allocations smaller than 512 bytes in current master than in be75326 (1,194,268,071 vs 1,214,784,494). This suggests the extra allocations that happen during some phases are larger than that. There are clearly less allocations across the board (apart from very few exceptions), and close to an order of magnitude less allocations larger than 1MiB. In fact, widening the bucket size to 32KiB shows an order of magnitude difference (or close) for most buckets: An interesting thing to note is how some sizes are largely overrepresented in the data with buckets of 16 bytes, like 768, 1104, 2048, 4128, with other smaller bumps for e.g. 2144, 2464, 2832, 3232, 3696, 4208, 4786, 5424, 6144, 6992, 7920 While some of those are powers of 2, most aren t, and some of them may actually represent objects sized with a power of 2, but that have an extra PyObject overhead. While looking at allocation stats, I got to wonder what the lifetimes of those allocations looked like. So I scanned the allocator logs and measured the distance between when an allocation is made and when it is freed, ignoring reallocs. To give a few examples of what I mean, the following allocation for p gets a lifetime of 0:
void *p = malloc(42);
free(p);
The following a lifetime of 1:
void *p = malloc(42);
void *other = malloc(42);
free(p);
And the following a lifetime of 1 as well:
void *p = malloc(42);
p = realloc(p, 84);
free(p);
(that is, it is not counted as two malloc/free pairs) The further away the free is from the corresponding malloc, the larger the lifetime. And the largest the lifetime can ever be is the total number of allocator function calls minus two, in the hypothetical case the very first allocation is freed as the very last (minus two because we defined the lifetime as the distance). What comes out of this data: All in all, what the data shows is that we re definitely in a better place now than we used to be a few days ago, and that there is still work to do on the memory front, but: So at this point, I won t look any deeper into the memory usage of the git-remote-hg python process, and will instead focus on the planned metadata storage changes. They will allow to share the metadata more easily (allowing faster and more straightforward gecko-dev graft), and will allow to import manifests earlier, which, as mentioned already, will help reduce memory use, but, more importantly, will allow to do more actual work while downloading the data. On slow networks, this is crucial to make clones and pulls faster.

23 March 2017

Mike Hommey: Why is the git-cinnabar master branch slower to clone?

Apart from the memory considerations, one thing that the data presented in the When the memory allocator works against you post that I haven t touched in the followup posts is that there is a large difference in the time it takes to clone mozilla-central with git-cinnabar 0.4.0 vs. the master branch. One thing that was mentioned in the first followup is that reducing the amount of realloc and substring copies made the cloning more than 15 minutes faster on master. But the same code exists in 0.4.0, so this isn t part of the difference. So what s going on? Looking at the CPU usage during the clone is enlightening. On 0.4.0: On master: (Note: the data gathering is flawed in some ways, which explains why the git-remote-hg process goes above 100%, which is not possible for this python process. The data is however good enough for the high level analysis that follows, so I didn t bother to get something more acurate) On 0.4.0, the git-cinnabar-helper process was saturating one CPU core during the File import phase, and the git-remote-hg process was saturating one CPU core during the Manifest import phase. Overall, the sum of both processes usually used more than one and a half core. On master, however, the total of both processes barely uses more than one CPU core. What happened? This and that happened. Essentially, before those changes, git-remote-hg would send instructions to git-fast-import (technically, git-cinnabar-helper, but in this case it s only used as a wrapper for git-fast-import), and use marks to track the git objects that git-fast-import created. After those changes, git-remote-hg asks git-fast-import the git object SHA1 of objects it just asked to be created. In other words, those changes replaced something asynchronous with something synchronous: while it used to be possible for git-remote-hg to work on the next file/manifest/changeset while git-fast-import was working on the previous one, it now waits. The changes helped simplify the python code, but made the overall clone process much slower. If I m not mistaken, the only real use for that information is for the mapping of mercurial to git SHA1s, which is actually rarely used during the clone, except at the end, when storing it. So what I m planning to do is to move that mapping to the git-cinnabar-helper process, which, incidentally, will kill not 2, but 3 birds with 1 stone: So expect git-cinnabar 0.5 to get moar faster, and to use moar less memory.

Mike Hommey: Analyzing git-cinnabar memory use

In previous post, I was looking at the allocations git-cinnabar makes. While I had the data, I figured I d also look how the memory use correlates with expectations based on repository data, to put things in perspective. As a reminder, this is what the allocations look like (horizontal axis being the number of allocator function calls): There are 7 different phases happening during a git clone using git-cinnabar, most of which can easily be identified on the graph above: Overall, except for the stupid spike during the final phase, the manifest DAG and the glibc allocator runaway memory use described in previous posts, there is nothing terribly bad with the git-cinnabar memory usage, all things considered. Mozilla-central is just big. The spike is already half addressed, and work is under way for the glibc allocator runaway memory use. The manifest DAG, interestingly, is actually mostly useless. It s only used to track the heads of the DAG, and it s very much possible to track heads of a DAG without actually storing the entire DAG. In fact, that s what git-cinnabar already does for changeset heads so we would only need to do the same for manifest heads. One could argue that the 1.4GB of raw RevChunk data we re keeping in memory for later user could be kept on disk instead. I haven t done this so far because I didn t want to have to handle temporary files (and answer questions like where to put them? , what if there isn t enough disk space there? , what if disk access is slow? , etc.). But the majority of this data is from manifests. I m already planning changes in how git-cinnabar stores manifests data that will actually allow to import them directly, instead of keeping them in memory until files are imported. This would instantly remove 1.18GB of memory usage. The downside, however, is that this would be more CPU intensive: Importing changesets will require creating the corresponding git trees, and getting the stored manifest data. I think it s worth, though. Finally, one thing that isn t obvious here, but that was found while analyzing why RSS would be going up despite memory usage going down, is that git-cinnabar is doing way too many reallocations and substring allocations. So let s look at two metrics that hopefully will highlight the problem: Assuming all the requested memory is filled at some point, the former gives us an upper bound to the amount of memory that is ever filled or copied (the amount that would be filled if no realloc was ever in-place), while the the latter gives us a lower bound (the amount that would be filled or copied if all reallocs were in-place). Ideally, we d want the upper and lower bounds to be close to each other (indicating few realloc calls), and the total amount at the end of the process to be as close as possible to the amount of data we re handling (which we ve seen is around 3TB). and this is clearly bad. Like, really bad. But we already knew that from the previous post, although it s nice to put numbers on it. The lower bound is about twice the amount of data we re handling, and the upper bound is more than 10 times that amount. Clearly, we can do better. We ll see how things evolve after the necessary code changes happen. Stay tuned.

22 March 2017

Mike Hommey: When the memory allocator works against you, part 2

This is a followup to the When the memory allocator works against you post from a few days ago. You may want to read that one first if you haven t, and come back. In case you don t or didn t read it, it was all about memory consumption during a git clone of the mozilla-central mercurial repository using git-cinnabar, and how the glibc memory allocator is using more than one would expect. This post is going to explore how/why it s happening. I happen to have written a basic memory allocation logger for Firefox, so I used it to log all the allocations happening during a git clone exhibiting the runaway memory increase behavior (using a python that doesn t use its own allocator for small allocations). The result was a 6.5 GB log file (compressed with zstd ; 125 GB uncompressed!) with 2.7 billion calls to malloc, calloc, free, and realloc, recorded across (mostly) 2 processes (the python git-remote-hg process and the native git-cinnabar-helper process ; there are other short-lived processes involved, but they do less than 5000 calls in total). The vast majority of those 2.7 billion calls is done by the python git-remote-hg process: 2.34 billion calls. We ll only focus on this process. Replaying those 2.34 billion calls with a program that reads the log allowed to reproduce the runaway memory increase behavior to some extent. I went an extra mile and modified glibc s realloc code in memory so it doesn t call memcpy, to make things faster. I also ran under setarch x86_64 -R to disable ASLR for reproducible results (two consecutive runs return the exact same numbers, which doesn t happen with ASLR enabled). I also modified the program to report the number of live allocations (allocations that haven t been freed yet), and the cumulated size of the actually requested allocations (that is, the sum of all the sizes given to malloc, calloc, and realloc calls for live allocations, as opposed to what the memory allocator really allocated, which can be more, per malloc_usable_size). RSS was not tracked because the allocations are never filled to make things faster, such that pages for large allocations are never dirty, and RSS doesn t grow as much because of that. Full disclosure: it turns out the system bytes and in-use bytes numbers I had been collecting in the previous post were smaller than what they should have been, and were excluding memory that the glibc memory allocator would have mmap()ed. That however doesn t affect the trends that had been witnessed. The data below is corrected. (Note that in the graph above and the graphs that follow, the horizontal axis represents the number of allocator function calls performed) While I was here, I figured I d check how mozjemalloc performs, and it has a better behavior (although it has more overhead). What doesn t appear on this graph, though, is that mozjemalloc also tells the OS to drop some pages even if it keeps them mapped (madvise(MADV_DONTNEED)), so in practice, it is possible the actual RSS decreases too. And jemalloc 4.5: (It looks like it has better memory usage than mozjemalloc for this use case, but its stats are being thrown off at some point, I ll have to investigate) Going back to the first graph, let s get a closer look at what the allocations look like when the system bytes number is increasing a lot. The highlights in the following graphs indicate the range the next graph will be showing. So what we have here is a bunch of small allocations (small enough that they don t seem to move the requested line ; most under 512 bytes, so under normal circumstances, they would be allocated by python, a few between 512 and 2048 bytes), and a few large allocations, one of which triggers a bump in memory use. What can appear weird at first glance is that we have a large allocation not requiring more system memory, later followed by a smaller one that does. What the allocations actually look like is the following:

void *ptr0 = malloc(4850928); // #1391340418
void *ptr1 = realloc(some_old_ptr, 8000835); // #1391340419
free(ptr0); // #1391340420
ptr1 = realloc(ptr1, 8000925); // #1391340421
/* ... */
void *ptrn = malloc(879931); // #1391340465
ptr1 = realloc(ptr1, 8880819); // #1391340466
free(ptrn); // #1391340467
As it turns out, inspecting all the live allocations at that point, while there was a hole large enough to do the first two reallocs (the second actually happens in place), at the point of the third one, there wasn t a large enough hole to fit 8.8MB. What inspecting the live allocations also reveals, is that there is a large number of large holes between all the allocated memory ranges, presumably coming from previous similar patterns. There are, in fact, 91 holes larger than 1MB, 24 of which are larger than 8MB. It s the accumulation of those holes that can t be used to fulfil larger allocations that makes the memory use explode. And there aren t enough small allocations happening to fill those holes. In fact, the global trend is for less and less memory to be allocated, so, smaller holes are also being poked all the time. Really, it s all a straightforward case of memory fragmentation. The reason it tends not to happen with jemalloc is that jemalloc groups allocations by sizes, which the glibc allocator doesn t seem to be doing. The following is how we got a hole that couldn t fit the 8.8MB allocation in the first place:

ptr1 = realloc(ptr1, 8880467); // #1391324989; ptr1 is 0x5555de145600
/* ... */
void *ptrx = malloc(232); // #1391325001; ptrx is 0x5555de9bd760 ; that is 13 bytes after the end of ptr1.
/* ... */
free(ptr1); // #1391325728; this leaves a hole of 8880480 bytes at 0x5555de145600.
All would go well if ptrx was free()d, but it looks like it s long-lived. At least, it s still allocated by the time we reach the allocator call #1391340466. And since the hole is 339 bytes too short for the requested allocation, the allocator has no other choice than request more memory to the system. What s bothering, though, is that the allocator chose to allocate ptrx in the space following ptr1, when it allocated similarly sized buffers after allocating ptr1 and before allocating ptrx in completely different places, and while there are plenty of holes in the allocated memory where it could fit. Interestingly enough, ptrx is a 232 bytes allocation, which means under normal circumstances, python itself would be allocating it. In all likeliness, when the python allocator is enabled, it s allocations larger than 512 bytes that become obstacles to the larger allocations. Another possibility is that the 256KB fragments that the python allocator itself allocates to hold its own allocations become the obstacles (my original hypothesis). I think the former is more likely, though, putting back the blame entirely on glibc s shoulders. Now, it looks like the allocation pattern we have here is suboptimal, so I re-ran a git clone under a debugger to catch when a realloc() for 8880819 bytes happens (the size is peculiar enough that it only happened once in the allocation log). But doing that with a conditional breakpoint is just too slow, so I injected a realloc wrapper with LD_PRELOAD that sends a SIGTRAP signal to the process, so that an attached debugger can catch it. Thanks to the support for python in gdb, it was then posible to pinpoint the exact python instructions that made the realloc() call (it didn t come as a surprise ; in fact, that was one of the places I had in mind, but I wanted definite proof):

new = ''
end = 0
# ...
for diff in RevDiff(rev_patch):
    new += data[end:diff.start]
    new += diff.text_data
    end = diff.end
    # ...
new += data[end:]
What happens here is that we re creating a mercurial manifest we got from the server in patch form against a previous manifest. So data contains the original manifest, and rev_patch the patch. The patch essentially contains instructions of the form replace n bytes at offset o with the content c . The code here just does that in the most straightforward way, implementation-wise, but also, it turns out, possibly the worst way. So let s unroll this loop over a couple iterations:
new = ''
This allocates an empty str object. In fact, this doesn t actually allocate anything, and only creates a pointer to an interned representation of an empty string.
new += data[0:diff.start]
This is where things start to get awful. data is a str, so data[0:diff.start] creates a new, separate, str for the substring. One allocation, one copy. Then appends it to new. Fortunately, CPython is smart enough, and just assigns data[0:diff.start] to new. This can easily be verified with the CPython REPL:
>>> foo = ''
>>> bar = 'bar'
>>> foo += bar
>>> foo is bar
True
(and this is not happening because the example string is small here ; it also happens with larger strings, like 'bar' * 42000) Back to our loop:
new += diff.text_data
Now, new is realloc()ated to have the right size to fit the appended text in it, and the contents of diff.text_data is copied. One realloc, one copy. Let s go for a second iteration:
new += data[diff.end:new_diff.start]
Here again, we re doing an allocation for the substring, and one copy. Then new is realloc()ated again to append the substring, which is an additional copy.
new += new_diff.text_data
new is realloc()ated yet again to append the contents of new_diff.text_data. We now finish with:
new += data[new_diff.end:]
which, again creates a substring from the data, and then proceeds to realloc()ate new one freaking more time. That s a lot of malloc()s and realloc()s to be doing After modifying the code to implement the first and fifth items, memory usage during a git clone of mozilla-central looks like the following (with the python allocator enabled): (Note this hasn t actually landed on the master branch yet) Compared to what it looked like before, this is largely better. But that s not the only difference: the clone was also about 1000 seconds faster. That s more than 15 minutes! But that s not all so surprising when you know the volumes of data handled here. More insight about this coming in an upcoming post. But while the changes outlined above make the glibc allocator behavior less likely to happen, it doesn t totally obliviate it. In fact, it seems it is still happening by the end of the manifest import phase. We re still allocating increasingly large temporary buffers because the size of the imported manifests grows larger and larger, and every one of them is the result of patching a previous one. The only way to avoid those large allocations creating holes would be to avoid doing them in the first place. My first attempt at doing that, keeping manifests as lists of lines instead of raw buffers, worked, but was terribly slow. So slow, in fact, that I had to stop a clone early and estimated the process would likely have taken a couple days. Iterating over multiple generators at the same time, a lot, kills performance, apparently. I ll have to try with significantly less of that.

12 March 2017

Mike Hommey: When the memory allocator works against you

Cloning mozilla-central with git-cinnabar requires a lot of memory. Actually too much memory to fit in a 32-bits address space. I hadn t optimized for memory use in the first place. For instance, git-cinnabar keeps sha-1s in memory as hex values (40 bytes) rather than raw values (20 bytes). When I wrote the initial prototype, it didn t matter that much, and while close(ish) to the tipping point, it didn t require more than 2GB of memory at the time. Time passed, and mozilla-central grew. I suspect the recent addition of several thousands of commits and files has made things worse. In order to come up with a plan to make things better (short or longer term), I needed data. So I added some basic memory resource tracking, and collected data while cloning mozilla-central. I must admit, I was not ready for what I witnessed. Follow me for a tale of frustrations (plural). I was expecting things to have gotten worse on the master branch (which I used for the data collection) because I am in the middle of some refactoring and did many changes that I was suspecting might have affected memory usage. I wasn t, however, expecting to see the clone command using 10GB(!) memory at peak usage across all processes. (Note, those memory sizes are RSS, minus shared ) It also was taking an unexpected long time, but then, I hadn t cloned a large repository like mozilla-central from scratch in a while, so I wasn t sure if it was just related to its recent growth in size or otherwise. So I collected data on 0.4.0 as well. Less time spent, less memory usage ok. There s definitely something wrong on master. But wait a minute, that slope from ~2GB to ~4GB on the git-remote-hg process doesn t actually make any kind of sense. I mean, I d understand it if it were starting and finishing with the Import manifest phase, but it starts in the middle of it, and ends long before it finishes. WTH? First things first, since RSS can be a variety of things, I checked /proc/$pid/smaps and confirmed that most of it was, indeed, the heap. That s the point where you reach for Google, type something like python memory profile and find various tools. One from the results that I remembered having used in the past is guppy s heapy. Armed with pdb, I broke execution in the middle of the slope, and tried to get memory stats with heapy. SIGSEGV. Ouch. Let s try something else. I reached out to objgraph and pympler. SIGSEGV. Ouch again. Tried working around the crashes for a while (too long while, retrospectively, hindsight is 20/20), and was somehow successful at avoiding them by peaking at a smaller set of objects. But whatever I did, despite being attached to a process that had 2.6GB RSS, I wasn t able to find more than 1.3GB of data. This wasn t adding up. It surely didn t help that getting to that point took close to an hour each time. Retrospectively, I wish I had investigated using something like Checkpoint/Restore in Userspace. Anyways, after a while, I decided that I really wanted to try to see the whole picture, not smaller peaks here and there that might be missing something. So I resolved myself to look at the SIGSEGV I was getting when using pympler, collecting a core dump when it happened. Guess what? The Debian python-dbg package does not contain the debug symbols for the python package. The core dump was useless. Since I was expecting I d have to fix something in python, I just downloaded its source and built it. Ran the command again, waited, and finally got a backtrace. First Google hit for the crashing function? The exact (unfixed) crash reported on the python bug tracker. No patch. Crashing code is doing:
((f)->f_builtins != (f)->f_tstate->interp->builtins)
And (f)->f_tstate is NULL. Classic NULL deref. Added a guard (assessing it wouldn t break anything). Ran the command again. Waited. Again. SIGSEGV. Facedesk. Another crash on the same line. Did I really use the patched python? Yes. But this time (f)->f_tstate->interp is NULL. Sigh. Same player, shoot again. Finally, no crash but still stuck on only 1.3GB accounted for. Ok, I know not all python memory profiling tools are entirely reliable, let s try heapy again. SIGSEGV. Sigh. No debug info on the heapy module, where the crash happens. Sigh. Rebuild the module with debug info, try again. The backtrace looks like heapy is recursing a lot. Look at %rsp, compare with the address space from /proc/$pid/maps. Confirmed. A stack overflow. Let s do ugly things and increase the stack size in brutal ways. Woohoo! Now heapy tells me there s even less memory used than the 1.3GB I found so far. Like, half less. Yeah, right. I m not clear on how I got there, but that s when I found gdb-heap, a tool from Red Hat s David Malcolm, and the associated talk Dude, where s my RAM? A deep dive into how Python uses memory (slides). With a gdb attached, I would finally be able to rip python s guts out and find where all the memory went. Or so I thought. The gdb-heap tool only found about 600MB. About as much as heapy did, for that matter, but it could be coincidental. Oh. Kay. I don t remember exactly what went through my mind then, but, since I was attached to a running process with gdb, I typed the following on the gdb prompt:
gdb> call malloc_stats()
And that s when the truth was finally unvealed: the memory allocator was just acting up the whole time. The ouput was something like:
Arena 0:
system bytes    =  some number above (but close to) 2GB
in use bytes    =  some number above (but close to) 600MB
Yes, the glibc allocator was just telling it had allocated 600MB of memory, but was holding onto 2GB. I must have found a really bad allocation pattern that causes massive fragmentation. One thing that David Malcolm s talk taught me, though, is that python uses its own allocator for small sizes, so the glibc allocator doesn t know about them. And, roughly, adding the difference between RSS and what glibc said it was holding to to the use bytes it reported somehow matches the 1.3GB I had found so far. So it was time to see how those things evolved in time, during the entire clone process. I grabbed some new data, tracking the evolution of system bytes and in use bytes . There are two things of note on this data: In fact, the latter is exactly how git-cinnabar is supposed to work: It reads changesets and manifests chunks, and holds onto them while importing files. Then it throws away those manifests and changesets chunks one by one while it imports them. There is, however, some extra bookkeeping that requires some additional memory, but it s expected to be less memory consuming than keeping all the changesets and manifests chunks in memory. At this point, I thought a possible explanation is that since both python and glibc are mmap()ing their own arenas, they might be intertwined in a way that makes things not go well with the allocation pattern happening during the Import manifest phase (which, in fact, allocates and frees increasingly large buffers for each manifest, as manifests grow in size in the mozilla-central history). To put the theory at work, I patched the python interpreter again, making it use malloc() instead of mmap() for its arenas. Aha! I thought. That definitely looks much better. Less gap between what glibc says it requested from the system and the RSS size. And, more importantly, no runaway increase of memory usage in the middle of nowhere. I was preparing myself to write a post about how mixing allocators could have unintended consequences. As a comparison point, I went ahead and ran another test, with the python allocator entirely disabled, this time. Heh. It turns out glibc was acting up all alone. So much for my (plausible) theory. (I still think mixing allocators can have unintended consequences.) (Note, however, that the reason why the python allocator exists is valid: without it, the overall clone took almost 10 more minutes) And since I had been getting all this data with 0.4.0, I gathered new data without the python allocator with the master branch. This paints a rather different picture than the original data on that branch, with much less memory use regression than one would think. In fact, there isn t much difference, except for the spike at the end, which got worse, and some of the noise during the Import manifests phase that got bigger, implying larger amounts of temporary memory used. The latter may contribute to the allocation patterns that throw glibc s memory allocator off. It turns out tracking memory usage in python 2.7 is rather painful, and not all the tools paint a complete picture of it. I hear python 3.x is somewhat better in that regard, and I hope it s true, but at the moment, I m stuck with 2.7. The most reliable tool I ve used here, it turns out, is pympler. Or rebuilding the python interpreter without its allocator, and asking the system allocator what is allocated. With all this data, I now have some defined problems to tackle, some easy (the spike at the end of the clone), and some less easy (working around glibc allocator s behavior). I have a few hunches as to what kind of allocations are causing the runaway increase of RSS. Coincidentally, I m half-way through a refactor of the code dealing with manifests, and it should help dealing with the issue. But that will be the subject of a subsequent post.

Next.