Search Results: "glandium"

Did you hear the news? Firefox development is moving from Mercurial to Git. While the decision is far from being mine, and I was barely involved in the small incremental changes that ultimately led to this decision, I feel I have to take at least some responsibility. And if you are one of those who would rather use Mercurial than Git, you may direct all your ire at me. But let's take a step back and review the past 25 years leading to this decision. You'll forgive me for skipping some details and any possible inaccuracies. This is already a long post, while I could have been more thorough, even I think that would have been too much. This is also not an official Mozilla position, only my personal perception and recollection as someone who was involved at times, but mostly an observer from a distance. From CVS to DVCS From its release in 1998, the Mozilla source code was kept in a CVS repository. If you're too young to know what CVS is, let's just say it's an old school version control system, with its set of problems. Back then, it was mostly ubiquitous in the Open Source world, as far as I remember. In the early 2000s, the Subversion version control system gained some traction, solving some of the problems that came with CVS. Incidentally, Subversion was created by Jim Blandy, who now works at Mozilla on completely unrelated matters. In the same period, the Linux kernel development moved from CVS to Bitkeeper, which was more suitable to the distributed nature of the Linux community. BitKeeper had its own problem, though: it was the opposite of Open Source, but for most pragmatic people, it wasn't a real concern because free access was provided. Until it became a problem: someone at OSDL developed an alternative client to BitKeeper, and licenses of BitKeeper were rescinded for OSDL members, including Linus Torvalds (they were even prohibited from purchasing one). Following this fiasco, in April 2005, two weeks from each other, both Git and Mercurial were born. The former was created by Linus Torvalds himself, while the latter was developed by Olivia Mackall, who was a Linux kernel developer back then. And because they both came out of the same community for the same needs, and the same shared experience with BitKeeper, they both were similar distributed version control systems. Interestingly enough, several other DVCSes existed:

SVK, a DVCS built on top of Subversion, allowing users to create local (offline) branches of remote Subversion repositories. It was also known for its powerful merging capabilities. I picked it at some point for my Debian work, mainly because I needed to interact with Subversion repositories.
Arch (tla), later known as GNU arch. From what I remember, it was awful to use. You think Git is complex or confusing? Arch was way worse. It was forked as "Bazaar", but the fork was abandoned in favor of "Bazaar-NG", now known as "Bazaar" or "bzr", a much more user-friendly DVCS. The first release of Bzr actually precedes Git's by two weeks. I guess it was too new to be considered by Linus Torvalds for the Linux kernel needs.
Monotone, which I don't know much about, but it was mentioned by Linus Torvalds two days before the initial Git commit of Git. As far as I know, it was too slow for the Linux kernel's needs. I'll note in passing that Monotone is the creation of Graydon Hoare, who also created Rust.
Darcs, with its patch-based model, rather than the common snapshot-based model, allowed more flexible management of changes. This approach came, however, at the expense of performance.

In this landscape, the major difference Git was making at the time was that it was blazing fast. Almost incredibly so, at least on Linux systems. That was less true on other platforms (especially Windows). It was a game-changer for handling large codebases in a smooth manner. Anyways, two years later, in 2007, Mozilla decided to move its source code not to Bzr, not to Git, not to Subversion (which, yes, was a contender), but to Mercurial. The decision "process" was laid down in two rather colorful blog posts. My memory is a bit fuzzy, but I don't recall that it was a particularly controversial choice. All of those DVCSes were still young, and there was no definite "winner" yet (GitHub hadn't even been founded). It made the most sense for Mozilla back then, mainly because the Git experience on Windows still wasn't there, and that mattered a lot for Mozilla, with its diverse platform support. As a contributor, I didn't think much of it, although to be fair, at the time, I was mostly consuming the source tarballs. Personal preferences Digging through my archives, I've unearthed a forgotten chapter: I did end up setting up both a Mercurial and a Git mirror of the Firefox source repository on alioth.debian.org. Alioth.debian.org was a FusionForge-based collaboration system for Debian developers, similar to SourceForge. It was the ancestor of salsa.debian.org. I used those mirrors for the Debian packaging of Firefox (cough cough Iceweasel). The Git mirror was created with hg-fast-export, and the Mercurial mirror was only a necessary step in the process. By that time, I had converted my Subversion repositories to Git, and switched off SVK. Incidentally, I started contributing to Git around that time as well. I apparently did this not too long after Mozilla switched to Mercurial. As a Linux user, I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git (and Firefox was not the only thing I was using version control for) Other people had also similarly created their own mirror, or with other tools. But none of them were "compatible": their commit hashes were different. Hg-git, used by the latter, was putting extra information in commit messages that would make the conversion differ, and hg-fast-export would just not be consistent with itself! My mirror is long gone, and those have not been updated in more than a decade. I did end up using Mercurial, when I got commit access to the Firefox source repository in April 2010. I still kept using Git for my Debian activities, but I now was also using Mercurial to push to the Mozilla servers. I joined Mozilla as a contractor a few months after that, and kept using Mercurial for a while, but as a, by then, long time Git user, it never really clicked for me. It turns out, the sentiment was shared by several at Mozilla. Git incursion In the early 2010s, GitHub was becoming ubiquitous, and the Git mindshare was getting large. Multiple projects at Mozilla were already entirely hosted on GitHub. As for the Firefox source code base, Mozilla back then was kind of a Wild West, and engineers being engineers, multiple people had been using Git, with their own inconvenient workflows involving a local Mercurial clone. The most popular set of scripts was moz-git-tools, to incorporate changes in a local Git repository into the local Mercurial copy, to then send to Mozilla servers. In terms of the number of people doing that, though, I don't think it was a lot of people, probably a few handfuls. On my end, I was still keeping up with Mercurial. I think at that time several engineers had their own unofficial Git mirrors on GitHub, and later on Ehsan Akhgari provided another mirror, with a twist: it also contained the full CVS history, which the canonical Mercurial repository didn't have. This was particularly interesting for engineers who needed to do some code archeology and couldn't get past the 2007 cutoff of the Mercurial repository. I think that mirror ultimately became the official-looking, but really unofficial, mozilla-central repository on GitHub. On a side note, a Mercurial repository containing the CVS history was also later set up, but that didn't lead to something officially supported on the Mercurial side. Some time around 2011~2012, I started to more seriously consider using Git for work myself, but wasn't satisfied with the workflows others had set up for themselves. I really didn't like the idea of wasting extra disk space keeping a Mercurial clone around while using a Git mirror. I wrote a Python script that would use Mercurial as a library to access a remote repository and produce a git-fast-import stream. That would allow the creation of a git repository without a local Mercurial clone. It worked quite well, but it was not able to incrementally update. Other, more complete tools existed already, some of which I mentioned above. But as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone. Boot to Git In the same time frame, Mozilla ventured in the Mobile OS sphere with Boot to Gecko, later known as Firefox OS. What does that have to do with version control? The needs of third party collaborators in the mobile space led to the creation of what is now the gecko-dev repository on GitHub. As I remember it, it was challenging to create, but once it was there, Git users could just clone it and have a working, up-to-date local copy of the Firefox source code and its history... which they could already have, but this was the first officially supported way of doing so. Coincidentally, Ehsan's unofficial mirror was having trouble (to the point of GitHub closing the repository) and was ultimately shut down in December 2013. You'll often find comments on the interwebs about how GitHub has become unreliable since the Microsoft acquisition. I can't really comment on that, but if you think GitHub is unreliable now, rest assured that it was worse in its beginning. And its sustainability as a platform also wasn't a given, being a rather new player. So on top of having this official mirror on GitHub, Mozilla also ventured in setting up its own Git server for greater control and reliability. But the canonical repository was still the Mercurial one, and while Git users now had a supported mirror to pull from, they still had to somehow interact with Mercurial repositories, most notably for the Try server. Git slowly creeping in Firefox build tooling Still in the same time frame, tooling around building Firefox was improving drastically. For obvious reasons, when version control integration was needed in the tooling, Mercurial support was always a no-brainer. The first explicit acknowledgement of a Git repository for the Firefox source code, other than the addition of the .gitignore file, was bug 774109. It added a script to install the prerequisites to build Firefox on macOS (still called OSX back then), and that would print a message inviting people to obtain a copy of the source code with either Mercurial or Git. That was a precursor to current bootstrap.py, from September 2012. Following that, as far as I can tell, the first real incursion of Git in the Firefox source tree tooling happened in bug 965120. A few days earlier, bug 952379 had added a mach clang-format command that would apply clang-format-diff to the output from hg diff. Obviously, running hg diff on a Git working tree didn't work, and bug 965120 was filed, and support for Git was added there. That was in January 2014. A year later, when the initial implementation of mach artifact was added (which ultimately led to artifact builds), Git users were an immediate thought. But while they were considered, it was not to support them, but to avoid actively breaking their workflows. Git support for mach artifact was eventually added 14 months later, in March 2016. From gecko-dev to git-cinnabar Let's step back a little here, back to the end of 2014. My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. That meant finding a way to store enough information locally to be able to reconstruct whatever the incremental updates would be relying on (guess why other tools hid a local Mercurial clone under hood). I got something working rather quickly, and after talking to a few people about this side project at the Mozilla Portland All Hands and seeing their excitement, I published a git-remote-hg initial prototype on the last day of the All Hands. Within weeks, the prototype gained the ability to directly push to Mercurial repositories, and a couple months later, was renamed to git-cinnabar. At that point, as a Git user, instead of cloning the gecko-dev repository from GitHub and switching to a local Mercurial repository whenever you needed to push to a Mercurial repository (i.e. the aforementioned Try server, or, at the time, for reviews), you could just clone and push directly from/to Mercurial, all within Git. And it was fast too. You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now). Another couple months later (we're now at the end of April 2015), git-cinnabar became able to start off a local clone of the gecko-dev repository, rather than clone from scratch, which could be time consuming. But because git-cinnabar and the tool that was updating gecko-dev weren't producing the same commits, this setup was cumbersome and not really recommended. For instance, if you pushed something to mozilla-central with git-cinnabar from a gecko-dev clone, it would come back with a different commit hash in gecko-dev, and you'd have to deal with the divergence. Eventually, in April 2020, the scripts updating gecko-dev were switched to git-cinnabar, making the use of gecko-dev alongside git-cinnabar a more viable option. Ironically(?), the switch occurred to ease collaboration with KaiOS (you know, the mobile OS born from the ashes of Firefox OS). Well, okay, in all honesty, when the need of syncing in both directions between Git and Mercurial (we only had ever synced from Mercurial to Git) came up, I nudged Mozilla in the direction of git-cinnabar, which, in my (biased but still honest) opinion, was the more reliable option for two-way synchronization (we did have regular conversion problems with hg-git, nothing of the sort has happened since the switch). One Firefox repository to rule them all For reasons I don't know, Mozilla decided to use separate Mercurial repositories as "branches". With the switch to the rapid release process in 2011, that meant one repository for nightly (mozilla-central), one for aurora, one for beta, and one for release. And with the addition of Extended Support Releases in 2012, we now add a new ESR repository every year. Boot to Gecko also had its own branches, and so did Fennec (Firefox for Mobile, before Android). There are a lot of them. And then there are also integration branches, where developer's work lands before being merged in mozilla-central (or backed out if it breaks things), always leaving mozilla-central in a (hopefully) good state. Only one of them remains in use today, though. I can only suppose that the way Mercurial branches work was not deemed practical. It is worth noting, though, that Mercurial branches are used in some cases, to branch off a dot-release when the next major release process has already started, so it's not a matter of not knowing the feature exists or some such. In 2016, Gregory Szorc set up a new repository that would contain them all (or at least most of them), which eventually became what is now the mozilla-unified repository. This would e.g. simplify switching between branches when necessary. 7 years later, for some reason, the other "branches" still exist, but most developers are expected to be using mozilla-unified. Mozilla's CI also switched to using mozilla-unified as base repository. Honestly, I'm not sure why the separate repositories are still the main entry point for pushes, rather than going directly to mozilla-unified, but it probably comes down to switching being work, and not being a top priority. Also, it probably doesn't help that working with multiple heads in Mercurial, even (especially?) with bookmarks, can be a source of confusion. To give an example, if you aren't careful, and do a plain clone of the mozilla-unified repository, you may not end up on the latest mozilla-central changeset, but rather, e.g. one from beta, or some other branch, depending which one was last updated. Hosting is simple, right? Put your repository on a server, install hgweb or gitweb, and that's it? Maybe that works for... Mercurial itself, but that repository "only" has slightly over 50k changesets and less than 4k files. Mozilla-central has more than an order of magnitude more changesets (close to 700k) and two orders of magnitude more files (more than 700k if you count the deleted or moved files, 350k if you count the currently existing ones). And remember, there are a lot of "duplicates" of this repository. And I didn't even mention user repositories and project branches. Sure, it's a self-inflicted pain, and you'd think it could probably(?) be mitigated with shared repositories. But consider the simple case of two repositories: mozilla-central and autoland. You make autoland use mozilla-central as a shared repository. Now, you push something new to autoland, it's stored in the autoland datastore. Eventually, you merge to mozilla-central. Congratulations, it's now in both datastores, and you'd need to clean-up autoland if you wanted to avoid the duplication. Now, you'd think mozilla-unified would solve these issues, and it would... to some extent. Because that wouldn't cover user repositories and project branches briefly mentioned above, which in GitHub parlance would be considered as Forks. So you'd want a mega global datastore shared by all repositories, and repositories would need to only expose what they really contain. Does Mercurial support that? I don't think so (okay, I'll give you that: even if it doesn't, it could, but that's extra work). And since we're talking about a transition to Git, does Git support that? You may have read about how you can link to a commit from a fork and make-pretend that it comes from the main repository on GitHub? At least, it shows a warning, now. That's essentially the architectural reason why. So the actual answer is that Git doesn't support it out of the box, but GitHub has some backend magic to handle it somehow (and hopefully, other things like Gitea, Girocco, Gitlab, etc. have something similar). Now, to come back to the size of the repository. A repository is not a static file. It's a server with which you negotiate what you have against what it has that you want. Then the server bundles what you asked for based on what you said you have. Or in the opposite direction, you negotiate what you have that it doesn't, you send it, and the server incorporates what you sent it. Fortunately the latter is less frequent and requires authentication. But the former is more frequent and CPU intensive. Especially when pulling a large number of changesets, which, incidentally, cloning is. "But there is a solution for clones" you might say, which is true. That's clonebundles, which offload the CPU intensive part of cloning to a single job scheduled regularly. Guess who implemented it? Mozilla. But that only covers the cloning part. We actually had laid the ground to support offloading large incremental updates and split clones, but that never materialized. Even with all that, that still leaves you with a server that can display file contents, diffs, blames, provide zip archives of a revision, and more, all of which are CPU intensive in their own way. And these endpoints are regularly abused, and cause extra load to your servers, yes plural, because of course a single server won't handle the load for the number of users of your big repositories. And because your endpoints are abused, you have to close some of them. And I'm not mentioning the Try repository with its tens of thousands of heads, which brings its own sets of problems (and it would have even more heads if we didn't fake-merge them once in a while). Of course, all the above applies to Git (and it only gained support for something akin to clonebundles last year). So, when the Firefox OS project was stopped, there wasn't much motivation to continue supporting our own Git server, Mercurial still being the official point of entry, and git.mozilla.org was shut down in 2016. The growing difficulty of maintaining the status quo Slowly, but steadily in more recent years, as new tooling was added that needed some input from the source code manager, support for Git was more and more consistently added. But at the same time, as people left for other endeavors and weren't necessarily replaced, or more recently with layoffs, resources allocated to such tooling have been spread thin. Meanwhile, the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes. The ongoing work to move Try pushes to Lando will hide the problem under the rug, but the underlying problem will still exist (although the last version of Mercurial seems to have improved things). On the flip side, more and more people have been relying on Git for Firefox development, to my own surprise, as I didn't really push for that to happen. It just happened organically, by ways of git-cinnabar existing, providing a compelling experience to those who prefer Git, and, I guess, word of mouth. I was genuinely surprised when I recently heard the use of Git among moz-phab users had surpassed a third. I did, however, occasionally orient people who struggled with Mercurial and said they were more familiar with Git, towards git-cinnabar. I suspect there's a somewhat large number of people who never realized Git was a viable option. But that, on its own, can come with its own challenges: if you use git-cinnabar without being backed by gecko-dev, you'll have a hard time sharing your branches on GitHub, because you can't push to a fork of gecko-dev without pushing your entire local repository, as they have different commit histories. And switching to gecko-dev when you weren't already using it requires some extra work to rebase all your local branches from the old commit history to the new one. Clone times with git-cinnabar have also started to go a little out of hand in the past few years, but this was mitigated in a similar manner as with the Mercurial cloning problem: with static files that are refreshed regularly. Ironically, that made cloning with git-cinnabar faster than cloning with Mercurial. But generating those static files is increasingly time-consuming. As of writing, generating those for mozilla-unified takes close to 7 hours. I was predicting clone times over 10 hours "in 5 years" in a post from 4 years ago, I wasn't too far off. With exponential growth, it could still happen, although to be fair, CPUs have improved since. I will explore the performance aspect in a subsequent blog post, alongside the upcoming release of git-cinnabar 0.7.0-b1. I don't even want to check how long it now takes with hg-git or git-remote-hg (they were already taking more than a day when git-cinnabar was taking a couple hours). I suppose it's about time that I clarify that git-cinnabar has always been a side-project. It hasn't been part of my duties at Mozilla, and the extent to which Mozilla supports git-cinnabar is in the form of taskcluster workers on the community instance for both git-cinnabar CI and generating those clone bundles. Consequently, that makes the above git-cinnabar specific issues a Me problem, rather than a Mozilla problem. Taking the leap I can't talk for the people who made the proposal to move to Git, nor for the people who put a green light on it. But I can at least give my perspective. Developers have regularly asked why Mozilla was still using Mercurial, but I think it was the first time that a formal proposal was laid out. And it came from the Engineering Workflow team, responsible for issue tracking, code reviews, source control, build and more. It's easy to say "Mozilla should have chosen Git in the first place", but back in 2007, GitHub wasn't there, Bitbucket wasn't there, and all the available options were rather new (especially compared to the then 21 years-old CVS). I think Mozilla made the right choice, all things considered. Had they waited a couple years, the story might have been different. You might say that Mozilla stayed with Mercurial for so long because of the sunk cost fallacy. I don't think that's true either. But after the biggest Mercurial repository hosting service turned off Mercurial support, and the main contributor to Mercurial going their own way, it's hard to ignore that the landscape has evolved. And the problems that we regularly encounter with the Mercurial servers are not going to get any better as the repository continues to grow. As far as I know, all the Mercurial repositories bigger than Mozilla's are... not using Mercurial. Google has its own closed-source server, and Facebook has another of its own, and it's not really public either. With resources spread thin, I don't expect Mozilla to be able to continue supporting a Mercurial server indefinitely (although I guess Octobus could be contracted to give a hand, but is that sustainable?). Mozilla, being a champion of Open Source, also doesn't live in a silo. At some point, you have to meet your contributors where they are. And the Open Source world is now majoritarily using Git. I'm sure the vast majority of new hires at Mozilla in the past, say, 5 years, know Git and have had to learn Mercurial (although they arguably didn't need to). Even within Mozilla, with thousands(!) of repositories on GitHub, Firefox is now actually the exception rather than the norm. I should even actually say Desktop Firefox, because even Mobile Firefox lives on GitHub (although Fenix is moving back in together with Desktop Firefox, and the timing is such that that will probably happen before Firefox moves to Git). Heck, even Microsoft moved to Git! With a significant developer base already using Git thanks to git-cinnabar, and all the constraints and problems I mentioned previously, it actually seems natural that a transition (finally) happens. However, had git-cinnabar or something similarly viable not existed, I don't think Mozilla would be in a position to take this decision. On one hand, it probably wouldn't be in the current situation of having to support both Git and Mercurial in the tooling around Firefox, nor the resource constraints related to that. But on the other hand, it would be farther from supporting Git and being able to make the switch in order to address all the other problems. But... GitHub? I hope I made a compelling case that hosting is not as simple as it can seem, at the scale of the Firefox repository. It's also not Mozilla's main focus. Mozilla has enough on its plate with the migration of existing infrastructure that does rely on Mercurial to understandably not want to figure out the hosting part, especially with limited resources, and with the mixed experience hosting both Mercurial and git has been so far. After all, GitHub couldn't even display things like the contributors' graph on gecko-dev until recently, and hosting is literally their job! They still drop the ball on large blames (thankfully we have searchfox for those). Where does that leave us? Gitlab? For those criticizing GitHub for being proprietary, that's probably not open enough. Cloud Source Repositories? "But GitHub is Microsoft" is a complaint I've read a lot after the announcement. Do you think Google hosting would have appealed to these people? Bitbucket? I'm kind of surprised it wasn't in the list of providers that were considered, but I'm also kind of glad it wasn't (and I'll leave it at that). I think the only relatively big hosting provider that could have made the people criticizing the choice of GitHub happy is Codeberg, but I hadn't even heard of it before it was mentioned in response to Mozilla's announcement. But really, with literal thousands of Mozilla repositories already on GitHub, with literal tens of millions repositories on the platform overall, the pragmatic in me can't deny that it's an attractive option (and I can't stress enough that I wasn't remotely close to the room where the discussion about what choice to make happened). "But it's a slippery slope". I can see that being a real concern. LLVM also moved its repository to GitHub (from a (I think) self-hosted Subversion server), and ended up moving off Bugzilla and Phabricator to GitHub issues and PRs four years later. As an occasional contributor to LLVM, I hate this move. I hate the GitHub review UI with a passion. At least, right now, GitHub PRs are not a viable option for Mozilla, for their lack of support for security related PRs, and the more general shortcomings in the review UI. That doesn't mean things won't change in the future, but let's not get too far ahead of ourselves. The move to Git has just been announced, and the migration has not even begun yet. Just because Mozilla is moving the Firefox repository to GitHub doesn't mean it's locked in forever or that all the eggs are going to be thrown into one basket. If bridges need to be crossed in the future, we'll see then. So, what's next? The official announcement said we're not expecting the migration to really begin until six months from now. I'll swim against the current here, and say this: the earlier you can switch to git, the earlier you'll find out what works and what doesn't work for you, whether you already know Git or not. While there is not one unique workflow, here's what I would recommend anyone who wants to take the leap off Mercurial right now:

Make sure git is installed. Chances are you already have it.

Install git-cinnabar where mach bootstrap would install it.

$ mkdir -p ~/.mozbuild/git-cinnabar
$ cd ~/.mozbuild/git-cinnabar
$ curl -sOL https://raw.githubusercontent.com/glandium/git-cinnabar/master/download.py
$ python3 download.py && rm download.py

Add git-cinnabar to your PATH. Make sure to also set that wherever you keep your PATH up-to-date (.bashrc or wherever else).
```
$ PATH=$PATH:$HOME/.mozbuild/git-cinnabar
```
Enter your mozilla-central or mozilla-unified Mercurial working copy, we'll do an in-place conversion, so that you don't need to move your mozconfigs, objdirs and what not.

Initialize the git repository from GitHub.

$ git init
$ git remote add origin https://github.com/mozilla/gecko-dev
$ git remote update origin

Switch to a Mercurial remote.

$ git remote set-url origin hg::https://hg.mozilla.org/mozilla-unified
$ git config --local remote.origin.cinnabar-refs bookmarks
$ git remote update origin --prune

Fetch your local Mercurial heads.
```
$ git -c cinnabar.refs=heads fetch hg::$PWD refs/heads/default/*:refs/heads/hg/*
```
This will create a bunch of hg/<sha1> local branches, not all relevant to you (some come from old branches on mozilla-central). Note that if you're using Mercurial MQ, this will not pull your queues, as they don't exist as heads in the Mercurial repo. You'd need to apply your queues one by one and run the command above for each of them.
Or, if you have bookmarks for your local Mercurial work, you can use this instead:
```
$ git -c cinnabar.refs=bookmarks fetch hg::$PWD refs/heads/*:refs/heads/hg/*
```
This will create hg/<bookmark_name> branches.
Now, make git know what commit your working tree is on.
```
$ git reset $(git cinnabar hg2git $(hg log -r . -T ' node '))
```
This will take a little moment because Git is going to scan all the files in the tree for the first time. On the other hand, it won't touch their content or timestamps, so if you had a build around, it will still be valid, and mach build won't rebuild anything it doesn't have to.

As there is no one-size-fits-all workflow, I won't tell you how to organize yourself from there. I'll just say this: if you know the Mercurial sha1s of your previous local work, you can create branches for them with:

$ git branch <branch_name> $(git cinnabar hg2git <hg_sha1>)

At this point, you should have everything available on the Git side, and you can remove the .hg directory. Or move it into some empty directory somewhere else, just in case. But don't leave it here, it will only confuse the tooling. Artifact builds WILL be confused, though, and you'll have to ./mach configure before being able to do anything. You may also hit bug 1865299 if your working tree is older than this post. If you have any problem or question, you can ping me on #git-cinnabar or #git on Matrix. I'll put the instructions above somewhere on wiki.mozilla.org, and we can collaboratively iterate on them. Now, what the announcement didn't say is that the Git repository WILL NOT be gecko-dev, doesn't exist yet, and WON'T BE COMPATIBLE (trust me, it'll be for the better). Why did I make you do all the above, you ask? Because that won't be a problem. I'll have you covered, I promise. The upcoming release of git-cinnabar 0.7.0-b1 will have a way to smoothly switch between gecko-dev and the future repository (incidentally, that will also allow to switch from a pure git-cinnabar clone to a gecko-dev one, for the git-cinnabar users who have kept reading this far). What about git-cinnabar? With Mercurial going the way of the dodo at Mozilla, my own need for git-cinnabar will vanish. Legitimately, this begs the question whether it will still be maintained. I can't answer for sure. I don't have a crystal ball. However, the needs of the transition itself will motivate me to finish some long-standing things (like finalizing the support for pushing merges, which is currently behind an experimental flag) or implement some missing features (support for creating Mercurial branches). Git-cinnabar started as a Python script, it grew a sidekick implemented in C, which then incorporated some Rust, which then cannibalized the Python script and took its place. It is now close to 90% Rust, and 10% C (if you don't count the code from Git that is statically linked to it), and has sort of become my Rust playground (it's also, I must admit, a mess, because of its history, but it's getting better). So the day to day use with Mercurial is not my sole motivation to keep developing it. If it were, it would stay stagnant, because all the features I need are there, and the speed is not all that bad, although I know it could be better. Arguably, though, git-cinnabar has been relatively stagnant feature-wise, because all the features I need are there. So, no, I don't expect git-cinnabar to die along Mercurial use at Mozilla, but I can't really promise anything either. Final words That was a long post. But there was a lot of ground to cover. And I still skipped over a bunch of things. I hope I didn't bore you to death. If I did and you're still reading... what's wrong with you? ;) So this is the end of Mercurial at Mozilla. So long, and thanks for all the fish. But this is also the beginning of a transition that is not easy, and that will not be without hiccups, I'm sure. So fasten your seatbelts (plural), and welcome the change. To circle back to the clickbait title, did I really kill Mercurial at Mozilla? Of course not. But it's like I stumbled upon a few sparks and tossed a can of gasoline on them. I didn't start the fire, but I sure made it into a proper bonfire... and now it has turned into a wildfire. And who knows? 15 years from now, someone else might be looking back at how Mozilla picked Git at the wrong time, and that, had we waited a little longer, we would have picked some yet to come new horse. But hey, that's the tech cycle for you.

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.11?

Full rewrite of the Python parts of git-cinnabar in Rust.
Push performance is between twice and 10 times faster than 0.5.x, depending on scenarios.
Based on git 2.38.0.
git cinnabar fetch now accepts a --tags flag to fetch tags.
git cinnabar bundle now accepts a -t flag to give a specific bundlespec.
git cinnabar rollback now accepts a --candidates flag to list the metadata sha1 that can be used as target of the rollback.
git cinnabar rollback now also accepts a --force flag to allow any commit sha1 as metadata.
git cinnabar now has a self-update subcommand that upgrades it when a new version is available. The subcommand is only available when building with the self-update feature (enabled on prebuilt versions of git-cinnabar).
Disabled inexact copy/rename detection, that was enabled by accident.

What s new since 0.6.0rc2?

Fixed use-after-free in metadata initialization.
Look for the new location of the CA bundle in git-windows 2.40.

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get version 0.5.11 on github. Or get version 0.6.0rc2 on github. What s new in 0.5.11?

Fixed compatibility with python 3.11.
Disabled inexact copy/rename detection, that was enabled by accident.
Updated git to 2.38.1 for the helper.

What s new in 0.6.0rc2?

Improvements and bug fixes to git cinnabar self-update. Note: to upgrade from 0.6.0rc1, don t use the self-update command except on Windows. Please use the download.py script instead, or install from the release artifacts on https://github.com/glandium/git-cinnabar/releases/tag/0.6.0rc2.
Disabled inexact copy/rename detection, that was enabled by accident.
Removed dependencies on msys DLLs on Windows.
Based on git 2.38.1.
Other minor fixes.

Full rewrite of git-cinnabar in Rust.
Push performance is between twice and 10 times faster than 0.5.x, depending on scenarios.
Based on git 2.38.0.
git cinnabar fetch now accepts a --tags flag to fetch tags.
git cinnabar bundle now accepts a -t flag to give a specific bundlespec.
git cinnabar rollback now accepts a --candidates flag to list the metadata sha1 that can be used as target of the rollback.
git cinnabar rollback now also accepts a --force flag to allow any commit sha1 as metadata.
git cinnabar now has a self-update subcommand that upgrades it when a new version is available. The subcommand is only available when building with the self-update feature (enabled on prebuilt versions of git-cinnabar).

Fixed exceptions during config initialization.
Fixed swapped error messages.
Fixed correctness issues with bundle chunks with no delta node.
This is probably the last 0.5.x release before 0.6.0.

Updated git to 2.37.1 for the helper.
Various python 3 fixes.
Fixed stream bundle
Added python and py.exe as executables tried on top of python3 and python2.
Improved handling of ill-formed local urls.
Fixed using old mercurial libraries that don t support bundlev2 with a server that does.
When fsck reports the metadata as broken, prevent further updates to the repo.
When issue #207 is detected, mark the metadata as broken
Added support for logging redirection to a file
Now ignore refs/cinnabar/replace/ refs, and always use the corresponding metadata instead.
Various git cinnabar fsck fixes.

Updated git to 2.34.0 for the helper.
Python 3.5 and newer are now officially supported. Git-cinnabar will try to use the python3 program by default, but will fallback to python2.7 if that s where the Mercurial libraries are available. It is possible to pick a specific python with the GIT_CINNABAR_PYTHON environment variable.
Fixed compatibility with Mercurial 5.8 and newer.
The prebuilt binaries are now optimized on arm64 macOS and Windows.
git cinnabar download now properly returns an error code when failing to extract the prebuilt binaries.
Pushing to a non-empty Mercurial repository without having pulled at least once from it is now prevented.
Replaced the nagging about fsck with a smaller check always happening after pulling.
Fail earlier on git fetch hg::url <sha1> (it would properly fetch the Mercurial changeset and its ancestors, but git would fail at the end because the sha1 is not a git sha1 ; use git cinnabar fetch instead)
Minor fixes.

Updated git to 2.31.1 for the helper.
When using git >= 2.31.0, git -c config=value ... works again.
Minor fixes.

5 years ago today, I was declaring Iceweasel dead, and Firefox was making a come back in Debian. I hadn t planned to make this post, and in fact, I thought it had been much longer. But coincidentally, I was binge-watching Mr. Robot recently, which prominently featured Iceweasel. iceweasel command in a terminal

Mr. Robot is set in the year 2015, and I was surprised that Iceweasel was being used, which led me to search for that post where I announced Firefox was back and realizing that we were close to the 5 years mark. Well, we are at the 5 years mark now. iceweasel start page

I d normally say time flies, but it turns out it hasn t flown as much as I thought it did. I wonder if the interminable pandemic is to blame for that.

Please partake in the git-cinnabar survey. Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git. Get it on github. These release notes are also available on the git-cinnabar wiki. What s new since 0.5.5?

Updated git to 2.29.2 for the helper.
git cinnabar git2hg and git cinnabar hg2git now have a --batch flag.
Fixed a few issues with experimental support for python 3.
Fixed compatibility issues with mercurial >= 5.5.
Avoid downloading unsupported clonebundles.
Provide more resilience to network problems during bundle download.
Prebuilt helper for Apple Silicon macos now available via git cinnabar download.

[Disclaimer: this has been sitting as a draft for close to three months ; I forgot to publish it, this is now finally done.] In my previous blog post, I built Firefox in a multiple different number of configurations where I d disable the CPU turbo, some of its cores or some of its threads. That is something that was traditionally done via the BIOS, but rebooting between each attempt is not really a great experience. Fortunately, the Linux kernel provides a large number of knobs that allow this at runtime. Turbo This is the most straightforward:

$ echo 0 > /sys/devices/system/cpu/cpufreq/boost

Re-enable with

$ echo 1 > /sys/devices/system/cpu/cpufreq/boost

CPU frequency throttling Even though I haven t mentioned it, I might as well add this briefly. There are many knobs to tweak frequency throttling, but assuming your goal is to disable throttling and set the CPU frequency to its fastest non-Turbo frequency, this is how you do it:

$ echo performance > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor

where $n is the id of the core you want to do that for, so if you want to do that for all the cores, you need to do that for cpu0, cpu1, etc. Re-enable with:

$ echo ondemand > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor

(assuming this was the value before you changed it ; ondemand is usually the default) Cores and Threads This one requires some attention, because you cannot assume anything about the CPU numbers. The first thing you want to do is to check those CPU numbers. You can do so by looking at the physical id and core id fields in /proc/cpuinfo, but the output from lscpu --extended is more convenient, and looks like the following:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    3700.0000 2200.0000
1   0    0      1    1:1:1:0       yes    3700.0000 2200.0000
2   0    0      2    2:2:2:0       yes    3700.0000 2200.0000
3   0    0      3    3:3:3:0       yes    3700.0000 2200.0000
4   0    0      4    4:4:4:1       yes    3700.0000 2200.0000
5   0    0      5    5:5:5:1       yes    3700.0000 2200.0000
6   0    0      6    6:6:6:1       yes    3700.0000 2200.0000
7   0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)
32  0    0      0    0:0:0:0       yes    3700.0000 2200.0000
33  0    0      1    1:1:1:0       yes    3700.0000 2200.0000
34  0    0      2    2:2:2:0       yes    3700.0000 2200.0000
35  0    0      3    3:3:3:0       yes    3700.0000 2200.0000
36  0    0      4    4:4:4:1       yes    3700.0000 2200.0000
37  0    0      5    5:5:5:1       yes    3700.0000 2200.0000
38  0    0      6    6:6:6:1       yes    3700.0000 2200.0000
39  0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)

Now, this output is actually the ideal case, where pairs of CPUs (virtual cores) on the same physical core are always n, n+32, but I ve had them be pseudo-randomly spread in the past, so be careful. To turn off a core, you want to turn off all the CPUs with the same CORE identifier. To turn off a thread (virtual core), you want to turn off one CPU. On machines with multiple sockets, you can also look at the SOCKET column. Turning off one CPU is done with:

$ echo 0 > /sys/devices/system/cpu/cpu$n/online

Re-enable with:

$ echo 1 > /sys/devices/system/cpu/cpu$n/online

Extra: CPU sets CPU sets are a feature of Linux s cgroups. They allow to restrict groups of processes to a set of cores. The first step is to create a group like so:

$ mkdir /sys/fs/cgroup/cpuset/mygroup

Please note you may already have existing groups, and you may want to create subgroups. You can do so by creating subdirectories. Then you can configure on which CPUs/cores/threads you want processes in this group to run on:

$ echo 0-7,16-23 > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus

The value you write in this file is a comma-separated list of CPU/core/thread numbers or ranges. 0-3 is the range for CPU/core/thread 0 to 3 and is thus equivalent to 0,1,2,3. The numbers correspond to /proc/cpuinfo or the output from lscpu as mentioned above. There are also memory aspects to CPU sets, that I won t detail here (because I don t have a machine with multiple memory nodes), but you can start with:

$ cat /sys/fs/cgroup/cpuset/cpuset.mems > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems

Now you re ready to assign processes to this group:

$ echo $pid >> /sys/fs/cgroup/cpuset/mygroup/tasks

There are a number of tweaks you can do to this setup, I invite you to check out the cpuset(7) manual page. Disabling a group is a little involved. First you need to move the processes to a different group:

$ while read pid; do echo $pid > /sys/fs/cgroup/cpuset/tasks; done < /sys/fs/cgroup/cpuset/mygroup/tasks

Then deassociate CPU and memory nodes:

$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus
$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems

And finally remove the group:

$ rmdir /sys/fs/cgroup/cpuset/mygroup

Updated git to 2.26.2 for the helper.
Improved experimental support for pushing merges.
Fixed a few issues with experimental support for python 3.
Don t complain the helper is outdated if it s newer.
Auto-enable graft when cinnabarclone contains a graft that can be fulfilled.
Exclude git-cinnabar notes and git-filter-branch backups from graft candidates.
Graft failures are more now silent.
Fixed handling of a null manifest.

Enabled support for clonebundles for faster clones when the server provides them.
Git packs created by git-cinnabar are now smaller.
Added a new git cinnabar upgrade command to handle metadata upgrade separately from fsck.
Metadata upgrade is now significantly faster.
git cinnabar fsck also faster.
Both now also use significantly less memory.
Updated git to 2.13.1 for git-cinnabar-helper.

git-cinnabar-helper is now mandatory. You can either download one with git cinnabar download on supported platforms or build one with make helper.
Metadata changes require to run git cinnabar fsck.
Mercurial tags are consolidated in a separate (fake) repository. See the README file.
Updated git to 2.13.0 for git-cinnabar-helper.
Improved memory consumption and performance.
Improved experimental support for pushing merges.
Experimental support for clonebundles.
Removed support for the .git/hgrc file for mercurial specific configuration.
Support any version of Git (was previously limited to 1.8.5 minimum)

Since version 0.4.0, git-cinnabar has a few hidden experimental features. Two of them are available in 0.4.0, and a third was recently added on the master branch. The basic mechanism to enable experimental features is to set a preference in the git configuration with a comma-separated list of features to enable, or all, for all of them. That preference is cinnabar.experiments. Any means to set a git configuration can be used. You can:

Add the following to .git/config:
```
[cinnabar]
experiments=feature
```

Or run the following command:

$ git config cinnabar.experiments feature

Or only enable the feature temporarily for a given command:
```
$ git -c cinnabar.experiments=feature command arguments
```

But what features are there? wire In order to talk to Mercurial repositories, git-cinnabar normally uses mercurial python modules. This experimental feature allows to access Mercurial repositories without using the mercurial python modules. It then relies on git-cinnabar-helper to connect to the repository through the mercurial wire protocol. As of version 0.4.0, the feature is automatically enabled when Mercurial is not installed. merge Git-cinnabar currently doesn t allow to push merge commits. The main reason for this is that generating the correct mercurial data for those merges is tricky, and needs to be gotten right. In version 0.4.0, enabling this feature allows to push merge commits as long as the parent commits are available on the mercurial repository. If they aren t, you need to first push them independently, and then push the merge. On current master, that limitation doesn t exist anymore ; you can just push everything in one go. The main caveat with this experimental support for pushing merges is that it currently doesn t handle the case where a file was moved on one of the branches the same way mercurial would (i.e. the information would be lost to mercurial users). clonebundles As of mercurial 3.6, Mercurial servers can opt-in to providing pre-generated bundles, which, when clients support it, takes CPU load off the server when a clone is performed. Good for servers, and usually good for clients too when they have a fast network connection, because downloading a pre-generated bundle is usually faster than waiting for the server to generate one. As of a few days ago, the master branch of git-cinnabar supports cloning using those pre-generated bundles, provided the server advertizes them (mozilla-central does).

This all started when I figured out that git-cinnabar was using crazy amounts of memory when cloning mozilla-central. That pointed to memory allocation patterns that triggered a suboptimal behavior in the glibc memory allocator, and, while overall, git-cinnabar wasn t really abusing memory all things considered, it happened to be realloc()ating way too much. It also turned out that recent changes on the master branch had made most uses of fast-import synchronous, making the whole process significantly slower. This is where we started from on 0.4.0:

And on the master branch as of be75326:

An interesting thing to note here is that the glibc allocator runaway memory use was, this time, more pronounced on 0.4.0 than on master. It was the opposite originally, but as I mentioned in the past ASLR is making it not happen exactly the same way each time. While I m here, one thing I failed to mention in the previous posts is that all these measurements were done by cloning a local mercurial clone of mozilla-central, served from localhost via HTTP to eliminate the download time from hg.mozilla.org. And while mozilla-central itself has received new changesets since the first post, the local clone has not been updated, such that all subsequent clone tests I did were cloning the exact same repository under the exact same circumstances. After last blog post, I focused on the low hanging fruits identified so far:

Moving the mercurial to git SHA1 mapping to the helper process (Finding a git bug in the process).
Tracking mercurial manifest heads in the helper process.
Removing most of the synchronous calls to the helper happening during a clone.

And this is how things now look on the master branch as of 35c18e7:

So where does that put us?

The overall clone is now about 11 minutes faster than 0.4.0 (and about 50 minutes faster than master as of be75326!)
Non-shared memory use of the git-remote-hg process stays well under 2GB during the whole clone, with no spike at the end.
git-cinnabar-helper now uses more memory, but the sum of both processes is less than what it used to be, even when compensating for the glibc memory allocator issue. One thing to note is that while the git-cinnabar-helper memory use goes above 2GB at the end of the clone, a very large part is due to the pack window size being 1GB on 64-bits (vs. 32MB on 32-bits). Memory usage should stay well under the 2GB address space limit on a 32-bits system.
CPU usage is well above 100% for most of the clone.

On a more granular level:

The Import manifests phase is now 13 minutes faster than it was in 0.4.0.
The Read and import files phase is still almost 4 minutes slower than in 0.4.0.
The Import changesets phase is still almost 2 minutes slower than in 0.4.0.
But the Finalization phase is now 3 minutes faster than in 0.4.0.

What this means is that there s still room for improvement. But at this point, I d rather focus on other things. Logging all the memory allocations with the python allocator disabled still resulted in a 6.5GB compressed log file, containing 2.6 billion calls to malloc, calloc, free and realloc (down from 2.7 billions in be75326). The number of allocator calls done by the git-remote-hg process is down to 2.25 billions (from 2.34 billion in be75326). Surprisingly, while more things were moved to the helper, it still made less allocations than in be75326: 345 millions, down from 363 millions. Presumably, this is because the number of commands processed by the fast-import code was reduced. Let s now take a look at the various metrics we analyzed previously (the horizontal axis represents the number of allocator calls that happened before the measurement):

A few observations to make here:

The allocated memory (requested bytes) is well below what it was, and the spike at the end is entirely gone. It also more closely follows the amount of raw data we re holding on to (which makes sense since most of the bookkeeping was moved to the helper)
The number of live allocations (allocated memory pointers that haven t been free()d yet) has gone significantly down as well.
The cumulated[*] bytes are now in a much more reasonable range, with the lower bound close to the total amount of data we re dealing with during the clone, and the upper bound slightly over twice that amount (the upper bound for the be75326 is not shown here, but it was around 45TB; less than 7TB is a big improvement).
There are less allocator calls during the first phases and the Importing changesets phase, but more during the Reading and importing files and Importing manifests phases.

[*] The upper bound is the sum of all sizes ever given to malloc, calloc, realloc etc. and the lower bound is the same, but removing the size of allocations passed as input to realloc (in practical words, this pretends reallocs never happened and that the final size for a given reallocated pointer is the one that counts) So presumably, some of the changes led to more short-lived allocations. Considering python uses its own allocator for sizes smaller than 512 bytes, it s probably not so much of a problem. But let s look at the distribution of buffer sizes (including all sizes given to realloc).

(Bucket size is 16 bytes) What is not completely obvious from the logarithmic scale is that, in fact, 98.4% of the allocations are less than 512 bytes with the current master (35c18e7), and they were 95.5% with be75326. Interestingly, though, in absolute numbers, there are less allocations smaller than 512 bytes in current master than in be75326 (1,194,268,071 vs 1,214,784,494). This suggests the extra allocations that happen during some phases are larger than that. There are clearly less allocations across the board (apart from very few exceptions), and close to an order of magnitude less allocations larger than 1MiB. In fact, widening the bucket size to 32KiB shows an order of magnitude difference (or close) for most buckets:

An interesting thing to note is how some sizes are largely overrepresented in the data with buckets of 16 bytes, like 768, 1104, 2048, 4128, with other smaller bumps for e.g. 2144, 2464, 2832, 3232, 3696, 4208, 4786, 5424, 6144, 6992, 7920 While some of those are powers of 2, most aren t, and some of them may actually represent objects sized with a power of 2, but that have an extra PyObject overhead. While looking at allocation stats, I got to wonder what the lifetimes of those allocations looked like. So I scanned the allocator logs and measured the distance between when an allocation is made and when it is freed, ignoring reallocs. To give a few examples of what I mean, the following allocation for p gets a lifetime of 0:

void *p = malloc(42);
free(p);

The following a lifetime of 1:

void *p = malloc(42);
void *other = malloc(42);
free(p);

And the following a lifetime of 1 as well:

void *p = malloc(42);
p = realloc(p, 84);
free(p);

(that is, it is not counted as two malloc/free pairs) The further away the free is from the corresponding malloc, the larger the lifetime. And the largest the lifetime can ever be is the total number of allocator function calls minus two, in the hypothetical case the very first allocation is freed as the very last (minus two because we defined the lifetime as the distance).

What comes out of this data:

As expected, there are more short-lived allocations in 35c18e7.
Around 90% of allocations have a lifetime spanning 10% of the process life or less. This is a rather surprisingly large amount of allocations with a very large lifetime.
Around 80% of allocations have a lifetime spanning 0.01% of the process life or less.
The median lifetime is around 0.0000002% (2*10^-7%) of the process life, which, in absolute terms is around 500 allocator function calls between a malloc and a free.
If we consider every imported changeset, manifest and file to require a similar number of allocations, and considering there are about 2.7M of them in total, each spans about 3.7*10^-7%. About 53% of all allocations on be75326 and 57% on 35c18e7 have a lifetime below that. Whenever I get to look more closely to memory usage again, I ll probably look at the data separately for each individual phase.
One surprising fact, that doesn t appear on the graph because of the logarithmic scale not showing 0 on the horizontal axis, is that 9.3% on be75326 and 7.3% on 35c18e7 of all allocations have a lifetime of 0. That is, whatever the code using them is doing, it s not allocating or freeing anything else, and not reallocating them either.

All in all, what the data shows is that we re definitely in a better place now than we used to be a few days ago, and that there is still work to do on the memory front, but:

As mentioned in a previous post, there are bigger wins to be had from not keeping manifests data around in memory at all, and by importing it directly instead.
In time, a lot of the import code is meant to move to the helper, where the constraints are completely different, and it might not be worth spending time now on reducing the memory usage of python code that might go away soon(ish). The situation was bad and necessitated action rather quickly, but we re now in a place where it s not as bad anymore.

So at this point, I won t look any deeper into the memory usage of the git-remote-hg python process, and will instead focus on the planned metadata storage changes. They will allow to share the metadata more easily (allowing faster and more straightforward gecko-dev graft), and will allow to import manifests earlier, which, as mentioned already, will help reduce memory use, but, more importantly, will allow to do more actual work while downloading the data. On slow networks, this is crucial to make clones and pulls faster.

Apart from the memory considerations, one thing that the data presented in the When the memory allocator works against you post that I haven t touched in the followup posts is that there is a large difference in the time it takes to clone mozilla-central with git-cinnabar 0.4.0 vs. the master branch. One thing that was mentioned in the first followup is that reducing the amount of realloc and substring copies made the cloning more than 15 minutes faster on master. But the same code exists in 0.4.0, so this isn t part of the difference. So what s going on? Looking at the CPU usage during the clone is enlightening. On 0.4.0:

On master:

(Note: the data gathering is flawed in some ways, which explains why the git-remote-hg process goes above 100%, which is not possible for this python process. The data is however good enough for the high level analysis that follows, so I didn t bother to get something more acurate) On 0.4.0, the git-cinnabar-helper process was saturating one CPU core during the File import phase, and the git-remote-hg process was saturating one CPU core during the Manifest import phase. Overall, the sum of both processes usually used more than one and a half core. On master, however, the total of both processes barely uses more than one CPU core. What happened? This and that happened. Essentially, before those changes, git-remote-hg would send instructions to git-fast-import (technically, git-cinnabar-helper, but in this case it s only used as a wrapper for git-fast-import), and use marks to track the git objects that git-fast-import created. After those changes, git-remote-hg asks git-fast-import the git object SHA1 of objects it just asked to be created. In other words, those changes replaced something asynchronous with something synchronous: while it used to be possible for git-remote-hg to work on the next file/manifest/changeset while git-fast-import was working on the previous one, it now waits. The changes helped simplify the python code, but made the overall clone process much slower. If I m not mistaken, the only real use for that information is for the mapping of mercurial to git SHA1s, which is actually rarely used during the clone, except at the end, when storing it. So what I m planning to do is to move that mapping to the git-cinnabar-helper process, which, incidentally, will kill not 2, but 3 birds with 1 stone:

It will restore the asynchronicity, obviously (at least, that s the expected main outcome).
Storing the mapping in the git-cinnabar-helper process is very likely to take less memory than what it currently takes in the git-remote-hg process. Even if it doesn t (which I doubt), that should still help stay under the 2GB limit of 32-bit processes.
The whole thing that spikes memory usage during the finalization phase, as seen in previous post, will just go away, because the git-cinnabar-helper process will just have prepared the git notes-like tree on its own.

So expect git-cinnabar 0.5 to get moar faster, and to use moar less memory.

In previous post, I was looking at the allocations git-cinnabar makes. While I had the data, I figured I d also look how the memory use correlates with expectations based on repository data, to put things in perspective. As a reminder, this is what the allocations look like (horizontal axis being the number of allocator function calls):

There are 7 different phases happening during a git clone using git-cinnabar, most of which can easily be identified on the graph above:

Negotiation. During this phase, git-cinnabar talks to the mercurial server to determine what needs to be pulled. Once that is done, a getbundle request is emitted, which response is read in the next three phases. This phase is essentially invisible on the graph.
Reading changeset data. The first thing that a mercurial server sends in the response for a getbundle request is changesets. They are sent in the RevChunk format. Translated to git, they become commit objects. But to create commit objects, we need the entire corresponding trees and files (blobs), which we don t have yet. So we keep this data in memory. In the git clone analyzed here, there are 345643 changesets loaded in memory. Their raw size in RawChunk format is 237MB. I think by the end of this phase, we made 20 million allocator calls, have about 300MB of live data in about 840k allocations. (No certainty because I don t actually have definite data that would allow to correlate between the phases and allocator calls, and the memory usage change between this phase and next is not as clear-cut as with other phases). This puts us at less than 3 live allocations per changeset, with only about 60MB overhead over the raw data.
Reading manifest data. In the stream we receive, manifests follow changesets. Each changeset points to one manifest ; several changesets can point to the same manifest. Manifests describe the content of the entire source code tree in a similar manner as git trees, except they are flat (there s one manifest for the entire tree, where git trees would reference other git trees for sub directories). And like git trees, they only map file paths to file SHA1s. The way they are currently stored by git-cinnabar (which is planned to change) requires knowing the corresponding git SHA1s for those files, and we haven t got those yet, so again, we keep everything in memory. In the git clone analyzed here, there are 345398 manifests loaded in memory. Their raw size in RawChunk format is 1.18GB. By the end of this phase, we made 23 million more allocator calls, and have about 1.52GB of live data in about 1.86M allocations. We re still at less than 3 live allocations for each object (changeset or manifest) we re keeping in memory, and barely over 100MB of overhead over the raw data, which, on average puts the overhead at 150 bytes per object. The three phases so far are relatively fast and account for a small part of the overall process, so they don t appear clear-cut to each other, and don t take much space on the graph.
Reading and Importing files. After the manifests, we finally get files data, grouped by path, such that we get all the file revisions of e.g. .cargo/.gitignore, followed by all the file revisions of .cargo/config.in, .clang-format, and so on. The data here doesn t depend on anything else, so we can finally directly import the data. This means that for each revision, we actually expand the RawChunk into the full file data (RawChunks contain patches against a previous revision), and don t keep the RawChunk around. We also don t keep the full data after it was sent to the git-cinnabar-helper process (as far as cloning is concerned, it s essentially a wrapper for git-fast-import), except for the previous revision of the file, which is likely the patch base for the next revision. We however keep in memory one or two things for each file revision: a mapping of its mercurial SHA1 and the corresponding git SHA1 of the imported data, and, when there is one, the file metadata (containing information about file copy/renames) that lives as a header in the file data in mercurial, but can t be stored in the corresponding git blobs, otherwise we d have irrelevant data in checkouts. On the graph, this is where there is a steady and rather long increase of both live allocations and memory usage, in stairs for the latter. In the git clone analyzed here, there are 2.02M file revisions, 78k of which have copy/move metadata for a cumulated size of 8.5MB of metadata. The raw size of the file revisions in RawChunk format is 3.85GB. The expanded data size is 67GB. By the end of this phase, we made 622 million more allocator calls, and peaked at about 2.05GB of live data in about 6.9M allocations. Compared to the beginning of this phase, that added about 530MB in 5 million allocations. File metadata is stored in memory as python dicts, with 2 entries each, instead of raw form for convenience and future-proofing, so that would be at least 3 allocations each: one for each value, one for the dict, and maybe one for the dict storage ; their keys are all the same and are probably interned by python, so wouldn t count. As mentioned above, we store a mapping of mercurial to git SHA1s, so for each file that makes 2 allocations, 4.04M total. Plus the 230k or 310k from metadata. Let s say 4.45M total. We re short 550k allocations, but considering the numbers involved, it would take less than one allocation per file on average to go over this count. As for memory size, per this answer on stackoverflow, python strings have an overhead of 37 bytes, so each SHA1 (kept in hex form) will take 77 bytes (Note, that s partly why I didn t particularly care about storing them as binary form, that would only save 25%, not 50%). That s 311MB just for the SHA1s, to which the size of the mapping dict needs to be added. If it were a plain array of pointers to keys and values, it would take 2 * 8 bytes per file, or about 32MB. But that would be a hash table with no room for more items (By the way, I suspect the stairs that can be seen on the requested and in-use bytes is the hash table being realloc()ed). Plus at least 290 bytes per dict for each of the 78k metadata, which is an additional 22M. All in all, 530MB doesn t seem too much of a stretch.
Importing manifests. At this point, we re done receiving data from the server, so we begin by dropping objects related to the bundle we got from the server. On the graph, I assume this is the big dip that can be observed after the initial increase in memory use, bringing us down to 5.6 million allocations and 1.92GB. Now begins the most time consuming process, as far as mozilla-central is concerned: transforming the manifests into git trees, while also storing enough data to be able to reconstruct manifests later (which is required to be able to pull from the mercurial server after the clone). So for each manifest, we expand the RawChunk into the full manifest data, and generate new git trees from that. The latter is mostly performed by the git-cinnabar-helper process. Once we re done pushing data about a manifest to that process, we drop the corresponding data, except when we know it will be required later as the delta base for a subsequent RevChunk (which can happen in bundle2). As with file revisions, for each manifest, we keep track of the mapping of SHA1s between mercurial and git. We also keep a DAG of the manifests history (contrary to git trees, mercurial manifests track their ancestry ; files do too, but git-cinnabar doesn t actually keep track of that separately ; it just relies on the manifests data to infer file ancestry). On the graph, this is where the number of live allocations increases while both requested and in-use bytes decrease, noisily. By the end of this phase, we made about 1 billion more allocator calls. Requested allocations went down to 1.02GB, for close to 7 million live allocations. Compared to the end of the dip at the beginning of this phase, that added 1.4 million allocations, and released 900MB. By now, we expect everything from the Reading manifests phase to have been released, which means we allocated around 620MB (1.52GB 900MB), for a total of 3.26M additional allocations (1.4M + 1.86M). We have a dict for the SHA1s mapping (345k * 77 * 2 for strings, plus the hash table with 345k items, so at least 60MB), and the DAG, which, now that I m looking at memory usage, I figure has the one of the possibly worst structure, using 2 sets for each node (at least 232 bytes per set, that s at least 160MB, plus 2 hash tables with 345k items). I think 250MB for those data structures would be largely underestimated. It s not hard to imagine them taking 620MB, because really, that DAG implementation is awful. The number of allocations expected from them would be around 1.4M (4 * 345k), but I might be missing something. That s way less than the actual number, so it would be interesting to take a closer look, but not before doing something about the DAG itself. Fun fact: the amount of data we re dealing with in this phase (the expanded size of all the manifests) is close to 2.9TB (yes, terabytes). With about 4700 seconds spent on this phase on a real clone (less with the release branch), we re still handling more than 615MB per second.
Importing changesets. This is where we finally create the git commits corresponding to the mercurial changesets. For each changeset, we expand its RawChunk, find the git tree we created in the previous phase that corresponds to the associated manifest, and create a git commit for that tree, with the right date, author, and commit message. For data that appears in the mercurial changeset that can t be stored or doesn t make sense to store in the git commit (e.g. the manifest SHA1, the list of changed files[*], or some extra metadata like the source of rebases), we keep some metadata we ll store in git notes later on. [*] Fun fact: the list of changed files stored in mercurial changesets does not necessarily match the list of files in a git diff between the corresponding git commit and its parents, for essentially two reasons:
- Old buggy versions of mercurial have generated erroneous lists that are now there forever (they are part of what makes the changeset SHA1).
- Mercurial may create new revisions for files even when the file content is not modified, most notably during merges (but that also happened on non-merges due to, presumably, bugs).
so we keep it verbatim. On the graph, this is where both requested and in-use bytes are only slightly increasing. By the end of this phase, we made about half a billion more allocator calls. Requested allocations went up to 1.06GB, for close to 7.7 million live allocations. Compared to the end of the previous phase, that added 700k allocations, and 400MB. By now, we expect everything from the Reading changesets phase to have been released (at least the raw data we kept there), which means we may have allocated at most around 700MB (400MB + 300MB), for a total of 1.5M additional allocations (700k + 840k). All these are extra data we keep for the next and final phase. It s hard to evaluate the exact size we d expect here in memory, but if we divide by the number of changesets (345k), that s less than 5 allocations per changeset and less than 2KB per changeset, which is low enough not to raise eyebrows, at least for now.
Finalizing the clone. The final phase is where we actually go ahead storing the mappings between mercurial and git SHA1s (all 2.7M of them), the git notes where we store the data necessary to recreate mercurial changesets from git commits, and a cache for mercurial tags. On the graph, this is where the requested and in-use bytes, as well as the number of live allocations peak like crazy (up to 21M allocations for 2.27GB requested). This is very much unwanted, but easily explained with the current state of the code. The way the mappings between mercurial and git SHA1s are stored is via a tree similar to how git notes are stored. So for each mercurial SHA1, we have a file that points to the corresponding git SHA1 through git links for commits or directly for blobs (look at the output of git ls-tree -r refs/cinnabar/metadata^3 if you re curious about the details). If I remember correctly, it s faster if the tree is created with an ordered list of paths, so the code created a list of paths, and then sorted it to send commands to create the tree. The former creates a new str of length 42 and a tuple of 3 elements for each and every one of the 2.7M mappings. With the 37 bytes overhead by str instance and the 56 + 3 * 8 bytes per tuple, we have at least 429MB wasted. Creating the tree itself keeps the corresponding fast-import commands in a buffer, where each command is going to be a tuple of 2 elements: a pointer to a method, and a str of length between 90 and 93. That s at least another 440MB wasted. I already fixed the first half, but the second half still needs addressing.

Overall, except for the stupid spike during the final phase, the manifest DAG and the glibc allocator runaway memory use described in previous posts, there is nothing terribly bad with the git-cinnabar memory usage, all things considered. Mozilla-central is just big. The spike is already half addressed, and work is under way for the glibc allocator runaway memory use. The manifest DAG, interestingly, is actually mostly useless. It s only used to track the heads of the DAG, and it s very much possible to track heads of a DAG without actually storing the entire DAG. In fact, that s what git-cinnabar already does for changeset heads so we would only need to do the same for manifest heads. One could argue that the 1.4GB of raw RevChunk data we re keeping in memory for later user could be kept on disk instead. I haven t done this so far because I didn t want to have to handle temporary files (and answer questions like where to put them? , what if there isn t enough disk space there? , what if disk access is slow? , etc.). But the majority of this data is from manifests. I m already planning changes in how git-cinnabar stores manifests data that will actually allow to import them directly, instead of keeping them in memory until files are imported. This would instantly remove 1.18GB of memory usage. The downside, however, is that this would be more CPU intensive: Importing changesets will require creating the corresponding git trees, and getting the stored manifest data. I think it s worth, though. Finally, one thing that isn t obvious here, but that was found while analyzing why RSS would be going up despite memory usage going down, is that git-cinnabar is doing way too many reallocations and substring allocations. So let s look at two metrics that hopefully will highlight the problem:

The cumulated amount of requested memory. That is, the sum of all sizes ever given to malloc, realloc, calloc, etc.
The compensated cumulated amount of requested memory (naming is hard). That is, the sum of all sizes ever given to malloc, calloc, etc. except realloc. For realloc, we only count the delta in size between what the size was before and after the realloc.

Assuming all the requested memory is filled at some point, the former gives us an upper bound to the amount of memory that is ever filled or copied (the amount that would be filled if no realloc was ever in-place), while the the latter gives us a lower bound (the amount that would be filled or copied if all reallocs were in-place). Ideally, we d want the upper and lower bounds to be close to each other (indicating few realloc calls), and the total amount at the end of the process to be as close as possible to the amount of data we re handling (which we ve seen is around 3TB).

and this is clearly bad. Like, really bad. But we already knew that from the previous post, although it s nice to put numbers on it. The lower bound is about twice the amount of data we re handling, and the upper bound is more than 10 times that amount. Clearly, we can do better. We ll see how things evolve after the necessary code changes happen. Stay tuned.

This is a followup to the When the memory allocator works against you post from a few days ago. You may want to read that one first if you haven t, and come back. In case you don t or didn t read it, it was all about memory consumption during a git clone of the mozilla-central mercurial repository using git-cinnabar, and how the glibc memory allocator is using more than one would expect. This post is going to explore how/why it s happening. I happen to have written a basic memory allocation logger for Firefox, so I used it to log all the allocations happening during a git clone exhibiting the runaway memory increase behavior (using a python that doesn t use its own allocator for small allocations). The result was a 6.5 GB log file (compressed with zstd ; 125 GB uncompressed!) with 2.7 billion calls to malloc, calloc, free, and realloc, recorded across (mostly) 2 processes (the python git-remote-hg process and the native git-cinnabar-helper process ; there are other short-lived processes involved, but they do less than 5000 calls in total). The vast majority of those 2.7 billion calls is done by the python git-remote-hg process: 2.34 billion calls. We ll only focus on this process. Replaying those 2.34 billion calls with a program that reads the log allowed to reproduce the runaway memory increase behavior to some extent. I went an extra mile and modified glibc s realloc code in memory so it doesn t call memcpy, to make things faster. I also ran under setarch x86_64 -R to disable ASLR for reproducible results (two consecutive runs return the exact same numbers, which doesn t happen with ASLR enabled). I also modified the program to report the number of live allocations (allocations that haven t been freed yet), and the cumulated size of the actually requested allocations (that is, the sum of all the sizes given to malloc, calloc, and realloc calls for live allocations, as opposed to what the memory allocator really allocated, which can be more, per malloc_usable_size). RSS was not tracked because the allocations are never filled to make things faster, such that pages for large allocations are never dirty, and RSS doesn t grow as much because of that. Full disclosure: it turns out the system bytes and in-use bytes numbers I had been collecting in the previous post were smaller than what they should have been, and were excluding memory that the glibc memory allocator would have mmap()ed. That however doesn t affect the trends that had been witnessed. The data below is corrected.

(Note that in the graph above and the graphs that follow, the horizontal axis represents the number of allocator function calls performed) While I was here, I figured I d check how mozjemalloc performs, and it has a better behavior (although it has more overhead).

What doesn t appear on this graph, though, is that mozjemalloc also tells the OS to drop some pages even if it keeps them mapped (madvise(MADV_DONTNEED)), so in practice, it is possible the actual RSS decreases too. And jemalloc 4.5:

(It looks like it has better memory usage than mozjemalloc for this use case, but its stats are being thrown off at some point, I ll have to investigate) Going back to the first graph, let s get a closer look at what the allocations look like when the system bytes number is increasing a lot. The highlights in the following graphs indicate the range the next graph will be showing.

So what we have here is a bunch of small allocations (small enough that they don t seem to move the requested line ; most under 512 bytes, so under normal circumstances, they would be allocated by python, a few between 512 and 2048 bytes), and a few large allocations, one of which triggers a bump in memory use. What can appear weird at first glance is that we have a large allocation not requiring more system memory, later followed by a smaller one that does. What the allocations actually look like is the following:


void *ptr0 = malloc(4850928); // #1391340418
void *ptr1 = realloc(some_old_ptr, 8000835); // #1391340419
free(ptr0); // #1391340420
ptr1 = realloc(ptr1, 8000925); // #1391340421
/* ... */
void *ptrn = malloc(879931); // #1391340465
ptr1 = realloc(ptr1, 8880819); // #1391340466
free(ptrn); // #1391340467

As it turns out, inspecting all the live allocations at that point, while there was a hole large enough to do the first two reallocs (the second actually happens in place), at the point of the third one, there wasn t a large enough hole to fit 8.8MB. What inspecting the live allocations also reveals, is that there is a large number of large holes between all the allocated memory ranges, presumably coming from previous similar patterns. There are, in fact, 91 holes larger than 1MB, 24 of which are larger than 8MB. It s the accumulation of those holes that can t be used to fulfil larger allocations that makes the memory use explode. And there aren t enough small allocations happening to fill those holes. In fact, the global trend is for less and less memory to be allocated, so, smaller holes are also being poked all the time. Really, it s all a straightforward case of memory fragmentation. The reason it tends not to happen with jemalloc is that jemalloc groups allocations by sizes, which the glibc allocator doesn t seem to be doing. The following is how we got a hole that couldn t fit the 8.8MB allocation in the first place:


ptr1 = realloc(ptr1, 8880467); // #1391324989; ptr1 is 0x5555de145600
/* ... */
void *ptrx = malloc(232); // #1391325001; ptrx is 0x5555de9bd760 ; that is 13 bytes after the end of ptr1.
/* ... */
free(ptr1); // #1391325728; this leaves a hole of 8880480 bytes at 0x5555de145600.

All would go well if ptrx was free()d, but it looks like it s long-lived. At least, it s still allocated by the time we reach the allocator call #1391340466. And since the hole is 339 bytes too short for the requested allocation, the allocator has no other choice than request more memory to the system. What s bothering, though, is that the allocator chose to allocate ptrx in the space following ptr1, when it allocated similarly sized buffers after allocating ptr1 and before allocating ptrx in completely different places, and while there are plenty of holes in the allocated memory where it could fit. Interestingly enough, ptrx is a 232 bytes allocation, which means under normal circumstances, python itself would be allocating it. In all likeliness, when the python allocator is enabled, it s allocations larger than 512 bytes that become obstacles to the larger allocations. Another possibility is that the 256KB fragments that the python allocator itself allocates to hold its own allocations become the obstacles (my original hypothesis). I think the former is more likely, though, putting back the blame entirely on glibc s shoulders. Now, it looks like the allocation pattern we have here is suboptimal, so I re-ran a git clone under a debugger to catch when a realloc() for 8880819 bytes happens (the size is peculiar enough that it only happened once in the allocation log). But doing that with a conditional breakpoint is just too slow, so I injected a realloc wrapper with LD_PRELOAD that sends a SIGTRAP signal to the process, so that an attached debugger can catch it. Thanks to the support for python in gdb, it was then posible to pinpoint the exact python instructions that made the realloc() call (it didn t come as a surprise ; in fact, that was one of the places I had in mind, but I wanted definite proof):


new = ''
end = 0
# ...
for diff in RevDiff(rev_patch):
    new += data[end:diff.start]
    new += diff.text_data
    end = diff.end
    # ...
new += data[end:]

What happens here is that we re creating a mercurial manifest we got from the server in patch form against a previous manifest. So data contains the original manifest, and rev_patch the patch. The patch essentially contains instructions of the form replace n bytes at offset o with the content c . The code here just does that in the most straightforward way, implementation-wise, but also, it turns out, possibly the worst way. So let s unroll this loop over a couple iterations:

new = ''

This allocates an empty str object. In fact, this doesn t actually allocate anything, and only creates a pointer to an interned representation of an empty string.

new += data[0:diff.start]

This is where things start to get awful. data is a str, so data[0:diff.start] creates a new, separate, str for the substring. One allocation, one copy. Then appends it to new. Fortunately, CPython is smart enough, and just assigns data[0:diff.start] to new. This can easily be verified with the CPython REPL:

>>> foo = ''
>>> bar = 'bar'
>>> foo += bar
>>> foo is bar
True

(and this is not happening because the example string is small here ; it also happens with larger strings, like 'bar' * 42000) Back to our loop:

new += diff.text_data

Now, new is realloc()ated to have the right size to fit the appended text in it, and the contents of diff.text_data is copied. One realloc, one copy. Let s go for a second iteration:

new += data[diff.end:new_diff.start]

Here again, we re doing an allocation for the substring, and one copy. Then new is realloc()ated again to append the substring, which is an additional copy.

new += new_diff.text_data

new is realloc()ated yet again to append the contents of new_diff.text_data. We now finish with:

new += data[new_diff.end:]

which, again creates a substring from the data, and then proceeds to realloc()ate new one freaking more time. That s a lot of malloc()s and realloc()s to be doing

It is possible to limit the number of realloc()s by using new = bytearray() instead of new = ''. I haven t looked in the CPython code what the growth strategy is, but, for example, appending a 4KB string to a 500KB bytearray makes it grow to 600KB instead of 504KB, like what happens when using str.
It is possible to avoid realloc()s completely by preallocating the right size for the bytearray (with bytearray(size)), but that requires looping over the patch once first to know the new size, or using an estimate (the new manifest can t be larger than the size of the previous manifest + the size of the patch) and truncating later (although I m not sure it s possible to truncate a bytearray without a realloc()). As a downside, this initializes the buffer with null bytes, which is a waste of time.
Another possibility is to reuse bytearrays previously allocated for previous manifests.
Yet another possibility is to accumulate the strings to append and use ''.join(). CPython is smart enough to create a single allocation for the total size in that case. That would be the most convenient solution, but see below.
It is possible to avoid the intermediate allocations and copies for substrings from the original manifest by using memoryview.
Unfortunately, you can t use ''.join() on a list of memoryviews before Python 3.4.

After modifying the code to implement the first and fifth items, memory usage during a git clone of mozilla-central looks like the following (with the python allocator enabled):

(Note this hasn t actually landed on the master branch yet) Compared to what it looked like before, this is largely better. But that s not the only difference: the clone was also about 1000 seconds faster. That s more than 15 minutes! But that s not all so surprising when you know the volumes of data handled here. More insight about this coming in an upcoming post. But while the changes outlined above make the glibc allocator behavior less likely to happen, it doesn t totally obliviate it. In fact, it seems it is still happening by the end of the manifest import phase. We re still allocating increasingly large temporary buffers because the size of the imported manifests grows larger and larger, and every one of them is the result of patching a previous one. The only way to avoid those large allocations creating holes would be to avoid doing them in the first place. My first attempt at doing that, keeping manifests as lists of lines instead of raw buffers, worked, but was terribly slow. So slow, in fact, that I had to stop a clone early and estimated the process would likely have taken a couple days. Iterating over multiple generators at the same time, a lot, kills performance, apparently. I ll have to try with significantly less of that.

Cloning mozilla-central with git-cinnabar requires a lot of memory. Actually too much memory to fit in a 32-bits address space. I hadn t optimized for memory use in the first place. For instance, git-cinnabar keeps sha-1s in memory as hex values (40 bytes) rather than raw values (20 bytes). When I wrote the initial prototype, it didn t matter that much, and while close(ish) to the tipping point, it didn t require more than 2GB of memory at the time. Time passed, and mozilla-central grew. I suspect the recent addition of several thousands of commits and files has made things worse. In order to come up with a plan to make things better (short or longer term), I needed data. So I added some basic memory resource tracking, and collected data while cloning mozilla-central. I must admit, I was not ready for what I witnessed. Follow me for a tale of frustrations (plural). I was expecting things to have gotten worse on the master branch (which I used for the data collection) because I am in the middle of some refactoring and did many changes that I was suspecting might have affected memory usage. I wasn t, however, expecting to see the clone command using 10GB(!) memory at peak usage across all processes.

(Note, those memory sizes are RSS, minus shared ) It also was taking an unexpected long time, but then, I hadn t cloned a large repository like mozilla-central from scratch in a while, so I wasn t sure if it was just related to its recent growth in size or otherwise. So I collected data on 0.4.0 as well.

Less time spent, less memory usage ok. There s definitely something wrong on master. But wait a minute, that slope from ~2GB to ~4GB on the git-remote-hg process doesn t actually make any kind of sense. I mean, I d understand it if it were starting and finishing with the Import manifest phase, but it starts in the middle of it, and ends long before it finishes. WTH? First things first, since RSS can be a variety of things, I checked /proc/$pid/smaps and confirmed that most of it was, indeed, the heap. That s the point where you reach for Google, type something like python memory profile and find various tools. One from the results that I remembered having used in the past is guppy s heapy. Armed with pdb, I broke execution in the middle of the slope, and tried to get memory stats with heapy. SIGSEGV. Ouch. Let s try something else. I reached out to objgraph and pympler. SIGSEGV. Ouch again. Tried working around the crashes for a while (too long while, retrospectively, hindsight is 20/20), and was somehow successful at avoiding them by peaking at a smaller set of objects. But whatever I did, despite being attached to a process that had 2.6GB RSS, I wasn t able to find more than 1.3GB of data. This wasn t adding up. It surely didn t help that getting to that point took close to an hour each time. Retrospectively, I wish I had investigated using something like Checkpoint/Restore in Userspace. Anyways, after a while, I decided that I really wanted to try to see the whole picture, not smaller peaks here and there that might be missing something. So I resolved myself to look at the SIGSEGV I was getting when using pympler, collecting a core dump when it happened. Guess what? The Debian python-dbg package does not contain the debug symbols for the python package. The core dump was useless. Since I was expecting I d have to fix something in python, I just downloaded its source and built it. Ran the command again, waited, and finally got a backtrace. First Google hit for the crashing function? The exact (unfixed) crash reported on the python bug tracker. No patch. Crashing code is doing:

((f)->f_builtins != (f)->f_tstate->interp->builtins)

And (f)->f_tstate is NULL. Classic NULL deref. Added a guard (assessing it wouldn t break anything). Ran the command again. Waited. Again. SIGSEGV. Facedesk. Another crash on the same line. Did I really use the patched python? Yes. But this time (f)->f_tstate->interp is NULL. Sigh. Same player, shoot again. Finally, no crash but still stuck on only 1.3GB accounted for. Ok, I know not all python memory profiling tools are entirely reliable, let s try heapy again. SIGSEGV. Sigh. No debug info on the heapy module, where the crash happens. Sigh. Rebuild the module with debug info, try again. The backtrace looks like heapy is recursing a lot. Look at %rsp, compare with the address space from /proc/$pid/maps. Confirmed. A stack overflow. Let s do ugly things and increase the stack size in brutal ways. Woohoo! Now heapy tells me there s even less memory used than the 1.3GB I found so far. Like, half less. Yeah, right. I m not clear on how I got there, but that s when I found gdb-heap, a tool from Red Hat s David Malcolm, and the associated talk Dude, where s my RAM? A deep dive into how Python uses memory (slides). With a gdb attached, I would finally be able to rip python s guts out and find where all the memory went. Or so I thought. The gdb-heap tool only found about 600MB. About as much as heapy did, for that matter, but it could be coincidental. Oh. Kay. I don t remember exactly what went through my mind then, but, since I was attached to a running process with gdb, I typed the following on the gdb prompt:

gdb> call malloc_stats()

And that s when the truth was finally unvealed: the memory allocator was just acting up the whole time. The ouput was something like:

Arena 0:
system bytes    =  some number above (but close to) 2GB
in use bytes    =  some number above (but close to) 600MB

Yes, the glibc allocator was just telling it had allocated 600MB of memory, but was holding onto 2GB. I must have found a really bad allocation pattern that causes massive fragmentation. One thing that David Malcolm s talk taught me, though, is that python uses its own allocator for small sizes, so the glibc allocator doesn t know about them. And, roughly, adding the difference between RSS and what glibc said it was holding to to the use bytes it reported somehow matches the 1.3GB I had found so far. So it was time to see how those things evolved in time, during the entire clone process. I grabbed some new data, tracking the evolution of system bytes and in use bytes .

There are two things of note on this data:

There is a relatively large gap between what the glibc allocator says it has gotten from the system, and the RSS (minus shared ) size, that I m expecting corresponds to the small allocations that python handles itself.
Actual memory use is going down during the Import manifests phase, contrary to what the evolution of RSS suggests.

In fact, the latter is exactly how git-cinnabar is supposed to work: It reads changesets and manifests chunks, and holds onto them while importing files. Then it throws away those manifests and changesets chunks one by one while it imports them. There is, however, some extra bookkeeping that requires some additional memory, but it s expected to be less memory consuming than keeping all the changesets and manifests chunks in memory. At this point, I thought a possible explanation is that since both python and glibc are mmap()ing their own arenas, they might be intertwined in a way that makes things not go well with the allocation pattern happening during the Import manifest phase (which, in fact, allocates and frees increasingly large buffers for each manifest, as manifests grow in size in the mozilla-central history). To put the theory at work, I patched the python interpreter again, making it use malloc() instead of mmap() for its arenas.

Aha! I thought. That definitely looks much better. Less gap between what glibc says it requested from the system and the RSS size. And, more importantly, no runaway increase of memory usage in the middle of nowhere. I was preparing myself to write a post about how mixing allocators could have unintended consequences. As a comparison point, I went ahead and ran another test, with the python allocator entirely disabled, this time.

Heh. It turns out glibc was acting up all alone. So much for my (plausible) theory. (I still think mixing allocators can have unintended consequences.) (Note, however, that the reason why the python allocator exists is valid: without it, the overall clone took almost 10 more minutes) And since I had been getting all this data with 0.4.0, I gathered new data without the python allocator with the master branch.

This paints a rather different picture than the original data on that branch, with much less memory use regression than one would think. In fact, there isn t much difference, except for the spike at the end, which got worse, and some of the noise during the Import manifests phase that got bigger, implying larger amounts of temporary memory used. The latter may contribute to the allocation patterns that throw glibc s memory allocator off. It turns out tracking memory usage in python 2.7 is rather painful, and not all the tools paint a complete picture of it. I hear python 3.x is somewhat better in that regard, and I hope it s true, but at the moment, I m stuck with 2.7. The most reliable tool I ve used here, it turns out, is pympler. Or rebuilding the python interpreter without its allocator, and asking the system allocator what is allocated. With all this data, I now have some defined problems to tackle, some easy (the spike at the end of the clone), and some less easy (working around glibc allocator s behavior). I have a few hunches as to what kind of allocations are causing the runaway increase of RSS. Coincidentally, I m half-way through a refactor of the code dealing with manifests, and it should help dealing with the issue. But that will be the subject of a subsequent post.

Search Results: "glandium"

21 November 2023

Mike Hommey: How I (kind of) killed Mercurial at Mozilla

1 April 2023

Mike Hommey: Announcing git-cinnabar 0.6.0

29 October 2022

Mike Hommey: Announcing git-cinnabar 0.5.11 and 0.6.0rc2

3 October 2022

Mike Hommey: Announcing git-cinnabar 0.6.0rc1

30 July 2022

Mike Hommey: Announcing git-cinnabar 0.5.10

15 July 2022

Mike Hommey: Announcing git-cinnabar 0.5.9

19 November 2021

Mike Hommey: Announcing git-cinnabar 0.5.8

31 March 2021

Mike Hommey: Announcing git-cinnabar 0.5.7

10 March 2021

Mike Hommey: 5 years ago, Firefox (re)entered Debian

12 November 2020

Mike Hommey: Announcing git-cinnabar 0.5.6

30 August 2020

Mike Hommey: [Linux] Disabling CPU turbo, cores and threads without rebooting

23 April 2020

Mike Hommey: Announcing git-cinnabar 0.5.5

15 June 2017

Mike Hommey: Announcing git-cinnabar 0.5.0 beta 2

3 June 2017

Mike Hommey: Announcing git-cinnabar 0.5.0 beta 1

1 April 2017

Mike Hommey: git-cinnabar experimental features

Mike Hommey: Progress on git-cinnabar memory usage

23 March 2017

Mike Hommey: Why is the git-cinnabar master branch slower to clone?

Mike Hommey: Analyzing git-cinnabar memory use

22 March 2017

Mike Hommey: When the memory allocator works against you, part 2

12 March 2017

Mike Hommey: When the memory allocator works against you