Search Results: "utsl"

3 December 2007

Michael Prokop: git[-svn] in 30 minutes

… or something like that… I planned to write a short note about how to start with using git-svn so I can provide a pointer to some of my colleagues. It turned out that git has too many nice features that you should be aware of. :) Hopefully my notes (now being a reference for myself as well, thx to gebi for all the help and feedback) are useful anyway. If you think something (more or less essential, or at least something most of us should be aware of) is missing: please feel free to mention that in the comment section of my blog entry, thanks. Disclaimer: I’m still happy with mercurial for what I - and we at grml in general - use it: linear, but anyway distributed development. git on the other hand provides some really nice features. Rebasing and branching with git is really great - so non-linear development just works. As usual: use the right tool for the right job. git is a bit complicate to use. Not only but especially in the beginning. On the other hand I’m not such a big friend of subversion. If you wanthave to use subversion (Graz University of Technology for example provides a svn service to their students and employees) but prefer to work with git instead you should be aware of git-svn. git-svn gives you bidirectional operation between subversion and git. First of all make sure you have all you might need when working with git. Make sure to use a current version of git (I’m refering to version >=1.5.3). Just execute the following command line on your Debian system to install all relevant packages:
aptitude install \
git-buildpackage git-core git-cvs git-daemon-run \
git-doc git-email git-gui git-load-dirs git-svn \
gitk gitweb qgit
Now let’s start with some general and basic configuration:
# Remove directories from the SVN tree if there
# are no files left behind, configure it globaly:
git config --global svn.rmdir=true
# Want some more global, personal git configuration?
for line in  
  user.name=Michael Prokop 
  user.email=foo@example.invalid 
  color.diff=auto 
  color.diff.new=cyan 
  color.diff.old=magenta 
  color.diff.frag=yellow 
  color.diff.meta=green 
  color.diff.commit=normal 
do
  git config --global $line
done
Check out man git-config for much more details about configuration options. First tip: set ‘g’ as an alias for git so you don’t have to type that much. I’ll write the long version in the following examples so copy/paste works for everyone. Make sure to use the short options of git itself as well: use ‘git co’ for example instead of ‘git checkout’. You can define your own aliases inside git as well - either manually in ~/.gitconfig or running something like:
git config --global alias.st status
Enough pre-configuration for now. It’s time to checkout the SVN repository:
# Check out the SVN repository and set 'svn/' as
# prefix for the branches:
git svn clone -s --prefix=svn/ \
https://svn.tugraz.at/svn/$project foobar && 
cd foobar
# Adjust svn:ignore settings within git:
git svn show-ignore >> .git/info/exclude
# List all branches:
git branch -a
# List all remote branches:
git branch -r
# Rebase your local changes against the
# latest changes in SVN (kind of 'svn up'):
git svn rebase
# Checkout a specific branch:
git checkout $branch
Ok so far? But what do we have to do if we want to work on the upstream source and are allowed to commit/push directly to the repository? Let’s see how to work on that without using branches:
# Hack:
$EDITOR foobar
# Check status
git st[atus]
# List diff:
git diff [foobar]
# Commit it with a commit message using $EDITOR:
git commit -a
# Now commit your changes (that were committed
# previously using git) to SVN, as well as
# automatically updating your working HEAD:
git svn dcommit
But what should we do if we do not have commit rights? Let’s create our own branch and send a patch via mail to upstream:
# Make a new branch:
git checkout -b mikas_demo_patch
# and hack...
$EDITOR
# Commit all changes:
git ci -a -m 'Best patch but worst commit msg ever'
# ... and prepare patch[es]:
git format-patch -s -p -n master
# Now send mail(s) either use git-send-email:
git send-email --to foo@example.invalid *.patch
# ... or if you prefer mutt instead (short zsh syntax):
for f in *.patch ; mutt -H $f
You got a mail from someone else and would like to incorporate changes from the attached patch in your repository? Just store the mail in a seperate mailbox (use save-message in mutt for example, keybinding ’s’ by default), then execute:
# Apply a [series of] patch[es] from a mailbox
git am /path/to/mailbox
Want to work on a seperate branch and rebase your work with upstream?
# First of all make sure to use recent sources...
# So pull when using plain git:
git pull -u
# .. or when using git-svn use:
git svn rebase
# Then create a new branch:
git checkout -b mika
# Hack:
$EDITOR
# Commit:
git ci -a -m 'Best patch but worst commit msg ever'
# Switch to master branch:
git checkout master
# Pull again when using plain git:
git pull -u
# .. or when using git-svn use:
git svn rebase
# Finally switch back
git checkout mika
# Now rebase it with plain git using:
git rebase origin/master
# ... or when using git-svn:
git svn rebase
# Now check out the last 5 commits:
git log -n5
Another branch-session might look like:
git co -b foo
$EDITOR
git ci -a -m 'foo changes'
git co master
git co -b bar
$EDITOR
git ci -a -m 'bar changes''
git co foo
git rebase bar
git log -n5
git st
git branch
Pfuhhh? Right. :) Now it’s time to check out another cool feature: git stash, which is just great when pulling into a dirty tree or when suffering from interrupted workflow. Demo:
git stash
git pull / fetch+rebase
$EDITOR # fix conflicts
git commit -a -m "Fix in a hurry"
git stash apply
git stash clear # unless you want to keep the stash
git reset rocks as well:
# List all recent actions:
git reflog
# Now undo the last action:
git reset --hard HEAD@ 0 
How to get rid of branches?
# Delete a branch. The branch must be fully merged:
git branch -d remove_me_branch
# Delete a branch irrespective of its index status:
git branch -D remove_me_branch
# Delete a remote branch:
git push reponame :branch
Repack a git repository to minimize its disk usage:
git pack-refs --prune
git reflog expire --all
git repack -a -d -f -l
git prune
git rerere gc
Use git cherry to find commits not merged upstream. Another really cool feature is the interactive rebasing: git rebase –interactive Make sure you are aware of gitk: Screenshot of gitk … and don’t forget to set readable fonts for gitk, like:
[ -r ~/.gitk ]   cat > ~/.gitk << EOF
set mainfont  Arial 10 
set textfont   Courier 10 
set uifont  Arial 10 bold 
EOF
If you prefer a Qt based interface check out qgit. Useful ressources:

14 October 2007

Martin F. Krafft: Converting a package to Git

Previously, I demonstrated a Debian packaging workflow using Git and I mentioned the possibility of a follow-up post; well, here it is: you want to use my workflow (or one that's related) for a package that is currently maintained with Subversion on svn.debian.org and you'd like to keep the history during the conversion. Make sure to read the previous post before this one. I am again using the example of mdadm since its Git packaging repository is in a state of shambles and I want to restart to get it right and import the history from the previous Subversion repository. What better way than to write a blog post as I do so? Well, plenty actually. This kind of post isn't really made for a blog, and I have started work on setting up ikiwiki on madduck.net, but it's not yet ready, so I'll stick with the blog for now. I will make sure that links don't break as I move content over, so feel free to bookmark this
Importing the package into Git Thanks to git-svn, the initial step of getting your package imported into Git is a breeze:
$ git-svn clone --stdlayout --no-metadata \
    svn+ssh://svn.debian.org/svn/pkg-mdadm/mdadm mdadm
Sit back and enjoy. If that command exits prematurely with an error such as the following:
Malformed network data: Malformed network data at /usr/local/bin/git-svn line 1029
then you should upgrade to a newer Git version, or have a look here. If your Git does not know --stdlayout then upgrade as well (or use -T trunk -t tags -b branches instead). Sam Vilain notes that it is important to "get the attribution right with the final SVN import - getting the authors map right. I didn't do that. If you look at the repository resulting from the above command, you'll notice strange commit authors, such as madduck@some-unique-uuid-from-svn . git-svn allows you to map these to real names with real email addresses, which ensures that the attributions are good for the whole world to see. When done, switch to the repository and run git-branch -r. As you'll see, git-svn imported all SVN branches and tags as remote branches. You need those if you want to bidirectionally track the Subversion repository, but we are converting, as you may have guessed by the --no-metadata switch above. Therefore, we resort to the Dinosaur method of converting branches to tags, which I'll simplify for mdadm. We also just delete all remote branches after tagging, since mdadm never used branches in the SVN repository. Your mileage may vary.
git branch -r   sed -rne 's, *tags/([^@]+)$,\1,p'   while read tag; do
  echo "git tag debian/$tag tags/$ tag ^; git branch -r -d tags/$tag"
done
git branch -r   while read tag; do
  echo "git branch -r -d $tag"
done
If that seems to work alright, then you can execute the commands. Sam Vilain (again) hints me at git-pack-refs and then to edit .git/packed-refs with an editor. This certainly leaves more room for errors but might be significantly faster.
Cleaning up the SVN references Even though we passed --no-metadata to git-svn, it did leave some traces in .git/, which we can now safely remove:
$ git config --remove-section svn-remote.svn
$ rm -r .git/svn
Setting things straight You can skip this section unless you want to know a bit about how to fix up stuff with Git. There was actually some nasty tagging errors leading up to the 2.5.6-9 release for etch and I could never be bothered to fix those in SVN, but now I can (I love Git!):
$ git tag -d debian/2.5.6-10            # never existed
$ git tag -f debian/2.5.6-8 2.5.6-8~2   # mistagged
$ git checkout -b maint/etch 2.5.6-8    # this is when we diverged
$ git apply < /tmp/mdadm-2.5.6-8..2.5.6-9.diff
$ git add debian/po/gl.po debian/po/pt.po debian/changelog
$ git commit -s
$ git tag debian/2.5.6-9
Now that that's fixed, there is one other thing to worry about, namely the very last commit to SVN, which obsoletes the repository and points to the Git repository. But that's not all of it. I was also silly enough to include a fix in the same commit. Let's see what Git can do. Since the process of obsoletion involves all but adding a file, we can simply --amend the last commit and provide a new log message:
$ git checkout master
$ git rm OBSOLETE debian/OBSOLETE
$ git commit --amend
Now the repository is in an acceptable state.
Making ends meet The pkg-mdadm effort on svn.debian.org only maintained the ./debian/ directory, separate from the upstream code, and boy was that a bad idea. Just to give one example: think about what's involved in preparing a Debian-specific patch against the upstream code this has to end, and we can make it end right here; let's import upstream's code (again not using his ADSL line, but the upstream branch of the pkg-mdadm Git repository; see the previous post for details):
$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
$ git config remote.upstream-repo.fetch \
    +refs/heads/upstream:refs/remotes/upstream-repo/upstream
$ git fetch upstream-repo
$ git checkout -b upstream upstream-repo/master
Now we have two unconnected ancestries in our repository, and it's time to join them together. The most logical way seems to be to use the last upstream tag for which we have a Debian tag: 2.6.2. For this, we branch off the corresponding Debian tag (2.6.2-1) and merge upstream's 2.6.2 tag into the new branch. This will be a temporary branch Then, we rebase (remember, nothing has been published yet) the master branch on top of this temporary branch, before we end that branch's short life. The Debian tag stays where it is since it describes the state of the repository at time of the release of 2.6.2-1.
$ git checkout -b tmp/join debian/2.6.2-1
$ git merge mdadm-2.6.2
$ git rebase tmp/join master
$ git branch -d tmp/join
It just so happens that the head of the SVN repository, which is identical to the tip of our master branch, corresponds to Debian release 2.6.2-2, so we tag it:
$ git tag debian/2.6.2-2
We are now also "born" in the sense that maintenance in Git has started. Let's mark that point in history. There is no real reason I can foresee for this yet, but nonetheless:
$ git tag -s git-birth
Turning dpatch files into feature branches We want to turn dpatch files into feature branches and we somehow make it "proper". We could branch, apply the patch, delete the patch file, checkout master and delete the patch file there as well, but that appears "improper" to me at least; so instead, we'll cherry-pick:
$ git checkout -b deb/conffile-location
$ debian/patches/01-mdadm.conf-location.dpatch -apply
$ git rm debian/patches/01-mdadm.conf-location.dpatch
$ git commit -s
$ git commit -s $(git ls-files --others --modified)
I should quickly intervene to make sure you are following. I am making use of Git's index here. Applying the patch makes the changes in the working tree, but we did not tell Git that we want those to be part of the commit just yet. Instead, we delete the dpatch with git-rm, which automatically registers the deletion with the index. Thus, the first git-commit creates a commit which deletes the dpatch, while the second git-commit creates a commit with all the changes from the dpatch, using git-ls-files to identify new and modified files. But for now, let's move on. We have two commits in the deb/conffile-location branch, and one of those is relevant to the master branch, we cherry-pick it:
$ git cherry-pick deb/conffile-location^
If you're confused, let me explain: our goal is to have a number of feature branches, of which master is the one in which most of ./debian/ is maintained. All the branches later come together in the long-living build branch, so deb/conffile-location will never be merged back into master. However, once we applied the dpatch to the feature branch, we can delete it from there and the master branch. By cherry-picking, we "import" the deletion to the master branch. I repeat the same procedure for deb/docs, merging all the documentation-related dpatches, but I'll spare you the details.
and then Git let me down In the next step, I found I had misunderstood Git merging: I thought Git was smart, but Linus had his reasons for calling Git the "stupid content tracker" (more on that later). Read on as I am obsoleting dpatch files that upstream had merged: 99-*-FIX.dpatch. For consistency, I wanted to cherry-pick each of the appropriate upstream commits into the master branch along with deleting the corresponding dpatch file. Here is one example: 99-monitor-6+10-FIX.dpatch was obsoleted by upstream's commit 66f8bbb; the -x records the original commit ID in the log:
$ git cherry-pick -x 66f8bbb
$ git rm debian/patches/99-monitor-6+10-FIX.dpatch
$ git commit -s -m"remove dpatch obsoleted by $(git rev-parse --short HEAD)"
I repeated the procedure for the other dpatch files, removed the dpatch infrastructure, and then went on to merge it all into build to build the package. The build branch is a long-living branch off upstream, but which upstream? I'll fast-forward you past a segfault problem with mdadm, which upstream (thought to have) resolved with commit 23dc1ae after 2.6.3, but he had not yet released 2.6.4. Looking at the commits between 23dc1ae and upstream's HEAD at the time, I decided to include them all and snapshot 4450e59:
$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo
$ git tag mdadm-2.6.3+200709292116+4450e59 4450e59
$ git checkout master
$ git merge --no-commit mdadm-2.6.3+200709292116+4450e59
$ dch -v mdadm-2.6.3+200709292116+4450e59-1
$ git add debian/changelog
$ git commit -s
And then I called poor-mans-gitbuild, which merges master and then deb/* into build. Here is when stuff blew up. I'll make a long story short (read my description of the problem and Linus' answer if you want to know more): I thought Git was smart to identify merges common to both branches and do the right thing, but it turn out that Git does not care at all about commits, it only worries about content and the end result. In our case, unfortunately (or fortunately), the outcome meant a conflict because the upstream branch introduced a simple change (last hunk) in the lines surrounding the patch we cherry-picked, and Git can't handle it. The solution is not to cherry-pick, to cherry-pick all commits touching the context of the dpatch, or to simply merge upstream into all out feature branches. In our case, the first is the easiest solution and since importing dpatch files is a one-time thing (thank $DEITY), I'll leave it at that. Almost. I have spent two days thinking about this more than I should have. And it was this point Linus made which made me appreciate Git even more:
Conflicts aren't bad - they're good. Trying to aggressively resolve them automatically when two branches have done slightly different things in the same area is stupid and just results in more problems. Instead, git tries to do what I don't think anybody else has done: make the conflicts easy to resolve, by allowing you to work with them in your normal working tree, and still giving you a lot of tools to help you see what's going on.
The end This concludes today's report. Importing the changes from the old Git repo, tagging and merging the branches is all covered in my previous post, or at least you'll find enough information there to complete the exercise. I would like to specifically thank Sam Vilain and Linus Torvalds for their help in preparing this post, as well as the #git/freenode inhabitants, as always. If you are interested in the topic of using version control for distro packaging, I invite you to join the vcs-pkg mailing list and/or the #vcs-pkg/irc.oftc.net IRC channel. Also, if you are interested in Git in general, you can find a list of blog posts on the Git wiki. NP: The Police: Zenyatta Mondatta

10 October 2007

Martin F. Krafft: Packaging with Git

Introduction I gave a joint presentation with Manoj at Debconf7 about using distributed version control for Debian packaging, and I volunteered to do an on-line workshop about using Git for the task, so it's about time that I should know how to use Git for Debian packaging, but it turns out that I don't. Or well, didn't. After I made a pretty good mess out of the mdadm packaging repository (which is not a big problem as it's just ugly history up to the point when I start to get it right), I decided to get down with the topic and figure it out once and for all. I am writing this post as I put the pieces together. It's been cooking for a week, simply so I could gather enough feedback. I am aware that Git is not exactly a showcase of usability, so I took some extra care to not add to the confusion. It may be the first post in a series, because this time, I am just covering the case of mdadm, for which upstream also uses Git and where I am the only maintainer, and I shall pretend that I am importing mdadm to version control for the first time, so there won't be any history juggling. Future posts could well include tracking Subversion repositories with git-svn, and importing packages previously tracked therewith, but this ain't no promise! (well, that last post is already being drafted, but far from finished; you have been warned!) I realise that git-buildpackage exists, but imposes a rather strict branch layout and tagging scheme, which I don't want to adhere to. And gitpkg (Romain blogged about it recently), deserves another look since, according to its author, it does not impose anything on its user. But in any case, before using such tools (and possibly extending them to allow for other layouts), I'd really rather have done it by hand a couple of times to get the hang of it and find out where the culprits lie. Now, enough of the talking, just one last thing: I expect this blog post to change quite a bit as I get feedback. Changes shall be highlighted in bold typeface.
Setting up the infrastructure First, we prepare a shared repository on git.debian.org for later use (using collab-maint for illustration purposes), download the Debian source package we want to import (version 2.6.3+200709292116+4450e59-3 at time of writing, but I pretend it's -2 because we shall create -3 further down ), set up a local repository, and link it to the remote repository. Note that there are other ways to set up the infrastructure, but this happens to be the one I prefer, even though it's slightly more complicated:
$ ssh alioth
$ cd /git/collab-maint
$ ./setup-repository pkg-mdadm mdadm Debian packaging
$ exit
$ apt-get source --download-only mdadm
$ mkdir mdadm && cd mdadm
$ git init
$ git remote add origin ssh://git.debian.org/git/collab-maint/pkg-mdadm
$ git config branch.master.merge refs/heads/master
Now we can use git-pull and git-push, except the remote repository is empty and we can't pull from there yet. We'll save that for later. Instead, we tell the repository about upstream's Git repository. I am giving you the git.debian.org URL though, simply because I don't want upstream repository (which lives on an ADSL line) hammered in response to this blog post:
$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
Since we're using the upstream branch of the pkg-mdadm repository as source (and don't want all the other mess I created in that repository), we'll first limit the set of branches to be fetched (I could have used the -t option in the above git-remote command, but I prefer to make it explicit that we're doing things slightly differently to protect upstream's ADSL line).
$ git config remote.upstream-repo.fetch \
    +refs/heads/upstream:refs/remotes/upstream-repo/upstream
And now we can pull down upstream's history and create a local branch off it. The "no common commits" warning can be safely ignored since we don't have any commits at all at that point (so there can't be any in common between the local and remote repository), but we know what we're doing, even to the point that we can forcefully give birth to a branch, which is because we do not have a HEAD commit yet (our repository is still empty):
$ git fetch upstream-repo
warning: no common commits
[ ]
  # in the real world, we'd be branching off upstream-repo/master
$ git checkout -b upstream upstream-repo/upstream
warning: You appear to be on a branch yet to be born.
warning: Forcing checkout of upstream-repo/upstream.
Branch upstream set up to track remote branch
  refs/remotes/upstream-repo/upstream.
$ git branch
* upstream
$ ls   wc -l
77
Importing the Debian package Now it's time to import Debian's diff.gz remember how I pretend to use version control for package maintenance for the first time. Oh, and sorry about the messy file names, but I decided it's best to stick with real data in case you are playing along: Since we're applying the diff against version 2.6.3+200709292116+4450e59, we ought to make sure to have the repository at the same state. Upstream never "released" that version, but I encoded the commit ID of the tip when I snapshotted it: 4450e59, so we branch off there. Since we are actually tracking the git.debian.org pkg-mdadm repository instead of upstream, you can use the tag I made. Otherwise you could consider tagging yourself:
$ #git tag -s mdadm-2.6.3+200709292116+4450e59 4450e59
$ git checkout -b master mdadm-2.6.3+200709292116+4450e59
$ zcat ../mdadm_2.6.3+200709292116+4450e59-2.diff.gz   git apply
The local tree is now "debianised", but Git does not know about the new and changed files, which you can verify with git-status. We will split the changes made by Debian's diff.gz across several branches.
The idea of feature branches We could just create a debian branch, commit all changes made by the diff.gz there, and be done with it. However, we might want to keep certain aspects of Debianisation separate, and the way to do that is with feature branches (also known as "topic" branches). For the sake of this demonstration, let's create the following four branches in addition to the master branch, which holds the standard Debian files, such as debian/changelog, debian/control, and debian/rules:
  • upstream-patches will includes patches against the upstream code, which I submit for upstream inclusion.
  • deb/conffile-location makes /etc/mdadm/mdadm.conf the default over /etc/mdadm.conf and is Debian-specific (thus the deb/ prefix).
  • deb/initramfs includes the initramfs hook and script, which I want to treat separately but not submit upstream.
  • deb/docs similarly includes Debian-only documentation I add to the package as a service to Debian users.
If you're importing a Debian package using dpatch, you might want to convert every dpatch into a single branch, or at least collect logical units into separate branches. Up to you. For now, our simple example suffices. Keep in mind that it's easy to merge two branch and less trivial to split one into two. Why? Well, good question. As you will see further down, the separation between master and deb/initramfs actually makes things more complicated when you are working on an issue spanning across both. However, feature branches also bring a whole lot of flexibility. For instance, with the above separation, I could easily create mdadm packages without initramfs integration (see #434934), a disk-space-conscious distribution like grml might prefer to leave out the extra documentation, and maybe another derivative doesn't like the fact that the configuration file is in a different place from upstream. With feature branches, all these issues could be easily addressed by leaving out unwanted branches from the merge into the integration/build branch (see further down). Whether you use feature branches, and how many, or whether you'd like to only separate upstream and Debian stuff is entirely up to you. For the purpose of demonstration, I'll go the more complicated way.
Setting up feature branches So let's commit the individual files to the branches. The output of the git-checkout command shows modified files that have not been committed yet (which I trim after the first example); Git keeps these across checkouts/branch changes. Note that the ./debian/ directory does not show up as Git does not know about it yet (git-status will tell you that it's untracked, or rather: contains untracked files since Git does not track directories at all):
$ git checkout -b upstream-patches mdadm-2.6.3+200709292116+4450e59
M Makefile
M ReadMe.c
M mdadm.8
M mdadm.conf.5
M mdassemble.8
M super1.c
$ git add super1.c     #444682
$ git commit -s
  # i now branch off master, but that's the same as 4450e59 actually
  # i just do it so i can make this point 
$ git checkout -b deb/conffile-location master
$ git add Makefile ReadMe.c mdadm.8 mdadm.conf.5 mdassemble.8
$ git commit -s
$ git checkout -b deb/initramfs master
$ git add debian/initramfs/*
$ git commit -s
$ git checkout -b deb/docs master
$ git add RAID5_versus_RAID10.txt md.txt rootraiddoc.97.html
$ git commit -s
  # and finally, the ./debian/ directory:
$ git checkout master
$ chmod +x debian/rules
$ git add debian
$ git commit -s
$ git branch
  deb/conffile-location
  deb/docs
* master
  upstream
  upstream-patches
At this time, we push our work so it won't get lost if, at this moment, aliens land on the house, or any other completely plausible event of apocalypse descends upon you. We'll push our work to git.debian.org (the origin, which is the default destination and thus needs not be specified) by using git-push --all, which conveniently pushes all local branches, thus including the upstream code; you may not want to push the upstream code, but I prefer it since it makes it easier to work with the repository, and since most of the objects are needed for the other branches anyway after all, we branched off the upstream branch. Specifying --tags instead of --all pushes tags instead of heads (branches); you couldn't have guessed that! See this thread if you (rightfully) think that one should be able to do this in a single command (which is not git push refs/heads/* refs/tags/*)
$ git push --all
$ git push --tags
Done. Well, almost
Building the package (theory) Let's build the package. There seem to be two (sensible) ways we could do this, considering that we have to integrate (merge) the branches we just created, before we fire off the building scripts:
  1. by using a temporary (or "throw-away") branch off upstream, where we integrate all the branches we have just created, build the package, tag our master branch (it contains debian/changelog), and remove the temporary branch. When a new package needs to be built, we repeat the process.
  2. by using a long-living integration branch off upstream, into which we merge all our branches, tag the branch, and build the package off the tag. When a new package comes around, we re-merge our branches, tag, and build.
Both approaches have a certain appeal to me, but I settled for the second, for two reasons, the first of which leads to the second:
  1. When I upload a package to the Debian archive, I want to create a tag which captures the exact state of the tree from which the package was built, for posterity (I will return to this point later). Since the throw-away branches are not designed to persist and are not uploaded to the archive, tagging the merging commit makes no sense. Thus, the only way to properly identify a source tree across all involved branches would be to run git-tag $branch/$tagname $branch for each branch, which is purely semantic and will get messy sooner or later.
  2. As a result of the above: when Debian makes a new stable release, I would like to create a branch corresponding to the package in the stable archive at the time, for security and other proposed updates. I could rename my throw-away branch, if it still existed, or I could create a new branch and merge all other branches, using the (semantic) tags, but that seems rather unfavourable.
So instead, I use a long-living integration branch, notoriously tag the merge commits which produced the tree from which I built the package I uploaded, and when a certain version ends up in a stable Debian release, I create a maintenance branch off the one, single tag which corresponds to the very version of the package distributed as part of the Debian release. So much for the theory. Let's build, already!
Building the package (practise) So we need a long-living integration branch, and that's easier done than said:
$ git checkout -b build mdadm-2.6.3+200709292116+4450e59
Now we're ready to build, and the following procedure should really be automated. I thus write it like a script, called poor-mans-gitbuild, which takes as optional argument the name of the (upstream) tag to use, defaulting to upstream (the tip):
#!/bin/sh
set -eu
git checkout master
debver=$(dpkg-parsechangelog   sed -ne 's,Version: ,,p')
git checkout build
git merge $ 1:-upstream 
git merge upstream-patches
git merge master
for b in $(git for-each-ref --format='%(refname)' refs/heads/deb/*); do
  git merge -- $b
done
git tag -s debian/$debver
debuild   # will ignore .git automatically
git checkout master
Note how we are merging each branch in turn, instead of using the octopus merge strategy (which would create a commit with more than two parents) for reasons outlined in this post. An octopus-merge would actually work in our situation, but it will not always work, so better safe than sorry (although you could still achieve the same result). If you discover during the build that you forgot something, or the build script failed to run, just remove the tag, undo the merges, checkout the branch to which you need to commit to fix the issue, and then repeat the above build process:
$ git tag -d debian/$debver
$ git checkout build
$ git reset --hard upstream
$ git checkout master
$ editor debian/rules    # or whatever
$ git add debian/rules
$ git commit -s
$ poor-mans-gitbuild
Before you upload, it's a good idea to invoke gitk --all and verify that all goes according to plan:
screenshot of gitk after the above steps
When you're done and the package has been uploaded, push your work to git.debian.org, as before. Instead of using --all and --tags, I now specify exactly which refs to push. This is probably a good habit to get into to prevent publishing unwanted refs:
$ git push origin build tag debian/2.6.3+200709292116+4450e59-3
Now take your dog for a walk, or play outside, or do something else not involving a computer or entertainment device.
Uploading a new Debian version If you are as lucky as I am, the package you uploaded still has a bug in the upstream code and someone else fixes it before upstream releases a new version, then you might be in the position to release a new Debian version. Or maybe you just need to make some Debian-specific changes against the same upstream version. I'll let the commands speak for themselves:
$ git checkout upstream-patches
$ git-apply < patch-from-lunar.diff   #444682 again
$ git commit --author 'J r my Bobbio <lunar@debian.org>' -s
  # this should also be automated, see below
$ git checkout master
$ dch -i
$ dpkg-parsechangelog   sed -ne 's,Version: ,,p'
2.6.3+200709292116+4450e59-3
$ git commit -s debian/changelog
$ poor-mans-gitbuild
$ git push
$ git push origin tag debian/2.6.3+200709292116+4450e59-3
That first git-push may require a short explanation: without any arguments, git-push updates only the intersection of local and remote branches, so it would never push a new local branch (such as build above), but it updates all existing ones; thus, you cannot inadvertedly publish a local branch. Tags still need to be published explicitly.
Hacking on the software Imagine: on a rainy Saturday afternoon you get bored and decide to implement a better way to tell mdadm when to start which array. Since you're a genius, it'll take you only a day, but you do make mistakes here and there, so what could be better than to use version control? However, rather than having a branch that will live forever, you are just creating a local branch, which you will not publish. When you are done, you'll feed your work back into the existing branches. Git makes branching really easy and as you may have spotted, the poor-mans-gitbuild script reserves an entire branch namespace for people like you:
$ git checkout -b tmp/start-arrays-rework master
Unfortunately (or fortunately), fixing this issue will require work on two branches, since the initramfs script and hook are maintained in a separate branch. There are (again) two ways in which we can (sensibly) approach this:
  • create two separate, temporary branches, and switch between them as you work.
  • merge both into the temporary branch and later cherry-pick the commits into the appropriate branches.
I am undecided on this, but maybe the best would be a combination: merge both into a temporary branch and later cherry-pick the commits into two additional, temporary branches until you got it right, and then fast-forward the official branches to their tips:
$ git merge master deb/initramfs
$ editor debian/mdadm-raid                     #  
$ git commit -s debian/mdadm-raid
$ editor debian/initramfs/script.local-top     #  
$ git commit -s debian/initramfs/script.local-top
[many hours of iteration pass ]
[  until you are done]
$ git checkout -b tmp/start-arrays-rework-init master
  # for each commit $c in tmp/start-arrays-rework
  # applicable to the master branch:
$ git cherry-pick $c
$ git checkout -b tmp/start-arrays-rework-initramfs deb/initramfs
  # for each commit $c in tmp/start-arrays-rework
  # applicable to the deb/initramfs branch:
$ git cherry-pick $c
This is assuming that all your commits are logical units. If you find several commits which would better be bundled together into a single commit, this is the time to do it:
$ git cherry-pick --no-commit <commit7>
$ git cherry-pick --no-commit <commit4>
$ git cherry-pick --no-commit <commit5>
$ git commit -s
Before we now merge this into the official branches, let me briefly intervene and introduce the concept of a fast-forward. Git will "fast-forward" a branch to a new tip if it decides that no merge is needed. In the above example, we branched a temporary branch (T) off the tip of an official branch (O) and then worked on the temporary one. If we now merge the temporary one into the official one, Git determines that it can actually squash the ancestry into a single line and push the official branch tip to the same ref as the temporary branch tip. In cheap (poor man's), ASCII notation:
- - - O             >> merge T >>     - - - = - - OT
         - - T      >>  into O >>
This works because no new commits have been made on top of O (if there would be any, we might be able to rebase, but let's not go there quite yet; rebasing is how you shoot yourself in the foot with Git). Thus we can simply do the following:
$ git checkout deb/initramfs
$ git merge tmp/start-arrays-rework-initramfs
$ git checkout master
$ git merge tmp/start-arrays-rework-init
and test/build/push the result. Or well, since you are not an mdadm maintainer (We^W I have open job positions! Applications welcome!), you'll want to submit your work as patches via email:
$ git format-patch -s -M origin/master
This will create a number of files in the current directory, one corresponding for each commit you made since origin/master. Assuming each commit is a logical unit, you can now submit these to an email address. The --compose option lets you write an introductory message, which is optional:
$ git send-email --compose --to your@email.address <file1> <file2> < >
Once you've verified that everything is alright, swap your email address for the bug number (or the pkg-mdadm-devel list address). Thanks (in advance) for your contribution! Of course, you may also be working on a feature that you want to go upstream, in which case you'd probably branch off upstream-patches (if it depends on a patch not yet in upstream's repository), or upstream (if it does not):
$ git checkout -b tmp/cool-feature upstream
[ ]
when a new upstream version comes around After a while, upstream may have integrated your patches, in addition to various other changes, to give birth to mdadm-2.6.4. We thus first fetch all the new refs and merge them into our upstream branch:
$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo/master
we could just as well have executed git-pull, which with the default configuration would have done the same; however, I prefer to separate the process into fetching and merging. Now comes the point when many Git people think about rebasing. And in fact, rebasing is exactly what you should be doing, iff you're still working on an unpublished branch, such as the previous tmp/cool-feature off upstream. By rebasing your branch onto the updated upstream branch, you are making sure that your patch will apply cleanly when upstream tries it, because potential merge conflicts would be handled by you as part of the rebase, rather than by upstream:
$ git checkout tmp/cool-feature
$ git rebase upstream
What rebasing does is quite simple actually: it takes every commit you made since you branched off the parent branch and records the diff and commit message. Then, for each diff/commit_message pair, it creates a new commit on top of the new parent branch tip, thus rewrites history, and orphans all your original commits. Thus, you should only do this if your branch has never been published or else you would leave people who cloned from your published branch with orphans.
If this still does not make sense, try it out: create a (source) repository, make a commit (with a meaningful commit message), branch B off the tip, make a commit on top of B (with a meaningful message), clone that repository and return to the source repository. There, checkout the master, make a commit (with a ), checkout B, rebase it onto the tip of master, make a commit (with a ), and now git-pull from the clone; use gitk to figure out what's going on.
So you should almost never rebase a published branch, and since all your branches outside of the tmp/* namespace are published on git.debian.org, you should not rebase those. But then again, Pierre actually rebases a published branch in his workflow, and he does so with reason: his patches branch is just a collection of branches to go upstream, from which upstream cherry-picks or which upstream merges, but which no one tracks (or should be tracking). But we can't (or at least will not at this point) do this for our feature branches (though we could treat upstream-patches that way), so we have to merge. At first, it suffices to merge the new upstream into the long-living build branch, and to call poor-mans-gitbuild, but if you run into merge conflicts or find that upstream's changes affect the functionality contained in your feature branches, you need to actually fix those. For instance, let's say that upstream started providing md.txt (which I previously provided in the deb/docs branch), then I need to fix that branch:
$ git checkout deb/docs
$ git rm md.txt
$ git commit -s
That was easy, since I could evade the conflict. But what if upstream made a change to Makefile, which got in the way with my configuration file location change? Then I'd have to merge upstream into deb/conffile-location, resolve the conflicts, and commit the change:
$ git checkout deb/conffile-location
$ git merge upstream
CONFLICT!
$ git-mergetool
$ git commit -s
When all conflicts have been resolved, I can prepare a new release, as before:
$ git checkout master
$ dch -i
$ dpkg-parsechangelog   sed -ne 's,Version: ,,p'
2.6.3+200709292116+4450e59-3
# git commit -s debian/changelog
$ poor-mans-gitbuild
# git push
$ git push origin tag debian/2.6.3+200709292116+4450e59-3
Note that Git often appears smart about commits that percolated upstream: since upstream included the two commits in upstream-patches in his 2.6.4 release, my upstream-patches branch got effectively annihilated, and Git was smart enough to figure that out without a conflict. But before you rejoice, let it be told that this does not always work.
Creating and using a maintenance branch Let's say Debian "lenny" is released with mdadm 2.7.6-1, then:
$ git checkout -b maint/lenny debian/2.7.6-1
You might do this to celebrate the release, or you may wait until the need arises. We've already left the domain of reality ("lenny" is not yet released), so the following is just theory. Now, assume that a security bug is found in mdadm 2.7.6 after "lenny" was released. Upstream is already on mdadm 2.7.8 and commits deadbeef and c0ffee fix the security issue, then you'd cherry-pick them into the maint/lenny branch:
$ git checkout upstream
$ git pull
$ git checkout maint/lenny
$ git cherry-pick deadbeef
$ git cherry-pick c0ffee
If there are no merge conflicts (which you'd resolve with git-mergetool), we can just go ahead to prepare the new package:
$ dch -i
$ dpkg-parsechangelog   sed -ne 's,Version: ,,p'
2.7.6-1lenny1
$ git commit -s debian/changelog
$ poor-mans-gitbuild
$ git push origin maint/lenny
$ git push origin tag debian/2.7.6-1lenny1
Future directions It should be trivial to create the Debian source package directly from the repository, and in fact, in response to a recent blog post of mine on the dispensability of pristine upstream tarballs, two people showed me their scripts to do it. My post also caused Joey Hess to clarify his position on pristine tarballs, before he went out to implement dpkg-source v3. This looks very promising. Yet, as Romain argues, there are benefits with simple patch management systems. Exciting times ahead! In addition to creating source packages from version control, a couple of other ideas have been around for a while:
  • create debian/changelog from commit log summaries when you merge into the build branch.
  • integrate version control with the BTS, bidirectionally:
    • given a bug report, create a temporary branch and apply any patches found in the bug report.
    • upon merging the temporary branch back into the feature branch it modifies, generate a patch, send it to the BTS and tag the bug report + pending patch.
And I am sure there are more. If you have any, I'd be interested to hear about them!
Wrapping up I hope this post was useful. Thank you for reading to the end, this was probably my longest blog post ever. I want to thank Pierre Habouzit, Johannes Schindelin, and all the others on the #git/freenode IRC channel for their tutelage. Thanks also to Manoj Srivastava, whose pioneering work on packaging with GNU arch got me started on most of the concepts I use in the above. And of course, the members of the the vcs-pkg mailing list for the various discussions on this subject, especially those who participated in the thread leading up to this post. Finally, thanks to Linus and Junio for Git and the continuously outstanding high level of support they give. If you are interested in the topic of using version control for distro packaging, I invite you to join the vcs-pkg mailing list and/or the #vcs-pkg/irc.oftc.net IRC channel. NP: Aphex Twin: Selected Ambient Works, Volume 2 (at least when I started writing )

24 June 2007

Mike Hommey: Playing more with LVM, LUKS and the device mapper

Following my previous entry about playing around with LVM, LUKS and the device mapper, I documented myself about internals involved in pvmove, which is more of a challenge, considering there is no such documentation. I could find no useful documentation for either the device mapper or LVM. A bit of good old UTSL later, I could elaborate how to do what I wanted, and realized it was even possible to do in shell script. So here you are : a shell script to transform an LVM physical volume to a LVM over LUKS physical volume,. I’ll detail in between how it works. As the previous script, use at your own risk, it comes with no warranty.
Note you theorically can still access the filesystems underneath without problems. It’s an in-place and live conversion. Also note this is more a proof of concept than a proper, risk-less and well-written solution.

set -e
dev=$1
luks=$(mktemp)
cryptdev=$(basename $dev)_crypt
pvsize=$(pvs -o pe_start,pv_size --units s --noheadings --nosuffix "$dev" awk ' print $1 + $2 ')
devsize=$(blockdev --getsz "$dev")
mchunk=8
The script takes the physical volume device to convert as an argument. Note there is no check for the validity of its value.

dd of="$luks" seek=$devsize count=0 bs=512 2> /dev/null
luksdev=$(losetup -f)
losetup "$luksdev" "$luks"
trap "losetup -d \"$luksdev\"; rm -f \"$luks\"" EXIT
Next, we create a sparse luks file the same size as the device (in case luksFormat would use the size somehow, but I believe it doesn’t), and a loopback device on this file. The trap is here to avoid leaving the loopback device and the file when an error occurs later (though during the conversion itself, it will be pointless).

cryptsetup luksFormat -q "$luksdev"
cryptsetup luksOpen "$luksdev" $ cryptdev _real
read start length crypt format key IVoff cdev offset <<EOF
$(dmsetup table $ cryptdev _real)
EOF
We create a LUKS device, so that we can get the encryption key ($key), and the size of the LUKS header ($offset). Note you need to add –showkey to the dmsetup table command on sid.

if [ $(expr $devsize - $pvsize) -lt $offset ]; then
  echo Not enough free space after LVM physical volume
  cryptsetup luksClose $ cryptdev _real
  exit
fi
Check we have enough space after the LVM physical volume to offset everything by the size of the LUKS header. If not, you can still try again after you reduce the size of the LVM physical volume by an extent.

if [ $(expr $devsize % $offset % $mchunk) -gt 0 ]; then
  echo Last
  cryptsetup luksClose $ cryptdev _real
  exit
fi
This is another check to avoid surprises at the end, when dealing with the last chunk. As the script is written for the moment, it doesn’t support cases where this last chunk is not a multiple of $mchunk. So we need to abort in these.

read major minor <<EOF
$(stat -t "$dev" awk ' print $10,$11 ')
EOF
maps=$(dmsetup deps awk -F: "/\($major, $minor\)/ print \$1 ")
dmsetup create $cryptdev <<EOF
0 $length linear $dev 0
EOF
dmsetup reload $ cryptdev _real <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume $ cryptdev _real
for map in $ maps ; do
  dmsetup table "$map" sed s,$major:$minor,/dev/mapper/$cryptdev, dmsetup reload "$map"
  dmsetup resume "$map"
done
Here, we create the dev_crypt device mapper that is our fake LUKS device, which starts as a simple linear mapper and will end as a complete crypt mapper. This fake LUKS device is inserted as an intermediate mapper between the LVM device mapper and the real device. So, the LVM device mapper will have this fake LUKS device as backend, and, at the beginning, the fake LUKS device maps linearly to the real device. Note we look for all device mappers using the real device as backend before creating the fake LUKS device to avoid finding the fake LUKS device in the list. Also note dmsetup reload only loads a new table in the INACTIVE slot, and dmsetup resume makes this inactive table LIVE.

cursor=$length
chunk=$offset
while [ $cursor -gt 0 ]; do
  cursor=$(expr $cursor - $chunk)
  if [ $cursor -lt 0 ]; then
    chunk=$(expr $chunk + $cursor)
    cursor=0
  fi
  (
  [ $cursor -ne 0 ] && echo 0 $cursor linear $dev 0
  echo $cursor $chunk mirror core 1 $mchunk 2 $dev $cursor /dev/mapper/$ cryptdev _real $cursor
  [ $cursor -lt $(expr $length - $chunk) ] && echo $(expr $cursor + $chunk) $(expr $length - $cursor - $chunk) crypt $format $key $(expr $IVoff + $cursor + $chunk) $dev $(expr $offset + $cursor + $chunk)
  ) dmsetup reload "$cryptdev"
  dmsetup resume "$cryptdev"
  chunks=$(expr $chunk / $mchunk)
  while ! dmsetup status "$cryptdev" grep "$chunks/$chunks"; do
    true
  done
done
This is where the main work is done : moving the data around. We actually just let the device mapper deal with the data duplication, $offset blocks by $offset blocks ($offset being the LUKS header size), using a mirror target for the chunk being moved. So our disk looks like the following:

We use the extra dev_crypt_real device (the previously remapped LUKS device) as the encryption backend for the mirror.
I haven’t figured a better way to wait for the end of the mirroring than to do a loop checking with dmsetup status, dmsetup wait doesn’t seem to be very helpful here.
Anyways, this is the part of the script where you don’t want a crash to occur. Because if it does, all you can do is start on a rescue system, and try to find where the encrypted part of the disk start to setup a device mapper by hand.
And you’d better have the luks temporary file in a directory that is neither in RAM (think tmpfs) nor in the LVM you are converting (/tmp in a default etch install is, for instance ; note the script works nevertheless fine in this case). Also note the trap will remove the luks temporary file if the script exits…

dmsetup reload "$cryptdev" <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume "$cryptdev"
dmsetup remove "$ cryptdev _real"
dd if="$luks" of=$dev count=$offset bs=512 2> /dev/null
Final steps of the conversion : our dev_crypt device becomes a full LUKS volume, so we can remove dev_crypt_real and add the LUKS headers at the top of the device we converted. At this moment, the LUKS volume is setup just as if it had been setup by cryptsetup. For LVM to recognize the change properly, you need to run pvscan. Once you ran it, you can do whatever you want with LVM. Now, you may want to add the following to your /etc/crypttab file:

$cryptdev $dev none luks
i.e. hda5_crypt /dev/hda5 none luks if the device was /dev/hda5. And if the LVM volume you converted contains your root filesystem, you should run (for Debian systems):

update-initramfs -u
I tested this successfully under qemu and will give it a shot on my laptop some time soon. Now, because I had a hard time not finding much about the mirror target of the device mapper, here is what I could gather about it. The target syntax is as follows:

<logical_start_sector> <num_sectors> mirror [ core disk ] <num_params> <param> ... <num_mirrors> [ <destination > <start_sector> ] ...
logical_start_sector, num_sectors, destination and start_sector have the same meaning as in other targets.
core and disk are two different log types (to track differences between mirrors), respectively in memory and on disk.
num_params and params depend on the log type: num_mirrors is the number of mirrors and for each mirror, we have a pair destination and start_sector. [ Update: after a quick look at the device mapper source code in Linus’s git tree, updated the mirror target description ]

27 June 2006

Michael Janssen: The Continuing Saga of the ML-2010

I upgraded my CUPS to 1.2 today, and had a bit of trouble with getting the ML-2010 to work with it. Given my other issues with this printer, I thought I would expound on how I fixed yet another problem with this semi-supported printer.
The Samsung printing uses the linuxprint system, which uses a configuration file in an XML format which isn't specified. In the default install for the Samsung linux tool, it is installed in /usr/local/linuxprinter/linuxprint.cfg with a link from /etc/linuxprint.cfg to it. My file, after being setup, looks like this:

<?xml version="1.0"?>
<linux root="/usr/local/linuxprinter" system="cups">
  <option name="ghostscript" value="/usr/bin/gs-esp"/>
  <option name="address" value="localhost"/>
  <option name="port" value="631"/>
  <option name="lpr" value="/usr/bin/lp"/>
  <option name="llpr-default-printer" value="lp"/>
  <printer ppd="ppd/C/ML-2010spl2.ppd" queue="lp">
    <option name="Resolution" value="600"/>
    <option name="PageSize" value="Letter"/>
    <option name="InputSlot" value="AUTO"/>
    <option name="MediaType" value="PRINTER"/>
    <option name="JCLJamrecovery" value="RWJOff"/>
    <option name="JCLEconomode" value="PRINTERDEFAULT"/>
  </printer>
</linux>
 
I discovered the hard way that if this file isn't there, the filter which is installed (ppmtospl2) doesn't work that well. In this case, the printer queue is lp. If this file is setup correctly, you can setup CUPS yourself, using the ppd file which is referenced in the linuxprint.cfg file. If you have lost your linuxprint.cfg, I suspect you can just modify the above with the correct ppd - the Samsung package has many of them for different printers. If you don't want to go that way, you can rerun the /usr/local/linuxprinter/bin/linux-config as before, but you will have to open your CUPS 1.2 server wide open while you are configuring it so that it can add the printer. Also, I had to have a printer existing in the CUPS 1.2 server in order to have the linux-config program work at all. I solved this by just adding a virtual pdf printer (using the package cups-pdf). Even when you get linux-config to add the printer, it will not add it correctly for CUPS 1.2 - the device is incorrect - so you will have to reconfigure it via the web interface and give it the ppd which is in the directory anyway.
At least I have my printer back again.

21 May 2006

Andreas Metzler: kicking gnutls packaging

Over the last days I have taken a shot at getting the packaging of the gnutls dependency chain in shape. I have now got packages of gnutls13 1.4.0, libtasn1-3 0.3.4 and opencdk8 0.5.8 which at least when reading the debian diff seem to be ok. Once libtasn1-3 is in the archive gnutsl12 could be updated to 1.2.11 and we could try to get rid of libtasn1-2 and gnutls11. The packages currently in the archive have an unreadable diff, as they are not based on upstream's tarballs but on sources pulled from cvs. I hope the package maintainer Matthias Urlichs will reappear soon and can make use of my patches.