Search Results: "Raphael Bossek"

30 April 2011

Thomas Girard: ACE+TAO Debian packaging moved to git

We recently converted Debian ACE+TAO package repository from Subversion to git. This was a long and interesting process; I learned a lot on git in the course. I had been using git for a while for other packages: BOUML, dwarves and GNU Smalltalk. But I did not really get it. A preliminary study led by Pau[1] showed that out of the following three tools: the last one was giving results that look better.
The conversion svn-all-fast-export requires physical access to the repo, so the Alioth SVN repo was copied on my machine svn-pkg-ace/ before running the tool:
svn-all-fast-export --identity-map authors.txt --rules pkg-ace.rules svn-pkg-ace
Here's the content of the pkg-ace.rules configuration file that was used:
create repository pkg-ace
end repository
match /trunk/
  repository pkg-ace
  branch master
end match
match /(branches tags)/([^/]+)/
  repository pkg-ace
  branch \2
end match
The author mapping file authors.txt being:
markos = Konstantinos Margaritis <email-hidden>
mbrudka-guest = Marek Brudka <email-hidden>
pgquiles-guest = Pau Garcia i Quiles <email-hidden>
tgg = Thomas Girard <email-hidden>
tgg-guest = Thomas Girard <email-hidden>
The tool sample configuration file merged-branches-tags.rules recommends to post-process tags, which are just a branch in SVN. That's why the configuration file above treats branches as tags. The conversion was indeed fast: less than 1 minute.
Post-conversion observations Invoking gitk --all in the converted repo revealed different kind of issues:
  • svn tags as branches: http://thomas.g.girard.free.fr/ACE/tags-as-branches.png Branches are marked with green rectangles, and tags with yellow arrows. What we have here (expected given our configuration of the tool) are branches (e.g. 5.4.7-5) corresponding to tags, and tags matching the SVN tagging commit (e.g. backups/5.4.7-5@224). We'll review and fix this.

  • merged code that did not appear as such: http://thomas.g.girard.free.fr/ACE/missing-merge-metadata.png Branches that were not merged using svn merge look like they were not merged at all.

  • commits with wrong author: http://thomas.g.girard.free.fr/ACE/wrong-author.png Before being in SVN, the repository was stored in CVS. When it was imported into SVN, no special attention was given to the commit author. Hence I got credited for changes I did not write.

  • obsolete branches: http://thomas.g.girard.free.fr/ACE/obsolete-branches.png The tool leaves all branches, including removed ones (with tag on their end) so that you can decide what to do with them.

  • missing merges: http://thomas.g.girard.free.fr/ACE/missing-merge.png The branch 5.4.7-12 was never merged into the trunk!

Learning git Based on observations above, I realized my limited knowledge won't do to complete the conversion and clean the repository. There are tons of documentation on git out there, and you can find a lot of links from the git documentation page. Here's the one I've used:
The Git Object Model It's described with pictures here. You really need to understand this if you haven't already. Once you do, you understand that git is built bottom-up: the plumbing then the porcelain. If you can't find the tool you need, it's easy to write it.
git fast-import The Migrating to Git chapter explains how you can use the git fast-import tool to manually import anything into git. I've used it to create tags with dates in the past, slightly changing the Custom Importer example in the book:
#!/usr/bin/env ruby
#
# retag.rb
#
# Small script to create an annotated tag, specifying commiter as well as
# date, and tag comment.
#
# Based on Scott Chacon "Custom Importer" example.
#
# Arguments:
#  $1 -- tag name
#  $2 -- sha-1 revision to tag
#  $3 -- committer in the form First Last <email>
#  $4 -- date to use in the form YYYY/MM/DD_HH:MM:SS

def help
  puts "Usage: retag <tag> <sha1sum> <committer> <date> <comment>"
  puts "Creates a annotated tag with name <tag> for commit <sha1sum>, using "
  puts "given <committer>, <date> and <comment>"
  puts "The output should be piped to git fast-import"
end
def to_date(datetime)
  (date, time) = datetime.split('_')
  (year, month, day) = date.split('/')
  (hour, minute, second) = time.split(':')
  return Time.local(year, month, day, hour, minute, second).to_i
end
def generate_tag(tag, sha1hash, committer, date, message)
  puts "tag # tag "
  puts "from # sha1hash "
  puts "tagger # committer  # date  +0000"
  print "data # message.size \n# message "
end
if ARGV.length != 5
  help
  exit 1
else
  (tag, sha1sum, committer, date, message) = ARGV
  generate_tag(tag, sha1sum, committer, to_date(date), message)
end
graft points (graft means greffe in French) Because of missing svn:mergeinfo some changes appear unmerged. To fix this there are graft points: they override git idea of parents of a commit. To create a graft point, assuming 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242 is the commit you want to change, currently with a single parent 898ad49b61d4d8d5dc4072351037e2c8ade1ab68, but containing changes from commit 11cf74d4aa996ffed7c07157fe0780ec2224c73e:
me@mymachine$ echo 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242 11cf74d4aa996ffed7c07157fe0780ec2224c73e 898ad49b61d4d8d5dc4072351037e2c8ade1ab68 >> .git/info/grafts
git filter-branch git filter-branch allows you to completely rewrite history of a git branch, changing or dropping commits while traversing the branch. As an additional benefit, this tool use graft points and make them permanent. In other words: after running git filter-branch you can remove .git/info/grafts file. I've used it to rewrite author of a given set of commits, using a hack on top of Chris Johnsen script:
#!/bin/sh

br="HEAD"
TARG_NAME="Raphael Bossek"
TARG_EMAIL="hidden"
export TARG_NAME TARG_EMAIL
filt='
    if test "$GIT_COMMIT" = 546db1966133737930350a098057c4d563b1acdf -o \
            "$GIT_COMMIT" = 23419dde50662852cfbd2edde9468beb29a9ddcc; then
        if test -n "$TARG_EMAIL"; then
            GIT_AUTHOR_EMAIL="$TARG_EMAIL"
            export GIT_AUTHOR_EMAIL
        else
            unset GIT_AUTHOR_EMAIL
        fi
        if test -n "$TARG_NAME"; then
            GIT_AUTHOR_NAME="$TARG_NAME"
            export GIT_AUTHOR_NAME
        else
            unset GIT_AUTHOR_NAME
        fi
    fi
'
git filter-branch $force --tag-name-filter cat --env-filter "$filt" -- $br
(Script edited here; there were much more commits written by Raphael.)

Important

It's important to realize that the whole selected branch history is rewritten, so all objects id will change. You should not do this if you already published your repository.

The --tag-name-filter cat argument ensures our tags are copied during the traversal; otherwise they would be untouched, and hence not available in the new history.

Hint

Once git filter-branch completes you get a new history, as well as a new original ref to ease comparison. It is highly recommended to check the result of the rewrite before removing original. To shrink the repo after this, git clone the rewritten repo with file:// syntax -- git-filter-branch says it all.

Cleaning up the repo To recap, here's how the ACE+TAO git repo was changed after conversion:
  1. Add graft points where needed.

  2. Clean tags and branches. Using git tag -d, git branch -d and the Ruby script above it was possible to recreate tags. During this I was also able to add missing tags, and remove some SVN errors I did -- like committing in a branch created under tags/.

  3. Remove obsolete branches.

  4. Merge missing pieces. There were just two missing debian/changelog entries. I did this before git filter-branch because I did not find a way to use the tool correctly with multiple heads.

  5. Fix commit author where needed. Using the shell script above Raphael is now correctly credited for his work.

That's it. The ACE+TAO git repository for Debian packages is alive at http://git.debian.org/?p=pkg-ace/pkg-ace.git;a=summary.
[1]http://lists.alioth.debian.org/pipermail/pkg-ace-devel/2011-March/002421.html
[2]available in Debian as svn-all-fast-export