Search Results: "Tomasz Buchert"

9 October 2017

Antonio Terceiro: pristine-tar updates

Introduction pristine-tar is a tool that is present in the workflow of a lot of Debian people. I adopted it last year after it has been orphaned by its creator Joey Hess. A little after that Tomasz Buchert joined me and we are now a functional two-person team. pristine-tar goals are to import the content of a pristine upstream tarball into a VCS repository, and being able to later reconstruct that exact same tarball, bit by bit, based on the contents in the VCS, so we don t have to store a full copy of that tarball. This is done by storing a binary delta files which can be used to reconstruct the original tarball from a tarball produced with the contents of the VCS. Ultimately, we want to make sure that the tarball that is uploaded to Debian is exactly the same as the one that has been downloaded from upstream, without having to keep a full copy of it around if all of its contents is already extracted in the VCS anyway. The current state of the art, and perspectives for the future pristine-tar solves a wicked problem, because our ability to reconstruct the original tarball is affected by changes in the behavior of tar and of all of the compression tools (gzip, bzip2, xz) and by what exact options were used when creating the original tarballs. Because of this, pristine-tar currently has a few embedded copies of old versions of compressors to be able to reconstruct tarballs produced by them, and also rely on a ever-evolving patch to tar that is been carried in Debian for a while. So basically keeping pristine-tar working is a game of Whac-A-Mole. Joey provided a good summary of the situation when he orphaned pristine-tar. Going forward, we may need to rely on other ways of ensuring integrity of upstream source code. That could take the form of signed git tags, signed uncompressed tarballs (so that the compression doesn t matter), or maybe even a different system for storing actual tarballs. Debian bug #871806 contains an interesting discussion on this topic. Recent improvements Even if keeping pristine-tar useful in the long term will be hard, too much of Debian work currently relies on it, so we can t just abandon it. Instead, we keep figuring out ways to improve. And I have good news: pristine-tar has recently received updates that improve the situation quite a bit. In order to be able to understand how better we are getting at it, I created a "visualization of the regression test suite results. With the help of data from there, let s look at the improvements made since pristine-tar 1.38, which was the version included in stretch. pristine-tar 1.39: xdelta3 by default. This was the first release made after the stretch release, and made xdelta3 the default delta generator for newly-imported tarballs. Existing tarballs with deltas produced by xdelta are still supported, this only affects new imports. The support for having multiple delta generator was written by Tomasz, and was already there since 1.35, but we decided to only flip the switch after using xdelta3 was supported in a stable release. pristine-tar 1.40: improved compression heuristics pristine-tar uses a few heuristics to produce the smaller delta possible, and this includes trying different compression options. In the release Tomasz included a contribution by Lennart Sorensen to also try the --gnu, which gretly improved the support for rsyncable gzip compressed files. We can see an example of the type of improvement we got in the regression test suite data for delta sizes for faad2_2.6.1.orig.tar.gz: In 1.40, the delta produced from the test tarball faad2_2.6.1.orig.tar.gz went down from 800KB, almost the same size of tarball itself, to 6.8KB pristine-tar 1.41: support for signatures This release saw the addition of support for storage and retrieval of upstream signatures, contributed by Chris Lamb. pristine-tar 1.42: optionally recompressing tarballs I had this idea and wanted to try it out: most of our problems reproducing tarballs come from tarballs produced with old compressors, or from changes in compressor behavior, or from uncommon compression options being used. What if we could just recompress the tarballs before importing then? Yes, this kind of breaks the pristine bit of the whole business, but on the other hand, 1) the contents of the tarball are not affected, and 2) even if the initial tarball is not bit by bit the same that upstream release, at least future uploads of that same upstream version with Debian revisions can be regenerated just fine. In some cases, as the case for the test tarball util-linux_2.30.1.orig.tar.xz, recompressing is what makes it possible to reproduce the tarball (and thus import it with pristine-tar) possible at all: util-linux_2.30.1.orig.tar.xz can only be imported after being recompressed In other cases, if the current heuristics can t produce a reasonably small delta, recompressing makes a huge difference. It s the case for mumble_1.1.8.orig.tar.gz: with recompression, the delta produced from mumble_1.1.8.orig.tar.gz goes from 1.2MB, or 99% of the size to the original tarball, to 14.6KB, 1% of the size of original tarball Recompressing is not enabled by default, and can be enabled by passing the --recompress option. If you are using pristine-tar via a wrapper tool like gbp-buildpackage, you can use the $PRISTINE_TAR environment variable to set options that will affect any pristine-tar invocations. Also, even if you enable recompression, pristine-tar will only try it if the delta generations fails completely, of if the delta produced from the original tarball is too large. You can control what too large means by using the --recompress-threshold-bytes and --recompress-threshold-percent options. See the pristine-tar(1) manual page for details.

21 August 2016

Tomasz Buchert: A new version of pristine-tar

Some of you may have noticed that a new version of pristine-tar was released in Debian unstable. Well, actually, 3 versions were released recently, since very fast a regression was found (#835586) and, sure enough, I found another one after uploading a fix for the first one. The most important new feature is the ability to use xdelta3 instead of xdelta for delta generation. This doesn t concern 99.9% of users and is disabled by default for backwards compatibility, but is going to be default in the future. This new format supports bigger files (see #737499) and xdelta3 seems better maintained. Enjoy the new version and please report any problems you find.

29 June 2015

Lunar: Reproducible builds: week 9 in Stretch cycle

What happened about the reproducible builds effort this week: Toolchain fixes Norbert Preining uploaded texinfo/6.0.0.dfsg.1-2 which makes texinfo indices reproducible. Original patch by Chris Lamb. Lunar submitted recently rebased patches to make the file order of files inside .deb stable. akira filled #789843 to make tex4ht stop printing timestamps in its HTML output by default. Dhole wrote a patch for xutils-dev to prevent timestamps when creating gzip compresed files. Reiner Herrmann sent a follow-up patch for wheel to use UTC as timezone when outputing timestamps. Mattia Rizzolo started a discussion regarding the failure to build from source of subversion when -Wdate-time is added to CPPFLAGS which happens when asking dpkg-buildflags to use the reproducible profile. SWIG errors out because it doesn't recognize the aforementioned flag. Trying to get the .buildinfo specification to more definitive state, Lunar started a discussion on storing the checksums of the binary package used in dpkg status database. akira discovered while proposing a fix for simgrid that CMake internal command to create tarballs would record a timestamp in the gzip header. A way to prevent it is to use the GZIP environment variable to ask gzip not to store timestamps, but this will soon become unsupported. It's up for discussion if the best place to fix the problem would be to fix it for all CMake users at once. Infrastructure-related work Andreas Henriksson did a delayed NMU upload of pbuilder which adds minimal support for build profiles and includes several fixes from Mattia Rizzolo affecting reproducibility tests. Neils Thykier uploaded lintian which both raises the severity of package-contains-timestamped-gzip and avoids false positives for this tag (thanks to Tomasz Buchert). Petter Reinholdtsen filled #789761 suggesting that how-can-i-help should prompt its users about fixing reproducibility issues. Packages fixed The following packages became reproducible due to changes in their build dependencies: autorun4linuxcd, libwildmagic, lifelines, plexus-i18n, texlive-base, texlive-extra, texlive-lang. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Untested uploaded as they are not in main: Patches submitted which have not made their way to the archive yet: debbindiff development debbindiff/23 includes a few bugfixes by Helmut Grohne that result in a significant speedup (especially on larger files). It used to exhibit the quadratic time string concatenation antipattern. Version 24 was released on June 23rd in a hurry to fix an undefined variable introduced in the previous version. (Reiner Herrmann) debbindiff now has a test suite! It is written using the PyTest framework (thanks Isis Lovecruft for the suggestion). The current focus has been on the comparators, and we are now at 93% of code coverage for these modules. Several problems were identified and fixed in the process: paths appearing in output of javap, readelf, objdump, zipinfo, unsqusahfs; useless MD5 checksum and last modified date in javap output; bad handling of charsets in PO files; the destination path for gzip compressed files not ending in .gz; only metadata of cpio archives were actually compared. stat output was further trimmed to make directory comparison more useful. Having the test suite enabled a refactoring of how comparators were written, switching from a forest of differences to a single tree. This helped removing dust from the oldest parts of the code. Together with some other small changes, version 25 was released on June 27th. A follow up release was made the next day to fix a hole in the test suite and the resulting unidentified leftover from the comparator refactoring. (Lunar) Documentation update Ximin Luo improved code examples for some proposed environment variables for reference timestamps. Dhole added an example on how to fix timestamps C pre-processor macros by adding a way to set the build date externally. akira documented her fix for tex4ht timestamps. Package reviews 94 obsolete reviews have been removed, 330 added and 153 updated this week. Hats off for Chris West (Faux) who investigated many fail to build from source issues and reported the relevant bugs. Slight improvements were made to the scripts for editing the review database, edit-notes and clean-notes. (Mattia Rizzolo) Meetings A meeting was held on June 23rd. Minutes are available. The next meeting will happen on Tuesday 2015-07-07 at 17:00 UTC. Misc. The Linux Foundation announced that it was funding the work of Lunar and h01ger on reproducible builds in Debian and other distributions. This was further relayed in a Bits from Debian blog post.

22 June 2015

Lunar: Reproducible builds: week 8 in Stretch cycle

What happened about the reproducible builds effort this week: Toolchain fixes Andreas Henriksson has improved Johannes Schauer initial patch for pbuilder adding support for build profiles. Packages fixed The following 12 packages became reproducible due to changes in their build dependencies: collabtive, eric, file-rc, form-history-control, freehep-chartableconverter-plugin , jenkins-winstone, junit, librelaxng-datatype-java, libwildmagic, lightbeam, puppet-lint, tabble. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: reproducible.debian.net Bugs with the ftbfs usertag are now visible on the bug graphs. This explain the recent spike. (h01ger) Andreas Beckmann suggested a way to test building packages using the funny paths that one can get when they contain the full Debian package version string. debbindiff development Lunar started an important refactoring introducing abstactions for containers and files in order to make file type identification more flexible, enabling fuzzy matching, and allowing parallel processing. Documentation update Ximin Luo detailed the proposal to standardize environment variables to pass a reference source date to tools that needs one (e.g. documentation generator). Package reviews 41 obsolete reviews have been removed, 168 added and 36 updated this week. Some more issues affecting packages failing to build from source have been identified. Meetings Minutes have been posted for Tuesday June 16th meeting. The next meeting is scheduled Tuesday June 23rd at 17:00 UTC. Presentations Lunar presented the project in French during Pas Sage en Seine in Paris. Video and slides are available.

20 June 2015

Lunar: Reproducible builds: week 5 in Stretch cycle

What happened about the reproducible builds effort for this week: Toolchain fixes Uploads that should help other packages: Patch submitted for toolchain issues: Some discussions have been started in Debian and with upstream: Packages fixed The following 8 packages became reproducible due to changes in their build dependencies: access-modifier-checker, apache-log4j2, jenkins-xstream, libsdl-perl, maven-shared-incremental, ruby-pygments.rb, ruby-wikicloth, uimaj. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which did not make their way to the archive yet: Discussions that have been started: reproducible.debian.net Holger Levsen added two new package sets: pkg-javascript-devel and pkg-php-pear. The list of packages with and without notes are now sorted by age of the latest build. Mattia Rizzolo added support for email notifications so that maintainers can be warned when a package becomes unreproducible. Please ask Mattia or Holger or in the #debian-reproducible IRC channel if you want to be notified for your packages! strip-nondeterminism development Andrew Ayer fixed the gzip handler so that it skip adding a predetermined timestamp when there was none. Documentation update Lunar added documentation about mtimes of file extracted using unzip being timezone dependent. He also wrote a short example on how to test reproducibility. Stephen Kitt updated the documentation about timestamps in PE binaries. Documentation and scripts to perform weekly reports were published by Lunar. Package reviews 50 obsolete reviews have been removed, 51 added and 29 updated this week. Thanks Chris West and Mathieu Bridon amongst others. New identified issues: Misc. Lunar will be talking (in French) about reproducible builds at Pas Sage en Seine on June 19th, at 15:00 in Paris. Meeting will happen this Wednesday, 19:00 UTC.

13 June 2015

Tomasz Buchert: Tagging unreplied messages with notmuch

Some people are very bad at responding to e-mails. Or they don t check their mailbox as often as I do, who knows. Anyway, sometimes I want to ping somebody about an e-mail that I sent some time ago. Till now, I did it by going through a list of my sent e-mails and resending messages that were unreplied. However, that was somewhat inefficient. As a solution, I coded a post-new hook for notmuch that tags all unreplied messages. The implementation is rather short and straightforward (see GitHub repo). It marks all replied messages with response and everything else with noresponse. The precise definition of replied message is: a message whose ID is mentioned in at least one In-Reply-To header in your mailbox. To solve my initial problem, I also tag my sent, but unreplied messages with noack so that I can easily obtain the list of people to ping eventually. I also have the backlog tag which groups e-mails sent to me and which I haven t replied yet. Feel free to use it if you find it useful.

17 May 2015

Lunar: Reproducible builds: week 3 in Stretch cycle

What happened about the reproducible builds effort for this week: Toolchain fixes Tomasz Buchert submitted a patch to fix the currently overzealous package-contains-timestamped-gzip warning. Daniel Kahn Gillmor identified #588746 as a source of unreproducibility for packages using python-support. Packages fixed The following 57 packages became reproducible due to changes in their build dependencies: antlr-maven-plugin, aspectj-maven-plugin, build-helper-maven-plugin, clirr-maven-plugin, clojure-maven-plugin, cobertura-maven-plugin, coinor-ipopt, disruptor, doxia-maven-plugin, exec-maven-plugin, gcc-arm-none-eabi, greekocr4gamera, haskell-swish, jarjar-maven-plugin, javacc-maven-plugin, jetty8, latexml, libcgi-application-perl, libnet-ssleay-perl, libtest-yaml-valid-perl, libwiki-toolkit-perl, libwww-csrf-perl, mate-menu, maven-antrun-extended-plugin, maven-antrun-plugin, maven-archiver, maven-bundle-plugin, maven-clean-plugin, maven-compiler-plugin, maven-ear-plugin, maven-install-plugin, maven-invoker-plugin, maven-jar-plugin, maven-javadoc-plugin, maven-processor-plugin, maven-project-info-reports-plugin, maven-replacer-plugin, maven-resources-plugin, maven-shade-plugin, maven-site-plugin, maven-source-plugin, maven-stapler-plugin, modello-maven-plugin1.4, modello-maven-plugin, munge-maven-plugin, ocaml-bitstring, ocr4gamera, plexus-maven-plugin, properties-maven-plugin, ruby-magic, ruby-mocha, sisu-maven-plugin, syncache, vdk2, wvstreams, xml-maven-plugin, xmlbeans-maven-plugin. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Ben Hutchings also improved and merged several changes submitted by Lunar to linux. Currently untested because in contrib: reproducible.debian.net
Thanks to the reproducible-build team for running a buildd from hell. gregor herrmann
Mattia Rizzolo modified the script added last week to reschedule a package from Alioth, a reason can now be optionally specified. Holger Levsen splitted the package sets page so each set now has its own page. He also added new sets for Java packages, Haskell packages, Ruby packages, debian-installer packages, Go packages, and OCaml packages. Reiner Herrmann added locales-all to the set of packages installed in the build environment as its needed to properly identify variations due to the current locale. Holger Levsen improved the scheduling so new uploads get tested sooner. He also changed the .json output that is used by tracker.debian.org to lists FTBFS issues again but only for issues unrelated to the toolchain or our test setup. Amongst many other small fixes and additions, the graph colors should now be more friendly to red-colorblind people. The fix for pbuilder given in #677666 by Tim Landscheidt is now used. This fixed several FTBFS for OCaml packages. Work on rebuilding with different CPU has continued, a kvm-on-kvm build host has been set been set up for this purpose. debbindiff development Version 19 of debbindiff included a fix for a regression when handling info files. Version 20 fixes a bug when diffing files with many differences toward a last line with no newlines. It also now uses the proper encoding when writing the text output to a pipe, and detects info files better. Documentation update Thanks to Santiago Vila, the unneeded -depth option used with find when fixing mtimes has been removed from the examples. Package reviews 113 obsolete reviews have been removed this week while 77 has been added.

21 January 2015

Tomasz Buchert: Expired keys in Debian keyring

A new version of Stellarium was recently released (0.13.2), so I wanted to upload it to Debian unstable as I usually do. And so I did, but it was rejected without me even knowing, since I got no e-mail response from ftp-masters. It turns out that my GPG key in the Debian keyring expired recently and so my upload was rightfully rejected. Not a big deal, actually, since you can easily move the expiration date (even after its expiration!). I did it already and the updated key is already propagated, but be aware that Debian keyring does not synchronize with other keyservers! To update your key in Debian (if you are a Debian Developer or Mantainer) you must send your updated keys to keyring.debian.org like that (you should replace my ID with your own):
$ gpg --keyserver keyring.debian.org --send-keys 24B17D29
Debian keyring is distributed as a standard DEB package and apparently it may take up to a month to have your updated key in Debian. It seems that I may be unable to upload packages for some time. But the whole story made me thinking: am I the only one who forgot to update his key in Debian keyring? To verify it I wrote the following snippet (works in Python 2 and 3!) which shows keys expired in the Debian keyring (well, two of them). As a bonus, it also shows keys that have non-UTF8 characters in UIDs see #738483 for more information.
#
# be sure to do "apt-get install python-gnupg"
#
import gnupg
import datetime
def check_keys(keyring, tab = ""):
    gpg = gnupg.GPG(keyring = keyring)
    gpg.decode_errors = 'replace' # see: https://bugs.debian.org/738483
    keys = gpg.list_keys()
    now = datetime.datetime.now()
    for key in keys:
        uids = key['uids']
        uid = uids[0]
        if key['expires'] != '':
            expire = datetime.datetime.fromtimestamp(int(key['expires']))
            diff = expire - now
            if diff.days < 0:
                print(u' EXPIRED: Key of   expired   days ago.'.format(tab, uid, -diff.days))
        mangled_uids = [ u for u in uids if u'\ufffd' in u ]
        if len(mangled_uids) > 0:
            print(u' MANGLED: Key of   has some mangled uids:  '.format(tab, uid, mangled_uids))
keyrings = [
    "/usr/share/keyrings/debian-keyring.gpg",
    "/usr/share/keyrings/debian-maintainers.gpg"
]
for keyring in keyrings:
    print(u"CHECKING  ".format(keyring))
    check_keys(keyring, tab = "    ")
I m not going to show the output of this code, because it contains names and e-mail adresses which I really shouldn t post. But you can run it yourself. You will see that there is a small group of people with expired keys (including me!). Interestingly, some keys have expired a long time ago: there is one that expired more than 7 years ago! The outcome of the story is: yes, you should have an expiration date on your key for safety reasons, but be careful - it can surprise you at the worst moment.

4 December 2014

Tomasz Buchert: BSP in Munich

My first plan was to participate in the Bug Squashing Party (BSP) in Paris, but finally I didn t manage to organize the travel and proper stay in the capital of France. Finally I settled on Munich and below you will find a summary of my stay there. While organizing my trip, the Munich natives offered me a couch to surf during the stay, which was very kind. In the end, however, I stayed at my friend s place - thanks a lot Mehdi! The BSP gathered Debian people interested in preparing Jessie release, but also KDE, Kolab and LibreOffice people. The host for the meeting was LiMux (on of the most known large-scale Linux deployments) which provided a venue, but also food (free!) and other attractions. Organizers, if you read this, I thank you for the amazing work that you have done. I arrived on late Friday s evening and so I didn t participate that day in bug squashing and took a good sleep instead. On Saturday and Sunday we were, well, squashing bugs. I haven t done NMUs before and it was a new experience to me, even though theoretically I know how it works. The DDs present at the BSP were more than helpful to help me when I was stuck. Here is a short description of what I have done: I also got my GPG keys signed and vice-versa, so I m much better connected to the web of trust now (7 signatures of DDs so far). All in all, it was a great experience which I recommend to everyone. The atmosphere is very motivating and there are many people who will gladly help you if you don t know or understand something. You don t have to be DD to participate even, but a minimal packaging experience is definitely very useful. I m quite sure that people would be glad to help even a complete newcomer (I certainly would do), but being able to build a package using apt-get source and debuild won t hurt anybody. At the very least you can always triage bugs: reproduce them, find the cause of the bug, propose a way to fix it, etc. As always you should be open-minded and willing to get your hands dirty. I want also thank Debian for sponsoring me which finally convinced me to participate (Munich is around 500 km from where I live and the travel is fairly expensive). Some links of interest: Many bugs still wait to be squashed (120 at the time of writing this), so let s get back to work!