Search Results: "Andrew Ayer"

30 January 2024

Matthew Palmer: Why Certificate Lifecycle Automation Matters

If you ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the Show More button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense after all, if a CA is issuing a large volume of certificates, they ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer. This gave me a list of the issuers of these certificates, which looks a bit like this:
/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020
Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data
Lieutenant Commander Data from Star Trek: The Next Generation Insert obligatory "not THAT data" comment here
The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:
IssuerCompromised Count
Sectigo170
ISRG (Let's Encrypt)161
GoDaddy141
DigiCert81
GlobalSign46
Entrust3
SSL.com1
If you re familiar with the CA ecosystem, you ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then. Let s look more closely at the relationships, though, to see if we can get more useful insights.

Volume Control Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a compromise rate . I m using the Unexpired Precertificates colume from the issuance volume report, as I feel that s the number that best matches the certificate population I m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.
IssuerIssuance VolumeCompromised CountCompromise Rate
Sectigo88,323,0681701 in 519,547
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
GoDaddy56,121,4291411 in 398,024
DigiCert144,713,475811 in 1,786,586
GlobalSign1,438,485461 in 31,271
Entrust23,16631 in 7,722
SSL.com171,81611 in 171,816
If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
Sectigo88,323,0681701 in 519,547
DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
By grouping by order-of-magnitude in the compromise rate, we can identify three bands :
  • The Super Leakers: Customers of Entrust and GlobalSign seem to love to lose control of their private keys. For Entrust, at least, though, the small volumes involved make the numbers somewhat untrustworthy. The three compromised certificates could very well belong to just one customer, for instance. I m not aware of anything that GlobalSign does that would make them such an outlier, either, so I m inclined to think they just got unlucky with one or two customers, but as CAs don t include customer IDs in the certificates they issue, it s not possible to say whether that s the actual cause or not.
  • The Regular Leakers: Customers of SSL.com, GoDaddy, and Sectigo all have compromise rates in the 1-in-hundreds-of-thousands range. Again, the low volumes of SSL.com make the numbers somewhat unreliable, but the other two organisations in this group have large enough numbers that we can rely on that data fairly well, I think.
  • The Low Leakers: Customers of DigiCert and Let s Encrypt are at least three times less likely than customers of the regular leakers to lose control of their private keys. Good for them!
Now we have some useful insights we can think about.

Why Is It So?
Professor Julius Sumner Miller If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out
All of the organisations on the list, with the exception of Let s Encrypt, are what one might term traditional CAs. To a first approximation, it s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key somewhere. Since humans are handling the keys, there s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world. Let s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let s Encrypt has 161 compromised certificates currently in the wild, it s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn t eliminate it completely.

Explaining the Outlier The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let s Encrypt and the other organisations, if it weren t for one outlier. This is a largely traditional CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let s Encrypt. We are, of course, talking about DigiCert. The thing about DigiCert, that doesn t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest hosted TLS providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys. When we ask for all certificates issued by DigiCert , we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent. It s possible, though not trivial, to account for certificates issued to these hosted TLS providers, because the certificates they use are issued from intermediates branded to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:
SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';
The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
"Regular" DigiCert40,397,363811 in 498,732
Sectigo88,323,0681701 in 519,547
All DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
In short, it appears that DigiCert s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean? The takeaway from all this is fairly straightforward, and not overly surprising, I believe.

The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem
Bender, the robot from Futurama, asking if we'd like to kill all humans No thanks, Bender, I'm busy tonight
This observation that if you don t let humans near keys, they don t get leaked is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between most and basically all of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS the keys are locked away where humans can t get to them. It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it. More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated If you d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations In the interests of clarity, I feel it s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn t feasibly account for.
  • Time Periods: Because time never stops, there is likely to be some slight mismatches in the numbers obtained from the various data sources, because they weren t collected at exactly the same moment.
  • Issuer-to-Organisation Mapping: It s possible that the way I mapped issuers to organisations doesn t match exactly with how crt.sh does it, meaning that counts might be skewed. I tried to minimise that by using the same data sources (the CCADB AllCertificates report) that I believe that crt.sh uses for its mapping, but I cannot be certain of a perfect match.
  • Unwarranted Grouping: I ve drawn some conclusions about the practices of the various organisations based on their general approach to certificate issuance. If a particular subordinate CA that I ve grouped into the parent organisation is managed in some unusual way, that might cause my conclusions to be erroneous. I was able to fairly easily separate out CloudFlare, AWS, and Azure, but there are almost certainly others that I didn t spot, because hoo boy there are a lot of intermediate CAs out there.

20 September 2016

Reproducible builds folks: Reproducible Builds: week 73 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday September 11 and Saturday September 17 2016: Toolchain developments Ximin Luo started a new series of tools called (for now) debrepatch, to make it easier to automate checks that our old patches to Debian packages still apply to newer versions of those packages, and still make these reproducible. Ximin Luo updated one of our few remaining patches for dpkg in #787980 to make it cleaner and more minimal. The following tools were fixed to produce reproducible output: Packages reviewed and fixed, and bugs filed The following updated packages have become reproducible - in our current test setup - after being fixed: The following updated packages appear to be reproducible now, for reasons we were not able to figure out. (Relevant changelogs did not mention reproducible builds.) The following 3 packages were not changed, but have become reproducible due to changes in their build-dependencies: jaxrs-api python-lua zope-mysqlda. Some uploads have addressed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Reviews of unreproducible packages 462 package reviews have been added, 524 have been updated and 166 have been removed in this week, adding to our knowledge about identified issues. 25 issue types have been updated: Weekly QA work FTBFS bugs have been reported by: diffoscope development A new version of diffoscope 60 was uploaded to unstable by Mattia Rizzolo. It included contributions from: It also included from changes previous weeks; see either the changes or commits linked above, or previous blog posts 72 71 70. strip-nondeterminism development New versions of strip-nondeterminism 0.027-1 and 0.028-1 were uploaded to unstable by Chris Lamb. It included contributions from: disorderfs development A new version of disorderfs 0.5.1 was uploaded to unstable by Chris Lamb. It included contributions from: It also included from changes previous weeks; see either the changes or commits linked above, or previous blog posts 70. Misc. This week's edition was written by Ximin Luo and reviewed by a bunch of Reproducible Builds folks on IRC.

15 June 2016

Reproducible builds folks: Reproducible builds: week 59 in Stretch cycle

What happened in the Reproducible Builds effort between June 5th and June 11th 2016: Media coverage Ed Maste gave a talk at BSDCan 2016 on reproducible builds (slides, video). GSoC and Outreachy updates Weekly reports by our participants: Documentation update - Ximin Luo proposed a modification to our SOURCE_DATE_EPOCH spec explaining FORCE_SOURCE_DATE. Some upstream build tools (e.g. TeX, see below) have expressed a desire to control which cases of embedded timestamps should obey SOURCE_DATE_EPOCH. They were not convinced by our arguments on why this is a bad idea, so we agreed on an environment variable FORCE_SOURCE_DATE for them to implement their desired behaviour - named generically, so that at least we can set it centrally. For more details, see the text just linked. However, we strongly urge most build tools not to use this, and instead obey SOURCE_DATE_EPOCH unconditionally in all cases. Toolchain fixes Packages fixed The following 16 packages have become reproducible due to changes in their build-dependencies: apertium-dan-nor apertium-swe-nor asterisk-prompt-fr-armelle blktrace canl-c code-saturne coinor-symphony dsc-statistics frobby libphp-jpgraph paje.app proxycheck pybit spip tircd xbs The following 5 packages are new in Debian and appear to be reproducible so far: golang-github-bowery-prompt golang-github-pkg-errors golang-gopkg-dancannon-gorethink.v2 libtask-kensho-perl sspace The following packages had older versions which were reproducible, and their latest versions are now reproducible again after being fixed: The following packages have become reproducible after being fixed: Some uploads have fixed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Package reviews 68 reviews have been added, 19 have been updated and 28 have been removed in this week. New and updated issues: 26 FTBFS bugs have been reported by Chris Lamb, 1 by Santiago Vila and 1 by Sascha Steinbiss. diffoscope development strip-nondeterminism development disorderfs development tests.reproducible-builds.org Misc. Steven Chamberlain submitted a patch to FreeBSD's makefs to allow reproducible builds of the kfreebsd installer. Ed Maste committed a patch to FreeBSD's binutils to enable determinstic archives by default in GNU ar. Helmut Grohne experimented with cross+native reproductions of dash with some success, using rebootstrap. This week's edition was written by Ximin Luo, Chris Lamb, Holger Levsen, Mattia Rizzolo and reviewed by a bunch of Reproducible builds folks on IRC.

12 January 2016

Bits from Debian: New Debian Developers and Maintainers (November and December 2015)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

3 January 2016

Lunar: Reproducible builds: week 35 in Stretch cycle

What happened in the reproducible builds effort between December 20th to December 26th: Toolchain fixes Mattia Rizzolo rebased our experimental versions of debhelper (twice!) and dpkg on top of the latest releases. Reiner Herrmann submited a patch for mozilla-devscripts to sort the file list in generated preferences.js files. To be able to lift the restriction that packages must be built in the same path, translation support for the __FILE__ C pre-processor macro would also be required. Joerg Sonnenberger submitted a patch back in 2010 that would still be useful today. Chris Lamb started work on providing a deterministic mode for debootstrap. Packages fixed The following packages have become reproducible due to changes in their build dependencies: bouncycastle, cairo-dock-plug-ins, darktable, gshare, libgpod, pafy, ruby-redis-namespace, ruby-rouge, sparkleshare. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: reproducible.debian.net Statistics for package sets are now visible for the armhf architecture. (h01ger) The second build now has a longer timeout (18 hours) than the first build (12 hours). This should prevent wasting resources when a machine is loaded. (h01ger) Builds of Arch Linux packages are now done using a tmpfs. (h01ger) 200 GiB have been added to jenkins.debian.net (thanks to ProfitBricks!) to make room for new jobs. The current count is at 962 and growing! diffoscope development Aside from some minor bugs that have been fixed, a one-line change made huge memory (and time) savings as the output of transformation tool is now streamed line by line instead of loaded entirely in memory at once. disorderfs development Andrew Ayer released disorderfs version 0.4.2-1 on December 22th. It fixes a memory corruption error when processing command line arguments that could cause command line options to be ignored. Documentation update Many small improvements for the documentation on reproducible-builds.org sent by Georg Koppen were merged. Package reviews 666 (!) reviews have been removed, 189 added and 162 updated in the previous week. 151 new fail to build from source reports have been made by Chris West, Chris Lamb, Mattia Rizzolo, and Niko Tyni. New issues identified: unsorted_filelist_in_xul_ext_preferences, nondeterminstic_output_generated_by_moarvm. Misc. Steven Chamberlain drew our attention to one analysis of the Juniper ScreenOS Authentication Backdoor: Whilst this may have been added in source code, it was well-disguised in the disassembly and just 7 instructions long. I thought this was a good example of the current state-of-the-art, and why we'd like our binaries and eventually, installer and VM images reproducible IMHO. Joanna Rutkowska has mentioned possible ways for Qubes to become reproducible on their development mailing-list.

11 November 2015

Bits from Debian: New Debian Developers and Maintainers (September and October 2015)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

14 October 2015

Lunar: Reproducible builds: week 24 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Scott Kitterman fixed an issue with non-deterministic Depends generated by dh-python identified by Santiago Vila and Chris Lamb. Lunar updated the patch against dpkg which makes the order of files in control.tar.gz deterministic using the new --sort=name option available in GNU Tar 1.28. josch released sbuild version 0.66.0-1 with several fixes and improvements. The most notable one for reproducible builds is the new --build-path option and $build_path configuration variable added by akira which allows to explicitly chose a given build path. Reiner Herrmann wrote a new patch for dh-systemd to sort the list of unit files in the generated maintainer scripts. Packages fixed The following packages became reproducible due to changes in their build dependencies: aoeui, apron, camlmix, cudf, findlib, glpk-java, hawtjni, haxe, java-atk-wrapper, llvm-py, misery, mtasc, ocamldsort, optcomp, spamoracle. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Untested Patches submitted which have not made their way to the archive yet: reproducible.debian.net ProfitBricks once again increased their support for reproducible builds in Debian and in other free software projects by adding 58 new cores and 138 GiB of RAM to the already existing setup. Two new amd64 build nodes and 16 new amd64 build jobs have been added which doubles the build capacity per day and allows us to spot many kind of problems earlier. The size of the tmpfs where builds are performed has also been increased from 70 to 200 GiB on all amd64 build nodes. Huge thanks! When examining a package, a link now points to a table listing all previous recorded tests for the same package. (Mattia) The menu on the package pages has also been improved. (h01ger) Packages in the depwait state are now rescheduled automatically after five days. (h01ger) Links to documentation and other projects being tested have been made more visible on the landing page. (h01ger) To reduce noise on the team IRC channel five different types of notifications have been turned into mail notifications. The remaining ones have been shortened and the status changes have been limited to unstable and experimental. (h01ger) Maintainer notifications about status changes in a package will only be sent out once per day, and not on each status change. (h01ger) diffoscope development Some more experiments of concurrent processing have been made. None were good and reliable enough to be shared, though. Package reviews 48 reviews have been removed, 189 added and 23 updated this week. 9 FTBFS bugs were reported by Chris Lamb. Misc. h01ger met with Levente Polyak to discuss testing Arch Linux on Debian continuous test system with an easily extensible framework. The idea is to also allow testing of other distributions, and provide a nice package based view like the one for Debian.

1 September 2015

Lunar: Reproducible builds: week 18 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Aur lien Jarno uploaded glibc/2.21-0experimental1 which will fix the issue were locales-all did not behave exactly like locales despite having it in the Provides field. Lunar rebased the pu/reproducible_builds branch for dpkg on top of the released 1.18.2. This made visible an issue with udebs and automatically generated debug packages. The summary from the meeting at DebConf15 between ftpmasters, dpkg mainatainers and reproducible builds folks has been posted to the revelant mailing lists. Packages fixed The following 70 packages became reproducible due to changes in their build dependencies: activemq-activeio, async-http-client, classworlds, clirr, compress-lzf, dbus-c++, felix-bundlerepository, felix-framework, felix-gogo-command, felix-gogo-runtime, felix-gogo-shell, felix-main, felix-shell-tui, felix-shell, findbugs-bcel, gco, gdebi, gecode, geronimo-ejb-3.2-spec, git-repair, gmetric4j, gs-collections, hawtbuf, hawtdispatch, jack-tools, jackson-dataformat-cbor, jackson-dataformat-yaml, jackson-module-jaxb-annotations, jmxetric, json-simple, kryo-serializers, lhapdf, libccrtp, libclaw, libcommoncpp2, libftdi1, libjboss-marshalling-java, libmimic, libphysfs, libxstream-java, limereg, maven-debian-helper, maven-filtering, maven-invoker, mochiweb, mongo-java-driver, mqtt-client, netty-3.9, openhft-chronicle-queue, openhft-compiler, openhft-lang, pavucontrol, plexus-ant-factory, plexus-archiver, plexus-bsh-factory, plexus-cdc, plexus-classworlds2, plexus-component-metadata, plexus-container-default, plexus-io, pytone, scolasync, sisu-ioc, snappy-java, spatial4j-0.4, tika, treeline, wss4j, xtalk, zshdb. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: Chris Lamb also noticed that binaries shipped with libsilo-bin did not work. Documentation update Chris Lamb and Ximin Luo assembled a proper specification for SOURCE_DATE_EPOCH in the hope to convince more upstreams to adopt it. Thanks to Holger it is published under a non-Debian domain name. Lunar documented easiest way to solve issues with file ordering and timestamps in tarballs that came with tar/1.28-1. Some examples on how to use SOURCE_DATE_EPOCH have been improved to support systems without GNU date. reproducible.debian.net armhf is finally being tested, which also means the remote building of Debian packages finally works! This paves the way to perform the tests on even more architectures and doing variations on CPU and date. Some packages even produce the same binary Arch:all packages on different architectures (1, 2). (h01ger) Tests for FreeBSD are finally running. (h01ger) As it seems the gcc5 transition has cooled off, we schedule sid more often than testing again on amd64. (h01ger) disorderfs has been built and installed on all build nodes (amd64 and armhf). One issue related to permissions for root and unpriviliged users needs to be solved before disorderfs can be used on reproducible.debian.net. (h01ger) strip-nondeterminism Version 0.011-1 has been released on August 29th. The new version updates dh_strip_nondeterminism to match recent changes in debhelper. (Andrew Ayer) disorderfs disorderfs, the new FUSE filesystem to ease testing of filesystem-related variations, is now almost ready to be used. Version 0.2.0 adds support for extended attributes. Since then Andrew Ayer also added support to reverse directory entries instead of shuffling them, and arbitrary padding to the number of blocks used by files. Package reviews 142 reviews have been removed, 48 added and 259 updated this week. Santiago Vila renamed the not_using_dh_builddeb issue into varying_mtimes_in_data_tar_gz_or_control_tar_gz to align better with other tag names. New issue identified this week: random_order_in_python_doit_completion. 37 FTBFS issues have been reported by Chris West (Faux) and Chris Lamb. Misc. h01ger gave a talk at FrOSCon on August 23rd. Recordings are already online. These reports are being reviewed and enhanced every week by many people hanging out on #debian-reproducible. Huge thanks!

25 August 2015

Lunar: Reproducible builds: week 17 in Stretch cycle

A good amount of the Debian reproducible builds team had the chance to enjoy face-to-face interactions during DebConf15.
Names in red and blue were all present at DebConf15
Picture of the  reproducible builds  talk during DebConf15
Hugging people with whom one has been working tirelessly for months gives a lot of warm-fuzzy feelings. Several recorded and hallway discussions paved the way to solve the remaining issues to get reproducible builds part of Debian proper. Both talks from the Debian Project Leader and the release team mentioned the effort as important for the future of Debian. A forty-five minutes talk presented the state of the reproducible builds effort. It was then followed by an hour long roundtable to discuss current blockers regarding dpkg, .buildinfo and their integration in the archive. Picture of the  reproducible builds  roundtable during DebConf15 Toolchain fixes Reiner Herrmann submitted a patch to make rdfind sort the processed files before doing any operation. Chris Lamb proposed a new patch for wheel implementing support for SOURCE_DATE_EPOCH instead of the custom WHEEL_FORCE_TIMESTAMP. akira sent one making man2html SOURCE_DATE_EPOCH aware. St phane Glondu reported that dpkg-source would not respect tarball permissions when unpacking under a umask of 002. After hours of iterative testing during the DebConf workshop, Sandro Knau created a test case showing how pdflatex output can be non-deterministic with some PNG files. Packages fixed The following 65 packages became reproducible due to changes in their build dependencies: alacarte, arbtt, bullet, ccfits, commons-daemon, crack-attack, d-conf, ejabberd-contrib, erlang-bear, erlang-cherly, erlang-cowlib, erlang-folsom, erlang-goldrush, erlang-ibrowse, erlang-jiffy, erlang-lager, erlang-lhttpc, erlang-meck, erlang-p1-cache-tab, erlang-p1-iconv, erlang-p1-logger, erlang-p1-mysql, erlang-p1-pam, erlang-p1-pgsql, erlang-p1-sip, erlang-p1-stringprep, erlang-p1-stun, erlang-p1-tls, erlang-p1-utils, erlang-p1-xml, erlang-p1-yaml, erlang-p1-zlib, erlang-ranch, erlang-redis-client, erlang-uuid, freecontact, givaro, glade, gnome-shell, gupnp, gvfs, htseq, jags, jana, knot, libconfig, libkolab, libmatio, libvsqlitepp, mpmath, octave-zenity, openigtlink, paman, pisa, pynifti, qof, ruby-blankslate, ruby-xml-simple, timingframework, trace-cmd, tsung, wings3d, xdg-user-dirs, xz-utils, zpspell. The following packages became reproducible after getting fixed: Uploads that might have fixed reproducibility issues: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: St phane Glondu reported two issues regarding embedded build date in omake and cduce. Aur lien Jarno submitted a fix for the breakage of make-dfsg test suite. As binutils now creates deterministic libraries by default, Aur lien's patch makes use of a wrapper to give the U flag to ar. Reiner Herrmann reported an issue with pound which embeds random dhparams in its code during the build. Better solutions are yet to be found. reproducible.debian.net Package pages on reproducible.debian.net now have a new layout improving readability designed by Mattia Rizzolo, h01ger, and Ulrike. The navigation is now on the left as vertical space is more valuable nowadays. armhf is now enabled on all pages except the dashboard. Actual tests on armhf are expected to start shortly. (Mattia Rizzolo, h01ger) The limit on how many packages people can schedule using the reschedule script on Alioth has been bumped to 200. (h01ger) mod_rewrite is now used instead of JavaScript for the form in the dashboard. (h01ger) Following the rename of the software, debbindiff has mostly been replaced by either diffoscope or differences in generated HTML and IRC notification output. Connections to UDD have been made more robust. (Mattia Rizzolo) diffoscope development diffoscope version 31 was released on August 21st. This version improves fuzzy-matching by using the tlsh algorithm instead of ssdeep. New command line options are available: --max-diff-input-lines and --max-diff-block-lines to override limits on diff input and output (Reiner Herrmann), --debugger to dump the user into pdb in case of crashes (Mattia Rizzolo). jar archives should now be detected properly (Reiner Herrman). Several general code cleanups were also done by Chris Lamb. strip-nondeterminism development Andrew Ayer released strip-nondeterminism version 0.010-1. Java properties file in jar should now be detected more accurately. A missing dependency spotted by St phane Glondu has been added. Testing directory ordering issues: disorderfs During the reproducible builds workshop at DebConf, participants identified that we were still short of a good way to test variations on filesystem behaviors (e.g. file ordering or disk usage). Andrew Ayer took a couple of hours to create disorderfs. Based on FUSE, disorderfs in an overlay filesystem that will mount the content of a directory at another location. For this first version, it will make the order in which files appear in a directory random. Documentation update Dhole documented how to implement support for SOURCE_DATE_EPOCH in Python, bash, Makefiles, CMake, and C. Chris Lamb started to convert the wiki page describing SOURCE_DATE_EPOCH into a Freedesktop-like specification in the hope that it will convince more upstream to adopt it. Package reviews 44 reviews have been removed, 192 added and 77 updated this week. New issues identified this week: locale_dependent_order_in_devlibs_depends, randomness_in_ocaml_startup_files, randomness_in_ocaml_packed_libraries, randomness_in_ocaml_custom_executables, undeterministic_symlinking_by_rdfind, random_build_path_by_golang_compiler, and images_in_pdf_generated_by_latex. 117 new FTBFS bugs have been reported by Chris Lamb, Chris West (Faux), and Niko Tyni. Misc. Some reproducibility issues might face us very late. Chris Lamb noticed that the test suite for python-pykmip was now failing because its test certificates have expired. Let's hope no packages are hiding a certificate valid for 10 years somewhere in their source! Pictures courtesy and copyright of Debian's own paparazzi: Aigars Mahinovs.

26 July 2015

Lunar: Reproducible builds: week 12 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Eric Dorlan uploaded automake-1.15/1:1.15-2 which makes the output of mdate-sh deterministic. Original patch by Reiner Herrmann. Kenneth J. Pronovici uploaded epydoc/3.0.1+dfsg-8 which now honors SOURCE_DATE_EPOCH. Original patch by Reiner Herrmann. Chris Lamb submitted a patch to dh-python to make the order of the generated maintainer scripts deterministic. Chris also offered a fix for a source of non-determinism in dpkg-shlibdeps when packages have alternative dependencies. Dhole provided a patch to add support for SOURCE_DATE_EPOCH to gettext. Packages fixed The following 78 packages became reproducible in our setup due to changes in their build dependencies: chemical-mime-data, clojure-contrib, cobertura-maven-plugin, cpm, davical, debian-security-support, dfc, diction, dvdwizard, galternatives, gentlyweb-utils, gifticlib, gmtkbabel, gnuplot-mode, gplanarity, gpodder, gtg-trace, gyoto, highlight.js, htp, ibus-table, impressive, jags, jansi-native, jnr-constants, jthread, jwm, khronos-api, latex-coffee-stains, latex-make, latex2rtf, latexdiff, libcrcutil, libdc0, libdc1394-22, libidn2-0, libint, libjava-jdbc-clojure, libkryo-java, libphone-ui-shr, libpicocontainer-java, libraw1394, librostlab-blast, librostlab, libshevek, libstxxl, libtools-logging-clojure, libtools-macro-clojure, litl, londonlaw, ltsp, macsyfinder, mapnik, maven-compiler-plugin, mc, microdc2, miniupnpd, monajat, navit, pdmenu, pirl, plm, scikit-learn, snp-sites, sra-sdk, sunpinyin, tilda, vdr-plugin-dvd, vdr-plugin-epgsearch, vdr-plugin-remote, vdr-plugin-spider, vdr-plugin-streamdev, vdr-plugin-sudoku, vdr-plugin-xineliboutput, veromix, voxbo, xaos, xbae. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: reproducible.debian.net The statistics on the main page of reproducible.debian.net are now updated every five minutes. A random unreviewed package is suggested in the look at a package form on every build. (h01ger) A new package set based new on the Core Internet Infrastructure census has been added. (h01ger) Testing of FreeBSD has started, though no results yet. More details have been posted to the freebsd-hackers mailing list. The build is run on a new virtual machine running FreeBSD 10.1 with 3 cores and 6 GB of RAM, also sponsored by Profitbricks. strip-nondeterminism development Andrew Ayer released version 0.009 of strip-nondeterminism. The new version will strip locales from Javadoc, include the name of files causing errors, and ignore unhandled (but rare) zip64 archives. debbindiff development Lunar continued its major refactoring to enhance code reuse and pave the way to fuzzy-matching and parallel processing. Most file comparators have now been converted to the new class hierarchy. In order to support for archive formats, work has started on packaging Python bindings for libarchive. While getting support for more archive formats with a common interface is very nice, libarchive is a stream oriented library and might have bad performance with how debbindiff currently works. Time will tell if better solutions need to be found. Documentation update Lunar started a Reproducible builds HOWTO intended to explain the different aspects of making software build reproducibly to the different audiences that might have to get involved like software authors, producers of binary packages, and distributors. Package reviews 17 obsolete reviews have been removed, 212 added and 46 updated this week. 15 new bugs for packages failing to build from sources have been reported by Chris West (Faux), and Mattia Rizzolo. Presentations Lunar presented Debian efforts and some recipes on making software build reproducibly at Libre Software Meeting 2015. Slides and a video recording are available. Misc. h01ger, dkg, and Lunar attended a Core Infrastructure Initiative meeting. The progress and tools mode for the Debian efforts were shown. Several discussions also helped getting a better understanding of the needs of other free software projects regarding reproducible builds. The idea of a global append only log, similar to the logs used for Certificate Transparency, came up on multiple occasions. Using such append only logs for keeping records of sources and build results has gotten the name Binary Transparency Logs . They would at least help identifying a compromised software signing key. Whether the benefits in using such logs justify the costs need more research.

7 July 2015

Lunar: Reproducible builds: week 10 in Stretch cycle

What happened about the reproducible builds effort this week: Media coverage Daniel Stender published an English translation of the article which originally appeared in Linux Magazin in Admin Magazine. Toolchain fixes Fixes landed in the Debian archive: Lunar submitted to Debian the patch already sent upstream adding a --clamp-mtime option to tar. Patches have been submitted to add support for SOURCE_DATE_EPOCH to txt2man (Reiner Herrmann), epydoc (Reiner Herrmann), GCC (Dhole), and Doxygen (akira). Dhole uploaded a new experimental debhelper to the reproducible repository which exports SOURCE_DATE_EPOCH. As part of the experiment, the patch also sets TZ to UTC which should help with most timezone issues. It might still be problematic for some packages which would change their settings based on this. Mattia Rizzolo sent upstream a patch originally written by Lunar to make the generate-id() function be deterministic in libxslt. While that patch was quickly rejected by upstream, Andrew Ayer came up with a much better one which sadly could have some performance impact. Daniel Veillard replied with another patch that should be deterministic in most cases without needing extra data structures. It's impact is currently being investigated by retesting packages on reproducible.debian.net. akira added a new option to sbuild for configuring the path in which packages are built. This will be needed for the srebuild script. Niko Tyni asked Perl upstream about it using the __DATE__ and __TIME__ C processor macros. Packages fixed The following 143 packages became reproducible due to changes in their build dependencies: alot, argvalidate, astroquery, blender, bpython, brian, calibre, cfourcc, chaussette, checkbox-ng, cloc, configshell, daisy-player, dipy, dnsruby, dput-ng, dsc-statistics, eliom, emacspeak, freeipmi, geant321, gpick, grapefruit, heat-cfntools, imagetooth, jansson, jmapviewer, lava-tool, libhtml-lint-perl, libtime-y2038-perl, lift, lua-ldoc, luarocks, mailman-api, matroxset, maven-hpi-plugin, mknbi, mpi4py, mpmath, msnlib, munkres, musicbrainzngs, nova, pecomato, pgrouting, pngcheck, powerline, profitbricks-client, pyepr, pylibssh2, pylogsparser, pystemmer, pytest, python-amqp, python-apt, python-carrot, python-crypto, python-darts.lib.utils.lru, python-demgengeo, python-graph, python-mock, python-musicbrainz2, python-pathtools, python-pskc, python-psutil, python-pypump, python-repoze.sphinx.autointerface, python-repoze.tm2, python-repoze.what-plugins, python-repoze.what, python-repoze.who-plugins, python-xstatic-term.js, reclass, resource-agents, rgain, rttool, ruby-aggregate, ruby-archive-tar-minitar, ruby-bcat, ruby-blankslate, ruby-coffee-script, ruby-colored, ruby-dbd-mysql, ruby-dbd-odbc, ruby-dbd-pg, ruby-dbd-sqlite3, ruby-dbi, ruby-dirty-memoize, ruby-encryptor, ruby-erubis, ruby-fast-xs, ruby-fusefs, ruby-gd, ruby-git, ruby-globalhotkeys, ruby-god, ruby-hike, ruby-hmac, ruby-integration, ruby-ipaddress, ruby-jnunemaker-matchy, ruby-memoize, ruby-merb-core, ruby-merb-haml, ruby-merb-helpers, ruby-metaid, ruby-mina, ruby-net-irc, ruby-net-netrc, ruby-odbc, ruby-packet, ruby-parseconfig, ruby-platform, ruby-plist, ruby-popen4, ruby-rchardet, ruby-romkan, ruby-rubyforge, ruby-rubytorrent, ruby-samuel, ruby-shoulda-matchers, ruby-sourcify, ruby-test-spec, ruby-validatable, ruby-wirble, ruby-xml-simple, ruby-zoom, ryu, simplejson, spamassassin-heatu, speaklater, stompserver, syncevolution, syncmaildir, thin, ticgit, tox, transmissionrpc, vdr-plugin-xine, waitress, whereami, xlsx2csv, zathura. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: reproducible.debian.net A new package set for the X Strike Force has been added. (h01ger) Bugs tagged with locale are now visible in the statistics. (h01ger) Some work has been done add tests for NetBSD. (h01ger) Many changes by Mattia Rizzolo have been merged on the whole infrastructure: debbindiff development Version 26 has been released on June 28th fixing the comparison of files of unknown format. (Lunar) A missing dependency identified in python-rpm affecting debbindiff installation without recommended packages was promptly fixed by Michal iha . Lunar also started a massive code rearchitecture to enhance code reuse and enable new features. Nothing visible yet, though. Documentation update josch and Mattia Rizzolo documented how to reschedule packages from Alioth. Package reviews 142 obsolete reviews have been removed, 344 added and 107 updated this week. Chris West (Faux) filled 13 new bugs for packages failing to build from sources. The following new issues have been added: snapshot_placeholder_replaced_with_timestamp_in_pom_properties, different_encoding, timestamps_in_documentation_generated_by_org_mode and timestamps_in_pdf_generated_by_matplotlib.

20 June 2015

Lunar: Reproducible builds: week 5 in Stretch cycle

What happened about the reproducible builds effort for this week: Toolchain fixes Uploads that should help other packages: Patch submitted for toolchain issues: Some discussions have been started in Debian and with upstream: Packages fixed The following 8 packages became reproducible due to changes in their build dependencies: access-modifier-checker, apache-log4j2, jenkins-xstream, libsdl-perl, maven-shared-incremental, ruby-pygments.rb, ruby-wikicloth, uimaj. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which did not make their way to the archive yet: Discussions that have been started: reproducible.debian.net Holger Levsen added two new package sets: pkg-javascript-devel and pkg-php-pear. The list of packages with and without notes are now sorted by age of the latest build. Mattia Rizzolo added support for email notifications so that maintainers can be warned when a package becomes unreproducible. Please ask Mattia or Holger or in the #debian-reproducible IRC channel if you want to be notified for your packages! strip-nondeterminism development Andrew Ayer fixed the gzip handler so that it skip adding a predetermined timestamp when there was none. Documentation update Lunar added documentation about mtimes of file extracted using unzip being timezone dependent. He also wrote a short example on how to test reproducibility. Stephen Kitt updated the documentation about timestamps in PE binaries. Documentation and scripts to perform weekly reports were published by Lunar. Package reviews 50 obsolete reviews have been removed, 51 added and 29 updated this week. Thanks Chris West and Mathieu Bridon amongst others. New identified issues: Misc. Lunar will be talking (in French) about reproducible builds at Pas Sage en Seine on June 19th, at 15:00 in Paris. Meeting will happen this Wednesday, 19:00 UTC.

8 June 2015

Lunar: Reproducible builds: week 6 in Stretch cycle

What happened about the reproducible builds effort for this week: Presentations On May 26th,Holger Levsen presented reproducible builds in Debian at CCC Berlin for the Datengarten 52. The presentation was in German and the slides in English. Audio and video recordings are available. Toolchain fixes Niels Thykier fixed the experimental support for the automatic creation of debug packages in debhelper that being tested as part of the reproducible toolchain. Lunar added to the reproducible build version of dpkg the normalization of permissions for files in control.tar. The patch has also been submitted based on the main branch. Daniel Kahn Gillmor proposed a patch to add support for externally-supplying build date to help2man. This sparkled a discussion about agreeing on a common name for an environment variable to hold the date that should be used. It seems opinions are converging on using SOURCE_DATE_UTC which would hold a ISO-8601 formatted date in UTC) (e.g. 2015-06-05T01:08:20Z). Kudos to Daniel, Brendan O'Dea, Ximin Luo for pushing this forward. Lunar proposed a patch to Tar upstream adding a --clamp-mtime option as a generic solution for timestamp variations in tarballs which might also be useful for dpkg. The option changes the behavior of --mtime to only use the time specified if the file mtime is newer than the given time. So far, upstream is not convinced that it would make a worthwhile addition to Tar, though. Daniel Kahn Gillmor reached out to the libburnia project to ask for help on how to make ISO created with xorriso reproducible. We should reward Thomas Schmitt with a model upstream trophy as he went through a thorough analysis of possible sources of variations and ways to improve the situation. Most of what is missing with the current version in Debian is available in the latest upstream version, but libisoburn in Debian needs help. Daniel backported the missing option for version 1.3.2-1.1. akira submitted a new issue to Doxygen upstream regarding the timestamps added to the generated manpages. Packages fixed The following 49 packages became reproducible due to changes in their build dependencies: activemq-protobuf, bnfc, bridge-method-injector, commons-exec, console-data, djinn, github-backup, haskell-authenticate-oauth, haskell-authenticate, haskell-blaze-builder, haskell-blaze-textual, haskell-bloomfilter, haskell-brainfuck, haskell-hspec-discover, haskell-pretty-show, haskell-unlambda, haskell-x509-util, haskelldb-hdbc-odbc, haskelldb-hdbc-postgresql, haskelldb-hdbc-sqlite3, hasktags, hedgewars, hscolour, https-everywhere, java-comment-preprocessor, jffi, jgit, jnr-ffi, jnr-netdb, jsoup, lhs2tex, libcolor-calc-perl, libfile-changenotify-perl, libpdl-io-hdf5-perl, libsvn-notify-mirror-perl, localizer, maven-enforcer, pyotherside, python-xlrd, python-xstatic-angular-bootstrap, rt-extension-calendar, ruby-builder, ruby-em-hiredis, ruby-redcloth, shellcheck, sisu-plexus, tomcat-maven-plugin, v4l2loopback, vim-latexsuite. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which did not make their way to the archive yet: Daniel Kahn Gilmor also started discussions for emacs24 and the unsorted lists in generated .el files, the recording of a PID number in lush, and the reproducibility of ISO images in grub2. reproducible.debian.net Notifications are now sent when the build environment for a package has changed between two builds. This is a first step before automatically building the package once more. (Holger Levsen) jenkins.debian.net was upgraded to Debian Jessie. (Holger Levsen) A new variation is now being tested: $PATH. The second build will be done with a /i/capture/the/path added. (Holger Levsen) Holger Levsen with the help of Alexander Couzens wrote extra job to test the reproducibility of coreboot. Thanks James McCoy for helping with certificate issues. Mattia Rizollo made some more internal improvements. strip-nondeterminism development Andrew Ayer released strip-nondeterminism/0.008-1. This new version fixes the gzip handler so that it now skip adding a predetermined timestamp when there was none. Holger Levsen sponsored the upload. Documentation update The pages about timestamps in manpages generated by Doxygen, GHC .hi files, and Jar files have been updated to reflect their status in upstream. Markus Koschany documented an easy way to prevent Doxygen to write timestamps in HTML output. Package reviews 83 obsolete reviews have been removed, 71 added and 48 updated this week. Meetings A meeting was held on 2015-06-03. Minutes and full logs are available. It was agreed to hold such a meeting every two weeks for the time being. The time of the next meeting should be announced soon.