Search Results: "Daniel Kahn Gillmor"

31 December 2020

Daniel Kahn Gillmor: New OpenPGP certificate for dkg, 2021

dkg's 2021 OpenPGP transition As 2021 begins, I'm changing to a new OpenPGP certificate. I did a similar transition two years ago, and a fair amount has changed since then. You might know my old OpenPGP certificate as:
pub   ed25519 2019-01-19 [C] [expires: 2021-01-18]
      C4BC2DDB38CCE96485EBE9C2F20691179038E5C6
uid          Daniel Kahn Gillmor <dkg@fifthhorseman.net>
uid          Daniel Kahn Gillmor <dkg@debian.org>
My new OpenPGP certificate is:
pub   ed25519 2020-12-27 [C] [expires: 2023-12-24]
      C29F8A0C01F35E34D816AA5CE092EB3A5CA10DBA
uid           [ unknown] Daniel Kahn Gillmor
uid           [ unknown] <dkg@debian.org>
uid           [ unknown] <dkg@fifthhorseman.net>
You can find a signed transition statement if you're into that sort of thing. If you're interested in the rationale for why I'm making this transition, read on.

Dangers of Offline Primary Secret Keys There are several reasons for transitioning, but one i simply couldn't argue with was my own technical failure. I put the primary secret key into offline storage some time ago for "safety", and used ext4's filesystem-level encryption layered on top of dm-crypt for additional security. But either the tools changed out from under me, or there were failures on the storage medium, or I've failed to remember my passphrase correctly, because I am unable to regain access to the cleartext of the secret key. In particular, I find myself unable to use e4crypt add_key with the passphrase I know to get a usable working directory. I confess I still find e4crypt pretty difficult to use and I don't use it often, so the problem may entirely be user error (either now, or two years ago when I did the initial setup). Anyway, lesson learned: don't use cryptosystems that you're not comfortable with to encrypt data that you care about recovering. This is a lesson I'm pretty sure I've learned before, sigh, but it's a good reminder.

Split User IDs I'm trying to split out my User IDs again -- this way if you know me by e-mail address, you don't have to think/worry about certifying my name, and if you know me by name, you don't have to think/worry about certifying my e-mail address. I think that's simpler and more sensible. It's also nice because e-mail address-only User IDs can be used effectively in contexts like Autocrypt, which I think are increasingly important if we want to have usable encrypted e-mail. Last time around I initially tried split User IDs but rolled them back and I think most of the bugs I discovered then have been fixed.

Certificate Flooding Another reason for making a transition to a new certificate is that my older certificate is one of the ones that was "flooded" on the SKS keyserver network last year, which was one of the final straws for that teetering project. Transitioning to a new certificate lets that old flooded cert expire and people can just simply move on from it, ideally deleting it from their local keyrings. Hopefully as a community we can move on from SKS to key distribution mechanisms like WKD, Autocrypt, DANE, and keys.openpgp.org, all of which address some of the known problems with keyserver abuse.

Trying New Tools Finally, I'm also interested in thinking about how key and certificate management might be handled in different ways. While I'm reasonably competent in handling GnuPG, the larger OpenPGP community (which I'm a part of) has done a lot of thinking and a lot of work about how people can use OpenPGP. I'm particularly happy with the collaborative work that has gone into the Stateless OpenPGP CLI (aka sop), which helps to generate a powerful interoperability test suite. While sop doesn't offer the level of certificate management I'd need to use it to manage this new certificate in full, I wish something like it would! Starting from a fresh certificate and actually using it helps me to think through what I might actually need from a tool that is roughly as straightforward and opinionated as sop is. If you're a software developer who might use or implement OpenPGP, or a protocol designer, and you haven't played around with any of the various implementations of sop yet, I recommend taking a look. And feedback on the specification is always welcome, too, including ideas for new functionality (maybe even like certificate management).

Next Steps If you're the kind of person who's into making OpenPGP certifications, feel free to check in with me via whatever channels you're used to using to verify that this transition is legit. If you think it is, and you're comfortable, please send me (e-mail is probably best) your certifications over the new certficate. I'll keep on working to make OpenPGP more usable and acceptable. Hopefully, 2021 will be a better year ahead for all of us.

17 April 2020

Daniel Kahn Gillmor: Tech-assisted Contact-Tracing against the COVID-19 pandemic

Today at the ACLU, we released a whitepaper discussing how to evaluate some novel cryptographic schemes that are being considered to provide technology-assisted contact-tracing in the face of the COVID-19 pandemic. The document offers guidelines for thinking about potential schemes like this, and what kinds of safeguards we need to expect and demand from these systems so that we might try to address the (hopefully temporary) crisis of the pandemic without also creating a permanent crisis for civil liberties. The proposals that we're seeing (including PACT, DP^3T, TCN, and the Apple/Google proposal) work in pretty similar ways, and the challenges and tradeoffs there are remarkably similar to Internet protocol design decisions. Only now in addition to bytes and packets and questions of efficiency, privacy, and control, we're also dealing directly with risk of physical harm (who gets sick?), society-wide allocation of scarce and critical resources (who gets tested? who gets treatment?), and potentially serious means of exercising powerful social control (who gets forced into quarantine?). My ACLU colleague Jon Callas and i will be doing a Reddit AMA in r/Coronavirus tomorrow starting at 2020-04-17T19:00:00Z (that's 3pm Friday in TZ=America/New_York) about this very topic, if that's the kind of thing you're into.

20 November 2017

Reproducible builds folks: Reproducible Builds: Weekly report #133

Here's what happened in the Reproducible Builds effort between Sunday November 5 and Saturday November 11 2017: Upcoming events On November 17th Chris Lamb will present at Open Compliance Summit, Yokohama, Japan on how reproducible builds ensures the long-term sustainability of technology infrastructure. We plan to hold an assembly at 34C3 - hope to see you there! LEDE CI tests Thanks to the work of lynxis, Mattia and h01ger, we're now testing all LEDE packages in our setup. This is our first result for the ar71xx target: "502 (100.0%) out of 502 built images and 4932 (94.8%) out of 5200 built packages were reproducible in our test setup." - see below for details how this was achieved. Bootstrapping and Diverse Double Compilation As a follow-up of a discussion on bootstrapping compilers we had on the Berlin summit, Bernhard and Ximin worked on a Proof of Concept for Diverse Double Compilation of tinycc (aka tcc). Ximin Luo did a successful diverse-double compilation of tinycc git HEAD using gcc-7.2.0, clang-4.0.1, icc-18.0.0 and pgcc-17.10-0 (pgcc needs to triple-compile it). More variations are planned for the future, with the eventual aim to reproduce the same binaries cross-distro, and extend it to test GCC itself. Packages reviewed and fixed, and bugs filed Patches filed upstream: Patches filed in Debian: Patches filed in OpenSUSE: Reviews of unreproducible packages 73 package reviews have been added, 88 have been updated and 40 have been removed in this week, adding to our knowledge about identified issues. 4 issue types have been updated: Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by: diffoscope development Mattia Rizzolo uploaded version 88~bpo9+1 to stretch-backports. reprotest development reproducible-website development theunreproduciblepackage development tests.reproducible-builds.org in detail Misc. This week's edition was written by Ximin Luo, Bernhard M. Wiedemann, Chris Lamb and Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

6 June 2017

Reproducible builds folks: Reproducible Builds: week 110 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday May 28 and Saturday June 3 2017: Past an upcoming events Documentation updates Toolchain development and fixes Patches and bugs filed 4 package reviews have been added, 6 have been updated and 25 have been removed in this week, adding to our knowledge about identified issues. Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by: diffoscope development tests.reproducible-builds.org Mattia Rizzolo: Daniel Kahn Gillmor: Vagrant Cascadian: Holger Levsen: Misc. This week's edition was written by Chris Lamb, Bernhard M. Wiedemann and Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

22 February 2017

Antoine Beaupr : The case against password hashers

In previous articles, we have looked at how to generate passwords and did a review of various password managers. There is, however, a third way of managing passwords other than remembering them or encrypting them in a "vault", which is what I call "password hashing". A password hasher generates site-specific passwords from a single master password using a cryptographic hash function. It thus allows a user to have a unique and secure password for every site they use while requiring no storage; they need only to remember a single password. You may know these as "deterministic or stateless password managers" but I find the "password manager" phrase to be confusing because a hasher doesn't actually store any passwords. I do not think password hashers represent a good security tradeoff so I generally do not recommend their use, unless you really do not have access to reliable storage that you can access readily. In this article, I use the word "password" for a random string used to unlock things, but "token" to represent a generated random string that the user doesn't need to remember. The input to a password hasher is a password with some site-specific context and the output from a password hasher is a token.

What is a password hasher? A password hasher uses the master password and a label (generally the host name) to generate the site-specific password. To change the generated password, the user can modify the label, for example by appending a number. Some password hashers also have different settings to generate tokens of different lengths or compositions (symbols or not, etc.) to accommodate different site-specific password policies. The whole concept of password hashers relies on the concept of one-way cryptographic hash functions or key derivation functions that take an arbitrary input string (say a password) and generate a unique token, from which it is impossible to guess the original input string. Password hashers are generally written as JavaScript bookmarklets or browser plugins and have been around for over a decade. The biggest advantage of password hashers is that you only need to remember a single password. You do not need to carry around a password manager vault: there's no "state" (other than site-specific settings, which can be easily guessed). A password hasher named Master Password makes a compelling case against traditional password managers in its documentation:
It's as though the implicit assumptions are that everybody backs all of their stuff up to at least two different devices and backups in the cloud in at least two separate countries. Well, people don't always have perfect backups. In fact, they usually don't have any.
It goes on to argue that, when you lose your password: "You lose everything. You lose your own identity." The stateless nature of password hashers also means you do not need to use cloud services to synchronize your passwords, as there is (generally, more on that later) no state to carry around. This means, for example, that the list of accounts that you have access to is only stored in your head, and not in some online database that could be hacked without your knowledge. The downside of this is, of course, that attackers do not actually need to have access to your password hasher to start cracking it: they can try to guess your master key without ever stealing anything from you other than a single token you used to log into some random web site. Password hashers also necessarily generate unique passwords for every site you use them on. While you can also do this with password managers, it is not an enforced decision. With hashers, you get distinct and strong passwords for every site with no effort.

The problem with password hashers If hashers are so great, why would you use a password manager? Programs like LessPass and Master Password seem to have strong crypto that is well implemented, so why isn't everyone using those tools? Password hashing, as a general concept, actually has serious problems: since the hashing outputs are constantly compromised (they are sent in password forms to various possibly hostile sites), it's theoretically possible to derive the master password and then break all the generated tokens in one shot. The use of stronger key derivation functions (like PBKDF2, scrypt, or HMAC) or seeds (like a profile-specific secret) makes those attacks much harder, especially if the seed is long enough to make brute-force attacks infeasible. (Unfortunately, in the case of Password Hasher Plus, the seed is derived from Math.random() calls, which are not considered cryptographically secure.) Basically, as stated by Julian Morrison in this discussion:
A password is now ciphertext, not a block of line noise. Every time you transmit it, you are giving away potential clues of use to an attacker. [...] You only have one password for all the sites, really, underneath, and it's your secret key. If it's broken, it's now a skeleton-key [...]
Newer implementations like LessPass and Master Password fix this by using reasonable key derivation algorithms (PBKDF2 and scrypt, respectively) that are more resistant to offline cracking attacks, but who knows how long those will hold? To give a concrete example, if you would like to use the new winner of the password hashing competition (Argon2) in your password manager, you can patch the program (or wait for an update) and re-encrypt your database. With a password hasher, it's not so easy: changing the algorithm means logging in to every site you visited and changing the password. As someone who used a password hasher for a few years, I can tell you this is really impractical: you quickly end up with hundreds of passwords. The LessPass developers tried to facilitate this, but they ended up mostly giving up. Which brings us to the question of state. A lot of those tools claim to work "without a server" or as being "stateless" and while those claims are partly true, hashers are way more usable (and more secure, with profile secrets) when they do keep some sort of state. For example, Password Hasher Plus records, in your browser profile, which site you visited and which settings were used on each site, which makes it easier to comply with weird password policies. But then that state needs to be backed up and synchronized across multiple devices, which led LessPass to offer a service (which you can also self-host) to keep those settings online. At this point, a key benefit of the password hasher approach (not keeping state) just disappears and you might as well use a password manager. Another issue with password hashers is choosing the right one from the start, because changing software generally means changing the algorithm, and therefore changing passwords everywhere. If there was a well-established program that was be recognized as a solid cryptographic solution by the community, I would feel more confident. But what I have seen is that there are a lot of different implementations each with its own warts and flaws; because changing is so painful, I can't actually use any of those alternatives. All of the password hashers I have reviewed have severe security versus usability tradeoffs. For example, LessPass has what seems to be a sound cryptographic implementation, but using it requires you to click on the icon, fill in the fields, click generate, and then copy the password into the field, which means at least four or five actions per password. The venerable Password Hasher is much easier to use, but it makes you type the master password directly in the site's password form, so hostile sites can simply use JavaScript to sniff the master password while it is typed. While there are workarounds implemented in Password Hasher Plus (the profile-specific secret), both tools are more or less abandoned now. The Password Hasher homepage, linked from the extension page, is now a 404. Password Hasher Plus hasn't seen a release in over a year and there is no space for collaborating on the software the homepage is simply the author's Google+ page with no information on the project. I couldn't actually find the source online and had to download the Chrome extension by hand to review the source code. Software abandonment is a serious issue for every project out there, but I would argue that it is especially severe for password hashers. Furthermore, I have had difficulty using password hashers in unified login environments like Wikipedia's or StackExchange's single-sign-on systems. Because they allow you to log in with the same password on multiple sites, you need to choose (and remember) what label you used when signing in. Did I sign in on stackoverflow.com? Or was it stackexchange.com? Also, as mentioned in the previous article about password managers, web-based password managers have serious security flaws. Since more than a few password hashers are implemented using bookmarklets, they bring all of those serious vulnerabilities with them, which can range from account name to master password disclosures. Finally, some of the password hashers use dubious crypto primitives that were valid and interesting a decade ago, but are really showing their age now. Stanford's pwdhash uses MD5, which is considered "cryptographically broken and unsuitable for further use". We have seen partial key recovery attacks against MD5 already and while those do not allow an attacker to recover the full master password yet (especially not with HMAC-MD5), I would not recommend anyone use MD5 in anything at this point, especially if changing that algorithm later is hard. Some hashers (like Password Hasher and Password Plus) use a a single round of SHA-1 to derive a token from a password; WPA2 (standardized in 2004) uses 4096 iterations of HMAC-SHA1. A recent US National Institute of Standards and Technology (NIST) report also recommends "at least 10,000 iterations of the hash function".

Conclusion Forced to suggest a password hasher, I would probably point to LessPass or Master Password, depending on the platform of the person asking. But, for now, I have determined that the security drawbacks of password hashers are not acceptable and I do not recommend them. It makes my password management recommendation shorter anyway: "remember a few carefully generated passwords and shove everything else in a password manager". [Many thanks to Daniel Kahn Gillmor for the thorough reviews provided for the password articles.]
Note: this article first appeared in the Linux Weekly News. Also, details of my research into password hashers are available in the password hashers history article.

15 February 2017

Antoine Beaupr : A look at password managers

As we noted in an earlier article, passwords are a liability and we'd prefer to get rid of them, but the current reality is that we do use a plethora of passwords in our daily lives. This problem is especially acute for technology professionals, particularly system administrators, who have to manage a lot of different machines. But it also affects regular users who still use a large number of passwords, from their online bank to their favorite social-networking site. Despite the remarkable memory capacity of the human brain, humans are actually terrible at recalling even short sets of arbitrary characters with the precision needed for passwords. Therefore humans reuse passwords, make them trivial or guessable, write them down on little paper notes and stick them on their screens, or just reset them by email every time. Our memory is undeniably failing us and we need help, which is where password managers come in. Password managers allow users to store an arbitrary number of passwords and just remember a single password to unlock them all. But there is a large variety of password managers out there, so which one should we be using? At my previous job, an inventory was done of about 40 different free-software password managers in different stages of development and of varying quality. So, obviously, this article will not be exhaustive, but instead focus on a smaller set of some well-known options that may be interesting to readers.

KeePass: the popular alternative The most commonly used password-manager design pattern is to store passwords in a file that is encrypted and password-protected. The most popular free-software password manager of this kind is probably KeePass. An important feature of KeePass is the ability to auto-type passwords in forms, most notably in web browsers. This feature makes KeePass really easy to use, especially considering it also supports global key bindings to access passwords. KeePass databases are designed for simultaneous access by multiple users, for example, using a shared network drive. KeePass has a graphical interface written in C#, so it uses the Mono framework on Linux. A separate project, called KeePassX is a clean-room implementation written in C++ using the Qt framework. Both support the AES and Twofish encryption algorithms, although KeePass recently added support for the ChaCha20 cipher. AES key derivation is used to generate the actual encryption key for the database, but the latest release of KeePass also added using Argon2, which was the winner of the July 2015 password-hashing competition. Both programs are more or less equivalent, although the original KeePass seem to have more features in general. The KeePassX project has recently been forked into another project now called KeePassXC that implements a set of new features that are present in KeePass but missing from KeePassX like:
  • auto-type on Linux, Mac OS, and Windows
  • database merging which allows multi-user support
  • using the web site's favicon in the interface
So far, the maintainers of KeePassXC seem to be open to re-merging the project "if the original maintainer of KeePassX in the future will be more active and will accept our merge and changes". I can confirm that, at the time of writing, the original KeePassX project now has 79 pending pull requests and only one pull request was merged since the last release, which was 2.0.3 in September 2016. While KeePass and derivatives allow multiple users to access the same database through the merging process, they do not support multi-party access to a single database. This may be a limiting factor for larger organizations, where you may need, for example, a different password set for different technical support team levels. The solution in this case is to use separate databases for each team, with each team using a different shared secret.

Pass: the standard password manager? I am currently using password-store, or pass, as a password manager. It aims to be "the standard Unix password manager". Pass is a GnuPG-based password manager that features a surprising number of features given its small size:
  • copy-paste support
  • Git integration
  • multi-user/group support
  • pluggable extensions (in the upcoming 1.7 release)
The command-line interface is simple to use and intuitive. The following, will, for example, create a pass repository, a 20 character password for your LWN account and copy it to the clipboard:
    $ pass init
    $ pass generate -c lwn 20
The main issue with pass is that it doesn't encrypt the name of those entries: if someone were to compromise my machine, they could easily see which sites I have access to simply by listing the passwords stored in ~/.password-store. This is a deliberate design decision by the upstream project, as stated by a mailing list participant, Allan Odgaard:
Using a single file per item has the advantage of shell completion, using version control, browse, move and rename the items in a file browser, edit them in a regular editor (that does GPG, or manually run GPG first), etc.
Odgaard goes on to point out that there are alternatives that do encrypt the entire database (including the site names) if users really need that feature. Furthermore, there is a tomb plugin for pass that encrypts the password store in a LUKS container (called a "tomb"), although it requires explicitly opening and closing the container, which makes it only marginally better than using full disk encryption system-wide. One could also argue that password file names do not hold secret information, only the site name and username, perhaps, and that doesn't require secrecy. I do believe those should be kept secret, however, as they could be used to discover (or prove) which sites you have access to and then used to perform other attacks. One could draw a parallel with the SSH known_hosts file, which used to be plain text but is now hashed so that hosts are more difficult to discover. Also, sharing a database for multi-user support will require some sort of file-sharing mechanism. Given the integrated Git support, this will likely involve setting up a private Git repository for your team, something which may not be accessible to the average Linux user. Nothing keeps you, however, from sharing the ~/.password-store directory through another file sharing mechanism like (say) Syncthing or Dropbox. You can use multiple distinct databases easily using the PASSWORD_STORE_DIR environment variable. For example, you could have a shell alias to use a different repository for your work passwords with:
    alias work-pass="PASSWORD_STORE_DIR=~/work-passwords pass"
Group support comes from a clever use of the GnuPG multiple-recipient encryption support. You simply have to specify multiple OpenPGP identities when initializing the repository, which also works in subdirectories:
    $ pass init -p Ateam me@example.com joelle@example.com
    mkdir: created directory '/home/me/.password-store/Ateam'
    Password store initialized for me@example.com, joelle@example.com
    [master 0e3dbe7] Set GPG id to me@example.com, joelle@example.com.
     1 file changed, 2 insertions(+)
     create mode 100644 Ateam/.gpg-id
The above will configure pass to encrypt the passwords in the Ateam directory for me@example.com and joelle@example.com. Pass depends on GnuPG to do the right thing when encrypting files and how those identities are treated is entirely delegated to GnuPG's default configuration. This could lead to problems if arbitrary keys can be injected into your key ring, which could confuse GnuPG. I would therefore recommend using full key fingerprints instead of user identifiers. Regarding the actual encryption algorithms used, in my tests, GnuPG 1.4.18 and 2.1.18 seemed to default to 256-bit AES for encryption, but that has not always been the case. The chosen encryption algorithm actually depends on the recipient's key preferences, which may vary wildly: older keys and versions may use anything from 128-bit AES to CAST5 or Triple DES. To figure out which algorithm GnuPG chose, you may want to try this pipeline:
    $ echo test   gpg -e -r you@example.com   gpg -d -v
    [...]
    gpg: encrypted with 2048-bit RSA key, ID XXXXXXX, created XXXXX
      "You Person You <you@example.com>"
    gpg: AES256 encrypted data
    gpg: original file name=''
    test
As you can see, pass is primarily a command-line application, which may make it less accessible to regular users. The community has produced different graphical interfaces that are either using pass directly or operate on the storage with their own GnuPG integration. I personally use pass in combination with Rofi to get quick access to my passwords, but less savvy users may want to try the QtPass interface, which should be more user-friendly. QtPass doesn't actually depend on pass and can use GnuPG directly to interact with the pass database; it is available for Linux, BSD, OS X, and Windows.

Browser password managers Most users are probably already using a password manager through their web browser's "remember password" functionality. For example, Chromium will ask if you want it to remember passwords and encrypt them with your operating system's facilities. For Windows, this encrypts the passwords with your login password and, for GNOME, it will store the passwords in the gnome-keyring storage. If you synchronize your Chromium settings with your Google account, Chromium will store those passwords on Google's servers, encrypted with a key that is stored in the Google Account itself. So your passwords are then only as safe as your Google account. Note that this was covered here in 2010, although back then Chromium didn't synchronize with the Google cloud or encrypt with the system-level key rings. That facility was only added in 2013. In Firefox, there's an optional, profile-specific master password that unlocks all passwords. In this case, the issue is that browsers are generally always open, so the vault is always unlocked. And this is for users that actually do pick a master password; users are often completely unaware that they should set one. The unlocking mechanism is a typical convenience-security trade-off: either users need to constantly input their master passwords to login or they don't, and the passwords are available in the clear. In this case, Chromium's approach of actually asking users to unlock their vault seems preferable, even though the developers actually refused to implement the feature for years. Overall, I would recommend against using a browser-based password manager. Even if it is not used for critical sites, you will end up with hundreds of such passwords that are vulnerable while the browser is running (in the case of Firefox) or at the whim of Google (in the case of Chromium). Furthermore, the "auto-fill" feature that is often coupled with browser-based password managers is often vulnerable to serious attacks, which is mentioned below. Finally, because browser-based managers generally lack a proper password generator, users may fail to use properly generated passwords, so they can then be easily broken. A password generator has been requested for Firefox, according to this feature request opened in 2007, and there is a password generator in Chrome, but it is disabled by default and hidden in the mysterious chrome://flags URL.

Other notable password managers Another alternative password manager, briefly mentioned in the previous article, is the minimalistic Assword password manager that, despite its questionable name, is also interesting. Its main advantage over pass is that it uses a single encrypted JSON file for storage, and therefore doesn't leak the name of the entries by default. In addition to copy/paste, Assword also supports automatically entering passphrases in fields using the xdo library. Like pass, it uses GnuPG to encrypt passphrases. According to Assword maintainer Daniel Kahn Gillmor in email, the main issue with Assword is "interaction between generated passwords and insane password policies". He gave the example of the Time-Warner Cable registration form that requires, among other things, "letters and numbers, between 8 and 16 characters and not repeat the same characters 3 times in a row". Another well-known password manager is the commercial LastPass service which released a free-software command-line client called lastpass-cli about three years ago. Unfortunately, the server software of the lastpass.com service is still proprietary. And given that LastPass has had at least two serious security breaches since that release, one could legitimately question whether this is a viable solution for storing important secrets. In general, web-based password managers expose a whole new attack surface that is not present in regular password managers. A 2014 study by University of California researchers showed that, out of five password managers studied, every one of them was vulnerable to at least one of the vulnerabilities studied. LastPass was, in particular, vulnerable to a cross-site request forgery (CSRF) attack that allowed an attacker to bypass account authentication and access the encrypted database.

Problems with password managers When you share a password database within a team, how do you remove access to a member of the team? While you can, for example, re-encrypt a pass database with new keys (thereby removing or adding certain accesses) or change the password on a KeePass database, a hostile party could have made a backup of the database before the revocation. Indeed, in the case of pass, older entries are still in the Git history. So access revocation is a problematic issue found with all shared password managers, as it may actually mean going through every password and changing them online. This fundamental problem with shared secrets can be better addressed with a tool like Vault or SFLvault. Those tools aim to provide teams with easy ways to store dynamic tokens like API keys or service passwords and share them not only with other humans, but also make them accessible to machines. The general idea of those projects is to store secrets in a central server and send them directly to relevant services without human intervention. This way, passwords are not actually shared anymore, which is similar in spirit to the approach taken by centralized authentication systems like Kerberos. If you are looking at password management for teams, those projects may be worth a look. Furthermore, some password managers that support auto-typing were found to be vulnerable to HTML injection attacks: if some third-party ad or content is able to successfully hijack the parent DOM content, it masquerades as a form that could fool auto-typing software as demonstrated by this paper that was submitted at USENIX 2014. Fortunately, KeePass was not vulnerable according to the security researchers, but LastPass was, again, vulnerable.

Future of password managers? All of the solutions discussed here assume you have a trusted computer you regularly have access to, which is a usage pattern that seems to be disappearing with a majority of the population. You could consider your phone to be that trusted device, yet a phone can be lost or stolen more easily than a traditional workstation or even a laptop. And while KeePass has Android and iOS ports, those do not resolve the question of how to share the password storage among those devices or how to back them up. Password managers are fundamentally file-based, and the "file" concept seems to be quickly disappearing, faster than we technologists sometimes like to admit. Looking at some relatives' use of computers, I notice it is less about "files" than images, videos, recipes, and various abstract objects that are stored in the "cloud". They do not use local storage so much anymore. In that environment, password managers lose their primary advantage, which is a local, somewhat offline file storage that is not directly accessible to attackers. Therefore certain password managers are specifically designed for the cloud, like LastPass or web browser profile synchronization features, without necessarily addressing the inherent issues with cloud storage and opening up huge privacy and security issues that we absolutely need to address. This is where the "password hasher" design comes in. Also known as "stateless" or "deterministic" password managers, password hashers are emerging as a convenient solution that could possibly replace traditional password managers as users switch from generic computing platforms to cloud-based infrastructure. We will cover password hashers and the major security challenges they pose in a future article.
Note: this article first appeared in the Linux Weekly News.

Antoine Beaupr : A look at password managers

As we noted in an earlier article, passwords are a liability and we'd prefer to get rid of them, but the current reality is that we do use a plethora of passwords in our daily lives. This problem is especially acute for technology professionals, particularly system administrators, who have to manage a lot of different machines. But it also affects regular users who still use a large number of passwords, from their online bank to their favorite social-networking site. Despite the remarkable memory capacity of the human brain, humans are actually terrible at recalling even short sets of arbitrary characters with the precision needed for passwords. Therefore humans reuse passwords, make them trivial or guessable, write them down on little paper notes and stick them on their screens, or just reset them by email every time. Our memory is undeniably failing us and we need help, which is where password managers come in. Password managers allow users to store an arbitrary number of passwords and just remember a single password to unlock them all. But there is a large variety of password managers out there, so which one should we be using? At my previous job, an inventory was done of about 40 different free-software password managers in different stages of development and of varying quality. So, obviously, this article will not be exhaustive, but instead focus on a smaller set of some well-known options that may be interesting to readers.

KeePass: the popular alternative The most commonly used password-manager design pattern is to store passwords in a file that is encrypted and password-protected. The most popular free-software password manager of this kind is probably KeePass. An important feature of KeePass is the ability to auto-type passwords in forms, most notably in web browsers. This feature makes KeePass really easy to use, especially considering it also supports global key bindings to access passwords. KeePass databases are designed for simultaneous access by multiple users, for example, using a shared network drive. KeePass has a graphical interface written in C#, so it uses the Mono framework on Linux. A separate project, called KeePassX is a clean-room implementation written in C++ using the Qt framework. Both support the AES and Twofish encryption algorithms, although KeePass recently added support for the ChaCha20 cipher. AES key derivation is used to generate the actual encryption key for the database, but the latest release of KeePass also added using Argon2, which was the winner of the July 2015 password-hashing competition. Both programs are more or less equivalent, although the original KeePass seem to have more features in general. The KeePassX project has recently been forked into another project now called KeePassXC that implements a set of new features that are present in KeePass but missing from KeePassX like:
  • auto-type on Linux, Mac OS, and Windows
  • database merging which allows multi-user support
  • using the web site's favicon in the interface
So far, the maintainers of KeePassXC seem to be open to re-merging the project "if the original maintainer of KeePassX in the future will be more active and will accept our merge and changes". I can confirm that, at the time of writing, the original KeePassX project now has 79 pending pull requests and only one pull request was merged since the last release, which was 2.0.3 in September 2016. While KeePass and derivatives allow multiple users to access the same database through the merging process, they do not support multi-party access to a single database. This may be a limiting factor for larger organizations, where you may need, for example, a different password set for different technical support team levels. The solution in this case is to use separate databases for each team, with each team using a different shared secret.

Pass: the standard password manager? I am currently using password-store, or pass, as a password manager. It aims to be "the standard Unix password manager". Pass is a GnuPG-based password manager that features a surprising number of features given its small size:
  • copy-paste support
  • Git integration
  • multi-user/group support
  • pluggable extensions (in the upcoming 1.7 release)
The command-line interface is simple to use and intuitive. The following, will, for example, create a pass repository, a 20 character password for your LWN account and copy it to the clipboard:
    $ pass init
    $ pass generate -c lwn 20
The main issue with pass is that it doesn't encrypt the name of those entries: if someone were to compromise my machine, they could easily see which sites I have access to simply by listing the passwords stored in ~/.password-store. This is a deliberate design decision by the upstream project, as stated by a mailing list participant, Allan Odgaard:
Using a single file per item has the advantage of shell completion, using version control, browse, move and rename the items in a file browser, edit them in a regular editor (that does GPG, or manually run GPG first), etc.
Odgaard goes on to point out that there are alternatives that do encrypt the entire database (including the site names) if users really need that feature. Furthermore, there is a tomb plugin for pass that encrypts the password store in a LUKS container (called a "tomb"), although it requires explicitly opening and closing the container, which makes it only marginally better than using full disk encryption system-wide. One could also argue that password file names do not hold secret information, only the site name and username, perhaps, and that doesn't require secrecy. I do believe those should be kept secret, however, as they could be used to discover (or prove) which sites you have access to and then used to perform other attacks. One could draw a parallel with the SSH known_hosts file, which used to be plain text but is now hashed so that hosts are more difficult to discover. Also, sharing a database for multi-user support will require some sort of file-sharing mechanism. Given the integrated Git support, this will likely involve setting up a private Git repository for your team, something which may not be accessible to the average Linux user. Nothing keeps you, however, from sharing the ~/.password-store directory through another file sharing mechanism like (say) Syncthing or Dropbox). You can use multiple distinct databases easily using the PASSWORD_STORE_DIR environment variable. For example, you could have a shell alias to use a different repository for your work passwords with:
    alias work-pass="PASSWORD_STORE_DIR=~/work-passwords pass"
Group support comes from a clever use of the GnuPG multiple-recipient encryption support. You simply have to specify multiple OpenPGP identities when initializing the repository, which also works in subdirectories:
    $ pass init -p Ateam me@example.com joelle@example.com
    mkdir: created directory '/home/me/.password-store/Ateam'
    Password store initialized for me@example.com, joelle@example.com
    [master 0e3dbe7] Set GPG id to me@example.com, joelle@example.com.
     1 file changed, 2 insertions(+)
     create mode 100644 Ateam/.gpg-id
The above will configure pass to encrypt the passwords in the Ateam directory for me@example.com and joelle@example.com. Pass depends on GnuPG to do the right thing when encrypting files and how those identities are treated is entirely delegated to GnuPG's default configuration. This could lead to problems if arbitrary keys can be injected into your key ring, which could confuse GnuPG. I would therefore recommend using full key fingerprints instead of user identifiers. Regarding the actual encryption algorithms used, in my tests, GnuPG 1.4.18 and 2.1.18 seemed to default to 256-bit AES for encryption, but that has not always been the case. The chosen encryption algorithm actually depends on the recipient's key preferences, which may vary wildly: older keys and versions may use anything from 128-bit AES to CAST5 or Triple DES. To figure out which algorithm GnuPG chose, you may want to try this pipeline:
    $ echo test   gpg -e -r you@example.com   gpg -d -v
    [...]
    gpg: encrypted with 2048-bit RSA key, ID XXXXXXX, created XXXXX
      "You Person You <you@example.com>"
    gpg: AES256 encrypted data
    gpg: original file name=''
    test
As you can see, pass is primarily a command-line application, which may make it less accessible to regular users. The community has produced different graphical interfaces that are either using pass directly or operate on the storage with their own GnuPG integration. I personally use pass in combination with Rofi to get quick access to my passwords, but less savvy users may want to try the QtPass interface, which should be more user-friendly. QtPass doesn't actually depend on pass and can use GnuPG directly to interact with the pass database; it is available for Linux, BSD, OS X, and Windows.

Browser password managers Most users are probably already using a password manager through their web browser's "remember password" functionality. For example, Chromium will ask if you want it to remember passwords and encrypt them with your operating system's facilities. For Windows, this encrypts the passwords with your login password and, for GNOME, it will store the passwords in the gnome-keyring storage. If you synchronize your Chromium settings with your Google account, Chromium will store those passwords on Google's servers, encrypted with a key that is stored in the Google Account itself. So your passwords are then only as safe as your Google account. Note that this was covered here in 2010, although back then Chromium didn't synchronize with the Google cloud or encrypt with the system-level key rings. That facility was only added in 2013. In Firefox, there's an optional, profile-specific master password that unlocks all passwords. In this case, the issue is that browsers are generally always open, so the vault is always unlocked. And this is for users that actually do pick a master password; users are often completely unaware that they should set one. The unlocking mechanism is a typical convenience-security trade-off: either users need to constantly input their master passwords to login or they don't, and the passwords are available in the clear. In this case, Chromium's approach of actually asking users to unlock their vault seems preferable, even though the developers actually refused to implement the feature for years. Overall, I would recommend against using a browser-based password manager. Even if it is not used for critical sites, you will end up with hundreds of such passwords that are vulnerable while the browser is running (in the case of Firefox) or at the whim of Google (in the case of Chromium). Furthermore, the "auto-fill" feature that is often coupled with browser-based password managers is often vulnerable to serious attacks, which is mentioned below. Finally, because browser-based managers generally lack a proper password generator, users may fail to use properly generated passwords, so they can then be easily broken. A password generator has been requested for Firefox, according to this feature request opened in 2007, and there is a password generator in Chrome, but it is disabled by default and hidden in the mysterious chrome://flags URL.

Other notable password managers Another alternative password manager, briefly mentioned in the previous article, is the minimalistic Assword password manager that, despite its questionable name, is also interesting. Its main advantage over pass is that it uses a single encrypted JSON file for storage, and therefore doesn't leak the name of the entries by default. In addition to copy/paste, Assword also supports automatically entering passphrases in fields using the xdo library. Like pass, it uses GnuPG to encrypt passphrases. According to Assword maintainer Daniel Kahn Gillmor in email, the main issue with Assword is "interaction between generated passwords and insane password policies". He gave the example of the Time-Warner Cable registration form that requires, among other things, "letters and numbers, between 8 and 16 characters and not repeat the same characters 3 times in a row". Another well-known password manager is the commercial LastPass service which released a free-software command-line client called lastpass-cli about three years ago. Unfortunately, the server software of the lastpass.com service is still proprietary. And given that LastPass has had at least two serious security breaches since that release, one could legitimately question whether this is a viable solution for storing important secrets. In general, web-based password managers expose a whole new attack surface that is not present in regular password managers. A 2014 study by University of California researchers showed that, out of five password managers studied, every one of them was vulnerable to at least one of the vulnerabilities studied. LastPass was, in particular, vulnerable to a cross-site request forgery (CSRF) attack that allowed an attacker to bypass account authentication and access the encrypted database.

Problems with password managers When you share a password database within a team, how do you remove access to a member of the team? While you can, for example, re-encrypt a pass database with new keys (thereby removing or adding certain accesses) or change the password on a KeePass database, a hostile party could have made a backup of the database before the revocation. Indeed, in the case of pass, older entries are still in the Git history. So access revocation is a problematic issue found with all shared password managers, as it may actually mean going through every password and changing them online. This fundamental problem with shared secrets can be better addressed with a tool like Vault or SFLvault. Those tools aim to provide teams with easy ways to store dynamic tokens like API keys or service passwords and share them not only with other humans, but also make them accessible to machines. The general idea of those projects is to store secrets in a central server and send them directly to relevant services without human intervention. This way, passwords are not actually shared anymore, which is similar in spirit to the approach taken by centralized authentication systems like Kerberos). If you are looking at password management for teams, those projects may be worth a look. Furthermore, some password managers that support auto-typing were found to be vulnerable to HTML injection attacks: if some third-party ad or content is able to successfully hijack the parent DOM content, it masquerades as a form that could fool auto-typing software as demonstrated by this paper that was submitted at USENIX 2014. Fortunately, KeePass was not vulnerable according to the security researchers, but LastPass was, again, vulnerable.

Future of password managers? All of the solutions discussed here assume you have a trusted computer you regularly have access to, which is a usage pattern that seems to be disappearing with a majority of the population. You could consider your phone to be that trusted device, yet a phone can be lost or stolen more easily than a traditional workstation or even a laptop. And while KeePass has Android and iOS ports, those do not resolve the question of how to share the password storage among those devices or how to back them up. Password managers are fundamentally file-based, and the "file" concept seems to be quickly disappearing, faster than we technologists sometimes like to admit. Looking at some relatives' use of computers, I notice it is less about "files" than images, videos, recipes, and various abstract objects that are stored in the "cloud". They do not use local storage so much anymore. In that environment, password managers lose their primary advantage, which is a local, somewhat offline file storage that is not directly accessible to attackers. Therefore certain password managers are specifically designed for the cloud, like LastPass or web browser profile synchronization features, without necessarily addressing the inherent issues with cloud storage and opening up huge privacy and security issues that we absolutely need to address. This is where the "password hasher" design comes in. Also known as "stateless" or "deterministic" password managers, password hashers are emerging as a convenient solution that could possibly replace traditional password managers as users switch from generic computing platforms to cloud-based infrastructure. We will cover password hashers and the major security challenges they pose in a future article.
Note: this article first appeared in the Linux Weekly News.

8 February 2017

Antoine Beaupr : Reliably generating good passwords

Passwords are used everywhere in our modern life. Between your email account and your bank card, a lot of critical security infrastructure relies on "something you know", a password. Yet there is little standard documentation on how to generate good passwords. There are some interesting possibilities for doing so; this article will look at what makes a good password and some tools that can be used to generate them. There is growing concern that our dependence on passwords poses a fundamental security flaw. For example, passwords rely on humans, who can be coerced to reveal secret information. Furthermore, passwords are "replayable": if your password is revealed or stolen, anyone can impersonate you to get access to your most critical assets. Therefore, major organizations are trying to move away from single password authentication. Google, for example, is enforcing two factor authentication for its employees and is considering abandoning passwords on phones as well, although we have yet to see that controversial change implemented. Yet passwords are still here and are likely to stick around for a long time until we figure out a better alternative. Note that in this article I use the word "password" instead of "PIN" or "passphrase", which all roughly mean the same thing: a small piece of text that users provide to prove their identity.

What makes a good password? A "good password" may mean different things to different people. I will assert that a good password has the following properties:
  • high entropy: hard to guess for machines
  • transferable: easy to communicate for humans or transfer across various protocols for computers
  • memorable: easy to remember for humans
High entropy means that the password should be unpredictable to an attacker, for all practical purposes. It is tempting (and not uncommon) to choose a password based on something else that you know, but unfortunately those choices are likely to be guessable, no matter how "secret" you believe it is. Yes, with enough effort, an attacker can figure out your birthday, the name of your first lover, your mother's maiden name, where you were last summer, or other secrets people think they have. The only solution here is to use a password randomly generated with enough randomness or "entropy" that brute-forcing the password will be practically infeasible. Considering that a modern off-the-shelf graphics card can guess millions of passwords per second using freely available software like hashcat, the typical requirement of "8 characters" is not considered enough anymore. With proper hardware, a powerful rig can crack such passwords offline within about a day. Even though a recent US National Institute of Standards and Technology (NIST) draft still recommends a minimum of eight characters, we now more often hear recommendations of twelve characters or fourteen characters. A password should also be easily "transferable". Some characters, like & or !, have special meaning on the web or the shell and can wreak havoc when transferred. Certain software also has policies of refusing (or requiring!) some special characters exactly for that reason. Weird characters also make it harder for humans to communicate passwords across voice channels or different cultural backgrounds. In a more extreme example, the popular Signal software even resorted to using only digits to transfer key fingerprints. They outlined that numbers are "easy to localize" (as opposed to words, which are language-specific) and "visually distinct". But the critical piece is the "memorable" part: it is trivial to generate a random string of characters, but those passwords are hard for humans to remember. As xkcd noted, "through 20 years of effort, we've successfully trained everyone to use passwords that are hard for human to remember but easy for computers to guess". It explains how a series of words is a better password than a single word with some characters replaced. Obviously, you should not need to remember all passwords. Indeed, you may store some in password managers (which we'll look at in another article) or write them down in your wallet. In those cases, what you need is not a password, but something I would rather call a "token", or, as Debian Developer Daniel Kahn Gillmor (dkg) said in a private email, a "high entropy, compact, and transferable string". Certain APIs are specifically crafted to use tokens. OAuth, for example, generates "access tokens" that are random strings that give access to services. But in our discussion, we'll use the term "token" in a broader sense. Notice how we removed the "memorable" property and added the "compact" one: we want to efficiently convert the most entropy into the shortest password possible, to work around possibly limiting password policies. For example, some bank cards only allow 5-digit security PINs and most web sites have an upper limit in the password length. The "compact" property applies less to "passwords" than tokens, because I assume that you will only use a password in select places: your password manager, SSH and OpenPGP keys, your computer login, and encryption keys. Everything else should be in a password manager. Those tools are generally under your control and should allow large enough passwords that the compact property is not particularly important.

Generating secure passwords We'll look now at how to generate a strong, transferable, and memorable password. These are most likely the passwords you will deal with most of the time, as security tokens used in other settings should actually never show up on screen: they should be copy-pasted or automatically typed in forms. The password generators described here are all operated from the command line. Password managers often have embedded password generators, but usually don't provide an easy way to generate a password for the vault itself. The previously mentioned xkcd cartoon is probably a common cultural reference in the security crowd and I often use it to explain how to choose a good passphrase. It turns out that someone actually implemented xkcd author Randall Munroe's suggestion into a program called xkcdpass:
    $ xkcdpass
    estop mixing edelweiss conduct rejoin flexitime
In verbose mode, it will show the actual entropy of the generated passphrase:
    $ xkcdpass -V
    The supplied word list is located at /usr/lib/python3/dist-packages/xkcdpass/static/default.txt.
    Your word list contains 38271 words, or 2^15.22 words.
    A 6 word password from this list will have roughly 91 (15.22 * 6) bits of entropy,
    assuming truly random word selection.
    estop mixing edelweiss conduct rejoin flexitime
Note that the above password has 91 bits of entropy, which is about what a fifteen-character password would have, if chosen at random from uppercase, lowercase, digits, and ten symbols:
    log2((26 + 26 + 10 + 10)^15) = approx. 92.548875
It's also interesting to note that this is closer to the entropy of a fifteen-letter base64 encoded password: since each character is six bits, you end up with 90 bits of entropy. xkcdpass is scriptable and easy to use. You can also customize the word list, separators, and so on with different command-line options. By default, xkcdpass uses the 2 of 12 word list from 12 dicts, which is not specifically geared toward password generation but has been curated for "common words" and words of different sizes. Another option is the diceware system. Diceware works by having a word list in which you look up words based on dice rolls. For example, rolling the five dice "1 4 2 1 4" would give the word "bilge". By rolling those dice five times, you generate a five word password that is both memorable and random. Since paper and dice do not seem to be popular anymore, someone wrote that as an actual program, aptly called diceware. It works in a similar fashion, except that passwords are not space separated by default:
    $ diceware
    AbateStripDummy16thThanBrock
Diceware can obviously change the output to look similar to xkcdpass, but can also accept actual dice rolls for those who do not trust their computer's entropy source:
    $ diceware -d ' ' -r realdice -w en_orig
    Please roll 5 dice (or a single dice 5 times).
    What number shows dice number 1? 4
    What number shows dice number 2? 2
    What number shows dice number 3? 6
    [...]
    Aspire O's Ester Court Born Pk
The diceware software ships with a few word lists, and the default list has been deliberately created for generating passwords. It is derived from the standard diceware list with additions from the SecureDrop project. Diceware ships with the EFF word list that has words chosen for better recognition, but it is not enabled by default, even though diceware recommends using it when generating passwords with dice. That is because the EFF list was added later on. The project is currently considering making the EFF list be the default. One disadvantage of diceware is that it doesn't actually show how much entropy the generated password has those interested need to compute it for themselves. The actual number depends on the word list: the default word list has 13 bits of entropy per word (since it is exactly 8192 words long), which means the default 6 word passwords have 78 bits of entropy:
    log2(8192) * 6 = 78
Both of these programs are rather new, having, for example, entered Debian only after the last stable release, so they may not be directly available for your distribution. The manual diceware method, of course, only needs a set of dice and a word list, so that is much more portable, and both the diceware and xkcdpass programs can be installed through pip. However, if this is all too complicated, you can take a look at Openwall's passwdqc, which is older and more widely available. It generates more memorable passphrases while at the same time allowing for better control over the level of entropy:
    $ pwqgen
    vest5Lyric8wake
    $ pwqgen random=78
    Theme9accord=milan8ninety9few
For some reason, passwdqc restricts the entropy of passwords between the bounds of 24 and 85 bits. That tool is also much less customizable than the other two: what you see here is pretty much what you get. The 4096-word list is also hardcoded in the C source code; it comes from a Usenet sci.crypt posting from 1997. A key feature of xkcdpass and diceware is that you can craft your own word list, which can make dictionary-based attacks harder. Indeed, with such word-based password generators, the only viable way to crack those passwords is to use dictionary attacks, because the password is so long that character-based exhaustive searches are not workable, since they would take centuries to complete. Changing from the default dictionary therefore brings some advantage against attackers. This may be yet another "security through obscurity" procedure, however: a naive approach may be to use a dictionary localized to your native language (for example, in my case, French), but that would deter only an attacker that doesn't do basic research about you, so that advantage is quickly lost to determined attackers. One should also note that the entropy of the password doesn't depend on which word list is chosen, only its length. Furthermore, a larger dictionary only expands the search space logarithmically; in other words, doubling the word-list length only adds a single bit of entropy. It is actually much better to add a word to your password than words to the word list that generates it.

Generating security tokens As mentioned before, most password managers feature a way to generate strong security tokens, with different policies (symbols or not, length, etc). In general, you should use your password manager's password-generation functionality to generate tokens for sites you visit. But how are those functionalities implemented and what can you do if your password manager (for example, Firefox's master password feature) does not actually generate passwords for you? pass, the standard UNIX password manager, delegates this task to the widely known pwgen program. It turns out that pwgen has a pretty bad track record for security issues, especially in the default "phoneme" mode, which generates non-uniformly distributed passwords. While pass uses the more "secure" -s mode, I figured it was worth removing that option to discourage the use of pwgen in the default mode. I made a trivial patch to pass so that it generates passwords correctly on its own. The gory details are in this email. It turns out that there are lots of ways to skin this particular cat. I was suggesting the following pipeline to generate the password:
    head -c $entropy /dev/random   base64   tr -d '\n='
The above command reads a certain number of bytes from the kernel (head -c $entropy /dev/random) encodes that using the base64 algorithm and strips out the trailing equal sign and newlines (for large passwords). This is what Gillmor described as a "high-entropy compact printable/transferable string". The priority, in this case, is to have a token that is as compact as possible with the given entropy, while at the same time using a character set that should cause as little trouble as possible on sites that restrict the characters you can use. Gillmor is a co-maintainer of the Assword password manager, which chose base64 because it is widely available and understood and only takes up 33% more space than the original 8-bit binary encoding. After a lengthy discussion, the pass maintainer, Jason A. Donenfeld, chose the following pipeline:
    read -r -n $length pass < <(LC_ALL=C tr -dc "$characters" < /dev/urandom)
The above is similar, except it uses tr to directly to read characters from the kernel, and selects a certain set of characters ($characters) that is defined earlier as consisting of [:alnum:] for letters and digits and [:graph:] for symbols, depending on the user's configuration. Then the read command extracts the chosen number of characters from the output and stores the result in the pass variable. A participant on the mailing list, Brian Candler, has argued that this wastes entropy as the use of tr discards bits from /dev/urandom with little gain in entropy when compared to base64. But in the end, the maintainer argued that reading "reading from /dev/urandom has no [effect] on /proc/sys/kernel/random/entropy_avail on Linux" and dismissed the objection. Another password manager, KeePass uses its own routines to generate tokens, but the procedure is the same: read from the kernel's entropy source (and user-generated sources in case of KeePass) and transform that data into a transferable string.

Conclusion While there are many aspects to password management, we have focused on different techniques for users and developers to generate secure but also usable passwords. Generating a strong yet memorable password is not a trivial problem as the security vulnerabilities of the pwgen software showed. Furthermore, left to their own devices, users will generate passwords that can be easily guessed by a skilled attacker, especially if they can profile the user. It is therefore essential we provide easy tools for users to generate strong passwords and encourage them to store secure tokens in password managers.
Note: this article first appeared in the Linux Weekly News.

19 October 2016

Reproducible builds folks: Reproducible Builds: week 77 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday October 9 and Saturday October 15 2016: Media coverage Documentation update After discussions with HW42, Steven Chamberlain, Vagrant Cascadian, Daniel Shahaf, Christopher Berg, Daniel Kahn Gillmor and others, Ximin Luo has started writing up more concrete and detailed design plans for setting SOURCE_ROOT_DIR for reproducible debugging symbols, buildinfo security semantics and buildinfo security infrastructure. Toolchain development and fixes Dmitry Shachnev noted that our patch for #831779 has been temporarily rejected by docutils upstream; we are trying to persuade them again. Tony Mancill uploaded javatools/0.59 to unstable containing original patch by Chris Lamb. This fixed an issue where documentation Recommends: substvars would not be reproducible. Ximin Luo filed bug 77985 to GCC as a pre-requisite for future patches to make debugging symbols reproducible. Packages reviewed and fixed, and bugs filed The following updated packages have become reproducible - in our current test setup - after being fixed: The following updated packages appear to be reproducible now, for reasons we were not able to figure out. (Relevant changelogs did not mention reproducible builds.) Some uploads have addressed some reproducibility issues, but not all of them: Some uploads have addressed nearly all reproducibility issues, except for build path issues: Patches submitted that have not made their way to the archive yet: Reviews of unreproducible packages 101 package reviews have been added, 49 have been updated and 4 have been removed in this week, adding to our knowledge about identified issues. 3 issue types have been updated: Weekly QA work During of reproducibility testing, some FTBFS bugs have been detected and reported by: tests.reproducible-builds.org Debian: Openwrt/LEDE/NetBSD/coreboot/Fedora/archlinux: Misc. We are running a poll to find a good time for an IRC meeting. This week's edition was written by Ximin Luo, Holger Levsen & Chris Lamb and reviewed by a bunch of Reproducible Builds folks on IRC.

6 October 2016

Reproducible builds folks: Reproducible Builds: week 75 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday September 25 and Saturday October 1 2016: Statistics For the first time, we reached 91% reproducible packages in our testing framework on testing/amd64 using a determistic build path. (This is what we recommend to make packages in Stretch reproducible.) For unstable/amd64, where we additionally test for reproducibility across different build paths we are at almost 76% again. IRC meetings We have a poll to set a time for a new regular IRC meeting. If you would like to attend, please input your available times and we will try to accommodate for you. There was a trial IRC meeting on Friday, 2016-09-31 1800 UTC. Unfortunately, we did not activate meetbot. Despite this participants consider the meeting a success as several topics where discussed (eg changes to IRC notifications of tests.r-b.o) and the meeting stayed within one our length. Upcoming events Reproduce and Verify Filesystems - Vincent Batts, Red Hat - Berlin (Germany), 5th October, 14:30 - 15:20 @ LinuxCon + ContainerCon Europe 2016. From Reproducible Debian builds to Reproducible OpenWrt, LEDE & coreboot - Holger "h01ger" Levsen and Alexander "lynxis" Couzens - Berlin (Germany), 13th October, 11:00 - 11:25 @ OpenWrt Summit 2016. Introduction to Reproducible Builds - Vagrant Cascadian will be presenting at the SeaGL.org Conference In Seattle (USA), November 11th-12th, 2016. Previous events GHC Determinism - Bartosz Nitka, Facebook - Nara (Japan), 24th September, ICPF 2016. Toolchain development and fixes Michael Meskes uploaded bsdmainutils/9.0.11 to unstable with a fix for #830259 based on Reiner Herrmann's patch. This fixed locale_dependent_symbol_order_by_lorder issue in the affected packages (freebsd-libs, mmh). devscripts/2.16.8 was uploaded to unstable. It includes a debrepro script by Antonio Terceiro which is similar in purpose to reprotest but more lightweight; specific to Debian packages and without support for virtual servers or configurable variations. Packages reviewed and fixed, and bugs filed The following updated packages have become reproducible in our testing framework after being fixed: The following updated packages appear to be reproducible now for reasons we were not able to figure out. (Relevant changelogs did not mention reproducible builds.) Some uploads have addressed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Reviews of unreproducible packages 77 package reviews have been added, 178 have been updated and 80 have been removed in this week, adding to our knowledge about identified issues. 6 issue types have been updated: Weekly QA work As part of reproducibility testing, FTBFS bugs have been detected and reported by: diffoscope development A new version of diffoscope 61 was uploaded to unstable by Chris Lamb. It included contributions from: Post-release there were further contributions from: reprotest development A new version of reprotest 0.3.2 was uploaded to unstable by Ximin Luo. It included contributions from: Post-release there were further contributions from: tests.reproducible-builds.org Misc. This week's edition was written by Ximin Luo, Holger Levsen & Chris Lamb and reviewed by a bunch of Reproducible Builds folks on IRC.

20 September 2016

Reproducible builds folks: Reproducible Builds: week 73 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday September 11 and Saturday September 17 2016: Toolchain developments Ximin Luo started a new series of tools called (for now) debrepatch, to make it easier to automate checks that our old patches to Debian packages still apply to newer versions of those packages, and still make these reproducible. Ximin Luo updated one of our few remaining patches for dpkg in #787980 to make it cleaner and more minimal. The following tools were fixed to produce reproducible output: Packages reviewed and fixed, and bugs filed The following updated packages have become reproducible - in our current test setup - after being fixed: The following updated packages appear to be reproducible now, for reasons we were not able to figure out. (Relevant changelogs did not mention reproducible builds.) The following 3 packages were not changed, but have become reproducible due to changes in their build-dependencies: jaxrs-api python-lua zope-mysqlda. Some uploads have addressed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Reviews of unreproducible packages 462 package reviews have been added, 524 have been updated and 166 have been removed in this week, adding to our knowledge about identified issues. 25 issue types have been updated: Weekly QA work FTBFS bugs have been reported by: diffoscope development A new version of diffoscope 60 was uploaded to unstable by Mattia Rizzolo. It included contributions from: It also included from changes previous weeks; see either the changes or commits linked above, or previous blog posts 72 71 70. strip-nondeterminism development New versions of strip-nondeterminism 0.027-1 and 0.028-1 were uploaded to unstable by Chris Lamb. It included contributions from: disorderfs development A new version of disorderfs 0.5.1 was uploaded to unstable by Chris Lamb. It included contributions from: It also included from changes previous weeks; see either the changes or commits linked above, or previous blog posts 70. Misc. This week's edition was written by Ximin Luo and reviewed by a bunch of Reproducible Builds folks on IRC.

7 September 2016

Reproducible builds folks: Reproducible Builds: week 71 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday August 28 and Saturday September 3 2016: Media coverage Antonio Terceiro blogged about testing build reprodubility with debrepro . GSoC and Outreachy updates The next round is being planned now: see their page with a timeline and participating organizations listing. Maybe you want to participate this time? Then please reach out to us as soon as possible! Packages reviewed and fixed, and bugs filed The following packages have addressed reproducibility issues in other packages: The following updated packages have become reproducible in our current test setup after being fixed: The following updated packages appear to be reproducible now, for reasons we were not able to figure out yet. (Relevant changelogs did not mention reproducible builds.) The following 4 packages were not changed, but have become reproducible due to changes in their build-dependencies: Some uploads have addressed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Reviews of unreproducible packages 706 package reviews have been added, 22 have been updated and 16 have been removed in this week, adding to our knowledge about identified issues. 5 issue types have been added: 1 issue type has been updated: Weekly QA work FTBFS bugs have been reported by: diffoscope development diffoscope development on the next version (60) continued in git, taking in contributions from: strip-nondeterminism development Mattia Rizzolo uploaded strip-nondeterminism 0.023-2~bpo8+1 to jessie-backports. A new version of strip-nondeterminism 0.024-1 was uploaded to unstable by Chris Lamb. It included contributions from: Holger added jobs on jenkins.debian.net to run testsuites on every commit. There is one job for the master branch and one for the other branches. disorderfs development Holger added jobs on jenkins.debian.net to run testsuites on every commit. There is one job for the master branch and one for the other branches. tests.reproducible-builds.org Debian: We now vary the GECOS records of the two build users. Thanks to Paul Wise for providing the patch. Misc. This week's edition was written by Ximin Luo, Holger Levsen & Chris Lamb and reviewed by a bunch of Reproducible Builds folks on IRC.

3 August 2016

Daniel Kahn Gillmor: Changes for GnuPG in Debian

The GNU Privacy Guard (GnuPG) upstream team maintains three branches of development: 1.4 ("classic"), 2.0 ("stable"), and 2.1 ("modern").They differ in various ways: software architecture, supported algorithms, network transport mechanisms, protocol versions, development activity, co-installability, etc.Debian currently ships two versions of GnuPG in every maintained suite -- in particular, /usr/bin/gpg has historically always been provided by the "classic" branch.That's going to change!Debian unstable will soon be moving to the "modern" branch for providing /usr/bin/gpg. This will give several advantages for Debian and its users in the future, but it will require a transition. Hopefully we can make it a smooth one.

What are the benefits?Compared to "classic", The "modern" branch has:
  • updated crypto (including elliptic curves)
  • componentized architecture (e.g. libraries, some daemonized processes)
  • improved key storage
  • better network access (including talking to keyservers over tor)
  • stronger defaults
  • more active upstream development
  • safer info representation (e.g. no more key IDs, fingerprints easier to copy-and-paste)
If you want to try this out, the changes are already made in experimental. Please experiment!

What does this mean for end users?If you're an end user and you don't use GnuPG directly, you shouldn't notice much of a change once the packages start to move through the rest of the archive.Even if you do use GnuPG regularly, you shouldn't notice too much of a difference. One of the main differences is that all access to your secret key will be handled through gpg-agent, which should be automatically launched as needed. This means that operations like signing and decryption will cause gpg-agent to prompt the the user to unlock any locked keys directly, rather than gpg itself prompting the user.If you have an existing keyring, you may also notice a difference based on a change of how your public keys are managed, though again this transition should ideally be smooth enough that you won't notice unless you care to investigate more deeply.If you use GnuPG regularly, you might want to read the NEWS file that ships with GnuPG and related packages for updates that should help you through the transition.If you use GnuPG in a language other than English, please install the gnupg-l10n package, which contains the localization/translation files. For versions where those files are split out of the main package, gnupg explicitly Recommends: gnupg-l10n already, so it should be brought in for new installations by default.If you have an archive of old data that depends on known-broken algorithms, PGP3 keys, or other deprecated material, you'll need to have "classic" GnuPG around to access it. That will be provided in the gnupg1 package

What does this mean for package maintainers?If you maintain a package that depends on gnupg: be aware that the gnupg package in debian is going through this transition.A few general thoughts:
  • If your package Depends: gnupg for signature verification only, you might prefer to have it Depends: gpgv instead. gpgv is a much simpler tool that the full-blown GnuPG suite, and should be easier to manage. I'm happy to help with such a transition (we've made it recently with apt already)
  • If your package Depends: gnupg and expects ~/.gnupg/ to be laid out in a certain way, that's almost certainly going to break at some point. ~/.gnupg/ is GnuPG's internal storage, and it's not recommended to rely on any specific data structures there, as they may change. gpg offers commands like --export, --import, and --delete for manipulating its persistent storage. please use them instead!
  • If your package depends on parsing or displaying gpg's output for the user, please make sure you use its special machine-readable form (--with-colons). Parsing the human-readable text is not advised and may change from version to version.
If you maintain a package that depends on gnupg2 and tries to use gpg2 instead of gpg, that should stay ok. However, at some point it'd be nice to get rid of /usr/bin/gpg2 and just have one expected binary (gpg). So you can help with that:
  • Look for places where your package expects gpg2 and make it try gpg instead. If you can make your code fall back cleanly
  • Change your dependencies to indicate gnupg (>= 2)
  • Patch lintian to encourage other people to make this switch ;)

What specifically needs to happen?The last major step for this transition was renaming the source package for "classic" GnuPG to be gnupg1. This transition is currently in the ftp-master's NEW queue. Once it makes it through that queue, and both gnupg1 and gnupg2 have been in experimental for a few days without reports of dangerous breakage, we'll upload both gnupg1 and gnupg2 to unstable.We'll also need to do some triage on the BTS, reassigning some reports which are really only relevant for the "classic" branch.Please report bugs via the BTS as usual! You're also welcome to ask questions and make suggestions on #debian-gnupg on irc.oftc.net, or to mail the Debian GnuPG packaging team at pkg-gnupg-maint@lists.alioth.debian.org.Happy hacking!Tags: gnupg

15 June 2016

Reproducible builds folks: Reproducible builds: week 59 in Stretch cycle

What happened in the Reproducible Builds effort between June 5th and June 11th 2016: Media coverage Ed Maste gave a talk at BSDCan 2016 on reproducible builds (slides, video). GSoC and Outreachy updates Weekly reports by our participants: Documentation update - Ximin Luo proposed a modification to our SOURCE_DATE_EPOCH spec explaining FORCE_SOURCE_DATE. Some upstream build tools (e.g. TeX, see below) have expressed a desire to control which cases of embedded timestamps should obey SOURCE_DATE_EPOCH. They were not convinced by our arguments on why this is a bad idea, so we agreed on an environment variable FORCE_SOURCE_DATE for them to implement their desired behaviour - named generically, so that at least we can set it centrally. For more details, see the text just linked. However, we strongly urge most build tools not to use this, and instead obey SOURCE_DATE_EPOCH unconditionally in all cases. Toolchain fixes Packages fixed The following 16 packages have become reproducible due to changes in their build-dependencies: apertium-dan-nor apertium-swe-nor asterisk-prompt-fr-armelle blktrace canl-c code-saturne coinor-symphony dsc-statistics frobby libphp-jpgraph paje.app proxycheck pybit spip tircd xbs The following 5 packages are new in Debian and appear to be reproducible so far: golang-github-bowery-prompt golang-github-pkg-errors golang-gopkg-dancannon-gorethink.v2 libtask-kensho-perl sspace The following packages had older versions which were reproducible, and their latest versions are now reproducible again after being fixed: The following packages have become reproducible after being fixed: Some uploads have fixed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Package reviews 68 reviews have been added, 19 have been updated and 28 have been removed in this week. New and updated issues: 26 FTBFS bugs have been reported by Chris Lamb, 1 by Santiago Vila and 1 by Sascha Steinbiss. diffoscope development strip-nondeterminism development disorderfs development tests.reproducible-builds.org Misc. Steven Chamberlain submitted a patch to FreeBSD's makefs to allow reproducible builds of the kfreebsd installer. Ed Maste committed a patch to FreeBSD's binutils to enable determinstic archives by default in GNU ar. Helmut Grohne experimented with cross+native reproductions of dash with some success, using rebootstrap. This week's edition was written by Ximin Luo, Chris Lamb, Holger Levsen, Mattia Rizzolo and reviewed by a bunch of Reproducible builds folks on IRC.

3 June 2016

Gunnar Wolf: Stop it with those short PGP key IDs!

Debian is quite probably the project that most uses a OpenPGP implementation (that is, GnuPG, or gpg) for many of its internal operations, and that places most trust in it. PGP is also very widely used, of course, in many other projects and between individuals. It is regarded as a secure way to do all sorts of crypto (mainly, encrypting/decrypting private stuff, signing public stuff, certifying other people's identities). PGP's lineage traces back to Phil Zimmerman's program, first published in 1991 By far, not a newcomer PGP is secure, as it was 25 years ago. However, some uses of it might not be so. We went through several migrations related to algorithmic weaknesses (i.e. v3 keys using MD5; SHA1 is strongly discouraged, although not yet completely broken, and it should be avoided as well) or to computational complexity (as the migration away from keys smaller than 2048 bits, strongly prefering 4096 bits). But some vulnerabilities are human usage (that is, configuration-) related. Today, Enrico Zini gave us a heads-up in the #debian-keyring IRC channel, and started a thread in the debian-private mailing list; I understand the mail to a private list was partly meant to get our collective attention, and to allow for potentially security-relevant information to be shared. I won't go into details about what is, is not, should be or should not be private, but I'll post here only what's public information already. What are short and long key IDs? I'll start by quoting Enrico's mail:
there are currently at least 3 ways to refer to a gpg key: short key ID (last 8 hex digits of fingerprint), long key ID (last 16 hex digits) and full fingerprint. The short key ID used to be popular, and since 5 years it is known that it is computationally easy to generate a gnupg key with an arbitrary short key id. A mitigation to this is using "keyid-format long" in gpg.conf, and a better thing to do, especially in scripts, is to use the full fingerprint to refer to a key, or just ship the public key for verification and skip the key servers. Note that in case of keyid collision, gpg will download and import all the matching keys, and will use all the matching keys for verifying signatures.
So... What is this about? We humans are quite bad at recognizing and remembering randomly-generated strings with no inherent patterns in them. Every GPG key can be uniquely identified by its fingerprint, a 128-bit string, usually encoded as ten blocks of four hexadecimal characters (this allows for 160 bits; I guess there's space for a checksum in it). That is, my (full) key's signature is:
AB41 C1C6 8AFD 668C A045  EBF8 673A 03E4 C1DB 921F
However, it's quite hard to recognize such a long string, let alone memorize it! So, we often do what humans do: Given that strong cryptography implies a homogenous probability distribution, people compromised on using just a portion of the key the last portion. The short key ID. Mine is then the last two blocks (shown in boldface): C1DB921F. We can also use what's known as the long key ID, that's twice as long: 64 bits. However, while I can speak my short key ID on a single breath (and maybe even expect you to remember and note it down), try doing so with the long one (shown in italics above): 673A03E4C1DB921F. Nah. Too much for our little, analog brains. This short and almost-rememberable number has then 32 bits of entropy I have less than one in 4,000,000,000 chance of generating a new key with this same short key ID. Besides, key generation is a CPU-intensive operation, so it's quite unlikely we will have a collision, right? Well, wrong. Previous successful attacks on short key IDs Already five years ago, Asheesh Laroia migrated his 1024D key to a 4096R. And, as he describes in his always-entertaining fashion, he made his computer sweat until he was able to create a new key for which the short key ID collided with the old one. It might not seem like a big deal, as he did this non-maliciously, but this easily should have spelt game over for the usage of short key IDs. After all, being able to generate a collision is usually the end for cryptographic systems. Asheesh specifically mentioned in his posting how this could be abused. But we didn't listen. Short key IDs are just too convenient! Besides, they allow us to have fun, can be a means of expression! I know of at least two keys that would qualify as vanity: Obey Arthur Liu's 0x29C0FFEE (created in 2009) and Keith Packard's 0x00000011 (created in 2012). Then we got the Evil 32 project. They developed Scallion, started (AFAICT) in 2012. Scallion automates the search for a 32-bit collision using GPUs; they claim that it takes only four seconds to find a collision. So, they went through the strong set of the public PGP Web of Trust, and created a (32-bit-)colliding key for each of the existing keys. And what happened now? What happened today? We still don't really know, but it seems we found a first potentially malicious collision that is, the first "nonacademic" case. Enrico found two keys sharing the 9F6C6333 short ID, apparently belonging to the same person (as would be the case of Asheesh, mentioned above). After contacting Gustavo, though, he does not know about the second That is, it can be clearly regarded as an impersonation attempt. Besides, what gave away this attempt are the signatures it has: Both keys are signed by what appears to be the same three keys: B29B232A, F2C850CA and 789038F2. Those three keys are not (yet?) uploaded to the keyservers, though... But we can expect them to appear at any point in the future. We don't know who is behind this, or what his purpose is. We just know this looks very evil. Now, don't panic: Gustavo's key is safe. Same for his certifiers, Marga, Agust n and Maxy. It's just a 32-bit collision. So, in principle, the only parties that could be cheated to trust the attacker are humans, right? Nope. Enrico tested on the PGP pathfinder & key statistics service, a keyserver that finds trust paths between any two arbitrary keys in the strong set. Surprise: The pathfinder works on the short key IDs, even when supplied full fingerprints. So, it turns out I have three faked trust paths into our impostor. What next? There are several things this should urge us to do. And there are surely many other important recommendations. But this is a good set of points to start with. [update] I was pointed at Daniel Kahn Gillmor's 2013 blog post, OpenPGP Key IDs are not useful. Daniel argues, in short, that cutting a fingerprint in order to get a (32- or 64-bit) short key ID is the worst of all worlds, and we should rather target either always showing full fingerprints, or not showing it at all (and leaving all the crypto-checking bits to be done by the software, as comparing 160-bit strings is not natural for us humans). [update] This post was picked up by LWN.net. A very interesting discussion continues in their comments.

12 April 2016

Reproducible builds folks: Reproducible builds: week 48 in Stretch cycle

What happened in the reproducible builds effort between March 20th and March 26th: Toolchain fixes Daniel Kahn Gillmor worked on removing build path from build symbols submitting a patch adding -fdebug-prefix-map to clang to match GCC, another patch against gcc-5 to backport the removal of -fdebug-prefix-map from DW_AT_producer, and finally by proposing the addition of a normalizedebugpath to the reproducible feature set of dpkg-buildflags that would use -fdebug-prefix-map to replace the current directory with . using -fdebug-prefix-map. Sergey Poznyakoff merged the --clamp-mtime option so that it will be featured in the next Tar release. This option is likely to be used by dpkg-deb to implement deterministic mtimes for packaged files. Packages fixed The following packages have become reproducible due to changes in their build dependencies: augeas, gmtkbabel, ktikz, octave-control, octave-general, octave-image, octave-ltfat, octave-miscellaneous, octave-mpi, octave-nurbs, octave-octcdf, octave-sockets, octave-strings, openlayers, python-structlog, signond. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: tests.reproducible-builds.org i386 build nodes have been setup by converting 2 of the 4 amd64 nodes to i386. (h01ger) Package reviews 92 reviews have been removed, 66 added and 31 updated in the previous week. New issues: timestamps_generated_by_xbean_spring, timestamps_generated_by_mangosdk_spiprocessor. Chris Lamb filed 7 FTBFS bugs. Misc. On March 20th, Chris Lamb gave a talk at FOSSASIA 2016 in Singapore. The very same day, but a few timezones apart, h01ger did a presentation at LibrePlanet 2016 in Cambridge, Massachusetts. Seven GSoC/Outreachy applications were made by potential interns to work on various aspects of the reproducible builds effort. On top of interacting with several applicants, prospective mentors gathered to review the applications.

27 March 2016

Lunar: Reproducible builds: week 48 in Stretch cycle

What happened in the reproducible builds effort between March 20th and March 26th:

Toolchain fixes
  • Sebastian Ramacher uploaded breathe/4.2.0-1 which makes its output deterministic. Original patch by Chris Lamb, merged uptream.
  • Rafael Laboissiere uploaded octave/4.0.1-1 which allows packages to be built in place and avoid unreproducible builds due to temporary build directories appearing in the .oct files.
Daniel Kahn Gillmor worked on removing build path from build symbols submitting a patch adding -fdebug-prefix-map to clang to match GCC, another patch against gcc-5 to backport the removal of -fdebug-prefix-map from DW_AT_producer, and finally by proposing the addition of a normalizedebugpath to the reproducible feature set of dpkg-buildflags that would use -fdebug-prefix-map to replace the current directory with . using -fdebug-prefix-map. As succesful result of lobbying at LibrePlanet 2016, the --clamp-mtime option will be featured in the next Tar release. This option is likely to be used by dpkg-deb to implement deterministic mtimes for packaged files.

Packages fixed The following packages have become reproducible due to changes in their build dependencies: augeas, gmtkbabel, ktikz, octave-control, octave-general, octave-image, octave-ltfat, octave-miscellaneous, octave-mpi, octave-nurbs, octave-octcdf, octave-sockets, octave-strings, openlayers, python-structlog, signond. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #818742 on milkytracker by Reiner Herrmann: sorts the list of source files.
  • #818752 on tcl8.4 by Reiner Herrmann: sort source files using C locale.
  • #818753 on tk8.6 by Reiner Herrmann: sort source files using C locale.
  • #818754 on tk8.5 by Reiner Herrmann: sort source files using C locale.
  • #818755 on tk8.4 by Reiner Herrmann: sort source files using C locale.
  • #818952 on marionnet by ceridwen: dummy out build date and uname to make build reproducible.
  • #819334 on avahi by Reiner Herrmann: ship upstream changelog instead of the one generated by gettextize (although duplicate of #804141 by Santiago Vila).

tests.reproducible-builds.org i386 build nodes have been setup by converting 2 of the 4 amd64 nodes to i386. (h01ger)

Package reviews 92 reviews have been removed, 66 added and 31 updated in the previous week. New issues: timestamps_generated_by_xbean_spring, timestamps_generated_by_mangosdk_spiprocessor. Chris Lamb filed 7 FTBFS bugs.

Misc. On March 20th, Chris Lamb gave a talk at FOSSASIA 2016 in Singapore. The very same day, but a few timezones apart, h01ger did a presentation at LibrePlanet 2016 in Cambridge, Massachusetts. Seven GSoC/Outreachy applications were made by potential interns to work on various aspects of the reproducible builds effort. On top of interacting with several applicants, prospective mentors gathered to review the applications. Huge thanks to Linda Naeun Lee for the new hackergotchi visible on Planet Debian.

24 January 2016

Lunar: Reproducible builds: week 39 in Stretch cycle

What happened in the reproducible builds effort between January 17th and January 23rd:

Toolchain fixes James McCoy uploaded subversion/1.9.3-2 which removes -Wdate-time from CPPFLAGS passed to swig enabling several packages to build again. The switch made in binutils/2.25-6 to use deterministic archives by default had the unfortunate effect of breaking a seldom used feature of make. Manoj Srivastava asked on debian-devel the best way to communicate the changes to Debian users. Lunar quickly came up with a patch that displays a warning when Make encounters deterministic archives. Manoj made it available in make/4.1-2 together with a NEWS file advertising the change. Following Guillem Jover's comment on the latest patch to make mtimes of packaged files deterministic, Daniel Kahn Gillmor updated and extended the patch adding the --clamp-mtime option to GNU Tar. Mattia Rizzolo updated texlive-bin in the reproducible experimental repository.

Packages fixed The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:

reproducible.debian.net Transition from reproducible.debian.net to the more general tests.reproducible-builds.org has started. More visual changes are coming. (h01ger) A plan on how to run tests for F-Droid has been worked out. (hc, mvdan, h01ger) A first step has been made by adding a Jenkins job to setup an F-Droid build environment. (h01ger)

diffoscope development diffoscope 46 has been released on January 19th, followed-up by version 47 made available on January 23rd. Try it online at try.diffoscope.org! The biggest visible change is the improvement to ELF file handling. Comparisons are now done section by section, using the most appropriate tool and options to get meaningful results, thanks to Dhole's work and Mike Hommey's suggestions. Also suggested by Mike, symbols for IP-relative ops are now filtered out to remove clutter. Understanding differences in ELF files belonging to Debian packages should also be much easier as diffoscope will now try to extract debug information from the matching dbgsym package. This means objdump disassembler should output line numbers for packages built with recent debhelper as long as the associated debug package is in the same directory. As diff tends to consume huge amount of memory on large inputs, diffoscope has a limit in place to prevent crashes. diffoscope used to display a difference every time the limit was hit. Because this was confusing in case there were actually no differences, a hash is now internally computed to only report a difference when one exists. Files in archives and other container members are now compared in the original order. This should not matter in most case but overall give more predictable results. Debian .buildinfo files are now supported. Amongst other minor fixes and improvements, diffoscope will now properly compare symlinks in directories. Thanks Tuomas Tynkkynen for reporting the problem.

Package reviews 70 reviews have been removed, 125 added and 33 updated in the previous week, gcc-5 amongst others. 25 FTBFS issues have been filled by Chris Lamb, Daniel Stender, Martin Michlmayr.

Misc. The 16th FOSDEM will happen in Brussels, Belgium on January 30-31st. Several talks will be about reproducible builds: h01ger about the general ecosystem, Fabian Keil about the security oriented ElectroBSD, Baptiste Daroussin about FreeBSD packages, Ludovic Court s about Guix.

14 January 2016

Lunar: Reproducible builds: week 37 in Stretch cycle

What happened in the reproducible builds effort between January 3rd and January 9th 2016:

Toolchain fixes David Bremner uploaded dh-elpa/0.0.18 which adds a --fix-autoload-date option (on by default) to take autoload dates from changelog. Lunar updated and sent the patch adding the generation of .buildinfo to dpkg.

Packages fixed The following packages have become reproducible due to changes in their build dependencies: aggressive-indent-mode, circe, company-mode, db4o, dh-elpa, editorconfig-emacs, expand-region-el, f-el, geiser, hyena, js2-mode, markdown-mode, mono-fuse, mysql-connector-net, openbve, regina-normal, sml-mode, vala-mode-el. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #809780 on flask-restful by Chris Lamb: implement support for SOURCE_DATE_EPOCH in the build system.
  • #810259 on avfs by Chris Lamb: implement support for SOURCE_DATE_EPOCH in the build system.
  • #810509 on apt by Mattia Rizzolo: ensure a stable file order is given to the linker.

reproducible.debian.net Add 2 more armhf build nodes provided by Vagrant Cascadian. This added 7 more armhf builder jobs. We now run around 900 tests of armhf packages each day. (h01ger) The footer of each page now indicates by which Jenkins jobs build it. (h01ger)

diffoscope development diffoscope 45 has been released on January 4th. It features huge memory improvements when comparing large files, several fixes of squashfs related issues that prevented comparing two Tails images, and improve the file list of tar and cpio archive to be more precise and consistent over time. It also fixes a typo that prevented the Mach-O to work (Rainer M ller), improves comparisons of ELF files when specified on the command line, and solves a few more encoding issues.

Package reviews 134 reviews have been removed, 30 added and 37 updated in the previous week. 20 new fail to build from source issues were reported by Chris Lamb and Chris West. prebuilder will now skip installing diffoscope to save time if the build results are identical. (Reiner Herrmann)

2 January 2016

Lunar: Reproducible builds: week 33 in Stretch cycle

What happened in the reproducible builds effort between December 6th and December 12th: Toolchain fixes Reiner Herrmann rebased our experimental version of doxygen on version 1.8.9.1-6. Chris Lamb submitted a patch to make the manpages generated by ruby-ronn reproducible by using the locale-agnostic %Y-%m-%d for the dates. Daniel Kahn Gillmor took another shot at the issue of source path captured in DWARF symbols. A patch has been sent for review by GCC upstream to add the ability to read an environment variable with -fdebug-prefix-map. Packages fixed The following 24 packages have become reproducible due to changes in their build dependencies: gkeyfile-sharp, gprbuild, graphmonkey, gthumb, haskell-yi-language, ion, jackson-databind, jackson-dataformat-smile, jackson-dataformat-xml, jnr-ffi, libcommons-net-java, libproxy, maven-shared-utils, monodevelop-database, mydumper, ndesk-dbus, nini, notify-sharp, pixz, protozero, python-rtslib-fb, slurm-llnl, taglib-sharp, tomboy-latex. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: These uploads might have fixed reproducibility issues but could not be tested yet: Patches submitted which have not made their way to the archive yet: reproducible.debian.net Files created with diffoscope now have diffoscope in their name instead debbindiff. (h01ger) Hostnames of first and second build node are now recorded and shown in the build history. (Mattia Rizzolo) Exchanges have started with F-Droid developers to better understand what would be required to test F-Droid applications. (h01ger) A first small set of Fedora 23 packages is now also being tested while development on a new framework for testing RPMs in general has begun. A new Jenkins job has been added to set up to mock, the build system used by Fedora. Another new job takes care of testing RPMs from Fedora 23 on x86_64. So far only 151 packages from the buildsys-build group are tested (currently all unreproducible), but the plan is to build all 17,000 source packages in Fedora 23 and rawhide. The page presenting the results should also soon be improved. (h01ger, Dhiru Kholia) For Arch Linux, all 2223 packages from the extra repository will also be tested from now on. Packages in extra" are tested every four weeks, while those from core every week. Statistics are now displayed alongside the results. (h01ger) jenkins.debian.net has been updated to jenkins-job-builder version 1.3.0. Many job configurations have been simplified and refactored using features of the new version. This was another milestone for the jenkins.debian.org migration. (Phil Hands, h01ger) diffoscope development Chris Lamb announced try.diffoscope.org: an online service that runs diffoscope on user provided files. Screenshot of try.diffoscope.org Improvements are welcome. The application is licensed under the AGPLv3. On diffoscope itself, most pending patches have now been merged. Expect a release soon! Most of the code implementing parallel processing has been polished. Sadly, unpacking archive is CPU-bound in most cases, so the current thread-only implementation does not offer much gain on big packages. More work is still require to also add concurrent processes. Documentation update Ximin Luo has started to write a specification for buildinfo files that could become a larger platform than the limited set of features that were thought so far for Debian .buildinfo. Package reviews 113 reviews have been removed, 111 added and 56 updated in the previous week. 42 new FTBFS bugs were opened by Chris Lamb and Niko Tyni. New issues identified this week: timestamps_in_documentation_generated_by_docbook_dbtimestamp, timestamps_in_sym_l_files_generated_by_malaga, timestamps_in_edj_files_generated_by_edje_cc. Misc. Chris Lamb presented reproducible builds at skroutz.gr.

Next.