Search Results: "Daniel Stender"

25 July 2017

Reproducible builds folks: Reproducible Builds: week 117 in Buster cycle

Here's what happened in the Reproducible Builds effort between Sunday July 16 and Saturday July 22 2017: Toolchain development Bernhard M. Wiedemann wrote a tool to automatically run through different sources of non-determinism, and report which of these caused irreproducibility. Dan Kegel's patches to fpm were merged. Bugs filed Patches submitted upstream: Patches filed in Debian: Reviews of unreproducible packages 73 package reviews have been added, 44 have been updated and 50 have been removed in this week, adding to our knowledge about identified issues. No issue types were updated. Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by: diffoscope development reprotest development Ximin also restarted the discussion with autopkgtest-devel about code reuse for reprotest. Santiago Torres began a series of patches to make reprotest more distro-agnostic, with the aim of making it usable on Arch Linux. Ximin reviewed these patches. Misc. This week's edition was written by Ximin Luo, Bernhard M. Wiedemann and Chris Lamb & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

7 March 2017

Daniel Stender: Remotely deploy a WSGI application (as a Debian package) with Ansible

This is a mini workshop as an introduction into using Ansible for the administration of Debian systems. As an example it s shown how this configuration management tool can be used to remotely set up a simple WSGI application running on an Apache web server on a Debian installation to make it available on the net. The application which is used as an example is httpbin by Runscope. This is an useful HTTP request service for the development of web software or any other purposes which features a number of specific endpoints that can be used for different testing matters. For example, the address http://<address>/user-agent of httpbin returns the user agent identification of the client program which has been used to query it (that s taken from the header of the request). There are official instances of this request server running on the net, like the one at http://httpbin.org/. WSGI is a widespread standard for programming web application in Python, and httpbin is implemented in Python using the Flask web framework. The basis of the workshop is a simple base installation of an up-to-date Debian 8 Jessie on a demonstration host, and the latest official release of that is 8.7. As a first step, the installation has to be switched over to the testing branch of Debian, because the Debian packages of httpbin are comparatively new and are going to be introduced into the stable branch of the archive the first time with the upcoming major release number 9 Stretch . After that, the Apache packages which are needed to make it available (apache2 and libapache2-mod-wsgi other web servers of course could be used instead), and which are not part of a base installation, are installed from the archive. The web server then gets launched remotely, and the httpbin package will be also pulled and the service is going to be integrated into Apache to run on that. To achieve that, two configuration files must be deployed on the target system, and a few additional operations are needed to get everything working together. Every step is preconfigured within Ansible so that the whole process could be launched by a single command on the control node, and could be run on a single or a number of comparable target machines automatically and reproducibly. If a server is needed for trying this workshop out, straightforward cloud server instances are available on the net for example at DigitalOcean, but let me underline this there are other cloud providers which offer the same things, too! If it s needed for experiments or other purposes only for a limited time, low priced droplets are available here which are billed by the hour. After being registered, the machine(s) which is/are wanted could be set up easily over the web interface (choose Debian 8.7 as OS), but there are also command line clients available like doctl (which is not yet available as a Debian package). For the convenient use of a droplet the user should generate a SSH key pair on the local machine, first:
$ ssh-keygen -t rsa -b 4096 -C "john@doe.com" -f ~/.ssh/mykey
The public part of the key ~/.ssh/mykey.pub then can be uploaded into the user account before the droplet is going to be created, it then could be integrated automatically. There is a good introduction on the whole process available in the excellent tutorial series serversforhackers.com, here. Ansible then can use the SSH keypair to login into a droplet without the need to type in the password every time. On a cloud server like this carrying a Debian base system, the examples in this workshop can be tried well. Ansible works client-less and doesn t need to be installed on the remote system but only on the control node, however a Python 2.7 interpreter is needed there (the base system of DigitalOcean includes that). For that Ansible can do anything on them, remote servers which are going to be controlled must be added to /etc/ansible/hosts. This is a configuration file in the INI format for DNS names and IP addresses. For a flexible organisation of the server inventory it s possible to group hosts here, IP ranges could be given, and optional variables can be used among other useful things (the default file contains a couple of examples). One or a couple of servers (in Ansible they are called hosts ) on which something particular is going to be happening (like httpbin is going to be installed) could be added like this (the group name is arbitrary):
[httpbin]
192.0.2.0
Whether Ansible could communicate with the hosts in the group and actually can operate on them can be verified by just pinging them like this:
$ ansible httpbin -m ping -u root --private-key=~/.ssh/mykey
192.0.2.0   SUCCESS =>  
    "changed": false, 
    "ping": "pong"
 
The command succeeded well, so it appears there isn t no significant problem regarding this machine. The return value changed:false indicates that there haven t been any changes on that host as a result of the execution of this command. Next to ping there are several other modules which could be used with the command line tool ansible the same way, and these modules are actually something like the core components of Ansible. The module shell for example can be used to execute shell commands on the remote machine like uname to get some system information returned from the server:
$ ansible httpbin -m shell -a "uname -a" -u root --private-key=~/.ssh/mykey
192.0.2.0   SUCCESS   rc=0 >>
Linux debian-512mb-fra1-01 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux
In the same way, the module apt could be used to remotely install packages. But with that there s no major advantage over other software products that offer a similar functionality, and using those modules on the command line is just the basics of Ansible usage. Playbooks in Ansible are YAML scripts for the manipulation of the registered hosts in /etc/ansible/hosts. Different tasks can be defined here for successive processing, like a simple playbook for changing the package source from stable to testing for example goes like this:
---
 - hosts: httpbin
   tasks:
   - name: remove "jessie" package source
     apt_repository: repo='deb http://mirrors.digitalocean.com/debian jessie main' state=absent
   - name: add "testing" package source
     apt_repository: repo='deb http://httpredir.debian.org/debian testing main contrib non-free' state=present
   - name: upgrade packages
     apt: update_cache=yes upgrade=dist
First, like used with the CLI tool ansible above, the targeted host group httpbin is chosen. The default user root and the SSH key could be fixed here, too, to spare the need to give them on the command line. Then there are three tasks defined to get worked down consecutively: With the module apt_repository the preset package source jessie is removed from /etc/apt/sources.list. Then, a new package source for the testing archive gets added to /etc/apt/sources.list.d/ by using the same module (by the way, mirrors.digitalocean.org also provides testing, though, and that might be faster). After that, the apt module is used to upgrade the package inventory (it performs apt-get dist-upgrade), after an update of the package cache has taken place (by running apt-get update) A playbook like this (the filename is arbitrary, but commonly carries the suffix .yml) can be run by the CLI tool ansible-playbook, like this:
$ ansible-playbook httpbin.yml -u root --private-key=~/.ssh/mykey
Ansible then works down the individual plays of the tasks on the remote server(s) top-down, and due to a high speed net connection and SSD block device hardware the change of the system to being a Debian Testing base installation only takes around a minute to complete in the cloud. While working, Ansible puts out status reports for the individual operations. If certain changes on the base system have been taken place already like when a playbook is run through one more time, the modules of course sense that and return just the information that the system haven t been changed because it s already there what have been wanted to change to. Beyond the basic playbook which is shown here there are more advanced features like register and when available to bind the execution of a play to the error-free result of a previous one. The apt module then can be used in the playbook to install the three needed binary packages one after another:
   - name: install apache2
     apt: pkg=apache2 state=present
   - name: install mod_wsgi
     apt: pkg=libapache2-mod-wsgi state=present
   - name: install httpbin
     apt: pkg=python-httpbin state=present
The Debian packages are configured in a way that the Apache web server is running immediately after installation, and the Apache module mod_wsgi is automatically integrated. If that would be otherwise desired, there are Ansible modules available for operating on Apache which can reverse this if that is wanted. By the way, after the package have been installed the httpbin server can be launched with python -m httpbin.core, but this runs only a mini web server which is not suitable for productive use. To get httpbin running on the Apache web server two configuration files are needed. They could be set up in the project directory on the control node and then uploaded onto the remote machine with another Ansible module. The file httpbin.wsgi (the name is again arbitrary) contains only a single line which is the starter for the WSGI application to run:
from httpbin import app as application
The module copy can be used to deploy that script on the host, while the target folder /var/www/httpbin must be set up before that by the module file. In addition to that, a separate user account like httpbin (the name is also arbitrary but picked up in the other config file) is needed to run it, and the module user can set this up. The demonstrational playbook continues, and the plays which are performing these three operations are going like this:
   - name: mkdir /var/www/httpbin
     file: path=/var/www/httpbin state=directory
   - name: set up user "httpbin"
     user: name=httpbin
   - name: copy WSGI starter
     copy: src=httpbin.wsgi dest=/var/www/httpbin/httpbin.wsgi owner=httpbin group=httpbin mode=0644 
Another configuration script httpbin.conf is needed for Apache on the remote server to include the WSGI application httpbin running as a virtual host. It goes like this:
<VirtualHost *>
 WSGIDaemonProcess httpbin user=httpbin group=httpbin threads=5
 WSGIScriptAlias / /var/www/httpbin/httpbin.wsgi
 <Directory /var/www/httpbin>
  WSGIProcessGroup httpbin
  WSGIApplicationGroup % GLOBAL 
  Order allow,deny
  Allow from all
 </Directory>
</VirtualHost>
This file needs to be copied into the folder /etc/apache2/sites-available on the host, which already exists when the apache2 package is installed. The remaining operations which are missing to get anything running together are: The default welcome screen of Apache blocks anything else and should be disabled by Apache s CLI tool a2dissite. And after that, the new virtual host needs to be activated with the complementary tool a2ensite both could be run remotely by the module command. Then the Apache server on the remote machine must be restarted to read in the new configuration. You ve guessed it already, that s all easy to perform with Ansible:
   - name: deploy configuration script
     copy: src=httpbin.conf dest=/etc/apache2/sites-available owner=root group=root mode=0644
   - name: deactivate default welcome screen
     command: a2dissite 000-default.conf
     
   - name: activate httpbin virtual host
     command: a2ensite httpbin.conf
   - name: restart Apache
     service: name=apache2 state=restarted 
That s it. After this playbook has been performed by Ansible on a (or several) freshly set up remote Debian base installation completely, then the httpbin request server is available running on the Apache web server and could be queried from anywhere by a web browser, or for example by curl:
$ curl http://192.0.2.0/user-agent
 
  "user-agent": "curl/7.50.1"
 
With the broad set of Ansible modules and the playbooks a lot of tasks can be accomplished like the example problem which has been explained here. But the range of functions of Ansible however is still even more comprehensive, but to discuss that would have blown the frame of this blog post. For example the playbooks offer more advanced features like event handler which can be used for recurring operations like the restart of Apache in more extensive projects. And beyond playbooks, templates could be set up in the roles which can behave differently on selected machine groups Ansible uses Jinja2 as template engine for that. And the scope of functions of the basic modules could be expanded by employing external tools. To drop a word on why it could be useful in certain situations to run own instances of the httpbin request server instead of using the official ones which are provided on the net by Runscope: Like some people would prefer to run a private instance for example in the local network instead of querying the one on the internet. Or for some development reasons a couple or even a large number of identical instances might be needed Ansible is ideal for setting them up automatically. Anyway, the Javascript bindings to the tracking services like Google Analytics in httpbin/templates/trackingscripts.html are patched out in the Debian package. That could be another reason to prefer to set up an own instance on a Debian server.

15 February 2017

Daniel Stender: APT programming snippets for Debian system maintenance

The Python API for the Debian package manager APT is useful for writing practical system maintenance scripts, which are going beyond shell scripting capabilities. There are Python2 and Python3 libraries for that available as packages, as well as a documentation in the package python-apt-doc. If that s also installed, the documentation then could be found in /usr/share/doc/python-apt-doc/html/index.html, and there are also a couple of example scripts shipped into /usr/share/doc/python-apt-doc/examples. The libraries mainly consists of Python bindings for the libapt-inst and libapt-pkg C++ core libraries of the APT package manager, which makes it processing very fast. Debugging symbols are also available as packages (python ,3 -apt-dbg). The module apt_inst provides features like reading from binary packages, while apt_pkg resembles the functions of the package manager. There is also the apt abstraction layer which provides more convenient access to the library, like apt.cache.Cache() could be used to behave like apt-get:
from apt.cache import Cache
mycache = Cache()
mycache.update()                   # apt-get update
mycache.open()                     # re-open
mycache.upgrade(dist_upgrade=True) # apt-get dist-upgrade
mycache.commit()                   # apply

boil out selections As widely known, there is a feature of dpkg which helps to move a package inventory from one installation to another by just using a text file with a list of installed packages. A selections list containing all installed package could be easily generated with $ dpkg --get-selections > selections.txt. The resulting file then looks something similar like this:
$ cat selections.txt
0ad                                 install
0ad-data                            install
0ad-data-common                     install
a2ps                                install
abi-compliance-checker              install
abi-dumper                          install
abigail-tools                       install
accountsservice                     install
acl                                 install
acpi                                install
The counterpart for this operation (--set-selections) could be used to reinstall (add) the complete package inventory on another installation resp. computer (that needs superuser rights), like that s explained in the manpage dpkg(1). No problem so far. The problem is, if that list contains a package which couldn t be found in any of the package inventories which are set up in /etc/apt/sources.list(.d/) on the target system, dpkg stops the whole process:
# dpkg --set-selections < selections.txt
dpkg: warning: package not in database at line 524: google-chrome-beta
dpkg: warning: found unknown packages; this might mean the available database
is outdated, and needs to be updated through a frontend method
Thus, manually downloaded and installed wild packages from unofficial package sources are problematic for this approach, because the package installer simply doesn t know where to get them. Luckily, dpkg puts out the relevant package names, but instead of having them removed manually with an editor this little Python script for python3-apt automatically deletes any of these packages from a selections file:
#!/usr/bin/env python3
import sys
import apt_pkg
apt_pkg.init()
cache = apt_pkg.Cache()
infile = open(sys.argv[1])
outfile_name = sys.argv[1] + '.boiled'
outfile = open(outfile_name, "w")
for line in infile:
    package = line.split()[0]
    if package in cache:
        outfile.write(line)
infile.close()
outfile.close()
sys.exit(0)
The script takes one argument which is the name of the selections file which has been generated by dpkg. The low level module apt_pkg first has to been initialized with apt_pkg.init(). Then apt_pkg.Cache() can be used to instantiate a cache object (here: cache). That object is iterable, thus it s easy to not perform something if a package from that list couldn t be found in the database, like not copying the corresponding line into the outfile (.boiled), while the others are copied. The result then looks something like this:
$ diff selections.txt selections.txt.boiled 
3780d3779
< python-timemachine   install
4438d4436
< wlan-supercracker    install
That script might be useful also for moving from one distribution resp. derivative to another (like from Ubuntu to Debian). For productive use, open() should be of course secured against FileNotFound and IOError-s to prevent program crashs on such events.

purge rc-s Like also widely known, deinstalled packages leave stuff like configuration files, maintainer scripts and logs on the computer, to save that if the package gets reinstalled at some point in the future. That happens if dpkg has been used with -r/--remove instead of -P/--purge, which also removes these files which are left otherwise. These packages are then marked as rc in the package archive, like:
$ dpkg -l   grep ^rc
rc  firebird2.5-common          2.5.6.27020.ds4-3   amd64   common files for firebird 2.5 servers and clients
rc  firebird2.5-server-common   2.5.6.27020.ds4-3   amd64   common files for firebird 2.5 servers
rc  firebird3.0-common          3.0.1.32609.ds4-8   all     common files for firebird 3.0 server, client and utilities
rc  imagemagick-common          8:6.9.6.2+dfsg-2    all     image manipulation programs -- infrastructure dummy package
It could be purged over them afterwards to completely remove them from the system. There are several shell coding snippets to be found on the net for completing this job automatically, like this one here:
dpkg -l   grep "^rc"   sed  e "s/^rc //"  e "s/ .*$//"   \
xargs dpkg  purge
The first thing which is needed to handle this by a Python script is the information that in apt_pkg, the package state rc per default is represented by the code 5:
>>> testpackage = cache['firebird2.5-common']
>>> testpackage.current_state
5
For changing things in the database apt_pkg.DepCache() could be docked onto an cache object to manipulate the installation state of a package within, like marking it to be removed resp. purged:
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> mydepcache.mark_delete(testpackage, True) # True = purge
>>> mydepcache.marked_delete(testpackage)
True
That s basically all which is needed for an old package purging maintenance script in Python 3, another iterator as package filter and there you go:
#!/usr/bin/env python3
import sys
import apt_pkg
from apt.progress.text import AcquireProgress
from apt.progress.base import InstallProgress
acquire = AcquireProgress()
install = InstallProgress()
apt_pkg.init()
cache = apt_pkg.Cache()
depcache = apt_pkg.DepCache(cache)
for paket in cache.packages:
    if paket.current_state == 5:
        depcache.mark_delete(paket, True)
depcache.commit(acquire, install)
The method DepCache.commit() applies the changes to the package archive at the end, and it needs apt_progress to perform. Of course this script needs superuser rights to run. It then returns something like this:
$ sudo ./rc-purge 
Reading package lists... Done
Building dependency tree
Reading state information... Done
Fetched 0 B in 0s (0 B/s)
custom fork found
got pid: 17984
got pid: 0
got fd: 4
(Reading database ... 701434 files and directories currently installed.)
Purging configuration files for libmimic0:amd64 (1.0.4-2.3) ...
Purging configuration files for libadns1 (1.5.0~rc1-1) ...
Purging configuration files for libreoffice-sdbc-firebird (1:5.2.2~rc2-2) ...
Purging configuration files for vlc-nox (2.2.4-7) ...
Purging configuration files for librlog5v5 (1.4-4) ...
Purging configuration files for firebird3.0-common (3.0.1.32609.ds4-8) ...
Purging configuration files for imagemagick-common (8:6.9.6.2+dfsg-2) ...
Purging configuration files for firebird2.5-server-common (2.5.6.27020.ds4-3)
It s not yet production ready (like there s an infinite loop if dpkg returns error code 1 like from can t remove non empty folder ). But generally, ATTENTION: be very careful with typos and other mistakes if you want to use that code snippet, a false script performing changes on the package database might destroy the integrity of your system, and you don t want that to happen.

detect wild packages Like said above, installed Debian packages might be called wild if they have been downloaded from somewhere on the net and installed manually, like that is done from time to time on many systems. If you want to remove that whole class of packages again for any reason, the question would be how to detect them. A characteristic element is that there is no source connected to such a package, and that could be detected by Python scripting using again the bindings for the APT libraries. The package object doesn t have an associated method to query its source, because the origin is always connected to a specific package version, like some specific version might have come from security updates for example. The current version of a package can be queried with DepCache.get_candidate_ver() which returns a complex apt_pkg.Version object:
>>> import apt_pkg
>>> apt_pkg.init()
>>> mycache = apt_pkg.Cache()
Reading package lists... Done
Building dependency tree
Reading state information... Done
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> testpackage = mydepcache.get_candidate_ver(mycache['nano'])
>>> testpackage
<apt_pkg.Version object: Pkg:'nano' Ver:'2.7.4-1' Section:'editors'  Arch:'amd64' Size:484790 ISize:2092032 Hash:33578 ID:31706 Priority:2>
For version objects there is the method file_list available, which returns a list containing PackageFile() objects:
>>> testpackage.file_list
[(<apt_pkg.PackageFile object: filename:'/var/lib/apt/lists/httpredir.debian.org_debian_dists_testing_main_binary-amd64_Packages'  a=testing,c=main,v=,o=Debian,l=Debian arch='amd64' site='httpredir.debian.org' IndexType='Debian Package Index' Size=38943764 ID:0>, 669901L)]
These file objects contain the index files which are associated with a specific package source (a downloaded package index), which could be read out easily (using a for-loop because there could be multiple file objects):
>>> for files in testpackage.file_list:
...     print(files[0].filename)
/var/lib/apt/lists/httpredir.debian.org_debian_dists_testing_main_binary-amd64_Packages
That explains itself: the nano binary package on this amd64 computer comes from httpredir.debian.org/debian testing main. If a package is wild that means it was installed manually, so there is no associated index file to be found, but only /var/lib/dpkg/status (libcudnn5 is not in the official package archives but distributed by Nvidia as a .deb package):
>>> testpackage2 = mydepcache.get_candidate_ver(mycache['libcudnn5'])
>>> for files in testpackage2.file_list:
...     print(files[0].filename)
/var/lib/dpkg/status
The simple trick now is to find all packages which have only /var/lib/dpkg/status as associated system file (that doesn t refer to what packages contain), an not an index file representing a package source. There s a little pitfall: that s truth also for virtual packages. But virtual packages commonly don t have an associated version (python-apt docs: to check whether a package is virtual; that is, it has no versions and is provided at least once ), and that can be queried by Package.has_versions. A filter to find out any packages that aren t virtual packages, are solely associated to one system file, and that file is /var/lib/dpkg/status, then goes like this:
for package in cache.packages:
    if package.has_versions:
        version = mydepcache.get_candidate_ver(package)
        if len(version.file_list) == 1:
            if 'dpkg/status' in version.file_list[0][0].filename:
                print(package.name)
On my Debian testing system this puts out a quite interesting list. It lists all the wild packages like libcudnn5, but also packages which are recently not in testing because they have been temporarily removed by AUTORM due to release critical bugs. Then there s all the obsolete stuff which have been installed from the package archives once and then being forgotten like old kernel header packages ( obsolete packages in dselect). So this snippet brings up other stuff, too. Thus, this might be more experimental stuff so far, though.

6 February 2017

Daniel Stender: Howto create a Debian 9 preview as Vagrant box with Packer

I ve got some little scripts and a template here to automatically create Vagrant boxes from cutting edge Debian testing daily snapshots (netinstall ISO image) using HashiCorp s Packer. To create Vagrant boxes with these, you first need a running binary of Packer. There is a Debian package available if that s also your working environment, but Packer is going to be introduced into the stable branch with the upcoming Stretch release itself. However, Ubuntu already has it, and some other derivatives, too. And there are prebuild binaries available from the developer s site which run fine out-of-the-box (you just have to put the single binary somewhere into you $PATH, or expand that to find it). The JSON template should run with any Packer which is available for any of the different systems. Vagrant itself isn t needed to build the box with Packer, but Virtualbox is of course needed to pre bake the machine image within a virtual machine. In Debian the base binaries of Virtualbox are in the contrib archive section, so that source might be added to /etc/apt/sources.list, if haven t already. The scripts have been tested to run with 5.1.10, and I haven t seen that any late versions are demanded in particular, but of course heavily outdated versions might not work properly. Packer installs the guest additions ISO file for Virtualbox into the virtual machine (and the shipped provisioning script then builds them inside). For that, the Debian package which ships that (which is in non-free) is recognized if it is installed, and then could be used by Packer. When the ISO isn t available in the places which Packer checks on the host the builder then automatically downloads the corresponding ISO from http://download.virtualbox.org/virtualbox. When the tarball with the scripts is unpacked, just do make create and the process should run through, if Packer and Virtualbox are available. If your environment doesn t have GNU Make nor wget you might want to copy the relevant lines from the Makefile and run it manually. But if it does, just do it like this:
/tmp/debian-testing-vagrantbox$ make create
virtualbox-iso output will be in this color.
==> virtualbox-iso: Downloading or copying Guest additions
    virtualbox-iso: Downloading or copying: file:///usr/share/virtualbox/VBoxGuestAdditions.iso
==> virtualbox-iso: Downloading or copying ISO
    virtualbox-iso: Downloading or copying: http://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/amd64/iso-cd/debian-testing-amd64-netinst.iso
    virtualbox-iso: Download progress: 10%
 ... 
    virtualbox-iso: Download progress: 96%
==> virtualbox-iso: Starting HTTP server on port 8219
==> virtualbox-iso: Creating virtual machine...
==> virtualbox-iso: Creating hard drive...
==> virtualbox-iso: Creating forwarded port mapping for communicator (SSH, WinRM, etc) (host port 2885)
==> virtualbox-iso: Starting the virtual machine...
==> virtualbox-iso: Waiting 10s for boot...
==> virtualbox-iso: Typing the boot command...
==> virtualbox-iso: Waiting for SSH to become available...
The Virtualbox window then pops up and the build process continues within the virtual machine for a while. You might want to file a Github issue when there s a problem on your machine, please! (please include the tail of your packer.log) The Packer template (debian-testing-vagrant.json) is described in the documentation of the virtualbox-iso builder. A preseeding script for the Debian Installer (preseed.cfg) is also included which gets injected into the virtual build environment during the build process. The creation progress of the Debian base installation could be easily monitored since the Virtualbox window is fully shown during the Packer run (if you loose your mouse pointer by clicking inside that window, do <Right>+<Control> to escape). For good performance, a fast internet connection is needed since a whole base system must be downloaded if that s available the whole automated process very only takes a couple of minutes to complete on a non-SSD machine. When Packer has finished and a fresh box is created (the size is about 690 MB), it then could be used with Vagrant. Just add the new box with:
/tmp/debian-testing-vagrantbox$ vagrant box add stretch-preview debian-testing-vagrant.box
==> box: Box file was not detected as metadata. Adding it directly...
==> box: Adding box 'stretch-preview' (v0) for provider: 
    box: Unpacking necessary files from: file:///tmp/debian-testing-vagrantbox/debian-testing-vagrant.box
==> box: Successfully added box 'stretch-preview' (v0) for 'virtualbox'!
It then could be initialized within a random working directory with:
/tmp/myproject$ vagrant init stretch-preview
A  Vagrantfile  has been placed in this directory. You are now
ready to  vagrant up  your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
 vagrantup.com  for more information on using Vagrant.
After that, you could launch the virtual box with:
/tmp/myproject$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'stretch-preview'...
==> default: Matching MAC address for NAT networking...
==> default: Setting the name of the VM: myproject_default_1486321215067_75270
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: 
    default: Vagrant insecure key detected. Vagrant will automatically replace
    default: this with a newly generated keypair for better security.
    default: 
    default: Inserting generated public key within guest...
    default: Removing insecure key from the guest if it's present...
    default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /tmp/myproject
/tmp/myproject$
Then you can SSH into it by doing (touch is used here only to point to the shared folder):
/tmp/myproject$ touch hello!
/tmp/myproject$ vagrant ssh
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
/usr/bin/xauth:  file /home/vagrant/.Xauthority does not exist
vagrant@packer-virtualbox-iso-1486319595:~$ $ cat /etc/debian_version
9.0
vagrant@packer-virtualbox-iso-1486319595:~$ ls /vagrant/
hello!  Vagrantfile
vagrant@packer-virtualbox-iso-1486319595:~$ exit
logout
Connection to 127.0.0.1 closed.
/tmp/myproject$ vagrant halt
==> default: Attempting graceful shutdown of VM...
/tmp/myproject$
If you haven t worked with Vagrant before, maybe this is appealing. The user experience is somehwhat different from working with a chroot. Packer makes it very convenient to keep freshly created boxes for it coming. Have fun!

18 December 2016

Daniel Stender: How to cheat setuptools-scm (Debian diary)

[2016-12-19: some additions] This is another little issue from Python packaging for Debian which I came across lately packaging the compressed NumPy based data container Bcolz. Upstream uses setuptools-scm to determine the software s version during build time from the source code management environment the code is in. This method is convenient for the upstream development because with that the version number doesn t need to be hard-coded, and often people just forget to update that (and other version carrying files like doc/conf.py) when a new version of a project is released. python-setuptools just needs to be added to the setup.py to do its job, and in Bcolz the code goes like this:
setup(
    name="bcolz",
    use_scm_version= 
        'version_scheme': 'guess-next-dev',
        'local_scheme': 'dirty-tag',
        'write_to': 'bcolz/version.py'
The file the version number is written to is bcolz/version.py. This file isn t in the upstream code revision nor in the tarball which was released by the upstream developers, it s always generated during build time. In Debian there is an error if you try to build a package from a source tree which contains files which aren t to be found in the corresponding tarball, like cruft from a previous build, or if any files have changed therefore every new package should be test build also twice in a row in a non-chroot environment. Generally there a two ways to solve this, either you add cruft to debian/clean, or you add the file resp. a matching file pattern to extend-diff-ignore in debian/source/options. Which method is the better one could be discussed, I m generally using the clean option if something isn t in the upstream tarball, and the source/options solution if something is already in the upstream tarball, but gets changed during a build. This is related to your preferred Git procedures, if you remove a file which is in the upstream tarball these removals have to be checked in separately, and that means everytime a new upstream tarball is released that is not very convenient. Another option which is available is to strip certain files from the upstream tarball by putting them on the Files-Excluded in deb/copyright. By the way, the same complex applies to egg-info/: that folder is shipped or is not shipped in the upstream tarball, and files in that folder get changed during build. When the source code is put into a Git environment for Debian packaging, there could be problems with the version number setuptools-scm comes up with. This setuptools extension gets the recent version from the latest Git tag when there is a version number to be found, and that s all right. In Git environments for Debian packaging (like e.g. of the Debian Science group, the Python groups and the others) that is available, like the commonly used upstream tags have that1. The problem is, sometimes the upstream version which Debian has2 doesn t match the original upstream version number which is wanted for version in bcolz/version.py. For example, the suffix +ds is used if the upstream tarball has been stripped from prebuild files or embedded convenience shipments (like it s the case with the Bcolz package where c-blosc/ has been stripped because that s build for another package), of the suffix +dfsg shows that non-DFSG free software has been removed (which can t be distributed through the main archive section). For example, the version string for Bcolz which is found after the build currently (1.1.0+ds1-1) is 1.1.0+ds1:
# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
version = '1.1.0+ds1'
But that s not wanted because this version never has been released, but appears everywhere:
$ pip list   grep bcolz
bcolz (1.1.0+ds1)
$ python3 -c 'import bcolz; bcolz.print_versions()'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
bcolz version:     1.1.0+ds1
There are several different ways how to fix this. The one with the crowbar (like said in German) is to patch use_scm_version out from setup.py, but if you don t provide any version in exchange the version number which is used by Setuptools then is 0.0.0. The upstream version could be hard-coded into the patch, but then again it has not to be forgotten to update it manually by the maintainer, which is not very convenient. Plus, setup.py could change and the patch then might need to be unfuzzed, thus more work. Bad. A patch could be spared by manipulating and exporting the SETUPTOOLS_SCM_PRETEND_VERSION environment variable for setuptools-scm in debian/rules, which is sometimes used when I see the returns for that string on Debian Code Search. But how to prevent to hard code the version number, here? The dpkg-dev package (pulled by build-essential) ships a Makefile snippet /usr/share/dpkg/pkg-info.mk which could be included into debian/rules. It defines several variables which are useful for packaging, like DEB_SOURCE contains the source package name, they are extracted from debian/changelog. But, DEB_VERSION_UPSTREAM which is available through that puts out the upstream version without epoch and Debian revision, but it s not getting any finer grained out of the box. For a custom fix, a regular expression which removes the +... extensions (if present) from the bare upstream version string would be s/\+[^+]*//:
$ echo "1.1.0+ds1"   sed -e 's/\+[^+]*//'
1.1.0
$ echo "1.1.0"   sed -e 's/\+[^+]*//'
1.1.0
$ echo "1.1.0+dfsg12"   sed -e 's/\+[^+]*//'
1.1.0
With that, a custom variable VERSION_UPSTREAM could be set on the top of DEB_VERSION_UPSTREAM (from pkg-info.mk) in debian/rules:
include /usr/share/dpkg/pkg-info.mk
VERSION_UPSTREAM = $(shell echo '$(DEB_VERSION_UPSTREAM)'   sed -e 's/\+[^+]*//')
export SETUPTOOLS_SCM_PRETEND_VERSION = $(VERSION_UPSTREAM)
Bam, that works (see the commit here):
# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
version = '1.1.0'
An addition, I ve seen that dh-python also takes care of SETUPTOOLS_SCM_PRETEND_VERSION since 2.20160609. The environment variable is set by the Debhelper build system if python 3, -setuptools-scm is among the build-dependencies in debian/control. The Perl code for that is in dh/pybuild.pm. I think the version number string above comes from dh-python s pretended version, and not from any of the Git tags (which are currently debian/1.1.0+ds1-1 and upstream/1.1.0+ds1).

  1. For Git in Debian packaging, e.g. see the DEP-14 proposal (Recommended layout for Git packaging repositories): http://dep.debian.net/deps/dep14/ [return]
  2. Following the scheme for package versions [epoch:]upstream_version[-debian-revision] [return]

21 November 2016

Reproducible builds folks: Reproducible Builds: week 82 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday November 13 and Saturday November 19 2016: Media coverage Elsewhere in Debian Documentation update Packages reviewed and fixed, and bugs filed Reviews of unreproducible packages 43 package reviews have been added, 4 have been updated and 12 have been removed in this week, adding to our knowledge about identified issues. 2 issue types have been updated: 4 issue types have been added: Weekly QA work During our reproducibility testing, some FTBFS bugs have been detected and reported by: strip-nondeterminism development disorderfs development debrebuild development debrebuild is new tool proposed by HW42 and josch (see #774415: "From srebuild sbuild-wrapper to debrebuild"). debrepatch development debrepatch is a set of scripts that we're currently developing to make it easier to track unapplied patches. We have a lot of those and we're not always sure if they still work. The plan is to set up jobs to automatically apply old reproducibility patches to newer versions of packages and notify the right people if they don't apply and/or no longer make the package reproducible. debpatch is a component of debrepatch that applies debdiffs to Debian source packages. In other words, it is to debdiff(1) what patch(1) is to diff(1). It is a general tool that is not specific to Reproducible Builds. This week, Ximin Luo worked on making it more "production-ready" and will soon submit it for inclusion in devscripts. reprotest development Ximin Luo significantly improved reprotest, adding presets and auto-detection of which preset to use. One can now run e.g. reprotest auto . or reprotest auto $pkg_$ver.dsc instead of the long command lines that were needed before. He also made it easier to set up build dependencies inside the virtual server and made it possible to specify pre-build dependencies that reprotest itself needs to set up the variations. Previously one had to manually edit the virtual server to do that, which was not very usable to humans without an in-depth knowledge of the building process. These changes will be tested some more and then released in the near future as reprotest 0.4. tests.reproducible-builds.org Misc. This week's edition was written by Chris Lamb, Holger Levsen, Ximin Luo and reviewed by a bunch of Reproducible Builds folks on IRC.

30 August 2016

Daniel Stender: My work for Debian in August

Here's again a little list of my humble off-time contributions I'm happy to add to the large amount of work we're completing all together each month. Then there is one more "new in Debian" (meaning: "new in unstable") announcement. First, the uploads (a few of them are from July): New packages: Sponsored uploads: Requested resp. suggested for packaging: New in Debian: Lasagne (deep learning framework) Now that the mathematical expression compiler Theano is available in Debian, deep learning frameworks resp. toolkits which have been build on top of it can become available within Debian, too (like Blocks, mentioned before). Theano is an own general computing engine which has been developed with a focus on machine learning resp. neural networks, which features an own declarative tensor language. The toolkits which have build upon it vary in the way how much they abstract the bare features of Theano, if they are "thick" or "thin" so to say. When the abstraction gets higher you gain more end user convenience up to the level that you have the architectural components of neural networks available for combination like in a lego box, while the more complicated things which are going on "under the hood" (like how the networks are actually implemented) are hidden. The downside is, thick abstraction layers usually makes it difficult to implement novel features like custom layers or loss functions. So more experienced users and specialists might to seek out for the lower abstraction toolkits, where you have to think more in terms of Theano. I've got an initial package of Keras in experimental (1.0.7-1), it runs (only a Python 3 package is available so far) but needs some more work (e.g. building the documentation with mkdocs). Keras is a minimalistic, high modular DNN library inspired by Torch1. It has a clean, rather easy API for experimenting and fast prototyping. It can also run on top of Google's TensorFlow, and we're going to have it ready for that, too. Lasagne follows a different approach. It's, like Keras and Blocks, a Python library to create and train multi-layered artificial neural networks in/on Theano for applications like image recognition resp. classification, speech recognition, image caption generation or other purposes like style transfers from paintings to pictures2. It abstracts Theano as little as possible, and could be seen rather like an extension or an add-on than an abstraction3. Therefore, knowledge on how things are working in Theano would be needed to make full use out of this piece of software. With the new Debian package (0.1+git20160728.8b66737-1)4, the whole required software stack (the corresponding Theano package, NumPy, SciPy, a BLAS implementation, and the nividia-cuda-toolkit and NVIDIA kernel driver to carry out computations on the GPU5) could be installed most conveniently just by a single apt-get install python ,3 -lasagne command6. If wanted with the documentation package lasagne-doc for offline use (no running around on remote airports seeking for a WIFI spot), either in the Python 2 or the Python 3 branch, or both flavours altogether7. While others have to spend a whole weekend gathering, compiling and installing the needed libraries you can grab yourself a fresh cup of coffee. These are the advantages of a fully integrated system (sublime message, as always: desktop users switch to Linux!). When the installation of packages has completed, the MNIST example of Lasagne could be used for a quick check if the whole library stack works properly8:
$ THEANO_FLAGS=device=gpu,floatX=float32 python /usr/share/doc/python-lasagne/examples/mnist.py mlp 5
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN 5005)
Loading data...
Downloading train-images-idx3-ubyte.gz
Downloading train-labels-idx1-ubyte.gz
Downloading t10k-images-idx3-ubyte.gz
Downloading t10k-labels-idx1-ubyte.gz
Building model and compiling functions...
Starting training...
Epoch 1 of 5 took 2.488s
  training loss:        1.217167
  validation loss:      0.407390
  validation accuracy:      88.79 %
Epoch 2 of 5 took 2.460s
  training loss:        0.568058
  validation loss:      0.306875
  validation accuracy:      91.31 %
The example on how to train a neural network on the MNIST database of handwritten digits is refined (it also provides --help) and explained in detail in the Tutorial section of the documentation in /usr/share/doc/lasagne-doc/html/. Very good starting points are also the IPython notebooks that are available from the tutorials by Eben Olson9 and Geoffrey French on the PyData London 201610. There you have Theano basics, examples for employing convolutional neural networks (CNN) and recurrent neural networks (RNN) for a range of different purposes, how to use pre-trained networks for image recognition, etc.

  1. For a quick comparison of Keras and Lasagne with other toolkits, see Alex Rubinsteyn's PyData NYC 2015 presentation on using LSTM (long short term memory) networks on varying length sequence data like Grimm's fairy tales (https://www.youtube.com/watch?v=E92jDCmJNek 27:30 sq.)
  2. https://github.com/Lasagne/Recipes/tree/master/examples/styletransfer
  3. Great introduction to Theano and Lasagne by Eben Olson on the PyData NYC 2015: https://www.youtube.com/watch?v=dtGhSE1PFh0
  4. The package is "freelancing" currently being in collab-maint, to set up a deep learning packaging team within Debian is in the stage of discussion.
  5. Only available for amd64 and ppc64el.
  6. You would need "testing" as package source in /etc/apt/sources.list to install it from the archive at the present time (I have that for years, but if Debian Testing could be advised as productive system is going to be discussed elsewhere), but it's coming up for Debian 9. The cuda-toolkit and pycuda are in the non-free section of the archive, thus non-free (mostly used in combination with contrib) must be added to main. Plus, it's a mere suggestion of the Theano packages to keep Theano in main, so --install-suggests is needed to pull it automatically with the same command, or this must be given explicitly.
  7. For dealing with Theano in Debian, see this previous blog posting
  8. Like suggested in the guideline From Zero to Lasagne on Ubuntu 14.04. cuDNN isn't available as official Debian package yet, but could be downloaded as a .deb package after registration at https://developer.nvidia.com/cudnn. It integrates well out of the box.
  9. https://github.com/ebenolson/pydata2015
  10. https://github.com/Britefury/deep-learning-tutorial-pydata2016, video: https://www.youtube.com/watch?v=DlNR1MrK4qE

28 August 2016

Reproducible builds folks: Reproducible builds: week 70 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday August 21 and Saturday August 27 2016: GSoC and Outreachy updates Packages reviewed and fixed, and bugs filed Reviews of unreproducible packages 10 package reviews have been added and 6 have been updated this week, adding to our knowledge about identified issues. A large number of issue types have been updated: Weekly QA work 29 FTBFS bugs have been reported by: diffoscope development Holger also created another test job for diffoscope on jenkins.debian.net, so that now also all commits to branches other than master are being tested. strip-nondeterminism development strip-nondeterminism 0.023-1 was uploaded by Chris Lamb:
 * Support Android .apk files with the JAR normalizer.
 * handlers/png.pm: Drop unused Archive::Zip import
 * Remove hyphen from non-determinism and non-deterministic.
 * javaproperties.pm: Match more styles of .properties and loosen filename matching.
 * Improve tests:
   - Make fixture runner generic to all normalizer types.
   - Replace (single) pearregistry test with a fixture.
   - Set a canonical time for fixture tests.
   - Add gzip testcase fixture.
   - Replace t/javadoc.t with fixture
   - Replace t/ar.t with a fixture.
   - t/javaproperties: move pom.properties and version.properties tests to fixtures
   - t/fixtures.t: move to using subtests
   - t/fixtures.t: Explicitly test that we can find a normalizer
   - t/fixtures.t: Don't run normalizer if we didn't find one.
strip-nondeterminism 0.023-2 uploaded by Mattia Rizzolo to allow stderr in autopkgtest. disorderfs development tests.reproducible-builds.org Debian: Somewhat related to reproducible builds there has been a first Debian jenkins team maintainance meeting on the #debian-qa IRC channel, to discuss current issues with the setup and to start the work of migrating jenkins.debian.net to jenkins.debian.org. The next meeting will take place on September 28th 2016 at 19 UTC. Misc. This week's edition was written by Chris Lamb and Holger Levsen and reviewed by a bunch of Reproducible Builds folks on IRC.

23 August 2016

Reproducible builds folks: Reproducible Builds: week 69 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday August 14 and Saturday August 20 2016: Fasten your seatbelts Important note: we enabled build path variation for unstable now, so your package(s) might become unreproducible, while previously it was said to be reproducible given a specific build path it probably still is reproducible but read on for the details below in the tests.reproducible-builds.org section! As said many times: this is still research and we are working to make it reality. Media coverage Daniel Stender blogged about python packaging and explained some caveats regarding reproducible builds. Toolchain developments Thomas Schmitt uploaded xorriso which now obeys SOURCE_DATE_EPOCH. As stated in its man pages:
ENVIRONMENT
[...]
SOURCE_DATE_EPOCH  belongs to the specs of reproducible-builds.org.  It
is supposed to be either undefined or to contain a decimal number which
tells the seconds since january 1st 1970. If it contains a number, then
it is used as time value to set the  default  of  --modification-date=,
--gpt_disk_guid,  and  --set_all_file_dates.  Startup files and program
options can override the effect of SOURCE_DATE_EPOCH.
Packages reviewed and fixed, and bugs filed The following packages have become reproducible after being fixed: The following updated packages appear to be reproducible now, for reasons we were not able to figure out. (Relevant changelogs did not mention reproducible builds.) The following 2 packages were not changed, but have become reproducible due to changes in their build-dependencies: tagsoup tclx8.4. Some uploads have addressed some reproducibility issues, but not all of them: Patches submitted that have not made their way to the archive yet: Bug tracker house keeping: Reviews of unreproducible packages 55 package reviews have been added, 161 have been updated and 136 have been removed in this week, adding to our knowledge about identified issues. 2 issue types have been updated: Weekly QA work FTBFS bugs have been reported by: diffoscope development Chris Lamb, Holger Levsen and Mattia Rizzolo worked on diffoscope this week. Improvements were made to SquashFS and JSON comparison, the https://try.diffoscope.org/ web service, documentation, packaging, and general code quality. diffoscope 57, 58, and 59 were uploaded to unstable by Chris Lamb. Versions 57 and 58 were both broken, so Holger set up a job on jenkins.debian.net to test diffoscope on each git commit. He also wrote a CONTRIBUTING document to help prevent this from happening in future. From these efforts, we were also able to learn that diffoscope is now reproducible even when built across multiple architectures:
< h01ger>   https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope.html shows these packages were built on amd64:
< h01ger>    bd21db708fe91c01ba1c9cb35b9d41a7c9b0db2b 62288 diffoscope_59_all.deb
< h01ger>    366200bf2841136a4c8f8c30bdc87057d59a4cdd 20146 trydiffoscope_59_all.deb
< h01ger>   and on i386:
< h01ger>    bd21db708fe91c01ba1c9cb35b9d41a7c9b0db2b 62288 diffoscope_59_all.deb
< h01ger>    366200bf2841136a4c8f8c30bdc87057d59a4cdd 20146 trydiffoscope_59_all.deb
< h01ger>   and on armhf:
< h01ger>    bd21db708fe91c01ba1c9cb35b9d41a7c9b0db2b 62288 diffoscope_59_all.deb
< h01ger>    366200bf2841136a4c8f8c30bdc87057d59a4cdd 20146 trydiffoscope_59_all.deb
And those also match the binaries uploaded by Chris in his diffoscope 59 binary upload to ftp.debian.org, yay! Eating our own dogfood and enjoying it! tests.reproducible-builds.org Debian related: The last change probably will have an impact you will see: your package might become unreproducible in unstable and this will be shown on tracker.debian.org, while it will still be reproducible in testing. We've done this, because we think reproducible builds are possible with arbitrary build paths. But: we don't think those are a realistic goal for stretch, where we still recommend to use .buildinfo to record the build patch and then do rebuilds using that path. We are doing this, because besides doing theoretical groundwork we also have a practical goal: enable users to independently verify builds. And if they only can do this with a fixed path, so be it. For now :) To be clear: for Stretch we recommend that reproducible builds are done in the same build path as the "original" build. Finally, and just for our future references, when we enabled build path variation on Saturday, August 20th 2016, the numbers for unstable were:
suite all reproducible unreproducible ftbfs depwait not for this arch blacklisted
unstable/amd64 24693 21794 (88.2%) 1753 (7.1%) 972 (3.9%) 65 (0.2%) 95 (0.3%) 10 (0.0%)
unstable/i386 24693 21182 (85.7%) 2349 (9.5%) 972 (3.9%) 76 (0.3%) 103 (0.4%) 10 (0.0%)
unstable/armhf 24693 20889 (84.6%) 2050 (8.3%) 1126 (4.5%) 199 (0.8%) 296 (1.1%) 129 (0.5%)
Misc. Ximin Luo updated our git setup scripts to make it easier for people to write proper descriptions for our repositories. This week's edition was written by Ximin Luo and Holger Levsen and reviewed by a bunch of Reproducible Builds folks on IRC.

20 August 2016

Daniel Stender: Collected notes from Python packaging

Here are some collected notes on some particular problems from packaging Python stuff for Debian, and more are coming up like this in the future. Some of the issues discussed here might be rather simple and even benign for the experienced packager, but maybe this is be helpful for people coming across the same issues for the first time, wondering what's going wrong. But some of the things discussed aren't easy. Here are the notes for this posting, there is no particular order. UnicodeDecoreError on open() in Python 3 running in non-UTF-8 environments I've came across this problem again recently packaging httpbin 0.5.0. The build breaks the following way e.g. trying to build with sbuild in a chroot, that's the first run of setup.py with the default Python 3 interpreter:
I: pybuild base:184: python3.5 setup.py clean 
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    os.path.join(os.path.dirname(__file__), 'README.rst')).read()
  File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2386: ordinal not in range(128)
E: pybuild pybuild:274: clean: plugin distutils failed with: exit code=1: python3.5 setup.py clean 
One comes across UnicodeDecodeError-s quite oftenly on different occasions, mostly related to Python 2 packaging (like here). Here it's the case that in setup.py the long_description is tried to be read from the opened UTF-8 encoded file README.rst:
long_description = open(
    os.path.join(os.path.dirname(__file__), 'README.rst')).read()
This is a problem for Python 3.5 (and other Python 3 versions) when setup.py is executed by an interpreter run in a non-UTF-8 environment1:
$ LANG=C.UTF-8 python3.5 setup.py clean
running clean
$ LANG=C python3.5 setup.py clean
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    os.path.join(os.path.dirname(__file__), 'README.rst')).read()
  File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2386: ordinal not in range(128)
$ LANG=C python2.7 setup.py clean
running clean
The reason for this error is, the default encoding for file object returned by open() e.g. in Python 3.5 is platform dependent, so that read() fails on that when there's a mismatch:
>>> readme = open('README.rst')
>>> readme
<_io.TextIOWrapper name='README.rst' mode='r' encoding='ANSI_X3.4-1968'>
Non-UTF-8 build environments because $LANG isn't particularly set at all or set to C are common or even default in Debian packaging, like in the continuous integration resp. test building for reproducible builds the primary environment features that (see here). That's also true for the base system of the sbuild environment:
$ schroot -d / -c unstable-amd64-sbuild -u root
(unstable-amd64-sbuild)root@varuna:/# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
A problem like this is solved mostly elegant by installing some workaround in debian/rules. A quick and easy fix is to add export LC_ALL=C.UTF-8 here, which supersedes the locale settings of the build environment. $LC_ALL is commonly used to change the existing locale settings, it overrides all other locale variables with the same value (see here). C.UTF-8 is an UTF-8 locale which is available by default in a base system, it could be used without installing the locales package (or worse, the huge locales-all package):
(unstable-amd64-sbuild)root@varuna:/# locale -a
C
C.UTF-8
POSIX
This problem ideally should be taken care of upstream. In Python 3, the default open() is io.open(), for which the specific encoding could be given, so that the UnicodeDecodeError disappears. Python 2 uses os.open() for open(), which doesn't support that, but has io.open(), too. A fix which works for both Python branches goes like this:
import io
long_description = io.open(
    os.path.join(os.path.dirname(__file__), 'README.rst'), encoding='utf-8').read()
non-deterministic order of requirements in egg-info/requires.txt This problem appeared in prospector/0.11.7-5 in the reproducible builds test builds, that was the first package of Prospector running on Python 32. It was revealed that there were differences in the sorting order of the [with_everything] dependencies resp. requirements in prospector-0.11.7.egg-info/requires.txt if the package was build on varying systems:
$ debbindiff b1/prospector_0.11.7-5_amd64.changes b2/prospector_0.11.7-5_amd64.changes 
 ... 
  prospector_0.11.7-5_all.deb
      file list
      @@ -1,3 +1,3 @@
       -rw-r--r--   0        0        0        4 2016-04-01 20:01:56.000000 debian-binary
      --rw-r--r--   0        0        0     4343 2016-04-01 20:01:56.000000 control.tar.gz
      +-rw-r--r--   0        0        0     4344 2016-04-01 20:01:56.000000 control.tar.gz
       -rw-r--r--   0        0        0    74512 2016-04-01 20:01:56.000000 data.tar.xz
      control.tar.gz
          control.tar
              ./md5sums
                  md5sums
                  Files in package differ
      data.tar.xz
          data.tar
              ./usr/share/prospector/prospector-0.11.7.egg-info/requires.txt
              @@ -1,12 +1,12 @@
               
               [with_everything]
              +pyroma>=1.6,<2.0
               frosted>=1.4.1
               vulture>=0.6
              -pyroma>=1.6,<2.0
Reproducible builds folks recognized this to be a toolchain problem and set up the issue randonmness_in_python_setuptools_requires.txt to cover this. Plus, a wishlist bug against python-setuptools was filed to fix this. The patch which was provided by Chris Lamb adds sorting of dependencies in requires.txt for Setuptools by adding sorted() (stdlib) to _write_requirements() in command/egg_info.py:
--- a/setuptools/command/egg_info.py
+++ b/setuptools/command/egg_info.py
@@ -406,7 +406,7 @@ def warn_depends_obsolete(cmd, basename, filename):
 def _write_requirements(stream, reqs):
     lines = yield_lines(reqs or ())
     append_cr = lambda line: line + '\n'
-    lines = map(append_cr, lines)
+    lines = map(append_cr, sorted(lines))
     stream.writelines(lines)
O.k. In the toolchain, nothing sorts these requirements alphanumerically to make differences disappear, but what is the reason for these differences in the Prospector packages, though? The problem is somewhat subtle. In setup.py, [with_everything] is a dictionary entry of _OPTIONAL (which is used for extras_require) that is created by a list comprehension out of the other values in that dictionary. The code goes like this:
_OPTIONAL =  
    'with_frosted': ('frosted>=1.4.1',),
    'with_vulture': ('vulture>=0.6',),
    'with_pyroma': ('pyroma>=1.6,<2.0',),
    'with_pep257': (),  # note: this is no longer optional, so this option will be removed in a future release
 
_OPTIONAL['with_everything'] = [req for req_list in _OPTIONAL.values() for req in req_list]
The result, the new _OPTIONAL dictionary including the key with_everything (which w/o further sorting is the source of the list of requirements requires.txt) e.g. looks like this (code snipped run through in my IPython):
In [3]: _OPTIONAL
Out[3]: 
 'with_everything': ['vulture>=0.6', 'pyroma>=1.6,<2.0', 'frosted>=1.4.1'],
 'with_vulture': ('vulture>=0.6',),
 'with_pyroma': ('pyroma>=1.6,<2.0',),
 'with_frosted': ('frosted>=1.4.1',),
 'with_pep257': () 
That list comprehension iterates over the other dictionary entries to gather the value of with_everything, and Python programmers know that of course dictionaries are not indexed and therefore the order of the key-value pairs isn't fixed, but is determined by certain conditions. That's the source for the non-determinism of this Debian package revision of Prospector. This problem has been fixed by a patch, which just presorts the list of requirements before it gets added to _OPTIONAL:
@@ -76,8 +76,8 @@
     'with_pyroma': ('pyroma>=1.6,<2.0',),
     'with_pep257': (),  # note: this is no longer optional, so this option will be removed in a future release
  
-_OPTIONAL['with_everything'] = [req for req_list in _OPTIONAL.values() for req in req_list]
-
+with_everything = [req for req_list in _OPTIONAL.values() for req in req_list]
+_OPTIONAL['with_everything'] = sorted(with_everything)
In comparison to the list method sort(), the function sorted() does not change iterables in-place, but returns a new object. Both could be used. As a side note, egg-info/requires.txt isn't even needed, but that's another issue.

  1. As an alternative to prefixing LC_ALL, env -i could be used to get an empty environment.
  2. 0.11.7-4 already but this package revision was in experimental due to the switch to Python 3 and therefore not tested by reproducible builds continuous integration.

21 July 2016

Reproducible builds folks: Reproducible builds: week 62 in Stretch cycle

What happened in the Reproducible Builds effort between June 26th and July 2nd 2016: Read on to find out why we're lagging some weeks behind ! GSoC and Outreachy updates Toolchain fixes With the doxygen upload we are now down to only 2 modified packages in our repository: dpkg and rdfind. Weekly reports delay and the future of statistics To catch up with our backlog of weekly reports we have decided to skip some of the statistics for this week. We might publish them in a future report, or we might switch to a format where we summarize them more (and which we can create (even) more automatically), we'll see. We are doing these weekly statistics because we believe it's appropriate and useful to credit people's work and make it more visible. What do you think? We would love to hear your thoughts on this matter! Do you read these statistics? Somewhat? Actually, thanks to the power of notmuch, Holger came up with what you can see below, so what's missing for this week are the uploads fixing irreprodubilities. Which we really would like to show for the reasons stated above and because we really really need these uploads to happen ;-) But then we also like to confirm the bugs are really gone, which (atm) requires manual checking, and to look for the words "reproducible" and "deterministic" (and spelling variations) in debian/changelogs of all uploads, to spot reproducible work not tracked via the BTS. And we still need to catch up on the backlog of weekly reports. Bugs submitted with reproducible usertags It seems DebCamp in Cape Town was hugely successful and made some people get a lot of work done: 61 bugs have been filed with reproducible builds usertags and 60 of them had patches: Package reviews 437 new reviews have been added (though most of them were just linking the bug, "only" 56 new issues in packages were found), an unknown number has been been updated and 60 have been removed in this week, adding to our knowledge about identified issues. 4 new issue types have been found: Weekly QA work 98 FTBFS bugs have been reported by Chris Lamb and Santiago Vila. diffoscope development strip-nondeterminism development tests.reproducible-builds.org Misc. This week's edition was written by Mattia Rizzolo, Reiner Herrmann, Ceridwen and Holger Levsen and reviewed by a bunch of Reproducible builds folks on IRC.

20 July 2016

Daniel Stender: Theano in Debian: maintenance, BLAS and CUDA

I'm glad to announce that we have the current release of Theano (0.8.2) in Debian unstable now, it's on its way into the testing branch and the Debian derivatives, heading for Debian 9. The Debian package is maintained in behalf of the Debian Science Team. We have a binary package with the modules in the Python 2.7 import path (python-theano), if you want or need to stick to that branch a little longer (as a matter of fact, in the current popcon stats it's the most installed package), and a package running on the default Python 3 version (python3-theano). The comprehensive documentation is available for offline usage in another binary package (theano-doc). Although Theano builds its extensions on run time and therefore all binary packages contain the same code, the source package generates arch specific packages1 for the reason that the exhaustive test suite could run over all the architectures to detect if there are problems somewhere (#824116). what's this? In a nutshell, Theano is a computer algebra system (CAS) and expression compiler, which is implemented in Python as a library. It is named after a Classical Greek female mathematician and it's developed at the LISA lab (located at MILA, the Montreal Institute for Learning Algorithms) at the Universit de Montr al. Theano tightly integrates multi-dimensional arrays (N-dimensional, ND-array) from NumPy (numpy.ndarray), which are broadly used in Scientific Python for the representation of numeric data. It features a declarative Python based language with symbolic operations for the functional definition of mathematical expressions, which allows to create functions that compute values for them. Internally the expressions are represented as directed graphs with nodes for variables and operations. The internal compiler then optimizes those graphs for stability and speed and then generates high-performance native machine code to evaluate resp. compute these mathematical expressions2. One of the main features of Theano is that it's capable to compute also on GPU processors (graphical processor unit), like on custom graphic cards (e.g. the developers are using a GeForce GTX Titan X for benchmarks). Today's GPUs became very powerful parallel floating point devices which can be employed also for scientific computations instead of 3D video games3. The acronym "GPGPU" (general purpose graphical processor unit) refers to special cards like NVIDIA's Tesla4, which could be used alike (more on that below). Thus, Theano is a high-performance number cruncher with an own computing engine which could be used for large-scale scientific computations. If you haven't came across Theano as a Pythonistic professional mathematician, it's also one of the most prevalent frameworks for implementing deep learning applications (training multi-layered, "deep" artificial neural networks, DNN) around5, and has been developed with a focus on machine learning from the ground up. There are several higher level user interfaces build in the top of Theano (for DNN, Keras, Lasagne, Blocks, and others, or for Python probalistic programming, PyMC3). I'll seek for some of them also becoming available in Debian, too. helper scripts Both binary packages ship three convenience scripts, theano-cache, theano-test, and theano-nose. Instead of them being copied into /usr/bin, which would result into a binaries-have-conflict violation, the scripts are to be found in /usr/share/python-theano (python3-theano respectively), so that both module packages of Theano can be installed at the same time. The scripts could be run directly from these folders, e.g. do $ python /usr/share/python-theano/theano-nose to achieve that. If you're going to heavy use them, you could add the directory of the flavour you prefer (Python 2 or Python 3) to the $PATH environment variable manually by either typing e.g. $ export PATH=/usr/share/python-theano:$PATH on the prompt, or save that line into ~/.bashrc. Manpages aren't available for these little helper scripts6, but you could always get info on what they do and which arguments they accept by invoking them with the -h (for theano-nose) resp. help flag (for theano-cache). running the tests On some occasions you might want to run the testsuite of the installed library, like to check over if everything runs fine on your GPU hardware. There are two different ways to run the tests (anyway you need to have python ,3 -nose installed). One is, you could launch the test suite by doing $ python -c 'import theano; theano.test() (or the same with python3 to test the other flavour), that's the same what the helper script theano-test does. However, by doing it that way some particular tests might fail by raising errors also for the group of known failures. Known failures are excluded from being errors if you run the tests by theano-nose, which is a wrapper around nosetests, so this might be always the better choice. You can run this convenience script with the option --theano on the installed library, or from the source package root, which you could pull by $ sudo apt-get source theano (there you have also the option to use bin/theano-nose). The script accept options for nosetests, so you might run it with -v to increase verbosity. For the tests the configuration switch config.device must be set to cpu. This will also include the GPU tests when a proper accessible device is detected, so that's a little misleading in the sense of it doesn't mean "run everything on the CPU". You're on the safe side if you run it always like this: $ THEANO_FLAGS=device=cpu theano-nose, if you've set config.device to gpu in your ~/.theanorc. Depending on the available hardware and the used BLAS implementation (see below) it could take quite a long time to run the whole test suite through, on the Core-i5 in my laptop that takes around an hour even excluded the GPU related tests (which perform pretty fast, though). Theano features a couple of switches to manipulate the default configuration for optimization and compilation. There is a rivalry between optimization and compilation costs against performance of the test suite, and it turned out the test suite performs a quicker with lesser graph optimization. There are two different switches available to control config.optimizer, the fast_run toggles maximal optimization, while fast_compile runs only a minimal set of graph optimization features. These settings are used by the general mode switches for config.mode, which is either FAST_RUN by default, or FAST_COMPILE. The default mode FAST_RUN (optimizer=fast_run, linker=cvm) needs around 72 minutes on my lower mid-level machine (on un-optimized BLAS). To set mode=FAST_COMPILE (optimizer=fast_compile, linker=py) brings some boost for the performance of the test suite because it runs the whole suite in 46 minutes. The downside of that is that C code compilation is disabled in this mode by using the linker py, and also the GPU related tests are not included. I've played around with using the optimizer fast_compile with some of the other linkers (c py and cvm, and their versions without garbage collection) as alternative to FAST_COMPILE with minimal optimization but also machine code compilation incl. GPU testing. But to my experience, fast_compile without another than the linker py results in some new errors and failures of some tests on amd64, and this might the case also on other architectures, too. By the way, another useful feature is DebugMode for config.mode, which verifies the correctness of all optimizations and compares the C to Python results. If you want to have detailed info on the configuration settings of Theano, do $ python -c 'import theano; print theano.config' less, and check out the chapter config in the library documentation in the documentation. cache maintenance Theano isn't a JIT (just-in-time) compiler like Numba, which generates native machine code in the memory and executes it immediately, but it saves the generated native machine code into compiledirs. The reason for doing it that way is quite practical like the docs explain, the persistent cache on disk makes it possible to avoid generating code for the same operation, and to avoid compiling again when different operations generate the same code. The compiledirs by default are located within $(HOME)/.theano/. After some time the folder becomes quite large, and might look something like this:
$ ls ~/.theano
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.11+-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12rc1-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.1+-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.2-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.2rc1-64
If the used Python version changed like in this example you might to want to purge obsolete cache. For working with the cache resp. the compiledirs, the helper theano-cache comes in handy. If you invoke it without any arguments the current cache location is put out like ~/.theano/compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64 (the script is run from /usr/share/python-theano). So, the compiledirs for the old Python versions in this example (11+ and 12rc1) can be removed to free the space they occupy. All compiledirs resp. cache directories meaning the whole cache could be erased by $ theano-cache basecompiledir purge, the effect is the same as by performing $ rm -rf ~/.theano. You might want to do that e.g. if you're using different hardware, like when you got yourself another graphics card. Or habitual from time to time when the compiledirs fill up so much that it slows down processing with the harddisk being very busy all the time, if you don't have an SSD drive available. For example, the disk space of build chroots carrying (mainly) the tests completely compiled through on default Python 2 and Python 3 consumes around 1.3 GB (see here). BLAS implementations Theano needs a level 3 implementation of BLAS (Basic Linear Algebra Subprograms) for operations between vectors (one-dimensional mathematical objects) and matrices (two-dimensional objects) carried out on the CPU. NumPy is already build on BLAS and pulls the standard implementation (libblas3, soure package: lapack), but Theano links directly to it instead of using NumPy as intermediate layer to reduce the computational overhead. For this, Theano needs development headers and the binary packages pull libblas-dev by default, if any other development package of another BLAS implementation (like OpenBLAS or ATLAS) isn't already installed, or pulled with them (providing the virtual package libblas.so). The linker flags could be manipulated directly through the configuration switch config.blas.ldflags, which is by default set to -L/usr/lib -lblas -lblas. By the way, if you set it to an empty value, Theano falls back to using BLAS through NumPy, if you want to have that for some reason. On Debian, there is a very convenient way to switch between BLAS implementations by the alternatives mechanism. If you have several alternative implementations installed at the same time, you can switch from one to another easily by just doing:
$ sudo update-alternatives --config libblas.so
There are 3 choices for the alternative libblas.so (providing /usr/lib/libblas.so).
  Selection    Path                                  Priority   Status
------------------------------------------------------------
* 0            /usr/lib/openblas-base/libblas.so      40        auto mode
  1            /usr/lib/atlas-base/atlas/libblas.so   35        manual mode
  2            /usr/lib/libblas/libblas.so            10        manual mode
  3            /usr/lib/openblas-base/libblas.so      40        manual mode
Press <enter> to keep the current choice[*], or type selection number:
The implementations are performing differently on different hardware, so you might want to take the time to compare which one does it best on your processor (the other packages are libatlas-base-dev and libopenblas-dev), and choose that to optimize your system. If you want to squeeze out all which is in there for carrying out Theano's computations on the CPU, another option is to compile an optimized version of a BLAS library especially for your processor. I'm going to write another blog posting on this issue. The binary packages of Theano ship the script check_blas.py to check over how well a BLAS implementation performs with it, and if everything works right. That script is located in the misc subfolder of the library, you could locate it by doing $ dpkg -L python-theano grep check_blas (or for the package python3-theano accordingly), and run it with the Python interpreter. By default the scripts puts out a lot of info like a huge perfomance comparison reference table, the current setting of blas.ldflags, the compiledir, the setting of floatX, OS information, the GCC version, the current NumPy config towards BLAS, NumPy location and version, if Theano linked directly or has used the NumPy binding, and finally and most important, the execution time. If just the execution time for quick perfomance comparisons is needed this script could be invoked with -q. Theano on CUDA The function compiler of Theano works with alternative backends to carry out the computations, like the ones for graphics cards. Currently, there are two different backends for GPU processing available, one docks onto NVIDIA's CUDA (Compute Unified Device Architecture) technology7, and another one for libgpuarray, which is also developed by the Theano developers in parallel. The libgpuarray library is an interesting alternative for Theano, it's a GPU tensor (multi-dimensional mathematical object) array written in C with Python bindings based on Cython, which has the advantage of running also on OpenCL8. OpenCL, unlike CUDA9, is full free software, vendor neutral and overcomes the limitation of the CUDA toolkit being only available for amd64 and the ppc64el port (see here). I've opened an ITP on libgpuarray and we'll see if and how this works out. Another reason for it would be great to have it available is that it looks like CUDA currently runs into problems with GCC 610. More on that, soon. Here's a litle checklist for setting up your CUDA device so that you don't have to experience something like this:
$ THEANO_FLAGS=device=gpu,floatX=float32 python ./cat_dog_classifier.py 
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
hardware check For running Theano on CUDA you need an NVIDIA graphics card which is capable of doing that. You can recheck if your device is supported by CUDA here. When the hardware isn't too old (CUDA support started with GeForce 8 and Quadro X series) or too strange I think it isn't working only in exceptional cases. You can check your model and if the device is present in the system on the bare hardware level by doing this:
$ lspci   grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
If a line like this doesn't get returned, your device most probably is broken, or not properly connected (ouch). If rev ff appears at the end of the line that means the device is off meaning powered down. This might be happening if you have a laptop with Optimus graphics hardware, and the related drivers have switched off the unoccupied device to safe energy11. kernel module Running CUDA applications requires the proprietary NVIDIA driver kernel module to be loaded into the kernel and working. If you haven't already installed it for another purpose, the NVIDIA driver and the CUDA toolkit are both in the non-free section of the Debian archive, which is not enabled by default. To get non-free packages you have to add non-free (and it's better to do so, also contrib) to your package source in /etc/apt/sources.list, which might then look like this:
deb http://httpredir.debian.org/debian/ testing main contrib non-free
After doing that, perform $ apt-cache update to update the package lists, and there you go with the non-free packages. The headers of the running kernel are needed to compile modules, you can get them together with the NVIDIA kernel module package by running:
$ sudo apt-get install linux-headers-$(uname -r) nvidia-kernel-dkms build-essential
DKMS will then build the NVIDIA module for the kernel and does some other things on the system. When the installation has finished, it's generally advised to reboot the system completely. troubleshooting If you have problems with the CUDA device, it's advised to verify if the following things concerning the NVIDIA driver resp. kernel module are in order: blacklist nouveau Check if the default Nouveau kernel module driver (which blocks the NVIDIA module) for some reason still gets loaded by doing $ lsmod grep nouveau. If nothing gets returned, that's right. If it's still in the kernel, just add blacklist nouveau to /etc/modprobe.d/blacklist.conf, and update the booting ramdisk with sudo update-initramfs -u afterwards. Then reboot once more, this shouldn't be the case then anymore. rebuild kernel module To fix it when the module haven't been properly compiled for some reason you could trigger a rebuild of the NVIDIA kernel module with $ sudo dpkg-reconfigure nvidia-kernel-dkms. When you're about to send your hardware in to repair because everything looks all right but the device just isn't working, that really could help (own experience). After the rebuild of the module or modules (if you have a few kernel packages installed) has completed, you could recheck if the module really is available by running:
$ sudo modinfo nvidia-current
filename:       /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko
alias:          char-major-195-*
version:        352.79
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
vermagic:       4.4.0-1-amd64 SMP mod_unload modversions 
parm:           NVreg_Mobile:int
It should be something similiar to this when everything is all right. reload kernel module When there are problems with the GPU, maybe the kernel module isn't properly loaded. You could recheck if the module has been properly loaded by doing
$ lsmod   grep nvidia
nvidia_uvm             73728  0
nvidia               8540160  1 nvidia_uvm
drm                   356352  7 i915,drm_kms_helper,nvidia
The kernel module could be loaded resp. reloaded with $ sudo nvidia-modprobe (that tool is from the package nvidia-modprobe). unsupported graphics card Be sure that you graphics cards is supported by the current driver kernel module. If you have bought new hardware, that's quite possible to come out being a problem. You can get the version of the current NVIDIA driver with:
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 5.3.1 20160528 (Debian 5.3.1-21)
Then, google the version number like nvidia 352.79, this should get you onto an official driver download page like this. There, check for what's to be found under "Supported Products". I you're stuck with that there are two options, to wait until the driver in Debian got updated, or replace it with the latest driver package from NVIDIA. That's possible to do, but something more for experienced users. occupied graphics card The CUDA driver cannot work while the graphical interface is busy like by processing the graphical display of your X.Org server. Which kernel driver actually is used to process the desktop could be examined by this command:12
$ grep '(II).*([0-9]):' /var/log/Xorg.0.log
[    37.700] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20150522
[    37.700] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917-2 (Vincent Cheng <vcheng@debian.org>)
 ... 
[    39.808] (II) intel(0): switch to mode 1920x1080@60.0 on eDP1 using pipe 0, position (0, 0), rotation normal, reflection none
[    39.810] (II) intel(0): Setting screen physical size to 508 x 285
[    67.576] (II) intel(0): EDID vendor "CMN", prod id 5941
[    67.576] (II) intel(0): Printing DDC gathered Modelines:
[    67.576] (II) intel(0): Modeline "1920x1080"x0.0  152.84  1920 1968 2000 2250  1080 1083 1088 1132 -hsync -vsync (67.9 kHz eP)
This example shows that the rendering of the desktop is performed by the graphical device of the Intel CPU, which is just like it's needed for running CUDA applications on your NVIDIA graphics card, if you don't have another one. nvidia-cuda-toolkit With the Debian package of the CUDA toolkit everything pretty much runs out of the box for Theano. Just install it with apt-get, and you're ready to go, the CUDA backend is the default one. Pycuda is also a suggested dependency of the binary packages, it could be pulled together with the CUDA toolkit. The up-to-date CUDA release 7.5 is of course available, with that you have Maxwell architecture support so that you can run Theano on e.g. a GeForce GTX Titan X with 6,2 TFLOPS on single precision13 at an affordable price. CUDA 814 is around the corner with support for the new Pascal architecture15. Like the GeForce GTX 1080 high-end gaming graphics card already has 8,23 TFLOPS16. When it comes to professional GPGPU hardware like the Tesla P100 there is much more computational power available, scalable by multiplication of cores resp. cards up to genuine little supercomputers which fit on a desk, like the DGX-117. Theano can use multiple GPUs for calculations to work with highly scaled hardware, I'll write another blog post on this issue. Theano on the GPU It's not difficult to run Theano on the GPU. Only single precision floating point numbers (float32) are supported on the GPU, but that is sufficient for deep learning applications. Theano uses double precision floats (float64) by default, so you have to set the configuration variable config.floatX to float32, like written on above, either with the THEANO_FLAGS environment variable or better in your .theanorc file, if you're going to use the GPU a lot. Switching to the GPU actually happens with the config.device configuration variable, which must be set to either gpu or gpu0, gpu1 etc., to choose a particular one if multiple devices are available. Here's is a little test script check1.py, it's taken from the docs and slightly altered. You can run that script either with python or python3 (there was a single test failure on the Python 3 package, so the Python 2 library might be a little more stable currently). For comparison, here's an example on how it perfoms on my hardware, one time on the CPU, one more time on the GPU:
$ THEANO_FLAGS=floatX=float32 python ./check1.py 
[Elemwise exp,no_inplace (<TensorType(float32, vector)>)]
Looping 1000 times took 4.481719 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu
$ THEANO_FLAGS=floatX=float32,device=gpu python ./check1.py 
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN not available)
[GpuElemwise exp,no_inplace (<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise exp,no_inplace .0)]
Looping 1000 times took 1.164906 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
If you got a result like this you're ready to go with Theano on Debian, training computer vision classifiers or whatever you want to do with it. I'll write more on for what Theano could be used, soon.

  1. Some ports are disabled because they are currently not supported by Theano. There are NotImplementedErrors and other errors in the tests on the numpy.ndarray object being not aligned. The developers commented on that, see here. And on some ports the build flags -m32 resp. -m64 of Theano aren't supported by g++, the build flags can't be manipulated easily.
  2. Theano Development Team: "Theano: a Python framework for fast computation of mathematical expressions"
  3. Marc Couture: "Today's high-powered GPUs: strong for graphics and for maths". In: RTC magazine June 2015, pp. 22 25
  4. Ogier Maitre: "Understanding NVIDIA GPGPU hardware". In: Tsutsui/Collet (eds.): Massively parallel evolutionary computation on GPGPUs. Berlin, Heidelberg: Springer 2013, pp. 15-34
  5. Geoffrey French: "Deep learing tutorial: advanved techniques". PyData London 2016 presentation
  6. Like the description of the Lintian tag binary-without-manpage says, that's not needed for them being in /usr/share.
  7. Tom. R. Halfhill: "Parallel processing with CUDA: Nvidia's high-performance computing platform uses massive multithreading". In: Microprocessor Report January 28, 2008
  8. Faber et.al: "Parallelwelten: GPU-Programmierung mit OpenCL". In: C't 26/2014, pp. 160-165
  9. For comparison, see: Valentine Sinitsyn: "Feel the taste of GPU programming". In: Linux Voice February 2015, pp. 106-109
  10. https://lists.debian.org/debian-devel/2016/07/msg00004.html
  11. If Optimus (hybrid) graphics hardware is present (like commonly today on PC laptops), Debian launches the X-server on the graphics processing unit of the CPU, which is ideal for CUDA. The problem with Optimus actually is the graphics processing on the dedicated GPU. If you are using Bumblebee, the Python interpreter which you want to run Theano on has be to be started with the launcher primusrun, because Bumblebee powers the GPU down with the tool bbswitch every time it isn't used, and I think also the kernel module of the driver is dynamically loaded.
  12. Thorsten Leemhuis: "Treiberreviere. Probleme mit Grafiktreibern f r Linux l sen": In: C't Nr.2/2013, pp. 156-161
  13. Martin Fischer: "4K-Rakete: Die schnellste Single-GPU-Grafikkarte der Welt". In C't 13/2015, pp. 60-61
  14. http://www.heise.de/developer/meldung/Nvidia-CUDA-8-bringt-Optimierungen-fuer-die-Pascal-Architektur-3164254.html
  15. Martin Fischer: "All In: Nvidia enth llt die GPU-Architektur 'Pascal'". In: C't 9/2016, pp. 30-31
  16. Martin Fischer: "Turbo-Pascal: High-End-Grafikkarte f r Spieler: GeForce GTX 1080". In: C't 13/2016, pp. 100-103
  17. http://www.golem.de/news/dgx-1-nvidias-supercomputerchen-mit-8x-tesla-p100-1604-120155.html

3 July 2016

Daniel Stender: My work for Debian in June

At least a little much more time lst month for helping to improve the operating system we're working on. I've worked on a couple of package updates, mostly they are for PAPT, DPMT, Debian Science, or pkg-go (in order of completion): New packages: Sponsored Uploads: Again some further ahead with these ones. My "new package of the month" is going to be Theano. I'm still working on the introduction, it'll come hereafter very soon.

8 June 2016

Reproducible builds folks: Reproducible builds: week 58 in Stretch cycle

What happened in the Reproducible Builds effort between May 29th and June 4th 2016: Media coverage Ed Maste will present Reproducible Builds in FreeBSD at BDSCan 2016 in Ottawa, Canada on June 11th. GSoC and Outreachy updates Toolchain fixes Other upstream fixes Packages fixed The following 53 packages have become reproducible due to changes in their build-dependencies: angband blktrace code-saturne coinor-symphony device-tree-compiler mpich rtslib ruby-bcrypt ruby-bson-ext ruby-byebug ruby-cairo ruby-charlock-holmes ruby-curb ruby-dataobjects-sqlite3 ruby-escape-utils ruby-ferret ruby-ffi ruby-fusefs ruby-github-markdown ruby-god ruby-gsl ruby-hdfeos5 ruby-hiredis ruby-hitimes ruby-hpricot ruby-kgio ruby-lapack ruby-ldap ruby-libvirt ruby-libxml ruby-msgpack ruby-ncurses ruby-nfc ruby-nio4r ruby-nokogiri ruby-odbc ruby-oj ruby-ox ruby-raindrops ruby-rdiscount ruby-redcarpet ruby-redcloth ruby-rinku ruby-rjb ruby-rmagick ruby-rugged ruby-sdl ruby-serialport ruby-sqlite3 ruby-unicode ruby-yajl ruby-zoom thin The following packages have become reproducible after being fixed: Some uploads have addressed some reproducibility issues, but not all of them: Uploads with an unknown result because they fail to build: Patches submitted that have not made their way to the archive yet: Package reviews 45 reviews have been added, 25 have been updated and 25 have been removed in this week. 12 FTBFS bugs have been reported by Chris Lamb and Niko Tyni. diffoscope development strip-nondeterminism development Mattia uploaded strip-nondeterminism 0.018-1 which improved support for *.epub files. tests.reproducible-builds.org Misc. Last week we also learned about progress of reproducible builds in FreeBSD. Ed Maste announced a change to record the build timestamp during ports building, which is required for later reproduction. This week's edition was written by Reiner Herrman, Holger Levsen and Chris Lamb and reviewed by a bunch of Reproducible builds folks on IRC.

30 May 2016

Daniel Stender: My work for Debian in May

No double posting this time ;-) I've got not so much spare time this month to spend on Debian, but I could work on the following packages: This series of blog postings also includes little introductions of and into new packages in the archive. This month there is: Pyinfra Pyinfra is a new project which is currently still in development state. It has been already pointed out in an interesting German article1, and is now available as package maintained within the Python Applications Team. It's currently a one man production by Nick Barrett, and eagerly developed in the past weeks (we're currently at 0.1~dev24). Pyinfra is a remote server configuration/provisioning/service deployment tool which belongs in the same software category like Puppet or Ansible2. It's for provisioning one or an array of remote servers with software packages and to configure them. Pyinfra runs agentless like Ansible, that means for using it nothing special (like a daemon) has to run on targeted servers. It's written to be used for provisioning POSIX compatible Linux systems and has alternatives when it comes to special features like package managers (e.g. supports apt as well as yum). The documentation could be found in usr/share/doc/pyinfra/html/. Here's a little crash course on how to use Pyinfra: The pyinfra CLI tool is used on the command line like this, deploy scripts, single operations or facts (see below) could be used on a single server or a multitude of remote servers:
$ pyinfra -i <inventory script/single host> <deploy script>
$ pyinfra -i <inventory script/single host> --run <operation>
$ pyinfra -i <inventory script/single host> --facts <fact>
Remote servers which are operated on must provide a working shell and they must be reachable by SSH. For connecting, --port, --user, --password, --key/--key-password and --sudo flags are available, --sudo to gain superuser rights. Root access or sudo rights of course have to be already set up. By the way, localhost could be operated on the same way. Single operations are organized in modules like "apt", "files", "init", "server" etc. With the --run option they could be used individually on servers like follows, e.g. server.user adds a new user on a single targeted system (-v adds verbosity to the pyinfra run):
$ pyinfra -i 192.0.2.10 --run server.user sam --user root --key ~/.ssh/sshkey --key-password 123456 -v
Multiple servers can be grouped in inventories, which hold the targeted hosts and data associated with them, like e.g. an inventory file farm1.py would contain lists like this:
COMPUTE_SERVERS = ['192.0.2.10', '192.0.2.11']
DATABASE_SERVERS = ['192.0.2.20', '192.0.2.21']
Group designators must be all caps. A higher level of grouping are the file names of inventory scripts, thus COMPUTE_SERVERS and DATABASE_SERVERS can be referenced to at the same time by the group designator farm1. Plus, all servers are automatically added to the group all. And, inventory scripts should be stored in the subfolder inventory/ in the project directory. Inventory files then could be used instead of specific IP addresses like this, the single operation then gets performed on all given machines in farm1.py:
$ pyinfra -i inventory/farm1.py  --run server.user sam --user root --key ~/.ssh/sshkey --key-password=123456 -v
Deployment scripts could be used together with group data files in the subfolder group_data/ in the project directory. For example, a group_data/farm1.py designates all servers given in inventory/farm1.py (by the way, all.py designates all servers), and contains the random attribute user_name (attributes must be lowercase), next to authentication data for the whole inventory group:
user_name = 'sam'
ssh_user = 'root'
ssh_key = '~/.ssh/sshkey'
ssh_key_password = '123456'
The random attribute can be picked up by a deployment script using host.data() like follows, user_name could be used again for e.g. server.user(), like this:
from pyinfra import host
from pyinfra.modules import server
server.user(host.data.user_name)
This deploy, the ensemble of inventory file, group data file and deployment script (usually placed top level in the project folder) then could be run that way:
$ pyinfra -i inventory/farm1.py deploy.py
You have guessed it, since deployment scripts are Python scripts they are fully programmable (please regard that Pyinfra is build & runs on Python 3 on Debian), and that's the main advantage point with this piece of software. Quite handy for that come Pyinfra facts, functions which check different things on remote systems and return information as Python data. Like e.g. deb_packages returns a dictionary of installed packages from a remote apt based server:
$ pyinfra -i 192.0.2.10 --fact deb_packages --user root --key ~/.ssh/sshkey --key-password=123456
 
    "192.0.2.10":  
        "libdebconfclient0": "0.192",
        "python-debian": "0.1.27",
        "libavahi-client3": "0.6.31-5",
        "dbus": "1.8.20-0+deb8u1",
        "libustr-1.0-1": "1.0.4-3+b2",
        "sed": "4.2.2-4+b1",
Using facts, Pyinfra reveals its full potential. For example, a deployment script could go like this, linux.distribution() returns a dict containing the installed distribution:
from pyinfra import host
from pyinfra.modules import apt
if host.fact.linux_distribution['name'] == 'Debian':
   apt.packages(packages='gummi', present=True, update=True)
elif host.fact.linux_distribution['name'] == 'CentOS':
   pass
I'll spare more sophisticated examples to keep this introduction simple. Beyond fancy deployment scripts, Pyinfra features an own API by which it could be programmed from the outside, and much more. But maybe that's enough to introduce Pyinfra. That are the usage basics. Pyinfra is a brand new project and it remains to be seen whether the developer can keep on further developing the tool like he does these days. For a private project it's insane to attempt to become a contender for the established "big" free configuration management tools and frameworks, but, if Puppet has become too complex in the meanwhile or not3, I really don't think that's the point here. Pyinfra follows an own approach in being programmable the way which it is. And it's definitely not harm to have it in the toolbox already, not trying to replace nothing. Brainstorm After the first package has been in experimental, the Brainstorm library from Swiss AI research institute IDSIA4 is now available as python3-brainstorm in unstable. Brainstorm is a lean, easy-to-use library for setting up deep learning networks (multiple layered artificial neural networks) for machine learning applications like for image and speech recognition or natural language processing. To set up a working training network for a classifier for handwritten digits like the MNIST dataset (a usual "hello world") just takes a couple of lines, like an example demonstrates. The package is maintained within the Debian Python Modules Team. The Debian package ships a couple of examples in /usr/share/python3-brainstorm/examples (the data/ and examples/ folders of the upstream tarball are combined here). Among them there are5: The current documentation in /usr/share/doc/python3-brainstorm/html/ isn't complete yet (several chapters are under construction), but there's a walkthrough on the CIFAR-10 example. The MNIST example has been extended by Github user pinae, and has been explained in German C't recently6. What are the perspectives for further development? Like Zhou Mo confirmed, there are a couple of deep learning frameworks around having a rather poor outlook since there have been abandoned after being completed as PhD projects. There's really no point for thriving to have them all in Debian, like the ITP of Minerva has been given up partly for this reason, there weren't any commits since 08/2015 (and because cuDNN isn't available and most likely won't). Brainstorm, 0.5 have been released 05/2015, also was a PhD project as IDSIA. It's stated on Github that the project is "under active development", but the rather sparse project page on the other side expresses the "hope the community will help us to further improve Brainstorm". This sentence much often implies that the developers are not actively working on the project. But there are recent commits and it looks that upstream is active and could be reached when there are problems, and that the project is active. So I don't think we're riding a dead horse, here. The downside for Brainstorm in Debian is, it seems that the libraries which are needed for GPU accelerated processing can't be fully provided. Pycuda is available, but scikit-cuda (an additional library which provides wrappers for CUDA features like CUBLAS, CUFFT and CUSOLVER) is not and won't be, because the CULA Dense Toolkit (scikit-cuda also contains wrappers for also that) is not available freely as source. Because of that, a dependency against pycuda, not even as Suggests (it's non-free), has been spared. Without GPU acceleration, Brainstorm computes the matrices on openBLAS using a Cython wrapper on the NumpyHandler, and the PyCudaHandler couldn't be used. openBLAS makes pretty good use of the available hardware (it distributes over all available CPU cores), but it's not yet possible to run Brainstorm full throttle using available floating point devices to reduce training times, which becomes crucial when the projects are getting bigger. Brainstorm belongs to the number of deep learning frameworks already being or becoming available in Debian. Currently there is: I've checked over Microsoft's CNTK, but although it's also set free recently I have my doubts if that could be included. Apparently there are dependencies against non-free software and most likely other issues. So much for a little update on the state of deep learning in Debian, please excuse if my radar misses something.

  1. Tim Sch rmann: "Schlangen l: Automatisiertes Service-Deployment mit Pyinfra". In: IT-Administrator 05/2016, pp. 90-95.
  2. For a comparison of configuration management software like this, see B wetter/Johannsen/Steig: "Baukastensysteme: Konfigurationsmanagement mit Open-Source-Software". In: iX 04/2016, pp. 94-99 (please excuse the prevalence of German articles in the pointers, I've just have them at hand).
  3. On the points of critique on Puppet, see Martin Loschwitz: "David gegen Goliath Zwei Welten treffen aufeinander: Puppet und Ansible". In Linux-Magazin 01/2016, 50-54.
  4. See the interview with IDSIA's deep learning guru J rgen Schmidhuber in German C't 2014/09, p. 148
  5. The examples scripts need some more finetuning. To run the data creation scripts in place the environment variable BRAINSTORM_DATA_DIR could be set, but the trained networks are currently tried to write in place. So please copy the scripts into some workspace if you want to try them out. I'll patch the example scripts to run out-of-the-box, soon.
  6. Johannes Merkert: "Ziffernlerner. Ein k nstliches neuronales Netz selber gebaut". In: C't 2016/06, p. 142-147. Web: http://www.heise.de/ct/ausgabe/2016-6-Ein-kuenstliches-neuronales-Netz-selbst-gebaut-3118857.html.
  7. See Ramon Wartala: "Tiefensch rfe: Deep learning mit NVIDIAs Jetson-TX1-Board und dem Caffe-Framework". In: iX 06/2016, pp. 100-103
  8. https://lists.debian.org/debian-science/2016/03/msg00016.html

30 April 2016

Daniel Stender: What I've worked on for Debian this month

This month I've worked on the following things for Debian: To begin with that, I've set up a Debhelper sequencer script for dh-buildinfo1, this add-on now can be used with dh $@ --with buildinfo in deb/rules instead of having to explicitly call it somewhere in an override. Debops I've set up initial Debian packages of Debops2, a collection of fine crafted Ansible roles and playbooks especially for Debian servers, shipped with a couple of convenience and wrapper scripts in Python3. There are two binary packages, one for the toolset (debops), and the other for the playbooks and roles of the project (debops-playbooks). The application is easy to use, just initialize a new project with debops-init foo and add your server(s) to foo/ansible/inventory/hosts belonging to groups representing services and things you want to employ on them. Like the group [debops_gitlab] automatically installs a complete running Gitlab setup on one or a multitude of servers in the same run with the debops command4. Use other groups like [debops_mariadb_server] accordingly in the same host inventory. Ansible runs agent less, so you don't have to prepare freshly setup servers with nothing special to use that tool randomly (like on localhost). The list of things you could deploy with Debops is quite amazing and you've got dozens of services at your hand. The new packages are currently in experimental because they need some more fine tuning, like there are a couple of minor error messages which recently occur using it, but it works well. The (early staged) documentation unfortunately couldn't be packaged because of the scattered resp. collective nature of the project (all parts have their own Github repositories)5, and also how to generate the upstream tarball remains a bit of a challenge (currently, it's the outcome of debops-init)6. I'll have this package in unstable soon. More info on Debops is coming up, then. Hashicorp's Packer I'm very glad to announce that Packer7 is ready being available in unstable, and the two year old RFP bug could be finally closed8. It's another great and much convenient devops tool which does a lot of different things in an automated fashion using only a single "one-argument" CLI tool in combination with a couple of lines in a configuration script (thanks to Yaroslav Halchenko for the tip). Packer helps creating machine images for different platforms. This is like when you use e.g. Debian installations in a Qemu box for testing or development purposes. Instead of setting up a new virtual machine manually like installing Debian on another computer this process could be automated with Packer, like I've written about in this blog entry here9. You just need a template containing instructions for the included Qemu-builder and a preseeding script for the Debian installer, and there you go drinking your coffee while Packer does all the work for you: downloading the installation ISO image, creating the new virtual harddrive, booting the emulator, running the whole installation process automatically like answering questions, selecting things, rebooting without ISO image to complete the installation etc. A couple of minutes and you have a new pre-baked virtual machine image like from a vendoring machine, a fresh one everytime you need it. Packer10 supports a number of builders for different target platforms (desktop virtualization solutions as much as public cloud providers and private cloud software), can build in parallel, and also the full range of common provisioners can be employed in the process to equip the newly installed OSs. Vagrant boxes could be generated by one of the included postprocessors. I'll write more on Packer here on this blog, soon. There were more then two dozens of packages missing to complete Packer11, which is the achievement of combined forces within the pkg-go group. Much thanks esp. to Alexandre Viau who have worked on the most of the needed new packages. Thanks also to the FTP-masters which were always very quick in reviewing the Go packages, so that it could be proceeded to build and package the sub dependent new ones always consecutively. Squirrel3 I've didn't had the most work with it and just sponsored this for Fabian Wolff, but want to highlight here that there's a new package of Squirrel12 now available in Debian13. Squirrel is a lightweight scripting language, somewhat comparable to Lua. It's fully object-oriented and highly embeddable, it's used in a lot of commerical computer games under the hood for implementing intelligence for bots next to other things14, but also for the Internet of Things (it's embedded in hardware from Electric Imp). Squirrel functions could be called from C++15. I've filed an ITP bug for Squirrel already in 2011 (#651195), but always something else got in the way, and it ended up being an RFP. I'm really glad that it got picked up and completed. misc There were a couple of uploads on updated upstream tarballs and for fixing bugs, namely afl/2.10b-1 and 2.11b-1, python-afl/0.5.3-1, pyutilib/5.3.2-1, pyomo/4.3.11327-1, libvigraimpex/1.10.0+git20160211.167be93dfsg-2 (fix of #820429, thanks for Tobias Frost), and gamera/3.4.2+svn1454-1. For the pkg-go group, I've set up a new package of github-mitchellh-ioprogress (which is needed by the official DigitalOcean CLI tool doctl, now RFP #807956 instead of ITP due to the lack of time - again facing a lot of missing packages), and provided a little patch for dh-make-golang updating some standards16. For Packer I've also updated azure-go-autorest and azure-sdk as team upload (#821938, #821832), but it came out that the project which is currently under heavy development towards a new official release broke a lot in the past weeks (and no Git branching have been used), so that Packer as a matter of fact needed a vendored snapshot, although there have been only a couple of commits in between. Docker-registry hat the same problem with the new package of azure-sdk/2.1.1~beta1, so that it needed to be fixed, too (#822146). By the way, the tool ratt17 comes very handy for automatically test building down all reverse dependencies, not only for Go packages (thanks to Tianon Gravi for the tip). Finally, I've posted the needed reverse depencies as RFP bugs for Terraform18 (again quite a lot), Vuls19, and cve-dictionary20, which is needed for Vuls. I'll let them rest a while waiting to get picked up before working anything down.

Daniel Stender: My work for Debian in April

This month I've worked on the these things for Debian: To begin with that, I've set up a Debhelper sequencer script for dh-buildinfo1, this add-on now can be used with dh $@ --with buildinfo in deb/rules instead of having to explicitly call it somewhere in an override. Debops I've set up initial Debian packages of Debops2, a collection of fine crafted Ansible roles and playbooks especially for Debian servers (servers which run on Debian), which are shipped with a couple of helper and wrapper scripts in Python3. There are two binary packages, one for the toolset (debops), and the other for the playbooks and roles of the project (debops-playbooks). The application is easy to use, just initialize a new project with debops-init foo and add your server(s) to foo/ansible/inventory/hosts belonging to groups representing services and things you want to employ on them. For example, the group [debops_gitlab] automatically installs a complete running Gitlab setup on one or a multitude of servers in the same run with the debops command4. Other groups like [debops_mariadb_server] could be used accordingly in the same host inventory. Ansible works without agent, so you don't have to prepare freshly setup servers with nothing special to use that tool randomly (like on localhost). The list of things you could deploy with Debops is quite amazing and dozens of services are at hand. The new Debian packages are currently in experimental because they need some more fine tuning, e.g. there are a couple of minor error messages which recently occur using it, but it works well. The (early staged) documentation unfortunately couldn't be packaged because of the scattered resp. collective nature of the project (all parts have their own Github repositories)5, and also how to generate the upstream tarball remains a bit of a challenge (currently, it's the outcome of debops-init)6. I'll have this package in unstable soon. More info on Debops is coming up, then. HashiCorp's Packer I'm very glad to announce that Packer7 is ready being available in unstable, and the RFP bug could be finally closed after I've taken it over8. It's another great and much convenient devops tool which does a lot of different things in an automated fashion using only a single "one-argument" CLI tool in combination with a couple of lines in a configuration script (thanks to Yaroslav Halchenko for the tip). Packer helps creating machine images for different platforms. This is like when you use e.g. Debian installations in a Qemu box for testing or development purposes. Instead of setting up a new virtual machine manually the same way as installing Debian on another computer this process can be completely automated with Packer, like I've written about in this blog entry here9. You just need a template which contains instructions for the included Qemu builder and a preseeding script for the Debian installer, and there you go drinking your coffee while Packer does all the work: download the ISO image for installation, create the new virtual harddrive, boot the emulator, run the whole installation process automatically like with answering questions, selecting things, reboot without ISO image to complete the installation etc. A couple of minutes and you have a new pre-baked virtual machine image like from a vendoring machine, another fresh one could be created anytime. Packer10 supports a number of builders for different target platforms (desktop virtualization solutions as much as public cloud providers and private cloud software), can build in parallel, and also the full range of common provisioners can be employed in the process to equip the newly installed OSs with services and programs. Vagrant boxes could be generated by one of the included postprocessors. I'll write more on Packer here on this blog, soon. There were more then two dozens of packages missing to complete Packer11, which is the achievement of combined forces within the pkg-go group. Much thanks esp. to Alexandre Viau who have worked on the most of the needed new packages. Thanks also to the FTP masters which were always very quick in reviewing the Go packages, so that it could be proceeded to build and package the sub dependent new ones always consecutively. Squirrel3 I've didn't had the major work of that and just sponsored this for Fabian Wolff, but want to highlight here that there's a new package of Squirrel12 now available in Debian13. Squirrel is a lightweight scripting language, somewhat comparable to Lua. It's fully object-oriented and highly embeddable, it's used in a lot of commerical computer games under the hood for implementing intelligence for bots next to other things14, but also for the Internet of Things (it's embedded in hardware from Electric Imp). Squirrel functions could be called from C++15. I've filed an ITP bug for Squirrel already in 2011 (#651195), but always something else had a higher priority, and it ended up being an RFP. I'm really glad that it got picked up and completed quickly afterwards. misc There were a couple of uploads on updated upstream tarballs and for fixing bugs, namely afl/2.10b-1 and 2.11b-1, python-afl/0.5.3-1, pyutilib/5.3.2-1, pyomo/4.3.11327-1, libvigraimpex/1.10.0+git20160211.167be93dfsg-2 (fix of #820429, thanks to Tobias Frost), and gamera/3.4.2+svn1454-1. For the pkg-go group, I've set up a new package of github-mitchellh-ioprogress (which is needed by the official DigitalOcean CLI tool doctl, now RFP #807956 instead of ITP due to the lack of time, again a lot of missing packages are missing for that), and provided a little patch for dh-make-golang updating some standards16. For Packer I've also updated azure-go-autorest and azure-sdk as team upload (#821938, #821832), but it came out that the project which is currently under heavy development towards a new official release broke a lot in the past weeks (no Git branching have been used), so that Packer as a matter of fact needed a vendored snapshot, although there have been only a couple of commits in between. Docker-registry has the same problem with the new package of azure-sdk/2.1.1~beta1, so that it needed to be fixed, too (#822146). By the way, the tool ratt17 comes very handy for automatically test building down all reverse dependencies, not only for Go packages (thanks to Tianon Gravi for the tip). Finally, I've posted the needed reverse depencies as RFP bugs for Terraform18 (again quite a lot), Vuls19, and cve-dictionary20, which is needed for Vuls. I'll let them rest a while waiting to get picked up before working anything down.

12 April 2016

Reproducible builds folks: Reproducible builds: week 49 in Stretch cycle

What happened in the reproducible builds effort between March 27th and April 2nd: Toolchain fixes Packages fixed The following packages have become reproducible due to changes in their build dependencies: ctioga2, erlang-bitcask, libcommons-collections3-java, libjgoodies-animation-java, libjide-oss-java, octave-gsl, octave-interval, octave-io, octave-quaternion, octave-signal, octave-stk, segment, service-wrapper-java, sqlline, svnkit, uddi4j, velocity-tools. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: tests.reproducible-builds.org The i386 builders are now testing packages on i386 for reproducibility. It will probably take 4 weeks until everything has been build twice, on this arch. (h01ger) Package reviews 52 reviews have been removed, 24 added and 4 updated in the previous week. Chris Lamb reported 13 new FTBFS. New issue: copyright_year_in_comments_generated_by_ckbuilder. Misc. This week's edition was mostly written by Lunar, with some help by Reiner Herrmann and h01ger.

Reproducible builds folks: Reproducible builds: week 48 in Stretch cycle

What happened in the reproducible builds effort between March 20th and March 26th: Toolchain fixes Daniel Kahn Gillmor worked on removing build path from build symbols submitting a patch adding -fdebug-prefix-map to clang to match GCC, another patch against gcc-5 to backport the removal of -fdebug-prefix-map from DW_AT_producer, and finally by proposing the addition of a normalizedebugpath to the reproducible feature set of dpkg-buildflags that would use -fdebug-prefix-map to replace the current directory with . using -fdebug-prefix-map. Sergey Poznyakoff merged the --clamp-mtime option so that it will be featured in the next Tar release. This option is likely to be used by dpkg-deb to implement deterministic mtimes for packaged files. Packages fixed The following packages have become reproducible due to changes in their build dependencies: augeas, gmtkbabel, ktikz, octave-control, octave-general, octave-image, octave-ltfat, octave-miscellaneous, octave-mpi, octave-nurbs, octave-octcdf, octave-sockets, octave-strings, openlayers, python-structlog, signond. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: tests.reproducible-builds.org i386 build nodes have been setup by converting 2 of the 4 amd64 nodes to i386. (h01ger) Package reviews 92 reviews have been removed, 66 added and 31 updated in the previous week. New issues: timestamps_generated_by_xbean_spring, timestamps_generated_by_mangosdk_spiprocessor. Chris Lamb filed 7 FTBFS bugs. Misc. On March 20th, Chris Lamb gave a talk at FOSSASIA 2016 in Singapore. The very same day, but a few timezones apart, h01ger did a presentation at LibrePlanet 2016 in Cambridge, Massachusetts. Seven GSoC/Outreachy applications were made by potential interns to work on various aspects of the reproducible builds effort. On top of interacting with several applicants, prospective mentors gathered to review the applications.

27 March 2016

Lunar: Reproducible builds: week 48 in Stretch cycle

What happened in the reproducible builds effort between March 20th and March 26th:

Toolchain fixes
  • Sebastian Ramacher uploaded breathe/4.2.0-1 which makes its output deterministic. Original patch by Chris Lamb, merged uptream.
  • Rafael Laboissiere uploaded octave/4.0.1-1 which allows packages to be built in place and avoid unreproducible builds due to temporary build directories appearing in the .oct files.
Daniel Kahn Gillmor worked on removing build path from build symbols submitting a patch adding -fdebug-prefix-map to clang to match GCC, another patch against gcc-5 to backport the removal of -fdebug-prefix-map from DW_AT_producer, and finally by proposing the addition of a normalizedebugpath to the reproducible feature set of dpkg-buildflags that would use -fdebug-prefix-map to replace the current directory with . using -fdebug-prefix-map. As succesful result of lobbying at LibrePlanet 2016, the --clamp-mtime option will be featured in the next Tar release. This option is likely to be used by dpkg-deb to implement deterministic mtimes for packaged files.

Packages fixed The following packages have become reproducible due to changes in their build dependencies: augeas, gmtkbabel, ktikz, octave-control, octave-general, octave-image, octave-ltfat, octave-miscellaneous, octave-mpi, octave-nurbs, octave-octcdf, octave-sockets, octave-strings, openlayers, python-structlog, signond. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #818742 on milkytracker by Reiner Herrmann: sorts the list of source files.
  • #818752 on tcl8.4 by Reiner Herrmann: sort source files using C locale.
  • #818753 on tk8.6 by Reiner Herrmann: sort source files using C locale.
  • #818754 on tk8.5 by Reiner Herrmann: sort source files using C locale.
  • #818755 on tk8.4 by Reiner Herrmann: sort source files using C locale.
  • #818952 on marionnet by ceridwen: dummy out build date and uname to make build reproducible.
  • #819334 on avahi by Reiner Herrmann: ship upstream changelog instead of the one generated by gettextize (although duplicate of #804141 by Santiago Vila).

tests.reproducible-builds.org i386 build nodes have been setup by converting 2 of the 4 amd64 nodes to i386. (h01ger)

Package reviews 92 reviews have been removed, 66 added and 31 updated in the previous week. New issues: timestamps_generated_by_xbean_spring, timestamps_generated_by_mangosdk_spiprocessor. Chris Lamb filed 7 FTBFS bugs.

Misc. On March 20th, Chris Lamb gave a talk at FOSSASIA 2016 in Singapore. The very same day, but a few timezones apart, h01ger did a presentation at LibrePlanet 2016 in Cambridge, Massachusetts. Seven GSoC/Outreachy applications were made by potential interns to work on various aspects of the reproducible builds effort. On top of interacting with several applicants, prospective mentors gathered to review the applications. Huge thanks to Linda Naeun Lee for the new hackergotchi visible on Planet Debian.

Next.