Search Results: "dnn"

2 September 2020

Norbert Preining: Multiple GPUs for graphics and deep learning

For long time I have been using a good old nvidia GeForce GTX 1050 for my display and deep learning needs. I reported a few times how to get Tensorflow running on Debian/Sid, see here and here. Later on I switched to AMD GPU in the hope that an open source approach to both GPU driver as well as deep learning (ROCm) would improve the general experience. Unfortunately it turned out that AMD GPUs are generally not ready for deep learning usage. The problems with AMD and ROCm are far and wide. First of all, it seems that for anything more complicated then simple stuff, AMD s flagship RX 5700(XT) and all GFX10 (Navi) based cards are not(!!!) supported in ROCm. Yes, you read correct AMD does not support 5700(XT) cards in the ROCm stack. Some simple stuff works, but nothing for real computations. Then, even IF they would support, ROCm as distributed is currently a huge pain in the butt. The source code is a huge mess, and building usable packages from it is probably possible, but quite painful (I am member of the ROCm packaging team in Debian, and have tried many hours). And the packages provided by AMD are not installable on Debian/sid due to library incompatibilities. So that left me with a bit a problem: for work I need to train quite some neural networks, do model selection, etc. Doing this on a CPU is a bit a burden. So at the end I decided to put the nVidia card back into the computer (well, after moving it to a bigger case but that is a different story to tell). Here are the steps I did to get both cards working for their respective target: AMD GPU for driving the console and X (and games!), and the nVidia card doing the deep learning stuff (tensorflow using the GPU). Starting point Starting point was a working AMD GPU installation. The AMD GPU is also the first GPU card (top slot) and thus the one that is used by the BIOS and the Linux console. If you want the video output on the second card you need to trick, and probably don t have console output, etc etc. So not a solution for me. Installing libcuda1 and the nvidia kernel drivers Next step was installing the libcuda1 package:
apt install libcuda1
This installs a lot of stuff, including the nvidia drivers, GLX libraries, alternatives setup, and update-glx tool and package. The kernel module should be built and installed automatically for your kernel. Installing CUDA Follow more or less the instructions here and do
wget -O- https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub   sudo tee /etc/apt/trusted.gpg.d/nvidia-cuda.asc
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"   sudo tee /etc/apt/sources.list.d/nvidia-cuda.list
sudo apt-get update
sudo apt-get install cuda-libraries-10-1
Warning! At the moment Tensorflow packages require CUDA 10.1, so don t install the 10.0 version. This might change in the future! This will install lots of libs into /usr/local/cuda-10.1 and add the respective directory to the ld.so path by creating a file /etc/ld.so.conf.d/cuda-10-1.conf. Install CUDA CuDNN One difficult to satisfy dependency are the CuDNN libraries. In our case we need the version 7 library for CUDA 10.1. To download these files one needs to have a NVIDIA developer account, which is quick and painless. After that go to the CuDNN page where one needs to select Archived releases and then Download cuDNN v7.N.N (xxxx NN, YYYY), for CUDA 10.1 and then cuDNN Runtime Library for Ubuntu18.04 (Deb). At the moment (as of today) this will download a file libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb which needs to be installed with dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb. Updating the GLX setting Here now comes the very interesting part one needs to set up the GLX libraries. Reading the output of update-glx --help and then the output of update-glx --list glx:
$ update-glx --help
update-glx is a wrapper around update-alternatives supporting only configuration
of the 'glx' and 'nvidia' alternatives. After updating the alternatives, it
takes care to trigger any follow-up actions that may be required to complete
the switch.
 
It can be used to switch between the main NVIDIA driver version and the legacy
drivers (eg: the 304 series, the 340 series, etc).
 
For users with Optimus-type laptops it can be used to enable running the discrete
GPU via bumblebee.
 
Usage: update-glx <command>
 
Commands:
  --auto <name>            switch the master link <name> to automatic mode.
  --display <name>         display information about the <name> group.
  --query <name>           machine parseable version of --display <name>.
  --list <name>            display all targets of the <name> group.
  --config <name>          show alternatives for the <name> group and ask the
                           user to select which one to use.
  --set <name> <path>      set <path> as alternative for <name>.
 
<name> is the master name for this link group.
  Only 'nvidia' and 'glx' are supported.
<path> is the location of one of the alternative target files.
  (e.g. /usr/lib/nvidia)
 
$ update-glx --list glx
/usr/lib/mesa-diverted
/usr/lib/nvidia
I was tempted into using
update-glx --config glx /usr/lib/mesa-diverted
because at the end the Mesa GLX libraries should be used to drive the display via the AMD GPU. Unfortunately, with this neither the nvidia kernel module was loaded, the nvidia persistenced couldn t run because the library libnvidia-cfg1 wasn t found (not sure it was needed at all ), and with that also no way to run tensorflow on GPU. So what I did I tried
update-glx --auto glx
(which is the same as update-glx --config glx /usr/lib/nvidia), and rebooted, and decided to check afterwards what is broken. To my big surprise, the AMD GPU still worked out of the box, including direct rendering, and the games I tried (Overload, Supraland via Wine) all worked without a hinch. Not that I really understand why the GLX libraries that are seemingly now in use are from nvidia but work the same (if anyone has an explanation, that would be great!), but since I haven t had any problems till now, I am content. Checking GPU usage in tensorflow Make sure that you remove tensorflow-rocm and reinstall tensorflow with GPU support:
pip3 uninstall tensorflow-rocm
pip3 install --upgrade tensorflow-gpu
After that a simple
$ python3 -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
....(lots of output)
2020-09-02 11:57:04.673096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3581 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
tf.Tensor(1093.4915, shape=(), dtype=float32)
$
should indicate that the GPU is used by tensorflow! The R Keras package should also work out of the box and pick up the system-wide tensorflow which in turn picks the GPU, see this post for example code to run for tests. Conclusion All in all it was easier than expected, despite the dances one has to do for nvidia to get the correct libraries. What still puzzles me is the selection option in update-glx, and might need a better support for secondary nvidia GPU cards.

15 February 2017

Daniel Stender: APT programming snippets for Debian system maintenance

The Python API for the Debian package manager APT is useful for writing practical system maintenance scripts, which are going beyond shell scripting capabilities. There are Python2 and Python3 libraries for that available as packages, as well as a documentation in the package python-apt-doc. If that s also installed, the documentation then could be found in /usr/share/doc/python-apt-doc/html/index.html, and there are also a couple of example scripts shipped into /usr/share/doc/python-apt-doc/examples. The libraries mainly consists of Python bindings for the libapt-inst and libapt-pkg C++ core libraries of the APT package manager, which makes it processing very fast. Debugging symbols are also available as packages (python ,3 -apt-dbg). The module apt_inst provides features like reading from binary packages, while apt_pkg resembles the functions of the package manager. There is also the apt abstraction layer which provides more convenient access to the library, like apt.cache.Cache() could be used to behave like apt-get:
from apt.cache import Cache
mycache = Cache()
mycache.update()                   # apt-get update
mycache.open()                     # re-open
mycache.upgrade(dist_upgrade=True) # apt-get dist-upgrade
mycache.commit()                   # apply

boil out selections As widely known, there is a feature of dpkg which helps to move a package inventory from one installation to another by just using a text file with a list of installed packages. A selections list containing all installed package could be easily generated with $ dpkg --get-selections > selections.txt. The resulting file then looks something similar like this:
$ cat selections.txt
0ad                                 install
0ad-data                            install
0ad-data-common                     install
a2ps                                install
abi-compliance-checker              install
abi-dumper                          install
abigail-tools                       install
accountsservice                     install
acl                                 install
acpi                                install
The counterpart for this operation (--set-selections) could be used to reinstall (add) the complete package inventory on another installation resp. computer (that needs superuser rights), like that s explained in the manpage dpkg(1). No problem so far. The problem is, if that list contains a package which couldn t be found in any of the package inventories which are set up in /etc/apt/sources.list(.d/) on the target system, dpkg stops the whole process:
# dpkg --set-selections < selections.txt
dpkg: warning: package not in database at line 524: google-chrome-beta
dpkg: warning: found unknown packages; this might mean the available database
is outdated, and needs to be updated through a frontend method
Thus, manually downloaded and installed wild packages from unofficial package sources are problematic for this approach, because the package installer simply doesn t know where to get them. Luckily, dpkg puts out the relevant package names, but instead of having them removed manually with an editor this little Python script for python3-apt automatically deletes any of these packages from a selections file:
#!/usr/bin/env python3
import sys
import apt_pkg
apt_pkg.init()
cache = apt_pkg.Cache()
infile = open(sys.argv[1])
outfile_name = sys.argv[1] + '.boiled'
outfile = open(outfile_name, "w")
for line in infile:
    package = line.split()[0]
    if package in cache:
        outfile.write(line)
infile.close()
outfile.close()
sys.exit(0)
The script takes one argument which is the name of the selections file which has been generated by dpkg. The low level module apt_pkg first has to been initialized with apt_pkg.init(). Then apt_pkg.Cache() can be used to instantiate a cache object (here: cache). That object is iterable, thus it s easy to not perform something if a package from that list couldn t be found in the database, like not copying the corresponding line into the outfile (.boiled), while the others are copied. The result then looks something like this:
$ diff selections.txt selections.txt.boiled 
3780d3779
< python-timemachine   install
4438d4436
< wlan-supercracker    install
That script might be useful also for moving from one distribution resp. derivative to another (like from Ubuntu to Debian). For productive use, open() should be of course secured against FileNotFound and IOError-s to prevent program crashs on such events.

purge rc-s Like also widely known, deinstalled packages leave stuff like configuration files, maintainer scripts and logs on the computer, to save that if the package gets reinstalled at some point in the future. That happens if dpkg has been used with -r/--remove instead of -P/--purge, which also removes these files which are left otherwise. These packages are then marked as rc in the package archive, like:
$ dpkg -l   grep ^rc
rc  firebird2.5-common          2.5.6.27020.ds4-3   amd64   common files for firebird 2.5 servers and clients
rc  firebird2.5-server-common   2.5.6.27020.ds4-3   amd64   common files for firebird 2.5 servers
rc  firebird3.0-common          3.0.1.32609.ds4-8   all     common files for firebird 3.0 server, client and utilities
rc  imagemagick-common          8:6.9.6.2+dfsg-2    all     image manipulation programs -- infrastructure dummy package
It could be purged over them afterwards to completely remove them from the system. There are several shell coding snippets to be found on the net for completing this job automatically, like this one here:
dpkg -l   grep "^rc"   sed  e "s/^rc //"  e "s/ .*$//"   \
xargs dpkg  purge
The first thing which is needed to handle this by a Python script is the information that in apt_pkg, the package state rc per default is represented by the code 5:
>>> testpackage = cache['firebird2.5-common']
>>> testpackage.current_state
5
For changing things in the database apt_pkg.DepCache() could be docked onto an cache object to manipulate the installation state of a package within, like marking it to be removed resp. purged:
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> mydepcache.mark_delete(testpackage, True) # True = purge
>>> mydepcache.marked_delete(testpackage)
True
That s basically all which is needed for an old package purging maintenance script in Python 3, another iterator as package filter and there you go:
#!/usr/bin/env python3
import sys
import apt_pkg
from apt.progress.text import AcquireProgress
from apt.progress.base import InstallProgress
acquire = AcquireProgress()
install = InstallProgress()
apt_pkg.init()
cache = apt_pkg.Cache()
depcache = apt_pkg.DepCache(cache)
for paket in cache.packages:
    if paket.current_state == 5:
        depcache.mark_delete(paket, True)
depcache.commit(acquire, install)
The method DepCache.commit() applies the changes to the package archive at the end, and it needs apt_progress to perform. Of course this script needs superuser rights to run. It then returns something like this:
$ sudo ./rc-purge 
Reading package lists... Done
Building dependency tree
Reading state information... Done
Fetched 0 B in 0s (0 B/s)
custom fork found
got pid: 17984
got pid: 0
got fd: 4
(Reading database ... 701434 files and directories currently installed.)
Purging configuration files for libmimic0:amd64 (1.0.4-2.3) ...
Purging configuration files for libadns1 (1.5.0~rc1-1) ...
Purging configuration files for libreoffice-sdbc-firebird (1:5.2.2~rc2-2) ...
Purging configuration files for vlc-nox (2.2.4-7) ...
Purging configuration files for librlog5v5 (1.4-4) ...
Purging configuration files for firebird3.0-common (3.0.1.32609.ds4-8) ...
Purging configuration files for imagemagick-common (8:6.9.6.2+dfsg-2) ...
Purging configuration files for firebird2.5-server-common (2.5.6.27020.ds4-3)
It s not yet production ready (like there s an infinite loop if dpkg returns error code 1 like from can t remove non empty folder ). But generally, ATTENTION: be very careful with typos and other mistakes if you want to use that code snippet, a false script performing changes on the package database might destroy the integrity of your system, and you don t want that to happen.

detect wild packages Like said above, installed Debian packages might be called wild if they have been downloaded from somewhere on the net and installed manually, like that is done from time to time on many systems. If you want to remove that whole class of packages again for any reason, the question would be how to detect them. A characteristic element is that there is no source connected to such a package, and that could be detected by Python scripting using again the bindings for the APT libraries. The package object doesn t have an associated method to query its source, because the origin is always connected to a specific package version, like some specific version might have come from security updates for example. The current version of a package can be queried with DepCache.get_candidate_ver() which returns a complex apt_pkg.Version object:
>>> import apt_pkg
>>> apt_pkg.init()
>>> mycache = apt_pkg.Cache()
Reading package lists... Done
Building dependency tree
Reading state information... Done
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> testpackage = mydepcache.get_candidate_ver(mycache['nano'])
>>> testpackage
<apt_pkg.Version object: Pkg:'nano' Ver:'2.7.4-1' Section:'editors'  Arch:'amd64' Size:484790 ISize:2092032 Hash:33578 ID:31706 Priority:2>
For version objects there is the method file_list available, which returns a list containing PackageFile() objects:
>>> testpackage.file_list
[(<apt_pkg.PackageFile object: filename:'/var/lib/apt/lists/httpredir.debian.org_debian_dists_testing_main_binary-amd64_Packages'  a=testing,c=main,v=,o=Debian,l=Debian arch='amd64' site='httpredir.debian.org' IndexType='Debian Package Index' Size=38943764 ID:0>, 669901L)]
These file objects contain the index files which are associated with a specific package source (a downloaded package index), which could be read out easily (using a for-loop because there could be multiple file objects):
>>> for files in testpackage.file_list:
...     print(files[0].filename)
/var/lib/apt/lists/httpredir.debian.org_debian_dists_testing_main_binary-amd64_Packages
That explains itself: the nano binary package on this amd64 computer comes from httpredir.debian.org/debian testing main. If a package is wild that means it was installed manually, so there is no associated index file to be found, but only /var/lib/dpkg/status (libcudnn5 is not in the official package archives but distributed by Nvidia as a .deb package):
>>> testpackage2 = mydepcache.get_candidate_ver(mycache['libcudnn5'])
>>> for files in testpackage2.file_list:
...     print(files[0].filename)
/var/lib/dpkg/status
The simple trick now is to find all packages which have only /var/lib/dpkg/status as associated system file (that doesn t refer to what packages contain), an not an index file representing a package source. There s a little pitfall: that s truth also for virtual packages. But virtual packages commonly don t have an associated version (python-apt docs: to check whether a package is virtual; that is, it has no versions and is provided at least once ), and that can be queried by Package.has_versions. A filter to find out any packages that aren t virtual packages, are solely associated to one system file, and that file is /var/lib/dpkg/status, then goes like this:
for package in cache.packages:
    if package.has_versions:
        version = mydepcache.get_candidate_ver(package)
        if len(version.file_list) == 1:
            if 'dpkg/status' in version.file_list[0][0].filename:
                print(package.name)
On my Debian testing system this puts out a quite interesting list. It lists all the wild packages like libcudnn5, but also packages which are recently not in testing because they have been temporarily removed by AUTORM due to release critical bugs. Then there s all the obsolete stuff which have been installed from the package archives once and then being forgotten like old kernel header packages ( obsolete packages in dselect). So this snippet brings up other stuff, too. Thus, this might be more experimental stuff so far, though.

29 October 2016

Jaldhar Vyas: Dawkins Weasel

Happy Dhanteras from Bappy Lahiri
It's already Dhan Terash so I better pick up the pace if I want to beat my blogging challenge before Diwali so in this post I'll discuss a program I wrote earlier this year.
I dread to look up anything on Wikipedia because I always end up going down a rabbit hole and surfacing hours later on a totally unrelated topic. Case in point, some months ago, I ended up on the page of the title. This is an interesting little experiment illustrating how random selection can result in the evolution of a specific form. The algorithm is:

  1. Start with a random string of 28 characters.
  2. Make 100 copies of this string, with a 5% chance per character of that character being replaced with a random character.
  3. Compare each new string with "METHINKS IT IS LIKE A WEASEL", and give each a score (the number of letters in the string that are correct and in the correct position).
  4. If any of the new strings has a perfect score (== 28), halt.
  5. Otherwise, take the highest scoring string, and go to step 2.
I had to try this myself so I wrote a little implementation in C++. A sample run looks like this:
  
$ ./weasel
0000 DNCFICBLUZVC JF KKNVJJASCJRW (0)
0001 DNIFICOLUZVC JFLIKNVAJASCJEW (6)
0002 DNNWICKSUZVCRSFLIKNVA ASCJEL (11)
0003 DNNWICKSUZVCRSFLIKNVA ASCJEL (11)
0004 MNNVICKSQZVCRSFLIKNVA WSCJEL (13)
0005 MENVICKSQZVCRSFLIKNVA WSCJEL (14)
0006 MENVISKS ZTCRSFLIKNVA WLCJEL (16)
0007 MENVISKS ZTCRSFLIKNVA WLCJEL (16)
0008 MEDHISKS ZTCISFLIKNVA WLCJEL (18)
0009 MEDHISKS ZTCISFLIKNVA WLCJEL (18)
0010 MEDHISKS ZTCISFLIKNVA WLCJEL (18)
0011 MEDHISKS ZTCIS LIKTKA WLCZEL (19)
0012 MEDHISKS ZTCIS LIKTKA WLCZEL (19)
0013 MEDHISKS ZTCIS LIKT A WLCZEL (20)
0014 MEDHISKS ZTCIS LIKT A WLCZEL (20)
0015 MEDHISKS ZTCIS LIKE A WLAZEL (22)
0016 MEDHIGKS ITCIS LIKE A WLAZEL (23)
0017 MEDHIGKS ITCIS LIKE A WLAZEL (23)
0018 MEDHIGKS ITCIS LIKE A WLAZEL (23)
0019 MEDHIGKS ITCIS LIKE A WLAZEL (23)
0020 MEDHIGKS ITCIS LIKE A WLAZEL (23)
0021 MEDHIGKS ITCIS LIKE A WLAZEL (23)
0022 METHINKS ITCIS LIKE A WLASEL (26)
0023 METHINKS ITCIS LIKE A WLASEL (26)
0024 METHINKS ITCIS LIKE A WLASEL (26)
0025 METHINKS ITCIS LIKE A WEASEL (27)
0026 METHINKS ITCIS LIKE A WEASEL (27)
0027 METHINKS ITCIS LIKE A WEASEL (27)
0028 METHINKS ITCIS LIKE A WEASEL (27)
0029 METHINKS ITCIS LIKE A WEASEL (27)
0030 METHINKS ITCIS LIKE A WEASEL (27)
0031 METHINKS ITCIS LIKE A WEASEL (27)
0032 METHINKS ITCIS LIKE A WEASEL (27)
0033 METHINKS ITCIS LIKE A WEASEL (27)
0034 METHINKS ITCIS LIKE A WEASEL (27)
0035 METHINKS ITCIS LIKE A WEASEL (27)
0036 METHINKS ITCIS LIKE A WEASEL (27)
0037 METHINKS ITCIS LIKE A WEASEL (27)
0038 METHINKS ITCIS LIKE A WEASEL (27)
0039 METHINKS ITCIS LIKE A WEASEL (27)
0040 METHINKS ITCIS LIKE A WEASEL (27)
0041 METHINKS ITCIS LIKE A WEASEL (27)
0042 METHINKS ITCIS LIKE A WEASEL (27)
0043 METHINKS ITCIS LIKE A WEASEL (27)
0044 METHINKS ITCIS LIKE A WEASEL (27)
0045 METHINKS ITCIS LIKE A WEASEL (27)
0046 METHINKS ITCIS LIKE A WEASEL (27)
0047 METHINKS ITCIS LIKE A WEASEL (27)
0048 METHINKS ITCIS LIKE A WEASEL (27)
0049 METHINKS ITCIS LIKE A WEASEL (27)
0050 METHINKS ITCIS LIKE A WEASEL (27)
0051 METHINKS ITCIS LIKE A WEASEL (27)
0052 METHINKS ITCIS LIKE A WEASEL (27)
0053 METHINKS ITCIS LIKE A WEASEL (27)
0054 METHINKS IT IS LIKE A WEASEL (28)

My program lets you adjust the input string, the number of copies, and the mutation threshold. I also thought it might be interesting to implement the Generator design pattern. In C++ this is done by making a class which implements begin() and end() methods and atleast a forward iterator. You can find the source code on Github.

30 August 2016

Daniel Stender: My work for Debian in August

Here's again a little list of my humble off-time contributions I'm happy to add to the large amount of work we're completing all together each month. Then there is one more "new in Debian" (meaning: "new in unstable") announcement. First, the uploads (a few of them are from July): New packages: Sponsored uploads: Requested resp. suggested for packaging: New in Debian: Lasagne (deep learning framework) Now that the mathematical expression compiler Theano is available in Debian, deep learning frameworks resp. toolkits which have been build on top of it can become available within Debian, too (like Blocks, mentioned before). Theano is an own general computing engine which has been developed with a focus on machine learning resp. neural networks, which features an own declarative tensor language. The toolkits which have build upon it vary in the way how much they abstract the bare features of Theano, if they are "thick" or "thin" so to say. When the abstraction gets higher you gain more end user convenience up to the level that you have the architectural components of neural networks available for combination like in a lego box, while the more complicated things which are going on "under the hood" (like how the networks are actually implemented) are hidden. The downside is, thick abstraction layers usually makes it difficult to implement novel features like custom layers or loss functions. So more experienced users and specialists might to seek out for the lower abstraction toolkits, where you have to think more in terms of Theano. I've got an initial package of Keras in experimental (1.0.7-1), it runs (only a Python 3 package is available so far) but needs some more work (e.g. building the documentation with mkdocs). Keras is a minimalistic, high modular DNN library inspired by Torch1. It has a clean, rather easy API for experimenting and fast prototyping. It can also run on top of Google's TensorFlow, and we're going to have it ready for that, too. Lasagne follows a different approach. It's, like Keras and Blocks, a Python library to create and train multi-layered artificial neural networks in/on Theano for applications like image recognition resp. classification, speech recognition, image caption generation or other purposes like style transfers from paintings to pictures2. It abstracts Theano as little as possible, and could be seen rather like an extension or an add-on than an abstraction3. Therefore, knowledge on how things are working in Theano would be needed to make full use out of this piece of software. With the new Debian package (0.1+git20160728.8b66737-1)4, the whole required software stack (the corresponding Theano package, NumPy, SciPy, a BLAS implementation, and the nividia-cuda-toolkit and NVIDIA kernel driver to carry out computations on the GPU5) could be installed most conveniently just by a single apt-get install python ,3 -lasagne command6. If wanted with the documentation package lasagne-doc for offline use (no running around on remote airports seeking for a WIFI spot), either in the Python 2 or the Python 3 branch, or both flavours altogether7. While others have to spend a whole weekend gathering, compiling and installing the needed libraries you can grab yourself a fresh cup of coffee. These are the advantages of a fully integrated system (sublime message, as always: desktop users switch to Linux!). When the installation of packages has completed, the MNIST example of Lasagne could be used for a quick check if the whole library stack works properly8:
$ THEANO_FLAGS=device=gpu,floatX=float32 python /usr/share/doc/python-lasagne/examples/mnist.py mlp 5
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN 5005)
Loading data...
Downloading train-images-idx3-ubyte.gz
Downloading train-labels-idx1-ubyte.gz
Downloading t10k-images-idx3-ubyte.gz
Downloading t10k-labels-idx1-ubyte.gz
Building model and compiling functions...
Starting training...
Epoch 1 of 5 took 2.488s
  training loss:        1.217167
  validation loss:      0.407390
  validation accuracy:      88.79 %
Epoch 2 of 5 took 2.460s
  training loss:        0.568058
  validation loss:      0.306875
  validation accuracy:      91.31 %
The example on how to train a neural network on the MNIST database of handwritten digits is refined (it also provides --help) and explained in detail in the Tutorial section of the documentation in /usr/share/doc/lasagne-doc/html/. Very good starting points are also the IPython notebooks that are available from the tutorials by Eben Olson9 and Geoffrey French on the PyData London 201610. There you have Theano basics, examples for employing convolutional neural networks (CNN) and recurrent neural networks (RNN) for a range of different purposes, how to use pre-trained networks for image recognition, etc.

  1. For a quick comparison of Keras and Lasagne with other toolkits, see Alex Rubinsteyn's PyData NYC 2015 presentation on using LSTM (long short term memory) networks on varying length sequence data like Grimm's fairy tales (https://www.youtube.com/watch?v=E92jDCmJNek 27:30 sq.)
  2. https://github.com/Lasagne/Recipes/tree/master/examples/styletransfer
  3. Great introduction to Theano and Lasagne by Eben Olson on the PyData NYC 2015: https://www.youtube.com/watch?v=dtGhSE1PFh0
  4. The package is "freelancing" currently being in collab-maint, to set up a deep learning packaging team within Debian is in the stage of discussion.
  5. Only available for amd64 and ppc64el.
  6. You would need "testing" as package source in /etc/apt/sources.list to install it from the archive at the present time (I have that for years, but if Debian Testing could be advised as productive system is going to be discussed elsewhere), but it's coming up for Debian 9. The cuda-toolkit and pycuda are in the non-free section of the archive, thus non-free (mostly used in combination with contrib) must be added to main. Plus, it's a mere suggestion of the Theano packages to keep Theano in main, so --install-suggests is needed to pull it automatically with the same command, or this must be given explicitly.
  7. For dealing with Theano in Debian, see this previous blog posting
  8. Like suggested in the guideline From Zero to Lasagne on Ubuntu 14.04. cuDNN isn't available as official Debian package yet, but could be downloaded as a .deb package after registration at https://developer.nvidia.com/cudnn. It integrates well out of the box.
  9. https://github.com/ebenolson/pydata2015
  10. https://github.com/Britefury/deep-learning-tutorial-pydata2016, video: https://www.youtube.com/watch?v=DlNR1MrK4qE

20 July 2016

Daniel Stender: Theano in Debian: maintenance, BLAS and CUDA

I'm glad to announce that we have the current release of Theano (0.8.2) in Debian unstable now, it's on its way into the testing branch and the Debian derivatives, heading for Debian 9. The Debian package is maintained in behalf of the Debian Science Team. We have a binary package with the modules in the Python 2.7 import path (python-theano), if you want or need to stick to that branch a little longer (as a matter of fact, in the current popcon stats it's the most installed package), and a package running on the default Python 3 version (python3-theano). The comprehensive documentation is available for offline usage in another binary package (theano-doc). Although Theano builds its extensions on run time and therefore all binary packages contain the same code, the source package generates arch specific packages1 for the reason that the exhaustive test suite could run over all the architectures to detect if there are problems somewhere (#824116). what's this? In a nutshell, Theano is a computer algebra system (CAS) and expression compiler, which is implemented in Python as a library. It is named after a Classical Greek female mathematician and it's developed at the LISA lab (located at MILA, the Montreal Institute for Learning Algorithms) at the Universit de Montr al. Theano tightly integrates multi-dimensional arrays (N-dimensional, ND-array) from NumPy (numpy.ndarray), which are broadly used in Scientific Python for the representation of numeric data. It features a declarative Python based language with symbolic operations for the functional definition of mathematical expressions, which allows to create functions that compute values for them. Internally the expressions are represented as directed graphs with nodes for variables and operations. The internal compiler then optimizes those graphs for stability and speed and then generates high-performance native machine code to evaluate resp. compute these mathematical expressions2. One of the main features of Theano is that it's capable to compute also on GPU processors (graphical processor unit), like on custom graphic cards (e.g. the developers are using a GeForce GTX Titan X for benchmarks). Today's GPUs became very powerful parallel floating point devices which can be employed also for scientific computations instead of 3D video games3. The acronym "GPGPU" (general purpose graphical processor unit) refers to special cards like NVIDIA's Tesla4, which could be used alike (more on that below). Thus, Theano is a high-performance number cruncher with an own computing engine which could be used for large-scale scientific computations. If you haven't came across Theano as a Pythonistic professional mathematician, it's also one of the most prevalent frameworks for implementing deep learning applications (training multi-layered, "deep" artificial neural networks, DNN) around5, and has been developed with a focus on machine learning from the ground up. There are several higher level user interfaces build in the top of Theano (for DNN, Keras, Lasagne, Blocks, and others, or for Python probalistic programming, PyMC3). I'll seek for some of them also becoming available in Debian, too. helper scripts Both binary packages ship three convenience scripts, theano-cache, theano-test, and theano-nose. Instead of them being copied into /usr/bin, which would result into a binaries-have-conflict violation, the scripts are to be found in /usr/share/python-theano (python3-theano respectively), so that both module packages of Theano can be installed at the same time. The scripts could be run directly from these folders, e.g. do $ python /usr/share/python-theano/theano-nose to achieve that. If you're going to heavy use them, you could add the directory of the flavour you prefer (Python 2 or Python 3) to the $PATH environment variable manually by either typing e.g. $ export PATH=/usr/share/python-theano:$PATH on the prompt, or save that line into ~/.bashrc. Manpages aren't available for these little helper scripts6, but you could always get info on what they do and which arguments they accept by invoking them with the -h (for theano-nose) resp. help flag (for theano-cache). running the tests On some occasions you might want to run the testsuite of the installed library, like to check over if everything runs fine on your GPU hardware. There are two different ways to run the tests (anyway you need to have python ,3 -nose installed). One is, you could launch the test suite by doing $ python -c 'import theano; theano.test() (or the same with python3 to test the other flavour), that's the same what the helper script theano-test does. However, by doing it that way some particular tests might fail by raising errors also for the group of known failures. Known failures are excluded from being errors if you run the tests by theano-nose, which is a wrapper around nosetests, so this might be always the better choice. You can run this convenience script with the option --theano on the installed library, or from the source package root, which you could pull by $ sudo apt-get source theano (there you have also the option to use bin/theano-nose). The script accept options for nosetests, so you might run it with -v to increase verbosity. For the tests the configuration switch config.device must be set to cpu. This will also include the GPU tests when a proper accessible device is detected, so that's a little misleading in the sense of it doesn't mean "run everything on the CPU". You're on the safe side if you run it always like this: $ THEANO_FLAGS=device=cpu theano-nose, if you've set config.device to gpu in your ~/.theanorc. Depending on the available hardware and the used BLAS implementation (see below) it could take quite a long time to run the whole test suite through, on the Core-i5 in my laptop that takes around an hour even excluded the GPU related tests (which perform pretty fast, though). Theano features a couple of switches to manipulate the default configuration for optimization and compilation. There is a rivalry between optimization and compilation costs against performance of the test suite, and it turned out the test suite performs a quicker with lesser graph optimization. There are two different switches available to control config.optimizer, the fast_run toggles maximal optimization, while fast_compile runs only a minimal set of graph optimization features. These settings are used by the general mode switches for config.mode, which is either FAST_RUN by default, or FAST_COMPILE. The default mode FAST_RUN (optimizer=fast_run, linker=cvm) needs around 72 minutes on my lower mid-level machine (on un-optimized BLAS). To set mode=FAST_COMPILE (optimizer=fast_compile, linker=py) brings some boost for the performance of the test suite because it runs the whole suite in 46 minutes. The downside of that is that C code compilation is disabled in this mode by using the linker py, and also the GPU related tests are not included. I've played around with using the optimizer fast_compile with some of the other linkers (c py and cvm, and their versions without garbage collection) as alternative to FAST_COMPILE with minimal optimization but also machine code compilation incl. GPU testing. But to my experience, fast_compile without another than the linker py results in some new errors and failures of some tests on amd64, and this might the case also on other architectures, too. By the way, another useful feature is DebugMode for config.mode, which verifies the correctness of all optimizations and compares the C to Python results. If you want to have detailed info on the configuration settings of Theano, do $ python -c 'import theano; print theano.config' less, and check out the chapter config in the library documentation in the documentation. cache maintenance Theano isn't a JIT (just-in-time) compiler like Numba, which generates native machine code in the memory and executes it immediately, but it saves the generated native machine code into compiledirs. The reason for doing it that way is quite practical like the docs explain, the persistent cache on disk makes it possible to avoid generating code for the same operation, and to avoid compiling again when different operations generate the same code. The compiledirs by default are located within $(HOME)/.theano/. After some time the folder becomes quite large, and might look something like this:
$ ls ~/.theano
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.11+-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12rc1-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.1+-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.2-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.2rc1-64
If the used Python version changed like in this example you might to want to purge obsolete cache. For working with the cache resp. the compiledirs, the helper theano-cache comes in handy. If you invoke it without any arguments the current cache location is put out like ~/.theano/compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64 (the script is run from /usr/share/python-theano). So, the compiledirs for the old Python versions in this example (11+ and 12rc1) can be removed to free the space they occupy. All compiledirs resp. cache directories meaning the whole cache could be erased by $ theano-cache basecompiledir purge, the effect is the same as by performing $ rm -rf ~/.theano. You might want to do that e.g. if you're using different hardware, like when you got yourself another graphics card. Or habitual from time to time when the compiledirs fill up so much that it slows down processing with the harddisk being very busy all the time, if you don't have an SSD drive available. For example, the disk space of build chroots carrying (mainly) the tests completely compiled through on default Python 2 and Python 3 consumes around 1.3 GB (see here). BLAS implementations Theano needs a level 3 implementation of BLAS (Basic Linear Algebra Subprograms) for operations between vectors (one-dimensional mathematical objects) and matrices (two-dimensional objects) carried out on the CPU. NumPy is already build on BLAS and pulls the standard implementation (libblas3, soure package: lapack), but Theano links directly to it instead of using NumPy as intermediate layer to reduce the computational overhead. For this, Theano needs development headers and the binary packages pull libblas-dev by default, if any other development package of another BLAS implementation (like OpenBLAS or ATLAS) isn't already installed, or pulled with them (providing the virtual package libblas.so). The linker flags could be manipulated directly through the configuration switch config.blas.ldflags, which is by default set to -L/usr/lib -lblas -lblas. By the way, if you set it to an empty value, Theano falls back to using BLAS through NumPy, if you want to have that for some reason. On Debian, there is a very convenient way to switch between BLAS implementations by the alternatives mechanism. If you have several alternative implementations installed at the same time, you can switch from one to another easily by just doing:
$ sudo update-alternatives --config libblas.so
There are 3 choices for the alternative libblas.so (providing /usr/lib/libblas.so).
  Selection    Path                                  Priority   Status
------------------------------------------------------------
* 0            /usr/lib/openblas-base/libblas.so      40        auto mode
  1            /usr/lib/atlas-base/atlas/libblas.so   35        manual mode
  2            /usr/lib/libblas/libblas.so            10        manual mode
  3            /usr/lib/openblas-base/libblas.so      40        manual mode
Press <enter> to keep the current choice[*], or type selection number:
The implementations are performing differently on different hardware, so you might want to take the time to compare which one does it best on your processor (the other packages are libatlas-base-dev and libopenblas-dev), and choose that to optimize your system. If you want to squeeze out all which is in there for carrying out Theano's computations on the CPU, another option is to compile an optimized version of a BLAS library especially for your processor. I'm going to write another blog posting on this issue. The binary packages of Theano ship the script check_blas.py to check over how well a BLAS implementation performs with it, and if everything works right. That script is located in the misc subfolder of the library, you could locate it by doing $ dpkg -L python-theano grep check_blas (or for the package python3-theano accordingly), and run it with the Python interpreter. By default the scripts puts out a lot of info like a huge perfomance comparison reference table, the current setting of blas.ldflags, the compiledir, the setting of floatX, OS information, the GCC version, the current NumPy config towards BLAS, NumPy location and version, if Theano linked directly or has used the NumPy binding, and finally and most important, the execution time. If just the execution time for quick perfomance comparisons is needed this script could be invoked with -q. Theano on CUDA The function compiler of Theano works with alternative backends to carry out the computations, like the ones for graphics cards. Currently, there are two different backends for GPU processing available, one docks onto NVIDIA's CUDA (Compute Unified Device Architecture) technology7, and another one for libgpuarray, which is also developed by the Theano developers in parallel. The libgpuarray library is an interesting alternative for Theano, it's a GPU tensor (multi-dimensional mathematical object) array written in C with Python bindings based on Cython, which has the advantage of running also on OpenCL8. OpenCL, unlike CUDA9, is full free software, vendor neutral and overcomes the limitation of the CUDA toolkit being only available for amd64 and the ppc64el port (see here). I've opened an ITP on libgpuarray and we'll see if and how this works out. Another reason for it would be great to have it available is that it looks like CUDA currently runs into problems with GCC 610. More on that, soon. Here's a litle checklist for setting up your CUDA device so that you don't have to experience something like this:
$ THEANO_FLAGS=device=gpu,floatX=float32 python ./cat_dog_classifier.py 
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
hardware check For running Theano on CUDA you need an NVIDIA graphics card which is capable of doing that. You can recheck if your device is supported by CUDA here. When the hardware isn't too old (CUDA support started with GeForce 8 and Quadro X series) or too strange I think it isn't working only in exceptional cases. You can check your model and if the device is present in the system on the bare hardware level by doing this:
$ lspci   grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
If a line like this doesn't get returned, your device most probably is broken, or not properly connected (ouch). If rev ff appears at the end of the line that means the device is off meaning powered down. This might be happening if you have a laptop with Optimus graphics hardware, and the related drivers have switched off the unoccupied device to safe energy11. kernel module Running CUDA applications requires the proprietary NVIDIA driver kernel module to be loaded into the kernel and working. If you haven't already installed it for another purpose, the NVIDIA driver and the CUDA toolkit are both in the non-free section of the Debian archive, which is not enabled by default. To get non-free packages you have to add non-free (and it's better to do so, also contrib) to your package source in /etc/apt/sources.list, which might then look like this:
deb http://httpredir.debian.org/debian/ testing main contrib non-free
After doing that, perform $ apt-cache update to update the package lists, and there you go with the non-free packages. The headers of the running kernel are needed to compile modules, you can get them together with the NVIDIA kernel module package by running:
$ sudo apt-get install linux-headers-$(uname -r) nvidia-kernel-dkms build-essential
DKMS will then build the NVIDIA module for the kernel and does some other things on the system. When the installation has finished, it's generally advised to reboot the system completely. troubleshooting If you have problems with the CUDA device, it's advised to verify if the following things concerning the NVIDIA driver resp. kernel module are in order: blacklist nouveau Check if the default Nouveau kernel module driver (which blocks the NVIDIA module) for some reason still gets loaded by doing $ lsmod grep nouveau. If nothing gets returned, that's right. If it's still in the kernel, just add blacklist nouveau to /etc/modprobe.d/blacklist.conf, and update the booting ramdisk with sudo update-initramfs -u afterwards. Then reboot once more, this shouldn't be the case then anymore. rebuild kernel module To fix it when the module haven't been properly compiled for some reason you could trigger a rebuild of the NVIDIA kernel module with $ sudo dpkg-reconfigure nvidia-kernel-dkms. When you're about to send your hardware in to repair because everything looks all right but the device just isn't working, that really could help (own experience). After the rebuild of the module or modules (if you have a few kernel packages installed) has completed, you could recheck if the module really is available by running:
$ sudo modinfo nvidia-current
filename:       /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko
alias:          char-major-195-*
version:        352.79
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
vermagic:       4.4.0-1-amd64 SMP mod_unload modversions 
parm:           NVreg_Mobile:int
It should be something similiar to this when everything is all right. reload kernel module When there are problems with the GPU, maybe the kernel module isn't properly loaded. You could recheck if the module has been properly loaded by doing
$ lsmod   grep nvidia
nvidia_uvm             73728  0
nvidia               8540160  1 nvidia_uvm
drm                   356352  7 i915,drm_kms_helper,nvidia
The kernel module could be loaded resp. reloaded with $ sudo nvidia-modprobe (that tool is from the package nvidia-modprobe). unsupported graphics card Be sure that you graphics cards is supported by the current driver kernel module. If you have bought new hardware, that's quite possible to come out being a problem. You can get the version of the current NVIDIA driver with:
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 5.3.1 20160528 (Debian 5.3.1-21)
Then, google the version number like nvidia 352.79, this should get you onto an official driver download page like this. There, check for what's to be found under "Supported Products". I you're stuck with that there are two options, to wait until the driver in Debian got updated, or replace it with the latest driver package from NVIDIA. That's possible to do, but something more for experienced users. occupied graphics card The CUDA driver cannot work while the graphical interface is busy like by processing the graphical display of your X.Org server. Which kernel driver actually is used to process the desktop could be examined by this command:12
$ grep '(II).*([0-9]):' /var/log/Xorg.0.log
[    37.700] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20150522
[    37.700] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917-2 (Vincent Cheng <vcheng@debian.org>)
 ... 
[    39.808] (II) intel(0): switch to mode 1920x1080@60.0 on eDP1 using pipe 0, position (0, 0), rotation normal, reflection none
[    39.810] (II) intel(0): Setting screen physical size to 508 x 285
[    67.576] (II) intel(0): EDID vendor "CMN", prod id 5941
[    67.576] (II) intel(0): Printing DDC gathered Modelines:
[    67.576] (II) intel(0): Modeline "1920x1080"x0.0  152.84  1920 1968 2000 2250  1080 1083 1088 1132 -hsync -vsync (67.9 kHz eP)
This example shows that the rendering of the desktop is performed by the graphical device of the Intel CPU, which is just like it's needed for running CUDA applications on your NVIDIA graphics card, if you don't have another one. nvidia-cuda-toolkit With the Debian package of the CUDA toolkit everything pretty much runs out of the box for Theano. Just install it with apt-get, and you're ready to go, the CUDA backend is the default one. Pycuda is also a suggested dependency of the binary packages, it could be pulled together with the CUDA toolkit. The up-to-date CUDA release 7.5 is of course available, with that you have Maxwell architecture support so that you can run Theano on e.g. a GeForce GTX Titan X with 6,2 TFLOPS on single precision13 at an affordable price. CUDA 814 is around the corner with support for the new Pascal architecture15. Like the GeForce GTX 1080 high-end gaming graphics card already has 8,23 TFLOPS16. When it comes to professional GPGPU hardware like the Tesla P100 there is much more computational power available, scalable by multiplication of cores resp. cards up to genuine little supercomputers which fit on a desk, like the DGX-117. Theano can use multiple GPUs for calculations to work with highly scaled hardware, I'll write another blog post on this issue. Theano on the GPU It's not difficult to run Theano on the GPU. Only single precision floating point numbers (float32) are supported on the GPU, but that is sufficient for deep learning applications. Theano uses double precision floats (float64) by default, so you have to set the configuration variable config.floatX to float32, like written on above, either with the THEANO_FLAGS environment variable or better in your .theanorc file, if you're going to use the GPU a lot. Switching to the GPU actually happens with the config.device configuration variable, which must be set to either gpu or gpu0, gpu1 etc., to choose a particular one if multiple devices are available. Here's is a little test script check1.py, it's taken from the docs and slightly altered. You can run that script either with python or python3 (there was a single test failure on the Python 3 package, so the Python 2 library might be a little more stable currently). For comparison, here's an example on how it perfoms on my hardware, one time on the CPU, one more time on the GPU:
$ THEANO_FLAGS=floatX=float32 python ./check1.py 
[Elemwise exp,no_inplace (<TensorType(float32, vector)>)]
Looping 1000 times took 4.481719 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu
$ THEANO_FLAGS=floatX=float32,device=gpu python ./check1.py 
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN not available)
[GpuElemwise exp,no_inplace (<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise exp,no_inplace .0)]
Looping 1000 times took 1.164906 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
If you got a result like this you're ready to go with Theano on Debian, training computer vision classifiers or whatever you want to do with it. I'll write more on for what Theano could be used, soon.

  1. Some ports are disabled because they are currently not supported by Theano. There are NotImplementedErrors and other errors in the tests on the numpy.ndarray object being not aligned. The developers commented on that, see here. And on some ports the build flags -m32 resp. -m64 of Theano aren't supported by g++, the build flags can't be manipulated easily.
  2. Theano Development Team: "Theano: a Python framework for fast computation of mathematical expressions"
  3. Marc Couture: "Today's high-powered GPUs: strong for graphics and for maths". In: RTC magazine June 2015, pp. 22 25
  4. Ogier Maitre: "Understanding NVIDIA GPGPU hardware". In: Tsutsui/Collet (eds.): Massively parallel evolutionary computation on GPGPUs. Berlin, Heidelberg: Springer 2013, pp. 15-34
  5. Geoffrey French: "Deep learing tutorial: advanved techniques". PyData London 2016 presentation
  6. Like the description of the Lintian tag binary-without-manpage says, that's not needed for them being in /usr/share.
  7. Tom. R. Halfhill: "Parallel processing with CUDA: Nvidia's high-performance computing platform uses massive multithreading". In: Microprocessor Report January 28, 2008
  8. Faber et.al: "Parallelwelten: GPU-Programmierung mit OpenCL". In: C't 26/2014, pp. 160-165
  9. For comparison, see: Valentine Sinitsyn: "Feel the taste of GPU programming". In: Linux Voice February 2015, pp. 106-109
  10. https://lists.debian.org/debian-devel/2016/07/msg00004.html
  11. If Optimus (hybrid) graphics hardware is present (like commonly today on PC laptops), Debian launches the X-server on the graphics processing unit of the CPU, which is ideal for CUDA. The problem with Optimus actually is the graphics processing on the dedicated GPU. If you are using Bumblebee, the Python interpreter which you want to run Theano on has be to be started with the launcher primusrun, because Bumblebee powers the GPU down with the tool bbswitch every time it isn't used, and I think also the kernel module of the driver is dynamically loaded.
  12. Thorsten Leemhuis: "Treiberreviere. Probleme mit Grafiktreibern f r Linux l sen": In: C't Nr.2/2013, pp. 156-161
  13. Martin Fischer: "4K-Rakete: Die schnellste Single-GPU-Grafikkarte der Welt". In C't 13/2015, pp. 60-61
  14. http://www.heise.de/developer/meldung/Nvidia-CUDA-8-bringt-Optimierungen-fuer-die-Pascal-Architektur-3164254.html
  15. Martin Fischer: "All In: Nvidia enth llt die GPU-Architektur 'Pascal'". In: C't 9/2016, pp. 30-31
  16. Martin Fischer: "Turbo-Pascal: High-End-Grafikkarte f r Spieler: GeForce GTX 1080". In: C't 13/2016, pp. 100-103
  17. http://www.golem.de/news/dgx-1-nvidias-supercomputerchen-mit-8x-tesla-p100-1604-120155.html

30 May 2016

Daniel Stender: My work for Debian in May

No double posting this time ;-) I've got not so much spare time this month to spend on Debian, but I could work on the following packages: This series of blog postings also includes little introductions of and into new packages in the archive. This month there is: Pyinfra Pyinfra is a new project which is currently still in development state. It has been already pointed out in an interesting German article1, and is now available as package maintained within the Python Applications Team. It's currently a one man production by Nick Barrett, and eagerly developed in the past weeks (we're currently at 0.1~dev24). Pyinfra is a remote server configuration/provisioning/service deployment tool which belongs in the same software category like Puppet or Ansible2. It's for provisioning one or an array of remote servers with software packages and to configure them. Pyinfra runs agentless like Ansible, that means for using it nothing special (like a daemon) has to run on targeted servers. It's written to be used for provisioning POSIX compatible Linux systems and has alternatives when it comes to special features like package managers (e.g. supports apt as well as yum). The documentation could be found in usr/share/doc/pyinfra/html/. Here's a little crash course on how to use Pyinfra: The pyinfra CLI tool is used on the command line like this, deploy scripts, single operations or facts (see below) could be used on a single server or a multitude of remote servers:
$ pyinfra -i <inventory script/single host> <deploy script>
$ pyinfra -i <inventory script/single host> --run <operation>
$ pyinfra -i <inventory script/single host> --facts <fact>
Remote servers which are operated on must provide a working shell and they must be reachable by SSH. For connecting, --port, --user, --password, --key/--key-password and --sudo flags are available, --sudo to gain superuser rights. Root access or sudo rights of course have to be already set up. By the way, localhost could be operated on the same way. Single operations are organized in modules like "apt", "files", "init", "server" etc. With the --run option they could be used individually on servers like follows, e.g. server.user adds a new user on a single targeted system (-v adds verbosity to the pyinfra run):
$ pyinfra -i 192.0.2.10 --run server.user sam --user root --key ~/.ssh/sshkey --key-password 123456 -v
Multiple servers can be grouped in inventories, which hold the targeted hosts and data associated with them, like e.g. an inventory file farm1.py would contain lists like this:
COMPUTE_SERVERS = ['192.0.2.10', '192.0.2.11']
DATABASE_SERVERS = ['192.0.2.20', '192.0.2.21']
Group designators must be all caps. A higher level of grouping are the file names of inventory scripts, thus COMPUTE_SERVERS and DATABASE_SERVERS can be referenced to at the same time by the group designator farm1. Plus, all servers are automatically added to the group all. And, inventory scripts should be stored in the subfolder inventory/ in the project directory. Inventory files then could be used instead of specific IP addresses like this, the single operation then gets performed on all given machines in farm1.py:
$ pyinfra -i inventory/farm1.py  --run server.user sam --user root --key ~/.ssh/sshkey --key-password=123456 -v
Deployment scripts could be used together with group data files in the subfolder group_data/ in the project directory. For example, a group_data/farm1.py designates all servers given in inventory/farm1.py (by the way, all.py designates all servers), and contains the random attribute user_name (attributes must be lowercase), next to authentication data for the whole inventory group:
user_name = 'sam'
ssh_user = 'root'
ssh_key = '~/.ssh/sshkey'
ssh_key_password = '123456'
The random attribute can be picked up by a deployment script using host.data() like follows, user_name could be used again for e.g. server.user(), like this:
from pyinfra import host
from pyinfra.modules import server
server.user(host.data.user_name)
This deploy, the ensemble of inventory file, group data file and deployment script (usually placed top level in the project folder) then could be run that way:
$ pyinfra -i inventory/farm1.py deploy.py
You have guessed it, since deployment scripts are Python scripts they are fully programmable (please regard that Pyinfra is build & runs on Python 3 on Debian), and that's the main advantage point with this piece of software. Quite handy for that come Pyinfra facts, functions which check different things on remote systems and return information as Python data. Like e.g. deb_packages returns a dictionary of installed packages from a remote apt based server:
$ pyinfra -i 192.0.2.10 --fact deb_packages --user root --key ~/.ssh/sshkey --key-password=123456
 
    "192.0.2.10":  
        "libdebconfclient0": "0.192",
        "python-debian": "0.1.27",
        "libavahi-client3": "0.6.31-5",
        "dbus": "1.8.20-0+deb8u1",
        "libustr-1.0-1": "1.0.4-3+b2",
        "sed": "4.2.2-4+b1",
Using facts, Pyinfra reveals its full potential. For example, a deployment script could go like this, linux.distribution() returns a dict containing the installed distribution:
from pyinfra import host
from pyinfra.modules import apt
if host.fact.linux_distribution['name'] == 'Debian':
   apt.packages(packages='gummi', present=True, update=True)
elif host.fact.linux_distribution['name'] == 'CentOS':
   pass
I'll spare more sophisticated examples to keep this introduction simple. Beyond fancy deployment scripts, Pyinfra features an own API by which it could be programmed from the outside, and much more. But maybe that's enough to introduce Pyinfra. That are the usage basics. Pyinfra is a brand new project and it remains to be seen whether the developer can keep on further developing the tool like he does these days. For a private project it's insane to attempt to become a contender for the established "big" free configuration management tools and frameworks, but, if Puppet has become too complex in the meanwhile or not3, I really don't think that's the point here. Pyinfra follows an own approach in being programmable the way which it is. And it's definitely not harm to have it in the toolbox already, not trying to replace nothing. Brainstorm After the first package has been in experimental, the Brainstorm library from Swiss AI research institute IDSIA4 is now available as python3-brainstorm in unstable. Brainstorm is a lean, easy-to-use library for setting up deep learning networks (multiple layered artificial neural networks) for machine learning applications like for image and speech recognition or natural language processing. To set up a working training network for a classifier for handwritten digits like the MNIST dataset (a usual "hello world") just takes a couple of lines, like an example demonstrates. The package is maintained within the Debian Python Modules Team. The Debian package ships a couple of examples in /usr/share/python3-brainstorm/examples (the data/ and examples/ folders of the upstream tarball are combined here). Among them there are5: The current documentation in /usr/share/doc/python3-brainstorm/html/ isn't complete yet (several chapters are under construction), but there's a walkthrough on the CIFAR-10 example. The MNIST example has been extended by Github user pinae, and has been explained in German C't recently6. What are the perspectives for further development? Like Zhou Mo confirmed, there are a couple of deep learning frameworks around having a rather poor outlook since there have been abandoned after being completed as PhD projects. There's really no point for thriving to have them all in Debian, like the ITP of Minerva has been given up partly for this reason, there weren't any commits since 08/2015 (and because cuDNN isn't available and most likely won't). Brainstorm, 0.5 have been released 05/2015, also was a PhD project as IDSIA. It's stated on Github that the project is "under active development", but the rather sparse project page on the other side expresses the "hope the community will help us to further improve Brainstorm". This sentence much often implies that the developers are not actively working on the project. But there are recent commits and it looks that upstream is active and could be reached when there are problems, and that the project is active. So I don't think we're riding a dead horse, here. The downside for Brainstorm in Debian is, it seems that the libraries which are needed for GPU accelerated processing can't be fully provided. Pycuda is available, but scikit-cuda (an additional library which provides wrappers for CUDA features like CUBLAS, CUFFT and CUSOLVER) is not and won't be, because the CULA Dense Toolkit (scikit-cuda also contains wrappers for also that) is not available freely as source. Because of that, a dependency against pycuda, not even as Suggests (it's non-free), has been spared. Without GPU acceleration, Brainstorm computes the matrices on openBLAS using a Cython wrapper on the NumpyHandler, and the PyCudaHandler couldn't be used. openBLAS makes pretty good use of the available hardware (it distributes over all available CPU cores), but it's not yet possible to run Brainstorm full throttle using available floating point devices to reduce training times, which becomes crucial when the projects are getting bigger. Brainstorm belongs to the number of deep learning frameworks already being or becoming available in Debian. Currently there is: I've checked over Microsoft's CNTK, but although it's also set free recently I have my doubts if that could be included. Apparently there are dependencies against non-free software and most likely other issues. So much for a little update on the state of deep learning in Debian, please excuse if my radar misses something.

  1. Tim Sch rmann: "Schlangen l: Automatisiertes Service-Deployment mit Pyinfra". In: IT-Administrator 05/2016, pp. 90-95.
  2. For a comparison of configuration management software like this, see B wetter/Johannsen/Steig: "Baukastensysteme: Konfigurationsmanagement mit Open-Source-Software". In: iX 04/2016, pp. 94-99 (please excuse the prevalence of German articles in the pointers, I've just have them at hand).
  3. On the points of critique on Puppet, see Martin Loschwitz: "David gegen Goliath Zwei Welten treffen aufeinander: Puppet und Ansible". In Linux-Magazin 01/2016, 50-54.
  4. See the interview with IDSIA's deep learning guru J rgen Schmidhuber in German C't 2014/09, p. 148
  5. The examples scripts need some more finetuning. To run the data creation scripts in place the environment variable BRAINSTORM_DATA_DIR could be set, but the trained networks are currently tried to write in place. So please copy the scripts into some workspace if you want to try them out. I'll patch the example scripts to run out-of-the-box, soon.
  6. Johannes Merkert: "Ziffernlerner. Ein k nstliches neuronales Netz selber gebaut". In: C't 2016/06, p. 142-147. Web: http://www.heise.de/ct/ausgabe/2016-6-Ein-kuenstliches-neuronales-Netz-selbst-gebaut-3118857.html.
  7. See Ramon Wartala: "Tiefensch rfe: Deep learning mit NVIDIAs Jetson-TX1-Board und dem Caffe-Framework". In: iX 06/2016, pp. 100-103
  8. https://lists.debian.org/debian-science/2016/03/msg00016.html

14 October 2014

Julian Andres Klode: Key transition

I started transitioning from 1024D to 4096R. The new key is available at: https://people.debian.org/~jak/pubkey.gpg and the keys.gnupg.net key server. A very short transition statement is available at: https://people.debian.org/~jak/transition-statement.txt and included below (the http version might get extended over time if needed). The key consists of one master key and 3 sub keys (signing, encryption, authentication). The sub keys are stored on an OpenPGP v2 Smartcard. That s really cool, isn t it? Somehow it seems that GnuPG 1.4.18 also works with 4096R keys on this smartcard (I accidentally used it instead of gpg2 and it worked fine), although only GPG 2.0.13 and newer is supposed to work.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1,SHA512
Because 1024D keys are not deemed secure enough anymore, I switched to
a 4096R one.
The old key will continue to be valid for some time, but i prefer all
future correspondence to come to the new one.  I would also like this
new key to be re-integrated into the web of trust.  This message is
signed by both keys to certify the transition.
the old key was:
pub   1024D/00823EC2 2007-04-12
      Key fingerprint = D9D9 754A 4BBA 2E7D 0A0A  C024 AC2A 5FFE 0082 3EC2
And the new key is:
pub   4096R/6B031B00 2014-10-14 [expires: 2017-10-13]
      Key fingerprint = AEE1 C8AA AAF0 B768 4019  C546 021B 361B 6B03 1B00
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEARECAAYFAlQ9j+oACgkQrCpf/gCCPsKskgCgiRn7DoP5RASkaZZjpop9P8aG
zhgAnjHeE8BXvTSkr7hccNb2tZsnqlTaiQIcBAEBCgAGBQJUPY/qAAoJENc8OeVl
gLOGZiMP/1MHubKmA8aGDj8Ow5Uo4lkzp+A89vJqgbm9bjVrfjDHZQIdebYfWrjr
RQzXdbIHnILYnUfYaOHUzMxpBHya3rFu6xbfKesR+jzQf8gxFXoBY7OQVL4Ycyss
4Y++g9m4Lqm+IDyIhhDNY6mtFU9e3CkljI52p/CIqM7eUyBfyRJDRfeh6c40Pfx2
AlNyFe+9JzYG1i3YG96Z8bKiVK5GpvyKWiggo08r3oqGvWyROYY9E4nLM9OJu8EL
GuSNDCRJOhfnegWqKq+BRZUXA2wbTG0f8AxAuetdo6MKmVmHGcHxpIGFHqxO1QhV
VM7VpMj+bxcevJ50BO5kylRrptlUugTaJ6il/o5sfgy1FdXGlgWCsIwmja2Z/fQr
ycnqrtMVVYfln9IwDODItHx3hSwRoHnUxLWq8yY8gyx+//geZ0BROonXVy1YEo9a
PDplOF1HKlaFAHv+Zq8wDWT8Lt1H2EecRFN+hov3+lU74ylnogZLS+bA7tqrjig0
bZfCo7i9Z7ag4GvLWY5PvN4fbws/5Yz9L8I4CnrqCUtzJg4vyA44Kpo8iuQsIrhz
CKDnsoehxS95YjiJcbL0Y63Ed4mkSaibUKfoYObv/k61XmBCNkmNAAuRwzV7d5q2
/w3bSTB0O7FHcCxFDnn+tiLwgiTEQDYAP9nN97uibSUCbf98wl3/
=VRZJ
-----END PGP SIGNATURE-----

Filed under: Uncategorized

Julian Andres Klode: Key transition

I started transitioning from 1024D to 4096R. The new key is available at: https://people.debian.org/~jak/pubkey.gpg and the keys.gnupg.net key server. A very short transition statement is available at: https://people.debian.org/~jak/transition-statement.txt and included below (the http version might get extended over time if needed). The key consists of one master key and 3 sub keys (signing, encryption, authentication). The sub keys are stored on an OpenPGP v2 Smartcard. That s really cool, isn t it? Somehow it seems that GnuPG 1.4.18 also works with 4096R keys on this smartcard (I accidentally used it instead of gpg2 and it worked fine), although only GPG 2.0.13 and newer is supposed to work.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1,SHA512
Because 1024D keys are not deemed secure enough anymore, I switched to
a 4096R one.
The old key will continue to be valid for some time, but i prefer all
future correspondence to come to the new one.  I would also like this
new key to be re-integrated into the web of trust.  This message is
signed by both keys to certify the transition.
the old key was:
pub   1024D/00823EC2 2007-04-12
      Key fingerprint = D9D9 754A 4BBA 2E7D 0A0A  C024 AC2A 5FFE 0082 3EC2
And the new key is:
pub   4096R/6B031B00 2014-10-14 [expires: 2017-10-13]
      Key fingerprint = AEE1 C8AA AAF0 B768 4019  C546 021B 361B 6B03 1B00
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEARECAAYFAlQ9j+oACgkQrCpf/gCCPsKskgCgiRn7DoP5RASkaZZjpop9P8aG
zhgAnjHeE8BXvTSkr7hccNb2tZsnqlTaiQIcBAEBCgAGBQJUPY/qAAoJENc8OeVl
gLOGZiMP/1MHubKmA8aGDj8Ow5Uo4lkzp+A89vJqgbm9bjVrfjDHZQIdebYfWrjr
RQzXdbIHnILYnUfYaOHUzMxpBHya3rFu6xbfKesR+jzQf8gxFXoBY7OQVL4Ycyss
4Y++g9m4Lqm+IDyIhhDNY6mtFU9e3CkljI52p/CIqM7eUyBfyRJDRfeh6c40Pfx2
AlNyFe+9JzYG1i3YG96Z8bKiVK5GpvyKWiggo08r3oqGvWyROYY9E4nLM9OJu8EL
GuSNDCRJOhfnegWqKq+BRZUXA2wbTG0f8AxAuetdo6MKmVmHGcHxpIGFHqxO1QhV
VM7VpMj+bxcevJ50BO5kylRrptlUugTaJ6il/o5sfgy1FdXGlgWCsIwmja2Z/fQr
ycnqrtMVVYfln9IwDODItHx3hSwRoHnUxLWq8yY8gyx+//geZ0BROonXVy1YEo9a
PDplOF1HKlaFAHv+Zq8wDWT8Lt1H2EecRFN+hov3+lU74ylnogZLS+bA7tqrjig0
bZfCo7i9Z7ag4GvLWY5PvN4fbws/5Yz9L8I4CnrqCUtzJg4vyA44Kpo8iuQsIrhz
CKDnsoehxS95YjiJcbL0Y63Ed4mkSaibUKfoYObv/k61XmBCNkmNAAuRwzV7d5q2
/w3bSTB0O7FHcCxFDnn+tiLwgiTEQDYAP9nN97uibSUCbf98wl3/
=VRZJ
-----END PGP SIGNATURE-----

Filed under: Uncategorized

11 November 2009

Yves-Alexis Perez: Key transition (this is _not_ a meme)

Ok, so following the trend, I created some time ago a new GPG key, which I'm now transitioning too. I've set up a transition document, available at http://molly.corsac.net/~corsac/key-transition.txt. It's signed by both the old and the new keys and is reproduced below:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160,SHA512
Wed, 11 Nov 2009 13:44:05 +0100
I've recently set up a new RSA-based GPG key, and will be transitioning away
from my old DSA-based one.  The old key will be revoked soon, so I prefer all
future correspondence to use the new one.  I would also like to ensure that
this new key is well-integrated into the web of trust.  This message is signed
by both keys to certify the transition.
The old DSA key was:
pub   1024D/C5C05BAE 2004-11-11
      Key fingerprint = DE26 2FC4 7097 FFC6 DE2C  D8C0 4D44 C020 C5C0 5BAE
The new RSA key is:
pub   4096R/71EF0BA8 2009-05-06
      Key fingerprint = 4510 DCB5 7ED4 7040 60C6  6476 3055 0F78 71EF 0BA8
If you already know my old key, you can verify that the new key is
signed by the old one:
  gpg --check-sigs 71EF0BA8
If you don't already know my old key, or if you're extra-paranoid, you
can check the fingerprint against the one given above:
  gpg --fingerprint 71EF0BA8
If you have previously signed my old DSA key, and if you're satisfied
that you've got the correct new RSA key, then I'd appreciate it if you
would sign my new key as well:
  caff 71EF0BA8
The caff program is in the signing-party package in Debian.  Please be careful
to generate signatures that don't rely on the weakening SHA-1 hash algorithm,
which requires some careful configuration even if you've already configured
gpg correctly.  See http://www.gag.com/bdale/blog/posts/Strong_Keys.html for
the gory details.
Thanks,
- --
Yves-Alexis Perez
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEAREDAAYFAkr6sqQACgkQTUTAIMXAW64HiACeIyabQueDHAeiAX8EkIeApiDj
++UAn2z7YkjHx0lQh0+s5WdhikG0YztiiQIcBAEBCgAGBQJK+rKkAAoJEDBVD3hx
7wuodUcQAKMbG9Rehxz+uZ6fST99cHt5Fjnv9TorY4hQaQK+85ZgiwPaHMHfYM1G
5hcrXI+JFUpz8j40deZuaWuspOdHBHwnHNQril8MqT0CJgtB6HFTo+w/7Lmmui5M
DDMMed39UJl7bF73hV9ywGecxPpeh+dtoVnh0VT16uK2xTvW6ICEZgaPw1xfPUHS
+jxQ7I05X1OWQkPpmhxXJqGclDyO+qx4CJZsOxUAvt2LphHxhZxB3QE5OUdudGKQ
AH6KhC4rpNQdJVMX20SG8PybL/AipN3Y8N/63VkoqVC2heRlaQ69HjsuqIAkIyan
hHnqmJH8Q+TDTbdKZvOQv6jcd4o3VSibz0T9MwnOfqQ0uRYyTpaXC0vLUH6lXaC4
eK+VVWbY8vCAFHR3h80Q61i2me2HU5ly7a/W22dz19zzDNNC5q9MO78uIYkUK78N
Z0wzJrmOxRyhvs5DOSOpNVlXZhffNQM1f42xxG8cUDaIf7pR5jK+xqHV7tIBQE1D
CrD0mt+YQCnngK0i4wQTO7VT/vjypf4A9W+VSsoJJpRhBbngU4pHu9JWqO84/7AA
j5FN8ug15MWysaS+FQ/EqzHmT7BGBWaTPv3yGlHKUjx0w4bPEpbH7y3fwHAcmOFf
xFRzvZFQ03zeer06yAqTVNuwr77HZgrCzgyQVgIkegAg6iUPiZcs
=CBT+
-----END PGP SIGNATURE-----

25 October 2009

Biella Coleman: Branding Politics, Branding Change

Brands are most often associated with the world of crass consumerism but they can play a key role in fomenting political change. Or so claim some pretty clever thinkers and activists and they will be giving talk about the importance of branding for democratic politics, this Monday at 7 PM, at the Change You Want to See
Please join us this Monday, October 26th as we continue our series on Symbols, Branding and Persuasion with an exploration of branding in the context of electoral and legislative politics. We ll start with a presentation by media theorist Stephen Duncombe, author of Dream: Reimagining Progressive Politics in an Age of Fantasy and the forthcoming Branding the New Deal. Afterward Jessica Teal, design manager for the Obama 2008 presidential campaign will join Duncombe for a conversation via video skype. Like it or not, propaganda and mass persuasion are part of modern democratic politics. Many progressives today have an adverse reaction to propaganda: ours is a politics based in reason and rationality, not symbols and fantasy. Given our last administration s fondness for selling fantasies as reality, this aversion to branding, marketing and propaganda is understandable. But it is also na ve. Mass persuasion is a necessary part of democratic politics, the real issue is what ethics it embodies and which values it expresses. Looking critically at how the Roosevelt Administration tried to brand the New Deal and how the Obama campaign leveraged principles of marketing and advertising gives us an opportunity to think about different models of political persuasion.

3 November 2005

Philipp Kern: Today we've got April Fool's Day, right?

IBM is now able to slow down light with “fairly standard materials”, and it’s only a question of time when photons replace electrons in computers. Thanks ZDNet for making me smile...