Daniel Stender: My work for Debian in May
No double posting this time ;-)
I've got not so much spare time this month to spend on Debian, but I could work on the following packages:
Remote servers which are operated on must provide a working shell and they must be reachable by SSH. For connecting,
Multiple servers can be grouped in inventories, which hold the targeted hosts and data associated with them, like e.g. an inventory file
Group designators must be all caps.
A higher level of grouping are the file names of inventory scripts, thus
Deployment scripts could be used together with group data files in the subfolder
The random attribute can be picked up by a deployment script using
This deploy, the ensemble of inventory file, group data file and deployment script (usually placed top level in the project folder) then could be run that way:
You have guessed it, since deployment scripts are Python scripts they are fully programmable (please regard that Pyinfra is build & runs on Python 3 on Debian), and that's the main advantage point with this piece of software.
Quite handy for that come Pyinfra facts, functions which check different things on remote systems and return information as Python data.
Like e.g.
Using facts, Pyinfra reveals its full potential. For example, a deployment script could go like this,
I'll spare more sophisticated examples to keep this introduction simple.
Beyond fancy deployment scripts, Pyinfra features an own API by which it could be programmed from the outside, and much more.
But maybe that's enough to introduce Pyinfra.
That are the usage basics.
Pyinfra is a brand new project and it remains to be seen whether the developer can keep on further developing the tool like he does these days.
For a private project it's insane to attempt to become a contender for the established "big" free configuration management tools and frameworks, but, if Puppet has become too complex in the meanwhile or not3, I really don't think that's the point here.
Pyinfra follows an own approach in being programmable the way which it is.
And it's definitely not harm to have it in the toolbox already, not trying to replace nothing.
Brainstorm
After the first package has been in experimental, the Brainstorm library from Swiss AI research institute IDSIA4 is now available as python3-brainstorm in unstable.
Brainstorm is a lean, easy-to-use library for setting up deep learning networks (multiple layered artificial neural networks) for machine learning applications like for image and speech recognition or natural language processing.
To set up a working training network for a classifier for handwritten digits like the MNIST dataset (a usual "hello world") just takes a couple of lines, like an example demonstrates.
The package is maintained within the Debian Python Modules Team.
The Debian package ships a couple of examples in
- golang-github-hpcloud-tail/1.0.0+git20160415.b294095-3: put versioned dependency & rebuild against golang-fsnotify/1.3.0-3 to fix FTBFS on ppc64el.
- updates: packer/0.10.1-1, pybtex/0.20.1-1, afl/2.12b-1, afl/2.13b-1, pyutilib/5.3.5-1.
- new packages: golang-github-azure-go-ntlmssp/0.0~git20160412.e0b63eb-1 (needed by Packer 0.10.1), and python-latexcodec/1.0.3-1 (needed by Pybtex 0.20).
-
prospector/0.11.7-7 fixed for reproducible builds: there were variations in the sorting order of dependencies in
prospector.egg-info/requires.txt
. I've prepared a patch to make the package reproducible again (that problem began with 0.11.7-5) before the proposed toolchain patch for setuptools (#804249) gets accepted. - python-latexcodec/1.0.3-3 also fixed for reproducible builds (#824454).
usr/share/doc/pyinfra/html/
.
Here's a little crash course on how to use Pyinfra:
The pyinfra CLI tool is used on the command line like this, deploy scripts, single operations or facts (see below) could be used on a single server or a multitude of remote servers:
$ pyinfra -i <inventory script/single host> <deploy script> $ pyinfra -i <inventory script/single host> --run <operation> $ pyinfra -i <inventory script/single host> --facts <fact>
--port
, --user
, --password
, --key
/--key-password
and --sudo
flags are available, --sudo
to gain superuser rights.
Root access or sudo rights of course have to be already set up.
By the way, localhost could be operated on the same way.
Single operations are organized in modules like "apt", "files", "init", "server" etc.
With the --run
option they could be used individually on servers like follows, e.g. server.user
adds a new user on a single targeted system (-v
adds verbosity to the pyinfra run):
$ pyinfra -i 192.0.2.10 --run server.user sam --user root --key ~/.ssh/sshkey --key-password 123456 -v
farm1.py
would contain lists like this:
COMPUTE_SERVERS = ['192.0.2.10', '192.0.2.11'] DATABASE_SERVERS = ['192.0.2.20', '192.0.2.21']
COMPUTE_SERVERS
and DATABASE_SERVERS
can be referenced to at the same time by the group designator farm1
. Plus, all servers are automatically added to the group all
.
And, inventory scripts should be stored in the subfolder inventory/
in the project directory.
Inventory files then could be used instead of specific IP addresses like this, the single operation then gets performed on all given machines in farm1.py
:
$ pyinfra -i inventory/farm1.py --run server.user sam --user root --key ~/.ssh/sshkey --key-password=123456 -v
group_data/
in the project directory.
For example, a group_data/farm1.py
designates all servers given in inventory/farm1.py
(by the way, all.py
designates all servers), and contains the random attribute user_name
(attributes must be lowercase), next to authentication data for the whole inventory group:
user_name = 'sam' ssh_user = 'root' ssh_key = '~/.ssh/sshkey' ssh_key_password = '123456'
host.data()
like follows, user_name
could be used again for e.g. server.user()
, like this:
from pyinfra import host from pyinfra.modules import server server.user(host.data.user_name)
$ pyinfra -i inventory/farm1.py deploy.py
deb_packages
returns a dictionary of installed packages from a remote apt based server:
$ pyinfra -i 192.0.2.10 --fact deb_packages --user root --key ~/.ssh/sshkey --key-password=123456 "192.0.2.10": "libdebconfclient0": "0.192", "python-debian": "0.1.27", "libavahi-client3": "0.6.31-5", "dbus": "1.8.20-0+deb8u1", "libustr-1.0-1": "1.0.4-3+b2", "sed": "4.2.2-4+b1",
linux.distribution()
returns a dict containing the installed distribution:
from pyinfra import host from pyinfra.modules import apt if host.fact.linux_distribution['name'] == 'Debian': apt.packages(packages='gummi', present=True, update=True) elif host.fact.linux_distribution['name'] == 'CentOS': pass
/usr/share/python3-brainstorm/examples
(the data/
and examples/
folders of the upstream tarball are combined here).
Among them there are5:
-
scripts for creating proper HDF5 training data of the MNIST database of handwritten digits and for training a simple neural network on it (
create_mnist.py
,mnist_pi.py
), -
examples for setting up data and training a convolutional neural network (CNN) on the CIFAR-10 dataset of pictures (
create_cifa10.py
,cifar10_cnn.py
), -
as well as example scripts for setting up training data and creating a LSTM (Long short-term memory) recurrent neural network (RNN) on test data used in the Hutter Prize competition (
create_hutter.py
,hutter_lstm.py
). -
And there's also another example script for creating training data of the CIFAR-100 dataset (
create_cifar100.py
).
/usr/share/doc/python3-brainstorm/html/
isn't complete yet (several chapters are under construction), but there's a walkthrough on the CIFAR-10 example.
The MNIST example has been extended by Github user pinae, and has been explained in German C't recently6.
What are the perspectives for further development? Like Zhou Mo confirmed, there are a couple of deep learning frameworks around having a rather poor outlook since there have been abandoned after being completed as PhD projects.
There's really no point for thriving to have them all in Debian, like the ITP of Minerva has been given up partly for this reason, there weren't any commits since 08/2015 (and because cuDNN isn't available and most likely won't).
Brainstorm, 0.5 have been released 05/2015, also was a PhD project as IDSIA.
It's stated on Github that the project is "under active development", but the rather sparse project page on the other side expresses the "hope the community will help us to further improve Brainstorm".
This sentence much often implies that the developers are not actively working on the project.
But there are recent commits and it looks that upstream is active and could be reached when there are problems, and that the project is active.
So I don't think we're riding a dead horse, here.
The downside for Brainstorm in Debian is, it seems that the libraries which are needed for GPU accelerated processing can't be fully provided.
Pycuda is available, but scikit-cuda (an additional library which provides wrappers for CUDA features like CUBLAS, CUFFT and CUSOLVER) is not and won't be, because the CULA Dense Toolkit (scikit-cuda also contains wrappers for also that) is not available freely as source.
Because of that, a dependency against pycuda, not even as Suggests (it's non-free), has been spared.
Without GPU acceleration, Brainstorm computes the matrices on openBLAS using a Cython wrapper on the NumpyHandler
, and the PyCudaHandler
couldn't be used.
openBLAS makes pretty good use of the available hardware (it distributes over all available CPU cores), but it's not yet possible to run Brainstorm full throttle using available floating point devices to reduce training times, which becomes crucial when the projects are getting bigger.
Brainstorm belongs to the number of deep learning frameworks already being or becoming available in Debian.
Currently there is:
- Caffe for image recognition resp. classification7 is just around the corner (#823140).
- Theano is currently in experimental, and will be ready together with libgpuarray (OpenCL based GPU accelerated processing) and Keras (abstraction layer) for Stretch. It could already run on NVIDIA graphics card via CUDA8 (limited to amd64 and ppc64el, though).
- Lasagne, another abstraction layer for Theano is RFP (#818641).
- Google's Tensorflow, the free successor of Dist-Belief, is currently on ITP (#804612). It's waiting for Google's build system Bazel to become available.
- Torch is also ITP (#794634). It's blocked by a wishlist bug on dh-lua to get closed.
- Amazon's own machine learning workhorse dsstne ("destiny") is now also put under a free license and also will becoming available (#824692) in the near future for Debian (contrib). It's not yet for image recognition applications, though (lacks CNN).
- Mxnet is RFP (#808235).
- Tim Sch rmann: "Schlangen l: Automatisiertes Service-Deployment mit Pyinfra". In: IT-Administrator 05/2016, pp. 90-95.
- For a comparison of configuration management software like this, see B wetter/Johannsen/Steig: "Baukastensysteme: Konfigurationsmanagement mit Open-Source-Software". In: iX 04/2016, pp. 94-99 (please excuse the prevalence of German articles in the pointers, I've just have them at hand).
- On the points of critique on Puppet, see Martin Loschwitz: "David gegen Goliath Zwei Welten treffen aufeinander: Puppet und Ansible". In Linux-Magazin 01/2016, 50-54.
- See the interview with IDSIA's deep learning guru J rgen Schmidhuber in German C't 2014/09, p. 148
-
The examples scripts need some more finetuning.
To run the data creation scripts in place the environment variable
BRAINSTORM_DATA_DIR
could be set, but the trained networks are currently tried to write in place. So please copy the scripts into some workspace if you want to try them out. I'll patch the example scripts to run out-of-the-box, soon. - Johannes Merkert: "Ziffernlerner. Ein k nstliches neuronales Netz selber gebaut". In: C't 2016/06, p. 142-147. Web: http://www.heise.de/ct/ausgabe/2016-6-Ein-kuenstliches-neuronales-Netz-selbst-gebaut-3118857.html.
- See Ramon Wartala: "Tiefensch rfe: Deep learning mit NVIDIAs Jetson-TX1-Board und dem Caffe-Framework". In: iX 06/2016, pp. 100-103
- https://lists.debian.org/debian-science/2016/03/msg00016.html