No double posting this time ;-)
I've got not so much spare time this month to spend on Debian, but I could work on the following packages:
-
golang-github-hpcloud-tail/1.0.0+git20160415.b294095-3: put versioned dependency & rebuild against golang-fsnotify/1.3.0-3 to fix FTBFS on ppc64el.
-
updates: packer/0.10.1-1, pybtex/0.20.1-1, afl/2.12b-1, afl/2.13b-1, pyutilib/5.3.5-1.
-
new packages: golang-github-azure-go-ntlmssp/0.0~git20160412.e0b63eb-1 (needed by Packer 0.10.1), and python-latexcodec/1.0.3-1 (needed by Pybtex 0.20).
-
prospector/0.11.7-7 fixed for reproducible builds: there were variations in the sorting order of dependencies in
prospector.egg-info/requires.txt
.
I've prepared a patch to make the package reproducible again (that problem began with 0.11.7-5) before the proposed toolchain patch for setuptools (#804249) gets accepted.
-
python-latexcodec/1.0.3-3 also fixed for reproducible builds (#824454).
This series of blog postings also includes little introductions of and into new packages in the archive.
This month there is:
Pyinfra
Pyinfra is a new project which is currently still in development state.
It has been already pointed out in an interesting German article
, and is now available as
package maintained within the Python Applications Team.
It's currently a one man production by Nick Barrett, and eagerly developed in the past weeks (we're currently at 0.1~dev24).
Pyinfra is a remote server configuration/provisioning/service deployment tool which belongs in the same software category like Puppet or Ansible
.
It's for provisioning one or an array of remote servers with software packages and to configure them.
Pyinfra runs agentless like Ansible, that means for using it nothing special (like a daemon) has to run on targeted servers.
It's written to be used for provisioning POSIX compatible Linux systems and has alternatives when it comes to special features like package managers (e.g. supports apt as well as yum).
The documentation could be found in
usr/share/doc/pyinfra/html/
.
Here's a little crash course on how to use Pyinfra:
The pyinfra CLI tool is used on the command line like this, deploy scripts, single operations or facts (see below) could be used on a single server or a multitude of remote servers:
$ pyinfra -i <inventory script/single host> <deploy script>
$ pyinfra -i <inventory script/single host> --run <operation>
$ pyinfra -i <inventory script/single host> --facts <fact>
Remote servers which are operated on must provide a working shell and they must be reachable by SSH. For connecting,
--port
,
--user
,
--password
,
--key
/
--key-password
and
--sudo
flags are available,
--sudo
to gain superuser rights.
Root access or sudo rights of course have to be already set up.
By the way, localhost could be operated on the same way.
Single operations are organized in
modules like "apt", "files", "init", "server" etc.
With the
--run
option they could be used individually on servers like follows, e.g.
server.user
adds a new user on a single targeted system (
-v
adds verbosity to the pyinfra run):
$ pyinfra -i 192.0.2.10 --run server.user sam --user root --key ~/.ssh/sshkey --key-password 123456 -v
Multiple servers can be grouped in inventories, which hold the targeted hosts and data associated with them, like e.g. an inventory file
farm1.py
would contain lists like this:
COMPUTE_SERVERS = ['192.0.2.10', '192.0.2.11']
DATABASE_SERVERS = ['192.0.2.20', '192.0.2.21']
Group designators must be all caps.
A higher level of grouping are the file names of inventory scripts, thus
COMPUTE_SERVERS
and
DATABASE_SERVERS
can be referenced to at the same time by the group designator
farm1
. Plus, all servers are automatically added to the group
all
.
And, inventory scripts should be stored in the subfolder
inventory/
in the project directory.
Inventory files then could be used instead of specific IP addresses like this, the single operation then gets performed on all given machines in
farm1.py
:
$ pyinfra -i inventory/farm1.py --run server.user sam --user root --key ~/.ssh/sshkey --key-password=123456 -v
Deployment scripts could be used together with group data files in the subfolder
group_data/
in the project directory.
For example, a
group_data/farm1.py
designates all servers given in
inventory/farm1.py
(by the way,
all.py
designates all servers), and contains the random attribute
user_name
(attributes must be lowercase), next to authentication data for the whole inventory group:
user_name = 'sam'
ssh_user = 'root'
ssh_key = '~/.ssh/sshkey'
ssh_key_password = '123456'
The random attribute can be picked up by a deployment script using
host.data()
like follows,
user_name
could be used again for e.g.
server.user()
, like this:
from pyinfra import host
from pyinfra.modules import server
server.user(host.data.user_name)
This deploy, the ensemble of inventory file, group data file and deployment script (usually placed top level in the project folder) then could be run that way:
$ pyinfra -i inventory/farm1.py deploy.py
You have guessed it, since deployment scripts are Python scripts they are fully programmable (please regard that Pyinfra is build & runs on Python 3 on Debian), and that's the main advantage point with this piece of software.
Quite handy for that come Pyinfra
facts, functions which check different things on remote systems and return information as Python data.
Like e.g.
deb_packages
returns a dictionary of installed packages from a remote apt based server:
$ pyinfra -i 192.0.2.10 --fact deb_packages --user root --key ~/.ssh/sshkey --key-password=123456
"192.0.2.10":
"libdebconfclient0": "0.192",
"python-debian": "0.1.27",
"libavahi-client3": "0.6.31-5",
"dbus": "1.8.20-0+deb8u1",
"libustr-1.0-1": "1.0.4-3+b2",
"sed": "4.2.2-4+b1",
Using facts, Pyinfra reveals its full potential. For example, a deployment script could go like this,
linux.distribution()
returns a dict containing the installed distribution:
from pyinfra import host
from pyinfra.modules import apt
if host.fact.linux_distribution['name'] == 'Debian':
apt.packages(packages='gummi', present=True, update=True)
elif host.fact.linux_distribution['name'] == 'CentOS':
pass
I'll spare more sophisticated examples to keep this introduction simple.
Beyond fancy deployment scripts, Pyinfra features an own
API by which it could be programmed from the outside, and much more.
But maybe that's enough to introduce Pyinfra.
That are the usage basics.
Pyinfra is a brand new project and it remains to be seen whether the developer can keep on further developing the tool like he does these days.
For a private project it's insane to attempt to become a contender for the established "big" free configuration management tools and frameworks, but, if Puppet has become too complex in the meanwhile or not
, I really don't think that's the point here.
Pyinfra follows an own approach in being programmable the way which it is.
And it's definitely not harm to have it in the toolbox already, not trying to replace nothing.
Brainstorm
After the first package has been in experimental, the
Brainstorm library from Swiss AI research institute
IDSIA is now available as
python3-brainstorm in unstable.
Brainstorm is a lean, easy-to-use library for setting up deep learning networks (multiple layered artificial neural networks) for machine learning applications like for image and speech recognition or natural language processing.
To set up a working training network for a classifier for handwritten digits like the MNIST dataset (a usual "hello world") just takes a couple of lines, like an example demonstrates.
The
package is maintained within the Debian Python Modules Team.
The Debian package ships a couple of examples in
/usr/share/python3-brainstorm/examples
(the
data/
and
examples/
folders of the upstream tarball are combined here).
Among them there are
:
-
scripts for creating proper HDF5 training data of the MNIST database of handwritten digits and for training a simple neural network on it (
create_mnist.py
, mnist_pi.py
),
-
examples for setting up data and training a convolutional neural network (CNN) on the CIFAR-10 dataset of pictures (
create_cifa10.py
, cifar10_cnn.py
),
-
as well as example scripts for setting up training data and creating a LSTM (Long short-term memory) recurrent neural network (RNN) on test data used in the Hutter Prize competition (
create_hutter.py
, hutter_lstm.py
).
-
And there's also another example script for creating training data of the CIFAR-100 dataset (
create_cifar100.py
).
The current documentation in
/usr/share/doc/python3-brainstorm/html/
isn't complete yet (several chapters are under construction), but there's a walkthrough on the CIFAR-10 example.
The MNIST example has been extended by Github user
pinae, and has been explained in German C't recently
.
What are the perspectives for further development? Like Zhou Mo confirmed, there are a couple of deep learning frameworks around having a rather poor outlook since there have been abandoned after being completed as PhD projects.
There's really no point for thriving to have them all in Debian, like the
ITP of Minerva has been given up partly for this reason, there weren't any commits since 08/2015 (and because cuDNN isn't available and most likely won't).
Brainstorm, 0.5 have been released 05/2015, also was a PhD project as IDSIA.
It's stated on Github that the project is "under active development", but the rather sparse
project page on the other side expresses the "hope the community will help us to further improve Brainstorm".
This sentence much often implies that the developers are not actively working on the project.
But there are recent commits and it looks that upstream is active and could be reached when there are problems, and that the project is active.
So I don't think we're riding a dead horse, here.
The downside for Brainstorm in Debian is, it seems that the libraries which are needed for GPU accelerated processing can't be fully provided.
Pycuda is available, but
scikit-cuda (an additional library which provides wrappers for CUDA features like CUBLAS, CUFFT and CUSOLVER) is not and won't be, because the
CULA Dense Toolkit (scikit-cuda also contains wrappers for also that) is not available freely as source.
Because of that, a dependency against pycuda, not even as Suggests (it's non-free), has been spared.
Without GPU acceleration, Brainstorm computes the matrices on openBLAS using a Cython wrapper on the
NumpyHandler
, and the
PyCudaHandler
couldn't be used.
openBLAS makes pretty good use of the available hardware (it distributes over all available CPU cores), but it's not yet possible to run Brainstorm full throttle using available floating point devices to reduce training times, which becomes crucial when the projects are getting bigger.
Brainstorm belongs to the number of deep learning frameworks already being or becoming available in Debian.
Currently there is:
-
Caffe for image recognition resp. classification is just around the corner (#823140).
-
Theano is currently in experimental, and will be ready together with libgpuarray (OpenCL based GPU accelerated processing) and Keras (abstraction layer) for Stretch.
It could already run on NVIDIA graphics card via CUDA (limited to amd64 and ppc64el, though).
-
Lasagne, another abstraction layer for Theano is RFP (#818641).
-
Google's Tensorflow, the free successor of Dist-Belief, is currently on ITP (#804612).
It's waiting for Google's build system Bazel to become available.
-
Torch is also ITP (#794634).
It's blocked by a wishlist bug on dh-lua to get closed.
-
Amazon's own machine learning workhorse dsstne ("destiny") is now also put under a free license and also will becoming available (#824692) in the near future for Debian (contrib).
It's not yet for image recognition applications, though (lacks CNN).
-
Mxnet is RFP (#808235).
I've checked over Microsoft's CNTK, but although it's also set free recently I have my doubts if that could be included.
Apparently there are dependencies against non-free software and most likely other issues.
So much for a little update on the state of deep learning in Debian, please excuse if my radar misses something.