Search Results: "mlang"

7 August 2017

Mario Lang: If your software should be cross platform and accessible, forget about Qt

A few years ago, I started to write software which primary audience is going to be blind musicians. I did a small presentation of the UI at DebConf15. Most of the functionality is in a compiler-alike backend. But eventually, I wanted to create a user interface to improve the interactive experience. So, the problem again: which toolkit to choose which would be accessible on most platforms? Last time I needed to solve a similar problem, I used Java/Swing. This has its problems, but it actually works on Windows, Linux and (supposedly) Mac. This time around, my implementation language is C++, so Swing didn't look that interesting. It appears there is not much that fullfils these requirements. Qt looked like it could. But since I had my bad experiences already with Qt claiming accessibility they really never implemented, I was at least a bit cautious. Around 10 years ago, when Qt 4 was released, I found that the documentation claimed that Qt4 was accessible on Linux, but it really never was until a very late 4.x release. This information was a blatant lie, trying to lure uninformed programmers into using Qt, much to the disservice of their disabled users. If you ask a random blind Windows user who knows a bit about toolkits, they will readily tell you that they hate every app written in Qt. With this knowledge, and the spirit of "We can change the world" I wrote a private mail to the person responsible for maintaining Qt accessibility. I explained to them that I am about to choose Qt as the UI platform for my program, and that my primary audience is users that rely on Accessibility. I also explained that cross-platform support (esp. good support on Windows) is a necessary requirement for my project. I basically got a nice marketing speak answer back, but when I read it back then, I didn't fully realize that just yet. The tone basicallly: "No problem. Qt works on Linux, Mac and Windows, and if you find any problems, just report them to us and we are going to fix them." Well, I was aware that I am not a paying customer of Qt Company, so the promise above is probbably a bit vague (I thought), but still, it sounded quite encouraging. So off I went, and started to learn enough Qt to implement the simple user interface I wanted. First tests on Linux seemed to work, that is nice. After a while, I started to test on Windows. And BANG, of course, there is a "hidden" problem. The most wide-spread (commercial) screen reader used by most blind people somehow does not see the content of text entry widgets. This was and still is a major problem for my project. I have a number of text entry fields in my UI. Actually, the main part of the UI is a simple editor, so you might see the problem already. So some more testing was done, just to realize that yes, text entry fields indeed do not work with the most widely used screen reader on Windows. While other screen readers seemed to work (NVDA) it is simply not feasable to ask my future users to switch to a different screen reader just for a single program. So I either needed to get JAWS fixed, or drop Qt. Well, after a lot of testing, I ended up submitting a bug to the Qt tracker. That was a little over a year ago. The turnaround time of private mail was definitely faster. And now I get a reply to my bug explaining that JAWS was never a priority, still is not, and that my problem will probably go away after a rewrite which has no deadline yet. Why did I expect this already? At least now I know why no blind users wants to have any Qt on their machines. If you want to write cross-platform accessible software: You definitely should not use Qt. And no other Free Software toolkit for that matter, because they basically all dont give a shit about accessibility on non-Linux platforms. Yes, GTK has a Windows port, but that isn't accessible at all. Yes, wxWindows has a Windows port, but that has problems with, guess what, text entry fields (at least last time I checked). Free Software is NOT about Accessibility or equality. I see evidence for that claim since more then 15 years now. It is about coolness, self-staging, scratch-your-own-itchness and things like that. When Debian released Jessie, I was told that something like Accessibility is not important enough to delay the release. If GNOME just broke all the help system by switching to not-yet-accessible webkit, that is just bad luck, I was told. But it is outside of the abilities of package maintainers to ensure that what we ship is accessible. I hereby officially give up. And I admit my own stupidity. Sorry for claiming Free Software would be a good thing for the world. It is definitely not for my kin. If Free Software ever takes over, the blind will be unable to use their computers. Don't get me wrong. I love my command-line. But as the well-known saying goes: "Free Software will be ready for the desktop user, perhaps, next year?" The scratch-your-own-itch philosophy simply doesn't work together with a broad list of user requirements. If you want to support users with disabilities, you probably should not rely on hippie coders right now. I repeat: If you want to write compliant software, that would be also useable to people with disabilities, you can not use Qt. For now, you will need to write a native UI for every platform you want to support. Oh, and do not believe Qt Company marketing texts, your users will suffer if you do.

20 December 2016

Mario Lang: Squarepusher's Shobaleader One

I recently was lucky enough to see one of my long-time favourite drum and bass artists live! Squarepusher! I know and love his music since the late 90s. My girlfriend got us tickets for the Shobaleader One performance at Progy & Bess in Vienna. It was fantastic! 90 minutes of high energy jazz. As a personal memory, I captured one of my favourite Squarepusher tracks, Cooper's World. This is another case of #unseenphotography. While I am usually not very much into jazz, I like this fusion of dnb and jazz very much.

Mario Lang: Upgrading GlusterFS from Wheezy to Stretch

We are about to upgrade one of our GlusterFS-based storage systems at work. Fortunately, I was worrying about the upgrade procedure for the Debian packages not being tested by the maintainers. It turns out I was right. Simply upgrading the packages without manual intervention will apparently render your GlusterFS server unusable.
Basic setup I have only tested the most basic distributed GlusterFS setup. No replication whatsoever. We have two GlusterFS servers, storage1 and storage2. A peering between both has been established, and a very basic volume has been configured:
storage1:~# gluster
gluster> peer status
Number of Peers: 1
Hostname: storage2
Uuid: 2d22cc13-2252-4cf1-bfe9-3d27fa2fbc29
State: Peer in Cluster (Connected)
gluster> volume create data storage1:/srv/data storage2:/srv/data
...
gluster> volume start data
...
gluster> volume info
Volume Name: data
Type: Distribute
Volume ID: e2bd5767-4b33-4e57-9320-91ca76f52d56
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage1:/srv/data
Brick2: storage2:/srv/data
For the test setup, I populated the volume with a number of files.
Upgrading from Wheezy to Jessie To be save, stop the volume before you begin with the package upgrade:
gluster> volume stop data
And now perform your dist-upgrade. After the upgrade, you will have to perform two manual clean ups. Both actions have to be performed on all storage servers.
/etc/glusterd is now /var/lib/glusterd The package maintainers have apparently neglected to take care of this one. You manually need to copy the old configuration files over.
storage1:~# cd /var/lib/glusterd && cp -r /etc/glusterd/* .
Put volume-id in extended attribute GlusterFS 3.5 requires the volume-id in an extended directory attribute. This is also not automatically handled during package upgrade.
storage1:~# vol=data
storage1:~# volid=$(grep volume-id /var/lib/glusterd/vols/$vol/info   cut -d= -f2   sed 's/-//g')
storage1:~# setfattr -n trusted.glusterfs.volume-id -v 0x$volid /srv/data
With these two steps performed on all GlusterFS servers, you should now be able to start and mount your volume again in Debian Jessie. Do not forget to explicitly stop the volume again before continueing with the next upgrade step.
Upgrading from Jessie to Stretch After you have dist-upgraded to Stretch, there is yet another manual step you have to take to convert the volume metadata to the new layout in GlusterFS 3.8. Make sure you have stopped your volumes and the GlusterFS server.
storage1:~# service glusterfs-server stop
Now run the following command:
storage1:~# glusterd --xlator-option *.upgrade=on -N
Now you should be ready to start your volume again:
storage1:~# service glusterfs-server start
storage1:~# gluster
gluster> volume start data
And mount it:
client:~# mount -t glusterfs storage1:/data /mnt
You should now be running GlusterFS 3.8 and your files should still all be there.

12 June 2016

Mario Lang: A Raspberry Pi Zero in a Handy Tech Active Star 40 Braille Display

TL;DR: I put a $5 Raspberry Pi Zero, a Bluetooth USB dongle, and the required adapter cable into my new Handy Tech Active Star 40 braille display. An internal USB port provides the power. This has transformed my braille display into an ARM-based, monitorless, Linux laptop that has a keyboard and a braille display. It can be charged/powered via USB so it can also be run from a power bank or a solar charger, thus potentially being able to run for days, rather than just hours, without needing a standard wall-jack. [picture: a Raspberry Pi Zero embedded within an Active Star 40] [picture: a braille display with a keyboard on top and a Raspberry Pi Zero inside]
Some Background on Braille Display Form Factors Braille displays come in various sizes. There are models tailored for desktop use (with 60 cells or more), models tailored for portable use with a laptop (usually with 40 cells), and, nowadays, there are even models tailored for on-the-go use with a smartphone or similar (with something like 14 or 18 cells). Back in the old days, braille displays were rather massive. A 40-cell braille display was typically about the size of a 13" laptop. In modern times, manufacturers have managed to reduce the size of the internals such that a 40-cell display can be placed in front of a laptop or keyboard instead of placing the laptop on top of the braille display. While this is a nice achievement, I personally haven't found it to be very convenient because you now have to place two physically separate devices on your lap. It's OK if you have a real desk, but, at least in my opinion, if you try to use your laptop as its name suggests, it's actually inconvenient to use a small form factor, 40-cell display. For this reason, I've been waiting for a long-promised new model in the Handy Tech Star series. In 2002, they released the Handy Tech Braille Star 40, which is a 40-cell braille display with enough space to put a laptop directly on top of it. To accommodate larger laptop models, they even built in a little platform at the back that can be pulled out to effectively enlarge the top surface. Handy Tech has now released a new model, the Active Star 40, that has essentially the same layout but modernized internals. [picture: a plain Active Star 40] You can still pull out the little platform to increase the space that can be used to put something on top. [picture: an Active Star 40 with extended platform and a laptop on top] But, most conveniently, they've designed in an empty compartment, roughly the size of a modern smartphone, beneath the platform. The original idea was to actually put a smartphone inside, but this has turned out (at least to me) to not be very feasible. Fortunately, they thought about the need for electricity and added a Micro USB cable terminating within the newly created, empty compartment. My first idea was to put a conventional Raspberry Pi inside. When I received the braille display, however, we immediately noticed that a standard-sized rpi is roughly 3mm too high to fit into the empty compartment. Fortunately, though, a co-worker noticed that the Raspberry Pi Zero was available for order. The Raspberry Pi Zero is a lot thinner, and fits perfectly inside (actually, I think there's enough space for two, or even three, of them). So we ordered one, along with some accessories like a 64GB SDHC card, a Bluetooth dongle, and a Micro USB adapter cable. The hardware arrived a few days later, and was immediately bootstrapped with the assistance of very helpful friends. It works like a charm!
Technical Details The backside of the Handy Tech Active Star 40 features two USB host ports that can be used to connect devices such as a keyboard. A small form-factor, USB keyboard with a magnetic clip-on is included. When a USB keyboard is connected, and when the display is used via Bluetooth, the braille display firmware additionally offers the Bluetooth HID profile, and key press/release events received via the USB port are passed through to it. I use the Bluetooth dongle for all my communication needs. Most importantly, BRLTTY is used as a console screen reader. It talks to the braille display via Bluetooth (more precisely, via an RFCOMM channel). The keyboard connects through to Linux via the Bluetooth HID profile. Now, all that is left is network connectivity. To keep the energy consumption as low as possible, I decided to go for Bluetooth PAN. It appears that the tethering mode of my mobile phone works (albeit with a quirk), so I can actually access the internet as long as I have cell phone reception. Additionally, I configured a Bluetooth PAN access point on my desktop machines at home and at work, so I can easily (and somewhat more reliably) get IP connectivity for the rpi when I'm near one of these machines. I plan to configure a classic Raspberry Pi as a mobile Bluetooth access point. It would essentially function as a Bluetooth to ethernet adapter, and should allow me to have network connectivity in places where I don't want to use my phone.
BlueZ 5 and PAN It was a bit challenging to figure out how to actually configure Bluetooth PAN with BlueZ 5. I found the bt-pan python script (see below) to be the only way so far to configure PAN without a GUI. It handles both ends of a PAN network, configuring a server and a client. Once instructed to do so (via D-Bus) in client mode, BlueZ will create a new network device - bnep0 - once a connection to a server has been established. Typically, DHCP is used to assign IP addresses for these interfaces. In server mode, BlueZ needs to know the name of a bridge device to which it can add a slave device for each incoming client connection. Configuring an address for the bridge device, as well as running a DHCP server + IP Masquerading on the bridge, is usually all you need to do.
A Bluetooth PAN Access Point with Systemd I'm using systemd-networkd to configure the bridge device. /etc/systemd/network/pan.netdev:
[NetDev]
Name=pan
Kind=bridge
ForwardDelaySec=0
/etc/systemd/network/pan.network:
[Match]
Name=pan
[Network]
Address=0.0.0.0/24
DHCPServer=yes
IPMasquerade=yes
Now, BlueZ needs to be told to configure a NAP profile. To my surprise, there seems to be no way to do this with stock BlueZ 5.36 utilities. Please correct me if I'm wrong. Luckily, I found a very nice blog post, as well as an accommodating Python script that performs the required D-Bus calls. For convenience, I use a Systemd service to invoke the script and to ensure that its dependencies are met. /etc/systemd/system/pan.service:
[Unit]
Description=Bluetooth Personal Area Network
After=bluetooth.service systemd-networkd.service
Requires=systemd-networkd.service
PartOf=bluetooth.service
[Service]
Type=notify
ExecStart=/usr/local/sbin/pan
[Install]
WantedBy=bluetooth.target
/usr/local/sbin/pan:
#!/bin/sh
# Ugly hack to work around #787480
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
exec /usr/local/sbin/bt-pan --systemd --debug server pan
This last file wouldn't be necessary if IPMasquerade= were supported in Debian right now (see #787480). After the obligatory systemctl daemon-reload and systemctl restart systemd-networkd, you can start your Bluetooth Personal Area Network with systemctl start pan.
Bluetooth PAN Client with Systemd Configuring the client is also quite easy to do with Systemd. /etc/systemd/network/pan-client.network:
[Match]
Name=bnep*
[Network]
DHCP=yes
/etc/systemd/system/pan@.service:
[Unit]
Description=Bluetooth Personal Area Network client
[Service]
Type=notify
ExecStart=/usr/local/sbin/bt-pan --debug --systemd client %I --wait
Now, after the usual configuration reloading, you should be able to connect to a specific Bluetooth access point with:
systemctl start pan@00:11:22:33:44:55
Pairing via the Command Line Of course, the server and client-side service configuration require a pre-existing pairing between the server and each of its clients. On the server, start bluetoothctl and issue the following commands:
power on
agent on
default-agent
scan on
scan off
pair XX:XX:XX:XX:XX:XX
trust XX:XX:XX:XX:XX:XX
Once you've set scan mode to on, wait a few seconds until you see the device you're looking for scroll by. Note its device address, and use it for the pair and (optional) trust commands. On the client, the sequence is essentially the same except that you don't need to issue the trust command. The server needs to trust a client in order to accept NAP profile connections from it without waiting for manual confirmation by the user. I'm actually not sure if this is the optimal sequence of commands. It might be enough to just pair the client with the server and issue the trust command on the server, but I haven't tried this yet.
Enabling Use of the Bluetooth HID Profile Essentially the same as above also needs to be done in order to use the Bluetooth HID profile of the Active Star 40 on Linux. However, instead of agent on, you need to issue the command agent KeyboardOnly. This explicitly tells bluetoothctl that you're specifically looking for a HID profile.
Configuring Bluetooth via the Command Line Feels Vague While I'm very happy that I actually managed to set all of this up, I must admit that the command-line interface to BlueZ feels a bit incomplete and confusing. I initially thought that agents were only for PIN code entry. Now that I've discovered that "agent KeyboardOnly" is used to enable the HID profile, I'm not sure anymore. I'm surprised that I needed to grab a script from a random git repository in order to be able to set up PAN. I remember, with earlier version of BlueZ, that there was a tool called pand that you could use to do all of this from the command-line. I don't seem to see anything like that for BlueZ 5 anymore. Maybe I'm missing something obvious?
Performance The data rate is roughly 120kB/s, which I consider acceptable for such a low power solution. The 1GHz ARM CPU actually feels sufficiently fast for a console/text-mode person like me. I'll rarely be using much more than ssh and emacs on it anyway.
Console fonts and screen dimensions The default dimensions of the framebuffer on the Raspberry Pi Zero are a bit unexpectedly strange. fbset reports that the screen dimension is 656x416 pixels (of course, no monitor connected). With a typical console font of 8x16, I got 82 columns and 26 lines. With a 40 cell braille display, the 82 columns are very inconvenient. Additionally, as a braille user, I would like to be able to view Unicode braille characters in addition to the normal charset on the console. Fortunately, Linux supports 512 glyphs, while most console fonts do only provide 256. console-setup can load and combine two 256-glyph fonts at once. So I added the following to /etc/default/console-setup to make the text console a lot more friendly to braille users:
SCREEN_WIDTH=80
SCREEN_HEIGHT=25
FONT="Lat15-Terminus16.psf.gz brl-16x8.psf"

Note

You need console-braille installed for brl-16x8.psf to be available.

Further Projects There's a 3.5mm audio jack inside the braille display as well. Unfortunately, there are no converters from Mini-HDMI to 3.5mm audio that I know of. It would be very nice to be able to use the sound card that is already built into the Raspberry Pi Zero, but, unfortunately, this doesn't seem possible at the moment. Alternatively, I'm looking at using a Micro USB OTG hub and an additional USB audio adapter to get sound from the Raspberry Pi Zero to the braille display's speakers. Unfortunately, the two USB audio adapters I've tried so far have run hot for some unknown reason. So I have to find some other chipset to see if the problem goes away. A little nuisance, currently, is that you need to manually power off the Raspberry, wait a few seconds, and then power down the braille display. Turning the braille display off cuts power delivery via the internal USB port. If this is accidentally done too soon then the Raspberry Pi Zero is shut down ungracefully (which is probably not the best way to do it). We're looking into connecting a small, buffering battery to the GPIO pins of the rpi, and into notifying the rpi when external power has dropped. A graceful, software-initiated shutdown can then be performed. You can think of it as being like a mini UPS for Micro USB.
The image If you are a happy owner of a Handy Tech Active Star 40 and would like to do something similar, I am happy to share my current (Raspbian Stretch based) image. In fact, if there is enough interest by other blind users, we might even consider putting a kit together that makes it as easy as possible for you to get started. Let me know if this could be of interest to you.
Thanks Thanks to Dave Mielke for reviewing the text of this posting. Thanks to Simon Kainz for making the photos for this article. And I owe a big thank you to my co-workers at Graz University of Technology who have helped me a lot to bootstrap really quickly into the rpi world.
P.S. My first tweet about this topic is just five days ago, and apart from the soundcard not working yet, I feel like the project is already almost complete! By the way, I am editing the final version of this blog posting from my newly created monitorless ARM-based Linux laptop via an ssh connection to my home machine.

21 April 2016

Mario Lang: Scraping the web with Python and XQuery

During a JAWS for Windows training, I was introduced to the Research It feature of that screen reader. Research It is a quick way to utilize web scraping to make working with complex web pages easier. It is about extracting specific information from a website that does not offer an API. For instance, look up a word in an online dictionary, or quickly check the status of a delivery. Strictly speaking, this feature does not belong in a screen reader, but it is a very helpful tool to have at your fingertips. Research It uses XQuery (actually, XQilla) to do all the heavy lifting. This also means that the Research It Rulesets are theoretically also useable on other platforms. I was immediately hooked, because I always had a love for XPath. Looking at XQuery code is totally self-explanatory for me. I just like the syntax and semantics. So I immediately checked out XQilla on Debian, and found #821329 and #821330, which were promptly fixed by Tommi Vainikainen, thanks to him for the really quick response! Unfortunately, making xqilla:parse-html available and upgrading to the latest upstream version is not enough to use XQilla on Linux with the typical webpages out there. Xerces-C++, which is what XQilla uses to fetch web resources, does not support HTTPS URLs at the moment. I filed #821380 to ask for HTTPS support in Xerces-C to be enabled by default. And even with HTTPS support enabled in Xerces-C, the xqilla:parse-html function (which is based on HTML Tidy) fails for a lot of real-world webpages I tried. Manually upgrading the six year old version of HTML Tidy in Debian to the latest from GitHub (tidy-html5, #810951) did not help a lot either.
Python to the rescue XQuery is still a very nice language for extracting information from markup documents. XQilla just has a bit of a hard time dealing with the typical HTML documents out there. After all, it was designed to deal with well-formed XML documents. So I decided to build myself a little wrapper around XQilla which fetches the web resources with the Python Requests package, and cleans the HTML document with BeautifulSoup (which uses lxml to do HTML parsing). The output of BeautifulSoup can apparently be passed to XQilla as the context document. This is a fairly crazy hack, but it works quite reliably so far. Here is how one of my web scraping rules looks like:
from click import argument, group
@group()
def xq():
  """Web scraping for command-line users."""
  pass
@xq.group('github.com')
def github():
  """Quick access to github.com."""
  pass
@github.command('code_search')
@argument('language')
@argument('query')
def github_code_search(language, query):
  """Search for source code."""
  scrape(get='https://github.com/search',
         params= 'l': language, 'q': query, 'type': 'code' )
The function scrape automatically determines the XQuery filename according to the callers function name. Here is how github_code_search.xq looks like:
declare function local:source-lines($table as node()*) as xs:string*
 
  for $tr in $table/tr return normalize-space(data($tr))
 ;
let $results := html//div[@id="code_search_results"]/div[@class="code-list"]
for $div in $results/div
let $repo := data($div/p/a[1])
let $file := data($div/p/a[2])
let $link := resolve-uri(data($div/p/a[2]/@href))
return (concat($repo, ": ", $file), $link, local:source-lines($div//table),
        "---------------------------------------------------------------")
That is all I need to implement a custom web scraping rule. A few lines of Python to specify how and where to fetch the website from. And a XQuery file that specifies how to mangle the document content. And thanks to the Python click package, the various entry points of my web scraping script can easily be called from the command-line. Here is a sample invokation:
fx:~/xq% ./xq.py github.com
Usage: xq.py github.com [OPTIONS] COMMAND [ARGS]...
  Quick access to github.com.
Options:
  --help  Show this message and exit.
Commands:
  code_search  Search for source code.
fx:~/xq% ./xq.py github.com code_search Pascal '"debian/rules"'
prof7bit/LazPackager: frmlazpackageroptionsdeb.pas
https://github.com/prof7bit/LazPackager/blob/cc3e35e9bae0c5a582b0b301dcbb38047fba2ad9/frmlazpackageroptionsdeb.pas
230 procedure TFDebianOptions.BtnPreviewRulesClick(Sender: TObject);
231 begin
232 ShowPreview('debian/rules', EdRules.Text);
233 end;
234
235 procedure TFDebianOptions.BtnPreviewChangelogClick(Sender: TObject);
---------------------------------------------------------------
prof7bit/LazPackager: lazpackagerdebian.pas
https://github.com/prof7bit/LazPackager/blob/cc3e35e9bae0c5a582b0b301dcbb38047fba2ad9/lazpackagerdebian.pas
205 + 'mv ../rules debian/' + LF
206 + 'chmod +x debian/rules' + LF
207 + 'mv ../changelog debian/' + LF
208 + 'mv ../copyright debian/' + LF
---------------------------------------------------------------
For the impatient, here is the implementation of scrape:
from bs4 import BeautifulSoup
from bs4.element import Doctype, ResultSet
from inspect import currentframe
from itertools import chain
from os import path
from os.path import abspath, dirname
from subprocess import PIPE, run
from tempfile import NamedTemporaryFile
import requests
def scrape(get=None, post=None, find_all=None,
           xquery_name=None, xquery_vars= , **kwargs):
  """Execute a XQuery file.
  When either get or post is specified, fetch the resource and run it through
  BeautifulSoup, passing it as context to the XQuery.
  If find_all is given, wrap the result of executing find_all on
  the BeautifulSoup in an artificial HTML body.
  If xquery_name is not specified, the callers function name is used.
  xquery_name combined with extension ".xq" is searched in the directory
  where this Python script resides and executed with XQilla.
  kwargs are passed to get or post calls.  Typical extra keywords would be:
  params -- To pass extra parameters to the URL.
  data -- For HTTP POST.
  """
  response = None
  url = None
  context = None
  if get is not None:
    response = requests.get(get, **kwargs)
  elif post is not None:
    response = requests.post(post, **kwargs)
  if response is not None:
    response.raise_for_status()
    context = BeautifulSoup(response.text, 'lxml')
    dtd = next(context.descendants)
    if type(dtd) is Doctype:
      dtd.extract()
    if find_all is not None:
      context = context.find_all(find_all)
    url = response.url
  if xquery_name is None:
    xquery_name = currentframe().f_back.f_code.co_name
  cmd = ['xqilla']
  if context is not None:
    if type(context) is BeautifulSoup:
      soup = context
      context = NamedTemporaryFile(mode='w')
      print(soup, file=context)
      cmd.extend(['-i', context.name])
    elif isinstance(context, list) or isinstance(context, ResultSet):
      tags = context
      context = NamedTemporaryFile(mode='w')
      print('<html><body>', file=context)
      for item in tags: print(item, file=context)
      print('</body></html>', file=context)
      context.flush()
      cmd.extend(['-i', context.name])
  cmd.extend(chain.from_iterable(['-v', k, v] for k, v in xquery_vars.items()))
  if url is not None:
    cmd.extend(['-b', url])
  cmd.append(abspath(path.join(dirname(__file__), xquery_name + ".xq")))
  output = run(cmd, stdout=PIPE).stdout.decode('utf-8')
  if type(context) is NamedTemporaryFile: context.close()
  print(output, end='')
The full source for xq can be found on GitHub. The project is just two days old, so I have only implemented three scraping rules as of now. However, adding new rules has been made deliberately easy, so that I can just write up a few lines of code whenever I find something on the web which I'd like to scrape on the command-line. If you find this "framework" useful, make sure to share your insights with me. And if you impelement your own scraping rules for a public service, consider sharing that as well. If you have an comments or questions, send me mail. Oh, and by the way, I am now also on Twitter as @blindbird23.

21 February 2016

Mario Lang: Generating C++ from a DTD with Jinja2 and lxml

I recently stumbled across an XML format specified in a DTD that I wanted to work with from within C++. The XML format is document centric, which is a bit of a pain with existing data binding compilers according to my limited experience. So to learn something new, and to keep control over generated code, I started to investigate what it would take to write my own little custom data binding compiler.
Writing a program that writes a program It turns out that there are two very helpful libraries in Python which can really make your life a lot easier:
To keep my life simple, I am focusing on generating accessors for XML attributes only for now. I leave it up to the library client to figure out how to deal with child elements.
A highly simplified DOM Inspired by the hybrid example from libstudxml, we define a simple base class that can store raw XML elements.
class element  
public:
  using attributes_type = std::map<xml::qname, std::string>;
  using elements_type = std::vector<std::shared_ptr<element>>;
  element(const xml::qname& name) : tag_name_(name)  
  virtual ~element() = default;
  xml::qname const& tag_name() const   return tag_name_;  
  attributes_type const& attributes() const   return attributes_;  
  attributes_type&       attributes()         return attributes_;  
  std::string const& text() const   return text_;  
  void text(std::string const& text)   text_ = text;  
  elements_type const& elements() const  return elements_; 
  elements_type&       elements()         return elements_;  
  element(xml::parser&, bool start_end = true);
  void serialize (xml::serializer&, bool start_end = true) const;
  template<typename T> static std::shared_ptr<element> create(xml::parser& p)  
    return std::make_shared<T>(p, false);
   
private:
  xml::qname tag_name_;
  attributes_type attributes_;
  std::string text_;           // Simple content only.
  elements_type elements_;     // Complex content only.
 ;
For each element name in the DTD, we're going to define a class that inherits from the element class, implementing special methods to make attribute access easier. The element(xml::parser&) constructor is going to create the corresponding class whenever it sees a certain element name. This calls for some sort of factory:
class factory  
public:
  static std::shared_ptr<element> make(xml::parser& p);
protected:
  struct element_info  
    xml::content content_type;
    std::shared_ptr<element> (*construct)(xml::parser&);
   ;
  using map_type = std::map<xml::qname, element_info>;
  static map_type *get_map()  
    if (!map) map = new map_type;
    return map;
   
private:
  static map_type *map;
 ;
template<typename T>
struct register_element : factory  
  register_element(xml::qname const& name, xml::content const& content)  
    get_map()->insert( name, element_info content, &element::create<T> );
   
 ;
shared_ptr<element> factory::make(xml::parser& p)  
  auto name = p.qname();
  auto iter = get_map()->find(name);
  if (iter == get_map()->end())  
    // No subclass found, so store plain data so we do not loose on roundtrip.
    return std::make_shared<element>(p, false);
   
  auto const& element = iter->second;
  p.content(element.content_type);
  return element.create(p);
 
The header template Now that we have our required infrastructure, we can finally start writing Jinja2 templates to generate classes for all elements in our DTD:
 %- for elem in dtd.iterelements() % 
   %- if elem.name in forwards_for % 
     %- for forward in forwards_for[elem.name] % 
class  forward ;
     %- endfor % 
   %- endif % 
class  elem.name  : public dom::element  
  static register_element< elem.name > factory_registration;
public:
   elem.name (xml::parser& p, bool start_end = true) : dom::element(p, start_end)  
   
   %- for attr in elem.iterattributes() % 
     %- if attr is required_string_attribute % 
  std::string  attr.name () const;
  void  attr.name (std::string const&);
     %- elif attr is implied_string_attribute % 
  optional<std::string>  attr.name () const;
  void  attr.name (optional<std::string>);
     # more branches to go here # 
     %- endif % 
   %- endfor % 
 ;
 %- endfor % 
required_string_attribute and implied_string_attribute are so-called Jinja2 tests. They are a nice way to isolate predicates such that the Jinja2 templates can stay relatively free of complicated expressions:
templates.tests['required_string_attribute'] = lambda a: \
  a.type in ['id', 'cdata', 'idref'] and a.default == 'required'
templates.tests['implied_string_attribute'] = lambda a: \
  a.type in ['id', 'cdata', 'idref'] and a.default == 'implied'
That is nice, but we have only seen C++ header declarations so far. Lets have a look into the implementation of some of our attribute accessors.
Enum conversion One interesting aspect of DTD based code generation is the fact that attributes can have enumerations specified. Assume that we have some extra data-structure in Python which helps us to define a nice name for each individual enumeration attribute. Then, a part of the Jinja2 template to generate the implementation for an enumeration attribute looks like:
     %- elif attr is known_enumeration_attribute % 
       %- set enum = enumerations[tuple(attr.values())]['name'] % 
       %- if attr.default == 'required' % 
 enum   elem.name :: attr.name () const  
  auto iter = attributes().find(qname " attr.name " );
  if (iter != attributes().end())  
         %- for value in attr.values() % 
     % if not loop.first % else  % else %       % endif -% 
    if (iter->second == " value ") return  enum :: value   mangle ;
         %- endfor % 
    throw illegal_enumeration ;
   
  throw missing_attribute ;
 
void  elem.name :: attr.name ( enum  value)  
  static qname const attr " attr.name " ;
  switch (value)  
         %- for value in attr.values() % 
  case  enum :: value   mangle :
    attributes()[attr] = " value ";
    break;
         %- endfor % 
  default:
    throw illegal_enumeration ;
   
 
       %- elif attr.default == 'implied' % 
 # similar implementation using boost::optional # 
       %- endif % 
     %- endif % 
Putting it all together The header for the library is generated like this:
from jinja2 import DictLoader, Environment
from lxml.etree import DTD
LIBRARY_HEADER = """
 # Our template code # 
"""
bmml = DTD('bmml.dtd')
templates = Environment(loader=DictLoader(globals()))
templates.filters['mangle'] = lambda ident: \
   '8th_or_128th': 'eighth_or_128th',
   '256th': 'twohundredfiftysixth',
   'continue': 'continue_'
   .get(ident, ident)
def template(name):
  return templates.get_template(name)
def hpp():
  print(template('LIBRARY_HEADER').render(
     'dtd': bmml,
     'enumerations': enumerations,
     'forwards_for':  'ornament': ['ornament_type'],
                      'score': ['score_data', 'score_header'] 
     ))
With all of this in place, we can have a look at a small use case for our library.
Printing document content I haven't really explained anything about the document format we're working with until now. Braille Music Markup Language is an XML based plain text markup language. Its purpose is to be able to enhance plain braille music scores with usually hard-to-calcuate meta information. Almost all element text content is supposed to be printed as-is to reconstruct the original plain text. So we could at least define one very basic operation in our library: printing the plain text content of an element. I found an XML stylesheet that is supposed to convert BMML documents to HTML. This stylesheet apparently has a bug, insofar as it forgets to treat the rest_data element in the same way as it already treats the note_data element. note to self, I wish I would've done a code review before the EU-project that developed BMML was finished. It looks like resurrecting maintainance is one of the things I might be able to look into in a meeting in Pisa in the first three days of March this year. If we keep this in mind, we can easily reimplement what the stylesheet does in idiomatic C++:
template<typename T>
typename std::enable_if<std::is_base_of<element, T>::value, std::ostream&>::type
operator<<(std::ostream &out, std::shared_ptr<T> elem)  
  if (!std::dynamic_pointer_cast<note_data>(elem) &&
      !std::dynamic_pointer_cast<rest_data>(elem) &&
      !std::dynamic_pointer_cast<score_header>(elem))
   
    auto const& text = elem->text();
    if (text.empty()) for (auto child : *elem) out << child; else out << text;
   
  return out;
 
The use of std::enable_if is necessary here so that operator<< is defined on the element class and all of its subclasses. Without the std::enable_if magic, client code would be forced to manually make sure it is passing std::shared_ptr<element> each time it wants to use the operatr<< on any of our specially defined subclasses. Now we can easily print BMML documents and get their actual plain text representation.
#include <fstream>
#include <iostream>
#include <xml/parser>
#include <xml/serializer>
#include "bmml.hxx"
using namespace std;
using namespace xml;
int main (int argc, char *argv[])  
  if (argc < 2)  
    cerr << "usage: " << argv[0] << " [<filename.bmml>...]" << endl;
    return EXIT_FAILURE;
   
  try  
    for (int i = 1; i < argc; ++i)  
      ifstream ifs argv[i] ;
      if (ifs.good())  
        parser p ifs, argv[i] ;
        p.next_expect(parser::start_element, "score", content::complex);
        cout << make_shared<bmml::score>(p, false) << endl;
        p.next_expect(parser::end_element, "score");
        else  
        cerr << "Unable to open '" << argv[i] << "'." << endl;
        return EXIT_FAILURE;
       
     
    catch (xml::exception const& e)  
    cerr << e.what() << endl;
    return EXIT_FAILURE;
   
 
That's it for now. The full source for the actual library which inspired this posting can be found on github in my bmmlcxx project. If you have an comments or questions, send me mail. If you like bmmlcxx, don't forget to star it :-).

Mario Lang: Generating C++ from a DTD with Jinja2 and lxml

I recently stumbled across an XML format specified in a DTD that I wanted to work with from within C++. The XML format is document centric, which is a bit of a pain with existing data binding compilers according to my limited experience. So to learn something new, and to keep control over generated code, I started to investigate what it would take to write my own little custom data binding compiler.
Writing a program that writes a program It turns out that there are two very helpful libraries in Python which can really make your life a lot easier:
To keep my life simple, I am focusing on generating accessors for XML attributes only for now. I leave it up to the library client to figure out how to deal with child elements.
A highly simplified DOM Inspired by the hybrid example from libstudxml, we define a simple base class that can store raw XML elements.
class element  
public:
  using attributes_type = std::map<xml::qname, std::string>;
  using elements_type = std::vector<std::shared_ptr<element>>;
  element(const xml::qname& name) : tag_name_(name)  
  virtual ~element() = default;
  xml::qname const& tag_name() const   return tag_name_;  
  attributes_type const& attributes() const   return attributes_;  
  attributes_type&       attributes()         return attributes_;  
  std::string const& text() const   return text_;  
  void text(std::string const& text)   text_ = text;  
  elements_type const& elements() const  return elements_; 
  elements_type&       elements()         return elements_;  
  element(xml::parser&, bool start_end = true);
  void serialize (xml::serializer&, bool start_end = true) const;
  template<typename T> static std::shared_ptr<element> create(xml::parser& p)  
    return std::make_shared<T>(p, false);
   
private:
  xml::qname tag_name_;
  attributes_type attributes_;
  std::string text_;           // Simple content only.
  elements_type elements_;     // Complex content only.
 ;
For each element name in the DTD, we're going to define a class that inherits from the element class, implementing special methods to make attribute access easier. The element(xml::parser&) constructor is going to create the corresponding class whenever it sees a certain element name. This calls for some sort of factory:
class factory  
public:
  static std::shared_ptr<element> make(xml::parser& p);
protected:
  struct element_info  
    xml::content content_type;
    std::shared_ptr<element> (*construct)(xml::parser&);
   ;
  using map_type = std::map<xml::qname, element_info>;
  static map_type *get_map()  
    if (!map) map = new map_type;
    return map;
   
private:
  static map_type *map;
 ;
template<typename T>
struct register_element : factory  
  register_element(xml::qname const& name, xml::content const& content)  
    get_map()->insert( name, element_info content, &element::create<T> );
   
 ;
shared_ptr<element> factory::make(xml::parser& p)  
  auto name = p.qname();
  auto iter = get_map()->find(name);
  if (iter == get_map()->end())  
    // No subclass found, so store plain data so we do not loose on roundtrip.
    return std::make_shared<element>(p, false);
   
  auto const& element = iter->second;
  p.content(element.content_type);
  return element.create(p);
 
The header template Now that we have our required infrastructure, we can finally start writing Jinja2 templates to generate classes for all elements in our DTD:
 %- for elem in dtd.iterelements() % 
   %- if elem.name in forwards_for % 
     %- for forward in forwards_for[elem.name] % 
class  forward ;
     %- endfor % 
   %- endif % 
class  elem.name  : public dom::element  
  static register_element< elem.name > factory_registration;
public:
   elem.name (xml::parser& p, bool start_end = true) : dom::element(p, start_end)  
   
   %- for attr in elem.iterattributes() % 
     %- if attr is required_string_attribute % 
  std::string  attr.name () const;
  void  attr.name (std::string const&);
     %- elif attr is implied_string_attribute % 
  optional<std::string>  attr.name () const;
  void  attr.name (optional<std::string>);
     # more branches to go here # 
     %- endif % 
   %- endfor % 
 ;
 %- endfor % 
required_string_attribute and implied_string_attribute are so-called Jinja2 tests. They are a nice way to isolate predicates such that the Jinja2 templates can stay relatively free of complicated expressions:
templates.tests['required_string_attribute'] = lambda a: \
  a.type in ['id', 'cdata', 'idref'] and a.default == 'required'
templates.tests['implied_string_attribute'] = lambda a: \
  a.type in ['id', 'cdata', 'idref'] and a.default == 'implied'
That is nice, but we have only seen C++ header declarations so far. Lets have a look into the implementation of some of our attribute accessors.
Enum conversion One interesting aspect of DTD based code generation is the fact that attributes can have enumerations specified. Assume that we have some extra data-structure in Python which helps us to define a nice name for each individual enumeration attribute. Then, a part of the Jinja2 template to generate the implementation for an enumeration attribute looks like:
     %- elif attr is known_enumeration_attribute % 
       %- set enum = enumerations[tuple(attr.values())]['name'] % 
       %- if attr.default == 'required' % 
 enum   elem.name :: attr.name () const  
  auto iter = attributes().find(qname " attr.name " );
  if (iter != attributes().end())  
         %- for value in attr.values() % 
     % if not loop.first % else  % else %       % endif -% 
    if (iter->second == " value ") return  enum :: value   mangle ;
         %- endfor % 
    throw illegal_enumeration ;
   
  throw missing_attribute ;
 
void  elem.name :: attr.name ( enum  value)  
  static qname const attr " attr.name " ;
  switch (value)  
         %- for value in attr.values() % 
  case  enum :: value   mangle :
    attributes()[attr] = " value ";
    break;
         %- endfor % 
  default:
    throw illegal_enumeration ;
   
 
       %- elif attr.default == 'implied' % 
 # similar implementation using boost::optional # 
       %- endif % 
     %- endif % 
Putting it all together The header for the library is generated like this:
from jinja2 import DictLoader, Environment
from lxml.etree import DTD
LIBRARY_HEADER = """
 # Our template code # 
"""
bmml = DTD('bmml.dtd')
templates = Environment(loader=DictLoader(globals()))
templates.filters['mangle'] = lambda ident: \
   '8th_or_128th': 'eighth_or_128th',
   '256th': 'twohundredfiftysixth',
   'continue': 'continue_'
   .get(ident, ident)
def template(name):
  return templates.get_template(name)
def hpp():
  print(template('LIBRARY_HEADER').render(
     'dtd': bmml,
     'enumerations': enumerations,
     'forwards_for':  'ornament': ['ornament_type'],
                      'score': ['score_data', 'score_header'] 
     ))
With all of this in place, we can have a look at a small use case for our library.
Printing document content I haven't really explained anything about the document format we're working with until now. Braille Music Markup Language is an XML based plain text markup language. Its purpose is to be able to enhance plain braille music scores with usually hard-to-calcuate meta information. Almost all element text content is supposed to be printed as-is to reconstruct the original plain text. So we could at least define one very basic operation in our library: printing the plain text content of an element. I found an XML stylesheet that is supposed to convert BMML documents to HTML. This stylesheet apparently has a bug, insofar as it forgets to treat the rest_data element in the same way as it already treats the note_data element. note to self, I wish I would've done a code review before the EU-project that developed BMML was finished. It looks like resurrecting maintainance is one of the things I might be able to look into in a meeting in Pisa in the first three days of March this year. If we keep this in mind, we can easily reimplement what the stylesheet does in idiomatic C++:
template<typename T>
typename std::enable_if<std::is_base_of<element, T>::value, std::ostream&>::type
operator<<(std::ostream &out, std::shared_ptr<T> elem)  
  if (!std::dynamic_pointer_cast<note_data>(elem) &&
      !std::dynamic_pointer_cast<rest_data>(elem) &&
      !std::dynamic_pointer_cast<score_header>(elem))
   
    auto const& text = elem->text();
    if (text.empty()) for (auto child : *elem) out << child; else out << text;
   
  return out;
 
The use of std::enable_if is necessary here so that operator<< is defined on the element class and all of its subclasses. Without the std::enable_if magic, client code would be forced to manually make sure it is passing std::shared_ptr<element> each time it wants to use the operatr<< on any of our specially defined subclasses. Now we can easily print BMML documents and get their actual plain text representation.
#include <fstream>
#include <iostream>
#include <xml/parser>
#include <xml/serializer>
#include "bmml.hxx"
using namespace std;
using namespace xml;
int main (int argc, char *argv[])  
  if (argc < 2)  
    cerr << "usage: " << argv[0] << " [<filename.bmml>...]" << endl;
    return EXIT_FAILURE;
   
  try  
    for (int i = 1; i < argc; ++i)  
      ifstream ifs argv[i] ;
      if (ifs.good())  
        parser p ifs, argv[i] ;
        p.next_expect(parser::start_element, "score", content::complex);
        cout << make_shared<bmml::score>(p, false) << endl;
        p.next_expect(parser::end_element, "score");
        else  
        cerr << "Unable to open '" << argv[i] << "'." << endl;
        return EXIT_FAILURE;
       
     
    catch (xml::exception const& e)  
    cerr << e.what() << endl;
    return EXIT_FAILURE;
   
 
That's it for now. The full source for the actual library which inspired this posting can be found on github in my bmmlcxx project. If you have an comments or questions, send me mail. If you like bmmlcxx, don't forget to star it :-).

4 November 2015

Mario Lang: Blind through the night

"Graz nightlife as seen from the perspective of a blind couple" has just been uploaded to YouTube. This is a collection of many short clips I made while going to tekno and DnB parties. In case you don't know already, me and my girlfriend are both legally blind.

14 October 2015

Mario Lang: ccidentals in Haskell

I've had quite some fun recently (re)learning Haskell. My learning project is to implement braille music notation parsing in Haskell. Given that i've already implemented most of this stuff in C++, it gives me a great opportunity to rethink my algorithms. Not everything I've had to implement until now was actually pretty. I spent yesterday evening implementing accidentals handling, which turned out to be quite a mess. However, I wanted to share my definition of the circle of fifths, because I find it rather concise.
The problem Given a key signature (often expressed as the number of sharp or flat accidentals), tell which pitch classes are actually raised/lowered. While reading through music notation software, I have seen several implementations of this basic concept. However, I have never seen one which was so concise.
module Accidental where
import           Data.Map (Map)
import qualified Data.Map as Map (fromList)
import qualified Haskore.Basic.Pitch as Pitch
fifths n   n >  0    = let [a,b,c,d,e,f,g] = fifths (n-1)
                       in  [d,e,f,g+1,a,b,c]
           n <  0    = let [a,b,c,d,e,f,g] = fifths (n+1)
                       in  [e,f,g,a,b,c,d-1]
           otherwise = replicate 7 0
Given this, we can easily define a Map of pitches to currently active accidentals/alterations. List comprehension to the rescue!
accidentals :: Int -> Map Pitch.T Pitch.Relative
accidentals k = Map.fromList [ ((o, c), a)
                               o <- [0..maxOctave]
                             , (c, a) <- zip diatonicSteps $ fifths k
                             , a /= 0
                             ] where
  maxOctave = 9
  diatonicSteps = [Pitch.C, Pitch.D, Pitch.E, Pitch.F, Pitch.G,
                   Pitch.A, Pitch.B]
The full source code for the haskore-braille (WIP) package can be found on GitHub. If you have any comments regarding the implementation, please drop me a mail.

14 April 2015

Mario Lang: Bjarne Stoustrup talking about organisations that can raise expectations

At time index 22:35, Bjarne Stroustrup explains in this video what he thinks is very special about organisatrions like Cambridge or Bell Labs. When I just heard him explain this, I couldn't help but think of Debian. This is exactly how I felt (and actually still do) when I joined Debian as a Developer in 2002. This is, what makes Debian, amongst other things, very special to me. If you don't want to watch the video, here is the excerpt I am talking about:
One of the things that Cambridge could do, and later Bell Labs could do, is somehow raise peoples expectations of themselves. Raise the level that is considered acceptable. You walk in and you see what people are doing, you see how people are doing, you see how apparently easily they do it, and you see how nice they are while doing it, and you realize, I better sharpen up my game. This is something where you have to, you just have to get better. Because, what is acceptable has changed. And some organisations can do that, and well, most can't, to that extent. And I am very very lucky to be in a couple places that actually can increase your level of ambition, in some sense, level of what is a good standard.

9 April 2015

Mario Lang: A C++ sample collection

I am one of those people that best learns from looking at examples. No matter if I am trying to learn a programming pattern/idiom, or a completely new library or framework. Documentation is good (if it is good!) for diving into the details, but to get me started, I always want to look at a self contained example so that I can get a picture of the thing in my head. So I was very excited when a few days ago, CppSamples was announced on the ISO C++ Blog. While it is a very young site, it already contains some very useful gems. It is maintained over at GitHub, so it is also rather easy to suggest new additions, or improve the existing examples by submitting a pull request. Give it a try, it is really quite nice. In my book, the best resource I have found so far in 2015. BTW, Debian has a standard location for finding examples provided by a package. It is /usr/share/doc/<package>/examples/. I consider that very useful.

7 April 2015

Mario Lang: I am sorry, but this looks insane

I am a console user. I really just started to use X11 again about two weeks ago, to sometimes test an a Qt application I am developing. I am not using Firefox or anything similar, all my daily work happens in shells and inside of emacs, in a console, not in X11. BRLTTY runs all the time, translating the screen content to something that my braille display can understand, sent out via USB. So the most important programs to me, are really emacs, and brltty. This is my desktop, that is up since 179 days.
PID   USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1227 message+  20   0    7140   2860    672 S   0,0  0,1 153:33.10 dbus-daemon
21457 root      20   0   44456   1116    788 S   0,0  0,1 146:42.47 packagekitd
    1 root      20   0   24348   2808   1328 S   0,0  0,1 109:16.99 systemd
 7897 mlang     20   0  585776 121656   4188 S   0,0  6,0 105:22.40 emacs
13332 root      20   0   10744    432    220 S   0,0  0,0  91:55.96 ssh
19581 root      20   0    4924   1632   1076 S   0,0  0,1  53:33.56 systemd
19596 root      20   0   20312   9764   9660 S   0,0  0,5  48:10.76 systemd-journal
10172 root      20   0   85308   2472   1672 S   0,0  0,1  20:30.18 NetworkManager
   29 root      20   0       0      0      0 S   0,0  0,0  18:40.24 kswapd0
13334 root      20   0  120564   5748    304 S   0,0  0,3  16:20.89 sshfs
    7 root      20   0       0      0      0 S   0,0  0,0  15:21.15 rcu_sched
14245 root      20   0    7620    316    152 S   0,0  0,0  15:08.64 ssh
  438 root      20   0       0      0      0 S   0,0  0,0  12:14.80 jbd2/dm-1-8
11952 root      10 -10   42968   2028   1420 S   0,0  0,1  10:36.20 brltty
I am sorry, but this doesn't look right, not at all. I am not even beginning to talk about dbus-daemon and systemd. Why the HECK does packagekitd (which I definitely don't use actively) use up more then two hours of plain CPU time? What did it do, talk to NSA via an asymmetric cipher, or what?! I play music via sshfs, sometimes FLAC files. That barely consumed more CPU time then brltty, which is probably the most active daemon on my system, erm, it should be. I don't want to chime into any flamewars. I have accepted that we have systemd. But this does not look right! I remember, back in the good old days, emacs and brltty were my top CPU users.

23 March 2015

Mario Lang: Why is Qt5 not displaying Braille?

While evaluating the cross-platform accessibility of Qt5, I stumbled across this deficiency:
#include <QApplication>
#include <QTextEdit>
int main(int argv, char **args)
 
  QApplication app(argv, args);
  QTextEdit textEdit;
  textEdit.setText(u8"\u28FF");
  textEdit.show();
  return app.exec();
 
(compile with -std=c++11). On my system, this "application" does not show the correct glyph always. Sometimes, it renders a a white square with black border, i.e., the symbol for unknown glyph. However, if I invoke the same executable several times, sometimes, it renders the glyph correctly. In other words: The glyph choosing mechansim is apparently non-deterministic!!! UPDATE: Sune Vuorela figured out that I need to set QT_HARFBUZZ=old in the environment for this bug to go away. Apparently, harfbuzz-ng from Qt 5.3 is buggy.

18 March 2015

Mario Lang: Call for Help: BMC -- Braille Music Compiler

Since 2009, I am persuing a personal programming project. As I am not a professional programmer, I have spent quite a lot of that time exploring options. I have thrown out about three or four prototype implementations already. My last implementation seems to contain enough accumulated wisdom to be actually useful. I am far from finished, but the path I am walking now seems relatively sound. So, what is this project about? I have set myself a rather ambitious goal: I am trying to implement a two-way bridge between visual music notation and braille music code. It is called BMC (Braille Music Compiler). My problem: I am, as some of you might remember, 100% blind. So I am trying to write a translator between something I will never see directly, and its counterpart representation in a tactile encoding I had to learn from scratch to be able to work on this project. Braille music code is probably the most cryptic thing I have ever tried to learn. It basically is a method to represent a 2-dimensional structure like staff-notation as a stream of characters encoded in 6-dot braille. As the goal above states, I am ultimately trying to implement a converter that works both ways. One of my prototypes already implemented reading digital staff notation (MusicXML) and transcribing it to Braille. However, to be able to actually understand all the concepts involved, I ended up starting from the other end of the spectrum with my new implementation: parsing braille music code and emitting digital staff notation (LilyPond and MusicXML). This is a rather unique feature, since while there is commercial (and very expensive) software out there to convert MusicXML to braille music code, there is, as far as I know, no system that allows to input un-annotated braille music code and have it automatically converted to sighted music notation. So the current state of things is, that we are able to read certain braille music code formats, and output either reformatted (to new line-width) braille music code, LilyPond or MusicXML. The ultimate goal is to also implement a MusicXML reader, and convert the data to something that can be output as braille music code. While the initial description might not sound very hard, there are a lot of complications arising from how braille music code works, which make this quite a programming challenge. For one, braille music note and rest values are ambigious. A braille music note or rest that looks like a whole can mean a whole or 16th. A braille music note or rest that looks like a half can mean a half or a 32nd. And so on. So each braille music code value can have two meanings. The actual value can be caluclated with a recursive algorithm that I have worked out from scratch over the years. The original implementation was inspired by Samuel Thibault (thanks!) and has since then evolved into something that does what we need, while trying to do that very fast. Most input documents can be processed in almost no time, however, time signatures with a value > 1 (such as 12/8) tend to make the number of possible choices exploed quite heavily. I have found so far one piece from J.S. Bach (BWV988 Variation 3) which takes about 1.5s on my 3GHz AMD (and the code is already using several CPU cores). Additionally, braille music code supports a form of "micro"-repetitions which are not present in visual staff notation which effectively allow certain musical patterns to be compressed if represented in braille. Another algorithmically interesting part of BMC that I have started to taclke just recently is the linebreaking problem. Braille music code has some peculiar rules when it comes to breaking a measure of musical material into several lines. I ended up adapting Donald E. Knuth's algorithm from Breaking Paragraphs into Lines for fixed-width text. In other words, I am ignoring the stretch/shrink factors, while making use of different penalty values to find the perfect solution for the problem of breaking a paragraph of braille music code into several lines. One thing that I have learnt from my perivous prototype (which was apparently useful enough to already acquire some users) is that it is not enough to just transcribe one format to another. I ultimately want to store meta information about the braille that is presented to the user such that I can implement interactive querying and editing features. Braille music code is complicated, and one of the original motivations to work on software to deal with it was to ease the learning curve. A user of BMC should be able to ask the system for a description of a character at a certain position. The user interface (not implemented yet) should allow to play a certain note interactively, or play the measure under the cursor, or play the whole document, and if possible, have the cursor scroll along while playback plays notes. These features are not implemented in BMC yet, but they have been impleemnted in the previous prototype and their usefulness is apparent. Also, when viewing a MusicXML document in braille music code, certain non-structural changes like adding/removing fingering annotations should be possible while preserving unhandled features of the original MusicXML document. This also has been implemented in the previous prototype, and is a goal for BMC.
I need your help The reason why I am explaining all of this here is that I need your help for this project to succeed. Helping the blind to more easily work with traditional music notation is a worthwhile goal to persue. There is no free system around that really tries to adhere to the braille music code standard, and aims to cover converting both ways. I have reached a level of conformance that surpasses every implementation of the same problem that I have seen so far on the net. However, the primary audience of this software is going to be using Windows. We desperately need a port to that OS, and a user interface resembling NotePad with a lot fewer menu entires. We also need a GTK interface that does the same thing on Linux. wxWindows is unfortunately out of question, since it does not provide the same level of Accessibility on all the platforms it supports. Ideally, we'd also have a Cocoa interface for OS X. I am afraid there is no platform independent GUI framework that offers the same level of Accessibility on all supported platforms. And since much of our audience is going to rely on working Accessibility, it looks like we need to implement three user interfaces to achieve this goal :-(. I also desperately need code reviews and inspiration from fellow programmers. BMC is a C++11 project heavily making use of Boost. If you are into one of these things, please give it a whirl, and emit pull requests, no matter how small they are. While I have learnt a lot in the last years, I am sure there are many places that could use some fresh winds of thought by people that are not me. I am suffering from what I call "the lone coder syndrome". I also need (technical) writers to help me complete the pieces of documentation that are already lying around. I have started to write a braille music tutorial based on the underlying capabilities of BMC. In other words, the tutorial includes examples which are being typeset in braille and staff notation, using LilyPond as a rendering engine. However, something like a user manual is missing, basically, because the user interface is missing. BMC is currently "just" a command-line tool (well enough for me) that transcribes input files to STDOUT. This is very good for testing the backend, which is all that has been important to me in the last years. However, BMC has reached a stage now where its functionality is likely useful enough to be exposed to users. While I try to improve things steadily as I can, I realize that I really need to put out this call for help to make any useful progress in a foreseeable time. If you think it is a worthwhile goal to help the blind to more easily work with music notation, and also enable communication between blind and sighted musicians in both ways, please take the time and consider how you could help this project to advance. My email address can be found on my GitHub page. Oh, and while you are over at GitHub, make sure to star BMC if you think it is a nice project. It would be nice if we could produce a end-user oriented release before the end of this year.

18 December 2014

Mario Lang: deluXbreed #2 is out!

The third installment of my crossbreed digital mix podcast is out! This time, I am featuring Harder & Louder and tracks from Behind the Machine and the recently released Remixes.
  1. Apolloud - Nagazaki
  2. Apolloud - Hiroshima
  3. SA+AN - Darksiders
  4. Im Colapsed - Cleaning 8
  5. Micromakine & Switch Technique - Ascension
  6. Micromakine - Cyberman (Dither Remix)
  7. Micromakine - So Good! (Synapse Remix)
How was DarkCast born and how is it done? I always loved 175BPM music. It is an old thing that is not going away soon :-). I recently found that there is a quite active culture going on, at least on BandCamp. But single tracks are just that, not really fun to listen to in my opinion. This sort of music needs to be mixed to be fun. In the past, I used to have most tracks I like/love as vinyl, so I did some real-world vinyl mixing myself. But these days, most fun music is only available digitally, at least easily. Some people still do vinyl releases, but they are actually rare. So for my personal enjoyment, I started to digitally mix tracks I really love, such that I can listen to them without "interruption". And since I am an iOS user since three years now, using the podcast format to get stuff onto my devices was quite a natural choice. I use SoX and a very small shell script to create these mixes. Here is a pseudo-template:
sox --combine mix-power \
" sox \" sox 1.flac -p\" \" sox 3.flac -p speed 0.987 delay 2:28.31 2:28.31\" -p" \
" sox \" sox 2.flac -p delay 2:34.1 2:34.1\" -p" \
mix.flac
As you can imagine, it is quite a bit of fiddling to get these scripts to do what you want. But it is a non-graphical method to get things done. If you know of a better tool, possibly with a bit of real-time controls, to get the same job done, wihtout having to resort to a damn GUI, let me know.

14 December 2014

Mario Lang: Data-binding MusicXML

My long-term free software project (Braille Music Compiler) just produced some offspring! xsdcxx-musicxml is now available on GitHub. I used CodeSynthesis XSD to generate a rather complete object model for MusicXML 3.0 documents. Some of the classes needed a bit of manual adjustment, to make the client API really nice and tidy. During the process, I have learnt (as is almost always the case when programming) quite a lot. I have to say, once you got the hang of it, CodeSynthesis XSD is really a very powerful tool. I definitely prefer having these 100k lines of code auto-generated from a XML Schema, instead of having to implement small parts of it by hand. If you are into MusicXML for any reason, and you like C++, give this library a whirl. At least to me, it is what I was always looking for: Rather type-safe, with a quite self-explanatory API. For added ease of integration, xsdcxx-musicxml is sub-project friendly. In other words, if your project uses CMake and Git, adding xsdcxx-musicxml as a subproject is as easy as using git submodule add and putting add_subdirectory(xsdcxx-musicxml) into your CMakeLists.txt. Finally, if you want to see how this library can be put to use: The MusicXML export functionality of BMC is all in one C++ source file: musicxml.cpp.

12 October 2014

Mario Lang: soundCLI works again

I recently ranted about my frustration with GStreamer in a SoundCloud command-line client written in Ruby. Well, it turns out that there was quite a bit confusion going on. I still haven't figured out why my initial tries resulted in an error regarding $DISPLAY not being set. But now that I have played a bit with gst-launch-1.0, I can positively confirm that this was very likely not the fault of GStreamer. THe actual issue is, that ruby-gstreamer is assuming gstreamer-1.0, while soundCLI was still written against the gstreamer-0.10 API. Since the ruby gst module doesn't have the Gstreamer API version in its name, and since Ruby is a dynamic language that only detects most errors at runtime, this led to all sorts of cascaded errors. It turns out I only had to correct the use of query_position, query_duration, and get_state, as well as switching from playbin2 to playbin. soundCLI is now running in the background and playing my SoundCloud stream. A pull request against soundCLI has also been opened. On a somewhat related note, I found a GCC bug (ICE SIGSEGV) this weekend. My first one. It is related to C++11 bracketed initializers. Given that I have heard GCC 5.0 aims to remove the experimental nature of C++11 (and maybe also C++14), this seems like a good time to hit this one. I guess that means I should finally isolate the C++ regex (runtime) segfault I recently stumbled across.

10 October 2014

Mario Lang

GStreamer and the command-line? I was recently looking for a command-line client for SoundCloud. soundCLI on GitHub appeared to be what I want. But wait, there is a problem with its implementation. soundCLI uses gstreamer's playbin2 to play audio data. But that apparently requires $DISPLAY to be set. So no, soundCLI is not a command-line client. It is a command-line client for X11 users. Ahem. A bit of research on Stackoverflow and related sites did not tell me how to modify playbin2 usage such that it does not require X11, while it is only playing AUDIO data. What the HECK is going on here. Are the graphical people trying to silently overtake the world? Is Linux becoming the new Windows? The distinction between CLI and GUI has become more and more blurry in the recent years. I fear for my beloved platform. If you know how to patch soundCLI to not require X11, please let me know. My current work-around is to replace all gstreamer usage with a simple "system" call to vlc. That works, but it does not give me comment display (since soundCLI doesn't know the playback position anymore) and hangs after every track, requiring me to enter "quit" manually on the VLC prompt. I really would have liked to use mplayer2 for this, but alas, mplayer2 does not support https. Oh well, why would it need to, in this day and age where everyone seems to switch to https by default. Oh well.

30 September 2014

Mario Lang: A simple C++11 concurrent workqueue

For a little toy project of mine (a wikipedia XML dump word counter) I wrote a little C++11 helper class to distribute work to all available CPU cores. It took me many years to overcome my fear of threading: In the past, whenever I toyed with threaded code, I ended up having a lot of deadlocks, and generally being confused. It appears that I finally have understood enough of this crazyness to be able to come up with the small helper class below.
The problem We want to spread work amongst all available CPU cores. There are no dependencies between items in our work queue. So every thread can just pick up and process an item as soon as it is ready.
The solution This simple implementation makes use of C++11 threading primitives, lambda functions and move semantics. The idea is simple: You provide a function at construction time which defines how to process one item of work. To pass work to the queue, simply call the function operator of the object, repeatedly. When the destructor is called (once the object reachs the end of its scope), all remaining items are processed and all background threads are joined. The number of threads defaults to the value of std::thread::hardware_concurrency(). This appears to work at least since GCC 4.9. Earlier tests have shown that std::thread::hardware_concurrency() always returned 1. I don't know when exactly GCC (or libstdc++, actually) started to support this, but at least since GCC 4.9, it is usable. Prerequisite on Linux is a mounted /proc. The number of maximum items per thread in the queue defaults to 1. If the queue is full, calls to the function operator will block. So the most basic usage example is probably something like:
int main()  
  typedef std::string item_type;
  distributor<item_type> process([](item_type &item)  
    // do work
   );
  while (/* input */) process(std::move(/* item */));
  return 0;
 
That is about as simple as it can get, IMHO. The code can be found in the GitHub project mentioned above. However, since the class template is relatively short, here it is.
#include <condition_variable>
#include <mutex>
#include <queue>
#include <stdexcept>
#include <thread>
#include <vector>
template <typename Type, typename Queue = std::queue<Type>>
class distributor: Queue, std::mutex, std::condition_variable  
  typename Queue::size_type capacity;
  bool done = false;
  std::vector<std::thread> threads;
public:
  template<typename Function>
  distributor( Function function
             , unsigned int concurrency = std::thread::hardware_concurrency()
             , typename Queue::size_type max_items_per_thread = 1
             )
  : capacity concurrency * max_items_per_thread 
   
    if (not concurrency)
      throw std::invalid_argument("Concurrency must be non-zero");
    if (not max_items_per_thread)
      throw std::invalid_argument("Max items per thread must be non-zero");
    for (unsigned int count  0 ; count < concurrency; count += 1)
      threads.emplace_back(static_cast<void (distributor::*)(Function)>
                           (&distributor::consume), this, function);
   
  distributor(distributor &&) = default;
  distributor &operator=(distributor &&) = delete;
  ~distributor()
   
     
      std::lock_guard<std::mutex> guard(*this);
      done = true;
      notify_all();
     
    for (auto &&thread: threads) thread.join();
   
  void operator()(Type &&value)
   
    std::unique_lock<std::mutex> lock(*this);
    while (Queue::size() == capacity) wait(lock);
    Queue::push(std::forward<Type>(value));
    notify_one();
   
private:
  template <typename Function>
  void consume(Function process)
   
    std::unique_lock<std::mutex> lock(*this);
    while (true)  
      if (not Queue::empty())  
        Type item   std::move(Queue::front())  ;
        Queue::pop();
        notify_one();
        lock.unlock();
        process(item);
        lock.lock();
        else if (done)  
        break;
        else  
        wait(lock);
       
     
   
 ;
If you have any comments regarding the implementation, please drop me a mail.

2 September 2014

Mario Lang: exercism.io C++ track

exercism.io is a croud-sourced mentorship platform for learning to program. In my opinion, they do a lot of things right. In particular, an exercise on exercism.io consists of a descriptive README file and a set of test cases implemented in the target programming language. The tests have two positive sides: You learn to do test-driven development, which is good. And you also have an automated validation suite. Of course, a test can not give you feedback on your actual implementation, but at least it can give you an idea if you have managed to implement what was required of you. But that is not the end of it. Once you have submitted a solution to a particular exercise, other users of exercism.io can comment on your implementation. And you can, as soon as you have submitted the first implementation, look at the solutions that other people have submitted to that particular problem. So knowledge transfer can happen both ways from there on: You can learn new things from how other people have solved the same problem, and you can also tell other people about things they might have done in a different way. These comments are, somewhat appropriately, called nitpicks on exercism.io. Now, exercism has recently gained a C++ track. That track is particularily fun, because it is based on C++11, Boost, and CMake. Things that are quite standard to C++ development these days. And the use of C++11 and Boost makes some solutions really shine.

Next.