Search Results: "Thorsten Glaser"

8 June 2013

Thorsten Glaser: Waypoint Statistics

mirabilos  found waypoints I ve finally gotten around to listing all Waypoints (Geocaches, Opencaches, Closedcaches, Earthcaches, Terracaches including Locationless, Navicaches, etc.) I ve found a box, enjoyful, educating, a good place to hide one myself, etc. and putting up a list and, of course, generate my own statpic.

I ll put them up for the other project members, too (already made a picture for gecko2@ but bsiegert@ still needs one; we also need to collect offline lists of found, owned and attended waypoints) A bit of background story: I decided, years ago, to have an offline list of cache finds in case something would happen. Just, I had found way too many already, so this was a huge bit of work. Oh well I of course procrastinated, and then something did happen (Opencaching wanting to force a Restricted Commons licence; me disagreeing and suggesting a change; some trigger-happy person immediately deleting my account without waiting for the discussion or the decision period to end; weeks of forum discussions; Opencaching allowing dual-licencing; them telling me they can t restore my data probably never heard of databa sorry, MySQL backups). And I still didn t have the list. Now I do; recreated even the OC information from what was still accessible and with help from one OC supporter ( mic@ , thanks); merged caches that are co-listed on several platforms, etc. (still need to put in the FTF/STF/TTF/4TF/LTF and voting/favourites information) and a statpic, all in Open Source and Open Data, in cvs(1) with mksh(1) and a frontend for libgd2 I admit, but we had been using that for the MirWebsite for a while already. I suggest every geocacher keep an offline or local record of all their finds (and hides and attended logs) for things like this, in case some platform decides to let s say, put your data into the cloud where it is? I don t know .

20 May 2013

Thorsten Glaser: DynDNS

Apparently (hi Zhenech, found on Pl net Debian), a Man does not only need to fork a child, plant a tree, etc. in their life but also write a DynDNS service. Perfect for opening a new tag in the wlog called arch ology ( Some Assembly Required is also a nice example for these). Once upon a time, I used SixXS heartbeat protocol client for updating the Legacy IP (known as IPv4 earlier) endpoint address of my tunnel at home (My ISP offers static v4 for some payment now, luckily). Their client sucked, so I wrote on in ksh, naturally. And because mksh(1) is such nice a language to program in (although, I only really begun becoming proficient in Korn Shell in 2005-2006 or so, thus please take those scripts with a grain of salt, I d do them much differently nowadays) I also wrote a heartbeat server implementation. In Shell. The heartbeat server supports different backends (per client), and to date I ve run backends providing DynDNS (automatically disabling the RR if the client goes offline), an IP (IPv6) tunnel of my own (basically the same setup SixXS has, without knowing theirs), rdate(8) based time offset monitoring for ntpd(8), and an eMail forwarding service (as one must not run an MTA on dynamic IP) with it; some of these even in parallel. Not all of it is documented, but I ve written up most things in CVS. There also were some issues (mostly to do with killing sleep(1)ing subprocesses not working right), so it occasionally hung, but very rarely. Running it under the supervise of DJB d montools was nice, as I was already using djbdns, since I do not understand the BIND zone file format and do not consider MySQL a database (and did not even like databases at all, back then). For DynDNS, the heartbeat server s backend simply updated the zone file (by either adding or updating or deleting the line for the client) then running tinydns-data, then rsync ing it to the djbdns server primary and secondaries, then running zonenotify so the BIND secondaries get a NOTIFY to update their zones (so I never had to bother much with the SOA values, only allow AXFR). That s a really KISS setup Anyway. This is arch ology. The scripts are there, feel free to use them, hack on them, take them as examples even submit back patches if you want. I ll even answer questions, to some degree, in IRC. But that s it. I urge people to go use a decent ISP, even if the bandwidth is smaller. To paraphrase a coworker after he cancelled his cable based internet access (I think at Un*tym*dia) before the 2-week trial period was even over: rather have slow but reliable internet at Netc*logne than that . People, vote with your purse!

26 April 2013

Thorsten Glaser: mksh R45 released

The MirBSD Korn Shell R45 has been released today, and R44 has been named the new stable/bugfix-only series. (That s version 45.1, not 0.45, dear Homebrew/MacOSX packagers.) Packagers rejoice: the -DMKSH_GCC55009 dance is no longer needed, and even the run-time check for integer division is gone. Why? Because I realised one cannot use signed integers in C, at all, and rewrote the mksh(1) arithmetics code to use unsigned integers only. Special thanks to the people from musl libc and, to some lesser amount, Natureshadow for providing me with ideas what algorithms to replace some functionality with (signed shell arithmetic is, of course, still usable, it is just emulated using unsigned C integers now).

The following entertainment

	tg@blau:~ $ echo foo >/bar\ baz
	/bin/mksh: can't create /bar baz: Permission denied
	1 tg@blau:~ $ doch
	tg@blau:~ $ cat /bar\ baz

was provided by Tonnerre Lombard; like Swedish, German has got a number of words that cannot be expressed in English so I feel not up to the task of explaining this to people who don t know the German word doch , just rest assured it calls the last input line (be careful, this is literally a line, so don t use backslash-newline sequences) using sudo(8).

24 March 2013

Thorsten Glaser: Earthcache Master (Bronze)

Since a while
I am a proud
EarthCache Master
On the other hand I should probably put up my own, local, list of found caches, considering what happened to me on Open caching. And maybe write intros for people new to geocaching, since it d be virtually no work now had I done it initially. (And for fanfiction readers! I wish I d kept a list of read fics, not just of these I currently read and/or are currently unfinished.)

8 March 2013

Hideki Yamane: Re: checking package with clang

As I wrote before, I've posted a patch to BTS that add an option use clang compiler to pbuilder. However, pbuilder maintainer Junichi wasn't satisfied with it because he thought it seems to be overhead, introduce complexity and also hook script is enough.

I didn't agree with his opinion since I thought every time we need to add and remove its script, but Thorsten Glaser shows just specifying environment variables can solve it, and moreover, he points other compiler can be supported via hook, too. Sounds good, right? :)

Then I post a new hook script, it works with gcc-4. 4-8 and clang (gcc-4.8 needs experimental apt line for pbuilder). Please test it if you'd be interested.

6 March 2013

Raphael Geissert: A bashism a week: returning

Inspired by Thorsten Glaser's comment about where you can break from, this "bashism a week" is about a behaviour not implemented by bash.

return is a special built-in utility, and it should only be used on functions and scripts executed by the dot utility. That's what the POSIX:2001 specification requires.

If you return from any other scope, for example by accidentally calling it from a script that was not sourced but executed directly, the bash shell won't forgive you: it does not abort the execution of commands. This can lead to undesired behaviour.

A wide variety of shell interpreters silently handle such calls to return as if exit had been called.

An easy way to avoid such undesired behaviours is to follow the best practice of setting the e option, i.e.
set -e
. With that option set at the moment of calling return outside of the allowed scopes, bash will abort the execution, as desired.

The POSIX specification does not guarantee the above behaviour either as the result in such cases is "unspecified", however.

20 February 2013

Thorsten Glaser: GNU autotools generated files

On Planet Debian, Vincent Bernat wrote:

The drawback of this approach is that if you rebuild configure from the released tarball, you don t have the git tree and the version will be a date. Just don t do that.

Excuse me This is totally inacceptable. Regenerating files like aclocal.m4 and (for automake), configure (for autoconf), and the likes is one of the absolute duties of a software package. Things will break sooner or later if people do not do that. Additionally, generated files must be remakable from the distfile, so do not break this! May I suggest, constructively, an alternative? (People rightfully, I must admit complain I m just ranting too much.)
When making a release from git, write the git describe output into a file. Then, use that file instead of trying to run the git executable if .git/. is not a directory ( test -d .git/. ). Do not call git, because, in packages, it s either not installed or/and also undesired. Couldn t comment on your blog, but felt strongly enough about this I took the effort of writing a full post of my own. (But thanks for the book recommendation.)

Vincent Bernat: lldpd 0.7.1

A few weeks ago, a new version of lldpd, a 802.1AB (aka LLDP) implementation for various Unices, has been released. LLDP is an industry standard protocol designed to supplant proprietary Link-Layer protocols such as EDP or CDP. The goal of LLDP is to provide an inter-vendor compatible mechanism to deliver Link-Layer notifications to adjacent network devices. In short, LLDP allows you to know exactly on which port is a server (and reciprocally). To illustrate its use, I have made a xkcd-like strip: xkcd-like strip for the use of LLDP If you would like more information about lldpd, please have a look at its new dedicated website. This blog post is an insight of various technical changes that have affected lldpd since its latest major release one year ago. Lots of C stuff ahead!

Version & changelog UPDATED: Guillem Jover told me how he met the same goals for libbsd :
  1. Save the version from git into .dist-version and use this file if it exists. This allows one to rebuild ./configure from the published tarball without losing the version. This also handles Thorsten Glaser s critic.
  2. Include CHANGELOG in DISTCLEANFILES variable.
Since this is a better solution, I have adopted the appropriate line of codes from libbsd. The two following sections are partly technically outdated.

Automated version In, I was previously using a static version number that I had to increase when releasing:
AC_INIT([lldpd], [0.5.7], [])
Since the information is present in the git tree, this seems a bit redundant (and easy to forget). Taking the version from the git tree is easy:
        [m4_esyscmd_s([git describe --tags --always --match [0-9]* 2> /dev/null   date +%F])],
If the head of the git tree is tagged, you get the exact tag (0.7.1 for example). If it is not, you get the nearest one, the number of commits since it and part of the current hash (0.7.1-29-g2909519 for example). The drawback of this approach is that if you rebuild configure from the released tarball, you don t have the git tree and the version will be a date. Just don t do that.

Automated changelog Generating the changelog from git is a common practice. I had some difficulties to make it right. Here is my attempt (I am using automake):
dist_doc_DATA = NEWS ChangeLog
.PHONY: $(distdir)/ChangeLog
dist-hook: $(distdir)/ChangeLog
        $(AM_V_GEN)if test -d $(top_srcdir)/.git; then \
          prev=$$(git describe --tags --always --match [0-9]* 2> /dev/null) ; \
          for tag in $$(git tag   grep -E '^[0-9]+(\.[0-9]+) 1, $$'   sort -rn); do \
            if [ x"$$prev" = x ]; then prev=$$tag ; fi ; \
            if [ x"$$prev" = x"$$tag" ]; then continue; fi ; \
            echo "$$prev [$$(git log $$prev -1 --pretty=format:'%ai')]:" ; \
            echo "" ; \
            git log --pretty=' - [%h] %s (%an)' $$tag..$$prev ; \
            echo "" ; \
            prev=$$tag ; \
          done > $@ ; \
        else \
          touch $@ ; \
        touch $@
Changelog entries are grouped by version. Since it is a bit verbose, I still maintain a NEWS file with important changes.


C99 I have recently read 21st Century C which has some good bits and also handles the ecosystem around C. I have definitively adopted designated initializers in my coding style. Being a GCC extension since a long time, this is not a major compatibility problem. Without designated initializers:
struct netlink_req req;
struct iovec iov;
struct sockaddr_nl peer;
struct msghdr rtnl_msg;
memset(&req, 0, sizeof(req));
memset(&iov, 0, sizeof(iov));
memset(&peer, 0, sizeof(peer));
memset(&rtnl_msg, 0, sizeof(rtnl_msg));
req.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtgenmsg));
req.hdr.nlmsg_type = RTM_GETLINK;
req.hdr.nlmsg_flags = NLM_F_REQUEST   NLM_F_DUMP;
req.hdr.nlmsg_seq = 1;
req.hdr.nlmsg_pid = getpid();
req.gen.rtgen_family = AF_PACKET;
iov.iov_base = &req;
iov.iov_len = req.hdr.nlmsg_len;
peer.nl_family = AF_NETLINK;
rtnl_msg.msg_iov = &iov;
rtnl_msg.msg_iovlen = 1;
rtnl_msg.msg_name = &peer;
rtnl_msg.msg_namelen = sizeof(struct sockaddr_nl);
With designated initializers:
struct netlink_req req =  
    .hdr =  
        .nlmsg_len = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
        .nlmsg_type = RTM_GETLINK,
        .nlmsg_flags = NLM_F_REQUEST   NLM_F_DUMP,
        .nlmsg_seq = 1,
        .nlmsg_pid = getpid()  ,
    .gen =   .rtgen_family = AF_PACKET  
struct iovec iov =  
    .iov_base = &req,
    .iov_len = req.hdr.nlmsg_len
struct sockaddr_nl peer =   .nl_family = AF_NETLINK  ;
struct msghdr rtnl_msg =  
    .msg_iov = &iov,
    .msg_iovlen = 1,
    .msg_name = &peer,
    .msg_namelen = sizeof(struct sockaddr_nl)

Logging Logging in lldpd was not extensive. Usually, when receiving a bug report, I asked the reporter to add some additional printf() calls to determine where the problem was. This was clearly suboptimal. Therefore, I have added many log_debug() calls with the ability to filter out some of them. For example, to debug interface discovery, one can run lldpd with lldpd -ddd -D interface. Moreover, I have added colors when logging to a terminal. This may seem pointless but it is now far easier to spot warning messages from debug ones. logging output of lldpd

libevent In lldpd 0.5.7, I was using my own select()-based event loop. It worked but I didn t want to grow a full-featured event loop inside lldpd. Therefore, I switched to libevent. The minimal required version of libevent is 2.0.5. A convenient way to check the changes in API is to use Upstream Tracker, a website tracking API and ABI changes for various libraries. This version of libevent is not available in many stable distributions. For example, Debian Squeeze or Ubuntu Lucid only have 1.4.13. I am also trying to keep compatibility with very old distributions, like RHEL 2, which does not have a packaged libevent at all. For some users, it may be a burden to compile additional libraries. Therefore, I have included libevent source code in lldpd source tree (as a git submodule) and I am only using it if no suitable system libevent is available. Have a look at m4/libevent.m4 and src/daemon/ to see how this is done.


Serialization lldpctl is a client querying lldpd to display discovered neighbors. The communication is done through an Unix socket. Each structure to be serialized over this socket should be described with a string. For example:
#define STRUCT_LLDPD_DOT3_MACPHY "(bbww)"
struct lldpd_dot3_macphy  
        u_int8_t                 autoneg_support;
        u_int8_t                 autoneg_enabled;
        u_int16_t                autoneg_advertised;
        u_int16_t                mau_type;
I did not want to use stuff like Protocol Buffers because I didn t want to copy the existing structures to other structures before serialization (and the other way after deserialization). However, the serializer in lldpd did not allow to handle reference to other structures, lists or circular references. I have written another one which works by annotating a structure with some macros:
struct lldpd_chassis  
    TAILQ_ENTRY(lldpd_chassis) c_entries;
    u_int16_t        c_index;
    u_int8_t         c_protocol;
    u_int8_t         c_id_subtype;
    char            *c_id;
    int              c_id_len;
    char            *c_name;
    char            *c_descr;
    u_int16_t        c_cap_available;
    u_int16_t        c_cap_enabled;
    u_int16_t        c_ttl;
    TAILQ_HEAD(, lldpd_mgmt) c_mgmt;
MARSHAL_TQE  (lldpd_chassis, c_entries)
MARSHAL_FSTR (lldpd_chassis, c_id, c_id_len)
MARSHAL_STR  (lldpd_chassis, c_name)
MARSHAL_STR  (lldpd_chassis, c_descr)
MARSHAL_SUBTQ(lldpd_chassis, lldpd_mgmt, c_mgmt)
Only pointers need to be annotated. The remaining of the structure can be serialized with just memcpy()1. I think there is still room for improvement. It should be possible to add annotations inside the structure and avoid some duplication. Or maybe, using a C parser? Or using the AST output from LLVM?

Library In lldpd 0.5.7, there are two possible entry points to interact with the daemon:
  1. Through SNMP support. Only information available in LLDP-MIB are exported. Therefore, implementation-specific values are not available. Moreover, SNMP support is currently read-only.
  2. Through lldpctl. Thanks to a contribution from Andreas Hofmeister, the output can be requested to be formatted as an XML document.
Integration of lldpd into a network stack was therefore limited to one of those two channels. As an exemple, you can have a look at how Vyatta made the integration using the second solution. To provide a more robust solution, I have added a shared library, liblldpctl, with a stable and well-defined API. lldpctl is now using it. I have followed those directions2:
  • Consistent naming (all exported symbols are prefixed by lldpctl_). No pollution of the global namespace.
  • Consistent return codes (on errors, all functions returning pointers are returning NULL, all functions returning integers are returning -1).
  • Reentrant and thread-safe. No global variables.
  • One well-documented include file.
  • Reduce the use of boilerplate code. Don t segfault on NULL, accept integer input as string, provide easy iterators,
  • Asynchronous API for input/output. The library delegates reading and writing by calling user-provided functions. Those functions can yield their effects. In this case, the user has to callback the library when data is available for reading or writing. It is therefore possible to integrate the library with any existing event-loop. A thin synchronous layer is provided on top of this API.
  • Opaque types with accessor functions.
Accessing bits of information is done through atoms which are opaque containers of type lldpctl_atom_t. From an atom, you can extract some properties as integers, strings, buffers or other atoms. The list of ports is an atom. A port in this list is also an atom. The list of VLAN present on this port is an atom, as well as each VLAN in this list. The VLAN name is a NULL-terminated string living in the scope of an atom. Accessing a property is done by a handful of functions, like lldpctl_atom_get_str(), using a specific key. For example, here is how to display the list of VLAN assuming you have one port as an atom:
vlans = lldpctl_atom_get(port, lldpctl_k_port_vlans);
lldpctl_atom_foreach(vlans, vlan)  
    vid = lldpctl_atom_get_int(vlan,
    name = lldpctl_atom_get_str(vlan,
    if (vid && name)
        printf("VLAN %d: %s\n", vid, name);
Internally, an atom is typed and reference counted. The size of the API is greatly limited thanks to this concept. There are currently more than one hundred pieces of information that can be retrieved from lldpd. Ultimately, the library will also enable the full configuration of lldpd. Currently, many aspects can only be configured through command-line flags. The use of the library does not replace lldpctl which will still be available and be the primary client of the library.

CLI Having a configuration file was requested since a long time. I didn t want to include a parser in lldpd: I am trying to keep it small. It was already possible to configure lldpd through lldpctl. Locations, network policies and power policies were the three items that could be configured this way. So, the next step was to enable lldpctl to read a configuration file, parse it and send the result to lldpd. As a bonus, why not provide a full CLI accepting the same statements with inline help and completion?

Parsing & completion Because of completion, it is difficult to use a YACC generated parser. Instead, I define a tree where each node accepts a word. A node is defined with this function:
struct cmd_node *commands_new(
    struct cmd_node *,
    const char *,
    const char *,
    int(*validate)(struct cmd_env*, void *),
    int(*execute)(struct lldpctl_conn_t*, struct writer*,
        struct cmd_env*, void *),
    void *);
A node is defined by:
  • its parent,
  • an optional accepted static token,
  • an help string,
  • an optional validation function and
  • an optional function to execute if the current token is accepted.
When walking the tree, we maintain an environment which is both a key-value store and a stack of positions in the tree. The validation function can check the environment to see if we are in the right context (we want to accept the keyword foo only once, for example). The execution function can add the current token as a value in the environment but it can also pop the current position in the tree to resume walk from a previous node. As an example, see how nodes for configuration of a coordinate-based location are registered:
/* Our root node */
struct cmd_node *configure_medloc_coord = commands_new(
    "coordinate", "MED location coordinate configuration",
/* The exit node.
   The validate function will check if we have both
   latitude and longitude. */
    NEWLINE, "Configure MED location coordinates",
    cmd_check_env, cmd_medlocation_coordinate,
/* Store latitude. Once stored, we pop two positions
   to go back to the "root" node. The user can only
   enter latitude once. */
        "latitude", "Specify latitude",
        cmd_check_no_env, NULL, "latitude"),
    NULL, "Latitude as xx.yyyyN or xx.yyyyS",
    NULL, cmd_store_env_value_and_pop2, "latitude");
/* Same thing for longitude */
        "longitude", "Specify longitude",
        cmd_check_no_env, NULL, "longitude"),
    NULL, "Longitude as xx.yyyyE or xx.yyyyW",
    NULL, cmd_store_env_value_and_pop2, "longitude");
The definition of all commands is still a bit verbose but the system is simple enough yet powerful enough to cover all needed cases.

Readline When faced with a CLI, we usually expect some perks like completion, history handling and help. The most used library to provide such features is the GNU Readline Library. Because this is a GPL library, I have first searched an alternative. There are several of them: From an API point of view, the first three libraries support the GNU Readline API. They also have a common native API. Moreover, this native API also handles tokenization. Therefore, I have developed the first version of the CLI with this API3. Unfortunately, I noticed later this library is not very common in the Linux world and is not available in RHEL. Since I have used the native API, it was not possible to fallback to the GNU Readline library. So, let s switch! Thanks to the appropriate macro from the Autoconf Archive (with small modifications), the compilation and linking differences between the libraries are taken care of. Because GNU Readline library does not come with a tokenizer, I had to write one myself. The API is also badly documented and it is difficult to know which symbol is available in which version. I have limited myself to:
  • readline(), addhistory(),
  • rl_insert_text(),
  • rl_forced_update_display(),
  • rl_bind_key()
  • rl_line_buffer and rl_point.
Unfortunately, the various libedit libraries have a noop for rl_bind_key(). Therefore, completion and online help is not available with them. I have noticed that most BSD come with GNU Readline library preinstalled, so it could be considered as a system library. Nonetheless, linking with libedit to avoid licensing issues is possible and help can be obtained by prefixing the command with help.

OS specific support

BSD support Until version 0.7, lldpd was Linux-only. The rewrite to use Netlink was the occasion to abstract interfaces and to port to other OS. The first port was for Debian GNU/kFreeBSD, then for FreeBSD, OpenBSD and NetBSD. They all share the same source code:
  • getifaddrs() to get the list of interfaces,
  • bpf(4) to attach to an interface to receive and send packets,
  • PF_ROUTE socket to be notified when a change happens.
Each BSD has its own ioctl() to retrieve VLAN, bridging and bonding bits but they are quite similar. The code was usually adapted from ifconfig.c. The BSD ports have the same functionalities than the Linux port, except for NetBSD which lacks support for LLDP-MED inventory since I didn t find a simple way to retrieve DMI related information. They also offer greater security by filtering packets sent. Moreover, OpenBSD allows to lock the filters set on the socket:
/* Install write filter (optional) */
if (ioctl(fd, BIOCSETWF, (caddr_t)&fprog) < 0)  
    rc = errno;
    log_info("privsep", "unable to setup write BPF filter for %s",
    goto end;
/* Lock interface */
if (ioctl(fd, BIOCLOCK, (caddr_t)&enable) < 0)  
    rc = errno;
    log_info("privsep", "unable to lock BPF interface %s",
    goto end;
This is a very nice feature. lldpd is using a privileged process to open the raw socket. The socket is then transmitted to an unprivileged process. Without this feature, the unprivileged process can remove the BPF filters. I have ported the ability to lock a socket filter program to Linux. However, I still have to add a write filter.

OS X support Once FreeBSD was supported, supporting OS X seemed easy. I got sponsored by which provided a virtual Mac server. Making lldpd work with OS X took only two days, including a full hour to guess how to get Apple Xcode without providing a credit card. To help people installing lldpd on OS X, I have also written a lldpd formula for Homebrew which seems to be the most popular package manager for OS X.

Upstart and systemd support Many distributions propose upstart and systemd as a replacement or an alternative for the classic SysV init. Like most daemons, lldpd detaches itself from the terminal and run in the background, by forking twice, once it is ready (for lldpd, this just means we have setup the control socket). While both upstart and systemd can accommodate daemons that behave like this, it is recommended to not fork. How to advertise readiness in this case? With upstart, lldpd will send itself the SIGSTOP signal. upstart will detect this, resume lldpd with SIGCONT and assume it is ready. The code to support upstart is therefore quite simple. Instead of calling daemon(), do this:
const char *upstartjob = getenv("UPSTART_JOB");
if (!(upstartjob && !strcmp(upstartjob, "lldpd")))
    return 0;
log_debug("main", "running with upstart, don't fork but stop");
The job configuration file looks like this:
# lldpd - LLDP daemon
description "LLDP daemon"
start on net-device-up IFACE=lo
stop on runlevel [06]
expect stop
  . /etc/default/lldpd
  exec lldpd $DAEMON_ARGS
end script
systemd provides a socket to achieve the same goal. An application is expected to write READY=1 to the socket when it is ready. With the provided library, this is just a matter of calling sd_notify("READY=1\n"). Since sd_notify() has less than 30 lines of code, I have rewritten it to avoid an external dependency. The appropriate unit file is:
Description=LLDP daemon
ExecStart=/usr/sbin/lldpd $DAEMON_ARGS

OS include files Linux-specific include files were a major pain in previous versions of lldpd. The problems range from missing header files (like linux/if_bonding.h) to the use of kernel-only types. Those headers have a difficult history. They were first shipped with the C library but were rarely synced and almost always outdated. They were then extracted from kernel version with almost no change and lagged behind the kernel version used by the released distribution4. Today, the problem is acknowledged and is being solved by both the distributions which extract the headers from the packaged kernel and by kernel developers with a separation of kernel-only headers from user-space API headers. However, we still need to handle legacy. A good case is linux/ethtool.h:
  • It can just be absent.
  • It can use u8, u16 types which are kernel-only types. To work around this issue, type munging can be setup.
  • It can miss some definition, like SPEED_10000. In this case, you either define the missing bits and find yourself with a long copy of the original header interleaved with #ifdef or conditionally use each symbol. The latest solution is a burden by itself but it also hinders some functionalities that can be available in the running kernel.
The easy solution to all this mess is to just include the appropriate kernel headers into the source tree of the project. Thanks to Google ripping them for its Bionic C library, we know that copying kernel headers into a program does not create a derivative work.

  1. Therefore, the use of u_int16_t and u_int8_t types is a left-over of the previous serializer where the size of all members was important.
  2. For more comprehensive guidelines, be sure to check Writing a C library.
  3. Tokenization is not the only advantage of libedit native API. The API is also cleaner, does not have a global state and has a better documentation. All the implementations are also BSD licensed.
  4. For example, in Debian Sarge, the Linux kernel was a 2.6.8 (2004) while the kernel headers were extracted from some pre-2.6 kernel.

25 January 2013

Johannes Schauer: Bootstrappable Debian - New Milestone

This post is about the port bootstrap build ordering tool (naming suggestions welcome) which was started as a Debian Google Summer of Code project in 2012 and continued to be developed afterwards. Sources are available through gitorious. In the end of November 2012, I managed to put down an approximation algorithm to the feedback arc set problem which allowed to break the dependency graph into a directed acyclic graph with only few removed build dependencies. I wrote about this effort on our mailinglist but didnt mention it here because it was still too much of a proof-of-concept. Later, in January 2013, I mentioned the result of this algorithm in an email wookey and me wrote to debian-devel mailinglist. Many things happened since November 2012:

Processing pipeline instead of monolithic tools The tools I developed so far tried to accomplish everything by themselves, reusing functionality implemented in a central library. Therefor, if one wanted to try out even trivial new things, it mostly meant to hack some OCaml code. Pietro Abate suggested to instead develop smaller tools which could work independently of each other, would only execute one algorithm each and could easily be connected together in different ways to achieve different effects. This switch is now done and all functionality from the old tools is moved into a new toolset. The exchange format between the tools is either plain text files in deb822 control format (Packages/Sources files) or a dependency graph. The dependency graph is currently marshalled by OCaml but future versions should work with just passing a GraphML (an XML graph format) representation around. This new way of doing things seems close to the UNIX philosophy (each program does one thing well, data is stored as text, every program is a filter). For example the deb822 control output can easily be manipulated using grep-dctrl(1) and there exist many tools which can read, analyze and manipulate GraphML. It is a big improvement over the old, monolithic tools which did not allow to manipulate any intermediate result by external, existing tools. Currently, a shell script ( will execute all tools in a meaningful order, connecting them together correctly. The same tools will be used for a future but they will be connected differently. I wrote about a first proposal of what the individual tools should do and how they should be connected in this email from which I also linked a confusing overview of the pipeline. This overview has recently been improved to be even more confusing and the current version of the lower half (the native part) now looks like this: native reduced Solid arrows represent a flow of binary packages, dottend arrows represent a flow of source packages, ovals represent a set of packages, boxes with rounded corners represent set operations, rectangular boxes represent filters. There is only one input to a filter, which is the arrow connected to the top of the box. Outgoing arrows from the bottom represent the filtered input. Ingoing arrows to either side are arguments to the filter and control how the filter behaves depending on the algorithm. I will explain this better once the pipeline proved to be less in flux. The pipeline is currently executed like this by

Two new ways to break dependency cycles have been discovered So far, we knew of three ways to formally break dependency cycles:
  • remove build dependencies through build profiles
  • find out that the build dependency is only used to build arch:all packages and therefor put it into Build-Depends-Indep
  • cross compile some source packages
Two new methods can be added to the above:

Dependency graph definition changed Back in September when I was visiting irill, Pietro found a flaw in how the dependency graph used to be generated. He supplied a new definition of the dependency graph which does away with the problem he found. After fixing some small issues with his code, I changed all the existing algorithms to use the new graph definition. The old graph is as of today removed from the repository. Thanks to Pietro for supplying the new graph definition - I must still admit that my OCaml foo is not strong enough to have come up with his code.

Added complexity for profile built source packages As mentioned in the introduction, wookey and me addressed the debian-devel list with a proposal on necessary changes for an automated bootstrapping of Debian. During the discussiong, two important things came up which are to be considered by the dependency graph algorithms:
  • profile built source packages may not create all binary packages
  • profile built source packages may need additional build dependencies
Both things make it necessary to alter the dependency graph during the generation of a feedback arc set and feedback vertex set beyond the simple removal of edges. Luckily, the developed approximation algorithms can be extended to support such changes in the graph.

Different feedback arc set algorithms The initially developed feedback arc set algorithm is well suited to discover build dependency edges which should be dropped. It performs far worse when creating the final build order because it only considers edges by itself and not how many edges of a source package can be dropped by profile building it. The adjusted algorithm for generating a build order is more of a feedback vertex set algorithm because instead of greedily finding the edge with most cycles through it, it greedily finds the source package which would break most cycles if it was profile built.

Generating a build order with less profile built source packages After implementing all the features above I now feel more confident to publish the current status of the tools to a wider audience. The following test shows a run of the aforementioned ./ shell script. Its final output is a list of source packages which have to be profile built and a build order. Using the resulting build order, starting from a minimal build system (essential:yes, build-essential and debhelper), all source packages will be compiled which are needed to compile all binary packages in the system. The result will therefor be a list of source and binary packages which fulfill the following property:
  • all binary packages can be built from the available source packages
  • all source packages can be built with the available binary packages
I called this a "reduceded distribution" in earlier posts. The interesting property of this specific selection is, that it contains the biggest problem set of Debian when bootstrapping it: a 900 to 1000 nodes big strongly connected component. Here is a visualization of the problem: hideous mess Source packages do not yet come with build profiles and the cross build situation can not yet be analyzed, so the following assumptions were made: The last point is about 14 build dependencies which were decided to be broken by the feedback arc set algorithm but for which other data sources did not indicate that they are actually breakable. Those 14 are dynamically generated by If above assumptions should not be too far from the actual situation, then not more than 73 source packages have to be modified to bootstrap a reduced distribution. This reduced distribution even includes dependency-wise "big" packages like webkit, metacity, iceweasel, network-manager, tracker, gnome-panel, evolution-data-server, kde-runtime, libav and nautilus. By changing one line in one could easily develop a build order which generates gnome-desktop or really any given (meta-)package selection. All of takes only 80 seconds to execute on my system (Core i5, 2.5 GHz, singlethreaded). Here is the final build order which creates 2044 binary packages from 613 source packages.
  1. nspr, libio-pty-perl, libmcrypt, unzip, libdbi-perl, cdparanoia, libelf, c-ares, liblocale-gettext-perl, libibverbs, numactl, ilmbase, tbb, check, libogg, libatomic-ops, libnl($), orc($), libaio, tcl8.4, kmod, libgsm, lame, opencore-amr, tcl8.5, exuberant-ctags, mhash, libtext-iconv-perl, libutempter, pciutils, gperf, hspell, recode, tcp-wrappers, fdupes, chrpath, libbsd, zip, procps, wireless-tools, cpufrequtils, ed, libjpeg8, hesiod, pax, less, dietlibc, netkit-telnet, psmisc, docbook-to-man, libhtml-parser-perl, libonig, opensp($), libterm-size-perl, linux86, libxmltok, db-defaults, java-common, sharutils, libgpg-error, hardening-wrapper, cvsps, p11-kit, libyaml, diffstat, m4
  2. openexr, enca, help2man, speex, libvorbis, libid3tag, patch, openjade1.3, openjade, expat, fakeroot, libgcrypt11($), ustr, sysvinit, netcat, libirman, html2text, libmad, pth, clucene-core, libdaemon($), texinfo, popt, net-tools, tar, libsigsegv, gmp, patchutils($), dirac, cunit, bridge-utils, expect, libgc, nettle, elfutils, jade, bison
  3. sed, indent, findutils, fastjar, cpio, chicken, bzip2, aspell, realpath, dctrl-tools, rsync, ctdb, pkg-config($), libarchive, gpgme1.0, exempi, pump, re2c, klibc, gzip, gawk, flex-old, original-awk, mawk, libtasn1-3($), flex($), libtool
  4. libcap2, mksh, readline6, libcdio, libpipeline, libcroco, schroedinger, desktop-file-utils, eina, fribidi, libusb, binfmt-support, silgraphite2.0, atk1.0($), perl, gnutls26($), netcat-openbsd, ossp-uuid, gsl, libnfnetlink, sg3-utils, jbigkit, lua5.1, unixodbc, sqlite, wayland, radvd, open-iscsi, libpcap, linux-atm, gdbm, id3lib3.8.3, vo-aacenc, vo-amrwbenc, fam, faad2, hunspell, dpkg, tslib, libart-lgpl, libidl, dh-exec, giflib, openslp-dfsg, ppl($), xutils-dev, blcr, bc, time, libdatrie, libpthread-stubs, guile-1.8, libev, attr, libsigc++-2.0, pixman, libpng, libssh2, sqlite3, acpica-unix, acl, a52dec
  5. json-c, cloog-ppl, libverto, glibmm2.4, rtmpdump, libdbd-sqlite3-perl, nss, freetds, slang2, libpciaccess, iptables, e2fsprogs, libnetfilter-conntrack, bash, libthai, python2.6($), libice($), libpaper, libfontenc($), libxau, libxdmcp($), openldap($), cyrus-sasl2($), openssl, python2.7($)
  6. psutils, libevent, stunnel4, libnet-ssleay-perl, libffi, readline5, file, libvoikko, gamin, libieee1284, build-essential, libcap-ng($), lcms($), keyutils, libxml2, libxml++2.6, gcc-4.6($), binutils, dbus($), libsm($), libxslt, doxygen($)
  7. liblqr, rarian, xmlto, policykit-1($), libxml-parser-perl, tdb, devscripts, eet, libasyncns, libusbx, icu, linux, libmng, shadow, xmlstarlet, tidy, gavl, flac, dbus-glib($), libxcb, apr, krb5, alsa-lib
  8. usbutils, enchant, neon27, uw-imap, libsndfile, yasm, xcb-util, gconf($), shared-mime-info($), audiofile, ijs($), jbig2dec, libx11($)
  9. esound, freetype, libxkbfile, xvidcore, xcb-util-image, udev($), libxfixes, libxext($), libxt, libxrender
  10. tk8.4, startup-notification, libatasmart, fontconfig, libxp, libdmx, libvdpau, libdrm, libxres, directfb, libxv, libxxf86dga, libxxf86vm, pcsc-lite, libxss, libxcomposite, libxcursor, libxdamage, libxi($), libxinerama, libxrandr, libxmu($), libxpm, libxfont($)
  11. xfonts-utils, libxvmc, xauth, libxtst($), libxaw($)
  12. pmake, corosync, x11-xserver-utils, x11-xkb-utils, coreutils, xft, nas, cairo
  13. libedit, cairomm, tk8.5, openais, pango1.0($)
  14. ocaml, blt, ruby1.8, firebird2.5, heimdal, lvm2($)
  15. cvs($), python-stdlib-extensions, parted, llvm-2.9, ruby1.9.1, findlib, qt4-x11($)
  16. xen, mesa, audit($), avahi($)
  17. x11-utils, xorg-server, freeglut, libva
  18. jasper, tiff3, python3.2($)
  19. openjpeg, qt-assistant-compat, v4l-utils, qca2, jinja2, markupsafe, lcms2, sip4, imlib2, netpbm-free, cracklib2, cups($), postgresql-9.1
  20. py3cairo, pycairo, pam, libgnomecups, gobject-introspection($)
  21. gdk-pixbuf, libgnomeprint, gnome-menus, gsettings-desktop-schemas, pangomm, consolekit, vala-0.16($), colord($), atkmm1.6
  22. libgee, gtk+3.0($), gtk+2.0($)
  23. gtkmm2.4, poppler($), openssh, libglade2, libiodbc2, gcr($), libwmf, systemd($), gcj-4.7, java-atk-wrapper($)
  24. torque, ecj($), vala-0.14, gnome-keyring($), gcc-defaults, gnome-vfs($)
  25. libidn, libgnome-keyring($), openmpi
  26. mpi-defaults, dnsmasq, wget, lynx-cur, ghostscript, curl
  27. fftw3, gnupg, libquvi, xmlrpc-c, raptor, liboauth, groff, fftw, boost1.49, apt
  28. boost-defaults, libsamplerate, cmake, python-apt
  29. qjson, qtzeitgeist, libssh, qimageblitz, pkg-kde-tools, libical, dwarves-dfsg, automoc, attica, yajl, source-highlight, pygobject, pygobject-2, mysql-5.5($)
  30. libdbd-mysql-perl, polkit-qt-1, libdbusmenu-qt, raptor2, dbus-python, apr-util
  31. rasqal, serf, subversion($), apache2
  32. git, redland
  33. xz-utils, util-linux, rpm, man-db, make-dfsg, libvisual, cryptsetup, libgd2, gstreamer0.10
  34. mscgen, texlive-bin
  35. dvipng, luatex
  36. libconfig, transfig, augeas, blas, libcaca, autogen, libdbi, linuxdoc-tools, gdb, gpm
  37. ncurses, python-numpy($), rrdtool, w3m, iproute, gcc-4.7, libtheora($), gcc-4.4
  38. libraw1394, base-passwd, lm-sensors, netcf, eglibc, gst-plugins-base0.10
  39. libiec61883, qtwebkit, libvirt, libdc1394-22, net-snmp, jack-audio-connection-kit($), bluez
  40. redhat-cluster, gvfs($), pulseaudio($)
  41. phonon, libsdl1.2, openjdk-6($)
  42. phonon-backend-gstreamer($), gettext, libbluray, db, swi-prolog, qdbm, swig2.0($)
  43. highlight, libselinux, talloc, libhdate, libftdi, libplist, python-qt4, libprelude, libsemanage, php5
  44. samba, usbmuxd, libvpx, lirc, bsdmainutils, libiptcdata, libgtop2, libgsf, telepathy-glib, libwnck3, libnotify, libunique3, gnome-desktop3, gmime, glib2.0, json-glib, libgnomecanvas, libcanberra, orbit2, udisks, d-conf, libgusb
  45. libgnomeprintui, libimobiledevice, nautilus($), libbonobo, librsvg
  46. evas, wxwidgets2.8, upower, gnome-disk-utility, libgnome
  47. ecore, libbonoboui
  48. libgnomeui
  49. graphviz
  50. exiv2, libexif, lapack, soprano, libnl3, dbus-c++
  51. atlas, libffado, graphicsmagick, libgphoto2, network-manager
  52. pygtk, jackd2, sane-backends, djvulibre
  53. libav($), gpac($), ntrack, python-imaging, imagemagick, dia
  54. x264($), matplotlib, iceweasel
  55. strigi, opencv, libproxy($), ffms2
  56. kde4libs, frei0r, glib-networking
  57. kde-baseapps, kate, libsoup2.4
  58. geoclue, kde-runtime, totem-pl-parser, libgweather, librest, libgdata
  59. webkit
  60. zenity, gnome-online-accounts
  61. metacity, evolution-data-server
  62. gnome-panel
  63. tracker
The final recompilation of profile built source packages is omitted. Source packages marked with a ($) are selected to be profile built. All source packages listed in the same line can be built in parallel as they do not depend upon each other. This order looks convincing as it first compiles a multitude of source packages which have no or only few build dependencies lacking. Later steps allow fewer source packages to be compiled in parallel. The amount of needed build dependencies is highest in the source packages that are built last.

18 January 2013

Ingo Juergensmann: Progress of m68k port

A few weeks ago on Christmas Wouter and I blogged about the successful reinstallation of m68k buildds after a very long period of years of inactivity. This even got us mentioned on Slashdot. It's been now roughly 3 weeks since then and we made some sort of progress: shows now that we're from 20% keeping up to about 60% keeping up. The installed packages went from ~1900 to about 3800 and we even triggered 200 packages from BD-uninstallable to Needs-Build:

  wanna-build statistics - Fri Jan 18 06:52:36 CET 2013
Distribution unstable:
Installed       :  3868 (buildd_m68k-ara5: 488, buildd_m68k-arrakis: 20,
                         buildd_m68k-elgar: 106, buildd_m68k-vivaldi: 80,
                         tg: 1412, unknown: 1761, wouter: 1)
Needs-Build     :  3500
Building        :    26 (buildd_m68k-ara5: 1, buildd_m68k-elgar: 1,
                         buildd_m68k-vivaldi: 1, tg: 23)
Built           :     0
Uploaded        :     1 (tg: 1)
Failed          :    34 (buildd_m68k-ara5: 17, tg: 17)
Dep-Wait        :     4 (tg: 4)
Reupload-Wait   :     0
Install-Wait    :     0
Failed-Removed  :     0
Dep-Wait-Removed:     0
BD-Uninstallable:  2320
Auto-Not-For-Us :   188
Not-For-Us      :     9
total           :  9975
 38.78% (3868) up-to-date,  38.79% (3869) including uploaded
 35.09% (3500) need building
  0.26% ( 26) currently building
  0.00% (  0) already built, but not uploaded
  0.38% ( 38) failed/dep-wait
  0.00% (  0) old failed/dep-wait
 23.26% (2320) not installable, because of missing build-dep
  1.88% (188) need porting or cause the buildd serious grief (Auto)
  0.09% (  9) need porting or cause the buildd serious grief
So, overall we're performing fine. The mention on Slashdot even brought up new donors of hardware. Someone offered SCSI/SCA disks up to 73 GB in size and another person even offered several Amigas, from which we'll using a Amiga 2000 with Blizzard 2060 accellerator card as a new buildd. This leads me to a medium-sized drawback: we actually have several Amigas with a Blizzard 2060 as buildds. Unfortunately there's no SCSI driver in current kernels for that kind of hardware. This results in the effect that we can't use as many machines as we could. Currently we are using 3 active buildds and some Aranym VMs running on Thorsten Glasers hosts. We could add 4 more buildds when there would be a working SCSI driver. So, if anyone likes to contribute to the m68k port and loves kernel hacking, this would be a great way to help us. :-)

13 January 2013

Thorsten Glaser: PSA: Unicode codepoints, referring to

PSA: Referring to Unicode codepoints. If your Unicode codepoint is, numerically, between 0 and 65533, inclusive, convert it to hexadecimal and zero-pad it to four nibbles. For example, the Euro sign is Unicode codepoint #8364 which is 20AC hex; the Eszett is 223 which is DF hex, padded 00DF.
Then write an uppercase U , a plus sign + , and the four nibbles: U+20AC U+00DF
In mksh, JSON, etc. it s a backslash \ , a lower-case u and four nibbles. Otherwise, your Unicode codepoint will be, numerically, between 65536 and 1114111, inclusive, that is hex 10000 to 10FFFF. (There s nothing on 65534 and 65535, nor above these figures.) In this case, convert it to hex, zero-pad it to eight nibbles and write it as an uppercase U , a hyphen-minus - and the eight nibbles. In C-like escapes for environments supporting the Unicode SMP, that s a backslash \ , an upper-case U and eight nibbles. Do not, in either case, use less (or more) hex digits than specified here. For example, there s a famous Unicode codepoint U-0001F4A9 PILE OF POO . That s not the same as U+1F4A9. The latter reads as U+1F4A GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA and a digit 9 ( 9). Be educated.

4 January 2013

Thorsten Glaser: on the linguistic gender of hosts, with an xz reminder

Let s start a convention: bare-metal machines have the linguistic male gender ( der Computer , he needs to be rebooted), whereas VMs have the linguistic female gender ( die virtuelle Maschine , she runs better since the last upgrade of Linux-KVM), and neutral linguistic gender is used when you cannot or do not want or need to make such distinction.
This is, of course, entirely unrelated to human gender, but not unrelated to #debian-68k (on OFTC) discussions ;-) ObRant: DO NOT USE xz COMPRESSION LEVELS ABOVE 6! (For -7 we can make exceptions, for example in Debian *-dbg or *-source packages.) You may use -e if you absolutely need the better compression, but please think of the poor sods who have to create the archives. You must not use the highest compression levels -8 or -9 since they have absolutely insane memory requirements on compression and will still hinder machines with less RAM on decompression. (Using -e only affects CPU usage at compression time; decompression is exactly as fast and memory-consuming as without.) Furthermore, DO NOT CHOOSE A COMPRESSION LEVEL WITH A DICTIONARY SIZE MUCH LARGER THAN THE DATA TO COMPRESS, as that makes absolutely no sense and will rather worsen than improve compression. As a reminder, xz uses the following dictionary sizes: Decompression uses less than 1 MiB more than the dictionary size, but the dictionary must always be allocated wholly. (You re fine to use custom presets, but mind the RAM usage!) As a general rule, if you have something of up to 20 MiB to compress, -4 is fine, and -5 will only be better if you have similar data spread across the whole of the file instead of close to each other. When I make mksh distfiles, I instead put files close to each other that have related content, which improves compression much more nicely without penalising low-memory systems; for example, you could put documentation, Makefiles, scripts, m4(1) files, and C source code into groups before archiving, instead of doing it alphabetically. Another note on bzip2: its decompression is slow. I see no reason to use it any more, at all. Use gzip(1) if you care for compatibility or have an issue with xz not having a free copyright licence, and xz otherwise.

31 December 2012

Thorsten Glaser: Not just Amigas, editors and errnos

mksh made quite some waves (machine translation of the third article) recently. Let s state it s not just Amigas ara5 is a buildd running the Atari kernel, an emulated though. On the other hand, the bare-metal Ataris used to be the fastest buildds, so I expect we get them back online soonish. I m currently fighting with some buildd software bugfixes, but once they re in, we will make more of them. Oh, and porterboxen! Does anyone want to host a VM with a porterbox? Requirements: wheezy host system (can be emulated), 1 GiB RAM, one CPU core with about 6500 BogoMIPS or more (so the emulated system has decent speed; an AMD Phenom II X4 3.2 GHz does just fine). Oh, and mksh is ported to more and more platforms, like 386BSD 0.0 with GCC 1.39, and QNX 4 with Watcom and more bugfixes are also being worked on. And let s not forget features! jupp got refreshed: it s got a bracketed paste mode, which is even auto-enabled on xterm-xfree86 (though the xterm(1) in MirBSD s a tad too old to know it; will update that later, just imported sendmail(8) 8.14.6 and lynx(1) 2.8.8dev.15 into base, more to come) and will be enhanced later (should disable auto-indent, wordwrap, status line updates, and possibly more), lots of new functions and bindings, now uses mkstemp(3) to create backup files race-free, and more (read the NEWS file). In MirBSD, Benny and I just added a number of errnos, mostly for SUSv4 compliance and being able to compile more software from pkgsrc without needing to patch. This is being tested right now (although I should probably go out and watch fireworks in less than a half-hour), together with the new imports and the bunch of small fixes we accumulate (even though most development in MirBSD is currently in mksh(1) and similar doesn t mean that all is, or worse, we were dead, which we aren t). I ll publish a new snapshot some time in January. The Grml 2012.12 also contains a pretty up-to-date MirBSD, with a boot(8/i386)loader that now ignores GUID partition table entries when deciding what to use for the a slice. If you haven t already done so, read Benjamin Mako Hill s writings!

25 December 2012

Ingo Juergensmann: Resurrecting m68k - We're on track again!

Mid of November I already wrote about "Resurrecting m68k" - and went on holidays right after that writing. So, nothing really happened until December. But then things happened rather quickly one after one. First, I got Elgar up and running. Then I upgraded Arrakis and Vivaldi again. And then it was a lucky coincedence that my parents made a short trip to Nuremberg. Back then there were another buildd located in that city: Akire, which was operated by Matthias "smurf" Urlichs. So I mailed him and asked, if Akire still do exists and he answered surprising quickly that it is - but he wanted to take it to the garbage soon. I asked Smurf if my parents could pick it up and we managed to exchange contact addresses/phone numbers. To all of our surprise the Hotel, where my parents were staying, was just 180m away from Smurfs home! So it was really easy for my parents to pick up the machine, until they continued their trip to visit me in Rostock. That way I had just another machine to upgrade! Whoohoo! I used most of the time in December to upgrade the machines, migrating to larger disks, setting up everything as someone on debian-68k list popped up to offer a hosting facility in Berlin. That was really perfect timing! I took Elgar from NMMN in Hamburg, where it was hosted until August, and had now a second machine, Akire, where I didn't know where to host. So the offer made it easy to decide: Elgar & Akire will go to Berlin whereas Kullervo & Crest will move back to NMMN, when those two boxes got upgraded. That way we have some kind of redundancy. Perfect! Except that we would still need a running Buildd on those machines. During the last few years, I think 4-5 years, the sbuild/buildd suite did change in a great way. Nothing worked any longer as it did. So I concentrated on getting sbuild ready to pick a source and build it. But I got faced with some segfaults of various stuff. After all, it happened to be a somewhat broken kernel that caused all the problems. After upgrading the kernel, schroot suddenly did work and I could continue in setting up sbuild. After some days things got clearer and finally it worked: 6tunnel was the first newly build package by sbuild on m68k on 20. December 2012! During the next days I tried to get a larger disk (18G) for Spice, another machine, working, so I could use the big disk (36G) for Akire, instead of the old 2 & 4G disks and tried to deploy the sbuild config to Arrakis and Vivaldi. That was about two days ago. The missing part was an updated buildd config. This was addressed by Wouter today (well, actually yesterday in the meantime) and now we have a working buildd again since years! Hooray! :-)) Now we are back on track with the m68k port and will add more buildds, as well native as emulated ones, to come down from that "Needs-Build : 5261" number. So, very big thanks to all that made this possible:
  • Wouter for configuring the buildd setup on Arrakis
  • Aurelien for adding the m68k buildd back to
  • John Paul Adrian Glaubitz for offering the hosting
  • Matthias "smurf" Urlichs for keeping care of Akire all of these years
  • NMMN in Hamburg for willing to continue the hosting for Kullervo & Crest
  • adb@#debian-68k for donating 4x 32MB PS/2 RAM
and finally, last but not least, a very, very BIG THANKS to Thorsten Glaser who acted all these years as a human buildd and for solving the TLS problem on m68k and keeping the port alive in some kind of one-man-show!

24 December 2012

Wouter Verhelst: m68k: back from the dead

It's been a few years coming, but the day has finally arrived. Contrary to some rumours which I've had to debunk over the years, the m68k port did not go into limbo because it was kicked out of the archive; instead, it did because recent versions of glibc require support for thread-local storage, a feature that wasn't available on m68k, and nobody with the required time, willingness, and skill set could be found to implement it. This changed a few years back, when some people wrote the required support, because they were paid to do so in order to make recent Linux run on ColdFire processors again. Since ColdFire and m68k processors are sufficiently similar, that meant the technical problem was solved. However, by that time we'd fallen so far behind that essentially, we needed to rebootstrap the port all over again. Doing that is nontrivial, and most of the m68k porters team just didn't have the time or willingness anymore to work on this; and for a while, it seemed like the m68k port was well and truly dead. But then, two things happened. The first thing that happened was called "ARAnyM", short for "Atari Running on Any Machine". An atari emulator, it started supporting running Linux at some point, allowing us to work on the port while on the road, using a laptop something not possible beforehand. While there were some bugs in the emulation of early versions of aranym, these were all fairly quickly fixed, and before long we could run Debian/m68k in the emulator. However, that was still the etch-m68k port on, some time after Debian etch wasn't even supported anymore; this still required work. The second "thing" that happened was called "Thorsten Glaser". A fresh DD and maintainer of the "mksh" shell, he wanted to make sure his shell would work on all Debian architectures, including the ones on When he found out that the m68k port was in a pretty bad shape, he did not, like many before him, shrug and move on; instead, he took it upon himself to start compiling things, just so he could compile his shell. How's that for dedication. Anyway, fast-forward two years, and we're now, finally, at a point where I spent most of today getting buildd up and running on arrakis, one of our oldest buildd hosts. There were some issues while trying that, but it looks like everything is in working order now; and the first package to start building on the first functioning buildd host since a long time (the build of which is still ongoing at the time of this writing) is "babl". Of course, with only one buildd, we'll have no hope whatsoever of ever catching up. But that can be remedied easily by using more than one buildd host... Update: It built! Party!

15 December 2012

Thorsten Glaser: Der heilige Frieden?

(Apologies for putting this on Planet Debian, but it says the one or other non-English post is okay as long as it s an exception. I feel I need to reach more people with this, but don t feel like translating this into English right now.)
Update: Tanguy asked for a short English summary: it s me ranting against the rioting against muslims and the call for more CCTV surveillance after a possible bomb was found at the train station. In Bonn herrscht immer noch Bombenstimmung , wenn man z.B. auf die Webseite der Lokalzeitung schaut von dem Amoklauf in Connecticut, ber den sich im IRC gewunder wird, ist immer noch nichts zu sehen, daf r wird flei ig wider Islamisten gehetzt. Ich finde das besorgniserregend, mu doch jetzt jeder Angeh rige des Islams f rchten, verfolgt oder benachteiligt zu werden. Das reizt doch erst recht zum Gegenschlag, bei dem dann auch Menschen, die absolut nicht mit der hier vorherrschenden Meinung und Politik bereinstimmen, getroffen werden k nnen. Ich pers nlich habe kein Problem mit Menschen anderen Glaubens oder anderer Weltanschauung, solange wir friedlich miteinander leben k nnen. Ich teile eure Unzufriedenheit mit dem herrschenden Staat, der immer weitergehenden berwachung, Unterdr ckung von Leuten, die nicht dem vorherrschenden Menschenbild entsprechen (egal an welchen Kategori n), und bitte die, die dies lesen, nochmal nachzudenken, bevor sie etwas tun, was hinterher Unschuldige trifft oder gar in friendly fire ausartet. Hat eigentlich wer die in Bad Godesberg ausgegebenen Koran-B cher sich mal angeschaut? Als ich davon las, war ich ja zugegebenerma en neugierig, weil ich vom Koran leider eher wenig kenne, wei aber nicht, wie neutral oder eben nicht die bersetzung gehalten ist. Anhand dessen, was ich bereits mitbekam, sollte das eher friedlicher sein als was durch sp tere Theologen festgelegt wurde wie ja auch zum Beispiel im Christentum, aber ber die Horrorepisoden der christlichen Kirche will ich jetzt auch nicht mich auslassen, in der Hoffnung, da auch diese sich mit den Jahren gebessert hat. (Ist nur halt das Problem mit den Leuten, die die alten Hetzparolen jetzt noch verbreiten. Ist wie im Netz mit den Groupies von Theo de Raadt, die noch asiger zu Leuten sind als er selber.) (Au erdem mu man ja bef rchten, durch Besitz eines Korans schon vorverurteilt zu werden heutzutage *seufz* ich finde das nicht gut!) Update (ich verga ): auch der Ruf nach mehr Video berwachung ist nur Panikmache. Das geht nur zu Lasten des Normalb rgers. Vielleicht lassen sich noch Kleinstdelikte wie Taschendiebstahl damit abschrecken, aber gerade diese Bomben und dergleichen sind doch oft von Leuten, die vor Konsequenzen keine Angst haben, organisiert. Die werden dann maximal M rtyrer. Ich wiederhole nochmal f r die Politiker und die ganz langsamen unter den Lesern: berwachung verhindert keine Straftat.

7 December 2012

Thorsten Glaser: Collision resolution in open addressing hashtables

Before we begin, everyone should read up on hashtables and what open addressing / closed hashing is. The context is lines 111 190 of Python s Objects/dictobject.c as of today (so we get the line numbers straight). (I ve reworded this wlog entry a bit; I originally wrote it too late at night for it to read coherent.) Basically, I ve got an application where I d like to use a hashtable for a number of things not as generic as Python, and with focus on small footprint. I d like to offer associative arrays in a scripting language, where the keys are always arbitrary byte strings excluding NUL. Also, I d like to use the hashtable as backend for indexed arrays, where the keys are uint32_t and the usual use case is sequential. Finally, I m using it for several internal tables, such as a list of keywords, one of builtins, one of special variables, etc. which is a reason for me to not use a self-balancing binary tree as data structure (reading further below might suggest that, but getting a sorted list of hashtable keys is not the focus, though not unimportant).
My questions on this are:

Why is the shift on perturb done after its first use? In my experiments (using 32-bit width everywhere), for the pathological case of an 8-element (i = 3) table with three entries 0, 0x40000000 and 0x800000000, the second round yields 1 for all three, so it cannot have to do with the upper bits. My lookup looks like:

	mask = 2  - 1;
	j = perturb = hash(key);
	goto find_first_slot;
	j = (j << 2) + j + perturb + 1;
	perturb >>= PERTURB_SHIFT;
	entry = table[j & mask];
	if (!match(entry)) goto find_next_empty_slot;

This means that my first check is always the bare hash (so only do it if needed is no reason) and, since I m using gotos, I could just move the perturb >>= PERTURB_SHIFT; line before the line recalculating the next j to use. This seems to make more sense, even in the face of Python. (I actually looked at the Python file s comments again today because I thought to use a different resolution, but they have a good rationale for using the multiplication by 5.) Why can t we just use i as the PERTURB_SHIFT? Sure, this changes a shift-right by a constant, which can possibly be encoded as immediate value in assembly (unless you re on a pre-80186, which can only do SHR AX,1 and SHR AX,CL but not SHR AX,4, but that s outside of mksh s scope) into a right-shift by a variable, but i is already known, and I think the behaviour is better (it wouldn t eat any bits; assume the same 8-entry hashtable and pathologic keys 0, 8 and 16). Again: who do I think I am to go against the wisdom of the Python people, who seem to have shed more thought on this than everyone else I saw, asked, read about (including Spammipedia). That s why I m asking here. On that reference: I don t support spammers or people nagging for donations or premium accounts, like Xing and Groundspeak/Geocaching.COM, at all. In fact, I urge others to do the same, so it really hurts them; it may be their business model, but not if they spam me. Besides, OpenCaching.DE exists. Another thing is: to avoid CVE-2011-4815, I m randomising the hash used, with one seed value per hashtable, changed before a resize operation. I originally thought to seed it with nonzero, but then I have to rehash on hashtable resize, so I ll be XORing the final hash value instead (thanks ciruZ for the idea). I m thinking of omitting that for indexed arrays, as an attacker almost certainly cannot determine the keys there. (To directly use the indexed array keys, which are already uint32_t, as hashes makes using i from even more important.) The hash I m using is a modified Jenkins one-at-a-time called NZAAT: it s my new generic standard n n-cryptographic hash, and the changes are thus: while adding a byte, another increment of the hash is done (so NUL counts), and the finaliser got prefixed with the shift-left-add+shift-right-xor sequence of the adder (but not adding any value or the +1), to get best avalanche for all bytes. I actually compiled several versions of Hash.cs on a Windows VM at work to analyse the original one-at-a-time and all of my modifications; these turned out to be the simplest ones (I originally had added 0x100 instead of 1, but the effect was the same, and +1 is usually cheaper on most CPUs). Also, to avoid people being able to get to the seed, a user will always get only a sorted list of hashtable keys (numeric for indexed arrays, ASCIIbetically otherwise; see also my thoughts on JSON from the previous wlog entry). What algorithm do I use? For strings, comparisons are much more expensive, so I d like to keep them low. Memory use is also a factor; allocating one large(r) block is better than many small ones due to the pool allocator overhead and due to portability to ancient Unic s (which is another reason for me to use a hashtable which is a small struct plus an array of pointers, and then pass the list of keys as array of string pointers, instead of a tree). For both reasons, I m thinking a relatively simple MergeSort: I need to allocate the result array anyway, so I can just get two and free the one that isn t the end result, and it s AFAICT the cheapest on comparisons other than Tree Sort (which nobody really seems to use, and which would effect to using a balanced binary tree again). Since keys are unique, stability and duplicate handling is never an issue. I d like to use only one algorithm and one data structure, not a combination, as compactness is a design goal. Please drop your thoughts on Freenode, e.g. by /msg MemoServ send mirabilos your text here or per eMail to the domains debian, freewrt or mirbsd, which are organisations, with the localpart tg. Or just contact me as usual, if you re already acquainted. Or lookup 0xE99007E0. Thanks in advance! (Especially, Python Developers thoughts are welcome.)

1 December 2012

Thorsten Glaser: Proposed extensions to the JSON specification

The following proposal extends the JSON specification, with the idea of using JSON as an information interchange format, rather than just a way of writing certain ECMAscript values. They do not add anything but only restrict valid JSON content and encoders with some rationale. First of, I d like to remind everyone, including JSON s author, that JSON is case-sensitive, except in the four hexdigits after a backslash-u sequence in a String. Second, I d like to remind everyone that JSON is not binary-safe. No way around that, it implements Unicode (actually, 16-bit UCS-2, and it doesn t guarantee that UTF-16 surrogates are correctly paired) text. I also consider only UTF- 8, 16,32 B,L E valid encodings for JSON. (No PDP endian, either. Sorry, guys.) For my first proposal, I d like to point out CVE-2011-4815 which was about overflowing hashtables. The obvious fix is to randomise the hash per hashtable; to ensure this doesn t leak, we sort ASCIIbetically the keys of an Object in the encoder. (Using Unicode is good here we can just sort the keys as UTF-8 strings by their uint8_t value or as Unicode (UCS-2 or even UCS-4 or UTF-16) strings by the codepoints.) JSON was never preserving the order of elements in an Object anyway so we make it standardised (we still accept any order, and, when parsing, in collision cases, the later value wins). This also helps diffs. For my second proposal, I d like to forbid \u0000, \uFFFE, \uFFFF in strings. The first because many implementations use C strings, and for an information interchange format this is better; it also has security implications to allow NUL in a string. The other two, but not unpaired UTF-16 surrogates (as ECMAscript uses UCS-2 and got UTF-16 only later) because they re not valid Unicode; JSON was not binary-safe already so why bother. Among other benefits, this also helps implementations. For my third proposal, I d like to agree that implementations should impose a nesting depth limit that may be user-defined, and in the face of which, cyclic checking may be ignored by an encoder. I emit nesting depth overflows as literalnull; might also throw an error. Since I was asked, the common standard value is to restrict nesting depth is 32, unless the user specified one. (I also saw 8, but 32 WFM pretty well.) Most seem to use it even if it may seem low at first. Only specialised applications probably need more, and they can always pass a value. For my forth proposal, backslash-escape U+007F U+009F always. It may upset humans, editors, databases, etc. (This paragraph is newly added, after some IRC discussion.) All these do not permit anything that wasn t accepted to be accepted afterwards. I ve got a fifth proposal which changes acceptance rules but only for a subset of parsers: formally JSON is defined in ECMA-262 as industry standard that, in contrast to RFC 4627, always allowed any Value as top-level element of a JSON text. I d like to make it so, and ignore the RFC s requirement for it to be an Object or Array. Even so, the first two characters (after the BOM, if any) of a JSON text always are in the non-NUL 7-bit ASCII range, allowing for encoding detection. (This is done by the NUL octet pattern in the first four octets.) JSON has only taken off because it s a tightly defined simple format that can be used everywhere and isn t too awful for humans (escaping not needed for U+0020 U+D7FF and U+E000 U+FFFD after all, although I d also take the C1 control characters out, see my forth proposal above). I ve started to use a trailing comma in indexed and associative arrays in code I write at work, when the array values are one a line, to help version control systems to do their diffs, but refrain from asking for a JSON extension to permit that in order to not endanger compatibility any (no comment needed, it s just not worth it), but I d like my above proposals to be followed by implementators (and I m one of them). Some more discussion with Jonathan pointed out that JSON5 allows for trailing commata in Object and Array; IMHO the only feature of it that is not bad or outright harmful. I ll probably keep from accepting them because, on their own, they re not that useful, and I usually would run JSON texts, even configs, through a parser/encoder roundtrip to pretty-print them which would lose them anyway. As for binary-safeness: probably best to just use base64 and let the outer layers worry about compression. The data is usually unrelated to the JSON-encoded structure, and even if it s related to other data the base64 representation is usually similar (unless misaligned). Update 02.12.2012 Wrong I was about the first two characters: " " is a valid JSON text. Still possible to peek at four octets and determine the encoding by ordering the tests; updated my notes.

10 October 2012

Johannes Schauer: Using Gentoo to find reduced build dependencies for Debian source packages

Automatically devising a build order that allows to bootstrap Debian, currently fails (amongst other reasons) because of the lack of metadata information about which build dependencies can potentially be dropped from source packages. If that information was available, an algorithm could decide which build dependencies to drop so that dependency cycles can be broken. Finding droppable build dependencies of a source package is something only humans can do. This is because it involves to manually analyze and test the build system of a source package. Build systems are neither uniform nor do they encode their dependencies in a way that can directly be mapped to Debian packages. Therefor they are not machine readable. One idea to solve the dilemma, is to find a Linux distribution that provides the following: If such a distribution can be found then the information from it can be used to find dependencies that can also be dropped from Debian source packages. Gentoo is a distribution that fulfills above requirements through so called USE flags that allow to enable or disable features during compilation. Dependencies of Gentoo source packages are stored in .ebuild files that control the build process. Since .ebuild files are bash scripts, parsing them is not trivial. I therefor used the emerge software package to extract that information. Thanks to the well written emerge code and to quick help in the Gentoo IRC channel, it didnt take long to make the code run on Debian. My sourcecode is downloadable here: Before I list the results of using Gentoo USE flags to determine dependencies that can potentially be dropped from Debian source packages, let me list the problems that this method entails.

Only package name matching, no version matching When writing the mapping from Debian to Gentoo packages and back I discard version information. There are just too many versions that either Debian or Gentoo have and are not present in the other. So the assumption is, that Debian Sid and Gentoo have both the most recent major versions of upstream software which has roughly the same requirements in terms of build dependencies.

Gentoo packages are matched to Debian source packages In Gentoo there are only source packages and no binary packages. So I map Gentoo packages to Debian source packages. But Gentoo source packages build depend on other source packages while Debian source packages depend on binary packages. So at some point I have to translate Gentoo packages to Debian source packages and those source packages to Debian binary packages. I do this by analyzing the original binary package build dependencies of a Debian source package and then filter out those binary packages as being droppable that are built by the Debian source packages that were found to be droppable.

Not the exact same package set There is some software that is only in Gentoo and some that is only in Debian. Debian and Gentoo also split some source packages differently.

Gentoo has more direct dependencies Many build dependencies in Debian are indirectly pulled in through dependencies of direct build dependencies. In Gentoo source packages directly depend on most things they need to build successfully. This leads to the list of dependencies in Gentoo to be much larger than the list of dependencies in the corresponding Debian source package. It also means that lots of dependencies that can be dropped in Gentoo are not found to be droppable in Debian because they are not direct dependencies of that source package.

There are no implicit dependencies Gentoo will often drop dependencies that are essential or build-essential packages in Debian and are therefor implicit build dependencies that cannot be dropped.

Result Despite the many problems, the result doesnt look too wrong. I got some Debian source packages that were found to have droppable build dependencies from Thorsten Glaser and all dependencies that Gentoo found to be droppable were also dropped by him. To put everything into numbers: the current 912 nodes big SCC in Debian Sid can be reduced to 6 individual SCC with 422, 5, 5, 3, 2 and 2 nodes each. So using Gentoo cuts the size of the central component to more than half. Surely, there will be a number of dependencies that were found to be droppable in Gentoo but are actually not droppable in Debian. The point is, that it is better to have "some" data even if it contains false positives than no data at all. It is easier for a human to verify if some suggested droppable build dependencies are actually correct than going through hundreds of source packages with thousands of dependencies manually.

16 July 2012

Thorsten Glaser: Apache 2, https clients linked against GnuTLS, connection errors

I ve been debugging a weird problem at work after upgrading a complex system from lenny to wheezy, some https clients failed to connect: GNU wget and Debian s version of lynx(1) which is linked against libgnutls26 fail. NSS applications continue to work, as does cURL; wget and lynx on MirBSD (linked with OpenSSL of course) work. Even Debian s gnutls-cli tools from both gnutls26 and gnutls28 work. Huh. The error_log shows renegotiation problems, yet setting the new Apache 2 configuration option to use insecure renegotiation doesn t help either. (The option is a total #FAIL: its only other value is use secure TLSv1.x renegotiation , but I don t want/need SSL renegortiation at all, anyway.) Natureshadow told me this was a hot issue on Debianforum at the moment, yet, nobody had a clue or enough information to file a formal bugreport against (initially) apache2, as that s what changed. I tracked it down on a new VM with no configuration otherwise, and here are my findings so others don t run into it.

Tracking down the problem, this can be reduced to the following configuration (minimised, to show the problem) in /etc/apache2/sites-enabled/1one:

	<VirtualHost *:443>
		RedirectMatch permanent .
		SSLEngine on
		SSLCertificateFile /etc/ssl/W_lan_tarent_de.cer
		SSLCertificateKeyFile /etc/ssl/private/W_lan_tarent_de.key
		SSLCertificateChainFile /etc/ssl/

Do not mind the actual content, this is a very stripped-down demo on a not-actually-set-up-yet box.

Same is valid for the companion configuration file /etc/apache2/sites-enabled/2two:

	NameVirtualHost *:443
	<VirtualHost *:443>
		SSLEngine on
		# workaround for BEAST (CVE-2011-3389), short-term
		SSLCipherSuite RC4-SHA
		SSLCertificateFile /etc/ssl/W_lan_tarent_de.cer
		SSLCertificateKeyFile /etc/ssl/private/W_lan_tarent_de.key
		SSLCertificateChainFile /etc/ssl/
		SSLProtocol TLSv1

Turns out the BEAST workaround was at fault here: the differing SSLCipherSuites between the vhosts (on the same Legacy IP / TCP Port tuple, as we use Wildcard SSL Certificates) made Apache 2 want to renegotiate, so either commenting it on 2two or, better, adding it to 1one helped. Interestingly enough, the SSLProtocol directive did not matter (in my tests). So, keep SSL settings synchronised between vhosts. In fact, those were already from include files, but 2two was from the Evolvis 5 generation, whereas we added to 1one an Include of the file generated by the previous releases of EvolvisForge and had not switched those legacy vhosts to the new configuration, as everything worked on lenny. This wlog entry brought to you by the system administrators of tarent solutions GmbH and the Evolvis Project, based on FusionForge.