Search Results: "Bastian Blank"

8 April 2024

Bastian Blank: Python dataclasses for Deb822 format

Python includes some helping support for classes that are designed to just hold some data and not much more: Data Classes. It uses plain Python type definitions to specify what you can have and some further information for every field. This will then generate you some useful methods, like __init__ and __repr__, but on request also more. But given that those type definitions are available to other code, a lot more can be done. There exists several separate packages to work on data classes. For example you can have data validation from JSON with dacite. But Debian likes a pretty strange format usually called Deb822, which is in fact derived from the RFC 822 format of e-mail messages. Those files includes single messages with a well known format. So I'd like to introduce some Deb822 format support for Python Data Classes. For now the code resides in the Debian Cloud tool. Usage Setup It uses the standard data classes support and several helper functions. Also you need to enable support for postponed evaluation of annotations.
from __future__ import annotations
from dataclasses import dataclass
from dataclasses_deb822 import read_deb822, field_deb822
Class definition start Data classes are just normal classes, just with a decorator.
@dataclass
class Package:
Field definitions You need to specify the exact key to be used for this field.
    package: str = field_deb822('Package')
    version: str = field_deb822('Version')
    arch: str = field_deb822('Architecture')
Default values are also supported.
    multi_arch: Optional[str] = field_deb822(
        'Multi-Arch',
        default=None,
    )
Reading files
for p in read_deb822(Package, sys.stdin, ignore_unknown=True):
    print(p)
Full example
from __future__ import annotations
from dataclasses import dataclass
from debian_cloud_images.utils.dataclasses_deb822 import read_deb822, field_deb822
from typing import Optional
import sys
@dataclass
class Package:
    package: str = field_deb822('Package')
    version: str = field_deb822('Version')
    arch: str = field_deb822('Architecture')
    multi_arch: Optional[str] = field_deb822(
        'Multi-Arch',
        default=None,
    )
for p in read_deb822(Package, sys.stdin, ignore_unknown=True):
    print(p)
Known limitations

3 December 2023

Ben Hutchings: FOSS activity in September 2023

12 October 2023

Reproducible Builds: Reproducible Builds in September 2023

Welcome to the September 2023 report from the Reproducible Builds project In these reports, we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.
Andreas Herrmann gave a talk at All Systems Go 2023 titled Fast, correct, reproducible builds with Nix and Bazel . Quoting from the talk description:

You will be introduced to Google s open source build system Bazel, and will learn how it provides fast builds, how correctness and reproducibility is relevant, and how Bazel tries to ensure correctness. But, we will also see where Bazel falls short in ensuring correctness and reproducibility. You will [also] learn about the purely functional package manager Nix and how it approaches correctness and build isolation. And we will see where Bazel has an advantage over Nix when it comes to providing fast feedback during development.
Andreas also shows how you can get the best of both worlds and combine Nix and Bazel, too. A video of the talk is available.
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb fixed compatibility with file(1) version 5.45 [ ] and updated some documentation [ ]. In addition, Vagrant Cascadian extended support for GNU Guix [ ][ ] and updated the version in that distribution as well. [ ].
Yet another reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. If you haven t been before, our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page, the news item, or the invitation email that Mattia Rizzolo sent out recently, all of which have more details about the event and location. We are also still looking for sponsors to support the event, so please reach out to the organising team if you are able to help. Also note that PackagingCon 2023 is taking place in Berlin just before our summit.
On the Reproducible Builds website, Greg Chabala updated the JVM-related documentation to update a link to the BUILDSPEC.md file. [ ] And Fay Stegerman fixed the builds failing because of a YAML syntax error.

Distribution work In Debian, this month: September saw F-Droid add ten new reproducible apps, and one existing app switched to reproducible builds. In addition, two reproducible apps were archived and one was disabled for a current total of 199 apps published with Reproducible Builds and using the upstream developer s signature. [ ] In addition, an extensive blog post was posted on f-droid.org titled Reproducible builds, signing keys, and binary repos .

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In August, a number of changes were made by Holger Levsen:
  • Disable armhf and i386 builds due to Debian bug #1052257. [ ][ ][ ][ ]
  • Run diffoscope with a lower ionice priority. [ ]
  • Log every build in a simple text file [ ] and create persistent stamp files when running diffoscope to ease debugging [ ].
  • Run schedulers one hour after dinstall again. [ ]
  • Temporarily use diffoscope from the host, and not from a schroot running the tested suite. [ ][ ]
  • Fail the diffoscope distribution test if the diffoscope version cannot be determined. [ ]
  • Fix a spelling error in the email to IRC gateway. [ ]
  • Force (and document) the reconfiguration of all jobs, due to the recent rise of zombies. [ ][ ][ ][ ]
  • Deal with a rare condition when killing processes which should not be there. [ ]
  • Install the Debian backports kernel in an attempt to address Debian bug #1052257. [ ][ ]
In addition, Mattia Rizzolo fixed a call to diffoscope --version (as suggested by Fay Stegerman on our mailing list) [ ], worked on an openQA credential issue [ ] and also made some changes to the machine-readable reproducible metadata, reproducible-tracker.json [ ]. Lastly, Roland Clobus added instructions for manual configuration of the openQA secrets [ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 October 2023

Bastian Blank: Introducing uploads to Debian by git tag

Several years ago, several people proposed a mechanism to upload packages to Debian just by doing "git tag" and "git push". Two not too long discussions on debian-devel (first and second) did not end with an agreement how this mechanism could work.1 The central problem was the ability to properly trace uploads back to the ones who authorised it. Now, several years later and after re-reading those e-mail threads, I again was stopped at the question: can we do this? Yes, it would not be just "git tag", but could we do with close enough? I have some rudimentary code ready to actually do uploads from the CI. However the list of caveats are currently pretty long. Yes, it works in principle. It is still disabled here, because in practice it does not yet work. Problems with this setup So what are the problems? It requires the git tags to include the both signed files for a successful source upload. This is solved by a new tool that could be a git sub-command. It just creates the source package, signs it and adds the signed file describing the upload (the .dsc and .changes file) to the tag to be pushed. The CI then extracts the signed files from the tag message and does it's work as normal. It requires a sufficiently reproducible build for source packages. Right now it is only known to work with the special 3.0 (gitarchive) source format, but even that requires the latest version of this format. No idea if it is possible to use others, like 3.0 (quilt) for this purpose. The shared GitLab runner provides by Salsa do not allow ftp access to the outside. But Debian still uses ftp to do uploads. At least if you don't want to share your ssh key, which can't be restricted to uploads only, but ssh would not work either. And as the current host for those builds, the Google Cloud Platform, does not provide connection tracking support for ftp, there is no easy way to allow that without just allowing everything. So we have no way to currently actually perform uploads from this platform. Further work As this is code running in a CI under the control of the developer, we can easily do other workflows. Some teams do workflows that do tags after acceptance into the Debian archive. Or they don't use tags at all. With some other markers, like variables or branch names, this support can be expanded easily. Unrelated to this task here, we might want to think about tying the .changes files for uploads to the target archive. As this code makes all of them readily available in form of tag message, replaying them into possible other archives might be more of a concern now. Conclusion So to come back to the question, yes we can. We can prepare uploads using our CI in a way that they would be accepted into the Debian archive. It just needs some more work on infrastructure.

  1. Here I have to admit that, after reading it again, I'm not really proud of my own behaviour and have to apologise.

28 December 2020

Bastian Blank: Salsa updated to GitLab 13.7

Yesterday, Debian Salsa was updated to the new GitLab 13.7 upstream release. As always, this new release comes with a bunch of new features. GitLab 13.7 GitLab 13.7 includes a bunch of new features. See the upstream release posting for a full list.

22 October 2020

Bastian Blank: Salsa updated to GitLab 13.5

Today, GitLab released the version 13.5 with several new features. Also Salsa got some changes applied to it. GitLab 13.5 GitLab 13.5 includes several new features. See the upstream release postfix for a full list. Shared runner builds on larger instances It's been way over two years since we started to use Google Compute Engine (GCE) for Salsa. Since then, all the jobs running on the shared runners run within a n1-standard-1 instance, providing a fresh set of one vCPU and 3.75GB of RAM for each and every build. GCE supports several new instance types, featuring better and faster CPUs, including current AMD EPICs. However, as it turns out, GCE does not support any single vCPU instances for any of those types. So jobs in the future will use n2d-standard-2 for the time being, provinding two vCPUs and 8GB of RAM.. Builds run with IPv6 enabled All builds run with IPv6 enabled in the Docker environment. This means the lo network device got the IPv6 loopback address ::1 assigned. So tests that need minimal IPv6 support can succeed. It however does not include any external IPv6 connectivity.

30 September 2020

Bastian Blank: Booting Debian on ThinkPad X13 AMD

Running new hardware is always fun. The problems are endless. The solutions not so much. So I've got a brand new ThinkPad X13 AMD. It features an AMD Ryzen 5 PRO 4650U, 16GB of RAM and a 256GB NVME SSD. The internal type identifier is 20UF. It runs the latest firmware as of today with version 1.09. So far I found two problems with it: Disable Secure Boot The system silently fails to boot a signed shim and grub from an USB thumb drive. I used on of the Debian Cloud images, which should properly work in this setup and do on my other systems. The only fix I found was to disable Secure Boot alltogether. Select Linux in firmware Running a Linux 5.8 with default firmware setting produces ACPI errors on each key press.
ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.GPP3], AE_NOT_FOUND (20200528/psargs-330)
ACPI Error: Aborting method \_SB.GPIO._EVT due to previous error (AE_NOT_FOUND) (20200528/psparse-529)
This can be "fixed" by setting a strategic setting inside the firmware:
Config > Power > Sleep State to Linux

16 September 2020

Bastian Blank: Salsa hosted 1e6 CI jobs

Today, Salsa hosted it's 1,000,000th CI job. The price for hitting the target goes to the Cloud team. The job itself was not that interesting, but it was successful.

12 August 2017

Bastian Blank: Network caps in cloud environments

Providing working network is not easy. All the cloud providers seem to know how to do that most of the time. Providing enough troughput is not easy either. Here it get's interresting as the cloud providers tackle that problem with completely different results. There are essentially three large cloud providers. The oldest and mostly known cloud provider is Amazon Web Services (AWS). Behind that follow Microsoft with Azure and the Google Cloud Platform (GCP). Some public instances of OpenStack exist, but they simply don't count anyway. So we remain with three and they tackle this problem with widely different results. Now, what network troughput is necessary for real world systems anyway? An old friend gives the advice: 1Gbps per Core of uncongested troughput within the complete infrastructure is the minimum. A generalization of this rule estimates around 1bps per clock cycle and core, so a 2GHz core would need 2Gbps. Do you even get a high enough network cap at your selected cloud provider to fill any of these estimates? Our first provider, AWS, publishes a nice list of network caps for some of there instance types. The common theme in this list is: for two cores (all the *.large types) you get 500Mbps, for four cores (*.xlarge) you get 750Mbps and for eight cores (*.2xlarge) you get 1000Mbps. This is way below our estimate shown above and does not even raise linear with the number of cores. But all of this does not really matter anyway, as the performance of AWS is the worst of the three providers. Our second provider, Azure, seems to not publish any real information about network caps at all. From my own knowledge it is 50MBps (500Mbps) per core for at least the smaller instances. At least is scales linear with instance size, but is still way below our estimates. Our third provider, GCP, documents a simple rule for network caps: 2Gbps per core. This matches what we estimated. Now the most important question: does this estimate really work and can we actually fill it. The answer is not easy. A slightly synthetic test of a HTTP server with cached static content showed that it can easily reach 7Gbps on a 2GHz Intel Skylake core. So yes, it gives a good estimate on what network troughput is needed for real world applications. However we still could easily file pipe that is larger by a factor of three.

11 June 2017

Bastian Blank: New blog featuring Pelican

For years I used a working blog setup using Zope, Plone and Quills. Quills was neglected for a long time and made me stuck at the aging Plone 4.2. There seems to be some work done in the last year, but I did not really care. Also this whole setup was just a bit too heavy for what I actually use it for. Well, static page generators are the new shit, so there we are. So here is it, some new blog, nice and shiny, which I hope to stop neglecting. The blog is managed in a Git repository. This repository is hosted on a private GitLab instance. It uses Pelican to generate shiny pages and is run via the GitLab CI. The finished pages are served by a, currently not highly available, instance of GitLab Pages (yes, this is one of the parts they copied from Github first). So let's see if this setup makes me more comfortable.

9 December 2016

Guido G nther: Debian Fun in November 2016

Debian LTS November marked the nineteenth month I contributed to Debian LTS under the Freexian umbrella. I had 7 hours allocated which I used completely by: Other Debian stuff Some other Free Software activites

3 August 2016

Guido G nther: Debian Fun in July 2016

Debian LTS July marked the fifteenth month I contributed to Debian LTS under the Freexian umbrella. As usual I spent the 8 hours working on these LTS things: Other Debian stuff

14 March 2016

Lunar: Reproducible builds: week 46 in Stretch cycle

What happened in the reproducible builds effort between March 6th and March 12th:

Packages fixed The following packages have become reproducible due to changes in their build dependencies: dfc, gap-openmath, gnubik, gplanarity, iirish, iitalian, monajat, openimageio, plexus-digest, ruby-fssm, vdr-plugin-dvd, vdr-plugin-spider. The following packages became reproducible after getting fixed:
  • adduser/3.114 by Niels Thykier.
  • bsdmainutils/9.0.7 by Michael Meskes.
  • criu/2.0-1 by Salvatore Bonaccorso.
  • genometools/1.5.8+ds-2 by Sascha Steinbiss.
  • gfs2-utils/3.1.8-1 uploaded by Bastian Blank, fix by Christoph Berg.
  • gmerlin/1.2.0~dfsg+1-5 by IOhannes m zm lnig.
  • heroes/0.21-14 by Stephen Kitt.
  • kmc/2.3+dfsg-3 by Sascha Steinbiss.
  • polyml/5.6-3 by James Clarke.
  • sed/4.2.2-7.1 by Niels Thykier.
  • snpomatic/1.0-3 by Sascha Steinbiss.
  • tantan/13-4 by Sascha Steinbiss.
Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #817979 on modernizr by Sascha Steinbiss: sort list of files included in feature-detects.js.
  • #818027 on snapper by Sascha Steinbiss: always use /bin/sh as shell.

tests.reproducible-builds.org Always use all cores on armhf builders. (h01ger) Improve the look of Debian dashboard. (h01ger)

Package reviews 118 reviews have been removed, 114 added and 15 updated in the previous week. 15 FTBFS have been filled by Chris Lamb. New issues: xmlto_txt_output_locale_specific.

Misc. Lunar seeks new maintainers for diffoscope, several mailing lists, and these very weekly reports.

21 March 2014

Petter Reinholdtsen: Video DVD reader library / python-dvdvideo - nice free software

Keeping your DVD collection safe from scratches and curious children fingers while still having it available when you want to see a movie is not straight forward. My preferred method at the moment is to store a full copy of the ISO on a hard drive, and use VLC, Popcorn Hour or other useful players to view the resulting file. This way the subtitles and bonus material are still available and using the ISO is just like inserting the original DVD record in the DVD player. Earlier I used dd for taking security copies, but it do not handle DVDs giving read errors (which are quite a few of them). I've also tried using dvdbackup and genisoimage, but these days I use the marvellous python library and program python-dvdvideo written by Bastian Blank. It is in Debian already and the binary package name is python3-dvdvideo. Instead of trying to read every block from the DVD, it parses the file structure and figure out which block on the DVD is actually in used, and only read those blocks from the DVD. This work surprisingly well, and I have been able to almost backup my entire DVD collection using this method. So far, python-dvdvideo have failed on between 10 and 20 DVDs, which is a small fraction of my collection. The most common problem is DVDs using UTF-16 instead of UTF-8 characters, which according to Bastian is against the DVD specification (and seem to cause some players to fail too). A rarer problem is what seem to be inconsistent DVD structures, as the python library claim there is a overlap between objects. An equally rare problem claim some value is out of range. No idea what is going on there. I wish I knew enough about the DVD format to fix these, to ensure my movie collection will stay with me in the future. So, if you need to keep your DVDs safe, back them up using python-dvdvideo. :)

9 September 2013

Bastian Blank: Setting up Ceph the hard way

Components Ceph consists of two main daemons. One is the monitoring daemon, which monitors the health of the cluster and provides location information. The second is the storage daemon, which maintains the actual storage. Both are needed in a minimal setup.
Monitor The monitor daemons are the heart of the cluster. They maintain quorum within the cluster to check if everything can be used. They provide referrals to clients, to allow them to find the data they seek. Without a majority of monitors nothing will work within the cluster.
Storage The storage daemons maintain the actual storage. One daemon maintains one backend storage device.
Configuration The default config is understandable, but several things will just not work with it.
Monitor on localhost By default the monitor daemon will not work on localhost. There is an (undocumented) override to force it to work on localhost:
[mon.noname-admin]
 mon addr = [::1]:6789
The monitor will be renamed to mon.admin internaly.
IPv6 Ceph supports IP (IPv6) or legacy-IP (IPv4), but never both. I don't really use legacy-IP any longer, so I have to configure Ceph accordingly:
[global]
  ms bind ipv6 = true
One-OSD clusters For testing purposes I wanted to create a cluster with exactly one OSD. It never got into a clean state. So I asked and found the answer in #ceph:
[global]
 osd crush chooseleaf type = 0
Disable authentication While deprecated, the following seems to work:
[global]
 auth supported = none
Complete configuration
[global]
 auth supported = none
 log file = $name.log
 run dir =  
 osd pool default size = 1
 osd crush chooseleaf type = 0
 ms bind ipv6 = true
[mon]
 mon data =  /$name
[mon.noname-admin]
 mon addr = [::1]:6789
[osd]
 osd data =  /$name
 osd journal =  /$name/journal
 osd journal size = 100
[osd.0]
 host = devel
Installation This is currently based on my updated packages. And they are still pretty unclean from my point of view.
Setup All the documentation tells only about ceph-deploy and ceph-disk. This tools are abstractions that need root to mount stuff and do all the work. Here I show how to do a minimal setup without needing root.
Keyring setup For some reason even with no authentication the monitor setup wants a keyring. So just set one up:
$ ceph-authtool --create-keyring keyring --gen-key -n mon.
$ ceph-authtool keyring --gen-key -n client.admin
Monitor setup Monitor setup by hand is easy:
$ mkdir $mon_data
$ ceph-mon -c ceph.conf --mkfs --fsid $(uuidgen) --keyring keyring
After that just start it:
$ ceph-mon -c ceph.conf
$
OSD setup First properly add the new OSD to the internal state:
$ ceph -c ceph.conf osd create
$ ceph -c ceph.conf osd crush set osd.0 1.0 root=default
Then setup the OSD itself:
$ mkdir $osd_data
$ ceph-osd -c ceph.conf -i 0 --mkfs --mkkey --keyring keyring
And start it:
$ ceph-osd -c ceph.conf -i 0
starting osd.0 at :/0 osd_data $osd_data $osd_data/journal
$
Health check The health check should return ok after some time:
$ ceph -c ceph.conf health
HEALTH_OK
$

4 March 2013

Bastian Blank: Using SECCOMP to filter sync operations

Linux includes a syscall filter since a long time. It was restricted to a pre-defined set of syscalls. Since some versions Linux got a more generic filter. Linux can use a BPF-filter to define actions for syscalls. This allows a fine granular specification on which syscalls to act. Also it supports different outcomes assigned to the filter. This filter can be used to filter out sync operations. Debian already got a tool to do this called eatmydata. It is pretty limited as it uses a shared library to drop the library functions. It needs to be available at all times, or it will not do anything. I wrote a small tool that asks the kernel to filter out sync operations for all children. It sets a filter with all currently supported sync-like operations and makes them return success. However it can't filter the O_SYNC-flag from the open-syscall, so it just makes it return an error. It executes the command given on the command-line after that. This is just a proof of concept, but lets see.
/*
 * Copyright (C) 2013 Bastian Blank <waldi@debian.org>
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 * 1. Redistributions of source code must retain the above copyright notice, this
 *    list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright notice,
 *    this list of conditions and the following disclaimer in the documentation
 *    and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
#define _GNU_SOURCE 1
#include <errno.h>
#include <fcntl.h>
#include <seccomp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define filter_rule_add(action, syscall, count, ...) \
  if (seccomp_rule_add(filter, action, syscall, count, ##__VA_ARGS__)) abort();
static int filter_init(void)
 
  scmp_filter_ctx filter;
  if (!(filter = seccomp_init(SCMP_ACT_ALLOW))) abort();
  if (seccomp_attr_set(filter, SCMP_FLTATR_CTL_NNP, 1)) abort();
  filter_rule_add(SCMP_ACT_ERRNO(EINVAL), SCMP_SYS(open), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, O_SYNC, O_SYNC));
  filter_rule_add(SCMP_ACT_ERRNO(0), SCMP_SYS(fsync), 0);
  filter_rule_add(SCMP_ACT_ERRNO(0), SCMP_SYS(fdatasync), 0);
  filter_rule_add(SCMP_ACT_ERRNO(0), SCMP_SYS(msync), 0);
  filter_rule_add(SCMP_ACT_ERRNO(0), SCMP_SYS(sync), 0);
  filter_rule_add(SCMP_ACT_ERRNO(0), SCMP_SYS(syncfs), 0);
  filter_rule_add(SCMP_ACT_ERRNO(0), SCMP_SYS(sync_file_range), 0);
  return seccomp_load(filter);
 
int main(__attribute__((unused)) int argc, char *argv[])
 
  if (argc <= 1)
   
    fprintf(stderr, "usage: %s COMMAND [ARG]...\n", argv[0]);
    return 2;
   
  if (filter_init())
   
    fprintf(stderr, "%s: can't initialize seccomp filter\n", argv[0]);
    return 1;
   
  execvp(argv[1], &argv[1]);
  if (errno == ENOENT)
   
    fprintf(stderr, "%s: command not found: %s\n", argv[0], argv[1]);
    return 127;
   
  fprintf(stderr, "%s: failed to execute: %s: %s\n", argv[0], argv[1], strerror(errno));
  return 1;
 

2 March 2013

Bastian Blank: LDAP, Insignificant Space and Postfix

For some, LDAP, just like X.500, is black magic. I won't argue against. Sometimes it really shows surprising behavior. It always makes sense, if you think about what LDAP is built for. One surprising behavior is the handling of the "Insignificant Space"
LDAP supports syntax and comparator methods. The syntax specifies how the entry should look. Usually this is some form of text, but numbers or stuff like telephone numbers are supported as well. The comparators specifies how the values are compared. Most of the text comparators are defined to apply the handling of insignificant spaces. Insignificant space handling normalizes the use of spaces. First all leading and trailing spaces are removed. All the internal spaces are normalized to at most two spaces. At the end all strings starts and ends with one space to allow proper sub-string matches. The resulting strings are used for comparisons. This behavior makes sense most of the time. If the user want to find something in the directory, he usually don't cares about spaces, but about content. But I found one occurrence where this produces some grieve. Postfix supports LDAP since some time. And lets say, it does not care about spaces in its queries. This is no problem as e-mail addresses do not contain spaces. Or do they?
Yes, e-mail addresses can contain spaces. This is not widely known, but still allowed. This addresses are quoted and the command looks like RCPT TO:<" test"@example.com>. The local part is quoted and contains a space at the beginning. And this is where the problem starts. Postfix sanitizes the address. It uses a simplified internal representation of the address for all lookups. So the address gets test@example.com. This form is used in all table lookups. The LDAP table uses the internal form of the address. This address is copied verbatim into the query. This query may look this way (mail= test@example.com). It is sent to the server this way. The LDAP server applies the insignificant space modifications. The query is now interpreted and the comparator specified modifications are applied. The query gets effectively changed to (mail=test@example.com). And this is where the fun starts. Postfix accepts undeliverable mails. Depending on the setup, such LDAP queries may be used to check for valid addresses. Because of the space handling, the sender can add spaces to the beginning of the address and it will still be considered valid. In later steps this addresses are not longer valid. Addresses starting with spaces are considered invalid in some locations of Postfix. What surprised me a bit is that virtual alias handling did not map them. The unmodified addressed showed up on the LMTP backend server. That's why they showed up my radar. I would say Postfix is wrong in this case. The LDAP server applies the spec correctly and defines spaces in e-mail addresses as insignificant. Postfix on the other side considers them significant. The easiest fix would be to not allow any spaces in the LDAP table.

25 December 2012

Bastian Blank: New software: LMTP to UUCP gateway

I use UUCP to get my mails. It works fine but lacks support for modern SMTP features like DSN. While it may be possible to bolt support into the the rmail part, both the sendmail interface used to submit mails and the Postfix pipe daemon used to extract mail are not able to do so. So I started a small project to get around this problem. This software uses LMTP to retrieve and SMTP to send all mails. LMTP (a SMTP derivative with support for all extensions) is used to inject mail via a small daemon. The mails are transported using a format similar to batched SMTP to the remote system. It is then injected via SMTP to the local MTA.
Sender LMTP is used to supply mail. As a SMTP derivative, LMTP inherits support for all the available SMTP extensions. The only difference between LMTP and SMTP is the support for one result per recipient after end-of-data. This allow proper handling and mails with multiple recipients without a queue. Mails are supplied to a special LMTP server. This server may currently run from inetd or in foreground by itself. A real daemon mode is not yet implemented. Each mail is submitted to the UUCP queue in its own format. We need to store a lot of meta-data along with the real mail. This data is stored in a custom format.
Protocol All data is transferred using a custom protocol. It is a SMTP derivative, but it is only used in uni-directional communication, so no responses exists. It uses its own Hello command and supports the SMTP commands MAIL, RCPT and DATA. This format allows exactly one mail in each file. An EOF ends the mail transaction. Also all data must be in dot-escaped form like in SMTP.
Hello (UHLO) A sender must start the transaction with this command. It specifies the name of the sender and all requested SMTP extensions. Syntax:
uhlo = "UHLO" SP ( Domain / address-literal ) *( SP ehlo-keyword ) CRLF
The receiver must check if all requested SMTP extensions are available.
Receiver Each mail is submitted by the UUCP system. It calls the supplied receiver tool called rumtp. This tool reads the protocol stream and submits the mail to a local SMTP server. There is no error handling right now in this tool. All errors will create a mail to the local UUCP admin by the UUCP system itself.
License and distribution This package is licensed GPL 3. It is for new distributed via Alioth.

24 December 2012

Bastian Blank: Relaying e-mail over unreliable connections

Introduction I still prefer to handle all my mails at home. I got backup set-up on my own and have anything in direct access. This network is connected to the internet via an end-user DSL connection without a static IP-address and without any useful SLA. However relaying mails over such an unreliable connection is still an unsolved problem. There exists a lot of solutions for this problem. I'm currently building a new e-mail setup, so I tried to collect all the used and maybe possible solutions for the relay problem. I will show some of them.
Don't do it The easiest solution is to just don't try to collect mails at home. This solution is somewhat against the rules, but I know people who prefers it. Access to the mails is usually done with IMAP and a copy is made if needed. This is not really a solution, but it works for many people.
SMTP SMTP is the mail protocol used in the internet. It is used for all public mail exchanges. By itself it can't be used to submit mails to remote destinations With some preparations it can be used to relay mail over unreliable connections. There are three different turn commands in SMTP that can be used to start the mail flow. Two of them will be
VPN or dynamic DNS with TLS with or without extended turn (ETRN) Using SMTP themself is an easy solution. It can relay mails without much hasle But it needs either a fixed or authenticated connection. SMTP can use a fixed connection. This is usually provided by some sort of VPN. The VPN can be encrypted, but does not need to. This allow the MTA to connect to the downstream server if it is available. The other solution is to authenticate the downstream server. Authentication is available via TLS and X.509 certificates. It still needs some way to find it, but with dynamic DNS this is no real problem. Both variants can be combined with the extended turn command. Extended turn allows to request a queue flush for a domain. It can be used to request mails only to be delivered if the downstream server is available at all. This reduces the load on the MTA.
Authenticated turn (ATRN) On-demand mail relay is a rarely supported ESMTP feature. The Authenticated turn command effectively reverses the SMTP connection and allows, after authentication, the flow of mails from the server to the client via standard SMTP. There exists some stand-alone implementations, but no widely used MTA includes an implementation.
POP3/IMAP and fetchmail/getmail All e-mails can be delivered to mailboxes and retrieved by the end-users system and re-injected into the delivery system. Both fetchmail and getmail are able to retrieve mail from POP3 and IMAP servers. They are either directly delivered via a MDA like procmail or maildrop. Otherwise they are submitted via the sendmail interface or SMTP and delivered by the MTA. Neither POP3 nor IMAP have support for meta data like the real recipient or sender.
Mailbox per address The mails for each address are delivered to its own mailbox. This allows proper identification of the recipient address. It still have no real idea of the sender address. Because it needs to poll one mailbox per address, this highers the resources needed on both sides dramatically.
Multi-drop mailbox The mails for all addresses are delivered into one mailbox. The original recipient must be saved in a custom header to allow this information to be restored. Only one mailbox needs to be polled for mail addresses.
UUCP One of the oldest transports used for mail is UUCP. It effectively copies a file to a different system and pipes it into a program. UUCP can be used to transport mails in various ways.
rmail Each mail is copied verbatim to the client. It saves the sender address in form of a "From" pseudo header in the mail itself and supplies the recipient on the command-line. So it have access to both sender and recipient address.
Batched SMTP Batched SMTP transfers old-style SMTP transactions over UUCP The MTA (Exim supports this) or a special tool (in the bsmtpd package) writes this files and after a given time or size they are dispatched via UUCP to the remote system. The bsmtpd packages was removed from Debian some years ago.
Dovecot synchronization Dovecot support bi-directional synchronization of mailbox contents. It holds all data on both sides. The internal log is used to merge changes done on both sides, so it should not loose any data. This synchronization can be used to work with the data on both sides (via Dovecot of cause) or create backups. It needs shell access to the user owning the data on both sides.
Conclusions There is no one size fits all solution for this problem. If you admin the remote system, you can implement any of this solutions If it is managed by someone else you need good luck. Almost all solutions does not support current SMTP features. The one I'm really missing is DSN, aka configurable delivery notices. POP3 and IMAP handle already delivered mail and have no need for this. All UUCP variants does not handle it, because they are much older anyway. Only SMTP itself supports all of its features. I still use rmail over UUCP for my mails at home and it works flawless. UUCP itself runs over SSH, it can compress data on the fly and authenticate using private keys.

1 July 2012

Niels Thykier: Performance bottlenecks in Lintian

Thanks to a heads up from Bastian Blank, I learned that Lintian 2.5.7 and 2.5.8 were horribly slow on the Linux binaries. Bastian had already identified the issue and 2.5.9 fixed the performance regression. But in light of that, I decided to have a look at a couple of other bottlenecks. First, I added a simple benchmark support to Lintian 2.5.10 (enabled with -dd) that prints the approximate run time of a given collection. As an example, when running lintian -dd on lintian 2.5.10, you can see something like:
N: Collecting info: unpacked for source:lintian/2.5.10 ...
[...]
N: Collection script unpacked for source:lintian/2.5.10 done (0.699s)
When done on linux-image, the slowest 3 things with 2.5.10 are (in order of appearance):
[...]
N: Collection script strings for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (12.333s)
N: Collection script objdump-info for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (15.915s)
[...]
N: Finished check: binaries (5.911s)
[...]
(The mileage (and order) probably will vary a bit.) These 3 things makes up about 22 seconds of a total running time on approximately 28-30s on my machine. Now if you wondering how 12, 16 and 6 becomes 22 the answer is parallelization . strings and objdump-info are run in parallel so only the most
expensive of the two counts in practise (with multiple processing units). The version of linux-image I have been testing (3.2.20-1, amd64) has over 2800 ELF binaries (kernel modules). That makes the runtime of strings and objdump-info much more dominating than in your average package . For the fun of it I have done a small informal benchmark of various Lintian versions on the binary. I have used the command line:
# time is the bash shell built-in and not /usr/bin/time
$ time lintian -EvIL +pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
# This was used with only versions that did not accept -L +pedantic
$ time lintian -EvI --pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
With older versions of Lintian (<= 2.5.3) Perl starts to emit warnings; these have been manually filtered out. I used lintian from the git repository (i.e. I didn t install the packages, but checked out the relevant git tags). I had libperlio-gzip-perl installed (affects the 2.5.10 run). Most results are only from a single run, though I ran it twice on the first version (hoping my kernel would cache the deb for the next run). The results are:
2.5.10
real  0m28.836s
user  0m36.982s
sys  0m3.280s
2.5.9
real  1m9.378s
user  0m33.702s
sys  0m11.177s
2.5.8
real  4m54.492s
user  4m0.631s
sys  0m30.466s
2.5.7 (not tested, but probably about same as 2.5.8)
2.5. 0..6 
real  1m20s  - 1m22s
user  0m19.0s - 0m20.7s
sys  0m5.1s  - 0m5.6s
I think Bastian s complaint was warranted for 2.5. 7,8 . :) While it would have been easy to attribute the performance gain in 2.5.10 on the new parallelization improvements, it is simply not the case. These improvements only apply to running collections when checking multiple packages. On my machine, the parallelization limit for a package is effectively determined by the dependencies between the collections on my machine. Instead the improvements comes from reducing the number of system(3) (or fork+exec) calls Lintian does. Mostly through using xargs more, even if it meant slightly more complex code. But also, libperlio-gzip-perl shaved off a couple of seconds on binaries check. But as I said, linux-image is not your average package . Most of the improvements mentioned here are hardly visible on other packages. So let s have a look at some more other bottlenecks. In my experience the following are the worst offenders : But enough Lintian for now time to fix some RC bugs!

Next.