Search Results: "Martin"

1 October 2017

Iain R. Learmonth: Free Software Efforts (2017W39)

Here s my weekly report for week 39 of 2017. In this week I have travelled to Berlin and caught up on some podcasts in doing so. I ve also had some trouble with the RSS feeds on my blog but hopefully this is all fixed now. Thanks to Martin Milbret I now have a replacement for my dead workstation, an HP Z600, and there will be a blog post about this new set up to come next week. Thanks also to S lvan and a number of others that made donations towards getting me up and running again. A breakdown of the donations and expenses can be found at the end of this post.

Debian Two of my packages measurement-kit from OONI and python-azure-devtools used to build the Azure Python SDK (packaged as python-azure) have been accepted by ftp-master into Debian s unstable suite. I have also sponsored uploads for comptext, comptty, fllog, flnet and gnustep-make. I had previously encouraged Eric Heintzmann to become a DM and I have given him DM upload privileges for the gnustep-make package as he has shown to care for the GNUstep packages well. Bugs closed (fixed/wontfix): #8751251, #8751261, #861753, #873083

Tor Project My Tor Project contributions this week were primarily attending the Tor Metrics meeting which I have reported on in a separate blog post.

Sustainability I believe it is important to be clear not only about the work I have already completed but also about the sustainability of this work into the future. I plan to include a short report on the current sustainability of my work in each weekly report. The replacement workstation arrived on Friday and is now up and running. In total I received 308.73 in donations and spent 36.89 on video adapters and 141.94 on replacement hard drives for my NAS (which includes my local Debian mirror and backups). For the Tor Metrics meeting in Berlin, Tor Project paid my flights and accommodation and I paid only for ground transport and food myself. The total cost for ground transport during the trip was 45.92 (taxi to airport, 1 Tageskarte) and total cost for food was 23.46. The current funds I have available for equipment, travel and other free software expenses is now 60.52. I do not believe that any hardware I rely on is looking at imminent failure.

  1. Fixed by a sponsored upload, not by my changes [return]

6 September 2017

Mike Gabriel: MATE 1.18 landed in Debian testing

This is to announce that finally all MATE Desktop 1.18 components have landed in Debian testing (aka buster). Credits Again a big thanks to the packaging team (esp. Vangelis Mouhtsis and Martin Wimpress, but also to Jeremy Bicha for constant advice and Aron Xu for joining the Debian+Ubuntu MATE Packaging Team and merging all the Ubuntu zesty and artful branches back to master). Fully Available on all Debian-supported Architectures The very special thing about this MATE 1.18 release for Debian is that MATE is now available on all Debian hardware architectures. See "Buildd" column on our DDPO overview page [1]. Thanks to all the people from the Debian porters realm for providing feedback to my porting questions. References

31 August 2017

Martin-Éric Racine: Firefox slow as HEL even after boosting RAM from 4GB to 16GB

The title says it all: Firefox is the only piece of software that shows zero sign of improvement in its performance after quadrupling the amount of RAM installed on my 64-bit desktop computer. All other applications show a significant boost in performance. Yet, among all the applications I use, Firefox is the one that most needed this speed boost. To say that this is a major disapointment is an understatement. FFS, Mozilla devs!

20 August 2017

Vincent Bernat: IPv6 route lookup on Linux

TL;DR: With its implementation of IPv6 routing tables using radix trees, Linux offers subpar performance (450 ns for a full view 40,000 routes) compared to IPv4 (50 ns for a full view 500,000 routes) but fair memory usage (20 MiB for a full view).
In a previous article, we had a look at IPv4 route lookup on Linux. Let s see how different IPv6 is.

Lookup trie implementation Looking up a prefix in a routing table comes down to find the most specific entry matching the requested destination. A common structure for this task is the trie, a tree structure where each node has its parent as prefix. With IPv4, Linux uses a level-compressed trie (or LPC-trie), providing good performances with low memory usage. For IPv6, Linux uses a more classic radix tree (or Patricia trie). There are three reasons for not sharing:
  • The IPv6 implementation (introduced in Linux 2.1.8, 1996) predates the IPv4 implementation based on LPC-tries (in Linux 2.6.13, commit 19baf839ff4a).
  • The feature set is different. Notably, IPv6 supports source-specific routing1 (since Linux 2.1.120, 1998).
  • The IPv4 address space is denser than the IPv6 address space. Level-compression is therefore quite efficient with IPv4. This may not be the case with IPv6.
The trie in the below illustration encodes 6 prefixes: Radix tree For more in-depth explanation on the different ways to encode a routing table into a trie and a better understanding of radix trees, see the explanations for IPv4. The following figure shows the in-memory representation of the previous radix tree. Each node corresponds to a struct fib6_node. When a node has the RTN_RTINFO flag set, it embeds a pointer to a struct rt6_info containing information about the next-hop. Memory representation of a routing table The fib6_lookup_1() function walks the radix tree in two steps:
  1. walking down the tree to locate the potential candidate, and
  2. checking the candidate and, if needed, backtracking until a match.
Here is a slightly simplified version without source-specific routing:
static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
                                       struct in6_addr  *addr)
 
    struct fib6_node *fn;
    __be32 dir;
    /* Step 1: locate potential candidate */
    fn = root;
    for (;;)  
        struct fib6_node *next;
        dir = addr_bit_set(addr, fn->fn_bit);
        next = dir ? fn->right : fn->left;
        if (next)  
            fn = next;
            continue;
         
        break;
     
    /* Step 2: check prefix and backtrack if needed */
    while (fn)  
        if (fn->fn_flags & RTN_RTINFO)  
            struct rt6key *key;
            key = fn->leaf->rt6i_dst;
            if (ipv6_prefix_equal(&key->addr, addr, key->plen))  
                if (fn->fn_flags & RTN_RTINFO)
                    return fn;
             
         
        if (fn->fn_flags & RTN_ROOT)
            break;
        fn = fn->parent;
     
    return NULL;
 

Caching While IPv4 lost its route cache in Linux 3.6 (commit 5e9965c15ba8), IPv6 still has a caching mechanism. However cache entries are directly put in the radix tree instead of a distinct structure. Since Linux 2.1.30 (1997) and until Linux 4.2 (commit 45e4fd26683c), almost any successful route lookup inserts a cache entry in the radix tree. For example, a router forwarding a ping between 2001:db8:1::1 and 2001:db8:3::1 would get those two cache entries:
$ ip -6 route show cache
2001:db8:1::1 dev r2-r1  metric 0
    cache
2001:db8:3::1 via 2001:db8:2::2 dev r2-r3  metric 0
    cache
These entries are cleaned up by the ip6_dst_gc() function controlled by the following parameters:
$ sysctl -a   grep -F net.ipv6.route
net.ipv6.route.gc_elasticity = 9
net.ipv6.route.gc_interval = 30
net.ipv6.route.gc_min_interval = 0
net.ipv6.route.gc_min_interval_ms = 500
net.ipv6.route.gc_thresh = 1024
net.ipv6.route.gc_timeout = 60
net.ipv6.route.max_size = 4096
net.ipv6.route.mtu_expires = 600
The garbage collector is triggered at most every 500 ms when there are more than 1024 entries or at least every 30 seconds. The garbage collection won t run for more than 60 seconds, except if there are more than 4096 routes. When running, it will first delete entries older than 30 seconds. If the number of cache entries is still greater than 4096, it will continue to delete more recent entries (but no more recent than 512 jiffies, which is the value of gc_elasticity) after a 500 ms pause. Starting from Linux 4.2 (commit 45e4fd26683c), only a PMTU exception would create a cache entry. A router doesn t have to handle those exceptions, so only hosts would get cache entries. And they should be pretty rare. Martin KaFai Lau explains:
Out of all IPv6 RTF_CACHE routes that are created, the percentage that has a different MTU is very small. In one of our end-user facing proxy server, only 1k out of 80k RTF_CACHE routes have a smaller MTU. For our DC traffic, there is no MTU exception.
Here is how a cache entry with a PMTU exception looks like:
$ ip -6 route show cache
2001:db8:1::50 via 2001:db8:1::13 dev out6  metric 0
    cache  expires 573sec mtu 1400 pref medium

Performance We consider three distinct scenarios:
Excerpt of an Internet full view
In this scenario, Linux acts as an edge router attached to the default-free zone. Currently, the size of such a routing table is a little bit above 40,000 routes.
/48 prefixes spread linearly with different densities
Linux acts as a core router inside a datacenter. Each customer or rack gets one or several /48 networks, which need to be routed around. With a density of 1, /48 subnets are contiguous.
/128 prefixes spread randomly in a fixed /108 subnet
Linux acts as a leaf router for a /64 subnet with hosts getting their IP using autoconfiguration. It is assumed all hosts share the same OUI and therefore, the first 40 bits are fixed. In this scenario, neighbor reachability information for the /64 subnet are converted into routes by some external process and redistributed among other routers sharing the same subnet2.

Route lookup performance With the help of a small kernel module, we can accurately benchmark3 the ip6_route_output_flags() function and correlate the results with the radix tree size: Maximum depth and lookup time Getting meaningful results is challenging due to the size of the address space. None of the scenarios have a fallback route and we only measure time for successful hits4. For the full view scenario, only the range from 2400::/16 to 2a06::/16 is scanned (it contains more than half of the routes). For the /128 scenario, the whole /108 subnet is scanned. For the /48 scenario, the range from the first /48 to the last one is scanned. For each range, 5000 addresses are picked semi-randomly. This operation is repeated until we get 5000 hits or until 1 million tests have been executed. The relation between the maximum depth and the lookup time is incomplete and I can t explain the difference of performance between the different densities of the /48 scenario. We can extract two important performance points:
  • With a full view, the lookup time is 450 ns. This is almost ten times the budget for forwarding at 10 Gbps which is about 50 ns.
  • With an almost empty routing table, the lookup time is 150 ns. This is still over the time budget for forwarding at 10 Gbps.
With IPv4, the lookup time for an almost empty table was 20 ns while the lookup time for a full view (500,000 routes) was a bit above 50 ns. How to explain such a difference? First, the maximum depth of the IPv4 LPC-trie with 500,000 routes was 6, while the maximum depth of the IPv6 radix tree for 40,000 routes is 40. Second, while both IPv4 s fib_lookup() and IPv6 s ip6_route_output_flags() functions have a fixed cost implied by the evaluation of routing rules, IPv4 has several optimizations when the rules are left unmodified5. Those optimizations are removed on the first modification. If we cancel those optimizations, the lookup time for IPv4 is impacted by about 30 ns. This still leaves a 100 ns difference with IPv6 to be explained. Let s compare how time is spent in each lookup function. Here is a CPU flamegraph for IPv4 s fib_lookup(): IPv4 route lookup flamegraph Only 50% of the time is spent in the actual route lookup. The remaining time is spent evaluating the routing rules (about 30 ns). This ratio is dependent on the number of routes we inserted (only 1000 in this example). It should be noted the fib_table_lookup() function is executed twice: once with the local routing table and once with the main routing table. The equivalent flamegraph for IPv6 s ip6_route_output_flags() is depicted below: IPv6 route lookup flamegraph Here is an approximate breakdown on the time spent:
  • 50% is spent in the route lookup in the main table,
  • 15% is spent in handling locking (IPv4 is using the more efficient RCU mechanism),
  • 5% is spent in the route lookup of the local table,
  • most of the remaining is spent in routing rule evaluation (about 100 ns)6.
Why does the evaluation of routing rules is less efficient with IPv6? Again, I don t have a definitive answer.

History The following graph shows the performance progression of route lookups through Linux history: IPv6 route lookup performance progression All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is able to compile older kernels as well as current ones. The kernel configuration is the default one with CONFIG_SMP, CONFIG_IPV6, CONFIG_IPV6_MULTIPLE_TABLES and CONFIG_IPV6_SUBTREES options enabled. Some other unrelated options are enabled to be able to boot them in a virtual machine and run the benchmark. There are three notable performance changes:
  • In Linux 3.1, Eric Dumazet delays a bit the copy of route metrics to fix the undesirable sharing of route-specific metrics by all cache entries (commit 21efcfa0ff27). Each cache entry now gets its own metrics, which explains the performance hit for the non-/128 scenarios.
  • In Linux 3.9, Yoshifuji Hideaki removes the reference to the neighbor entry in struct rt6_info (commit 887c95cc1da5). This should have lead to a performance increase. The small regression may be due to cache-related issues.
  • In Linux 4.2, Martin KaFai Lau prevents the creation of cache entries for most route lookups. The most sensible performance improvement comes with commit 4b32b5ad31a6. The second one is from commit 45e4fd26683c, which effectively removes creation of cache entries, except for PMTU exceptions.

Insertion performance Another interesting performance-related metric is the insertion time. Linux is able to insert a full view in less than two seconds. For some reason, the insertion time is not linear above 50,000 routes and climbs very fast to 60 seconds for 500,000 routes. Insertion time Despite its more complex insertion logic, the IPv4 subsystem is able to insert 2 million routes in less than 10 seconds.

Memory usage Radix tree nodes (struct fib6_node) and routing information (struct rt6_info) are allocated with the slab allocator7. It is therefore possible to extract the information from /proc/slabinfo when the kernel is booted with the slab_nomerge flag:
# sed -ne 2p -e '/^ip6_dst/p' -e '/^fib6_nodes/p' /proc/slabinfo   cut -f1 -d:
   name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
fib6_nodes         76101  76104     64   63    1
ip6_dst_cache      40090  40090    384   10    1
In the above example, the used memory is 76104 64+40090 384 bytes (about 20 MiB). The number of struct rt6_info matches the number of routes while the number of nodes is roughly twice the number of routes: Nodes The memory usage is therefore quite predictable and reasonable, as even a small single-board computer can support several full views (20 MiB for each): Memory usage The LPC-trie used for IPv4 is more efficient: when 512 MiB of memory is needed for IPv6 to store 1 million routes, only 128 MiB are needed for IPv4. The difference is mainly due to the size of struct rt6_info (336 bytes) compared to the size of IPv4 s struct fib_alias (48 bytes): IPv4 puts most information about next-hops in struct fib_info structures that are shared with many entries.

Conclusion The takeaways from this article are:
  • upgrade to Linux 4.2 or more recent to avoid excessive caching,
  • route lookups are noticeably slower compared to IPv4 (by an order of magnitude),
  • CONFIG_IPV6_MULTIPLE_TABLES option incurs a fixed penalty of 100 ns by lookup,
  • memory usage is fair (20 MiB for 40,000 routes).
Compared to IPv4, IPv6 in Linux doesn t foster the same interest, notably in term of optimizations. Hopefully, things are changing as its adoption and use at scale are increasing.

  1. For a given destination prefix, it s possible to attach source-specific prefixes:
    ip -6 route add 2001:db8:1::/64 \
      from 2001:db8:3::/64 \
      via fe80::1 \
      dev eth0
    
    Lookup is first done on the destination address, then on the source address.
  2. This is quite different of the classic scenario where Linux acts as a gateway for a /64 subnet. In this case, the neighbor subsystem stores the reachability information for each host and the routing table only contains a single /64 prefix.
  3. The measurements are done in a virtual machine with one vCPU and no neighbors. The host is an Intel Core i5-4670K running at 3.7 GHz during the experiment (CPU governor set to performance). The benchmark is single-threaded. Many lookups are performed and the result reported is the median value. Timings of individual runs are computed from the TSC.
  4. Most of the packets in the network are expected to be routed to a destination. However, this also means the backtracking code path is not used in the /128 and /48 scenarios. Having a fallback route gives far different results and make it difficult to ensure we explore the address space correctly.
  5. The exact same optimizations could be applied for IPv6. Nobody did it yet.
  6. Compiling out table support effectively removes those last 100 ns.
  7. There is also per-CPU pointers allocated directly (4 bytes per entry per CPU on a 64-bit architecture). We ignore this detail.

18 August 2017

Mike Gabriel: @DebConf17: Ad-hoc BoF: Bits from the Debian+Ubuntu MATE Packaging Team

On Tuesday, late afternoon, at DebConf17, I offered an ad-hoc BoF about the current status of the MATE Desktop packaging efforts in Debian and Ubuntu. I need to get this written down, before DebConf17 feels too far away... Unfortunately, I scheduled that BoF with Joey Hess's talk about his post-Debian life, which attracted many people. So, only a small group of people came together to share and discuss about the current status of MATE in Debian and Ubuntu. Ongoing efforts around MATE in Debian and Ubuntu A quick summary of ongoing efforts was provided and also a collection of URLs for reporting bugs, looking up packaging status, etc. was listed: Cross-Distro Packaging Workflow The workflow of Debian and Ubuntu packaging in the MATE Packaging Team was described in detail (basically, all packages go through Debian, only exception being freeze states of this or that distro) and the benefit of the close cooperation between the two projects underlined. We reduce the packaging effort tremendously by working very closely together. In the past, we (Martin Wimpress and myself) also invited the people from Linux Mint to join our cross-distro packaging efforts, but so far to no avail. Also the packaging for the Parrot Security OS has recently been integrated in our packaging Git repository. Requested / Upcoming Improvements From people attending the BoF, I got these main inputs, which I hope to accomplish with the help of all, for Debian buster: Another project that I will also work on for Debian buster is proper Ayatana Indicator support for Debian's MATE Desktop. Credits Thanks to all the people attending the BoF for your sharings and input. Feel invited to contribute to our team's effort in improving the user experience of the MATE Destkop Environment in Debian (and Ubuntu, and ...). Also a big thanks to the people already on the team for bringing MATE in Debian to where it is now. Good work, folks!

2 August 2017

Markus Koschany: My Free Software Activities in July 2017

Welcome to gambaru.de. Here is my monthly report that covers what I have been doing for Debian. If you re interested in Java, Games and LTS topics, this might be interesting for you. Debian Games Debian Java Debian LTS This was my seventeenth month as a paid contributor and I have been paid to work 23,5 hours on Debian LTS, a project started by Rapha l Hertzog. In that time I did the following: Non-maintainer upload Thanks for reading and see you next time.

31 July 2017

Norbert Preining: Calibre in Debian

Some news about Calibre in Debian: I have been added to the list of maintainers, thanks Martin, and the recent release of Calibre 3.4 into Debian/unstable brought some fixes concerning the desktop integration. Now I am working on Calibre 3.5. Calibre 3.5 separates out one module, html5-parser, into a separate package which needs to be included into Debian first. I have prepared and uploaded a version, but NEW processing will keep this package from entering Debian for a while. Other things I am currently doing is going over the list of bugs and try to fix or close as many as possible. Finally the endless Rar support story still continues. I still don t have any response from the maintainer of unrar-nonfree in Debian, so I am contemplating to package my own version of libunrar. As I wrote in the previous post, Calibre now checks whether the Python module unrardll is available, and if, uses it to decode rar-packed ebooks. I have a package for unrardll ready to be uploaded, but it needs the shared library version, and thus I am stuck waiting for unrar-nonfree. Anyway, to help all those wanting to play with the latest Calibre, my archive nowadays contains: Together these packages provide the newest Calibre with Rar support.
deb http://www.preining.info/debian/ calibre main
deb-src http://www.preining.info/debian/ calibre main
The releases are signed with my Debian key 0x6CACA448860CDC13 Enjoy

18 July 2017

Norbert Preining: Calibre and rar support

Thanks to the cooperation with upstream authors and the maintainer Martin Pitt, the Calibre package in Debian is now up-to-date at version 3.4.0, and has adopted a more standard packaging following upstream. In particular, all the desktop files and man pages have been replaced by what is shipped by Calibre. What remains to be done is work on RAR support.
Rar support is necessary in the case that the eBook uses rar as compression, which happens quite often in comic books (cbr extension). Calibre 3 has split out rar support into a dynamically loaded module, so what needs to be done is packaging it. I have prepared a package for the Python library unrardll which allows Calibre to read rar-compressed ebooks, but it depends on the unrar shared library, which unfortunately is not built in Debian. I have sent a patch to fix this to the maintainer, see bug 720051, but without reaction from the maintainer. Thus, I am publishing updated packages for unrar shipping also libunrar5, and unrardll Python package in my calibre repository. After installing python-unrardll Calibre will happily import meta-data from rar-compressed eBooks, as well as display them.
deb http://www.preining.info/debian/ calibre main
deb-src http://www.preining.info/debian/ calibre main
The releases are signed with my Debian key 0x6CACA448860CDC13 Enjoy

2 July 2017

Bits from Debian: New Debian Developers and Maintainers (May and June 2017)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

21 June 2017

Reproducible builds folks: Reproducible Builds: week 112 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday June 11 and Saturday June 17 2017: Upcoming events Upstream patches and bugs filed Reviews of unreproducible packages 1 package review has been added, 19 have been updated and 2 have been removed in this week, adding to our knowledge about identified issues. Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by: diffoscope development tests.reproducible-builds.org As you might have noticed, Debian stretch was released last week. Since then, Mattia and Holger renamed our testing suite to stretch and added a buster suite so that we keep our historic results for stretch visible and can continue our development work as usual. In this sense, happy hacking on buster; may it become the best Debian release ever and hopefully the first reproducible one! Axel Beckert is currently in the process of setting up eight LeMaker HiKey960 boards. These boards were sponsored by Hewlett Packard Enterprise and will be hosted by the SOSETH students association at ETH Zurich. Thanks to everyone involved here and also thanks to Martin Michlmayr and Steve Geary who initiated getting these boards to us. Misc. This week's edition was written by Chris Lamb, Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

9 May 2017

Martin Pitt: Cockpit is now just an apt install away

Cockpit has now been in Debian unstable and Ubuntu 17.04 and devel, which means it s now a simple
$ sudo apt install cockpit
away for you to try and use. This metapackage pulls in the most common plugins, which are currently NetworkManager and udisks/storaged. If you want/need, you can also install cockpit-docker (if you grab docker.io from jessie-backports or use Ubuntu) or cockpit-machines to administer VMs through libvirt. Cockpit upstream also has a rather comprehensive Kubernetes/Openstack plugin, but this isn t currently packaged for Debian/Ubuntu as kubernetes itself is not yet in Debian testing or Ubuntu. After that, point your browser to https://localhost:9090 (or the host name/IP where you installed it) and off you go.

What is Cockpit? Think of it as an equivalent of a desktop (like GNOME or KDE) for configuring, maintaining, and interacting with servers. It is a web service that lets you log into your local or a remote (through ssh) machine using normal credentials (PAM user/password or SSH keys) and then starts a normal login session just as gdm, ssh, or the classic VT logins would. Login screen System page The left side bar is the equivalent of a task switcher , and the applications (i. e. modules for administering various aspects of your server) are run in parallel. The main idea of Cockpit is that it should not behave special in any way - it does not have any specific configuration files or state keeping and uses the same Operating System APIs and privileges like you would on the command line (such as lvmconfig, the org.freedesktop.UDisks2 D-Bus interface, reading/writing the native config files, and using sudo when necessary). You can simultaneously change stuff in Cockpit and in a shell, and Cockpit will instantly react to changes in the OS, e. g. if you create a new LVM PV or a network device gets added. This makes it fundamentally different to projects like webmin or ebox, which basically own your computer once you use them the first time. It is an interface for your operating system, which even reflects in the branding: as you see above, this is Debian (or Ubuntu, or Fedora, or wherever you run it on), not Cockpit .

Remote machines In your home or small office you often have more than one machine to maintain. You can install cockpit-bridge and cockpit-system on those for the most basic functionality, configure SSH on them, and then add them on the Dashboard (I add a Fedora 26 machine here) and from then on can switch between them on the top left, and everything works and feels exactly the same, including using the terminal widget: Add remote Remote terminal The Fedora 26 machine has some more Cockpit modules installed, including a lot of playground ones, thus you see a lot more menu entries there.

Under the hood Beneath the fancy Patternfly/React/JavaScript user interface is the Cockpit API and protocol, which particularly fascinates me as a developer as that is what makes Cockpit so generic, reactive, and extensible. This API connects the worlds of the web, which speaks IPs and host names, ports, and JSON, to the local host only world of operating systems which speak D-Bus, command line programs, configuration files, and even use fancy techniques like passing file descriptors through Unix sockets. In an ideal world, all Operating System APIs would be remotable by themselves, but they aren t. This is where the cockpit bridge comes into play. It is a JSON (i. e. ASCII text) stream protocol that can control arbitrarily many channels to the target machine for reading, writing, and getting notifications. There are channel types for running programs, making D-Bus calls, reading/writing files, getting notified about file changes, and so on. Of course every channel can also act on a remote machine. One can play with this protocol directly. E. g. this opens a (local) D-Bus channel named d1 and gets a property from systemd s hostnamed:
$ cockpit-bridge --interact=---
  "command": "open", "channel": "d1", "payload": "dbus-json3", "name": "org.freedesktop.hostname1"  
---
d1
  "call": [ "/org/freedesktop/hostname1", "org.freedesktop.DBus.Properties", "Get",
          [ "org.freedesktop.hostname1", "StaticHostname" ] ],
  "id": "hostname-prop"  
---
and it will reply with something like
d1
 "reply":[[ "t":"s","v":"donald" ]],"id":"hostname-prop" 
---
( donald is my laptop s name). By adding additional parameters like host and passing credentials these can also be run remotely through logging in via ssh and running cockpit-bridge on the remote host. Stef Walter explains this in detail in a blog post about Web Access to System APIs. Of course Cockpit plugins (both internal and third-party) don t directly speak this, but use a nice JavaScript API. As a simple example how to create your own Cockpit plugin that uses this API you can look at my schroot plugin proof of concept which I hacked together at DevConf.cz in about an hour during the Cockpit workshop. Note that I never before wrote any JavaScript and I didn t put any effort into design whatsoever, but it does work .

Next steps Cockpit aims at servers and getting third-party plugins for talking to your favourite part of the system, which means we really want it to be available in Debian testing and stable, and Ubuntu LTS. Our CI runs integration tests on all of these, so each and every change that goes in is certified to work on Debian 8 (jessie) and Ubuntu 16.04 LTS, for example. But I d like to replace the external PPA/repository on the Install instructions with just it s readily available in -backports ! Unfortunately there s some procedural blockers there, the Ubuntu backport request suffers from understaffing, and the Debian stable backport is blocked on getting it in to testing first, which in turn is blocked by the freeze. I will soon ask for a freeze exception into testing, after all it s just about zero risk - it s a new leaf package in testing. Have fun playing around with it, and please report bugs! Feel free to discuss and ask questions on the Google+ post.

30 April 2017

Chris Lamb: Free software activities in April 2017

Here is my monthly update covering what I have been doing in the free software world (previous month):
Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to permit verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area. This month I:
I also made the following changes to diffoscope, our recursive and content-aware diff utility used to locate and diagnose reproducibility issues:
  • New features:
    • Add support for comparing Ogg Vorbis files. (0436f9b)
  • Bug fixes:
    • Prevent a traceback when using --new-file with containers. (#861286)
    • Don't crash on invalid archives; print a useful error instead. (#833697).
    • Don't print error output from bzip2 call. (21180c4)
  • Cleanups:
    • Prevent abstraction-level violations by defining visual diff support on Presenter classes. (7b68309)
    • Show Debian packages installed in test output. (c86a9e1)


Debian
Debian LTS

This month I have been paid to work 18 hours on Debian Long Term Support (LTS). In that time I did the following:
  • "Frontdesk" duties, triaging CVEs, etc.
  • Issued DLA 882-1 for the tryton-server general application platform to fix a path suffix injection attack.
  • Issued DLA 883-1 for curl preventing a buffer read overrun vulnerability.
  • Issued DLA 884-1 for collectd (a statistics collection daemon) to close a potential infinite loop vulnerability.
  • Issued DLA 885-1 for the python-django web development framework patching two open redirect & XSS attack issues.
  • Issued DLA 890-1 for ming, a library to create Flash files, closing multiple heap-based buffer overflows.
  • Issued DLA 892-1 and DLA 891-1 for the libnl3/libnl Netlink protocol libraries, fixing integer overflow issues which could have allowed arbitrary code execution.

Uploads
  • redis (4:4.0-rc3-1) New upstream RC release.
  • adminer:
    • 4.3.0-2 Fix debian/watch file.
    • 4.3.1-1 New upstream release.
  • bfs:
    • 1.0-1 Initial release.
    • 1.0-2 Drop fstype tests as they rely on /etc/mtab being available. (#861471)
  • python-django:
    • 1:1.10.7-1 New upstream security release.
    • 1:1.11-1 New upstream stable release to experimental.

I sponsored the following uploads: I also performed the following QA uploads:
  • gtkglext (1.2.0-7) Correct installation location of gdkglext-config.h after "Multi-Archification" in 1.2.0-5. (#860007)
Finally, I made the following non-maintainer uploads (NMUs):
  • python-formencode (1.3.0-2) Don't ship files in /usr/lib/python 2.7,3 /dist-packages/docs. (#860146)
  • django-assets (0.12-2) Patch pytest plugin to check whether we are running in a Django context, otherwise we can break unrelated testsuites. (#859916)


FTP Team

As a Debian FTP assistant I ACCEPTed 155 packages: aiohttp-cors, bear, colorize, erlang-p1-xmpp, fenrir, firejail, fizmo-console, flask-ldapconn, flask-socketio, fontmanager.app, fonts-blankenburg, fortune-zh, fw4spl, fzy, gajim-antispam, gdal, getdns, gfal2, gmime, golang-github-go-macaron-captcha, golang-github-go-macaron-i18n, golang-github-gogits-chardet, golang-github-gopherjs-gopherjs, golang-github-jroimartin-gocui, golang-github-lunny-nodb, golang-github-markbates-goth, golang-github-neowaylabs-wabbit, golang-github-pkg-xattr, golang-github-siddontang-goredis, golang-github-unknwon-cae, golang-github-unknwon-i18n, golang-github-unknwon-paginater, grpc, grr-client-templates, gst-omx, hddemux, highwayhash, icedove, indexed-gzip, jawn, khal, kytos-utils, libbloom, libdrilbo, libhtml-gumbo-perl, libmonospaceif, libpsortb, libundead, llvm-toolchain-4.0, minetest-mod-homedecor, mini-buildd, mrboom, mumps, nnn, node-anymatch, node-asn1.js, node-assert-plus, node-binary-extensions, node-bn.js, node-boom, node-brfs, node-browser-resolve, node-browserify-des, node-browserify-zlib, node-cipher-base, node-console-browserify, node-constants-browserify, node-delegates, node-diffie-hellman, node-errno, node-falafel, node-hash-base, node-hash-test-vectors, node-hash.js, node-hmac-drbg, node-https-browserify, node-jsbn, node-json-loader, node-json-schema, node-loader-runner, node-miller-rabin, node-minimalistic-crypto-utils, node-p-limit, node-prr, node-sha.js, node-sntp, node-static-module, node-tapable, node-tough-cookie, node-tunein, node-umd, open-infrastructure-storage-tools, opensvc, openvas, pgaudit, php-cassandra, protracker, pygame, pypng, python-ase, python-bip32utils, python-ltfatpy, python-pyqrcode, python-rpaths, python-statistics, python-xarray, qtcharts-opensource-src, r-cran-cellranger, r-cran-lexrankr, r-cran-pwt9, r-cran-rematch, r-cran-shinyjs, r-cran-snowballc, ruby-ddplugin, ruby-google-protobuf, ruby-rack-proxy, ruby-rails-assets-underscore, rustc, sbt, sbt-launcher-interface, sbt-serialization, sbt-template-resolver, scopt, seqsero, shim-signed, sniproxy, sortedcollections, starjava-array, starjava-connect, starjava-datanode, starjava-fits, starjava-registry, starjava-table, starjava-task, starjava-topcat, starjava-ttools, starjava-util, starjava-vo, starjava-votable, switcheroo-control, systemd, tilix, tslib, tt-rss-notifier-chrome, u-boot, unittest++, vc, vim-ledger, vis, wesnoth-1.13, wolfssl, wuzz, xandikos, xtensor-python & xwallpaper. I additionally filed 14 RC bugs against packages that had incomplete debian/copyright files against getdns, gfal2, grpc, mrboom, mumps, opensvc, python-ase, sniproxy, starjava-topcat, starjava-ttools, unittest++, wolfssl, xandikos & xtensor-python.

13 April 2017

Antoine Beaupr : New approaches to network fast paths

With the speed of network hardware now reaching 100 Gbps and distributed denial-of-service (DDoS) attacks going in the Tbps range, Linux kernel developers are scrambling to optimize key network paths in the kernel to keep up. Many efforts are actually geared toward getting traffic out of the costly Linux TCP stack. We have already covered the XDP (eXpress Data Path) patch set, but two new ideas surfaced during the Netconf and Netdev conferences held in Toronto and Montreal in early April 2017. One is a patch set called af_packet, which aims at extracting raw packets from the kernel as fast as possible; the other is the idea of implementing in-kernel layer-7 proxying. There are also user-space network stacks like Netmap, DPDK, or Snabb (which we previously covered). This article aims at clarifying what all those components do and to provide a short status update for the tools we have already covered. We will focus on in-kernel solutions for now. Indeed, user-space tools have a fundamental limitation: if they need to re-inject packets onto the network, they must again pay the expensive cost of crossing the kernel barrier. User-space performance is effectively bounded by that fundamental design. So we'll focus on kernel solutions here. We will start from the lowest part of the stack, the af_packet patch set, and work our way up the stack all the way up to layer-7 and in-kernel proxying.

af_packet v4 John Fastabend presented a new version of a patch set that was first published in January regarding the af_packet protocol family, which is currently used by tcpdump to extract packets from network interfaces. The goal of this change is to allow zero-copy transfers between user-space applications and the NIC (network interface card) transmit and receive ring buffers. Such optimizations are useful for telecommunications companies, which may use it for deep packet inspection or running exotic protocols in user space. Another use case is running a high-performance intrusion detection system that needs to watch large traffic streams in realtime to catch certain types of attacks. Fastabend presented his work during the Netdev network-performance workshop, but also brought the patch set up for discussion during Netconf. There, he said he could achieve line-rate extraction (and injection) of packets, with packet rates as high as 30Mpps. This performance gain is possible because user-space pages are directly DMA-mapped to the NIC, which is also a security concern. The other downside of this approach is that a complete pair of ring buffers needs to be dedicated for this purpose; whereas before packets were copied to user space, now they are memory-mapped, so the user-space side needs to process those packets quickly otherwise they are simply dropped. Furthermore, it's an "all or nothing" approach; while NIC-level classifiers could be used to steer part of the traffic to a specific queue, once traffic hits that queue, it is only accessible through the af_packet interface and not the rest of the regular stack. If done correctly, however, this could actually improve the way user-space stacks access those packets, providing projects like DPDK a safer way to share pages with the NIC, because it is well defined and kernel-controlled. According to Jesper Dangaard Brouer (during review of this article):
This proposal will be a safer way to share raw packet data between user space and kernel space than what DPDK is doing, [by providing] a cleaner separation as we keep driver code in the kernel where it belongs.
During the Netdev network-performance workshop, Fastabend asked if there was a better data structure to use for such a purpose. The goal here is to provide a consistent interface to user space regardless of the driver or hardware used to extract packets from the wire. af_packet currently defines its own packet format that abstracts away the NIC-specific details, but there are other possible formats. For example, someone in the audience proposed the virtio packet format. Alexei Starovoitov rejected this idea because af_packet is a kernel-specific facility while virtio has its own separate specification with its own requirements. The next step for af_packet is the posting of the new "v4" patch set, although Miller warned that this wouldn't get merged until proper XDP support lands in the Intel drivers. The concern, of course, is that the kernel would have multiple incomplete bypass solutions available at once. Hopefully, Fastabend will present the (by then) merged patch set at the next Netdev conference in November.

XDP updates Higher up in the networking stack sits XDP. The af_packet feature differs from XDP in that it does not perform any sort of analysis or mangling of packets; its objective is purely to get the data into and out of the kernel as fast as possible, completely bypassing the regular kernel networking stack. XDP also sits before the networking stack except that, according to Brouer, it is "focused on cooperating with the existing network stack infrastructure, and on use-cases where the packet doesn't necessarily need to leave kernel space (like routing and bridging, or skipping complex code-paths)." XDP has evolved quite a bit since we last covered it in LWN. It seems that most of the controversy surrounding the introduction of XDP in the Linux kernel has died down in public discussions, under the leadership of David Miller, who heralded XDP as the right solution for a long-term architecture in the kernel. He presented XDP as a fast, flexible, and safe solution. Indeed, one of the controversies surrounding XDP was the question of the inherent security challenges with introducing user-provided programs directly into the Linux kernel to mangle packets at such a low level. Miller argued that whatever protections are expected for user-space programs also apply to XDP programs, comparing the virtual memory protections to the eBPF (extended BPF) verifier applied to XDP programs. Those programs are actually eBPF that have an interesting set of restrictions:
  • they have a limited size
  • they cannot jump backward (and thus cannot loop), so they execute in predictable time
  • they do only static allocation, so they are also limited in memory
XDP is not a one-size-fits-all solution: netfilter, the TC traffic shaper, and other normal Linux utilities still have their place. There is, however, a clear use case for a solution like XDP in the kernel. For example, Facebook and Cloudflare have both started testing XDP and, in Facebook's case, deploying XDP in production. Martin Kafai Lau, from Facebook, presented the tool set the company is using to construct a DDoS-resilience solution and a level-4 load balancer (L4LB), which got a ten-times performance improvement over the previous IPVS-based solution. Facebook rolled out its own user-space solution called "Droplet" to detect hostile traffic and deploy blocking rules in the form of eBPF programs loaded in XDP. Lau demonstrated the way Facebook deploys a three-part chained eBPF program: the first part allows debugging and dumping of packets, the second is Droplet itself, which drops undesirable traffic, and the last segment is the load balancer, which mangles the packets to tweak their destination according to internal rules. Droplet can drop DDoS attacks at line rate while keeping the architecture flexible, which were two key design requirements. Gilberto Bertin, from Cloudflare, presented a similar approach: Cloudflare has a tool that processes sFlow data generated from iptables in order to generate cBPF (classic BPF) mitigation rules that are then deployed on edge routers. Those rules are created with a tool called bpfgen, part of Cloudflare's BSD-licensed bpftools suite. For example, it could create a cBPF bytecode blob that would match DNS queries to any example.com domain with something like:
    bpfgen dns *.example.com
Originally, Cloudflare would deploy those rules to plain iptables firewalls with the xt_bpf module, but this led to performance issues. It then deployed a proprietary user-space solution based on Solarflare hardware, but this has the performance limitations of user-space applications getting packets back onto the wire involves the cost of re-injecting packets back into the kernel. This is why Cloudflare is experimenting with XDP, which was partly developed in response to the company's problems, to deploy those BPF programs. A concern that Bertin identified was the lack of visibility into dropped packets. Cloudflare currently samples some of the dropped traffic to analyze attacks; this is not currently possible with XDP unless you pass the packets down the stack, which is expensive. Miller agreed that the lack of monitoring for XDP programs is a large issue that needs to be resolved, and suggested creating a way to mark packets for extraction to allow analysis. Cloudflare is currently in a testing phase with XDP and it is unclear if its whole XDP tool chain will be publicly available. While those two companies are starting to use XDP as-is, there is more work needed to complete the XDP project. As mentioned above and in our previous coverage, massive statistics extraction is still limited in the Linux kernel and introspection is difficult. Furthermore, while the existing actions (XDP_DROP and XDP_TX, see the documentation for more information) are well implemented and used, another action may be introduced, called XDP_REDIRECT, which would allow redirecting packets to different network interfaces. Such an action could also be used to accelerate bridges as packets could be "switched" based on the MAC address table. XDP also requires network driver support, which is currently limited. For example, the Intel drivers still do not support XDP, although that should come pretty soon. Miller, in his Netdev keynote, focused on XDP and presented it as the standard solution that is safe, fast, and usable. He identified the next steps of XDP development to be the addition of debugging mechanisms, better sampling tools for statistics and analysis, and user-space consistency. Miller foresees a future for XDP similar to the popularization of the Arduino chips: a simple set of tools that anyone, not just developers, can use. He gave the example of an Arduino tutorial that he followed where he could just look up a part number and get easy-to-use instructions on how to program it. Similar components should be available for XDP. For this purpose, the conference saw the creation of a new mailing list called xdp-newbies where people can learn how to create XDP build environments and how to write XDP programs.

In-kernel layer-7 proxying The third approach that struck me as innovative is the idea of doing layer-7 (application) proxying directly in the kernel. This comes from the idea that, traditionally, we build firewalls to segregate traffic and apply controls, but as most services move to HTTP, those policies become ineffective. Thomas Graf, presented this idea during Netconf using a Star Wars allegory: what if the Death Star were a server with an API? You would have endpoints like /dock or /comms that would allow you to dock a ship or communicate with the Death Star. Those API endpoints should obviously be public, but then there is this /exhaust-port endpoint that should never be publicly available. In order for a firewall to protect such a system, it must be able to inspect traffic at a higher level than the traditional address-port pairs. Graf presented a design where the kernel would create an in-kernel socket that would negotiate TCP connections on behalf of user space and then be able to apply arbitrary eBPF rules in the kernel. Graf's design of in-kernel proxying In this scenario, instead of doing the traditional transfer from Netfilter's TPROXY to user space, the kernel directly decapsulates the HTTP traffic and passes it to BPF rules that can make decisions without doing expensive context switches or memory copies in the case of simply wanting to refuse traffic (e.g. issue an HTTP 403 error). This, of course, requires the inclusion of kTLS to process HTTPS connections. HTTP2 support may also prove problematic, as it multiplexes connections and is harder to decapsulate. This design was described as a "pure pre-accept() hook". Starovoitov also compared the design to the kernel connection multiplexer (KCM). Tom Herbert, KCM's author, agreed that it could be extended to support this, but would require some extensions in user space to provide an interface between regular socket-based applications and the KCM layer. In any case, if the application does TLS (and lots of them do), kTLS gets tricky because it breaks the end-to-end nature of TLS, in effect becoming a man in the middle between the client and the application. Eric Dumazet argued that HA-Proxy already does things like this: it uses splice() to avoid copying too much data around, but it still does a context switch to hand over processing to user space, something that could be fixed in the general case. Another similar project that was presented at Netdev is the Tempesta firewall and reverse-proxy. The speaker, Alex Krizhanovsky, explained the Tempesta developers have taken one person month to port the mbed TLS stack to the Linux kernel to allow an in-kernel TLS handshake. Tempesta also implements rate limiting, cookies, and JavaScript challenges to mitigate DDoS attacks. The argument behind the project is that "it's easier to move TLS to the kernel than it is to move the TCP/IP stack to user space". Graf explained that he is familiar with Krizhanovsky's work and he is hoping to collaborate. In effect, the design Graf is working on would serve as a foundation for Krizhanovsky's in-kernel HTTP server (kHTTP). In a private email, Graf explained that:
The main differences in the implementation are currently that we foresee to use BPF for protocol parsing to avoid having to implement every single application protocol natively in the kernel. Tempesta likely sees this less of an issue as they are probably only targeting HTTP/1.1 and HTTP/2 and to some [extent] JavaScript.
Neither project is really ready for production yet. There didn't seem to be any significant pushback from key network developers against the idea, which surprised some people, so it is likely we will see more and more layer-7 intelligence move into the kernel sooner rather than later.

Conclusion All of this work aims at replacing a rag-tag bunch of proprietary solutions that recently came up to bypass the Linux kernel TCP/IP stack and improve performance for firewalls, proxies, and other key edge network elements. The idea is that, unless the kernel improves its performance, or at least provides a way to bypass its more complex code paths, people will work around it. With this set of solutions in place, engineers will now be able to use standard APIs to hook high-performance systems into the Linux kernel.
The author would like to thank the Netdev and Netconf organizers for travel assistance, Thomas Graf for a review of the in-kernel proxying section of this article, and Jesper Dangaard Brouer for review of the af_packet and XDP sections. Note: this article first appeared in the Linux Weekly News.

11 April 2017

Antoine Beaupr : A report from Netconf: Day 2

This article covers the second day of the informal Netconf discussions, held on on April 4, 2017. Topics discussed this day included the binding of sockets in VRF, identification of eBPF programs, inconsistencies between IPv4 and IPv6, changes to data-center hardware, and more. (See this article for coverage from the first day of discussions).

How to bind to specific sockets in VRF One of the first presentations was from David Ahern of Cumulus, who presented a few interesting questions for the audience. His first was the problem of binding sockets to a given interface. Right now, there are four different ways this can be done:
  • the old SO_BINDTODEVICE generic socket option (see socket(7))
  • the IP_PKTINFO, IP-specific socket option (see ip(7)), introduced in Linux 2.2
  • the IP_UNICAST_IF flag, introduced in Linux 3.3 for WINE
  • the IPv6 scope ID suffix, part of the IPv6 addressing standard
So there's a problem of having too many ways of doing the same thing, something that cannot really be fixed without breaking ABI compatibility. But even worse, conflicts between those options are not reported by the kernel so it's possible for a user to set up socket flags in a way that certain flags override others and there are no checks made or errors reported. It was agreed that the user should get some notification of conflicting changes here, at least. Furthermore, binding sockets to a specific VRF (Virtual Routing and Forwarding) device is not currently possible, so Ahern asked what the best way to do this would be, considering the many options available. A use case example is a UDP multicast socket that could be bound to a specific interface within a VRF. This is an old problem: Tom Herbert explained that there were previous discussions about making the bind() system call more programmable so that, for example, you could bind() a UDP socket to a discrete list of IP addresses or a subnet. So he identified this issue as a broader problem that should be addressed by making the interfaces more generic. Ahern explained that it is currently possible to bind sockets to the slave device of a VRF even though that should not be allowed. He also raised the question of how the kernel should tell which socket should be selected for incoming packets. Right now, there is a scoring mechanism for UDP sockets, but that cannot be used directly in this more general case. David Miller said that there are already different ways of specifying scope: there is the VRF layer and the namespace ("netns") layer. A long time ago, Miller reluctantly accepted the addition of netns keys everywhere, swallowing the performance cost to gain flexibility. He argued that a new key should not be added and instead existing infrastructure should be reused. Herbert argued this was exactly the reason why this should be simplified: "if we don't answer the question, people will keep on trying this". For example, one can use a VRF to limit listening addresses, but it gets complicated if we need a device for every address. It seems the consensus evolved towards using, IP_UNICAST_IF, added back in 2012, which is accessible for non-root users. It is currently limited to UDP and RAW sockets, but it could be extended for TCP.

XDP and eBPF program identification Ahern then turned to the problem of extracting BPF programs from the kernel. He gave the example of a simple cBPF (classic BPF) filter that checks for ARP packets. If the filter is read back from the kernel, the user gets a blob of binary data, which is hard to interpret. There is an kernel verifier that can show C-like output, but that is also difficult to interpret. Ahern then added annotations to his slide that showed what the original program actually does, which was a good demonstration of why such a feature is needed. Ahern explained that, at least for cBPF, it should be possible to recover the original plaintext, or at least something close to the original program. A first step would be to replace known constants (like 0x806 for ARP). Even with eBPF, it should be possible to improve the output. Alexei Starovoitov, the BPF maintainer, explained that it might make sense to start by returning information about the maps used by an eBPF program. Then more complex data structures could be inspected once we know their type. The first priority is to get simple debugging tools working but, in the long term, the goal is a full decompiler that can reconstruct instructions into a human-readable program. The question that remains is how to return this data. Ahern explained that right now the bpf() system call copies the data to a different file descriptor, but it could just fill in a buffer. Starovoitov argued for a file descriptor; that would allow the kernel to stream everything through the same descriptor instead of having many attach points. Netlink cannot be used for this because of its asynchronous nature. A similar issue regarding the way we identify express data path (XDP) programs (which are also written in BPF) was raised by Daniel Borkmann from Covalent. Miller explained that users will want ways to figure out which XDP program was installed, so XDP needs an introspection mechanism. We currently have SHA-1 identifiers that can be internally used to tell which binary is currently loaded but those are not exposed to user space. Starovoitov mentioned it is now just a boolean that shows if a program is loaded or not. A use case for this, on top of just trying to figure out which BPF program is loaded, is to actually fetch the source code of a BPF program that was deployed in the field for which the source was lost. It is still uncertain that it will be possible to extract an exact copy that could then be recompiled into the same program. Starovoitov added that he needed this in production to do proper reporting.

IPv4/IPv6 equivalency The last issue or set of issues that Ahern brought up was the question of inconsistencies between IPv4 and IPv6. It turns out that, because both protocols were (naturally) implemented separately, there are inconsistencies in how they are handled in the Linux kernel, which affect, among other things, the VRF framework. The first example he gave was the fact that IPv6 addresses added on the loopback interface generate unreachable routes in the main routing table, yet this doesn't happen with IPv4 addresses. Hannes Frederic Sowa explained this was part of the IPv6 specification: there are stronger restrictions on loopback interfaces in IPv6 than IPv4. Ahern explained that VRF loopback interfaces do not implement these restrictions and wanted to know if this was a problem. Another issue is that anycast routes are added to the wrong interface. This is apparently not specific to VRF: this was done "just because Java", and has been there from day one. It seems that the Java Virtual Machine builds its own routing table and assumes this behavior, so changing this would break every JVM out there, which is obviously not acceptable. Finally, Martin Kafai Lau asked if work should be done to merge the IPv4 and IPv6 FIB (forwarding information base) trees. The FIB tree is the data structure that represents routing tables in the Linux kernel. Miller explained that the two trees are not semantically equivalent: while IPv6 does source-address lookup and routing, IPv4 does not. We can't remove the source lookups from IPv6, because "people probably use that". According to Alexander Duyck, adding source tables to IPv4 would degrade performance to the level of IPv6 performance, which was jokingly referred to as an incentive to switch to IPv6. More seriously, Sowa argued that using the same compressed tree IPv4 uses in IPv6 could make sense. People may want to have source routing in IPv4 as well. Miller argued that the kernel is optimized for 32-bit addresses in IPv4, and conceded that it could be scaled to 64-bit subnets, but 128-bit addresses would be much harder. Sowa suggested that they could be limited to 64 bits, as global routes that are announced over BGP usually have such a limit, and more specific routes are usually at discrete prefixes like /65, /127 (for interconnect links) or /128 for (for point-to-point links). He expressed concerns over the reliability of such an implementation so, at this point, it is unlikely that the data structures could be merged. What is more likely is that the code path could be merged and simplified, while keeping the data structures separate.

Modules options substitutions The next issue that was raised was from Ji P rko, who asked how to pass configuration options to a driver before the driver is initialized. Some chips require that some settings be sent before the firmware is loaded, which leads to a weird situation where there is a need to address a device before it's actually recognized by the kernel. The question then can be summarized as to how to pass information to a device that doesn't exist yet. The answer seems to be that devlink could do this, as it has access to the full device tree and, therefore, to devices that can be addressed by (say) PCI identifiers. Then a possible devlink command could look something like:
    devlink dev pci/0000:03:00.0 option set foo bar
This idea raised a bunch of extra questions: some devices don't have a one-to-one mapping with the PCI bridge identifiers, for example, meaning that those identifiers cannot be used to access such devices. Another issue is that you may want to send multiple settings in a single transaction, which doesn't fit well in the devlink model. Miller then proposed to let the driver initialize itself to some state and wait for configuration to be sent when necessary. Another way would be to unregister the driver and re-register with the given configuration. Shrijeet Mukherjee explained that right now, Cumulus is doing this using horrible startup script magic by retrying and re-registering, but it would be nice to have a more standard way to do this.

Control over UAPI patches Another issue that came up was the problem of changes in the user-space API (UAPI) which break backward compatibility. P rko said that "we have to be more careful about those changes". The problem is that reviewers are not always available to make detailed reviews of such changes and may not notice API-breaking changes. P rko proposed creating a bot to check if a given patch introduces UAPI changes, changes in structs, or in netlink enums. Miller said he could block merges until discussions happen and that patchwork, which Miller uses to process patches from the mailing list, does some of this. He also pointed out there aren't enough test cases in the first place. Starovoitov argued UAPI isn't special, there are other ways of breaking backward compatibility. He expressed concerns that such a bot could create a false sense that everything is fine while a patch could break compatibility and not be detected. Miller countered that UAPI is special in that "we're stuck with it forever". He then went on to propose that, since there's a maintainer (or more) for each module, he can make sure that each maintainer explicitly approves changes to those modules.

Data-center hardware changes Starovoitov brought up the issue of a new type of hardware that is currently being deployed in data centers called a "multi-host NIC" (network interface card). It's a single NIC that is connected to multiple servers. Facebook, for example, uses this in its Yosemite platform that shoves twelve servers into a 2U rack mount, in three modules. Each module is made of four servers connected to the traditional switch fabric with a single NIC through PCI-Express. Mellanox and and Broadcom also have similar devices. One question is how to manage those devices. Since they are connected through a PCI-Express bus, Linux will see them as a NIC, yet they are also a little like switches, in that they interconnect multiple servers. Furthermore, the kernel security model assumes that a NIC is trusted, and gladly opens its own memory to NICs through DMA; this can become a huge security issue when the NIC is under the control of another server. This can especially become problematic if we consider that there could be TLS hardware offloading in the future with the introduction of in-kernel TLS stacks. The other problem is the question of reliability: since those devices are currently "dumb", they need to be managed just like a regular NIC. If the host managing the card crashes, it could disable a whole set of servers that rely on the same NIC. There could be an election process among the servers, but that complicates significantly what used to be a simple PCI connection. Mukherjee pointed out that the model Cisco uses for this is that the "smart NIC" is a "slave" of the main switch fabric. It's a daughter card, which makes it easier to manage from a network perspective. It is clear that Linux will need a way to represent those devices, probably through the newly introduced switchdev or DSA (distributed switch architecture), but it will be something to keep an eye on as density increases in the data center. There were many more discussions during Netconf, too many to cover here, but in the end, Miller thanked everyone for all the interesting topics as the participants dispersed for a day off to travel to Montreal to attend the following Netdev conference.
The author would like to thank the Netconf and Netdev organizers for travel to, and hosting assistance in, Toronto. Many thanks to Alexei Starovoitov for his time taken for a technical review of this article. Note: this article first appeared in the Linux Weekly News.

18 March 2017

Vincent Sanders: A rose by any other name would smell as sweet

Often I end up dealing with code that works but might not be of the highest quality. While quality is subjective I like to use the idea of "code smell" to convey what I mean, these are a list of indicators that, in total, help to identify code that might benefit from some improvement.

Such smells may include:
I am most certainly not alone in using this approach and Fowler et al have covered this subject in the literature much better than I can here. One point I will raise though is some programmers dismiss code that exhibits these traits as "legacy" and immediately suggest a fresh implementation. There are varying opinions on when a rewrite is the appropriate solution from never to always but in my experience making the old working code smell nice is almost always less effort and risk than a re-write.
TestsWhen I come across smelly code, and I decide it is worthwhile improving it, I often discover the biggest smell is lack of test coverage. Now do remember this is just one code smell and on its own might not be indicative, my experience is smelly code seldom has effective test coverage while fresh code often does.

Test coverage is generally understood to be the percentage of source code lines and decision paths used when instrumented code is exercised by a set of tests. Like many metrics developer tools produce, "coverage percentage" is often misused by managers as a proxy for code quality. Both Fowler and Marick have written about this but sufficient to say that for a developer test coverage is a useful tool but should not be misapplied.

Although refactoring without tests is possible the chances for unintended consequences are proportionally higher. I often approach such a refactor by enumerating all the callers and constructing a description of the used interface beforehand and check that that interface is not broken by the refactor. At which point is is probably worth writing a unit test to automate the checks.

Because of this I have changed my approach to such refactoring to start by ensuring there is at least basic API code coverage. This may not yield the fashionable 85% coverage target but is useful and may be extended later if desired.

It is widely known and equally widely ignored that for maximum effectiveness unit tests must be run frequently and developers take action to rectify failures promptly. A test that is not being run or acted upon is a waste of resources both to implement and maintain which might be better spent elsewhere.

For projects I contribute to frequently I try to ensure that the CI system is running the coverage target, and hence the unit tests, which automatically ensures any test breaking changes will be highlighted promptly. I believe the slight extra overhead of executing the instrumented tests is repaid by having the coverage metrics available to the developers to aid in spotting areas with inadequate tests.
ExampleA short example will help illustrate my point. When a web browser receives an object over HTTP the server can supply a MIME type in a content-type header that helps the browser interpret the resource. However this meta-data is often problematic (sorry that should read "a misleading lie") so the actual content must be examined to get a better answer for the user. This is known as mime sniffing and of course there is a living specification.

The source code that provides this API (Linked to it rather than included for brevity) has a few smells:
While some of these are obvious the non-use of the global string table and the API complexity needed detailed knowledge of the codebase, just to highlight how subjective the sniff test can be. There is also one huge air freshener in all of this which definitely comes from experience and that is the modules author. Their name at the top of this would ordinarily be cause for me to move on, but I needed an example!

First thing to check is the API use

$ git grep -i -e mimesniff_compute_effective_type --or -e mimesniff_init --or -e mimesniff_fini
content/hlcache.c: error = mimesniff_compute_effective_type(handle, NULL, 0,
content/hlcache.c: error = mimesniff_compute_effective_type(handle,
content/hlcache.c: error = mimesniff_compute_effective_type(handle,
content/mimesniff.c:nserror mimesniff_init(void)
content/mimesniff.c:void mimesniff_fini(void)
content/mimesniff.c:nserror mimesniff_compute_effective_type(llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_compute_effective_type(struct llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_init(void);
content/mimesniff.h:void mimesniff_fini(void);
desktop/netsurf.c: ret = mimesniff_init();
desktop/netsurf.c: mimesniff_fini();

This immediately shows me that this API is used in only a very small area, this is often not the case but the general approach still applies.

After a little investigation the usage is effectively that the mimesniff_init API must be called before the mimesniff_compute_effective_type API and the mimesniff_fini releases the initialised resources.

A simple test case was added to cover the API, this exercised the behaviour both when the init was called before the computation and not. Also some simple tests for a limited number of well behaved inputs.

By changing to using the global string table the initialisation and finalisation API can be removed altogether along with a large amount of global context and pre-processor macros. This single change removes a lot of smell from the module and raises test coverage both because the global string table already has good coverage and because there are now many fewer lines and conditionals to check in the mimesniff module.

I stopped the refactor at this point but were this more than an example I probably would have:
ConclusionThe approach examined here reduce the smell of code in an incremental, testable way to improve the codebase going forward. This is mainly necessary on larger complex codebases where technical debt and bit-rot are real issues that can quickly overwhelm a codebase if not kept in check.

This technique is subjective but helps a programmer to quantify and examine a piece of code in a structured fashion. However it is only a tool and should not be over applied nor used as a metric to proxy for code quality.

15 March 2017

Dirk Eddelbuettel: RcppEigen 0.3.2.9.1

A new maintenance release 0.3.2.9.1 of RcppEigen, still based on Eigen 3.2.9 is now on CRAN and is now going into Debian soon. This update ensures that RcppEigen and the Matrix package agree on their #define statements for the CholMod / SuiteSparse library. Thanks to Martin Maechler for the pull request. I also added a file src/init.c as now suggested (soon: requested) by the R CMD check package validation. The complete NEWS file entry follows.

Changes in RcppEigen version 0.3.2.9.1 (2017-03-14)
  • Synchronize CholMod header file with Matrix package to ensure binary compatibility on all platforms (Martin Maechler in #42)
  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); also use .registration=TRUE in useDynLib in NAMESPACE

Courtesy of CRANberries, there is also a diffstat report for the most recent release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

5 March 2017

Shirish Agarwal: To say or not to say

Voltaire For people who are visually differently-abled, the above reads To learn who rules over you, simply find out who you are not allowed to criticize Voltaire wrote this either in late 16th century or early 17th century and those words were as apt in those times, as it is in these turbulent times as well. Update 05/03 According to @bla these words are attributable to a neo-nazi and apparently a child abuser. While I don t know the context in which it was shared, it describes the environment in which we are perfectly. Please see his comment for a link and better understanding. The below topic requires a bit of maturity, so if you are easily offended, feel free not to read further. While this week-end I was supposed to share about the recent Science Day celebrations that we did last week Science Day celebrations at GMRT Would explore it probably next week. This week the attempt is to share thoughts which had been simmering at the back of my mind for more than 2 weeks or more and whose answers are not clear to me. My buttons were pressed when Martin f. Kraft shared about a CoC violation and the steps taken therein. While it is easy to say with 20:20 hind-sight to say that the gentleman acted foolishly, I don t really know the circumstances to pass the judgement so quickly. In reality, while I didn t understand the joke in itself, I have to share some background by way of anecdotes as to why it isn t so easy for me to give a judgement call. a. I don t know the topics chosen by stand-up comedians in other countries, in India, most of the stand-up acts are either about dating or sex or somewhere in-between, which is lovingly given the name Leela (dance of life) in Indian mythology. I have been to several such acts over the years at different events, different occasions and 99.99% of the time I would see them dealing with pedophilia, necrophilia and all sorts of deviants in sexuality and people laughing wildly, but couple of times when the comedian shared the term sex with people, educated, probably more than a few world-travelled middle to higher-middle class people were shocked into silence. I had seen this not in once but 2-3 times in different environments and was left wondering just couple of years back Is sex such a bad word that people get easily shocked ? Then how is it that we have 1.25 billion + people in India. There had to be some people having sex. I don t think that all 1.25 billion people are test-tube babies. b. This actually was what lead to my quandary last year when my sharing of My Experience with Debian which I had carefully prepared for newbies, seeing seasoned debian people, I knew my lame observations wouldn t cut ice with them and hence had to share my actual story which involved a bit of porn. I was in two minds whether or not to say it till my eyes caught a t-shirt on which it was said We make porn or something to that effect. That helped me share my point. c. Which brings me to another point, it seems it is becoming increasingly difficult to talk about anything either before apologizing to everyone and not really knowing who will take offence at what and what the repercussions might be. In local sharings, I always start with a blanket apology that if I say something that offends you, please let me know afterwards so I can work on it. As the term goes You can t please everyone and that is what happens. Somebody sooner or later would take offence at something and re-interpret it in ways which I had not thought of. Charlie Chaplin - King of self-deprecating humor From the little sharings and interactions I have been part of, I find people take offence at the most innocuous things. For instance, one of the easy routes of not offending anyone is to use self-deprecating humour (or so I thought) either of my race, caste, class or even my issues with weight and each of the above would offend somebody. Charlie Chaplin didn t have those problems. If somebody is from my caste, I m portraying the caste in a certain light, a certain slant. If I m talking about weight issues, then anybody who is like me (fat) feels that the world is laughing at them rather than at me or they will be discriminated against. While I find the last point a bit valid, it leaves with me no tools and no humour. I neither have the observational powers or the skills that Kapil Sharma has and have to be me. While I have no clue what to do next, I feel the need to also share why humour is important in any sharing.- a. Break When any speaker uses humour, the idea is to take a break from a serious topic. It helps to break the monotony of the talk especially if the topic is full of jargon talk and new concepts. A small comedic relief brings the attendees attention back to the topic as it tends to wander in a long monotonous talk. b. Bridge Some of the better speakers use one or more humourous anecdote to explain and/or bridge the chasm between two different concepts. Some are able to produce humour on the fly while others like me have to rely on tried and tested methods. There is one another thing as well, humour is seems to be a mixture of social, cultural and political context and its very easy to have it back-fired upon you. For instance, I attempted humour on refugees, probably not the best topic to try humour in the current political climate, and predictably, it didn t go down well. I had to share and explain about Robin Williams slightly dark yet humorous tale in Moscow on the Hudson The film provides comedy and pathos in equal measure. You are left identifying with Vladimir Ivanoff (Robin Williams character) especially in the last scene where he learns of his grand-mother dying and he remembers her and his motherland, Russia and plays a piece on his saxophone as a tribute both to his grand-mother and the motherland. Apparently, in the height of the cold war, if a Russian defected to United States (land of Satan and other such terms used) you couldn t return to Russia. The movie, seen some years back left a deep impact on me. For all the shortcomings and ills that India has, even if I could, would and could I be happy anywhere else ? The answers are not so easy. With most NRI s (Non-Resident Indians) who emigrated for good did it not so much for themselves but for their children. So the children would hopefully have a better upbringing, better facilities, better opportunities than they would have got here. I talked to more than a few NRI s and while most of them give standardized answers, talking awhile and couple of beers or their favourite alcohol later, you come across deeply conflicted human beings whose heart is in India and their job, profession and money interests compel them to be in the country where they are serving. And Indian movies further don t make it easy for the Indian populace when trying to integrate into a new place. Some of the biggest hits of yesteryear s were about having the distinct Indian culture in their new country while the message of most countries is integration. I know of friends who are living in Germany who have to struggle through their German in order to be counted as a citizen, the same I guess is true of other countries as well, not just the language but the customs as well. They also probably struggle with learning more than one language and having an amalgamation of values which somehow they and their children have to make sense of. I was mildly shocked last week to learn that Mishi Choudary had to train people in the U.S. to differentiate between Afghan turban styles of wearing and the Punjabi style of wearing the turban. A simple search on Afghani turban and Punjabi turban reveals that there are a lot of differences between the two cultures. In fact, the way they talk, the way they walk, there are lots that differentiate the two cultures. The second shocking video was of an African-American man racially abusing an Indian-American girl. At first, I didn t believe it till I saw the video on facebook. My point through all that is it seems humour, that clean, simple exercise which brings a smile to you and uplifts the spirit doesn t seem to be as easy as it once was. Comments, suggestions, criticisms all are welcome.
Filed under: Miscellenous Tagged: #Elusive, #Fear, #hind-sight, #Humour, #immigrant, #integration, #Mishi Choudary, #refugee, #Robin Williams, #self-deprecating, #SFLC, #two-minds

25 February 2017

Martin Pitt: systemd 233 about to be released, please help testing

systemd 233 is scheduled to be released next week, and there is only a handful of small issues left. As usual there are tons of improvements and fixes, but the most intrusive one probably is another attempt to move from legacy cgroup v1 to a hybrid setup where the new unified (cgroup v2) hierarchy is mounted at /sys/fs/cgroup/unified/ and the legacy one stays at /sys/fs/cgroup/ as usual. This should provide an easier path for software like Docker or LXC to migrate to the unified hiearchy, but even that hybrid mode broke some bits. While systemd 233 will not make it into Debian stretch or Ubuntu zesty, as both are in feature freeze, it will soon be available in Debian experimental, and in the next Ubuntu release after 17.04 gets released. Thus now is another good time to give this some thorough testing! To help with this, please give the PPA with builds from upstream master a spin. In addition to the usual packages for Ubuntu 16.10 I also uploaded a build for Ubuntu zesty, and a build for Debian stretch (aka testing) which also works on Debian sid. You can use that URL as an apt source:
deb [trusted=yes] https://people.debian.org/~mpitt/tmp/systemd-master-20170225/ /
These packages pass our autopkgtests and I tested them manually too. LXC and LXD work fine, docker.io/runc needs a fix which I uploaded to Ubuntu zesty. (It s not yet available in Debian, sorry.) Please file reports about regressions on GitHub, but please also le me know about successes on my Google+ page so that we can get some idea about how many people tested this. Thank you, and happy booting!

6 February 2017

Martin Pitt: Migrated blog from WordPress to Hugo

My WordPress blog got hacked two days ago and now twice today. This morning I purged MySQL and restored a good backup from three days ago, changed all DB and WordPress passwords (both the old and new ones were long and autogenerated ones), but not even an hour after the redeploy the hack was back. (It can still be seen on Planet Debian and Planet Ubuntu. Neither the Apache logs nor the Journal had anything obvious, nor were there any new files in global or user www directories, so I m a bit stumped how this happened. Certainly not due to bruteforcing a password, that would both have shown in the logs and also have triggered ban2fail, so this looks like an actual vulnerability. I upgraded to WordPress 4.7.1 a few days ago, and apparently 4.7.2 fixes a few vulnerabilities, although all of them don t sound like they would match my situation. jessie-backports is still at 4.7.1, so I missed that update. But either way, all WordPress blogs hosted on my server are down for the time being. I took this as motivation to finally migrate to something more robust. WordPress has tons of features that I never need, and also a lot of overhead (dynamic generation, MySQL, its own user/passwords, etc.). I had a look around, and it seems Hugo and Blogofile are nice contenders no privileges, no database, outputting static files, input is Markdown (so much nicer to type than HTML!), and maintaining your blog in git and previewing the changes on my local laptop are straightforward. I happened to try Hugo first, and like it enough to give it an extended try you have plenty of themes to choose from and they are straightforward to customize, so I don t need to spend a lot of time learning and crafting CSS. I ran the WordPress to Hugo Exporter, and it produced remarkable results fairly usable HTML Markdown and metadata conversion, it keeps all the original URLs, and it s painless to use. Nicely done! So here it is, on to a much more secure server now! \o/

Jaldhar Vyas: Don't Believe Everything You Read on Debian Planet

Martin Pitt won the popular vote.

Next.

Previous.