Search Results: "orv"

19 March 2017

Petter Reinholdtsen: Free software archive system Nikita now able to store documents

The Nikita Noark 5 core project is implementing the Norwegian standard for keeping an electronic archive of government documents. The Noark 5 standard document the requirement for data systems used by the archives in the Norwegian government, and the Noark 5 web interface specification document a REST web service for storing, searching and retrieving documents and metadata in such archive. I've been involved in the project since a few weeks before Christmas, when the Norwegian Unix User Group announced it supported the project. I believe this is an important project, and hope it can make it possible for the government archives in the future to use free software to keep the archives we citizens depend on. But as I do not hold such archive myself, personally my first use case is to store and analyse public mail journal metadata published from the government. I find it useful to have a clear use case in mind when developing, to make sure the system scratches one of my itches. If you would like to help make sure there is a free software alternatives for the archives, please join our IRC channel (#nikita on and the project mailing list. When I got involved, the web service could store metadata about documents. But a few weeks ago, a new milestone was reached when it became possible to store full text documents too. Yesterday, I completed an implementation of a command line tool archive-pdf to upload a PDF file to the archive using this API. The tool is very simple at the moment, and find existing fonds, series and files while asking the user to select which one to use if more than one exist. Once a file is identified, the PDF is associated with the file and uploaded, using the title extracted from the PDF itself. The process is fairly similar to visiting the archive, opening a cabinet, locating a file and storing a piece of paper in the archive. Here is a test run directly after populating the database with test data using our API tester:
~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446
 0 - Title of the test case file created 2017-03-18T23:49:32.103446
 1 - Title of the test file created 2017-03-18T23:49:32.103446
Select which mappe you want (or search term): 0
Uploading mangelmelding/mangler.pdf
  PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
  File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446
You can see here how the fonds (arkiv) and serie (arkivdel) only had one option, while the user need to choose which file (mappe) to use among the two created by the API tester. The archive-pdf tool can be found in the git repository for the API tester. In the project, I have been mostly working on the API tester so far, while getting to know the code base. The API tester currently use the HATEOAS links to traverse the entire exposed service API and verify that the exposed operations and objects match the specification, as well as trying to create objects holding metadata and uploading a simple XML file to store. The tester has proved very useful for finding flaws in our implementation, as well as flaws in the reference site and the specification. The test document I uploaded is a summary of all the specification defects we have collected so far while implementing the web service. There are several unclear and conflicting parts of the specification, and we have started writing down the questions we get from implementing it. We use a format inspired by how The Austin Group collect defect reports for the POSIX standard with their instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :). The Nikita project is implemented using Java and Spring, and is fairly easy to get up and running using Docker containers for those that want to test the current code base. The API tester is implemented in Python.

17 February 2017

Joey Hess: Presenting at LibrePlanet 2017

I've gotten in the habit of going to the FSF's LibrePlanet conference in Boston. It's a very special conference, much wider ranging than a typical technology conference, solidly grounded in software freedom, and full of extraordinary people. (And the only conference I've ever taken my Mom to!) After attending for four years, I finally thought it was time to perhaps speak at it.
Four keynote speakers will anchor the event. Kade Crockford, director of the Technology for Liberty program of the American Civil Liberties Union of Massachusetts, will kick things off on Saturday morning by sharing how technologists can enlist in the growing fight for civil liberties. On Saturday night, Free Software Foundation president Richard Stallman will present the Free Software Awards and discuss pressing threats and important opportunities for software freedom. Day two will begin with Cory Doctorow, science fiction author and special consultant to the Electronic Frontier Foundation, revealing how to eradicate all Digital Restrictions Management (DRM) in a decade. The conference will draw to a close with Sumana Harihareswara, leader, speaker, and advocate for free software and communities, giving a talk entitled "Lessons, Myths, and Lenses: What I Wish I'd Known in 1998." That's not all. We'll hear about the GNU philosophy from Marianne Corvellec of the French free software organization April, Joey Hess will touch on encryption with a talk about backing up your GPG keys, and Denver Gingerich will update us on a crucial free software need: the mobile phone. Others will look at ways to grow the free software movement: through cross-pollination with other activist movements, removal of barriers to free software use and contribution, and new ideas for free software as paid work.
-- Here's a sneak peek at LibrePlanet 2017: Register today! I'll be giving some varient of the keysafe talk from Linux.Conf.Au. By the way, videos of my keysafe and propellor talks at Linux.Conf.Au are now available, see the talks page.

26 December 2016

Lucas Nussbaum: The Linux 2.5, Ruby 1.9 and Python 3 release management anti-pattern

There s a pattern that comes up from time to time in the release management of free software projects. To allow for big, disruptive changes, a new development branch is created. Most of the developers focus moves to the development branch. However at the same time, the users focus stays on the stable branch. As a result: This situation can grow up to a quasi-deadlock, with people questioning whether it was a good idea to do such a massive fork in the first place, and if it is a good idea to even spend time switching to the new branch. To make things more unclear, the development branch is often declared stable by its developers, before most of the libraries or applications have been ported to it. This has happened at least three times. First, in the Linux 2.4 / 2.5 era. Wikipedia describes the situation like this:

Before the 2.6 series, there was a stable branch (2.4) where only relatively minor and safe changes were merged, and an unstable branch (2.5), where bigger changes and cleanups were allowed. Both of these branches had been maintained by the same set of people, led by Torvalds. This meant that users would always have a well-tested 2.4 version with the latest security and bug fixes to use, though they would have to wait for the features which went into the 2.5 branch. The downside of this was that the stable kernel ended up so far behind that it no longer supported recent hardware and lacked needed features. In the late 2.5 kernel series, some maintainers elected to try backporting of their changes to the stable kernel series, which resulted in bugs being introduced into the 2.4 kernel series. The 2.5 branch was then eventually declared stable and renamed to 2.6. But instead of opening an unstable 2.7 branch, the kernel developers decided to continue putting major changes into the 2.6 branch, which would then be released at a pace faster than 2.4.x but slower than 2.5.x. This had the desirable effect of making new features more quickly available and getting more testing of the new code, which was added in smaller batches and easier to test. Then, in the Ruby community. In 2007, Ruby 1.8.6 was the stable version of Ruby. Ruby 1.9.0 was released on 2007-12-26, without being declared stable, as a snapshot from Ruby s trunk branch, and most of the development s attention moved to 1.9.x. On 2009-01-31, Ruby 1.9.1 was the first release of the 1.9 branch to be declared stable. But at the same time, the disruptive changes introduced in Ruby 1.9 made users stay with Ruby 1.8, as many libraries (gems) remained incompatible with Ruby 1.9.x. Debian provided packages for both branches of Ruby in Squeeze (2011) but only changed the default to 1.9 in 2012 (in a stable release with Wheezy 2013). Finally, in the Python community. Similarly to what happened with Ruby 1.9, Python 3.0 was released in December 2008. Releases from the 3.x branch have been shipped in Debian Squeeze (3.1), Wheezy (3.2), Jessie (3.4). But the python command still points to 2.7 (I don t think that there are plans to make it point to 3.x, making python 3.x essentially a different language), and there are talks about really getting rid of Python 2.7 in Buster (Stretch+1, Jessie+2). In retrospect, and looking at what those projects have been doing in recent years, it is probably a better idea to break early, break often, and fix a constant stream of breakages, on a regular basis, even if that means temporarily exposing breakage to users, and spending more time seeking strategies to limit the damage caused by introducing breakage. What also changed since the time those branches were introduced is the increased popularity of automated testing and continuous integration, which makes it easier to measure breakage caused by disruptive changes. Distributions are in a good position to help here, by being able to provide early feedback to upstream projects about potentially disruptive changes. And distributions also have good motivations to help here, because it is usually not a great solution to ship two incompatible branches of the same project. (I wonder if there are other occurrences of the same pattern?) Update: There s a discussion about this post on HN

12 December 2016

Kees Cook: security things in Linux v4.9

Previously: v4.8. Here are a bunch of security things I m excited about in the newly released Linux v4.9: Latent Entropy GCC plugin Building on her earlier work to bring GCC plugin support to the Linux kernel, Emese Revfy ported PaX s Latent Entropy GCC plugin to upstream. This plugin is significantly more complex than the others that have already been ported, and performs extensive instrumentation of functions marked with __latent_entropy. These functions have their branches and loops adjusted to mix random values (selected at build time) into a global entropy gathering variable. Since the branch and loop ordering is very specific to boot conditions, CPU quirks, memory layout, etc, this provides some additional uncertainty to the kernel s entropy pool. Since the entropy actually gathered is hard to measure, no entropy is credited , but rather used to mix the existing pool further. Probably the best place to enable this plugin is on small devices without other strong sources of entropy. vmapped kernel stack and thread_info relocation on x86 Normally, kernel stacks are mapped together in memory. This meant that attackers could use forms of stack exhaustion (or stack buffer overflows) to reach past the end of a stack and start writing over another process s stack. This is bad, and one way to stop it is to provide guard pages between stacks, which is provided by vmalloced memory. Andy Lutomirski did a bunch of work to move to vmapped kernel stack via CONFIG_VMAP_STACK on x86_64. Now when writing past the end of the stack, the kernel will immediately fault instead of just continuing to blindly write. Related to this, the kernel was storing thread_info (which contained sensitive values like addr_limit) at the bottom of the kernel stack, which was an easy target for attackers to hit. Between a combination of explicitly moving targets out of thread_info, removing needless fields, and entirely moving thread_info off the stack, Andy Lutomirski and Linus Torvalds created CONFIG_THREAD_INFO_IN_TASK for x86. CONFIG_DEBUG_RODATA mandatory on arm64 As recently done for x86, Mark Rutland made CONFIG_DEBUG_RODATA mandatory on arm64. This feature controls whether the kernel enforces proper memory protections on its own memory regions (code memory is executable and read-only, read-only data is actually read-only and non-executable, and writable data is non-executable). This protection is a fundamental security primitive for kernel self-protection, so there s no reason to make the protection optional. random_page() cleanup Cleaning up the code around the userspace ASLR implementations makes them easier to reason about. This has been happening for things like the recent consolidation on arch_mmap_rnd() for ET_DYN and during the addition of the entropy sysctl. Both uncovered some awkward uses of get_random_int() (or similar) in and around arch_mmap_rnd() (which is used for mmap (and therefore shared library) and PIE ASLR), as well as in randomize_stack_top() (which is used for stack ASLR). Jason Cooper cleaned things up further by doing away with randomize_range() entirely and replacing it with the saner random_page(), making the per-architecture arch_randomize_brk() (responsible for brk ASLR) much easier to understand. That s it for now! Let me know if there are other fun things to call attention to in v4.9.

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

15 November 2016

Antoine Beaupr : The Turris Omnia router: help for the IoT mess?

The Turris Omnia router is not the first FLOSS router out there, but it could well be one of the first open hardware routers to be available. As the crowdfunding campaign is coming to a close, it is worth reflecting on the place of the project in the ecosystem. Beyond that, I got my hardware recently, so I was able to give it a try.

A short introduction to the Omnia project The Turris Omnia Router The Omnia router is a followup project on CZ.NIC's original research project, the Turris. The goal of the project was to identify hostile traffic on end-user networks and develop global responses to those attacks across every monitored device. The Omnia is an extension of the original project: more features were added and data collection is now opt-in. Whereas the original Turris was simply a home router, the new Omnia router includes:
  • 1.6GHz ARM CPU
  • 1-2GB RAM
  • 8GB flash storage
  • 6 Gbit Ethernet ports
  • SFP fiber port
  • 2 Mini-PCI express ports
  • mSATA port
  • 3 MIMO 802.11ac and 2 MIMO 802.11bgn radios and antennas
  • SIM card support for backup connectivity
Some models sold had a larger case to accommodate extra hard drives, turning the Omnia router into a NAS device that could actually serve as a multi-purpose home server. Indeed, it is one of the objectives of the project to make "more than just a router". The NAS model is not currently on sale anymore, but there are plans to bring it back along with LTE modem options and new accessories "to expand Omnia towards home automation". Omnia runs a fork of the OpenWRT distribution called TurrisOS that has been customized to support automated live updates, a simpler web interface, and other extra features. The fork also has patches to the Linux kernel, which is based on Linux 4.4.13 (according to uname -a). It is unclear why those patches are necessary since the ARMv7 Armada 385 CPU has been supported in Linux since at least 4.2-rc1, but it is common for OpenWRT ports to ship patches to the kernel, either to backport missing functionality or perform some optimization. There has been some pressure from backers to petition Turris to "speedup the process of upstreaming Omnia support to OpenWrt". It could be that the team is too busy with delivering the devices already ordered to complete that process at this point. The software is available on the CZ-NIC GitHub repository and the actual Linux patches can be found here and here. CZ.NIC also operates a private GitLab instance where more software is available. There is technically no reason why you wouldn't be able to run your own distribution on the Omnia router: OpenWRT development snapshots should be able to run on the Omnia hardware and some people have installed Debian on Omnia. It may require some customization (e.g. the kernel) to make sure the Omnia hardware is correctly supported. Most people seem to prefer to run TurrisOS because of the extra features. The hardware itself is also free and open for the most part. There is a binary blob needed for the 5GHz wireless card, which seems to be the only proprietary component on the board. The schematics of the device are available through the Omnia wiki, but oddly not in the GitHub repository like the rest of the software.

Hands on I received my own router last week, which is about six months late from the original April 2016 delivery date; it allowed me to do some hands-on testing of the device. The first thing I noticed was a known problem with the antenna connectors: I had to open up the case to screw the fittings tight, otherwise the antennas wouldn't screw in correctly. Once that was done, I simply had to go through the usual process of setting up the router, which consisted of connecting the Omnia to my laptop with an Ethernet cable, connecting the Omnia to an uplink (I hooked it into my existing network), and go through a web wizard. I was pleasantly surprised with the interface: it was smooth and easy to use, but at the same time imposed good security practices on the user. Install wizard performing automatic updates For example, the wizard, once connected to the network, goes through a full system upgrade and will, by default, automatically upgrade itself (including reboots) when new updates become available. Users have to opt-in to the automatic updates, and can chose to automate only the downloading and installation of the updates without having the device reboot on its own. Reboots are also performed during user-specified time frames (by default, Omnia applies kernel updates during the night). I also liked the "skip" button that allowed me to completely bypass the wizard and configure the device myself, through the regular OpenWRT systems (like LuCI or SSH) if I needed to. The Omnia router about to rollback to latest snapshot Notwithstanding the antenna connectors themselves, the hardware is nice. I ordered the black metal case, and I must admit I love the many LED lights in the front. It is especially useful to have color changes in the reset procedure: no more guessing what state the device is in or if I pressed the reset button long enough. The LEDs can also be dimmed to reduce the glare that our electronic devices produce. All this comes at a price, however: at \$250 USD, it is a much higher price tag than common home routers, which typically go for around \$50. Furthermore, it may be difficult to actually get the device, because no orders are being accepted on the Indiegogo site after October 31. The Turris team doesn't actually want to deal with retail sales and has now delegated retail sales to other stores, which are currently limited to European deliveries.

A nice device to help fight off the IoT apocalypse It seems there isn't a week that goes by these days without a record-breaking distributed denial-of-service (DDoS) attack. Those attacks are more and more caused by home routers, webcams, and "Internet of Things" (IoT) devices. In that context, the Omnia sets a high bar for how devices should be built but also how they should be operated. Omnia routers are automatically upgraded on a nightly basis and, by default, do not provide telnet or SSH ports to run arbitrary code. There is the password-less wizard that starts up on install, but it forces the user to chose a password in order to complete the configuration. Both the hardware and software of the Omnia are free and open. The automatic update's EULA explicitly states that the software provided by CZ.NIC "will be released under a free software licence" (and it has been, as mentioned earlier). This makes the machine much easier to audit by someone looking for possible flaws, say for example a customs official looking to approve the import in the eventual case where IoT devices end up being regulated. But it also makes the device itself more secure. One of the problems with these kinds of devices is "bit rot": they have known vulnerabilities that are not fixed in a timely manner, if at all. While it would be trivial for an attacker to disable the Omnia's auto-update mechanisms, the point is not to counterattack, but to prevent attacks on known vulnerabilities. The CZ.NIC folks take it a step further and encourage users to actively participate in a monitoring effort to document such attacks. For example, the Omnia can run a honeypot to lure attackers into divulging their presence. The Omnia also runs an elaborate data collection program, where routers report malicious activity to a central server that collects information about traffic flows, blocked packets, bandwidth usage, and activity from a predefined list of malicious addresses. The exact data collected is specified in another EULA that is currently only available to users logged in at the Turris web site. That data can then be turned into tweaked firewall rules to protect the overall network, which the Turris project calls a distributed adaptive firewall. Users need to explicitly opt-in to the monitoring system by registering on a portal using their email address. Turris devices also feature the Majordomo software (not to be confused with the venerable mailing list software) that can also monitor devices in your home and identify hostile traffic, potentially leading users to take responsibility over the actions of their own devices. This, in turn, could lead users to trickle complaints back up to the manufacturers that could change their behavior. It turns out that some companies do care about their reputations and will issue recalls if their devices have significant enough issues. It remains to be seen how effective the latter approach will be, however. In the meantime, the Omnia seems to be an excellent all-around server and router for even the most demanding home or small-office environments that is a great example for future competitors.
Note: this article first appeared in the Linux Weekly News.

13 November 2016

Andrew Cater: Debian MiniConf, ARM, Cambridge 11/11/16 - Day 2 post 2

It's raining cats and dogs in Cambridge.

Just listening to Lars Wirzenius - who shared an office with Linus Torvalds, owned the computer that first ran Linux, founded the Linux Documentation Project. Living history in more than one sense :)

Live streaming is also happening.

Building work is also happening - so there may be random noise happening occasionally.

23 October 2016

Jaldhar Vyas: What I Did During My Summer Vacation

Thats So Raven If I could sum up the past year in one word, that word would be distraction. There have been so many strange, confusing or simply unforseen things going on I have had trouble focusing like never before. For instance, on the opposite side of the street from me is one of Jersey City's old resorvoirs. It's not used for drinking water anymore and the city eventually plans on merging it into the park on the other side. In the meantime it has become something of a wildlife refuge. Which is nice except one of the newly settled critters was a bird of prey -- the consensus is possibly some kind of hawk or raven. Starting your morning commute under the eyes of a harbinger of death is very goth and I even learned to deal with the occasional piece of deconstructed rodent on my doorstep but nighttime was a big problem. For contrary to popular belief, ravens do not quoth "nevermore" but "KRRAAAA". Very loudly. Just as soon as you have drifted of to sleep. Eventually my sleep-deprived neighbors and I appealed to the NJ division of enviromental protection to get it removed but by the time they were ready to swing into action the bird had left for somewhere more congenial like Transylvania or Newark. Or here are some more complete wastes of time: I go the doctor for my annual physical. The insurance company codes it as Adult Onset Diabetes by accident. One day I opened the lid of my laptop and there's a "ping" sound and a piece of the hinge flies off. Apparently that also severed the connection to the screen and naturally the warranty had just expired so I had to spend the next month tethered to an external monitor until I could afford to buy a new one. Mix in all the usual social, political, family and work drama and you can see that it has been a very trying time for me. Dovecot I have managed to get some Debian work done. On Dovecot, my principal package, I have gotten tremendous support from Apollon Oikonomopolous who I belatedly welcome as a member of the Dovecot maintainer team. He has been particularly helpful in fixing our systemd support and cleaning out a lot of the old and invalid bugs. We're in pretty good shape for the freeze. Upstream has released an RC of 2.2.26 and hopefully the final version will be out in the next couple of days so we can include it in Stretch. We can always use more help with the package so let me know if you're interested. Debian-IN Most of the action has been going on without me but I've been lending support and sponsoring whenever I can. We have several new DDs and DMs but still no one north of the Vindhyas I'm afraid. Debian Perl Group gregoa did a ping of inactive maintainers and I regretfully had to admit to myself that I wasn't going to be of use anytime soon so I resigned. Perl remains my favorite language and I've actually been more involved in the meetings of my local Perlmongers group so hopefully I will be back again one day. And I still maintain the Perl modules I wrote myself. Debian-Axe-Murderers* May have gained a recruit. *Stricly speaking it should be called Debian-People-Who-Dont-Think-Faults-in-One-Moral-Domain-Such-As-For-Example-Axe-Murdering-Should-Leak-Into-Another-Moral-Domain-Such-As-For-Example-Debian but come on, that's just silly.

17 September 2016

Norbert Preining: Android 7.0 Nougat Root PokemonGo

Since my switch to Android my Nexus 6p is rooted and I have happily fixed the Android (<7) font errors with Japanese fonts in English environment (see this post). The recently released Android 7 Nougat finally fixes this problem, so it was high time to update. In addition, a recent update to Pokemon Go excluded rooted devices, so I was searching for a solution that allows me to: update to Nougat, keep root, and run PokemonGo (as well as some bank security apps etc). android-nougat-root-poke After some playing around here are the steps I took: Installation of necessary components Warning: The following is for Nexus6p device, you need different image files and TWRP recovery for other devices. Flash Nougat firmware images Get it from the Google Android Nexus images web site, unpack the zip and the included zip one gets a lot of img files.
cd angler-nrd90u/
As I don t want my user partition to get flashed, I did not use the included flash script, but did it manually:
fastboot flash bootloader bootloader-angler-angler-03.58.img
fastboot reboot-bootloader
sleep 5
fastboot flash radio radio-angler-angler-03.72.img
fastboot reboot-bootloader
sleep 5
fastboot erase system
fastboot flash system system.img
fastboot erase boot
fastboot flash boot boot.img
fastboot erase cache
fastboot flash cache cache.img
fastboot erase vendor
fastboot flash vendor vendor.img
fastboot erase recovery
fastboot flash recovery recovery.img
fastboot reboot
After that boot into the normal system and let it do all the necessary upgrades. Once this is done, let us prepare for systemless root and possible hiding of it. Get the necessary file Get Magisk, SuperSU-magisk, as well as the Magisk-Manager.apk from this forum thread (direct links as of 2016/9:,, Magisk-Manager.apk). Transfer these two files to your device I am using an external USB stick that can be plugged into the device, or copy it via your computer or via a cloud service. Also we need to get a custom recovery image, I am using TWRP. I used the version 3.0.2-0 of TWRP I had already available, but that version didn t manage to decrypt the file system and hangs. One needs to get at least version 3.0.2-2 from the TWRP web site. Install latest TWRP recorvery Reboot into boot-loader, then use fastboot to flash twrp:
fastboot erase recovery
fastboot flash recovery twrp-3.0.2-2-angler.img
fastboot reboot-bootloader
After that select Recovery with the up-down buttons and start twrp. You will be asked you pin if you have one set. Install Select Install in TWRP, select the file, and see you device being prepared for systemless root. Install SuperSU, Magisk version Again, boot into TWRP and use the install tool to install After reboot you should have a SuperSU binary running. Install the Magisk Manager From your device browse to the .apk and install it. How to run safety net programs Those programs that check for safety functions (Pokemon Go, Android Pay, several bank apps) need root disabled. Open the Magisk Manager and switch the root switch to the left (off). After this starting the program should bring you past the special check.

16 August 2016

Lars Wirzenius: 20 years ago I became a Debian developer

Today it is 23 years ago since Ian Murdock published his intention to develop a new Linux distribution, Debian. It also about 20 years since I became a Debian developer and made my first package upload. In the time since: It's been a good twenty years. And the fun ain't over yet.

1 August 2016

Petter Reinholdtsen: Techno TV broadcasting live across Norway and the Internet (#debconf16, #nuug) on @frikanalen

Did you know there is a TV channel broadcasting talks from DebConf 16 across an entire country? Or that there is a TV channel broadcasting talks by or about Linus Torvalds, Tor, OpenID, Common Lisp, Civic Tech, EFF founder John Barlow, how to make 3D printer electronics and many more fascinating topics? It works using only free software (all of it available from Github), and is administrated using a web browser and a web API. The TV channel is the Norwegian open channel Frikanalen, and I am involved via the NUUG member association in running and developing the software for the channel. The channel is organised as a member organisation where its members can upload and broadcast what they want (think of it as Youtube for national broadcasting television). Individuals can broadcast too. The time slots are handled on a first come, first serve basis. Because the channel have almost no viewers and very few active members, we can experiment with TV technology without too much flack when we make mistakes. And thanks to the few active members, most of the slots on the schedule are free. I see this as an opportunity to spread knowledge about technology and free software, and have a script I run regularly to fill up all the open slots the next few days with technology related video. The end result is a channel I like to describe as Techno TV - filled with interesting talks and presentations. It is available on channel 50 on the Norwegian national digital TV network (RiksTV). It is also available as a multicast stream on Uninett. And finally, it is available as a WebM unicast stream from Frikanalen and NUUG. Check it out. :)

8 July 2016

Mike Hommey: Are all integer overflows equal?

Background: I ve been relearning Rust (more about that in a separate post, some time later), and in doing so, I chose to implement the low-level parts of git (I ll touch the why in that separate post I just promised). Disclaimer: It s friday. This is not entirely(?) a serious post. So, I was looking at Documentation/technical/index-format.txt, and saw:
32-bit number of index entries.
What? The index/staging area can t handle more than ~4.3 billion files? There I was, writing Rust code to write out the index.
(For people familiar with the byteorder crate and wondering what NetworkOrder is, I have a use byteorder::BigEndian as NetworkOrder) And the Rust compiler rightfully barfed:
error: mismatched types:
 expected  u32 ,
    found  usize  [E0308]
And there I was, wondering: mmmm should I just add as u32 and silently truncate or hey what does git do? And it turns out, git uses an unsigned int to track the number of entries in the first place, so there is no truncation happening. Then I thought but what happens when cache_nr reaches the max? Well, it turns out there s only one obvious place where the field is incremented. What? Holy coffin nails, Batman! No overflow check? Wait a second, look 3 lines above that:
ALLOC_GROW(istate->cache, istate->cache_nr + 1, istate->cache_alloc);
Yeah, obviously, if you re incrementing cache_nr, you already have that many entries in memory. So, how big would that array be?
        struct cache_entry **cache;
So it s an array of pointers, assuming 64-bits pointers, that s ~34.3 GB. But, all those cache_nr entries are in memory too. How big is a cache entry?
struct cache_entry  
        struct hashmap_entry ent;
        struct stat_data ce_stat_data;
        unsigned int ce_mode;
        unsigned int ce_flags;
        unsigned int ce_namelen;
        unsigned int index;     /* for link extension */
        unsigned char sha1[20];
        char name[FLEX_ARRAY]; /* more */
So, 4 ints, 20 bytes, and as many bytes as necessary to hold a path. And two inline structs. How big are they?

struct hashmap_entry  
        struct hashmap_entry *next;
        unsigned int hash;
struct stat_data  
        struct cache_time sd_ctime;
        struct cache_time sd_mtime;
        unsigned int sd_dev;
        unsigned int sd_ino;
        unsigned int sd_uid;
        unsigned int sd_gid;
        unsigned int sd_size;
Woohoo, nested structs.
struct cache_time  
        uint32_t sec;
        uint32_t nsec;
So all in all, we re looking at 1 + 2 + 2 + 5 + 4 32-bit integers, 1 64-bits pointer, 2 32-bits padding, 20 bytes of sha1, for a total of 92 bytes, not counting the variable size for file paths. The average path length in mozilla-central, which only has slightly over 140 thousands of them, is 59 (including the terminal NUL character). Let s conservatively assume our crazy repository would have the same average, making the average cache entry 151 bytes. But memory allocators usually allocate more than requested. In this particular case, with the default allocator on GNU/Linux, it s 156 (weirdly enough, it s 152 on my machine). 156 times 4.3 billion 670 GB. Plus the 34.3 from the array of pointers: 704.3 GB. Of RAM. Not counting the memory allocator overhead of handling that. Or all the other things git might have in memory as well (which apparently involves a hashmap, too, but I won t look at that, I promise). I think one would have run out of memory before hitting that integer overflow. Interestingly, looking at Documentation/technical/index-format.txt again, the on-disk format appears smaller, with 62 bytes per file instead of 92, so the corresponding index file would be smaller. (And in version 4, paths are prefix-compressed, so paths would be smaller too). But having an index that large supposes those files are checked out. So let s say I have an empty ext4 file system as large as possible (which I m told is 2^60 bytes (1.15 billion gigabytes)). Creating a small empty ext4 tells me at least 10 inodes are allocated by default. I seem to remember there s at least one reserved for the journal, there s the top-level directory, and there s lost+found ; there apparently are more. Obviously, on that very large file system, We d have a git repository. git init with an empty template creates 9 files and directories, so that s 19 more inodes taken. But git init doesn t create an index, and doesn t have any objects. We d thus have at least one file for our hundreds of gigabyte index, and at least 2 who-knows-how-big files for the objects (a pack and its index). How many inodes does that leave us with? The Linux kernel source tells us the number of inodes in an ext4 file system is stored in a 32-bits integer. So all in all, if we had an empty very large file system, we d only be able to store, at best, 2^32 22 files And we wouldn t even be able to get cache_nr to overflow. while following the rules. Because the index can keep files that have been removed, it is actually possible to fill the index without filling the file system. After hours (days? months? years? decades?*) of running
seq 0 4294967296   while read i; do touch $i; git update-index --add $i; rm $i; done
One should be able to reach the integer overflow. But that d still require hundreds of gigabytes of disk space and even more RAM. Ok, it s actually much faster to do it hundreds of thousand files at a time, with something like:
seq 0 100000 4294967296   while read i; do j=$(seq $i $(($i + 99999))); touch $j; git update-index --add $j; rm $j; done
At the rate the first million files were added, still assuming a constant rate, it would take about a month on my machine. Considering reading/writing a list of a million files is a thousand times faster than reading a list of a billion files, assuming linear increase, we re still talking about decades, and plentiful RAM. Fun fact: after leaving it run for 5 times as much as it had run for the first million files, it hasn t even done half more One could generate the necessary hundreds-of-gigabytes index manually, that wouldn t be too hard, and assuming it could be done at about 1 GB/s on a good machine with a good SSD, we d be able to craft a close-to-explosion index within a few minutes. But we d still lack the RAM to load it. So, here is the open question: should I report that integer overflow? Wow, that was some serious procrastination. Edit: Epilogue: Actually, oops, there is a separate integer overflow on the reading side that can trigger a buffer overflow, that doesn t actually require a large index, just a crafted header, demonstrating that yes, not all integer overflows are equal.

19 May 2016

Petter Reinholdtsen: I want the courts to be involved before the police can hijack a news site DNS domain (#domstolkontroll)

I just donated to the NUUG defence "fond" to fund the effort in Norway to get the seizure of the news site tested in court. I hope everyone that agree with me will do the same. Would you be worried if you knew the police in your country could hijack DNS domains of news sites covering free software system without talking to a judge first? I am. What if the free software system combined search engine lookups, bittorrent downloads and video playout and was called Popcorn Time? Would that affect your view? It still make me worried. In March 2016, the Norwegian police seized (as in forced NORID to change the IP address pointed to by it to one controlled by the police) the DNS domain, without any supervision from the courts. I did not know about the web site back then, and assumed the courts had been involved, and was very surprised when I discovered that the police had hijacked the DNS domain without asking a judge for permission first. I was even more surprised when I had a look at the web site content on the Internet Archive, and only found news coverage about Popcorn Time, not any material published without the right holders permissions. The seizure was widely covered in the Norwegian press (see for example Hegnar Online and ITavisen and NRK), at first due to the press release sent out by kokrim, but then based on protests from the law professor Olav Torvund and lawyer Jon Wessel-Aas. It even got some coverage on TorrentFreak. I wrote about the case a month ago, when the Norwegian Unix User Group (NUUG), where I am an active member, decided to ask the courts to test this seizure. The request was denied, but NUUG and its co-requestor EFN have not given up, and now they are rallying for support to get the seizure legally challenged. They accept both bank and Bitcoin transfer for those that want to support the request. If you as me believe news sites about free software should not be censored, even if the free software have both legal and illegal applications, and that DNS hijacking should be tested by the courts, I suggest you show your support by donating to NUUG.

7 May 2016

Craig Small: Displaying Linux Memory

Memory management is hard, but RAM management may be even harder. Most people know the vague overall concept of how memory usage is displayed within Linux. You have your total memory which is everything inside the box; then there is used and free which is what the system is or is not using respectively. Some people might know that not all used is used and some of it actually is free. It can be very confusing to understand, even for a someone who maintains procps (the package that contains top and free, two programs that display memory usage). So, how does the memory display work? What free shows The free program is part of the procps package. It s central goal is to give a quick overview of how much memory is used where. A typical output (e.g. what I saw when I typed free -h ) could look like this:
      total   used    free   shared  buff/cache  available
Mem:    15G   3.7G    641M     222M         11G        11G
Swap:   15G   194M     15G
I ve used the -h option for human-readable output here for the sake of brevity and because I hate typing long lists of long numbers. People who have good memories (or old computers) may notice there is a missing -/+ buffers/cache line. This was intentionally removed in mid-2014 because as the memory management of Linux got more and more complicated, these lines became less relevant. These used to help with the not used used memory problem mentioned in the introduction but progress caught up with it. To explain what free is showing, you need to understand some of the underlying statistics that it works with. This isn t a lesson on how Linux its memory (the honest short answer is, I don t fully know) but just enough hopefully to understand what free is doing. Let s start with the two simple columns first; total and free. Total Memory This is what memory you have available to Linux. It is almost, but not quite, the amount of memory you put into a physical host or the amount of memory you allocate for a virtual one. Some memory you just can t have; either due to early reservations or devices shadowing the memory area. Unless you start mucking around with those settings or the virtual host, this number stays the same. Free Memory Memory that nobody at all is using. They haven t reserved it, haven t stashed it away for future use or even just, you know, actually using it. People often obsess about this statistic but its probably the most useless one to use for anything directly. I have even considered removing this column, or replacing it with available (see later what that is) because of the confusion this statistic causes. The reason for its uselessness is that Linux has memory management where it allocates memory it doesn t use. This decrements the free counter but it is not truly used . If you application needs that memory, it can be given back. A very important statistic to know for running a system is how much memory have I got left before I either run out or I start to serious swap stuff to swap drives. Despite its name, this statistic will not tell you that and will probably mislead you. My advice is unless you really understand the Linux memory statistics, ignore this one. Who s Using What Now we come to the components that are using (if that is the right word) the memory within a system. Shared Memory Shared memory is often thought of only in the context of processes (and makes working out how much memory a process uses tricky but that s another story) but the kernel has this as well. The shared column lists this, which is a direct report from the Shmem field in the meminfo file. Slabs For things used a lot within the kernel, it is inefficient to keep going to get small bits of memory here and there all the time. The kernel has this concept of slabs where it creates small caches for objects or in-kernel data strucutures that slabinfo(5) states [such as] buffer heads, inodes and dentries . So basically kernel stuff for the kernel to do kernelly things with. Slab memory comes in two flavours. There is reclaimable and unreclaimable. This is important because unreclaimable cannot be handed back if your system starts to run out of memory. Funny enough, not all reclaimable is, well, reclaimable. A good estimate is you ll only get 50% back, top and free ignore this inconvenient truth and assume it can be 100%. All of the reclaimable slab memory is considered part of the Cached statistic. Unreclaimable is memory that is part of Used. Page Cache and Cached Page caches are used to read and write to storage, such as a disk drive. These are the things that get written out when you use sync and make the second read of the same file much faster. An interesting quirk is that tmpfs is part of the page cache. So the Cached column may increase if you have a few of these. The Cached column may seem like it should only have Page Cache, but the Reclaimable part of the Slab is added to this value. For some older versions of some programs, they will have no or all Slab counted in Cached. Both of these versions are incorrect. Cached makes up part of the buff/cache column with the standard options for free or has a column to itself for the wide option. Buffers The second component to the buff/cache column (or separate with the wide option) is kernel buffers. These are the low-level I/O buffers inside the kernel. Generally they are small compared to the other components and can basically ignored or just considered part of the Cached, which is the default for free. Used Unlike most of the previous statistics that are either directly pulled out of the meminfo file or have some simple addition, the Used column is calculated and completely dependent on the other values. As such it is not telling the whole story here but it is reasonably OK estimate of used memory. Used component is what you have left of your Total memory once you have removed: Notice that the unreclaimable part of slab is not in this calculation, which means it is part of the used memory. Also note this seems a bit of a hack because as the memory management gets more complicated, the estimates used become less and less real. Available In early 2014, the kernel developers took pity on us toolset developers and gave us a much cleaner, simpler way to work out some of these values (or at least I d like to think that s why they did it). The available statistic is the right way to work out how much memory you have left. The commit message explains the gory details about it, but the great thing is that if they change their mind or add some new memory feature the available value should be changed as well. We don t have to worry about should all of slab be in Cached and are they part of Used or not, we have just a number directly out of meminfo. What does this mean for free? Poor old free is now at least 24 years old and it is based upon BSD and SunOS predecessors that go back way before then. People expect that their system tools don t change by default and show the same thing over and over. On the other side, Linux memory management has changed dramatically over those years. Maybe we re all just sheep (see I had to mention sheep or RAMs somewhere in this) and like things to remain the same always. Probably if free was written now; it would only need the total, available and used columns with used merely being total minus available. Possibly with some other columns for the wide option. The code itself (found in libprocps) is not very hard to maintain so its not like this change will same some time but for me I m unsure if free is giving the right and useful result for people that use it.

13 February 2016

Mark Brown: Performance problems

Just over a year ago I implemented an optimization to the SPI core code in Linux that avoids some needless context switches to a worker thread in the main data path that most clients use. This was really nice, it was simple to do but saved a bunch of work for most drivers using SPI and made things noticeably faster. The code got merged in v4.0 and that was that, I kept on kicking a few more ideas for optimizations in this area around but that was that until the past month. What happened then was that for whatever reason people started picking up v4.0 and using it in production more. On some systems people started seeing problems when there was heavy SPI flash usage, often during things like distribution installation. In some cases the lockup detector fired, but the most entertaining error was that on Marvell Orion systems (which are single core) when the flash was being heavily used the SATA controller started having trouble handling interrupts. These problems all bisected down to the key commit in that series, 0461a4149836c79 (spi: Pump transfers inside calling context for spi_sync()). The problem is that there are a number of widely deployed SPI controllers out there that don t support DMA and instead require the CPU to explicitly read and write everything sent to and from registers in the controller. To make matters worse these accesses to the controller will usually take many CPU cycles to complete, each one stalling the CPU while they happen. This is fine for short transfers or if the CPU has nothing else to do but on a busy multitasking system it s an issue. Before the optimization the switches between the worker thread interacting with the hardware and the thread initiating the SPI operations provided breaks in this activity which allowed other things to switch in. Unfortunately when we optimize those away then if there s a lot of work for the controller being done from one thread then that thread can run for a long time without pause. The fix for affected drivers if there are no less CPU intensive ways of driving the hardware is to add some explicit sleeps into the driver itself, either at the end of the transfer_one() or perhaps in an unprepare_message() function. In a way I was quite pleased to see this, it was a clear demonstration that the optimization was having the intended effect though obviously users of affected systems will not find that so comforting. It s not the first time that making things faster or fixing a bug has revealed an underlying problem, I m sure it won t be the last.

7 January 2016

Francois Marier: Streamzap remotes and evdev in MythTV

Modern versions of Linux and MythTV enable infrared remote controls without the need for lirc. Here's how I migrated my Streamzap remote to evdev.

Installing packages In order to avoid conflicts between evdev and lirc, I started by removing lirc and its config:
apt purge lirc
and then I installed this tool:
apt install ir-keytable

Remapping keys While my Streamzap remote works out of the box with kernel 3.16, the keycodes that it sends to Xorg are not the ones that MythTV expects. I therefore copied the existing mapping:
cp /lib/udev/rc_keymaps/streamzap /home/mythtv/
and changed it to this:
0x28c0 KEY_0
0x28c1 KEY_1
0x28c2 KEY_2
0x28c3 KEY_3
0x28c4 KEY_4
0x28c5 KEY_5
0x28c6 KEY_6
0x28c7 KEY_7
0x28c8 KEY_8
0x28c9 KEY_9
0x28ca KEY_ESC
0x28cb KEY_MUTE #  
0x28cc KEY_UP
0x28ce KEY_DOWN
0x28d0 KEY_UP
0x28d1 KEY_LEFT
0x28d2 KEY_ENTER
0x28d3 KEY_RIGHT
0x28d4 KEY_DOWN
0x28d5 KEY_M
0x28d6 KEY_ESC
0x28d7 KEY_L
0x28d8 KEY_P
0x28d9 KEY_ESC
0x28da KEY_BACK # <
0x28db KEY_FORWARD # >
0x28dc KEY_R
0x28e0 KEY_D
0x28e1 KEY_I
0x28e2 KEY_END
0x28e3 KEY_A
The complete list of all EV_KEY keycodes can be found in the kernel. The following command will write this mapping to the driver:
/usr/bin/ir-keytable w /home/mythtv/streamzap -d /dev/input/by-id/usb-Streamzap__Inc._Streamzap_Remote_Control-event-if00
and they should take effect once MythTV is restarted.

Applying the mapping at boot While the na ve solution is to apply the mapping at boot (for example, by sticking it in /etc/rc.local), that only works if the right modules are loaded before rc.local runs. A much better solution is to write a udev rule so that the mapping is written after the driver is loaded. I created /etc/udev/rules.d/streamzap.rules with the following:
# Configure remote control for MythTV
ACTION=="add", ATTRS idVendor =="0e9c", ATTRS idProduct =="0000", RUN+="/usr/bin/ir-keytable -c -w /home/mythtv/streamzap -D 1000 -P 250 -d /dev/input/by-id/usb-Streamzap__Inc._Streamzap_Remote_Control-event-if00"
and got the vendor and product IDs using:
grep '^[IN]:' /proc/bus/input/devices
The -D and -P parameters control what happens when a button on the remote is held down and the keypress must be repeated. These delays are in milliseconds.

8 December 2015

Mark Brown: Maximising the efficiency of chained regulators

Linux v4.4 will include a cool new feature contributed by Sascha Hauer of Pengutronix which propagates voltages set on a regulator to the regulators that supply it (taking into account the minimum headroom that the child regulator needs). The original reason for implementing it was to allow us to set voltages through simple unregulated power switches but the cool bit is that we can also use this to save power in some systems. There are two standard types of voltage regulator, DCDCs which are very efficient but produce noisy output and LDOs which are much less efficient but a lot cheaper and simpler and produce much cleaner output. What a lot of systems do to avoid a lot of the inefficiency of LDOs is to use a DCDC to reduce the voltage from the main system power supply (eg, the battery) to something close to the minimum power supply for the LDOs in the system This means that most of the voltage reduction (which is what generates inefficiency) comes from the DCDC rather than the LDO but you still get the clean power supply from the LDO and can have several different output voltages from a single expensive DCDC. By managing the voltage we set on the DCDC at runtime depending on the LDO configurations we can maximise the power savings from this setup, putting as much of the work onto the DCDC as we can at any given moment. This has been at the back of my mind for a long time, I m really pleased to see it implemented. It s a pretty small change code wise and probably not worth implementing for any one system but when we do it in the core like this hopefully many systems will be able to use it and the effects will add up.

8 October 2015

Norbert Preining: Looking at the facts: Sarah Sharp s crusade

Much has been written around the internet about this geeky kernel maintainer Sarah Sharp who left kernel development. I have now spent two hours reading through lkml posts, and want to summarize a few mails from the long thread, since most of the usual news sites just rewrap the original blog of hers without adding any background. darth-cookie The whole thread evolved out call for stable kernel review by Greg Kroah-Hartman where he complained about too many patches that are not actually in rc1 before going into stable:
  I'm sitting on top of over 170 more patches that have been marked for
  the stable releases right now that are not included in this set of
  releases.  The fact that there are this many patches for stable stuff
  that are waiting to be merged through the main -rc1 merge window cycle
  is worrying to me.
from where it developed into a typical Linus rant on people flagging crap for stable, followed by some jokes:
On Fri, Jul 12, 2013 at 8:47 AM, Steven Rostedt <> wrote:
> I tend to hold things off after -rc4 because you scare me more than Greg
> does ;-)
Have you guys *seen* Greg? The guy is a freakish giant. He *should*
scare you. He might squish you without ever even noticing.
and Ingo Molnar giving advice to Greg KH:
So Greg, if you want it all to change, create some _real_ threat: be frank 
with contributors and sometimes swear a bit. That will cut your mailqueue 
in half, promise!
with Greg KH taking a funny position in answering:
Ok, I'll channel my "inner Linus" and take a cue from my kids and start
swearing more.
Up to now a pretty decent and normal thread with some jokes and poking, nobody minded, and reading through it I had a good time. The thread continues with a discussion on requirements what to submit to stable, and some side threads on particular commits. And then, out of the blue, Social Justice Warrior (SJW) Sarah Sharp pops in with a very important contribution:
Seriously, guys?  Is this what we need in order to get improve -stable?
Linus Torvalds is advocating for physical intimidation and violence.
Ingo Molnar and Linus are advocating for verbal abuse.
Not *fucking* cool.  Violence, whether it be physical intimidation,
verbal threats or verbal abuse is not acceptable.  Keep it professional
on the mailing lists.
Let's discuss this at Kernel Summit where we can at least yell at each
other in person.  Yeah, just try yelling at me about this.  I'll roar
right back, louder, for all the people who lose their voice when they
get yelled at by top maintainers.  I won't be the nice girl anymore.
Onto which Linus answers in a great way:
That's the spirit.
Greg has taught you well. You have controlled your fear. Now, release
your anger. Only your hatred can destroy me.
Come to the dark side, Sarah. We have cookies.
On goes Sarah, gearing up in her SJW mode and starting to rant:
However, I am serious about this.  Linus, you're one of the worst
offenders when it comes to verbally abusing people and publicly tearing
their emotions apart.
I'm not going to put up with that shit any more.
Linus himself made clear what he thinks of her:
Trust me, there's a really easy way for me to curse at people: if you
are a maintainer, and you make excuses for your bugs rather than
trying to fix them, I will curse at *YOU*.
Because then the problem really is you.
It is easy to verify what Linus said, by reading the above two links and the answers of the maintainers, both agreed that it was their failure and were sorry. (Mauro s answer, Rafael s answer) It is just the geeky SJW that was not even attacked (who would dare to attack a woman nowadays?). The overall reaction to her by the maintainers can be exemplified by Thomas Gleixner s post:
Just for the record. I got grilled by Linus several times over the
last years and I can't remember a single instance where it was
What follows is a nearly endless discussion with Sarah meandering around, changing permanently her opinion what is acceptable. Linus tried to explain to her in simple words, without success, she continues to rant around. Here arguments are so weak I had nothing but good laugh:
> Sarah, that's a pretty potent argument by Linus, that "acting 
> professionally" risks replacing a raw but honest culture with a
> polished but dishonest culture - which is harmful to developing
> good technology.
> That's a valid concern. What's your reply to that argument?
I don't feel the need to comment, because I feel it's a straw man
argument.  I feel that way because I disagree with the definition of
professionalism that people have been pushing.
To me, being "professional" means treating each other with respect.  I
can show emotion, express displeasure, be direct, and still show respect
for my fellow developers.
For example, I find the following statement to be both direct and
respectful, because it's criticizing code, not the person:
"This code is SHIT!  It adds new warnings and it's marked for stable
when it's clearly *crap code* that's not a bug fix.  I'm going to revert
this merge, and I expect a fix from you IMMEDIATELY."
The following statement is not respectful, because it targets the
"Seriously, Maintainer.  Why are you pushing this kind of *crap* code to
me again?  Why the hell did you mark it for stable when it's clearly
not a bug fix?  Did you even try to f*cking compile this?"
Fortunately, she was immediately corrected and Ingo Molnar wrote an excellent refutation (starting another funny thread) of all her emails, statements, accusations (all of the email is a good read):
_That_ is why it might look to you as if the person was
attacked, because indeed the actions of the top level maintainer were
wrong and are criticised.
... and now you want to 'shut down' the discussion. With all due respect,
you started it, you have put out various heavy accusations here and elsewhere,
so you might as well take responsibility for it and let the discussion be
brought to a conclusion, wherever that may take us, compared to your initial view?
(He retracted that last statement, though I don t see a reason for it) Last but not least, let us return to her blog post, where she states herself that:
FYI, comments will be moderated by someone other than me. As this is my blog, not a
government entity, I have the right to replace any comment I feel like with 
 fart fart fart fart .
and she made lots of use of it, I counted at least 10 instances. She seems to remove or fart fart fart any comment that is not in line with her opinion. Further evidence is provided by this post on lkml. Everyone is free to have his own opinion (sorry, his/her), and I am free to form my own opinion on Sarah Sharp by just simply reading the facts. I am more than happy that one more SJW has left Linux development, as the proliferation of cleaning of speech from any personality has taken too far a grip. Coming to my home-base in Debian, unfortunately there is no one in the position and the state of mind of Linus, so we are suffering the same stupidities imposed by social justice worriers and some brainless feminists (no, don t get me wrong, these are two independent attributes. I do NOT state that feminism is brainless) that Linus and the maintainer crew was able to fend of this time. I finish with my favorite post from that thread, by Steven Rosted (from whom I also stole the above image!):
On Tue, 2013-07-16 at 18:37 -0700, Linus Torvalds wrote:
> Emotions aren't bad. Quite the reverse. 
Spock and Dr. Sheldon Cooper strongly disagree.
Post Scriptum (after a bike ride) The last point by Linus is what I criticize most on Debian nowdays, it has become a sterilized over-governed entity, where most fun is gone. Making fun is practically forbidden, since there is the slight chance that some minority somewhere on this planet might feel hurt, and by this we are breaking the CoC. Emotions are restricted to the Happy happy how nice we are and how good we are level of US and also Japanese self-reenforcement society. Post Post Scriptum I just read Sarah Sharp s post on What makes a good community? , and without giving a full account or review, I am just pi**ed by the usage of the word microaggressions I can only recommend everyone to read this article and this article to get a feeling how bs the idea of microaggressions has taken over academia and obviously not only academia. Post3 Scriptum I am happy to see Lars Wirzenius, Gunnar Wolf, and Mart n Ferrari opposing my view. I agree with them that my comments concerning Debian are not mainstream in Debian something that is not very surprising, though, and I think it is great that they have fun in Debian, like many other contributors. Post4 Scriptum Although nobody will read this, here is a great response from a female developer:
[...] To Linus: You're a hero to many of us. Don't change. Please. You DO
NOT need to take time away from doing code to grow a pair of breasts
and judge people's emotional states: [...]
Nothing to add here!

6 October 2015

Matthew Garrett: Going my own way

Reaction to Sarah's post about leaving the kernel community was a mixture of terrible and touching, but it's still one of those things that almost certainly won't end up making any kind of significant difference. Linus has made it pretty clear that he's fine with the way he behaves, and nobody's going to depose him. That's unfortunate, because earlier today I was sitting in a presentation at Linuxcon and remembering how much I love the technical side of kernel development. "Remembering" is a deliberate choice of word - it's been increasingly difficult to remember that, because instead I remember having to deal with interminable arguments over the naming of an interface because Linus has an undying hatred of BSD securelevel, or having my name forever associated with the deepthroating of Microsoft because Linus couldn't be bothered asking questions about the reasoning behind a design before trashing it.

In the end it's a mixture of just being tired of dealing with the crap associated with Linux development and realising that by continuing to put up with it I'm tacitly encouraging its continuation, but I can't be bothered any more. And, thanks to the magic of free software, it turns out that I can avoid putting up with the bullshit in the kernel community and get to work on the things I'm interested in doing. So here's a kernel tree with patches that implement a BSD-style securelevel interface. Over time it'll pick up some of the power management code I'm still working on, and we'll see where it goes from there. But, until there's a significant shift in community norms on LKML, I'll only be there when I'm being paid to be there. And that's improved my mood immeasurably.

(Edited to add a context link for the "deepthroating of Microsoft" reference)

comment count unavailable comments

30 August 2015

Dirk Eddelbuettel: RcppGSL 0.3.0

A new version of RcppGSL just arrived on CRAN. The RcppGSL package provides an interface from R to the GNU GSL using our Rcpp package. Following on the heels of an update last month we updated the package (and its vignette) further. One of the key additions concern memory management: Given that our proxy classes around the GSL vector and matrix types are real C++ object, we can monitor their scope and automagically call free() on them rather then insisting on the user doing it. This renders code much simpler as illustrated below. Dan Dillon added const correctness over a series on pull request which allows us to write more standard (and simply nicer) function interfaces. Lastly, a few new typedef declaration further simply the use of the (most common) double and int vectors and matrices. Maybe a code example will help. RcppGSL contains a full and complete example package illustrating how to write a package using the RcppGSL facilities. It contains an example of computing a column norm -- which we blogged about before when announcing an much earlier version. In its full glory, it looks like this:
#include <RcppGSL.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
extern "C" SEXP colNorm(SEXP sM)  
        RcppGSL::matrix<double> M = sM;     // create gsl data structures from SEXP
        int k = M.ncol();
        Rcpp::NumericVector n(k);           // to store results
        for (int j = 0; j < k; j++)  
            RcppGSL::vector_view<double> colview = gsl_matrix_column (M, j);
            n[j] = gsl_blas_dnrm2(colview);
        return n;                           // return vector
    catch( std::exception &ex )  
        forward_exception_to_r( ex );
        ::Rf_error( "c++ exception (unknown reason)" );
  return R_NilValue; // -Wall
We manually translate the SEXP coming from R, manually cover the try and catch exception handling, manually free the memory etc pp. Well in the current version, the example is written as follows:
#include <RcppGSL.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
// [[Rcpp::export]]
Rcpp::NumericVector colNorm(const RcppGSL::Matrix & G)  
    int k = G.ncol();
    Rcpp::NumericVector n(k);           // to store results
    for (int j = 0; j < k; j++)  
        RcppGSL::VectorView colview = gsl_matrix_const_column (G, j);
        n[j] = gsl_blas_dnrm2(colview);
    return n;                           // return vector
This takes full advantage of Rcpp Attributes automagically creating the interface and exception handler (as per the previous release), adds a const & interface, does away with the tedious and error-pronce free() and uses the shorter-typedef forms for RcppGSL::Matrix and RcppGSL::VectorViews using double variables. Now the function is short and concise and hence easier to read and maintain. The package vignette has more details on using RcppGSL. The NEWS file entries follows below:
Changes in version 0.3.0 (2015-08-30)
  • The RcppGSL matrix and vector class now keep track of object allocation and can therefore automatically free allocated object in the destructor. Explicit use is still supported.
  • The matrix and vector classes now support const reference semantics in the interfaces (thanks to PR #7 by Dan Dillon)
  • The matrix_view and vector_view classes are reorganized to better support const arguments (thanks to PR #8 and #9 by Dan Dillon)
  • Shorthand forms such as Rcpp::Matrix have been added for double and int vectors and matrices including views.
  • Examples such as fastLm can now be written in a much cleaner and shorter way as GSL objects can appear in the function signature and without requiring explicit .free() calls at the end.
  • The included examples, as well as the introductory vignette, have been updated accordingly.
Courtesy of CRANberries, a summary of changes to the most recent release is available. More information is on the RcppGSL page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

23 July 2015

Daniel Pocock: Unpaid work training Google's spam filters

This week, there has been increased discussion about the pain of spam filtering by large companies, especially Google. It started with Google's announcement that they are offering a service for email senders to know if their messages are wrongly classified as spam. Two particular things caught my attention: the statement that less than 0.05% of genuine email goes to the spam folder by mistake and the statement that this new tool to understand misclassification is only available to "help qualified high-volume senders". From there, discussion has proceeded with Linus Torvalds blogging about his own experience of Google misclassifying patches from Linux contributors as spam and that has been widely reported in places like Slashdot and The Register. Personally, I've observed much the same thing from the other perspective. While Torvalds complains that he isn't receiving email, I've observed that my own emails are not always received when the recipient is a Gmail address. It seems that Google expects their users work a little bit every day going through every message in the spam folder and explicitly clicking the "Not Spam" button: so that Google can improve their proprietary algorithms for classifying mail. If you just read or reply to a message in the folder without clicking the button, or if you don't do this for every message, including mailing list posts and other trivial notifications that are not actually spam, more important messages from the same senders will also continue to be misclassified. If you are not willing to volunteer your time to do this, or if you are simply one of those people who has better things to do, Google's Gmail service is going to have a corrosive effect on your relationships. A few months ago, we visited Australia and I sent emails to many people who I wanted to catch up with, including invitations to a family event. Some people received the emails in their inboxes yet other people didn't see them because the systems at Google (and other companies, notably Hotmail) put them in a spam folder. The rate at which this appeared to happen was definitely higher than the 0.05% quoted in the Google article above. Maybe the Google spam filters noticed that I haven't sent email to some members of the extended family for a long time and this triggered the spam algorithm? Yet it was at that very moment that we were visiting Australia that email needs to work reliably with that type of contact as we don't fly out there every year. A little bit earlier in the year, I was corresponding with a few students who were applying for Google Summer of Code. Some of them also observed the same thing, they sent me an email and didn't receive my response until they were looking in their spam folder a few days later. Last year I know a GSoC mentor who lost track of a student for over a week because of Google silently discarding chat messages, so it appears Google has not just shot themselves in the foot, they managed to shoot their foot twice. What is remarkable is that in both cases, the email problems and the XMPP problems, Google doesn't send any error back to the sender so that they know their message didn't get through. Instead, it is silently discarded or left in a spam folder. This is the most corrosive form of communication problem as more time can pass before anybody realizes that something went wrong. After it happens a few times, people lose a lot of confidence in the technology itself and try other means of communication which may be more expensive, more synchronous and time intensive or less private. When I discussed these issues with friends, some people replied by telling me I should send them things through Facebook or WhatsApp, but each of those services has a higher privacy cost and there are also many other people who don't use either of those services. This tends to fragment communications even more as people who use Facebook end up communicating with other people who use Facebook and excluding all the people who don't have time for Facebook. On top of that, it creates more tedious effort going to three or four different places to check for messages. Despite all of this, the suggestion that Google's only response is to build a service to "help qualified high-volume senders" get their messages through leaves me feeling that things will get worse before they start to get better. There is no mention in the Google announcement about what they will offer to help the average person eliminate these problems, other than to stop using Gmail or spend unpaid time meticulously training the Google spam filter and hoping everybody else does the same thing. Some more observations on the issue Many spam filtering programs used in corporate networks, such as SpamAssassin, add headers to each email to suggest why it was classified as spam. Google's systems don't appear to give any such feedback to their users or message senders though, just a very basic set of recommendations for running a mail server. Many chat protocols work with an explicit opt-in. Before you can exchange messages with somebody, you must add each other to your buddy lists. Once you do this, virtually all messages get through without filtering. Could this concept be adapted to email, maybe giving users a summary of messages from people they don't have in their contact list and asking them to explicitly accept or reject each contact? If a message spends more than a week in the spam folder and Google detects that the user isn't ever looking in the spam folder, should Google send a bounce message back to the sender to indicate that Google refused to deliver it to the inbox? I've personally heard that misclassification occurs with mailing list posts as well as private messages.