Search Results: "wijnen"

14 December 2016

Antoine Beaupr : Django debates privacy concern

In recent years, privacy issues have become a growing concern among free-software projects and users. As more and more software tasks become web-based, surveillance and tracking of users is also on the rise. While some software may use advertising as a source of revenue, which has the side effect of monitoring users, the Django community recently got into an interesting debate surrounding a proposal to add user tracking actually developer tracking to the popular Python web framework.

Tracking for funding A novel aspect of this debate is that the initiative comes from concerns of the Django Software Foundation (DSF) about funding. The proposal suggests that "relying on the free labor of volunteers is ineffective, unfair, and risky" and states that "the future of Django depends on our ability to fund its development". In fact, the DSF recently hired an engineer to help oversee Django's development, which has been quite successful in helping the project make timely releases with fewer bugs. Various fundraising efforts have resulted in major new Django features, but it is difficult to attract sponsors without some hard data on the usage of Django. The proposed feature tries to count the number of "unique developers" and gather some metrics of their environments by using Google Analytics (GA) in Django. The actual proposal (DEP 8) is done as a pull request, which is part of Django Enhancement Proposal (DEP) process that is similar in spirit to the Python Enhancement Proposal (PEP) process. DEP 8 was brought forward by a longtime Django developer, Jacob Kaplan-Moss. The rationale is that "if we had clear data on the extent of Django's usage, it would be much easier to approach organizations for funding". The proposal is essentially about adding code in Django to send a certain set of metrics when "developer" commands are run. The system would be "opt-out", enabled by default unless turned off, although the developer would be warned the first time the phone-home system is used. The proposal notes that an opt-in system "severely undercounts" and is therefore not considered "substantially better than a community survey" that the DSF is already doing.

Information gathered The pieces of information reported are specifically designed to run only in a developer's environment and not in production. The metrics identified are, at the time of writing:
  • an event category (the developer commands: startproject, startapp, runserver)
  • the HTTP User-Agent string identifying the Django, Python, and OS versions
  • a user-specific unique identifier (a UUID generated on first run)
The proposal mentions the use of the GA aip flag which, according to GA documentation, makes "the IP address of the sender 'anonymized'". It is not quite clear how that is done at Google and, given that it is a proprietary platform, there is no way to verify that claim. The proposal says it means that "we can't see, and Google Analytics doesn't store, your actual IP". But that is not actually what Google does: GA stores IP addresses, the documentation just says they are anonymized, without explaining how. GA is presented as a trade-off, since "Google's track record indicates that they don't value privacy nearly as high" as the DSF does. The alternative, deploying its own analytics software, was presented as making sustainability problems worse. According to the proposal, Google "can't track Django users. [...] The only thing Google could do would be to lie about anonymizing IP addresses, and attempt to match users based on their IPs". The truth is that we don't actually know what Google means when it "anonymizes" data: Jannis Leidel, a Django team member, commented that "Google has previously been subjected to secret US court orders and was required to collaborate in mass surveillance conducted by US intelligence services" that limit even Google's capacity of ensuring its users' anonymity. Leidel also argued that the legal framework of the US may not apply elsewhere in the world: "for example the strict German (and by extension EU) privacy laws would exclude the automatic opt-in as a lawful option". Furthermore, the proposal claims that "if we discovered Google was lying about this, we'd obviously stop using them immediately", but it is unclear exactly how this could be implemented if the software was already deployed. There are also concerns that an implementation could block normal operation, especially in countries (like China) where Google itself may be blocked. Finally, some expressed concerns that the information could constitute a security problem, since it would unduly expose the version number of Django that is running.

In other projects Django is certainly not the first project to consider implementing analytics to get more information about its users. The proposal is largely inspired by a similar system implemented by the OS X Homebrew package manager, which has its own opt-out analytics. Other projects embed GA code directly in their web pages. This is apparently the option chosen by the Oscar Django-based ecommerce solution, but that was seen by the DSF as less useful since it would count Django administrators and wasn't seen as useful as counting developers. Wagtail, a Django-based content-management system, was incorrectly identified as using GA directly, as well. It actually uses referrer information to identify installed domains through the version updates checks, with opt-out. Wagtail didn't use GA because the project wanted only minimal data and it was worried about users' reactions. NPM, the JavaScript package manager, also considered similar tracking extensions. Laurie Voss, the co-founder of NPM, said it decided to completely avoid phoning home, because "users would absolutely hate it". But NPM users are constantly downloading packages to rebuild applications from scratch, so it has more complete usage metrics, which are aggregated and available via a public API. NPM users seem to find this is a "reasonable utility/privacy trade". Some NPM packages do phone home and have seen "very mixed" feedback from users, Voss said. Eric Holscher, co-founder of Read the Docs, said the project is considering using Sentry for centralized reporting, which is a different idea, but interesting considering Sentry is fully open source. So even though it is a commercial service (as opposed to the closed-source Google Analytics), it may be possible to verify any anonymity claims.

Debian's response Since Django is shipped with Debian, one concern was the reaction of the distribution to the change. Indeed, "major distros' positions would be very important for public reception" to the feature, another developer stated. One of the current maintainers of Django in Debian, Rapha l Hertzog, explicitly stated from the start that such a system would "likely be disabled by default in Debian". There were two short discussions on Debian mailing lists where the overall consensus seemed to be that any opt-out tracking code was undesirable in Debian, especially if it was aimed at Google servers. I have done some research to see what, exactly, was acceptable as a phone-home system in the Debian community. My research has revealed ten distinct bug reports against packages that would unexpectedly connect to the network, most of which were not directly about collecting statistics but more often about checking for new versions. In most cases I found, the feature was disabled. In the case of version checks, it seems right for Debian to disable the feature, because the package cannot upgrade itself: that task is delegated to the package manager. One of those issues was the infamous "OK Google" voice activation binary blog controversy that was previously reported here and has since then been fixed (although other issues remain in Chromium). I have also found out that there is no clearly defined policy in Debian regarding tracking software. What I have found, however, is that there seems to be a strong consensus in Debian that any tracking is unacceptable. This is, for example, an extract of a policy that was drafted (but never formally adopted) by Ian Jackson, a longtime Debian developer:
Software in Debian should not communicate over the network except: in order to, and as necessary to, perform their function[...]; or for other purposes with explicit permission from the user.
In other words, opt-in only, period. Jackson explained that "when we originally wrote the core of the policy documents, the DFSG [Debian Free Software Guidelines], the SC [Social Contract], and so on, no-one would have considered this behaviour acceptable", which explains why no explicit formal policy has been adopted yet in the Debian project. One of the concerns with opt-out systems (or even prompts that default to opt-in) was well explained back then by Debian developer Bas Wijnen:
It very much resembles having to click through a license for every package you install. One of the nice things about Debian is that the user doesn't need to worry about such things: Debian makes sure things are fine.
One could argue that Debian has its own tracking systems. For example, by default, Debian will "phone home" through the APT update system (though it only reports the packages requested). However, this is currently not automated by default, although there are plans to do so soon. Furthermore, Debian members do not consider APT as tracking, because it needs to connect to the network to accomplish its primary function. Since there are multiple distributed mirrors (which the user gets to choose when installing), the risk of surveillance and tracking is also greatly reduced. A better parallel could be drawn with Debian's popcon system, which actually tracks Debian installations, including package lists. But as Barry Warsaw pointed out in that discussion, "popcon is 'opt-in' and [...] the overwhelming majority in Debian is in favour of it in contrast to 'opt-out'". It should be noted that popcon, while opt-in, defaults to "yes" if users click through the install process. [Update: As pointed out in the comments, popcon actually defaults to "no" in Debian.] There are around 200,000 submissions at this time, which are tracked with machine-specific unique identifiers that are submitted daily. Ubuntu, which also uses the popcon software, gets around 2.8 million daily submissions, while Canonical estimates there are 40 million desktop users of Ubuntu. This would mean there is about an order of magnitude more installations than what is reported by popcon. Policy aside, Warsaw explained that "Debian has a reputation for taking privacy issues very serious and likes to keep it".

Next steps There are obviously disagreements within the Django project about how to handle this problem. It looks like the phone-home system may end up being implemented as a proxy system "which would allow us to strip IP addresses instead of relying on Google to anonymize them, or to anonymize them ourselves", another Django developer, Aymeric Augustin, said. Augustin also stated that the feature wouldn't "land before Django drops support for Python 2", which is currently estimated to be around 2020. It is unclear, then, how the proposal would resolve the funding issues, considering how long it would take to deploy the change and then collect the information so that it can be used to spur the funding efforts. It also seems the system may explicitly prompt the user, with an opt-out default, instead of just splashing a warning or privacy agreement without a prompt. As Shai Berger, another Django contributor, stated, "you do not get [those] kind of numbers in community surveys". Berger also made the argument that "we trust the community to give back without being forced to do so"; furthermore:
I don't believe the increase we might get in the number of reports by making it harder to opt-out, can be worth the ill-will generated for people who might feel the reporting was "sneaked" upon them, or even those who feel they were nagged into participation rather than choosing to participate.
Other options may also include gathering metrics in pip or PyPI, which was proposed by Donald Stufft. Leidel also proposed that the system could ask to opt-in only after a few times the commands are called. It is encouraging to see that a community can discuss such issues without heating up too much and shows great maturity for the Django project. Every free-software project may be confronted with funding and sustainability issues. Django seems to be trying to address this in a transparent way. The project is willing to engage with the whole spectrum of the community, from the top leaders to downstream distributors, including individual developers. This practice should serve as a model, if not of how to do funding or tracking, at least of how to discuss those issues productively. Everyone seems to agree the point is not to surveil users, but improve the software. As Lars Wirzenius, a Debian developer, commented: "it's a very sad situation if free software projects have to compromise on privacy to get funded". Hopefully, Django will be able to improve its funding without compromising its principles.
Note: this article first appeared in the Linux Weekly News.

25 April 2008

Lucas Nussbaum: Various stuff

New QA website I modified qa.debian.org’s stylesheet/template, using the PTS’s stylesheet as a basis. It looks a bit better. The content was also updated, so we should stop receiving totally outdated answers to the “What does the QA team do?” question in NM. Now, who is going to do the same thing with www.debian.org? :-) Closing bugs in removed packages When packages are removed from unstable and testing, their bugs are not necessarly marked as closed, so they can’t be archived. A few days ago, there was about 3300 open bugs filed against removed packages. Thanks to the work of Barry deFreese, Marco Rodrigues and Raphael Geissert, we are now down to ~2500 bugs. If you want to help, just drop in #debian-qa and ask about our scripts/process. (There are some tricky details) DEP #1: NMUs With Bas Wijnen, we finally announced the DEP about NMUs we have been working on. Please join the (currently very quiet) discussion!

29 August 2007

Miriam Ruiz: Internationalized hex-a-hop

The first serious i18n effort for the Games Team is now bearing fruit. The newer version of the game hex-a-hop is now entering Debian. All the merit goes to Jens Seidel, who has developed the patches for making it work with SDLPango and to support all the spectrum of Unicode characters, instead of the limited ASCII set included in the game, as well as to all the people of Debian i18n who have done the translations (Helge Kreutzmann, Damyan Ivanov, Enrique Mat as S nchez, Bas Wijnen, Piotr Engelking, Yuri Kozlov and Clytie Siddall). The game has already been translated to Bulgarian, German, Spanish, Dutch, Polish, Russian and Vietnamese. Thanks Jens, both for taking care of the changes in the code needed to achieve this, and also for coordinating all the t10n and i18n process, as I don’t really have much experience in those areas.

4 March 2006

Michael Banck: 3 Mar 2006

FOSDEM 2006 This year, the days before FOSDEM were the stressful ones, as I got to organize accomodation. Initially, we wanted to have similar appartments as last year, but by the time I was less busy at uni to actually look into it, most of them were already booked, so we had to put up with a youth hostel instead. The positive sides of this were the much lower expenses and a location in the city centre, making us actually look at Bruxelles a bit in detail this time. "Us" were the Hurd people, including Martin "earliest Hurd adopter present" Michlmayr. I got to FOSDEM by car again, picking up Marcus Brinkmann, Neal Walfield and Olaf Buddenhagen on the way in Cologne. Finding the youth hostel seemed to be pretty hard as we just had a street address and a map without street names, but we managed to find it pretty quickly to my great surprise (driving around in Bruxelles usually ended up being a complete disaster over the last years). After a strange encounter with a Guillem Jover lookalike in front of the hostel, we met the other guys (Thomas Schwinge, Marco Gerards, Stefan Siegl and Ognyan Kulev) and had a discussion about Neal's and Marcus' plan to move to a persistent system. After dinner, I met the other Debian people in the Roi d'Espagne and hat some longer chats with Jeroen van Wolffelaar, Rob Bradford, Martin Michlmayr and Jordi Mallach, who I finally met for the first time and who did not cop out of FOSDEM this year as usual... The pub is getting more and more crowded each year, all the hackers barely fit even though they opened the balustrade this time as well. It was great to see everybody again and have a few beers. Martin and I then managed to find the way back to the hostel by foot. We had no developer room, and no talks in the Debian room either, so FOSDEM was a pretty relaxed event this year. I met some more familiar faces like Noel Koethe and Andreas Mueller and listened to a couple of talks, most notably Richard Stallman's and Jeff Waugh's keynotes and Hanna Wallach's talk about FLOSSPOLS. Stefan Siegl also managed to get GNU Mach working for both my 3Com PCMCIA NIC and my Orinoco PCMCIA WLAN card, confirming his title as Hurd "hacker of the month". On Saturday evening, we (at this time, Guillem Jover, Gianluca Guida, Bas Wijnen and Jeroen Dekkers had joined) had dinner with the french Hurd guys (Manuel Menal, Marc Dequenes, Richard Braun, Arnaud Fontaine and others) in an italian restaurant. At 10:40 PM, the waiter told us in a rather unfriendly tone that they would close at 11 and presented us with the bill, along with handing out the menu again so that we could look up our share. By the time the bill arrived the french part of the table (at 10:55 PM), the guys were pretty surprised by this whole business and complained loudly that they did not have a dessert yet and insistent on having one. After some more minutes of discussion, the waiter gave in and served their desserts, after which each of them paid his share with his carte bleue. I believe we left the restaurant around 11:30. On Sunday evening, we had dinner again (the french guys had left Bruxelles already) and then drove back to Germany after having desserts and coffee in a bar. We left Bruxelles at around midnight and arrived in Duesseldorf at 2:30 PM, so we were glad that Neal offered us to stay at his place. We had breakfast the next morning with him and Isabel and then I proceeded to drive back to Frankfurt in the early afternoon. FOSDEM rocked, as usual. After being with the Debian crowd for the first three years or so, and mostly sticking with the Hurd crowd last year, I think I managed a pretty good balance between the two this year. This will not have been my last FOSDEM.