It's Sunday and I'm now sitting in the train from Brest to Paris where I will be changing to Germany, on the way back from the annual Debian conference. A full week of presentations, discussions, talks and socializing is laying behind me and my head is still spinning from the intensity.
Pollito and the gang of DebConf mascots wearing their conference badges (photo: Christoph Berg)
Table of Contents
Sunday, July 13th
It started last Sunday with traveling to the conference. I got on the Eurostar in Duisburg and we left on time, but even before reaching Cologne, the train was already one hour delayed for external reasons, collecting yet another hour between Aachen and Liege for its own technical problems. "The train driver is working on trying to fix the problem." My original schedule had well over two hours for changing train stations in Paris, but being that late, I missed the connection to Brest in Montparnasse. At least in the end, the total delay was only one hour when finally arriving at the destination. Due to the French julliet quatorze fireworks approaching, buses in Brest were rerouted, but I managed to catch the right bus to the conference venue, already meeting a few Debian people on the way.
The conference was hosted at the IMT Atlantique Brest campus, giving the event a nice university touch. I arrived shortly after 10 in the evening and after settling down a bit, got on one of the "magic" buses for transportation to the camping site where half of the attendees where stationed. I shared a mobile home with three other Debianites, where I got a small room for myself.
Monday, July 14th
Next morning, we took the bus back to the venue with a small breakfast and the opening session where Enrico Zini invited me to come to his and Nicolas Dandrimont's session about Debian community governance and curation, which I gladly did. Many ideas about conflict moderation and community steering were floated around. I hope some of that can be put into effect to make flamewars on the mailing lists less heated and more directed. After that, I attended Olly Betts' "Stemming with Snowball" session, which is the stemmer used also in PostgreSQL. Text search is one of the areas in PostgreSQL that I never really looked closely at, including the integration into the postgresql-common package, so it was nice to get more information about that.
In preparation for the conference, a few of us Ham radio operators in Debian had decided to bring some radio gear to DebConf this year in order to perhaps spark more interest for our hobby among the fellow geeks. In the afternoon after the talks, I found a quieter spot just outside of the main hall and set up a shortwave antenna by attaching a 10m mast to one of the park benches there. The 40m band was still pretty much closed, but I could work a few stations from England, just across the channel from Bretagne, answering questions from interested passing-by Debian people between the contacts. Over time, the band opened and more European stations got into the log.
F/DF7CB in Brest (photo: Evangelos Ribeiro Tzaras)
Tuesday, July 15th
Tuesday started with Helmut Grohne's session about "Reviving (un)schroot". The schroot program has been Debian's standard way of managing build chroots for a long time, but it is more and more being regarded as obsolete with all kinds of newer containerization and virtualization technologies taking over. Since many bits of Debian infrastructure depend on schroot, and its user interface is still very useful, Helmut reimplemented it using Linux namespaces and the "unshare" systemcall. I had already worked with him at the Hamburg Minidebconf to replace the apt.postgresql.org buildd machinery with the new system, but we were not quite there yet (network isolation is nice, but we still sometimes need proper networking), so it was nice to see the effort is still progressing and I will give his new scripts a try when I'm back home.
Next, Stefano Rivera and Colin Watson presented Debusine, a new package repository and workflow management system. It looks very promising for anyone running their own repository, so perhaps yet another bit of apt.postgresql.org infrastructure to replace in the future. After that, I went to the Debian LTS BoF session by Santiago Ruano Rinc n and Bastien Roucari s - Debian releases plus LTS is what we are covering with apt.postgresql.org. Then there were bits from the DPL (Debian Project Leader), and a session moderated by Stefano Rivera interesting to me as a member of the Debian Technical Committee on the future structure of the packages required for cross-building in Debian, a topic which had been brought to TC a while ago. I am happy that we could resolve the issue without having to issue a formal TC ruling as the involved parties (kernel, glibc, gcc and the cross-build people) found a promising way forward themselves. DebConf is really a good way to get such issues unstuck.
Ten years ago at the 2015 Heidelberg DebConf, Enrico had given a seminal "Semi-serious stand-up comedy" talk, drawing parallels between the Debian Open Source community and the BDSM community - "People doing things consensually together". (Back then, the talk was announced as "probably unsuitable for people of all ages".) With his unique presentation style and witty insights, the session made a lasting impression on everyone attending. Now, ten years later (and he and many in the audience being ten years older), he gave an updated version of it. We are now looking forward to the sequel in 2035. The evening closed with the famous DebConf tradition of the Cheese & Wine party in a old fort next to the coast, just below the conference venue. Even when he's a fellow Debian Developer, Ham and also TC member, I had never met Paul Tagliamonte in person before, but we spent most of the evening together geeking out on all things Debian and Ham radio.
The northern coast of Ushant (photo: Christoph Berg)
Wednesday, July 16th
Wednesday already marked the end of the first half of the week, the day of the day trips. I had chosen to go to Ouessant island (Ushant in English) which marks the Western end of French mainland and hosts one of the lighthouses yielding the way into the English channel. The ferry trip included surprisingly big waves which left some participants seasick, but everyone recovered fast. After around one and a half hours we arrived, picked up the bicycles, and spent the rest of the day roaming the island. The weather forecast was originally very cloudy and 18 C, but over noon this turned into sunny and warm, so many got an unplanned sunburn. I enjoyed the trip very much - it made up for not having time visiting the city during the week. After returning, we spent the rest of the evening playing DebConf's standard game, Mao (spoiler alert: don't follow the link if you ever intend to play).
Having a nice day (photo: Christoph Berg)
Thursday, July 17th
The next day started with the traditional "Meet the Technical Committee" session. This year, we trimmed the usual slide deck down to remove the boring boilerplate parts, so after a very short introduction to the work of the committee by our chairman Matthew Vernon, we opened up the discussion with the audience, with seven (out of 8) TC members on stage. I think the format worked very well, with good input from attendees. Next up was "Don't fear the TPM" by Jonathan McDowell. A common misconception in the Free Software community is that the TPM is evil DRM hardware working against the user, but while it could be used in theory that way, the necessary TPM attestations seem to impossible to attain in practice, so that wouldn't happen anyway. Instead, it is a crypto coprocessor present in almost all modern computers that can be used to hold keys, for example to be used for SSH. It will also be interesting to research if we can make use of it for holding the Transparent Data Encryption keys for CYBERTEC's PostgreSQL Enterprise Edition.
Aigars Mahinovs then directed everyone in place for the DebConf group picture, and Lucas Nussbaum started a discussion about archive-wide QA tasks in Debian, an area where I did a lot of work in the past and that still interests me. Antonio Terceiro and Paul Gevers followed up with techniques to track archive-wide rebuilding and testing of packages and in turn filing a lot of bugs to track the problems. The evening ended with the conference dinner, again in the fort close by the coast. DebConf is good for meeting new people, and I incidentally ran into another Chris, who happened to be one of the original maintainers of pgaccess, the pre-predecessor of today's pgadmin. I admit still missing this PostgreSQL frontend for its simplicity and ability to easily edit table data, but it disappeared around 2004.
Friday, July 18th
On Friday, I participated in discussion sessions around contributors.debian.org (PostgreSQL is planning to set up something similar) and the New Member process which I had helped to run and reform a decade or two ago. Agathe Porte (also a Ham radio operator, like so many others at the conference I had no idea of) then shared her work on rust-rewriting the slower parts of Lintian, the Debian package linter. Craig Small talked about "Free as in Bytes", the evolution of the Linux procps free command. Over the time and many kernel versions, the summary numbers printed became better and better, but there will probably never be a version that suits all use cases alike. Later over dinner, Craig (who is also a TC member) and I shared our experiences with these numbers and customers (not) understanding them. He pointed out that for PostgreSQL and looking at used memory in the presence of large shared memory buffers, USS (unique set size) and PSS (proportional set size) should be more realistic numbers than the standard RSS (resident set size) that the top utility is showing by default.
Antonio Terceiro and Paul Gevers again joined to lead a session, now on ci.debian.net and autopkgtest, the test driver used for running tests on packages after then have been installed on a system. The PostgreSQL packages are heavily using this to make sure no regressions creep in even after builds have successfully completed and test re-runs are rescheduled periodically. The day ended with Bdale Garbee's electronics team BoF and Paul Tagliamonte and me setting up the radio station in the courtyard, again answering countless questions about ionospheric conditions and operating practice.
Saturday, July 19th
Saturday was the last conference day. In the first session, Nikos Tsipinakis and Federico Vaga from CERN announced that the LHC will be moving to Debian for the accelerator's frontend computers in their next "long shutdown" maintenance period in the next year. CentOS broke compatibility too often, and Debian trixie together with the extended LTS support will cover the time until the next long shutdown window in 2035, until when the computers should have all been replaced with newer processors covering higher x86_64 baseline versions. The audience was very delighted to hear that Debian is now also being used in this prestige project.
Ben Hutchings then presented new Linux kernel features. Particularly interesting for me was the support for atomic writes spanning more than one filesystem block. When configured correctly, this would mean PostgreSQL didn't have to record full-page images in the WAL anymore, increasing throughput and performance. After that, the Debian ftp team discussed ways to improve review of new packages in the archive, and which of their processes could be relaxed with new US laws around Open Source and cryptography algorithms export. Emmanuel Arias led a session on Salsa CI, Debian's Gitlab instance and standard CI pipeline. (I think it's too slow, but the runners are not under their control.) Julian Klode then presented new features in APT, Debian's package manager. I like the new display format (and a tiny bit of that is also from me sending in wishlist bugs).
In the last round of sessions this week, I then led the Ham radio BoF with an introduction into the hobby and how Debian can be used. Bdale mentioned that the sBitx family of SDR radios is natively running Debian, so stock packages can be used from the radio's touch display. We also briefly discussed his involvement in ARDC and the possibility to get grants from them for Ham radio projects. Finally, DebConf wrapped up with everyone gathering in the main auditorium and cheering the organizers for making the conference possible and passing Pollito, the DebConf mascot, to the next organizer team.
Pollito on stage (photo: Christoph Berg)
Sunday, July 20th
Zoom back to the train: I made it through the Paris metro and I'm now on the Eurostar back to Germany. It has been an intense week with all the conference sessions and meeting all the people I had not seen so long. There are a lot of new ideas to follow up on both for my Debian and PostgreSQL work. Next year's DebConf will take place in Santa Fe, Argentina. I haven't yet decided if I will be going, but I can recommend the experience to everyone!
The post The Debian Conference 2025 in Brest appeared first on CYBERTEC PostgreSQL Services & Support.
What happened in the Reproducible
Builds effort between Sunday September 25 and Saturday October 1 2016:
Statistics
For the first time, we reached 91% reproducible packages in our testing framework on
testing/amd64 using a determistic build path. (This is what we recommend to make packages in Stretch reproducible.)
For unstable/amd64, where we additionally test for reproducibility across
different build paths we are at almost 76% again.
IRC meetings
We have a poll to set a time for a new regular IRC meeting.
If you would like to attend, please input your available times and we will try
to accommodate for you.
There was a trial IRC meeting on Friday, 2016-09-31 1800 UTC. Unfortunately, we
did not activate meetbot.
Despite this participants consider the meeting a success as several topics where
discussed (eg changes to IRC notifications of tests.r-b.o) and the meeting stayed
within one our length.
Upcoming events
Reproduce and Verify Filesystems
- Vincent Batts, Red Hat - Berlin (Germany), 5th October, 14:30 - 15:20 @
LinuxCon + ContainerCon Europe 2016.
From Reproducible Debian builds to Reproducible OpenWrt, LEDE &
coreboot - Holger "h01ger" Levsen and
Alexander "lynxis" Couzens - Berlin (Germany), 13th October, 11:00 - 11:25 @
OpenWrt Summit 2016.
Introduction to Reproducible
Builds
- Vagrant Cascadian will be presenting at the SeaGL.org Conference In
Seattle (USA), November 11th-12th, 2016.
Previous events
GHC Determinism
- Bartosz Nitka, Facebook - Nara (Japan), 24th September, ICPF 2016.
Toolchain development and fixes
Michael Meskes uploaded bsdmainutils/9.0.11 to unstable with a fix
for #830259 based on Reiner Herrmann's patch. This fixed locale_dependent_symbol_order_by_lorder issue in the affected packages (freebsd-libs, mmh).
devscripts/2.16.8 was uploaded to unstable. It includes a debrepro
script by Antonio Terceiro which is similar in purpose to reprotest but more
lightweight; specific to Debian packages and without support for virtual servers
or configurable variations.
Packages reviewed and fixed, and bugs filed
The following updated packages have become reproducible in our testing framework
after being fixed:
The following updated packages appear to be reproducible now for reasons we
were not able to figure out. (Relevant changelogs did not mention reproducible
builds.)
Reviews of unreproducible packages
77 package reviews have been added, 178 have been updated and 80 have been
removed in this week, adding to our knowledge about identified
issues.
6 issue types have been updated:
Weekly QA work
As part of reproducibility testing, FTBFS bugs have been detected and reported
by:
Adrian Bunk (3)
Chris Lamb (12)
Lucas Nussbaum (3)
Sebastian Reichel (1)
diffoscope development
A new version of diffoscope 61 was
uploaded to unstable by Chris
Lamb. It included
contributions
from:
Ximin Luo:
Improve the CLI --help text and add an --output-empty option.
Chris Lamb:
Add a progress bar and show it if stdout is a TTY. You can read more about
it here. It can
also be read by higher-level programs via the --status-fd CLI option.
Maria Glukhova:
Behaviour improvements in the case of OS-level errors.
Mattia Rizzolo:
Testing and packaging improvements.
Post-release there were further contributions from:
Chris Lamb:
Code architecture improvements.
Maria Glukhova:
Testing improvements.
reprotest development
A new version of reprotest 0.3.2 was
uploaded to unstable by Ximin
Luo. It included
contributions
from:
Ximin Luo:
Add a --diffoscope-arg CLI option to pass extra args to diffoscope.
Post-release there were further contributions from:
Chris Lamb:
Code quality improvements.
tests.reproducible-builds.org
Hans-Christoph Steiner continued work on setting up reproducible tests for F-Droid.
Holger cleaned up the script creating the page showing breakages, so that it now also cleans up some of the breakage it finds.
IRC notifications about diffoscope crashes and artifacts available for investigations have been dropped; instead the breakages page has a permanent pointer. (h01ger)
IRC notifications from the automatic package scheduler and status changes for packages have been moved -- as a temporary trial -- to #debian-reproducible-changes on irc.oftc.net (Mattia).
Misc.
This week's edition was written by Ximin Luo, Holger Levsen & Chris Lamb and reviewed by a bunch of Reproducible Builds folks on IRC.
What happened about the reproducible
builds effort this week:
Media coverage
Daniel Stender published an English translation of the article which originally
appeared in Linux Magazin in Admin Magazine.
Toolchain fixes
Fixes landed in the Debian archive:
Lunar uploaded docbook-to-man/1:2.0.0-34 which removes a timestamp in generated manpages. Original patch by Chris Lamb.
Stefano Rivera uploaded dh-python/1.20150628-1 which now sorts namespace files. Original patch by Chris Lamb.
Christian Hofstaedtler uploaded ruby2.2/2.2.2-2 which now uses UTC for the dates in gemspec files. Original patch by Chris Lamb.
reproducible.debian.net
A new package set for the X Strike Force has been added. (h01ger)
Bugs tagged with locale are now visible in the statistics. (h01ger)
Some work has been done add tests for NetBSD. (h01ger)
Many changes by Mattia Rizzolo have been merged on the whole infrastructure:
IRC notifications when known reproducible packages stops buildig successfully.
What happened about the reproducible
builds effort this week:
Toolchain fixes
Norbert Preining uploaded texinfo/6.0.0.dfsg.1-2 which makes texinfo indices reproducible. Original patch by Chris Lamb.
Lunar submitted recently rebased patches to make the file order of files inside .deb stable.
akira filled #789843 to make tex4ht stop printing timestamps in its HTML output by default.
Dhole wrote a patch for xutils-dev to prevent timestamps when creating gzip compresed files.
Reiner Herrmann sent a follow-up patch for wheel to use UTC as timezone when outputing timestamps.
Mattia Rizzolo started a discussion regarding the failure to build from source of subversion when -Wdate-time is added to CPPFLAGS which happens when asking dpkg-buildflags to use the reproducible profile. SWIG errors out because it doesn't recognize the aforementioned flag.
Trying to get the .buildinfo specification to more definitive state, Lunar started a discussion on storing the checksums of the binary package used in dpkg status database.
akira discovered while proposing a fix for simgrid that CMake internal command to create tarballs would record a timestamp in the gzip header. A way to prevent it is to use the GZIP environment variable to ask gzip not to store timestamps, but this will soon become unsupported. It's up for discussion if the best place to fix the problem would be to fix it for all CMake users at once.
Infrastructure-related work
Andreas Henriksson did a delayed NMU upload of pbuilder which adds minimal support for build profiles and includes several fixes from Mattia Rizzolo affecting reproducibility tests.
Neils Thykier uploaded lintian which both raises the severity of package-contains-timestamped-gzip and avoids false positives for this tag (thanks to Tomasz Buchert).
Petter Reinholdtsen filled #789761 suggesting that how-can-i-help should prompt its users about fixing reproducibility issues.
Packages fixed
The following packages became reproducible due to changes in their
build dependencies:
autorun4linuxcd,
libwildmagic,
lifelines,
plexus-i18n,
texlive-base,
texlive-extra,
texlive-lang.
The following packages became reproducible after getting fixed:
Patches submitted which have not made their way to the archive yet:
#789648 on apt-dater by Dhole: allow the build date to be set externally and set it to the time of the latest debian/changelog entry.
#789715 on simgrid by akira: fix doxygen and patch CMakeLists.txt to give GZIP=-n for tar.
#789728 on aegisub by Juan Picca: get rid of __DATE__ and __TIME__ macros.
#789747 on dipy by Juan Picca: set documentation date for Sphinx.
#789748 on jansson by Juan Picca: set documentation date for Sphinx.
#789799 on tmexpand by Chris Lamb: remove timestamps, hostname and username from the build output.
#789804 on libevocosm by Chris Lamb: removes generated files which include extra information about the build environment.
#789963 on qrfcview by Dhole: removes the timestamps from the the generated PNG icon.
#789965 on xtel by Dhole: removes extra timestamps from compressed files by gzip and from the PNG icon.
#790010 on simbody by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790023 on stx-btree by akira: pass HTML_TIMESTAMP=NO to Doxygen.
#790034 on siscone by akira: removes $datetime from footer.html used by Doxygen.
#790035 on thepeg by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790072 on libxray-spacegroup-perl by Chris Lamb: set $Storable::canonical = 1 to make space_groups.db.PL output deterministic.
#790074 on visp by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790081 on wfmath by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790082 on wreport by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790088 on yudit by Chris Lamb: removes timestamps from the build system by passing a static comment.
#790122 on clblas by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790133 on dcmtk by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790139 on glfw3 by akira: patch for Doxygen timestamps further improved by James Cowgill by removing $datetime from the footer.
#790228 on gtkspellmm by akira: set HTML_TIMESTAMP=NO in Doxygen configuration.
#790232 on ucblogo by Reiner Herrmann: set LC_ALL to C before sorting.
#790235 on basemap by Juan Picca: set documentation date for Sphinx.
#790258 on guymager by Reiner Herrmann: use the date from the latest debian/changelog as build date
#790309 on pelican by Chris Lamb: removes useless (and unreproducible) tests.
debbindiff development
debbindiff/23 includes a few bugfixes by Helmut Grohne that result in a significant speedup (especially on larger files). It used to exhibit the quadratic time string concatenation antipattern.
Version 24 was released on June 23rd in a hurry to fix an undefined variable introduced in the previous version. (Reiner Herrmann)
debbindiff now has a test suite! It is written using the PyTest framework (thanks Isis Lovecruft for the suggestion). The current focus has been on the comparators, and we are now at 93% of code coverage for these modules.
Several problems were identified and fixed in the process: paths appearing in output of javap, readelf, objdump, zipinfo, unsqusahfs; useless MD5 checksum and last modified date in javap output; bad handling of charsets in PO files; the destination path for gzip compressed files not ending in .gz; only metadata of cpio archives were actually compared. stat output was further trimmed to make directory comparison more useful.
Having the test suite enabled a refactoring of how comparators were written, switching from a forest of differences to a single tree. This helped removing dust from the oldest parts of the code.
Together with some other small changes, version 25 was released on June 27th. A follow up release was made the next day to fix a hole in the test suite and the resulting unidentified leftover from the comparator refactoring. (Lunar)
Documentation update
Ximin Luo improved code examples for some proposed environment variables for reference timestamps. Dhole added an example on how to fix timestamps C pre-processor macros by adding a way to set the build date externally. akira documented her fix for tex4ht timestamps.
Package reviews
94 obsolete
reviews have
been removed, 330 added and 153 updated this week.
Hats off for Chris West (Faux) who investigated many fail to build from source issues and reported the relevant bugs.
Slight improvements were made to the scripts for editing the review database, edit-notes and clean-notes. (Mattia Rizzolo)
Meetings
A meeting was held on June 23rd. Minutes are available.
The next meeting will happen on Tuesday 2015-07-07 at 17:00 UTC.
Misc.
The Linux Foundation announced that it was funding the work of Lunar and h01ger on reproducible builds in Debian and other distributions. This was further relayed in a Bits from Debian blog post.
What happened about the reproducible
builds effort for this week:
Toolchain fixes
Uploads that should help other packages:
Stephen Kitt uploaded mingw-w64/4.0.2-2 which avoids inserting timestamps in PE binaries, and specify dlltool's temp prefix so it generates reproducible files.
Stephen Kitt uploaded binutils-mingw-w64/6.1 which fixed dlltool to initialize its output's .idata$6 section, avoiding random data ending up there.
Patch submitted for toolchain issues:
#787159 on openjdk-7 by Emmanuel Bourg: sort the annotations and enums in package-tree.html produced by javadoc.
#787250 on python-qt4 by Reiner Herrmann: sort imported modules to get reproducible output.
#787251 on pyqt5 by Reiner Herrmann: sort imported modules to get reproducible output.
Some discussions have been started in Debian and with upstream:
#786927 on flowscan by Dhole: remove timestamps from gzip files and fix mtimes of packaged files.
#786959 on python3.5 by Lunar: set build date of binary and documentation to the time of latest debian/changelog entry, prevent gzip from storing a timestamp.
reproducible.debian.net
Holger Levsen added two new package sets: pkg-javascript-devel and pkg-php-pear. The list of packages with and without notes are now sorted by age of the latest build.
Mattia Rizzolo added support for email notifications so that maintainers can be warned when a package becomes unreproducible. Please ask Mattia or Holger or in the #debian-reproducible IRC channel if you want to be notified for your packages!
strip-nondeterminism development
Andrew Ayer fixed the gzip handler so that it skip adding a predetermined timestamp when there was none.
Documentation update
Lunar added documentation about mtimes of file extracted using unzip being timezone dependent. He also wrote a short example on how to test reproducibility.
Stephen Kitt updated the documentation about timestamps in PE binaries.
Documentation and scripts to perform weekly reports were published by Lunar.
Package reviews
50 obsolete
reviews have
been removed, 51 added and 29 updated this week. Thanks Chris West and Mathieu
Bridon amongst others.
New identified issues:
Misc.
Lunar will be talking (in French) about reproducible builds at Pas Sage en Seine on June 19th, at 15:00 in Paris.
Meeting will
happen this Wednesday, 19:00 UTC.
The GSoC student application period is over, and the last two days were pretty interesting.
For a few years now, Olly Betts has provided us with a spreadsheet to graph the number of applicants to an organization over time.
Here s the graph for Debian this year:
(Historical graphs: 2013, 2012. Spreadsheet available from Olly s blog)
On Wednesday, I was thinking hmm, 30 applicants, this is a slow year . Well, the number of proposals more than doubled in the last two days, to conclude on a whooping 68 applications! The last one was submitted just three seconds before the deadline
If you want to take a look at the proposals, head over to the Debian wiki.
Time to get on reviewing! The final student acceptances will be published in just less than a month, on April 21st.
Slide 5 from my nostalgia-fest "The Art of Writing Small Programs"
from just under two years ago:
XKCD 1275 from last week:
(Of course that should be INT SQR EXP PI/INT PI * PI * R ** INT PI).
I've produced a graph of the 61 student applications which Debian
received for GSoC this year:
Ana blogged a similar graph last year
if you want to compare. It looks like the total is down a little (though
I'm not sure if the figure of 81 from the text, or ~68 read from the graph
is correct for last year) - this is likely at least partly due to the number of
proposals each student can send having been reduced from 20 last year to 5
this year, which should have reduced the number of low quality proposals.
The timeline this year is later, which may have also had an effect.
If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this OpenDocument spreadsheet and follow the
instructions inside.
Gaurav (one of Xapian's GSoC students this year) queried my advice that it's
better not to explicitly initialise a std::string in C++ like this:
std::string s = "";
but instead to write:
std::string s;
It could be argued that the former makes it clearer that the string is
initialised (since C++ does inherit C's behaviour of not implicitly
initialising variables of fundamental types, such as int and double). But
objects aren't left uninitialised - the default constructor gets called
(or if there isn't one, the code won't compile).
The downside is that you get quite a lot more code from the first version
than the second. Perhaps compilers will grow smarter and in future both
the above will compile to the same code, but that's not true today.
Here's a simple bit of test code:
(As an aside, I wouldn't recommend spending a lot of time digging into the
assembler your compiler produces, but if there's more than one equivalent way
to write code for a common case (as here) and you want to know which is most
efficient, it can be informative to see what code is produced for each).
You don't really need to know x86-64 assembler to grasp what's happening
above, which is lucky, since I don't really know x86-64 assembler. In the
first hunk, we get an empty literal string being added to the rodata (read-only
data) section; and in the second, instead of two instructions which copy a
standard empty string representation, we get much more elaborate code, including
a function call to _ZNSsC1EPKcRKSaIcE - the C++ mangled name for the
std::string from const char * constructor:
I also tested the example above with clang (Debian clang version 3.0-6
(tags/RELEASE_30/final) (based on LLVM 3.0)) and the resultant code is
fairly similar for each case.
The overhead for doing this once isn't going to matter, but if it happens
every time you declare a std::string variable, the cumulative effects may
be measurable. And as well as taking longer to execute, the larger code will
cause greater CPU cache pressure.
I was staying in room 2112 in
the hotel for the GSoC mentor summit at the weekend, which wetted my appetite
for some Rush, so I stuck a CD in
my laptop today and got this dialog:
Looks like there's already a report (#631760) for this issue.
At the end of the previous episode,
you may remember our gallant heroes had a pile of 30 proposals to review.
We soon spotted one more to mark as invalid (just a paste with our ideas list
plus a some biographical details), and another got withdrawn by the student
without explanation (but was low quality anyway), so that left us with 28.
We had six volunteers for mentoring, and in the initial allocation we received
five student slots from Google, but we asked nicely if we could have an extra
one, and were lucky enough to get it. Last year we had four students, so that's
a 50% increase.
Here's those 28, broken down by the project idea:
8 - Weighting Schemes
6 - Learning to Rank
3 - Dynamic Snippets
2 - Lucene Backend
2 - QueryParser improvements
1 - Erlang Bindings
1 - Improve C# and Java bindings
1 - Improve PHP Bindings
1 - Improve Python Bindings
1 - Improving Japanese Support
1 - Node.js Bindings
1 - Postlist encodings
I find it interesting that the most popular three ideas have closer connections
to Information Retrieval theory than most - probably these appeal to
students who have taken IR courses and already have an interest and some
knowledge of the project area. I think we should aim to get more ideas like
these on the list in future years.
It's worth noting that in several cases students had taken an idea in
sufficiently different directions that there wasn't much overlap, so we didn't
just pick the best proposal for each project idea to narrow things down. Also,
the proposal isn't the only factor - we like to see applicants work on patch,
and to interact with us on IRC and/or email. But in the end it happens we
ended up with proposals which were all from different ideas - here are those we
selected:
My congratulations to the lucky six, and my commiserations to those we weren't
able to select. It wasn't an easy selection to make, and we truly appreciate
the time you spent writing your proposal, working on patches, and on the rest
of the application process. We'd encourage you to remain involved with Xapian,
and to apply to us again next year if you're still eligible for GSoC.
Student applications for GSoC closed a day
or so ago, and we've done an initial pass through Xapian's applications, so I thought I should post another
overview, similar to last year's.
We received a total of 41 applications this year (very close to last year's
total of 42). Here's a graph of applications against time:
If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this OpenDocument spreadsheet and follow the
instructions inside.
That total of 41 includes one duplicate and one application withdrawn by the
student (we had one of each last year too). I've also gone through and marked
nine spam proposals as invalid (similar to the seven we had last year). Spam
proposals are things like proposals with no connection at all to Xapian, and
proposals which are just a title and/or paste from our ideas list with a
generic biography.
So that leaves us with 30 proposals (compared to 33 last year). It's hard
to really measure, but my feeling is that the average quality is higher
than last year (and it was already pretty impressive last year).
I finally got my talk slides from last year's Kiwi PyCon and OSDC uploaded, and
got inspired to overhaul my list of talks,
including a few that were missing entirely, and adding links to video or
audio of more of the talks.
If you've half an hour to kill and want some 8-bit computing nostalgia, then
the video of "The Art of Writing Small Programs"
from OSDC 2011 is well worth a watch.
I spent a bit of time last month working towards being able to remove
wxwidgets2.6 from the Debian archive - inspired by the
RCBW initiative, I did a
quick (and perhaps not totally thorough) tally of the Debian RC bugs I fixed in
October 2011:
Probably the biggest win will be the removal of wxwidgets2.6 though - it's long
dead upstream and removing it will pave the way for including wxwidgets3.0
once it is released.
(Actually, we branched six weeks ago, but I've not got around to
writing about it until now.)
The development branch approach we used for 1.1.x development releases
leading to a stable 1.2.0 release seemed to work pretty well, so we're
adopting that again.
The main problem last time was that it took a long time to actually stabilise
1.1.x because we kept slipping more changes in. For 1.3.x, we need to be more
disciplined and changes should be developed on a branch and not merged
prematurely. We now have solid git mirroring,
so developing on a branch is a more pleasant experience than before. We also
need to be brutal sooner. It's better for everyone to (say) achieve two
releases series in two years than have one release series take two years.
When I was in the UK back in May, Richard and I sat down and hashed out
a list of goals for a 1.4 release series. This is what we came up with
(the order is just how they came to mind, so isn't really significant):
read more
There seems to be a lot of confusion amongst students as to what
"Status: pending" on their proposal means. Well, a state diagram
explains this clearly (brief summary: pending is good):
Note: Now that student applications have closed, moving a "withdrawn"
application back to "pending" counts as a modification to the proposal
so you'll need to get the org to click "Allow proposal modifications"
on your proposal before you can do this.
Student applications for GSoC closed a few
hours ago. This is Xapian's first year as a mentoring
organisation (though
I've been involved in previous years with SWIG
and Debian) and we've been blown away by the
response from students.
If you'd have asked me when we'd got accepted, I'd have guessed we might
get 20 applications and feel we'd done well, but counting up now we have 42.
Ignoring two which were withdrawn (one duplicate, one a spam which surprisingly
got withdrawn when I politely suggested such applications weren't useful), here
is a graph of applications against time:
If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this OpenDocument spreadsheet and follow the
instructions inside.
Now the task of selection starts in earnest. I've gone through and marked
the seven spam proposals as ineligible (that's one line proposals,
proposals with no connection at all to Xapian, and proposals which are just a
title and/or paste from our ideas list with a generic biography).
That leaves 33, but not all are really in the running, before our student
applicants start to despair! I don't have a good picture yet, but it
looks like there are something like 10-15 we'll be seriously considering.
From left to right: Obey Arthur Liu, Olly Betts, Stefano Zacchiroli, Dirk Eddelbuettel, Sylvestre Ledru, Jelmer Vernooij.
Dear Planet,
We arrived at the Google Summer of Code 2009 Mentor Summit and are having a blast here. The weather is awesome, the candies are plenty and the conference rooms are comfy at the Googleplex. We will write to you again soon.
Cheers
The Debian people Arthur, Olly, Zack, Dirk, Sylvestre, Jelmer