Search Results: "Olivier Berger"

6 December 2020

Olivier Berger: Offsite backups to from an Asustor NAS

I have an Asustor (Nimbuster 4 / AS5304T) NAS at home, which allows me to perform regular backups. Unfortunately, I couldn't manage to find a way to configure remote backups from the NAS to using the "Remote Sync" option of the "Backup & Restore" app provided in the ADM OS of the Asustor NAS. I'm used to using rsync -e ssh commands and would have expected some such configuration to be possible... it may just be some UI issue, but I couldn't find how to add one for I've then used the following workaround : the NAS can run a "Linux VM" (LXC ?) with the provided "Linux Center" app. It allows me to run a regular Debian installation inside this Linux guest. On which I installed backupninja, which performs backups over rsync, from the NAS filesystem, to the remote. Along the way I had to overcome a few issues, whose documentation may be useful to others :

3 June 2020

Olivier Berger: Automate the capture a full BigBlueButton conference replay, with bbb-downloader

BigBlueButton, aka BBB, is a webrtc conferencing solution, that among many features, allows to record a conference, for later replay. We have been working together with my colleague Fran ois Trahay, on a set of scripts (bbb-downloader) that will allow to easily (on Linux) download recordings of BBB conferences, for local backup, video editing, upload on video sharing platforms, etc. This is particularly useful in our distance learning contexts where students may have to catch up on a live session that was recorded. We have integrated a hackish solution to capture, as a single video, presentations that contained slide deck presentations. Let me explain why this was necessary. A nice feature of BBB is the fact that, to present a slides deck, you don t need to share your screen (as a video stream), but just have to upload your file, which is then auto-converted to images, that are sent to participants, in sync with your next/previous browsing of the slides. This is great for participants with low bandwidth, which can see the slides ( static images) instead of receiving a full screen video stream. But a side effect is that the recording of a class/conference that is done by BBB replays the slides just as it was done live : displaying images one after the other. While it is easy to retrieve the audio, webcams of participants, or screen sharings as video streams, directly available from the recordings replay app, it is thus not the same for the slides, which don t come as a video. Our script will perform a replay, using a Docker container which drives Selenium under the hood, to capture the full replay, as a single video, which then includes the slides and everything. You can see my demo of this process in the following video: bbb-downloader full capture demo. It takes long to replay, in real-time, the recordings, to perform this capture but it works. Kudos to elgalu/docker-selenium for the Docker env. Feel free to test it and profit, or to report issues in the Guthub issues of the repo:

2 June 2020

Olivier Berger: Mixing NRELab s Antidote and Eclipse Che on the same k8s cluster

You may have heard of my search for Cloud solutions to run labs in an academic context, with a focus on free an open source solutions . You may read previous installments of this blog, or for a shorter, check the presentation I ve recorded last week. I ve become quite interested, in the latest month, in 2 projects: NRELab s Antidote and Eclipse Che. Antidote is the software that powers NRELabs, a labs platform for learning network automation, which runs on top of Kubernetes (k8s). The interesting thing is that for each learner, there can be a dedicated k8s namespace with multiple virtual nodes running on a separate network. This can be used in the context of virtual classes/labs where our students will perform network labs in parallel on the same cluster. Eclipse Che powers Eclipse on the Cloud , making available software development environments, for developers, on a Kubernetes Cloud. Developers typically work from a Web page instead of installing local development tools. Both projects seem quite complementary. For one, we both teach networks and software developments. So that would naturally appeal for many professors. Furthermore, Eclipse Che provides a few features that Antidote is lacking : authenticating users (with keycloak), and persisting their work in workspaces, between work sessions. Typically what we need in our academic context where students will work on the same labs during scheduled classes, week after week, during or off-hours. Thus it would be great to have more integration between the 2 environments. I intend to work on that front, but that takes time, as running stuff on Kubernetes isn t exactly trivial, at least when you re like me and want to use a vanilla kubernetes. I ve mainly relied on running k8s inside VMs using Vagrant and/or minikube so far. A first milestone I ve achieved is making sure that Antidote and Eclipse Che aren t incompatible. Antidote s selfmedicate script was actually running inside a Vagrant VM, where I had difficulties installing Eclipse Che (probably because of old software, or particular networking setup details). I ve overcome this hurdle, as I m now able to install both environments on a single Kubernetes VM (using my own Vagrant setup). Running Eclipse Che (alongsite Antidote) on a k8s Vagrant VM. This proves only that there s no show stopper there, but a lot of work remains. Stay tuned. Update: I ve finally managed to get it to work on the antidote-selfmedicate base too. See my branch at:

Olivier Berger: Experimenting on distant labs and labs on the Cloud

I have delivered a speech last week about some ideas and experiments I have about the use of remote access and Cloud technologies for labs. I have collected the speech recording and stuff, in french, in another post. The presentation was in french originaly, so I ve quickly translated my slides and recorded an english version. I mention tools like Guacamole, MeshCentral, NRELab s Antidote, Eclipse Che and Labtainers, as well as k8s and Docker, as interesting tools that may allow us to continue teaching in labs while allowing more flexibility, distant learning, and hopefully improved quality. You can find the slides here:, and the recording is here: Experimenting on distant labs and labs on the Cloud.

9 May 2017

Olivier Berger: Installing a Docker Swarm cluster inside VirtualBox with Docker Machine

I ve documented the process of installing a Docker Swarm cluster inside VirtualBox with Docker Machine. This allows experimenting with Docker Swarm, the simple docker container orchestrator, over VirtualBox. This allows you to play with orchestration scenarii without having to install docker on real machines. Also, such an environment may be handy for teaching if you don t want to install docker on the lab s host. Installing the docker engine on Linux hosts for unprivileged users requires some care (refer to docs about securing Docker), as the default configuration may allow learners to easily gain root privileges (which may or not be desired). See more at

7 February 2017

Olivier Berger: Making Debian stable/jessie images for OpenStack with bootstrap-vz and cloud-init

I m investigating the creation of VM images for different virtualisation solutions. Among the target platforms is a destop as a service platform based on an OpenStack public cloud. We ve been working with bootstrap-vz for creating VMs for Vagrant+VirtualBox so I wanted to test its use for OpenStack. There are already pre-made images available, including official Debian ones, but I like to be able to re-create things instead of depending on some external magic (which also means to be able to optimize, customize and avoid potential MitM, of course). It appears that bootstrap-vz can be used with cloud-init provided that some bits of config are specified. In particular the cloud_init plugin of bootstrap-vz requires a metadata_source set to NoCloud, ConfigDrive, OpenStack, Ec2 . Note we explicitely spell it OpenStack and not Openstack as was mistakenly done in the default Debian cloud images (see The following snippet of manifest provides the necessary bits :
name: debian- system.release - system.architecture - %Y %m %d 
  name: kvm
  - virtio_pci
  - virtio_blk
  workspace: /target
  # create or reuse a tarball of packages
  tarball: true
  release: jessie
  architecture: amd64
  bootloader: grub
  charmap: UTF-8
  locale: en_US
  timezone: UTC
  backing: raw
    #type: gpt
    type: msdos
      filesystem: ext4
      size: 4GiB
      size: 512MiB
  # change if another mirror is closer
    password: whatever
    username: debian
    # Note we explicitely spell it 'OpenStack' and not 'Openstack' as done in the default Debian cloud images (see
    metadata_sources: NoCloud, ConfigDrive, OpenStack, Ec2
  # admin_user:
  #   username: Administrator
  #   password: Whatever
    # reduce the size by around 250 Mb
    zerofree: true
I ve tested this with the bootstrap-vz version in stretch/testing (0.9.10+20170110git-1) for creating jessie/stable image, which were booted on the OVH OpenStack public cloud. YMMV. Hope this helps

26 November 2015

Olivier Berger: Handling video files produced for a MOOC on Windows with git and git-annex

This post is intended to document some elements of workflow that I ve setup to manage videos produced for a MOOC, where different colleagues work collaboratively on a set of video sequences, in a remote way. We are a team of several schools working on the same course, and we have an incremental process, so we need some collaboration over a quite long period of many remote authors, over a set of video sequences. We re probably going to review some of the videos and make changes, so we need to monitor changes, and submit versions to colleagues on remote sites so they can criticize and get later edits. We may have more that one site doing video production. Thus we need to share videos along the flow of production, editing and revision of the course contents, in a way that is manageable by power users (we re all computer scientists, used to SVN or Git). I ve decided to start an experiment with Git and Git-Annex to try and manage the videos like we use to do for slides sources in LaTeX. Obviously the main issue is that videos are big files, demanding in storage space and bandwidth for transfers. We want to keep a track of everything which is done during the production of the videos, so that we can later re-do some of the video editing, for instance if we change the graphic design elements (logos, subtitles, frame dimensions, additional effects, etc.), for instance in the case where we would improve the classes over the seasons. On the other hand, not all colleagues want to have to download a full copy of all rushes on their laptop if they just want to review on particular sequence of the course. They will only need to download the final edit MP4. Even if they re interested in being able to fetch all the rushes, should they want to try and improve the videos. Git-Annex brings us the ability to decouple the presence of files in directories, managed by regular Git commands, from the presence of the file contents (the big stuff), which is managed by Git-Annex. Here s a quick description of our setup : Why didn t we use git-annex on windows directly, on the Windows host which is the source of the files ? We tried, but that didn t make it. Git-Annex assistant somehow crashed on us, thus causing the Git history to be strange, so that became unmanageable, and more important, we need robust backups, so we can t allow to handle something we don t fully trust: shooting again a video is really costly (setting up again a shooting set, with lighting, cameras, and a professor who has to repeat again the speech!). The rsync (with delete on destination) from windows to Linux is robust. Git-Annex on Linux seems robust so far. That s enough for now :-) The drawback is that we need manual intervention for starting the rsync, and also that we must make sure that the rsync target is ready to get a backup. The target of the rsync on Linux is a git-annex clone using the default indirect mode, which handles the files as symlinks to the actual copies managed by git-annex inside the .git/ directory. But that ain t suitable to be compared to the origin of the rsync mirror which are plain files on the Windows computer. We must then do a git-annex edit on the whole target of the rsync mirror before the rsync, so that the files are there as regular video files. This is costly, in terms of storage, and also copying time (our repo contains around 50 Gb, and the Linux host is a rather tiny laptop). After the rsync, all the files need to be compared to the SHA256 known to git-annex so that only modified files are taken into account in the commit. We perform a git-annex add on the whole files (for new files having appeared at rsync time), and then a git-annex sync . That takes a lot of time, since all SHA256 computations are quite long for such a set of big files (the video rushes and edited videos are in HD). So the process needs to be the following, on the target Linux host:
  1. git annex add .
  2. git annex sync
  3. git annex copy . to server1
  4. git annex copy . to server2
  5. git annex edit .
  6. only then : rsync
Iterate ad lib  I would have preferred to have a working git-annex on windows, but that is a much more manageable process for me for now, and until we have more videos to handle in our repo that our Linux laptop can hold, we re quite safe. Next steps will probably involve gardening the contents of the repo on the Linux host so we re only keeping copies of current files, and older copies are only kept on the 2 servers, in case of later need. I hope this can be useful to others, and I d welcome suggestions on how to improve our process.

30 April 2015

Olivier Berger: A howto record a screencast on Linux and tablet

I ve documented the process of how I m trying to perform DIY screencast recording, for the needs of a MOOC. I m working on my Debian or Ubuntu desktop, using an external graphic tablet with integrated display for annotating slides. The main software used for the process are xournal for annotating PDFs and vokoscreen for the screen and video recording. Here is the documentation : And here s the companion video :
I hope this is useful to some.

21 April 2015

Olivier Berger: How to publish an HTML5+RDFa Web site from org-mode

I m a big fan of org-mode (see previous posts), and I ve started maintaining (sic) my professional webpage(s) with it. But I ve also recently tried and publish some more Semantic/Linked Data aware documents too (again, previous posts). Ideally, I think my preferred workflow for publishing articles or documents of some importance, would be to author them in org-mode, and then publish them as HTML5 including RDFa meta-data and annotations. Instead, I ve more frequently been doing conversions of org-mode to LaTeX, in order to submit a printable version, and later-on decided to convert the LaTeX to HTML5+RDFa But one of the issues is how to properly embed the RDF meta-data inside the org-mode documents, so that the syntax is both compact and expressive enough. I doubt there s a universal solution, given that RDF tends to be complex, and graphs may not project easilly along a mainly linear structure of an org-mode document, but anyway, there seems to be possible middle grounds that are practically good enough. I ve tried and implement a solution, which reuses the principles set by John Kitchin in Extending the org-mode link syntax with attributes, i.e. implementing an HTML exporter for a particular custom link type, which will convert the plist-like syntax to some RDFa constructs. Here s a description of the whole solution : The nice thing about org-mode, and its litterate programming babel environment, is that it allows to embed the code of the links exporter inside the org document, avoiding to dissociate the converter from the document s source, making it auto-complete. Next step will probably be to author a paper (or convert back a preprint of mines) with org-mode, in order to provide Linked Research meta-data. Stay tuned for more details, and in the meantime, I welcome any improvement to the org/babel/elisp setup. Edit: I ve recorded a webcast to provide a bit more details, available on YouTube :

7 April 2015

Olivier Berger: Publishing my papers as Linked Research

I intend to make the extra effort of republishing my own research papers as Linked Research, i.e. in a form readable by humans (HTML5), but also embedding meta-data (as RDF) for machine processing. I ve started with Authoritative Linked Data descriptions of Debian source packages using ADMS.SW (a good candidate, as it deals with Linked Data ;). You ll notice the menu which helps select different style sheets for preparing clean printable versions, not far from the LaTeX output usually converted to PDF. I hope this will pave the way to more Linked Research, and less opaque publications. The only hassle at the moment is the conversion from LaTeX to HTML5 which I m doing manually, in Emacs + nxml-mode. Update: Check the preprint links in my publications page, for more papers.

27 March 2015

Olivier Berger: New short paper : Designing a virtual laboratory for a relational database MOOC with Vagrant, Debian, etc.

Here s a short preview of our latest accepted paper (to appear at CSEDU 2015), about the construction of VMs for the Relational Database MOOC using Vagrant, Debian, PostgreSQL (previous post), etc. :

Designing a virtual laboratory for a relational database MOOC Olivier Berger, J Paul Gibson, Claire Lecocq and Christian Bac Keywords: Remote Learning, Virtualization, Open Education Resources, MOOC, Vagrant Abstract: Technical advances in machine and system virtualization are creating opportunities for remote learning to provide significantly better support for active education approaches. Students now, in general, have personal computers that are powerful enough to support virtualization of operating systems and networks. As a conse- quence, it is now possible to provide remote learners with a common, standard, virtual laboratory and learning environment, independent of the different types of physical machines on which they work. This greatly enhances the opportunity for producing re-usable teaching materials that are actually re-used. However, configuring and installing such virtual laboratories is technically challenging for teachers and students. We report on our experience of building a virtual machine (VM) laboratory for a MOOC on relational databases. The architecture of our virtual machine is described in detail, and we evaluate the benefits of using the Vagrant tool for building and delivering the VM. TOC :
  • Introduction
    • A brief history of distance learning
    • Virtualization : the challenges
    • The design problem
  • The virtualization requirements
    • Scenario-based requirements
    • Related work on requirements
    • Scalability of existing approaches
  • The MOOC laboratory
    • Exercises and lab tools
    • From requirements to design
  • Making the VM as a Vagrant box
    • Portability issues
    • Delivery through Internet
    • Security
    • Availability of the box sources
  • Validation
    • Reliability Issues with VirtualBox
    • Student feedback and evaluation
  • Future work
    • Laboratory monitoring
    • More modular VMs
  • Conclusions
  • Alario-Hoyos et al., 2014
    Alario-Hoyos, C., P rez-Sanagust n, M., Kloos, C. D., and Mu oz Merino, P. J. (2014).
    Recommendations for the design and deployment of MOOCs: Insights about the MOOC digital education of the future deployed in Mir adaX.
    In Proceedings of the Second International Conference on Technological Ecosystems for Enhancing Multiculturality, TEEM 14, pages 403-408, New York, NY, USA. ACM.
  • Armbrust et al., 2010
    Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. (2010).
    A view of cloud computing.
    Commun. ACM, 53:50-58.
  • Billingsley and Steel, 2014
    Billingsley, W. and Steel, J. R. (2014).
    Towards a supercollaborative software engineering MOOC.
    In Companion Proceedings of the 36th International Conference on Software Engineering, pages 283-286. ACM.
  • Brown and Duguid, 1996
    Brown, J. S. and Duguid, P. (1996).
    Universities in the digital age.
    Change: The Magazine of Higher Learning, 28(4):11-19.
  • Bullers et al., 2006
    Bullers, Jr., W. I., Burd, S., and Seazzu, A. F. (2006).
    Virtual machines an idea whose time has returned: Application to network, security, and database courses.
    SIGCSE Bull., 38(1):102-106.
  • Chen and Noble, 2001
    Chen, P. M. and Noble, B. D. (2001).
    When virtual is better than real [operating system relocation to virtual machines].
    In Hot Topics in Operating Systems, 2001. Proceedings of the Eighth Workshop on, pages 133-138. IEEE.
  • Cooper, 2005
    Cooper, M. (2005).
    Remote laboratories in teaching and learning-issues impinging on widespread adoption in science and engineering education.
    International Journal of Online Engineering (iJOE), 1(1).
  • Cormier, 2014
    Cormier, D. (2014).
    Rhizo14-the MOOC that community built.
    INNOQUAL-International Journal for Innovation and Quality in Learning, 2(3).
  • Dougiamas and Taylor, 2003
    Dougiamas, M. and Taylor, P. (2003).
    Moodle: Using learning communities to create an open source course management system.
    In World conference on educational multimedia, hypermedia and telecommunications, pages 171-178.
  • Gomes and Bogosyan, 2009
    Gomes, L. and Bogosyan, S. (2009).
    Current trends in remote laboratories.
    Industrial Electronics, IEEE Transactions on, 56(12):4744-4756.
  • Hashimoto, 2013
    Hashimoto, M. (2013).
    Vagrant: Up and Running.
    O Reilly Media, Inc.
  • Jones and Winne, 2012
    Jones, M. and Winne, P. H. (2012).
    Adaptive Learning Environments: Foundations and Frontiers.
    Springer Publishing Company, Incorporated, 1st edition.
  • Lowe, 2014
    Lowe, D. (2014).
    MOOLs: Massive open online laboratories: An analysis of scale and feasibility.
    In Remote Engineering and Virtual Instrumentation (REV), 2014 11th International Conference on, pages 1-6. IEEE.
  • Ma and Nickerson, 2006
    Ma, J. and Nickerson, J. V. (2006).
    Hands-on, simulated, and remote laboratories: A comparative literature review.
    ACM Computing Surveys (CSUR), 38(3):7.
  • Pearson, 2013
    Pearson, S. (2013).
    Privacy, security and trust in cloud computing.
    In Privacy and Security for Cloud Computing, pages 3-42. Springer.
  • Prince, 2004
    Prince, M. (2004).
    Does active learning work? A review of the research.
    Journal of engineering education, 93(3):223-231.
  • Romero-Zaldivar et al., 2012
    Romero-Zaldivar, V.-A., Pardo, A., Burgos, D., and Delgado Kloos, C. (2012).
    Monitoring student progress using virtual appliances: A case study.
    Computers & Education, 58(4):1058-1067.
  • Sumner, 2000
    Sumner, J. (2000).
    Serving the system: A critical history of distance education.
    Open learning, 15(3):267-285.
  • Watson, 2008
    Watson, J. (2008).
    Virtualbox: Bits and bytes masquerading as machines.
    Linux J., 2008(166).
  • Winckles et al., 2011
    Winckles, A., Spasova, K., and Rowsell, T. (2011).
    Remote laboratories and reusable learning objects in a distance learning context.
    Networks, 14:43-55.
  • Yeung et al., 2010
    Yeung, H., Lowe, D. B., and Murray, S. (2010).
    Interoperability of remote laboratories systems.
    iJOE, 6(S1):71-80.

13 February 2015

Olivier Berger: Testing the RuneStone interactive Python courses server in docker

I ve been working on setting up a Docker container environment allowing to test the RuneStone Interactive server. RuneStone Interactive allows the publication of courses containing interactive Python examples, and while most of the content is static (the Python examples are run innside a Python interpreter implemented in JavaScript, hence locally in the JS VM of the Web browser), the tool also offers an environment allowing to monitor the progress of learners in a course, which is dynamic and is queried by the browser over AJAX APIs. That s the part which I wanted to be able to operate for test purposes. As it is a web2py application, it s not exactly obvious to gather all dependencies and run locally. Well, in fact it is, but I want to understand the architecture of the tool to be able to understand the deployment constraints, so making a docker image will help in this purpose. The result is the following : Now, it s easier to test the writing of a new course (yet another container above the latter one), and directly test for real.

6 February 2015

Olivier Berger: Configuring the start of multiple docker container with Vagrant in a portable manner

I ve mentioned earlier the work that our students did on migrating part of the elements of the Database MOOC lab VM to docker. While docker seems quite cool, let s face it, participants to the MOOCs aren t all using Linux where docker can be available directly. Hence the need to use boot2docker, for instance on Windows. Then we re back quite close to the architecture of the Vagrang VM, which relies too on a VirtualBox VM to run a Linux machine (boot2docker does exactly that with a minimal Linux which runs docker). If VirtualBox is to be kept around, then why not stick to Vagrant also, as it offers a docker provider. This docker provider for Vagrant helps configure basic parameters of docker containers in a Vagrantfile, and basically uses the vagrant up command instead of using docker build + docker run. If on Linux, it only triggers docker, and if not, then it ll start boot2docker (or any other Linux box) in between. This somehow offers a unified invocation command, which renders a bit more portable the documentation. Now, there are some tricks when using this docker provider, in particular for debugging what s happening inside the VM. One nice feature is that you can debug on Linux what is to be executed on Windows, by explicitely requiring the start of the intermediary boot2docker VM even if it s not really needed. By using a custom secondary Vagrantfile for that VM, it is possible to tune some parameters of that VM (like its graphic memory to allow to start it with a GUI allowing to connect another alternative is to ssh -p 2222 docker@localhost once you know that its password is tcuser ). I ve committed an example of such a setup in the moocbdvm project s Git, which duplicates the docker provisioning files that our students had already published in the dedicated GitHub repo. Here s an interesting reference post about Vagrant + docker and multiple containers, btw.

4 December 2014

Olivier Berger: Shell script to connecting to a Shibboleth protected web app with curl

Here s a shell script I ve created (reusing one meant for CAS protected resources), which will allow to connect to a Web application protected by the Shibboleth SSO mechanism.
It uses cURL to navigate through the various jumps required by the protocol, perform the necessary posts, etc.
I haven t read the Shibboleth specs, so it may not be the best way, and may not work in all cases, but that was enough for my case, at least. Feel free to improve it on Github Gists. After the connection is succesful, one may reuse the .cookieJar file to perform further cURL connections, or even some automated content mirroring with httrack, for instance (see a previous experiment of mine with httrack for Moodle).

1 December 2014

Olivier Berger: Offline backup/mirror of a Moodle course, using httrack

I havent found much details online on how to perform a Moodle course mirror that could be browsed offline using httrack. This could be useful both for backup purposes, or for distant learners with connectivity issues. In my case, there s a login/password dialog that grants access to the moodle platform, which can be processed by httrack by capturing the POST form results using the catchurl option. The strategy I ve used is to add filters so that everything is excluded and only explicitely mentioned filters are then allowed to be mirrored. This allows to perform the backup connected with a user that may have high privileges, while avoiding to disappear in loops or complex links following for UI rendering variants of Moodle s interface.
Here s an example command line :
httrack -v -z -%F "Mirrored [from host %s [file %s [at %s]]]" -N "%h%p/%n%[id].%t" ">postfile:/home/myself/websites/mycourse/hts-post0" "" -O "/home/myself/websites/mycourse" "-*/*" "+/login/index.php*" "+*/course/view.php*" "+*/mod/resource/view.php*" "+*/mod/page/view.php*" "+*/mod/forum/view.php*" "+*/mod/forum/discuss.php?d=*[0-9]" "+*/mod/url/view.php*" "+*/pluginfile.php/*" "+*/mod/feedback/view.php*" "+*/mod/feedback/analysis.php*" "+*/theme/*" "-*/course/view.php?id=43"
Let s comment on these different parameters : I hope this will be useful.

20 August 2014

Olivier Berger: Building a lab VM based on Debian for a MOOC, using Vagrant + VirtualBox

We ve been busy setting up a Virtual Machine (VM) image to be used by participants of a MOOC that s opening in early september on Relational Databases at Telecom SudParis. We ve chosen to use Vagrant and VirtualBox which are used to build, distribute and run the box, providing scriptability (reproducibility) and making it portable on most operating systems. The VM itself contains a Debian (jessie) minimal system which runs (in the background) PostgreSQL, Apache + mod_php, phpPgAdmin, and a few applications of our own to play with example databases already populated in PostgreSQL.
As the MOOC s language will be french, we expect the box to be used mostly on machines with azerty keyboards. This and other context elements led us to add some customizations (locale, APT mirror) in provisioning scripts run during the box creation. At the moment, we generate 2 variants of the box, one for 32 bits kernel (i686) and one for 64 bits kernel (amd64) which (once compressed) represent betw. 300 and 350 Mb. The resulting boxes are uploaded to a self-hosting site, and distributed through vagrantcloud. Once the VM are created in VirtualBox, the typical VMDK drives file is around 1.3Gb. We use our own Debian base boxes containing a minimal Debian jessie/testing, instead of relying on someone else s, and recreate them using (the development branch version of) bootsrap-vz. This ensure we can put more trust in the content as it s a native Debian package installation without MITM intervention. The VM are meant to be run headless for the moment, keeping their size to the minimum, even though we also provide a script to install and configure a desktop environment based on XFCE4. The applications are either used through vagrant ssh, for instance for SQL command-line in psql, or in the Web browser, for our own Web based SQL exerciser, or phpPgAdmin (see a demo screencast (in french, w/ english subtitles)), which can then be used even off-line by the participants, which also means this requires no servers availability for our IT staff.
The MOOC includes a section on PHP + SQL programming, whose exercises can be performed using a shared sub-folder of /vagrant/ which allows editing on the host with the favourite native editor/IDE, while running PHP inside the VM s Apache + mod_php. The sources of our environment are available as free software, if you re interested to replicate a similar environment for another project. As we re still polishing the environment before the MOOC opening (on september 10th), I m not mentioning the box URLs but they shouldn t be too hard to find if you re investigating (refering to the fusionforge project s web site). We don t know yet how suitable this environment will be for learning SQL and database design and programming, and if Vagrant will bring more difficulties than benefits. Still we hope that the participants will find this practical, allowing them to work on the lab / exercises whenever and wherever they chose, removing the pain of installing and configuring a RDBMS on their machines, or the need to be connected to a cloud or to our overloaded servers. Of course, one limitation will be the requirements on the host machines, that will need to be reasonably modern, in order to run a virtualized Linux system. Another is access to high bandwidth for downloading the boxes, but this is kind of a requirement already for downloading/watching the videos of the MOOC classes ;-) Big thanks go to our intern St phane Germain, who joined us this summer to work on this virtualized environment.

14 May 2014

Olivier Berger: Using RDFAlchemy together with RDFLib s SPARQLStore to query DBPedia and process resources in OO way

I ve been searching for interesting ways to manipulate RDF graphs in Python, to create an application that would handle Linked Data Resources in an OO-way, i.e. using Python classes and not tables/sets/lists of triples. The data will be persisted in graphs in a triple store, accessed through a SPARQL enpoint. In this post, I ll illustrate how I managed to tie RDFLib s SPARQLStore plugin and RDFAlchemy to reach a rather nice looking result.
RDFLib provides tools to manipulate graphs, but most of the examples I found didn t load Graph instances from SPARQL, and generally used SPARQLWrapper results (tables) manually. Here comes a first tool to the rescue : SPARQLStore, which allows to dynamically query a remote SPARQL endpoint when navigating an RDFLib graph. I couldn t find a lot of documentation about it, but I found some hints in slide 51 of the excellent presentation by Tatiana Al-Chueyr Linking the world with Python and Semantics. Now, I know how I can manipulate the RDFLib Graphs queried from a remote SPARQL endpoint with the usual methods, I m still not satisfied, because I want OO stuff, not lists of (predicate, object) tuples. Here comes the second tool, RDFAlchemy, which allows to create descriptor classes mapping RDF Resources to Python classes. Here again, there isn t much docs available, but I found the excellent tutorial by Ted Lawless Reading and writing RDF for VIVO with RDFAlchemy. Now we can put the 2 pieces together, modulo a few precautions : the SPARQLStore plugin of RDFLib will generate SPARQL queries everytime a graph is navigated, whereas RDFAlchemy expects the graph to be in memory (at least AFAIU). So we ll have to manually pre-load the contents of the graphs that we need for all the attributes of the descriptor classes. I ve written an example code that illustrates this by trying to query french films in Wikipedia. Here s a copy of the gist. Attention, it will query DBPedia a lot, so pay attention to the bandwidth and memory if you change parameters. It isn t perfect, and I still need to investigate benefits and limitations of the approach. On clear limitation is the number of SPARQL queries made on the endpoint, instead of a smarter pre-loading. Also, on a side note, during tests I could spot a few issues with SPARQLStore, which seems lagging behind probably not used by so many people. The RDFAlchemy project doesn t seem to be in great health, mostly unmaintained from what I can see (and notice that I linked to the GitHub clone/fork that seemed to be the latest maintained while the original author s seems dead), but nevertheless, the code works with a more recent RDFLib, so that s not so bad. Stay tuned for more adventures in Linked Data land in Python.

2 May 2014

Olivier Berger: Debian docker containers using a modified baseimage-docker

I have been testing Docker for a few weeks now, and investigated the use of baseimage-docker, which provides support for supervising services with runit, and includes OpenSSH, among other things, based on an Ubuntu base system. Of couse, I m interested in a Debian counterpart. I had initially followed instructions provided by Steve Kemp which also prepared a Debian image including OpenSSH and runit, but it appears that baseimage-docker provides more tiny bits that avoid reinventing the wheel. I ve then forked the baseimage-docker to do a quick and dirty adaptation for Debian. There s a sid variant (my debian branch) and a wheezy one (my wheezy branch, unsurprisingly). I haven t used all features of baseimage-docker, so things might break for sure. For the records, I m playing with it as a base image to construct a docker-based container running the FusionForge test suite. Did I warn you it s quick and dirty and without any warranty ? Hoping that this is useful anyway.

7 April 2014

Olivier Berger: Tagged a first version of the TWiki to FusionForge s MediaWiki converter

As announced previously, I ve been hacking on a migration tool allowing to import into the MediaWiki of a FusionForge project, a conversion of the contents of a TWiki wiki. I ve succesfully imported a first project (from PicoForge to FusionForge) using the tool, so I ve decided to tag a first release and make the Git repo accessible. More details at : Feel free to ask here in the comments or by email, in case of need. And, yes, my Python is most likely awful, but at least, this works, and much more featureful than existing tools I could test.

30 March 2014

Olivier Berger: Working on a TWiki to MediaWiki converter (targetting FusionForge wikis)

I m currently working on a wiki converter allowing me to transfer old TWiki wikis (hosted on picoforge) to MediaWikis hosted on FusionForge. Unlike existing tools that I ve found that more or less target the same needs, mine will address two peculiarities : The tool is written in Python, and will include my own crappy wiki syntax converter in Python, instead of spawning existing Perl scripts, as others did. It may happen to work for FosWiki too, but I don t intend to use it beyond our old TWiki installations, for a start. Stay tuned for more progress updates. Edit: I ve now released a first version.