Search Results: "Kunal Mehta"

2 September 2022

Kunal Mehta: Kiwix in Debian, 2022 update

Previous updates: 2018, 2021 Kiwix is an offline content reader, best known for distributing copies of Wikipedia. I have been maintaining it in Debian since 2017. This year most of the work has been keeping all the packages up to date in anticipation of next year's Debian 12 Bookworm release, including several transitions for new libzim and libkiwix versions. The Debian Package Tracker makes it really easy to keep an eye on all Kiwix-related packages. All of the "user-facing" packages (zim-tools, kiwix-tools, kiwix) now have very basic autopkgtests that can provide a bit of confidence that the package isn't totally broken. I recommend reading the "FAQ for package maintainers" to learn about all the benefits you get from having autopkgtests. Finally, back in March I wrote a blog post, How to mirror the Russian Wikipedia with Debian and Kiwix, which got significant readership (compared to most posts on this blog), including being quoted by LWN! We are always looking for more contributors, please reach out if you're interested. The Kiwix team is one of my favorite groups of people to work with and they love Debian too.

15 March 2022

Kunal Mehta: How to mirror the Russian Wikipedia with Debian and Kiwix

It has been reported that the Russian government has threatened to block access to Wikipedia for documenting narratives that do not agree with the official position of the Russian government. One of the anti-censorship strategies I've been working on is Kiwix, an offline Wikipedia reader (and plenty of other content too). Kiwix is free and open source software developed by a great community of people that I really enjoy working with. With threats of censorship, traffic to Kiwix has increased fifty-fold, with users from Russia accounting for 40% of new downloads! You can download copies of every language of Wikipedia for offline reading and distribution, as well as hosting your own read-only mirror, which I'm going to explain today. Disclaimer: depending on where you live it may be illegal or get you in trouble with the authorities to rehost Wikipedia content, please be aware of your digital and physical safety before proceeding. With that out of the way, let's get started. You'll need a Debian (or Ubuntu) server with at least 30GB of free disk space. You'll also want to have a webserver like Apache or nginx installed (I'll share the Apache config here). First, we need to download the latest copy of the Russian Wikipedia.
$ wget 'https://download.kiwix.org/zim/wikipedia/wikipedia_ru_all_maxi_2022-03.zim'
If the download is interrupted or fails, you can use wget -c $url to resume it. Next let's install kiwix-serve and try it out. If you're using Ubuntu, I strongly recommend enabling our Kiwix PPA first.
$ sudo apt update
$ sudo apt install kiwix-tools
$ kiwix-serve -p 3004 wikipedia_ru_all_maxi_2022-03.zim
At this point you should be able to visit http://yourserver.com:3004/ and see the Russian Wikipedia. Awesome! You can use any available port, I just picked 3004. Now let's use systemd to daemonize it so it runs in the background. Create /etc/systemd/system/kiwix-ru-wp.service with the following:
[Unit]
Description=Kiwix Russian Wikipedia
[Service]
Type=simple
User=www-data
ExecStart=/usr/bin/kiwix-serve -p 3004 /path/to/wikipedia_ru_all_maxi_2022-03.zim
Restart=always
[Install]
WantedBy=multi-user.target
Now let's start it and enable it at boot:
$ sudo systemctl start kiwix-ru-wp
$ sudo systemctl enable kiwix-ru-wp
Since we want to expose this on the public internet, we should put it behind a more established webserver and configure HTTPS. Here's the Apache httpd configuration I used:
<VirtualHost *:80>
        ServerName ru-wp.yourserver.com
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html
        ErrorLog $ APACHE_LOG_DIR /error.log
        CustomLog $ APACHE_LOG_DIR /access.log combined
        <Proxy *>
                Require all granted
        </Proxy>
        ProxyPass / http://127.0.0.1:3004/
        ProxyPassReverse / http://127.0.0.1:3004/
</VirtualHost>
Put that in /etc/apache2/sites-available/kiwix-ru-wp.conf and run:
$ sudo a2ensite kiwix-ru-wp
$ sudo systemctl reload apache2
Finally, I used certbot to enable HTTPS on that subdomain and redirect all HTTP traffic over to HTTPS. This is an interactive process that is well documented so I'm not going to go into it in detail. You can see my mirror of the Russian Wikipedia, following these instructions, at https://ru-wp.legoktm.com/. Anyone is welcome to use it or distribute the link, though I am not committing to running it long-term. This is certainly not a perfect anti-censorship solution, the copy of Wikipedia that Kiwix provides became out of date the moment it was created, and the setup described here will require you to manually update the service when the new copy is available next month. Finally, if you have some extra bandwith, you can also help seed this as a torrent.

19 August 2021

Kunal Mehta: Kiwix returns in Debian Bullseye

(This is my belated #newindebianbullseye post.) The latest version of the Debian distro, 11.0 aka Bullseye, was released last week and after a long absence, includes Kiwix! Previously in Debian 10/Buster, we only had the underlying C/C++ libraries available. If you're not familiar with it, Kiwix is an offline content reader, providing Wikipedia, Gutenberg, TED talks, and more in ZIM (.zim) files that can be downloaded and viewed entirely offline. You can get the entire text of the English Wikipedia in less than 100GB. apt install kiwix will get you a graphical desktop application that allows you to download and read ZIMs. apt install kiwix-tools installs kiwix-serve (among others), which serves ZIM files over an HTTP server. Additionally, there are now tools in Debian that allow you to create your own ZIM files: zimwriterfs and the python3-libzim library. All of this would not have been possible without the support of the Kiwix developers, who made it a priority to properly support Debian. All of the Kiwix and repositories have a CI process that builds Debian packages for each pull request and needs to pass before it'll be accepted. Ubuntu users can take advantage of our primary PPA or the bleeding-edge PPA. For Debian users, my goal is that unstable/sid will have the latest verison within a few days of a release, and once it moves into testing, it'll be available in Debian Backports. It is always a pleasure working with the Kiwix team, who make a point to send stickers and chocolate every year :)

14 December 2020

Kunal Mehta: Starting a new job

Last week I officially joined the Site Reliability Engineering team at the Wikimedia Foundation. I'll be working with the Service Operations team, which "...takes care of public and user-visible services." I'm glad to be back at the WMF; I had originally started working there in 2013 but recently took a break to finish school. SRE will be my ninth distinct team at the WMF, and I'm looking forward to even more adventures. As part of transitioning into my new role, I have unsubscribed myself from most MediaWiki bug mail and Gerrit notifications. Once I get more situated I'll put out a more detailed request for new maintainers for the components that need them. I'll continue taking care of maintenance as needed until then. P.S.: I created a new userbox about Rust on mediawiki.org.

12 May 2020

Kunal Mehta: MediaWiki packages for Ubuntu 20.04 Focal available

Packages for the MediaWiki 1.31 LTS release are now available for the new Ubuntu 20.04 LTS "Focal Fossa" release in my PPA. Please let me know if you run into any errors or issues. In the future these packages will be upgraded to the MediaWiki 1.35 LTS release, whenever that's ready. It's currently delayed because of the pandemic, but I expect that it'll be ready for the next Debian release.

7 March 2017

Bits from Debian: New Debian Developers and Maintainers (January and February 2017)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!