Search Results: "jwilk"

1 October 2023

Paul Wise: FLOSS Activities September 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review
  • Spam: reported 2 Debian bug reports
  • Debian wiki: RecentChanges for the month
  • Debian BTS usertags: changes for the month
  • Debian screenshots:
    • approved fzf lame lsd termshark vifm
    • rejected orthanc (private data), gpr/orthanc (Windows), qrencode (random QR codes), weboob-qt (chess website)

Administration
  • Debian IRC: fix #debian-pkg-security topic/metadata
  • Debian wiki: unblock IP addresses, approve accounts

Communication

Sponsors The SWH work was sponsored. All other work was done on a volunteer basis.

22 October 2021

Enrico Zini: Scanning for imports in Python scripts

I had to package a nontrivial Python codebase, and I needed to put dependencies in setup.py. I could do git grep -h import sort -u, then review the output by hand, but I lacked the motivation for it. Much better to take a stab at solving the general problem The result is at https://github.com/spanezz/python-devel-tools. One fun part is scanning a directory tree, using ast to find import statements scattered around the code:
class Scanner:
    def __init__(self):
        self.names: Set[str] = set()
    def scan_dir(self, root: str):
        for dirpath, dirnames, filenames, dir_fd in os.fwalk(root):
            for fn in filenames:
                if fn.endswith(".py"):
                    with dirfd_open(fn, dir_fd=dir_fd) as fd:
                        self.scan_file(fd, os.path.join(dirpath, fn))
                st = os.stat(fn, dir_fd=dir_fd)
                if st.st_mode & (stat.S_IXUSR   stat.S_IXGRP   stat.S_IXOTH):
                    with dirfd_open(fn, dir_fd=dir_fd) as fd:
                        try:
                            lead = fd.readline()
                        except UnicodeDecodeError:
                            continue
                        if re_python_shebang.match(lead):
                            fd.seek(0)
                            self.scan_file(fd, os.path.join(dirpath, fn))
    def scan_file(self, fd: TextIO, pathname: str):
        log.info("Reading file %s", pathname)
        try:
            tree = ast.parse(fd.read(), pathname)
        except SyntaxError as e:
            log.warning("%s: file cannot be parsed", pathname, exc_info=e)
            return
        self.scan_tree(tree)
    def scan_tree(self, tree: ast.AST):
        for stm in tree.body:
            if isinstance(stm, ast.Import):
                for alias in stm.names:
                    if not isinstance(alias.name, str):
                        print("NAME", repr(alias.name), stm)
                    self.names.add(alias.name)
            elif isinstance(stm, ast.ImportFrom):
                if stm.module is not None:
                    self.names.add(stm.module)
            elif hasattr(stm, "body"):
                self.scan_tree(stm)
Another fun part is grouping the imported module names by where in sys.path they have been found:
    scanner = Scanner()
    scanner.scan_dir(args.dir)
    sys.path.append(args.dir)
    by_sys_path: Dict[str, List[str]] = collections.defaultdict(list)
    for name in sorted(scanner.names):
        spec = importlib.util.find_spec(name)
        if spec is None or spec.origin is None:
            by_sys_path[""].append(name)
        else:
            for sp in sys.path:
                if spec.origin.startswith(sp):
                    by_sys_path[sp].append(name)
                    break
            else:
                by_sys_path[spec.origin].append(name)
    for sys_path, names in sorted(by_sys_path.items()):
        print(f" sys_path or 'unidentified' :")
        for name in names:
            print(f"   name ")
An example. It's kind of nice how it can at least tell apart stdlib modules so one doesn't need to read through those:
$ ./scan-imports  /himblick
unidentified:
  changemonitor
  chroot
  cmdline
  mediadir
  player
  server
  settings
  static
  syncer
  utils
 /himblick:
  himblib.cmdline
  himblib.host_setup
  himblib.player
  himblib.sd
/usr/lib/python3.9:
  __future__
  argparse
  asyncio
  collections
  configparser
  contextlib
  datetime
  io
  json
  logging
  mimetypes
  os
  pathlib
  re
  secrets
  shlex
  shutil
  signal
  subprocess
  tempfile
  textwrap
  typing
/usr/lib/python3/dist-packages:
  asyncssh
  parted
  progressbar
  pyinotify
  setuptools
  tornado
  tornado.escape
  tornado.httpserver
  tornado.ioloop
  tornado.netutil
  tornado.web
  tornado.websocket
  yaml
built-in:
  sys
  time
Maybe such a tool already exists and works much better than this? From a quick search I didn't find it, and it was fun to (re)invent it. Updates: Jakub Wilk pointed out to an old python-modules script that finds Debian dependencies. The AST scanning code should be refactored to use ast.NodeVisitor.

30 September 2017

Chris Lamb: Free software activities in September 2017

Here is my monthly update covering what I have been doing in the free software world in September 2017 (previous month):
Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to allow verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area. This month I:
  • Published a short blog post about how to determine which packages on your system are reproducible. [...]
  • Submitted a pull request for Numpy to make the generated config.py files reproducible. [...]
  • Provided a patch to GTK upstream to ensure the immodules.cache files are reproducible. [...]
  • Within Debian:
    • Updated isdebianreproducibleyet.com, moving it to HTTPS, adding cachebusting as well as keeping the number up-to-date.
    • Submitted the following patches to fix reproducibility-related toolchain issues:
      • gdk-pixbuf: Make the output of gdk-pixbuf-query-loaders reproducible. (#875704)
      • texlive-bin: Make PDF IDs reproducible. (#874102)
    • Submitted a patch to fix a reproducibility issue in doit.
  • Categorised a large number of packages and issues in the Reproducible Builds "notes" repository.
  • Chaired our monthly IRC meeting. [...]
  • Worked on publishing our weekly reports. (#123, #124, #125, #126 & #127)


I also made the following changes to our tooling:
reproducible-check

reproducible-check is our script to determine which packages actually installed on your system are reproducible or not.

  • Handle multi-architecture systems correctly. (#875887)
  • Use the "restricted" data file to mask transient issues. (#875861)
  • Expire the cache file after one day and base the local cache filename on the remote name. [...] [...]
I also blogged about this utility. [...]
diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues.

  • Filed an issue attempting to identify the causes behind an increased number of timeouts visible in our CI infrastructure, including running a number of benchmarks of recent versions. (#875324)
  • New features:
    • Add "binwalking" support to analyse concatenated CPIO archives such as initramfs images. (#820631).
    • Print a message if we are reading data from standard input. [...]
  • Bug fixes:
    • Loosen matching of file(1)'s output to ensure we correctly also match TTF files under file version 5.32. [...]
    • Correct references to path_apparent_size in comparators.utils.file and self.buf in diffoscope.diff. [...] [...]
  • Testing:
    • Make failing some critical flake8 tests result in a failed build. [...]
    • Check we identify all CPIO fixtures. [...]
  • Misc:
    • No need for try-assert-except block in setup.py. [...]
    • Compare types with identity not equality. [...] [...]
    • Use logging.py's lazy argument interpolation. [...]
    • Remove unused imports. [...]
    • Numerous PEP8, flake8, whitespace, other cosmetic tidy-ups.

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build.

  • Log which handler processed a file. (#876140). [...]

disorderfs

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues.



Debian My activities as the current Debian Project Leader are covered in my monthly "Bits from the DPL" email to the debian-devel-announce mailing list.
Lintian I made a large number of changes to Lintian, the static analysis tool for Debian packages. It reports on various errors, omissions and general quality-assurance issues to maintainers: I also blogged specifically about the Lintian 2.5.54 release.

Patches contributed
  • debconf: Please add a context manager to debconf.py. (#877096)
  • nm.debian.org: Add pronouns to ALL_STATUS_DESC. (#875128)
  • user-setup: Please drop set_special_users hack added for "the convenience of heavy testers". (#875909)
  • postgresql-common: Please update README.Debian for PostgreSQL 10. (#876438)
  • django-sitetree: Should not mask test failures. (#877321)
  • charmtimetracker:
    • Missing binary dependency on libqt5sql5-sqlite. (#873918)
    • Please drop "Cross-Platform" from package description. (#873917)
I also submitted 5 patches for packages with incorrect calls to find(1) in debian/rules against hamster-applet, libkml, pyferret, python-gssapi & roundcube.

Debian LTS

This month I have been paid to work 15 hours on Debian Long Term Support (LTS). In that time I did the following:
  • "Frontdesk" duties, triaging CVEs, etc.
  • Documented an example usage of autopkgtests to test security changes.
  • Issued DLA 1084-1 and DLA 1085-1 for libidn and libidn2-0 to fix an integer overflow vulnerabilities in Punycode handling.
  • Issued DLA 1091-1 for unrar-free to prevent a directory traversal vulnerability from a specially-crafted .rar archive. This update introduces an regression test.
  • Issued DLA 1092-1 for libarchive to prevent malicious .xar archives causing a denial of service via a heap-based buffer over-read.
  • Issued DLA 1096-1 for wordpress-shibboleth, correcting an cross-site scripting vulnerability in the Shibboleth identity provider module.

Uploads
  • python-django:
    • 1.11.5-1 New upstream security release. (#874415)
    • 1.11.5-2 Apply upstream patch to fix QuerySet.defer() with "super" and "subclass" fields. (#876816)
    • 2.0~alpha1-2 New upstream alpha release of Django 2.0, dropping support for Python 2.x.
  • redis:
    • 4.0.2-1 New upstream release.
    • 4.0.2-2 Update 0004-redis-check-rdb autopkgtest test to ensure that the redis.rdb file exists before testing against it.
    • 4.0.2-2~bpo9+1 Upload to stretch-backports.
  • aptfs (0.11.0-1) New upstream release, moving away from using /var/lib/apt/lists internals. Thanks to Julian Andres Klode for a helpful bug report. (#874765)
  • lintian (2.5.53, 2.5.54) New upstream releases. (Documented in more detail above.)
  • bfs (1.1.2-1) New upstream release.
  • docbook-to-man (1:2.0.0-39) Tighten autopkgtests and enable testing via travis.debian.net.
  • python-daiquiri (1.3.0-1) New upstream release.

I also made the following non-maintainer uploads (NMUs):

Debian bugs filed
  • clipit: Please choose a sensible startup default in "live" mode. (#875903)
  • git-buildpackage: Please add a --reset option to gbp pull. (#875852)
  • bluez: Please default Device "friendly name" to hostname without domain. (#874094)
  • bugs.debian.org: Please explicitly link to packages,tracker .debian.org. (#876746)
  • Requests for packaging:
    • selfspy log everything you do on the computer. (#873955)
    • shoogle use the Google API from the shell. (#873916)

FTP Team

As a Debian FTP assistant I ACCEPTed 86 packages: bgw-replstatus, build-essential, caja-admin, caja-rename, calamares, cdiff, cockpit, colorized-logs, comptext, comptty, copyq, django-allauth, django-paintstore, django-q, django-test-without-migrations, docker-runc, emacs-db, emacs-uuid, esxml, fast5, flake8-docstrings, gcc-6-doc, gcc-7-doc, gcc-8, golang-github-go-logfmt-logfmt, golang-github-google-go-cmp, golang-github-nightlyone-lockfile, golang-github-oklog-ulid, golang-pault-go-macchanger, h2o, inhomog, ip4r, ldc, libayatana-appindicator, libbson-perl, libencoding-fixlatin-perl, libfile-monitor-lite-perl, libhtml-restrict-perl, libmojo-rabbitmq-client-perl, libmoosex-types-laxnum-perl, libparse-mime-perl, libplack-test-agent-perl, libpod-projectdocs-perl, libregexp-pattern-license-perl, libstring-trim-perl, libtext-simpletable-autowidth-perl, libvirt, linux, mac-fdisk, myspell-sq, node-coveralls, node-module-deps, nov-el, owncloud-client, pantomime-clojure, pg-dirtyread, pgfincore, pgpool2, pgsql-asn1oid, phpliteadmin, powerlevel9k, pyjokes, python-evdev, python-oslo.db, python-pygal, python-wsaccel, python3.7, r-cran-bindrcpp, r-cran-dotcall64, r-cran-glue, r-cran-gtable, r-cran-pkgconfig, r-cran-rlang, r-cran-spatstat.utils, resolvconf-admin, retro-gtk, ring-ssl-clojure, robot-detection, rpy2-2.8, ruby-hocon, sass-stylesheets-compass, selinux-dbus, selinux-python, statsmodels, webkit2-sharp & weston. I additionally filed 4 RC bugs against packages that had incomplete debian/copyright files against: comptext, comptext, ldc & python-oslo.concurrency.

31 January 2017

Chris Lamb: Free software activities in January 2017

Here is my monthly update covering what I have been doing in the free software world (previous month):
Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to permit verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. (I have previously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area.) This month I:
I also made the following changes to our tooling:
diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues.

  • Comparators:
    • Display magic file type when we know the file format but can't find file-specific details. (Closes: #850850).
    • Ensure our "APK metadata" file appears first, fixing non-deterministic tests. (998b288)
    • Fix APK extration with absolute filenames. (Closes: #850485).
    • Don't error if directory containing ELF debug symbols already exists. (Closes: #850807).
    • Support comparing .ico files (Closes: #850730).
    • If we don't have a tool (eg. apktool), don't blow up when attempting to unpack it.
  • Output formats:
    • Add Markdown output format. (Closes: #848141).
    • Add RestructuredText output format.
    • Use an optimised indentation routine throughout all presenters.
    • Move text presenter to use the Visitor pattern.
    • Correctly escape value of href="" elements (re. #849411).
  • Tests:
    • Prevent FTBFS by loading fixtures as UTF-8 in case surrounding terminal is not Unicode-aware. (Closes: #852926).
    • Skip tests if binutils can't handle the object file format. (Closes: #851588).
    • Actually compare the output of text/ReST/Markdown formats to fixtures.
    • Add tests for: Comparing two empty directories, HTML output, image.ICOImageFile, --html-dir, --text-color & no arguments (beyond the filenames) emits the text output.
  • Profiling:
    • Count the number of calls, not just the total time.
    • Skip as much profiling overhead when not enabled for a ~2% speedup.
  • Misc:
    • Alias an expensive Config() lookup for a 10% optimisation.
    • Avoid expensive regex creation until we actually need it, speeding up diff parsing by 2X.
    • Use Pythonic logging functions based on __name__, etc.
    • Drop milliseconds from logging output.

buildinfo.debian.net

buildinfo.debian.net is my experiment into how to process, store and distribute .buildinfo files after the Debian archive software has processed them.

  • Store files directly onto S3.
  • Drop big unique_together index to save disk space.
  • Show SHA256 checksums where space permits.


Debian LTS

This month I have been paid to work 12.75 hours on Debian Long Term Support (LTS). In that time I did the following:
  • "Frontdesk" duties, triaging CVEs, etc.
  • Issued DLA 773-1 for python-crypto fixing a vulnerability where calling AES.new with an invalid parameter could crash the Python interpreter.
  • Issued DLA 777-1 for libvncserver addressing two heap-based buffer overflow attacks based on invalid FramebufferUpdate data.
  • Issued DLA 778-1 for pcsc-lite correcting a use-after-free vulnerability.
  • Issued DLA 795-1 for hesiod which fixed a weak SUID check as well as removed the hard-coding of a fallback domain if the configuration file could not be found.
  • Issued DLA 810-1 for libarchive fixing a heap buffer overflow.

Uploads
  • python-django:
    • 1:1.10.5-1 New upstream stable release.
    • 1:1.11~alpha1-1 New upstream experimental release.
  • gunicorn (19.6.0-10) Moved debian/README.Debian to debian/NEWS so that the recent important changes will be displayed to users when upgrading to stretch.
  • redis:
    • 3:3.2.6-2 & 4:4.0-rc2-2 Tidy patches and rename RunTimeDirectory to RuntimeDirectory in .service files. (Closes: #850534)
    • 3:3.2.6-3 Remove a duplicate redis-server binary by symlinking /usr/bin/redis-check-rdb. This was found by the dedup service.
    • 3:3.2.6-4 Expand the documentation in redis-server.service and redis-sentinel.service regarding the default hardening options and how, in most installations, they can be increased.
    • 3:3.2.6-5, 3:3.2.6-6, 4:4.0-rc2-3 & 4:4.0-rc2-4 Add taskset calls to try and avoid build failures due to parallelism in upstream test suite.

I also made the following non-maintainer uploads:
  • cpio:
    • 2.12+dfsg-1 New upstream release (to experimental), refreshing all patches, etc.
    • 2.12+dfsg-2 Add missing autoconf to Build-Depends.
  • xjump (2.7.5-6.2) Make the build reproducible by passing -n to gzip calls in debian/rules. (Closes: #777354)
  • magicfilter (1.2-64.1) Make the build reproducible by passing -n to gzip calls in debian/rules. (Closes: #777478)



4 July 2016

Paul Wise: Check All The thingS!

check-all-the-things (aka cats, Meow!) is a tool that aims to make it easy to know which tools can be used to check a directory tree and to make it easy to run those tools on the directory tree. The tree could either be a source tree or a build tree or both. It aims to check as much of the tree as possible so the output can be very verbose and have many false positives. It is not for the busy, lazy or noise intolerant. It runs the checks by matching file names and MIME types against those registered for a list of checks. Each check has a set of dependencies, flags, filename wildcards, MIME type wildcards, comments and prerequisite commands. By default it: It runs all checks for the current distro/release except: There are command-line options to customise the behaviour and automatic bash shell completion via argcomplete. There are 177 checks (including TODO ones) in 73 different categories. There are an additional 224 not-well-specified TODO items for new checks in comments. It is exceptionally easy to add new checks once one knows how to use the tool one wants to add. At this point in time it is probably not a good idea to run it in an untrusted directory tree for several reasons: The project initially started as really hacky wiki page full of commands to run. At some point I figured it was time to make this actually be maintainable and started on a project to do that. At around the same time Jakub Wilk was working on maquack to replace the wiki page. Somehow I found out about it and talked to him about it. It was vastly less hacky than my version so I ended up taking it over and continuing it under the check-all-the-things name. I polished it for the last two years and finally released it into Debian unstable during DebCamp16.

29 December 2015

David Bremner: Converting PDFs to DJVU

Today I was wondering about converting a pdf made from scan of a book into djvu, hopefully to reduce the size, without too much loss of quality. My initial experiments with pdf2djvu were a bit discouraging, so I invested some time building gsdjvu in order to be able to run djvudigital. Watching the messages from djvudigital I realized that the reason it was achieving so much better compression was that it was using black and white for the foreground layer by default. I also figured out that the default 300dpi looks crappy since my source document is apparently 600dpi. I then went back an compared djvudigital to pdf2djvu a bit more carefully. My not-very-scientific conclusions: Perhaps most compellingly, the output from pdf2djvu has sensible metadata and is searchable in evince. Even with the --words option, the output from djvudigital is not. This is possibly related to the error messages like
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .
It could well be my fault, because building gsdjvu involved guessing at corrections for several errors. Some of these issues have to do with building software from 2009 (the instructions suggestion building with ghostscript 8.64) in a modern toolchain; others I'm not sure. There was an upload of gsdjvu in February of 2015, somewhat to my surprise. AT&T has more or less crippled the project by licensing it under the CPL, which means binaries are not distributable, hence motivation to fix all the rough edges is minimal.
Version kilobytes per page position in figure
Original PDF 80.9 top
pdf2djvu --dpi=450 92.0 not shown
pdf2djvu --monochrome --dpi=450 27.5 second from top
pdf2djvu --monochrome --dpi=600 --loss-level=50 21.3 second from bottom
djvudigital --dpi=450 29.4 bottom
djvu-compare.png

19 September 2015

Jakub Wilk: Executable URLs

or how to write shell scripts without using whitespace.

Jakub Wilk: Executable URLs

or how to write shell scripts without using whitespace.

22 July 2015

James McCoy: porterbox-logins

Some time ago, pabs documented his setup for easily connecting to one of Debian's porterboxes based on the desired architecture. Similarly, he submitted a wishlist bug against devscripts specifying some requirements for a script to make this functionality generally accessible to the developer community. I have yet to follow up on that request mainly due to ENOTIME for developing new scripts outright. I also have my own script I had been using to get information on available Debian machines. Recently, this came up again on IRC and jwilk decided to actually implement pabs' DNS alias idea. Now, one can use $arch-porterbox.debian.net to connect to a porterbox of the specified architecture. Preference is given to debian.org domains when there are both debian.org and debian.net porterboxes, and it's a simple use first listed machine if there are multiple available porterboxes. This is all well and good, but if you have SSH's StrictHostKeyChecking enabled, SSH will rightly refuse to connect. However, OpenSSH 6.5 added a feature called hostname canonicalization which can help. The below ssh_config snippet allows one to run ssh $arch-porterbox or ssh $arch-porterbox.debian.net and connect to one of the porterboxes, verifying the host key against the canonical host name.
Host *-porterbox
  HostName %h.debian.net
Match host *-porterbox.debian.net
  CanonicalizeHostname yes
  CanonicalizePermittedCNAMEs *-porterbox.debian.net:*.debian.org

26 September 2014

Jakub Wilk: Pet peeves: debhelper build-dependencies (redux)

$ zcat Sources.gz   grep -o -E 'debhelper [(]>= 9[.][0-9] ,7 ([^0-9)][^)]*)?[)]'   sort   uniq -c   sort -rn
    338 debhelper (>= 9.0.0)
     70 debhelper (>= 9.0)
     18 debhelper (>= 9.0.0~)
     10 debhelper (>= 9.0~)
      2 debhelper (>= 9.2)
      1 debhelper (>= 9.2~)
      1 debhelper (>= 9.0.50~)
Is it a way to protest against the current debhelper's version scheme?

4 September 2014

Jakub Wilk: Joys of East Asian encodings

In i18nspector I try to support all the encodings that were blessed by gettext, but it turns out to be more difficult than I anticipated:
$ roundtrip()   c=$(echo $1   iconv -t $2); printf '%s -> %s -> %s\n' $1 $c $(echo $c   iconv -f "$2");  
$ roundtrip   EUC-JP
  -> \ -> \
$ roundtrip   SHIFT_JIS
  -> \ ->  
$ roundtrip   JOHAB
  -> \ ->  
Now let's do the same in Python:
$ python3 -q
>>> roundtrip = lambda s, e: print('%s -> %s -> %s' % (s, s.encode(e).decode('ASCII', 'replace'), s.encode(e).decode(e)))
>>> roundtrip(' ', 'EUC-JP')
  -> \ -> \
>>> roundtrip(' ', 'SHIFT_JIS')
  -> \ -> \
>>> roundtrip(' ', 'JOHAB')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
UnicodeEncodeError: 'johab' codec can't encode character '\u20a9' in position 0: illegal multibyte sequence
So is 0x5C a backslash or a yen/won sign? Or both? And what if 0x5C could be a second byte of a two-byte character? What could possibly go wrong?

29 August 2014

Jakub Wilk: More spell-checking

Have you ever wanted to use Lintian's spell-checker against arbitrary files? Now you can do it with spellintian:
$ zrun spellintian --picky /usr/share/doc/RFC/best-current-practice/rfc*
/tmp/0qgJD1Xa1Y-rfc1917.txt: amoung -> among
/tmp/kvZtN435CE-rfc3155.txt: transfered -> transferred
/tmp/o093khYE09-rfc3481.txt: unecessary -> unnecessary
/tmp/4P0ux2cZWK-rfc6365.txt: charater -> character
mwic (Misspelled Words In Context) takes a different approach. It uses classic spell-checking libraries (via Enchant), but it groups misspellings and shows them in their contexts. That way you can quickly filter out false-positives, which are very common in technical texts, using visual grep:
$ zrun mwic /usr/share/doc/debian/social-contract.txt.gz
DFSG:
   an Free Software Guidelines (DFSG)
   an Free Software Guidelines (DFSG) part of the
                                ^^^^
Perens:
     Bruce Perens later removed the Debian-spe 
  by Bruce Perens, refined by the other Debian 
           ^^^^^^
Ean, Schuessler:
  community" was suggested by Ean Schuessler. This document was drafted
                              ^^^ ^^^^^^^^^^
GPL:
  The "GPL", "BSD", and "Artistic" lice 
       ^^^
contrib:
  created "contrib" and "non-free" areas in our 
           ^^^^^^^
CDs:
  their CDs. Thus, although non-free wor 
        ^^^

21 February 2014

Jakub Wilk: For those who care about snowclones

Instances of the for those who care about X snowclone on Debian mailing lists:

25 December 2013

Jakub Wilk: dput

My Internet connection is too flaky to use dput(-ng) reliably, so I use this tiny replacement instead:
#!/bin/sh
dcmd rsync --chmod=0644 -P "$@" ssh.upload.debian.org:/srv/upload.debian.org/UploadQueue/

5 December 2013

Jakub Wilk: A-za-z a-zA-z

Releasing the shift key is hard.

4 November 2013

Jakub Wilk: ~/.netrc security

TL;DR: don't put valuable passwords in ~/.netrc In the olden days, the ~/.netrc file was used for storing FTP usernames and passwords. These days we have clients of other protocols that use said file. Perhaps your IMAP or SMTP client use it. So you put your e-mail accounts password into ~/.netrc, and then meticulously configured the clients to always connect via TLS and to verify server certificates. You feel secure. But you shouldn't. Here's how an attacker capable of MiTM can exploit wget to steal ~/.netrc passwords: 1) Alice tries to download a file over HTTP:
$ wget http://xkcd.com/538/
2) Eve takes over the HTTP connection, sending a redirection response:
HTTP/1.1 303 See Other
Location: http://supersecuremail.example.net/
3) Alice's wget follows the redirection. 4) Eve takes over the connection to supersecuremail.example.net, requesting password authentication:
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="moo"
5) Alice's wget sends the supersecuremail.example.net password straight to Eve.

18 August 2013

Ben Armstrong: Taskwarrior new blog, getting involved

The Taskwarrior Team has just started a blog, kicking it off with a series of articles about development of Taskwarrior itself. Go, team! Almost from the moment I started using Taskwarrior (thanks to Jakub WIlk for an excellent job maintaining this) I knew I had finally found the todo system that I could love. Right away, I started hanging out with the Taskwarrior community at #taskwarrior @ irc.freenode.net and found out what an awesome bunch of people they are, both developers and users alike. I have plunged in with bug reports and feature requests, and am helping get more Taskwarrior-related things into Debian. In NEW right now I have uploaded several of the dependencies needed by my ITP of taskwarrior-web. It s looking like I ll have that finished later this month or early next month. Also, I have uploaded a wheezy backport of Taskwarrior itself (aka task ) which, if all goes well, enters the archive next week.

10 April 2013

Paul Wise: Inadequate software

Just 168 of the 4961 packages (3%) I have installed are inadequate. Unfortunately those packages collectively have 3440 inadequacies. How much of the software on your system has these inadequacies? You can find out today by installing Jakub Wilk's software, which is appropriately named adequate. It is now available in Debian unstable. I recommend enabling the apt hook which notifies you when software you are installing is inadequate. Other ways of being notified when you are installing inadequate software include apt-listbugs and debsecan. If you are interested in software quality, Debian's QA activities wiki page provides a good overview of the quality assurance activities that are being worked on within the context of Debian. If you want to provide better quality software for Debian, please keep an eye on the PTS pages for software you maintain. You can also run various automated checks on your software before you make new releases or upload them to the Debian archive. More people are needed to improve and expand upon Debian's existing quality assurance activities and infrastructure. Come join us today!

12 January 2013

Jakub Wilk: Internationalization environment variables

Setting internationalization environment variables is a bit tricky. For example, this:
$ LANG=sv_SE.UTF-8 stat /nonexistent
may look like a way to make stat(1) print the error message in Swedish. Yet there are many ways it could go wrong:
  1. LC_MESSAGES could be set in the environment, overriding LANG.
  2. LC_ALL could be set, overriding both LC_MESSAGES and LANG.
  3. LANGUAGE could be set, overriding LC_ALL, LC_MESSAGES, and LANG. For extra complexity, LANGUAGE has no effect if LC_MESSAGES is effectively set to C. (Also, this variable is a GNUism.)
  4. The locale could be simply missing from the system.
To make these things a little less intricate, I wrote localehelper. It's a bit like env(1), but it takes care of: This does the right thing:
$ localehelper LANG=sv_SE.UTF-8 stat /nonexistent
stat: kan inte ta status p   /nonexistent : Filen eller katalogen finns inte

11 November 2012

Jakub Wilk: All those translations, how did you ever manage it?

I've just released initial version of gettext-inspector, a tool for checking gettext PO/POT/MO files. While it's in an early stage of development, it's already able to detect wide rage of problems. For example, this is what it emits on my system:
$ gettext-inspector /usr/share/locale/*/LC_MESSAGES/*.mo   cut -d ' ' -f 1,3   sort   uniq -c   sort -rn
   1902 P: no-language-header-field
   1601 P: no-version-in-project-id-version
   1372 W: no-report-msgid-bugs-to-header-field
    273 P: invalid-content-transfer-encoding
    201 W: invalid-date
     78 W: syntax-error-in-plural-forms
     77 I: no-package-name-in-project-id-version
     63 W: boilerplate-in-report-msgid-bugs-to
     50 W: language-disparity
     47 I: unknown-header-field
     38 W: invalid-language
     25 W: boilerplate-in-project-id-version
     10 I: unknown-poedit-language
      8 W: no-date-header-field
      5 W: no-project-id-version-header-field
      5 W: c1-control-characters
      3 P: no-mime-version-header-field
      3 P: no-content-transfer-encoding-header-field
      2 W: non-portable-encoding
      1 W: invalid-report-msgid-bugs-to
      1 W: ancient-date
      1 I: unable-to-determine-language

Next.