Search Results: "dkogan"

5 May 2023

Dima Kogan: mrcal 2.3 released!

Today I released mrcal 2.3 (the release notes are available here). Once again, in the code there are lots of useful improvements, but nothing major. The big update in this release is the documentation. Much of it was improved and extended, especially practical guides in the how-to-calibrate page and the recipes. Major updates are imminent. I'm about to merge the cross-projection uncertainty branch and the triangulated-points-in-the-solver branch to study chessboard-less calibrations and structure from motion. Neither of these are novel, but mrcal's improved lens models and uncertainty propagation will hopefully produce better results.

20 April 2023

Dima Kogan: =numpy.percentile= API update

The numpy devs did a bad thing. Don't be like the numpy devs. The current (version 1.24) docs for numpy.percentile say this about the method keyword argument:
Changed in version 1.22.0: This argument was previously called "interpolation" ...
They renamed a keyword argument. So if you had working code that did
np.percentile( ...., interpolation=xxx, ....)
then running it in the most recent numpy would throw lots of Deprecation warnings at you, and presumably eventually it will stop working completely. This isn't great. The obvious answer is to change the code to
np.percentile( ...., method=xxx, ....)
But then if you run it on a machine with an older numpy install, then it won't work at all! There isn't a trivial method for users of numpy to conform to this change without breaking stuff. In other words, the numpy devs gave their users pointless homework. I just did this homework with this commit to mrcal. It creates a percentile_compat() function that figures out which flavor of argument we should use, and uses it. Here it is:
def percentile_compat(*args, **kwargs):
    r'''Wrapper for np.percentile() to handle their API change
In numpy 1.24 the "interpolation" kwarg was renamed to "method". I need to pass
the right thing to work with both old and new numpy. This function tries the
newer method, and if that fails, uses the old one. The test is only done the
first time.
It is assumed that this is called with the old 'interpolation' key.
    '''
    if not 'interpolation' in kwargs or \
       percentile_compat.which == 'interpolation':
        return np.percentile(*args, **kwargs)
    kwargs_no_interpolation = dict(kwargs)
    del kwargs_no_interpolation['interpolation']
    if percentile_compat.which == 'method':
        return np.percentile(*args, **kwargs_no_interpolation,
                             method = kwargs['interpolation'])
    # Need to detect
    try:
        result = np.percentile(*args, **kwargs_no_interpolation,
                               method = kwargs['interpolation'])
        percentile_compat.which = 'method'
        return result
    except:
        percentile_compat.which = 'interpolation'
        return np.percentile(*args, **kwargs)
percentile_compat.which = None
Please take it and use it. I give up all copyright.

17 October 2022

Dima Kogan: gnuplot output in an FLTK widget

Overview
I make a lot of plots, and the fragmentation of tools in this space really bugs me. People writing Python code mostly use matplotlib, R people use ggplot2. MS people use the internal Excel thing. I've seen people use gtkdatabox for GTK widgets, rrdtool for logging, qcustomplot for qt. And so on. This is really unhelpful, and it would benefit everybody if there was a single solid plotting backend with lots of bindings to different languages and tools. For my own usage, I've been fighting this quixotic battle, using gnuplot as the plotting backend for all my use cases. gnuplot is
  • very mature
  • stable
  • fast
  • powerful
  • supported on every (with reason) platform
  • supports lots and lots of output backends
There are some things it can't do, but those can be added, and I haven't felt it to be limiting in over 20 years of using it. I rarely use it directly, and usually interact with it through one of I wrote all of these, although the Perl library was taken over by others long ago. Recently I needed a plotting widget for an FLTK program written in Python. It would be great if there was a C++ class deriving from Fl_Widget that would be wrapped by pyfltk, but there isn't. But it turns out that I already had all the tools to quickly hack together something that mostly works. This is a not-ready-for-primetime hack, but it works so well, I'd like to write it up. Hopefully this will be done "properly" someday.

Approach
Alright. So here I'm trying to tie together a Python program, gnuplot output and an FLTK widget. This is a Python program, I can use gnuplotlib to talk to the gnuplot backend. In a perfect world, gnuplot would ship a backend interfacing to FLTK. But it doesn't. What it does do is to ship an x11 backend that makes plots with X11 commands, and it allows these commands to be directed to an arbitrary X11 window. So we
  1. Make an FLTK widget that simply creates an X11 window, and never actually draws into it
  2. Tell gnuplot to plot into this window

Demo
This is really simple, and works shockingly well. Here's my Fl_gnuplotlib widget:
#!/usr/bin/python3
import sys
import gnuplotlib as gp
import fltk
class Fl_Gnuplotlib_Window(fltk.Fl_Window):
    def __init__(self, x,y,w,h, **plot_options):
        super().__init__(x,y,w,h)
        self.end()
        self._plot                 = None
        self._delayed_plot_options = None
        self.init_plot(**plot_options)
    def init_plot(self, **plot_options):
        if 'terminal' in plot_options:
            raise Exception("Fl_Gnuplotlib_Window needs control of the terminal, but the user asked for a specific 'terminal'")
        if self._plot is not None:
            self._plot = None
        self._delayed_plot_options = None
        xid = fltk.fl_xid(self)
        if xid == 0:
            # I don't have an xid (yet?), so I delay the init
            self._delayed_plot_options = plot_options
            return
        # will barf if we already have a terminal
        gp.add_plot_option(plot_options,
                           terminal = f'x11 window "0x xid:x "')
        self._plot = gp.gnuplotlib(**plot_options)
    def plot(self, *args, **kwargs):
        if self._plot is None:
            if self._delayed_plot_options is None:
                raise Exception("plot has not been initialized")
            self.init_plot(**self._delayed_plot_options)
            if self._plot is None:
                raise Exception("plot has not been initialized. Delayed initialization failed")
        self._plot.plot(*args, **kwargs)
Clearly it's simply making an Fl_Window, and pointing gnuplotlib at it. And a sample application that uses this widget:
#!/usr/bin/python3
import sys
import numpy as np
import numpysane as nps
from fltk import *
from Fl_gnuplotlib import *
window = Fl_Window(800, 600, "plot")
plot   = Fl_Gnuplotlib_Window(0, 0, 800,600)
iplot = 0
plotx = np.arange(1000)
ploty = nps.cat(plotx*plotx,
                np.sin(plotx/100),
                plotx)
def timer_callback(*args):
    global iplot, plotx, ploty, plot
    plot.plot(plotx,
              ploty[iplot],
              _with = 'lines')
    iplot += 1
    if iplot == len(ploty):
        iplot = 0
    Fl.repeat_timeout(1.0, timer_callback)
window.resizable(window)
window.end()
window.show()
Fl.add_timeout(1.0, timer_callback)
Fl.run()
This is nice and simple. Exactly what a program using a widget to make a plot (while being oblivious to the details) should look like. It creates a window, places the one plotting widget into it, and cycles the plot inside it at 1Hz (cycling between a parabola, a sinusoid and a line). Clearly we could place other UI elements around it, or add more plots, or whatever. The output looks like this:
Fl_gnuplotlib_demo.gif
To run you need to apt install python3-numpysane python3-gnuplotlib python3-fltk. If running an older distro on a non-Debian-based distro, you should grab those from source.

Discussion
This works. But it's a hack. Some issues:
  • This plotting widget currently can output only. It can make whatever plot we like, but it cannot accept UI input from the container program in any way
  • More than that, when focused it completely replaces the FLTK event logic for that window. So all keyboard input is swallowed, including the keys to access FLTK menus, to exit the application, etc, etc.
  • This approach requires us to use the x11 gnuplot terminal. This works, but it's no longer the terminal preferred by the gnuplot devs, and it it's maintained as vigilantly as the others.
  • And it has bugs. For instance, asking to plot into a window that doesn't yet exist, causes it to create a new window. This breaks FLTK applications that start up and create a plot immediately. Here's a mailing list thread discussing these issues.
So this is a very functional hack, but it's still hack. And it feels like making this solid will take a lot of work. Maybe. I'll push more on this as I need it. Stay tuned!

28 June 2022

Dima Kogan: vnlog 1.33 released

This is a minor release to the vnlog toolkit that adds a few convenience options to the vnl-filter tool. The new options are

vnl-filter -l
Prints out the existing columns, and exits. I've been low-level wanting this for years, but never acutely-enough to actually write it. Today I finally did it.

vnl-filter --sub-abs
Defines an absolute-value abs() function in the default awk mode. I've been low-level wanting this for years as well. Previously I'd use --perl just to get abs(), or I'd explicitly define it: = sub 'abs(x) return x>0?x:-x; '=. Typing all that out was becoming tiresome, and now I don't need to anymore.

vnl-filter --begin ... and vnl-filter --end ...
Theses add BEGIN and END clauses. They're useful to, for instance, use a perl module in BEGIN, or to print out some final output in END. Previously you'd add these inside the --eval block, but that was awkward because BEGIN and END would then appear inside the while(<>) loop. And there was no clear was to do it in the normal -p mode (no --eval). Clearly these are all minor, since the toolkit is now mature. It does everything I want it to, that doesn't require lots of work to implement. The big missing features that I want would patch the underlying GNU coreutils instead of vnlog:
  • The sort tool can select different sorting modes, but join works only with alphanumeric sorting. join should have similarly selectable sorting modes. In the vnlog wrappe I can currently do something like vnl-join --vnl-sort n. This would pre-sort the input alphanumerically, and then post-sort it numerically. That is slow for big datasets. If join could handle numerically-sorted data directly, neither the pre- or post-sorts would be needed
  • When joining on a numerical field, join should be able to do some sort of interpolation when given fields that don't match exactly.
Both of these probably wouldn't take a ton of work to implement, and I'll look into it someday.

16 June 2022

Dima Kogan: Ricoh GR IIIx 802.11 reverse engineering

I just got a fancy new camera: Ricoh GR IIIx. It's pretty great, and I strongly recommend it to anyone that wants a truly pocketable camera with fantastic image quality and full manual controls. One annoyance is the connectivity. It does have both Bluetooth and 802.11, but the only official method of using them is some dinky closed phone app. This is silly. I just did some reverse-engineering, and I now have a functional shell script to download the last few images via 802.11. This is more convenient than plugging in a wire or pulling out the memory card. Fortunately, Ricoh didn't bend over backwards to make the reversing difficult, so to figure it out I didn't even need to download the phone app, and sniff the traffic. When you turn on the 802.11 on the camera, it says stuff about essid and password, so clearly the camera runs its own access point. Not ideal, but it's good-enough. I connected, and ran nmap to find hosts and open ports: only port 80 on 192.168.0.1 is open. Pointing curl at it yields some error, so I need to figure out the valid endpoints. I downloaded the firmware binary, and tried to figure out what's in it:
dima@shorty:/tmp$ binwalk fwdc243b.bin
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
3036150       0x2E53F6        Cisco IOS microcode, for "8"
3164652       0x3049EC        Certificate in DER format (x509 v3), header length: 4, sequence length: 5412
5472143       0x537F8F        Copyright string: "Copyright ("
6128763       0x5D847B        PARity archive data - file number 90
10711634      0xA37252        gzip compressed data, maximum compression, from Unix, last modified: 2022-02-15 05:47:23
13959724      0xD5022C        MySQL ISAM compressed data file Version 11
24829873      0x17ADFB1       MySQL MISAM compressed data file Version 4
24917663      0x17C369F       MySQL MISAM compressed data file Version 4
24918526      0x17C39FE       MySQL MISAM compressed data file Version 4
24921612      0x17C460C       MySQL MISAM compressed data file Version 4
24948153      0x17CADB9       MySQL MISAM compressed data file Version 4
25221672      0x180DA28       MySQL MISAM compressed data file Version 4
25784158      0x1896F5E       Cisco IOS microcode, for "\"
26173589      0x18F6095       MySQL MISAM compressed data file Version 4
28297588      0x1AFC974       MySQL ISAM compressed data file Version 6
28988307      0x1BA5393       MySQL ISAM compressed data file Version 3
28990184      0x1BA5AE8       MySQL MISAM index file Version 3
29118867      0x1BC5193       MySQL MISAM index file Version 3
29449193      0x1C15BE9       JPEG image data, JFIF standard 1.01
29522133      0x1C278D5       JPEG image data, JFIF standard 1.08
29522412      0x1C279EC       Copyright string: "Copyright ("
29632931      0x1C429A3       JPEG image data, JFIF standard 1.01
29724094      0x1C58DBE       JPEG image data, JFIF standard 1.01
The gzip chunk looks like what I want:
dima@shorty:/tmp$ tail -c+10711635 fwdc243b.bin> /tmp/tst.gz
dima@shorty:/tmp$ < /tmp/tst.gz gunzip   file -
/dev/stdin: ASCII cpio archive (SVR4 with no CRC)
dima@shorty:/tmp$ < /tmp/tst.gz gunzip > tst.cpio
OK, we have some .cpio thing. It's plain-text. I grep around it in, looking for GET and POST and such, and I see various URI-looking things at /v1/..... Grepping for that I see
dima@shorty:/tmp$ strings tst.cpio   grep /v1/
GET /v1/debug/revisions
GET /v1/ping
GET /v1/photos
GET /v1/props
PUT /v1/params/device
PUT /v1/params/lens
PUT /v1/params/camera
GET /v1/liveview
GET /v1/transfers
POST /v1/device/finish
POST /v1/device/wlan/finish
POST /v1/lens/focus
POST /v1/camera/shoot
POST /v1/camera/shoot/compose
POST /v1/camera/shoot/cancel
GET /v1/photos/ / 
GET /v1/photos/ / /info
PUT /v1/photos/ / /transfer
/v1/photos/<string>/<string>
/v1/photos/<string>/<string>/info
/v1/photos/<string>/<string>/transfer
/v1/device/finish
/v1/device/wlan/finish
/v1/lens/focus
/v1/camera/shoot
/v1/camera/shoot/compose
/v1/camera/shoot/cancel
/v1/changes
/v1/changes message received.
/v1/changes issue event.
/v1/changes new websocket connection.
/v1/changes websocket connection closed. reason( )
/v1/transfers, transferState( ), afterIndex( ), limit( )
Jackpot. I pointed curl at most of these, and they do interesting things. Generally they all spit out JSON. /v1/liveview sends out a sequence of JPEG images. The thing I care about is /v1/photos/DIRECTORY/FILE and /v1/photos/DIRECTORY/FILE/info. The result is a script I just wrote to connect to the camera, download N images, and connect back to the original access point: https://github.com/dkogan/ricoh-download Kinda crude, but works for now. I'll improve it with time. After I did this I found an old thread from 2015 where somebody was using an apparently-compatible camera, and wrote a fancier tool: https://www.pentaxforums.com/forums/184-pentax-k-s1-k-s2/295501-k-s2-wifi-laptop-2.html

29 November 2021

Dima Kogan: GL_image_display

I just spent an unspeakable number of days typing to produce something that sounds very un-impressive: an FLTK widget that can display an image. The docs and code live here. The big difference from the usual image-drawing widget is that this one uses OpenGL internally, so after the initial image load, the common operations (drawing, redrawing, panning and zooming) are very fast. I have high-resolution images in my projects, and this will make my tools much nicer. Three separate interfaces are available: The FLTK widgets have built-in interactive panning/zooming, and the library can draw line overlays. So nice applications can be built quickly. I already added some early disabled-by-default support into the mrcal-stereo tool to visualize the rectification and report sensitivities:
widget.png
Nice!

28 February 2021

Dima Kogan: mrcal: principled camera calibrations

This is a big deal. In my day job I work with images captured by cameras, using those images to infer something about the geometry of the scene being observed. Naturally, to get good results you need to have a good estimate of the behavior of the lens (the "intrinsics"), and of the relative geometry of the cameras (the "extrinsics"; if there's more than one camera). The usual way to do this is to perform a "calibration" procedure to compute the intrinsics and extrinsics, and then to use the resulting "camera model" to process the subsequent images. Wikipedia has an article. And from experience, the most common current toolkit to do this appears to be OpenCV. People have been doing this for a while, but for whatever reason the existing tools all suck. They make basic questions like "how much data should I gather for a calibration?" and "how good is this calibration I just computed?" and "how different are these two models?" unanswerable. This is clearly seen from the links above. The wikipedia article talks about fitting a pinhole model to lenses, even though no real lenses follow this model (telephoto lenses do somewhat; wider lenses don't at all). And the OpenCV tutorial cheerfully says that
Re-projection error gives a good estimation of just how exact the found
parameters are. The closer the re-projection error is to zero, the more accurate
the parameters we found are.
This statement is trivially proven false: throw away most of your calibration data, and your reprojection error becomes very low. But we can all agree that a calibration computed from less data is actually worse. Right? All the various assumptions and hacks in the existing tooling are fine as long as you don't need a whole lot of accuracy out of your results. I need a lot of accuracy, however, so all the existing tools don't work for my applications. So I built a new set of tools, and have been using them with great delight. I just got the blessing to do a public release, so I'm announcing it here. The tools are mrcal does a whole lot to produce calibrations that are as good as possible, and it will tell you just how good they are, and it includes visualization capabilities for extensive user feedback. An overview of the capabilities of the toolkit (with lots of pretty pictures!) is at the tour of mrcal. There's a lot of documentation and examples, but up to now I have been the primary user of the tools. So I expect this to be somewhat rough when others look at it. Bug reports and patches are welcome. mrcal is an excellent base, but it's nowhere near "done". The documentation has some notes about the planned features and improvements, and I'm always reachable by email. Let me know if you try it out!

27 February 2021

Dima Kogan: horizonator: terrain renderer based on SRTM DEMs

Check this out:
example-interactive.png
I just resurrected and cleaned up an old tool I had lying around. It's now nice and usable by others. This tool loads terrain data, and renders it from the ground, simulating what a human or a camera would see. This is useful for armchair exploring or for identifying peaks. This was relatively novel when I wrote it >10 years ago, but there are a number of similar tools in existence now. This implementation is still useful in that it's freely licensed and contains APIs, so fancier processing can be performed on its output. Sources and (barely-complete-enough) documentation live here: https://github.com/dkogan/horizonator

22 February 2021

Dima Kogan: feedgnuplot: labelled bar charts and a guide

I just released feedgnuplot 1.57, which includes two new pieces that I've long thought about adding:

Labelled bar charts
I've thought about adding these for a while, but had no specific need for them. Finally, somebody asked for it, and I wrote the code. Now that I can, I will probably use these all the time. The new capability can override the usual numerical tic labels on the x axis, and instead use text from a column in the data stream. The most obvious use case is labelled bar graphs:
echo "# label value
      aaa     2
      bbb     3
      ccc     5
      ddd     2"   \
feedgnuplot --vnl \
            --xticlabels \
            --with 'boxes fill solid border lt -1' \
            --ymin 0 --unset grid
xticlabels-basic.svg
But the usage is completely generic. All --xticlabels does, is to accept a data column as labels for the x-axis tics. Everything else that's supported by feedgnuplot and gnuplot works as before. For instance, I can give a domain, and use a style that takes y values and a color:
echo "# x label y color
        5 aaa   2 1
        6 bbb   3 2
       10 ccc   5 4
       11 ddd   2 1"   \
feedgnuplot --vnl --domain \
            --xticlabels \
            --tuplesizeall 3 \
            --with 'points pt 7 ps 2 palette' \
            --xmin 4 --xmax 12 \
            --ymin 0 --ymax 6 \
            --unset grid
xticlabels-points-palette.svg
And we can use gnuplot's support for clustered histograms:
echo "# x label a b
        5 aaa   2 1
        6 bbb   3 2
       10 ccc   5 4
       11 ddd   2 1"   \
vnl-filter -p label,a,b   \
feedgnuplot --vnl \
            --xticlabels \
            --set 'style data histogram' \
            --set 'style histogram cluster gap 2' \
            --set 'style fill solid border lt -1' \
            --autolegend \
            --ymin 0 --unset grid
xticlabels-clustered.svg
Or we can stack the bars on top of one another:
echo "# x label a b
        5 aaa   2 1
        6 bbb   3 2
       10 ccc   5 4
       11 ddd   2 1"   \
vnl-filter -p label,a,b   \
feedgnuplot --vnl \
            --xticlabels \
            --set 'style data histogram' \
            --set 'style histogram rowstacked' \
            --set 'boxwidth 0.8' \
            --set 'style fill solid border lt -1' \
            --autolegend \
            --ymin 0 --unset grid
xticlabels-stacked.svg
This is gnuplot's "row stacking". It also supports "column stacking", which effectively transposes the data, and it's not obvious to me that makes sense in the context of feedgnuplot. Similarly, it can label y and/or z axes; I can't think of a specific use case, so I don't have a realistic usage in mind, and I don't support that yet. If anybody can think of a use case, email me. Notes and limitations:
  • Since with --domain you can pass in both an x value and a tic label, it is possible to give it conflicting tic labels for the same x value. gnuplot itself has this problem too, and it just takes the last label it has for a given x. This is probably good-enough.
  • feedgnuplot uses whitespace-separated columns with no escape mechanism, so the field labels cannot have whitespace in it. Fixing this is probably not worth the effort.
  • These tic labels do not count towards the tuplesize
  • I really need to add a similar feature to gnuplotlib. This will happen when I need it or when somebody asks for it, whichever comes first.

A feedgnuplot guide
This fills in a sorely needed missing part of the documentation: the main feedgnuplot website now has a page containing examples and corresponding graphical output. This serves as a tutorial and a gallery demonstrating some usages. It's somewhat incomplete, since it can't show streaming plots, or real-world interfacing with stuff that produces data: some of those usages remain the the README. It's a million times better than what I had before though, which was nothing. Internally this is done just like the gnuplotlib guide: the thing is an org-mode document with org-babel snippets that are evaluated by emacs to make the images. There's some fancy emacs lisp to tie it all together. Works great!

23 July 2020

Dima Kogan: Finding long runs of "notable" data in a log

Here's yet another instance where the data processing I needed done could be acomplished entirely in the shell, with vnlog tools. I have some time-series data in a text table. Via some join and filter operations, I have boiled down this table to a sequence of time indices where something interesting happened. For instance let's say it looks like this: t.vnl
# time
1976
1977
1978
1979
1980
1986
1987
1988
1989
2011
2012
2013
2014
2015
4679
4680
4681
4682
4683
4684
4685
4686
4687
7281
7282
7283
7291
7292
7293
I'd like to find the longest contiguous chunk of time where the interesting thing kept happening. How? Like this!
$ < t.vnl vnl-filter -p 'time,d=diff(time)'  
          vnl-uniq -c -f -1  
          vnl-filter 'd==1' -p 'count=count+1,time=time-1'  
          vnl-sort -nrk count  
          vnl-align
# count time
9       4679
5       2011
5       1976
4       1986
3       7291
3       7281
Bam! So the longest run was 9-frames-long, starting at time = 4679.

16 July 2020

Dima Kogan: Visualizing a broken internet connection

We've all seen our ISPs crap out periodically. Things feel slow, we then go ping some server, and see lost packets or slow response times. When bitching to the ISP it's useful to have evidence so you can clearly explain exactly how they are fucking up. This happened to me again last week, so I wrote a quick oneliner to do the visualization. Here it is:
( echo '# i dt'; ping 8.8.8.8   perl -ne 'BEGIN   $ =1  next if /PING/; s/.*seq=([0-9]+).*time=([0-9]+).*/\1 \2/ && print')   tee /tmp/ping.vnl   vnl-filter --stream -p i,dt,di='diff(i)'   vnl-filter --stream --has di   feedgnuplot  --domain --stream --with 'linespoints pt 7 palette' --tuplesizeall 3 --ymin 0 --set 'cbrange [1:]' --xlabel "ping index" --ylabel "Response time (ms)" --title "ping 8.8.8.8 response times. Consecutive indices shown as colors"
You run that, and you get a realtime-updating plot of ping times and missed packets. The one-liner also logs the data to a file, so the saved data can be re-visualized. Showing what happens on my currently-working internet connection:
< /tmp/ping.vnl vnl-filter -p i,dt,di='diff(i)'   vnl-filter --has di   feedgnuplot  --domain --with 'linespoints pt 7 palette' --tuplesizeall 3 --ymin 0 --set 'cbrange [1:]' --xlabel "ping index" --ylabel "Response time (ms)" --title "ping 8.8.8.8 response times. Consecutive indices shown as colors" --hardcopy /tmp/isp-good.svg
good.svg
On a broken network it looks like this (I edited the log to show the kind of broken-ness I was seeing earlier):
bad.svg
The x axis is the ping index. The y axis is the response time. Misses responses are shows as gaps in the points and also as colors, for legibility. As usual, this uses the vnlog and feedgnuplot tools, so
sudo apt install feedgnuplot vnlog
if you want to try this

3 June 2020

Dima Kogan: vnlog now functional on *BSD and OSX

So somebody finally bugged me about supporting vnlog tools on OSX. I was pretty sure that between all the redirection, process communication, and file descriptor management something was Linux-specific, but apparently not: everything just works. There were a few uninteresting issues with tool paths, core tool and linker flags and so on, but it was all really minor. I have a report that the test suite passes on OSX, and I verified it on FreeBSD. I made a new 1.28 release tag, but it exists mostly for the benefit of any OSX or *BSD people who'd want to make a package for their system. Progress!

23 March 2020

Dima Kogan: org-babel for documentation

So I just gave a talk at SCaLE 18x about numpysane and gnuplotlib, two libraries I wrote to make using numpy bearable. With these two, it's actually quite nice! Prior to the talk I overhauled the documentation for both these projects. The gnuplotlib docs now have a tutorial/gallery page, which is interesting-enough to write about. Check it out! Mostly it is a sequence of Clearly you want the plots in the documentation to correspond to the code, so you want something to actually run each code snippet to produce each plot. Automatically. I don't want to maintain these manually, and periodically discover that the code doesn't make the plot I claim it does or worse: that the code barfs. This is vaguely what Jupyter notebooks do, but they're ridiculous, so I'm doing something better: That's it. The git repo is hosted by github, which has a rudimentary renderer for .org documents. I'm committing the .svg files, so that's enough to get rendered documentation that looks nice. Note that the usual workflow is to use org to export to html, but here I'm outsourcing that job to github; I just make the .svg files, and that's enough. Look at the link again: gnuplotlib tutorial/gallery. This is just a .org file committed to the git repo. github is doing its normal org->html thing to display this file. This has drawbacks too: github is ignoring the :noexport: tag on the init section at the end of the file, so it's actually showing all the emacs lisp goop that makes this work (described below!). It's at the end, so I guess this is good-enough. Those of us that use org-mode would be completely unsurprised to hear that the talk is also written as .org document. And the slides that show gnuplotlib plots use the same org-babel system to render the plots. It's all oh-so-nice. As with anything as flexible as org-babel, it's easy to get into a situation where you're bending it to serve a not-quite-intended purpose. But since this all lives in emacs, you can make it do whatever you want with a bit of emacs lisp. I ended up advising a few things (mailing list post here). And I stumbled on an (arguable) bug in emacs that needed working around (mailing list post here). I'll summarize both here.

Handling large Local Variables blocks
The advises I ended up with ended up longer than emacs expected, which made emacs not evaluate them when loading the buffer. As I discovered (see the mailing list post) the loading code looks for the string Local Variables in the last 3000 bytes of the buffer only, and I exceeded that. Stefan Monnier suggested a workaround in this post. Instead of the normal Local Variables block at the end:
Local Variables:
eval: (progn ... ...
             ... ...
             LONG chunk of emacs-lisp
      )
End:
I do this:
(progn ;;local-config
   lisp lisp lisp
   as long as I want
)
Local Variables:
eval: (progn (re-search-backward "^(progn ;;local-config") (eval (read (current-buffer))))
End:
So emacs sees a small chunk of code that searches backwards through the buffer (as far back as needed) for the real lisp to evaluate. As an aside, this blog is also an .org document, and the lisp snippets above are org-babel blocks that I'm not evaluating. The exporter knows to respect the emacs-lisp syntax highlighting, however.

Advises
OK, so what was all the stuff I needed to tell org-babel to do specially here? First off, org needed to be able to communicate to the Python session the name of the file to write the plot to. I do this by making the whole plist for this org-babel snippet available to python:
;; THIS advice makes all the org-babel parameters available to python in the
;; _org_babel_params dict. I care about _org_babel_params['_file'] specifically,
;; but everything is available
(defun dima-org-babel-python-var-to-python (var)
  "Convert an elisp value to a python variable.
  Like the original, but supports (a . b) cells and symbols
"
  (if (listp var)
      (if (listp (cdr var))
          (concat "[" (mapconcat #'org-babel-python-var-to-python var ", ") "]")
        (format "\"\"\"%s\"\"\"" var))
    (if (symbolp var)
        (format "\"\"\"%s\"\"\"" var)
      (if (eq var 'hline)
          org-babel-python-hline-to
        (format
         (if (and (stringp var) (string-match "[\n\r]" var)) "\"\"%S\"\"" "%S")
         (if (stringp var) (substring-no-properties var) var))))))
(defun dima-alist-to-python-dict (alist)
  "Generates a string defining a python dict from the given alist"
  (let ((keyvalue-list
         (mapcar (lambda (x)
                   (format "%s = %s, "
                           (replace-regexp-in-string
                            "[^a-zA-Z0-9_]" "_"
                            (symbol-name (car x)))
                           (dima-org-babel-python-var-to-python (cdr x))))
                 alist)))
    (concat
     "dict( "
     (apply 'concat keyvalue-list)
     ")")))
(defun dima-org-babel-python-pass-all-params (f params)
  (cons
   (concat
    "_org_babel_params = "
    (dima-alist-to-python-dict params))
   (funcall f params)))
(unless
    (advice-member-p
     #'dima-org-babel-python-pass-all-params
     #'org-babel-variable-assignments:python)
  (advice-add
   #'org-babel-variable-assignments:python
   :around #'dima-org-babel-python-pass-all-params))
So if there's a :file plist key, the python code can grab that, and write the plot to that filename. But I don't really want to specify an output file for every single org-babel snippet. All I really care about is that each plot gets a unique filename. So I omit the :file key entirely, and use this advice to generate one for me:
;; This sets a default :file tag, set to a unique filename. I want each demo to
;; produce an image, but I don't care what it is called. I omit the :file tag
;; completely, and this advice takes care of it
(defun dima-org-babel-python-unique-plot-filename
    (f &optional arg info params)
  (funcall f arg info
           (cons (cons ':file
                       (format "guide-%d.svg"
                               (condition-case nil
                                   (setq dima-unique-plot-number (1+ dima-unique-plot-number))
                                 (error (setq dima-unique-plot-number 0)))))
                 params)))
(unless
    (advice-member-p
     #'dima-org-babel-python-unique-plot-filename
     #'org-babel-execute-src-block)
  (advice-add
   #'org-babel-execute-src-block
   :around #'dima-org-babel-python-unique-plot-filename))
This uses the dima-unique-plot-number integer to keep track of each plot. I increment this with each plot. Getting closer. It isn't strictly required, but it'd be nice if each plot had the same output filename each time I generated it. So I want to reset the plot number to 0 each time:
;; If I'm regenerating ALL the plots, I start counting the plots from 0
(defun dima-reset-unique-plot-number
    (&rest args)
    (setq dima-unique-plot-number 0))
(unless
    (advice-member-p
     #'dima-reset-unique-plot-number
     #'org-babel-execute-buffer)
  (advice-add
   #'org-babel-execute-buffer
   :after #'dima-reset-unique-plot-number))
Finally, I want to lie to the user a little bit. The code I'm actually executing writes each plot to an .svg. But the code I'd like the user to see should use the default output: an interactive, graphical window. I do that by tweaking the python session to tell the gnuplotlib object to write to .svg files from org by default, instead of using the graphical terminal:
;; I'm using github to display guide.org, so I'm not using the "normal" org
;; exporter. I want the demo text to not contain the hardcopy= tags, but clearly
;; I need the hardcopy tag when generating the plots. I add some python to
;; override gnuplotlib.plot() to add the hardcopy tag somewhere where the reader
;; won't see it. But where to put this python override code? If I put it into an
;; org-babel block, it will be rendered, and the :export tags will be ignored,
;; since github doesn't respect those (probably). So I put the extra stuff into
;; an advice. Whew.
(defun dima-org-babel-python-set-demo-output (f body params)
  (with-temp-buffer
    (insert body)
    (beginning-of-buffer)
    (when (search-forward "import gnuplotlib as gp" nil t)
      (end-of-line)
      (insert
       "\n"
       "if not hasattr(gp.gnuplotlib, 'orig_init'):\n"
       "    gp.gnuplotlib.orig_init = gp.gnuplotlib.__init__\n"
       "gp.gnuplotlib.__init__ = lambda self, *args, **kwargs: gp.gnuplotlib.orig_init(self, *args, hardcopy=_org_babel_params['_file'] if 'file' in _org_babel_params['_result_params'] else None, **kwargs)\n"))
    (setq body (buffer-substring-no-properties (point-min) (point-max))))
  (funcall f body params))
(unless
    (advice-member-p
     #'dima-org-babel-python-set-demo-output
     #'org-babel-execute:python)
  (advice-add
   #'org-babel-execute:python
   :around #'dima-org-babel-python-set-demo-output))
)
And that's it. The advises in the talk are slightly different, in uninteresting ways. Some of this should be upstreamed to org-babel somehow. Now entirely clear which part, but I'll cross that bridge when I get to it.

15 March 2020

Dima Kogan: numpysane and broadcasting in C

Since the beginning, the numpysane library provided a broadcast_define() function to decorate existing Python routines to give them broadcasting awareness. This was very useful, but slow. I just did lots of typing, and now I have a flavor of this in C (the numpysane_pywrap module; new in numpysane 0.22). As expected, you get fast C loops! And similar to the rest of this library, this is a port of something in PDL: PDL::PP. Full documentation lives here: https://github.com/dkogan/numpysane/blob/master/README-pywrap.org After writing this I realized that there was something similar available in numpy this whole time: https://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html I haven't looked too deeply into this yet, but 2 things are clear: There's a design difference: the numpy implementation uses function callbacks, while I generate C code. Code generation is what PDL::PP does, and when I thought about it earlier, it seemed like doing this with function pointers would be too painful. I guess it's doable, though. And at least in one case, the gufuncs aren't doing the right broadcasting thing:
>>> a = np.arange(5).reshape(5,1)
>>> b = np.arange(3)
>>> np.matmul(a,b)
ValueError: matmul: Input operand 1 has a mismatch in
   its core dimension 0, with gufunc signature
   (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1)
This should work. And if you do this with numpysane.broadcast_define() or with numpysane_pywrap, it does work. I'll look at it later to figure out what it's doing.

10 July 2016

Bits from Debian: New Debian Developers and Maintainers (May and June 2016)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!