Antti-Juhani Kaijanaho: About to retire from Debian

Kaijanaho, Antti-JuhaniA Licentiate Thesis is assessed by two examiners, usually drawn from outside of the home university; they write (either jointly or separately) a substantiated statement about the thesis, in which they suggest a grade. The final grade is almost always the one suggested by the examiners. I was very fortunate to have such prominent scientists as Dr. Stefan Hanenberg and Prof. Stein Krogdahl as the examiners of my thesis. They recommended, and I received, the grade very good (4 on a scale of 1 5). The thesis has been
The extent of empirical evidence that could inform evidence-based design of programming languages. A systematic mapping study.
Jyv skyl : University of Jyv skyl , 2014, 243 p.
(Jyv skyl Licentiate Theses in Computing,
ISSN 1795-9713; 18)
ISBN 978-951-39-5790-2 (nid.)
ISBN 978-951-39-5791-9 (PDF)
Finnish summary Background: Programming language design is not usually informed by empirical studies. In other fields similar problems have inspired an evidence-based paradigm of practice. Central to it are secondary studies summarizing and consolidating the research literature. Aims: This systematic mapping study looks for empirical research that could inform evidence-based design of programming languages. Method: Manual and keyword-based searches were performed, as was a single round of snowballing. There were 2056 potentially relevant publications, of which 180 were selected for inclusion, because they reported empirical evidence on the efficacy of potential design decisions and were published on or before 2012. A thematic synthesis was created. Results: Included studies span four decades, but activity has been sparse until the last five years or so. The form of conditional statements and loops, as well as the choice between static and dynamic typing have all been studied empirically for efficacy in at least five studies each. Error proneness, programming comprehension, and human effort are the most common forms of efficacy studied. Experimenting with programmer participants is the most popular method. Conclusions: There clearly are language design decisions for which empirical evidence regarding efficacy exists; they may be of some use to language designers, and several of them may be ripe for systematic reviewing. There is concern that the lack of interest generated by studies in this topic area until the recent surge of activity may indicate serious issues in their research approach. Keywords: programming languages, programming language design, evidence-based paradigm, efficacy, research methods, systematic mapping study, thematic synthesis
In fact, the whole of 20th Century philosophy of science is a big pile of failed attempts to explain science; not one explanation is fully satisfactory. [...] Most scientists enjoy not pondering it, for it s a bit like being a cartoon character: so long as you don t look down, you can walk on air.I wrote my Master s Thesis (PDF) in 2002. It was about the formal method called B ; but I took a lot of time and pages to examine the history and content of formal logic. My supervisor was, understandably, exasperated, but I did receive the highest possible grade for it (which I never have fully accepted I deserved). The main reason for that digression: I looked down, and I just had to go poke the bridge I was standing on to make sure I was not, in fact, walking on air. In the many years since, I ve taken a lot of time to study foundations, first of mathematics, and more recently of science. It is one reason it took me about eight years to come up with a doable doctoral project (and I am still amazed that my department kept employing me; but I suppose they like my teaching, as do I). The other reason was, it took me that long to realize how to study the design of programming languages without going where everyone has gone before. Debian people, if any are still reading, may find it interesting that I found significant use for the dctrl-tools toolset I have been writing for Debian for about fifteen years: I stored my data collection as a big pile of dctrl-format files. I ended up making some changes to the existing tools (I should upload the new version soon, I suppose), and I wrote another toolset (unfortunately one that is not general purpose, like the dctrl-tools are) in the process. For the Haskell people, I mainly have an apology for not attending to Planet Haskell duties in the summer; but I am back in business now. I also note, somewhat to my regret, that I found very few studies dealing with Haskell. I just checked; I mention Haskell several times in the background chapter, but it is not mentioned in the results chapter (because there were not studies worthy of special notice). I am already working on extending this work into a doctoral thesis. I expect, and hope, to complete that one faster.
gnutls_handshake
returns once the handshake is finished. So how does one adapt this to asynchronous transput? Fortunately, there are (badly documented) hooks for this purpose.
An application can tell gnutls to call application-supplied functions instead of the read(2)
and write(2)
system calls. Thus, when setting up a TLS session but before the handshake, I do the following:
gnutls_transport_set_ptr(gs, this); gnutls_transport_set_push_function(gs, push_static); gnutls_transport_set_pull_function(gs, pull_static); gnutls_transport_set_lowat(gs, 0);Here,
gs
is my private copy of the gnutls session structure, and the push_static
and pull_static
are static member functions in my sesssion wrapper class. The first line tells gnutls to give the current this
pointer (a pointer to the current session wrapper) as the first argument to them. The last line tells gnutls not to try treating the this
pointer as a Berkeley socket.
The pull_static
static member function just passes control on to a non-static member, for convenience:
ssize_t session::pull_static(void * th, void *b, size_t n) return static_cast<session *>(th)->pull(b, n);The basic idea of the
pull
function is to try to return immediately with data from a buffer, and if the buffer is empty, to fail with an error code signalling the absence of data with the possibility that data may become available later (the POSIX EAGAIN
code):
class session [...] std::vector<unsigned char> ins; size_t ins_low, ins_high; [...] ; ssize_t session::pull(void *b, size_t n_wanted) unsigned char *cs = static_cast<unsigned char *>(b); if (ins_high - ins_low > 0) errno = EAGAIN; return -1; size_t n = ins_high - ins_low < n_wanted ? ins_high - ins_low : n_wanted; for (size_t i = 0; i < n; i++) cs[i] = ins[ins_low+i]; ins_low += n; return n;Here,
ins_low
is an index to the ins
vector specifying the first byte which has not already been passed on to gnutls, while ins_high
is an index to the ins
vector specifying the first byte that does not contain data read from the network. The assertions 0 <= ins_low
, ins_low <= ins_high
and ins_high <= ins.size()
are obvious invariants in this buffering scheme.
The push case is simpler: all one needs to do is buffer the data that gnutls wants to send, for later transmission:
class session [...] std::vector<unsigned char> outs; size_t outs_low; [...] ; ssize_t session::push(const void *b, size_t n) const unsigned char *cs = static_cast<const unsigned char *>(b); for (size_t i = 0; i < n; i++) outs.push_back(cs[i]); return n;The low water mark
outs_low
(indicating the first byte that has not yet been sent to the network) is not needed in the push function. It would be possible for the push callback to signal EAGAIN
, but it is not necessary in this scheme (assuming that one does not need to establish hard buffer limits).
Once gnutls receives an EAGAIN
condition from the pull callback, it suspends the current operation and returns to its caller with the gnutls condition GNUTLS_E_AGAIN
. The caller must arrange for more data to become available to the pull callback (in this case by scheduling an asynchronous write of the data in the outs
buffer scheme and scheduling an asynchronous read to the ins
buffer scheme) and then call the operation again, allowing the operation to resume.
The code so far does not actually perform any network transput. For this, I have written two auxiliary methods:
class session [...] bool read_active, write_active; [...] ; void session::post_write() if (write_active) return; if (outs_low > 0 && outs_low == outs.size()) outs.clear(); outs_low = 0; else if (outs_low > 4096) outs.erase(outs.begin(), outs.begin() + outs_low); outs_low = 0; if (outs_low < outs.size()) stream.async_write_some (boost::asio::buffer(outs.data()+outs_low, outs.size()-outs_low), boost::bind(&session::sent_some, this, _1, _2)); write_active = true; void session::post_read() if (read_active) return; if (ins_low > 0 && ins_low == ins.size()) ins.clear(); ins_low = 0; ins_high = 0; else if (ins_low > 4096) ins.erase(ins.begin(), ins.begin() + ins_low); ins_high -= ins_low; ins_low = 0; if (ins_high + 4096 >= ins.size()) ins.resize(ins_high + 4096); stream.async_read_some(boost::asio::buffer(ins.data()+ins_high, ins.size()-ins_high), boost::bind(&session::received_some, this, _1, _2)); read_active = true;Both helpers prune the buffers when necessary. (I should really remove those magic 4096s and make them a symbolic constant.) The data members
read_active
and write_active
ensure that at most one asynchronous read and at most one asynchronous write is pending at any given time. My first version did not have this safeguard (instead trying to rely on the ASIO stream reset
method to cancel any outstanding asynchronous transput at need), and the code sent some TLS records twice which is not good: sending the ServerHello twice is guaranteed to confuse the client.
Once ASIO completes an asynchronous transput request, it calls the corresponding handler:
void session::received_some(boost::system::error_code ec, size_t n) read_active = false; if (ec) pending_error = ec; return; ins_high += n; post_pending_actions(); void session::sent_some(boost::system::error_code ec, size_t n) write_active = false; if (ec) pending_error = ec; return; outs_low += n; post_pending_actions();Their job is to update the bookkeeping and to trigger the resumption of suspended gnutls operations (which is done by
post_pending_actions
).
Now we have all the main pieces of the puzzle. The remaining pieces are obvious but rather messy, and I d rather not repeat them here (not even in a cleaned-up form). But their essential idea goes as follows:
When called by the application code or when resumed by post_pending_actions
, an asynchronous wrapper of a gnutls operation first examines the session state for a saved error code. If one is found, it is propagated to the application using the usual ASIO techniques, and the operation is cancelled. Otherwise, the wrapper calls the actual gnutls operation. When it returns, the wrapper examines the return value. If successful completion is indicated, the handler given by the application is posted in the ASIO io_service
for later execution. If GNUTLS_E_AGAIN
is indicated, post_read
and post_write
are called to schedule actual network transput, and the wrapper is suspended (by pushing it into a queue of pending actions). If any other kind of failure is indicated, it is propagated to the application using the usual ASIO techniques.
The post_pending_actions
merely empties the queue of pending actions and schedules the actions that it found in the queue for resumption.
The code snippets above are not my actual working code. I have mainly removed from them some irrelevant details (mostly certain template parameters, debug logging and mutex handling). I don t expect the snippets to compile. I expect I will be able to post my actual git repository to the web in a couple of days.
Please note that my (actual) code has received only rudimentary testing. I believe it is correct, but I won t be surprised to find it contains bugs in the edge cases. I hope this is, still, of some use to somebody #!/bin/sh update-grubI call the script zzz-grub-local, to ensure that it runs last.
Next.