Search Results: "nomeata"

25 January 2024

Joachim Breitner: GHC Steering Committee Retrospective

After seven years of service as member and secretary on the GHC Steering Committee, I have resigned from that role. So this is a good time to look back and retrace the formation of the GHC proposal process and committee. In my memory, I helped define and shape the proposal process, optimizing it for effectiveness and throughput, but memory can be misleading, and judging from the paper trail in my email archives, this was indeed mostly Ben Gamari s and Richard Eisenberg s achievement: Already in Summer of 2016, Ben Gamari set up the ghc-proposals Github repository with a sketch of a process and sent out a call for nominations on the GHC user s mailing list, which I replied to. The Simons picked the first set of members, and in the fall of 2016 we discussed the committee s by-laws and procedures. As so often, Richard was an influential shaping force here.

Three ingredients For example, it was him that suggested that for each proposal we have one committee member be the Shepherd , overseeing the discussion. I believe this was one ingredient for the process effectiveness: There is always one person in charge, and thus we avoid the delays incurred when any one of a non-singleton set of volunteers have to do the next step (and everyone hopes someone else does it). The next ingredient was that we do not usually require a vote among all members (again, not easy with volunteers with limited bandwidth and occasional phases of absence). Instead, the shepherd makes a recommendation (accept/reject), and if the other committee members do not complain, this silence is taken as consent, and we come to a decision. It seems this idea can also be traced back on Richard, who suggested that once a decision is requested, the shepherd [generates] consensus. If consensus is elusive, then we vote. At the end of the year we agreed and wrote down these rules, created the mailing list for our internal, but publicly archived committee discussions, and began accepting proposals, starting with Adam Gundry s OverloadedRecordFields. At that point, there was no secretary role yet, so how I did become one? It seems that in February 2017 I started to clean-up and refine the process documentation, fixing bugs in the process (like requiring authors to set Github labels when they don t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the email in which I described that role I wrote:
Simon already shoved me towards picking up the secretary hat, to reduce load on Ben.
So when I merged the updated process documentation, I already listed myself secretary . It wasn t just Simon s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:
I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.
So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here. I am rather content with the result: These three ingredients single secretary, per-proposal shepherds, silence-is-consent helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.

Ulterior motivation I must admit, however, there was an ulterior motivation behind me grabbing the secretary role: Yes, I did want the committee to succeed, and I did want that authors receive timely, good and decisive feedback on their proposals but I did not really want to have to do that part. I am, in fact, a lousy proposal reviewer. I am too generous when reading proposals, and more likely mentally fill gaps in a specification rather than spotting them. Always optimistically assuming that the authors surely know what they are doing, rather than critically assessing the impact, the implementation cost and the interaction with other language features. And, maybe more importantly: why should I know which changes are good and which are not so good in the long run? Clearly, the authors cared enough about a proposal to put it forward, so there is some need and I do believe that Haskell should stay an evolving and innovating language but how does this help me decide about this or that particular feature. I even, during the formation of the committee, explicitly asked that we write down some guidance on Vision and Guideline ; do we want to foster change or innovation, or be selective gatekeepers? Should we accept features that are proven to be useful, or should we accept features so that they can prove to be useful? This discussion, however, did not lead to a concrete result, and the assessment of proposals relied on the sum of each member s personal preference, expertise and gut feeling. I am not saying that this was a mistake: It is hard to come up with a general guideline here, and even harder to find one that does justice to each individual proposal. So the secret motivation for me to grab the secretary post was that I could contribute without having to judge proposals. Being secretary allowed me to assign most proposals to others to shepherd, and only once in a while myself took care of a proposal, when it seemed to be very straight-forward. Sneaky, ain t it?

7 Years later For years to come I happily played secretary: When an author finished their proposal and public discussion ebbed down they would ping me on GitHub, I would pick a suitable shepherd among the committee and ask them to judge the proposal. Eventually, the committee would come to a conclusion, usually by implicit consent, sometimes by voting, and I d merge the pull request and update the metadata thereon. Every few months I d summarize the current state of affairs to the committee (what happened since the last update, which proposals are currently on our plate), and once per year gathered the data for Simon Peyton Jones annually GHC Status Report. Sometimes some members needed a nudge or two to act. Some would eventually step down, and I d sent around a call for nominations and when the nominations came in, distributed them off-list among the committee and tallied the votes. Initially, that was exciting. For a long while it was a pleasant and rewarding routine. Eventually, it became a mere chore. I noticed that I didn t quite care so much anymore about some of the discussion, and there was a decent amount of naval-gazing, meta-discussions and some wrangling about claims of authority that was probably useful and necessary, but wasn t particularly fun. I also began to notice weaknesses in the processes that I helped shape: We could really use some more automation for showing proposal statuses, notifying people when they have to act, and nudging them when they don t. The whole silence-is-assent approach is good for throughput, but not necessary great for quality, and maybe the committee members need to be pushed more firmly to engage with each proposal. Like GHC itself, the committee processes deserve continuous refinement and refactoring, and since I could not muster the motivation to change my now well-trod secretarial ways, it was time for me to step down. Luckily, Adam Gundry volunteered to take over, and that makes me feel much less bad for quitting. Thanks for that! And although I am for my day job now enjoying a language that has many of the things out of the box that for Haskell are still only language extensions or even just future proposals (dependent types, BlockArguments, do notation with ( foo) expressions and Unicode), I m still around, hosting the Haskell Interlude Podcast, writing on this blog and hanging out at ZuriHac etc.

22 December 2023

Joachim Breitner: The Haskell Interlude Podcast

It was pointed out to me that I have not blogged about this, so better now than never: Since 2021 I am together with four other hosts producing a regular podcast about Haskell, the Haskell Interlude. Roughly every two weeks two of us interview someone from the Haskell Community, and we chat for approximately an hour about how they came to Haskell, what they are doing with it, why they are doing it and what else is on their mind. Sometimes we talk to very famous people, like Simon Peyton Jones, and sometimes to people who maybe should be famous, but aren t quite yet. For most episodes we also have a transcript, so you can read the interviews instead, if you prefer, and you should find the podcast on most podcast apps as well. I do not know how reliable these statistics are, but supposedly we regularly have around 1300 listeners. We don t get much feedback, however, so if you like the show, or dislike it, or have feedback, let us know (for example on the Haskell Disourse, which has a thread for each episode). At the time of writing, we released 40 episodes. For the benefit of my (likely hypothetical) fans, or those who want to train an AI voice model for nefarious purposes, here is the list of episodes co-hosted by me: Can t decide where to start? The one with Ryan Trinkle might be my favorite. Thanks to the Haskell Foundation and its sponsors for supporting this podcast (hosting, editing, transscription).

1 November 2023

Joachim Breitner: Joining the Lean FRO

Tomorrow is going to be a new first day in a new job for me: I am joining the Lean FRO, and I m excited.

What is Lean? Lean is the new kid on the block of theorem provers. It s a pure functional programming language (like Haskell, with and on which I have worked a lot), but it s dependently typed (which Haskell may be evolving to be as well, but rather slowly and carefully). It has a refreshing syntax, built on top of a rather good (I have been told, not an expert here) macro system. As a dependently typed programming language, it is also a theorem prover, or proof assistant, and there exists already a lively community of mathematicians who started to formalize mathematics in a coherent library, creatively called mathlib.

What is a FRO? A Focused Research Organization has the organizational form of a small start up (small team, little overhead, a few years of runway), but its goals and measure for success are not commercial, as funding is provided by donors (in the case of the Lean FRO, the Simons Foundation International, the Alfred P. Sloan Foundation, and Richard Merkin). This allows us to build something that we believe is a contribution for the greater good, even though it s not (or not yet) commercially interesting enough and does not fit other forms of funding (such as research grants) well. This is a very comfortable situation to be in.

Why am I excited? To me, working on Lean seems to be the perfect mix: I have been working on language implementation for about a decade now, and always with a preference for functional languages. Add to that my interest in theorem proving, where I have used Isabelle and Coq so far, and played with Agda and others. So technically, clearly up my alley. Furthermore, the language isn t too old, and plenty of interesting things are simply still to do, rather than tried before. The ecosystem is still evolving, so there is a good chance to have some impact. On the other hand, the language isn t too young either. It is no longer an open question whether we will have users: we have them already, they hang out on zulip, so if I improve something, there is likely someone going to be happy about it, which is great. And the community seems to be welcoming and full of nice people. Finally, this library of mathematics that these users are building is itself an amazing artifact: Lots of math in a consistent, machine-readable, maintained, documented, checked form! With a little bit of optimism I can imagine this changing how math research and education will be done in the future. It could be for math what Wikipedia is for encyclopedic knowledge and OpenStreetMap for maps and the thought of facilitating that excites me. With this new job I find that when I am telling friends and colleagues about it, I do not hesitate or hedge when asked why I am doing this. This is a good sign.

What will I be doing? We ll see what main tasks I ll get to tackle initially, but knowing myself, I expect I ll get broadly involved. To get up to speed I started playing around with a few things already, and for example created Loogle, a Mathlib search engine inspired by Haskell s Hoogle, including a Zulip bot integration. This seems to be useful and quite well received, so I ll continue maintaining that. Expect more about this and other contributions here in the future.

29 October 2023

Joachim Breitner: Squash your Github PRs with one click

TL;DR: Squash your PRs with one click at https://squasher.nomeata.de/. Very recently I got this response from the project maintainer at a pull request I contributed: Thanks, approved, please squash so that I can merge. It s nice that my contribution can go it, but why did the maintainer not just press the Squash and merge button , and instead adds the this unnecessary roundtrip to the process? Anyways, maintainers make the rules, so I play by them. But unlike the maintainer, who can squash-and-merge with just one click, squashing the PR s branch is surprisingly laberous: Github does not allow you to do that via the Web UI (and hence on mobile), and it seems you are expected to go to your computer and juggle with git rebase --interactive. I found this rather annoying, so I created Squasher, a simple service that will squash your branch for you. There is no configuration, just paste the PR url. It will use the PR title and body as the commit message (which is obviously the right way ), and create the commit in your name:
Squasher in actionSquasher in action
If you find this useful, or found it to be buggy, let me know. The code is at https://github.com/nomeata/squasher if you are curious about it.

20 September 2023

Joey Hess: Haskell webassembly in the browser


live demo As far as I know this is the first Haskell program compiled to Webassembly (WASM) with mainline ghc and using the browser DOM. ghc's WASM backend is solid, but it only provides very low-level FFI bindings when used in the browser. Ints and pointers to WASM memory. (See here for details and for instructions on getting the ghc WASM toolchain I used.) I imagine that in the future, WASM code will interface with the DOM by using a WASI "world" that defines a complete API (and browsers won't include Javascript engines anymore). But currently, WASM can't do anything in a browser without calling back to Javascript. For this project, I needed 63 lines of (reusable) javascript (here). Plus another 18 to bootstrap running the WASM program (here). (Also browser_wasi_shim) But let's start with the Haskell code. A simple program to pop up an alert in the browser looks like this:
 -# LANGUAGE OverloadedStrings #- 
import Wasmjsbridge
foreign export ccall hello :: IO ()
hello :: IO ()
hello = do
    alert <- get_js_object_method "window" "alert"
    call_js_function_ByteString_Void alert "hello, world!"
A larger program that draws on the canvas and generated the image above is here. The Haskell side of the FFI interface is a bunch of fairly mechanical functions like this:
foreign import ccall unsafe "call_js_function_string_void"
    _call_js_function_string_void :: Int -> CString -> Int -> IO ()
call_js_function_ByteString_Void :: JSFunction -> B.ByteString -> IO ()
call_js_function_ByteString_Void (JSFunction n) b =
      BU.unsafeUseAsCStringLen b $ \(buf, len) ->
                _call_js_function_string_void n buf len
Many more would need to be added, or generated, to continue down this path to complete coverage of all data types. All in all it's 64 lines of code so far (here). Also a C shim is needed, that imports from WASI modules and provides C functions that are used by the Haskell FFI. It looks like this:
void _call_js_function_string_void(uint32_t fn, uint8_t *buf, uint32_t len) __attribute__((
        __import_module__("wasmjsbridge"),
        __import_name__("call_js_function_string_void")
));
void call_js_function_string_void(uint32_t fn, uint8_t *buf, uint32_t len)  
        _call_js_function_string_void(fn, buf, len);
 
Another 64 lines of code for that (here). I found this pattern in Joachim Breitner's haskell-on-fastly and copied it rather blindly. Finally, the Javascript that gets run for that is:
call_js_function_string_void(n, b, sz)  
    const fn = globalThis.wasmjsbridge_functionmap.get(n);
    const buffer = globalThis.wasmjsbridge_exports.memory.buffer;
    fn(decoder.decode(new Uint8Array(buffer, b, sz)));
 ,
Notice that this gets an identifier representing the javascript function to run, which might be any method of any object. It looks it up in a map and runs it. And the ByteString that got passed from Haskell has to be decoded to a javascript string. In the Haskell program above, the function is document.alert. Why not pass a ByteString with that through the FFI? Well, you could. But then it would have to eval it. That would make running WASM in the browser be evaling Javascript every time it calls a function. That does not seem like a good idea if the goal is speed. GHC's javascript backend does use Javascript FFI snippets like that, but there they get pasted into the generated Javascript hairball, so no eval is needed. So my code has things like get_js_object_method that look up things like Javascript functions and generate identifiers. It also has this:
call_js_function_ByteString_Object :: JSFunction -> B.ByteString -> IO JSObject
Which can be used to call things like document.getElementById that return a javascript object:
getElementById <- get_js_object_method (JSObjectName "document") "getElementById"
canvas <- call_js_function_ByteString_Object getElementById "myCanvas"
Here's the Javascript called by get_js_object_method. It generates a Javascript function that will be used to call the desired method of the object, and allocates an identifier for it, and returns that to the caller.
get_js_objectname_method(ob, osz, nb, nsz)  
    const buffer = globalThis.wasmjsbridge_exports.memory.buffer;
    const objname = decoder.decode(new Uint8Array(buffer, ob, osz));
    const funcname = decoder.decode(new Uint8Array(buffer, nb, nsz));
    const func = function (...args)   return globalThis[objname][funcname](...args)  ;
    const n = globalThis.wasmjsbridge_counter + 1;
    globalThis.wasmjsbridge_counter = n;
    globalThis.wasmjsbridge_functionmap.set(n, func);
    return n;
 ,
This does mean that every time a Javascript function id is looked up, some more memory is used on the Javascript side. For more serious uses of this, something would need to be done about that. Lots of other stuff like object value getting and setting is also not implemented, there's no support yet for callbacks, and so on. Still, I'm happy where this has gotten to after 12 hours of work on it. I might release the reusable parts of this as a Haskell library, although it seems likely that ongoing development of ghc will make it obsolete. In the meantime, clone the git repo to have a play with it.
This blog post was sponsored by unqueued on Patreon.

10 October 2022

Joachim Breitner: rec-def: Minesweeper case study

I m on the train back from MuniHac, where I gave a talk about the rec-def library that I have excessively blogged about recently (here, here, here and here). I got quite flattering comments about that talk, so if you want to see if they were sincere, I suggest you watch the recording of Getting recursive definitions off their bottoms (but it s not necessary for the following). After the talk, Franz Thoma approached me and told me a story of how we was once implementing the game Minesweeper in Haskell, and in particular the part of the logic where, after the user has uncovered a field, the game would automatically uncover all fields that are next to a neutral field, i.e. one with zero adjacent bombs. He was using a comonadic data structure, which makes a context-dependent parallel computation such as uncovering one field quite natural, and was hoping that using a suitable fix-point operator, he can elegantly obtain not just the next step, but directly the result of recursively uncovering all these fields. But, much to his disappointment, that did not work out: Due to the recursion inherent in that definition, a knot-tying fixed-point operator will lead to a cyclic definition.
Microsoft Minesweeper Microsoft Minesweeper
He was wondering if the rec-def library could have helped him, and we sat down to find out, and this is the tale of this blog post. I will avoid the comonadic abstractions and program it more naively, though, to not lose too many readers along the way. Maybe read Chris Penner s blog post and Finch s functional pearl Getting a Quick Fix on Comonads if you are curious about that angle.

Minesweeper setup Let s start with defining a suitable data type for the grid of the minesweeper board. I ll use the Array data type, it s Ix-based indexing is quite useful for grids: The library lacks a function to generate an array from a generating function, but it is easy to add: Let s also fix the size of the board, as a pair of lower and upper bounds (this is the format that the Ix type class needs): Now board is simply a grid of boolean values, with True indicating that a bomb is there: It would be nice to be able to see these board in a nicer way. So let us write A function that prints a grid, including a frame, given a function that prints something for each coordinate. Together with a function that prints a bomb (as *), we can print the board: The expression b ! c looks up a the coordinate in the array, and is True when there is a bomb at that coordinate. So here is our board, with two bombs:
ghci> putStrLn $ pBombs board1
######
#    #
#*   #
#*   #
#    #
######
But that s not what we want to show to the user: Every field should have have a number that indicates the number of bombs in the surrounding fields. To that end, we first define a function that takes a coordinate, and returns all adjacent coordinates. This also takes care of the border, using inRange: With that, we can calculate what to display in each cell a bomb, or a number: With a suitable printing function, we can now see the full board: And here it is:
ghci> putStrLn $ pBoard board1
######
#11  #
#*2  #
#*2  #
#11  #
######
Next we have to add masks: We need to keep track of which fields the user already sees. We again use a grid of booleans, and define a function to print a board with the masked fields hidden behind ?:
So this is what the user would see
ghci> putStrLn $ pMasked board1 mask1
######
#11 ?#
#????#
#????#
#????#
######

Uncovering some fields With that setup in place, we now implement the piece of logic we care about: Uncovering all fields that are next to a neutral field. Here is the first attempt: The idea is that we calculate the new mask m1 from the old one m0 by the following logic: A field is visible if it was visible before (m0 ! c), or if any of its neighboring, neutral fields are visible. This works so far: I uncovered the three fields next to the one neutral visible field:
ghci> putStrLn $ pMasked board1 $ solve0 board1 mask1
######
#11  #
#?2  #
#????#
#????#
######
But that s not quite what we want: We want to keep doing that to uncover all fields.

Uncovering all fields So what happens if we change the logic to: A field is visible if it was visible before (m0 ! c), or if any of its neighboring, neutral fields will be visible. In the code, this is just a single character change: Instead of looking at m0 to see if a neighbor is visible, we look at m1: (This is roughly what happened when Franz started to use the kfix comonadic fixed-point operator in his code, I believe.) Does it work? It seems so:
ghci> putStrLn $ pMasked board1 $ solve1 board1 mask1
######
#11  #
#?2  #
#?2  #
#?1  #
######
Amazing, isn t it! Unfortunately, it seems to work by accident. If I start with a different mask: which looks as follows:
ghci> putStrLn $ pMasked board1 mask2
######
#11??#
#????#
#????#
#??? #
######
Then our solve1 function does not work, and just sits there:
ghci> putStrLn $ pMasked board1 $ solve1 board1 mask2
######
#11^CInterrupted.
Why did it work before, but now now? It fails to work because as the code tries to figure out if a field, it needs to know if the next field will be uncovered. But to figure that out, it needs to know if the present field will be uncovered. With the normal boolean connectives ( and or), this does not make progress. It worked with mask1 more or less by accident: None of the fields on in the first column don t have neutral neighbors, so nothing happens there. And for all the fields in the third and forth column, the code will know for sure that they will be uncovered based on their upper neighbors, which come first in the neighbors list, and due to the short-circuting properties of , it doesn t have to look at the later cells, and the vicious cycle is avoided.

rec-def to the rescue This is where rec-def comes in: By using the RBool type in m1 instead of plain Bool, the recursive self-reference is not a problem, and it simply works: Note that I did not change the algorithm, or the self-reference through m1; I just replaced Bool with RBool, with RB. and or with RB.or. And used RB.get at the end to get a normal boolean out. And , here we go:
ghci> putStrLn $ pMasked board1 $ solve2 board1 mask2
######
#11  #
#?2  #
#?2  #
#?1  #
######
That s the end of this repetition of let s look at a tying-the-knot-problem and see how rec-def helps , which always end up a bit anti-climatic because it just works , at least in these cases. Hope you enjoyed it nevertheless.

15 September 2022

Joachim Breitner: rec-def: Dominators case study

More ICFP-inspired experiments using the rec-def library: In Norman Ramsey s very nice talk about his Functional Pearl Beyond Relooper: Recursive Translation of Unstructured Control Flow to Structured Control Flow , he had the following slide showing the equation for the dominators of a node in a graph:
Norman Ramsey shows a formula Norman Ramsey shows a formula
He said it s ICFP and I wanted to say the dominance relation has a beautiful set of equations you can read all these algorithms how to compute this, but the concept is simple . This made me wonder: If the concept is simple and this formula is beautiful shouldn t this be sufficient for the Haskell programmer to obtain the dominator relation, without reading all those algorithms? Before we start, we have to clarify the formula a bit: If a node is an entry node (no predecessors) then the big intersection is over the empty set, and that is not a well-defined concept. For these nodes, we need that big intersection to return the empty set, as entry nodes are not dominated by any other node. (Let s assume that the entry nodes are exactly those with no predecessors.) Let s try, first using plain Haskell data structures. We begin by implementing this big intersection operator on Data.Set, and also a function to find the predecessors of a node in a graph: Now we can write down the formula that Norman gave, quite elegantly: Does this work? It seems it does: But not surprising if you have read my previous blog posts it falls over once we have recursion: So let us reimplement it with Data.Recursive.Set. The hope is that we can simply replace the operations, and that now it can suddenly handle cyclic graphs as well. Let s see: It does! Well, it does return a result but it looks strange. Clearly node 3 and 4 are also dominated by 1, but the result does not reflect that. But the result is a solution to Norman s equation. Was the equation wrong? No, but we failed to notice that the desired solution is the largest, not the smallest. And Data.Recursive.Set calculates, as documented, the least fixed point. What now? Until the library has code for RDualSet a, we can work around this by using the dual formula to calculate the non-dominators. To do this, we
  • use union instead of intersection
  • delete instead of insert,
  • S.empty, use the set of all nodes (which requires some extra plumbing)
  • subtract the result from the set of all nodes to get the dominators
and thus the code turns into:
And with this, now we do get the correct result:
ghci> domintors3 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,[1,3]),(4,[1,4])]
We worked a little bit on how to express the beautiful formula to Haskell, but at no point did we have to think about how to solve it. To me, this is the essence of declarative programming.

14 September 2022

Joachim Breitner: rec-def: Program analysis case study

At this week s International Conference on Functional Programming I showed my rec-def Haskell library to a few people. As this crowd appreciates writing compilers, and example from the realm of program analysis is quite compelling.

To Throw or not to throw Here is our little toy language to analyze: It has variables, lambdas and applications, non-recursive (lazy) let bindings and, so that we have something to analyze, a way to throw and to catch exceptions: Given such an expression, we would like to know whether it might throw an exception. Such an analysis is easy to write: We traverse the syntax tree, remembering in the env which of the variables may throw an exception: The most interesting case is the one for Let, where we extend the environment env with the information about the additional variable env_bind, which is calculated from analyzing the right-hand side e1. So far so good:
ghci> someVal = Lam "y" (Var "y")
ghci> canThrow1 $ Throw
True
ghci> canThrow1 $ Let "x" Throw someVal
False
ghci> canThrow1 $ Let "x" Throw (App (Var "x") someVal)
True

Let it rec To spice things up, let us add a recursive let to the language: How can we support this new constructor in canThrow1? Let use naively follow the pattern used for Let: Calculate the analysis information for the variables in env_bind, extend the environment with that, and pass it down: Note that, crucially, we use env', and not just env, when analyzing the right-hand sides. It has to be that way, as all the variables are in scope in all the right-hand sides. In a strict language, such a mutually recursive definition, where env_bind uses env' which uses env_bind is basically unthinkable. But in a lazy language like Haskell, it might just work. Unfortunately, it works only as long as the recursive bindings are not actually recursive, or if they are recursive, they are not used:
ghci> canThrow1 $ LetRec [("x", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "y") someVal), ("y", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "y")
True
But with genuine recursion, it does not work, and simply goes into a recursive cycle:
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
^CInterrupted.
That is disappointing! Do we really have to toss that code and somehow do an explicit fixed-point calculation here? Obscuring our nice declarative code? And possibly having to repeat work (such as traversing the syntax tree) many times that we should only have to do once?

rec-def to the rescue Not with rec-def! Using RBool from Data.Recursive.Bool instead of Bool, we can write the exact same code, as follows: And it works!
ghci> canThrow2 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
False
ghci> canThrow2 $ LetRec [("x", App (Var "x") Throw), ("y", Throw)] (Var "x")
True
I find this much more pleasing than the explicit naive fix-pointing you might do otherwise, where you stabilize the result at each LetRec independently: Not only is all that extra work hidden from the programmer, but now also a single traversal of the syntax tree creates, thanks to the laziness, a graph of RBool values, which are then solved under the hood .

The issue with x=x There is one downside worth mentioning: canThrow2 fails to produce a result in case we hit x=x:
ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
^CInterrupted.
This is, after all the syntax tree has been processed and all the map lookups have been resolved, equivalent to
ghci> let x = x in RB.get (x :: RBool)
^CInterrupted.
which also does not work. The rec-def machinery can only kick in if at least one of its function is used on any such cycle, even if it is just a form of identity (which I ~ought to add to the library~ since have added to the library):
ghci> idR x = RB.false   x
ghci> let x = idR x in getR (x :: R Bool)
False
And indeed, if I insert a call to idR in the line then our analyzer will no longer stumble over these nasty recursive equations:
ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
False
It is a bit disappointing to have to do that, but I do not see a better way yet. I guess the def-rec library expects the programmer to have a similar level of sophistication as other tie-the-know tricks with laziness (where you also have to ensure that your definitions are productive and that the sharing is not accidentally lost).

10 September 2022

Joachim Breitner: rec-def: Behind the scenes

A week ago I wrote about the rec-def Haskell library, which allows you to write more recursive definitions, such as in this small example:
let s1 = RS.insert 23 s2
    s2 = RS.insert 42 s1
in RS.get s1
This will not loop (as it would if you d just used Data.Set), but rather correctly return the set S.fromList [23,42]. See the previous blog post for more examples and discussion of the user-facing side of this. For quick reference, these are the types of the functions involved here: The type of s1 and s2 above is not Set Int, but rather RSet Int, and in this post I ll explain how RSet works internally.

Propagators, in general The conceptual model behind an recursive equation like above is
  • There are a multiple cells that can hold values of an underlying type (here Set)
  • These cells have relations that explain how the values in the cells should relate to each other
  • After registering all the relations, some form of solving happens.
  • If the solving succeeds, we can read off the values from the cells, and they should satisfy the registered relation.
This is sometimes called a propagator network, and is a quite general model that can support different kind of relations (e.g. equalities, inequalities, functions), there can be various solving strategies (iterative fixed-points, algebraic solution, unification, etc.) and information can flow on along the edges (and hyper-edges) possibly in multiple directions. For our purposes, we only care about propagator networks where all relations are functional, so they have a single output cell that is declared to be a function of multiple (possibly zero) input cells, without affecting these input cells. Furthermore, every cell is the output of exactly one such relation.

IO-infested propagator interfaces This suggests that an implementation of such a propagator network could provide an interface with the following three operations:
  • Functions to declare cells
  • Functions to declare relations
  • Functions to read values off cells
This is clearly an imperative interface, so we ll see monads, and we ll simply use IO. So concretely for our small example above, we might expect
There is no need for an explicit solve function: solving can happen when declareInsert or getCell is called as a User I do not care about that. You might be curious about the implementation of newCell, declareInsert and getCell, but I have to disappoint you: This is not the topic of this article. Instead, I want to discuss how to turn this IO-infested interface into the pure interface seen above?

Pure, but too strict Obviously, we have to get rid of the IO somehow, and have to use unsafePerformIO :: IO a -> a somehow. This dangerous function creates a pure-looking value that, when used the first time, will run the IO-action and turn into that action s result. So maybe we can simply write the following: Indeed, the types line up, but if we try to use that code, nothing will happen. Our insert is too strict to be used recursively: It requires the value of c2 (as it is passed to declareInsert, which we assume to be strict in its arguments) before it can return c1, so the recursive example at the top of this post will not make any progress.

Pure, lazy, but forgetful To work around this, maybe it suffices if we do not run declareInsert right away, but just remember that we have to do it eventually? So let s introduce a new data type for RSet a that contains not just the cell (Cell a), but also an action that we still have to run before getting a value: This is better: insert is now lazy in its arguments (for this it is crucial to pattern-match on RSet only inside the todo code, not in the pattern of insert!) This means that our recursive code above does not get stuck right away.

Pure, lazy, but runs in circles But it is still pretty bad: Note that we do not run get s2 in the example above, so that cell s todo, which would declareInsert 42, will never run. This cannot work! We have to (eventually) run the declaration code from all involved cells before we can use getCell! We can try to run the todo action of all the dependencies as part of a cell s todo action: Now we certainly won t forget to run the second cell s todo action, so that is good. But that cell s todo action will run the first cell s todo action, and that again the second cell s, and so on.

Pure, lazy, terminating, but not thread safe This is silly: We only need (and should!) run that code once! So let s keep track of whether we ran it already: Ah, much better: It works! Our call to get c1 will trigger the first cell s todo action, which will mark it as done before calling the second cell s todo action. When that now invokes the first cell s todo action, it is already marked done and we break the cycle, and by the time we reach getCell, all relations have been correctly registered. In a single-threaded world, this would be all good and fine, but we have to worry about multiple threads running get concurrently, on the same or on different cells. In fact, because we use unsafePerformIO, we have to worry about this even when the program is not using threads. And the above code has problems. Imagine a second call to get c1 while the first one has already marked it as done, but has not finished processing all the dependencies yet: It will call getCell before all relations are registered, which is bad.

Recursive do-once IO actions Making this thread-safe seems to be possible, but would clutter both the code and this blog post. So let s hide that problem behind a nice and clean interface. Maybe there will be a separate blog post about its implementation (let me know if you are curious), or you can inspect the code in System.IO.RecThunk module yourself). The interface is simply
data Thunk
thunk :: IO [Thunk] -> IO Thunk
force :: Thunk -> IO ()
and the idea is that thunk act will defer the action act until the thunk is passed to force for the first time, and force will not return until the action has been performed (possibly waiting if another thread is doing that at the moment), and also until the actions of all the thunks returned by act have performed, recursively, without running into cycles. We can use this in our definition of RSet and get to the final, working solution: This snippet captures the essential ideas behind rec-def:
  • Use laziness to allow recursive definition to describe the propagator graph naturally
  • Use a form of explicit thunk to register the propagator graph relations at the right time (not too early/strict, not too late)

And that s all? The actual implementation in rec-def has a few more moving parts. In particular, it tries to support different value types (not just sets), possibly with different implementations, and even mixing them (e.g. in member :: Ord a => a -> RSet a -> RBool), so the generic code is in Data.Propagator.Purify, and supports various propagators underneath. The type RSet is then just a newtype around that, defined in Data.Recursive.Internal to maintain the safety of the abstraction, I went back and forth on a few variants of the design here, including one where there was a generic R type constructor (R (Set a), R Bool etc.), but then monomorphic interface seems simpler.

Does it really work? The big remaining question is certainly: Is this really safe and pure? Does it still behave like Haskell? The answer to these questions certainly depends on the underlying propagator implementation. But it also depends on what we actually mean by safe and pure ? For example, do we expect the Static Argument Transformation be semantics preserving? Or is it allowed to turn undefined values into defined ones (as it does here)? I am unsure myself yet, so I ll defer this discussion to a separate blog post, after I hopefully had good discussions about this here at ICFP 2022 in Ljubljana. If you are around and want to discuss, please hit me up!

3 September 2022

Joachim Breitner: More recursive definitions

Haskell is a pure and lazy programming language, and the laziness allows us to write some algorithms very elegantly, by recursively referring to already calculated values. A typical example is the following definition of the Fibonacci numbers, as an infinite stream:

Elegant graph traversals A maybe more practical example is the following calculation of the transitive closure of a graph: We represent graphs as maps from vertex to their successors vertex, and define the resulting map sets recursively: The set of reachable vertices from a vertex v is v itself, plus those reachable by its successors vs, for which we query sets. And, despite this apparent self-referential recursion, it works!

Cyclic graphs ruin it all These tricks can be very impressive until someone tries to use it on a cyclic graph and the program just hangs until we abort it: At this point we are thrown back to implement a more pedestrian graph traversal, typically keeping explicit track of vertices that we have seen already: I have written that seen/todo recursion idiom so often in the past, I can almost write it blindly And indeed, this code handles cyclic graphs just fine:
ghci> transitive2 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,[1,2,3]),(2,[1,2,3]),(3,[3])]
But this is a bit anticlimactic Haskell is supposed to be a declarative language, and transitive1 declares my intent just fine!

We can have it all It seems there actually is a way to write essentially the code in transitive1, and still get the right result in all cases, and I have just published a possible implementation as rec-def. In the module Data.Recursive.Set we find an API that resembles that of Set, with a type RSet a, and in addition to conversion functions from and to sets, we find the two operations that we needed in transitive1: Let s try that: And indeed it works! Magic!
ghci> transitive2 $ M.fromList [(1,[3]),(2,[1,3]),(3,[])]
fromList [(1,[1,3]),(2,[1,2,3]),(3,[3])]
ghci> transitive2 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,[1,2,3]),(2,[1,2,3]),(3,[3])]
To show off some more, here are small examples:
ghci> let s = RS.insert 42 s in RS.get s
fromList [42]
ghci> : 
  let s1 = RS.insert 23 s2
      s2 = RS.insert 42 s1
  in RS.get s1
 : 
fromList [23,42]

How is that possible? Is it still Haskell? The internal workings of the RSet a type will be the topic of a future blog post; let me just briefly mention that it uses unsafe features under the hood, and just keeps applying the equations you gave until a fixed-point is reached. Because it starts with the empty set and all operations provided by Data.Recursive.Set are monotonous (e.g. no difference) it will eventually find the least fixed point. Despite the unsafe machinery under the hood, I claim that Data.Recursive.Set is itself nicely safe, and does not destroy Haskell s nice properties like purity, referential transparency and equational reasoning. If you disagree, I d like to hear about it (here, on Twitter, Reddit or Discourse)! There is a brief discussion at the end of the tutorial in Data.Recursive.Example.

More than sets The library also provides Data.Recursive.Bool for recursive equations with booleans (preferring False) and Data.Recursive.DualBool (preferring True), and some operations like member :: Ord a => a -> RSet a -> RBool can actually connect different types. I plan to add other data types (natural numbers, maps, Maybe, with suitable orders) as demand arises and as I come across nice small example use-cases for the documentation (e.g. finding shortest paths in a graph). I believe this idiom is practically useful in a wide range of applications (which of course all have some underlying graph structure but then almost everything in Computer Science is a graph). My original motivation was a program analysis. Imagine you want to find out from where in your program you can run into a division by zero. As long as your program does not have recursion, you can simply keep track of a boolean flag while you traverse the program, keeping track a mapping from function names to whether they can divide by zero all nice and elegant. But once you allow mutually recursive functions, things become tricky. Unless you use RBool! Simply use laziness, pass the analysis result down when analyzing the function s right-hand sides, and it just works!

31 July 2022

Joachim Breitner: The Via Alpina red trail through Slovenia

This July my girlfriend and I hiked the Slovenian part of the Red Trail of the Via Alpina, from the edge of the Julian Alps to Trieste, and I d like to share some observations and tips that we might have found useful before our trip.
Our most favorite camp spot Our most favorite camp spot

Getting there As we traveled with complete camping gear and wanted to stay in our tent, we avoided the high alpine parts of the trail and started just where the trail came down from the Alps and entered the Karst. A great way to get there is to take the night train from Zurich or Munich towards Ljubljana, get off at Jesenice, have breakfast, take the local train to Podbrdo and you can start your tour at 9:15am. From there you can reach the trail at Pedrovo Brdo within 1 h.

Finding the way We did not use any paper maps, and instead relied on the OpenStreetMap data, which is very good, as well as the official(?) GPX tracks on Komoot, which are linked from the official route descriptions. We used OsmAnd. In general, trails are generally very well marked (red circle with white center, and frequent signs), but the signs rarely tell you which way the Via Alpina goes, so the GPS was needed. Sometimes the OpenStreetMap trail and the Komoot trail disagreed on short segments. We sometimes followed one and other times the other.

Variants We diverged from the trail in a few places:
  • We did not care too much about the horses in Lipica and at least on the map it looked like a longish boringish and sun-exposed detour, so we cut the loop and hiked from Prelo e pri Lokvi up onto the peak of the Veliko Gradi e (which unfortunately is too overgrown to provide a good view).
  • When we finally reached the top of Mali Kras and had a view across the bay of Trieste, it seemed silly to walk to down to Dolina, and instead we followed the ridge through Socerb, essentially the Alpe Adria Trail.
  • Not really a variant, but after arriving in Muggia, if one has to go to Trieste, the ferry is a probably nicer way to finish a trek than the bus.

Pitching a tent We used our tent almost every night, only in Idrija we got a room (and a shower ). It was not trivial to find good camp spots, because most of the trail is on hills with slopes, and the flat spots tend to have housed built on them, but certainly possible. Sometimes we hid in the forest, other times we found nice small and freshly mowed meadows within the forest.

Water Since this is Karst land, there is very little in terms of streams or lakes along the way, which is a pity. The Idrijca river right south of Idrija was very tempting to take a plunge. Unfortunately we passed there early in the day and we wanted to cover some ground first, so we refrained. As for drinking water, we used the taps at the bathrooms of the various touristic sites, a few (but rare) public fountains, and finally resorted to just ringing random doorbells and asking for water, which always worked.

Paths A few stages lead you through very pleasant narrow forest paths with a sight, but not all. On some days you find yourself plodding along wide graveled or even paved forest roads, though.

Landscape and sights The view from Nanos is amazing and, with this high peak jutting out over a wide plain, rather unique. It may seem odd that the trail goes up and down that mountain on the same day when it could go around, but it is certainly worth it. The Karst is mostly a cultivated landscape, with lots of forestry. It is very hilly and green, which is pretty, but some might miss some craggedness. It s not the high alps, after all, but at least they are in sight half the time. But the upside is that there are few sights along the way that are worth visiting, in particular the the Franja Partisan Hospital hidden in a very narrow gorge, the Predjama Castle and the kocjan Caves

3 January 2022

Joachim Breitner: Telegram bots in Python made easy

A while ago I set out to get some teenagers interested in programming, and thought about a good way to achieve that. A way that allows them to get started with very little friction, build something that s relevant in their currently live quickly, and avoids frustration. They were old enough to have their own smartphone, and they were already happily chatting with their friends, using the Telegram messenger. I have already experimented a bit with writing bots for Telegram (e.g. @Umklappbot or @Kaleidogen), and it occurred to me that this might be a good starting point: Chat bot interactions have a very simple data model: message in, response out, all simple text. Much simpler than anything graphical or even web programming. In a way it combines the simplicity of the typical initial programming exercises on the command-line with the impact and relevance of web programming. But of course real bot programming is still too hard installing a programming environment, setting up a server, deploying, dealing with access tokens, understanding the Telegram Bot API and mapping it to your programming language.
The IDEThe IDE
So I built a browser-based Python programming environments for Telegram bots that takes care of all of that. You simply write a single Python function, click the Deploy button, and the bot is live. That s it! This environment provides a much simpler API for the bots: Define a function like the following:
  def private_message(sender, text):
     return "Hello!"
This gets called upon a message, and if it returns a String, that s the response. That s it! Not enough to build any kind of Telegram bot, but sufficient for many fun applications.
A chatbotA chatbot
In fact, my nephew and niece use this to build a simple interactive fiction game, where the player says where they are going ( house , forest , lake ) and thus explore the story, and in the end kill the dragon. And my girlfriend created a shopping list bot that we are using productively . If you are curious, you can follow the instructions to create your own bot. There you can also find the source code and instructions for hosting your own instance (on Amazon Web Services). Help with the project (e.g. improving the sandbox for running untrustworthy python code; making the front-end work better) is of course highly appreciated, too. The frontend is written in PureScript, and the backend in Python, building on Amazon lambda and Amazon DynamoDB.

28 November 2021

Joachim Breitner: Zero-downtime upgrades of Internet Computer canisters

TL;DR: Zero-downtime upgrades are possible if you stick to the basic actor model.

Background DFINITY s Internet Computer provides a kind of serverless compute platform, where the services are WebAssemmbly programs called canisters . These services run without stopping (or at least that s what it feels like from the service s perspective; this is called orthogonal persistence ), and process one message after another. Messages not only come from the outside ( ingress calls), but are also exchanged between canisters. On top of these uni-directional messages, the system provides the concept of inter-canister calls , which associates a respondse message with the outgoing message, and guarantees that a response will come. This RPC-like interface allows canister developers to program in the popular async/await model, where these inter-canister calls look almost like normal function calls, and the subsequent code is suspended until the response comes back.

The problem This is all very well, until you try to upgrade your canister, i.e. install new code to fix a bug or add a feature. Because if you used the await pattern, there may still be suspended computations waiting for the response. If you swap out the program now, the code of that suspended computation will no longer be present, and the response cannot be handled! Worse, because of an infelicity with the current system s API, when the response comes back, it may actually corrupt your service s state. That is why upgrading a canister requires stopping it first, which means waiting for all outstanding calls to come back. During this time, your canister is not available for new calls (so there is downtime), and worse, the length of the downtime is at the whims of the canisters you called they could withhold the response ad infinitum, rendering your canister unupgradeable. Clearly, this is not acceptable for any serious application. In this post, I ll explore some of the ways to mitigate this problem, and how to create canisters that are safely instantanously (no downtime) upgradeable.

It s a spectrum Some canisters are trivially upgradeable, for others all hope is lost; it depends on what the canister does and how. As an overview, here is the spectrum:
  1. A canister that never performs inter-canister calls can always be upgraded without stopping.
  2. A canister that only does one-way calls, and does them in a particular way (see below), can always be upgraded without stopping.
  3. A canister that performs calls, and where it is acceptable to simply drop outstanding repsonses, can always be upgraded without stopping, once the System API has been improved and your Canister Development Kit (CDK; Motoko or Rust) has adapted.
  4. A canister that performs calls, but uses explicit continuations to handle, responses instead of the await-convenience, based on an eventually fixed System API, can be upgradeded without stopping, and will even handle responses afterwards.
  5. A canister that uses await to do inter-canister call cannot be upgraded without stopping.
In this post I will explain 2, which is possible now, in more detail. Variant 3 and 4 only become reality if and when the System API has improved.

One-way calls A one-way call is a call where you don t care about the response; neither the replied data, nor possible failure conditions. Since you don t care about the response, you can pass an invalid continuation to the system (technical detail: a Wasm table index of -1). Because it is invalid for any (realistic) Wasm module, it will stay invalid even after an upgrade, and the problem of silent corruption mentioned above is avoided. And otherwise it s fine for this to be invalid: it means the canister traps once the response comes back, which is harmeless (and possibly even cheaper than a do-nothing computation). This requires your CDK to support this kind of call. Mostly incidential, Motoko (and Candid) actually have the concept of one-way call in their type system, namely shared functions with return type () instead of async ... (Motoko is actually older than the system, and not every prediction about what the system will provide has proven successful). So, pending this PR to be released, Motoko will implement one-way calls in this way. On Rust, you have to use the System API directly or wait for cdk-rs to provide this ability (patches welcome, happy to advise). You might wonder: How are calls useful if I don t get to look at the response? Of course, this is a set-back calls with responses are useful, and await is convenient. And if you have to integrate with an existing service that only provides normal calls, you are out of luck. But if you get to design the canister and all called canisters together, it may be possible to use only one-way messages. You d be programming in the plain actor model now, with all its advantages (simple concurrency, easy to upgrade, general robustness). Consider for example a token ledger canister, not unlike the ICP ledger canister. For the most part, it doesn t have to do any outgoing calls (and thus be trivially upgradeble). But say we need to add notify functionality, where the ledger canister tells other canisters about a transaction. This is a good example for a one-way call: Maybe the ledger canister doesn t care if that notification was received? The ICP leder does care (once it comes back successful, this particular notification cannot be sent again), but maybe your ledger can do it differently: let the other canister confirm the receip via another one-way call, instead of via the reply; or simply charge for each notification and do not worry about repeated notifications. Maybe you want to add archiving functionality, where the ledger canister streams its data to an archive canister. There, again, instead of using successful responses to confirm receipt, the archive canister can ping the ledger canister with the latest received index directly. Yes, it changes the programming model a bit, and all involved parties have to play together, but the gain (zero-downtime upgrades) is quite valuable, and removes a fair number of other sources of issues.

And in the future? The above is possible with today s Internet Computer. If the System API gets improves the way I hope it will be, you have a possible middle ground: You still don t get to use await and instead have to write your response handler as separate functions, but this way you can call any canister again, and you get the system s assistance in mapping responses to calls. With this in place, any canister can be rewritten to a form that supports zero-downtime upgrades, without affecting its interface or what the canister can do.

9 November 2021

Joachim Breitner: How to audit an Internet Computer canister

I was recently called upon by Origyn to audit the source code of some of their Internet Computer canisters ( canisters are services or smart contracts on the Internet Computer), which were written in the Motoko programming language. Both the application model of the Internet Computer as well as Motoko bring with them their own particular pitfalls and possible sources for bugs. So given that I was involved in the creation of both, they reached out to me. In the course of that audit work I collected a list of things to watch out for, and general advice around them. Origyn generously allowed me to share that list here, in the hope that it will be helpful to the wider community.

Inter-canister calls The Internet Computer system provides inter-canister communication that follows the actor model: Inter-canister calls are implemented via two asynchronous messages, one to initiate the call, and one to return the response. Canisters process messages atomically (and roll back upon certain error conditions), but not complete calls. This makes programming with inter-canister calls error-prone. Possible common sources for bugs, vulnerabilities or simply unexpected behavior are:
  • Reading global state before issuing an inter-canister call, and assuming it to still hold when the call comes back.
  • Changing global state before issuing an inter-canister call, changing it again in the response handler, but assuming nothing else changes the state in between (reentrancy).
  • Changing global state before issuing an inter-canister call, and not handling failures correctly, e.g. when the code handling the callback rolls backs.
If you find such pattern in your code, you should analyze if a malicious party can trigger them, and assess the severity that effect These issues apply to all canisters, and are not Motoko-specific.

Rollbacks Even in the absence of inter-canister calls the behavior of rollbacks can be surprising. In particular, rejecting (i.e. throw) does not rollback state changes done before, while trapping (e.g. Debug.trap, assert , out of cycle conditions) does. Therefore, one should check all public update call entry points for unwanted state changes or unwanted rollbacks. In particular, look for methods (or rather, messages, i.e. the code between commit points) where a state change is followed by a throw. This issues apply to all canisters, and are not Motoko-specific, although other CDKs may not turn exceptions into rejects (which don t roll back).

Talking to malicious canisters Talking to untrustworthy canisters can be risky, for the following (likely incomplete) reasons:
  • The other canister can withhold a response. Although the bidirectional messaging paradigm of the Internet Computer was designed to guarantee a response eventually, the other party can busy-loop for as long as they are willing to pay for before responding. Worse, there are ways to deadlock a canister.
  • The other canister can respond with invalidly encoded Candid. This will cause a Motoko-implemented canister to trap in the reply handler, with no easy way to recover. Other CDKs may give you better ways to handle invalid Candid, but even then you will have to worry about Candid cycle bombs that will cause your reply handler to trap.
Many canisters do not even do inter-canister calls, or only call other trustwothy canisters. For the others, the impact of this needs to be carefully assessed.

Canister upgrade: overview For most services it is crucial that canisters can be upgraded reliably. This can be broken down into the following aspects:
  1. Can the canister be upgraded at all?
  2. Will the canister upgrade retain all data?
  3. Can the canister be upgraded promptly?
  4. Is three a recovery plan for when upgrading is not possible?

Canister upgradeability A canister that traps, for whatever reason, in its canister_preupgrade system method is no longer upgradeable. This is a major risk. The canister_preupgrade method of a Motoko canister consists of the developer-written code in any system func preupgrade() block, followed by the system-generated code that serializes the content of any stable var into a binary format, and then copies that to stable memory. Since the Motoko-internal serialization code will first serialize into a scratch space in the main heap, and then copy that to stable memory, canisters with more than 2GB of live data will likely be unupgradeable. But this is unlikely the first limit: The system imposes an instruction limit on upgrading a canister (spanning both canister_preupgrade and canister_postupgrade). This limit is a subnet configuration value, and sepearate (and likely higher) than the normal per-message limit, and not easily determined. If the canister s live data becomes too large to be serialized within this limit, the canister becomes non-upgradeable. This risk cannot be eliminated completely, as long as Motoko and Stable Variables are used. It can be mitigated by appropriate load testing: Install a canister, fill it up with live data, and exercise the upgrade. If this succeeds with a live data set exceeding the expected amount of data by a margin, this risk is probably acceptable. Bonus points for adding functionality that will prevent the canister s live data to increase above a certain size. If this testing is to be done on a local replica, extra care needs to be taken to make sure the local replica actually performs instruction counting and has the same resource limits as the production subnet. An alternative mitigation is to avoid canister_pre_upgrade as much as possible. This means no use of stable var (or restricted to small, fixed-size configuration data). All other data could be
  • mirrored off the canister (possibly off chain), and manually re-hydrated after an upgrade.
  • stored in stable memory manually, during each update call, using the ExperimentalStableMemory API. While this matches what high-assurance Rust canisters (e.g. the Internet Identity) do, This requires manual binary encoding of the data, and is marked experimental, so I cannot recommend this at the moment.
  • not put into a Motoko canister until Motoko has a scalable solution for stable variable (for example keeping them in stable memory permanently, with smart caching in main memory, and thus obliterating the need for pre-upgrade code.)

Data retention on upgrades Obviously, all live data ought to be retained during upgrades. Motoko automatically ensures this for stable var data. But often canisters want to work with their data in a different format (e.g. in objects that are not shared and thus cannot be put in stable vars, such as HashMap or Buffer objects), and thus may follow following idiom:
stable var fooStable =  ;
var foo = fooFromStable(fooStable);
system func preupgrade()   fooStable := fooToStable(foo);  )
system func postupgrade()   fooStable := (empty);  )
In this case, it is important to check that
  • All non-stable global vars, or global lets with mutable values, have a stable companion.
  • The assignments to foo and fooStable are not forgotten.
  • The fooToStable and fooFromStable form bijections.
An example would be HashMaps stored as arrays via Iter.toArray( .entries()) and HashMap.fromIter( .vals()). It is worth pointiong out that a code view will only look at a single version of the code, but cannot check whether code changes will preserve data on upgrade. This can easily go wrong if the names and types of stable variables are changed in incompatible way. The upgrade may fail loudly in this cases, but in bad cases, the upgrade may even succeed, losing data along the way. This risk needs to be mitigated by thorough testing, and possibly backups (see below).

Prompt upgrades Motoko and Rust canisters cannot be safely upgraded when they are still waiting for responses to inter-canister calls (the callback would eventually reach the new instance, and because of infelicities of the IC s System API, could possibly call arbitrary internal functions). Therefore, the canister needs to be stopped before upgrading, and started again. If the inter-canister calls take a long time, this mean that upgrading may take a long time, which may be undesirable. Again, this risk is reduced if all calls are made to trustworthy canisters, and elevated when possibly untrustworthy canisters are called, directly or indirectly.

Backup and recovery Because of the above risk around upgrades it is advisable to have a disaster recovery strategy. This could involve off-chain backups of all relevant data, so that it is possible to reinstall (not upgrade) the canister and re-upload all data. Note that reinstall has the same issue as upgrade described above in prompt upgrades : It ought to be stopped first to be safe. Note that the instruction limit for messages, as well as the message size limit, limit the amount of data returned. If the canister needs to hold more data than that, the backup query method might have to return chunks or deltas, with all the extra complexity that entails, e.g. state changes between downloading chunks. If large data load testing is performed (as Irecommend anyways to test upgradeability), one can test whether the backup query method works within the resource limits.

Time is not strictly monotonic The timestamps for current time that the Internet Computer provides to its canisters is guaranteed to be monotonic, but not strictly monotonic. It can return the same values, even in the same messages, as long as they are processed in the same block. It should therefore not be used to detect happens-before relations. Instead of using and comparing time stamps to check whether Y has been performed after X happened last, introduce an explicit var y_done : Bool state, which is set to False by X and then to True by Y. When things become more complex, it will be easier to model that state via an enumeration with speaking tag names, and update this state machine along the way. Another solution to this problem is to introduce a var v : Nat counter that you bump in every update method, and after each await. Now v is your canister s state counter, and can be used like a timestamp in many ways. While we are talking about time: The system time (typically) changes across an await. So if you do let now = Time.now() and then await, the value in now may no longer be what you want.

Wrapping arithmetic The Nat64 data type, and the other fixed-width numeric types provide opt-in wrapping arithmetic (e.g. +%, fromIntWrap). Unless explicitly required by the current application, this should be avoided, as usually a too large or negatie value is a serious, unrecoverable logic error, and trapping is the best one can do.

Cycle balance drain attacks Because of the IC s canister pays model, all canisters are prone to DoS attacks by draining their cycle balance, and this risk needs to be taken into account. The most elementary mitigation strategy is to monitor the cycle balance of canisters and keep it far from the (configurable) freezing threshold. On the raw IC-level, further mitigation strategies are possible:
  • If all update calls are authenticated, perform this authentication as quickly as possible, possibly before decoding the caller s argument. This way, a cycle drain attack by an unauthenticated attacker is less effective (but still possible).
  • Additionally, implementing the canister_inspect_message system method allows the above checks to be performed before the message even is accepted by the Internet Computer. But it does not defend against inter-canister messages and is therefore not a complete solution.
  • If an attack from an authenticated user (e.g. a stakeholder) is to be expected, the above methods are not effective, and an effective defense might require relatively involved additional program logic (e.g. per-caller statistics) to detect such an attack, and react (e.g. rate-limiting).
  • Such defenses are pointless if there is only a single method where they do not apply (e.g. an unauthenticated user registration method). If the application is inherently attackable this way, it is not worth the bother to raise defenses for other methods.
Related: A justification why the Internet Identity does not use canister_inspect_message) A motoko-implemented canister currently cannot perform most of these defenses: Argument decoding happens unconditionally before any user code that may reject a message based on the caller, and canister_inspect_message is not supported. Furthermore, Candid decoding is not very cycle defensive, and one should assume that it is possible to construct Candid messages that require many instructions to decode, even for simple argument type signatures. The conclusion for the audited canisters is to rely on monitoring to keep the cycle blance up, even during an attack, if the expense can be born, and maybe pray for IC-level DoS protections to kick in.

Large data attacks Another DoS attack vector exists if public methods allow untrustworthy users to send data of unlimited size that is persisted in the canister memory. Because of the translation of async-await code into multiple message handlers, this applies not only to data that is obviously stored in global state, but also local data that is live across an await point. The effectiveness of such attacks is limited by the Internet Computer s message size limit, which is in the order of a few megabytes, but many of those also add up. The problem becomes much worse if a method has an argument type that allows a Candid space bomb: It is possible to encode very large vectors with all values null in Candid, so if any method has an argument of type [Null] or [?t], a small message will expand to a large value in the Motoko heap. Other types to watch out:
  • Nat and Int: This is an unbounded natural number, and thus can be arbitrarily large. The Motoko representation will however not be much larger than the Candid encoding (so this does not qualify as a space bomb).
It is still advisable to check if the number is reasonable in size before storing it or doing an await. For example, when it denotes an index in an array, throw early if it exceeds the size of the array; if it denotes a token amount to transfer, check it against the available balance, if it denotes time, check it against reasonable bounds.
  • Principal: A Principal is effectively a Blob. The Interface specification says that principals are at most 29 bytes in length, but the Motoko Candid decoder does not check that currently (fixed in the next version of Motoko). Until then, a Principal passed as an argument can be large (the principal in msg.caller is system-provided and thus safe). If you cannot wait for the fix to reach you, manually check the size of the princial (via Principal.toBlob) before doing the await.

Shadowing of msg or caller Don t use the same name for the message context of the enclosing actor and the methods of the canister: It is dangerous to write shared(msg) actor, because now msg is in scope across all public methods. As long as these also use public shared(msg) func , and thus shadow the outer msg, it is correct, but it if one accidentially omits or mis-types the msg, no compiler error would occur, but suddenly msg.caller would now be the original controller, likely defeating an important authorization step. Instead, write shared(init_msg) actor or shared( caller = controller ) actor to avoid using msg.

Conclusion If you write a serious canister, whether in Motoko or not, it is worth to go through the code and watch out for these patterns. Or better, have someone else review your code, as it may be hard to spot issues in your own code. Unfortunately, such a list is never complete, and there are surely more ways to screw up your code in addition to all the non-IC-specific ways in which code can be wrong. Still, things get done out there, so best of luck!

31 October 2021

Joachim Breitner: A mostly allocation-free optional type

The Motoko programming language has a built-in data type for optional values, named ?t with values null and ?v (for v : t); this is the equivalent of Haskell s Maybe, Ocaml s option or Rust s Option. In this post, I explain how Motoko represents such optional values (almost) without allocation. I neither claim nor expect that any of this is novel; I just hope it s interesting.

Uniform representation The Motoko programming language, designed by Andreas Rossberg and implemented by a pretty cool team at DFINITY is a high-level language with strict semantics and a strong, structural, equi-recursive type system that compiles down to WebAssembly. Because the type system supports polymorphism, it represents all values uniformly. Simplified for the purpose of this blog post, they are always pointers into a heap object where the first word of the heap object, the heap tag, contains information about the value:
 
  tag    
 
The tag is something like array, int64, blob, variant, record, , and it has two purposes:
  • The garbage collector uses it to understand what kind of object it is looking at, so that it can move it around and follow pointers therein. Variable size objects such as arrays include the object size in a subsequent word of the heap object.
  • Some types have values that may have different shapes on the heap. For example, the ropes used in our text representation can either be a plain blob, or a concatenation node of two blobs. For these types, the tag of the heap object is inspected.

The optional type, naively The optional type (?t) is another example of such a type: Its values can either be null, or ?v for some value v of type t, and the primitive operations on this type are the two introduction forms, an analysis function, and a projection for non-null values:
null : () -> ?t
some : t -> ?t
is_some : ?t -> bool
project : ?t -> t     // must only be used if is_some holds
It is natural to use the heap tag to distinguish the two kind of values:
  • The null value is a simple one-word heap object with just a tag that says that this is null:
     
      null  
     
  • The other values are represented by a two-word object: The tag some, indicating that it is a ?v, and then the payload, which is the pointer that represents the value v:
     
      some   payload  
     
With this, the four operations can be implemented as follows:
def null():
  ptr <- alloc(1)
  ptr[0] = NULL_TAG
  return ptr
def some(v):
  ptr <- alloc(2)
  ptr[0] = SOME_TAG
  ptr[1] = v
  return ptr
def is_some(p):
  return p[0] == SOME_TAG
def project(p):
  return p[1]
The problem with this implementation is that null() and some(v) both allocate memory. This is bad if they are used very often, and very liberally, and this is the case in Motoko: For example the iterators used for the for (x in e) construct have type
type Iter<T> =   next : () -> ?T  
and would unavoidably allocate a few words on each iteration. Can we avoid this?

Static values It is quite simple to avoid this for for null: Just statically create a single null value and use it every time:
static_null = [NULL_TAG]
def null():
  return static_null
This way, at least null() doesn t allocate. But we gain more: Now every null value is represented by the same pointer, and since the pointer points to static memory, it does not change even with a moving garbage collector, so we can speed up is_some:
def is_some(p):
  return p != static_null
This is not a very surprising change so far, and most compilers out there can and will do at least the static allocation of such singleton constructors. For example, in Haskell, there is only a single empty list ([]) and a single Nothing value in your program, as you can see in my videos exploring the Haskell heap. But can we get rid of the allocation in some(v) too?

Unwrapped optional values If we don t want to allocate in some(v), what can we do? How about simply
def some(v):
  return v
That does not allocate! But it is also broken. At type ??Int, the values null, ?null and ??null are distinct values, but here this breaks. Or, more formally, the following laws should hold for our four primitive operations:
  1. is_some(null()) = false
  2. v. is_some(some(v)) = true
  3. p. project(some(p)) = p
But with the new definition of some, we d get is_some(some(null())) = false. Not good! But note that we only have a problem if we are wrapping a value that is null or some(v). So maybe take the shortcut only then, and write the following:
def some(v):
  if v == static_null   v[0] == SOME_TAG:
    ptr <- alloc(2)
    ptr[0] = SOME_TAG
    ptr[1] = v
    return ptr
  else:
    return v
The definition of is_some can stay as it is: It is still the case that all null values are represented by static_null. But the some values are now no longer all of the same shape, so we have to change project():
def project(p):
  if p[0] == SOME_TAG:
    return p[1]
  else:
    return p
Now things fall into place: A value ?v can, in many cases, be represented the same way as v, and no allocation is needed. Only when v is null or ?null or ??null or ???null etc. we need to use the some heap object, and thus have to allocate. In fact, it doesn t cost much to avoid allocation even for ?null:
static_some_null = [SOME_TAG, static_null]
def some(v):
  if v == static_null:
    return static_some_null
  else if v[0] == SOME_TAG:
    ptr <- alloc(2)
    ptr[0] = SOME_TAG
    ptr[1] = v
    return ptr
  else:
    return v
So unless one nests the ? type two levels deep, there is no allocation whatsoever, and the only cost is a bit more branching in some and project. That wasn t hard, but quite rewarding, as one can now use idioms like the iterator shown above with greater ease.

Examples The following tables shows the representation of various values before and after. Here [ ] is a pointed-to dynamically allocated heap object, a statically allocated heap object, N = NULL_TAG and S = SOME_TAG.
type value before after
Null null N N
?Int null N N
?Int ?23 [S,23] 23
??Int null N N
??Int ?null [S, N ] S, N
??Int ??23 [S,[S,23]] 23
???Int null N N
???Int ?null [S, N ] S, N
???Int ??null [S,[S, N ]] [S, S, N ]
???Int ???23 [S,[S,[S,23]]] 23

Concluding remarks
  • If you know what parametric polymorphism is, and wonder how this encoding can work in a language that has that, note that this representation is on the low-level of the untyped run-time value representation: We don t need to know the type of v in some(v), merely its heap representation.
  • The uniform representation in Motoko is a bit more involved: The pointers are tagged (by subtracting 1) and some scalar values are represented directly in place (shifted left by 1 bit). But this is luckily orthogonal to what I showed here.
  • Could Haskell do this for its Maybe type? Not so easily:
    • The Maybe type is not built-in, but rather a standard library-defined algebraic data type. But the compiler could feasible detect that this is option-like?
    • Haskell is lazy, so at runtime, the type Maybe could be Nothing, or Just v, or, and this is crucial, a yet to be evaluated expression, also called a thunk. And one definitely needs to distinguish between a thunk t :: Maybe a that may evaluate to Nothing, and a value Just t :: Maybe a that definitely is Just, but contains a value, which may be a thunk.
    Something like this may work for a strict Maybe type or unlifted sums like (# (# #) a #), but may clash with other ticks in GHC, such as pointer tagging.
  • As I said above, I don t expect this to be novel, and I am happy to add references to prior art here.
  • Given that a heap object with tag SOME_TAG now always encodes a tower ? null for n>0, one could try to optimize that even more by just storing the n:
     
      some    n   
     
    But that seems unadvisable: It is only a win if you have deep towers, which is probably rare. Worse, now the project function would have to return such a heap object with n decremented, so now projection might have to allocate, which goes against the cost model expected by the programmer.
  • If you rather want to see code than blog posts, feel free to check out Motoko PR #2115.
  • Does this kind of stuff excite you? Motoko is open source, so your contributions may be welcome!

14 September 2021

Joachim Breitner: A Candid explainer: Quirks

This is the fifth and final post in a series about the interface description language Candid. If you made it this far, you now have a good understanding of what Candid is, what it is for and how it is used. For this final post, I ll put the spotlight on specific aspects of Candid that are maybe surprising, or odd, or quirky. This section will be quite opinionated, and could maybe be called what I d do differently if I d re-do the whole thing . Note that these quirks are not serious problems, and they don t invalidate the overall design. I am writing this up not to discourage the use of Candid, but merely help interested parties to understand it better.

References in the wire format When the work on Candid began at DFINITY, the Internet Computer was still far away from being a thing, and many fundamental aspects about it were still in the air. I particular, there was still talk about embracing capabilities as a core feature of the application model, which would be implemented as opaque references on the system level, likely building on WebAssembly s host reference type proposal (which only landed recently), and could be used to model access permissions, custom tokens and many other things. So Candid is designed with that in mind, and you ll find that its wire format is not just a type table and a value table, but actually
a triple (T, M, R), where T ( type ) and M ( memory ) are sequences of bytes and R ( references ) is a sequence of references.
Also the wire format for values of function service tyeps have an extra byte to distinguish between public references (represented by a principal and possible a method name in the data part), and these opaque references. Alas, references never made it into the Internet Computer, so all Candid implementations simply ignore that part of the specification. But it s still in the spec, and if it confused you before, now you know why.

Hashed field names Candid record and variant types look like they have textual field names:
type T = record   syndactyle : nat; trustbuster: bool  
But that is actually only true superficially. The wire format for Candid only stores hashes of field names. So the above is actually equivalent to
type T = record   4260381820 : nat; 3504418361 : bool  
or, for that matter, to
type T = record   destroys : bool; rectum : nat  
(Yes, I used an english word list to find these hash collisions. There aren t that many actually.) The benefit of such hashing is that the messages are a bit smaller in most (not all) cases, but it is a big annoyance for dynamic uses of Candid. It s the reason why tools like dfx, if they don t know the Candid interface of a service, will print the result with just the numerical hash, letting you guess which field is which. It also complicates host languages that derive Candid types from the host language, like Motoko, as some records (e.g. record trustbuster: bool; destroys : int ) with field name hash collisions can not be represented in Candid, and either the host language s type system needs to be Candid aware now (as is the case of Motoko), or serialization/deserialization will fail at runtime, or odd bugs can happen. (More discussion of this issue).

Tuples Many languages have a built-in notion of a tuple type (e.g. (Int, Bool)), but Candid does not have such a type. The only first class product type is records. This means that tuples have to encoded as records somehow. Conveniently(?) record fields are just numbers after all, so the type (Int, Bool) would be mapped to the type
record   0 : int; 1 : bool  
So tuples can be expressed. But from my experience implementing the type mappings for Motoko and Haskell this is causing headaches. To get a good experience when importing from Candid, the tools have to try to reconstruct which records may have been tuples originally, and turn them into tuples. The main argument for the status quo is that Candid types should be canonical, and there should not be more than one product type, and records are fine, and there needs to be no anonymous product type. But that has never quite resonated with me, given the practical reality of tuple types in many host languages.

Argument sequences Did I say that Candid does not have tuple types? Turns out it does, sort of. There is no first class anonymous product, but since functions take sequences of arguments and results, there is a tuple type right there:
func foo : (bool, int) -> (int, bool)
Again, I found that ergonomic interaction with host languages becomes relatively unwieldy by requiring functions to take and return sequences of values. This is especially true for languages where functions take one argument value or return one result type (the latter being very common). Here, return sequences of length one are turned into that type directly, longer argument sequences turn into the host language s tuple type, and nullary argument sequences turn into the idiomatic unit type. But this means that the types (int, bool) -> () and (record 0: int, 1: bool ) -> () may be mapped to the same host language type, which causes problems when you hope to encode all necessary Candid type information in the host language. Another oddity with argument and result sequences is that you can give names to the entries, e.g. write
func hello : (last_name : text; first_name : text) -> ()
but these names are completely ignored! So while this looks like you can, for example, add new optional arguments in the middle, such as
func hello : (last_name : text; middle_name: opt text, first_name : text) -> ()
without breaking clients, this does not have the effect you think it has and will likely break. My suggestion is to never put names on function arguments and result values in Candid interfaces, and for anything that might be extended with new fields or where you want to name the arguments, use a single record type as the only argument:
func hello : (record   last_name : text; first_name : text ) -> ()
This allows you to add and remove arguments more easily and reliably.

Type shorthands The Candid specification defines a system of types, and then adds a number of syntactic short-hands . For example, if you write blob in a Candid type description, it ought to means the same as vec nat8. My qualm with that is that it doesn t always mean the same. A Candid type description is interpreted by a number of, say, consumers . Two such consumers are part of the Candid specification:
  • The specification that defines the wire format for that type
  • The upgrading (subtyping) rules
But there are more! For every host language, there is some mapping from Candid types to host language types, and also generic tools like Candid UI are consumers of the type algebra. If these were to take the Candid specification as gospel, they would be forced to treat blob and vec nat8 the same, but that would be quite unergonomic and might cause performance regressions (most language try to map blob to some compact binary data type, while vec t tends to turn into some form of array structure). So they need to be pragmatic and treat blob and vec nat8 differently. But then, for all practical purposes, blob is not just a short-hand of vec nat8. They are different types that just happens to have the same wire representations and subtyping relations. This affects not just blob, but also tuples (record blob; int; bool ) and field names , as discussed above.

The value text format For a while after defining Candid, the only implementation was in Motoko, and all the plumbing was automatic, so there was never a need for users to to explicitly handle Candid values, as all values were Motoko values. Still, for debugging and testing and such things, we eventually needed a way to print out Candid values, so the text format was defined ( To enable convenient debugging, the following grammar specifies a text format for values ). But soon the dfx tool learned to talk to canisters, and now users needed to enter Candid value on the command line, possibly even when talking to canisters for which the interface was not known to dfx. And, sure enough, the textual interface defined in the Candid spec was used. Unfortunately, since it was not designed for that use case, it is rather unwieldy:
  • It is quite verbose. You have to write record , not just . Vectors are written vec ; instead of some conventional syntax like [ , ]. Variants are written as variant error = " " with braces that don t any value here, and something like #error " " might have worked as well. With a bit more care, a more concise and ergonomic syntax might have been possible.
  • It wasn t designed to be sufficient to create a Candid value from it. If you write 5 it s unclear whether that s a nat or an int16 or what (and all of these have different wire representations). Type annotations were later added, but are relatively unwieldy, and don t cover all cases (e.g. a service reference with a recursive type cannot be represented in the textual format at the moment).
  • Not really the fault of the textual format, but some useful information about the types is not reflected in the type description that s part of the wire format. In particular not the field names, and whether a value was intended to be binary data (blob) or a list of small numbers (vec nat8), so pretty-printing such values requires guesswork. The Haskell library even tries to brute-force the hash to guess the field name, if it is short or in a english word list!
In hindsight I think it was too optimistic to assume that correct static type information is always available, and instead of actively trying to discourage dynamic use, Candid might be better if we had taken these (unavoidable?) use cases into account.

Custom wire format At the beginning of this post, I have a Candid is list. The list is relatively long, and the actual wire format is just one bullet point. Yes, defining a wire format that works isn t rocket science, and it was easiest to just make one up. But since most of the interesting meat of Candid is in other aspects (subtyping rules, host language integration), I wonder if it would have been better to use an existing, generic wire format, such as CBOR, and build Candid as a layer on top. This would give us plenty of tools and libraries to begin with. And maybe it would have reduced barrier of entry for developers, which now led to the very strange situation that DFINITY advocates for the universal use of Candid on the Internet Computer, so that all services can smoothly interact, but two of the most important services on the Internet Computer (the registry and the ledger) use Protobuf as their primary interface format, with Candid interfaces missing or an afterthought.

Sideways Interface Evolution This is not a quirk of Candid itself, but rather an idiom of how you can use Candid that emerged from our solution for record extensions and upgrades. Consider our example from before, a service with interface
service   add_user : (record   login : text; name : text  ) -> ()  
where you want to add an age field, which should be a number. The official way of doing that is to add that field with an optional type:
service   add_user : (record   login : text; name : text; age : opt nat  ) -> ()  
As explained above, this will not break old clients, as the decoder will treat a missing argument as null. So far so good. But often when adding such a field you don t want to bother new clients with the fact that this age was, at some point in the past, not there yet. And you can do that! The trick is to distinguish between the interface you publish and the interface you implement. You can (for example in your documentation) state that the interface is
service   add_user : (record   login : text; name : text; age : nat  ) -> ()  
which is not a subtype of the old type, but it is the interface you want new clients to work with. And then your implementation uses the type with opt nat. Calls from old clients will come through as null, and calls from new clients will come through as opt 42. We can see this idiom used in the Management Canister of the Internet Computer. The current documented interface only mentions a controllers : vec principal field in the settings, but the implementation still can handle both the old controller : principal and the new controllers field. It s probably advisable to let your CI system check that new versions of your service continue to implement all published interfaces, including past interfaces. But as long as the actual implementation s interface is a subtype of all interfaces ever published, this works fine. This pattern is related to when your service implements, say, http_request (so its implemented interface is a subtype of that common interface), but does not include that method in the published documentation (because clients of your service don t need to call it).

Self-describing Services As you have noticed, Candid was very much designed assuming that all parties always have the service type of services they want to interact with. But the Candid specification does not define how one can obtain the interface of a given service, and there isn t really a an official way to do that on the Internet Computer. That is unfortunate, because many interesting features depend on that: Such as writing import C "ic:7e6iv-biaaa-aaaaf-aaada-cai" in your Motoko program, and having it s type right there. Or tools like ic.rocks, that allow you to interact with any canister right there. One reason why we don t really have that feature yet is because of disagreements about how dynamic that feature should be. Should you be able to just ask the canister for its interface (and allow the canister to vary the response, for example if it can change its functionality over time, even without changing the actual wasm code)? Or is the interface a static property of the code, and one should be able to query the system for that data, without the canister s active involvement. Or, completely different, should interfaces be distributed out of band, maybe together with API documentation, or in some canister registry somewhere else? I always leaned towards the first of these options, but not convincingly enough. The second options requires system assistance, so more components to change, more teams to be involved that maybe intrinsically don t care a lot about this feature. And the third might have emerged as the community matures and builds such infrastructure, but that did not happen yet. In the end I sneaked in an implementation of the first into Motoko, arguing that even if we don t know yet how this feature will be implemented eventually, we all want the feature to exist somehow, and we really really want to unblock all the interesting applications it enables (e.g. Candid UI). That s why every Motoko canister, and some rust canisters too, implements a method
__get_candid_interface_tmp_hack : () -> (text)
that one can use to get the Candid interface file. The name was chosen to signal that this may not be the final interface, but like all good provisional solutions, it may last longer than intended. If that s the case, I m not sorry. This concludes my blog post series about Candid, for now. If you want to know more, feel free to post your question on the DFINTY developer forum, and I ll probably answer.

13 September 2021

Joachim Breitner: A Candid explainer: Language integration

This is the forth post in a series about the interface description language Candid. Now for something completely different: How does Candid interact with the various host languages, i.e. the actual programming languages that you write your services and clients in? There are two facets to that question:
  1. How is the Candid interface represented inside the language? Some languages with rich type systems can express all relevant information about a Candid method or service within its own type system, and then the concrete serialization/deserialization code can be derived from that type (e.g. using type classes in Haskell, Traits in Rust, or built into the compiler in Motoko). Other languages have a less rich type system (e.g. C), no type-driven generic programming or are simply dynamically typed. In these cases, the Candid type has to be either transformed into specific code for that service by some external tool (e.g. JavaScript and TypeScript) or the Candid description has to be parsed and interpreted at runtime. Either approach will give rise to a type mapping between the Candid types and the host language types. Developers will likely have to know which types correspond to which, which is why the Candid manual s section on types explicitly lists that.
  2. How is the Candid interface description produced and consumed? This is maybe the even more important question; what comes first: The code written in the host language, or the Candid description. There are multiple ways to tackle this, all of which have their merits, so I let s look at some typical approaches.

Generating candid from the host language In many case you don t care too much about the interface of your service, and you just want to write the functionality, and get the interface definition for free. This is what you get when you write Motoko services, where the compiler calculates the Candid interface based on the Motoko types of your actor methods, and the build tool (dfx) puts that Candid file where it needs to go. You can thus develop services without ever writing or even looking at Candid type definitions. The Candid library for Rust supports that mode as well, although you have to add some macros to your program to use it. A downside of this model is that you have only indirect control over the generated Candid. Since it is type-driven, whenever there is more than one sensible Candid type for a given host language type, the translation tool has to make a choice, and if that does not suit you, that can be a problem. In the case of Motoko we were able to co-design it with Candid, and their type systems are similar enough that this works well in practice. We have a specification of Candid-Motoko-type-mappings, and the type export from from Motoko to Candid is almost surjective. (Almost, because of Candid s float32 type, which Motoko simply does not have, and because of service types with methods names that are not valid Motoko identifiers.)

Checking host language against Candid The above works well when you (as the service developer) get to define the service s interface as you go. But sometimes you want to develop a service that adheres to a given Candid interface. For example, in order to respond to HTTP requests in an Internet Computer service, you should provide a method http_request that implements this interface (simplified):
type HeaderField = record   text; text;  ;
type HttpRequest = record  
  method: text;
  url: text;
  headers: vec HeaderField;
  body: blob;
 ;
type HttpResponse = record  
  status_code: nat16;
  headers: vec HeaderField;
  body: blob;
 ;
service :  
  http_request: (request: HttpRequest) -> (HttpResponse) query;
 
Here, a suitable mode of operation is to generate the Candid description of the service that you built, and then compare it against this expected interface with a tool that implements the Candid subtyping relation. This would then complain if what you wrote was not compatible with the above interface. The didc check tool that comes with the Rust library can do that. If your service has to implement multiple such pre-defined interfaces, its actual interface will end up being a subtype of each of these interfaces.

Importing Candid If you already have a Candid file, in particular if you are writing a client that wants to talk to an existing service, you can also import that Candid file into your language. This is a very common mode of operation for such interface descriptions languages (e.g. Protobuf). The details depend a lot on the host language, though:
  • In Motoko, you can import typed actor references: Just write import C "canister:foo" where foo is the name of another canister in your project, and your build tool (dfx) will pass the Candid interface of foo to the Motoko compiler, which then translates that Candid service type into a Motoko actor type for you to use. The mapping from Candid types to Motoko types is specified in IDL-Motoko.md as well, via the function i( ). The vision was always that you can import a reference to any canister on the Internet Computer this way (import C "ic:7e6iv-biaaa-aaaaf-aaada-cai"), and the build tool would fetch the interface automatically, but that has not been implemented yet, partly because of disagreement about how canisters expose their interface (see below). The Motoko compiler is ready, though, as described in its build tool interface specification. What is still missing in Motoko is a way to import a Candid type alone (without a concrete canister reference), to use the types somewhere (in function arguments, or to assert the type of the full canister).
  • The Rust library does not support importing Candid, as far as I know. If you write a service client in Rust, you have to know which Rust types map to the right Candid types, and manually get that right.
  • For JavaScript and TypeScript, the didc tool that comes with the Rust library can take a Candid file and produce JS resp. TS code that gives you an object representing the service with properly typed methods that you can easily interact with.
  • With the Haskell Candid library you can import Candid inline or from a file, and it uses metaprogramming (Template Haskell) to generate suitable Haskell types for you, which you can encode/decode at. This is for example used in the test suite of the Internet Identity, which executes that service in a simulated Internet Computer environment.

Generic and dynamic use Finally, it s worth mentioning that Candid can be used generically and dynamically:
  • Since Canisters can indicate their interface, a website like ic.rocks that enumerates Canisters can read that interface, and provide a fully automatically generated UI for them. For example, you can not only see the Candid interface of the Internet Identity canister at https://ic.rocks/principal/rdmx6-jaaaa-aaaaa-aaadq-cai, but actually interact with it!
  • Similarly, during development of a service, it is very useful to be able to interact with it before you have written your actual front-end. The Candid UI tool provides that functionality, and can be used during local development or on-chain. It is also integrated into the Internet Computer playground.
  • Command line tools like dfx allow you to make calls to services, even without having their actual Candid interface description around. However, such dynamic use was never part of the original design of Candid, so it is a bit rough around the edges the textual format is a bit unwieldy, you need to get the types right, and field name in the responses may be missing.
Do you want to know why field names are missing then? Then check out the next and final post of this series, where I discuss a number of Candid s quirks.

7 September 2021

Joachim Breitner: A Candid explainer: Opt is special

This is the third post in a series about the interface description language Candid.

The record extension problem Initially, the upgrade rules of Candid were a straight-forward application of the canonical subtyping rules. This worked and was sound, but it forbid one very commonly requested use case: Extending records in argument position. The subtyping rule for records say that
   record   old_field : nat; new_field : int  
<: record   old_field : nat  
or, in words, that subtypes can have more field than supertypes. This makes intuitive sense: A user of a such a record value that only expects old_field to be there is not bothered if suddenly a new_field appears. But a user who expects new_field to be there is thrown off if it suddenly isn t anymore. Therefore it is ok to extend the records returned by a service with new fields, but not to extend the record in your methods s argument types. But users want to extend their record types over time, also in argument position! In fact, they often want to extend them in both positions. Consider a service with the following interface, where the CUser record appears both in argument and result position:
type CUser = record   login : text; name : text  ;
service C :  
  register_user : (CUser) -> ();
  get_user_data : (text) -> (CUser);
 
It seems quite natural to want to extend the record with a new field (say, the age of the user). But simply changing the definition of CUser to
type CUser = record   login : text; name : text; age : nat  
is not ok, because now register_user requires the age field, but old clients don t provide it. So how can we allow developers to make changes like this (because they really really want that), while keeping the soundness guarantees made by Candid (because we really really want that)? This question has bothered us for over two years, and we even created a meta issue that records the dozen approached we considered, discussed and eventually ditched.

Dynamic subtyping checks in opt I will spare you the history lesson, though, and explain the solution we eventually came up with. In the example above it seems unreasonable to let the developer add a field age of type nat. Since there may be old clients around, the service unavoidably has to deal with records that don t have an age field. If the code expects an age value, what should it do in that case? But one can argue that changing the record type to
type CUser = record   login : text; name : text; age : opt nat  
could work: If the age field is present, use the value. If the field is absent, inject a null value during decoding. In the first-order case, this rather obvious solution would work just fine, and we d be done. But Candid aims to solve the higher order case, and I said things get tricky here, didn t I? Consider another, independent service D with the following interface:
type DUser = record   login : text; name : text  ;
service D :  
  on_user_added : (func (DUser) -> ()) -> ()
 
This service has a method that takes a method reference, presumably with the intention of calling it whenever a new user was added to service D. And because the types line up nicely, we can compose these two services, by once calling D.on_user_added(C.register_user), so that from now on D calls C.register_user( ). These kind of service mesh-ups are central to the vision of the Internet Computer! But what happens if the services now evolve their types in different ways, e.g. towards
type CUser = record   login : text; name : text; age : opt nat  
and
type DUser = record   login : text; name : text; age : opt variant   under_age; adult  
Individually, these are type changes that we want to allow. But now the call from D to C may transmit a value of record ; age = opt variant under_age when C expects a natural number! This is precisely what we want to prevent by introducing types, and it seems we have lost soundness. The best solution we could find is to make the opt type somewhat special, and apply these extra rules when decoding at an expected type of opt t.
  • If this was a record field, and the provided record value doesn t even have a field of that name, don t fail but instead treat this as null. This handles the first-order example above.
  • If there is a value, look at its type (which, in Candid, is part of the message).
    • If it is opt t' and t' <: t, then decode as normal.
    • If it is opt t' but t' <: t does not hold, then treat that as null. This should only happen in these relatively obscure higher-order cases where services evolved differently and incompatibly, and makes sure that the calls that worked before the upgrades will continue to work afterwards. It is not advisable to actually use that rule when changing your service s interface. Tools that assist the developer with an upgrade check should prevent or warn about the use of this rule.
  • Not strictly required here, but since we are making opt special anyways: If its type t' is not of the form opt , pretend it was opt t', and also pretend that the given value was wrapped in opt. This allows services to make record field in arguments that were required in the old version optional in a new version, e.g. as a way to deprecated them. So it is mildly useful, although I can report that it makes the meta-theory and implementation rather complex, in particular together with equirecursion and beasts like type O = opt O. See this discussion for a glimpse of that.
In the above I stress that we look at the type of the provided value, and not just the value. For example, if the sender sends the value opt vec at type opt vec nat, and the recipient expects opt vec bool, then this will decode as null, even though one could argue that the value opt vec could easily be understood at type opt vec bool. We initially had that, but then noticed that this still breaks soundness when there are references inside, and we have to do a full subtyping check in the decoder. This is very unfortunate, because it makes writing a Candid decoder a noticeably harder task that requires complicated graph algorithms with memoization (which I must admit Motoko has not implemented yet), but it s the least bad solution we could find so far.

Isn t that just dynamic typing? You might have noticed that with these rules, t <: opt t' holds for all types t and t'. In other words, every opt is a top type (like reserved), thus all optional types are equivalent. One could argue that we are throwing all of the benefits of static typing over board this way. But it s not quite as bad: It s true that decoding Candid values now involves a dynamic check inserting null values in certain situations, but as long as everyone plays by the rules (i.e. upgrades their services according to the Candid safe upgrade rules, and heeds the warning alluded to above), these certain situations will only happen in the obscurest of cases. It is, by the way, not material to this solution that the special subtyping behavior is applied to the opt type, and a neighboring point in the design space would be a dedicated upgraded t type constructor. That would allow developers to use the canonical opt type with the usual subtyping rules and communicate their intent ( clients should consider this field optional vs. old clients don t know about this field, but new clients should use it ) more cleanly at the expense of having more non-canonical types in the type system. To see why it is convenient if Candid has mostly just the normal types, read the next post, which will describe how Candid can be integrated into a host language.

30 August 2021

Joachim Breitner: A Candid explainer: Safe higher-order upgrades

This is the second post in a series about the interface description language Candid.

Safe upgrades A central idea behind Candid is that services evolve over time, and so also their interfaces evolve. As they do, it is desirable to keep the interface usable by clients who have not been updated. In particular on a blockchainy platform like the Internet Computer, where some programs are immutable and cannot be changed to accommodate changes in the interface of the services they use, this is of importance. Therefore, Candid defines which changes to an interface are guaranteed to be backward compatible. Of course it s compatible to add new methods to a service, but some changes to a method signature can also be ok. For example, changing
service A1 :  
  get_value : (variant   current; previous : nat  )
    -> (record   value : int; last_change : nat  )
 
to
service A2 :  
  get_value : (variant   current; previous : nat; default  )
    -> (record   value : int; last_change : nat; committed : bool  )
 
is fine: It doesn t matter that clients that still use the old interface don t know about the new constructor of the argument variant. And the extra field in the result record will silently be ignored by old clients. In the Candid spec, this relation is written as A2 <: A1 (note the order!), and the formal footing this builds on is subtyping. We thus say that it is safe to upgrade a service to a subtype , and that A2 s interface is a subtype of A1 s. In small examples, I often use nat and int, because every nat is also a int, but some int values are not not nat values, namely the negative ones. We say nat is a subtype of int, nat <: int. So a function that in the first returns a int can in the new version return a nat without breaking old clients. But the other direction doesn t work: If the old version s method had a return type of nat, then changing that to int would be a breaking change, as old clients would not be prepared to handle negative numbers. Note that arguments of function types follow different rules. In fact, the rules are simply turned around, and in argument position (also called negative position ), we can evolve the interface to accept supertypes. Concretely, a function that originally expected an nat can be changed to expect an int, but the other direction doesn t work, because there might still be old clients around that send negative numbers. This reversing-of-the-rules is called contravariance. The vision is that the developer s tooling will warn the developer before they upgrade the code of a running service to a type that is not a subtype of the old interface. Let me stress, though, that all this is about not breaking existing clients that use the old interface. It does not mean that a client developer who fetches the new interface for your service won t have to change their code to make his programs compile again.

Higher order composition So far, we considered the simple case of one service with a bunch of clients, i.e. the first order situation. Things get much more interesting if we have multiple services that are composed, and that pass around references to each other, and we still want everything to be nicely typed and never fail even as we upgrade services. Therefore, Candid s type system can express that a service s method expects or a returns a reference to another service or method, and the type that this service or method should have. For example
service B :   add_listener : (text, func (int) -> ()) -> ()  
says that the service with interface B has a method called add_listener which takes two arguments, a plain value of type text and a reference to a function that itself expects a int-typed argument. The contravariance of the subtyping rules explained above also applies to the types of these function reference. And because the int in the above type is on the left of two function arrows, the subtyping rule direction flips twice, and it is therefore safe to upgrade the service to accept the following interface:
service B :   add_listener : (text, func (nat) -> ()) -> ()  

Soundness theorem and Coq proof The combination of these higher order features and the safe upgrade mechanism is maybe the unique selling point of Candid, but also what made its design quite tricky sometimes. And although the conventional subtyping rules work well, we wanted to do some less conventional things (more on that below), and more than once thought we had a good solution, only to notice that it did not hold water in complicated higher-order cases. But what does it mean to hold water? I felt the need to precisely define a soundness criteria for higher order interface description languages, which you can find in the document An IDL Soundness Proposition. This is a general framework which you can instantiate with your concrete IDL language and rules, and then it tells you what you have to prove to consider your language to be sound. Intuitively, it simulates all possible ways in which services can be defined and upgraded, and in which they can pass around references to each other, and the soundness property is that then that for all messages sent between services, they can always be understood. The document also shows, in general, that any system that builds on canonical subtyping in sound, as expected. That is still a generic theorem that you can instantiate with a concrete system, but now there is less to do. Because these proofs can get tricky in corner cases, it is valuable to mechanize them in an interactive theorem prover and have the computer check the proofs. So I have created a Coq formalization that defines the soundness criteria, including the reduction to canonical subtyping. It also defines MiniCandid, a greatly simplified variant of Candid, proves various properties about it (transitivity etc.) and finally instantiates the soundness theorem. I am particularly fond of the use of coinduction to model the equirecursive types, and the use of named cases, as we know them from Isabelle, to make the proofs a bit more readable and maintainable. I am less fond of how much work it seems to be to extend MiniCandid with more of Candid s types. Already adding vec was more work than it s worth it, and I defensively blame Coq s not-so-great support for nested recursion. The soundness relies on subtyping, and that is all fine and well as long as we stick to canonical rules. Unfortunately, practical considerations force us to deviate from these a bit, as we see in the next post of this series.

29 August 2021

Joachim Breitner: A Candid explainer: The rough idea

One of the technologies that was created by DFINITY as we built the Internet Computer is Candid. Candid is In this in-depth blog post series I want to shed some light onto these aspects of Candid. The target audience is mainly anyone who wants to deeply understand Candid, e.g. researchers, implementors of Candid tools, anyone who wants to builds a better alternative, but no particular prior Candid knowledge is expected. Also, much of what is discussed here is independent of the Internet Computer. Some posts may be more theoretical or technical than others; if you are lost in one I hope you ll rejoin for the subsequent post Much in these posts is not relevant for developers who just want to use Candid in their projects, and also much that they want to know is missing. The Internet Computer dev docs will hopefully cater to that audience.

Blog post series overview
  1. The rough idea (this post) Announcing the blog post series, and very briefly outlining what an Interface Description Language even is.
  2. Safe higher-order upgrades Every can do first-order interface description languages. What makes Candid special is its support for higher-order service composition. This post also contains some general meta-theory and a Coq proof!
  3. Opt is special Extending records in argument position is surprisingly tricky. Lean why, and how we solved it.
  4. Language integration The point of Candid is inter-op, so let s look the various patterns of connecting Candid to your favorite programming language.
  5. Quirks The maybe most interesting post in this series: Candid has a bunch of quirks (hashed field names, no tuples but sequnces, references that nobody used, and more). I ll explain some of them, why they are there, and what else we could have done. Beware: This post will be opinionated.

The rough idea The idea of an interface description language is easy to begin with. Something like
service A :  
  add : (int) -> ();
  get : () -> (int);
 
clearly communicates that a service called A provides two methods add and get, that add takes an integer as an argument, and that get returns such a number. Of course, this only captures the outer shape of the service, but not what it actually does we might assume it implements a counter of sorts, but that is not part of the Candid interface description. In order to now use the service, we also need a (low-level) transport mechanism. Candid itself does not specify that, but merely assumes that it exists, and is able to transport the name of the invoked method and a raw sequence of bytes to the service, and a response (again a raw sequence of bytes) back. Already on the Internet Computer we have two distinct transport mechanisms used are external calls via an HTTP-based RPC interface and on-chain inter-canister calls. Both are handled by the service in the same way. Here we can see that Candid succeeds in abstracting over differences in the low-level transport mechanism. Because of this abstraction, it is possible to use Candid over other transportation mechanisms as well (conventional HTTPS, E-Mail, avian carrier). I think Candid has potential there as well, so even if you are not interested in the Internet Computer, this post may be interesting to you. The translation of argument and result values (e.g. numbers of type int) to the raw sequence of bytes is specified by Candid, defining a wire format. For example, the type int denotes whole numbers of arbitrary size, and they are encoded using the LEB128 scheme. The Candid wire format also contains a magic number (DIDL, which is 0x4449444c in hex) and a type description (more on that later). So when passing the value 2342 to the add, the raw bytes transported will be 0x4449444c00017da612. Of course we want to transfer more data than just integers, and thus Candid supports a fairly complete set of common basic types (nat, int, nat8, nat16, nat32, nat64, int8, int16, int32, int64, float32, float64, bool, text, blob) and composite types (vectors, optional values , records and variants). The design goal is to provide a canonical set of types enough to express most data that you might want to pass, but no more than needed, so that different host languages can support these types easily. The Candid type system is structural. This means that two types are the same when they are defined the same way. Imagine two different services defining the same
type User = record   name : text; user_id : nat  
then although User is defined in two places, it s still the same type. In other words, these name type definitions are always just simple aliases, and what matters is their right-hand side. Because Candid types can be recursive (e.g. type Peano = opt Peano), this means we have an equirecursive type system, which makes some things relatively hard. So far, nothing too special about Candid compared to other interface definition languages (although a structural equirecursive type system is already something). In the next post we look at reference types, and how such higher order features make Candid an interesting technology.

Next.