Search Results: "Keith Packard"

01 July 2017

Keith Packard: DRM-lease-4

DRM leasing part three (vblank) The last couple of weeks have been consumed by getting frame sequence numbers and events handled within the leasing environment (and Vulkan) correctly. Vulkan EXT_display_control extension This little extension provides the bits necessary for applications to track the display of frames to the user.
VkResult
vkGetSwapchainCounterEXT(VkDevice           device,
             VkSwapchainKHR         swapchain,
             VkSurfaceCounterFlagBitsEXT    counter,
             uint64_t           *pCounterValue);
This function just retrieves the current frame count from the display associated with swapchain.
VkResult
vkRegisterDisplayEventEXT(VkDevice          device,
              VkDisplayKHR          display,
              const VkDisplayEventInfoEXT   *pDisplayEventInfo,
              const VkAllocationCallbacks   *pAllocator,
              VkFence           *pFence);
This function creates a fence that will be signaled when the specified event happens. Right now, the only event supported is when the first pixel of the next display refresh cycle leaves the display engine for the display. If you want something fancier (like two frames from now), you get to do that on your own using this basic function. drmWaitVBlank drmWaitVBlank is the existing interface for all things sequence related and has three modes (always nice to have one function do three things, I think). It can:
  1. Query the current vblank number
  2. Block until a specified vblank number
  3. Queue an event to be delivered at a specific vblank number
This interface has a few issues: For leases, figuring out the index into the kernel list of crtcs is pretty tricky -- our lease has a subset of those crtcs, so we can't actually compute the global crtc index. drmCrtcGetSequence
int drmCrtcGetSequence(int fd, uint32_t crtcId,
               uint64_t *sequence, uint64_t *ns);
Here's a simple new function hand it a crtc ID and it provides the current frame sequence number and the time when that frame started (in nanoseconds). drmCrtcQueueSequence
int drmCrtcQueueSequence(int fd, uint32_t crtcId,
                 uint32_t flags, uint64_t sequence,
             uint64_t user_data);
struct drm_event_crtc_sequence  
    struct drm_event    base;
    __u64           user_data;
    __u64           time_ns;
    __u64           sequence;
 ;
This will cause a CRTC_SEQUENCE event to be delivered at the start of the specified frame sequence. That event will include the frame when the event was actually generated (in case it's late), along with the time (in nanoseconds) when that frame was started. The event also includes a 64-bit user_data value, which can be used to hold a pointer to whatever data the application wants to see in the event handler. The 'flags' argument contains a combination of:
#define DRM_CRTC_SEQUENCE_RELATIVE      0x00000001  /* sequence is relative to current */
#define DRM_CRTC_SEQUENCE_NEXT_ON_MISS      0x00000002  /* Use next sequence if we've missed */
#define DRM_CRTC_SEQUENCE_FIRST_PIXEL_OUT   0x00000004  /* Signal when first pixel is displayed */
These are similar to the values provided for the drmWaitVBlank function, except I've added a selector for when the event should be delivered to align with potential future additions to Vulkan. Right now, the only time you can ask for is first-pixel-out, which says that the event should correspond to the display of the first pixel on the screen. DRM events Vulkan fences With the kernel able to deliver a suitable event at the next frame, all the Vulkan code needed was a to create a fence and hook it up to such an event. The existing fence code only deals with rendering fences, so I added window system interface (WSI) fencing infrastructure and extended the radv driver to be able to handle both kinds of fences within that code. Multiple waiting threads I've now got three places which can be waiting for a DRM event to appear:
  1. Frame sequence fences.
  2. Wait for an idle image. Necessary when you want an image to draw the next frame to.
  3. Wait for the previous flip to complete. The kernel can only queue one flip at a time, so we have to make sure the previous flip is complete before queuing another one.
Vulkan allows these to be run from separate threads, so I needed to deal with multiple threads waiting for a specific DRM event at the same time. XCB has the same problem and goes to great lengths to manage this with a set of locking and signaling primitives so that only one thread is ever doing poll or read from the socket at time. If another thread wants to read at the same time, it will block on a condition variable which is then signaled by the original reader thread at the appropriate time. It's all very complicated, and it didn't work reliably for a number of years. I decided to punt and just create a separate thread for processing all DRM events. It blocks using poll(2) until some events are readable, processes those and then broadcasts to a condition variable to notify any waiting threads that 'something' has happened. Each waiting thread simply checks for the desired condition and if not satisfied, blocks on that condition variable. It's all very simple looking, and seems to work just fine. Code Complete, Validation Remains At this point, all of the necessary pieces are in place for the VR application to take advantage of an HMD using only existing Vulkan extensions. Those will be automatically mapped into DRM leases and DRM events as appropriate. The VR compositor application is working pretty well; tests with Dota 2 show occasional jerky behavior in complex scenes, so there's clearly more work to be done somewhere. I need to go write a pile of tests to independently verify that my code is working. I wonder if I'll need to wire up some kind of light sensor so I can actually tell when frames get displayed as it's pretty easy to get consistent-but-wrong answers in this environment. Source Code

30 May 2017

Keith Packard: DRM-lease-3

DRM leasing part three (Vulkan) With the kernel APIs off for review, and the X RandR bits looking like they're in reasonable shape, I finally found some time to sit down and figure out how I wanted to integrate this into Vulkan. Avoiding two DRM file descriptors Given that a DRM lease is represented by a DRM master file descriptor, we want to use that for all of the operations in the driver, including rendering and mode setting. Using the vulkan driver render node and the lease master node together would require passing buffer objects between the kernel contexts using even more file descriptors. The Mesa Vulkan drivers open the device nodes while enumerating devices, not when they are created. This seems a bit early to me, but it makes sure that the devices being enumerated are actually available for use, and not just present in the system. To replace the render node fd with the lease master fd means hooking into the system early enough that the enumeration code can see the lease fd. And that means creating an instance extension as the instance gets created before devices are enumerated. The VK_KEITHP_kms_display instance extension This simple instance extension provides the necessary hooks to get the lease information from the application down into the driver before the DRM node is opened. In the first implementation, I added a function that could be called before the devices were enumerated to save the information in the Vulkan loader. That worked, but required quite a bit of study of the Vulkan loader and its XML description of the full Vulkan API. Mark Young suggested that a simpler plan would be to chain the information into the VkInstanceCreateInfo pNext field; with no new APIs added to Vulkan, there shouldn't be any need to change the Vulkan loader -- the device driver would advertise the new instance extension and the application could find it. That would have worked great, except the Vulkan loader 'helpfully' elides all instance extensions it doesn't know about before returning the list to the application. I'd say this was a bug and should be fixed, but for now, I've gone ahead and added the few necessary definitions to the loader to make it work. In the application, it's a simple matter of searching for this extension, constructing the VkKmsDisplayInfoKEITHP structure, chaining that into the VkInstanceCreateInfo pNext list and passing that in to the vkCreateInstance call.
typedef struct VkKmsDisplayInfoKEITHP  
    VkStructureType         sType;  /* VK_STRUCTURE_TYPE_KMS_DISPLAY_INFO_KEITHP */
    const void*             pNext;
    int                     fd;
    uint32_t                crtc_id;
    uint32_t                *connector_ids;
    int                     connector_count;
    drmModeModeInfoPtr      mode;
  VkKmsDisplayInfoKEITHP;
As you can see, this includes the master file descriptor along with all of the information necessary to set the desired video mode using the specified resources. The driver just walks the pNext list from the VkInstanceCreateInfo structure looking for any provided VkKmsDisplayInfoKEITHP structure and pulls the data out. To avoid questions about file descriptor lifetimes, the driver dup's the provided fd. The application is expected to close their copy at a suitable time. The VK_KHR_display extension Vulkan already has an API for directly accessing the raw device, including code for exposing video modes and everything. As tempting as it may be to just go do something simpler, there's a lot to be said for using existing APIs. This extension doesn't provide any direct APIs for acquiring display resources, relying on the VK_EXT_acquire_xlib_display extension for that part. And that takes a VkPhysicalDisplay parameter, which is only available after the device is opened, which is why I created the VK_KEITHP_kms_display extension instead of just using the VK_EXT_acquire_xlib_display extension -- we can't increase the capabilities of the render node opened by the driver, and we don't want to keep two file descriptors around. With the information provided by the VK_KEITHP_kms_display extension, we can implement all of the VK_KHR_display extension APIs, including enumerating planes and modes and creating the necessary display surface. Of course, there's only one plane and one mode, so some of the implementation is pretty simplistic. The big piece of work was to create the swap chain structure and associated frame buffers. A working example I've taken the 'cube' example from the Vulkan loader and hacked it up to use XCB to construct a DRM lease, the VK_KEITHP_kms_display extension to pass that lease into the Vulkan driver. The existing support for the VK_KHR_display extension "just worked", which was pretty satisfying. It's a bit of a mess I'm not satisfied with the mesa code at this point; there's a bunch of code in the radeon driver which should be in the vulkan WSI bits, and the vulkan WSI bits should probably not have the KMS interfaces wired in. I'll ask around and see what other Mesa developers think I should do before restructuring it further; I'll probably have to rewrite it all at least one more time before it's ready to upstream. Seeing the code I'll be cleaning the code up a bit before sending it out for review, but it's already visible in my own repositories:

25 April 2017

Keith Packard: TeleMini3

TeleMini V3.0 Dual-deploy altimeter with telemetry now available TeleMini v3.0 is an update to our original TeleMini v1.0 flight computer. It is a miniature (1/2 inch by 1.7 inch) dual-deploy flight computer with data logging and radio telemetry. Small enough to fit comfortably in an 18mm tube, this powerful package does everything you need on a single board: I don't have anything in these images to show just how tiny this board is but the spacing between the screw terminals is 2.54mm (0.1in), and the whole board is only 13mm wide (1/2in). This was a fun board to design. As you might guess from the version number, we made a couple prototypes of a version 2 using the same CC1111 SoC/radio part as version 1 but in the EasyMini form factor (0.8 by 1.5 inches). Feed back from existing users indicated that bigger wasn't better in this case, so we shelved that design. With the availability of the STM32F042 ARM Cortex-M0 part in a 4mm square package, I was able to pack that, the higher power CC1200 radio part, a 512kB memory part and a beeper into the same space as the original TeleMini version 1 board. There is USB on the board, but it's only on some tiny holes, along with the cortex SWD debugging connection. I may make some kind of jig to gain access to that for configuration, data download and reprogramming. For those interested in an even smaller option, you could remove the screw terminals and battery connector and directly wire to the board, and replace the beeper with a shorter version. You could even cut the rear mounting holes off to make the board shorter; there are no components in that part of the board.

01 April 2017

Keith Packard: DRM-lease-2

DRM leasing part deux (kernel side) I've stabilized the kernel lease implementation so that leases now work reliably, and don't require a reboot before running the demo app a second time. Here's whats changed:
  1. Reference counting is hard. I'm creating a new drm_master, which means creating a new file. There were a bunch of reference counting errors in my first pass; I've cleaned those up while also reducing the changes needed to the rest of the DRM code.
  2. Added a 'mask_lease' value to the leases -- this controls whether resources in the lease are hidden from the lessor, allowing the lessor to continue to do operations on the leased resources if it likes.
  3. Hacked the mutex locking assertions to not crash in normal circumstances. I'm now doing: BUG_ON(__mutex_owner(&master->dev->mode_config.idr_mutex) != current); to make sure the mutex is held by the current thread, instead of just making sure some thread holds the mutex. I have this memory of a better way to do this, but now I can't dig it up. Suggestions welcome, of course.
I'm reasonably pleased with the current state of the code, although I want to squash the patches together so that only the final state of the design is represented, rather than the series of hacks present now. Comments on #dri-devel I spent a bit of time on the #dri-devel IRC channel answering questions about the DRM-lease design and the changes above reflect that. One concern was about mode setting from two masters at the same time. Mode setting depends on a number of shared 'hidden' resources; things like memory fifos and the like. If either lessee or lessor wants to change the displayed modes, they may fail due to conflicts over these resources and not be able to recover easily. A related concern was that the current TEST/render/commit mechanism used by compositors may no longer be reliable as another master could change hidden resource usage between the TEST and commit operations. Daniel Vetter suggested allowing the lessor to 'exclude' lessee mode sets during this operation. One solution would be to have the lessor set the mode on the leased resources before ceding control of the objects, and once set, the lessee shouldn't perform additional mode setting operations. This would limit the flexibility in the lessee quite a bit as it wouldn't be able to pop up overlay planes or change resolution. Questions Let's review the questions from my last post, DRM-lease: Remaining Kernel Work The code is running, and appears stable. However, it's not quite done yet. Here's a list of remaining items that I know about:
  1. Changing leases should update sub-leases. When you reduce the resources in one lease, the kernel should walk any sub-leases and clear out resources which the lessor no longer has access to.
  2. Sending events when leases are created/destroyed. When a lease is created, if the mask_lease value is set, then the lessor should get regular events describing the effective change. Similarly, both lessor and lessee should get events when a lease is changed.
  3. Refactoring the patch series to squash intermediate versions of the new code.
Remaining Other Work Outside of the kernel, I'll be adding X support for this operation. Here's my current thinking: With that, I think this phase of the project will be wrapped up and it will be time to move on to actually hooking up a real HMD and hacking the VR code to use this new stuff. Seeing the code For those interested in seeing the state of the code so far, there's kernel, drm and kmscube repositories here:

28 March 2017

Keith Packard: DRM-lease

DRM display resource leasing (kernel side) So, you've got a fine head-mounted display and want to explore the delights of virtual reality. Right now, on Linux, that means getting the window system to cooperate because the window system is the DRM master and holds sole access to all display resources. So, you plug in your device, play with RandR to get it displaying bits from the window system and then carefully configure your VR application to use the whole monitor area and hope that the desktop will actually grant you the boon of page flipping so that you will get reasonable performance and maybe not even experience tearing. Results so far have been mixed, and depend on a lot of pieces working in ways that aren't exactly how they were designed to work. We could just hack up the window system(s) and try to let applications reserve the HMD monitors and somehow removing them from the normal display area so that other applications don't randomly pop up in the middle of the screen. That would probably work, and would take advantage of much of the existing window system infrastructure for setting video modes and performing page flips. However, we've got a pretty spiffy standard API in the kernel for both of those, and getting the window system entirely out of the way seems like something worth trying. I spent a few hours in Hobart chatting with Dave Airlie during LCA and discussed how this might actually work. Goals
  1. Use KMS interfaces directly from the VR application to drive presentation to the HMD.
  2. Make sure the window system clients never see the HMD as a connected monitor.
  3. Maybe let logind (or other service) manage the KMS resources and hand them out to the window system and VR applications.
Limitations
  1. Don't make KMS resources appear and disappear. It turns out applications get confused when the set of available CRTCs, connectors and encoders changes at runtime.
An Outline for Multiple DRM masters By the end of our meeting in Hobart, Dave had sketched out a fairly simple set of ideas with me. We'd add support in the kernel to create additional DRM masters. Then, we'd make it possible to 'hide' enough state about the various DRM resources so that each DRM master would automagically use disjoint subsets of resources. In particular, we would.
  1. Pretend that connectors were always disconnected
  2. Mask off crtc and encoder bits so that some of them just didn't seem very useful.
  3. Block access to resources controlled by other DRM masters, just in case someone tried to do the wrong thing.
Refinement with Eric over Swedish Pancakes A couple of weeks ago, Eric Anholt and I had breakfast at the original pancake house and chatted a bit about this stuff. He suggested that the right interface for controlling these new DRM masters was through the existing DRM master interface, and that we could add new ioctls that the current DRM master could invoke to create and manage them. Leasing as a Model I spent some time just thinking about how this might work and came up with a pretty simple metaphor for these new DRM masters. The original DRM master on each VT "owns" the output resources and has final say over their use. However, a DRM master can create another DRM master and "lease" resources it has control over to the new DRM master. Once leased, resources cannot be controlled by the owner unless the owner cancels the lease, or the new DRM master is closed. Here's some terminology:
DRM Master
Any DRM file which can perform mode setting.
Owner
The original DRM Master, created by opening /dev/dri/card*
Lessor
A DRM master which has leased out resources to one or more other DRM masters.
Lessee
A DRM master which controls resources leased from another DRM master. Each Lessee leases resources from a single Lessor.
Lessee ID
An integer which uniquely identifies a lessee within the tree of DRM masters descending from a single Owner.
Lease
The contract between the Lessor and Lessee which identifies which resources which may be controlled by the Lessee. All of the resources must be owned by or leased to the Lessor.
With Eric's input, the interface to create a lease was pretty simple to write down:
int drmModeCreateLease(int fd,
               const uint32_t *objects,
               int num_objects,
               int flags,
               uint32_t *lessee_id);
Given an FD to a DRM master, and a list of objects to lease, a new DRM master FD is returned that holds a lease to those objects. 'flags' can be any combination of O_CLOEXEC and O_NONBLOCK for the newly minted file descriptor. Of course, the owner might want to take some resources back, or even grant new resources to the lessee. So, I added an interface that rewrites the terms of the lease with a new set of objects:
int drmModeChangeLease(int fd,
               uint32_t lessee_id,
               const uint32_t *objects,
               int num_objects);
Note that nothing here makes any promises about the state of the objects across changes in the lease status; the lessor and lessee are expected to perform whatever modesetting is required for the objects to be useful to them. Window System Integration There are two ways to integrate DRM leases into the window system environment:
  1. Have logind "lease" most resources to the window system. When a HMD is connected, it would lease out suitable resources to the VR environment.
  2. Have the window system "own" all of the resources and then add window system interfaces to create new DRM masters leased from its DRM master.
I'll probably go ahead and do 2. in X and see what that looks like. One trick with any of this will be to hide HMDs from any RandR clients listening in on the window system. You probably don't want the window system to tell the desktop that a new monitor has been connected, have it start reconfiguring things, and then have your VR application create a new DRM master, making the HMD appear to have disconnected to the window system and have that go reconfigure things all over again. I'm not sure how this might work, but perhaps having the VR application register something like a passive grab on hot plug events might make sense? Essentially, you want it to hear about monitor connect events, go look to see if the new monitor is one it wants, and if not, release that to other X clients for their use. This can be done in stages, with the ability to create a new DRM master over X done first, and then cleaning up the hotplug stuff later on. Current Status I hacked up the kernel to support the drmModeCreateLease API, and then hacked up kmscube to run two threads with different sets of KMS resources. That ran for nearly a minute before crashing and requiring a reboot. I think there may be some locking issues with page flips from two threads to the same device. I think I also made the wrong decision about how to handle lessors closing down. I tried to let the lessors get deleted and then 'orphan' the lessees. I've rewritten that so that lessees hold a reference on their lessor, keeping the lessor in place until the lessee shuts down. I've also written the kernel parts of the drmModeChangeLease support. Questions

14 March 2017

Keith Packard: Valve

Consulting for Valve in my spare time Valve Software has asked me to help work on a couple of Linux graphics issues, so I'll be doing a bit of consulting for them in my spare time. It should be an interesting diversion from my day job working for Hewlett Packard Enterprise on Memory Driven Computing and other fun things. First thing on my plate is helping support head-mounted displays better by getting the window system out of the way. I spent some time talking with Dave Airlie and Eric Anholt about how this might work and have started on the kernel side of that. A brief synopsis is that we'll split off some of the output resources from the window system and hand them to the HMD compositor to perform mode setting and page flips. After that, I'll be working out how to improve frame timing reporting back to games from a composited desktop under X. Right now, a game running on X with a compositing manager can't tell when each frame was shown, nor accurately predict when a new frame will be shown. This makes smooth animation rather difficult.

31 January 2017

Keith Packard: FOSDEM-2017

FOSDEM 2017 -- The Machine and ChaosKeys Yay! I get to go to FOSDEM this year. I'll be speaking about Free Software for The Machine on Sunday afternoon at 14:00 in K.1.105 (La Fontaine). I'll also be bringing along a number of ChaosKeys and will have them available for either $40 US or 40. You can either bring cash or pre-pay at the Altus Metrum shop. Hope to see you there.

28 January 2017

Bits from Debian: Debian at FOSDEM 2017

On February 4th and 5th, Debian will be attending FOSDEM 2017 in Brussels, Belgium; a yearly gratis event (no registration needed) run by volunteers from the Open Source and Free Software community. It's free, and it's big: more than 600 speakers, over 600 events, in 29 rooms. This year more than 45 current or past Debian contributors will speak at FOSDEM: Alexandre Viau, Bradley M. Kuhn, Daniel Pocock, Guus Sliepen, Johan Van de Wauw, John Sullivan, Josh Triplett, Julien Danjou, Keith Packard, Martin Pitt, Peter Van Eynde, Richard Hartmann, Sebastian Dr ge, Stefano Zacchiroli and Wouter Verhelst, among others. Similar to previous years, the event will be hosted at Universit libre de Bruxelles. Debian contributors and enthusiasts will be taking shifts at the Debian stand with gadgets, T-Shirts and swag. You can find us at stand number 4 in building K, 1 B; CoreOS Linux and PostgreSQL will be our neighbours. See https://wiki.debian.org/DebianEvents/be/2017/FOSDEM for more details. We are looking forward to meeting you all!

08 January 2017

Keith Packard: embedded-arm-libc

Finding a Libc for tiny embedded ARM systems You'd think this problem would have been solved a long time ago. All I wanted was a C library to use in small embedded systems -- those with a few kB of flash and even fewer kB of RAM. Small system requirements A small embedded system has a different balance of needs: Available small C libraries I've looked at: Current AltOS C library We've been using pdclib for a couple of years. It was easy to get running, but it really doesn't match what we need. In particular, it uses a lot of stack space in the stdio implementation as there's an additional layer of abstraction that isn't necessary. In addition, pdclib doesn't include a math library, so I've had to 'borrow' code from other places where necessary. I've wanted to switch for a while, but there didn't seem to be a great alternative. What's wrong with newlib? The "obvious" embedded C library is newlib. Designed for embedded systems with a nice way to avoid needing a 'real' kernel underneath, newlib has a lot going for it. Most of the functions have a good balance between speed and size, and many of them even offer two implementations depending on what trade-off you need. Plus, the build system 'just works' on multi-lib targets like the family of cortex-m parts. The big problem with newlib is the stdio code. It absolutely requires dynamic memory allocation and the amount of code necessary for 'printf' is larger than the flash space on many of our devices. I was able to get a cortex-m3 application compiled in 41kB of code, and that used a smattering of string/memory functions and printf. How about avr libc? The Atmel world has it pretty good -- avr-libc is small and highly optimized for atmel's 8-bit avr processors. I've used this library with success in a number of projects, although nothing we've ever sold through Altus Metrum. In particular, the stdio implementation is quite nice -- a 'FILE' is effectively a struct containing pointers to putc/getc functions. The library does no buffering at all. And it's tiny -- the printf code lacks a lot of the fancy new stuff, which saves a pile of space. However, much of the places where performance is critical are written in assembly language, making it pretty darn hard to port to another processor. Mixing code together for fun and profit! Today, I decided to try an experiment to see what would happen if I used the avr-libc stdio bits within the newlib environment. There were only three functions written in assembly language, two of them were just stubs while the third was a simple ultoa function with a weird interface. With those coded up in C, I managed to get them wedged into newlib. Figuring out the newlib build system was the only real challenge; it's pretty awful having generated files in the repository and a mix of autoconf 2.64 and 2.68 version dependencies. The result is pretty usable though; my STM 32L discovery board demo application is only 14kB of flash while the original newlib stdio bits needed 42kB and that was still missing all of the 'syscalls', like read, write and sbrk. Here's gitweb pointing at the top of the tiny-stdio tree: gitweb And, of course you can check out the whole thing
git clone git://keithp.com/git/newlib
'master' remains a plain upstream tree, although I do have a fix on that branch. The new code is all on the tiny-stdio branch. I'll post a note on the newlib mailing list once I've managed to subscribe and see if there is interest in making this option available in the upstream newlib releases. If so, I'll see what might make sense for the Debian libnewlib-arm-none-eabi packages.

19 November 2016

Keith Packard: AltOS-Lisp-2

Updates to Altos Lisp I wrote a few days ago about a tiny lisp interpreter I wrote for AltOS Really, it's almost "done" now, I just wanted to make a few improvements Incremental Collection I was on a walk on Wednesday when I figured out that I didn't need to do a full collection every time; a partial collection that only scanned the upper portion of memory would often find plenty of free space to keep working for a while. To recap, the heap is in two pieces; the ROM piece and the RAM piece. The ROM piece is generated during the build process and never changes afterwards (hence the name), so the only piece which is collected is the RAM piece. Collection works like:
chunk_low = heap base
new_top = heap base
For all of the heap
    Find the first 64 live objects above chunk_low
    Compact them all to new_top
    Rewrite references in the whole heap for them
    Set new_top above the new locations
    Set chunk_low above the old locations
top = new_top
The trick is to realize that there's really no need to start at the bottom of the heap; you can start anywhere you like and compact stuff, possibly leaving holes below that location in the heap. As the heap tends to have long-lived objects slowly sift down to the beginning, it's useful to compact objects higher than that, skipping the compaction process for the more stable area in memory. Each time the whole heap is scanned, the top location is recorded. After that, incremental collects happen starting at that location, and when that doesn't produce enough free space, a full collect is done. The collector now runs a bunch faster on average now. Binary Searches I stuck some linear searches in a few places in the code, the first was in the collector when looking to see where an object had moved to. As there are 64 entries, the search is reduced from 32 to 6 compares on average. The second place was in the frame objects, which hold the list of atom/value bindings for each lexical scope (including the global scope). These aren't terribly large, but a binary search is still a fine plan. I wanted to write down here the basic pattern I'm using for binary searches these days, which avoids some of the boundary conditions I've managed to generate in the past:
int find (needle)  
    int l = 0;
    int r = count - 1;
    while (l <= r)  
        int m = (l + r) >> 1;
        if (haystack[m] < needle)
            l = m + 1;
        else
            r = m - 1;
     
    return l;
 
With this version, the caller can then check to see if there's an exact match, and if not, then the returned value is the location in the array where the value should be inserted. If the needle is known to not be in the haystack, and if the haystack is large enough to accept the new value:
void insert(needle)  
    int l = find(needle);
    memmove(&haystack[l+1],
        &haystack[l],
        (num - l) * sizeof (haystack[0]));
    haystack[l] = needle;
 
Similarly, if the caller just wants to know if the value is in the array:
bool exists(needle)  
    int l = find(needle);
    return (l < count && haystack[l] == needle);
 
Call with Current Continuation Because the execution stack is managed on the heap, it's completely trivial to provide the scheme-like call with current continuation, which constructs an object which can be 'called' to transfer control to a saved location:
> (+ "hello " (call/cc (lambda (return) (setq boo return) (return "foo "))) "world")
"hello foo world"
> (boo "bar ")
"hello bar world"
> (boo "yikes ")
"hello yikes world"
One thing I'd done previously is dump the entire state of the interpreter on any error, and that included a full stack trace. I adopted that code for printing of these continuation objects:
boo
[
    expr:   (call/cc (lambda (return) (set (quote boo) return) (return "foo ")))
    state:  val
    values: (call/cc
             [recurse...]
             )
    sexprs: ()
    frame:   
]
[
    expr:   (+ "hello " (call/cc (lambda (return) (set (quote boo) return) (return "foo "))) "world")
    state:  formal
    values: (+
             "hello "
             )
    sexprs: ((call/cc (lambda (return) (set (quote boo) return) (return "foo ")))
             "world"
             )
    frame:   
]
The top stack frame is about to return from the call/cc spot with a value; supply a value to 'boo' and that's where you start. The next frame is in the middle of computing formals for the + s-expression. It's found the + function, and the "hello " string and has yet to get the value from call/cc or the value of the "world" string. Once the call/cc "returns", that value will get moved to the values list and the sexpr list will move forward one spot to compute the "world" value. Implementing this whole mechanism took only a few dozen lines of code as the existing stack contexts were already a continuation in effect. The hardest piece was figuring out that I needed to copy the entire stack each time the continuation was created or executed as it is effectively destroyed in the process of evaluation. I haven't implemented dynamic-wind yet; when I did that for nickle, it was a bit of a pain threading execution through the unwind paths. Re-using Frames I decided to try and re-use frames (those objects which hold atom/value bindings for each lexical scope). It wasn't that hard; the only trick was to mark frames which have been referenced from elsewhere as not-for-reuse and then avoid sticking those in the re-use queue. This reduced allocations even further so that for simple looping or tail-calling code, the allocator may never end up being called. How Big Is It? I've managed to squeeze the interpreter and all of the rest of the AltOS system into 25kB of Cortex-M0 code. That leaves space for the 4kB boot loader and 3kB of flash to save/restore the 3kB heap across resets. Adding builtins to control timers and GPIOs would make this a reasonable software load for an Arduino; offering a rather different programming model for those with a taste for adventure. Modern ARM-based Arduino boards have plenty of flash and ram for this. It might be interesting to get this running on the Arduino Zero; there's no real reason to replace the OS either; porting the lisp interpreter into the bare Arduino environment wouldn't take long.

15 November 2016

Keith Packard: AltOS-Lisp

A Tiny Lisp for AltOS I took a bit of a diversion over the last week or so when I wondered how small a lisp interpreter I could write, and whether I could fit that into one of the processors that AltOS runs on. It turns out, you can write a tiny lisp interpreter that fits in about 25kB of ram with a 3kB heap for dynamic data. I decided to target our ChaosKey boards; they're tiny, and I've got a lot of them. That processor offers 28kB of usable flash space (after the 4kB boot loader) and 6kB of ram with the processor running at a steaming 48MHz. I'm not at all sure this is useful, but I always enjoy doing language implementations, and this one presented some 'interesting' challenges: Iterative Compacting Allocator I'm betting someone has built one of these before, but I couldn't find one, so I wrote my own. The basic strategy is to walk the heap to find a subset of the active objects which are allocated sequentially in memory with only unused storage between them. These objects are then compacted in-place, and then the heap is walked again to update all references to the moved objects. Then, the process is restarted to find another subset and move them. By looking for these subsets starting at the bottom of the heap, and working upwards towards the top, the whole heap can be compacted into a contiguous chunk at the bottom of memory. Allocation involves moving a pointer along at the top of active memory; when it gets to the top of the heap, collect and see if there's space now. As always, the hardest part was to make sure all active memory was tied down. The second hardest part was to make sure that all active pointers were updated after any allocation, in case a collect moved the underlying object. That was just bookkeeping, but did consume much of the development time. One additional trick was to terminate the recursion during heap walking by flagging active cons cell locations in a global bitmap and then walking that separately, iterating until that bitmap is empty. Nested lambdas form another recursion which should probably get the same approach, but I haven't done that yet. An unexpected "benefit" of the tiny heap is that the collector gets called a lot, so any referencing bugs will have a good chance of being uncovered in even a short program execution. ROM-able Lisp Instead of implementing all of the language in C, I wanted to be able to implement various pieces in Lisp itself. Because of the complex nature of the evaluation process, adding things like 'let' or even 'defun' turn out to be dramatically simpler in Lisp. However, I didn't want to consume bunches of precious RAM to hold these basic functions. What I did was to create two heaps, one in ROM and the other in RAM. References are be tagged as to which heap they're in. 16-bit Values Lisp programs use a pile of references. Using a full 32 bits for each one would mean having a lot less effective storage. So, instead, I use an offset from the base of the heap. The top bit of the offset is used to distinguish between the ROM heap and the RAM heap. I needed a place to store type information, so I settled on using the bottom two bits of the references. This allows for four direct type values. One of these values is used to indicate an indirect type, where the type is stored in the first byte of the object. The direct types are:
ValueType
0Cons cell
114-bit int
2String
3Other
With 2 tag bits, the allocator needs to work in 32-bit units as the references couldn't point to individual bytes. Finally, I wanted 0 to be nil, so I add four to the offsets within the heaps. The result is that the ROM and RAM heaps can each cover up to 32k - 4 bytes. Note that ints are not stored in the heap; instead they are immediate values stored in 14 bits, providing a range of -8192 to 8191. One can imagine wanting more range in ints at some point. Heap-based Evaluator A simple lisp implementation uses the fact that eval is re-entrant and do the operation on the C stack:
val eval(val exprs)  
    val vals;
    while (exprs)  
        vals = append(vals, eval(car(exprs)));
        exprs = exprs->cdr;
     
    return execute (car(vals), cdr(vals));
 
This makes things really simple and provides for a clean framework for implementing various bits of lisp, including control flow and macros. However, it rapidly consumes all of the available memory for a stack, while also requiring separate bookkeeping for the in-use memory in each frame. I replaced this design with one which keeps the lisp stack on the heap, and then performs eval with a state machine with all state stored in global variables so that the memory manager can reference them directly. Each eval operation is performed in a separate 'stack' context, which holds the entire eval state except for the current value, which lives in a separate global variable and is used to pass values out of one stack frame and into another. When the last stack context is finished, the evaluation terminates and the value is returned to the caller. There are nine states in the state machine, each of which is implemented in a separate function, making the state machine a simple matter of pulling the current state from the top of the stack and invoking the associated function:
while (ao_lisp_stack)  
    if (!(*evals[ao_lisp_stack->state])()   ao_lisp_exception)  
        ao_lisp_stack_clear();
        return AO_LISP_NIL;
     
 
return ao_lisp_v;
Because there's no C recursion involved, catching exceptions is a simple matter of one test at this level. Primitives like progn, while, cond and eval all take special magic in the state machine to handle; getting all of that working took several attempts before I found the simple loop shown above. Lexical Scoping The last time I did a lisp interpreter, I implemented dynamic scoping. Atoms were all global and had values associated directly with them. Evaluating a lambda started by saving all of the existing global values for the parameter atoms and then binding the new values. When finished, the previous values would be restored. This is almost correct, but provides surprising results for things like:
> (setq baz 1)
> (def foo (lambda (bar) (+ baz bar)))
> (def bletch (lambda (baz) (foo baz)))
> (bletch 2)
4
The value that foo gets for 'baz' is 2 instead of 1 under dynamic scoping, which most people find surprising. This time, I was determined to use lexical scoping, and it turned out to be surprisingly easy. The first trick was to separate the atoms from their 'value'; each atom can have a different value in different lexical scopes. So, each lexical scope gets a 'frame' object, those contain the value for each atom defined in that scope. There's a global scope which holds all of the globally defined values (like baz, foo and bletch above). Each frame points to its enclosing scope, so you can search upwards to find the right value. The second trick was to realize that the lexical scope of a lambda is the scope in which the lambda itself is evaluated, and that the evaluation of a lambda expression results in a 'function' object, which contains the lambda and its enclosing scope:
> (def foo (lambda (bar bletch)
       ((lambda (baz)
          (+ baz bar))
        bletch)))
> (foo 2 3)
5
In this case, the inner lambda in foo can 'see' the value of bar from the enclosing lambda. More subtly, even if the inner lambda were executed multiple times, it would see the same baz, and could even change it. This can be used to implement all kinds of craziness, including generators:
> (defun make-inc (add)
  ((lambda (base)
     (lambda ()
       (progn
     (setq base (+ base add))
     base)))
   0)
  )
> (setq plus2 (make-inc 2))
> (plus2)
2
> (plus2)
4
The current implementation of each frame is a simple array of atom/value pairs, with a reference to the parent frame to form the full scope. There are dramatically faster implementations of this same concept, but the goal here was small and simple. A Tiny Allocator Optimization With eval consuming heap space for stacks, frames and argument lists, the interpreter was spending a lot of time in the collector. As a simple optimization, I added some free lists for stack frames and cons cells. Stack frames are never referenced when they're finished, so they can always go on the free list. Cons cells used to construct argument lists for functions are usually free. Builtin functions have a bit which indicates whether they might hold on to a reference to the argument list. Interpreted lambdas can't get the list while nlambdas, lexprs and macros do. Each lambda execution creates a new frame, and while it would be possible to discover if that frame 'escapes' the lambda, I decided to not attempt to cache free ones yet. Save and Restore To make the lisp interpreter more useful in tiny computers, I added the ability to save and restore the entire heap to flash. This requires leaving enough space in the flash to preserve the heap, further constraining the amount of flash available for the application. Code All of this code is in the 'lisp' branch of my AltOS repository: AltOS The lisp interpreter is independent from the rest of AltOS and could be re-purposed for another embedded operating system. It runs fine on ChaosKey hardware, and also on the STM32F042 Nucleo-32 board There's also a test framework which runs on Linux, and is how I developed almost all of the code. That's in the src/test directory in the above repository, and is called 'ao_lisp_test'. Towers of Hanoi Here's an implementation of the classic recursive Towers of Hanoi game; it shows most of the current features of the language.
;
; Towers of Hanoi
;
; Copyright   2016 Keith Packard <keithp@keithp.com>
;
; This program is free software; you can redistribute it and/or modify
; it under the terms of the GNU General Public License as published by
; the Free Software Foundation, either version 2 of the License, or
; (at your option) any later version.
;
; This program is distributed in the hope that it will be useful, but
; WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
; General Public License for more details.
;
; ANSI control sequences
(defun move-to (col row)
  (patom "\033[" row ";" col "H" nil)
  )
(defun clear ()
  (patom "\033[2J" nil)
  )
(defun display-string (x y str)
  (move-to x y)
  (patom str)
  )
; Here's the pieces to display
(setq stack '("*" "**" "***" "****" "*****" "******" "*******"))
(setq top (+ (length stack) 3))
;
; Here's all of the stacks of pieces
; This is generated when the program is run
;
(setq stacks nil)
; Display one stack, clearing any
; space above it
(defun display-stack (x y clear stack)
  (cond ((= 0 clear)
     (cond (stack (progn
            (display-string x y (car stack))
            (display-stack x (1+ y) 0 (cdr stack))
            )
              )
           )
     )
    (t (progn
         (display-string x y "          ")
         (display-stack x (1+ y) (1- clear) stack)
         )
       )
    )
  )
; This should probably be included in the rom image...
(defun length (list)
  (cond (list (1+ (length (cdr list))))
    (0)
    )
  )
; Position of the top of the stack on the screen
; Shorter stacks start further down the screen
(defun stack-pos (y stack)
  (- y (length stack))
  )
; Display all of the stacks, spaced 20 columns apart
(defun display-stacks (x y stacks)
  (cond (stacks (progn
          (display-stack x 0 (stack-pos y (car stacks)) (car stacks))
          (display-stacks (+ x 20) y (cdr stacks)))
        )
    )
  )
; Display all of the stacks, then move the cursor
; out of the way and flush the output
(defun display ()
  (display-stacks 0 top stacks)
  (move-to 1 21)
  (flush)
  )
; Reset stacks to the starting state, with
; all of the pieces in the first stack and the
; other two empty
(defun reset-stacks ()
  (setq stacks (list stack nil nil))
  (length stack)
  )
; more functions which could usefully
; be in the rom image
(defun min (a b)
  (cond ((< a b) a)
    (b)
    )
  )
(defun nth (list n)
  (cond ((= n 0) (car list))
    ((nth (cdr list) (1- n)))
    )
  )
; Replace a stack in the list of stacks
; with a new value
(defun replace (list pos member)
  (cond ((= pos 0) (cons member (cdr list)))
    ((cons (car list) (replace (cdr list) (1- pos) member)))
    )
  )
; Move a piece from the top of one stack
; to the top of another
(defun move-piece (from to)
  (let ((from-stack (nth stacks from))
    (to-stack (nth stacks to))
    (piece (car from-stack)))
    (setq from-stack (cdr from-stack))
    (setq to-stack (cons piece to-stack))
    (setq stacks (replace stacks from from-stack))
    (setq stacks (replace stacks to to-stack))
    (display)
    (delay 100)
    )
  )
; The implementation of the game
(defun _hanoi (n from to use)
  (cond ((= 1 n)
     (progn
      (move-piece from to)
      nil)
     )
    (t
     (progn
      (_hanoi (1- n) from use to)
      (_hanoi 1 from to use)
      (_hanoi (1- n) use to from)
      )
     )
    )
  )
; A pretty interface which
; resets the state of the game,
; clears the screen and runs
; the program
(defun hanoi ()
  (setq len (reset-stacks))
  (clear)
  (_hanoi len 0 1 2)
  )

12 September 2016

Keith Packard: hopkins

Hopkins Trailer Brake Controller in Subaru Outback My minivan transmission gave up the ghost last year, so I bought a Subaru outback to pull my t@b travel trailer. There isn't a huge amount of space under the dash, so I didn't want to mount a trailer brake controller in the 'usual' spot, right above my right knee. Instead, I bought a Hopkins InSIGHT brake controller, 47297. That comes in three separate pieces which allows for very flexible mounting options. I stuck the 'main' box way up under the dash on the left side of the car. There was a nice flat spot with plenty of space that was facing the right direction: The next trick was to mount the display and control boxes around the storage compartment in the center console: Routing the cables from the controls over to the main unit took a piece of 14ga solid copper wire to use as a fishing line. The display wire was routed above the compartment lid, the control wire was routed below the lid. I'm not entirely happy with the wire routing; I may drill some small holes and then cut the wires to feed them through.

16 August 2016

Keith Packard: udevwrap

Wrapping libudev using LD_PRELOAD Peter Hutterer and I were chasing down an X server bug which was exposed when running the libinput test suite against the X server with a separate thread for input. This was crashing deep inside libudev, which led us to suspect that libudev was getting run from multiple threads at the same time. I figured I'd be able to tell by wrapping all of the libudev calls from the server and checking to make sure we weren't ever calling it from both threads at the same time. My first attempt was a simple set of cpp macros, but that failed when I discovered that libwacom was calling libgudev, which was calling libudev. Instead of recompiling the world with my magic macros, I created a new library which exposes all of the (public) symbols in libudev. Each of these functions does a bit of checking and then simply calls down to the 'real' function. Finding the real symbols Here's the snippet which finds the real symbols:
static void *udev_symbol(const char *symbol)
 
    static void *libudev;
    static pthread_mutex_t  find_lock = PTHREAD_MUTEX_INITIALIZER;
    void *sym;
    pthread_mutex_lock(&find_lock);
    if (!libudev)  
        libudev = dlopen("libudev.so.1.6.4", RTLD_LOCAL   RTLD_NOW);
     
    sym = dlsym(libudev, symbol);
    pthread_mutex_unlock(&find_lock);
    return sym;
 
Yeah, the libudev version is hard-coded into the source; I didn't want to accidentally load the wrong one. This could probably be improved... Checking for re-entrancy As mentioned above, we suspected that the bug was caused when libudev got called from two threads at the same time. So, our checks are pretty simple; we just count the number of calls into any udev function (to handle udev calling itself). If there are other calls in process, we make sure the thread ID for those is the same as the current thread.
static void udev_enter(const char *func)  
    pthread_mutex_lock(&check_lock);
    assert (udev_running == 0   udev_thread == pthread_self());
    udev_thread = pthread_self();
    udev_func[udev_running] = func;
    udev_running++;
    pthread_mutex_unlock(&check_lock);
 
static void udev_exit(void)  
    pthread_mutex_lock(&check_lock);
    udev_running--;
    if (udev_running == 0)
    udev_thread = 0;
    udev_func[udev_running] = 0;
    pthread_mutex_unlock(&check_lock);
 
Wrapping functions Now, the ugly part -- libudev exposes 93 different functions, with a wide variety of parameters and return types. I constructed a hacky macro, calls for which could be constructed pretty easily from the prototypes found in libudev.h, and which would construct our stub function:
#define make_func(type, name, formals, actuals)         \
    type name formals                       \
    type ret;                       \
    static void *f;                     \
    if (!f)                         \
        f = udev_symbol(__func__);              \
    udev_enter(__func__);                   \
    ret = ((typeof (&name)) f) actuals;         \
    udev_exit();                        \
    return ret;                     \
     
There are 93 invocations of this macro (or a variant for void functions) which look much like:
make_func(struct udev *,
      udev_ref,
      (struct udev *udev),
      (udev))
Using udevwrap To use udevwrap, simply stick the filename of the .so in LD_PRELOAD and run your program normally:
# LD_PRELOAD=/usr/local/lib/libudevwrap.so Xorg 
Source code I stuck udevwrap in my git repository: http://keithp.com/cgi-bin/gitweb.cgi?p=udevwrap;a=summary You can clone it using
$ git git://keithp.com/git/udevwrap

03 August 2016

Keith Packard: chaoskey

ChaosKey v1.0 Released USB Attached True Random Number Generator ChaosKey, our random number generator that attaches via USB, is now available for sale from the altusmetrum store. We talked about this device at Debconf 16 last month Support for this device is included in Linux starting with version 4.1. Plug ChaosKey into your system and the driver will automatically add entropy into the kernel pool, providing a constant supply of true random numbers to help keep the system secure. ChaosKey is free hardware running free software, built with free software on a free operating system.

03 June 2016

Gunnar Wolf: Stop it with those short PGP key IDs!

Debian is quite probably the project that most uses a OpenPGP implementation (that is, GnuPG, or gpg) for many of its internal operations, and that places most trust in it. PGP is also very widely used, of course, in many other projects and between individuals. It is regarded as a secure way to do all sorts of crypto (mainly, encrypting/decrypting private stuff, signing public stuff, certifying other people's identities). PGP's lineage traces back to Phil Zimmerman's program, first published in 1991 By far, not a newcomer PGP is secure, as it was 25 years ago. However, some uses of it might not be so. We went through several migrations related to algorithmic weaknesses (i.e. v3 keys using MD5; SHA1 is strongly discouraged, although not yet completely broken, and it should be avoided as well) or to computational complexity (as the migration away from keys smaller than 2048 bits, strongly prefering 4096 bits). But some vulnerabilities are human usage (that is, configuration-) related. Today, Enrico Zini gave us a heads-up in the #debian-keyring IRC channel, and started a thread in the debian-private mailing list; I understand the mail to a private list was partly meant to get our collective attention, and to allow for potentially security-relevant information to be shared. I won't go into details about what is, is not, should be or should not be private, but I'll post here only what's public information already. What are short and long key IDs? I'll start by quoting Enrico's mail:
there are currently at least 3 ways to refer to a gpg key: short key ID (last 8 hex digits of fingerprint), long key ID (last 16 hex digits) and full fingerprint. The short key ID used to be popular, and since 5 years it is known that it is computationally easy to generate a gnupg key with an arbitrary short key id. A mitigation to this is using "keyid-format long" in gpg.conf, and a better thing to do, especially in scripts, is to use the full fingerprint to refer to a key, or just ship the public key for verification and skip the key servers. Note that in case of keyid collision, gpg will download and import all the matching keys, and will use all the matching keys for verifying signatures.
So... What is this about? We humans are quite bad at recognizing and remembering randomly-generated strings with no inherent patterns in them. Every GPG key can be uniquely identified by its fingerprint, a 128-bit string, usually encoded as ten blocks of four hexadecimal characters (this allows for 160 bits; I guess there's space for a checksum in it). That is, my (full) key's signature is:
AB41 C1C6 8AFD 668C A045  EBF8 673A 03E4 C1DB 921F
However, it's quite hard to recognize such a long string, let alone memorize it! So, we often do what humans do: Given that strong cryptography implies a homogenous probability distribution, people compromised on using just a portion of the key the last portion. The short key ID. Mine is then the last two blocks (shown in boldface): C1DB921F. We can also use what's known as the long key ID, that's twice as long: 64 bits. However, while I can speak my short key ID on a single breath (and maybe even expect you to remember and note it down), try doing so with the long one (shown in italics above): 673A03E4C1DB921F. Nah. Too much for our little, analog brains. This short and almost-rememberable number has then 32 bits of entropy I have less than one in 4,000,000,000 chance of generating a new key with this same short key ID. Besides, key generation is a CPU-intensive operation, so it's quite unlikely we will have a collision, right? Well, wrong. Previous successful attacks on short key IDs Already five years ago, Asheesh Laroia migrated his 1024D key to a 4096R. And, as he describes in his always-entertaining fashion, he made his computer sweat until he was able to create a new key for which the short key ID collided with the old one. It might not seem like a big deal, as he did this non-maliciously, but this easily should have spelt game over for the usage of short key IDs. After all, being able to generate a collision is usually the end for cryptographic systems. Asheesh specifically mentioned in his posting how this could be abused. But we didn't listen. Short key IDs are just too convenient! Besides, they allow us to have fun, can be a means of expression! I know of at least two keys that would qualify as vanity: Obey Arthur Liu's 0x29C0FFEE (created in 2009) and Keith Packard's 0x00000011 (created in 2012). Then we got the Evil 32 project. They developed Scallion, started (AFAICT) in 2012. Scallion automates the search for a 32-bit collision using GPUs; they claim that it takes only four seconds to find a collision. So, they went through the strong set of the public PGP Web of Trust, and created a (32-bit-)colliding key for each of the existing keys. And what happened now? What happened today? We still don't really know, but it seems we found a first potentially malicious collision that is, the first "nonacademic" case. Enrico found two keys sharing the 9F6C6333 short ID, apparently belonging to the same person (as would be the case of Asheesh, mentioned above). After contacting Gustavo, though, he does not know about the second That is, it can be clearly regarded as an impersonation attempt. Besides, what gave away this attempt are the signatures it has: Both keys are signed by what appears to be the same three keys: B29B232A, F2C850CA and 789038F2. Those three keys are not (yet?) uploaded to the keyservers, though... But we can expect them to appear at any point in the future. We don't know who is behind this, or what his purpose is. We just know this looks very evil. Now, don't panic: Gustavo's key is safe. Same for his certifiers, Marga, Agust n and Maxy. It's just a 32-bit collision. So, in principle, the only parties that could be cheated to trust the attacker are humans, right? Nope. Enrico tested on the PGP pathfinder & key statistics service, a keyserver that finds trust paths between any two arbitrary keys in the strong set. Surprise: The pathfinder works on the short key IDs, even when supplied full fingerprints. So, it turns out I have three faked trust paths into our impostor. What next? There are several things this should urge us to do. And there are surely many other important recommendations. But this is a good set of points to start with. [update] I was pointed at Daniel Kahn Gillmor's 2013 blog post, OpenPGP Key IDs are not useful. Daniel argues, in short, that cutting a fingerprint in order to get a (32- or 64-bit) short key ID is the worst of all worlds, and we should rather target either always showing full fingerprints, or not showing it at all (and leaving all the crypto-checking bits to be done by the software, as comparing 160-bit strings is not natural for us humans). [update] This post was picked up by LWN.net. A very interesting discussion continues in their comments.

07 May 2016

Keith Packard: Altos1.6.3

AltOS 1.6.3 Bdale and I are pleased to announce the release of AltOS version 1.6.3. AltOS is the core of the software for all of the Altus Metrum products. It consists of firmware for our cc1111, STM32L151, STMF042, LPC11U14 and ATtiny85 based electronics and Java-based ground station software. Version 1.6.3 adds idle mode to AltosDroid and has bug fixes for our host software on desktops, laptops an android devices along with BlueTooth support for Windows. 1.6.3 is in Beta test for Android; if you want to use the beta version, join the AltosDroid beta program AltOS AltOS fixes: AltosUI and TeleGPS Applications AltosUI and TeleGPS New Features: AltosUI and TeleGPS Fixes: AltosDroid AltosDroid new features: AltosDroid bug fixes: Documentation

13 April 2016

Keith Packard: x.org-election

X.org Election Time Vote Now It's more important than usual to actually get your vote in we're asking the membership to vote on changes the the X.org bylaws that are necessary for X.org to become a SPI affiliate project, instead of continuing on as a separate organization. While I'm in favor of this transition as I think it will provide much needed legal and financial help, the real reason we need everyone to vote is that we need of the membership to cast ballots for the vote to be valid. Last time, we didn't reach that value, so even though we had a majority voting in favor of the change, it didn't take effect. If you aren't in favor of this change, I'd still encourage you to vote as I'd like to get a valid result, no matter the outcome. Of course, we're also electing four members to the board. I'm happy to note that there are five candidates running for the four available seats, which shows that there are enough people willing to help serve the X.org community in this fashion.

13 January 2016

Keith Packard: auto-calendar

Automatic Calendar Management Notmuch + Calypso One of the big features of outlook/exchange in my world is automatically merging of incoming calendar updates from email. This makes my calendar actually useful in knowing what meetings people have asked me to attend. As I'm not willing to otherwise tolerate outlook, I decided to try and provide that in my preferred environment; notmuch and calypso. Identifying calendar updates The first trick is how to identify incoming messages with calendar updates. I'd love to be able to search for specific mime content types, but I haven't been able to construct such a search. Failing that, I'm just looking for messages containing the string 'text/calendar':
notmuch search --output=messages tag:inbox AND text/calendar
Next, I want to skip over previously scanned calendar updates, so I'll plan on tagging messages that have been scanned with the 'calendar' tag and skip those:
notmuch search --output=messages tag:inbox AND text/calendar AND not tag:calendar
jq sed for json With the messages containing likely calendar entries identified, the remaining task is to extract the MIME section containing the actual calendar data. Notmuch can generate json for the message, leaving us only needing to parse the json and extract the relevant section. I found the 'jq' tool in the archive, which looks like a rather complicated parsing and reformatting tool for json data. It took a while to understand, but I managed to generate a command string that takes a notmuch message and pulls out the content for all text/calendar elements:
jq -r '..  select(."content-type"? == "text/calendar")   .content'
This is a recursive walk over the data structure. It looks for structures with "content-type": "text/calendar" and dumps their "content" elements in raw text form. Putting it all together Here's the complete script:
#!/bin/bash
SEARCH="tag:inbox AND not tag:calendar AND text/calendar"
TMP= mktemp 
trap "rm -r $TMP" 0 1 15
notmuch search --output=messages $SEARCH   while read message; do
    notmuch show --format=json $message   
        jq -r '..   select(."content-type"? == "text/calendar")   .content' > $TMP
    if [ -s $TMP ]; then
        calypso --import private/calendar $TMP && notmuch tag +calendar $message
    else
        notmuch tag +calendar $message
    fi
done
I'd love to fix calypso's --import operation to talk to the local calypso daemon with the database loaded; the current mechanism actually loads the entire database into a new process and then applies the new data to that. With my calendar often containing hundreds of entries, that takes a while.

11 January 2016

Keith Packard: Altos1.6.2

AltOS 1.6.2 TeleMega v2.0 support, bug fixes and documentation updates Bdale and I are pleased to announce the release of AltOS version 1.6.2. AltOS is the core of the software for all of the Altus Metrum products. It consists of firmware for our cc1111, STM32L151, STMF042, LPC11U14 and ATtiny85 based electronics and Java-based ground station software. This is a minor release of AltOS, including support for our new TeleMega v2.0 board, a small selection of bug fixes and a major update of the documentation AltOS Firmware TeleMega v2.0 added The updated six-channel flight computer, TeleMega v2.0, has a few changes from the v1.0 design: None of these change the basic functionality of the device, but they do change the firmware a bit so there's a new package. AltOS Bug Fixes We also worked around a ground station limitation in the firmware: AltosUI and TeleGPS applications A few minor new features are in this release Documentation I spent a good number of hours completely reformatting and restructuring the Altus Metrum documentation.

21 December 2015

Keith Packard: TeleLaunchTwo

TeleLaunchTwo A Smaller Wireless Launch Controller I've built a wireless launch control system for NAR and OROC. Those are both complex systems with a single controller capable of running hundreds of pads. And, it's also complicated to build, with each board hand-made by elves in our Portland facility (aka, my office). A bunch of people have asked for something simpler, but using the same AES-secured two-way wireless communications link, so I decided to just build something and see if we couldn't eventually come up with something useful. I think if there's enough interest, I can get some boards built for reasonable money. Here's a picture of the system; you can see the LCO end in a box behind the pad end sitting on the bench. Radio Link Each end has a 35mW 70cm digital transceiver (so, they run in the 440MHz amateur band). These run at 19200 baud with fancy forward error correction and AES security to keep the link from accidentally (or maliciously) firing a rocket at the wrong time. Using a bi-directional link, we also get igniter continuity and remote arming information at the LCO end. The LCO Box In the LCO box, there's a lipo battery to run the device, so it can be completely stand-alone. It has three switches and a button -- an arming switch for each of two channels, a power switch and a firing button. The lipo can be charged by opening up the box and plugging it into a USB port. The Pad Box The pad box will have some cable glands for the battery and each firing circuit. On top, it will have two switches, a power switch and an arming switch. The board has two high-power FETs to drive the igniters. That should be more reliable than using a relay, while also allowing the board to tolerate a wider range of voltages -- the pad box can run on anything from 12V to 24V. The Box Unlike the OROC and NAR systems, these boards are both designed to fit inside a specific box, the Hammond 1554E, and use the mounting standoffs provided. This box is rated at NEMA 4X, which means it's fairly weather proof. Of course, I have to cut holes in the box, but I found some NEMA 4X switches, will use cable glands for the pad box wiring and can use silicone around the BNC connector. The result should be pretty robust. I also found a pretty solid-seeming BNC connector, which hooks around the edge of the board and also clips on to the board. Safety Features. There's an arming switch on both ends of the link, and you can't fire a rocket without having both ends armed. That provides an extra measure of safety while working near the pad. The pad switch is a physical interlock between the power supply and the igniters, so even if the software is hacked or broken, disarming the box means the igniters won't fire. The LCO box beeps constantly when either arming switch is selected, giving you feedback that the system is ready to fire. And you can see on any LED whether the pad box is also armed.

Next.