Search Results: "hmc"

This post should have marked the beginning of my yearly roundups of the favourite books and movies I read and watched in 2023. However, due to coming down with a nasty bout of flu recently and other sundry commitments, I wasn't able to undertake writing the necessary four or five blog posts In lieu of this, however, I will simply present my (unordered and unadorned) highlights for now. Do get in touch if this (or any of my previous posts) have spurred you into picking something up yourself

Books

Films Recent releases

The Blue Caftan (Maryam Touzani, 2022)
The Eight Mountains (Felix van Groeningen & Charlotte Vandermeersch, 2022)
Evil Does Not Exist (Ryusuke Hamaguchi, 2023)
Killers of the Flower Moon (Martin Scorcese, 2023)
Monster (Hirokazu Kore-eda, 2023)
Passages (Ira Sachs, 2023)
Poor Things (Yorgos Lanthimos, 2023)
The Tuba Thieves (Alison O Daniel, 2023)
Theater Camp (Molly Gordon and Nick Lieberman, 2023)
T R (Todd Field, 2022)

Unenjoyable experiences included Alejandro G mez Monteverde's Sound of Freedom (2023), Alex Garland's Men (2022) and Steven Spielberg's The Fabelmans (2022).
Older releases (Films released before 2022, and not including rewatches from previous years.)

Brief Encounter (David Lean, 1945)
Clouds of Sils Maria (Olivier Assayas, 2014)
Daisy Miller (Peter Bogdanovich, 1974)
First Reformed (Paul Schrader, 2017)
Forbidden Games (Ren Cl ment, 1952)
La Noire de... (Ousmane Semb ne, 1966)
The Queen of Spades (Thorold Dickinson, 1949)
The River (Jean Renoir, 1951)
Topsy-Turvy (Mike Leigh, 1999)
Le Trou (Jacques Becker, 1960)

Distinctly unenjoyable watches included Ocean's Eleven (1960), El Topo (1970), L olo (1992), Hotel Mumbai (2018), Bulworth (1998) and and The Big Red One (1980).

The following contributors got their Debian Developer accounts in the last two months:

James Lu (jlu)
Hugh McMaster (hmc)
Agathe Porte (gagath)

The following contributors were added as Debian Maintainers in the last two months:

Soren Stoutner
Matthijs Kooijman
Vinay Keshava
Jarrah Gosbell
Carlos Henrique Lima Melara
Cordell Bloor

Congratulations!

Now while I am sure Arthur Miller was referring to writing a play when he said those words they have an oddly appropriate resonance for my topic.

In the early nineties Lou Montulli applied the idea of magic cookies to HTTP to make the web stateful, I imagine he had no idea of the issues he was going to introduce for the future. Like most of the web technology it was a solution to an immediate problem which it has never been possible to subsequently improve.

Chocolate chip cookie are much tastier than HTTP cookies

The HTTP cookie is simply a way for a website to identify a connecting browser session so that state can be kept between retrieving pages. Due to shortcomings in the design of cookies and implementation details in browsers this has lead to a selection of unwanted side effects. The specific issue that I am talking about here is the supercookie where the super prefix in this context has similar connotations as to when applied to the word villain.

Whenever the browser requests a resource (web page, image, etc.) the server may return a cookie along with the resource that your browser remembers. The cookie has a domain name associated with it and when your browser requests additional resources if the cookie domain matches the requested resources domain name the cookie is sent along with the request.

As an example the first time you visit a page on www.example.foo.invalid you might receive a cookie with the domain example.foo.invalid so next time you visit a page on www.example.foo.invalid your browser will send the cookie along. Indeed it will also send it along for any page on another.example.foo.invalid

A supercookies is simply one where instead of being limited to one sub-domain (example.foo.invalid) the cookie is set for a top level domain (foo.invalid) so visiting any such domain (I used the invalid name in my examples but one could substitute com or co.uk) your web browser gives out the cookie. Hackers would love to be able to set up such cookies and potentially control and hijack many sites at a time.

This problem was noted early on and browsers were not allowed to set cookie domains with fewer than two parts so example.invalid or example.com were allowed but invalid or com on their own were not. This works fine for top level domains like .com, .org and .mil but not for countries where the domain registrar had rules about second levels like the uk domain (uk domains must have a second level like .co.uk).

NetSurf cookie manager showing a supercookie

There is no way to generate the correct set of top level domains with an algorithm so a database is required and is called the Public Suffix List (PSL). This database is a simple text formatted list with wildcard and inversion syntax and is at time of writing around 180Kb of text including comments which compresses down to 60Kb or so with deflate.

A few years ago with ICANN allowing the great expansion of top level domains the existing NetSurf supercookie handling was found to be wanting and I decided to implement a solution using the PSL. At this point in time the database was only 100Kb source or 40Kb compressed.

I started by looking at limited existing libraries. In fact only the regdom library was adequate but used 150Kb of heap to load the pre-processed list. This would have had the drawback of increasing NetSurf heap usage significantly (we still have users on 8Mb systems). Because of this and the need to run PHP script to generate the pre-processed input it was decided the library was not suitable.

Lacking other choices I came up with my own implementation which used a perl script to construct a tree of domains from the PSL in a static array with the label strings in a separate table. At the time my implementation added 70Kb of read only data which I thought reasonable and allowed for direct lookup of answers from the database.

This solution still required a pre-processing step to generate the C source code but perl is much more readily available, is a language already used by our tooling and we could always simply ship the generated file. As long as the generated file was updated at release time as we already do for our fallback SSL certificate root set this would be acceptable.

wireshark session shown NetSurf sending a co.uk supercookie to bbc.co.uk

I put the solution into NetSurf, was pleased no-one seemed to notice and moved on to other issues. Recently while fixing a completely unrelated issue in the display of session cookies in the management interface and I realised I had some test supercookies present in the display. After the initial "thats odd" I realised with horror there might be a deeper issue.

It quickly became evident the PSL generation was broken and had been for a long time, even worse somewhere along the line the "redundant" empty generated source file had been removed and the ancient fallback code path was all that had been used.

This issue had escalated somewhat from a trivial display problem. I took a moment to asses the situation a bit more broadly and came to the conclusion there were a number of interconnected causes, centered around the lack of automated testing, which could be solved by extracting the PSL handling into a "support" library.

NetSurf has several of these support libraries which could be used separately to the main browser project but are principally oriented towards it. These libraries are shipped and built in releases alongside the main browser codebase and mainly serve to make API more obvious and modular. In this case my main aim was to have the functionality segregated into a separate module which could be tested, updated and monitored directly by our CI system meaning the embarrassing failure I had found can never occur again.

Before creating my own library I did consider a library called libpsl had been created since I wrote my original implementation. Initially I was very interested in using this library given it managed a data representation within a mere 32Kb.

Unfortunately the library integrates a great deal of IDN and punycode handling which was not required in this use case. NetSurf already has to handle IDN and punycode translations and uses punycode encoded domain names internally only translating to unicode representations for display so duplicating this functionality using other libraries requires a great deal of resource above the raw data representation.

I put the library together based on the existing code generator Perl program and integrated the test set that comes along with the PSL. I was a little alarmed to discover that the PSL had almost doubled in size since the implementation was originally written and now the trivial test program of the library was weighing in at a hefty 120Kb.

This stemmed from two main causes:

there were now many more domain label strings to be stored
there now being many, many more nodes in the tree.

To address the first cause the length of the domain label strings was moved into the unused padding space within each tree node removing a byte from each domain label saving 6Kb. Next it occurred to me that while building the domain label string table that if the label to be added already existed as a substring within the table it could be elided.

The domain labels were sorted from longest to shortest and added in order searching for substring matches as the table was built this saved another 6Kb. I am sure there are ways to reduce this further I have missed (if you see them let me know!) but a 25% saving (47Kb to 35Kb) was a good start.

The second cause was a little harder to address. The structure representing nodes in the tree I started with was at first look reasonable.

struct pnode  
    uint16_t label_index; /* index into string table of label */
    uint16_t label_length; /* length of label */
    uint16_t child_node_index; /* index of first child node */
    uint16_t child_node_count; /* number of child nodes */
 ;

I examined the generated table and observed that the majority of nodes were leaf nodes (had no children) which makes sense given the type of data being represented. By allowing two types of node one for labels and a second for the child node information this would halve the node size in most cases and requiring only a modest change to the tree traversal code.

The only issue with this would be that a way to indicate a node has child information. It was realised that the domain labels can have a maximum length of 63 characters meaning their length can be represented in six bits so a uint16_t was excessive. The space was split into two uint8_t parts one for the length and one for a flag to indicate child data node followed.

union pnode  
    struct  
        uint16_t index; /* index into string table of label */
        uint8_t length; /* length of label */
        uint8_t has_children; /* the next table entry is a child node */
      label;
    struct  
        uint16_t node_index; /* index of first child node */
        uint16_t node_count; /* number of child nodes */
      child;
 ;

static const union pnode pnodes[8580] =  
    /* root entry */
      .label =   0, 0, 1    ,   .child =   2, 1553    ,
    /* entries 2 to 1794 */
      .label =  37, 2, 1    ,   .child =   1795, 6    ,

...

    /* entries 8577 to 8578 */
      .label =  31820, 6, 1    ,   .child =   8579, 1    ,
    /* entry 8579 */
      .label =  0, 1, 0    ,

 ;

This change reduced the node array size from 63Kb to 33Kb almost a 50% saving. I considered using bitfields to try and reduce the label length and has_children flag into a single byte but such packing will not reduce the length of a node below 32bits because it is unioned with the child structure.

A possibility of using the spare uint8_t derived by bitfield packing to store an additional label node in three other nodes was considered but added a great deal of complexity to node lookup and table construction for saving around 4Kb so was not incorporated.

With the changes incorporated the test program was a much more acceptable 75Kb reasonably close to the size of the compressed source but with the benefits of direct lookup. Integrating the libraries single API call into NetSurf was straightforward and resulted in correct operation when tested.

This episode just reminded me of the dangers of code that can fail silently. It exposed our users to a security problem that we thought had been addressed almost six years ago and squandered the limited resources of the project. Hopefully a lesson we will not have to learn again any time soon. If there is a positive to take away it is that the new implementation is more space efficient, automatically built and importantly tested

My recent rack design turned out to simply not be practical. It did not hold all the SBC I needed it to and most troubling accessing connectors was impractical. I was forced to remove the enclosure from the rack and go back to piles of SBC on a shelf.

View of the acrylic being laser cut through the heavily tinted window

This sent me back to the beginning of the design process. The requirement for easy access to connectors had been compromised on in my first solution because I wanted a compact 1U size. This time I returned to my initial toast rack layout but retaining the SBC inside their clip cases.

By facing the connectors downwards and providing basic cable management the design should be much more practical.

My design process is to use the QCAD package to create layered 2D outlines which are then converted from DXF into toolpaths with Lasercut CAM software. The toolpaths are then uploaded to the laser cutter directly from the PC running Lasercut.

Despite the laser cutters being professional grade systems the Lasercut software is a continuous cause of issues for many users, it is the only closed source piece of software in the production process and it has a pretty poor user interface. On this occasion my main issue with it was my design was quite large at 700mm by 400mm which caused the software to crash repeatedly. I broke the design down into two halves and this allowed me to continue.

Once I defeated the software the design was laser cut from 3mm clear extruded acrylic. The assembled is secured with 72 off M3 nuts and bolts. The resulting construction is very strong and probably contains much more material than necessary.

One interesting thing I discovered is that in going from a 1U enclosure holding 5 units to a 2U design holding 11 units I had increased the final weight from 320g to 980g and when all 11 SBC are installed that goes up to a whopping 2300g. Fortunately this is within the mechanical capabilities of the material but it is the heaviest thing I have ever constructed from 3mm acrylic.

Once installed in the rack with all SBC inserted and connected this finally actually works and provides a practical solution. The self is finally clear of SBC and has enough space for all the other systems I need to accommodate for various projects.

As usual the design files are all freely available though I really cannot see anyone else needing to replicate this.

Instant classic

Trusted:
NO, there were errors:
The certificate does not apply to the given host
The certificate authority's certificate is invalid
The root certificate authority's certificate is not trusted for this purpose
The certificate cannot be verified for internal reasons
Signature Algorithm: md5WithRSAEncryption
    Issuer: C=XY, ST=Snake Desert, L=Snake Town, O=Snake Oil, Ltd, OU=Certificate Authority, CN=Snake Oil CA/emailAddress=ca@snakeoil.dom
    Validity
        Not Before: Oct 21 18:21:51 1999 GMT
        Not After : Oct 20 18:21:51 2001 GMT
    Subject: C=XY, ST=Snake Desert, L=Snake Town, O=Snake Oil, Ltd, OU=Webserver Team, CN=www.snakeoil.dom/emailAddress=www@snakeoil.dom
...
            X509v3 Subject Alternative Name: 
            email:www@snakeoil.dom

For your own pleasure:

openssl s_client -connect www.walton.com.tw:443 -showcerts

or just run

echo '
-----BEGIN CERTIFICATE-----
MIIDNjCCAp+gAwIBAgIBATANBgkqhkiG9w0BAQQFADCBqTELMAkGA1UEBhMCWFkx
FTATBgNVBAgTDFNuYWtlIERlc2VydDETMBEGA1UEBxMKU25ha2UgVG93bjEXMBUG
A1UEChMOU25ha2UgT2lsLCBMdGQxHjAcBgNVBAsTFUNlcnRpZmljYXRlIEF1dGhv
cml0eTEVMBMGA1UEAxMMU25ha2UgT2lsIENBMR4wHAYJKoZIhvcNAQkBFg9jYUBz
bmFrZW9pbC5kb20wHhcNOTkxMDIxMTgyMTUxWhcNMDExMDIwMTgyMTUxWjCBpzEL
MAkGA1UEBhMCWFkxFTATBgNVBAgTDFNuYWtlIERlc2VydDETMBEGA1UEBxMKU25h
a2UgVG93bjEXMBUGA1UEChMOU25ha2UgT2lsLCBMdGQxFzAVBgNVBAsTDldlYnNl
cnZlciBUZWFtMRkwFwYDVQQDExB3d3cuc25ha2VvaWwuZG9tMR8wHQYJKoZIhvcN
AQkBFhB3d3dAc25ha2VvaWwuZG9tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
gQC554Ro+VH0dJONqljPBW+C72MDNGNy9eXnzejXrczsHs3Pc92Vaat6CpIEEGue
yG29xagb1o7Gj2KRgpVYcmdx6tHd2JkFW5BcFVfWXL42PV4rf9ziYon8jWsbK2aE
+L6hCtcbxdbHOGZdSIWZJwc/1Vs70S/7ImW+Zds8YEFiAwIDAQABo24wbDAbBgNV
HREEFDASgRB3d3dAc25ha2VvaWwuZG9tMDoGCWCGSAGG+EIBDQQtFittb2Rfc3Ns
IGdlbmVyYXRlZCBjdXN0b20gc2VydmVyIGNlcnRpZmljYXRlMBEGCWCGSAGG+EIB
AQQEAwIGQDANBgkqhkiG9w0BAQQFAAOBgQB6MRsYGTXUR53/nTkRDQlBdgCcnhy3
hErfmPNl/Or5jWOmuufeIXqCvM6dK7kW/KBboui4pffIKUVafLUMdARVV6BpIGMI
5LmVFK3sgwuJ01v/90hCt4kTWoT8YHbBLtQh7PzWgJoBAY7MJmjSguYCRt91sU4K
s0dfWsdItkw4uQ==
-----END CERTIFICATE-----
'   openssl x509 -noout -text

At least they're secure against heartbleed.

We have this PostgreSQL server with plenty of RAM that is still using some of its swap over the day (up to 600MB). Then suddenly everything is swapped in again.

It turned out the reason is there are two clusters running, and the second one isn't used as heavily as the first one. Disk I/O activity of the first cluster slowly evicts pages from the second shared buffers cache to swap, and then the daily pg_dump run reads them back every evening. I was not aware of an easy way to get numbers for "amount of SysV shared memory swapped to disk", but some googling led to shmctl(2):

#define _GNU_SOURCE 1
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <unistd.h>
int main ()
 
        struct shm_info info;
        int max;
        long PAGE_SIZE = sysconf(_SC_PAGESIZE);
        max = shmctl(0, SHM_INFO, (struct shmid_ds *) &info);
        printf ("max: %d\nshm_tot: %ld\nshm_rss: %ld\nshm_swp: %ld\n",
                        max,
                        info.shm_tot * PAGE_SIZE,
                        info.shm_rss * PAGE_SIZE,
                        info.shm_swp * PAGE_SIZE);
        return 0;

The output looks like this:

max: 13
shm_tot: 13232308224
shm_rss: 12626661376
shm_swp: 601616384

Update: Mark points out that ipcs -mu shows the same information. Thanks for the hint!

# ipcs -mu
------ Shared Memory Status --------
segments allocated 2
pages allocated 3230544
pages resident  3177975
pages swapped   51585
Swap performance: 0 attempts     0 successes

With a current daily build of debian-installer you should now be able to install a VM with Debian s390x. I'm told that even installation within hercules works again. d-i's beta 1 for wheezy has a broken busybox and is hence unusable. But there were more issues:

zipl-installer was blacklisted from building on s390x by Packages-arch-specific which caused nobootloader to be selected during installation.
base-installer was unable to pick the correct kernel image due to it not having any rules for this architecture.
A netcfg fix was needed that removes a 50s timeout while d-i tries to arping the configured gateway. This check seems to be pretty new and if it fails the link is still considered up and the installation continues. The network driver commonly used on s390(x) (qeth) uses layer 3 configuration by default, which means that ARP is meaningless. The IPv4 and IPv6 addresses are configured in the network adapter which then does all the ARP if necessary (i.e. not within the same machine).

The resulting installation even boots. But it shares some problems with squeeze's s390 port:

udev's persistent network device naming is broken. It includes a volatile device ID that is incremented with each reboot. This is now tracked in Debian bug #684766.
The TERM variable is set to linux during boot-up instead of dumb (the s390(x) terminals are either line- or form-based, except for one HMC ASCII terminal). Together with the new fancy coloured LSB init output in wheezy the screen gets even more noisy with all those escape sequences. It's yet unclear where this should be fixed. The kernel provides the init system with a TERM=linux default and it's not yet fixed up afterwards (either in initramfs-tools, or busybox, or sysvinit).

Tip of the day: If you see funny characters in x3270, then you maybe selected the wrong EBCDIC code page. If you use Linux you want to select CP1047 instead of the default CP037 in Options Character Set ( Euro). That's the target of the in-kernel ASCII to EBCDIC converter.

Three things I learned about Debian s390 today:

The kernel expects hexadecimal channel numbers in lower case. Trying uppercase digits is futile.
To get a getty in the HMC's Operating System Messages window of an LPAR, just uncomment the dumb console entry in /etc/inittab.
For the integrated ASCII console to work on LPAR, you need to put up a getty onto ttysclp0. You can reuse the same parameters as for ttyS0 (the device that just works with z/VM). Unlike the 3270 interface of z/VM the integrated ASCII console is actually pretty nice and usable. You can even run vim in it without getting completely crazy.

Award winning code Me and Yuwei had a fun day at hhhmcr (#hhhmcr) and even managed to put together a prototype that won the first prize \o/ We played with the gmp24 dataset kindly extracted from Twitter by Michael Brunton-Spall of the Guardian into a convenient JSON dataset. The idea was to find ways of making it easier to look at the data and making sense of it. This is the story of what we did, including the code we wrote. The original dataset has several JSON files, so the first task was to put them all together:

#!/usr/bin/python
# Merge the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import os
res = []
for f in os.listdir("."):
    if not f.startswith("gmp24"): continue
    data = open(f).read().strip()
    if data == "[]": continue
    parsed = simplejson.loads(data)
    res.extend(parsed)
print simplejson.dumps(res)

The results however were not ordered by date, as GMP had to use several accounts to twit because Twitter was putting Greather Manchester Police into jail for generating too much traffic. There would be quite a bit to write about that, but let's stick to our work. Here is code to sort the JSON data by time:

#!/usr/bin/python
# Sort the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import sys
import datetime as dt
all_recs = simplejson.load(sys.stdin)
all_recs.sort(key=lambda x: dt.datetime.strptime(x["created_at"], "%a %b %d %H:%M:%S +0000 %Y"))
simplejson.dump(all_recs, sys.stdout)

I then wanted to play with Tf-idf for extracting the most important words of every tweet:

#!/usr/bin/python
# tfifd - Annotate JSON elements with Tf-idf extracted keywords
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import sys, math
import simplejson
import re
# Read all the twits
records = simplejson.load(sys.stdin)
# All the twits by ID
byid = dict(((x["id"], x) for x in records))
# Stopwords we ignore
stopwords = set(["by", "it", "and", "of", "in", "a", "to"])
# Tokenising engine
re_num = re.compile(r"^\d+$")
re_word = re.compile(r"(\w+)")
def tokenise(tweet):
    "Extract tokens from a tweet"
    for tok in tweet["text"].split():
        tok = tok.strip().lower()
        if re_num.match(tok): continue
        mo = re_word.match(tok)
        if not mo: continue
        if mo.group(1) in stopwords: continue
        yield mo.group(1)
# Extract tokens from tweets
tokenised = dict(((x["id"], list(tokenise(x))) for x in records))
# Aggregate token counts
aggregated =  
for d in byid.iterkeys():
    for t in tokenised[d]:
        if t in aggregated:
            aggregated[t] += 1
        else:
            aggregated[t] = 1
def tfidf(doc, tok):
    "Compute TFIDF score of a token in a document"
    return doc.count(tok) * math.log(float(len(byid)) / aggregated[tok])
# Annotate tweets with keywords
res = []
for name, tweet in byid.iteritems():
    doc = tokenised[name]
    keywords = sorted(set(doc), key=lambda tok: tfidf(doc, tok), reverse=True)[:5]
    tweet["keywords"] = keywords
    res.append(tweet)
simplejson.dump(res, sys.stdout)

I thought this was producing a nice summary of every tweet but nobody was particularly interested, so we moved on to adding categories to tweet. Thanks to Yuwei who put together some useful keyword sets, we managed to annotate each tweet with a place name (i.e. "Stockport"), a social place name (i.e. "pub", "bank") and a social category (i.e. "man", "woman", "landlord"...) The code is simple; the biggest work in it was the dictionary of keywords:

#!/usr/bin/python
# categorise - Annotate JSON elements with categories
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
# Copyright (C) 2010  Yuwei Lin <yuwei@ylin.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import sys, math
import simplejson
import re
# Electoral wards from http://en.wikipedia.org/wiki/List_of_electoral_wards_in_Greater_Manchester
placenames = ["Altrincham", "Sale West",
"Altrincham", "Ashton upon Mersey", "Bowdon", "Broadheath", "Hale Barns", "Hale Central", "St Mary", "Timperley", "Village",
"Ashton-under-Lyne",
"Ashton Hurst", "Ashton St Michael", "Ashton Waterloo", "Droylsden East", "Droylsden West", "Failsworth East", "Failsworth West", "St Peter",
"Blackley", "Broughton",
"Broughton", "Charlestown", "Cheetham", "Crumpsall", "Harpurhey", "Higher Blackley", "Kersal",
"Bolton North East",
"Astley Bridge", "Bradshaw", "Breightmet", "Bromley Cross", "Crompton", "Halliwell", "Tonge with the Haulgh",
"Bolton South East",
"Farnworth", "Great Lever", "Harper Green", "Hulton", "Kearsley", "Little Lever", "Darcy Lever", "Rumworth",
"Bolton West",
"Atherton", "Heaton", "Lostock", "Horwich", "Blackrod", "Horwich North East", "Smithills", "Westhoughton North", "Chew Moor", "Westhoughton South",
"Bury North",
"Church", "East", "Elton", "Moorside", "North Manor", "Ramsbottom", "Redvales", "Tottington",
"Bury South",
"Besses", "Holyrood", "Pilkington Park", "Radcliffe East", "Radcliffe North", "Radcliffe West", "St Mary", "Sedgley", "Unsworth",
"Cheadle",
"Bramhall North", "Bramhall South", "Cheadle", "Gatley", "Cheadle Hulme North", "Cheadle Hulme South", "Heald Green", "Stepping Hill",
"Denton", "Reddish",
"Audenshaw", "Denton North East", "Denton South", "Denton West", "Dukinfield", "Reddish North", "Reddish South",
"Hazel Grove",
"Bredbury", "Woodley", "Bredbury Green", "Romiley", "Hazel Grove", "Marple North", "Marple South", "Offerton",
"Heywood", "Middleton",
"Bamford", "Castleton", "East Middleton", "Hopwood Hall", "Norden", "North Heywood", "North Middleton", "South Middleton", "West Heywood", "West Middleton",
"Leigh",
"Astley Mosley Common", "Atherleigh", "Golborne", "Lowton West", "Leigh East", "Leigh South", "Leigh West", "Lowton East", "Tyldesley",
"Makerfield",
"Abram", "Ashton", "Bryn", "Hindley", "Hindley Green", "Orrell", "Winstanley", "Worsley Mesnes",
"Manchester Central",
"Ancoats", "Clayton", "Ardwick", "Bradford", "City Centre", "Hulme", "Miles Platting", "Newton Heath", "Moss Side", "Moston",
"Manchester", "Gorton",
"Fallowfield", "Gorton North", "Gorton South", "Levenshulme", "Longsight", "Rusholme", "Whalley Range",
"Manchester", "Withington",
"Burnage", "Chorlton", "Chorlton Park", "Didsbury East", "Didsbury West", "Old Moat", "Withington",
"Oldham East", "Saddleworth",
"Alexandra", "Crompton", "Saddleworth North", "Saddleworth South", "Saddleworth West", "Lees", "St James", "St Mary", "Shaw", "Waterhead",
"Oldham West", "Royton",
"Chadderton Central", "Chadderton North", "Chadderton South", "Coldhurst", "Hollinwood", "Medlock Vale", "Royton North", "Royton South", "Werneth",
"Rochdale",
"Balderstone", "Kirkholt", "Central Rochdale", "Healey", "Kingsway", "Littleborough Lakeside", "Milkstone", "Deeplish", "Milnrow", "Newhey", "Smallbridge", "Firgrove", "Spotland", "Falinge", "Wardle", "West Littleborough",
"Salford", "Eccles",
"Claremont", "Eccles", "Irwell Riverside", "Langworthy", "Ordsall", "Pendlebury", "Swinton North", "Swinton South", "Weaste", "Seedley",
"Stalybridge", "Hyde",
"Dukinfield Stalybridge", "Hyde Godley", "Hyde Newton", "Hyde Werneth", "Longdendale", "Mossley", "Stalybridge North", "Stalybridge South",
"Stockport",
"Brinnington", "Central", "Davenport", "Cale Green", "Edgeley", "Cheadle Heath", "Heatons North", "Heatons South", "Manor",
"Stretford", "Urmston",
"Bucklow-St Martins", "Clifford", "Davyhulme East", "Davyhulme West", "Flixton", "Gorse Hill", "Longford", "Stretford", "Urmston",
"Wigan",
"Aspull New Springs Whelley", "Douglas", "Ince", "Pemberton", "Shevington with Lower Ground", "Standish with Langtree", "Wigan Central", "Wigan West",
"Worsley", "Eccles South",
"Barton", "Boothstown", "Ellenbrook", "Cadishead", "Irlam", "Little Hulton", "Walkden North", "Walkden South", "Winton", "Worsley",
"Wythenshawe", "Sale East",
"Baguley", "Brooklands", "Northenden", "Priory", "Sale Moor", "Sharston", "Woodhouse Park"]
# Manual coding from Yuwei
placenames.extend(["City centre", "Tameside", "Oldham", "Bury", "Bolton",
"Trafford", "Pendleton", "New Moston", "Denton", "Eccles", "Leigh", "Benchill",
"Prestwich", "Sale", "Kearsley", ])
placenames.extend(["Trafford", "Bolton", "Stockport", "Levenshulme", "Gorton",
"Tameside", "Blackley", "City centre", "Airport", "South Manchester",
"Rochdale", "Chorlton", "Uppermill", "Castleton", "Stalybridge", "Ashton",
"Chadderton", "Bury", "Ancoats", "Whalley Range", "West Yorkshire",
"Fallowfield", "New Moston", "Denton", "Stretford", "Eccles", "Pendleton",
"Leigh", "Altrincham", "Sale", "Prestwich", "Kearsley", "Hulme", "Withington",
"Moss Side", "Milnrow", "outskirt of Manchester City Centre", "Newton Heath",
"Wythenshawe", "Mancunian Way", "M60", "A6", "Droylesden", "M56", "Timperley",
"Higher Ince", "Clayton", "Higher Blackley", "Lowton", "Droylsden",
"Partington", "Cheetham Hill", "Benchill", "Longsight", "Didsbury",
"Westhoughton"])
# Social categories from Yuwei
soccat = ["man", "woman", "men", "women", "youth", "teenager", "elderly",
"patient", "taxi driver", "neighbour", "male", "tenant", "landlord", "child",
"children", "immigrant", "female", "workmen", "boy", "girl", "foster parents",
"next of kin"]
for i in range(100):
    soccat.append("%d-year-old" % i)
    soccat.append("%d-years-old" % i)
# Types of social locations from Yuwei
socloc = ["car park", "park", "pub", "club", "shop", "premises", "bus stop",
"property", "credit card", "supermarket", "garden", "phone box", "theatre",
"toilet", "building site", "Crown court", "hard shoulder", "telephone kiosk",
"hotel", "restaurant", "cafe", "petrol station", "bank", "school",
"university"]
extras =   "placename": placenames, "soccat": soccat, "socloc": socloc  
# Normalise keyword lists
for k, v in extras.iteritems():
    # Remove duplicates
    v = list(set(v))
    # Sort by length
    v.sort(key=lambda x:len(x), reverse=True)
# Add keywords
def add_categories(tweet):
    text = tweet["text"].lower()
    for field, categories in extras.iteritems():
        for cat in categories:
            if cat.lower() in text:
                tweet[field] = cat
                break
    return tweet
# Read all the twits
records = (add_categories(x) for x in simplejson.load(sys.stdin))
simplejson.dump(list(records), sys.stdout)

All these scripts form a nice processing chain: each script takes a list of JSON records, adds some bit and passes it on. In order to see what we have so far, here is a simple script to convert the JSON twits to CSV so they can be viewed in a spreadsheet:

#!/usr/bin/python
# Convert the JSON twits to CSV
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import sys
import csv
rows = ["id", "created_at", "text", "keywords", "placename"]
writer = csv.writer(sys.stdout)
for rec in simplejson.load(sys.stdin):
    rec["keywords"] = " ".join(rec["keywords"])
    rec["placename"] = rec.get("placename", "")
    writer.writerow([rec[row] for row in rows])

At this point we were coming up with lots of questions: "were there more reports on women or men?", "which place had most incidents?", "what were the incidents involving animals?"... Time to bring Xapian into play. This script reads all the JSON tweets and builds a Xapian index with them:

#!/usr/bin/python
# toxapian - Index JSON tweets in Xapian
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import simplejson
import sys
import os, os.path
import xapian
DBNAME = sys.argv[1]
db = xapian.WritableDatabase(DBNAME, xapian.DB_CREATE_OR_OPEN)
stemmer = xapian.Stem("english")
indexer = xapian.TermGenerator()
indexer.set_stemmer(stemmer)
indexer.set_database(db)
data = simplejson.load(sys.stdin)
for rec in data:
    doc = xapian.Document()
    doc.set_data(str(rec["id"]))
    indexer.set_document(doc)
    indexer.index_text_without_positions(rec["text"])
    # Index categories as categories
    if "placename" in rec:
        doc.add_boolean_term("XP" + rec["placename"].lower())
    if "soccat" in rec:
        doc.add_boolean_term("XS" + rec["soccat"].lower())
    if "socloc" in rec:
        doc.add_boolean_term("XL" + rec["socloc"].lower())
    db.add_document(doc)
db.flush()
# Also save the whole dataset so we know where to find it later if we want to
# show the details of an entry
simplejson.dump(data, open(os.path.join(DBNAME, "all.json"), "w"))

And this is a simple command line tool to query to the database:

#!/usr/bin/python
# xgrep - Command line tool to query the GMP24 tweet Xapian database
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import simplejson
import sys
import os, os.path
import xapian
DBNAME = sys.argv[1]
db = xapian.Database(DBNAME)
stem = xapian.Stem("english")
qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
qp.add_boolean_prefix("place", "XP")
qp.add_boolean_prefix("soc", "XS")
qp.add_boolean_prefix("loc", "XL")
query = qp.parse_query(sys.argv[2],
    xapian.QueryParser.FLAG_BOOLEAN  
    xapian.QueryParser.FLAG_LOVEHATE  
    xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE  
    xapian.QueryParser.FLAG_WILDCARD  
    xapian.QueryParser.FLAG_PURE_NOT  
    xapian.QueryParser.FLAG_SPELLING_CORRECTION  
    xapian.QueryParser.FLAG_AUTO_SYNONYMS)
enquire = xapian.Enquire(db)
enquire.set_query(query)
count = 40
matches = enquire.get_mset(0, count)
estimated = matches.get_matches_estimated()
print "%d/%d results" % (matches.size(), estimated)
data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))
for m in matches:
    rec = data[m.document.get_data()]
    print rec["text"]
print "%d/%d results" % (matches.size(), matches.get_matches_estimated())
total = db.get_doccount()
estimated = matches.get_matches_estimated()
print "%d results over %d documents, %d%%" % (estimated, total, estimated * 100 / total)

Neat! Now that we have a proper index that supports all sort of cool things, like stemming, tag clouds, full text search with complex queries, lookup of similar documents, suggest keywords and so on, it was just fair to put together a web service to share it with other people at the event. It helped that I had already written similar code for apt-xapian-index and dde before. Here is the server, quickly built on bottle. The very last line starts the server and it is where you can configure the listening interface and port.

#!/usr/bin/python
# xserve - Make the GMP24 tweet Xapian database available on the web
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import bottle
from bottle import route, post
from cStringIO import StringIO
import cPickle as pickle
import simplejson
import sys
import os, os.path
import xapian
import urllib
import math
bottle.debug(True)
DBNAME = sys.argv[1]
QUERYLOG = os.path.join(DBNAME, "queries.txt")
data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))
prefixes =   "place": "XP", "soc": "XS", "loc": "XL"  
prefix_desc =   "place": "Place name", "soc": "Social category", "loc": "Social location"  
db = xapian.Database(DBNAME)
stem = xapian.Stem("english")
qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
for k, v in prefixes.iteritems():
    qp.add_boolean_prefix(k, v)
def make_query(qstring):
    return qp.parse_query(qstring,
        xapian.QueryParser.FLAG_BOOLEAN  
        xapian.QueryParser.FLAG_LOVEHATE  
        xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE  
        xapian.QueryParser.FLAG_WILDCARD  
        xapian.QueryParser.FLAG_PURE_NOT  
        xapian.QueryParser.FLAG_SPELLING_CORRECTION  
        xapian.QueryParser.FLAG_AUTO_SYNONYMS)
@route("/")
def index():
    query = urllib.unquote_plus(bottle.request.GET.get("q", ""))
    out = StringIO()
    print >>out, '''
<html>
<head>
<title>Query</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript">
$(function() 
    $("#queryfield")[0].focus()
 )
</script>
</head>
<body>
<h1>Search</h1>
<form method="POST" action="/query">
Keywords: <input type="text" name="query" value="%s" id="queryfield">
<input type="submit">
<a href="http://xapian.org/docs/queryparser.html">Help</a>
</form>''' % query
    print >>out, '''
<p>Example: "car place:wigan"</p>

<p>Available prefixes:</p>

<ul>
'''
    for pfx in prefixes.keys():
        print >>out, "<li><a href='/catinfo/%s'>%s - %s</a></li>" % (pfx, pfx, prefix_desc[pfx])
    print >>out, '''
</ul>
'''
    oldqueries = []
    if os.path.exists(QUERYLOG):
        total = db.get_doccount()
        fd = open(QUERYLOG, "r")
        while True:
            try:
                q = pickle.load(fd)
            except EOFError:
                break
            oldqueries.append(q)
        fd.close()
        def print_query(q):
            count = q["count"]
            print >>out, "<li><a href='/query?query=%s'>%s (%d/%d %.2f%%)</a></li>" % (urllib.quote_plus(q["q"]), q["q"], count, total, count * 100.0 / total)
        print >>out, "<p>Last 10 queries:</p><ul>"
        for q in oldqueries[:-10:-1]:
            print_query(q)
        print >>out, "</ul>"
        # Remove duplicates
        oldqueries = dict(((x["q"], x) for x in oldqueries)).values()
        print >>out, "<table>"
        print >>out, "<tr><th>10 queries with most results</th><th>10 queries with least results</th></tr>"
        print >>out, "<tr><td>"
        print >>out, "<ul>"
        oldqueries.sort(key=lambda x:x["count"], reverse=True)
        for q in oldqueries[:10]:
            print_query(q)
        print >>out, "</ul>"
        print >>out, "</td><td>"
        print >>out, "<ul>"
        nonempty = [x for x in oldqueries if x["count"] > 0]
        nonempty.sort(key=lambda x:x["count"])
        for q in nonempty[:10]:
            print_query(q)
        print >>out, "</ul>"
        print >>out, "</td></tr>"
        print >>out, "</table>"
    print >>out, '''
</body>
</html>'''
    return out.getvalue()
@route("/query")
@route("/query/")
@post("/query")
@post("/query/")
def query():
    query = bottle.request.POST.get("query", bottle.request.GET.get("query", ""))
    enquire = xapian.Enquire(db)
    enquire.set_query(make_query(query))
    count = 40
    matches = enquire.get_mset(0, count)
    estimated = matches.get_matches_estimated()
    total = db.get_doccount()
    out = StringIO()
    print >>out, '''
<html>
<head><title>Results</title></head>
<body>
<h1>Results for "<b>%s</b>"</h1>
''' % query
    if estimated == 0:
        print >>out, "No results found."
    else:
        # Give as results the first 30 documents; also use them as the key
        # ones to use to compute relevant terms
        rset = xapian.RSet()
        for m in enquire.get_mset(0, 30):
            rset.add_document(m.document.get_docid())
        # Compute the tag cloud
        class NonTagFilter(xapian.ExpandDecider):
            def __call__(self, term):
                return not term[0].isupper() and not term[0].isdigit()
        cloud = []
        maxscore = None
        for res in enquire.get_eset(40, rset, NonTagFilter()):
            # Normalise the score in the interval [0, 1]
            weight = math.log(res.weight)
            if maxscore == None: maxscore = weight
            tag = res.term
            cloud.append([tag, float(weight) / maxscore])
        max_weight = cloud[0][1]
        min_weight = cloud[-1][1]
        cloud.sort(key=lambda x:x[0])
        def mklink(query, term):
            return "/query?query=%s" % urllib.quote_plus(query + " and " + term)
        print >>out, "<h2>Tag cloud</h2>"
        print >>out, "<blockquote>"
        for term, weight in cloud:
            size = 100 + 100.0 * (weight - min_weight) / (max_weight - min_weight)
            print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(query, term), size, term)
        print >>out, "</blockquote>"
        print >>out, "<h2>Results</h2>"
        print >>out, "<p><a href='/'>Search again</a></p>"
        print >>out, "<p>%d results over %d documents, %.2f%%</p>" % (estimated, total, estimated * 100.0 / total)
        print >>out, "<p>%d/%d results</p>" % (matches.size(), estimated)
        print >>out, "<ul>"
        for m in matches:
            rec = data[m.document.get_data()]
            print >>out, "<li><a href='/item/%s'>%s</a></li>" % (rec["id"], rec["text"])
        print >>out, "</ul>"
        fd = open(QUERYLOG, "a")
        qinfo = dict(q=query, count=estimated)
        pickle.dump(qinfo, fd)
        fd.close()
    print >>out, '''
<a href="/">Search again</a>

</body>
</html>'''
    return out.getvalue()
@route("/item/:id")
@route("/item/:id/")
def show(id):
    rec = data[id]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Result %s</title></head>
<body>
<h1>Raw JSON record for twit %s</h1>
<pre>''' % (rec["id"], rec["id"])
    print >>out, simplejson.dumps(rec, indent=" ")
    print >>out, '''
</pre>
</body>
</html>'''
    return out.getvalue()
@route("/catinfo/:name")
@route("/catinfo/:name/")
def catinfo(name):
    prefix = prefixes[name]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Values for %s</title></head>
<body>
''' % name
    terms = [(x.term[len(prefix):], db.get_termfreq(x.term)) for x in db.allterms(prefix)]
    terms.sort(key=lambda x:x[1], reverse=True)
    freq_min = terms[0][1]
    freq_max = terms[-1][1]
    def mklink(name, term):
        return "/query?query=%s" % urllib.quote_plus(name + ":" + term)
    # Build tag cloud
    print >>out, "<h1>Tag cloud</h1>"
    print >>out, "<blockquote>"
    for term, freq in sorted(terms[:20], key=lambda x:x[0]):
        size = 100 + 100.0 * (freq - freq_min) / (freq_max - freq_min)
        print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(name, term), size, term)
    print >>out, "</blockquote>"
    print >>out, "<h1>All terms</h1>"
    print >>out, "<table>"
    print >>out, "<tr><th>Occurrences</th><th>Name</th></tr>"
    for term, freq in terms:
        print >>out, "<tr><td>%d</td><td><a href='/query?query=%s'>%s</a></td></tr>" % (freq, urllib.quote_plus(name + ":" + term), term)
    print >>out, "</table>"
    print >>out, '''
</body>
</html>'''
    return out.getvalue()
# Change here for bind host and port
bottle.run(host="0.0.0.0", port=8024)

...and then we presented our work and ended up winning the contest. This was the story of how we wrote this set of award winning code.

Bastien mentioned the Chromium OS xorg.conf file, which includes an irritating wart - namely, Option "SHMConfig" "on". This tells the Synaptics touchpad driver to export its configuration data to a shared memory region which is accessible to any user on the system. The reason for this is that in the past, there was no good way for configuration information to be passed to input drivers through the X server at runtime. This got fixed with the advent of X input properties, and synaptics can now be configured sensibly over the X protocol.

But why was it off by default? Because, as I said, the configuration data is exported to a shared memory region which is accessible to any user on the system. And while it contains a bunch of information that's not terribly interesting (an attacker being able to disable my touchpad or turn on two finger emulation may be a DoS of sorts, but...), it also contains some values that are used to scale the input coordinates. Which means that anyone with access to the SHM region can effectively take control of your mouse. The current position is exported too, so they can also track all of your mouse input.

Now, this isn't stunningly bad. The attacker can only do this while you're touching the pad. You'll see everything that happens as a result. There's no way to fake keyboard input. They need to be running code as another user on the system - if they're running as the logged in user then they can already do all of this. And for a device as single-user as Google seem to be looking at, it's obviously not a concern at all.

But there's still plenty of places on the web suggesting that you enable SHMConfig, and various distributions that ship with it turned on (Ubuntu on the Dell mini used to, but got turned off after I contacted them about it). It's absolutely fine to do this as long as you're aware of the security implications of it, but otherwise please use X input properties instead.

I got tagged in the bunnymeme. I even got nagged about it. I have to admit, drawing a bunny is kind of a relief from a busy week. Here are the rules:

Draw a Bunny (or more)
Post it to your blog with the rules
Name three other bloggers that should draw a bunny

Here is my bunny:

And here are the three blogger friends I can think of who aren't too uptight and full of themselves to post a picture of a hand-drawn bunny:

Update: there's a nice graph tracking the meme's progress. tags: bunnymeme bunny meme splitbrain

Montreal Startup invests in Control Yourself, Inc. On a more serious note, I'm glad that GigaOM has published a story about venture investment in my company, Control Yourself, Inc. ("Identi.ca Gets Funding to Make Open-source Twitter Variant"). I'm psyched to be working with Montreal Start Up to build CYI into a viable Open Source business. They've been really supportive of the Open Everything strategy, including building the OpenMicroBlogging standard. Thanks to everyone who's sent private and public congratulations. The fun part starts now! tags: identica controlyourself investment montrealstartup openmicroblogging

Open Source Jaiku The news came out on the same day that Google announced they're going to release the Jaiku code as Open Source software. I think this is great news. Hopefully, we can work together to build a federated network of microblogging sites running Open Source software connected with open standards. It's so important to have multiple implementations of any open standard, and I think Identica and Jaiku can be a good team for the microblogging world. tags: jaiku opensource openstandard google

A common way to debug a network server is to use 'telnet' or 'nc' to connect to the server and issue some commands in the protocol to verify whether everything is working correctly. That obviously only works for ASCII protocols (as opposed to binary protocols), and it obviously also only works if you're not using any encryption. But that doesn't mean you can't test an encrypted protocol in a similar way, thanks to openssl's s_client:

wouter@country:~$ openssl s_client -host samba.grep.be -port 443
CONNECTED(00000003)
depth=0 /C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=svn.grep.be/emailAddress=wouter@grep.be
verify error:num=18:self signed certificate
verify return:1
depth=0 /C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=svn.grep.be/emailAddress=wouter@grep.be
verify return:1
---
Certificate chain
 0 s:/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=svn.grep.be/emailAddress=wouter@grep.be
   i:/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=svn.grep.be/emailAddress=wouter@grep.be
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDXDCCAsWgAwIBAgIJAITRhiXp+37JMA0GCSqGSIb3DQEBBQUAMH0xCzAJBgNV
BAYTAkJFMRAwDgYDVQQIEwdBbnR3ZXJwMREwDwYDVQQHEwhNZWNoZWxlbjEUMBIG
A1UEChMLTml4U3lzIEJWQkExFDASBgNVBAMTC3N2bi5ncmVwLmJlMR0wGwYJKoZI
hvcNAQkBFg53b3V0ZXJAZ3JlcC5iZTAeFw0wNTA1MjEwOTMwMDFaFw0xNTA1MTkw
OTMwMDFaMH0xCzAJBgNVBAYTAkJFMRAwDgYDVQQIEwdBbnR3ZXJwMREwDwYDVQQH
EwhNZWNoZWxlbjEUMBIGA1UEChMLTml4U3lzIEJWQkExFDASBgNVBAMTC3N2bi5n
cmVwLmJlMR0wGwYJKoZIhvcNAQkBFg53b3V0ZXJAZ3JlcC5iZTCBnzANBgkqhkiG
9w0BAQEFAAOBjQAwgYkCgYEAsGTECq0VXyw09Zcg/OBijP1LALMh9InyU0Ebe2HH
NEQ605mfyjAENG8rKxrjOQyZzD25K5Oh56/F+clMNtKAfs6OuA2NygD1/y4w7Gcq
1kXhsM1MOIOBdtXAFi9s9i5ZATAgmDRIzuKZ6c2YJxJfyVbU+Pthr6L1SFftEdfb
L7MCAwEAAaOB4zCB4DAdBgNVHQ4EFgQUtUK7aapBDaCoSFRWTf1wRauCmdowgbAG
A1UdIwSBqDCBpYAUtUK7aapBDaCoSFRWTf1wRauCmdqhgYGkfzB9MQswCQYDVQQG
EwJCRTEQMA4GA1UECBMHQW50d2VycDERMA8GA1UEBxMITWVjaGVsZW4xFDASBgNV
BAoTC05peFN5cyBCVkJBMRQwEgYDVQQDEwtzdm4uZ3JlcC5iZTEdMBsGCSqGSIb3
DQEJARYOd291dGVyQGdyZXAuYmWCCQCE0YYl6ft+yTAMBgNVHRMEBTADAQH/MA0G
CSqGSIb3DQEBBQUAA4GBADGkLc+CWWbfpBpY2+Pmknsz01CK8P5qCX3XBt4OtZLZ
NYKdrqleYq7r7H8PHJbTTiGOv9L56B84QPGwAzGxw/GzblrqR67iIo8e5reGbvXl
s1TFqKyvoXy9LJoGecMwjznAEulw9cYcFz+VuV5xnYPyJMLWk4Bo9WCVKGuAqVdw
-----END CERTIFICATE-----
subject=/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=svn.grep.be/emailAddress=wouter@grep.be
issuer=/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=svn.grep.be/emailAddress=wouter@grep.be
---
No client certificate CA names sent
---
SSL handshake has read 1428 bytes and written 316 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 1024 bit
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DHE-RSA-AES256-SHA
    Session-ID: 65E69139622D06B9D284AEDFBFC1969FE14E826FAD01FB45E51F1020B4CEA42C
    Session-ID-ctx: 
    Master-Key: 606553D558AF15491FEF6FD1A523E16D2E40A8A005A358DF9A756A21FC05DFAF2C9985ABE109DCD29DD5D77BE6BC5C4F
    Key-Arg   : None
    Start Time: 1222001082
    Timeout   : 300 (sec)
    Verify return code: 18 (self signed certificate)
---
HEAD / HTTP/1.1
Host: svn.grep.be
User-Agent: openssl s_client
Connection: close
HTTP/1.1 404 Not Found
Date: Sun, 21 Sep 2008 12:44:55 GMT
Server: Apache/2.2.3 (Debian) mod_auth_kerb/5.3 DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch11 mod_ssl/2.2.3 OpenSSL/0.9.8c
Connection: close
Content-Type: text/html; charset=iso-8859-1
closed
wouter@country:~$

As you can see, we connect to an HTTPS server, get to see what the server's certificate looks like, issue some commands, and the server responds properly. It also works for (some) protocols who work in a STARTTLS kind of way:

wouter@country:~$ openssl s_client -host samba.grep.be -port 587 -starttls smtp
CONNECTED(00000003)
depth=0 /C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=samba.grep.be
verify error:num=18:self signed certificate
verify return:1
depth=0 /C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=samba.grep.be
verify return:1
---
Certificate chain
 0 s:/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=samba.grep.be
   i:/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=samba.grep.be
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDBDCCAm2gAwIBAgIJAK53w+1YhWocMA0GCSqGSIb3DQEBBQUAMGAxCzAJBgNV
BAYTAkJFMRAwDgYDVQQIEwdBbnR3ZXJwMREwDwYDVQQHEwhNZWNoZWxlbjEUMBIG
A1UEChMLTml4U3lzIEJWQkExFjAUBgNVBAMTDXNhbWJhLmdyZXAuYmUwHhcNMDgw
OTIwMTYyMjI3WhcNMDkwOTIwMTYyMjI3WjBgMQswCQYDVQQGEwJCRTEQMA4GA1UE
CBMHQW50d2VycDERMA8GA1UEBxMITWVjaGVsZW4xFDASBgNVBAoTC05peFN5cyBC
VkJBMRYwFAYDVQQDEw1zYW1iYS5ncmVwLmJlMIGfMA0GCSqGSIb3DQEBAQUAA4GN
ADCBiQKBgQCee+Ibci3atTgoJqUU7cK13oD/E1IV2lKcvdviJBtr4rd1aRWfxcvD
PS00jRXGJ9AAM+EO2iuZv0Z5NFQkcF3Yia0yj6hvjQvlev1OWxaWuvWhRRLV/013
JL8cIrKYrlHqgHow60cgUt7kfSxq9kjkMTWLsGdqlE+Q7eelMN94tQIDAQABo4HF
MIHCMB0GA1UdDgQWBBT9N54b/zoiUNl2GnWYbDf6YeixgTCBkgYDVR0jBIGKMIGH
gBT9N54b/zoiUNl2GnWYbDf6YeixgaFkpGIwYDELMAkGA1UEBhMCQkUxEDAOBgNV
BAgTB0FudHdlcnAxETAPBgNVBAcTCE1lY2hlbGVuMRQwEgYDVQQKEwtOaXhTeXMg
QlZCQTEWMBQGA1UEAxMNc2FtYmEuZ3JlcC5iZYIJAK53w+1YhWocMAwGA1UdEwQF
MAMBAf8wDQYJKoZIhvcNAQEFBQADgYEAAnMdbAgLRJ3xWOBlqNjLDzGWAEzOJUHo
5R9ljMFPwt1WdjRy7L96ETdc0AquQsW31AJsDJDf+Ls4zka+++DrVWk4kCOC0FOO
40ar0WUfdOtuusdIFLDfHJgbzp0mBu125VBZ651Db99IX+0BuJLdtb8fz2LOOe8b
eN7obSZTguM=
-----END CERTIFICATE-----
subject=/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=samba.grep.be
issuer=/C=BE/ST=Antwerp/L=Mechelen/O=NixSys BVBA/CN=samba.grep.be
---
No client certificate CA names sent
---
SSL handshake has read 1707 bytes and written 351 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 1024 bit
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DHE-RSA-AES256-SHA
    Session-ID: 6D28368494A3879054143C7C6B926C9BDCDBA20F1E099BF4BA7E76FCF357FD55
    Session-ID-ctx: 
    Master-Key: B246EA50357EAA6C335B50B67AE8CE41635EBCA6EFF7EFCE082225C4EFF5CFBB2E50C07D8320E0EFCBFABDCDF8A9A851
    Key-Arg   : None
    Start Time: 1222000892
    Timeout   : 300 (sec)
    Verify return code: 18 (self signed certificate)
---
250 HELP
quit
221 samba.grep.be closing connection
closed
wouter@country:~$

OpenSSL here connects to the server, issues a proper EHLO command, does STARTTLS, and then gives me the same data as it did for the HTTPS connection. Isn't that nice.

My friend Hugh McGuire asked me if there were any privacy concerns with OpenID. Since this is kind of a complex question, I decided to answer it with an essay, OpenID Privacy Concerns, which I think is pretty useful to anyone thinking about using OpenID. tags: openid privacy

Weight weight don't tell me So, I'm in Los Altos at my parents' house for the holidays, and I've been working out at the Page Mill YMCA. Today I went in for my last weight-lifting session of the year. I also weigh myself on weight-lifting days (about once every 4 days), so this was my last weigh-in of the year, too. I'm glad to say that I've now got a body mass index of 29.4 -- down from a measured value of 38 in September. That was a little bit under my goal weight for the year-end, which was nice to hit. It's been a really precipitous drop in weight and increase in fitness this year, which I've been pretty proud of. There's still more ahead -- a "normal" weight is about 25. But I'm feeling healthier, and I look better, than I have in years and years. I think 2008 will see a slower weight loss, more stabilization, and exploration of some different sports and activities to keep the exercise fresh. tags: weightloss bmi exercise workout

My friend Hugh McGuire just launched a new Web site: earideas. It's a curated collection of the best podcasts on the Web -- kind of an interesting choice. Hugh's best known for the Librivox project -- a collaborative effort of thousands to make public-domain audio books freely available on the Web. He's also the leader of datalibre.ca and the co-founder of Collectik, the social sharing site for podcast enthusiasts. All of which is to say that he's a massive-collaboration kind of guy. So why the curated collection? Says Hugh, "It s built specifically so that you don t have to do anything much more than you do when you turn on a radio." In other words, they're finding the best stuff for you. I think the site makes a nice complement to Collectik. In any event: the gang at earideas has asked for lists of people's top 5 favorite podcasts. I'm still not very much into the podcasting thing; I don't have a portable digital music player, and I haven't found listening to music on my laptop or desktop really useful. But I do listen some of the time, so I figure I could give my suggestions.

Linuxcast by the personable and telegenic Don Marti. Great coverage of software and society in the Free world.
FLOSS Weekly. The "weekly" part is a bit of a stretch, but this is a very good interview show about Free/Libre/Open Source Software. I miss Chris DiBona as host, but Randall Schwartz is no slouch, either.
Destinyland. Destiny is perhaps one of the greatest pop culture researchers alive today. This very occasional podcast is just like having Destiny corner you at a party and talk for 90 minutes about the suspicious circumstances of Alfalfa's death. It's fascinating.
CBC Radio 3 Podcast. Yeah, it may make me terribly unhip, but I like Radio 3.
Radio Open Source. I want to hate this show for capitalizing on the name "open source" without actually providing any open content or talking about Free Software. But it's one of the best shows on the Web.

All right, that's my five. Good luck, Hugh and the rest of the gang at earideas.com. tags: earideas hughmcguire podcasts publicradio

Million-dollar idea So, I use a Logitech Trackman Marble for my desktop computer -- sometimes called the one true pointing device. I think it's awesome, and I've had it for maybe 10 years. The only downside is that every 6-8 weeks I have to pop off the cover and clean out all the finger oils, dust, hand hair and cookie crumbs that have amalgamated to a greasy, fuzzy crud inside. It may be one of the most disgusting things I do regularly -- and remember, I have a 2-year-old, so that's saying something. Anyways, here's my idea for a device, free for you to use. It's a self-cleaning trackball (or mouse). Just like a self-cleaning oven, every few weeks you'd click a special button on the device and it would heat itself to 800 F for about 90 minutes until the dang thing is just glowing white-hot and it vaporizes any crud that may be inside. Then, you can go about your business with a smooth, clean mouse (or trackball) again. Viola! I think the idea would work for keyboards, too, but I really rarely clean my keyboard. Filthy keyboards seem to work fine most of the time, and when they don't work well you can just throw them away and get a new one. But there's no accounting for taste and anyways people may want to buy a matching set of self-cleaning keyboard and mouse rather than just one or the other on its own. You, reader, can use this awesome million-dollar idea on your own, free of charge. Enjoy! tags: keyboard mouse trackball selfcleaning productrecall $1midea

I recently bought a Fujitsu-Siemens LIFEBOOK S-4572 (sub)notebook on eBay for less than 150 Euros, a really great little machine. It's a Pentium III, 750 MHz, 256 MB RAM (the chipset cannot handle more than that unfortunately), 12.1" screen, ethernet, 2x USB 1.1, CD-ROM/DVD reader + CD-ROM writer, PCMCIA, IrDA, modem, and a 15 GB hard drive. No floppy, no serial ports, no parallel port. There's no wireless builtin, but I use a cheapo PCMCIA adapter. The greatest feature compared to all other laptops I previously owned is that the battery life is really great, it lasts almost 3 hours (compare that to 45 minutes on my current "main" laptop). I'm running Debian unstable on the box (of course). Here's the Linux support status as far as I have tested things: Networking Works out of the box using the e100 driver. Sound Works out of the box using the snd_intel8x0 driver. X11 Works out of the box, using either the vesa or the ati driver (at a max. resolution of 1024x768). Touchpad Works out of the box. Using the Option "SHMConfig" "on" line in /etc/X11/xorg.conf's InputDevice section (using the synaptics driver) also works fine and allows you to scroll using the touchpad, e.g. in a browser. More info in the SynapticsTouchpad page on the Debian wiki. CDROM, DVD Reading CD-ROMs and DVDs as well as burning CD-ROMs works fine. I don't think the drive is capable of writing DVDs. External VGA Displaying the screen contents on an external VGA monitor (or beamer) works just fine, switching is done using Fn+F10. PCMCIA Works fine, tested using the Sitecom WL-112 wireless card. The driver installation for that is straight-forward, too:

$ apt-get install rt2500-source
$ m-a a-i rt2500-source
$ dpkg -i /usr/src/rt2500*deb

Special keys All the Fn-keys work fine (brightness, volume, etc.). There are five other special keys (for starting a browser or something) which I haven't tested, but I don't really care... USB Works fine, but it's only USB 1.1, so some higher-speed devices will not work (DVB-T USB devices for example; PCMCIA DVB-T adapters might work). IrDA, Modem Untested, I don't care. Powersaving, Suspend to RAM It seems this CPU (Pentium III, Coppermine) doesn't support frequency scaling, so cpufreq-set doesn't work. I'm using laptop-mode-tools to improve battery life a bit more, though. Also, Suspend-to-RAM works fine out of the box:

$ apt-get install hibernate
$ hibernate-ram

I haven't tested Suspend-to-Disk yet, but I'm not sure it'll work anyway, as I'm using a dm-crypt'ed disk (+ LVM), as with all my boxes. lspci

00:00.0 Host bridge [0600]: Intel Corporation 82440MX Host Bridge [8086:7194] (rev 01)
00:00.1 Multimedia audio controller [0401]: Intel Corporation 82440MX AC'97 Audio Controller [8086:7195]
00:00.2 Modem [0703]: Intel Corporation 82440MX AC'97 Modem Controller [8086:7196]
00:07.0 Bridge [0680]: Intel Corporation 82440MX ISA Bridge [8086:7198] (rev 01)
00:07.1 IDE interface [0101]: Intel Corporation 82440MX EIDE Controller [8086:7199]
00:07.2 USB Controller [0c03]: Intel Corporation 82440MX USB Universal Host Controller [8086:719a]
00:07.3 Bridge [0680]: Intel Corporation 82440MX Power Management Controller [8086:719b]
00:12.0 Ethernet controller [0200]: Intel Corporation 82557/8/9 [Ethernet Pro 100] [8086:1229] (rev 09)
00:13.0 CardBus bridge [0607]: O2 Micro, Inc. OZ6933/711E1 CardBus/SmartCardBus Controller [1217:6933] (rev 02)
00:13.1 CardBus bridge [0607]: O2 Micro, Inc. OZ6933/711E1 CardBus/SmartCardBus Controller [1217:6933] (rev 02)
00:14.0 VGA compatible controller [0300]: ATI Technologies Inc Rage Mobility P/M [1002:4c52] (rev 64)
01:00.0 Network controller [0280]: RaLink RT2500 802.11g Cardbus/mini-PCI [1814:0201] (rev 01)

Other resources

I'm really considering making this my "main" box even though it's a bit older/slower, as my current laptop with 45 minutes battery life is a major pain when travelling...

We've had a lot of great blog traffic about Vinismo. I thought I'd try to pull together a few of the better ones.

vinismo! wiki wine compendium from Hugh McGuire.
Vinismo Opens Its Doors to Wine Lovers Everywhere from the Creative Commons Weblog. Thanks, guys!
Vinismo: Wiki + Wine from Zach Copley
Vinismo from Wendy Copley.
DemoCampMontreal3 report from Montreal Tech Watch
DemoCampMontreal3 Report from Josh Nursing
Vinismo Presentation at DemoCampMontreal3 on the SearchAnyway Blog. Probably one of my favourite pieces, although it makes Niko real mad since it cuts off just before he starts talking. Make mental note: don't say "Uh" so much.
Bienvenue sur Vinismo, le guide des vins libre by Niko. Niko also bemoaned the difficulty of keeping our project under wraps in Mon lourd secret bient t d voil . What a drama queen.
Vinismo : un wiki sur le vin by Olivier Niquet. He's suspicious that we are sponsored by J. Lohr, because there are so many J. Lohr wines on our front page.

All in all it's been pretty good. I hope we'll get some more, though! tags: blogs vinismo

DemoCampMontreal3 report So, it's been a couple of days and I should probably get around to posting my own DemoCampMontreal3 report.
Niko and I started off with our own demo for Vinismo. It was a lot of fun: we talked about the reasons for starting the site, the technical, information architecture and graphics/UI design challenges, and what our future extensions are going to be. At the end of it, we took some questions, which was fun. The most interesting for me was from Roberto Rocha, whose TechnoCit is one of my favourite tech columns in Montreal. He asked, "Your typical contributor will be much older. What will you do to make your wiki more accessible to them?" It was a good question I don't have an answer to yet, but I want to think about it more.
The second demo was by Heri Rakotomalala, who showed off his social-networking GTD tool, WorkCruncher. It's a TODO list with a twist: items that you don't get done age off the list. You have to re-commit to doing a task on an almost daily basis. I think it's a great and refreshing design; my TODO list gets depressing long and filled with unfinishable tasks, and I get too intimidated to work on ones that really matter. I think Heri might have to make some concessions to people's expectations for TODO lists -- maybe a way to automatically archive tasks, rather than deleting them entirely...?
The third demo was by the gang from Defensio, who are providing an great anti-spam Web service similar to Akismet. They had a few examples of where they're different, but I'm not well-versed enough in comment spam issues to understand them. My guess is that since they're getting into the market after Akismet, though, they have the opportunity to make a smarter technology. Their one downside? They used slides -- which the rules of DemoCamp. They did demo the service, though.
Fourth up was the indefatigable Simon Law. Simon's project? To turn back time. Talk about ambitious! His effort consisted of making a typical kitchen clock turn backwards. He disassembled the clock and explained how it worked to the audience. It was great, except for two things: the clock didn't work at the end of the demo (although he got it working by the end of DemoCamp), and he took a few minutes to draw a diagram of the clock; in my mind, that's just a low-tech PowerPoint slide.
Fifth, and quite fascinating, was a tool that Jerome Paradis showed off. It was a Google-Maps mashup that filtered special emails for an informal private jet sharing network. Apparently, companies who charter private jets often have space in the jets, so they'll make that space available. People who need a last-minute charter jet can send email from their Blackberries and such, and if there's availability they get contacted by the charter companies. The interesting part? These people use a highly structured lingo ("O/W" = "one way"), and Jerome's tool scrapes these emails to make the data into a mapping app for his customers. Very interesting!
All in all, it was a good night -- probably made better by the Argentina Cabernet Sauvignon I had. There's a DemoCampMontreal4 scheduled for August 17th, but I won't be in town for it. Too bad for me! tags: democampmontreal3 defensio workcruncher simonlaw jeromeparadis vinismo

Long fun weekend here. Last Thursday, Maj's brother Brian appeared on our doorstep unexpectedly. He'd been driving across the country from Santa Barbara for a few weeks, stopping to see friends and go camping and rock climbing along the way. Every few days he'd call our home number to tell us when he'd be coming in... but our home phone has had mixed-up wiring since our move on June 1. So his arrival was a really pleasant surprise. That night we went out to dinner at Ouzeri, the upscale Greek restaurant around the corner. We had been trying to connect with Matt Biddulph and Alexandra Deschamps-Sonsino, who are visiting Montreal for a month. Matt and I were introduced at South by Southwest by connector Ben Cerveny, and Maj and Alex met separately and serendipitously at Montreal Pecha Kucha. So we did a big dinner at Ouzeri to talk and have fun. Boris Anthony came along, which made for a really great time. We had a lot of good wines at Ouzeri -- they have the best-known Greek wine cellar in town. We started off with an Ayiotiriki white (in a cool bottle), then a Naoussa, then a Rapsani. All very tasty. Dinner was a mix of appetizers, moussaka, and a big grilled shrimp platter. Boris got a nice-looking slice of lamb, too. Oh, and we got Ouzeri's trademark saganaki: a big raw cheese, doused with ouzo and cooked flamb at the table. Amita June doesn't like sitting at the table too much, so we took a walk outside so she could burn off some energy. A little bored, I twittered about where we were. Within 20 minutes, both Hugh and Niko cruised in to join us. Power of the Internet, dot dot dot. We finished dinner and decided to take the talk on the road. Everyone came over to our house at 4690 rue Pontiac and we... uh... opened another bottle of wine and hung out and talked for a couple of hours. By the time the bars were open and active on Mont-Royal, everyone streamed out and Maj and I put Amita to sleep. Good time! tags: montreal ouzeri mattbiddulph alexandredeschampssonsino borisanthony bencerveny greekwine hughmcguire niko

Incredible day here in Montreal. The temperature got over 31C today -- about 88 F -- which made for a steamy, jungly day. Remember how I said we had predictions of snow flurries two weeks ago? Things change quickly. Of course, hot weather and high humidity are a recipe for smog. Add on top of that the fact that Montreal is in the middle of a public transit strike, and you've got a serious air quality problem. Fortunately we should have some rain this weekend to shake that out. tags: weather montreal hot smog

EC2 I spent a big part of my day twiddling around getting a nice Ubuntu server running on Amazon Web Services. Amazon's EC2 is an innovative server-provisioning API; beta testers for EC2 can build or tear down servers for any purpose in a few minutes using EC2 and Amazon S3. I wrote a few years ago, in a widely-reproduced email, a reply making fun of Jeff Bezos and Amazon's supposed innovations. But let me be frank: Amazon Web Services are a shithouse crazy idea. I think that Jeff Bezos must have been a complete nutjob to bet the company on these zany technologies; I also think it's brilliant, and it's going to change the way we think about using computers. I included EC2 in Ten Web 2.0 APIs you can really use. I think that decision was really justified. My EC2 instances now run Ubuntu Feisty Fawn; lighty, MySQL and PHP. It seems to be a winning combination; I'm looking forward to using EC2 for a production Web or database server. tags: ec2 amazonwebservices amazon jeffbezos oneeyedman

rel-edit I mentioned already the great work that AboutUs.org is doing to organize a Universal Wiki Edit Button. I decided to kick in on the machine-readable side and proposed a rel-edit microformat. So far the microformats-discuss mailing list has been pretty positive on the idea, but I'm going to wait a few days before posting a draft on the microformats wiki. tags: rel-edit microformats uweb mailinglist microformats-discuss

Salt I just finished reading Salt: A World History, a nice non-fiction book by researcher extraordinaire Mark Kurlansky. The book covers this important mineral, its importance to human life, and the many ways to extract it to make it available for us. The book covers mummification in ancient Egypt, salt taxes in China, fish sauce in Vietnam, and Mahatma Gandhi's great salt march in India. It's so comprehensive that it can really make your head spin; but it's also exciting to see world history refracted through these whitish crystals. I think it's a great book, and I'm looking forward to reading Cod: A Biography of the Fish That Changed the World, by the same author. But we just got Everything is Miscellaneous from Amazon this week, so I think I'll be digging through that before I get to Cod. tags: salt markkurlansky nonfiction nacl cod

reCAPTCHA I heard about recaptcha via Hugh's article about same. Brilliant idea; why didn't someone think of this before? (Update: it was about 5 minutes after I posted this that I realized, "Hey! That can't work!" So I went back and re-read the docs on reCAPTCHA again. Now I'm even more impressed.) tags: recaptcha ingenious ocr

I want to ride my I've been thinking of picking up a bicycle for a few weeks now, as the weather has cleared here in town. So today I went up to Garantie Bicycle on rue Marie-Anne and bought the cheapest damn bike they had that wasn't made specifically for pre-pubescent children. Woohoo! The last time I had I bike, I lived in San Francisco. It was a beautiful cherry-red Cannondale hybrid -- with that great fat Cannondale tube, but light enough that I could lift it with one finger. I rode it around SF a lot, and took it out for trail biking on weekends in Marin County and the Peninsula and even the Sierra Nevada. It was a tough bike to ride, but once you got used to it it was a dream to take up hills. But that bike was stolen at Burning Man 2001, during the actual Burn. It's a classic mistake: professional bike thieves go to Burning Man each year to snag bikes left unlocked by tired and idealistic Burners, especially at times when camps are left empty, like during the Burn. To be honest, it was kind of a relief: I'd already moved out of my apartment and was planning a trip around North America, and I didn't have room in my Citro n DS for a bike. Nor in my storage locker at the weird and wonderful Sunshine Storage in Oakland. But it was too nice a bike to throw away or give to one of my no-good friends, who were mostly too short for it anyways. So bicycle theft was the best solution. Sam Phillips had his bike stolen at the same time. The bike thieves left a lot of cheapo bikes around our camp. They liked mine and Sam's, though. Anyways, my new bike was quite inexpensive, and it weighs a metric ton. It's built like you're supposed to drive trains over it. I could never, ever carry it up a steep and muddy hill, and I wouldn't bother. Fortunately for me Montreal is really, really flat, so I don't really need to worry about riding this thing up hills. It will look pretty good with a baby seat on the back, though. tags: montreal garantiebicycle burningman bicycle theft samphillips sunshinestorage oakland cannondale

I just finished reading Ben Yoskowitz's Top 10 reasons to join a startup, and I feel like I have to say something. As an experienced programmer, I think of a startup as an abusive relationship. Most technology startups I've been in have been run by charismatic sociopaths with no actual management or technology experience. They make wild, ignorant, unfounded promises to employees, investors and potential customers, then get increasingly anxious and unpleasant as their unsupportable predictions of schedule and market share don't come through. Employees end up working 80-hour weeks, typically for low or no pay and worthless stock options that never pay out. Guilty that living the "startup lifestyle" (work from home, come in late, have fun) has caused schedule or feature slips, they push themselves too hard at work, to the detriment of their health and home life. It doesn't help you much get another job later. Most startups don't succeed, which means that you end up with 1, 2 or 3 years of space on your r sum consumed by a made-up company name that no one has ever heard of and whose phone line has been disconnected. If people have heard of the name, they'll associate it with failure ("Oh, yeah, wasn't that the company that spent all that money and went nowhere?"). On top of all that, the tech experience of working in a startup is usually pretty bad. The engineers are typically unexperienced -- just out of school, or close to it -- and they waste a lot of time and effort trying to reinvent the wheel. The idea that you're changing the world encourages this kind of bad engineering. You end up cutting a lot of corners and developing a lot of bad habits that you have to unlearn later on. Schedule pressures make you drop important engineering steps like analysis, design, unit tests, documentation. That is, if you even knew you were supposed to do them in the first place. It takes a lot of experience in this kind of pressure-cooker environment to come out of a startup sane, healthy, and wealthy. The vast majority of people end up with broken relationships, poor mental and physical health, and not much in the way of money to show for their efforts. tags: startup benyoskovitz work

Wikiclock in the news So, Niko, Hugh, Mike and Leoben Richardson all blogged about the Wikiclock. As far as I know, no MSM coverage yet, but it should be coming down the pipe any day. I talked to Niko tonight, and he said that he saw Hugh at La ka, and Hugh was updating the wikiclock. They traded off updating all afternoon. Sigh. It's actually quite addictive. tags: wikiclock niko hughmacguire leonardrichardson mikelinksvayer

Transit strike We've got a transit strike on in Montreal (see the CBC for deetz). It's a pretty heinous problem -- the maintenance workers are asking for X% pay increases over Y years with blah blah blah pension and fie fo fum and... Whatever. The city is refusing to budge, which makes our mayor, G rald Tremblay, officially a hypocritical idiot. Tremblay has been wagging fingers at provincial, federal and international figures for years about sticking to the Kyoto protocol, but when the hard decisions had to be made, he was willing to shut down Montr al's transit system... in summer... after one of the warmest winters in history. You can blame both sides on this issue, but when it comes down to it the union's job is to get the best deal it can for its workers, and Tremblay's job is to keep the Montreal M tro running. Someone's not doing his job. The city spends millions trying to get commuters out of their cars and into the public transit system. And all that work's out the window, now. Despite the public call for more people to rollerblade to work (riiiiiiiight), traffic is way up and auto use is way up. That means air quality is going to go down, and our city can go from being an environmental leader to an environmental mistake. I'm glad to hear that the Quebec government is stepping in to break up the fight, but I hope it doesn't take too long. tags: montreal quebec metro strike

Calling graphic designers: Universal edit button One of the interesting movements to come out of RoCoCoCamp this weekend is the idea of a universal wiki edit button. This would be an icon similar to the RSS radio-wave icon that's become ubiquitous for indicating an RSS feed. The UWEB would symbolize that the current page can be edited. Folks at AboutUs.org have taken the lead on this effort, which could be really interesting. Getting a universally-understood sign for "edit this page" would greatly help with acceptance of the wiki way by the general public. I don't think it'd have to be limited to wikis; any page that could be edited (e.g., on a CMS) could also have the UWEB. I highly encourage skilled graphic designers who want to take a crack at designing the Next Big Icon to look at the UWEB effort and give it a try. tags: graphicdesign universalwikieditbutton wiki rocococamp aboutus.org icon

It's been a busy week here and I've been lax in updating my blog. It's probably a good thing to put on the chopping block -- of all the possible things going on in my life -- but I'm regretting not keeping up with it. We had a good weekend last weekend. Maj's Mother's Day present request was to clean out all our closets in the house and take extra clothes to the Petits Fr res up the street. It took most of the day Saturday, and we finished with twelve garbage bags full of clothes for the Fr res. Check out Maj's photo of the pile of clothes. Sunday morning Amita June and I made Mama a nice breakfast of waffles and (tofu) sausage. Amita's job was to stir the batter (which she did a good job at) and not to touch anything hot (not so good). That afternoon, Maj went to get a haircut at the salon down the street, and I took the opportunity to give Amita June her first haircut. I think she looks like Twiggy. This week has been a maelstrom of activity. We have RoCoCoCamp coming up on Friday, and we're still tying all the loose ends together. I had most of my tasks done yesterday, so at the organizational meeting last night I saddled myself with more. D'oh! I think my main work this week is going to be encouraging local Montreal tech scenesters to come. I especially want to make sure that important women in the local Internet scene are present, so I'm going to copy Hugh McGuire's invite for BarCampMontreal2 and do some link-baiting:

Martine Page, filmmaker and journalist who's an anchor of the local blogging community
V ro B., vodcaster and blogger and general communicator
m-c turgeon, podcaster and blogger and founder of Stars of the Web
Marie-Joe, who provides a crucial link between the local blogging and game communities and is obsessed with Wikipedia

tags: mothersday amitajune rocococamp barcampmontreal montreal women

Interview There's a really good interview with me and Maj on the Internet Brands Developer Blog. It's gotten into the short list of interviews I point press to when they want background about Wikitravel -- along with the Creative Commons interview. I'm really glad that IB has started this blog to give the outside world a peek at what's happening out there in El Segundo. And it's also a great way for IB to highlight some of the Open Source work they're doing, like the neat-o new AJAX tool kit IBDOM. tags: interview ib ibdom ibbydev

Search Results: "hmc"

31 December 2023

25 May 2023

20 September 2016

13 March 2016

18 April 2014

26 November 2012

14 August 2012

7 August 2011

15 October 2010

20 November 2009

15 January 2009

21 September 2008

31 December 2007

6 December 2007

13 August 2007

27 July 2007

2 July 2007

26 May 2007

23 May 2007

16 May 2007