Search Results: "James Morrison"

20 September 2013

James Morrison: Penances

I wrote up some potential penances a long time ago, here is what I thought of (or what I've had to do :):

First merge conflict from whitespace: Use a code review tool like gerrit that shows trailing whitespac

First merge conflict from whitespace: Add pre-commit hooks that remove whitespace

First time a feature is broken: Add a test for the feature, including the infrastructure for the test.

First time the client and server releases get out of sync: Run client tests against the released server and server code at top of tree.


James Morrison: Legacy code

I once attended a talk where the main point was that legacy code is any code that wasn't written with tests. Well, I think now legacy code is any code written without both tests and without code review. If you have only one the code is still legacy, it has a bus factor of one.

4 June 2013

James Morrison: Compiling namecoin

To compile namecoind remember to fetch the following dependencies (at least on debian):

28 May 2013

Russell Coker: Links May 2013

Cameron Russell (who works as an underwear model) gave an interesting TED talk about beauty [1]. Ben Goldacre gave an interesting and energetic TED talk about bad science in medicine [2]. A lot of the material is aimed at non-experts, so this is a good talk to forward to your less scientific friends. Lev wrote a useful description of how to disable JavaScript from one site without disabling it from all sites which was inspired by Snopes [3]. This may be useful some time. Russ Allbery wrote an interesting post about work and success titled The Why? of Work [4]. Russ makes lots of good points and I m not going to summarise them (read the article, it s worth it). There is one point I disagree with, he says You are probably not going to change the world . The fact is that I ve observed Russ changing the world, he doesn t appear to have done anything that will get him an entry in a history book but he s done a lot of good work in Debian (a project that IS changing the world) and his insightful blog posts and comments on mailing lists influence many people. I believe that most people should think of changing the world as a group project where they are likely to be one of thousands or millions who are involved, then you can be part of changing the world every day. James Morrison wrote an insightful blog post about what he calls Penance driven development [5]. The basic concept of doing something good to make up for something you did which has a bad result (even if the bad result was inadvertent) is probably something that most people do to some extent, but formalising it in the context of software development work is a cencept I haven t seen described before. A 9yo boy named Caine created his own games arcade out of cardboard, when the filmmaker Nirvan Mullick saw it he created a short movie about it and promoted a flash mob event to play games at the arcade [6]. They also created the Imagination Foundation to encourage kids to create things from cardboard [7]. Tanguy Ortolo describes how to use the UDF filesystem instead of FAT for USB devices [8]. This allows you to create files larger than 2G while still allowing the device to be used on Windows systems. I ll keep using BTRFS for most of my USB sticks though. Bruce Schneier gave an informative TED talk about security models [9]. Probably most people who read my blog already have a good knowledge of most of the topics he covers. I think that the best use of this video is to educate less technical people you know. Blaine Harden gave an informative and disturbing TED talk about the concentration camps in North Korea [10]. At the end he points out the difficult task of helping people recover from their totalitarian government that will follow the fall of North Korea. Bruce Schneier has an interesting blog post about the use of a motherboard BMC controller (IPMI and similar) to compromise a server [11]. Also some business class desktop systems and laptops have similar functionality. Russ Allbery wrote an insightful article about the failures of consensus decision-making [12]. He compares the Wikipedia and Debian methods so his article is also informative for people who are interested in learning about those projects. The TED blog has a useful reference article with 10 places anyone can learn to code [13]. Racialicious has an interesting article about the people who take offense when it s pointed out that they have offended someone else [14]. Nick Selby wrote an interesting article criticising the Symantic response to the NYT getting hacked and also criticises anti-viru software in general [15]. He raises the point that most of us already know, anti-virus software doesn t do much good. Securing Windows networks is a losing game. Joshua Brindle wrote an interesting blog post about security on mobile phones and the attempts to use hypervisors for separating data of different levels [16]. He gives lots of useful background information about how to design and implement phone based systems.

14 May 2013

James Morrison: Google IO Predictions: Appengine

Well, I put out crappy predictions for Google IO related to Android. So here are my crappy predictions for Appengine:
It should be obvious, but I really have no clue what Appengine related things could be announced. If PHP is not there then the Appengine team deserves some applause for their slight of hand. The rest is mostly my wishlist :)

11 May 2013

James Morrison: Google IO predictions

I don't work at Google, so I can play to Google IO prediction game. Here are my Android predictions:
  1. New android version -- 99%
  2. New Nexus 7 -- 90%
  3. Upgraded storage on the Nexus 4 -- 70%
  4. T-mobile LTE for Nexus 4 -- 40%
  5. AT&T LTE for Nexus 4 -- 30%
  6. Verizon Nexus 4 -- 15%
So I think that means there is a (0.99 * 0.9 * 0.7 * 0.4 * 0.3 * 0.15) a 1% chance of all of these things happening.

30 January 2013

James Morrison: Married

In response to my own post from 7 years ago Bachelorhood, I would like to announce that I am a much better married man than Simon. My bachelorhood days are truly over, my laundry goes in a laundry basket, my clothes are always put away (whether I like it or not). My fridge is plugged in, has real food in it; I have a cell phone (actually numerous cell phones); I have an awesome custom made table; lastly I have a TV in my house. Bachelorhood officially ended last year, 30 years was a good run :)

James Morrison: Penance driven development

Recently, I got to try out working in a couple different environments. Working in these environments helped me formalize some of the things that I like and some of the things I don't like. With these new environments I've come to realize my style of development. It's called penance driven development. It's called this because there are two important things:
1) It's better to ask forgiveness than for permissions
2) You need to be deserving of forgiveness.

So the concept is simple, whenever you break something, you first fix the symptom, then you improve the infrastructure so similar breaks should be harder to do. The cost of the penance doesn't have to be huge, but the penance has to be done, otherwise technical debt accumulates and people believe there is no downside to breaking things.

What needs to be improved depends on what is missing. For example, if an engineer broke a feature, but that feature didn't have a test, the penance should be to write a test. Even if the one that broke the feature isn't the owner or the one that wrote the feature. If someone breaks the build and no one noticed because the continuous build didn't email, the fix could be setting up the continuous build to email out build failures, or even better, run each change on a build bot before allowing commits.

I think I'll have to start a penance of the day blog/twitter stream.

I've been practicing this style of development for most of my career, and it's similar if less precise than the 5 whys . As mentioned in the The Lean Startup, the cost of the penance should be proportional to the cost of the break.

James Morrison: Issues with Appengine

I like App Engine, my wedding website runs on App Engine as well as a couple recent commercial projects. However, there are some annoyances that make a lot of the useful things I learned at Google not applicable.

  1. Chunked responses
    App Engine doesn't have a way of doing chunked responses.
    1.5. Processing after sending the response My annoyance here is that I can't send an early response and wait for some resource. Say I want to do 5 url fetches that I cache for a short period of time. If I want to reply in 250ms, I want to use any of the results I've gotten and reply to the user. Then once the rest of the fetches come in, I want to put them in the cache, so they will come back quickly the next time.
  2. Counters
    App Engine has crappy support for counters. You can shard your counters into the datastore: https://developers.google.com/appengine/articles/sharding_counters#counter_python or you can use taskqueues: http://blog.appenginefan.com/2009/10/non-sharded-counters-part-2-using-task.html . Now, how do you graph these counters over time, well, that is left as an exercise for the reader. Google has special datastores for stores and retrieving counters.
  3. Backends
    Backends in appengine are versioned separately from frontends. This sounds like a good idea, but to run the devserver with backends, the backends need to be in the same source tree. Thus backends share the same url space. Backends also share the same app.yaml, so again the url space is shared. Thus, if you make a change that you want to coordinate between the frontend and the backend, you need to change the application version in app.yaml, then rename all the backends in backends.yaml. If backends are supposed to be in the same source tree as frontends, then they should be prefixed with the application version. If backends should be versioned separately from frontends, then they shouldn't share url handlers from app.yaml.
    3.5 Uploading backends is slow
    Each backend is uploaded through appcfg.py separately. So if you are crazy and have one backend per backend level, then you need to upload 4 copies of the source code each time the backends get updated.
  4. Logging
    I want structured logs. Instead what you have to do is use the python logging module to write out human readable logs, then parse them out into the structure you want.
    http://code.google.com/p/google-app-engine-samples/source/browse/trunk/logparser/logparser.py

As a preemptive counter, 1.5 could be dealt with using backends and a background thread, except backends don't scale to the number of queries, so my use case above isn't fixed.

James Morrison: Testing email receive for appengine

It's not too obvious how to test email receive handlers in Appengine. The important observation is that the handlers take HTTP POSTs with multipart/form-data encoded data. In python you can build an email to be handled with the following code:
from email.message import Message 

def test_email(self):
body = Message()
body.add_header('to', 'test-unknown@other-app.com')
body.add_header('from', 'test@app.com')
body.add_header('Content-Type', 'multipart/alternative', boundary=self.boundary)
text = Message()
text['content-type'] = 'text/plain'
text.set_payload('I am I! Don Quixote! The man of La Mancha!')
body.attach(text)

post(payload=body.as_string())

James Morrison: ndb OR query with cursors

I have two annoyances with Appengine's NDB OR queries. One is that the order must be by key, the other is that OR queries are done in series, so ndb.OR(a == 1, a == 2) are done as query(a == 1) then query(a == 2). To work around this, I created a new cursor class that allows sorting by any attribute.

Using this cursor does come with it's own caveats, one is that the order between the queries is not maintained (fixable) and two the limit attribute is per query not for the results of all queries (fixable).

import base64
import json

from google.appengine.ext import ndb

class MultiCursor(object):
@staticmethod
def from_websafe_string(value):
values = None
try:
value = base64.urlsafe_b64decode(str(value))
values = json.loads(value)
except Exception as e:
logging.error('Invalid Cursor %s - %s', e, value)
if not values:
return MultiCursor()

cursors = MultiCursor()
for key, cursor in values.iteritems():
try:
if isinstance(cursor, basestring):
cursors.set(key, ndb.Cursor.from_websafe_string(cursor))
else:
cursors.set(key, cursor)
except:
logging.error('MultiCursor.from_string - bad cursor (%s) for %s in %s',
cursor, key, values)
return cursors

def to_websafe_string(self):
values =
for key, value in self.values:
if isinstance(value, ndb.Cursor):
values[key] = value.to_websafe_string()
else:
values[key] = value
return base64.urlsafe_b64encode(json.dumps(values))

def __init__(self, values=None):
self.values = values or

def get(self, key):
return self.values.get(key), self.values.get(key + _'done')

def set(self, key, cursor, more):
self.values[key] = cursor
self.values[key + _'done'] = not more

def __len__(self):
return len(self.values)

To use the cursor the code should look like:
@ndb.tasklet
def _query_A_by_value_async(value, cursor, limit=100):
start_cursor, done = cursor.get(value)
if not done:
query = Foo.query(Foo.A == value)
query = Foo.order(-Foo.last_access_time)
foos, end_cursor, more = yield query.fetch_page_async(
limit, start_cursor=start_cursor)
cursor.set(value, end_cursor, more)
raise ndb.Return(foos, end_cursor, more)
raise ndb.Return([], None, False)

15 May 2012

James Morrison: Testing email receive for appengine

It's not too obvious how to test email receive handlers in Appengine. The important observation is that the handlers take HTTP POSTs with multipart/form-data encoded data. In python you can build an email to be handled with the following code:
from email.message import Message 

def test_email(self):
body = Message()
body.add_header('to', 'test-unknown@other-app.com')
body.add_header('from', 'test@app.com')
body.add_header('Content-Type', 'multipart/alternative', boundary=self.boundary)
text = Message()
text['content-type'] = 'text/plain'
text.set_payload('I am I! Don Quixote! The man of La Mancha!')
body.attach(text)

post(payload=body.as_string())

19 April 2012

James Morrison: Issues with Appengine

I like App Engine, my wedding website runs on App Engine as well as a couple recent commercial projects. However, there are some annoyances that make a lot of the useful things I learned at Google not applicable.

  1. Chunked responses
    App Engine doesn't have a way of doing chunked responses.
    1.5. Processing after sending the responseMy annoyance here is that I can't send an early response and wait for some resource. Say I want to do 5 url fetches that I cache for a short period of time. If I want to reply in 250ms, I want to use any of the results I've gotten and reply to the user. Then once the rest of the fetches come in, I want to put them in the cache, so they will come back quickly the next time.
  2. Counters
    App Engine has crappy support for counters. You can shard your counters into the datastore: https://developers.google.com/appengine/articles/sharding_counters#counter_python or you can use taskqueues: http://blog.appenginefan.com/2009/10/non-sharded-counters-part-2-using-task.html . Now, how do you graph these counters over time, well, that is left as an exercise for the reader. Google has special datastores for stores and retrieving counters.
  3. Backends
    Backends in appengine are versioned separately from frontends. This sounds like a good idea, but to run the devserver with backends, the backends need to be in the same source tree. Thus backends share the same url space. Backends also share the same app.yaml, so again the url space is shared. Thus, if you make a change that you want to coordinate between the frontend and the backend, you need to change the application version in app.yaml, then rename all the backends in backends.yaml. If backends are supposed to be in the same source tree as frontends, then they should be prefixed with the application version. If backends should be versioned separately from frontends, then they shouldn't share url handlers from app.yaml.
    3.5 Uploading backends is slow
    Each backend is uploaded through appcfg.py separately. So if you are crazy and have one backend per backend level, then you need to upload 4 copies of the source code each time the backends get updated.
  4. Logging
    I want structured logs. Instead what you have to do is use the python logging module to write out human readable logs, then parse them out into the structure you want.
    http://code.google.com/p/google-app-engine-samples/source/browse/trunk/logparser/logparser.py

As a preemptive counter, 1.5 could be dealt with using backends and a background thread, except backends don't scale to the number of queries, so my use case above isn't fixed.

16 March 2012

James Morrison: Penance driven development

Recently, I got to try out working in a couple different environments. Working in these environments helped me formalize some of the things that I like and some of the things I don't like. With these new environments I've come to realize my style of development. It's called penance driven development. It's called this because there are two important things:
1) It's better to ask forgiveness than for permissions
2) You need to be deserving of forgiveness.

So the concept is simple, whenever you break something, you first fix the symptom, then you improve the infrastructure so similar breaks should be harder to do. The cost of the penance doesn't have to be huge, but the penance has to be done, otherwise technical debt accumulates and people believe there is no downside to breaking things.

What needs to be improved depends on what is missing. For example, if an engineer broke a feature, but that feature didn't have a test, the penance should be to write a test. Even if the one that broke the feature isn't the owner or the one that wrote the feature. If someone breaks the build and no one noticed because the continuous build didn't email, the fix could be setting up the continuous build to email out build failures, or even better, run each change on a build bot before allowing commits.

I think I'll have to start a penance of the day blog/twitter stream.

I've been practicing this style of development for most of my career, and it's similar if less precise than the 5 whys . As mentioned in the The Lean Startup, the cost of the penance should be proportional to the cost of the break.

10 February 2012

James Morrison: Daffodils

Dear Lazyweb,

I'd like a video of the 90s commercial "That's what daffodils do". If you do that, I'll release a basic library for the iPhone to use SPDY[1].

[1] I said basic!

24 January 2012

James Morrison: Appengine

I've been working with appengine for a few months now. I've managed to find out the hard way that appcfg.py rollback is useless. I've learned to create a git branch for each refactoring I start on. The git branch also includes a new app version for the branch. When push all my changes to my live site, I upload one last time to the new app. Merge my branch into the master then change the live version from the old one to the new one.

I haven't learned how this work flow translates to having multiple developers. Hopefully, it is simple enough that other people can follow.

I also use make since it's an easy way to automate tasks. I hear redo is good, but for python, I'm not really compiling anything, I'm simply writing shell scripts and make is a great shell script dispatcher.

James Morrison: Appengine backends and task queues

To get appengine to send queued tasks to a backend, you need to set the host header when queuing the task. E.g.
    deferred.defer(
batch.DoStuff, arg1, arg2, arg3,
_headers= 'Host': backends.get_hostname(backend='backend_name') )

James Morrison: Make is still my friend


The following is a makefile fragment I seem to start each of my appengine projects with.

GAEPATH = $(HOME)/bin/google_appengine
PORT=8081

PYLINTS = $(wildcard *.py */*.py */*/*.py)
PYLINTFILES = $(patsubst %.py,.%.lint,$(notdir $(PYLINTS)))
PYLINT = $(join $(dir $(PYLINTS)),$(PYLINTFILES))

PYTHONPATH=$(GAEPATH):$(GAEPATH)/lib/yaml/lib:$(GAEPATH)/lib/webob:$(GAEPATH)/lib/django_0_96:.

APP=new-app

run:
$(GAEPATH)/dev_appserver.py ./ --port=$(PORT) --datastore_path=/tmp/$(APP).dev_appserver.datastore


.%.lint: %.py
@PYTHONPATH=$(PYTHONPATH) pychecker --only --no-miximport $?
@touch $@


lint: $(PYLINT)


clean:
-rm $ PYLINT


.PHONY: run lint clean


James Morrison: CMYK images in PDFs

For the few of you out there that are parsing PDFs manually in python, JPEG images (including CMYK images) can be extracted with the following code fragment.
  # reader is a PDFReader object from pyPdf, value is the operand to a Do operator. 
from PIL import Image, ImageChops

xobject = reader.getObject(value)
if xobject['/Filter'] == '/DCTDecode':
raw_data = xobject.getRawData()
if xobject['/ColorSpace'] == '/DeviceRGB':
_CreateFile('image/jpeg', filename, raw_data))
else:
f = cStringIO.StringIO(raw_data)
of = cStringIO.StringIO()
i = Image.open(f)
if xobject['/ColorSpace'] == '/DeviceCMYK':
i = ImageChops.invert(i)
i.convert('RGB').save(of, 'JPEG')
_CreateFile('image/jpeg', filename, of.getvalue())

The CMYK images I found in the PDFs needed to inverted. PIL versions before 1.1.7 would do that for you, but version 1.1.7 removed in the ImageChops.invert() call.

10 July 2011

James Morrison: Code reviews

I'm reading through http://scientopia.org/blogs/goodmath/2011/07/06/things-everyone-should-do-code-review/. So, I'll make my comments here since I expect they will be long. As a caveat, I still work at Google and have done a lot of code reviews. I've made a lot of mistakes while code reviewing, I've had difficult code reviews, but my biggest problems were non-code reviews. The caveat is for most of the code reviews, I know the code being changed better than the person changing the code.

I'll reply to sections as I come across them: "Given a problem, there are usually a dozen different ways to solve it. Andgiven a solution, there's a million ways to render it as code. As a reviewer, your job isn't to make sure that the code is what you would have written - because it won't be.". I disagree with this. There are many times where code is first written in a (1) Ignorant way (2) Sloppy way (to try to save trivial amounts of work). I agree that the code probably won't be the way the reviewer wrote it, but it is good to comment on how it could have been written. Many times the reviewee isn't familiar (or ignores) with the idiomatic style of a larger bit of code.

"The second major pitfall of review is that people feel obligated to say something.". It's fine to say something, but if that's what you say most of the time then you are wasting the reviewee's time. If you do a first pass and don't find anything, you should go back and read the code and be sure that you understand it. Most times I've seen people say nothing is that they don't want to know what the new (or changed) code does.

The last section is about speed. For speed there are two types of code reviews, one which is quick, small and fixes something important. Those should only take one round and typically only need the reviewer to ask for tests. Then there is everything else, which includes small and unimportant. For those, it's ok to take more time to start the review, but each iteration should get smaller and faster.

For anyone that's gotten this far, I guess I can say my comments about Mark's blog post are purely bike shedding, but that's something with code reviews, it's ok to say you think the bike shed should be neon pink as long as you don't force the reviewee to paint the bike shed neon pink.

Next.