Search Results: "Robert Collins"

24 September 2014

Robert Collins: what-poles-for-the-tent

So Monty and Sean have recently blogged about about the structures (1, 2) they think may work better for OpenStack. I like the thrust of their thinking but had some mumblings of my own to add. Firstly, I very much like the focus on social structure and needs what our users and deployers need from us. That seems entirely right. And I very much like the getting away from TC picking winners and losers. That was never an enjoyable thing when I was on the TC, and I don t think it has made OpenStack better. However, the thing that picking winners and losers did was that it allowed users to pick an API and depend on it. Because it was the X API for OpenStack . If we don t pick winners, then there is no way to say that something is the X API for OpenStack , and that means that there is no forcing function for consistency between different deployer clouds. And so this appears to be why Ring 0 is needed: we think our users want consistency in being able to deploy their application to Rackspace or HP Helion. They want vendor neutrality, and by giving up winners-and-losers we give up vendor neutrality for our users. Thats the only explanation I can come up with for needing a Ring 0 because its still winners and losers (e.g. picking an arbitrary project) keystone, grandfathering it in, if you will. If we really want to get out of the role of selecting projects, I think we need to avoid this. And we need to avoid it without losing vendor neutrality (or we need to give up the idea of vendor neutrality). One might say that we must pick winners for the very core just by its, but I don t think thats true. If the core is small, many people will still want vendor neutrality higher up the stack. If the core is large, then we ll have a larger % of APIs covered and stable granting vendor neutrality. So a core with fixed APIs will be under constant pressure to expand: not just from developers of projects, but from users that want API X to be fixed and guaranteed available and working a particular way at [most] OpenStack clouds. Ring 0 also fulfils a quality aspect we can check that it all works together well in a realistic timeframe with our existing tooling. We are essentially proposing to pick functionality that we guarantee to users; and an API for that which they have everywhere, and the matching implementation we ve tested. To pull from Monty s post: What does a basic end user need to get a compute resource that works and seems like a computer? (end user facet) What does Nova need to count on existing so that it can provide that. He then goes on to list a bunch of things, but most of them are not needed for that: We need Nova (its the only compute API in the project today). We don t need keystone (Nova can run in noauth mode and deployers could just have e.g. Apache auth on top). We don t need Neutron (Nova can do that itself). We don t need cinder (use local volumes). We need Glance. We don t need Designate. We don t need a tonne of stuff that Nova has in it (e.g. quotas) end users kicking off a simple machine have -very- basic needs. Consider the things that used to be in Nova: Deploying containers. Neutron. Cinder. Glance. Ironic. We ve been slowly decomposing Nova (yay!!!) and if we keep doing so we can imagine getting to a point where there truly is a tightly focused code base that just does one thing well. I worry that we won t get there unless we can ensure there is no pressure to be inside Nova to win . So there s a choice between a relatively large set of APIs that make the guaranteed available APIs be comprehensive, or a small set that that will give users what they need just at the beginning but might not be broadly available and we ll be depending on some unspecified process for the deployers to agree and consolidate around what ones they make available consistently. In sort one of the big reasons we were picking winners and losers in the TC was to consolidate effort around a single API not implementation (keystone is already on its second implementation). All the angst about defcore and compatibility testing is going to be multiplied when there is lots of ecosystem choice around APIs above Ring 0, and the only reason that won t be a problem for Ring 0 is that we ll still be picking winners. How might we do this? One way would be to keep picking winners at the API definition level but not the implementation level, and make the competition be able to replace something entirely if they implement the existing API [and win hearts and minds of deployers]. That would open the door to everything being flexible and its happened before with Keystone. Another way would be to not even have a Ring 0. Instead have a project/program that is aimed at delivering the reference API feature-set built out of a single, flat Big Tent and allow that project/program to make localised decisions about what components to use (or not). Testing that all those things work together is not much different than the current approach, but we d have separated out as a single cohesive entity the building of a product (Ring 0 is clearly a product) from the projects that might go into it. Projects that have unstable APIs would clearly be rejected by this team; projects with stable APIs would be considered etc. This team wouldn t be the TC : they too would be subject to the TC s rulings. We could even run multiple such teams as hinted at by Dean Troyer one of the email thread posts. Running with that I d then be suggesting So OpenStack/NaaS would have an API or set of APIs, and they d be responsible for considering maturity, feature set, and so on, but wouldn t own Neutron, or Neutron incubator or any other component they would be a *cross project* team, focused at the product layer, rather than the component layer, which nearly all of our folk end up locked into today. Lastly Sean has also pointed out that we have large N N^2 communication issues I think I m proposing to drive the scope of any one project down to a minimum, which gives us more N, but shrinks the size within any project, so folk don t burn out as easily, *and* so that it is easier to predict the impact of changes clear contracts and APIs help a huge amount there.

29 August 2014

Robert Collins: Test processes as servers

Since its very early days subunit has had a single model you run a process, it outputs test results. This works great, except when it doesn t. On the up side, you have a one way pipeline there s no interactivity needed, which makes it very very easy to write a subunit backend that e.g. testr can use. On the downside, there s no interactivity, which means that anytime you want to do something with those tests, a new process is needed and thats sometimes quite expensive particularly in test suites with 10 s of thousands of tests.Now, for use in the development edit-execute loop, this is arguably ok, because one needs to load the new tests into memory anyway; but wouldn t it be nice if tools like testr that run tests for you didn t have to decide upfront exactly how they were going to run. If instead they could get things running straight away and then give progressively larger and larger units of work to be run, without forcing a new process (and thus new discovery directory walking and importing) ? Secondly, testr has an inconsistent interface if testr is letting a user debug things to testr through to child workers in a chain, it needs to use something structured (e.g. subunit) and route stdin to the actual worker, but the final testr needs to unwrap everything this is needlessly complex. Lastly, for some languages at least, its possibly to dynamically pick up new code at runtime so a simple inotify loop and we could avoid new-process (and more importantly complete-enumeration) *entirely*, leading to very fast edit-test cycles. So, in this blog post I m really running this idea up the flagpole, and trying to sketch out the interface and hopefully get feedback on it. Taking subunit.run as an example process to do this to:
  1. There should be an option to change from one-shot to server mode
  2. In server mode, it will listen for commands somewhere (lets say stdin)
  3. On startup it might eager load the available tests
  4. One command would be list-tests which would enumerate all the tests to its output channel (which is stdout today so lets stay with that for now)
  5. Another would be run-tests, which would take a set of test ids, and then filter-and-run just those ids from the available tests, output, as it does today, going to stdout. Passing somewhat large sets of test ids in may be desirable, because some test runners perform fixture optimisations (e.g. bringing up DB servers or web servers) and test-at-a-time is pretty much worst case for that sort of environment.
  6. Another would be be std-in a command providing a packet of stdin used for interacting with debuggers
So that seems pretty approachable to me we don t even need an async loop in there, as long as we re willing to patch select etc (for the stdin handling in some environments like Twisted). If we don t want to monkey patch like that, we ll need to make stdin a socketpair, and have an event loop running to shepard bytes from the real stdin to the one we let the rest of Python have. What about that nirvana above? If we assume inotify support, then list_tests (and run_tests) can just consult a changed-file list and reload those modules before continuing. Reloading them just-in-time would be likely to create havoc I think reloading only when synchronised with test completion makes a great deal of sense. Would such a test server make sense in other languages? What about e.g. testtools.run vs subunit.run such a server wouldn t want to use subunit, but perhaps a regular CLI UI would be nice

5 May 2014

Robert Collins: Distributed bugtracking quick thoughts

Just saw http://sny.no/2014/04/dbts and I feel compelled to note that distributed bug trackers are not new the earliest I personally encountered was Aaron Bentley s Bugs everywhere coming up on it s 10th birthday. BE meets many of the criteria in the dbts post I read earlier today, but it hasn t taken over the world and I think this is in large part due to the propogation nature of bugs being very different to code different solutions are needed. XXXX: With distributed code versioning we often see people going to some effort to avoid conflicts semantic conflicts are common, and representation conflicts extremely common.The idions Take for example https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/805661. Here we can look at the nature of the content:
  1. Concurrent cannot-conflict content e.g. the discussion about the bug. In general everyone should have this in their local bug database as soon as possible, and anyone can write to it.
  2. Observations of fact e.g. the code change that should fix the bug has landed in Ubuntu or Commit C should fix the bug .
  3. Reports of symptoms e.g. Foo does not work for me in Ubuntu with package versions X, Y and Z .
  4. Collaboratively edited metadata tags, title, description, and arguably even the fields like package, open/closed, importance.
Note that only one of these things the commit to fix the bug happens in the same code tree as the code; and that the commit fixing it may be delayed by many things before the fix is available to users. Also note that conceptually conflicts can happen in any of those fields except 1). Anyhow my humble suggestion for tackling the conflicts angle is to treat all changes to a bug as events in a timeline e.g. adding a tag foo is an event to add foo , rather than an event setting the tags list to bar,foo then multiple editors adding foo do not conflict (or need special handling). Collaboratively edited fields would be likely be unsatisfying with this approach though last-writer-wins isn t a great story. OTOH the number of people that edit the collaborative fields on any given bug tend to be quite low so one could defer that to manual fixups. Further, as a developer wanting local access to my bug database, syncing all of these things is appealing but if I m dealing with a million-bug bug database, I may actually need the ability to filter what I sync or do not sync with some care. Even if I want everything, query performance on such a database is crucial for usability (something git demonstrated convincingly in the VCS space). Lastly, I don t think distributed bug tracking is needed it doesn t solve a deeply burning use case offline access would be a 90% solution for most people. What does need rethinking is the hugely manual process most bug systems use today. Making tools like whoopsie-daisy widely available is much more interesting (and that may require distributed underpinnings to work well and securely). Automatic collation of distinct reports and surfacing the most commonly experienced faults to developers offers a path to evidence based assessment of quality something I think we badly need.

2 November 2013

Robert Collins: Learning is hard

I feel like I m taking a big personal risk writing this, even though I know the internet is large and probably no-one will read this :-) . So, dear reader, please be gentle. As we grow as people, as developers, as professionals some lessons are are hard to learn (e.g. you have to keep trying and trying to learn the task), and some are hard to experience (they might still be hard to learn, but just being there is hard itself ) I want to talk about a particular lesson I started learning in late 2008/early 2009 while I was at Canonical sadly one of those that was hard to experience. At the time I was one of the core developers on Bazaar, and I was feeling pretty happy about our progress, how bzr was developing, features, community etc. There was a bunch of pressure on to succeed in the marketplace, but that was ok, challenges bring out the stubborn in me :) . There was one glitch though we d been having a bunch of contentious code reviews, and my manager (Martin Pool) was chatting to me about them. I was as far as I could tell doing precisely the right thing from a peer review perspective: I was safeguarding the project, preventing changes that didn t fit properly, or that reduced key aspects- performance, usability from landing until they were fixed. However, the folk on the other side of the review were feeling frustrated, that nothing they could do would fix it, and generally very unhappy. Reviews and design discussions would grind to a halt, and they felt I was the cause. [They were right]. And here was the thing I simply couldn t understand the issue. I was doing my job; I wasn t angry at the people submitting code; I wasn t hostile; I wasn t attacking them (but I was being shall we say frank about the work being submitted). I remember saying to Martin one day look, I just don t get it can you show me what I said wrong? and he couldn t. Canonical has a 360 review system every 6 months / year (it changed over time) you review your peers, subordinate(s) and manager(s), and they review you. Imagine my surprise I was used to getting very positive reports with some constructive suggestions when I scored low on a bunch of the inter-personal metrics in the review. Martin explained that it was the reviews thing folk were genuinely unhappy, even as they commended me on my technical merits. Further to that, he said that I really needed to stop worrying about technical improvement and focus on this inter-personal stuff. Two really important things happened around this time. Firstly, Steve Alexander, who was one of my managers-once-removed at the time, reached out to me and suggested I read a book Getting out of the box and that we might have a chat about the issue after I had read it. I did so, and we chatted. That book gave me a language and viewpoint for thinking about the problem. It didn t solve it, but it meant that I got it , which I hadn t before. So then the second thing happened we had a company all hands and I got to chat with Claire Davis (head of HR at Canonical at the time) about what was going on. To this day the sheer embarrassment I felt when she told me that the broad perception of me amongst other teams managers was and I paraphrase a longer, more nuance conversation here technically fantastic but very scary to have on the team will disrupt and cause trouble . So, at this point about 6 months had passed, I knew what I wanted I wanted folk to want to work with me, to find my presence beneficial and positive on both technical and team aspects. I already knew then that what I seek is technical challenges: I crave novelty, new challenges, new problems. Once things become easy, it call all too easily slip into tedium. So at that time my reasoning was somewhat selfish: how was I to get challenges if no-one wanted to work with me except in extremis? I spent the next year working on myself as much as specific projects: learning more and more about how to play well with others. In June 2010 I got a performance review I could be proud of again I was in no way perfect, but I d made massive strides. This journey had also made huge improvements to my personal life a lot of stress between Lynne and I had gone away. Shortly after that I was invited to apply for a new role within Canonical as Technical Architect for Launchpad and Francis Lacoste told me that it was only due to my improved ability to play well with others that I was even considered. I literally could not have done the job 18 months before. I got the job, and I think I did pretty well in fact I was awarded an internal Spotlight on Success award for what we (it was a whole Launchpad team team effort) achieved while I was in that role. So, what did I change/learn? There s just a couple of key changes I needed to make in myself, but a) they aren t sticky: if I get overly tired, ye old terrible Robert can leak out, and b) there s actually a /lot/ of learnable skills in this area, much of which is derived lots of practice and critical self review is a good thing. The main thing I learnt was that I was Selfish. Yes capital S. For instance, in a discussion about adding working tree filter to bzr, I would focus on the impact/risk on me-and-things-I-directly-care-about: would it make my life harder, would it make bzr slower, was there anything that could go wrong. And I would spend only a little time thinking about what the proposer needed: they needed support and assistance making their idea reach the standards the bzr community had agreed on. The net effect of my behaviours was that I was a class A asshole when it came to getting proposals into a code base. The key things I had to change were:
  1. I need to think about the needs of the person I m speaking to *and not my own*. [Thats not to say you should ignore your needs, but you shouldn't dwell on them: if they are critical, your brain will prompt you].
  2. There s always a reason people do things: if it doesn t make sense, ask them! [The crucial conversations books have some useful modelling here on how and why people do things, and on how-and-why conversations and confrontations go bad and how to fix them.]
Ok so this is all interesting and so forth, but why the blog post? Firstly, I want to thank four folk who were particularly instrumental in helping me learn this lesson: Martin, Steve, Claire and of course my wife Lynne I owe you all an unmeasurable debt for your support and assistance. Secondly, I realised today that while I ve apologised one on one to particular folk who I knew I d made life hard for, I d never really made a widespread apology. So here it is: I spent many years as an ass, and while I didn t mean to be one, intent doesn t actually count here actions do. I m sorry for making your life hell in the past, and I hope I m doing better now. Lastly, if I m an ass to you now, I m sorry, I m probably regressing to old habits because I m too tired something I try to avoid, but it s not always possible. Please tell me, and I will go get some sleep then come and apologise to you, and try to do better in future. Less-assily-yrs,
Rob

13 October 2013

Robert Collins: key transition time

I ve transitioned to a new key announcement here or below. If you ve signed my key in the past please consider signing my new key to get it integrated into the web of trust. Thanks!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1,SHA256
Sun, 2013-10-13
Time for me to migrate to a new key (shockingly late - sorry!).
My old key is set to expire early next year. Please use my new key effective
immediately. If you have signed my old key then please sign my key - this
message is signed by both keys (and the new key is signed by my old key).
old key:
pub 1024D/FBD3EB8E 2002-07-20
Key fingerprint = 9222 8732 859D 25CC 2560 B617 867B F9A9 FBD3 EB8E
new key:
pub 4096R/AAC0E286 2013-10-13
Key fingerprint = 8244 0CEA B440 83C7 9431 D2CC 298E 9A19 AAC0 E286
The new key is up on the keyservers, so you can just pull it from there.
- -Rob
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEARECAAYFAlJZ8FEACgkQhnv5qfvT644WxACfWBoKdVW+YDrMR1H9IY6iJUk8
ZC8AoIMRc55CTXsyn3S7GWCfOR1QONVhiQEcBAEBCAAGBQJSWfBRAAoJEInv1Yjp
ddbfbvgIAKDsvPLQil/94l7A3Y4h4CME95qVT+m9C+/mR642u8gERJ1NhpqGzR8z
fNo8X3TChWyFOaH/rYV+bOyaytC95k13omjR9HmLJPi/l4lnDiy/vopMuJaDrqF4
4IS7DTQsb8dAkCVMb7vgSaAbh+tGmnHphLNnuJngJ2McOs6gCrg3Rb89DzVywFtC
Hu9t6Sv9b0UAgfc66ftqpK71FSo9bLQ4vGrDPsAhJpXb83kOQHLXuwUuWs9vtJ62
Mikb0kzAjlQYPwNx6UNpQaILZ1MYLa3JXjataAsTqcKtbxcyKgLQOrZy55ZYoZO5
+qdZ1+wiD3+usr/GFDUX9KiM/f6N+Xo=
=EVi2
-----END PGP SIGNATURE-----

1 September 2013

Robert Collins: Subunit and subtests

Python 3 recently introduced a nice feature subtests. When I was putting subunit version 2 together I tried to cater for this via a heuristic approach permitting the already known requirement that some tests which are reported are not runnable be combined with substring matching to identify subtests. However that has panned out poorly, when I went to integrate this with testr the code started to get fugly. So, I m going to extend the StreamResult API to know about subtests, and issue a subunit protocol bump to 2.1 to add a new field for labelling subtest events. My plan is to make this build a recursive tree structure that is given test test_foo with subtest i=3 which the Python subtest code would identify as test_foo (i=3) , they should be identified in StreamResult as test_id test_foo (i=3) and parent_test_id test_foo . This can then nest arbitrarily deep if test runners decide to do that, and the individual runnability becomes up to the test runner, not testrepository / subunit / StreamResult.

27 August 2013

Robert Collins: Using vanilla novaclient with Rackspace cloud

The Rackspace docs describe how to use rackspace s custom extensions, but not how to use plain ol nova. Using plain nova is important if you want cloud portability in your scripts. So for future reference these are the settings:

export OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
export OS_REGION_NAME=DFW
export OS_USERNAME=<username>
export OS_TENANT_NAME=<clientid>
export OS_PASSWORD=<password>
export OS_PROJECT_ID=<clientid>
export OS_NO_CACHE=1
unset NOVA_RAX_AUTH
unset OS_AUTH_SYSTEM

4 March 2013

Robert Collins: subunit version 2 progress

Subunit V2 is coming along very well. Current status: Remaining things to do:

25 February 2013

Robert Collins: El cheapo 10Gbps networking

I ve been hitting the limits of gigabit ethernet at home for quite a while now, and as I spend more time working with cloud technologies this started to frustrate me. I d heard of other folk getting good results with second hand Infiniband cards and decided to give it a go myself. I bought two Voltaire dual-port Infiniband adapters a 4X SDR PCI-E x4 card. And in a 2 metre 8470 cable, and we re in business. There are other, more comprehensive guides around to setting this up e.g. http://davidhunt.ie/wp/?p=2291 or http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html On ubuntu the hardware was autodetected; all I needed to do was:
modprobe ib_ipoib
sudo apt-get install opensm # on one machine
And configure /etc/network/interfaces e.g.:
iface ib1 inet static
address 192.168.2.3
netmask 255.255.255.0
network 192.168.2.0
up echo connected > find /sys -name mode   grep ib1 
up echo 65520 > find /sys -name mtu   grep ib1 
With no further tuning I was able to get 2Gbps doing linear file copies via Samba, which I suspect is rather pushing the limits of my circa 2007 home server I ll investigate futher to identify where the bottlenecks are, but the networking itself I suspect is ok netperf got me 6.7Gbps in a trivial test.

23 February 2013

Robert Collins: Simpler is better a single event type for StreamResult

StreamResult, covered in my last few blog posts, has panned out pretty well. Until that is, that I sat down to do a serialised version of it. It became fairly clear that the wire protocol can be very simple just one event type that has a bunch of optional fields test ids, routing code, file data, mime-type etc. It is up to the recipient at the far end of a stream to derive semantic meaning, which means that encoding a lot of rules (such as a data packet can have either a test status or file data) into the wire protocol isn t called for. If the wire protocol doesn t have those rules, Python parsers that convert a bytestream into StreamResult API calls will have to manually split packets that have both status() and file() data in them this means it would be impossible to create many legitimate bytestreams via the normal StreamResult API. That seems to be an unnecessary restriction, and thinking about it, having a very simple here is an event about a test run API that carries any information we have and maps down a very simple wire protocol should be about as easy to work with as the current file or status API. Most combinations of file+status parameters is trivially interpretable, but there is one that had no prior definition a test_status with no test id specified. Files with no testid are easily considered as global scope for their source, so perhaps test_status should be treated the same way? [Feedback in comments or email please]. For now I m going to leave the meaning undefined and unconstrained. So I m preparing a change to my patchset for StreamResult to: This will make the API trivially serialisable (both to JSON or protobufs or whatever, or to the custom binary format I m considering for subunit), and equally trivially parsable, which I think is a good thing.

19 February 2013

Robert Collins: First experience implementing StreamResult

My last two blog posts were largely about the needs of subunit, but a key result of any protocol is how easy working with it in a high level language is. In the weekend and evenings I ve done an implementation of a new set of classes StreamResult and friends that provides: So far the code has been uniformly simple to write. I started with an API that included an estimate function, which I ve since removed I don t believe the complexity is justified; enumeration is not significantly more expensive than counting, and runners that want to be efficient can either not enumerate or remember the enumeration from prior runs. The documentation in the linked pull request is a good place to start to get a handle on the API; I d love feedback. Next steps for me are to do a subunit protocol revision that maps to the Python API, both parser and generator and see how it feels. One wrinkle there is that the reason for doing this is to fix intrinsic limits in the existing protocol so doing forward and backward wire protocol compatibility would defeat the point. However we can make the output side explicitly choose a protocol version, and if we can autodetect the protocol version in the parser, even if we cannot handle mixed streams we can get the benefits of the new protocol once data has been detected. That said, I think we can start without autodetection during prototyping, and add it later. Without autodetection, programs like TestRepository will need configuration options to control what protocol variant to expect. This could be done by requiring this new protocol and providing a stream filter that can be deployed when needed.

15 February 2013

Robert Collins: More subunit needs

Of course, as happens sadly often, the scope creeps.. Additional pain points Zope s test runner runs things that are not tests, but which users want to know about layers . At the moment these are reported as individual tests, but this is problematic in a couple of ways. Firstly, the same test runs on multiple backend runners, so timing and stats get more complex. Secondly, if a layer fails to setup or teardown, tools like testrepository that have watched the stream will think a test failed, and on the next run try to explicitly run that test but that test doesn t really exist, so it won t run [unless an actual test that needs the layer is being run]. Openstack uses python coverage to gather coverage statistics during test runs. Each worker running tests needs to gather and return such statistics. The current subunit protocol has no space to hand this around, without it pretending to be a test [see a pattern here?]. And that has the same negative side effect test runners like testrepository will try to run that test . While testrepository doesn t want to know about coverage itself, it would be nice to be able to pass everything around and have a local hook handle the aggregation of that data. The way TAP is reflected into subunit today is to mangle each tap test into a subunit test , but for full benefits subunit tests have a higher bar they are individually addressable and runnable. So a TAP test script is much more equivalent to a subunit test. A similar concept is landing in Python s unittest soon subtests which will give very lightweight additional assertions within a larger test concept. Many C test runners that emit individual tests as simple assertions have this property as well there may be 5 or 10 executables each with dozens of assertions, but only the executables are individually addressable there is no way to run just one assertion from an executable as a test . It would be nice to avoid the friction that currently exists when dealing with that situation. Minimum requirements to support these Layers can be supported via timestamped stdout output, or fake tests. Neither is compelling, as the former requires special casing in subunit processors to data mine it, and the latter confuses test runners. A way to record something that is structured like a test (has an id the layer, an outcome in progress / ok / failed, and attachment data for showing failure details) but isn t a test would allow the data to flow around without causing confusion in the system. TAP support could change to just show the entire output as progress on one test and then fail or not at the end. This would result in a cognitive mismatch for folk from the TAP world, as TAP runners report each assertion as a test , and this would be hidden from subunit. Having a way to record something that is associated with an actual test, and has a name, status, attachment content for the TAP comments field that would let subunit processors report both the addressable tests (each TAP script) and the individual items, but know that only the overall scripts are runnable. Python subtests could use a unique test for each subtest, but that has the same issue has layers. Python will ensure a top level test errors if a subtest errors, so strictly speaking we probably don t need an associated-with concept, but we do need to be able to say that a test-like thing happened that isn t actually addressable. Coverage information could be about a single test, or even a subtest, or it could be about the entire work undertaken by the test process. I don t think we need a single standardised format for Coverage data (though that might be an excellent project for someone to undertake). It is also possible to overthink things :) . We have the idea of arbitrary attachments for tests. Perhaps arbitrary attachments outside of test scope would be better than specifying stdout/stderr as specific things. On the other hand stdout and stderr are well known things. Proposal version 2 A packetised length prefixed binary protocol, with each packet containing a small signature, length, routing code, a binary timestamp in UTC, a set of UTF8 tags (active only, no negative tags), a content tag one of (estimate + number, stdin, stdout, stderr, file, test), test-id, runnable, test-status (one of exists/inprogress/xfail/xsuccess/success/fail/skip), an attachment name, mime type, a last-block marker and a block of bytes. The std/stdout/stderr content tags are gone, replaced with file. The names stdin,stdout,stderr can be placed in the attachment name field to signal those well known files, and any other files that the test process wants to hand over can be simply embedded. Processors that don t expect them can just pass them on. Runnable is a boolean, indicating whether this packet is describing a test that can be executed deliberately (vs an individual TAP assertion, Python sub-test etc). This permits describing things like zope layers which are top level test-like things (they start, stop and can error) though they cannot be run.. and it doesn t explicitly model the setup/teardown aspect that they have. Should we do that? Testid is for identifying tests. With the runnable flag to indicate whether a test really is a test, subtests can just be namespaced by the generator reporters can choose whether to be naive and report every test , or whether to use simple string prefix-with-non-character-seperator to infer child elements. Impact on Python API If we change the API to:
class TestInfo(object):
    id = unicode
    status = ('exists', 'inprogress', 'xfail', 'xsuccess', 'success', 'fail', 'error', 'skip')
    runnable = boolean
class StreamingResult(object):
    def startTestRun(self):
        pass
    def stopTestRun(self):
        passs
    def estimate(self, count, route_code=None, timestamp=None):
        pass
    def file(self, name, bytes, eof=False, mime=None, test_info=None, route_code=None, timestamp=None):
        """Inform the result about the contents of an attachment."""
    def status(self, test_info, route_code=None, timestamp=None):
        """Inform the result about a test status with no attached data."""
This would permit the full semantics of a subunit stream to be represented I think, while being a narrow interface that should be easy to implement. Please provide feedback! I ll probably start implementing this soon.

14 February 2013

Robert Collins: Time to revise the subunit protocol

Subunit is seven and a half years old now Conrad Parker and I first sketched it up at a CodeCon camping and coding, a brilliant combination in mid 2005.
revno: 1
committer: Robert Collins <robertc@robertcollins.net>
timestamp: Sat 2005-08-27 15:01:20 +1000
message:  design up a protocol with kfish
It has proved remarkably resilient as a protocol the basic nature hasn t changed at all, even though we ve added tags, timestamps, support for attachments of arbitrary size. However a growing number of irritations have been building up with it. I think it is time to design another iteration of the protocol, one that will retain the positive qualities of the current protocol, while helping it become suitable for the next 7 years. Ideally we can keep compatibility and make it possible for a single stream to be represented in any format. Existing design The existing design is a mostly human readable line orientated protocol that can be sniffed out from the regular output of make or other build systems. Binary attachments are done using HTTP chunking, and the parser has to maintain state about the current test, tags, timing data and test progression [a simple stack of progress counters]. How to arrange subunit output is undefined, how to select tests to run is undefined. This makes writing a parser quite easy, and the tagging and timestamp facility allow multiplexing streams from two or more concurrent test runs into one with good fidelity but also requires that state be buffered until the end of a test, as two tests cannot be executing at once. Dealing with debuggers The initial protocol was intended to support dropping into a debugger just pass each line read through to stdout, and connect stdin to the test process, and voila, you have a working debugger connection. This works, but the current line based parsers make using it tedious the line buffered nature of it makes feedback on what has been typed fiddly, and stdout tends to be buffered, leading to an inability to see print statements and the like. All in-principle fixable, right ? When running two or more test processes, which test process should stdin be connected to? What if two or more drop into a debugger at once? What is being typed to which process is more luck than anything else. We ve added some idioms in testrepository that control test execution by a similar but different format one test per line to list tests, and have runners permit listing and selecting by a list. This works well, but the inconsistency with subunit itself is a little annoying you need two parsers, and two output formats. Good points The current protocol is extremely easy to implement for emitters, and the arbitrary attachments and tagging features have worked extremely well. There is a comprehensive Python parser which maps everything into Python unittest API calls (an extended version of the standard, with good backwards compatibility). Pain points The debugging support was a total failure, and the way the parser depraminates it s toys when a test process corrupts an outcome line is extremely frustrating. (other tests execute but the parser sees them as non-subunit chatter and passes the lines on through stdout). Dealing with concurrency The original design didn t cater for concurrency. There are three concurrency issues the corruption issue (see below for more detail) and multiplexing. Consider two levels of nested concurrency: A supervisor process such as testrepository starts 2 (or more but 2 is sufficient to reason about the issue) subsidiary worker processes (I1 and I2), each of which starts 2 subsidiary processes of their own (W1, W2, W3, W4). Each of the 4 leaf processes is outputting subunit which gets multiplexed in the 2 intermediary processes, and then again in the supervisor. Why would there be two layers? A concrete example is using testrepository to coordinate test runs on multiple machines at once, with each machine running a local testrepository to broker tests amongst the local CPUs. This could be done with 4 separate ssh sessions and no intermediaries, but that only removes a fraction of the issues. What issues? Well, consider some stdout chatter that W1 outputs. That will get passed to I1 and from there to the supervisor and captured. But there is nothing marking the chatter as belonging to W1: there is no way to tell where it came from. If W1 happened to fail, and there was a diagnostic message printed, we ve lost information. Or at best muddled it all up. Secondly, buffering imagine that a test on W1 hangs. I1 will know that W1 is running a test, but has no way to tell the supervisor (and thus the user) that this is the case, without writing to stdout [and causing a *lot* of noise if that happens a lot]. We could have I1 write to stdout only if W1 s test is taking more than 5 seconds or something but this is a workaround for a limitation of the protocol. Adding to the confusion, the clock on W1 and W3 may be very skewed, so timestamps for everything have to be carefully synchronised by the multiplexer. Thirdly, scheduling if W1/W2 are on a faster machine than W3/W4 then a partition of equal-timed tests onto each machine will lead one idle before the other finishes. It would be nice to be able to pass tests to run to the faster machine when it goes idle, rather than having to start a new runner each time. Lastly, what to do when W1 and W2 both wait for user input on stdin (e.g. passphrases, debugger input, $other). Naively connecting stdin to all processes doesn t work well. A GUI supervisor could connect a separate fd to each of I1 and I2, but that doesn t help when it is W1 and W2 reading from stdin. So additional requirement over baseline subunit:
  1. make it possible for stdout and stderr output to be captured from W1 and routed through I1 to the supervisor without losing its origin. It might be chatter from a noisy test, or it might be build output. Either way, the user probably will benefit if we can capture it and show it to them later when they review the test run. The supervisor should probably show it immediately as well the protocol doesn t need to care about that, just make it possible.
  2. make it possible to pass information about tests that have not completed through one subunit stream while they are still incomplete.
  3. make it possible (but optional) to pass tests to run to a running process that accepts subunit.
  4. make it possible to route stdin to a specific currently process like W1. This and point 3 suggest that we need a bidirectional protocol rather than the solely unidirectional protocol we have today. I don t know of a reliable portable way to tell when some process is seeking such input, so that will be up to the user I think. (e.g. printing (pdb) to stdout might be a sufficiently good indicator.)
Dealing with corruption Consider the following subunit fragment:
test: foo
starting serversuccess:foo
This is a classic example of corruption: the test foo started a server and helpfully wrote to stdout explaining that it did that, but missed the newline. As a result the success message for the test wasn t printed on a line of its own, and the subunit parser will believe that foo never completed. Every subsequent test is then ignored. This is usually easy to identify and fix, but its a head-scratcher when it happens. Another way it can happen is when a build tool like make runs tests in parallel, and they output subunit onto the same stdout file handle. A third way is when a build tool like make runs two separate test scripts serially, and the first one starts a test but errors hard and doesn t finish it. That looks like:
test: foo
test: bar
success: bar
One way that this sort of corruption can be mitigated is to put subunit on it s own file descriptor, but this has several caveats: it is harder to tunnel through things like ssh and it doesn t solve the failing test script case. I think it is unreasonable to require a protocol where arbitrary interleaving of bytes between different test runner streams will work so the make -j2 case can be ignored at the wire level though we should create a simple way to safely mux the output from such tests when the execute. The root of the issue is that a dropped update leaves bad state in the parser and it never recovers. So some way to recover, or less state to carry in the parser, would neatly solve things. I favour reducing parser state as that should shift stateful complexity onto end nodes / complex processors, rather than being carried by every node in the transmission path. Dependencies Various suggestions have been made JSON, Protobufs, etc A key design goal of the first subunit was a low barrier to entry. We keep that by being backward compatible, but being easy to work with for the new revision is also a worthy goal. High level proposal A packetised length prefixed binary protocol, with each packet containing a small signature, length, routing code, a binary timestamp in UTC, a set of UTF8 tags (active only, no negative tags), a content tag one of (estimate + number, stdin, stdout, stderr, test- + test id), test status (one of exists/inprogress/xfail/xsuccess/success/fail/skip), an attachment name, mime type, a last-block marker and a block of bytes. The content tags: Test status values are pretty ordinary. Exists is used to indicate a test that can be run when listing tests, and inprogress is used to report a test that has started but not necessarily completed. Attachment names must be unique per routing code + testid. So how does this line up? Interleaving and recovery We could dispense with interleaving and say the streams are wholly binary, or we can say that packets can start either after a \n or directly after another packet. If we say that binary-only is the approach to take, it would be possible to write a filter that would apply the newline heuristic (or even look for packet headers at every byte offset. I think mandating adjacent to a packet or after \n is a small concession to make and will avoid tools like testrepository forcing users to always configure a heuristic filter. Non-subunit content can be wrapped in subunit for forwarding (the I1 in W1->I1->Supervisor chain would do the wrapping). This won t eliminate corruption but it will localise it and permit the stream to recover: the test that was corrupted will show up as incomplete, or with incomplete attachment data. listing Test listing would emit many small non-timestamped packets. It may be useful to have a wrapper packet for bulk amounts of fine grained data like listing is, or for multiplexers with many input streams that will often have multiple data packets available to write at once. Selecting tests to run Same as for listing while passing regexes down to the test runner to select groups of tests is a valid use case, thats not something subunit needs to worry about : if the selection is not the result of the supervisor selecting by test id, then it is known at the start of the test run and can just be a command line parameter to the backend : subunit is relevant for passing instructions to a runner mid-execution. Because the supervisor cannot just hand out some tests and wait for the thing it ran to report that it can accept incremental tests on stdin, supervisor processes will need to be informed about that out of band. Debugging Debugging is straight forward . The parser can read the first 4 or so bytes of a packet one at a time to determine if it is a packet or a line of stdout/stderr, and then either read to end of line, or the binary length of the packet. So, we combine a few things; non-subunit output should be wrapped and presented to the user. Subunit that is being multiplexed and forwarded should prepend a routing code to the packet (e.g. I1 would append 1 or 2 to select which of W1/W2 the content came from, and then forward the packet. S would append 1 or 2 to indicate I1/I2 the routing code is a path through the tree of forwarding processes). The UI the user is using needs to supply some means to switch where stdin is attached. And stdin input should be routed via stdin packets. When there is no routing code left, the packet should be entirely unwrapped and presented as raw bytes to the process in question. Multiplexing Very straight forward unwrap the outer layer of the packet, add or adjust the routing code, serialise a header + adjusted length + rest of packet as-is. No buffering is needed, so the supervisor can show in-progress tests (and how long they have been running for). Parsing / representation in Python or other languages The parser should be very simple to write. Once parsed, this will be fundamentally different to the existing Python TestCase->TestResult API that is in used today. However it should be easy to write two adapters: old-style <-> this new-style. old-style -> new-style is useful for running existing tests suites and generating subunit, because thats way the subunit generation is transparent. new-style->old-style is useful for using existing test reporting facilities (such as junitxml or html TestResult objects) with subunit streams. Importantly though, a new TestResult style that supports the features of this protocol would enable some useful features for regular Python test suites: The API might be something like:
class StreamingResult(object):
    def startTestRun(self):
        pass
    def stopTestRun(self):
        pass
    def estimate(self, count, route_code=None):
        pass
    def stdin(self, bytes, route_code=None):
        pass
    def stdout(self, bytes, route_code=None):
        pass
    def test(self, test_id, status, attachment_name=None, attachment_mime=None, attachment_eof=None, attachment_bytes=None):
        pass
This would support just-in-time debugging by wiring up pdb to the stdin/stdout handlers of the result object, rather than actual stdin/stdout of the process a simple matter once written. Alternatively, the test runner could replace sys.stdin/stdout etc with thunk file-like objects, which might be a good idea anyway to capture spurious output happening during a test run. That would permit pdb to Just Work (even if the test process is internally running concurrent tests.. until it has two pdb objects running concurrently :) Generation new streams Should be very easy in anything except shell. For shell, we can have a command line tool that when invoked outputs a subunit stream for one instruction. E.g. test foo completed + some attachments or test foo starting .

24 January 2013

Robert Collins: Launchpadlib without gnome-keyring

Recently I ve been doing my personal development SSH d into my personal laptop. I found that launchpadlib (which various projects use for release automation) was failing the gnome keyring API threw an error because the keyring was locked, and python-keyring didn t try to unlock it. I needed a workaround to be able to release stuff, and with a bit of digging and help from #launchpad, came up with this: mkdir ~/.cache/keyring
mkdir ~/.local/share/python_keyring
echo > ~/.local/share/python_keyring/keyringrc.cfg << EOF
[backend]
default-keyring=keyring.backend.UncryptedFileKeyring
keyring-path=/home/robertc/.cache/keyring/
EOF (There is already encryption in place, so I chose an uncrypted store read the keyring source to find other alternatives). With this done, I can now use lp-shell etc over SSH, for when I m not physically at my machine.

14 January 2013

Robert Collins: Multi-machine parallel testing of nova with testrepository

I recently added a formal interface to testrepository to enable cross-machine scaling of test runs. As testrepository is still a static scheduler, this isn t perfect, but its quite a minimal interface, which makes it easy to implement. I will likely evolve it in reaction to feedback and experience. In the long term I d love to have a super generic tool that matches that interface, so the project VCS copy of .testr.conf can just call out to it. However I don t yet have that, but I do have a simple by-hand implementation that I use to run nova s tests across my personal laptop, desktop and work laptop. Testr models this by assuming each test running process can be mapped to a single instance id (which could be a chroot, vm, cloud instances, ) and then running one or more commands in the instance, before disposing of it. This by hand implementation consists of 4 things:
  1. A tiny script to rsync my source directory to the relevant places before I run tests. (This takes <2seconds on my home wifi).
  2. A script to allocate instance ids (I just use ints)
  3. A script to discard them
  4. And a script to copy tempfiles onto the target machine and run a given command.
I do my testing in lxc containers, because I like my primary environment to be free of project-specific quirks and workarounds. lxc is not needed though, if you don t want it. So, to set this up for yourself:
  1. on each host, make an lxc container (e.g. following) http://wiki.openstack.org/DependsOnUbuntu
  2. start them all (lxc-start -n nova -d)
  3. Make SSH config entries for the lxc containers, so you can get at them remotely. (make sure your host * rules are at the end of the file otherwise the master overrides won t work [and you might not notice for some time...]):
    Host desktop-nova.lxc
    # lxc addresses may be present on localhost too, so namespace the control
    # path to avoid connecting to the wrong container.
      ControlPath ~/.ssh/master-lxc-%r@%h:%p
      hostname 10.0.3.19
      ProxyCommand ssh 192.168.1.106 nc -q0 %h %p
    Host hplaptop-nova.lxc
    # lxc addresses may be present on localhost too, so namespace the control
    # path to avoid connecting to the wrong container.
      ControlPath ~/.ssh/master-lxc-%r@%h:%p
      hostname 10.0.3.244
      ProxyCommand ssh 192.168.1.116 nc -q0 %h %p
  4. make a script to copy your nova source tree to each test location. I called mine sync
    #!/bin/bash           
    cd $(dirname $0)
    echo syncing in $(pwd) 
    (rsync -a . desktop-nova.lxc:source/openstack/nova --delete-after && echo dell done) &
    (rsync -a . hplaptop-nova.lxc:source/openstack/nova --delete-after && echo hp done)
  5. Make sure you have the base directory on each location
    ssh desktop-nova.lxc mkdir -p source/openstack
    ssh hplaptop-nova.lxc mkdir -p source/openstack
  6. Sync your code over.
    ./sync
  7. And check tests run by running a few.
    ssh hplaptop-nova.lxc "cd source/openstack/nova && ./run_tests.sh compute"
    ssh hplaptop-nova.lxc "cd source/openstack/nova && ./run_tests.sh compute"
    This will check the test environment: we re not going to be running tests on each node via run-tests or even testr (because it gets immediately meta), but if this fails, later attempts won t work. Your test virtualenv is inside the source tree, so it is copied implicitly by the sync.
  8. Decide what concurrency you want. For me, I picked 12: I have a desktop i7 with 4 cores, and two laptops with 2 cores each, and hyperthreads are on on all of them I m going to set a concurrency figure of 12 between the cores (8) and threads (16) counts, and possibly balance it more in future. A higher number assumes less contention between ALU s and other elements of the core pipeline, and I expect quite some contention because most of nova s unittests are CPU bound not I/O. If the test servers are not busy, I can always raise it later.
  9. Create scripts to create / dispose / execute logical worker threads.
  10. Creation. I call this instance-provision and all it does is find the lowest ints not currently allocated and return them.
    #!/usr/bin/env python
    import os.path
    import sys
    if not os.path.isdir('.instances'):
        os.mkdir('.instances')
    running_ids = os.listdir('.instances')
    count = int(sys.argv[1])
    top = count + len(running_ids)
    ids = [str(i) for i in range(top)]
    new = set(ids) - set(running_ids)
    for id in new:
        file('.instances/%s' % id, 'w').close()
    print(' '.join(new))
  11. Disposal is easy: remove the file marking the instance as in-use.
    #!/bin/bash
    echo freeing $@
    cd .instances
    rm $@
  12. Execution is a little trickier. We need to run some commands locally, and other ones by copying in temp files that testr has setup to the machine sshing to the remote machine, cd ing to the right directory, sourcing the virtual env, and finally running the command.
    #!/bin/bash
    instance="$(($1 % 4))"
    case $instance in
    [0]) node=
         local="true"
         ;;
    [1]) node=hplaptop-nova.lxc
         local=""
         ;;
    [2-3]) node=desktop-nova.lxc
         local=""
         ;;
    *)   echo "Unknown instance $instance" >&2
         exit 1
         ;;
    esac
    shift
    files=
    # accumulate files to copy
    while [ "--" != "$1" ]; do 
    files="$files $1"
    shift ; done 
    shift   
    if [ -n "$files" -a -z "$local" ]; then
        echo copying $files to node.
        for f in $files; do
            rsync $f $node:$(dirname $f) ;
        done
    fi  
    if [ -n "$local" ]; then
        eval $@
    else
        echo ssh to $node
        ssh $node "cd source/openstack/nova && . .venv/bin/activate && $@"
    fi
  13. Finally, tell testr how to use this. (Don t commit this change to nova, as it would break other people). Add this to your .testr.conf.
    test_run_concurrency=echo 12
    instance_provision=./instance-provision $INSTANCE_COUNT
    instance_execute=./instance-execute $INSTANCE_ID $FILES -- $COMMAND
    instance_dispose=./instance-dispose $INSTANCE_IDS
Now, when you run testr run parallel, it will run across your machines. Just do a ./sync before running tests to get the code out there. It is possible to wrap all of this up via automation (or to include just-in-time provisioned cloud instances), but I like the results of still rough scripts here it strikes a good balance between effort, reliability and performance.

8 September 2012

Robert Collins: Launchpads page performance report now reusable

Thanks to Corey Goldberg, one of my colleagues @ Canonical, the page performance report can now be used on regular Apache log files, rather than just the zserver trace log files that Launchpad s middle tier generates. We use this report to identify poorly performing pages and get insight into the timing patterns of bad pages. The code lives in the Launchpad dev-utils project instructions for checking it out and configuring it are on the wiki. If you don t have aggregate data for your web application, I highly recommend grabbing PPR and checking it out its very lightweight, and data is extremely useful.

27 August 2012

Robert Collins: Why platform specific package systems exist and won t go away

A while back mdz blogged about challenges facing Ubuntu and other Linux distributions. He raises the point that runtime libraries for Python / Ruby etc have a unique set of issues because they tend to have their own packaging systems. Merely a month later he attended Debconf 2010 where a presentation was given on the issues that Java packages have on Dpkg based systems. Since then the conversation seems to have dried up. I ve been reminded of it recently in discussions within Canonical looking at how we deploy web services. Matt suggested some ways forward, including: I think its time we revisit and expand on those points. Nothing much has changed in how Ubuntu or other distributions approach integration with other packaging systems but the world has kept evolving. Internet access is growing ever more ubiquitous, more platforms are building packaging systems clojure, scala, node.js, to name but three, and a substantial and ever growing number of products expect to operate in a hybrid fashion with an evolving web service plus a local client which is kept up to date via package updates. Twitter, Facebook and Google Plus are three such products. Android has demonstrated a large scale app store on top of Linux, with its own custom packaging format. In order to expand them, we need some background context on the use cases that these different packaging systems need to support. Platforms such as antivirus scanners, node.js, Python, Clojure and so forth care a great deal about getting their software out to their users. They care about making it extremely easy to get the latest and greatest versions of their libraries. I say this because the evidence is all around us: every successful development community / product has built a targeted package management system which layers on top of Windows, and Mac OSX, and *nux. The only rational explanation I can come up for this behaviour is that the lower level operating system package management tools don t deliver what they need. E.g. this isn t as shallow as wanting a packaging system written in their own language, which would be easy to write off as parochialism rather than a thoughtful solution to their problems. In general packaging systems provide a language for shipping source or binary form, from one or more repositories, to users machines. They may support replications, and they may support multiple operating systems. They generally end up as graph traversal engines, pulling in dependencies of various sorts you can see the DOAP specification for an attempt at generic modelling of this. One problem that turns up rapidly when dealing with Linux distribution package managers is that the versions upstream packages have, and the versions a package has in e.g. Debian, differ. They differ because at some stage, someone will need to do a new package for the distribution when no upstream change has been made. This might be to apply a local patch, or it might be to correct a defect caused by a broken build server. Whatever the cause, there is a many to one relationship between the package versions that end users see via dpkg / rpm etc, and those that upstream ship. It is a near certainty that once this happens to a library package, that comparing package versions across different distribution packages becomes hard. You cannot reliably infer whether a given package version is sufficient as a dependency or not, when comparing binary packages between Red Hat and Debian. Or Debian and Ubuntu. The result of this is that even when the software (e.g. rpm) is available on multiple distributions (say Ubuntu and RHEL), or even on multiple operating systems (say Ubuntu and Windows), that many packages will /have/ to be targeted specifically to build and execute properly. (Obviously, compilation has to proceed separately for different architectures, its more the depedency metadata that says and build with version X of dependency Y that has to be customised). The result of this is that there is to the best of my knowledge no distribution of binary packages that targets Debian/Ubuntu and RHEL and Suse and Windows and Mac OS X, although there are vibrant communities building distributions of and for each in isolation. Some of the ports systems come close, but they are still focused on delivering to a small number of platforms. There s nothing that gives 99% coverage of users. And that means that to reach all their users, they have to write or adopt a new system. For any platform X, there is a strong pressure to have the platform be maintainable by folk that primarily work with X itself, or with the language that X is written in. Consider Python there is strong pressure to use C, or Python, and nothing else, for any tools that is somewhat parochial, but also just good engineering reducing variables and making the system more likely to be well maintained. The closest system I know of Steam is just now porting to Ubuntu (and perhaps Linux in general), and has reached its massive popularity by focusing entirely on applications for Windows, with Mac OSX a recent addition. Systems like pypi which have multi platform eggs do target the wide range of platforms I listed above, but they do so both narrowly and haphazardly: whether a binary or source package is available for a given platform is up to the maintainer of the package, and the packages themselves are dealing with a very narrow subset of the platforms complexity: Python provides compilation logic, they don t create generic C libraries with stable ABI s for use by other programs, they don t have turing complete scripts for dealing with configuration file management and so forth. Anti virus updaters similarly narrow the problem they deal with, and add constraints on latency- updates of anti virus signatures are time sensitive when a new rapidly spreading threat is detected. A minor point, but it adds to the friction of considering a single packaging tool for all needs is the different use cases of low level package management tools like dpkg or rpm vs the use cases that e.g. pypi has. A primary use case for packages on pypi is for them to be used by people that are not machine administrators. They don t have root, and don t want it. Contrast that with dpkg or rpm where the primary use case (to date) is the installation of system wide libraries and tools. Things like man page installation don t make any sense for non-system-wide package systems, whereas they are a primary feature for e.g. dpkg. In short, the per-platform/language tools are (generally):
  1. Written in languages that are familiar to the consumers of the tools.
  2. Targeted at use on top of existing platforms, by non-privileged users, and where temporary breakage is fine.
  3. Intended to get the software packaged in them onto widely disparate operating systems.
  4. Very narrow they make huge assumptions about how things can fit together, which their specific language/toolchain permits, and don t generalise beyond that.
  5. Don t provide for security updates in any specific form: that is left up to folk that ship individual things within the manager.
operating system package managers:
  1. Are written in languages which are very easy to bootstrap onto an architecture, and to deploy onto bare metal (as part of installation).
  2. Designed for delivering system components, and to avoid be able to upgrade the toolchain itself safely.
  3. Originally built to install onto one operating system, ports to other operating systems are usually fragile and only adopted in niche.
  4. Are hugely broad they install data, scripts, binaries, and need to know about late binding, system caches etc for every binary and runtime format the operating system supports
  5. Make special provision to allow security updates to be installed in a low latency fashion, without requiring anything consuming the package that is updated to change [but usually force-uninstalling anything that is super-tightly coupled to a library version].
Anti virus package managers:
  1. Exist to update daemons that run with system wide escalated privileges, or even file system layer drivers.
  2. Update datasets in realtime.
  3. Without permitting updates that are produced by third parties.
Given that, lets look at the routes Matt suggested Decoupling applications from the core as a strategy makes an assumption that the core and applications are partitionable. If they are not, then applications and the core will share common elements that need to be updated together. Consider, for instance, a Python application. If you run with a system installed Python, and it is built without zlib for some reason, but the Python application requires zlib, you have a problem. A classic example of this problem is facing Ubuntu today, with all the system provided tools moving to Python 3, but vast swathes of Python applications still being unported to Python 3 at all. Currently, the Python packaging system virtualenv/buildout + distribute don t provide a way to install the Python runtime itself, but will happily install their own components for everything up the stack from the runtime. Ubuntu makes extensive use of Python for its own tools, so the system Python has a lot of packages installed which buildout etc cannot ignore this often leads to issues with e.g. buildout, when the bootstrap environment has (say) zope.interfaces, but its then not accessible from the built-out environment that disables the standard sys.path (to achieve more robust separation). If we want to pursue decoupling, whether we build a new package manager or use e.g. virtualenv (or gem or npm or ), we ll need to be aware of this issue and perhaps offer, for an extended time, a dedicated no-frills, no-distro-packages install, to avoid it, and to allow an extended supported period for application authors without committing to a massive, distro sponsored porting effort. While its tempting to say we should install pip/npm/lein/maven and other external package systems, this is actually risky: they often evolve sufficiently fast that Ubuntu will be delivering an old, incompatible version of the tool to users well before Ubuntu goes out of support, or even befor the next release of Ubuntu. Treating data as a service. All the cases I ve seen so far of applications grabbing datasets from the web have depended on web infrastructure for validating the dataset. E.g. SSL certificates, or SSL + content checksums. Basically, small self-rolled distribution systems. I m likely ignorant of details here, and I depend on you, dear reader, to edumacate me. There is potential value in having data repackaged, when our packaging system has behind-firewall support, and the adhoc system that (for instance) a virus scanner system has does not. In this case, I specifically mean the problem of updated a machine which has no internet access, not even via a proxy. The challenge I see it is again the cross platform issue: The vendor will be supporting Ubuntu + Debian + RHEL + Suse, and from their perspective its probably cheaper to roll their own solution than to directly support dpkg + rpm + whatever Apple offer + Windows the skills to roll an adhoc distribution tool are more common than the skills to integrate closely with dpkg or rpm What about creating a set of interfaces for talking to dpkg / rpm / the system packagers on Windows and Mac OSX ? Here I think there is some promise, but it needs as Matt said careful thought. PackageKit isn t sufficient, at least today. There are, I think, two specific cases to cater to:
  1. The anti-virus / fresh data set case.
  2. The egg/gem/npm/ specific case.
For the egg/gem/npm case, we would need to support a pretty large set of common functionality, on Windows/Mac OSX / *nux (because otherwise upstream won t adopt what we create: losing 90% of their users (windows) or 5% (mac) isn t going to be well accepted :) . We d need to support multiple installations (because of mutually incompatible dependencies between applications), and we d need to support multiple language bindings in some fashion some approachable fashion where the upstream will feel capable of fixing and tweaking what we offer. We re going to need to support offline updates, replication, local builds, local repositories, and various signing strategies to match the various tradeoffs made by the upstream tools. For the anti-virus / fresh data case, we d need to support a similar set of operating systems, though I strongly suspect that there would be more tolerance for limited support in that most things in that space either have very platform specific code, or they are just a large-scale form of the egg/gem/npm problem, which also wants easy updates. What next? We should validate this discussion with at least two or three upstreams. Find out whats missing I suspect a lot and whats wrong I hope not much :) . Then we ll be in a position to decide if there is a tractable, widespread solution *possible*. Separately, we should stop fighting with upstreams that have their own packaging systems. They are satisfying different use cases than our core distro packaging systems are designed to solve. We should stop mindlessly repackaging things from e.g. eggs to debs, unless we need that specific thing as part of the transitive runtime or buildtime dependencies for the distribution itself. In particular, if us folk that build system packaging tools adopt and use the upstream application packaging tools, we can learn in a deep way the (real) advantages they have, and become more able to reason about how to unify the various engineering efforts going into them and perhaps even eventually satisfy them using dpkg/rpm on our machines.

13 August 2012

Robert Collins: minimising downtime for schema changes with postgresql

Two years ago Launchpad did schema changes once a month. Everyone would cross their fingers and hope while the system administrators took all the application servers offline, patched the database with a months worth of work and brought up the servers again running the new QA d codebase. This had two problems:
  1. due to the complexity of the system something like 300 processes have to be stopped or inhibited to take everything offline the downtime duration was often about 90 minutes long irrespective of the schema patch duration. [Some of the processes don't like being interrupted at all].
  2. We simply could not deliver any change in less than 1 week, with the on average latency for something that jumped all the queues still being 2 weeks.
About a year ago we wanted to increase the rate at which schema changes could be carried out the efforts to speed Launchpad up had consumed most low hanging fruit and more and more schema patches were required. We didn t want to introduce additional 90 minute downtime windows though. Adopting incremental migrations the sort of change process described in various places on the internet seemed like a good way to make it possible to apply the schema changes without this slow shutdown-and-restart step, which was required because the pre-patch codebase couldn t speak to the new schema. We could optimise each patch to be very fast by avoiding anything that causes a full table scan or table rewrite (such as adding indices, adding columns with a non-NULL default value). That would let us avoid the 90 minutes of downtime caused by stopping and restarting everything. However, that wasn t sufficient the reason Launchpad ended up doing monthly downtime is that previous attempts to do more frequent schema changes had too high a failure rate. A key reason for patch deployment time blowing out when everything wasn t shut down was due to Launchpad being a very busy system with the use of Slony, schema changes require an exclusive lock on all tables. [More recent versions of Slony only lock some tables, but it still requires very widespread locks for most DDL operations]. We re doing nearly 10 thousand transactions per minute, at any point in time there are always locks open on some table in the system: it was highly improbably and effectively impossible for slonik to get an exclusive lock on all tables in a reasonable timeframe. Background tasks that take many minutes to complete exacerbate this we can t just block new transactions long enough to deliver all the in-flight web pages and let locks clear that way. PGBouncer turns out to be an ideal tool here. If you route all your connections through PGBouncer, you have a single point you can deliberately interrupt to clear all database locks in a second or so (it takes time for backends to all notice that their clients have gone). So we combined these things to get what we called Fast Down Time or FDT. We set the following rules for developers:
  1. Any schema patch had to complete in <= 15 seconds in our schema staging environment (which has a full copy of the production DB), or we d roll it back and redesign.
  2. Any patch could change either code or schema, never both. schema patches were to land on a separate branch and would be promoted to trunk only after deployment. That branch also receives automated merges from trunk after every commit to trunk, so its running the latest code.
This meant that we could be confident in QA: we would QA the new schema and the application process with the current live code (we deploy trunk multiple times a day). We published some documentation about how to write fast schema patches to help socialise the approach. Then we wrote an automated tool that would:
  1. Check for known fragile processes and abort if any were found.
  2. Check for very long transactions and abort if any were found.
  3. Shutdown pgbouncer, disconnecting all clients instantly.
  4. Use slonik to apply one or more schema patches.
  5. Start pgbouncer back up again.
The code for this (call it FDTv1) is in the Launchpad source code history its pretty entangled but its there for grabbing if you need it. Read on to see why its only available in the history :) The result was wonderful we immediately were able to deploy schema changes with <= 90 seconds of downtime, which was significantly less than the 5 minutes our stakeholders had agreed to as a benchmark if we were under 5 minutes, we could schedule downtime once a day rather than once a month. We had to fix some API client code to retry more reliably, and likewise fix a few minor bugs in the database connection handling logic in the appservers, but all in all it was a pretty smooth project. Along the way we spun off a small python helper to run and control pgbouncer, which let us write effective tests for the connection handling code paths. In This gave us the following workflow for making schema changes:
  1. Land and deploy an incremental schema change.
  2. Land and deploy any indices that need to be added these are deployed live using CREATE INDEX CONCURRENTLY.
  3. Land and deploy code changes to populate any additional fields/tables from both application servers, and from cron we do a bulk backfill that does many small transactions while walking over the entire dataset that needs to be updated / populated.
  4. Land and deploy code changes to drop references to the old schema, whatever it was.
  5. Land and deploy an incremental schema change to finalise the change such as making a new column NOT NULL once the backfill is complete.
This looks long and unwieldy but its worth noting that its actually just repeated applications of a smaller primitive:
  1. Make a schema change that is fast and compatible with existing code.
  2. Change code to take advantage of the changed schema
Pretty much any change that is desired can be done using this single primitive. We wanted to go further though the multiple stages required for complex migrations became a burden with one change a day. Fortunately PostgreSQL now includes its own replication engine, which replicates the WAL logs rather than installing triggers on all tables like Slony. Stuart, our intrepid DBA migrated Launchpad to PostreSQL 9.1, updated the FDT tool to work with native replication, and migrated Launchpad off of Slony. The result is again wonderful the overhead in doing a schema patch, with all the protection I described above, is now ~5 seconds. We can do incremental changes in less time than it takes your browser to figure out that a given server is offline. We re now negotiating with the Launchpad stakeholders to get multiple downtime windows each day, with this almost unnoticable, super reliable process in place. Reliability wise, FDT has been superb. We ve had 2 failures: one where we believe we encountered a bug in Slony when dropping two tables at once, and one where we landed a patch that worked on staging but led to lock contention in production so the patch applied, but the system was very unhealthy after that until we fixed it. Thats after doing approximately 60 patches over a 1 year period. We re partway through extracting the patching logic from Launchpad s code base into a reusable tool, but the basic principles will apply to any PostgreSQL environment.

7 July 2012

Robert Collins: Reprap driver pinouts

This is largely a memo-to-my-future self, but it may save some time for someone else facing what I was last weekend. I ve been putting together a Reprap recently, seeded by the purchase of a partially assembled one from someone local who was leaving town and didn t want to take it with them. One of the issues it had was that 2 of the stepstick driver boards it uses were burnt out, and in NZ there are no local suppliers that I could find. There is however a supplier of Easydriver driver boards, which are apparently compatible. (The Reprap electronics is a sanguinololu, which has a fitted strip that exactly matches stepstick (or pololu) driver boards. The Easydrivers are not physically compatible, but they should be pin compatible.. no? I mapped across all the pins carefully, and the only issues were: there are three GND s on the Easydriver vs 2 on the stepstick, and the PFD pin isn t exposed on the stepstick board so it can t be mapped across. I ended up with this mapping (I m not sure where pin 1 is *meant* to be on the stepstick, so I m starting with VMOT, the anti-clockwise corner pin on the same side as the 2B/2A/1A/1B pins, when looking down on an installed board pin 1, and going clockwise from there). Stepstick Easydriver VMOT M+
GND GND
2B B2
2A A2
1A A1
1B B1
VDD +5V
GND GND
Dir Dir
Step Step
Slp Slp
Rst Rst
Ms3 Nothing
Ms2 Ms2
Ms1 Ms1
En Enable But, when I tried to use this, the motor just jammed up solid. A bit of debugging and trial and error later and I figured it out. The right mapping for the motor pins: 2B B2
2A B1
1A A1
1B A2 Thats right, the two boards have chosen opposed elements for labelling of motors coils pins on the step stick 1/2 refers to the coil and A/B the two ends that need to have voltage put across them, on the easydriver A/B refer to the coil and 1/2 the two ends Super confusing, especially as I haven t been doing much electronics for oh, a decade or so. I m reminded very strongly of Rusty s scale of interface usability here.

25 June 2012

Robert Collins: Running juju against a private openstack instance.

My laptop has somewhat less than 1/2 the grunt of my desktop at home, but I prefer to work on it as I can go sit in the sun etc, very hard to do that with a mini tower case :) However, running everything through ssh to another machine makes editing and iterating more clumsy; I need to do agent forwarding etc not terribly hard, but not free either, particularly when I travel, I need to remember to sync my source trees back to my laptop. So I prefer to live on my laptop and use my desktop for compute power. I had a couple of Juju charms I wanted to investigate, but I needed enough compute power to make my laptop really quite warm so I thought, its time to update my local cloud provider from Eucalyptus to Openstack. This was easy enough, until I came to run Juju. Turns out that Juju s commands really want to talk to the public DNS name of the instance (in order to SSH tunnel a connection to Zookeeper). But! Openstack returns DNS names like Server-3 , and if you think about a home network, its fairly rare to have a local DNS server *anyway*, so putting a suffix on names like that won t help at all: you either need to use a DNS naming provider (openstack ships with an LDAP provider, which adds even more complexity), and configure your clients to know how to find it, or you need to use the public IP addresses (which default to the FlatNetwork, which is routable within a home LAN by simply adding a route to 10.0.0.0/8 to your wifi interface). Adding to confusion, some wifi routers fail to forward avahi messages, which is a) terrible and b) breaks the only obvious way of doing no-config local DNS :( . So, I did some yak shaving this morning. Turns out other folk have already run into this and filed a Juju bug and a supporting txaws bug. The txaws bug was fixed, but just missed the release of Precise. Clint Byrum is going to SRU it this week though, so we ll have it soon. I ve put a patch up to address the Juju side, which is now pending review. Running the two together works very happily for me. \o/

Next.