Search Results: "Ian Wienand"

27 December 2022

Ian Wienand: Redirecting webfinger requests with Apache

If you have a personal domain, it is nice if you can redirect webfinger requests so you can be easily found via your email. This is hardly a new idea, but the growth of Mastodon recently has made this more prominent. I wanted to redirect webfinger endpoints to a Mastondon host I am using, but only my email and only standard Apache rewrites. Below, replace xxx@yyy\.com with your email and zzz.social with the account to be redirected to. There are a couple of tricks in being able to inspect the query-string and quoting, but the end result that works for me is
RewriteEngine On
RewriteMap lc int:tolower
RewriteMap unescape int:unescape
RewriteCond % REQUEST_URI  ^/\.well-known/webfinger$
RewriteCond $ lc:$ unescape:% QUERY_STRING  (?:^ &)resource=acct:xxx@yyy\.com(?:$ &)
RewriteRule ^(.*)$ https://zzz.social/.well-known/webfinger?resource=acct:xxx@zzz.social [L,R=302]
RewriteCond % REQUEST_URI  ^/\.well-known/host-meta$
RewriteCond $ lc:$ unescape:% QUERY_STRING  (?:^ &)resource=acct:xxx@yyy\.com(?:$ &)
RewriteRule ^(.*)$ https://zzz.social/.well-known/host-meta?resource=acct:xxx@zzz.social [L,R=302]
RewriteCond % REQUEST_URI  ^/\.well-known/nodeinfo$
RewriteCond $ lc:$ unescape:% QUERY_STRING  (?:^ &)resource=acct:xxx@yyy\.org(?:$ &)
RewriteRule ^(.*)$ https://zzz.social/.well-known/nodeinfo?resource=acct:xxx@zzz.social [L,R=302]
c.f. https://blog.bofh.it/debian/id_464

9 August 2021

Ian Wienand: nutdrv_qx setup for Synology DSM7

I have a cheap no-name UPS acquired from Jaycar and was wondering if I could get it to connect to my Synology DS918+. It rather unhelpfully identifies itself as MEC0003 and comes with some blob of non-working software on a CD; however some investigation found it could maybe work on my Synology NAS using the Network UPS Tools nutdrv_qx driver with the hunnox subdriver type. Unfortunately this is a fairly recent addition to the NUTs source, requiring rebuilding the driver for DSM7. I don't fully understand the Synology environment but I did get this working. Firstly I downloaded the toolchain from https://archive.synology.com/download/ToolChain/toolchain/ and extracted it. I then used the script from https://github.com/SynologyOpenSource/pkgscripts-ng to download some sort of build environment. This appears to want root access and possibly sets up some sort of chroot. Anyway, for DSM7 on the DS918+ I ran EnvDeploy -v 7.0 -p apollolake and it downloaded some tarballs into toolkit_tarballs that I simply extracted into the same directory as the toolchain. I then grabbed the NUTs source from https://github.com/networkupstools/nut. I then built NUTS similar to the following
./autogen.sh
PATH_TO_TC=/home/your/path
export CC=$ PATH_TO_CC /x86_64-pc-linux-gnu/bin/x86_64-pc-linux-gnu-gcc
export LD=$ PATH_TO_LD /x86_64-pc-linux-gnu/bin/x86_64-pc-linux-gnu-ld
./configure \
  --prefix= \
  --with-statepath=/var/run/ups_state \
  --sysconfdir=/etc/ups \
  --with-sysroot=$ PATH_TO_TC /usr/local/sysroot \
  --with-usb=yes
  --with-usb-libs="-L$ PATH_TO_TC /usr/local/x86_64-pc-linux-gnu/x86_64-pc-linux-gnu/sys-root/usr/lib/ -lusb" \
  --with-usb-includes="-I$ PATH_TO_TC /usr/local/sysroot/usr/include/"
make
The tricks to be aware of are setting the locations DSM wants status/config files and overriding the USB detection done by configure which doesn't seem to obey sysroot. If you would prefer to avoid this you can try this prebuilt nutdrv_qx (ebb184505abd1ca1750e13bb9c5f991eaa999cbea95da94b20f66ae4bd02db41). SSH to the DSM7 machine; as root move /usr/bin/nutdrv_qx out of the way to save it; scp the new version and move it into place. If you cat /dev/bus/usb/devices I found this device has a Vendor 0001 and ProdID 0000.
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  3 Spd=1.5  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0001 ProdID=0000 Rev= 1.00
S:  Product=MEC0003
S:  SerialNumber=ffffff87ffffffb7ffffff87ffffffb7
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbfs
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
E:  Ad=02(O) Atr=03(Int.) MxPS=   8 Ivl=10ms
DSM does a bunch of magic to autodetect and configure NUTs when a UPS is plugged in. The first thing you'll need to do is edit /etc/nutscan-usb.sh and override where it tries to use the blazer_usb driver for this obviously incorrect vendor/product id. The line should now look like
static usb_device_id_t usb_device_table[] =  
    0x0001, 0x0000, "nutdrv_qx"  ,
    0x03f0, 0x0001, "usbhid-ups"  ,
  ... and so on ...
Then you want to edit the file /usr/syno/lib/systemd/scripts/ups-usb.sh to start the nutdrv_qx; find the DRV_LIST in that file and update it like so:
local DRV_LIST="nutdrv_qx usbhid-ups blazer_usb bcmxcp_usb richcomm_usb tripplite_usb"
This is triggered by /usr/lib/systemd/system/ups-usb.service and is ultimately what tries to setup the UPS configuration. Lastly, you will need to edit the /etc/ups/ups.conf file. This will probably vary depending on your UPS. One important thing is to add user=root above the driver; it seems recent NUT has become more secure and drops permissions, but the result it will not find USB devices in this environment (if you're getting something like no appropriate HID device found this is likely the cause). So the configuration should look something like:
user=root
[ups]
driver = nutdrv_qx
port = auto
subdriver = hunnox
vendorid = "0001"
productid = "0000"
langid_fix = 0x0409
novendor
noscanlangid
#pollonly
#community =
#snmp_version = v2c
#mibs =
#secName =
#secLevel =
#authProtocol =
#authPassword =
#privProtocol =
#privPassword =
I then restarted the UPS daemon by enabling/disabling UPS support in the UI. This should tell you that your UPS is connected. You can also check /var/log/ups.log which shows for me
2021-08-09T18:14:51+10:00 synology synoups[11994]: =====log UPS status start=====
2021-08-09T18:14:51+10:00 synology synoups[11996]: device.mfr=
2021-08-09T18:14:51+10:00 synology synoups[11998]: device.model=
2021-08-09T18:14:51+10:00 synology synoups[12000]: battery.charge=
2021-08-09T18:14:51+10:00 synology synoups[12002]: battery.runtime=
2021-08-09T18:14:51+10:00 synology synoups[12004]: battery.voltage=13.80
2021-08-09T18:14:51+10:00 synology synoups[12006]: input.voltage=232.0
2021-08-09T18:14:51+10:00 synology synoups[12008]: output.voltage=232.0
2021-08-09T18:14:51+10:00 synology synoups[12010]: ups.load=31
2021-08-09T18:14:51+10:00 synology synoups[12012]: ups.status=OL
2021-08-09T18:14:51+10:00 synology synoups[12013]: =====log UPS status end=====
Which corresponds to the correct input/output voltage and state. Of course this is all unsupported and probably likely to break -- although I don't imagine much of these bits are updated very frequently. It will likely be OK until the UPS battery dies; at which point I would reccommend buying a better UPS on the Synology support list.

5 August 2021

Ian Wienand: Lyte Portable Projector Investigation

I recently picked up this portable projector for a reasonable price. It might also be called a "M5" projector, but I can not find one canonical source. In terms of projection, it performs as well as a 5cm cube could be expected to. They made a poor choice to eschew adding an external video input which severely limits the device's usefulness. The design is nice and getting into it is quite an effort. There is no wasted space! After pulling off the rubber top covering and base, you have to pry the decorative metal shielding off all sides to access the screws to open it. This almost unavoidably bends it so it will never quite be the same. To avoid you having to bother, some photos: Lyte ProjectorIt is fairly locked down. I found a couple of ways in; installing the Disney+ app from the "Aptoide TV" store it ships with does not work, but the app prompts you to update it, which sends you to an action where you can then choose to open the Google Play store. From there, you can install things that work on it's Android 7 OS. This allowed me to install a system-viewer app which revealed its specs: Another weird thing I found was that if you go into the custom launcher "About" page under settings and keep clicking the "OK" button on the version number, it will open the standard Android settings page. From there you can enable developer options. I could not get it connecting to ADB, although you perhaps need a USB OTG cable which I didn't have. It has some sort of built-in Miracast app that I could not get anything to detect. It doesn't have the native Google app store; most of the apps in the provided system don't work. Somehow it runs Netflix via a webview or which is hard to use. If it had HDMI input it would still be a useful little thing to plug things into. You could perhaps sideload some sort of apps to get the screensharing working, or it plays media files off a USB stick or network shares. I don't believe there is any practical way to get a more recent Android on this, leaving it on an accelerated path to e-waste for all but the most boutique users.

16 October 2016

Thomas Goirand: Released OpenStack Newton, Moving OpenStack packages to upstream Gerrit CI/CD

OpenStack Newton is released, and uploaded to Sid OpenStack Newton was released on the Thursday 6th of October. I was able to upload nearly all of it before the week-end, though there was a bit of hick-ups still, as I forgot to upload python-fixtures 3.0.0 to unstable, and only realized it thanks to some bug reports. As this is a build time dependency, it didn t disrupt Sid users too much, but 38 packages wouldn t build without it. Thanks to Santiago Vila for pointing at the issue here. As of writing, a lot of the Newton packages didn t migrate to Testing yet. It s been migrating in a very messy way. I d love to improve this process, but I m not sure how, if not filling RC bugs against 250 packages (which would be painful to do), so they would migrate at once. Suggestions welcome. Bye bye Jenkins For a few years, I was using Jenkins, together with a post-receive hook to build Debian Stable backports of OpenStack packages. Though nearly a year and a half ago, we had that project to build the packages within the OpenStack infrastructure, and use the CI/CD like OpenStack upstream was doing. This is done, and Jenkins is gone, as of OpenStack Newton. Current status As of August, almost all of the packages Git repositories were uploaded to OpenStack Gerrit, and the build now happens in OpenStack infrastructure. We ve been able to build all packages a release OpenStack Newton Debian packages using this system. This non-official jessie backports repository has also been validated using Tempest. Goodies from Gerrit and upstream CI/CD It is very nice to have it built this way, so we will be able to maintain a full CI/CD in upstream infrastructure using Newton for the life of Stretch, which means we will have the tools to test security patches virtually forever. Another thing is that now, anyone can propose packaging patches without the need for an Alioth account, by sending a patch for review through Gerrit. It is our hope that this will increase the likeliness of external contribution, for example from 3rd party plugins vendors (ie: networking driver vendors, for example), or upstream contributors themselves. They are already used to Gerrit, and they all expected the packaging to work this way. They are all very much welcome. The upstream infra: nodepool, zuul and friends
The OpenStack infrastructure has been described already in planet.debian.org, by Ian Wienand. So I wont describe it again, he did a better job than I ever would. How it works All source packages are stored in Gerrit with the deb- prefix. This is in order to avoid conflict with upstream code, and to easily locate packaging repositories. For example, you ll find Nova packaging under https://git.openstack.org/cgit/openstack/deb-nova. Two Debian repositories are stored in the infrastructure AFS (Andrew File System, which means a copy of that repository exist on each cloud were we have compute resources): one for the actual deb-* builds, under jessie-newton , and one for the automatic backports, maintained in the deb-auto-backports gerrit repository. We re using a git tag based workflow. Every Gerrit repository contains all of the upstream branch, plus a debian/newton branch, which contains the same content as a tag of upstream, plus the debian folder. The orig tarball is generated using git archive , then used by sbuild to produce binaries. To package a new upstream release, one simply needs to git merge -X theirs FOO (where FOO is the tag you want to merge), then edit debian/changelog so that the Debian package version matches the tag, then do git commit -a amend , and simply git review . At this point, the OpenStack CI will build the package. If it builds correctly, then a core reviewer can approve the merge commit , the patch is merged, then the package is built and the binary package published on the OpenStack Debian package repository. Maintaining backports automatically The automatic backports is maintained through a Gerrit repository called deb-auto-backports containing a packages-list file that simply lists source packages we need to backport. On each new CR (change request) in Gerrit, thanks to some madison-lite and dpkg compare-version magic, the packages-list is used to compare what s in the Debian archive and what we have in the jessie-newton-backports repository. If the version is lower in our repository, or if the package doesn t exist, then a build is triggered. There is the possibility to backport from any Debian release (using the -d flag in the packages-list file), and even we can use jessie-backports to just rebuild the package. I also had to write a hack to just download from jessie-backports without rebuilding, because rebuilding the webkit2gtk package (needed by sphinx) was taking too resources (though we ll try to never use it, and rebuild packages when possible). The nice thing with this system, is that we don t need to care much about maintaining packages up-to-date: the script does that for us. Upstream Debian repository are NOT for production The produced package repositories are there because we have interconnected build dependencies, needed to run unit test at build time. It is the only reason why such Debian repository exist. They are not for production use. If you wish to deploy OpenStack, we very much recommend using packages from distributions (like Debian or Ubuntu). Indeed, the infrastructure Debian repositories are updated multiple times daily. As a result, it is very likely that you will experience failures to download (hash or file size mismatch and such). Also, the functional tests aren t yet wired in the CI/CD in OpenStack infra, and therefore, we cannot guarantee yet that the packages are usable. Improving the build infrastructure There s a bunch of things which we could do to improve the build process. Let me give a list of things we want to do. Generalizing to Debian During Debconf 16, I had very interesting talks with the DSA (Debian System Administrator) about deploying such a CI/CD for the whole of the Debian archive, interfacing Gerrit with something like dgit and a build CI. I was told that I should provide a proof of concept first, which I very much agreed with. Such a PoC is there now, within OpenStack infra. I very much welcome any Debian contributor to try it, through a packaging patch. If you wish to do so, you should read how to contribute to OpenStack here: https://wiki.openstack.org/wiki/How_To_Contribute#If_you.27re_a_developer and then simply send your patch with git review . This system, however, currently only fits the git tag based packaging workflow. We d have to do a little bit more work to make it possible to use pristine-tar (basically, allow to push in the upstream and pristine-tar branches without any CI job connected to the push). Dear DSA team, as we now nice PoC that is working well, on which the OpenStack PKG team is maintaining 100s of packages, shall we try to generalize and provide such infrastructure for every packaging team and DDs?

21 June 2016

Ian Wienand: Zuul and Ansible in OpenStack CI

In a prior post, I gave an overview of the OpenStack CI system and how jobs were started. In that I said
(It is a gross oversimplification, but for the purposes of OpenStack CI, Jenkins is pretty much used as a glorified ssh/scp wrapper. Zuul Version 3, under development, is working to remove the need for Jenkins to be involved at all).
Well some recent security issues with Jenkins and other changes has led to a roll-out of what is being called Zuul 2.5, which has indeed removed Jenkins and makes extensive use of Ansible as the basis for running CI tests in OpenStack. Since I already had the diagram, it seems worth updating it for the new reality.
OpenStack CI Overview While previous post was really focused on the image-building components of the OpenStack CI system, overview is the same but more focused on the launchers that run the tests. Overview of OpenStack CI with Zuul and Ansible
  1. The process starts when a developer uploads their code to gerrit via the git-review tool. There is no further action required on their behalf and the developer simply waits for results of their jobs.

  2. Gerrit provides a JSON-encoded "fire-hose" output of everything happening to it. New reviews, votes, updates and more all get sent out over this pipe. Zuul is the overall scheduler that subscribes itself to this information and is responsible for managing the CI jobs appropriate for each change.

  3. Zuul has a configuration that tells it what jobs to run for what projects. Zuul can do lots of interesting things, but for the purposes of this discussion we just consider that it puts the jobs it wants run into gearman for a launcher to consume. gearman is a job-server; as they explain it "[gearman] provides a generic application framework to farm out work to other machines or processes that are better suited to do the work". Zuul puts into gearman basically a tuple (job-name, node-type) for each job it wants run, specifying the unique job name to run and what type of node it should be run on.

  4. A group of Zuul launchers are subscribed to gearman as workers. It is these Zuul launchers that will consume the job requests from the queue and actually get the tests running. However, a launcher needs two things to be able to run a job a job definition (what to actually do) and a worker node (somewhere to do it). The first part what to do is provided by job-definitions stored in external YAML files. The Zuul launcher knows how to process these files (with some help from Jenkins Job Builder, which despite the name is not outputting XML files for Jenkins to consume, but is being used to help parse templates and macros within the generically defined job definitions). Each Zuul launcher gets these definitions pushed to it constantly by Puppet, thus each launcher knows about all the jobs it can run automatically. Of course Zuul also knows about these same job definitions; this is the job-name part of the tuple we said it put into gearman. The second part somewhere to run the test takes some more explaining. To the next point...

  5. Several cloud companies donate capacity in their clouds for OpenStack to run CI tests. Overall, this capacity is managed by a customized management tool called nodepool (you can see the details of this capacity at any given time by checking the nodepool configuration). Nodepool watches the gearman queue and sees what requests are coming out of Zuul. It looks at node-type of jobs in the queue (i.e. what platform the job has requested to run on) and decides what types of nodes need to start and which cloud providers have capacity to satisfy demand. Nodepool will start fresh virtual machines (from images built daily as described in the prior post), monitor their start-up and, when they're ready, put a new "assignment job" back into gearman with the details of the fresh node. One of the active Zuul launchers will pick up this assignment job and register the new node to itself.

  6. At this point, the Zuul launcher has what it needs to actually get jobs started. With an fresh node registered to it and waiting for something to do, the Zuul launcher can advertise its ability to consume one of the waiting jobs from the gearman queue. For example, if a ubuntu-trusty node is provided to the Zuul launcher, the launcher can now consume from gearman any job it knows about that is intended to run on an ubuntu-trusty node type. If you're looking at the launcher code this is driven by the NodeWorker class you can see this being created in response to an assignment via LaunchServer.assignNode. To actually run the job where the "job hits the metal" as it were the Zuul launcher will dynamically construct an Ansible playbook to run. This playbook is a concatenation of common setup and teardown operations along with the actual test scripts the jobs wants to run. Using Ansible to run the job means all the flexibility an orchestration tool provides is now available to the launcher. For example, there is a custom console streamer library that allows us to live-stream the console output for the job over a plain TCP connection, and there is the possibility to use projects like ARA for visualisation of CI runs. In the future, Ansible will allow for better coordination when running multiple-node testing jobs after all, this is what orchestration tools such as Ansible are made for! While the Ansible run can be fairly heavyweight (especially when you're talking about launching thousands of jobs an hour), the system scales horizontally with more launchers able to consume more work easily. When checking your job results on logs.openstack.org you will see a _zuul_ansible directory now which contains copies of the inventory, playbooks and other related files that the launcher used to do the test run.

  7. Eventually, the test will finish. The Zuul launcher will put the result back into gearman, which Zuul will consume (log copying is interesting but a topic for another day). The testing node will be released back to nodepool, which destroys it and starts all over again nodes are not reused and also have no sensitive details on them, as they are essentially publicly accessible. Zuul will wait for the results of all jobs for the change and post the result back to Gerrit; it either gives a positive vote or the dreaded negative vote if required jobs failed (it also handles merges to git, but that is also a topic for another day).

Work will continue within OpenStack Infrastructure to further enhance Zuul; including better support for multi-node jobs and "in-project" job definitions (similar to the https://travis-ci.org/ model); for full details see the spec.

5 April 2016

Ian Wienand: Image building in OpenStack CI

Also titled minimal images - maximal effort! The OpenStack Infrastructure Team manages a large continuous-integration system that provides the broad range of testing the OpenStack project requires. Tests are run thousands of times a day across every project, on multiple platforms and on multiple cloud-providers. There are essentially no manual steps in any part of the process, with every component being automated via scripting, a few home-grown tools and liberal doses of Puppet and Ansible. More importantly, every component resides in the public git trees right alongside every other OpenStack project, with contributions actively encouraged. As with any large system, technical debt can build up and start to affect stability and long-term maintainability. OpenStack Infrastructure can see some of this debt accumulating as more testing environments across more cloud-providers are being added to support ever-growing testing demands. Thus a strong focus of recent work has been consolidating testing platforms to be smaller, better defined and more maintainable. This post illustrates some of the background to the issues and describes how these new platforms are more reliable and maintainable.
OpenStack CI Overview Before getting into details, it's a good idea to get a basic big-picture conceptual model of how OpenStack CI testing works. If you look at the following diagram and follow the numbers with the explanation below, hopefully you'll have all the context you need. Overview of OpenStack CI
  1. The developer uploads their code to gerrit via the git-review tool. There is no further action required on their behalf and the developer simply waits for results.

  2. Gerrit provides a JSON-encoded "firehose" output of everything happening to it. New reviews, votes, updates and more all get sent out over this pipe. Zuul is the overall scheduler that subscribes itself to this information and is responsible for managing the CI jobs appropriate for each change.

  3. Zuul has a configuration that tells it what jobs to run for what projects. Zuul can do lots of interesting things, but for the purposes of this discussion we just consider that it puts the jobs it wants run into gearman for a Jenkins master to consume. gearman is a job-server; as they explain it "[gearman] provides a generic application framework to farm out work to other machines or processes that are better suited to do the work". Zuul puts into gearman basically a tuple (job-name, node-type) for each job it wants run, specifying the unique job name to run and what type of node it should be run on.

  4. A group of Jenkins masters are subscribed to gearman as workers. It is these Jenkins masters that will consume the job requests from the queue and actually get the tests running. However, Jenkins needs two things to be able to run a job a job definition (what to actually do) and a slave node (somewhere to do it). The first part what to do is provided by job-definitions stored in external YAML files and processed by Jenkins Job Builder (jjb) in to job configurations for Jenkins. Each Jenkins master gets these definitions pushed to it constantly by Puppet, thus each Jenkins master instance knows about all the jobs it can run automatically. Zuul also knows about these job definitions; this is the job-name part of the tuple we said it put into gearman. The second part somewhere to run the test takes some more explaining. To the next point...

  5. Several cloud companies donate capacity in their clouds for OpenStack to run CI tests. Overall, this capacity is managed by a customised orchestration tool called nodepool. Nodepool watches the gearman queue and sees what requests are coming out of Zuul. It looks at node-type of jobs in the queue and decides what types of nodes need to start and which cloud providers have capacity to satisfy demand. Nodepool will monitor the start-up of the virtual-machines and register the new nodes to the Jenkins master instances.

  6. At this point, the Jenkins master has what it needs to actually get jobs started. When nodepool registers a host to a Jenkins master as a slave, the Jenkins master can now advertise its ability to consume jobs. For example, if a ubuntu-trusty node is provided to the Jenkins master instance by nodepool, Jenkins can now consume from gearman any job it knows about that is intended to run on an ubuntu-trusty slave. Jekins will run the job as defined in the job-definition on that host ssh-ing in, running scripts, copying the logs and waiting for the result. (It is a gross oversimplification, but for the purposes of OpenStack CI, Jenkins is pretty much used as a glorified ssh/scp wrapper. Zuul Version 3, under development, is working to remove the need for Jenkins to be involved at all).

  7. Eventually, the test will finish. The Jenkins master will put the result back into gearman, which Zuul will consume. The slave will be released back to nodepool, which destroys it and starts all over again (slaves are not reused and also have no sensitive details on them, as they are essentially publicly accessible). Zuul will wait for the results of all jobs for the change and post the result back to Gerrit; it either gives a positive vote or the dreaded negative vote if required jobs failed (it also handles merges to git, but we'll ignore that bit for now).

In a nutshell, that is the CI work-flow that happens thousands-upon-thousands of times a day keeping OpenStack humming along.
Image builds So far we have glossed over how nodepool actually creates the images that it hands out for testing. Image creation, illustrated in step 8 above, contains a lot of important details. Firstly, what are these images and why build them at all? These images are where the "rubber hits the road" they are instantiated into the virtual-machines that will run DevStack, unit-testing or whatever else someone might want to test. The main goal is to provide a stable and consistent environment in which to run a wide-range of tests. A full OpenStack deployment results in hundreds of libraries and millions of lines of code all being exercised at once. The testing-images are right at the bottom of all this, so any instability or inconsistency affects everyone; leading to constant fire-firefighting and major inconvenience as all forward-progress stops when CI fails. We want to support a wide number of platforms interesting to developers such as Ubuntu, Debian, CentOS and Fedora, and we also want to and make it easy to handle new releases and add other platforms. We want to ensure this can be maintained without too much day-to-day hands-on. Caching is a big part of the role of these images. With thousands of jobs going on every day, an occasional network blip is not a minor annoyance, but creates constant and difficult to debug failures. We want jobs to rely on as few external resources as possible so tests are consistent and stable. This means caching things like the git trees tests might use (OpenStack just broke the 1000 repository mark), VM images, packages and other common bits and pieces. Obviously a cache is only as useful as the data in it, so we build these images up every day to keep them fresh.
Snapshot images If you log into almost any cloud-provider's interface, they almost certainly have a range of pre-canned images of common distributions for you to use. At first, the base images for OpenStack CI testing came from what the cloud-providers had as their public image types. However, over time, there are a number of issues that emerge:
  1. No two images, even for the same distribution or platform, are the same. Every provider seems to do something "helpful" to the images which requires some sort of workaround.
  2. Providers rarely leave these images alone. One day you would boot the image to find a bunch of Python libraries pip-installed, or a mount-point moved, or base packages removed (all happened).
  3. Even if the changes are helpful, it does not make for consistent and reproducible testing if every time you run, you're on a slightly different base system.
  4. Providers don't have some images you want (like a latest Fedora), or have different versions, or different point releases. All update asynchronously whenever they get around to it.
So the original incarnations of OpenStack CI images were based on these public images. Nodepool would start one of these provider images and then run a series of scripts on it these scripts would firstly try to work-around any quirks to make the images look as similar as possible across providers, and then do the caching, setup things like authorized keys and finish other configuration tasks. Nodepool would then snapshot this prepared image and start instantiating VM's from these snapshots into the pool for testing. If you hear someone talking about a "snapshot image" in OpenStack CI context, that's likely what they are referring to. Apart from the stability of the underlying images, the other issue you hit with this approach is that the number of images being built starts to explode when you take into account multiple providers and multiple regions. Even with just Rackspace and the (now defunct) HP Cloud we would end up creating snapshot images for 4 or 5 platforms across a total of about 8 regions meaning anywhere up to 40 separate image builds happening daily (you can see how ridiculous it was getting in the logging configuration used at the time). It was almost a fait accompli that some of these would fail every day nodepool can deal with this by reusing old snapshots but this leads to a inconsistent and heterogeneous testing environment. Naturally there was a desire for something more consistent a single image that could run across multiple providers in a much more tightly controlled manner.
Upstream-based builds Upstream distributions do provide "cloud-images", which are usually pre-canned .qcow2 format files suitable for uploading to your average cloud. So the diskimage-builder tool was put into use creating images for nodepool, based on these upstream-provided images. In essence, diskimage-builder uses a series of elements (each, as the name suggests, designed to do one thing) that allow you to build a completely customised image. It handles all the messy bits of laying out the image file, tries to be smart about caching large downloads and final things like conversion to qcow2 or vhd. nodepool has used diskimage-builder to create customised images based upon the upstream releases for some time. These are better, but still have some issues for the CI environment:
  1. You still really have no control over what does or does not go into the upstream base images. You don't notice a change until you deploy a new image based on an updated version and things break.
  2. The images still start with a fair amount of "stuff" on them. For example cloud-init is a rather large Python program and has a fair few dependencies. These dependencies can both conflict with parts of OpenStack or end up tacitly hiding real test requirements (the test doesn't specify it, but the package is there as part of another base dependency. Things then break when the base dependencies change). The whole idea of the CI is that (as much as possible) you're not making any assumptions about what is required to run your tests you want everything explicitly included.
  3. An image that "works everywhere" across multiple cloud-providers is quite a chore. cloud-init hasn't always had support for config-drive and Rackspace's DHCP-less environment, for example. Providers all have their various different networking schemes or configuration methods which needs to be handled consistently.
If you were starting this whole thing again, things like LXC/Docker to keep "systems within systems" might come into play and help alleviate some of the packaging conflicts. Indeed they may play a role in the future. But don't forget that DevStack, the major CI deployment mechanism, was started before Docker existed. And there's tricky stuff with networking and Neutron going on. And things like iSCSI kernel drivers that containers don't support well. And you need to support Ubuntu, Debian, CentOS and Fedora. And you have hundreds of developers already relying on what's there. So change happens incrementally, and in the mean time, there is a clear need for a stable, consistent environment.
Minimal builds To this end, diskimage-builder now has a serial of "minimal" builds that are really that systems with essentially nothing on them. For Debian and Ubuntu this is achieved via debootstrap, for Fedora and CentOS we replicate this with manual installs of base packages into a clean chroot environment. We add on a range of important elements that make the image useful; for example, for networking, we have simple-init which brings up the network consistently across all our providers but has no dependencies to mess with the base system. If you check the elements provided by project-config you can see a range of specific elements that OpenStack Infra runs at each image build (these are actually specified by in arguments to nodepool, see the config file, particularly diskimages section). These custom elements do things like caching, using puppet to install the right authorized_keys files and setup a few needed things to connect to the host. In general, you can see the logs of an image build provided by nodepool for each daily build. So now, each day at 14:14 UTC nodepool builds the daily images that will be used for CI testing. We have one image of each type that (theoretically) works across all our providers. After it finishes building, nodepool uploads the image to all providers (p.s. the process of doing this is so insanely terrible it spawned shade; this deserves many posts of its own) at which point it will start being used for CI jobs. If you wish to replicate this entire process, the build-image.sh script, run on an Ubuntu Trusty host in a virtualenv with diskimage-builder will get you pretty close (let us know of any issues!).
DevStack and bare nodes There are two major ways OpenStack projects test their changes:
  1. Running with DevStack, which brings up a small, but fully-functional, OpenStack cloud with the change-under-test applied. Generally tempest is then used to ensure the big-picture things like creating VM's, networks and storage are all working.
  2. Unit-testing within the project; i.e. what you do when you type tox -e py27 in basically any OpenStack project.
To support this testing, OpenStack CI ended up with the concept of bare nodes and devstack nodes.
  • A bare node was made for unit-testing. While tox has plenty of information about installing required Python packages into the virtualenv for testing, it doesn't know anything about the system packages required to build those Python packages. This means things like gcc and library -devel packages which many Python packages use to build bindings. Thus the bare nodes had an ever-growing and not well-defined list of packages that were pre-installed during the image-build to support unit-testing. Worse still, projects didn't really know their dependencies but just relied on their testing working with this global list that was pre-installed on the image.
  • In contrast to this, DevStack has always been able to bootstrap itself from a blank system to a working OpenStack deployment by ensuring it has the right dependencies installed. We don't want any packages pre-installed here because it hides actual dependencies that we want explicitly defined within DevStack otherwise, when a user goes to deploy DevStack for their development work, things break because their environment differs slightly to the CI one. If you look at all the job definitions in OpenStack, by convention any job running DevStack has a dsvm in the job name this referred to running on a "DevStack Virtual Machine" or a devstack node. As the CI environment has grown, we have more and more testing that isn't DevStack based (puppet apply tests, for example) that rather confusingly want to run on a devstack node because they do not want dependencies installed. While it's just a name, it can be difficult to explain!
Thus we ended up maintaining two node-types, where the difference between them is what was pre-installed on the host and yes, the bare node had more installed than a devstack node, so it wasn't that bare at all!
Specifying Dependencies Clearly it is useful to unify these node types, but we still need to provide a way for the unit-test environments to have their dependencies installed. This is where a tool called bindep comes in. This tool gives project authors a way to specify their system requirements in a similar manner to the way their Python requirements are kept. For example, OpenStack has the concept of global requirements those Python dependencies that are common across all projects so version skew becomes somewhat manageable. This project now has some extra information in the other-requirements.txt file, which lists the system packages required to build the Python packages in the global-requirements list. bindep knows how to look at these lists provided by projects and get the right packages for the platform it is running on. As part of the image-build, we have a cache-bindep element that can go through every project and build a list of the packages it requires. We can thus pre-cache all of these packages onto the images, knowing that they are required by jobs. This both reduces the dependency on external mirrors and improves job performance (as the packages are locally cached) but doesn't pollute the system by having everything pre-installed. Package installation can now happen via the way we really should be doing it as part of the CI job. There is a job-macro called install-distro-packages which a test can use to call bindep to install the packages specified by the project before the run. You might notice the script has a "fallback" list of packages if the project does not specify it's own dependencies this essentially replicates the environment of a bare node as we transition to projects more strictly specifying their system requirements. We can now start with a blank image and all the dependencies to run the job can be expressed by and within the project leading to a consistent and reproducible environment without any hidden dependencies. Several things have broken as part of removing bare nodes this is actually a good thing because it means we have revealed areas where we were making assumptions in jobs about what the underlying platform provides. There's a few other job-macros that can do things like provide MySQL/Postgres instances for testing or setup other common job requirements. By splitting these types of things out from base-images we also improve the performance of jobs who don't waste time doing things like setting up databases for jobs that don't need it. As of this writing, the bindep work is new and still a work-in-progress. But the end result is that we have no more need for a separate bare node type to run unit-tests. This essentially halves the number of image-builds required and brings us to the goal of a single image for each platform running all CI.
Conclusion While dealing with multiple providers, image-types and dependency chains has been a great effort for the infra team, to everyone's credit I don't think the project has really noticed much going on underneath. OpenStack CI has transitioned to a situation where there is a single image type for each platform we test that deploys unmodified across all our providers and runs all testing environments equally. We have better insight into our dependencies and better tools to manage them. This leads to greatly decreased maintenance burden, better consistency and better performance; all great things to bring to OpenStack CI!

4 April 2016

Ian Wienand: Image building for OpenStack CI -- Minimal images, big effort

A large part of OpenStack Infrastructure teams recent efforts has been focused on moving towards more stable and maintainable CI environments for testing.
OpenStack CI Overview Before getting into details, it's a good idea to get a basic big-picture conceptual model of how OpenStack CI testing works. If you look at the following diagram and follow the numbers with the explanation below, hopefully you'll have all the context you need. Overview of OpenStack CI
  1. The developer uploads their code to gerrit via the git-review tool. They wait.

  2. Gerrit provides a JSON-encoded "firehose" output of everything happening to it. New reviews, votes, updates and more all get sent out over this pipe. Zuul is the overall scheduler that subscribes itself to this information and is responsible for managing the CI jobs appropriate for each change.

  3. Zuul has a configuration that tells it what jobs to run for what projects. Zuul can do lots of interesting things, but for the purposes of this discussion we just consider that it puts the jobs it wants run into gearman for a Jenkins host to consume. gearman is a job-server; as they explain it "[gearman] provides a generic application framework to farm out work to other machines or processes that are better suited to do the work".

  4. A group of Jenkins hosts are subscribed to gearman as workers. It is these Jenkins hosts that will consume the job requests from the queue and actually get the tests running. Jenkins needs two things to be able to run a job -- a job definition (what to actually do) and a slave node (somewhere to do it). The first part -- what to do -- is provided by job-definitions stored in external YAML files and processed by Jenkins Job Builder (jjb) in to job configurations for Jenkins. Thus each Jenkins instance knows about all the jobs it might need to run. Zuul also knows about these job definitions, so you can see how we now have a mapping where Zuul can put a job into gearman saying "run test foo-bar-baz" and a Jenkins host can consume that request and know what to do. The second part -- somewhere to run the test -- takes some more explaining. To the next point...

  5. Several cloud companies donate capacity in their clouds for OpenStack to run CI tests. Overall, this capacity is managed by nodepool -- a customised orchestration tool. Nodepool watches the gearman queue and sees what requests are coming out of Zuul, and decides what type of capacity to provide and in what clouds to satisfy the outstanding job queue. Nodepool will start-up virtual-machines as required, and register those nodes to the Jenkins instances.

  6. At this point, Jenkins has what it needs to actually get jobs started. When nodepool registers a host to Jenkins as a slave, the Jenkins host can now advertise its ability to consume jobs. For example, if a ubuntu-trusty node is provided to the Jenkins instance by nodepool, Jenkins can now consume a job from the queue intended to run on an ubuntu-trusty host. It will run the job as defined in the job-definition -- ssh-ing into the host, running scripts, copying the logs and waiting for the result. (It is a gross oversimplification, but but Jenkins is pretty much a glorified ssh/scp wrapper to OpenStack CI. Zuul Version 3, under development, is working to remove the need for Jenkins to be involved at all).

  7. Eventually, the test will finish. Jenkins will put the result back into gearman, which Zuul will consume. The slave will be released back to nodepool, which destroys it and starts all over again (slaves are not reused and also have no sensitive details on them, as they are essentially publicly accessible). Zuul will wait for the results of all jobs and post the result back to Gerrit and give either a positive vote or the dreaded negative vote if required jobs failed (it also handles merges to git, but we'll ignore that bit for now).

In a nutshell, that is the CI work-flow that happens thousands-upon-thousands of times a day keeping OpenStack humming along.
Image builds There is, however, another more asynchronous part of the process that hides a lot of details the rest of the system relies on. Illustrated in step 8 above, this is the management of the images that tests are being run upon. Above we we said that a test runs on a ubuntu-trusty, centos, fedora or some other type of node, but glossed over where these images come from. Firstly, what are these images, and why build them at all? These images are where the "rubber hits the road" -- where DevStack, functional testing or whatever else someone might want to test is actually run for real. Caching is a big part of the role of these images. With thousands of jobs going on every day, an occasional network blip is not a minor annoyance, but creates constant and difficult to debug CI failures. We want the images that CI runs on to rely on as few external resources as possible so test runs are as stable as possible. This means caching all the git trees tests might use, VM images consumed by various tests and other common bits and pieces. Obviously a cache is only as useful as the data in it, so we build these images up every day to keep them fresh.
Provider images If you log into almost any cloud-provider's interface, they almost certainly have a range of pre-canned images of common distributions for you to use. At first, the base images for OpenStack CI testing came from what the cloud-providers had as their public image types. However, over time, there are a number of issues that emerge:
  1. No two images, even for the same distribution or platform, are the same. Every provider seems to do something "helpful" to the images which requires some sort of workaround.
  2. Providers rarely leave these images alone. One day you would boot the image to find a bunch of Python libraries pip-installed, or a mount-point moved, or base packages removed (all happened).
  3. Even if the changes are helpful, it does not make for consistent and reproducible testing if every time you run, you're on a slightly different base system.
  4. Providers don't have some images you want (like a latest Fedora), or have different versions, or different point releases. All update asynchronously whenever they get around to it.
So the original incarnations of building images was that nodepool would start one of these provider images, run a bunch of scripts on it to make a base-image (do the caching, setup keys, etc), snapshot it and then start putting VM's based on these images into the pool for testing (if you hear someone talking about a "snapshot image" in OpenStack CI context, that's what it means). The first problem you hit here is that the number of images being built starts to explode when you take into account multiple providers and multiple regions. Even with just Rackspace and the (now defunct) HP Cloud we would end up snapshot images for 4 or 5 platforms across a total of about 8 regions -- meaning anywhere up to 40 separate image builds happening. It was almost a fait accompli that some images would fail every day -- nodepool can deal with this by reusing old snapshots; but this leads to a inconsistent and heterogeneous testing environment. OpenStack is like a gigantic Jenga tower, with a full DevStack deployment resulting in hundreds of libraries and millions of lines of code all being exercised at once. The testing images are right at the bottom of all this, and it doesn't take much to make the whole thing fall over (see points about providers not leaving images alone). This leads to constant fire-firefighting and everyone annoyed as all CI stops. Naturally there was a desire for something much more consistent -- a single image that could run across multiple providers in a much more tightly controlled manner.
Upstream-based builds Upstream distributions do provide their "cloud-images", which are usually pre-canned .qcow2 format files suitable for uploading to your average cloud. So the diskimage-builder tool was put into use creating images for nodepool, based on these upstream-provided images. In essence, diskimage-builder uses a series of elements (each, as the name suggests, designed to do one thing) that allow you to build a completely customised image. It handles all the messy bits of laying out the image file, tries to be smart about caching large downloads and final things like conversion to qcow2 or vhd or whatever your cloud requires. nodepool has used diskimage-builder to create customised images based upon the upstream releases for some time. These are better, but still have some issues for the CI environment:
  1. You still really have no control over what does or does not go into the upstream base images. You don't notice a change until you deploy a new image based on an updated version and things break.
  2. The images still have a fair amount of "stuff" on them. For example cloud-init is a rather large Python program and has a fair few dependencies. These dependencies can both conflict with parts of OpenStack or end up tacitly hiding real test requirements (the test doesn't specify it, but the package is there as part of another base dependency -- things break when the base dependencies change). The whole idea of the CI is that (as much as possible) you're not making any assumptions about what is required to run your tests -- you want everything explicitly included. (If you were starting this whole thing again, things like Docker to keep "systems within systems" might come into play. Indeed they may be in the future. But don't forget that DevStack, the major CI deployment mechanism, was started before Docker existed. And there's tricky stuff with networking and Neutron etc going on).
  3. An image that "works everywhere" across multiple cloud-providers is quite a chore. cloud-init hasn't always had support for config-drive and Rackspace's DHCP-less environment, for example. Providers all have their various different networking schemes or configuration methods which needs to be handled consistently.
Minimal builds To this end, diskimage-builder now has a serial of "minimal" builds that are really that -- systems with essentially nothing on them. For Debian and Ubuntu, this is achieved via debootstrap, for Fedora and CentOS we replicate this with manual installs of base packages into a clean chroot environment. We add-on a range of important elements that make the image useful; for example, for networking, we have simple-init which brings up the network consistently across all our providers but has no dependencies to mess with the base system. If you check the elements provided by project-config you can see a range of specific elements that OpenStack Infra runs at each image build (these are actually specified by in arguments to nodepool, see the config file, particularly diskimages section). These custom elements do things like caching, using puppet to install the right authorized_keys files and setup a few needed things to connect to the host. In general, you can see the logs of an image build provided by nodepool for each daily build. So now, each day at 14:00 UTC nodepool builds the daily images that will be used for CI testing. We have one image of each type that (theoretically) works across all our providers. After it finishes building, nodepool uploads the image to all providers (p.s. the process of doing this is so insanely terrible it spawned shade; this deserves many posts of its own) at which point it will start being used for CI jobs. If you wish to replicate this entire process, the build-image.sh script, run on an Ubuntu Trusty host in a virtualenv with diskimage-builder will get you pretty close (let us know of any issues!).
Dependencies But guess what, there's more! Along the way, OpenStack CI ended up with the concept of bare nodes and devstack nodes. A bare node was one that was used for functional testing; i.e. what you do when you type tox -e py27 in basically any OpenStack project. The problem here is that tox has plenty of information about installing required Python packages into the virtualenv for testing; but it doesn't know anything about the system packages required to build the Python libraries. This means things like gcc and -devel packages which many Python libraries use to build library bindings. In contrast to this, DevStack has always been able to bootstrap itself from a blank system to a working OpenStack deployment, ensuring it has the right libraries, etc, installed to get everything working. If you look at all the job definitions, anything running DevStack has a dsvm in the job name; which referred to to "DevStack Virtual Machine"; or basically specifying what was installed on the host (and yes, the bare node had more installed, so it wasn't that bare at all!). Thus we don't want packages pre-installed for DevStack, because it hides actual devstack dependencies that we want explicitly defined. But the bare nodes, used for functional testing, were different -- there was an every-growing and not well-defined list of packages that were pre-installed on those nodes to make sure functional testing worked. In general, you don't want jobs relying on something like this; we want to be sure if jobs have a dependency, they require it explicitly. This is where a tool called bindep comes in. OpenStack has the concept of global requirements -- those Python dependencies that are common across all projects so version skew becomes somewhat manageable. This now has some extra information in the other-requirements.txt file, which lists the system-packages required to build the Python-packages in the requirements list. bindep knows how to look at this and get the right packages for the platform it is running on. Indeed -- remember how it was previously mentioned we want to minimise dependencies on external resources at runtime? Well we can pre-cache all of these packages onto the images, knowing that they are likely to be required by packages. How do we get the packages installed? The way we really should be doing it -- as part of the CI job. There is a macro called install-distro-packages which uses bindep to install those packages as required by the global-requirements list. The result -- no more need for the bare node type to run functional tests! In all cases we can start with essentially a blank image and all the dependencies to run the job are expressed by and within the job -- leading to a consistent and reproducible environment. Several things have broken as part of removing bare nodes -- this is actually a good thing because it means we have revealed areas where we were making assumptions in jobs about what the underlying platform provides; issues that get fixed by thinking about and ensuring we have correct dependencies bringing up jobs. There's a few other macros there that do things like provide MySQL/Postgres instances or setup other common job requirements. By splitting these out we also improve the performance of jobs who now only bring in the dependencies they need -- we don't waste time doing things like setting up databases for jobs that don't need it.
Conclusion While dealing with multiple providers, image-types and dependency chains has been a great effort for the infra team, to everyone's credit I don't think the project has really noticed much going on underneath. OpenStack CI has transitioned to a situation where there is a single image type for each platform we test that deploys unmodified across all our providers. We have better insight into our dependencies and better tools to manage them. This leads to greatly decreased maintenance burden, better consistency and better performance; all great things to bring to OpenStack CI!

30 March 2016

Ian Wienand: Durable photo workflow

Ever since my kids were born I have accumulated thousands of digital happy-snaps and I have finally gotten to a point where I'm quite happy with my work-flow. I have always been extremely dubious of using any sort of external all-in-one solution to managing my photos; so many things seem to shut-down, cease development or disappear, all leaving you to have to figure out how to migrate to the next latest thing (e.g. Picasa shutting down). So while there is nothing complicated or even generic about them, there are a few things in my photo-scripts repo that might help others who like to keep a self-contained archive. Firstly I have a simple script to copy the latest photos from the SD card (i.e. those new since the last copy -- this is obviously very camera specific). I then split by date so I have a simple flat directory layout with each week's photos in it. With the price of SD cards and my rate of filling them up, I don't even bother wiping them at this point, but just keep them in the safe as a backup. For some reason I have a bit of a thing about geotagging all the photos so I know where I took them. Certainly some cameras do this today, but mine does not. I have a two-progned approach; I have a geotag script and then a small website easygeotag.info which quickly lets met translate a point on Google maps to exiv2 command-line syntax. Since I take a lot of photos in the same place, the script can store points by name in a small file sourced by the script. Adding comments to the photos is done with perhaps the lesser-known cousin of EXIF -- IPTC. Some time ago I wrote python bindings for libiptcdata and it has been working just fine ever since. Debian's python-iptcdata comes with a inbuilt script to set title and caption, which is easily wrapped. What I like about this is that my photos are in a simple directory layout, with all metadata embedded within the actual image files in very standarised formats that should be readable by anywhere I choose to host them. For sharing, I then upload to Flickr. I used to have a command-line script for this, but have found the web uploader works even better these days. It reads the IPTC data for titles and comments, and gets the geotag info for nice map displays. I manually coralle them into albums, and the Flickr "guest pass" is perfect for then sharing albums to friends and family without making them jump through hoops to register on a site to get access to the photos, or worse, host them myself. I consider Flickr a cache, because (even though I pay) I expect it to shut-down or turn evil at any time. Interestingly, their AI tagging is often quite accurate, and I imagine will only get better. This is nice extra metadata that you don't have to spend time on yourself. The last piece has always been the "hit by a bus" component of all this. Can anyone figure out access to all these photos if I suddenly disappear? I've tried many things here -- at one point I was using rdiff-backup to sync encrypted bundles up to AWS for example; but I very clearly found the problem in that when I forgot the keep the key safe and couldn't unencrypt any of my backups (let alone anyone else figuring all this out). Finally Google Nearline seems to be just what I want. It's off-site, redundant and the price is right; but more importantly I can very easily give access to the backup bucket to anyone with a Google address, who can then just hit a website to download the originals from the bucket (I left the link with my other "hit by a bus" bits and pieces). Of course what they then do with this data is their problem, but at least I feel like they have a chance. This even has an rsync like interface in the client, so I can quickly upload the new stuff from my home NAS (where I keep the photos in a RAID0). I've been doing this now for 350 weeks and have worked through some 25,000 photos. I used to get an album up every week, but as the kids get older and we're closer to family I now do it in batches about once a month. I do wonder if my kids will ever be interested in tagged and commented photos with pretty much their exact location from their childhood ... I doubt it, but it's nice to feel like I have a good chance of still having them if they do.

15 January 2016

Ian Wienand: Australia, ipv6 and dd-wrt

It seems that other than Internode, no Australian ISP has any details at all about native IPv6 deployment. Locally I am on Optus HFC, which I believe has been sold to the NBN, who I believe have since discovered that it is not quite what they thought it was. i.e. I think they have more problems than rolling out IPv6 and I won't hold my breath. So the only other option is to use a tunnel of some sort, and it seems there is really only one option with local presence via SixXS. There are other options, notably He.net, but they do not have Australian tunnel-servers. SixXS is the only one I could find with a tunnel in Sydney. So first sign up for an account there. The process was rather painless and my tunnel was provided quickly. After getting this, I got dd-wrt configured and working on my Netgear WNDR3700 V4. Here's my terse guide, cobbled together from other bits and pieces I found. I'm presuming you have a recent dd-wrt build that includes the aiccu tool to create the tunnel, and are pretty familiar with logging into it, etc. Firstly, on dd-wrt make sure you have JFFS2 turned on for somewhere to install scripts. Go Administration, JFFS2 Support, Internal Flash Storage, Enabled. Next, add the aiccu config file to /jffs/etc/aiccu.conf
# AICCU Configuration
# Login information
username USERNAME
password PASSWORD
# Protocol and server listed on your tunnel
protocol tic
server tic.sixxs.net
# Interface names to use
ipv6_interface sixxs
# The tunnel_id to use
# (only required when there are multiple tunnels in the list)
#tunnel_id <your tunnel id>
# Be verbose?
verbose false
# Daemonize?
daemonize true
# Require TLS?
requiretls true
# Set default route?
defaultroute true
Now you can add a script to bring up the tunnel and interface to /jffs/config/sixxs.ipup (make sure you make it executable) where you replace your tunnel address in the ip commands.
# wait until time is synced
while [  date +%Y  -eq 1970 ]; do
sleep 5
done
# check if aiccu is already running
if [ -n " ps grep etc/aiccu grep -v grep " ]; then
aiccu stop
sleep 1
killall aiccu
fi
# start aiccu
sleep 3
aiccu start /jffs/etc/aiccu.conf
sleep 3
ip -6 addr add 2001:....:....:....::/64 dev br0
ip -6 route add 2001:....:....:....::/64 dev br0
sleep 5
#### BEGIN FIREWALL RULES ####
WAN_IF=sixxs
LAN_IF=br0
#flush tables
ip6tables -F
#define policy
ip6tables -P INPUT DROP
ip6tables -P FORWARD DROP
ip6tables -P OUTPUT ACCEPT
# Input to the router
# Allow all loopback traffic
ip6tables -A INPUT -i lo -j ACCEPT
#Allow unrestricted access on internal network
ip6tables -A INPUT -i $LAN_IF -j ACCEPT
#Allow traffic related to outgoing connections
ip6tables -A INPUT -i $WAN_IF -m state --state RELATED,ESTABLISHED -j ACCEPT
# for multicast ping replies from link-local addresses (these don't have an
# associated connection and would otherwise be marked INVALID)
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-reply -s fe80::/10 -j ACCEPT
# Allow some useful ICMPv6 messages
ip6tables -A INPUT -p icmpv6 --icmpv6-type destination-unreachable -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type packet-too-big -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type time-exceeded -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type parameter-problem -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-reply -j ACCEPT
# Forwarding through from the internal network
# Allow unrestricted access out from the internal network
ip6tables -A FORWARD -i $LAN_IF -j ACCEPT
# Allow some useful ICMPv6 messages
ip6tables -A FORWARD -p icmpv6 --icmpv6-type destination-unreachable -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type packet-too-big -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type time-exceeded -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type parameter-problem -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type echo-request -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type echo-reply -j ACCEPT
#Allow traffic related to outgoing connections
ip6tables -A FORWARD -i $WAN_IF -m state --state RELATED,ESTABLISHED -j ACCEPT
Now you can reboot, or run the script, and it should bring the tunnel up and you should be correclty firewalled such that packets get out, but nobody can get in. Back to the web-interface, you can now enable IPv6 with Setup, IPV6, Enable. You leave "IPv6 Type" as Native IPv6 from ISP. Then I enabled Radvd and added a custom config in the text-box to get DNS working with google DNS on hosts with:
interface br0
 
AdvSendAdvert on;
prefix 2001:....:....:....::/64
  
  ;
 RDNSS 2001:4860:4860::8888 2001:4860:4860::8844
  
  ;
 ;
(again, replace the prefix with your own) That is pretty much it; at this point, you should have an IPv6 network and it's most likely that all your network devices will "just work" with it. I got full scores on the IPv6 test sites on a range of devices. Unfortunately, even a geographically close tunnel still really kills latency; compare these two traceroutes:
$ mtr -r -c 1 google.com
Start: Fri Jan 15 14:51:18 2016
HOST: jj                          Loss%   Snt   Last   Avg  Best  Wrst StDev
1.  -- 2001:....:....:....::      0.0%     1    1.4   1.4   1.4   1.4   0.0
2.  -- gw-163.syd-01.au.sixxs.ne  0.0%     1   12.0  12.0  12.0  12.0   0.0
3.  -- ausyd01.sixxs.net          0.0%     1   13.5  13.5  13.5  13.5   0.0
4.  -- sixxs.sydn01.occaid.net    0.0%     1   13.7  13.7  13.7  13.7   0.0
5.  -- 15169.syd.equinix.com      0.0%     1   11.5  11.5  11.5  11.5   0.0
6.  -- 2001:4860::1:0:8613        0.0%     1   14.1  14.1  14.1  14.1   0.0
7.  -- 2001:4860::8:0:79a0        0.0%     1  115.1 115.1 115.1 115.1   0.0
8.  -- 2001:4860::8:0:8877        0.0%     1  183.6 183.6 183.6 183.6   0.0
9.  -- 2001:4860::1:0:66d6        0.0%     1  196.6 196.6 196.6 196.6   0.0
10. -- 2001:4860:0:1::72d         0.0%     1  189.7 189.7 189.7 189.7   0.0
11. -- kul01s07-in-x09.1e100.net  0.0%     1  194.9 194.9 194.9 194.9   0.0
$ mtr -4 -r -c 1 google.com
Start: Fri Jan 15 14:51:46 2016
HOST: jj                          Loss%   Snt   Last   Avg  Best  Wrst StDev
1. -- gateway                    0.0%     1    1.3   1.3   1.3   1.3   0.0
2. -- 10.50.0.1                  0.0%     1   11.0  11.0  11.0  11.0   0.0
3. -- ???                       100.0     1    0.0   0.0   0.0   0.0   0.0
4. -- ???                       100.0     1    0.0   0.0   0.0   0.0   0.0
5. -- ???                       100.0     1    0.0   0.0   0.0   0.0   0.0
6. -- riv4-ge4-1.gw.optusnet.co  0.0%     1   12.1  12.1  12.1  12.1   0.0
7. -- 198.142.187.20             0.0%     1   10.4  10.4  10.4  10.4   0.0
When you watch what is actually using ipv6 (the ipvfoo plugin for Chrome is pretty cool, it shows you what requests are going where), it's mostly all just traffic to really big sites (Google/Google Analytics, Facebook, Youtube, etc) who have figured out IPv6. Since these are exactly the type of places that have made efforts to get caching as close as possible to you (Google's mirror servers are within Optus' network, for example) and so you're really shooting yourself in the foot going around it using an external tunnel. The other thing is that I'm often hitting IPv6 mirrors and downloading larger things for work stuff (distro updates, git clones, image downloads, etc) which is slower and wasting someone else's bandwith for really no benefit. So while it's pretty cool to have an IPv6 address (and a fun experiment) I think I'm going to turn it off. One positive was that after running with it for about a month, nothing has broken -- which suggests that most consumer level gear in a typical house (phones, laptops, TVs, smart-watches, etc) is either ready or ignores it gracefully. Bring on native IPv6!

22 December 2015

Ian Wienand: Platform bootstrap for OpenStack Infrastructure

This is a terse guide on bootstrapping virtual-machine images for OpenStack infrastructure, with the goal of adding continuous-integration support for new platforms. It might also be handy if you are trying to replicate the upstream CI environment. It covers deployment to Rackspace for testing; Rackspace is one of the major providers of capacity for the OpenStack Infrastructure project, so it makes a good place to start when building up your support story. Firstly, get an Ubuntu Trusty environment to build the image in (other environments, like CentOS or Fedora, probably work -- but take this step to minimise differences to what the automated machinery of what upstream does). You want a fair bit of memory, and plenty of disk-space. The tool used for building virtual-machine images is diskimage-builder. In short, it takes a series of elements, which are really just scripts run in different phases of the build process. I'll describe building a Fedora image, since that's been my focus lately. We will use a fedora-minimal element -- this means the system is bootstrapped from nothing inside a chroot environment, before eventually being turned into a virtual-machine image (contrast this to the fedora element, which bases itself off the qcow2 images provided by the official Fedora cloud project). Thus you'll need a few things installed on the Ubuntu host to deal with bootstrapping a Fedora chroot
apt-get install yum yum-utils python-lzma
You will hit stuff like that python-lzma dependency on this road-less-travelled -- technically it is a bug that yum packages on Ubuntu don't depend on it; without it you will get strange yum errors about unsupported compression. At this point, you can bootstrap your diskimage-builder environment. You probably want diskimage-builder from git, and then build up a virtualenv for your support bits and pieces.
git clone git://git.openstack.org/openstack/diskimage-builder
virtualenv dib_env
. dib_env/bin/activate
pip install dib-utils
dib-utils is a small part of diskimage-builder that is split-out; don't think too much about it. While diskimage-builder is responsible for the creation of the basic image, there are a number of elements provided by the OpenStack project-config repository that bootstrap the OpenStack environment. This does a lot of stuff, including caching all git trees (so CI jobs aren't cloning everything constantly) and running puppet setup.
git clone git://git.openstack.org/openstack-infra/project-config
There's one more trick required for building the VHD images that Rackspace requires; make sure you install the patched vhd-util as described in the script help. At this point, you can probably build an image. Here's something approximating what you will want to do
break=after-error \
TMP_DIR=~/tmp \
ELEMENTS_PATH=~/project-config/nodepool/elements \
DIB_DEV_USER_PASSWORD="password"  DIB_DEV_USER_PWDLESS_SUDO=1 \
DISTRO=23 \
./bin/disk-image-create -x --no-tmpfs -t vhd \
   fedora-minimal vm devuser simple-init \
   openstack-repos puppet nodepool-base node-devstack
To break some of this down This goes and does its thing; it will take about 20 minutes. Depending on how far your platform diverges from the existing support, it will require a lot of work to get everything working so you can get an image out the other side. To see a rough example of what should be happening, see the logs of the official image builds that happen for a variety of platforms. At some point, you should get a file image.vhd which is now ready to be deployed. The only reasonable way to do this is with shade. You can quickly grab this into the virtualenv we created before
pip install shade
Now you'll need to setup a clouds.yaml file to give yourself the permissions to upload the image. It should look something like
clouds:
  rax:
    profile: rackspace
    auth:
       username: your_rax_username
       password: your_rax_password
       project_id: your_rax_accountnum
    regions:
    - IAD
You should know your user-name and password (whatever you log into the website with), and when you login to Rackspace your project_id value is listed in the drop-down box labeled with your username as Account #. shade has no UI as such, so a simple script will do the upload.
import shade
shade.simple_logging(debug=True)
cloud = shade.openstack_cloud(cloud='rax')
image = cloud.create_image('image-name', filename='image.vhd', wait=True)
Now wait -- this will also take a while. Even after upload it takes a fair while to process (you will see the shade debug output looping around some glance calls seeing if it is ready). If everything works, the script should return and you should be able to see the new image in the output of nova list-images. At this point, you're ready to try booting! One caveat is that the Rackspace web interface does not seem to give you the option to boot with a configuration drive available to the host, essential for simple-init to bring up the network. So boot via the API with something like
nova boot --flavor=2 --image=image-uuid --config-drive 1 test-image
This should build and boot your image! This will allow you to open a whole new door of debugging to get your image booting correctly. You can now iterate by rebuilding/uploading/booting as required. Note these are pretty big images, and uploaded in broken-up swift files, so I find swift delete --all helpful to reset between builds (obviously, I have nothing else in swift that I want to keep). The Rackspace java-based console UI is rather annoying; it cuts itself off every time the host reboots. This makes it quite difficult to catch the early bootup, or modify grub options via the boot-loader, etc. You might need to fiddle timeout options, etc in the image build. If you've managed to get your image booting and listening on the network, you're a good-deal of the way towards having your platform supported in upstream OpenStack CI. At this point, you likely want to peruse the nodepool configuration and get an official build happening here. Once that is up, you can start the process of adding jobs that use your platform to test! Don't worry, there's plenty more that can go wrong from here -- but you're on the way! OpenStack infra is a very dynamic environment, which many changes in progress; so in general, #openstack-infra on freenode is going to be a great place to start looking for help.

5 March 2015

Ian Wienand: Acurite 02032CAUDI Weather Station

I found an Acurite Weather Center 02032CAUDI at Costco for $99, which seemed like a pretty good deal. It includes the "colour" display panel and a 5-in-1 remote sensor that includes temperature, wind-speed and direction, humidity and rain gauge. The colour in the diplay is really just a fancy background sticker with the usual calculator-style liquid-crystal display in front. It does seem that for whatever reason the viewing angle is extremely limited; even off centre a little and it becomes very dim. It has an inbuilt backlight that is quite bright; it is either off or on (3-levels) or in "auto" mode, which dims it to the lowest level at certain hours. Hacking in a proximity sensor might be a fun project. The UI is OK; it shows indoor and outdoor temperature/humidity, wind-speed/rain and with is able to show you highs and lows with a bit of scrolling. I was mostly interested in its USB output features. After a bit of fiddling I can confirm I've got it connected up to Meteobridge that is running on a Dlink DIR-505 and reporting to Weather Underground. One caveat is that you do need to plug the weather-station into a powered USB hub, rather than directly into the DIR-505; I believe because the DIR-505 can only talk directly to USB2.0 devices and not older 1.5 devices like the weather station. Another small issue is that the Meteobridge license is 65 which is not insignificant. Of course with some effort you can roll-your-own such as described in this series which is fun if you're looking for a project. Luckily I had a mounting place that backed onto my small server cupboard, so I could easily run the cables through the wall to power and the DIR-505. Without this the cables might end up a bit of a mess. Combined with the fairly limited viewing angle afforded, finding somewhere practical to put the indoor unit might be one of the hardest problems. Mounting the outdoor unit was fine, but mine is a little close to the roof-line so I'm not sure the wind-speed and direction are as accurate as if it were completely free-standing (I think official directions for wind-speed are something like free-standing 10m in the air). It needs to face north; both for the wind-direction and so the included solar-panel that draws air into the temp/humidity sensor is running as much as possible (it works without this, but it's more accurate with the fan). One thing is that it needs to mounted fairly level for the rain-gauge; it includes a small bubble-level on the top to confirm this. Firstly you'll probably find that most mount points you thought were straight actually aren't! Since the bubble is on the top, if you want to actually see it you need to be above it (obviously) which may not be possible if you're standing on a ladder and mounting it over your head. This may be a situation that inspires a very legitimate use of a selfie-stick. It's a fun little device and fairly hackable for an overall reasonable price; I recommend.

Ian Wienand: On VMware and GPL

I do not believe any of the current reporting around the announced case has accurately described the issue; which I see as a much more subtle question of GPL use across API layers. Of course I don't know what the real issue is, because the case is sealed and I have no inside knowledge. I do have some knowledge of the vmkernel, however, and what I read does not match with what I know. An overview of ESXi is shown below overview of vmkernel and vmkapi There is no question that ESXi uses a lot of Linux kernel code and drivers. The question as I see it is more around the interface. The vmkernel provides a well-described API known as vmkapi. You can write drivers directly to this API; indeed some do. You can download a SDK. A lot of Linux code has been extracted into vmkLinux; this is a shim between Linux drivers and the vmkapi interface. The intent here is to provide an environment where almost unmodified Linux drivers can interface to the proprietary vmkernel. This means vendors don't have to write two drivers, they can re-use their Linux ones. Of course, large parts of various Linux sub-systems' API are embedded in here. But the intent is that this code is modified to communicate to the vmkernel via the exposed vmkapi layer. It is conceivable that you could write a vmkWindows or vmkOpenBSD and essentially provide a shim-wrapper for drivers from other operating systems too. vmkLinux and all the drivers are GPL, and released as such. I do not think there could be any argument there. But they interface to vmkapi which, as stated, is an available API but part of the proprietary kernel. So, as I see it, this is a much more subtle question than "did VMware copy-paste a bunch of Linux code into their kernel". It goes to where the GPL crosses API boundaries and what is considered a derived work. If nothing else, this enforcement increasing clarity around that point would be good for everyone I think.

8 February 2015

Ian Wienand: Netgear CG3100D-2 investigation

The Netgear CG3100D-2 is the default cable-modem you get for Telstra Cable, at least at one time. Having retired it after changing service providers, I wanted to see if it was somewhat able to be re-purposed. In short it's hackability is low. First thing was to check out the Netgear Open Source page to see if the source had anything interesting. There is some source, but honestly when you dig into the platform code and see things like kernel/linux/arch/mips/bcm963xx/setup.c:
/***************************************************************************
 * C++ New and delete operator functions
 ***************************************************************************/
/* void *operator new(unsigned int sz) */
void *_Znwj(unsigned int sz)
 
    return( kmalloc(sz, GFP_KERNEL) );
 
/* void *operator new[](unsigned int sz)*/
void *_Znaj(unsigned int sz)
 
return( kmalloc(sz, GFP_KERNEL) );
 
...
there's a bit of a red-flag that this is not the cleanest code in the world (I guess it interfaces with some sort of cross-platform SDK written in some sort of C++). So next we can open it up, where it turns out there are two separate UARTs as shown in the following image.
UART connections on Netgear CG3100D 2BPAUS
One of these is for the bootloader and eCOS environment, and the other seems to be connected to the Linux side. A copy of the boot-logs for the bootloader and eCOS and Linux don't show anything particuarly interesting. The Linux boot does identify itself as Linux version 2.6.30-V2.06.05u while the available source lists its version as 2.6.30-1.0.5.83.mp2 so it's questionable if the source matches whatever firmware has made it onto the modem. We do see that this identifies as a BCM338332 which seems to be one of the many sub-models of the BCM3383 SoC cable-modem solution. There is an OpenWrt wiki page that indicates support is limited. Both Linux and eCos boot to a login prompt where all the usual default combinations of login/passwords fail. So my next thought was to try and get to the firmware via the bootloader, which has a simple interface
BCM338332 TP0 346890
Reset Switch - Low GPIO-18 50ms
MemSize:            128 M
Chip ID:     BCM3383G-B0
BootLoader Version: 2.4.0alpha14R6T Pre-release Gnu spiboot dual-flash reduced DDR drive linux
Build Date: Mar 24 2012
Build Time: 14:04:50
SPI flash ID 0x012018, size 16MB, block size 64KB, write buffer 256, flags 0x0
Dual flash detected.  Size is 32MB.
parameter offset is 49944
Signature/PID: a0e8
Image 1 Program Header:
   Signature: a0e8
     Control: 0005
   Major Rev: 0003
   Minor Rev: 0000
  Build Time: 2013/4/18 04:01:11 Z
 File Length: 3098751 bytes
Load Address: 80004000
    Filename: CG3100D_2BPAUS_V2.06.02u_130418.bin
         HCS: 1e83
         CRC: b95f4172
Found image 1 at offset 20000
Image 2 Program Header:
   Signature: a0e8
     Control: 0005
   Major Rev: 0003
   Minor Rev: 0000
  Build Time: 2013/10/17 02:33:29 Z
 File Length: 3098198 bytes
Load Address: 80004000
    Filename: CG3100D_2BPAUS_V2.06.05u_131017.bin
         HCS: 2277
         CRC: a6c0fd23
Found image 2 at offset 800000
Image 3 Program Header:
   Signature: a0e8
     Control: 0105
   Major Rev: 0002
   Minor Rev: 0017
  Build Time: 2013/10/17 02:22:30 Z
 File Length: 8277924 bytes
Load Address: 84010000
    Filename: CG3100D_2BPAUS_K2630V2.06.05u_131017.bin
         HCS: 157e
         CRC: 57bb0175
Found image 3 at offset 1000000
Enter '1', '2', or 'p' within 2 seconds or take default...
. .
Board IP Address  [0.0.0.0]:           192.168.2.10
Board IP Mask     [255.255.255.0]:
Board IP Gateway  [0.0.0.0]:
Board MAC Address [00:10:18:ff:ff:ff]:
Internal/External phy? (e/i/a)[a]
Switch detected: 53125
ProbePhy: Found PHY 0, MDIO on MAC 0, data on MAC 0
Using GMAC0, phy 0
Enet link up: 1G full
Main Menu:
==========
  b) Boot from flash
  g) Download and run from RAM
  d) Download and save to flash
  e) Erase flash sector
  m) Set mode
  s) Store bootloader parameters to flash
  i) Re-init ethernet
  p) Print flash partition map
  r) Read memory
  w) Write memory
  j) Jump to arbitrary address
  X) Erase all of flash except the bootloader
  z) Reset
Flash Partition information:
Name           Size           Offset
=====================================
bootloader   0x00010000     0x00000000
image1       0x007d0000     0x00020000
image2       0x007c0000     0x00800000
linux        0x00800000     0x01000000
linuxapps    0x00600000     0x01800000
permnv       0x00010000     0x00010000
dhtml        0x00200000     0x01e00000
dynnv        0x00040000     0x00fc0000
vennv        0x00010000     0x007f0000
The "read memory" seems to give you one byte at a time and I'm not certain it actually works. So I think the next step is solder some leads to dump out the firmware from the flash-chip directly, which is on the underside of the board. At that point, I imagine the passwords would be easily found in the image and you might then be able to leverage this into some sort of further hackability. If you want a challenge and have a lot of time on your hands, this might be your platform but practically I think the best place for this is the recycling bin.

28 November 2014

Ian Wienand: rstdiary

I find it very useful to spend 5 minutes a day to keep a small log of what was worked on, major bugs or reviews and a general small status report. It makes rolling up into a bigger status report easier when required, or handy as a reference before you go into meetings etc. I was happily using an etherpad page until I couldn't save any more revisions and the page got too long and started giving javascript timeouts. For a replacement I wanted a single file as input with no boilerplate to aid in back-referencing and adding entries quickly. It should be formatted to be future-proof, as well as being emacs, makefile and git friendly. Output should be web-based so I can refer to it easily and point people at it when required, but it just has to be rsynced to public_html with zero setup. rstdiary will take a flat RST based input file and chunk it into some reasonable looking static-HTML that looks something like this. It's split by month with some minimal navigation. Copy the output directory somewhere and it is done. It might also serve as a small example of parsing and converting RST nodes where it does the chunking; unfortunately the official documentation on that is "to be completed" and I couldn't find anything like a canonical example, so I gathered what I could from looking at the source of the transformation stuff. As the license says, the software is provided "as is" without warranty! So if you've been thinking "I should keep a short daily journal in a flat-file and publish it to a web-server but I can't find any software to do just that" you now have one less excuse.

10 August 2014

Ian Wienand: Finding out if you're a Rackspace instance

Different hosting providers do things slightly differently, so it's sometimes handy to be able to figure out where you are. Rackspace is based on Xen and their provided images should include the xenstore-ls command available. xenstore-ls vm-data will give you a handy provider and even region fields to let you know where you are.
function is_rackspace  
  if [ ! -f /usr/bin/xenstore-ls ]; then
      return 1
  fi
  /usr/bin/xenstore-ls vm-data   grep -q "Rackspace"
 
if is_rackspace; then
  echo "I am on Rackspace"
fi
Other reading about how this works:

8 August 2014

Ian Wienand: Bash arithmetic evaluation and errexit trap

In the "traps for new players" category:
count=0
things="0 1 0 0 1"
for i in $things;
do
   if [ $i == "1" ]; then
       (( count++ ))
   fi
done
echo "Count is $ count "
Looks fine? I've probably written this many times. There's a small gotcha:
((expression))
The expression is evaluated according to the rules described below under ARITHMETIC EVALUATION. If the value of the expression is non-zero, the return status is 0; otherwise the return status is 1. This is exactly equivalent to let "expression".
When you run this script with -e or enable errexit -- probably because the script has become too big to be reliable without it -- count++ is going to return 0 (post-increment) and per above stop the script. A definite trap to watch out for!

19 June 2014

Ian Wienand: ip link up versus status UP

ip link show has an up filter which the man page describes as display running interfaces. The output also shows the state, e.g.
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
  link/ether e8:03:9a:b6:46:b3 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
  link/ether c4:85:08:2a:6e:3a brd ff:ff:ff:ff:ff:ff
It is not described what the difference is between these two. The state output is derived from IFLA_OPERSTATE as per this little bit in ip/ipaddress.c in iproute2
if (tb[IFLA_OPERSTATE])
  print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
as per operstates.txt a status of UP means that the interface is up and can be used to send packets. In contrast, ip link show up runs a different filter
if (filter.up && !(ifi->ifi_flags&IFF_UP))
  return 0;
Thus only interfaces that are in state IFF_UP are shown. This means it is administratively up, but this does not mean it is available and ready to send packets. This may make a difference if you're trying to use the output of ip link in any sort of automation where you have to probe the available network-interfaces and know where packets are going to get out or not.

Ian Wienand: Non-stick tortilla technique

I have never had much success with tortilla making, generally finding the fragile masa would stick to everything and tear easily. However I was listening to the Cook's Illustrated podcast where they mentioned that cooking a pizza on a stone with silicone-paper makes no difference over doing it on the stone directly. For some reason I had just never thought of cooking it with the silicone-paper on; I'd always frustrated myself trying to separate the pressed tortilla in various ways. So here's my technique:
  1. Mix the masa as on the packet; 2 cups to 2-half cups water
  2. Leave for 20 minutes or so
  3. Use plastic-wrap on the top of the tortilla press
  4. Cut a silicone-paper square for the bottom
  5. Ball the masa and press
  6. Now the plastic-wrap will peel easily off the top of the uncooked tortilla
  7. Take the silcone-paper and put the tortilla down on the pan/hotplate
  8. Give a slight press with the spatula around the edges
  9. After about 20 seconds, the silicone-paper should peel off the tortilla fairly easily.
  10. repeat!
I've tried mixing in suet, oil and copha and none of them are worth the extra calories over just plain masa in my opinion and don't help with sticking anyway. Enjoy!

11 March 2014

Ian Wienand: Tenvis IP391W meta-page

Recently I purchased a Tenvis IP391W-HD camera. I would be unlikely to recommend it. The price is certainly right and the picture quality is quite good. The Android and iPhone apps do work to watch the stream live. However, the interface is terrible and almost useless without Internet Explorer. There is a RTSP stream (rtsp://admin:password@ip) which VLC can seem to handle, but not mplayer. The recording format (.h264) is not viewable by VLC or mplayer and all I could find is a Windows .exe to convert them to an .avi. The motion detection gets troubled by the dark. It would really only be useful for something permanently well-lit. It did send me emails via gmail. I have got it recording to a NFS server, but I don't have a lot of confidence in the reliability of it. I think I have it configured to record in 3600-second blocks (given the interface, it's hard to tell if I've set it up to the network, or to the internal flash, etc), but it seems to intersperse 60 minute recordings with random small recordings. Given the whole idea of a security camera is to record the unexpected, you want a lot of confidence you're actually recording, which you don't get with this. You can see below it recorded 3 hour blocks, then started going a little crazy...
-rw-r--r-- 1 nobody nogroup  69M Mar 11 01:25 0-003035.v264
-rw-r--r-- 1 nobody nogroup  69M Mar 11 02:25 0-013049.v264
-rw-r--r-- 1 nobody nogroup  69M Mar 11 03:26 0-023103.v264
-rw-r--r-- 1 nobody nogroup 5.9M Mar 11 03:31 0-033117.v264
-rw-r--r-- 1 nobody nogroup 1.5M Mar 11 03:40 0-034350.v264
-rw-r--r-- 1 nobody nogroup  17M Mar 11 04:02 0-035259.v264
-rw-r--r-- 1 nobody nogroup 306K Mar 11 04:10 0-041548.v264
-rw-r--r-- 1 nobody nogroup 4.9M Mar 11 04:23 0-042457.v264
There is a support forum, where I found the following files scattered in various posts. From what I can tell, they are the latest as of this writing. I can confirm they work with my IP391W-HD, which the system tells me is GM8126 hardware and came with firmware 1.2.8.3.
  • 1.3.3.3.pk2 - firmware (b56f211a569fb03a37d13b706c660dcb)
  • web.pk2 - a UI update that includes dropbox support. This is really for the model that has pan and tilt, so those buttons don't work. (0e42e42bd6f8034e87dcd443dcc3594d)
  • V264ToAVIen.exe - converts the output to an AVI file that mplayer will play (with some complaints) (9c5a858aa454fed4a0186cf244c0d234)
www.modern.ie offers free limited-time Windows VM's which will work to upload this firmware. Just make sure you use a bridged network in the VM; I'm guessing the firmware ActiveX control tells the camera to TFTP the data from it, which doesn't work via NAT. Somewhat worryingly, you can telnet to it and get a login prompt (TASTECH login). So it has a built-in backdoor you can't disable. There have been some efforts to hack the device. leecher@dose.0wnz.at did an excellent job reverse engineering the .pk2 format and writing tenvis_pack.c (no license, I'm generously assuming public domain). I used this to recreate the firmware above with a telnet daemon listening with a shell on port 2525 (no password, just telnet to it)
It's interesting to poke around, but it seems like the whole thing is really driven by a binary called ipc8126
/ # ipc8126 --help
*** TAS-Tech IPCAM/DVS
*** Version: 1.3.3.3
*** Release date: 2013-08-05 15:48:32
In general, I'd say hackability is quite low. Warning : any of the above might turn your camera into a paperweight. It worked for me, but that's all I can say...

6 March 2014

Ian Wienand: Skipping pages with django.core.paginator

Here's a little snippet for compressing the length of long Django pagination by just showing context around the currently selected page. What I wanted was the first few and last few pages always selectable with some context pages around the currently selected page; e.g. Example of skip markers in paginator If that's what you're looking for, some variation on below may be of use. In this approach, you build up a list of pages similar to the paginator object page_range but with only the relevant pages and the skip-markers identified.
from django.core.paginator import Paginator
import unittest
class Pages:
    def __init__(self, objects, count):
        self.pages = Paginator(objects, count)
    def pages_to_show(self, page):
        # pages_wanted stores the pages we want to see, e.g.
        #  - first and second page always
        #  - two pages before selected page
        #  - the selected page
        #  - two pages after selected page
        #  - last two pages always
        #
        # Turning the pages into a set removes duplicates for edge
        # cases where the "context pages" (before and after the
        # selected) overlap with the "always show" pages.
        pages_wanted = set([1,2,
                            page-2, page-1,
                            page,
                            page+1, page+2,
                            self.pages.num_pages-1, self.pages.num_pages])
        # The intersection with the page_range trims off the invalid
        # pages outside the total number of pages we actually have.
        # Note that includes invalid negative and >page_range "context
        # pages" which we added above.
        pages_to_show = set(self.pages.page_range).intersection(pages_wanted)
        pages_to_show = sorted(pages_to_show)
        # skip_pages will keep a list of page numbers from
        # pages_to_show that should have a skip-marker inserted
        # after them.  For flexibility this is done by looking for
        # anywhere in the list that doesn't increment by 1 over the
        # last entry.
        skip_pages = [ x[1] for x in zip(pages_to_show[:-1],
                                         pages_to_show[1:])
                       if (x[1] - x[0] != 1) ]
        # Each page in skip_pages should be follwed by a skip-marker
        # sentinel (e.g. -1).
        for i in skip_pages:
            pages_to_show.insert(pages_to_show.index(i), -1)
        return pages_to_show
class TestPages(unittest.TestCase):
    def runTest(self):
        objects = [x for x in range(0,1000)]
        p = Pages(objects, 10)
        self.assertEqual(p.pages_to_show(0),
                         [1, 2, -1, 99, 100])
        self.assertEqual(p.pages_to_show(1),
                         [1,2,3,-1,99,100])
        self.assertEqual(p.pages_to_show(2),
                         [1,2,3,4,-1,99,100])
        self.assertEqual(p.pages_to_show(3),
                         [1,2,3,4,5,-1,99,100])
        self.assertEqual(p.pages_to_show(4),
                         [1,2,3,4,5,6,-1,99,100])
        self.assertEqual(p.pages_to_show(5),
                         [1,2,3,4,5,6,7,-1,99,100])
        self.assertEqual(p.pages_to_show(6),
                         [1,2,-1,4,5,6,7,8,-1,99,100])
        self.assertEqual(p.pages_to_show(7),
                         [1,2,-1,5,6,7,8,9,-1,99,100])
        self.assertEqual(p.pages_to_show(50),
                         [1,2,-1,48,49,50,51,52,-1,99,100])
        self.assertEqual(p.pages_to_show(93),
                         [1,2,-1,91,92,93,94,95,-1,99,100])
        self.assertEqual(p.pages_to_show(94),
                         [1,2,-1,92,93,94,95,96,-1,99,100])
        self.assertEqual(p.pages_to_show(95),
                         [1,2,-1,93,94,95,96,97,-1,99,100])
        self.assertEqual(p.pages_to_show(96),
                         [1,2,-1,94,95,96,97,98,99,100])
        self.assertEqual(p.pages_to_show(97),
                         [1,2,-1,95,96,97,98,99,100])
        self.assertEqual(p.pages_to_show(98),
                         [1,2,-1,96,97,98,99,100])
        self.assertEqual(p.pages_to_show(99),
                         [1,2,-1,97,98,99,100])
        self.assertEqual(p.pages_to_show(100),
                         [1,2,-1,98,99,100])
if __name__ == '__main__':
    unittest.main()
Then somehow pass through the pages_to_show to your view (below I added it to the paginator object passed) and use a template along the lines of
<ul class="pagination">
 % if pages.has_previous % 
  <li><a href="foo.html?page=  pages.previous_page_number  ">&laquo;</a></li>
 % else % 
  <li class="disabled"><a href="#">&laquo;</a></li>
 % endif % 
 % for page in pages.pages_to_show % 
   % if page == -1 % 
  <li class="disabled"><a href="#">&hellip;</a></li>
   % elif page == pages.number % 
  <li class="active"><a href="#">  page_num  </a></li>
   % else % 
  <li><a href="foo.html?page=  page_num  "> page_num </a>
   % endif % 
 % endfor % 
 % if pages.has_next % 
  <li><a href="foo.html?page=  pages.next_page_number  ">&raquo;</a></li>
 % else % 
  <li class="disabled"><a href="#">&raquo;</a></li>
 % endif % 
</ul>

Next.