Search Results: "geoff"

9 September 2023

Dirk Eddelbuettel: RcppFarmHash 0.0.3 on CRAN: Small Update

A minor maintenance release of the RcppFarmHash package is now on CRAN as version 0.0.3. RcppFarmHash wraps the Google FarmHash family of hash functions (written by Geoff Pike and contributors) that are used for example by Google BigQuery for the FARM_FINGERPRINT digest. This releases farms out the conversion to the integer64 add-on type in R to the new package RcppInt64 released a few days ago and adds some minor maintenance on continuous integration and alike. The brief NEWS entry follows:

Changes in version 0.0.3 (2023-09-09)
  • Rely on new RcppInt64 package and its header for conversion
  • Minor updates to continuous integration and README.md

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

2 June 2023

Matt Brown: Calling time on DNSSEC: The costs exceed the benefits

I m calling time on DNSSEC. Last week, prompted by a change in my DNS hosting setup, I began removing it from the few personal zones I had signed. Then this Monday the .nz ccTLD experienced a multi-day availability incident triggered by the annual DNSSEC key rotation process. This incident broke several of my unsigned zones, which led me to say very unkind things about DNSSEC on Mastodon and now I feel compelled to more completely explain my thinking: For almost all domains and use-cases, the costs and risks of deploying DNSSEC outweigh the benefits it provides. Don t bother signing your zones. The .nz incident, while topical, is not the motivation or the trigger for this conclusion. Had it been a novel incident, it would still have been annoying, but novel incidents are how we learn so I have a small tolerance for them. The problem with DNSSEC is precisely that this incident was not novel, just the latest in a long and growing list. It s a clear pattern. DNSSEC is complex and risky to deploy. Choosing to sign your zone will almost inevitably mean that you will experience lower availability for your domain over time than if you leave it unsigned. Even if you have a team of DNS experts maintaining your zone and DNS infrastructure, the risk of routine operational tasks triggering a loss of availability (unrelated to any attempted attacks that DNSSEC may thwart) is very high - almost guaranteed to occur. Worse, because of the nature of DNS and DNSSEC these incidents will tend to be prolonged and out of your control to remediate in a timely fashion. The only benefit you get in return for accepting this almost certain reduction in availability is trust in the integrity of the DNS data a subset of your users (those who validate DNSSEC) receive. Trusted DNS data that is then used to communicate across an untrusted network layer. An untrusted network layer which you are almost certainly protecting with TLS which provides a more comprehensive and trustworthy set of security guarantees than DNSSEC is capable of, and provides those guarantees to all your users regardless of whether they are validating DNSSEC or not. In summary, in our modern world where TLS is ubiquitous, DNSSEC provides only a thin layer of redundant protection on top of the comprehensive guarantees provided by TLS, but adds significant operational complexity, cost and a high likelihood of lowered availability. In an ideal world, where the deployment cost of DNSSEC and the risk of DNSSEC-induced outages were both low, it would absolutely be desirable to have that redundancy in our layers of protection. In the real world, given the DNSSEC protocol we have today, the choice to avoid its complexity and rely on TLS alone is not at all painful or risky to make as the operator of an online service. In fact, it s the prudent choice that will result in better overall security outcomes for your users. Ignore DNSSEC and invest the time and resources you would have spent deploying it improving your TLS key and certificate management. Ironically, the one use-case where I think a valid counter-argument for this position can be made is TLDs (including ccTLDs such as .nz). Despite its many failings, DNSSEC is an Internet Standard, and as infrastructure providers, TLDs have an obligation to enable its use. Unfortunately this means that everyone has to bear the costs, complexities and availability risks that DNSSEC burdens these operators with. We can t avoid that fact, but we can avoid creating further costs, complexities and risks by choosing not to deploy DNSSEC on the rest of our non-TLD zones.

But DNSSEC will save us from the evil CA ecosystem! Historically, the strongest motivation for DNSSEC has not been the direct security benefits themselves (which as explained above are minimal compared to what TLS provides), but in the new capabilities and use-cases that could be enabled if DNS were able to provide integrity and trusted data to applications. Specifically, the promise of DNS-based Authentication of Named Entities (DANE) is that with DNSSEC we can be free of the X.509 certificate authority ecosystem and along with it the expensive certificate issuance racket and dubious trust properties that have long been its most distinguishing features. Ten years ago this was an extremely compelling proposition with significant potential to improve the Internet. That potential has gone unfulfilled. Instead of maturing as deployments progressed and associated operational experience was gained, DNSSEC has been beset by the discovery of issue after issue. Each of these has necessitated further changes and additions to the protocol, increasing complexity and deployment cost. For many zones, including significant zones like google.com (where I led the attempt to evaluate and deploy DNSSEC in the mid 2010s), it is simply infeasible to deploy the protocol at all, let alone in a reliable and dependable manner. While DNSSEC maturation and deployment has been languishing, the TLS ecosystem has been steadily and impressively improving. Thanks to the efforts of many individuals and companies, although still founded on the use of a set of root certificate authorities, the TLS and CA ecosystem today features transparency, validation and multi-party accountability that comprehensively build trust in the ability to depend and rely upon the security guarantees that TLS provides. When you use TLS today, you benefit from:
  • Free/cheap issuance from a number of different certificate authorities.
  • Regular, automated issuance/renewal via the ACME protocol.
  • Visibility into who has issued certificates for your domain and when through Certificate Transparency logs.
  • Confidence that certificates issued without certificate transparency (and therefore lacking an SCT) will not be accepted by the leading modern browsers.
  • The use of modern cryptographic protocols as a baseline, with a plausible and compelling story for how these can be steadily and promptly updated over time.
DNSSEC with DANE can match the TLS ecosystem on the first benefit (up front price) and perhaps makes the second benefit moot, but has no ability to match any of the other transparency and accountability measures that today s TLS ecosystem offers. If your ZSK is stolen, or a parent zone is compromised or coerced, validly signed TLSA records for a forged certificate can be produced and spoofed to users under attack with minimal chances of detection. Finally, in terms of overall trust in the roots of the system, the CA/Browser forum requirements continue to improve the accountability and transparency of TLS certificate authorities, significantly reducing the ability for any single actor (say a nefarious government) to subvert the system. The DNS root has a well established transparent multi-party system for establishing trust in the DNSSEC root itself, but at the TLD level, almost intentionally thanks to the hierarchical nature of DNS, DNSSEC has multiple single points of control (or coercion) which exist outside of any formal system of transparency or accountability. We ve moved from DANE being a potential improvement in security over TLS when it was first proposed, to being a definite regression from what TLS provides today. That s not to say that TLS is perfect, but given where we re at, we ll get a better security return from further investment and improvements in the TLS ecosystem than we will from trying to fix DNSSEC.

But TLS is not ubiquitous for non-HTTP applications The arguments above are most compelling when applied to the web-based HTTP-oriented ecosystem which has driven most of the TLS improvements we ve seen to date. Non-HTTP protocols are lagging in adoption of many of the improvements and best practices TLS has on the web. Some claim this need to provide a solution for non-HTTP, non-web applications provides a motivation to continue pushing DNSSEC deployment. I disagree, I think it provides a motivation to instead double-down on moving those applications to TLS. TLS as the new TCP. The problem is that costs of deploying and operating DNSSEC are largely fixed regardless of how many protocols you are intending to protect with it, and worse, the negative side-effects of DNSSEC deployment can and will easily spill over to affect zones and protocols that don t want or need DNSSEC s protection. To justify continued DNSSEC deployment and operation in this context means using a smaller set of benefits (just for the non-HTTP applications) to justify the already high costs of deploying DNSSEC itself, plus the cost of the risk that DNSSEC poses to the reliability to your websites. I don t see how that equation can ever balance, particularly when you evaluate it against the much lower costs of just turning on TLS for the rest of your non-HTTP protocols instead of deploying DNSSEC. MTA-STS is a worked example of how this can be achieved. If you re still not convinced, consider that even DNS itself is considering moving to TLS (via DoT and DoH) in order to add the confidentiality/privacy attributes the protocol currently lacks. I m not a huge fan of the latency implications of these approaches, but the ongoing discussion shows that clever solutions and mitigations for that may exist. DoT/DoH solve distinct problems from DNSSEC and in principle should be used in combination with it, but in a world where DNS itself is relying on TLS and therefore has eliminated the majority of spoofing and cache poisoning attacks through DoT/DoH deployment the benefit side of the DNSSEC equation gets smaller and smaller still while the costs remain the same.

OK, but better software or more careful operations can reduce DNSSEC s cost Some see the current DNSSEC costs simply as teething problems that will reduce as the software and tooling matures to provide more automation of the risky processes and operational teams learn from their mistakes or opt to simply transfer the risk by outsourcing the management and complexity to larger providers to take care of. I don t find these arguments compelling. We ve already had 15+ years to develop improved software for DNSSEC without success. What s changed that we should expect a better outcome this year or next? Nothing. Even if we did have better software or outsourced operations, the approach is still only hiding the costs behind automation or transferring the risk to another organisation. That may appear to work in the short-term, but eventually when the time comes to upgrade the software, migrate between providers or change registrars the debt will come due and incidents will occur. The problem is the complexity of the protocol itself. No amount of software improvement or outsourcing addresses that. After 15+ years of trying, I think it s worth considering that combining cryptography, caching and distributed consensus, some of the most fundamental and complex computer science problems, into a slow-moving and hard to evolve low-level infrastructure protocol while appropriately balancing security, performance and reliability appears to be beyond our collective ability. That doesn t have to be the end of the world, the improvements achieved in the TLS ecosystem over the same time frame provide a positive counter example - perhaps DNSSEC is simply focusing our attention at the wrong layer of the stack. Ideally secure DNS data would be something we could have, but if the complexity of DNSSEC is the price we have to pay to achieve it, I m out. I would rather opt to remain with the simpler yet insecure DNS protocol and compensate for its short comings at higher transport or application layers where experience shows we are able to more rapidly improve and develop our security capabilities.

Summing up For the vast majority of domains and use-cases there is simply no net benefit to deploying DNSSEC in 2023. I d even go so far as to say that if you ve already signed your zones, you should (carefully) move them back to being unsigned - you ll reduce the complexity of your operating environment and lower your risk of availability loss triggered by DNS. Your users will thank you. The threats that DNSSEC defends against are already amply defended by the now mature and still improving TLS ecosystem at the application layer, and investing in further improvements here carries far more return than deployment of DNSSEC. For TLDs, like .nz whose outage triggered this post, DNSSEC is not going anywhere and investment in mitigating its complexities and risks is an unfortunate burden that must be shouldered. While the full incident report of what went wrong with .nz is not yet available, the interim report already hints at some useful insights. It is important that InternetNZ publishes a full and comprehensive review so that the full set of learnings and improvements this incident can provide can be fully realised by .nz and other TLD operators stuck with the unenviable task of trying to safely operate DNSSEC.

Postscript After taking a few days to draft and edit this post, I ve just stumbled across a presentation from the well respected Geoff Huston at last weeks RIPE86 meeting. I ve only had time to skim the slides (video here) - they don t seem to disagree with my thinking regarding the futility of the current state of DNSSEC, but also contain some interesting ideas for what it might take for DNSSEC to become a compelling proposition. Probably worth a read/watch!

1 November 2022

Jonathan Dowland: Halloween playlist 2022

I hope you had a nice Halloween! I've collected together some songs that I've enjoyed over the last couple of years that loosely fit a theme: ambient, instrumental, experimental, industrial, dark, disconcerting, etc. I've prepared a Spotify playlist of most of them, but not all. The list is inline below as well, with many (but not all) tracks linking to Bandcamp, if I could find them there. This is a bit late, sorry. If anyone listens to something here and has any feedback I'd love to hear it. (If you are reading this on an aggregation site, it's possible the embeds won't work. If so, click through to my main site) Spotify playlist: https://open.spotify.com/playlist/3bEvEguRnf9U1RFrNbv5fk?si=9084cbf78c364ac8; The list, with Bandcamp embeds where possible: Some sources
  1. Via Stuart Maconie's Freak Zone
  2. Via Mary Anne Hobbs
  3. Via Lose yourself with
  4. Soma FM - Doomed (Halloween Special)

29 July 2022

Bits from Debian: New Debian Developers and Maintainers (May and June 2022)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

2 August 2021

Dirk Eddelbuettel: RcppFarmHash 0.0.2: Maintenance

A minor maintenance release of the new package RcppFarmHash, first released in version 0.0.1 a week ago, is now on CRAN in an version 0.0.2. RcppFarmHash wraps the Google FarmHash family of hash functions (written by Geoff Pike and contributors) that are used for example by Google BigQuery for the FARM_FINGERPRINT digest. This releases adds a #define which was needed on everybody s favourite CRAN platform to not attempt to include a missing header endian.h. With this added #define all is well as we can already tell from looking at the CRAN status where the three machines maintained by you-may-know-who have already built the package. The others will follow over the next few days. I also tweeted about the upload with a screenshot demonstrating an eight minute passage from upload to acceptance with the added #ThankYouCRAN tag to say thanks for very smooth and fully automated processing at their end. The very brief NEWS entry follows:

Changes in version 0.0.2 (2021-08-02)
  • On SunOS, set endianness to not error on #include endian.h
  • Add badges and installation notes to README as package is on CRAN

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

27 July 2021

Dirk Eddelbuettel: RcppFarmHash 0.0.1: New CRAN Package

A new package RcppFarmHash is now on CRAN in an inaugural version 0.0.1. RcppFarmHash wraps the Google FarmHash family of hash functions (written by Geoff Pike and contributors) that are used for example by Google BigQuery for the FARM_FINGERPRINT. The package was prepared and uploaded yesterday afternoon, and to my surprise already on CRAN this (early) morning when I got up. So here is another #ThankYouCRAN for very smoothing operations. The very brief NEWS entry follows:

Changes in version 0.0.1 (2021-07-25)
  • Initial version and CRAN upload

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

13 September 2020

Russell Coker: Setting Up a Dell T710 Server with DRAC6

I ve just got a Dell T710 server for LUV and I ve been playing with the configuration options. Here s a list of what I ve done and recommendations on how to do things. I decided not to try to write a step by step guide to doing stuff as the situation doesn t work for that. I think that a list of how things are broken and what to avoid is more useful. BIOS Firstly with a Dell server you can upgrade the BIOS from a Linux root shell. Generally when a server is deployed you won t want to upgrade the BIOS (why risk breaking something when it s working), so before deployment you probably should install the latest version. Dell provides a shell script with encoded binaries in it that you can just download and run, it s ugly but it works. The process of loading the firmware takes a long time (a few minutes) with no progress reports, so best to plug both PSUs in and leave it alone. At the end of loading the firmware a hard reboot will be performed, so upgrading your distribution while doing the install is a bad idea (Debian is designed to not lose data in this situation so it s not too bad). IDRAC IDRAC is the Integrated Dell Remote Access Controller. By default it will listen on all Ethernet ports, get an IP address via DHCP (using a different Ethernet hardware address to the interface the OS sees), and allow connections. Configuring it to be restrictive as to which ports it listens on may be a good idea (the T710 had 4 ports built in so having one reserved for management is usually OK). You need to configure a username and password for IDRAC that has administrative access in the BIOS configuration. Web Interface By default IDRAC will run a web server on the IP address it gets from DHCP, you can connect to that from any web browser that allows ignoring invalid SSL keys. Then you can use the username and password configured in the BIOS to login. IDRAC 6 on the PowerEdge T710 recommends IE 6. To get a ssl cert that matches the name you want to use (and doesn t give browser errors) you have to generate a CSR (Certificate Signing Request) on the DRAC, the only field that matters is the CN (Common Name), the rest have to be something that Letsencrypt will accept. Certbot has the option --config-dir /etc/letsencrypt-drac to specify an alternate config directory, the SSL key for DRAC should be entirely separate from the SSL key for other stuff. Then use the --csr option to specify the path of the CSR file. When you run letsencrypt the file name of the output file you want will be in the format *_chain.pem . You then have to upload that to IDRAC to have it used. This is a major pain for the lifetime of letsencrypt certificates. Hopefully a more recent version of IDRAC has Certbot built in. When you access RAC via ssh (see below) you can run the command racadm sslcsrgen to generate a CSR that can be used by certbot. So it s probably possible to write expect scripts to get that CSR, run certbot, and then put the ssl certificate back. I don t expect to use IDRAC often enough to make it worth the effort (I can tell my browser to ignore an outdated certificate), but if I had dozens of Dells I d script it. SSH The web interface allows configuring ssh access which I strongly recommend doing. You can configure ssh access via password or via ssh public key. For ssh access set TERM=vt100 on the host to enable backspace as ^H. Something like TERM=vt100 ssh root@drac . Note that changing certain other settings in IDRAC such as enabling Smartcard support will disable ssh access. There is a limit to the number of open sessions for managing IDRAC, when you ssh to the IDRAC you can run racadm getssninfo to get a list of sessions and racadm closessn -i NUM to close a session. The closessn command takes a -a option to close all sessions but that just gives an error saying that you can t close your own session because of programmer stupidity. The IDRAC web interface also has an option to close sessions. If you get to the limits of both ssh and web sessions due to network issues then you presumably have a problem. I couldn t find any documentation on how the ssh host key is generated. I Googled for the key fingerprint and didn t get a match so there s a reasonable chance that it s unique to the server (please comment if you know more about this). Don t Use Graphical Console The T710 is an older server and runs IDRAC6 (IDRAC9 is the current version). The Java based graphical console access doesn t work with recent versions of Java. The Debian package icedtea-netx has has the javaws command for running the .jnlp command for the console, by default the web browser won t run this, you download the .jnlp file and pass that as the first parameter to the javaws program which then downloads a bunch of Java classes from the IDRAC to run. One error I got with Java 11 was Exception in thread Timer-0 java.util.ServiceConfigurationError: java.nio.charset.spi.CharsetProvider: Provider sun.nio.cs.ext.ExtendedCharsets could not be instantiated , Google didn t turn up any solutions to this. Java 8 didn t have that problem but had a connection failed error that some people reported as being related to the SSL key, but replacing the SSL key for the web server didn t help. The suggestion of running a VM with an old web browser to access IDRAC didn t appeal. So I gave up on this. Presumably a Windows VM running IE6 would work OK for this. Serial Console Fortunately IDRAC supports a serial console. Here s a page summarising Serial console setup for DRAC [1]. Once you have done that put console=tty0 console=ttyS1,115200n8 on the kernel command line and Linux will send the console output to the virtual serial port. To access the serial console from remote you can ssh in and run the RAC command console com2 (there is no option for using a different com port). The serial port seems to be unavailable through the web interface. If I was supporting many Dell systems I d probably setup a ssh to JavaScript gateway to provide a web based console access. It s disappointing that Dell didn t include this. If you disconnect from an active ssh serial console then the RAC might keep the port locked, then any future attempts to connect to it will give the following error:
/admin1-> console com2
console: Serial Device 2 is currently in use
So far the only way I ve discovered to get console access again after that is the command racadm racreset . If anyone knows of a better way please let me know. As an aside having racreset being so close to racresetcfg (which resets the configuration to default and requires a hard reboot to configure it again) seems like a really bad idea. Host Based Management
deb http://linux.dell.com/repo/community/ubuntu xenial openmanage
The above apt sources.list line allows installing Dell management utilities (Xenial is old but they work on Debian/Buster). Probably the packages srvadmin-storageservices-cli and srvadmin-omacore will drag in enough dependencies to get it going. Here are some useful commands:
# show hardware event log
omreport system esmlog
# show hardware alert log
omreport system alertlog
# give summary of system information
omreport system summary
# show versions of firmware that can be updated
omreport system version
# get chassis temp
omreport chassis temps
# show physical disk details on controller 0
omreport storage pdisk controller=0
RAID Control The RAID controller is known as PERC (PowerEdge Raid Controller), the Dell web site has an rpm package of the perccli tool to manage the RAID from Linux. This is statically linked and appears to have different versions of libraries used to make it. The command perccli show gives an overview of the configuration, but the command perccli /c0 show to give detailed information on controller 0 SEGVs and the kernel logs a vsyscall attempted with vsyscall=none message. Here s an overview of the vsyscall enmulation issue [2]. Basically I could add vsyscall=emulate to the kernel command line and slightly reduce security for the system to allow system calls from perccli that are called from old libc code to work, or I could run code from a dubious source as root. Some versions of IDRAC have a racadm raid command that can be run from a ssh session to perform RAID administration remotely, mine doesn t have that. As an aside the help for the RAC system doesn t list all commands and the Dell documentation is difficult to find so blog posts from other sysadmins is the best source of information. I have configured IDRAC to have all of the BIOS output go to the virtual serial console over ssh so I can see the BIOS prompt me for PERC administration but it didn t accept my key presses when I tried to do so. In any case this requires downtime and I d like to be able to change hard drives without rebooting. I found vsyscall_trace on Github [3], it uses the ptrace interface to emulate vsyscall on a per process basis. You just run vsyscall_trace perccli and it works! Thanks Geoffrey Thomas for writing this! Here are some perccli commands:
# show overview
perccli show
# help on adding a vd (RAID)
perccli /c0 add help
# show controller 0 details
perccli /c0 show
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0
When a disk is added to a PERC controller about 525MB of space is used at the end for RAID metadata. So when you create a RAID-0 with a single device as in the above example all disk data is preserved by default except for the last 525MB. I have tested this by running a BTRFS scrub on a disk from another system after making it into a RAID-0 on the PERC.

10 September 2016

Sylvain Le Gall: Release of OASIS 0.4.7

I am happy to announce the release of OASIS v0.4.7. Logo OASIS small OASIS is a tool to help OCaml developers to integrate configure, build and install systems in their projects. It should help to create standard entry points in the source code build system, allowing external tools to analyse projects easily. This tool is freely inspired by Cabal which is the same kind of tool for Haskell. You can find the new release here and the changelog here. More information about OASIS in general on the OASIS website. Pull request for inclusion in OPAM is pending. Here is a quick summary of the important changes: Features: This version contains a lot of changes and is the achievement of a huge amount of work. The addition of OMake as a plugin is a huge progress. The overall work has been targeted at making OASIS more library like. This is still a work in progress but we made some clear improvement by getting rid of various side effect (like the requirement of using "chdir" to handle the "-C", which leads to propage ~ctxt everywhere and design OASISFileSystem). I would like to thanks again the contributor for this release: Spiros Eliopoulos, Paul Snively, Jeremie Dimino, Christopher Zimmermann, Christophe Troestler, Max Mouratov, Jacques-Pascal Deplaix, Geoff Shannon, Simon Cruanes, Vladimir Brankov, Gabriel Radanne, Evgenii Lepikhin, Petter Urkedal, Gerd Stolpmann and Anton Bachin.

30 August 2016

Daniel Stender: My work for Debian in August

Here's again a little list of my humble off-time contributions I'm happy to add to the large amount of work we're completing all together each month. Then there is one more "new in Debian" (meaning: "new in unstable") announcement. First, the uploads (a few of them are from July): New packages: Sponsored uploads: Requested resp. suggested for packaging: New in Debian: Lasagne (deep learning framework) Now that the mathematical expression compiler Theano is available in Debian, deep learning frameworks resp. toolkits which have been build on top of it can become available within Debian, too (like Blocks, mentioned before). Theano is an own general computing engine which has been developed with a focus on machine learning resp. neural networks, which features an own declarative tensor language. The toolkits which have build upon it vary in the way how much they abstract the bare features of Theano, if they are "thick" or "thin" so to say. When the abstraction gets higher you gain more end user convenience up to the level that you have the architectural components of neural networks available for combination like in a lego box, while the more complicated things which are going on "under the hood" (like how the networks are actually implemented) are hidden. The downside is, thick abstraction layers usually makes it difficult to implement novel features like custom layers or loss functions. So more experienced users and specialists might to seek out for the lower abstraction toolkits, where you have to think more in terms of Theano. I've got an initial package of Keras in experimental (1.0.7-1), it runs (only a Python 3 package is available so far) but needs some more work (e.g. building the documentation with mkdocs). Keras is a minimalistic, high modular DNN library inspired by Torch1. It has a clean, rather easy API for experimenting and fast prototyping. It can also run on top of Google's TensorFlow, and we're going to have it ready for that, too. Lasagne follows a different approach. It's, like Keras and Blocks, a Python library to create and train multi-layered artificial neural networks in/on Theano for applications like image recognition resp. classification, speech recognition, image caption generation or other purposes like style transfers from paintings to pictures2. It abstracts Theano as little as possible, and could be seen rather like an extension or an add-on than an abstraction3. Therefore, knowledge on how things are working in Theano would be needed to make full use out of this piece of software. With the new Debian package (0.1+git20160728.8b66737-1)4, the whole required software stack (the corresponding Theano package, NumPy, SciPy, a BLAS implementation, and the nividia-cuda-toolkit and NVIDIA kernel driver to carry out computations on the GPU5) could be installed most conveniently just by a single apt-get install python ,3 -lasagne command6. If wanted with the documentation package lasagne-doc for offline use (no running around on remote airports seeking for a WIFI spot), either in the Python 2 or the Python 3 branch, or both flavours altogether7. While others have to spend a whole weekend gathering, compiling and installing the needed libraries you can grab yourself a fresh cup of coffee. These are the advantages of a fully integrated system (sublime message, as always: desktop users switch to Linux!). When the installation of packages has completed, the MNIST example of Lasagne could be used for a quick check if the whole library stack works properly8:
$ THEANO_FLAGS=device=gpu,floatX=float32 python /usr/share/doc/python-lasagne/examples/mnist.py mlp 5
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN 5005)
Loading data...
Downloading train-images-idx3-ubyte.gz
Downloading train-labels-idx1-ubyte.gz
Downloading t10k-images-idx3-ubyte.gz
Downloading t10k-labels-idx1-ubyte.gz
Building model and compiling functions...
Starting training...
Epoch 1 of 5 took 2.488s
  training loss:        1.217167
  validation loss:      0.407390
  validation accuracy:      88.79 %
Epoch 2 of 5 took 2.460s
  training loss:        0.568058
  validation loss:      0.306875
  validation accuracy:      91.31 %
The example on how to train a neural network on the MNIST database of handwritten digits is refined (it also provides --help) and explained in detail in the Tutorial section of the documentation in /usr/share/doc/lasagne-doc/html/. Very good starting points are also the IPython notebooks that are available from the tutorials by Eben Olson9 and Geoffrey French on the PyData London 201610. There you have Theano basics, examples for employing convolutional neural networks (CNN) and recurrent neural networks (RNN) for a range of different purposes, how to use pre-trained networks for image recognition, etc.

  1. For a quick comparison of Keras and Lasagne with other toolkits, see Alex Rubinsteyn's PyData NYC 2015 presentation on using LSTM (long short term memory) networks on varying length sequence data like Grimm's fairy tales (https://www.youtube.com/watch?v=E92jDCmJNek 27:30 sq.)
  2. https://github.com/Lasagne/Recipes/tree/master/examples/styletransfer
  3. Great introduction to Theano and Lasagne by Eben Olson on the PyData NYC 2015: https://www.youtube.com/watch?v=dtGhSE1PFh0
  4. The package is "freelancing" currently being in collab-maint, to set up a deep learning packaging team within Debian is in the stage of discussion.
  5. Only available for amd64 and ppc64el.
  6. You would need "testing" as package source in /etc/apt/sources.list to install it from the archive at the present time (I have that for years, but if Debian Testing could be advised as productive system is going to be discussed elsewhere), but it's coming up for Debian 9. The cuda-toolkit and pycuda are in the non-free section of the archive, thus non-free (mostly used in combination with contrib) must be added to main. Plus, it's a mere suggestion of the Theano packages to keep Theano in main, so --install-suggests is needed to pull it automatically with the same command, or this must be given explicitly.
  7. For dealing with Theano in Debian, see this previous blog posting
  8. Like suggested in the guideline From Zero to Lasagne on Ubuntu 14.04. cuDNN isn't available as official Debian package yet, but could be downloaded as a .deb package after registration at https://developer.nvidia.com/cudnn. It integrates well out of the box.
  9. https://github.com/ebenolson/pydata2015
  10. https://github.com/Britefury/deep-learning-tutorial-pydata2016, video: https://www.youtube.com/watch?v=DlNR1MrK4qE

26 July 2016

Norbert Preining: TUG 2016 Day 1 Routers and Reading

The first day of the real conference started with an excellent overview of what one can do with TeX, spanning from traditional scientific journal styles to generating router configuration for cruising ships.
tug2016-color All this was crowned with an invited talk my Kevin Larson from Microsoft s typography department on how to support reading comprehension. Pavneet Aurora Opening: Passport to the TeX canvas Pavneet, our never-sleeping host and master of organization, opened the conference with a very philosophical introduction, touching upon a wide range of topics ranging from Microsoft, Twitter to the beauty of books, pages, and type. I think at some point he even mentioned TeX, but I can t remember for sure. His words put up a very nice and all-inclusive stage, a community that is open to all kind of influences with any disregard or prejudice. Let us hope that is reflects reality. Thanks Pavneet. Geoffrey Poore Advances in PythonTeX Our first regular talk was by Geoffrey reporting on recent advances in PythonTeX, a package that allows including python code in your TeX document. Starting with an introduction to PythonTeX, Geoggrey reports about an improved verbatim environment, fvextra, which patches fancyvrb, and improved interaction between tikz and PythonTeX. As I am a heavy user of listings for my teaching on algebraic specification languages, I will surely take a look at this package and see how it compares to listings. Stefan Kottwitz TeX in industry I: Programming Cisco network switches using TeX Next was Stefan from Lufthansa Industry Solutions, who reported first about his working environment, Cruise Ships with a very demanding IT infrastructure he has to design and implement. Then he introduced us to his way of generating IP configurations for all the devices using TeX. The reason he chose this method is that it allows him to generate at the same time proper documentation. It was surprising for me to hear that by using TeX he could far more efficiently and quicker produce well designed and easily accessible documentation, which helped both the company as well as made the clients happy! Stefan Kottwitz TeX in industry II: Designing converged network solutions After a coffee break, Stefan continued his exploration into industrial usage of TeX, this time about using tikz to generate graphics representing the network topology on the ships. Boris Veytsman Making ACM LaTeX styles Next up was Boris which brought us back to traditional realms of TeX when he guided us into the abyss of ACM LaTeX styles he tried to maintain for some time, until he plunged into a complete rewrite of the styles. Frank Mittelbach Alice goes floating global optimized pagination including picture placements The last talk before lunch (probably a strategic placement, otherwise Frank would continue for hours and hours) was Frank on global optimization of page breaks. Frank showed us what can and can not be done with current LaTeX, and how to play around with global optimization of pagination, using Alice in Wonderland as running example. We can only hope that his package is soon available in an easily consumable version to play around. Thai lunch Pavneet has organized three different lunch-styles for the three days of the conference, today s was Thai with spring rools, fried noodles, one kind of very interesting orange noodles, and chicken something. Michael Doob baseball rules summary After lunch Michael gave us an accessible explanation of the most arcane rules a game can have the baseball rules by using pseudo code. I think the total number of loc needed to explain the overall rules would fill more pages than the New York phonebook, so I am deeply impressed by all those who can understand these rules. Some of us even wandered off in the late afternoon to see a match with life explanations of Michael. Amartyo Banerjee, S.K. Venkatesan A Telegram bot for printing LaTeX files Next up was Amartyo who showed a Telegram (as in messenger application) bot running on a Raspberry Pi, that receives (La)TeX files and sends back compiled PDF files. While it is not ready for consumption (If you sneeze the bot will crash!), it looks like a promising application. Furthermore, it is nice to see how open APIs (like Telegram) can spur development of useful tools, while closed APIs (including threatening users, like WhatApp) hinders it. Norbert Preining Security improvements in the TeX Live Manager and installer Next up was my own talk about beefing up the security of TeX Live by providing integrity and authenticity checks via GnuPG, a feature that has been introduced with the recent release of TeX Live 2016. The following discussion gave me several good idea on how to further improve security and usability. Arthur Reutenauer -The TeX Live M sub-project (and open discussion) Arthur presented the TeX Live M (where the M stands supposedly for Mojca, who couldn t attend unfortunately) project: Their aim is to provide a curated and quality verified sub-part of TeX Live that is sufficiently complete for many applications, and easier for distributors and packagers. We had a lively discussion after Arthur s short presentation, mostly about why TeX Live does not have a on-the-fly installation like MikTeX. I insisted that this is already possible, using the tex-on-the-fly package which uses the mktextex infrastructure, but also caution against using it by default due to delays induced by repeatably reading the TeX Live database. I think this is a worth-while project for someone interested in learning the internals of TeX Live, but I am not sure whether I want to invest time into this feature. Another discussion point was about a testing infrastructure, which I am currently working on. This is in fact high on my list, to have some automatic minimal functionality testing a LaTeX package should at least load! Kevin Larson Reading between the lines: Improving comprehension for students Having a guest from Microsoft is rather rare in our quite Unix-centered environment, so big thanks to Pavneet again for setting up this contact, and big thanks to Kevin for coming. Kevin gave us a profound introduction to reading disabilities and how to improve reading comprehension. Starting with an excursion into what makes a font readable and how Microsoft develops optimally readable fonts, he than turned to reading disabilities like dyslexia, and how markup of text can increase students comprehension rate. He also toppled my long-term believe that dyslexia is connected to the similar shape of letters which are somehow visually malprocessed this was the scientific status from the 1920ies till the 70ies, but since then all researchers have abandoned this interpretation and dyslexia is now linked to problems linking shape to phonems. Kevin did an excellent job with a slightly difficult audience some people picking about grammer differences between British and US English and permanently derailing the discussion, and even more the high percentage of typographically somehow specially tasted participants. After the talk I had a lengthy discussion with Kevin about if/how this research can be carried over to non-Roman writing systems, in particular Kanji/Hanzi based writing systems, where dyslexia shows itself probably in different context. Kevin also mentioned that they want to add interword space to Chinese to help learners of Chinese (children, foreigners) to better parse, and studies showed that this helps a lot in comprehension. On a meta-level, this talk bracketed with the morning introduction by Pavneet, describing an open environment with stimulus back and forth in all directions. I am very happy that Kevin took the pain to come in his tight schedule, and I hope that the future will bring better cooperation at the end we are all working somehow on the same front only the the tools differ.
izakaya-sake-partyAfter the closing of the session, one part of our group went off to the baseball match, while another group dived into a Japanese-style Izakaya where we managed to kill huge amounts of sake and quite an amount of food. The photo shows me after the first bottle of sake, while just seeping on an intermediate small amount of genshu (kind of strong undiluted sake) before continuing to the next bottle. An interesting and stimulating first day of TUG, and I am sure that everyone was looking forward to day 2.

20 July 2016

Daniel Stender: Theano in Debian: maintenance, BLAS and CUDA

I'm glad to announce that we have the current release of Theano (0.8.2) in Debian unstable now, it's on its way into the testing branch and the Debian derivatives, heading for Debian 9. The Debian package is maintained in behalf of the Debian Science Team. We have a binary package with the modules in the Python 2.7 import path (python-theano), if you want or need to stick to that branch a little longer (as a matter of fact, in the current popcon stats it's the most installed package), and a package running on the default Python 3 version (python3-theano). The comprehensive documentation is available for offline usage in another binary package (theano-doc). Although Theano builds its extensions on run time and therefore all binary packages contain the same code, the source package generates arch specific packages1 for the reason that the exhaustive test suite could run over all the architectures to detect if there are problems somewhere (#824116). what's this? In a nutshell, Theano is a computer algebra system (CAS) and expression compiler, which is implemented in Python as a library. It is named after a Classical Greek female mathematician and it's developed at the LISA lab (located at MILA, the Montreal Institute for Learning Algorithms) at the Universit de Montr al. Theano tightly integrates multi-dimensional arrays (N-dimensional, ND-array) from NumPy (numpy.ndarray), which are broadly used in Scientific Python for the representation of numeric data. It features a declarative Python based language with symbolic operations for the functional definition of mathematical expressions, which allows to create functions that compute values for them. Internally the expressions are represented as directed graphs with nodes for variables and operations. The internal compiler then optimizes those graphs for stability and speed and then generates high-performance native machine code to evaluate resp. compute these mathematical expressions2. One of the main features of Theano is that it's capable to compute also on GPU processors (graphical processor unit), like on custom graphic cards (e.g. the developers are using a GeForce GTX Titan X for benchmarks). Today's GPUs became very powerful parallel floating point devices which can be employed also for scientific computations instead of 3D video games3. The acronym "GPGPU" (general purpose graphical processor unit) refers to special cards like NVIDIA's Tesla4, which could be used alike (more on that below). Thus, Theano is a high-performance number cruncher with an own computing engine which could be used for large-scale scientific computations. If you haven't came across Theano as a Pythonistic professional mathematician, it's also one of the most prevalent frameworks for implementing deep learning applications (training multi-layered, "deep" artificial neural networks, DNN) around5, and has been developed with a focus on machine learning from the ground up. There are several higher level user interfaces build in the top of Theano (for DNN, Keras, Lasagne, Blocks, and others, or for Python probalistic programming, PyMC3). I'll seek for some of them also becoming available in Debian, too. helper scripts Both binary packages ship three convenience scripts, theano-cache, theano-test, and theano-nose. Instead of them being copied into /usr/bin, which would result into a binaries-have-conflict violation, the scripts are to be found in /usr/share/python-theano (python3-theano respectively), so that both module packages of Theano can be installed at the same time. The scripts could be run directly from these folders, e.g. do $ python /usr/share/python-theano/theano-nose to achieve that. If you're going to heavy use them, you could add the directory of the flavour you prefer (Python 2 or Python 3) to the $PATH environment variable manually by either typing e.g. $ export PATH=/usr/share/python-theano:$PATH on the prompt, or save that line into ~/.bashrc. Manpages aren't available for these little helper scripts6, but you could always get info on what they do and which arguments they accept by invoking them with the -h (for theano-nose) resp. help flag (for theano-cache). running the tests On some occasions you might want to run the testsuite of the installed library, like to check over if everything runs fine on your GPU hardware. There are two different ways to run the tests (anyway you need to have python ,3 -nose installed). One is, you could launch the test suite by doing $ python -c 'import theano; theano.test() (or the same with python3 to test the other flavour), that's the same what the helper script theano-test does. However, by doing it that way some particular tests might fail by raising errors also for the group of known failures. Known failures are excluded from being errors if you run the tests by theano-nose, which is a wrapper around nosetests, so this might be always the better choice. You can run this convenience script with the option --theano on the installed library, or from the source package root, which you could pull by $ sudo apt-get source theano (there you have also the option to use bin/theano-nose). The script accept options for nosetests, so you might run it with -v to increase verbosity. For the tests the configuration switch config.device must be set to cpu. This will also include the GPU tests when a proper accessible device is detected, so that's a little misleading in the sense of it doesn't mean "run everything on the CPU". You're on the safe side if you run it always like this: $ THEANO_FLAGS=device=cpu theano-nose, if you've set config.device to gpu in your ~/.theanorc. Depending on the available hardware and the used BLAS implementation (see below) it could take quite a long time to run the whole test suite through, on the Core-i5 in my laptop that takes around an hour even excluded the GPU related tests (which perform pretty fast, though). Theano features a couple of switches to manipulate the default configuration for optimization and compilation. There is a rivalry between optimization and compilation costs against performance of the test suite, and it turned out the test suite performs a quicker with lesser graph optimization. There are two different switches available to control config.optimizer, the fast_run toggles maximal optimization, while fast_compile runs only a minimal set of graph optimization features. These settings are used by the general mode switches for config.mode, which is either FAST_RUN by default, or FAST_COMPILE. The default mode FAST_RUN (optimizer=fast_run, linker=cvm) needs around 72 minutes on my lower mid-level machine (on un-optimized BLAS). To set mode=FAST_COMPILE (optimizer=fast_compile, linker=py) brings some boost for the performance of the test suite because it runs the whole suite in 46 minutes. The downside of that is that C code compilation is disabled in this mode by using the linker py, and also the GPU related tests are not included. I've played around with using the optimizer fast_compile with some of the other linkers (c py and cvm, and their versions without garbage collection) as alternative to FAST_COMPILE with minimal optimization but also machine code compilation incl. GPU testing. But to my experience, fast_compile without another than the linker py results in some new errors and failures of some tests on amd64, and this might the case also on other architectures, too. By the way, another useful feature is DebugMode for config.mode, which verifies the correctness of all optimizations and compares the C to Python results. If you want to have detailed info on the configuration settings of Theano, do $ python -c 'import theano; print theano.config' less, and check out the chapter config in the library documentation in the documentation. cache maintenance Theano isn't a JIT (just-in-time) compiler like Numba, which generates native machine code in the memory and executes it immediately, but it saves the generated native machine code into compiledirs. The reason for doing it that way is quite practical like the docs explain, the persistent cache on disk makes it possible to avoid generating code for the same operation, and to avoid compiling again when different operations generate the same code. The compiledirs by default are located within $(HOME)/.theano/. After some time the folder becomes quite large, and might look something like this:
$ ls ~/.theano
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.11+-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12rc1-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.1+-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.2-64
compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--3.5.2rc1-64
If the used Python version changed like in this example you might to want to purge obsolete cache. For working with the cache resp. the compiledirs, the helper theano-cache comes in handy. If you invoke it without any arguments the current cache location is put out like ~/.theano/compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64 (the script is run from /usr/share/python-theano). So, the compiledirs for the old Python versions in this example (11+ and 12rc1) can be removed to free the space they occupy. All compiledirs resp. cache directories meaning the whole cache could be erased by $ theano-cache basecompiledir purge, the effect is the same as by performing $ rm -rf ~/.theano. You might want to do that e.g. if you're using different hardware, like when you got yourself another graphics card. Or habitual from time to time when the compiledirs fill up so much that it slows down processing with the harddisk being very busy all the time, if you don't have an SSD drive available. For example, the disk space of build chroots carrying (mainly) the tests completely compiled through on default Python 2 and Python 3 consumes around 1.3 GB (see here). BLAS implementations Theano needs a level 3 implementation of BLAS (Basic Linear Algebra Subprograms) for operations between vectors (one-dimensional mathematical objects) and matrices (two-dimensional objects) carried out on the CPU. NumPy is already build on BLAS and pulls the standard implementation (libblas3, soure package: lapack), but Theano links directly to it instead of using NumPy as intermediate layer to reduce the computational overhead. For this, Theano needs development headers and the binary packages pull libblas-dev by default, if any other development package of another BLAS implementation (like OpenBLAS or ATLAS) isn't already installed, or pulled with them (providing the virtual package libblas.so). The linker flags could be manipulated directly through the configuration switch config.blas.ldflags, which is by default set to -L/usr/lib -lblas -lblas. By the way, if you set it to an empty value, Theano falls back to using BLAS through NumPy, if you want to have that for some reason. On Debian, there is a very convenient way to switch between BLAS implementations by the alternatives mechanism. If you have several alternative implementations installed at the same time, you can switch from one to another easily by just doing:
$ sudo update-alternatives --config libblas.so
There are 3 choices for the alternative libblas.so (providing /usr/lib/libblas.so).
  Selection    Path                                  Priority   Status
------------------------------------------------------------
* 0            /usr/lib/openblas-base/libblas.so      40        auto mode
  1            /usr/lib/atlas-base/atlas/libblas.so   35        manual mode
  2            /usr/lib/libblas/libblas.so            10        manual mode
  3            /usr/lib/openblas-base/libblas.so      40        manual mode
Press <enter> to keep the current choice[*], or type selection number:
The implementations are performing differently on different hardware, so you might want to take the time to compare which one does it best on your processor (the other packages are libatlas-base-dev and libopenblas-dev), and choose that to optimize your system. If you want to squeeze out all which is in there for carrying out Theano's computations on the CPU, another option is to compile an optimized version of a BLAS library especially for your processor. I'm going to write another blog posting on this issue. The binary packages of Theano ship the script check_blas.py to check over how well a BLAS implementation performs with it, and if everything works right. That script is located in the misc subfolder of the library, you could locate it by doing $ dpkg -L python-theano grep check_blas (or for the package python3-theano accordingly), and run it with the Python interpreter. By default the scripts puts out a lot of info like a huge perfomance comparison reference table, the current setting of blas.ldflags, the compiledir, the setting of floatX, OS information, the GCC version, the current NumPy config towards BLAS, NumPy location and version, if Theano linked directly or has used the NumPy binding, and finally and most important, the execution time. If just the execution time for quick perfomance comparisons is needed this script could be invoked with -q. Theano on CUDA The function compiler of Theano works with alternative backends to carry out the computations, like the ones for graphics cards. Currently, there are two different backends for GPU processing available, one docks onto NVIDIA's CUDA (Compute Unified Device Architecture) technology7, and another one for libgpuarray, which is also developed by the Theano developers in parallel. The libgpuarray library is an interesting alternative for Theano, it's a GPU tensor (multi-dimensional mathematical object) array written in C with Python bindings based on Cython, which has the advantage of running also on OpenCL8. OpenCL, unlike CUDA9, is full free software, vendor neutral and overcomes the limitation of the CUDA toolkit being only available for amd64 and the ppc64el port (see here). I've opened an ITP on libgpuarray and we'll see if and how this works out. Another reason for it would be great to have it available is that it looks like CUDA currently runs into problems with GCC 610. More on that, soon. Here's a litle checklist for setting up your CUDA device so that you don't have to experience something like this:
$ THEANO_FLAGS=device=gpu,floatX=float32 python ./cat_dog_classifier.py 
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
hardware check For running Theano on CUDA you need an NVIDIA graphics card which is capable of doing that. You can recheck if your device is supported by CUDA here. When the hardware isn't too old (CUDA support started with GeForce 8 and Quadro X series) or too strange I think it isn't working only in exceptional cases. You can check your model and if the device is present in the system on the bare hardware level by doing this:
$ lspci   grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
If a line like this doesn't get returned, your device most probably is broken, or not properly connected (ouch). If rev ff appears at the end of the line that means the device is off meaning powered down. This might be happening if you have a laptop with Optimus graphics hardware, and the related drivers have switched off the unoccupied device to safe energy11. kernel module Running CUDA applications requires the proprietary NVIDIA driver kernel module to be loaded into the kernel and working. If you haven't already installed it for another purpose, the NVIDIA driver and the CUDA toolkit are both in the non-free section of the Debian archive, which is not enabled by default. To get non-free packages you have to add non-free (and it's better to do so, also contrib) to your package source in /etc/apt/sources.list, which might then look like this:
deb http://httpredir.debian.org/debian/ testing main contrib non-free
After doing that, perform $ apt-cache update to update the package lists, and there you go with the non-free packages. The headers of the running kernel are needed to compile modules, you can get them together with the NVIDIA kernel module package by running:
$ sudo apt-get install linux-headers-$(uname -r) nvidia-kernel-dkms build-essential
DKMS will then build the NVIDIA module for the kernel and does some other things on the system. When the installation has finished, it's generally advised to reboot the system completely. troubleshooting If you have problems with the CUDA device, it's advised to verify if the following things concerning the NVIDIA driver resp. kernel module are in order: blacklist nouveau Check if the default Nouveau kernel module driver (which blocks the NVIDIA module) for some reason still gets loaded by doing $ lsmod grep nouveau. If nothing gets returned, that's right. If it's still in the kernel, just add blacklist nouveau to /etc/modprobe.d/blacklist.conf, and update the booting ramdisk with sudo update-initramfs -u afterwards. Then reboot once more, this shouldn't be the case then anymore. rebuild kernel module To fix it when the module haven't been properly compiled for some reason you could trigger a rebuild of the NVIDIA kernel module with $ sudo dpkg-reconfigure nvidia-kernel-dkms. When you're about to send your hardware in to repair because everything looks all right but the device just isn't working, that really could help (own experience). After the rebuild of the module or modules (if you have a few kernel packages installed) has completed, you could recheck if the module really is available by running:
$ sudo modinfo nvidia-current
filename:       /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko
alias:          char-major-195-*
version:        352.79
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
vermagic:       4.4.0-1-amd64 SMP mod_unload modversions 
parm:           NVreg_Mobile:int
It should be something similiar to this when everything is all right. reload kernel module When there are problems with the GPU, maybe the kernel module isn't properly loaded. You could recheck if the module has been properly loaded by doing
$ lsmod   grep nvidia
nvidia_uvm             73728  0
nvidia               8540160  1 nvidia_uvm
drm                   356352  7 i915,drm_kms_helper,nvidia
The kernel module could be loaded resp. reloaded with $ sudo nvidia-modprobe (that tool is from the package nvidia-modprobe). unsupported graphics card Be sure that you graphics cards is supported by the current driver kernel module. If you have bought new hardware, that's quite possible to come out being a problem. You can get the version of the current NVIDIA driver with:
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 5.3.1 20160528 (Debian 5.3.1-21)
Then, google the version number like nvidia 352.79, this should get you onto an official driver download page like this. There, check for what's to be found under "Supported Products". I you're stuck with that there are two options, to wait until the driver in Debian got updated, or replace it with the latest driver package from NVIDIA. That's possible to do, but something more for experienced users. occupied graphics card The CUDA driver cannot work while the graphical interface is busy like by processing the graphical display of your X.Org server. Which kernel driver actually is used to process the desktop could be examined by this command:12
$ grep '(II).*([0-9]):' /var/log/Xorg.0.log
[    37.700] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20150522
[    37.700] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917-2 (Vincent Cheng <vcheng@debian.org>)
 ... 
[    39.808] (II) intel(0): switch to mode 1920x1080@60.0 on eDP1 using pipe 0, position (0, 0), rotation normal, reflection none
[    39.810] (II) intel(0): Setting screen physical size to 508 x 285
[    67.576] (II) intel(0): EDID vendor "CMN", prod id 5941
[    67.576] (II) intel(0): Printing DDC gathered Modelines:
[    67.576] (II) intel(0): Modeline "1920x1080"x0.0  152.84  1920 1968 2000 2250  1080 1083 1088 1132 -hsync -vsync (67.9 kHz eP)
This example shows that the rendering of the desktop is performed by the graphical device of the Intel CPU, which is just like it's needed for running CUDA applications on your NVIDIA graphics card, if you don't have another one. nvidia-cuda-toolkit With the Debian package of the CUDA toolkit everything pretty much runs out of the box for Theano. Just install it with apt-get, and you're ready to go, the CUDA backend is the default one. Pycuda is also a suggested dependency of the binary packages, it could be pulled together with the CUDA toolkit. The up-to-date CUDA release 7.5 is of course available, with that you have Maxwell architecture support so that you can run Theano on e.g. a GeForce GTX Titan X with 6,2 TFLOPS on single precision13 at an affordable price. CUDA 814 is around the corner with support for the new Pascal architecture15. Like the GeForce GTX 1080 high-end gaming graphics card already has 8,23 TFLOPS16. When it comes to professional GPGPU hardware like the Tesla P100 there is much more computational power available, scalable by multiplication of cores resp. cards up to genuine little supercomputers which fit on a desk, like the DGX-117. Theano can use multiple GPUs for calculations to work with highly scaled hardware, I'll write another blog post on this issue. Theano on the GPU It's not difficult to run Theano on the GPU. Only single precision floating point numbers (float32) are supported on the GPU, but that is sufficient for deep learning applications. Theano uses double precision floats (float64) by default, so you have to set the configuration variable config.floatX to float32, like written on above, either with the THEANO_FLAGS environment variable or better in your .theanorc file, if you're going to use the GPU a lot. Switching to the GPU actually happens with the config.device configuration variable, which must be set to either gpu or gpu0, gpu1 etc., to choose a particular one if multiple devices are available. Here's is a little test script check1.py, it's taken from the docs and slightly altered. You can run that script either with python or python3 (there was a single test failure on the Python 3 package, so the Python 2 library might be a little more stable currently). For comparison, here's an example on how it perfoms on my hardware, one time on the CPU, one more time on the GPU:
$ THEANO_FLAGS=floatX=float32 python ./check1.py 
[Elemwise exp,no_inplace (<TensorType(float32, vector)>)]
Looping 1000 times took 4.481719 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu
$ THEANO_FLAGS=floatX=float32,device=gpu python ./check1.py 
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN not available)
[GpuElemwise exp,no_inplace (<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise exp,no_inplace .0)]
Looping 1000 times took 1.164906 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
If you got a result like this you're ready to go with Theano on Debian, training computer vision classifiers or whatever you want to do with it. I'll write more on for what Theano could be used, soon.

  1. Some ports are disabled because they are currently not supported by Theano. There are NotImplementedErrors and other errors in the tests on the numpy.ndarray object being not aligned. The developers commented on that, see here. And on some ports the build flags -m32 resp. -m64 of Theano aren't supported by g++, the build flags can't be manipulated easily.
  2. Theano Development Team: "Theano: a Python framework for fast computation of mathematical expressions"
  3. Marc Couture: "Today's high-powered GPUs: strong for graphics and for maths". In: RTC magazine June 2015, pp. 22 25
  4. Ogier Maitre: "Understanding NVIDIA GPGPU hardware". In: Tsutsui/Collet (eds.): Massively parallel evolutionary computation on GPGPUs. Berlin, Heidelberg: Springer 2013, pp. 15-34
  5. Geoffrey French: "Deep learing tutorial: advanved techniques". PyData London 2016 presentation
  6. Like the description of the Lintian tag binary-without-manpage says, that's not needed for them being in /usr/share.
  7. Tom. R. Halfhill: "Parallel processing with CUDA: Nvidia's high-performance computing platform uses massive multithreading". In: Microprocessor Report January 28, 2008
  8. Faber et.al: "Parallelwelten: GPU-Programmierung mit OpenCL". In: C't 26/2014, pp. 160-165
  9. For comparison, see: Valentine Sinitsyn: "Feel the taste of GPU programming". In: Linux Voice February 2015, pp. 106-109
  10. https://lists.debian.org/debian-devel/2016/07/msg00004.html
  11. If Optimus (hybrid) graphics hardware is present (like commonly today on PC laptops), Debian launches the X-server on the graphics processing unit of the CPU, which is ideal for CUDA. The problem with Optimus actually is the graphics processing on the dedicated GPU. If you are using Bumblebee, the Python interpreter which you want to run Theano on has be to be started with the launcher primusrun, because Bumblebee powers the GPU down with the tool bbswitch every time it isn't used, and I think also the kernel module of the driver is dynamically loaded.
  12. Thorsten Leemhuis: "Treiberreviere. Probleme mit Grafiktreibern f r Linux l sen": In: C't Nr.2/2013, pp. 156-161
  13. Martin Fischer: "4K-Rakete: Die schnellste Single-GPU-Grafikkarte der Welt". In C't 13/2015, pp. 60-61
  14. http://www.heise.de/developer/meldung/Nvidia-CUDA-8-bringt-Optimierungen-fuer-die-Pascal-Architektur-3164254.html
  15. Martin Fischer: "All In: Nvidia enth llt die GPU-Architektur 'Pascal'". In: C't 9/2016, pp. 30-31
  16. Martin Fischer: "Turbo-Pascal: High-End-Grafikkarte f r Spieler: GeForce GTX 1080". In: C't 13/2016, pp. 100-103
  17. http://www.golem.de/news/dgx-1-nvidias-supercomputerchen-mit-8x-tesla-p100-1604-120155.html

7 September 2015

Ben Armstrong: Hike at Blomidon Park: Late Summer, 2015

I had the wonderful privilege to go camping and hiking with my kids scouting group, the Pathfinders of Tantallon SDA church. The day started with a quick trip to Pugwash with one of the leaders to bring back some chairs to their school, and then we headed back out to Blomidon to meet up with the group. Click the photo below to start the slideshow.
The road trip started early to fetch some chairsThe road trip started early to fetch some chairs click to start
The smudgy truck windows make an interesting filter Still an hour or more away from our first stop More funky filtering, this time with trees participating Wentworth valley taken over the cluttered dash We disturbed a great blue heron s breakfast at Wallace The beach at Pugwash SDA Camp where we loaded the chairs Trucking along past Truro After dropping off chairs, finally approaching Blomidon Getting very close to Blomidon Some bikers out to enjoy the views Interesting white berries Interesting red berries My first up close look at the point with my hiking buddy, Dave, on the first day An experimental panorama. Not sure I have the knack for keeping the horizon straight. We must bring the group out here tomorrow! Fast ringneck snake! Hard to get a clear shot Pre-dawn over the campground The first blush of coming dawn The Moon and Venus just before dawn Evergreens surrounding our camp site, pre-dawn Seems I m still the only one up Half of the tents on the spacious group site Half of the tents on the spacious group site My daughter, the artist My two youngest and their best friend Some relaxing down time after breakfast Not sure who said what, but apparently they were hilarious. :) The smoke was a bit much for my eldest Geoff entertaining the troops Dave making breakfast Dave making breakfast Breakfast just wrapping up Relaxing while we finish breakfast Looks like that needs some tweaking A bit too smoky The whole group The whole group Just goofing around Who s winning? Enjoying the last embers of the breakfast fire before heading to Jodrey Trail I admire this young lady s great eye for photography She has some sweet gear A lot of old hardwoods out here Dave did this hike with me yesterday Excellent hiking buddy! Words can t describe how much more stunning these views are in person All the cameras came out Got to get that perfect shot! A tree clinging to the eroding ground above the sheer cliff A lookoff on Jodrey Trail A lookoff on Jodrey Trail Lining up her shot /a> A lookoff on Jodrey Trail A lookoff on Jodrey Trail A lookoff on Jodrey Trail A fern with sharply serrated sturdy leaves I m not familiar with Breaking camp at group site 404 One final chance to enjoy the view from the park entrance before heading home
facebooktwittergoogle_plusredditpinterestlinkedinmail

30 June 2014

Russell Coker: Links June 2014

Russ Albery wrote an insightful blog post about trust, computer security, and training programmers [1]. He makes a good case that social problems in our community decrease the availability of skilled people to write and audit security code. The Lawfare blog has an insightful article by Dan Geer about Heartbleed as a Metaphor [2]. He makes some good points about security and design, ways of potentially solving some flaws and problems with the various solutions. Eben Moglen wrote an insightful article for The Guardian about the way that the NSA spying is a direct threat to democracy [3] The TED blog has an interesting interview with Kitra Cahana about her work living with and photographing nomads in the US [4]. I was surprised to learn that there s an active nomad community in the US based on the culture that started in the Great Depression. Apparently people are using Youtube to learn about nomad culture before joining. Dave Johnson wrote an interesting Salon article about why CEOs make 300* as much money as workers [5]. Note that actually contributing to the financial success of the company is not one of the reasons. Maia Szalavitz wrote an interesting Slate article about Autism and Anorexia [6]. Apparently some people on the Autism Spectrum are mis-diagnosed with Anorexia due to food intolerance. Groups of four professors have applied for the job of president and vice-chancellor of the University of Alberta [7]. While it was a joke to apply in that way, 1/4 of the university president s salary is greater than the salary of a professor and the university would get a team of 4 people to do the job so it would really make sense to hire them. Of course the university could just pay a more reasonable salary for the president and hire an extra 3 professors. But the same argument applies for lots of highly paid jobs. Is a CEO who gets paid $10M per annim really going to do a better job than a team of 100 people who are paid $100K? Joel on Software wrote an insightful article explaining why hiring 1/200 applicants doesn t mean you hire the top 0.5% of workers [8]. He suggests that the best employees almost never apply through regular channels so an intern program is the only way to get a chance of hiring the best people. Chaotic Idealism has an interesting article on some of the bogus claims about autism and violence [9]. Salon has an interesting articleby Lindsay Abrams about the way the food industry in the US lobbies for laws to prevent employees from reporting animal cruelty or contamination of the food supply and how drones will now be used for investigative journalism [10]. Jacobin Mag has an interesting article by Geoff Shullenberger about the Voluntariat , the people who volunteer their time to help commercial organisations [11]. I don t object to people voluntarily helping companies, but when they are exploited or when the company also requires voluntary help from the government it becomes a problem. We need some legislation about this. Laura Hudson wrote an insightful article about how Riot Games solved their online abuse problem [12]. There are ideas in this that can apply to all online communities. Matt LeMay wrote an interesting article for Medium titled What (Else) Can Men Do? Grow The Fuck Up [13]. It s a general commentary on the treatment of women in geek communities (and most other places). Foz Meadows wrote an insightful analysis of the attempts of bigots to influence science-fiction [14]. If I had more spare time I d read some of the books by bigoted authors on the Sad Puppy Slate (from a library of course) and see if they lack talent in the same way that Orson Scott Card does. Racialicious has an interesting article by Phenderson Djeli Clark about the horrible racism and bigotry of H.P. Lovecraft [15]. I have only read two HP Lovecraft stories, one was mediocre and the other (The Horroe at Red Hook) was quite poor largely due to his inability to write about non-white people. Grace Wyler wrote an insightful article for Vice magazine about the right-wing terrorists in the US killing cops [16]. Paul Rosenberg wrote an interesting and amusing (for people outside the US) article about the gun crazies in the US [17]. Maybe the gun crazies should have a loaded assault rifles and tequila party to follow up on their gun appreciation day . A US TV show made a 4 minute clip of some of the stupid things that Tony Abbott has done [18]. Tony is almost as stupid as Dubya.

26 May 2014

Joachim Breitner: Does list fusion work?

I m writing this in the lunch break of TFP in Soesterberg. The invited talk this morning was by Geoffrey Mainland, and he talked about the difficulties of (informal) reasoning about performance in high-level languages like Haskell, especially with fancy stuff in the compiler like fusion. So I couldn t help but think about a (very small) help here. Consider the the two very similar expressions foldl (+) 0 [0..1000] and foldr (+) 0 [0..1000]. Which of these fuse away the list? Hopefully both, but hard to predict. So with my list-fusion-probe library, you can write
import Data.List.Fusion.Probe (fuseThis)
main = print $ foldr (+) 0 (fuseThis [0..1001])
and find out. If you compile this (with -O!), it will print
500500
If you change the foldr to foldl, it will print
Test: fuseThis: List did not fuse
So you can see that the function fuseThis :: [a] -> [a] does nothing if the list gets fused away, but causes a run-time error if not. It allows you to annotate your code with your assumptions of list fusion, and get shouted at if your assumptions are wrong. It wouldn t be hard to have the compiler give a warning or error message at compile time; we d just need to introduce a special function abortCompilation that, when found in the code during compilation, does just that. Note that you ll have trouble reproducing the above in GHC HEAD, where foldl does fuse (which is what I m going to talk about tomorrow here).

7 April 2014

Russ Allbery: Review: Fantasy & Science Fiction, September/October 2011

Review: Fantasy & Science Fiction, September/October 2011
Editor: Gordon van Gelder
Issue: Volume 121, No. 3 & 4
ISSN: 1095-8258
Pages: 258
Another review of a magazine that I finished quite some time ago. Apologies for any inaccuracies or lack of depth in the reviews. There wasn't much in Charles de Lint's reviews in this issue that interested me, but Michelle West covers a great selection of books. Two of them (The Wise Man's Fear and The Quantum Thief) are already on my to-read list; the third, The Postmortal, sounded interesting and would go on my list to purchase if I didn't already have so many good books I've not read. Otherwise, this issue is short on non-fiction. The only other essay entry is a film review from Kathi Maio, which is the typical whining about all things film that F&SF publishes. "Rutger and Baby Do Jotenheim" by Esther M. Friesner: Baby is a former pole dancer with a toy poodle named Mister Snickers, which warns you right away that this story is going to involve a few over-the-top caricatures and more use of the word "piddle" than one might ideally want. Rutger is a mythology professor who tolerates her for the standard reasons in this sort of pairing. They're travelling across country to Baby's sister's wedding when their car breaks down in Minnesota, prompting an encounter with frost giants. As you might expect, this is a sort of fractured fairy tale, except based on Norse mythology instead of the more typical Grimm fare. The fun is in watching these two apparent incompetents (but with enough knowledge of mythology to clue in the reader) reproduce the confrontation between Thor and Utgard-Loki. The fight with old age is particularly entertaining. If you've read any of Friesner's other stories, you know what to expect: not much in the way of deeper meaning, but lots of fun playing with stereotypes and an optimistic, funny outcome. Good stuff. (7) "The Man Inside Black Betty" by Sarah Langan: This story comes with a mouthful of a subtitle: "Is Nicholas Wellington the World's Best Hope?" It's also a story that purports to be written by a fictional character, in this case one Saurub Ramesh (with Langan credited as having done "research"). It's told in the style of first-person journalism, relating the thoughts and impressions of Ramesh as he interviews Nicholas Wellington. The topic is Black Betty: a black hole above Long Island Sound. Wellington is a scientific genius and iconoclast with radical theories of black holes that contradict how the government has been attempting to deal with Black Betty, unsuccessfully. The structure here was well-handled, reminding me a lot of a Michael Lewis article during the financial collapse. Langan has a good feel for how journalism of this type mixes personalities, politics, and facts. But it's all setup and no story. We get some world building, and then it's over, with no resolution except pessimism. Meh. (4) "A Borrowed Heart" by Deborah J. Ross: Ross starts with the trappings of urban fantasy transplanted into a Victorian world: supernatural creatures about, a protagonist who is a high-class prostitute, and sex and a sucubus by the second page. It evolves from there into a family drama and an investigation, always giving the reader the impression that a vampire will jump out at any moment. But the ending caught me entirely by surprise and was far more effective due to its departure from the expected path. Well done. (7) "Bright Moment" by Daniel Marcus: The conflict between terraforming and appreciation for the universe as we find it is an old story pattern in science fiction, and Marcus doesn't add much here. I think the story would have been stronger if he'd found a way to write the same plot with a pure appeal to environmental beauty without the typical stakes-raising. But he does sprinkle the story with a few interesting bits, including a pod marriage and a futuristic version of extreme sports as a way of communing with nature. (6) "The Corpse Painter's Masterpiece" by M. Rickert: This is typical of my reaction to a Rickert story: shading a bit too much towards horror for me, a bit too cryptic, well-written but not really my thing. It's about a corpse painter who does the work of an informal mortician, improving the appearance of bodies for their funerals, and the sheriff who brings him all the dead bodies. It takes an odd macabre twist, and I have no idea what to make of the ending. (4) "Aisle 1047" by Jon Armstrong: Armstrong is best known for a couple of novels, Grey and Yarn, which entangle their stories in the future of marketing and commerce. One may be unsurprised, then, that this short story is on similar themes, with the intensity turned up to the parody point. Tiffan3 is a department-store saleswoman, spouting corporate slogans and advertising copy while trying to push customers towards particular products. The story follows the escalation into an all-out brand war, fought with the bubbly short-cut propaganda of a thirty-second commercial. For me, it fell awkwardly between two stools: it's a little too over-the-top and in love with its own bizarre alternate world to be effective satire, but the world is more depressing than funny and the advertising copy is grating. More of a curiosity than a successful story, I think. (5) "Anise" by Chris DeVito: Stories that undermine body integrity and focus on the fascinated horror of violation of physical boundaries aren't generally my thing, so take that into account in this review. Anise's husband died, but that's not as much of a problem as it used to be. Medical science can resurrect people via a sort of permanent, full-body life support system, making them more cyborg than human. "Anise" is about the social consequences of this technology in a world where a growing number of people have a much different relationship with their body than the typical living person today. It's a disturbing story that is deeply concerned with the physical: sex, blood, physical intimacy in various different forms, and a twisted type of psychological abuse. I think fans of horror will like this more than I did, although it's not precisely horror. It looks at the way one's perception of self and others can change by passing through a profound physical transformation. (5) "Spider Hill" by Donald Mead: I liked this story a lot better. It's about witchcraft and farm magic, about family secrets, and a sort of coming-of-age story (for a girl rather than a boy, for once). The main character is resourceful, determined, but also empathetic and aware of the impact of her actions, which made her more fun to read about. I doubt I'll remember this for too long, but when skimming through it again for a review, I had fond memories of it. (6) "Where Have All the Young Men Gone?" by Albert E. Cowdrey: Cowdrey in his paranormal investigation mode, which I like better than his horror mode. For once, the protagonist isn't even a lower-class or backwoods character. Instead, he's a military historian travelling in Austria who runs across a local ghost story. This is a fairly straightforward ghost investigation that follows a familiar path (albeit to an unusual final destination), but Cowdrey is a good story-teller and I liked the protagonist. (7) "Overtaken" by Karl Bunker: This is the sort of story that delivers its moral with the force of a hammer. It's not subtle. But if you're in the right mood for that, it's one of the better stories of its type. It's about a long-journey starship, crew in hibernation, that's overtaken by a far newer and faster mechanized ship from Earth that's attempting to re-establish contact with the old ships. The story is a conversation between the ship AIs. Save this one until you're in the mood for an old-fashioned defense of humanity. (8) "Time and Tide" by Alan Peter Ryan: Another pseudo-horror story, although I think it's better classified as a haunting. A wardrobe recalls a traumatic drowning in the childhood of the protagonist. As these things tend to do in stories like this, reality and memory start blurring and the wardrobe takes on a malevolent role. Not my sort of thing. (3) "What We Found" by Geoff Ryman: Any new Geoff Ryman story is something to celebrate. This is a haunting story on the boundaries between the scientific method and tribal superstition, deeply entangled with the question of how one recovers from national and familial trauma. How can we avoid passing the evils and madness of one generation down to the next? Much of the story is about family trauma, told with Ryman's exceptional grasp of character, but the science is entangled in an ingenious way that I won't spoil. As with Air, this is in no way science fiction. The science here would have fascinating and rather scary implications for our world, but clearly is not how science actually works. But as an insight into politics, and into healing, I found it a startlingly effective metaphor. I loved every bit of this. By far the best story of the issue. (9) Rating: 7 out of 10

29 January 2014

Petter Reinholdtsen: A fist full of non-anonymous Bitcoins

Bitcoin is a incredible use of peer to peer communication and encryption, allowing direct and immediate money transfer without any central control. It is sometimes claimed to be ideal for illegal activity, which I believe is quite a long way from the truth. At least I would not conduct illegal money transfers using a system where the details of every transaction are kept forever. This point is investigated in USENIX ;login: from December 2013, in the article "A Fistful of Bitcoins - Characterizing Payments Among Men with No Names" by Sarah Meiklejohn, Marjori Pomarole,Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. They analyse the transaction log in the Bitcoin system, using it to find addresses belong to individuals and organisations and follow the flow of money from both Bitcoin theft and trades on Silk Road to where the money end up. This is how they wrap up their article:
"To demonstrate the usefulness of this type of analysis, we turned our attention to criminal activity. In the Bitcoin economy, criminal activity can appear in a number of forms, such as dealing drugs on Silk Road or simply stealing someone else s bitcoins. We followed the flow of bitcoins out of Silk Road (in particular, from one notorious address) and from a number of highly publicized thefts to see whether we could track the bitcoins to known services. Although some of the thieves attempted to use sophisticated mixing techniques (or possibly mix services) to obscure the flow of bitcoins, for the most part tracking the bitcoins was quite straightforward, and we ultimately saw large quantities of bitcoins flow to a variety of exchanges directly from the point of theft (or the withdrawal from Silk Road). As acknowledged above, following stolen bitcoins to the point at which they are deposited into an exchange does not in itself identify the thief; however, it does enable further de-anonymization in the case in which certain agencies can determine (through, for example, subpoena power) the real-world owner of the account into which the stolen bitcoins were deposited. Because such exchanges seem to serve as chokepoints into and out of the Bitcoin economy (i.e., there are few alternative ways to cash out), we conclude that using Bitcoin for money laundering or other illicit purposes does not (at least at present) seem to be particularly attractive."
These researches are not the first to analyse the Bitcoin transaction log. The 2011 paper "An Analysis of Anonymity in the Bitcoin System" by Fergal Reid and Martin Harrigan is summarized like this:
"Anonymity in Bitcoin, a peer-to-peer electronic currency system, is a complicated issue. Within the system, users are identified by public-keys only. An attacker wishing to de-anonymize its users will attempt to construct the one-to-many mapping between users and public-keys and associate information external to the system with the users. Bitcoin tries to prevent this attack by storing the mapping of a user to his or her public-keys on that user's node only and by allowing each user to generate as many public-keys as required. In this chapter we consider the topological structure of two networks derived from Bitcoin's public transaction history. We show that the two networks have a non-trivial topological structure, provide complementary views of the Bitcoin system and have implications for anonymity. We combine these structures with external information and techniques such as context discovery and flow analysis to investigate an alleged theft of Bitcoins, which, at the time of the theft, had a market value of approximately half a million U.S. dollars."
I hope these references can help kill the urban myth that Bitcoin is anonymous. It isn't really a good fit for illegal activites. Use cash if you need to stay anonymous, at least until regular DNA sampling of notes and coins become the norm. :) As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

16 November 2013

Russ Allbery: Review: Blue Remembered Earth

Review: Blue Remembered Earth, by Alastair Reynolds
Series: Poseidon's Children #1
Publisher: Ace
Copyright: January 2012
Printing: June 2012
ISBN: 0-441-02071-2
Format: Hardcover
Pages: 505
Geoffrey Akinya is one heir to the vast Akinya business empire, which straddles the solar system and has created vast riches from exploration and mining. He wants nothing to do with it. His passion is elephants: long and methodical study of wild herds in Africa, including a slow and careful investigation into their thought processes. His obnoxious and superior cousins have a passion for business and finance and can have the running of the company for all he cares. But Geoffrey's grandmother Eunice, the family matriarch and driving force behind much of their business expansion, has died. Her ashes are brought back to Africa from the Winter Palace in orbit around the Moon where she lived out her final days and scattered in a family ceremony. And, shortly thereafter, Geoffrey's cousins convince him to investigate a safe-deposit box left on the Moon by Eunice to ensure that it doesn't contain anything damaging to the family. What it contains is the first step in a puzzle. Despite himself, Geoffrey and his sister Sunday an artist and family black sheep who lives in a region of the Moon that is one of the sole holdouts against the ubiquitous monitoring common in the rest of the inhabited solar system are slowly pulled into unraveling that puzzle. This leads them into an uneasy alliance with one of the major political forces in the solar system and some startling revelations about their grandmother's actual plans. Blue Remembered Earth is another entry in the currently-popular sub-genre of solar system SF. Interstellar travel is still a dream, but a combination of improved technology, suspended animation, and heavy use of robots and robotic factories has let humans slowly expand into more of the solar system. On Earth, powers have risen and fallen, and the leap into the solar system coincided with a surging and powerful Africa. Extensive undersea colonization has established a new international consortium of sea-floor countries, and Earth politics have realigned into a diplomatic and business struggle between surface and oceanic nations. Unusually for this sort of fiction, most characters of significance in this story are African. The United States and Europe have quietly disappeared in the way that all non-US countries tend to in most other SF. Reynolds doesn't do very much with this setup (and the characters don't seem distinctly African to me). It's just there as the unremarked normal. I liked that. I kind of wish more of African culture had come through in the story, but there are enough SF novels with an assumed and unremarked white (and usually US) future that an assumed and unremarked black African culture is a nice change of pace. I picked up this book in part because it was described as a better 2312. That's a fairly accurate summary (although only helpful to those who have read Kim Stanley Robinson's novel). The scope and setting are similar, although Reynolds's future cuts back on the scope of off-world human colonization, relies more heavily on robotics, and keeps the power base on Earth. Both books are structured as a mystery, although in Blue Remembered Earth that mystery is deliberately planted for the characters. Reynolds is nowhere near as good at set pieces as Robinson, sadly. There's nothing here at the level of some of the scenes on Mercury in 2312. But he is much better at plotting and characterization, and Blue Remembered Earth tells a coherent story with increasing tension and a satisfying ending. Reynolds here uses an interesting strategy for spinning a mystery out of pieces of knowledge and discovery: set the discovery process prior to the novel, use protagonists who were unaware of it, and turn the details into a mystery that they have to investigate. This is an interesting way to address the problem that most discovery processes are slow and not particularly dramatic. It is artificial Eunice creates the plot for the novel but that worked for me. By the end of the book, I think I understand her reasons for doing so, and both Geoffrey and Sunday have good reasons to not be aware of any of the details before they start following the clues that Eunice leaves behind. The mystery of course sends Geoffrey and Sunday on a bit of a tour of the solar system as well as a tour of the past of their family business and the politics of the present. Here, Reynolds draws on an old SF idea: a conflict between groups that want to mechanize space exploration and groups that want to adapt humans for space. Long-time SF readers will be familiar with this approach, and I don't think Reynolds does anything startlingly new with it, but I thought it was a well-executed side plot. And, for once, an SF author realizes that Earth's oceans are as compelling of a place for that conflict to play out as outer space. There are, of course, significant mysteries to be uncovered, but it takes much of the book before those revelations start (even if one of them is almost spoiled in the dust jacket summary). The story up until that point is driven by character, particularly Geoffrey's stubborn annoyance at the rest of his family and his complete lack of interest in being the protagonist of a novel. I should probably warn here that many of the Amazon reviews disliked this book because of the characters, particularly Geoffrey. Apparently he comes across to some as whiny, resentful, and annoying. I, however, didn't have this reaction at all, possibly in part because I wholeheartedly share his opinion of both big business and his cousins. I thoroughly enjoyed reading an SF novel whose main characters are not driven either typical capitalist motives or a desire for personal exploration, but instead are pulled into the plot through a sense of loyalty and an inability to get out of the way of the problem. I also liked the way Reynolds manages to broaden and deepen the reader's appreciation for the characters as the story unfolds, and even salvages some characters I thought were unsalvageable. The last third of the book features some excellent plot twists and some very nicely-handled revelations. This is apparently the first book of a trilogy, and I had a suspicion at the end of the book that sequels were coming, but I think it stands on its own quite well. Most of the questions raised by the plot are resolved by the end of the book, and those that remain are of a size that other books have left unanswered intentionally. Despite not reaching the same moments of awe-inspiring beauty, and despite more workman-like descriptions, the stronger plot and more interesting characters do make Blue Remembered Earth a better 2312 for me. Given that 2312 won a Nebula, that's saying something (although I don't agree with that award). It's a bit deliberate in its pacing, and the plot built around a constructed puzzle can feel a bit artificial, but I think it's a solid and satisfying near-space SF novel. Followed by On the Steel Breeze. Rating: 7 out of 10

31 October 2011

Russell Coker: Links October 2011

Ron has written an interesting blog post about the US as a lottery economy [1]. Most people won t win the lottery (literally or metaphorically) so they remain destined for poverty. Tim Connors wrote an informative summary of the issues relating to traffic light timing and pedestrians/cyclists [2]. I have walked between Southgate and the Crown Casino area many times and have experienced the problem he describes many times. Scientific American has an interesting article about a new global marketplace for scientific research [3]. The concept is that instead of buying a wide range of research equipment (and hiring people to run it) you can outsource non-core research for a lower cost. Svante P bo gave an interesting TED talk about his work analysing human DNA to determine prehistoric human migration patterns [4]. Among other things he determined that 2.5% of the DNA from modern people outside Africa came from the Neandertals. Lisa wrote an informative article about Emotional Support Animals (as opposed to Service Animals such as guide dogs) for disabled people [5]. It seems that the US law is quite similar to Australian law in that reasonable accommodations have to be made for disabled people which includes allowing pets in rental properties even if such pets aren t officially ESAs. Beyond Zero Emissions has an interesting article about electricity prices which explains how wind power forces prices down [6]. This should offset the new carbon tax . Problogger has an article listing some of the ways that infographics can be used on the web [7]. This can be for blog posts or just for your personal understanding. Petter Reinholdtsen wrote a handy post about ripping DVDs which also explains how to do it when the DVD has errors [8], I haven t yet ripped a DVD but this one is worth noting for when I do. Miriam has written about the Fantastic Park ICT training for 8-12yo kids [9]. It s run in Spain (and all the links are in Spanish but Google Translation works well) and is a camp to teach children about computers and robotics using Lego Wedo among other things. We need to have more of these things in other countries. The Atlantic Cities has an interesting article comparing grid and cul-de-sac based urban designs [10]. Apparently the cul-de-sac design forces an increase in car use and therefore an increase in fatal accidents while also decreasing the health benefits of walking. Having lived in both grid and cul-de-sac based urban areas I have personally experienced the benefits of the grid based layout. Sarah Chayes wrote an interesting LA Times article about governments being taken over by corruption [11]. She argues that arbitrary criminal government leads to an increase in religious fundamentelism. Michael Lewis has an insightful article in Vanity Fair about the bankruptcy of US states and cities [12]. Ben Goldacre gave an interesting TED talk about bad medical science [13]. He starts with the quackery that is published in tabloid newspapers and then moves on to deliberate scientific fraud by medical companies. Geoff Mulgan gave an interesting TED talk about the Studio Schools in the UK which are based around group project work [14]. The main thing I took from this is that the best method of teaching varies by subject and by student. So instead of having a monolithic education department controlling everything we should have schools aimed at particular career paths and learning methods. Sophos has an interesting article about the motion sensors of smart phones being used to transcribe keyboard input based on vibration [15]. This attack could be launched by convincing a target to install a trojan application on their phone. It s probably best to regard your phone with suspicion nowadays. Simon Josefsson wrote a good article explaining how to use a GPG smart-card to authenticate ssh sessions with particular reference to running backups over ssh [16]. C ran wrote a good article explaining how to use all the screen space when playing DVDs on a wide screen display with mplayer [17]. Charles Stross has an informative blog post about Wall St Journal circulation fraud [18]. Apparently the WSJ was faking readership numbers to get more money from advertisers, this should lead to law suits and more problems for Rupert Murdoch. Is everything associated with Wall St corrupt?

2 September 2011

Asheesh Laroia: Debian bug squashing party at SIPB, MIT


(Photo credit: Obey Arthur Liu; originally on Picasa, license.) Three weekends ago, I participated in a Debian bug squashing party. It was more fun than I had guessed! The event worked: we squashed bugs. Geoffrey Thomas (geofft) organized it as an event for MIT's student computing group, SIPB. In this post, I'll review the good parts and the bad. I'll conclude with beaming photos of my two mentees and talk about the bugs they fixed. So, the good:

The event was a success, but as always, there are some things that could have gone more smoothly. Here's that list: Still, it turned out well! I did three NMUs, corresponding to three patches submitted for release-critical bugs by my two mentees. Those mentees were: Jessica enjoying herself Jessica McKellar is a software engineer at Ksplice Oracle and a recent graduate of MIT's EECS program. She solved three release-critical bugs. This was her first direct contribution to Debian. In particular: Jessica has since gotten involved in the Twisted project's personal package archive. Toward the end of the sprint, she explained, "I like fixing bugs. I will totally come to the next bug squashing party." Noah grinning Noah Swartz is a recent graduate of Case Western Reserve University where he studied Mathematics and played Magic. He is an intern at the MIT Media Lab where he contributes to DoppelLab in Joe Paradiso's Responsive Environments group. This was definitely his first direct contribution to Debian. It was also one of the most intense command-line experiences he has had so far. Noah wasn't originally planning to come, but we were having lunch together before the hackathon, and I convinced him to join us. Noah fixed #625177, a fails-to-build-from-source (FTBFS) bug in nslint. The problem was that "-Wl" was instead written in all lowercase in the debian/rules file, as "-wl". Noah fixed that, making sure the package properly built in pbuilder, and then spent some quality time with lintian figuring out the right way to write a debian/changelog. That's a wrap! We'll hopefully have one again in a few months, and before that, I hope to write up a guide so that we run things even more smoothly next time.

31 August 2011

Russell Coker: Links August 2011

Alex Steffen gave an interesting TED talk summarising the ways that greater urban density can reduce energy use while increasing our quality of life [1]. Geoffrey West gave an interesting TED talk about the way animals, corporations, and cities scale [2]. The main factor is the way that various variables scale in proportion to size. On a logarithmic graph the growth of a city shows a steady increase in both positive factors such as wages and inventions and in negative factors such as crime as it grows larger. So it seems that we need to decrease the crime rate significantly to permit the growth of larger cities and therefore gain more efficiency. The Mankind Project (MKP) has a mission of redefining mature masculinity for the 21st Century [3]. They have some interesting ideas. Phillip Zimbardo gave a provocative TED talk about the demise of men [4]. He provided little evidence to support his claims though.

Next.