Search Results: "certik"

4 August 2013

Ondřej Čertík: How to support both Python 2 and 3

I'll start with the conclusion: making backwards incompatible version of a language is a terrible idea, and it was bad a mistake. This mistake was somewhat corrected over the years by eventually adding features to both Python 2.7 and 3.3 that actually allow to run a single code base on both Python versions --- which, as I show below, was discouraged by both Guido and the official Python documents (though the latest docs mention it)... Nevertheless, a single code base fixes pretty much all the problems and it actually is fun to use Python again. The rest of this post explains my conclusion in great detail. My hope is that it will be useful to other Python projects to provide tips and examples how to support both Python 2 and 3, as well as to future language designers to keep languages backwards compatible. When Python 3.x got released, it was pretty much a new language, backwards incompatible with Python 2.x, as it was not possible to run the same source code in both versions. I was extremely unhappy about this situation, because I simply didn't have time to port all my Python code to a new language. I read the official documentation about how the transition should be done, quoting:

You should have excellent unit tests with close to full coverage.

Port your project to Python 2.6.

Turn on the Py3k warnings mode.

Test and edit until no warnings remain.

Use the 2to3 tool to convert this source code to 3.0 syntax. Do not manually edit the output!

Test the converted source code under 3.0.

If problems are found, make corrections to the 2.6 version of the source code and go back to step 3.

When it's time to release, release separate 2.6 and 3.0 tarballs (or whatever archive form you use for releases).

I've also read Guido's blog post, which repeats the above list and adds an encouraging comment:

Python 3.0 will break backwards compatibility. Totally. We're not even aiming for a specific common subset.

In other words, one has to maintain a Python 2.x code base, then run 2to3 tool to get it converted. If you want to develop using Python 3.x, you can't, because all code must be developed using 2.x. As to the actual porting, Guido says in the above post:

If the conversion tool and the forward compatibility features in Python 2.6 work out as expected, steps (2) through (6) should not take much more effort than the typical transition from Python 2.x to 2.(x+1).

So sometime in 2010 or 2011 I started porting SymPy, which is now a pretty large code base (sloccount says over 230,000 lines of code, and in January 2010 it said almost 170,000 lines). I remember spending a few full days on it, and I just gave up, because it wasn't just changing a few things, but pretty fundamental things inside the code base, and one cannot just do it half-way, one has to get all the way through and then polish it up. We ended up using one full Google Summer of Code project for it, you can read the final report. I should mention that we use metaclasses and other things, that make such porting harder. Conclusion: this was definitely not "the typical transition from Python 2.x to 2.(x+1)". Ok, after months of hard work by a lot of people, we finally have a Python 2.x code base that can be translated using the 2to3 tool and it works and tests pass in Python 3.x. The next problem is that Python 3.x is pretty much like a ghetto -- you can use it as a user, but you can't develop in it. The 2to3 translation takes over 5 minutes on my laptop, so any interactivity is gone. It is true that the tool can cache results, so the next pass is somewhat faster, but in practice this still turns out to be much much worse than any compilation of C or Fortran programs (done for example with cmake), both in terms of time and in terms of robustness. And I am not even talking about pip issues or setup.py issues regarding calling 2to3. What a big mess... Programming should be fun, but this is not fun. I'll be honest, this situation killed a lot of my enthusiasm for Python as a platform. I learned modern Fortran in the meantime and with admiration I noticed that it still compiles old F77 programs without modification and I even managed to compile a 40 year old pre-F77 code with just minimal modifications (I had to port the code to F77). Yet modern Fortran is pretty much a completely different language, with all the fancy features that one would want. Together with my colleagues I created a fortran90.org website, where you can compare Python/NumPy side by side with modern Fortran, it's pretty much 1:1 translation and a similar syntax (for numerical code), except that you need to add types of course. Yet Fortran is fully backwards compatible. What a pleasure to work with! Fast forward to last week. A heroic effort by Sean Vig who ported SymPy to single code base (#2318) was merged. Earlier this year similar pull requests by other people have converted NumPy (#3178, #3191, #3201, #3202, #3203, #3205, #3208, #3216, #3223, #3226, #3227, #3231, #3232, #3235, #3236, #3237, #3238, #3241, #3242, #3244, #3245, #3248, #3249, #3257, #3266, #3281, #3191, ...) and SciPy (#397) codes as well. Now all these projects have just one code base and it works in all Python versions (2.x and 3.x) without the need to call the 2to3 tool. Having a single code base, programming in Python is fun again. You can choose any Python version, be it 2.x or 3.x, and simply submit a patch. The patch is then tested using Travis-CI, so that it works in all Python versions. Installation has been simplified (no need to call any 2to3 tools and no more hacks to get setup.py working). In other words, this is how it should be, that you write your code once, and you can use any supported language version to run it/compile it, or develop in. But for some reason, this obvious solution has been discouraged by Guido and other Python documents, as seen above. I just looked up the latest official Python docs, and that one is not upfront negative about a single code base. But it still does not recommend this approach as the one. So let me fix that: I do recommend a single code base as the solution. The newest Python documentation from the last paragraph also mentions

Regardless of which approach you choose, porting is not as hard or time-consuming as you might initially think.

Well, I encourage you to browse through the pull requests that I linked to above for SymPy, NumPy or SciPy. I think it is very time consuming, and that's just converting from 2to3 to single code base, which is the easy part. The hard part was to actually get SymPy to work with Python 3 (as I discussed above, that took couple months of hard work), and I am pretty sure it was pretty hard to port NumPy and SciPy as well. The docs also says:

It /single code base/ does lead to code that is not entirely idiomatic Python

That is true, but our experience has been, that with every Python version that we drop, we also delete lots of ugly hacks from our code base. This has been true for dropping support for 2.3, 2.4 and 2.5, and I expect it will also be true for dropping 2.6 and especially 2.7, when we can simply use the Python 3.x syntax. So not a big deal overall. To sum this blog post up, as far as I am concerned, pretty much all the problems with supporting Python 2.x and 3.x are fixed by having a single code base. You can read the pull requests above to see how to implemented things (like metaclasses, and other fancy stuff...). Python is still quite the same language, you write your code, you use a Python version of your choice and things will just work. Not a big deal overall. The official documentation should be fixed to recommend this approach, and deprecate the other approaches. I think that Python is great and I hope it will be used more in the future.

Written with StackEdit.

2 July 2013

Ondřej Čertík: My impressions from the SciPy 2013 conference

I have attended the SciPy 2013 conference in Austin, Texas. Here are my impressions.

Number one is the fact that the IPython notebook was used by pretty much everyone. I use it a lot myself, but I didn't realize how ubiquitous it has become. It is quickly becoming the standard now. The IPython notebook is using Markdown and in fact it is better than Rest. The way to remember the "[]()" syntax for links is that in regular text you put links into () parentheses, so you do the same in Markdown, and append [] for the text of the link. The other way to remember is that [] feel more serious and thus are used for the text of the link. I stressed several times to +Fernando Perez and +Brian Granger how awesome it would be to have interactive widgets in the notebook. Fortunately that was pretty much preaching to the choir, as that's one of the first things they plan to implement good foundations for and I just can't wait to use that.

It is now clear, that the IPython notebook is the way to store computations that I want to share with other people, or to use it as a "lab notebook" for myself, so that I can remember what exactly I did to obtain the results (for example how exactly I obtained some figures from raw data). In other words --- instead of having sets of scripts and manual bash commands that have to be executed in particular order to do what I want, just use IPython notebook and put everything in there.

Number two is that how big the conference has become since the last time I attended (couple years ago), yet it still has the friendly feeling. Unfortunately, I had to miss a lot of talks, due to scheduling conflicts (there were three parallel sessions), so I look forward to seeing them on video.

+Aaron Meurer and I have done the SymPy tutorial (see the link for videos and other tutorial materials). It's been nice to finally meet +Matthew Rocklin (very active SymPy contributor) in person. He also had an interesting presentation
about symbolic matrices + Lapack code generation. +Jason Moore presented PyDy.
It's been a great pleasure for us to invite +David Li (still a high school student) to attend the conference and give a presentation about his work on sympygamma.com and live.sympy.org.

It was nice to meet the Julia guys, +Jeff Bezanson and +Stefan Karpinski. I contributed the Fortran benchmarks on the Julia's website some time ago, but I had the feeling that a lot of them are quite artificial and not very meaningful. I think Jeff and Stefan confirmed my feeling. Julia seems to have quite interesting type system and multiple dispatch, that SymPy should learn from.

I met the VTK guys +Matthew McCormick and +Pat Marion. One of the keynotes was given by +Will Schroeder from Kitware about publishing. I remember him stressing to manage dependencies well as well as to use BSD like license (as opposed to viral licenses like GPL or LGPL). That opensource has pretty much won (i.e. it is now clear that that is the way to go).

I had great discussions with +Francesc Alted, +Andy Terrel, +Brett Murphy, +Jonathan Rocher, +Eric Jones, +Travis Oliphant, +Mark Wiebe, +Ilan Schnell, +St fan van der Walt, +David Cournapeau, +Anthony Scopatz, +Paul Ivanov, +Michael Droettboom, +Wes McKinney, +Jake Vanderplas, +Kurt Smith, +Aron Ahmadia, +Kyle Mandli, +Benjamin Root and others.

It's also been nice to have a chat with +Jason Vertrees and other guys from Schr dinger.

One other thing that I realized last week at the conference is that pretty much everyone agreed on the fact that NumPy should act as the default way to represent memory (no matter if the array was created in Fortran or other code) and allow manipulations on it. Faster libraries like Blaze or ODIN should then hook themselves up into NumPy using multiple dispatch. Also SymPy would then hook itself up so that it can be used with array operations natively. Currently SymPy does work with NumPy (see our tests for some examples what works), but the solution is a bit fragile (it is not possible to override NumPy behavior, but because NumPy supports general objects, we simply give it SymPy objects and things mostly work).

Similar to this, I would like to create multiple dispatch in SymPy core itself, so that other (faster) libraries for symbolic manipulation can hook themselves up, so that their own (faster) multiplication, expansion or series expansion would get called instead of the SymPy default one implemented in pure Python.

Other blog posts from the conference:

Aaron's post
Fernando's post

5 June 2012

Ondřej Čertík: How to convert scanned images to pdf

From time to time I need to convert scanned documents to a pdf format.

Usage scenario 1: I scan part of a book (i.e. some article) on a school's scanner that sends me 10 big separate color pdf files (one pdf per page). I want to get one nice, small (black and white) pdf file with all the pages.

Usage scenario 2: I download a web form, print it, fill it in, sign it, scan it on my own scanner using Gimp and now I want to convert the image into a nice pdf file (either color or black & white) to send back over email.

Solution: I save the original files (be it pdf or png) into a folder and use git to track it. Then create a simple reduce script to convert it to the final format (view it as a pipeline). Often I need to tweak one or two parameters in the pipeline.

Here is a script for scenario 1:
<script src="https://gist.github.com/2871625.js?file=reduce2.sh"></script>
And here for scenario 2:
<script src="https://gist.github.com/2871625.js?file=reduce1.sh"></script>
There can be several unexpected surprises along the way. From my experience:

If I convert png directly to tiff, sometimes the resolution can be wrong. The solution is to always convert to ppm (color) or pbm (black and white) first, which is just a simple file format containing the raw pixels. This is the "starting" format (so first I need to convert the initial pdf or png into ppm/pbm) and then do anything else. That proved to be very robust.
The tiff2pdf utility proved to be the most robust way to convert an image to a pdf. All other ways that I have tried failed in one way or another (resolution, positioning, paper format and other things were wrong....). It can create multiple pages pdf files, set paper format (US Letter, A4, ...) and so on.
The linux convert utility is a robust tool for cropping images, converting color to black and white (using a threshold for example) and other things. As long as the image is first converted to ppm/pbm. In principle it can also produce pdf files, but that didn't work well for me.
I sometimes use the unpaper program in the pipeline for some automatic polishing of the images.

In general, I am happy with my solution. So far I was always able to get what I needed using this "pipeline" method.

26 January 2012

Ondřej Čertík: When double precision is not enough

I was doing some finite element (FE) calculation and I needed the sum of the lowest 7 eigenvalues of a symmetric matrix (that comes from the FE assembly) to converge to at least 1e-8 accuracy (so that I can check calculation done by some other solver of mine, that calculates the same but doesn't use FE). In reality I wanted the rounded value to 8 decimal digits to be correct, so I really needed 1e-9 accuracy (but it's ok if it is let's say 2e-9, but not ok if it is 9e-9). With my FE solver, I couldn't get it to converge more than to roughly 5e-7 no matter how hard I tried. Now what?

When doing the convergence, I take a good mesh and keep increasing "p" (the polynomial order) until it converges. For my particular problem, it is fully converged for about p=25 (the solver supports the order up to 64). Increasing "p" further will not increase the accuracy anymore, and the accuracy stays at the level 5e-7 for the sum of the lowest 7 eigenvalues. For optimal meshes, it converges at p=25, for not optimal meshes, it converges for higher "p", but in all cases, it doesn't get below 5e-7.

I know from experience, that for simpler problems, the FE solver can easily converge to 1e-10 or more using double precision. So I know it is doable, now the question is what the problem is: there
are a few possible reasons:

The FE quadrature is not accurate enough
The condition number of the matrix is high, thus LAPACK doesn't return very accurate eigenvalues
Bug in the assembly/solver (like single/double corruption in Fortran, or some other subtle bug)

When using the same solver for simpler potential, it converged nicely to 1e-10. So this suggests there is no bug in the assembly or solver itself. It is possible that the quadrature is not accurate enough, but again, if it converges for simple problem, it's probably not it. So it seems it is the ill conditioned matrix, that causes this. So I printed the residuals (that I simply calculated in Fortran using the matrix and the eigenvectors returned by LAPACK), and it only showed 1e-9. For simpler problems, it can go to 1e-14 easily. So that must be it. How do we fix it?

Obviously by making the matrix less ill conditioned, which is caused by the mesh for the problem (the ratio of the longest/shortest elements is 1e9) but for my problem I really needed such a mesh. So the other option is to increase the real number accuracy.

In Fortran all real variables are defined as real(dp), where dp is an integer defined at a single place in the project. There are several ways to define it, but it's value is 8 for gfortran and it means double precision.So I increased it to 16 (quadruple precision), recompiled. Now the whole program calculates in quadruple precision (more than 30 significant digits). I had to recompile LAPACK using the "-fdefault-real-8" gfortran option, that promotes all double precision numbers to quadruple precision, and I used the "d" versions (double precision, now promoted to quadruple) of LAPACK routines.

I rerun the calculation ---- and suddenly LAPACK residuals are around 1e-13, and the solver converges to 1e-10 easily (for the sum of the lowest 7 eigenvalues). Problem solved.

Turning my Fortran program to quadruple precision is as easy as changing one variable and recompiling. Turning LAPACK to quadruple precision is easy with a single gfortran flag (LAPACK uses the old f77 syntax for double precision, if it used real(dp), then I would simply change it as for my program). The whole calculation got at least 10x slower with quadruple. The reason is that gfortran runtime uses the libquadmath library, that simulates quadruple precision (as current CPUs only support double precision natively).

I actually discovered a few bugs in my program (typically some constants in older code didn't use the "dp" syntax, but had the double precision hardwired). Fortran warns about all such cases, when the real variables have incompatible precision.

It is amazing how easy it is to work with different precision in Fortran (literally just one change and recompile). How could this be done with C++? This wikipedia page suggests, that "long double" is only 80bit in most cases (quadruple is 128bit), but gcc offers __float128, so it seems I would have to manually change all "double" to "__float128" in the whole C++ program (this could be done with a single "sed" command).

19 November 2010

Ondřej Čertík: Google Code vs GitHub for hosting opensource projects

Cython is now considering options where to move the main (mercurial) repository, and Robert Bradshaw (one of the main Cython developers) has asked me about my experience with regards to Google Code and GitHub, since we use both with SymPy.

Google Code is older, and it was the first service that provided free (virtually unlimited) number of projects that you could easily and immediately setup. At that time (4 years ago?) that was something unheard of. However, the GitHub guys in the meantime not only made this available too, but also implemented features, that (as far as I know) no one offers at all, in particular hosting your own pages at your own domain (but at GitHub's servers, some examples are sympy.org and docs.sympy.org), commenting on git branches and pull requests before the code gets merged in (I am 100% convinced that this is the right approach, as opposed to comment on the code after it gets in), allow to easily fork the repository and it has simply more social features, that the Google Code doesn't have.

I believe that managing an opensource project is mainly a social activity, and GitHub's social features really make so many things easier. From this point of view, GitHub is clearly the best choice today.

I think there is only one (but potentially big) problem with GitHub, that its issue tracker is very bad, compared to the Google Code one. For that reason (and also because we already use it), we keep our issues at Google Code with SymPy.

The above are the main things to consider. Now there are some little things to keep in mind, that I will briefly touch below: Google Code doesn't support git and blocks access from Cuba and other countries, when you want to change the front page, you need to be an admin, while at GitHub I simply add push access to all sympy developers, so anyone just pushes a patch to this repository: https://github.com/sympy/sympy.github.com, and it automatically appears on our front page (sympy.org), with Google Code we had to write long pages (in our docs) about how to send patches, with GitHub we just say, send us a pull request, and point to: http://help.github.com/pull-requests/. In other words, GitHub takes care of teaching people how to use git and figure out how to send patches, and we can concentrate on reviewing the patches and pushing them in.

Wikipages at github are maintained in git, and they provide the webfrontend to it as opensource, so there is no vendor lock-in. Anyone with github account can modify our wiki pages, while the Google Code pages can only be modified by people that I add to the Google Code project, which forced us to install mediawiki on my linode server (hosted at linode.com, which by the way is an excellent VPS hosting service, that I have been using for couple of years already and I can fully recommend it), and I had to manage it all the time, and now we are moving our pages to the github wiki, so that I have one less thing to worry about.

So as you can see, I, as admin, have less things to worry about, as github manages everything for me now, while with Google Code, I had to manage lots of things on my linodes.

One other thing to consider is that GitHub is only for git, but they also provide svn and hg access (both push and pull, they translate the repository automatically between git and svn/hg), I never really used it much, so I don't know how stable this is. As I wrote before, I think that git is the best tool now for maintaining a project, and I think that github is now the best choice to host it (except the issue tracker, where Google Code is better).

31 October 2010

Ondřej Čertík: git has won

I switched to git from mercurial about two years ago. See here why I switched and here my experience after 4 months. Back then I was unsure, whether git will win, but I thought it has a bigger momentum. Well, I think that now it's quite clear that git has already won. Pretty much everybody that I collaborate with is using git now.

I use github everyday, and now thanks to github pull requests, I think it's the best collaboration platform out there (compared to Google Code, Sourceforge, Bitbucket or Launchpad).

I think it's partly because the github guys have a clear vision of what has to be done in order to make collaboration more easier and they do it, but more importantly that git branches is the way to go, as well as other git features, that are "right" from the beginning (branches, interactive rebase, and so on), while other VCS like bzr and mercurial simply either don't have them, or are getting them, but it's hard to get used to it (for example mercurial uses the "mercurial queues", and I think that is the totally wrong approach to things).

Anyway, this is just my own personal opinion. I'll be happy to discuss it in the comments, if you disagree.

20 July 2010

Ondřej Čertík: Theoretical Physics Reference Book

Today I fulfilled my old dream --- I just created my first book! Here is how it looks like:

More images
here.

Here is the source code of the book: http://github.com/certik/theoretical-physics, the repository contains a branch 'master' with the code and 'gh-pages' with the generated html pages, that are hosted at github, at the url theoretical-physics.net.

Then I published the book at Lulu: http://www.lulu.com/product/hardcover/theoretical-physics-reference/11612144, I wanted a hardcover book, so I setup a project at Lulu, used some Lulu templates for the cover and that was it. Lulu's price for the book is $19.20 (166 black & white pages, hardcover), then I can set my own price and the rest of the money probably goes to me. I set the price to $20, because Lulu has free shipping for orders $20 or more. You can also download the pdf (for free) at the above link (or just use my git repository). So far this didn't cost me anything.

I have then ordered the book myself (just like anybody else would, at the above address) and it arrived today. It's a regular hardcover book. Beautiful, you can browse the pictures above. It smells deliciously (that you have to believe me). And all that it cost me was $19.20.

As for the contents itself, you can browse it online at theoretical-physics.net, essentially it's most of my physics notes, that I collected over the years. I'd like to treat books like software --- release early release often. This is my first release and I would call it beta. The main purpose of it was to see if everything goes through, how long it takes (the date inside the book is July 4, 2010, I created and ordered it on July 5, got the physical book on July 19) and what the quality is (excellent). I also wanted to see how the pages correspond to the pdf (you can see for yourself on the photos, click on the picasa link above).

Now I need to improve the first pages a bit, as well as the last pages, improve the index, write some foreword and so on. I also need to think how to better organize the contents itself and generally improve it. I also need to figure out some versioning scheme, so far this is version 0.1. I think I'll do edition 1, edition 2, edition 3, and so on. And whenever I feel that I have added enough new content, I'll just publish it as a new edition. So if you want to buy it, I suggest to wait for my 1.0 version, that will have the mentioned improvements.

It'd be also cool to have all the editions online somehow and create nice webpages for it (currently theoretical-physics.net points directly to the book html itself).

So far the book is just text. I still need to figure out how to handle pictures and also whether or not to use program examples (in Python, using sympy, scipy, etc.). So far I am inclining not to put there any program codes, as then I don't need to maintain them.

Overall I am very pleased with the quality, up to some minor issues that I mentioned above, everything else end up just fine. I think we have come a long way from the discovery of the printing press. Anybody can now create a book for free, and if you want to hold the hardcopy in your hands, it costs around $20. You don't need to order certain amounts of books, nor partner with some publisher etc. I think that's just awesome.

14 December 2009

Ondřej Čertík: ESCO 2010 conference

An interesting conference 2nd European Seminar on Coupled Problems (ESCO) will be held on June 28 July 2, 2010 in Pilsen, Czech Republic.
Among the topics are solving PDEs and applications and using Python for scientific computing. In particular, Ga l Varoquaux is the keynote speaker.

Unfortunately, it was later announced that the SciPy 2010 conference is going to be at the same time, which is really unfortunate. But here are some reasons why you should consider going to ESCO 2010 instead:

If you like numerical calculations (finite elements, differences, volumes, ...) and solving partial differential equations and other problems and also programming in Python, together with C/C++ or Fortran, you will have a chance to meet some of the top people in the field. SciPy conference usually has people who solve PDEs (e.g. SciPy 09 had about 6), but ESCO 2010 will have about 60. So ESCO wins.

Robert Cimrman, who you probably know from the scipy and numpy mailinglists, also the author of the sfepy FEM package in Python, lives in Pilsen, so he'll gladly show you some good Pilsen pubs. SciPy 2010 is going to be in Austin and while Austin has cool pubs too, I must be fair and I liked that (I was there at the Sage 08 days), but it's just not comparable, the beer is better in Pilsen, it's a historic city and there are more pubs.

Pilsen is close to Prague, so you will have the chance to visit it. You should walk in the old town, have couple beers etc. Here you can see some photos that Ga l took when we met in Prauge. Again, this is incomparable with Austin.

It is held in the Pilsner Urquell Brewery. When Pavel ol n announced that at the SciPy 09 conference, Jarrod asked "Ah, in a beer pub?". So let me be clear. The word pilsner (type of the beer) is coming from the Czech city Pilsen (Plze in Czech). Pilsner Urquell is not some beer pub (e.g. even Reno where I live now has a beer pub), it's The Brewery. Austin is a cool place (and Texas steaks are really good), but as you can see now, it absolutely cannot compete with Pilsen.

If you have time, I can fully recommend to go to ESCO 2010.

10 May 2009

Ondřej Čertík: My experience with running an opensource project

Nir Aides, the author of the excellent winpdb debugger, sent me the following email on September 21, 2008, so I asked him if I can copy his email and reply in form of a blog post (so that other people can comment and join the discussion) and he agreed. It took me almost a year to reply, but I made it. :)

Hi Ondrej,

How are you?

I am about to publish a new free software project, a new simple PHP framework, and I am interested in your advice.

You started SymPy and were able to make other people join you and develop it with you.
How did you do it?
How did it happen?
Did you actively call for other people or they spontaneously showed interest and joined you?
Are the other major contributor people who were your friends before you started the project?
Did you need to create or manage the project in a particular way to make it attractive to other people?
Are there things you are aware of that promote collaboration or demote it?

I was never successful in doing the same with Winpdb, which while it became reasonably popular, no one has ever joined me to develop it, except for a notable tutorial contribution by Chris Lasher which was developed independently.

Now with the new project, I am wondering what are my chances of making other people try it and take it on. On the one hand it is a new and fresh code base in an interesting field, on the other hand, why would anyone bother to spend their energy on this new project when they have Symfony or Drupal?

What do you think?

BTW, Ohloh believes you have a median of 19,000 lines of changed code per month since the start of their log. Can this be true? Is this humanly possible? According to it SymPy has over 1,000,000 lines of code? I can't understand these numbers. Winpdb has about 25,000 lines after 3 years of development. And from my experience 1,000,000 lines of code projects need about 20-50 full time developers to work on for 2-5 years which is about 40-250 man years. And as if this is not enough you are listed as owner in a dozen other projects in Google code and have enough time to become an awarded scientist. How is this possible?

http://www.ohloh.net/p/sympy/contributors/

BTW2, do you still use Winpdb? If you find yourself using it less, can you say what are the reasons, or what it would take to make it more useful?

BTW3, How is SymPy doing?

Cheers,
Nir

So my most honest answer how to run a successful opensource project is: I don't know.

But nevertheless I tried to summarize some of my ideas and experience and some guidelines that I try to follow, maybe it will be useful to you Nir, or anyone else.

First of all, there has to be a public mailinglist (easily accessible), public bug tracker, nice webpage, easy to find downloads, frequent releases (once a month is good, but in the worst case at least 4 times a year) and a set of guidelines to follow in order to contribute. So that's a must, if the project doesn't have the above, it's almost impossible to become successful. However, that is just a start, just a playground. There are still many projects that have the above and yet they totally fail to attract developers.

So I think the most important principle is that I always think how to employ other people in what I do. If I have some plan in my head how to do something, e.g. how to move some things forward, I always create exact steps and put it to issues, or our mailinglist, so that each step can be done by someone who is completely new to sympy. So I try to look at things from other people's perspective and think -- ok, I quite like this SymPy project and I'd like to get this done (for example a new release, or something fixed, or implemented), but I have no idea how to start and what exactly needs to be done.

So what I try to do if someone comes to our list and asks for something, is that I create a new issue for it and think how I would fix it if I had time. Then write the necessary steps in the issue and invite the submitter to fix it and I offer help with explaining anything and guiding. Now there are two things that can happen. Either the submitter has time and a will to go forward and in this case he starts wrestling with it and whenever he has some code or a question, I need to find time, review it and offer some way out. Or the submitter is too busy, in which case the instructions simply rest in the issues and the next time someone asks for the feature, the instructions are already there. I don't have estimates how frequent either case is.

When I am working on something myself, I try not to code privately, but also put up issues first and put the steps needed in the issues, so that it's easy for other people to join in.

In general, the most precious value for me is the fact that someone else had to sit down at his computer and wrote the patch. So I do everything possible to get new (or more) people interested in the development. Some people think that only super programmers can do a decent job and it's useless to invest time in people that may just have started with Python. They are wrong. Among the SymPy developers (around 65 people total have contributed patches so far at the time of writing this post), we have all kinds of people. We have people from high school, we have a retired US army engineer, we have physicists, mathematicians, biologists, engineers, teachers, or just hobbyists, who do it for fun. Unfortunately, we do not have many women (I think no patch that made it into sympy was contributed by a woman, but I may be wrong), so if anyone has any ideas how to get more women involved, let me know (I know we have several women fans, so that's a good start:). We have people whose first open source project they ever contributed to was sympy and people who are new to Python.

Many times the first patch that a new potential developer submits is not perfect, usually it's faster for me to write it myself, than to help with the first patch, however my rule is to always help the submitter do that. Sometimes he sends a second patch, or a third, and usually it needs less and less work on my side and it already pays off, because he is then able to fix things himself, if he discovers a bug and sympy has just won a one more contributor.

So I came to the conclusion that all that is needed is an enthusiasm. You don't even have to know Python (as you can learn all these things on the way) and you can still do useful things for us and really spare our time.

To answer another question from Nir's email, SymPy has about 130000 lines of code and another about 20000 lines of tests, so I think those stats are wrong. Also the changed lines of code is in my opinion wrong, we usually have about 250 new patches per release (this depends how often we release and other things).

Yes, I am involved in couple other projects, e.g. Debian, Sage, ipython, scipy, hpfem.org (and couple more), basically everything that has to do with numeric simulation and Python, but my activity there varies. The most time consuming thing in the last couple years was definitely school, I was finishing my master in Theoretical Physics in Prague and then moved to the Nevada/Reno and I just finished my first semester here at PhD in Chemical Physics, and sometimes it was just crazy, e.g. I finished teaching at 7pm and instead of going home and sleep, I stayed in my office, fixed 10 sympy issues that were holding off a release, finished at 1am, went home (by bike, since I don't have a car yet), slept couple hours and then did just school again for a week, other people reviewed the issues in the meantime, and then I made the release (instead of sleeping again). In the last semester it was not unusual that I got home at 1am every week day, then slept most of Saturday to catch up, on Sunday I did some laundry and shopping, and the rest of time I did grading and homeworks for all my classes and teaching, no time for anything else (e.g. no friends, no girls, no rest, no hobby, no opensource stuff, nothing). So sometimes one has to work pretty hard to get through it, but fortunately it's behind me finally, if all goes well, I should be just doing research from now on and have a real life too. Also I am sorry I didn't manage to reply sooner. :)

To answer the other questions:

Are the other major contributor people who were your friends before you started the project?

No, not a single major contributor was my friend before I started the project. Every single one of them become a developer using the procedure I described above, e.g. first showed on the list or in the issues, and maybe even the very first patch was not a high quality one (and if I was stupid and arrogant, or didn't see the big potential, I would just ignore them). But when given a chance, they became extremely good developers and sympy would simply just not be here without them.

Did you actively call for other people or they spontaneously showed interest and joined you?

I very much encourage everyone to contribute, but the initial interest must be in them, e.g. they at least have to show around the mailinglist/issues, so that I know about them. But once I know they are interested in some issue, yes, I try to invite them to fix it, with my help.

One observation I made is that I have to always think in the spirit "how to earn new money, not how to spare the money I already have", e.g. when applied to sympy, how to get new developers, how to develop the new great things etc. Even if I am super busy as I was, I still have to think this way. Once I start thinking how to conserve and preserve what we already have, I am done, finished and that's the road to hell.

If I am open, positive, full of energy, I can see people joining me and we can do great things together. It probably sounds obvious, but it was not for me, when for example some people I worked with, started their own projects, when I got busy, and started to compete, instead of helping sympy out. And I felt betrayed, after so much work that I invested into it and started to become protective. And then I realised that's wrong. I can never stop other people do what they want to do. If they want to have their own project, they will have it. If they don't want to help sympy out, they won't (and what is more important, there is nothing wrong with either of that). It's that simple and being protective only makes things worse.

There is also a question of the license that you use for the project, e.g. one should basically only choose between BSD (maybe also MIT or Apache), LGPL and GPL (there are also several versions of the GPL licenses). Unfortunately the fact is, that there are people who will never contribute a code under a permissive BSD license (because it's not protecting their work enough) and there are also other people who really want to code to be BSD (or other permissive license) so they can sell it and they don't need to consult with lawyers what they are or aren't allowed to do and also so that they can combine it with any other code (opensource or not). It also depends if one wants to combine (and distribute) other codes together. So choosing a license is also important. I believe that for sympy BSD is the best and for other projects (like Sage) GPL is the best and one has to decide on a case by case basis. For Winpdb, I would make it BSD too, since you can get more people using it.

To conclude, SymPy is a little more than 2 years old, and it has been a great ride so far and more things are coming, e.g. this summer we have 5 Google Summer of Code students and people are starting it to use in their research and we plan to use it in our codes at our group here in Reno too, so things look promising. I am really glad, we managed to build such a community, so that when I am busy, as I was the last semester, other people help out with patches, reviews and other things, so that the project doesn't stall and when I got rid of my school duties now, we can move things forward a lot.

So maybe you can get inspired by some of the ideas above. I am also interested in any discussion about this (feel free to post a comment below, or send me an email, or just write to a sympy list about what you think).

17 March 2009

Ondřej Čertík: Newtonian Mechanics with SymPy

Luke Peterson from UC Davis came to visit me in Reno and we spent the last weekend hacking on the Python Dynamics package that uses SymPy to calculate equations of motion for basically any rigid body system.

On Friday we did some preliminary work, mostly on the paper, Luke showed me his rolling torus demo that he did with the proprietary autolev package. We set ourselves a goal to get this implemented in SymPy by the time Luke leaves and then we went to the Atlantis casino together with my boss Pavel and other guys from the Desert Research Institute and I had my favourite meal here, a big burger, fries and a beer.

On Saturday we started to code and had couple lines of the autolev torus script working. Then we went on the bike ride from Reno to California. I took some pictures with Luke's iphone:

Those mountains are in California and we went roughly to the snow line level and back:

This is Nevada side:

That was fun. Then we worked hard and by the evening we had a dot product and a cross product working, so we went to an Irish pub to have couple beers and I had my burger as usual.

On Sunday we spent the whole day and evening coding and we got the equations of motion working. On Monday we worked very hard again:

and fixed some remaining nasty bugs. I taught Luke to use git, so our code is at: http://github.com/hazelnusse/pydy, for the time being we call it pydy and after we polish everything, we'll probably put it into sympy/physics/pydy.py. If you run rollingtorus.py, you get this plot of the trajectory of the torus in a plane:

It's basically if you throw a coin on the table, e.g. this model takes into account moments of inertia, yaw (heading), lean, spin and the x-y motion in the plane. Depending on the initial conditions, you can get many different trajectories, e.g for example:

or:

This is very exciting, as the code is very short, and most of the things that Luke needs are needed for all the other applications of sympy, e.g. a good printing of equations and vectors (both in the terminal and in latex), C code generation, fast handling of expressions, nice ipython terminal for experimentation, plotting, etc.

Together with the atomic physics package that we started to develop with Brian sympy will soon be able to cover some basic areas of physics. Other areas are general relativity (there is some preliminary code in examples/advanced/relativity.py) and quantum field theory and Feynman diagrams - for that we need someone enthusiastic that needs this for his/her research --- if you are interested, drop me an email, you can come to Reno (or work remotely) and we can get it done.

My vision is that sympy should be able to handle all areas of physics, e.g. it needs good assumptions (if you want to help out, please help us test Fabian's patches here), then faster core, we have a pretty good optional Cython core here, so we'll be merging it after the new assumptions are in place. Then sympy should have basic modules for most areas in physics so that one can get started really quickly. From our experience so far in sympy/physics, those modules will not be big, as most of the functionality is not module specific.

23 December 2008

Ondřej Čertík: Experience with git after 4 months

I switched to git from mercurial about 4 months ago. As Matt Mackall (the main author of mercurial) has pointed out, my arguments were partially just excuses for the switch, because everything could be fixed in mercurial as well (eventually). On the other hand, I really wasn't expecting to switch when I invested time and energy to learn mercurial. Also because I thought that git doesn't have such a nice documentation, has a steep learning curve and the community is hostile, because it is developed by the kernel people who are famous for sometimes being not so nice on the mailinglists (as I now found out, none of these are true). I was getting annoyed by small things one by one, after finally I said to myself enough is enough, let's just switch.

So as to the steep learning curve --- it took me a day or two to being able to do what I want, so it seems to me it's really easy to learn once you know some distributed vcs. I found out the documentation is much better than in mercurial --- in git one just types "git help command" and a full man page fires up with lot's of information and examples. In mercurial, one just gets a couple lines (compared to git) of help in the terminal, without examples. Also the first (actually second) thing I noticed is that "git log" (and other commands like "git diff") doesn't scroll out my terminal, but uses "less" automatically -- that's just great. The first thing I noticed is that git is superfast for almost every operation, especially it's noticeable with "git diff". As to the community, I only had a chance to send a few emails to the git list, so I cannot really say, but so far it's very responsive and friendly.

However, those are just little things that can (and I am sure they will) be fixed in mercurial as well. What is a big thing are branches, especially remote branches. One just fetches a remote repository and then works with it like with any other local branch, e.g. one can switch to it, or just "git log some_remote_branch" to see what is in there. One can easily compare them etc. With mercurial, I was using "hg out" and "hg in" commands to see what changes I will pull or push, but those commands require internet connection, so it really sucks and it's slow. In mercurial, I was using different directories for different branches, but that's just extremely inconvenient and also slow (creating a new branch is just a matter of adding one file, not copying the whole repository --- for example with the sympy hg repository, I often had to wait several seconds to clone a repo, while with the git repository it's just instant) and big (in terms of megabytes).

The bible of Mercurial, the hgbook says:

In most instances, isolating branches in repositories is the right approach.

But I think it's the wrong approach. The only argument for doing this that I accept is that sometimes you are not sure which branch you are in in git --- well, I use this trick (actually, just read the documentation of __git_ps1 in /etc/bash_completion.d/git), so my bash line always says which branch I am in using the red color. I never do any mistakes with this.

Anyways, I could continue like this (e.g. check out "git svn", "git rebase -i" and many other things), but you can read why git is better for example here, no need to repeat here. I am very happy now and I can only recommend to learn git.

So let me also say some things that are worse in git --- one thing that could be improved are URLs of the gitweb (hgweb urls are neat, gitweb urls use "?id=..." stuff). Another thing is that the debian package git is not git! You need to install git-core to get git (but this seems to be getting fixed).

There is also an interesting discussion happening right now in the debian-python mailinglist about switching the DPMT repositories to git. I think almost all opinions (for and against) were already stated, but if you have some opinion that wasn't yet said, please do so.

Emilio Pozuelo Monfort: Collaborative maintenance

The Debian Python Modules Team is discussing which DVCS to switch to from SVN. Ondrej Certik asked how to generate a list of commiters to the team s repository, so I looked at it and got this:



emilio@saturno:~/deb/python-modules$ svn log   egrep "^r[0-9]+     cut -f2 -d    sed  s/-guest//    sort   uniq -c   sort -n -r

    865  piotr

    609  morph

    598  kov

    532  bzed

    388  pox

    302  arnau

    253  certik

    216  shlomme

    212  malex

    175  hertzog

    140  nslater

    130  kobold

    123  nijel

    121  kitterma

    106  bernat

     99  kibi

     87  varun

     83  stratus

     81  nobse

     81  netzwurm

     78  azatoth

     76  mca

     73  dottedmag

     70  jluebbe

     68  zack

     68  cgalisteo

     61  speijnik

     61  odd_bloke

     60  rganesan

     55  kumanna

     52  werner

     50  haas

     48  mejo

     45  ucko

     43  pabs

     42  stew

     42  luciano

     41  mithrandi

     40  wardi

     36  gudjon

     35  jandd

     34  smcv

     34  brettp

     32  jenner

     31  davidvilla

     31  aurel32

     30  rousseau

     30  mtaylor

     28  thomasbl

     26  lool

     25  gaspa

     25  ffm

     24  adn

     22  jmalonzo

     21  santiago

     21  appaji

     18  goedson

     17  toadstool

     17  sto

     17  awen

     16  mlizaur

     16  akumar

     15  nacho

     14  smr

     14  hanska

     13  tviehmann

     13  norsetto

     13  mbaldessari

     12  stone

     12  sharky

     11  rainct

     11  fabrizio

     10  lash

      9  rodrigogc

      9  pcc

      9  miriam

      9  madduck

      9  ftlerror

      8  pere

      8  crschmidt

      7  ncommander

      7  myon

      7  abuss

      6  jwilk

      6  bdrung

      6  atehwa

      5  kcoyner

      5  catlee

      5  andyp

      4  vt

      4  ross

      4  osrevolution

      4  lamby

      4  baby

      3  sez

      3  joss

      3  geole

      2  rustybear

      2  edmonds

      2  astraw

      2  ana

      1  twerner

      1  tincho

      1  pochu

      1  danderson

As it s likely that the Python Applications Packaging Team will switch too to the same DVCS at the same time, here are the numbers for its repo:



emilio@saturno:~/deb/python-apps$ svn log   egrep "^r[0-9]+     cut -f2 -d    sed  s/-guest//    sort   uniq -c   sort -n -r

    401  nijel

    288  piotr

    235  gothicx

    159  pochu

     76  nslater

     69  kumanna

     68  rainct

     66  gilir

     63  certik

     52  vdanjean

     52  bzed

     46  dottedmag

     41  stani

     39  varun

     37  kitterma

     36  morph

     35  odd_bloke

     29  pcc

     29  gudjon

     28  appaji

     25  thomasbl

     24  arnau

     20  sc

     20  andyp

     18  jalet

     15  gerardo

     14  eike

     14  ana

     13  dfiloni

     11  tklauser

     10  ryanakca

     10  nxvl

     10  akumar

      8  sez

      8  baby

      6  catlee

      4  osrevolution

      4  cody-somerville

      2  mithrandi

      2  cjsmo

      1  nenolod

      1  ffm

Here I m the 4th most committer

And while I was on it, I thought I could do the same for the GNOME and GStreamer teams:



emilio@saturno:~/deb/pkg-gnome$ svn log   egrep "^r[0-9]+     cut -f2 -d    sed  s/-guest//    sort   uniq -c   sort -n -r

   5357  lool

   2701  joss

   1633  slomo

   1164  kov

    825  seb128

    622  jordi

    621  jdassen

    574  manphiz

    335  sjoerd

    298  mlang

    296  netsnipe

    291  grm

    255  ross

    236  ari

    203  pochu

    198  ondrej

    190  he

    180  kilian

    176  alanbach

    170  ftlerror

    148  nobse

    112  marco

     87  jak

     84  samm

     78  rfrancoise

     75  oysteigi

     73  jsogo

     65  svena

     65  otavio

     55  duck

     54  jcurbo

     53  zorglub

     53  rtp

     49  wasabi

     49  giskard

     42  tagoh

     42  kartikm

     40  gpastore

     34  brad

     32  robtaylor

     31  xaiki

     30  stratus

     30  daf

     26  johannes

     24  sander-m

     21  kk

     19  bubulle

     16  arnau

     15  dodji

     12  mbanck

     11  ruoso

     11  fpeters

     11  dedu

     11  christine

     10  cpm

      7  ember

      7  drew

      7  debotux

      6  tico

      6  emil

      6  bradsmith

      5  robster

      5  carlosliu

      4  rotty

      4  diegoe

      3  biebl

      2  thibaut

      2  ejad

      1  naoliv

      1  huats

      1  gilir



emilio@saturno:~/deb/pkg-gstreamer$ svn log   egrep "^r[0-9]+     cut -f2 -d    sed  s/-guest//    sort   uniq -c   sort -n -r

    891  lool

    840  slomo

     99  pnormand

     69  sjoerd

     27  seb128

     21  manphiz

      8  he

      7  aquette

      4  elmarco

      1  fabian

Conclusions:
- Why do I have the full python-modules and pkg-gstreamer trees, if I have just one commit to DPMT, and don t even have commit access to the GStreamer team?
- If you don t want to seem like you have done less commits than you have actually done, don t change your alioth name when you become a DD

(hint: pox-guest and piotr in python-modules are the same person)
- If the switch to a new VCS was based on a vote where you have one vote per commit, the top 3 commiters in pkg-gnome could win the vote if they chosed the same! For python-apps it s the 4 top commiters, and the 7 ones for python-modules. pkg-gstreamer is a bit special

28 October 2008

Ondřej Čertík: Google Mentor Summit III

On Sunday I checked out from the hotel and enjoyed the breakfast, basically the same as the first picture here. Which is cool, the best way to start a day is with a good breakfast in the morning.

At Google I finally met with Bart Massey, who works at the Portland State University where I was an intern (in 2005) and also he was sponsoring one SymPy GSoC student in 2007.

Then we started to port SymPy to Jython with Philip Jenvey. I first installed openjdk-6, which is in Debian main, then checked out Jython svn, typed "ant" and it just worked. This is so awesome, that we finally have a truly opensource and fully working java implementation in Debian. Philip fixed some bugs already, so SymPy imports just fine. Most of the tests run, but also quite a lot still fail.

We managed to isolate two bugs: issue 1158 and 1159. Then there was an annoying recursion bug, that took us several hours to dig into and I then had to leave at 6pm without a clue. But Philip kept saying, I just need to look more at the java stacktraces that Jython was generating and I'll figure it out. And he did! The next morning he filed an issue 412 in the PyPy tracker, because they seem to have the same problem. There are subtle differences in how to handle __mul__ and __rmul__ methods that are not well documented, so it works in CPython but fails in Jython/PyPy. I think we can fix that in SymPy too in the meantime, so I am very excited, because it seems SymPy will work on top of Jython soon.

There was also a git vs hg session with both hg and git developers. Since I recently switched from hg to git for all stuff where I can choose the vcs, it was very interesting for me. And I must say I am glad I switched. Mercurial works fine enough and with the recent rebase feature I think it is usable, but git just has a bigger momentum at the moment and it just doesn't get in my way.

During the day Steve sent an email to the GSoC list that Dirk forgot his jacket in the hotel room, so I said to myself, haha, that's a pity. And then I realized, oh shit, I left my jacket there as well. They didn't have it in the evening when I stopped there, but fortunately they sent me an email yesterday that they found it, so I'll stop there on my way back.

We also took couple of group pictures, here is just my lame attempt (I am sure people will soon post better ones):

On Monday morning I met with Fernando Perez, the author of ipython, he know works with Jarrod. Then I rented a car and came to visit Brian Granger and his family in San Luis Obispo.

We went to the beach in the evening:

And had a dinner together. Brian showed me his options pricing example he did with SymPy, so I made him create an issue 1175 with the code.
Then we played with the parallel stuff that Brian contributed to ipython and we tried sympy with it and to our disappointment, we found pickling bugs in sympy, it works with protocol 0, but not protocol 2, see the issue 1176 and 1177. This almost smells like a bug in Python itself, but I need to investigate that later more deeply.

Today is Tuesday and we'll work on doing atomic physics calculations with sympy. See also Brian's email to the sympy list about it.

26 October 2008

Ondřej Čertík: Google Mentor Summit II

On Saturday morning I met Steve McIntyre and Dirk Eddelbuettel, Steve took us by car to Google. Well, it was overwhelming.

For example we met there two other Debian developers, Jeff Bailey and Daniel Burrows. There are developers from all the famous projects, like git, Wine, Turbogears, Jython, ... There are still people I wanted to meet but didn't manage yesterday, need to fix that today. My plan is to get SymPy running on top of Jython, or at least do some more progress to take the advantage, that Philip Jenvey from Jython is here. Also I wanted to fix some RC bugs in Debian when Steve is here, to make some progress on my NM finally. We'll see how it goes.

There was also an excellent presentation from people behind Android. One thing I was curious if it's going to run native C applications, like Python and it seems it will happen eventually, all the sources are out there, so someone just needs to push it forward. Another very cool thing they did that I was always looking for is Gerrit for reviewing patches (see review.source.android.com for example how it looks like) -- basically a fork of Guido's codereview, but it has many nice features, like automatically applying the patch to the git repository as a new topic branch and one can then easily pull it, as well as review it over the web interface. When I get back to Prague I am going to install this for SymPy.

25 October 2008

Ondřej Čertík: Google Mentor Summit I

I flew via Atlanta and had only about an hour to my next flight to San Francisco, so after my last experience, when I went to the immigration, got stuck in the line for more than an hour and then had to run to catch my flight, I decided to try a different strategy this time: first run to the immigration and then walk to my gate. Unfortunately I was sitting near the back of the plane and I got out among the last ones. Fortunately, it was several hundreds meters to the immigration, so I run as fast as I could and I managed to get there as the first one and everything took about 5 minutes. That was just awesome, I finally figured this out.

Jarrod was waiting for me at the airport, went to his place. Here's his cat:

On Friday we did some work and then went to the Golden Gate, the traffic was quite dense:

Alcatraz:

San Francisco:

Then we had a cofee in San Francisco and went to Silicon Valey, Jarrod drove me around a little bit to see SLAC, Stanford campus and other things. In the evening we went to the common pub with other mentors and Google guys, where we for example met with Robert Bradshaw.

Today I am looking forward to meet with all the people I know from mailinglists, Debian and other places.

22 September 2008

Ondřej Čertík: master studies

I did it! :) I defended my thesis and passed master finals from theoretical physics at Charles University in Prague couple hours ago.

After almost dropping out of my school exactly a year ago for not having enough credits to go to the next year, I gave myself an obligation to finish my school on time. I worked very hard the last year, I had to do 8 exams, some of them very hard, requiring more than a week of thorough learning and 9 seminars, requiring a lot of work too and also a master thesis, for which I had to had working finite element solvers and together with SymPy it took all my time and energy. I even had to cancel my trip to Austin and Caltech for the SciPy conference. But I finished my school after all, I am very happy about it.

Last year, two of my friends bet $100 between themselves that I will not finish on time. Jarda, who believed in me is now at Princeton doing his Ph.D. Matou , who didn't believe in me, will now pay $100 to Jarda. I think that life is fair.:)

Now I'll be visiting pubs quite often and then I'll fix some long standing issues in SymPy, hopefully finish my Debian task & skills (to finally become a Developer couple months after that) and finally do useful stuff for my research with a fresh head now.

16 August 2008

Ondřej Čertík: I am switching from Mercurial to git

After a long night of debugging and while preparing, reviewing and pushing final patches before a SymPy release, I got this:


$ hg qpush 
applying 1645.diff
Unable to read 1645.diff
** unknown exception encountered, details follow
** report bug details to http://www.selenic.com/mercurial/bts
** or mercurial@selenic.com
** Mercurial Distributed SCM (version 1.0.1)
Traceback (most recent call last):
  File "/usr/bin/hg", line 20, in 
    mercurial.dispatch.run()
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 20, in run
    sys.exit(dispatch(sys.argv[1:]))
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 29, in dispatch
    return _runcatch(u, args)
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 45, in _runcatch
    return _dispatch(ui, args)
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 364, in _dispatch
    ret = _runcommand(ui, options, cmd, d)
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 417, in _runcommand
    return checkargs()
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 373, in checkargs
    return cmdfunc()
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 356, in 
    d = lambda: func(ui, repo, *args, **cmdoptions)
  File "/var/lib/python-support/python2.5/hgext/mq.py", line 1942, in push
    mergeq=mergeq)
  File "/var/lib/python-support/python2.5/hgext/mq.py", line 833, in push
    top = self.applied[-1].name
IndexError: list index out of range

I am using:


$ hg --version 
Mercurial Distributed SCM (version 1.0.1)

Copyright (C) 2005-2008 Matt Mackall  and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And I just got pissed off and I am switching to git for good. I've been using mercurial every day since about November 2007, and I think I am quite proficient in it. But we were missing some features, like recording a patch by hunks (git add -p), so Kirill Smelkov (another SymPy developer) just implemented this to mercurial queues, it is now included in hg 1.0. But I am constantly beaten by stupid bugs, mercurial broke into pieces several times already (last time with hg bisect, before it with mercurial queues and file renaming), that just should not happen in a production version. Another problem is that hgweb is constantly using 100% of CPU on my server whenever someone clicks to see the contents of the README file. Again I can spend time debugging it, but I will just use git and if it works, I'll stay with it.

I started learning git recently and it is just better in every aspect. Bigger community, it has all the features that I was always missing in mercurial (like rebase, hunks recording, diff diff --color-words, better gitk). And it's superfast.

Unless I find some showstopper bugs in git too, I don't think I am coming back.

We'll have to create a live mercurial mirror of SymPy, so that people can continue using mercurial, but I'll myself just be using git. We wrote a simple translation table for anyone using mercurial to get started:

http://wiki.sympy.org/wiki/Git_hg_rosetta_stone

I love Python and I like that Mercurial is just a small C core and the rest is in Python, but for some reason, it still has the baby bugs, that should have been fixed years ago and why should we spend our time fixing and improving Mercurial, if other people have already done the job for us (and usually a better job than I could do) in git?

Another reason for switching is that in Debian, mercurial is not used for packages, while git is used a lot.

24 May 2008

Ondřej Čertík: Ubuntu Developer Summit in Prague

Last weekend I was at FOSSCamp. Since I live in Prague I wanted to go to Ubuntu Developer Summit (UDS) each day, but unfortunately I had some exams, so I only went on Wednesday and Friday.

On Wednesday I first met Lars Wirzenius:

we agreed to go to pub in the evening. Then I did a little work, there was quite a nice view from the window (Prague castle on the horizon):

and I went to the #ubuntu-devel-summit IRC channel and pinged Scott Kitterman, whom I new from the Debian Python Modules Team (DPMT), but didn't know how he looks like. We met and once I knew Scott, it was easy to get around, so he introduced me to Steve Langasek (pronounced Lang ek). We agreed to go to pub as well. Steve lives in Portland, OR, where I spent the summer 2005 and Scott is from Baltimore where I spent the summer 2006.
Then I also met Riku Voipio, Martin B hm, Christian Reis (whom I asked if it's possible to support Debian unstable on Ubuntu Personal Package Archives and he said that it will probably happen, so that's really cool -- I also offered my help with this) and others, so in the end, there were 14 of us going to the pub, so I chose again the same pub as with the FOSSCamp people and it seems it tasted good again:

Notice the sv kov na smetan above, my favourite meal. Good choice Scott. :)

Ok, that was on Wednesday. On Friday I arrived at around 3pm, looked at the schedule table and noticed that Matthias Klose should be at UDS too, so I started IRC and pinged him. Fortunately, Emilio Pozuelo Monfort, whom I know from DPMT as well, replied first so we met, it was cool and he showed where Matthias is. I am very glad I met him, so we discussed python-central and python-support packages and why we have them both, also with Scott later on.

When I was waiting for Matthias, I sat next to Nicolas Valc rcel, started my laptop and begun looking at some SymPy bugs and Nicolas noticed that and said -- "You are developing SymPy?", I said "Yes.", flattered. And he showed me a bug with plotting and Compiz, so we immediately reported that to pyglet.

In the evening people continued to some kind of a party, but unfortunately, I was already going to some other pub.

Overall, even though I was there for only two afternoons, it was just awesome and I utterly enjoyed meeting all the people I knew from mailinglists and IRC.

23 May 2008

Lucas Nussbaum: FOSSCAMP

Last week-end, I was in Prague for FOSSCAMP, a Canonical-sponsor event aiming at bringing together people from various Free Software projects. Such an event is a very good idea, especially after the openssl debacle, where we saw how difficult it is to build good relationships with other Free Software projects. The event was organized in a rather interesting way: the attendees make up the schedule as the event happens, by adding sessions to the timetable using marker pens on a whiteboard (see picture, hi Jorge!). If we have a spare BOF room at Debconf, it would be great to do the same thing: I always find it frustrating that many important discussions at Debconf happen outside BOFs/lectures, and are so easy to miss, simply because you submit talk proposals months before Debconf, so you can’t know yet what will be the hot stuff when Debconf finally arrives :-) It was also interesting to see all the different ways people organized BOFs. It might be interesting to write a list of DOes and DON’T about BOFs (not that the sessions weren’t of good quality!). Does anyone know if such a list already exist? And finally, as usual, it was great to meet all those people from this nice community again. If you still believe that some projects are fighting with each other, you need to attend such an event. And special thanks go to Ond ej ert k for guiding me through a visit of Prague!

20 May 2008

Ondřej Čertík: Installing a printer in Debian

I finally bought a new toner to my old Minolta 1250e that I didn't use for a few years (because I was lazy to fix it) and I was surprised how easy it is to install it these days:

1. plugged the USB cable from the printer to my laptop
2. wajig install cupsys foomatic-db-engine
3. sudo ln -s /usr/bin/foomatic-ppdfile /usr/lib/cups/driver
4. On http://localhost:631/ I added the new printer in cupsys, it showed my printer over USB, I selected it and then I chose the correct PPD.

And everything just works. The only nontrivial part is the step 3, that was suggested in /usr/share/doc/foomatic-db-engine/USAGE.gz:


If the printers of the Foomatic database do not appear, check whether the
link to foomatic-ppdfile is in /usr/lib/cups/driver:

lrwxrwxrwx  1 root root 25 Apr 19 18:13 foomatic -> /usr/bin/foomatic-ppdfile

If not, create it manually.

Without it, I didn't see the right PPD in the step 4. Maybe it'd be a nice idea to do this automatically (either when installing foomatic-db-engine or cupsys), or am I missing something?

As to the PPDs, one can get all printer IDs by:


$ foomatic-ppdfile -A

and the PPD for my printer by


$ foomatic-ppdfile -p Minolta-PagePro_1250E

However, it's not necessary in the above howto.

Next.