Niels Thykier: Partial rewrite of lintian s reporting setup
I had the mixed pleasure of doing a partial rewrite of lintian s reporting framework. It started as a problem with generating the graphs, which turned out to be not enough memory . On the plus side, I am actually quite pleased with the end result. I managed to scope-creep myself quite a bit and I ended up getting rid of a lot of old issues.
The major changes in summary:
Filed under: Debian, Lintian
- A lot of logic was moved out of harness, meaning it is now closer to becoming a simple dumb task scheduler. With the logic being moved out in separate processes, harness now hogs vastly less memory that I cannot convince perl to release to the OS. On lilburn.debian.org vastly less is on the order of reducing 700ish MB to 32 MB .
- All important metadata was moved into the harness state-cache , which is a simple YAML file. This means that Lintian laboratory is no longer a data store. This change causes a lot of very positive side effects.
- With all metadata now stored in a single file, we can now do atomic updates of the data store. That said, this change itself does not enable us to run multiple lintian s in parallel.
- As the lintian laboratory is no longer a data store, we can now do our processing in throw away laboratories like the regular lintian user does. As the permanent laboratory is the primary source of failure, this removes an entire class of possible problems.
- Packages can now be up to date in the generated reports. Previously, they would always be listed as out of date even if they were up to date. This is the only end user/website-visitor visible change in all of this (besides the graphs are now working again \o/).
- The size of the harness work list is no longer based on the number of changes to the archive.
- The size of the harness work list can now be changed with a command line option and is no longer hard coded to 1024. However, the time limit remains hard coded for now.
- The full run (and clean run ) now simply marks everything out-of-date and processes its (new) backlog over the next (many) harness runs. Accordingly, a full-run no longer causes lintian to run 5-7 days on lilburn.d.o before getting an update to the website. Instead we now get incremental updates.
- The harness.log now features status updates from lintian as they happen with processed X successfully or error processing Y plus a little wall time benchmark. With this little feature I filed no less than 3 bugs against lintian 2 of which are fixed in git. The last remains unfixed but can only be triggered in Debian stable.
- It is now possible with throw-away labs to terminate the lintian part of a reporting run early with minimal lost processing. Since the lintian-harness is regular fed status updates from lintian, we can now mark successfully completed entries as done even if lintian does not complete its work list. Caveat: There may be minor inaccuracies in the generated report for the particular package lintian was processing when it was interrupted. This will fix itself when the package is reprocessed again.
- It is now vastly easier to collect new meta data to be used in the reports. Previously, they had to be included in the laboratory and extracted from there. Now, we just have to fit it into a YAML file. In fact, I have been considering to add the wall time and make a top X slowest page.
- It is now possible to generate the html pages with only a state-cache and the lintian.log file. Previously, it also required a populated lintian laboratory.
Filed under: Debian, Lintian