Search Results: "danielsh"

11 January 2017

Reproducible builds folks: Reproducible Builds: week 89 in Stretch cycle

What happened in the Reproducible Builds effort between Sunday January 1 and Saturday January 7 2017: GSoC and Outreachy updates Toolchain development Packages reviewed and fixed, and bugs filed Chris Lamb: Dhole: Reviews of unreproducible packages 13 package reviews have been added, 4 have been updated and 6 have been removed in this week, adding to our knowledge about identified issues. 2 issue types have been added/updated: Upstreaming of reproducibility fixes Merged: Opened: Weekly QA work During our reproducibility testing, the following FTBFS bugs have been detected and reported by: diffoscope development diffoscope 67 was uploaded to unstable by Chris Lamb. It included contributions from :
[ Chris Lamb ]
* Optimisations:
  - Avoid multiple iterations over archive by unpacking once for an ~8X
    runtime optimisation.
  - Avoid unnecessary splitting and interpolating for a ~20X optimisation
    when writing --text output.
  - Avoid expensive diff regex parsing until we need it, speeding up diff
    parsing by 2X.
  - Alias expensive Config() in diff parsing lookup for a 10% optimisation.
* Progress bar:
  - Show filenames, ELF sections, etc. in progress bar.
  - Emit JSON on the the status file descriptor output instead of a custom
* Logging:
  - Use more-Pythonic logging functions and output based on __name__, etc.
  - Use Debian-style "I:", "D:" log level format modifier.
  - Only print milliseconds in output, not microseconds.
  - Print version in debug output so that saved debug outputs can standalone
    as bug reports.
* Profiling:
  - Also report the total number of method calls, not just the total time.
  - Report on the total wall clock taken to execute diffoscope, including
* Tidying:
  - Rename "NonExisting" -> "Missing".
  - Entirely rework diffoscope.comparators module, splitting as many separate
    concerns into a different utility package, tidying imports, etc.
  - Split diffoscope.difference into diffoscope.diff, etc.
  - Update file references in debian/copyright post module reorganisation.
  - Many other cleanups, etc.
* Misc:
  - Clarify comment regarding why we call python3(1) directly. Thanks to J r my
    Bobbio <>.
  - Raise a clearer error if trying to use --html-dir on a file.
  - Fix --output-empty when files are identical and no outputs specified.
[ Reiner Herrmann ]
* Extend .apk recognition regex to also match zip archives (Closes: #849638)
[ Mattia Rizzolo ]
* Follow the rename of the Debian package "python-jsbeautifier" to
[ siamezzze ]
* Fixed no newline being classified as order-like difference.
reprotest development reprotest 0.5 was uploaded to unstable by Chris Lamb. It included contributions from:
[ Ximin Luo ]
* Stop advertising variations that we're not actually varying.
  That is: domain_host, shell, user_group.
* Fix auto-presets in the case of a file in the current directory.
* Allow disabling build-path variations. (Closes: #833284)
* Add a faketime variation, with NO_FAKE_STAT=1 to avoid messing with
  various buildsystems. This is on by default; if it causes your builds
  to mess up please do file a bug report.
* Add a --store-dir option to save artifacts.
Other contributions (not yet uploaded): website development Misc. This week's edition was written by Chris Lamb, Holger Levsen and Vagrant Cascadian, reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

3 January 2017

Maria Glukhova: Getting to know diffoscope better

I apologize to all potential readers of this blog for not writing a comprehensive Introduction post with details of the project I am taking part in during my internship, as well as some story about how I ended up there. Let me just say that I was a Debian user for years when I discovered it is taking part in Outreachy as one of organisations. Their Reproducible Builds effort has a noble goal and a bunch of great people behind it - I had no chances not to get excited by it. Looking for a place where my skills could be of any use, I discovered diffoscope - the tool for in-depth comparassion of files, archives etc. My mentor, Mattia Rizzolo, supported my decision to work on it, so now I am concentrating my efforts on improving diffoscope. As my first steps, I am doing small (but hopefully still somewhat important) job of fixing existing bugs. It helps me to better understand how diffoscope works, as well as introduces me to the workflow of opensource development. During December, I have done several small contributions, mostly fixing bugs.

Test data and jessie-backports First of them could be somewhat called cleaning up after my own mistake, although that mistake wasn t trivial. During the application period, I have fixed a bug with diffoscope failing while comparing symlinks to directory. That was a small change, but I included some tests for that case anyway. And that actually caused problems. With these tests, I included test data: two folders with symlinks. All was good in unstable version of Debian, but in jessie-backports, that commit caused build to fail. After some digging, I discovered the problem was caused by build process including copying that data. That was done using shutils Python module, and older version of that module, included in jessie, could not handle copying symlinks to directory properly. Thanks to my mentor for giving me a hint on how to resolve this: using temporary folders and creating these symlinks at runtime. That way, we ensured tests run without problems during build process on jessie. What have I learned: A great deal, actually. I spent too much time on that one, but I learned how to build packages, what happens during dpkg-buildpackage run and what debhelper tools are for. I also learned a bit about what chroot is and how to use it for testing.

ICC profile files and file type recognizing regexp Another one was also about failing tests and, therefore, failing build. Failing tests were all due to ICC files were not recognized by diffoscope. Turned out libmagic got an update which changed the description of ICC profile files. Diffoscope was relying on regexp applied to file type description to recognize the file, so I changed regexp to reflect the changes in libmagic. What have I learned: How diffoscope recognizes file types. Got me thinking: maybe there is a better way? That regexp-based approach is doomed to cause problems with every file type description change. I have this question still lingering in my mind - maybe I will come up with an idea later.

Order-like difference in text files Next, I decided to do something a bit bigger and fullfilled a feature request. That request was for detecting order-like difference in text files (when files has the same lines, but in different order). I did it by collecting added and removed lines in diff output in lists, sorting and then comparing them. Sadly, I forgot about one particular case - when one of the files is missing the newline at the end of file. I was kindly reminded of that quite soon in comments on the bug-tracker (thanks danielsh!) and have already fixed that. I also recieved feedback on how better implement it deeper in the diffoscope - not using the results of diff, but rather comparing sum of hashes of the lines directly in the difference module. I am yet to try that. What have I learned: That a call to diff is actually the slowest part of the diffoscope run when done on two big text files. Could it help somehow in speeding it up? I don t know yet. I also learned to comment on bugs in Debian bugtracker and was surprised by how much feedback I got. Thanks to my mentor for pushing me to do that - I definetely need to overcome my fear of communications to be more effective!

Random FTBFS There was also a very nasty bug that caused diffoscope to fail to be built from source randomly, failing with non-informative Fatal Python error: deallocated None. It already seemed strange when it was first reported; It got only more strange when suddenly that bug ceased to be reproducible. We hoped that would mean that bug was caused by some external tool, and was fixed there. Turns out it was not that easy. I tested this on two separate computers and on virtual machine; I used different versions of diffoscope. Well. Seems like that bug is still somehow tied to diffoscope version and not some external tool version - I still can do git checkout 64 and be able to reproduce the bug (still randomly, though). Although I spent quite a lot of time on that one, the only result was the information about connection between bug apperances and diffoscope version. I still wasn t able to get to the root of the problem - hopefully, someone else will be able to, given the information I found. What have I learned: git-bisect! Thanks to my friend for pointing me to it, that tool came handy in that situation. Also, got some experience in catching nasty bugs like that (pity that no experience in squashing them). I had some extra time commitements in December, one of them (Reproducible Builds Summit II) connected to my internship and one (my exam session in university) not. In January, I should be able to allocate more time to that work - I hope it will help me achieve more significant results. Many thanks to Mattia Rizzolo, Chris Lamb, Holger Levsen and all other folks of Reproducible Builds project - I cannot stress enough how important your support is to me. Wish you all a great 2017!