Search Results: "tytso"

Chris Lamb: Free software activities in August 2017

Here is my monthly update covering what I have been doing in the free software world in August 2017 (previous month):

Created ZeroCoolOS, a live operating system that plays the film Hackers (1995) on a continuous loop.
Sent a patch for pristine-tar to allow storage of detached upstream signatures. (#871809)
Worked more on Lintian, a static analysis tool for Debian packages, reporting on various errors, omissions and quality-assurance issues to the maintainer (previous changes):
- Fix an apache2-unparsable-dependency false positive by allowing periods in dependency names. (#873701)
- Ignore "repacked" packages when checking for upstream source tarball signatures as they will never match.
- Downgrade the severity of orig-tarball-missing-upstream-signature. (#870722)
- From a suggestion by Theodore Ts'o, expand the explanation of orig-tarball-missing-upstream-signature to include the location of where dpkg-source looks.
- Address a number of issues in the copyright-year-in-future tag including preventing false positives in port numbers, email addresses, ISO standard numbers and street addresses (#869788), as well as "meta" or testing statements (#873323). In addition, report all violating years in a line and expand the testsuite.
- Don't match quoted "FIXME" variants of file-contains-fixme-placeholder (#870199), avoid checking copyright_hints files (#872843) and downgrade the tag's severity.
- Apply a patch from Alex Muntada to recommend "substr" over of "substring" in mentions-deprecated-usr-lib-perl5-directory. (#871767)
- Prevent missing-build-dependency-for-dh_-command false positives exposed by following the advice in useless-autoreconf-build-depends. (#869541)
- Ensure readme-debian-contains-debmake-template also checks for files containing "Automatically generated by debmake".
- Check python3-foo packages have a Section: python, not just python2-foo. (#870272)
- Check for packages shipping compiled Java class files. (#873211)
- Additionally consider .cljc files to avoid codeless-jar warnings. (#870649)
- Prevent desktop-entry-lacks-keywords-entry false positives for Link and Directory-style .desktop files. (#873702)
- Split out Python checks from checks/scripts.pm check to a new, source check of type source.
- Check for python-foo without a corresponding python3-foo package. (#870681)
- Complain about packages that Build-Depend on python-sphinx only. (#870730)
- Warn about packages that alternatively Build-Depend on the Python 2 and Python 3 versions of Sphinx. (#870758)
- Check for packages that depend on Python 2.x. (#870822)
- Correct false positives in unconditional-use-of-dpkg-statoverride by detecting "if !" as a shell prefix. (#869587)
- Alert on for missing calls to dpkg-maintscript-helper(1) in maintainer scripts. (#872042)
- Check for packages using sensible-utils without declaring a dependency after splitting from debianutils. (#872611)
- Warn about scripts using nodejs as an interpreter now that the nodejs script provides /usr/bin/node. (#873096)
- Remove recommendations to add a Testsuite: autopkgtest field to debian/control and emit a new tag the package if it does so. (#865531)
- Recognise autopkgtest-pkg-elpa as a valid test suite. (#873458)
- Add note to /etc/bash_completion.d's obsolete path warning output regarding stricter filename requirements. (#814599)
- Add 4.0.1 and 4.1.0 as known Policy standards versions.
- Apply a patch from Maia Everett to avoid British spellings under the en_US locale. (#868897)
- Stop emitting maintainer,uploader -address-causes-mail-loops for @packages.debian.org addresses. (#871575)
- Modify Lintian::Data's all subroutine to always return keys in insertion order.
- Apply a patch from Steve Langasek to accomodate binutils outputting symbols in a different format on the ppc64el architecture. (#869750)
- Add an explicit test for packages including external fonts via the Google Font and TypeKit APIs. (#873434)
- Add missing entries in internal Test-For fields to make development/testing workflow less error-prone.
Sent three pull requests to git-buildpackage, a tool to assist in Debian packaging from Git repositories:
- Make pq --abbrev= configurable. (#872351)
- Use build profiles to avoid installation of test dependencies. (#31)
- Correct "allow to" grammar. (#30)
Updated travis.debian.net (my hosted service for projects that host their Debian packaging on GitHub to use the Travis CI continuous integration platform for testing):
- Move away from deb.debian.org; Travis appears to be using a HTTP proxy that strips SRV records. (commit)
- Highlight double quotes are required for TRAVIS_DEBIAN_EXTRA_REPOSITORY. (commit)
- Use force-unsafe-io. (commit)
- Clarify docs when upstream already has a travis.yml file. (#46)
- Make documentation easier to copy-paste. (commit)
Merged a pull request in django-slack, my library to easily post messages to the Slack group-messaging utility, where instantiation of a SlackException was failing. (#71)
Assigned two pull requests to the Redis key-value database store to correct "did not received" and "faield" typos. (#4216 & #4215).

Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to allow verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area. This month I:

Presented a status update at Debconf17 in Montr al, Canada alongside Holger Levsen, Maria Glukhova, Steven Chamberlain, Vagrant Cascadian, Valerie Young and Ximin Luo.
I worked on the following issues upstream:
- glib2.0: Please make the output of gio-querymodules reproducible. (...)
- gcab: Please make the output reproducible. (...)
- gtk+2.0: Please make the immodules.cache files reproducible. (...)
- desktop-file-utils: Please make the output reproducible. (...)
Within Debian:
- My work as Debian Project Leader (DPL) is covered in my monthly Bits from the DPL email to debian-devel-announce.
- Created isdebianreproducibleyet.com.
- Sent a patch to dpkg to sort the "unused substitution" warnings. (#870221)
- Added a script to devscripts to report on reproducibility status of installed packages. (#872514)
- Modified the Debian archive tools (dak) to automatically reject packages which do not bump their date in debian/changelog. (debian-devel post)
- Fixed an QA issue in snappy that was caught by the reproducible builds continuous integration framework. (#872226)
- I submitted three patches to fix specific reproducibility issues in grap, isa-support & python-numpy.
- Finally, I also performed two non-maintainer uploads (NMUs) for jsmath-fonts (#792319) and xvier (#777330) to make their builds reproducible.
Categorised a large number of packages and issues in the Reproducible Builds "notes" repository.
Worked on publishing our weekly reports. (#118, #119, #120, #121 & #122)

I also made the following changes to our tooling:

diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues.

Use name attribute over path to avoid leaking comparison full path in output. (commit)
Add missing skip_unless_module_exists import. (commit)
Tidy diffoscope.progress and the XML comparator (commit, commit)

disorderfs

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues.

Add a simple autopkgtest smoke test. (commit)

Debian

Patches contributed

openssh: Quote the IP address in ssh-keygen -f suggestions. (#872643)
libgfshare:
- SIGSEGV if /dev/urandom is not accessible. (#873047)
- Add bindnow hardening. (#872740)
- Support nodoc build profile. (#872739)
devscripts:
- Enable hardening buildflags for /usr/bin/debpkg. (#873379)
- Add missing scripts/debc to .gitignore. (#873381)
memcached: Add hardening to systemd .service file. (#871610)
googler: Tidy long and short package descriptions. (#872461)
gnome-split: Homepage points to domain-parked website. (#873037)

Uploads

python-django 1:1.11.4-1 New upstream release.
redis:
- 4:4.0.1-3 Drop yet more non-deterministic tests.
- 4:4.0.1-4 Tighten systemd/seccomp hardening.
- 4:4.0.1-5 Drop even more tests with timing issues.
- 4:4.0.1-6 Don't install completions to /usr/share/bash-completion/completions/debian/bash_completion/.
- 4:4.0.1-7 Don't let sentinel integration tests fail the build as they use too many timers to be meaningful. (#872075)
python-gflags 1.5.1-3 If SOURCE_DATE_EPOCH is set, either use that as a source of current dates or the UTC-version of the file's modification time (#836004), don't call update-alternatives --remove in postrm. update debian/watch/Homepage & refresh/tidy the packaging.
bfs 1.1.1-1 New upstream release, tidy autopkgtest & patches, organising the latter with Pq-Topic.
python-daiquiri 1.2.2-1 New upstream release, tidy autopkgtests & update travis.yml from travis.debian.net.
aptfs 2:0.10-2 Add upstream signing key, refer to /usr/share/common-licenses/GPL-3 in debian/copyright & tidy autopkgtests.
adminer 4.3.1-2 Add a simple autopkgtest & don't install the Selenium-based tests in the binary package.
zoneminder (1.30.4+dfsg-2) Prevent build failures with GCC 7 (#853717) & correct example /etc/fstab entries in README.Debian (#858673).

Finally, I reviewed and sponsored uploads of astral, inflection, more-itertools, trollius-redis & wolfssl.

Debian LTS

This month I have been paid to work 18 hours on Debian Long Term Support (LTS). In that time I did the following:

"Frontdesk" duties, triaging CVEs, etc.
Issued DLA 1049-1 for libsndfile preventing a remote denial of service attack.
Issued DLA 1052-1 against subversion to correct an arbitrary code execution vulnerability.
Issued DLA 1054-1 for the libgxps XML Paper Specification library to prevent a remote denial of service attack.
Issued DLA 1056-1 for cvs to prevent a command injection vulnerability.
Issued DLA 1059-1 for the strongswan VPN software to close a denial of service attack.

Debian bugs filed

wget: Please hash the hostname in ~/.wget-hsts files. (#870813)
debian-policy: Clarify whether mailing lists in Maintainers/Uploaders may be moderated. (#871534)
git-buildpackage: "pq export" discards text within square brackets. (#872354)
qa.debian.org: Escape HTML in debcheck before outputting. (#872646)
pristine-tar: Enable multithreaded compression in pristine-xz. (#873229)
tryton-meta: Please combine tryton-modules-* into a single source package with multiple binaries. (#873042)
azure-cli:
- New upstream release. (#871418)
- debian/watch file broken. (#871417)
fwupd-tests: Don't ship test files to generic /usr/share/installed-tests dir. (#872458)
libvorbis: Maintainer fields points to a moderated mailing list. (#871258)
rmlint-gui: Ship a rmlint-gui binary. (#872162)
template-glib: debian/copyright references online source without quotation. (#873619)

FTP Team

As a Debian FTP assistant I ACCEPTed 147 packages: abiword, adacgi, adasockets, ahven, animal-sniffer, astral, astroidmail, at-at-clojure, audacious, backdoor-factory, bdfproxy, binutils, blag-fortune, bluez-qt, cheshire-clojure, core-match-clojure, core-memoize-clojure, cypari2, data-priority-map-clojure, debian-edu, debian-multimedia, deepin-gettext-tools, dehydrated-hook-ddns-tsig, diceware, dtksettings, emacs-ivy, farbfeld, gcc-7-cross-ports, git-lfs, glewlwyd, gnome-recipes, gnome-shell-extension-tilix-dropdown, gnupg2, golang-github-aliyun-aliyun-oss-go-sdk, golang-github-approvals-go-approval-tests, golang-github-cheekybits-is, golang-github-chzyer-readline, golang-github-denverdino-aliyungo, golang-github-glendc-gopher-json, golang-github-gophercloud-gophercloud, golang-github-hashicorp-go-rootcerts, golang-github-matryer-try, golang-github-opentracing-contrib-go-stdlib, golang-github-opentracing-opentracing-go, golang-github-tdewolff-buffer, golang-github-tdewolff-minify, golang-github-tdewolff-parse, golang-github-tdewolff-strconv, golang-github-tdewolff-test, golang-gopkg-go-playground-validator.v8, gprbuild, gsl, gtts, hunspell-dz, hyperlink, importmagic, inflection, insighttoolkit4, isa-support, jaraco.itertools, java-classpath-clojure, java-jmx-clojure, jellyfish1, lazymap-clojure, libblockdev, libbytesize, libconfig-zomg-perl, libdazzle, libglvnd, libjs-emojify, libjwt, libmysofa, libundead, linux, lua-mode, math-combinatorics-clojure, math-numeric-tower-clojure, mediagoblin, medley-clojure, more-itertools, mozjs52, openssh-ssh1, org-mode, oysttyer, pcscada, pgsphere, poppler, puppetdb, py3status, pycryptodome, pysha3, python-cliapp, python-coloredlogs, python-consul, python-deprecation, python-django-celery-results, python-dropbox, python-fswrap, python-hbmqtt, python-intbitset, python-meshio, python-parameterized, python-pgpy, python-py-zipkin, python-pymeasure, python-thriftpy, python-tinyrpc, python-udatetime, python-wither, python-xapp, pythonqt, r-cran-bit, r-cran-bit64, r-cran-blob, r-cran-lmertest, r-cran-quantmod, r-cran-ttr, racket-mode, restorecond, rss-bridge, ruby-declarative, ruby-declarative-option, ruby-errbase, ruby-google-api-client, ruby-rash-alt, ruby-representable, ruby-test-xml, ruby-uber, sambamba, semodule-utils, shimdandy, sjacket-clojure, soapysdr, stencil-clojure, swath, template-glib, tools-analyzer-jvm-clojure, tools-namespace-clojure, uim, util-linux, vim-airline, vim-airline-themes, volume-key, wget2, xchat, xfce4-eyes-plugin & xorg-gtest. I additionally filed 6 RC bugs against packages that had incomplete debian/copyright files against: gnome-recipes, golang-1.9, libdazzle, poppler, python-py-zipkin & template-glib.

Theodore Ts'o: Is Nokia Doomed?

There s been a lot of discussion regarding whether or not Nokia is Doomed or not. The people who say Nokia are doomed basically point out that Nokia doesn t have any attractive products at the high end, and at the low end the margins are extremely thin. The high end products suffer from the Symbian being essentially dead (even Nokia is recommending that developers not develop native applications for Symbian, but to use Qt instead), and Nokia doesn t have much of a development community following it, and it certainly does have much in the way of 3rd party applications, either targetting Symbian or Qt at the moment. So what do I think of the whole debate between Tomi and Scoble? First of all, I think there is a huge difference in American and European assumptions and perspectives, and a big question is whether the rest of the world will end up looking more like Europe or America vis-a-vis two key areas: cost of data plans, and whether phones become much more application centric. Tomi took Apple to task in the comments section of his 2nd article for not having an SD card slot (how else would people share photos with their friends?) and for not supporting MMS in its earlier phones. My first reaction to that was: Um, isn t that what photo-sharing sites are for? Is it really that hard to attach a photo to an e-mail? And then it hit me. In Europe, data is still like MMS a few years ago a place for rapacious carriers to make way too much money. Many European telco s don t have unlimited data plans, and charge by the megabyte and even if you re lucky enough to live in a country which does have an American-like data plan, the cost of data roaming is still incredibly expensive. In contrast, in the US, I can pay $30/month for an unlimited data plan, and I can travel 2000 miles south or west and it will still be valid. Try doing that in Europe! The US had consumer-friendly data plans much earlier than Europe did, and so perhaps it s not surprising that Nokia has engineered phones that were far more optimized for the limitations caused by the Europe s Wireless carriers. The second area of debate where I think Scoble and Tomi are far apart is whether phones of the future are fundamentally about applications or well, making phone calls. Here I don t have proof that this is a fundamentally European vs. US difference, but I have my suspicions that it might be. Tomi spent a lot of time dwelling on how Nokia was much better at making phone calls (i.e., better microphones, better radios, etc). And my reaction to that was, Who cares? I rarely use my phone for making phone calls these days! And that was certainly one of the reasons why I gave up on Nokia after the E70 its contacts database was garbage! It was OK as a phone directory, but as a place for storing multiple addresses and e-mail addresses, it didn t hold a candle to the Palm PDA. And that s perhaps the key question how much is a smart phone and about being a phone , versus being a PDA (and these days I want a cloud-synchronized PDA, for my calendar, contacts, and todo lists), and how much is it about applications? This is getting long, so I think I ll save my comments about whether I think Meego will be an adequate savior for Nokia for another post. But it s worthwhile to talk here about Tomi s comments about most smartphones being much cheaper than the luxury iPhone, and so it doesn t matter that Nokia s attempt in the higher end smart phones has been a continuous history of fail. First of all, it s worth noting that there are much cheaper Android phones available on the market today, which are price-competitive with Nokia s low-end smartphones (i.e., available for free from T-Mobile in the States with a two year commitment). Secondly, the history in the computer market over the last twenty years is that features inevitably waterfall into the cheaper models, and prices will tend to drop over time as well. Apple started only with the iPod, but over time they added the iPod Nano and the iPod Shuffle. And it would not surprise me if they introduce a lower-end iPhone as well in time as well. It would shock me if they aren t experimenting with such models even as we speak, and have simply chosen not to push one out to the market yet. So even if you buy Tomi s argument that the high-end smartphones don t matter, and you only care about volume, and not about profit margins (talk to the people at Nokia that will need to be laid off to make their expenses match with their lowered revenue run rates; I bet they will care), the question is really about whether Nokia has time to execute on the Meego vision before it s too late and the current application-centric smartphone ecosystems (Android and iPhone) start eating into the lower-end smartphone segment. More on that in my next post. No related posts.

Theodore Ts'o: Android will be using ext4 starting with Gingerbread

I received a trackback from Tim Bray s Saving Data Safely post on the Android Developer s blog to my Don t fear the fsync! blog entry, so I guess the cat s out of the bag. Starting with Gingerbread, newer Android phones (starting with the Nexus S) will be using the ext4 file system. Very cool! So just as IBM used to promote Linux by saying that it was scalable enough to run on everything between watches and mainframes, I can now talk about ext4 as running in production on cell phones to Google data centers. How much am I worried about Tim Bray s caution to Android programmers that they use fsync() correctly? Not a lot. Sure, they should make sure they use fsync(), or if they want to be clever, sync_file_range(), to make sure files are appropriately written to disk (or, in Android s case, to flash). But unlike Ubuntu s running on random PC s, with users downloading the latest (possibly buggy) Nvidia drivers, handset manufacturers test their systems very carefully before they let them ship. So if there are any problems, they tend to be found before the phone ships to end-users. So at least in my experience, my Nexus One has been very reliable; it s never crashed on me. So the chances of random crashes when you exit the 3D game (as one Ubuntu user reported and considered acceptable?!? I d be roasting Ubuntu and/or Nvidia under a slow fire if that was a reliably reproducible bug, not considering it par for the course) are very remote. And fsync() is important if systems crash or aren t shut down cleanly. Still, if users are randomly ripping out their batteries to turn off their cell phone in a hurry, because they re too impatient to wait for a controlled shutdown, then sure, we might have problems, and it s a good reason for application writers to use fsync() correctly. By the way, I had nothing to do with the choice to use ext4 on Android. So if you re curious about why ext4 was chosen, I can t say anything authoritatively, since I wasn t consulted before the decision was made (although obviously I m delighted). As far as I can tell after the fact, one of the reasons for choosing ext4 was better performance, especially in the light of dual-core ARM CPU s which are becoming available in large quantities in the near future; YAFFS is single threaded, so it would have been a bottleneck on dual-core systems. Why not btrfs? Well, for all of btrfs s features, it s not out of beta yet, and Gingerbread is shipping today. (Well, in less than a week to Best Buy, if we want to be precise.) This is another reason why I m glad to see ext4 being used on Android is that it validates my decision to keep working on ext4 2-3 years ago, even though newer filesystems like btrfs were promised to be coming down the pike. As I ve said many times before, file systems are like fine wine, and they take many years to mature. So having ext4 ready today is a great way of giving more choices to developers and system administrators about what file system they want to use. No related posts.

Theodore Ts'o: Android will be using ext4 starting with Gingerbread

I received a trackback from Tim Bray s Saving Data Safely post on the Android Developer s blog to my Don t fear the fsync! blog entry, so I guess the cat s out of the bag. Starting with Gingerbread, newer Android phones (starting with the Nexus S) will be using the ext4 file system. Very cool! So just as IBM used to promote Linux by saying that it was scalable enough to run on everything between watches and mainframes, I can now talk about ext4 as running in production on cell phones to Google data centers. How much am I worried about Tim Bray s caution to Android programmers that they use fsync() correctly? Not a lot. Sure, they should make sure they use fsync(), or if they want to be clever, sync_file_range(), to make sure files are appropriately written to disk (or, in Android s case, to flash). But unlike Ubuntu s running on random PC s, with users downloading the latest (possibly buggy) Nvidia drivers, handset manufacturers test their systems very carefully before they let them ship. So if there are any problems, they tend to be found before the phone ships to end-users. So at least in my experience, my Nexus One has been very reliable; it s never crashed on me. So the chances of random crashes when you exit the 3D game (as one Ubuntu user reported and considered acceptable?!? I d be roasting Ubuntu and/or Nvidia under a slow fire if that was a reliably reproducible bug, not considering it par for the course) are very remote. And fsync() is important if systems crash or aren t shut down cleanly. Still, if users are randomly ripping out their batteries to turn off their cell phone in a hurry, because they re too impatient to wait for a controlled shutdown, then sure, we might have problems, and it s a good reason for application writers to use fsync() correctly. By the way, I had nothing to do with the choice to use ext4 on Android. So if you re curious about why ext4 was chosen, I can t say anything authoritatively, since I wasn t consulted before the decision was made (although obviously I m delighted). As far as I can tell after the fact, one of the reasons for choosing ext4 was better performance, especially in the light of dual-core ARM CPU s which are becoming available in large quantities in the near future; YAFFS is single threaded, so it would have been a bottleneck on dual-core systems. Why not btrfs? Well, for all of btrfs s features, it s not out of beta yet, and Gingerbread is shipping today. (Well, in less than a week to Best Buy, if we want to be precise.) This is another reason why I m glad to see ext4 being used on Android is that it validates my decision to keep working on ext4 2-3 years ago, even though newer filesystems like btrfs were promised to be coming down the pike. As I ve said many times before, file systems are like fine wine, and they take many years to mature. So having ext4 ready today is a great way of giving more choices to developers and system administrators about what file system they want to use. No related posts.

Theodore Ts'o: Working on Technology at Startups

Richard Tibbetts has called me out for conflating Web 2.0 startups with all startups in my recent blog posting, Google has a problem retaining great engineers? Bullcrap . His complaint was that I was over generalizing from Web 2.0 startups to all startups. He s right, of course. The traditional technology startup by definition does have a large amount technology work that needs to be done, in addition to the business development work. However, things have changed a lot even for technology startups. Consider a company like Sequent Computer Systems, which started in 1983. At the time the founders had a key idea, which was to use multiple commodity intel CPU s to create first SMP, and then later, NUMA minicomputers. But in order to do that, they had to design, build and manufacture a huge mount of hardware, as well as develop a whole new Unix-derived operating system, just to bring that core idea to market. These days, the founder or founders will have a core idea, which they will hopefully patent, to prevent competitors from replicating their work, just as before. However, these days there is a huge selection of open source software so there is much less technology that needs to be re-developed / re-invented in order to bring that idea to market. From Linux and BSD at the operating system level, to databases like MySQL, Apache web servers, etc., there is an awful lot for the startup to chose from. This is all goodness, of course. But it means that most of the technology developed in a typical startup will tend to be focused on supporting the core idea that was developed by the founder. If a company is focused on databases, they probably won t be interested in supporting me to do my file system work. Why should they? There s lots of open source file systems they can use; one of them will probably meet their needs. So while it s obvious that you can do technology at large variety of companies, of different sizes, I don t think it s arrogance to say that there are certain types of technology that can only be done at Google, or maybe at a very small subset of other companies. I m pretty confident, for example, that Google has the world s largest number of computers in its various data centers around the world. That means there are certain things that don t make sense at other companies, but which absolutely makes sense at our scale. In addition, Google s business model means allows us to pursue open source projects such as Chrome and Android that wouldn t make sense at other companies. With Android in particular, it means that multiple firmware updates for any given handset model, if it causes people to use the web more and drive more advertising revenue, makes sense and so it s something we can pursue in comparison to Nokia, which gets its revenue from hardware sales, so a firmware update that extends the life of a handset is actually a bad thing for them; better to force you to upgrade an handset every year or two. So I think Richard misunderstood me if he thought I was trying to make the argument that Google is the only place where you can do interesting technical work. That s obviously not true, of course. But I do think there are many examples of technical work which don t make business sense to do at smaller companies, and startups in particular have to be very much focused on bringing the founder s idea to market, and all else has to be subordinated to that goal. And one of the most interesting developments is how the combination of commoditized and standardized hardware, and comoditized software in the form of open source, has changed the game for startups. For most startups, though, open source software is something that they will use, but not necessarily develop except in fairly small ways. Many of the economic arguments in favor of releasing code as open source, and dedicating a significant fraction of an engineer s time to serve as a OSS project maintainer or kernel subsystem maintainer, are ones that make much more sense at a very large company like Google or IBM. That s not because startups are evil, or deficient in any way; just the economic realities that at a successful startup, everything has to be subordinated to the central goal of proving that they have a sustainable, scalable business model and that they have a good product/market fit. Everything else, and that includes participating in an open source community, is very likely a distraction from that central goal. No related posts.

Theodore Ts'o: Working on Technology at Startups

Richard Tibbetts has called me out for conflating Web 2.0 startups with all startups in my recent blog posting, Google has a problem retaining great engineers? Bullcrap . His complaint was that I was over generalizing from Web 2.0 startups to all startups. He s right, of course. The traditional technology startup by definition does have a large amount technology work that needs to be done, in addition to the business development work. However, things have changed a lot even for technology startups. Consider a company like Sequent Computer Systems, which started in 1983. At the time the founders had a key idea, which was to use multiple commodity intel CPU s to create first SMP, and then later, NUMA minicomputers. But in order to do that, they had to design, build and manufacture a huge mount of hardware, as well as develop a whole new Unix-derived operating system, just to bring that core idea to market. These days, the founder or founders will have a core idea, which they will hopefully patent, to prevent competitors from replicating their work, just as before. However, these days there is a huge selection of open source software so there is much less technology that needs to be re-developed / re-invented in order to bring that idea to market. From Linux and BSD at the operating system level, to databases like MySQL, Apache web servers, etc., there is an awful lot for the startup to chose from. This is all goodness, of course. But it means that most of the technology developed in a typical startup will tend to be focused on supporting the core idea that was developed by the founder. If a company is focused on databases, they probably won t be interested in supporting me to do my file system work. Why should they? There s lots of open source file systems they can use; one of them will probably meet their needs. So while it s obvious that you can do technology at large variety of companies, of different sizes, I don t think it s arrogance to say that there are certain types of technology that can only be done at Google, or maybe at a very small subset of other companies. I m pretty confident, for example, that Google has the world s largest number of computers in its various data centers around the world. That means there are certain things that don t make sense at other companies, but which absolutely makes sense at our scale. In addition, Google s business model means allows us to pursue open source projects such as Chrome and Android that wouldn t make sense at other companies. With Android in particular, it means that multiple firmware updates for any given handset model, if it causes people to use the web more and drive more advertising revenue, makes sense and so it s something we can pursue in comparison to Nokia, which gets its revenue from hardware sales, so a firmware update that extends the life of a handset is actually a bad thing for them; better to force you to upgrade an handset every year or two. So I think Richard misunderstood me if he thought I was trying to make the argument that Google is the only place where you can do interesting technical work. That s obviously not true, of course. But I do think there are many examples of technical work which don t make business sense to do at smaller companies, and startups in particular have to be very much focused on bringing the founder s idea to market, and all else has to be subordinated to that goal. And one of the most interesting developments is how the combination of commoditized and standardized hardware, and comoditized software in the form of open source, has changed the game for startups. For most startups, though, open source software is something that they will use, but not necessarily develop except in fairly small ways. Many of the economic arguments in favor of releasing code as open source, and dedicating a significant fraction of an engineer s time to serve as a OSS project maintainer or kernel subsystem maintainer, are ones that make much more sense at a very large company like Google or IBM. That s not because startups are evil, or deficient in any way; just the economic realities that at a successful startup, everything has to be subordinated to the central goal of proving that they have a sustainable, scalable business model and that they have a good product/market fit. Everything else, and that includes participating in an open source community, is very likely a distraction from that central goal. No related posts.

Theodore Ts'o: Close the Washington Monument

Bruce Schneier has written an absolutely powerful essay in his blog, with the modest proposal that in response to the security worries at the Washington Monument, we should close it. If you haven t read it yet, run, don t walk, to his blog and read it. Then if you live in the States, write to your congresscritters, and ask them to reinsert the backbone which they have placed in a blind trust when they got elected, and tell the TSA that they have a new mandate; to provide as much security as possible without compromising our freedom, privacy, and American Ideals. Right now, they have an impossible job, because they have been asked to provide an absolute degree of security. And in trying to provide the impossible, the terrorists have already won No related posts.

Theodore Ts'o: Google has a problem retaining great engineers? Bullcrap.

Once again, there s been another story about how Google is having trouble retaining talent. Despite all Eric Schmidt s attempts to tell folks that Google s regretted attrition rate has not changed in seven years, this story just doesn t want to seem to die. (And those stories about Google paying $3.5 million and $7 million to keep an engineer from defecting to Facebook? As far as I know, total bull. I bet it s something made up by some Facebook recruiter who needed to explain how she let a live prospect get away. :-) At least for me, the complete opposite is true. There are very few companies where I can do the work that I want to do, and Google is one of them. A startup is totally the wrong place for me. Why? Because if you talk to any venture capitalist, a startup has one and only one reason to exist: to prove that it has a scalable, viable business model. Take diapers.com for example. As Business Week described, while they were proving that they had a business model that worked, they purchased their diapers at the local BJ s and shipped them via Fedex. Another startup, Chegg, proved its business model by using Amazon.com to drop ship text books to their first customers. (The venture capitalist Mark Maples talked about this in a brilliant talk at the Founders Showcase; the Chegg example starts around 20:50 minutes in, but I d recommend listening to the whole thing, since it s such a great talk.) You don t negotiate volume discounts with textbook publishers, or build huge warehouses to hold all of the diapers that you re going to buy until you prove that you have a business model that works. Similarly, you don t work on great technology at a startup. Startups, by and large, aren t about technology at least, not the Web 2.0 startups like Facebook, Foursquare, Twitter, Groupon, etc. They are about business model discovery. So if you are fundamentally a technologist at heart, whose heart sings when you re making a better file system, or fixing a kernel bug, you re not going to be happy at a startup. At least, not if the startup is run competently. If you have the heart of an entrepreneur, and you are willing to roll the dice (since 9 out of 10 startups go belly up; those are the ones that failed to find a viable business model), then sure, go for a startup. And understand that your job will be to make something that works well at a small scale, quick, dirty, and cheap. If that means using some proprietary software, then that s what you should do. Hopefully you ll get lucky and win the IPO lottery. But if your primary interest is to doing great engineering work, then you want go to company that has a proven business model. Google has enough scale that I can sleep well knowing that improvements I am making to Linux and the ext4 file system are saving enough money for the company that Google is making a profit on me. That is, the value Google is getting out of my work is worth many multiples of my total compensation. And that s a position that every engineer should strive for, since that s how you can be confident that you will remain gainfully employed. The fact that I can do what I love, and go to conferences, and all of my work is open source that s just total icing on the cake. :-) No related posts.

Francois Marier: RAID1 alternative for SSD drives

I recently added a solid-state drive to my desktop computer to take advantage of the performance boost rumored to come with these drives. For reliability reasons, I've always tried to use software RAID1 to avoid having to reinstall my machine from backups should a hard drive fail. While this strategy is fairly cheap with regular hard drives, it's not really workable with SSD drives which are still an order of magnitude more expensive.

The strategy I settled on is this one:

continue to have all partitions (/, /home and /data) on my RAID1 hard drives,
put another copy of the root partition (/) on the SSD drive, and
leave my /tmp and swap partitions in RAID0 arrays on my rotational hard drives to reduce the number of writes on the SSD.

This setup has the benefit of using a very small SSD to speed up the main partition while keeping all important data on the larger mirrored drives.

Resetting the SSDThe first thing I did, given that I purchased a second-hand drive, was to completely erase the drive and mark all sectors as empty using an ATA secure erase. Because SSDs have a tendency to get slower as data is added to them, it is necessary to clear the drive in a way that will let the controller know that every byte is now free to be used again.

There is a lot of advice on the web on how to do this and many tutorials refer to an old piece of software called Secure Erase. There is a much better solution on Linux: issuing the commands directly using hdparm.

Partitioning the SSDOnce the drive is empty, it's time to create partitions on it. I'm not sure how important it is to align the partitions to the SSD erase block size on newer drives, but I decided to follow Ted Ts'o's instructions anyways.

Another thing I did is leave 20% of the drive unpartitioned. I've often read that SSDs are faster the more free space they have so I figured that limiting myself to 80% of the drive should help the drive maintain its peak performance over time. In fact, I've heard that extra unused unpartitionable space is one of the main differences between the value and extreme series of Intel SSDs. I'd love to see an official confirmation of this from Intel of course!

Keeping the RAID1 array in sync with the SSDOnce I added the solid-state drive to my computer and copied my root partition on it, I adjusted my fstab and grub settings to boot from that drive. I also setup the following cron job (running twice daily) to keep a copy of my root partition on the old RAID1 drives (mounted on /mnt):

nice ionice -c3 rsync -aHx --delete --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/* --exclude=/home/* --exclude=/mnt/* --exclude=/lost+found/* --exclude=/data/* /* /mnt/

Tuning the SSDFinally, after reading this excellent LWN article, I decided to tune the SSD drive (/dev/sda) by adjusting three things:

Add the discard mount option (also know as ATA TRIM and introduced in the 2.6.33 Linux kernel) to the root partition in /etc/fstab:

/dev/sda1  /  ext4  discard,errors=remount-ro,noatime  0  1

Use the noop IO scheduler by adding these lines to /etc/rc.local:

echo noop > /sys/block/sda/queue/scheduler
echo 1 > /sys/block/sda/queue/iosched/fifo_batch

Turn off entropy gathering (for kernels 2.6.36 or later) by adding this line to /etc/rc.local:

echo 0 > /sys/block/sda/queue/add_random

Is there anything else I should be doing to make sure I get the most out of my SSD?

Theodore Ts'o: I have the money shot for my LCA presentation

Thanks to Eric Whitney s benchmarking results, I have my money shot for my upcoming 2011 LCA talk in Brisbane, which will be about how to improve scalability in the Linux kernel, using the case study of the work that I did to improve scalability via a series of scalability patches that were developed during 2.6.34, 2.6.35, and 2.6.36 (and went into the the kernel during subsequent merge window). These benchmarks were done on a 48-core AMD system (8 sockets, 6 cores/socket) using a 24 SAS-disk hardware RAID array. Which is the sort of system which XFS has traditionally shined on, and for which ext3 has traditionally not scaled very well at. We re now within striking distance of XFS, and there s more improvements to ext4 which I have planned that should help its performance even further. This is the kind of performance improvement that I m totally psyched to see! No related posts.

Theodore Ts'o: I have the money shot for my LCA presentation

Thanks to Eric Whitney s benchmarking results, I have my money shot for my upcoming 2011 LCA talk in Brisbane, which will be about how to improve scalability in the Linux kernel, using the case study of the work that I did to improve scalability via a series of scalability patches that were developed during 2.6.34, 2.6.35, and 2.6.36 (and went into the kernel during subsequent merge window). These benchmarks were done on a 48-core AMD system (8 sockets, 6 cores/socket) using a 24 SAS-disk hardware RAID array. Which is the sort of system which XFS has traditionally shined on, and for which ext3 has traditionally not scaled very well at. We re now within striking distance of XFS, and there s more improvements to ext4 which I have planned that should help its performance even further. This is the kind of performance improvement that I m totally psyched to see! No related posts.

Theodore Ts'o: Moderate Muslims need a better PR Agency

There has been much made of recent reports that roughly half of Americans have an unfavorable view of Islam. And as usual, there are those who will try to claim that Muslims really aren t all that bad, and that Sharia is just set of nice, abstract principles which are all about the protection of life, family, education, religion, property, and human dignity. And on the other side, we have people pointing out that using Sharias as justification, there are countries which are stoning women and chopping off poor people s hands and then forbidding Muslims from arguing about whether such things are just. Worst yet, some of the more public moderate Muslims , such as Iman Rauf, refuse to criticize organizations such as Hamas, on the ground that he is a bridge builder , and it wouldn t help to criticize Hamas s terrorist activities as being anti-Isalmic. OK, so let s grant for the sake of argument the claims that Shariah really is far more than just criminal sanctions, but mainly about exhorting people to live a moral life, and that of the 1,081 pages of the two-volume Arabic text which Sherman A. Jackson (the Arthur F. Thurnau Prfessor of Arabic and Isalamic Studies at the University of Michigan) used to study Shariah, only 60 pages were devoted to criminal sanctions (i.e., the stoning and the cutting off of hands), and only 19 were devoted to Jihad. Let s even further grant the claims made by Humaira Awais Shahid that most Muslims reject Political Islam and are not even arabs. OK. But that case, wouldn t there be more Imans publically disavowing the people who advocate terrorism and suicide bombs as not being Islamic? Not all of them are bridge-builders, are they? And if so, some of them might be better deployed towards saying what Islam is not, and saying that perhaps people who espouse those beliefs are being profoundly unIslamic and saying this loudly outside of their Mosques. Oh, I understand that many Muslims feel that they shouldn t be asked to repudiate the sins of a few crazy terrorists ; just as all Christians shouldn t be held accountable for disturbed crazy rants of a small-time pastor from Gainsville, Florida. But at the same time, it seems to me that Islamic leaders should be eager to say, loudly, that what is being done in the name of their religion in Iran and Nigeria is wrong, and to denounce it. Maybe, some would say, that they are doing that and the media isn t paying attention to them. Well, the Media is surely paying attention to people like Iman Rauf, and he refuses to denounce Hamas! I would gently suggest to those Islamic leaders who feel that they and their faith haven t been given a fair shake, to hire a better PR agency; and make sure that active denunciations of that which they claim does not represent true Islam is shouted from the rooftops; published in press releases; made in press conferences. And actively denounce your fellow Muslims that you feel are shaming your religion, instead of complaining about American Islamophobia. Trying to pretend that there is absolutely no truth in why Americans might be afraid of terrorists who have been hiding under the mantle of your religion is not going to help your cause. There may be truth in the fact that many Americans don t know as much as they should about Islam and Shariah. And there may be truth that American unswerving support of Israel, despite the fact that they acted in profound and unjust ways against the Palestinians, is not only morally and ethically wrong, but has hurt American interests. I certainly believe that to be to true; I am no friend to fundamentalists of any stripe, whether they are Christian, Jewish, or Islamic; and I think the Jewish fundamentalists have almost completely taken over the Israeli political discourse. But all of that is irrelevant if the goal is reduce people e negative views towards Islam. And in any case, if you really believe that attacking innocents, and coercing religion by threatening Muslims who have fallen away from their faith with the death penalty is wrong, then it s wrong regardless of whether those innocents happened to vote for politicians who have been influenced by way too much money from AIPAC. So instead of trying to lecture Americans about their uncritical support of Israel, why not just stand witness to the fact that killing innocents is wrong, and that people who do that are not Islamic, no matter what they claim or how impressive their turban might happen to be? And maybe it might be a good idea to speak out against those who would lend support, whether moral, or financial or logistical, to people who do these unIslamic things in the name of Islam? And it may not be enough to say it once; it needs to be said again and again. Which is why it s important to hire a good PR agency. No related posts.

Theodore Ts'o: sshkeygen.com: A web-based ssh key generator

This is so very, very, wrong enough so that my first thought was, this web site brought to you by China and the letters M , S , and S . I m curious how many people were stupid enough to use this to generate keys that they actually use in production, but I m afraid the answer would seriously depress me. No related posts.

Theodore Ts'o: The history of General Tso s Chicken

I just came across this story (http://goo.gl/EbqP) today, and given my name, and given that I fancy myself a bit of a foodie, who could resist? (Not that I considered the deep-fried, dunked-in-sugar-syrup mess that passes for General Tso s chicken in most fast food Chinese restaurants to be gourmet food, mind you!)

Here s the first thing you should know: The general had nothing to do with his chicken. You can banish any stories of him stir-frying over the flames of the cities he burned, or heartbreaking tales of a last supper, prepared with blind courage, under attack from overwhelming hordes. Unlike the amoeba-like mythologies that follow so many traditional dishes, the story of General Tso s chicken is compellingly simple. One man, Peng Chang-kuei very old but still alive invented it. But what s it ? Because while chef Peng is universally credited with inventing a dish called General Tso s chicken, he probably wouldn t recognize the crisp, sweet, red nuggets you get with pork fried rice for $4.95 with a choice of soda or soup. All that happened under his nose. It all got away from him

No related posts.

Theodore Ts'o: The history of General Tso s Chicken

I just came across this story (http://goo.gl/EbqP) today, and given my name, and given that I fancy myself a bit of a foodie, who could resist? (Not that I considered the deep-fried, dunked-in-sugar-syrup mess that passes for General Tso s chicken in most fast food Chinese restaurants to be gourmet food, mind you!)

Here s the first thing you should know: The general had nothing to do with his chicken. You can banish any stories of him stir-frying over the flames of the cities he burned, or heartbreaking tales of a last supper, prepared with blind courage, under attack from overwhelming hordes. Unlike the amoeba-like mythologies that follow so many traditional dishes, the story of General Tso s chicken is compellingly simple. One man, Peng Chang-kuei very old but still alive invented it. But what s it ? Because while chef Peng is universally credited with inventing a dish called General Tso s chicken, he probably wouldn t recognize the crisp, sweet, red nuggets you get with pork fried rice for $4.95 with a choice of soda or soup. All that happened under his nose. It all got away from him

No related posts.

Theodore Ts'o: The Transitive Grace Period Public Licence: good ideas come around

I recently came across the Transitive Grace Period Public License (alternate link) by Zooko Wilcox-O Hearn. I fonud it interesting because it s very similar almost identical to something I had first starting floating about ten years ago. I called it the Temporary Proprietary License (TPL). I m sure this is a case of great minds think alike . One things that I like about my write up is that I gave some of the rationale behind why this approach is a fruitful one:

A while ago, I was talking to Jim Gettys at the IETF meeting in Orlando, and the subject of software licensing issues came up, and he had a very interesting perspective to share about the X Consortium License, and what he viewed as bugs in the GPL. His basic observation was this: Many companies made various improvements to the X code, which they would keep as proprietary and give them a temporary edge in the marketplace. However, since the X code base was continually evolving, over time it became less attractive to maintain these code forks, since it would mean that they would have to be continually merging their enhancements into the evolving code base. Also, in general, the advantage in having the proprietary new feature or speed enhancement typically degraded over time. After all, most companies are quite happy if it takes 18-24 months for their competitor to match a feature in their release. So sometime later, the companies would very often donate their previously proprietary enhancement to the X consortium, which would then fold it into the public release of X. Jim Gettys complaint about the GPL was that by removing this ability for companies to recoup the investment needed to make major developmental improvements to Open Source code bases, companies might not have the incentive do this type of infrastructural improvements to GPL ed projects. Upon reflection, I think this is a very valid model. When Open Vision distributed the Kerberos Administration daemon to MIT, they wanted an 18 month sunset clause in the license which would prevent commercial competitors from turning around and bidding their work against them. My contract with Powerquest for doing the ext2 partition resizer had a similar clause which kept the resizing code proprietary until a suitable timeout period had occurred, at which point it would be released under the GPL. (Note that by releasing under the GPL, it prevents any of Partition Magic s commercial competitors from including it in their proprietary products!)

For more, read the full proposal. No related posts.

Theodore Ts'o: Proud to be a Googler

Although I obviously had nothing to do with Google s decision vis-a-vis China, having only started working there for a week, I was definitely glad to see it and it made me proud to be able to say that I work there. Kudos to Google s management team for having made (IMHO) the right decision. Hopefully Yahoo and Microsoft will consider carefully what the ethical implications of their collusion and collaboration with the Chinese government s attempt to control free speech and the human rights implications of the same. I have my own opinion regarding the IETF s decision to meet in Beijing, since as we ve seen with the Search Engine industry s attempt to accommodate the Chinese, engagement doesn t necessarily always lead to openness and goodness. All I can suggest is that those people who do decide to travel to that meeting be very careful about what sort of message they want to send with respect to China, as well as being very careful about protecting themselves against targetted information security attacks. No related posts.

Wouter Verhelst: NBD, TRIM, and sparse files

Ever since I read about the ATA TRIM command, I've been thinking that something like that would make a lot of sense for things like NBD. Currently, it's possible to serve a sparse file using NBD. I often do this, for example, testing things like the fix for Bug #513568 (nbd-server choking on a >3 TB RAID device) which would otherwise require me to spend significant amounts of money so that I get enough disks to test. Obviously that's not going to happen. If we're using sparse files, that means we claim to export more than we actually do. This is obviously useful for testing, but there are other cases where this kind of behaviour could be useful; for instance, when exporting swapdevices to several diskless clients, it would be a rare case indeed where all those clients need all of their available swapspace; it might be safe to slightly overcommit the disk space on your hard disk, on the assumption that none of your clients is going to use all that space at the same time. Similarly, when a file is removed on a filesystem on an NBD device, the kernel might use the generic TRIM layer described by Ted in the article above to relay that removal to the nbd-server process, which could then make sure the diskspace is deallocated. However, unfortunately, there currently is no way for an nbd-server to tell the kernel to create a new hole in a sparse file. Creating sparse files is done at file creation time; if you later want to add more holes than you did when creating the file, you need to recreate the whole file from scratch in order to do so. Additionally, when we're talking about a large amount of data that needs to be removed, it's usually much faster to say 'remove this block of data' rather than 'overwrite several megabytes with zeroes here' not just in API, also in kernel-side implementation, provided the filesystem supports extents. I think it might be nice if there were a public (userspace) API to tell the kernel that particular bits of a file or a block device are no longer required, and that they may thus be removed. But then, I'm not very experienced with kernel code, and it might be that I'm just plainly missing something here. Anyone?

Jose Carlos Garcia Sogo: Is ext4 unsafe?

There has been a lot of hype about ext4 lately, following an Ubuntu bug which rose a lot of concerns about ext4 security. I know these are no news, and I know also that I am not a kernel guru, but let s resume some important concepts.
Ext4 implements something called allocate-on-flush. This means that kernel will decide how to write to disk in an efficient way by batching allocations together in larger runs. It increases performance and reduces fragmentation. But if a system crash happens between metadata and data is actually written to disk a 0-size length file can be found in the filesystem. With ext3 journal had a default timeout of 5 seconds and then was flushed to disk. With ext4 this time is unknown, so appplications that want to ensure everything is on disk have to call fsync() on file and directory after every operation. To avoid this, a series of patches have been queued for 2.6.30 so when a file is going to be replaced, it is written to disk with no delay.
Another concern raised is the need for atomicity and durability of files. For achieving this applications have first to write a temporary file and then rename() it over the old file. This will ensure that new or old file will be found in the filesystem after a crash, instead of a corrupted file that can be found when it is opened with O_TRUNC.
Anyway, my opinion is that when choosing a filesystem, one have to know what the computer is going to be used for. Using binary propietary drivers is an option, which can lead your Linux installation to behave most like Vista and filesystem cannot be blamed for losing files if it is making its job as designed. Perhaps, it s better to use ext3 in this situation, but at the cost of missing performance.
If you want to learn more about this issues I recommend you to read both articles by Theodore Tso, Delayed allocation and the zero-length file problem and Don t fear the fsync! and also ~~Alessander~~Alexander Larsson s one ext4 vs fsync, my take as well as comment in all of them.

Theodore Ts'o: Don t fear the fsync!

After reading the comments on my earlier post, Delayed allocation and the zero-length file problem as well as some of the comments on the Slashdot story as well as the Ubuntu bug, it s become very clear to me that there are a lot of myths and misplaced concerns about fsync() and how best to use it. I thought it would be appropriate to correct as many of these misunderstandings about fsync() in one comprehensive blog posting. As the Eat My Data presentation points out very clearly, the only safe way according that POSIX allows for requesting data written to a particular file descriptor be safely stored on stable storage is via the fsync() call. Linux s close(2) man page makes this point very clearly:

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2).

Why don t application programmers follow these sage words? These three reasons are most often given as excuses:

(Perceived) performance problems with fsync()
The application only needs atomicity, but not durability
The fsync() causing the hard drive to spin up unnecessarily in laptop_mode

Let s examine each of these excuses one at a time, to see how valid they really are. (Perceived) performance problems with fsync() Most of the bad publicity with fsync() originated with the now infamous problem with Firefox 3.0 that showed up about a year ago in May, 2008. What happened with Firefox 3.0 was that the primary user interface thread called the sqllite library each time the user clicked on a link to go to a new page. The sqllite library called fsync(), which in ext3 s data=ordered mode, caused a large, visible latency which was visible to the user if there was a large file copy happening by another process. Nearly all of the reported delays was a few seconds, which would be expected; normally there isn t that much dirty data that needs to be flushed out on a Linux system, even if it is even very busy. For example, consier the example of a laptop downloading an .iso image from a local file server; if the laptop has the exclusive link of a 100 megabit/second ethernet link, and the server has the .iso file in cache, or has a nice fast RAID array so it is not the bottleneck, then in the best case, the laptop will be able to download data at the rate of 10-12 MB/second. Assuming the default 5 second commit interval, that means that in the worst case, there will be at most 60 megabytes which must be written out before the commit can proceed. A reasonably modern 7200 rpm laptop drive can write between 60 and 70 MB/second. (The Seagate Momentus 7200.4 laptop drive is reported to be able to deliver 85-104 MB/second, but I can t find it for sale anywhere for love or money.) In this example, an fsync() will trigger a commit and might need to take a second while the download is going on; perhaps half a second if you have a really fast 7200 rpm drive, and maybe 2-3 seconds if you have a slow 5400 rpm drive. (Jump to Sidebar: What about those 30 second fsync reports?) Obviously, you can create workloads that aren t bottlenecked on the maximum ethernet download speed, or the speed of reading from a local disk drive; for example, dd if=/dev/zero of=big-zero-file will create a very large number of dirty pages that must be written to the hard drive at the next commit or fsync() call. It s important to remember though, fsync() doesn t create any extra I/O (although it may remove some optimization opportunities to avoid double writes); fsync() just pushes around when the I/O gets done, and whether it gets done synchronously or asynchronously. If you create a large number of pages that need to be flushed to disk, sooner or later it will have a significant and unfortunate effect on your system s performance. Fsync() might make things more visible, but if the fsync() is done off the main UI thread, the fact that fsync() triggers a commit won t actually disturb other processes doing normal I/O; in ext3 and ext4, we start a new transaction to take care of new file system operations while the committing transction completes. The final observation I ll make is that part of the problem is that Firefox as an application wants to make a huge number of updates to state files and was concerned about not losing that information even in the face of a crash. Every application writer should be asking themselves whether this sort of thing is really necessary. For example, doing some quick measurements using ext4, I determined that Firefox was responsible for 2.54 megabytes written to the disk for each web page visited by the user (and this doesn t include writes to the Firefox cache; I symlinked the cache directory to a tmpfs directory mounted on /tmp to reduce the write load to my SSD). So these 2.54 megabytes is just for Firefox s cookie cache and Places database to maintain its Awesome bar . Is that really worth it? If you visit 400 web pages in a day, that s 1GB of writes to your SSD, and if you write more than 20GB/day, the Intel SSD will enable its write endurance management feature which slows down the performance of the drive. In light of that, exactly how important is it to update those darned sqllite databases after every web click? What if Firefox saved a list of URL s that has been visited, and only updated every 30 or 60 minutes, instead? Is it really that every last web page that you browse be saved if the system crashes? An fsync() call every 15, 30, or 60 minutes, done by a thread which doesn t block the application s UI, would have never been noticed and would have not started the firestorm on Firefox s bugzilla #421482. Very often, after a little thinking, a small change in the application is all that s necessary for to really optimize the application s fsync() usage. (Skip over the sidebar if you ve already read it). Sidebar: What about those 30 second fsync reports? If you read through the Firefox s bugzilla entry, you ll find reports of fsync delays of 30 seconds or more. That tale has grown in the retelling, and I ve seen some hyperbolic claims of five minute delays. Where did that come from? Well, if you look that those claims, you ll find they were using a very read-heavy workload, and/or they were using the ionice command to set a real-time I/O priority. For example, something like ionice -c 1 -n 0 tar cvf /dev/null big-directory . This will cause some significant delays, first of all because ionice -c 1 causes the process to have a real-time I/O priority, such that any I/O requests issued by that process will be serviced before all others. Secondly, even without the real-time I/O priority, the I/O scheduler naturally prioritizes reads as higher priority than writes because normally processes are waiting for reads to complete, but writes are normally asynchronous. This is not at all realistic workload, and it is even more laughable that some people thought this might be an accurate representation of the I/O workload of a kernel compile. These folks had never tried the experiment, or measured how much I/O goes on during a kernel compile. If you try it, you ll find that a kernel compile sucks up a lot of CPU, and doesn t actually do that much I/O. (In fact, that s why an SSD only speeds up a kernel compile by about 20% or so, and that s in a completely cold cache case. If the commonly used include files are already in the system s page cache, the performance improvement of the SSD is much less.) Jump back to reading Performance problems with fsync. The atomicity not durability argument One argument that has commonly been made on the various comment streams is that when replacing a file by writing a new file and the renaming file.new to file , most applications don t need a guarantee that new contents of the file are committed to stable store at a certain point in time; only that either the new or the old contents of the file will be present on the disk. So the argument is essentially that the sequence:

fd = open( foo.new , O_WRONLY);
write(fd, buf, bufsize);
fsync(fd);
close(fd);
rename( foo.new , foo );

is too expensive, since it provides atomicity and durability , when in fact all the application needed was atomicity (i.e., either the new or the old contents of foo should be present after a crash), but not durability (i.e., the application doesn t need to need the new version of foo now, but rather at some intermediate time in the future when it s convenient for the OS). This argument is flawed for two reasons. First of all, the squence above exactly provides desired atomicity without durability . It doesn t guarantee which version of the file will appear in the event of an unexpected crash; if the application needs a guarantee that the new version of the file will be present after a crash, it s necessary to fsync the containing directory. Secondly, as we discussed above, fsync() really isn t that expensive, even in the case of ext3 and data=ordered; remember, fsync() doesn t create extra I/O s, although it may introduce latency as the application waits for some of the pending I/O s to complete. If the application doesn t care about exactly when the new contents of the file will be committed to stable store, the simplest thing to do is to execute the above sequence (open-write-fsync-close-rename) in a separate, asynchronous thread. And if the complaint is that this is too complicated, it s not hard to put this in a library. For example, there is currently discussion on the gtk-devel-list on adding the fsync() call to g_file_set_contents(). Maybe if someone asks nicely, the glib developers will add an asynchronous version of this function which runs g_file_set_contents() in a separate thread. Voila! Avoiding hard drive spin-ups with laptop_mode Finally, as Nathaniel Smith said in Comment #111 of of my previous post:

The problem is that I don t, really, want to turn off fsync s, because I like my data. What I want to do is to spin up the drive as little as possible while maintaining data consistency. Really what I want is a knob that says I m willing to lose up to minutes of work, but no more . We even have that knob (laptop mode and all that), but it only works in simple cases.

This is a reasonable concern, and the way to fix this is to enhance laptop_mode in the Linux kernel. Bart Samwel, the author and maintainer of laptop_mode, actually discussed this idea with me last month at FOSDEM. Laptop_mode already adjusts /proc/sys/vm/dirty_expire_centisecs and /proc/sys/vm/dirty_writeback_centisecs based on the configuration parameter MAX_LOST_WORK_SECONDS, and it also adjusts the file system commit time (for ext3; it needs to be taught to do the same thing for ext4, which is a simple patch) to MAX_LOST_WORK_SECONDS as well. All that is necessary is a kernel patch to allow laptop_mode to disable fsync() calls, since the kernel knows that it is in laptop_mode, and it notices that the disk has spun up, it will sync out everything to disk, since once the energy has been spent to spin up the hard drive, we might as well write everything in memory that needs to be written out right away. Hence, a patch which allows fsync() calls to be disabled while in laptop_mode should do pretty much everything Nate has asked. I need to check to see if laptop_mode does this already, but if it doesn t force a file system commit when it detects that the hard drive has been spun up, it should obviously do this as well. (In addition to having a way to globally disable fsync() s, it may also be useful to have a way to selectively disable fsync() s on a per-process basis, or on the flip side, exempt some process from a global fsync-disable flag. This may be useful if there are some system daemons that really do want to wake up the hard drive and once the hard drive is spinning, naturally everything else that needs to pushed out to stable store should be immediately written.) With this relatively minor change to the kernel s support of laptop_mode, it should be possible to achieve the result that Nate desires, without needing force applications to worry about this issue; applications should be able to just simply use fsync() without fear. Summary As we ve seen, the reasons most people think fsync() should be avoided really don t hold water. The fsync() call really is your friend, and it s really not the villain that some have made it out to be. If used intelligently, it can provide your application with a portable way of assuring that your data has been safely written to stable store, without causing a user-visible latency in your application. The problem is getting people to not fear fsync(), understand fsync(), and then learning the techniques to use fsync() optimally. So just as there has been a Don t fear the penguin campaign, maybe we also need to have a Don t fear the fsync() campaign. All we need is a friendly mascot and logo for a Don t fear the fsync() campaign. Anybody want to propose an image? We can make some T-shirts, mugs, bumper stickers Related posts (automatically generated):

Delayed allocation and the zero-length file problem A recent Ubuntu bug has gotten slashdotted, and has started...