Hard Drive Brands When people ask for advice about what storage to use they often get answers like use brand X, it works well for me and brand Y had a heap of returns a few years ago . I m not convinced there is any difference between the small number of manufacturers that are still in business. One problem we face with reliability of computer systems is that the rate of change is significant, so every year there will be new technological developments to improve things and every company will take advantage of them. Storage devices are unique among computer parts for their requirement for long-term reliability. For most other parts in a computer system a fault that involves total failure is usually easy to fix and even a fault that causes unreliable operation usually won t spread it s damage too far before being noticed (except in corner cases like RAM corruption causing corrupted data on disk). Every year each manufacturer will bring out newer disks that are bigger, cheaper, faster, or all three. Those disks will be expected to remain in service for 3 years in most cases, and for consumer disks often 5 years or more. The manufacturers can t test the new storage technology for even 3 years before releasing it so their ability to prove the reliability is limited. Maybe you could buy some 8TB disks now that were manufactured to the same design as used 3 years ago, but if you buy 12TB consumer grade disks, the 20TB+ data center disks, or any other device that is pushing the limits of new technology then you know that the manufacturer never tested it running for as long as you plan to run it. Generally the engineering is done well and they don t have many problems in the field. Sometimes a new range of disks has a significant number of defects, but that doesn t mean the next series of disks from the same manufacturer will have problems. The issues with SSDs are similar to the issues with hard drives but a little different. I m not sure how much of the improvements in SSDs recently have been due to new technology and how much is due to new manufacturing processes. I had a bad experience with a nameless brand SSD a couple of years ago and now stick to the better known brands. So for SSDs I don t expect a great quality difference between devices that have the names of major computer companies on them, but stuff that comes from China with the name of the discount web store stamped on it is always a risk. Hard Drive vs SSD A few years ago some people were still avoiding SSDs due to the perceived risk of new technology. The first problem with this is that hard drives have lots of new technology in them. The next issue is that hard drives often have some sort of flash storage built in, presumably a SSHD or Hybrid Drive gets all the potential failures of hard drives and SSDs. One theoretical issue with SSDs is that filesystems have been (in theory at least) designed to cope with hard drive failure modes not SSD failure modes. The problem with that theory is that most filesystems don t cope with data corruption at all. If you want to avoid losing data when a disk returns bad data and claims it to be good then you need to use ZFS, BTRFS, the NetApp WAFL filesystem, Microsoft ReFS (with the optional file data checksum feature enabled), or Hammer2 (which wasn t production ready last time I tested it). Some people are concerned that their filesystem won t support wear levelling for SSD use. When a flash storage device is exposed to the OS via a block interface like SATA there isn t much possibility of wear levelling. If flash storage exposes that level of hardware detail to the OS then you need a filesystem like JFFS2 to use it. I believe that most SSDs have something like JFFS2 inside the firmware and use it to expose what looks like a regular block device. Another common concern about SSD is that it will wear out from too many writes. Lots of people are using SSD for the ZIL (ZFS Intent Log) on the ZFS filesystem, that means that SSD devices become the write bottleneck for the system and in some cases are run that way 24*7. If there was a problem with SSDs wearing out I expect that ZFS users would be complaining about it. Back in 2014 I wrote a blog post about whether swap would break SSD  (conclusion it won t). Apart from the nameless brand SSD I mentioned previously all of my SSDs in question are still in service. I have recently had a single Samsung 500G SSD give me 25 read errors (which BTRFS recovered from the other Samsung SSD in the RAID-1), I have yet to determine if this is an ongoing issue with the SSD in question or a transient thing. I also had a 256G SSD in a Hetzner DC give 23 read errors a few months after it gave a SMART alert about Wear_Leveling_Count (old age). Hard drives have moving parts and are therefore inherently more susceptible to vibration than SSDs, they are also more likely to cause vibration related problems in other disks. I will probably write a future blog post about disks that work in small arrays but not in big arrays. My personal experience is that SSDs are at least as reliable as hard drives even when run in situations where vibration and heat aren t issues. Vibration or a warm environment can cause data loss from hard drives in situations where SSDs will work reliably. NVMe I think that NVMe isn t very different from other SSDs in terms of the actual storage. But the different interface gives some interesting possibilities for data loss. OS, filesystem, and motherboard bugs are all potential causes of data loss when using a newer technology. Future Technology The latest thing for high end servers is Optane Persistent memory  also known as DCPMM. This is NVRAM that fits in a regular DDR4 DIMM socket that gives performance somewhere between NVMe and RAM and capacity similar to NVMe. One of the ways of using this is Memory Mode where the DCPMM is seen by the OS as RAM and the actual RAM caches the DCPMM (essentially this is swap space at the hardware level), this could make multiple terabytes of RAM not ridiculously expensive. Another way of using it is App Direct Mode where the DCPMM can either be a simulated block device for regular filesystems or a byte addressable device for application use. The final option is Mixed Memory Mode which has some DCPMM in Memory Mode and some in App Direct Mode . This has much potential for use of backups and to make things extra exciting App Direct Mode has RAID-0 but no other form of RAID. Conclusion I think that the best things to do for storage reliability are to have ECC RAM to avoid corruption before the data gets written, use reasonable quality hardware (buy stuff with a brand that someone will want to protect), and avoid new technology. New hardware and new software needed to talk to new hardware interfaces will have bugs and sometimes those bugs will lose data. Filesystems like BTRFS and ZFS are needed to cope with storage devices returning bad data and claiming it to be good, this is a very common failure mode. Backups are a good thing.
- Reproducible Builds Summit update
- NMU campaign for buster
- Press release: Debian is doing Reproducible Builds for Buster
- Reproducible Builds Branding & Logo
- should we become an SPI member
- Next meeting
- Any other business
ardeterminitiscally for Homebrew, a package manager for MacOS. Dan Kegel worked on using
SOURCE_DATE_EPOCHand other reproduciblity fixes in fpm, a multi plattform package builder. The Fedora Haskell team disabled parallel builds to achieve reproducible builds. Bernhard M. Wiedemann submitted many patches upstream:
- Sorting file lists:
- Use date from
- Bernhard M. Wiedemann
- Adrian Bunk:
- Chris Lamb:
- random_order_of_pdf_ids_generated_by_latex, added the name of the function that possibly causes this issue.
- Adrian Bunk (98)
html-diroutput for very very large diffs such as those for GCC. So far, this includes unreleased work on a
PartialStringdata structure which will form a core part of a new and more intelligent recursive display algorithm. strip-nondeterminism development Versions 0.035-1 was uploaded to unstable from experimental by Chris Lamb. It included contributions from:
- Bernhard M. Wiedemann
- Add CPIO handler and test case.
- Chris Lamb
- Packaging improvements.
- Chris Lamb:
- Add OpenEmbedded to projects page after a discussion at LinuxCon China.
- Update some metadata for existing talks.
- Add 13 missing talks.
- Alexander 'lynxis' Couzens
- LEDE: do a quick
sha256sumbefore calling diffoscope. The LEDE build consists of 1000 packages, using diffoscope to detect whether two packages are identical takes 3 seconds in average, while calling
sha256sumon those small packages takes less than a second, so this reduces the runtime from 3h to 2h (roughly). For Debian package builds this is neglectable, as each build takes several minutes anyway, thus adding 3 seconds to each build doesn't matter much.
- LEDE/OpenWrt: move
toolchain.htmlcreation to remote node, as this is were the toolchain is build.
- LEDE: remove debugging output for images.
- LEDE: fixup HTML creation for toolchain, build path, downloaded software and GIT commit used.
- LEDE: do a quick
- Mattia 'mapreri' Rizzolo:
- Debian: introduce Buster.
- Debian: explain how to migrate from squid3 (in jessie) to squid (in stretch).
- Holger 'h01ger' Levsen:
- Add jenkins jobs to create schroots and configure pbuilder for Buster.
- Add Buster to README/about jenkins.d.n.
- Teach jessie and ubuntu 16.04 systems how to debootstrap Buster.
- Only update indexes and pkg_sets every 30min as the jobs almost run for 15 min now that we test four suites (compared to three before).
- Create HTML dashboard, live status and dd-list pages less often.
- (Almost) stop scheduling old packages in stretch, new versions will still be scheduled and tested as usual.
- Increase scheduling limits, especially for untested, new and depwait packages.
- Replace Stretch with Buster in the repository comparison page.
- Only keep build_service logs for a day, not three.
- Add check for hanging mounts to node health checks.
- Add check for haveged to node health checks.
- Disable ntp.service on hosts running in the future, needed on stretch.
- Install amd64 kernels on all i386 systems. There is a performance issue with i386 kernels, for which a bug should be filed. Installing the amd64 kernel is a sufficient workaround, but it breaks our 32/64 bit kernel variation on i386.
- LEDE, OpenWrt: Fix up links and split TODO list.
- Upgrade i386 systems (used for Debian) and
pb3+4-amd64(used for coreboot, LEDE, OpenWrt, NetBSD, Fedora and Arch Linux tests) to Stretch
- jenkins: use java 8 as required by jenkins >= 2.60.1