Search Results: "Russell Coker"

30 November 2021

Russell Coker: Links November 2021

The Guardian has an amusing article by Sophie Elmhirst about Libertarians buying a cruise ship to make a seasteading project off the coast of Panama [1]. It turns out that you need permits etc to do this and maintaining a ship is expensive. Also you wouldn t want to mine cryptocurrency in a ship cabin as most cabins are small and don t have enough airconditioning to remain pleasant if you dump 1kW or more into the air. NPR has an interesting article about the reaction of the NRA to the Columbine shootings [2]. Seems that some NRA person isn t a total asshole and is sharing their private information, maybe they are dying and are worried about going to hell. David Brin wrote an insightful blog post about the singleton hypothesis where he covers some of the evidence of autocratic societies failing [3]. I think he makes a convincing point about a single centralised government for human society not being viable. But something like the EU on a world wide scale could work well. Ken Shirriff wrote an interesting blog post about reverse engineering the Yamaha DX7 synthesiser [4]. The New York Times has an interesting article about a Baboon troop that became less aggressive after the alpha males all died at once from tuberculosis [5]. They established a new more peaceful culture that has outlived the beta males who avoided tuberculosis. The Guardian has an interesting article about how sequencing the genomes of the entire population can save healthcare costs while improving the health of the population [6]. This is somthing wealthy countries should offer for free to the world population. At a bit under $1000 per test that s only about $7 trillion to test everyone, and of course the price should drop significantly if there were billions of tests being done. The Strategy Bridge has an interesting article about SciFi books that have useful portrayals of military strategy [7]. The co-author is Major General Mick Ryan of the Australian Army which is noteworthy as Major General is the second highest rank in use by the Australian Army at this time. Vice has an interesting article about the co-evolution of penises and vaginas and how a lot of that evolution is based on avoiding impregnation from rape [8]. Cory Doctorow wrote an insightful Medium article about the way that governments could force interoperability through purchasing power [9]. Cory Doctorow wrote an insightful article for Locus Magazine about imagining life after capitalism and how capitalism might be replaced [10]. We need a Star Trek future! Arstechnica has an informative article about new developmenet in the rowhammer category of security attacks on DRAM [11]. It seems that DDR4 with ECC is the best current mitigation technique and that DDR3 with ECC is harder to attack than non-ECC RAM. So the thing to do is use ECC on all workstations and avoid doing security critical things on laptops because they can t juse ECC RAM.

Russell Coker: Your Device Has Been Improved

I ve just started a Samsung tablet downloading a 770MB update, the description says:
Technically I have no doubt that both those claims are true and accurate. But according to common understanding of the English language I think they are both misleading. By stability improved they mean fixed some bugs that made it unstable and no technical person would imagine that after a certain number of such updates the number of bugs will ever reach zero and the tablet will be perfectly reliable. In fact if you should consider yourself lucky if they fix more bugs than they add. It s not THAT uncommon for phones and tablets to be bricked (rendered unusable by software) by an update. In the past I got a Huawei Mate9 as a warranty replacement for a Nexus 6P because an update caused so many Nexus 6P phones to fail that they couldn t be replaced with an identical phone [1]. By security improved they usually mean fixed some security flaws that were recently discovered to make it almost as secure as it was designed to be . Note that I deliberately say almost as secure because it s sometimes impossible to fix a security flaw without making significant changes to interfaces which requires more work than desired for an old product and also gives a higher probability of things going wrong. So it s sometimes better to aim for almost as secure or alternatively just as secure but with some features disabled. Device manufacturers (and most companies in the Android space make the same claims while having the exact same bugs to deal with, Samsung is no different from the others in this regards) are not making devices more secure or more reliable than when they were initially released. They are aiming to make them almost as secure and reliable as when they were released. They don t have much incentive to try too hard in this regard, Samsung won t suffer if I decide my old tablet isn t reliable enough and buy a new one, which will almost certainly be from Samsung because they make nice tablets. As a thought experiment, consider if car repairers did the same thing. Getting us to service your car will improve fuel efficiency , great how much more efficient will it be than when I purchased it? As another thought experiment, consider if car companies stopped providing parts for car repair a few years after releasing a new model. This is effectively what phone and tablet manufacturers have been doing all along, software updates for stability and security are to devices what changing oil etc is for cars.

8 November 2021

Russell Coker: Installing NextCloud

NextCloud and OwnCloud History Some time ago I tried OwnCloud, it wasn t a positive experience for me. Since that time I ve got a server with a much faster CPU, a faster Internet connection, and the NextCloud code is newer and running on a newer version of PHP, I didn t make good notes so I m not sure which factors were most responsible for having a better experience this time. According to the NextCloud Wikipedia page [1] the fork of NextCloud from the OpenCloud base happened in 2016 so it s obviously been a while since I tried it, it was probably long before 2016. Recently the BBC published an interesting article on Turnover contagion which is when one resignation can trigger many more [2] which is interesting to read in the context of OwnCloud losing critical staff failing after one key developer resigned. I mentioned OwnCloud in a 2012 blog post about Liberty and Mobile Phones [3], since then I haven t done well at achieving those goals. A few days ago I decided to try NextCloud and found it a much better experience than I recall OwnCloud being in the past. Installation I installed OwnCloud on an Oracle Cloud ARM VM (see my previous blog post about the Oracle Cloud Free Tier [4]). This CloudCone article on installing NextCloud on Debian 10 (Buster) covers the basics well [5]. Here is the NextCloud URL for downloading the PHP files (a large ZIP archive) [6]. You have to extract to where Apache is configured to have it s webroot and then run chown -R www-data nextcloud/lib/private/Log nextcloud/config nextcloud/apps (or if you use php-fpm then chown it to the user for that). NextCloud recommend having all of the NextCloud files owned by www-data, but that s just a bad idea, allowing it to rewrite some of it s program files is bad, allowing it to rewrite all of them is worse. For my installation I used the Apache modiles macro, rewrite, ssl, php7.4, and headers (this is more about how I configure Apache than about NextCloud). Also I edited /etc/php/7.4/apache2/php.ini and changed memory_limit to 512M (the default of 128M is not enough). I m currently only testing it, for a production use I would use php-fpm and run it under it s own UID so that it can t interact with other PHP apps. After that it was just a matter of visiting the configuration URL and giving it the details of the database etc. After setting it up the command php -d memory_limit=512M occ app:install richdocumentscode_arm64 when run from the root of the OwnCloud installation installs the Cloudera components for editing LibreOffice documents in OwnCloud, this is the command for ARM64 architecture, I presume the command for other architectures is similar. Conclusion OwnCloud is very usable, it has a decent feature set built in and the option to download modules such as the components for editing LibreOffice files on the web is useful. But I am hesitant to install things that require the sort of access it requires. I think it would be better if there was a documented and supported way of installing things and then locking them down so that at runtime it can only write to data files not any program files or configuration files. It would also be better if it was packaged for Debian and had the Debian update process for security fixes. I can imagine many people installing it, forgetting to update it, and ending up with insecure systems.

4 November 2021

Russell Coker: USB Microphones

The Situation I bought myself some USB microphones over ebay, I couldn t see any with USB type A connectors (the original USB connectors) and bought ones with USB-C connectors. I thought it would be good to have microphones that could work with recent mobile phones and with PCs, because surely it wouldn t be difficult to get an adaptor. I tested one of the microphones, it worked well on a phone. I bought a pair of adaptors for USB A ports on a PC or laptop to USB-C (here s the link to where I bought them). I used one of the adaptors with a USB-C HDMI device which gave the following line from lsusb, I didn t try using a HDMI monitor on my laptop, having the device recognised was enough.
Bus 003 Device 002: ID 2109:0100 VIA Labs, Inc. USB 2.0 BILLBOARD
I tried connecting a USB-C microphone and Linux didn t recognise the existence of a USB device, I tried that on a PC and a laptop on multiple ports. I wondered whether the description of the VIA BILLBOARD device as USB 2.0 was relevant to my problem. According to Big Mess O Wires USB-C has separate wires for USB 3.1 and USB 2 [1]. So someone could make a device that converts USB-A to USB-C with only USB-2 wires in place. I tested the USB-A to USB-C adaptor with the HDMI device in a USB SuperSpeed (IE 3.x) port and it still identified as USB 2.0. I suspect that the USB-C HDMI device is using all the high speed wires for DisplayPort data (with a conversion to HDMI) and therefore looks like a USB 2.0 device. The Problem I want to install a microphone in my workstation for long Zoom training sessions (7 hours in a day) that otherwise require me to use multiple Android devices as I don t have a device that will do 7 hours of Zoom without running out of battery. A new workstation with USB-C is unreasonably expensive. A PCIe USB-C card would give me the port at the back of the machine, I can t have the back of the machine near the microphone because it s too noisy. If I could have a USB-C hub with reasonable length cables (the 1M cables typical for USB 2.0 hubs would be fine) connected to a USB-C port at the back of my workstation that would work. But there seems to be a great lack of USB-C hubs. NewBeDev has an informative post about the lack of USB-C hubs that have multiple USB-C ports [2]. There also seems to be a lack of USB-C hubs with cables longer than 20cm. The Solution I ended up ordering a Sades Wand gaming headset [3], that has over-ear headphones and an attached microphone which connects to the computer via USB 2.0. I gave the URL for the sades.com.au web site for reference but you will get a significantly better price by buying on ebay ($39+postage vs about $30 including postage). I guess I won t be using my new USB-C microphones for a while.

1 November 2021

Russell Coker: Talking to Criminals

I think most people and everyone who reads my blog is familiar with the phone support scams that are common nowadays. There s the we are Microsoft support and have found a problem with your PC , the we are from your ISP and want to warn you that your Internet access will be cut off , and the here s the bill for something expensive and we need you to confirm whether you want to pay . Most people hang up when scammers call them and don t call them back. But I like to talk to them. I review the quality of their criminal enterprise and tell them that I expect better quality criminals to call me. I ask them if they are proud to be criminals and if their parents would be proud of them. I ask them if they are paid well to be a criminal. Usually they just hang up and on one occasion the criminal told me to get lost before hanging up. Today I got a spam message telling me to phone +61-2-8006-7237 about an invoice for Norton Software Enhancer and Firewall Defender if I wanted to dispute it. It was interesting that they had an invoice number in the email which they asked me for when I called, at the time I didn t think to make up an invoice number with the same format to determine if they were actually looking it up, in retrospect I should have used a random 9 digit number to determine if they had a database for this. On the first call they just hung up on me. The second call they told me you won t save anyone before hanging up. The third call I got on to a friendly and talkative guy who told me that he was making good money being a criminal. I asked if he was in India or Australia (both guys had accents from the Indian subcontinent), he said he was in Pakistan. He said that he made good money by Pakistani standards as $1 Australian is over 100 Pakistani Rupees. He asked me if I d like to work for him, I said that I make good money doing legal things, he said that if I have so much money I could send him some. ;) He also offered to take me on a tour of Islamabad if I visited, this could have been a genuine offer to have a friendly meeting with someone from the opposite site of computer security or an attempt at kidnap for ransom. He didn t address my question about whether the local authorities would be interested in his work, presumably he thinks that a combination of local authorities not caring much and the difficulty of tracking international crime makes him safe. It was an interesting conversation, I encourage everyone to chat to such criminals. They are right that you won t save anyone. But you can have some fun and occasionally learn some interesting things.

26 October 2021

Russell Coker: Links October 2021

Bloomburg has an insightful article about Juniper, the NSA, and the compromise of Netscreen [1]. It was worse than we previously thought and the Chinese government was involved. Haaretz has an amusing story about security issues at a credit card company based on a series of major WTFs [2]. They used WhatsApp for communicating with customers (despite the lack of support from Facebook for issues like account compromise), stored it on a phone (they should have used a desktop PC), didn t lock the phone down (should have been in a locked case and bolted down like any other financial security device), and allowed it to get stolen. Fortunately the thief was only after a free phone not the financial data stored on it. David Brin wrote an insightful blog post Should facts and successes matter in economics? Or politics? [3] which is part of his series about challenging conservatives to bet on their policies. Vice has an interesting article about a normal-looking USB-C to Lightning cable that intercepts data transfer and sends it out via an embedded Wifi AP [4]. Getting that into such a small space is an impressive engineering feat. The vendor already has a YSB-A to lightning cable with such features for $120 [5]. That s too expensive to just leave them lying around and hope that someone with interesting data finds them, but it s also quite cheap for a targeted attack. Interesting article about tracking people via Bluetooth MAC address or device name [6]. Most of the research is based on a man riding a bike around Norway and passively sniffing Bluetooth transmissions. You can buy commercial devices that can receive Bluetooth from 1Km away. A recent version of Bluetooth has random Mac addresses but that still allows tracking by device name which for many people is their own name. Cory Doctorow has a good summary of the ways that Facebook is rotten [7]. It s worse than you think. In 2019 almost all Facebook s top Christian pages were run by foreign troll farms [8]. This is partly due to Christians being gullible, but Facebook is also to blame for this. Cornell has an interesting article about using CRISPR to identify the gender of chicken eggs before they hatch [9]. This means that instead of killing roosters hatched from eggs for egg production they can just put those eggs for eating and save some money. Another option would be to genetically engineer more sexual dimorphism into chickens as the real problem is that hens for laying eggs are too thin to be good for eating so if you could have a breed of chicken with thin hens and fat cocks then all eggs could be hatched and the chickens used. The article claims that this is an ethical benefit of not killing baby roosters, but really it s about saving 50 cents per egg. Umair Haque wrote an insightful article about why everything will get more expensive as the externalities dating back to the industrial revolution have to be paid for [9]. Alexei Navalny (the jailed Russian opposition politician who Putin tried to murder) wrote an insightful article about why corruption is at the root of most world problems and how to solve it [10]. Cory Doctorow wrote an insightful article about breaking in to the writing industry which can apply to starting in most careers [11]. The main point is that people who have established careers have knowledge about starting a career that s at best outdated and at most totally irrelevant. Learning from people who are at most one step ahead of you is probably best. Peter Wehner wrote an insightful article for The Atlantic about the way churches in the US are breaking apart due to political issues [12]. Similar things appear to be happening in Australia for the same reason, conservative fear based politics which directly opposes everything in the Bible about Jesus is taking over churches. On the positive side this should destroy churches and the way churches are currently going they should be destroyed. The Guardian has an article about the incidence of reinfection with Covid19 [13]. The current expectation is that people who aren t vaccinated will probably get it about every 16 months if it becomes endemic (as it has in the US and will do in Australia if conservatives have their way). If the mortality rate is 2% each time then an unvaccinated person could expect a 15% chance of dying over the course of 10 years if there is no cumulative damage. However if damage to the heart and lungs accumulates over multiple courses of the disease then the probability of death over 10 years could be a lot higher. Psyche has an interesting article by Professor Jan-Willem van Prooijeni about the way that conspiracy theories bypass rationality [14]. The way that entertaining stories bypass rationality is particularly concerning given the way Facebook and other social media are driven by clickbait.

20 October 2021

Russell Coker: Strange Apache Reload Issue

I recently had to renew the SSL certificate for my web server, nothing exciting about that but Certbot created a new directory for the key because I had removed some domains (moved to a different web server). This normally isn t a big deal, change the Apache configuration to the new file names and run the reload command. My monitoring system initially said that the SSL certificate wasn t going to expire in the near future so it looked fine. Then an hour later my monitoring system told me that the certificate was about to expire, apparently the old certificate came back! I viewed my site with my web browser and the new certificate was being used, it seemed strange. Then I did more tests with gnutls-cli which revealed that exactly half the connections got the new certificate and half got the old one. Because my web server isn t doing anything particularly demanding the mpm_event configuration only starts 2 servers, and even that may be excessive for what it does. So it seems that the Apache reload command had reloaded the configuration on one mpm_event server but not the other! Fortunately this was something that was easy to test and was something that was automatically tested. If the change that didn t get accepted was something small it would be a particularly insidious bug. I haven t yet tried to reproduce this. But if I get the time I ll do so and file a bug report.

1 October 2021

Russell Coker: Getting Started With Kali

Kali is a Debian based distribution aimed at penetration testing. I haven t felt a need to use it in the past because Debian has packages for all the scanning tools I regularly use, and all the rest are free software that can be obtained separately. But I recently decided to try it. Here s the URL to get Kali [1]. For a VM you can get VMWare or VirtualBox images, I chose VMWare as it s the most popular image format and also a much smaller download (2.7G vs 4G). For unknown reasons the torrent for it didn t work (might be a problem with my torrent client). The download link for it was extremely slow in Australia, so I downloaded it to a system in Germany and then copied it from there. I don t want to use either VMWare or VirtualBox because I find KVM/Qemu sufficient to do everything I want and they are in the Main section of Debian, so I needed to convert the image files. Some of the documentation on converting image formats to use with QEMU/KVM says to use a program called kvm-img which doesn t seem to exist, I used qemu-img from the qemu-utils package in Debian/Bullseye. The man page qemu-img(1) doesn t list the types of output format supported by the -O option and the examples returned by a web search show using -O qcow2 . It turns out that the following command will convert the image to raw format which is the format I prefer. I use BTRFS for storing all my VM images and that does all the copy-on-write I need.
qemu-img convert Kali-Linux-2021.3-vmware-amd64.vmdk ../kali
After converting it the file was 500M smaller than the VMWare files (10.2 vs 10.7G). Probably the Kali distribution file could be reduced in size by converting it to raw and then back to VMWare format. The Kali VMWare image is compressed with 7zip which has a good compression ratio, I waited almost 90 minutes for zstd to compress it with -19 and the result was 12% larger than the 7zip file. VMWare apparently likes to use an emulated SCSI controller, I spent some time trying to get that going in KVM. Apparently recent versions of QEMU changed the way this works and therefore older web pages aren t helpful. Also allegedly the SCSI emulation is buggy and unreliable (but I didn t manage to get it going so can t be sure). It turns out that the VM is configured to work with the virtio interface, the initramfs.conf has the configuration option MODULES=most which makes it boot on all common configurations (good work by the initramfs-tools maintainers). The image works well with the Spice display interface, so it doesn t capture my mouse, the window for the VM works the same way as other windows on my desktop and doesn t capture the mouse cursor. I don t know if this level of Spice integration is in Debian now, last time I tested it didn t work that way. I also downloaded Metasploitable [2] which is a VM image designed to be full of security flaws for testing the tools that are in Kali. Again it worked nicely after converting from VMWare to raw format. One thing to note about Metasploitable is that you must not make it available on the public Internet. My home network has NAT for IPv4 but all systems get public IPv6 addresses. It s usually nice that those things just work on VMs but not for this. So I added an iptables command to block IPv6 to /etc/rc.local. Conclusion Installing VMs for both these distributions was quite easy. Most of my time was spent downloading from a slow server, trying to get SCSI emulation working, working out how to convert image files, and testing different compression options. The time spent doing stuff once I knew what to do was very small. Kali has zsh as the default shell, it s quite nice. I ve been happy with bash for decades, but I might end up trying zsh out on other machines.

21 September 2021

Russell Coker: Links September 2021

Matthew Garrett wrote an interesting and insightful blog post about the license of software developed or co-developed by machine-learning systems [1]. One of his main points is that people in the FOSS community should aim for less copyright protection. The USENIX ATC 21/OSDI 21 Joint Keynote Address titled It s Time for Operating Systems to Rediscover Hardware has some inssightful points to make [2]. Timothy Roscoe makes some incendiaty points but backs them up with evidence. Is Linux really an OS? I recommend that everyone who s interested in OS design watch this lecture. Cory Doctorow wrote an interesting set of 6 articles about Disneyland, ride pricing, and crowd control [3]. He proposes some interesting ideas for reforming Disneyland. Benjamin Bratton wrote an insightful article about how philosophy failed in the pandemic [4]. He focuses on the Italian philosopher Giorgio Agamben who has a history of writing stupid articles that match Qanon talking points but with better language skills. Arstechnica has an interesting article about penetration testers extracting an encryption key from the bus used by the TPM on a laptop [5]. It s not a likely attack in the real world as most networks can be broken more easily by other methods. But it s still interesting to learn about how the technology works. The Portalist has an article about David Brin s Startide Rising series of novels and his thought s on the concept of Uplift (which he denies inventing) [6]. Jacobin has an insightful article titled You re Not Lazy But Your Boss Wants You to Think You Are [7]. Making people identify as lazy is bad for them and bad for getting them to do work. But this is the first time I ve seen it described as a facet of abusive capitalism. Jacobin has an insightful article about free public transport [8]. Apparently there are already many regions that have free public transport (Tallinn the Capital of Estonia being one example). Fare free public transport allows bus drivers to concentrate on driving not taking fares, removes the need for ticket inspectors, and generally provides a better service. It allows passengers to board buses and trams faster thus reducing traffic congestion and encourages more people to use public transport instead of driving and reduces road maintenance costs. Interesting research from Israel about bypassing facial ID [9]. Apparently they can make a set of 9 images that can pass for over 40% of the population. I didn t expect facial recognition to be an effective form of authentication, but I didn t expect it to be that bad. Edward Snowden wrote an insightful blog post about types of conspiracies [10]. Kevin Rudd wrote an informative article about Sky News in Australia [11]. We need to have a Royal Commission now before we have our own 6th Jan event. Steve from Big Mess O Wires wrote an informative blog post about USB-C and 4K 60Hz video [12]. Basically you can t have a single USB-C hub do 4K 60Hz video and be a USB 3.x hub unless you have compression software running on your PC (slow and only works on Windows), or have DisplayPort 1.4 or Thunderbolt (both not well supported). All of the options are not well documented on online store pages so lots of people will get unpleasant surprises when their deliveries arrive. Computers suck. Steinar H. Gunderson wrote an informative blog post about GaN technology for smaller power supplies [13]. A 65W USB-C PSU that fits the usual wall wart form factor is an interesting development.

7 September 2021

Russell Coker: Oracle Cloud Free Tier

It seems that every cloud service of note has a free tier nowadays and the Oracle Cloud is the latest that I ve discovered (thanks to r/homelab which I highly recommend reading). Here s Oracle s summary of what they offer for free [1]. Oracle s always free tier (where presumable always is defined as until we change our contract ) currently offers ARM64 VMs to a total capacity of 4 CPU cores, 24G of RAM, and 200G of storage with a default VM size of 1/4 that (1 CPU core and 6G of RAM). It also includes 2 AMD64 VMs that each have 1G of RAM, but a 64bit VM with 1G of RAM isn t that useful nowadays. Web Interface The first thing to note is that the management interface is a massive pain to use. When a login times out for security reasons it redirects to a web page that gives a 404 error, maybe the redirection works OK if you are using it when it times out, but if you go off and spend an hour doing something else you will return to a 404 page. A web interface should never refer you to a page with a 404. There doesn t seem to be a way of bookmarking the commonly used links (as AWS does) and the set of links on the left depend on the section you are in with no obvious way of going between sections. Sometimes I got stuck in a set of pages about authentication controls (the identity cloud ) and there seems to be no link I could click on to get me back to cloud computing, I had to go to a bookmarked link for the main cloud login page. A web interface should never force the user to type in the main URL or go to a bookmark, you should be able to navigate from every page to every other page in a logical manner. An advanced user might have their own bookmarks in their browser to suit their workflow. But a beginner should be able to go to anywhere without breaking the session. Some parts of the interface appear to be copied from AWS, but unfortunately not the good parts. The way AWS manages IP access control is not easy to manage and it s not clear why packets are dropped, Oracle copies all this. On the upside Oracle has some good Datadog style analytics so for a new deployment you can debug IP access control by seeing records of rejected packets. Just to make it extra annoying when you create a rule with multiple ports specified the web interface will expand it out to multiple rules for one port each, having ports 80 and 443 on separate lines doesn t make things easier. Also it forces you to have IPv4 and IPv6 as separate rules, so if you want HTTP and HTTPS on both IPv4 and IPv6 (a common requirement) then you need 4 separate rules. One final annoying thing is that the web interface doesn t make your previous settings a default. As I ve created many ARM images and haven t created a single AMD image it should know that the probability that I want to create an AMD image is very low and stop defaulting to that. Recovery When trying a new system you will inevitably break things and have to recover things. The way to recover from a configuration error that prevents your VM from booting and getting to a state of allowing a login is to go to stop the VM, then go to the Boot volume section under Resources and use the settings button to detach the boot volume. Then you go to another VM (which must be running), go to the Attached block volumes menu and attach it as Paravirtualised (not iSCSI and not default which will probably be iSCSI). After some time the block device will appear and you can mount it and do stuff to it. Then after umounting it you detach it from the recovery VM and attach it again to the original VM (where it will still have an entry in the Boot volume section) and boot the original VM. As an aside it s really annoying that you can t attach a volume to a VM that isn t running. My first attempt at image recovery started with making a snapshot of the Boot volume, this didn t work well because the image uses EFI and therefore GPT and because the snapshot was larger than the original block device (which incidentally was the default size). I admit that I might have made a mistake making the snapshot, but if so it shouldn t be so easy to do. With GPT if you have a larger block device then partitioning tools complain about the backup partition table not being found, and they complain even more if you try to go back to the smaller size later on. Generally GPT partition tables are a bad idea for VMs, when I run the host I don t use partition tables, I have a separate block device for each filesystem or swap space. Snapshots aren t needed for recovery, they don t seem to work very well, and if it s possible to attach a snapshot to a VM in place of it s original Boot volume I haven t figured out how to do it. Console Connection If you boot Oracle Linux a derivative of RHEL that has SE Linux enabled in enforcing mode (yay) then you can go to the Console connection . The console is a Javascript console which allows you to login on a virtual serial console on device /dev/ttyAMA0. It tells you to type help but that isn t accepted, you have a straight Linux console login prompt. If you boot Ubuntu then you don t get a working serial console, it tells you to type help for help but doesn t respond to that. It seems that the Oracle Linux kernel 5.4.17-2102.204.4.4.el7uek.aarch64 is compiled with support for /dev/ttyAMA0 (the default ARM serial device) while the kernel 5.11.0-1016-oracle compiled by Oracle for their Ubuntu VMs doesn t have it. Performance I haven t done any detailed tests of VM performance. As a quick test I used zstd to compress a 154MB file, on my home workstation (E5-2620 v4 @ 2.10GHz) it took 11.3 seconds of CPU time to compress with zstd -9 and 7.2s to decompress. On the Oracle cloud it took 7.2s and 5.4s. So it seems that for some single core operations the ARM CPU used by the Oracle cloud is about 30% to 50% faster than a E5-2620 v4 (a slightly out of date server processor that uses DDR4 RAM). If you ran all the free resources in a single VM that would make a respectable build server. If you want to contribute to free software development and only have a laptop with 4G of RAM then an ARM build/test server with 24G of RAM and 4 cores would be very useful. Ubuntu Configuration The advantage of using EFI is that you can manage the kernel from within the VM. The default Oracle kernel for Ubuntu has a lot of modules included and is compiled with a lot of security options including SE Linux. Competitors https://aws.amazon.com/free AWS offers 750 hours (just over 31 days) per month of free usage of a t2.micro or t3.micro EC2 instance (which means 1GB of RAM). But that only lasts for 12 months and it s still only 1GB of RAM. AWS has some other things that could be useful like 1 million free Lambda requests per month. If you want to run your personal web site on Lambda you shouldn t hit that limit. They also apparently have some good offers for students. https://cloud.google.com/free The Google Cloud Project (GCP) offers $300 of credit. https://cloud.google.com/free/docs/gcp-free-tier#free-tier-usage-limits GCP also has ongoing free tier usage for some services. Some of them are pretty much unlimited use (50GB of storage for Cloud Source Repositories is a heap of source code). But for VMs you get the equivalent of 1*e2-micro instance running 24*7. A e2-micro has 1G of RAM. You also only get 30G of storage and 1GB of outbound data. It s clearly not as generous an offer as Oracle, but Oracle is the underdog so they have to try harder. https://azure.microsoft.com/en-us/free/ Azure appears to be much the same as AWS, free Linux VM for a year and then other less popular services free forever (or until they change the contract). https://www.ibm.com/cloud/free The IBM cloud free tier is the least generous offer, a VM is only free for 30 days. But what they offer for 30 days is pretty decent. If you want to try the IBM cloud and see if it can do what your company needs then this will do well. If you want to have free hosting for your hobby stuff then it s no good. Oracle seems like the most generous offer if you want to do stuff, but also one of the least valuable if you want to learn things that will help you at a job interview. For job interviews AWS seems the most useful and then GCP and Azure vying for second place.

30 August 2021

Russell Coker: Links August 2021

Sciencealert has an interesting article on a game to combat misinformation by microdosing people [1]. The game seemed overly simplistic to me, but I guess I m not the target demographic. Research shows it to work. Vice has an interesting and amusing article about mass walkouts of underpaid staff in the US [2]. The way that corporations are fighting an increase in the minimum wage doesn t seem financially beneficial for them. An increase in the minimum wage means small companies have to increase salaries too and the ratio of revenue to payroll is probably worse for small companies. It seems that companies like McDonalds make oppressing their workers a higher priority than making a profit. Interesting article in Vice about how the company Shot Spotter (which determines the locations of gunshots by sound) forges evidence for US police [3]. All convictions based on Shot Spotter evidence should be declared mistrials. BitsNBites has an interesting article on the fundamental flaws of SIMD (Single Instruction Multiple Data) [4]. The Daily Dot has a disturbing article anbout the possible future of the QAnon movement [5]. Let s hope they become too busy fighting each other to hurt many innocent people. Ben Taylor wrote an interesting blog post suggesting that Web Assembly should be a default binary target [6]. I don t support that idea but I think that considering it is useful. Web assembly could be used more for non-web things and it would be a better option than Node.js for some things. There are also some interesting corner cases like games, Minecraft was written in Java and there s no reason that Web Assembly couldn t do the same things. Vice has an interesting article about the Phantom encrypted phone service that ran on Blackberry handsets [7]. Australia really needs legislation based on the US RICO law! Vice has an interesting article about an encrypted phone company run by drug dealers [8]. Apparently after making an encrypted phone system for their own use they decided to sell it to others and made millions of dollars. They could have run a successful legal business. Salon has an insightful interview with Michael Petersen about his research on fake news and people who share it because they need chaos [9]. Apparently low status people who are status seeking are a main contributor to this, they share fake news knowingly to spread chaos. A society with less inequality would have less problems with fake news. Salon has another insightful interview with Michael Petersen, about is later research on fake news as an evolutionary strategy [10]. People knowingly share fake news to mobilise their supporters and to signal allegiance to their group. The more bizarre the beliefs are the more strongly they signal allegiance. If an opposing group has a belief then they can show support for their group by having the opposite belief (EG by opposing vaccination if the other political side supports doctors). He also suggests that lying can be a way of establishing dominance, the more honest people are opposed by a lie the more dominant the liar may seem. Vice has an amusing article about how police took over the Encrochat encrypted phone network that was mostly used by criminals [11]. It s amusing to read of criminals getting taken down like this. It s also interesting to note that the authorities messed up by breaking the wipe facility which alerted the criminals that their security was compromised. The investigation could have continued for longer if they hadn t changed the functionality of compromised phones. A later vice article mentioned that the malware installed on Encrochat devices recorded MAC addresses of Wifi access points which was used to locate the phones even though they had the GPS hardware removed. Cory Doctorow wrote an insightful article for Locus about the insufficient necessity of interoperability [12]. The problem if monopolies is not just an inability to interoperate with other services or leave it s losing control over your life. A few cartel participants interoperating will be able to do all the bad things to us tha a single monopolist could do.

31 July 2021

Russell Coker: Links July 2021

The News Tribune published an article in 2004 about the Dove of Oneness , a mentally ill woman who got thousands of people to believe her crazy ideas about NESARA [1]. In recent time the QANON conspiracy theory has drawn on the NESARA cult and encouraged it s believers to borrow money and spend it in the belief that all debts will be forgiven (something which was not part of NESARA). The Wikipedia page about NESARA (proposed US legislation that was never considered by the US congress) notes that the second edition of the book about it was titled Draining the Swamp: The NESARA Story Monetary and Fiscal Policy Reform . It seems like the Trump cult has been following that for a long time. David Brin (best-selling SciFi Author and NASA consultant) wrote an insightful blog post about the Tytler Calumny [2], which is the false claim that democracy inevitably fails because poor people vote themselves money. When really the failure is of corrupt rich people subverting the government processes to enrich themselves at the expense of their country. It s worth reading, and his entire blog is also worth reading. Cory Doctorow has an insightful article about his own battle with tobacco addiction and the methods that tobacco companies and other horrible organisations use to prevent honest discussion about legislation [3]. Cory Doctorow has an insightful article about consent theater which is describes how consent in most agreements between corporations and people is a fraud [4]. The new GDPR sounds good. The forum for the War Thunder game had a discussion on the accuracy of the Challenger 2 tank which ended up with a man who claims to be a UK tank commander posting part of a classified repair manual [5]. That s pretty amusing, and also good advertising for War Thunder. After reading about this I discovered that it s free on Steam and runs on Linux! Unfortunately it whinged about my video drivers and refused to run. Corey Doctorow has an insightful and well researched article about the way the housing market works in the US [6]. For house prices to increase conditions for renters need to be worse, that may work for home owners in the short term but then in the long term their children and grandchildren will end up renting.

16 July 2021

Russell Coker: Thoughts about RAM and Storage Changes

My first Linux system in 1992 was a 386 with 4MB of RAM and a 120MB hard drive which (for some reason I forgot) only was supported by Linux for about 90MB. My first hard drive was 70MB and could do 500KB/s for contiguous IO, my first Linux hard drive was probably a bit faster, maybe 1MB/s. My current Linux workstation has 64G of RAM and 2*1TB NVMe devices that can sustain about 1.1GB/s. The laptop I m using right now has 8GB of RAM and a 180GB SSD that can do 380MB/s. My laptop has 2000* the RAM of my first Linux system and maybe 400* the contiguous IO speed. Currently I don t even run a VM with less than 4GB of RAM, NB I m not saying that smaller VMs aren t useful merely that I don t happen to be using them now. Modern AMD64 CPUs support 2MB huge pages . As a proportion of system RAM if I used 2MB pages everywhere they would be a smaller portion of system RAM than the 4KB pages on my first Linux system! I am not suggesting using 2MB pages for general systems. For my workstations the majority of processes are using less than 10MB of resident memory and given the different uses for memory mapped shared objects, memory mapped file IO, malloc(), stack, heap, etc there would be a lot of inefficiency having 2MB the limit for all allocation. But as systems worked with 4MB of RAM or less and 4K pages it would surely work to have only 2MB pages with 64GB or more of RAM. Back in the 90s it seemed ridiculous to me to have 256 byte pages on a 68030 CPU, but 4K pages on a modern AMD64 system is even more ridiculous. Apparently AMD64 supports 1GB pages on some CPUs, that seems ridiculously large but when run on a system with 1TB of RAM that s comparable to 4K pages on my first Linux system. Currently AWS offers 24TB EC2 instances and the Google Cloud Project offers 12TB virtual machines. It might even make sense to have the entire OS using 1GB pages for some usage scenarios on such systems, wasting tens of GB of RAM to save TLB thrashing might be a good trade-off. My personal laptop has 200* the RAM of my first Linux system and maybe 400* the contiguous IO speed. An employer recently assigned me a Thinkpad Carbon X1 Gen6 with an NVMe device that could sustain 5GB/s until the CPU overheated, that s 5000* the contiguous IO speed of my first Linux hard drive. My Linux hard drive had a 28ms average access time and my first Linux hard drive probably was a little better, let s call it 20ms for the sake of discussion. It s generally quoted that access times for NVMe are at best 10us, that s 2000* better than my first Linux hard drive. As seek times are the main factor for swap performance a laptop with 8GB of RAM and a fast NVMe device could be expected to give adequate performance with 2000* the swap of my first Linux system. For the work laptop in question I had 8G of swap and my personal laptop has 6G of swap which is somewhat comparable to the 4MB of swap on my first Linux system in that swap is about equal to RAM size, so I guess my personal laptop is performing better than it can be expected to. These are just some idle thoughts about hardware changes over the years. Don t take it as advice for purchasing hardware and don t take it too seriously in general. Also when writing comments don t restrict yourself to being overly serious, feel free to run the numbers on what systems with petabytes of Optane might be like, speculate on what NUMA systems in laptops might be like, etc. Go wild.

5 July 2021

Russell Coker: Servers and Lockdown

OS security features and server class systems are things that surely belong together. If a program is important enough to buy expensive servers to run it then it s important enough that you want to have all the OS security features enabled. For such an important program you will also want to have all possible monitoring systems running so you can predict hardware failures etc. Therefore you would expect that you could buy a server, setup the vendor s management software, configure your Linux kernel with security features such as lockdown (a LSM that restricts access to /dev/mem, the iopl() system call, and other dangerous things [1]), and have it run nicely! You will be disappointed if you try doing that on a HP or Dell server though. HP Problems
[370742.622525] Lockdown: hpasmlited: raw io port access is restricted; see man kernel_lockdown.7
The above message is logged when trying to INSTALL (not even run) the hp-health package from the official HP repository (as documented in my previous blog post about the HP ML-110 Gen9 [2]) with lockdown=integrity (the less restrictive lockdown option). Now the HP package in question is in their repository for Debian/Stretch (released in 2017) and the Lockdown LSM was documented by LWN as being released in 2019, so not supporting a Debian/Bullseye feature in Debian/Stretch packages isn t inherently a bad thing apart from the fact that they haven t released a new version of that package since. The Stretch package that I am testing now was released in 2019. Also it s been regarded as best practice to have device drivers for this sort of thing since long before 2017.
# hplog -v
ERROR: Could not open /dev/cpqhealth/cdt.
Please make sure the Health Monitor is started.
Attempting to run the hplog -v command (to view the HP hardware log) gives the above error. Strace reveals that it could and did open /dev/cpqhealth/cdt but had problems talking to something (presumably the Health Monitor daemon) over a Unix domain socket. It would be nice if they could at least get the error message right! Dell Problems
[   13.811165] Lockdown: smbios-sys-info: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
[   13.820935] Lockdown: smbios-sys-info: raw io port access is restricted; see man kernel_lockdown.7
[   18.118118] Lockdown: dchcfg: raw io port access is restricted; see man kernel_lockdown.7
[   18.127621] Lockdown: dchcfg: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
[   19.371391] Lockdown: dsm_sa_datamgrd: raw io port access is restricted; see man kernel_lockdown.7
[   19.382147] Lockdown: dsm_sa_datamgrd: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
Above is a sample of the messages when booting a Dell PowerEdge R710 with lockdown=integrity with the srvadmin-omacore package installed from the official Dell repository I describe in my blog post about the Dell PowerEdge R710 [3]. Now that repository is for Ubuntu/Xenial which was released in 2015, but again it was best practice to have device drivers for this many years ago. Also the newest Debian based releases that Dell apparently supports are Ubuntu/Xenial and Debian/Jessie which were both released in 2015.
# omreport system esmlog
Error! No Embedded System Management (ESM) log found on this system.
Above is the result when I try to view the ESM log (the Dell hardware log). How Long Should Server Support Last? The Wikipedia List of Dell PowerEdge Servers shows that the R710 is a Generation 11 system. Generation 11 was first released in 2010 and Generation 12 was first released in 2012. Generation 13 was the latest hardware Dell sold in 2015 when they apparently ceased providing newer OS support for Generation 11. Dell currently sells Generation 15 systems and provides more recent support for Generation 14 and Generation 15. I think it s reasonable to debate whether Dell should support servers for 4 generations. But given that a major selling point of server class systems is that they have long term support I think it would make sense to give better support for this and not drop support when it s only 2 versions from the latest release! The support for Dell Generation 11 hardware only seems to have lasted for 3 years after Generation 12 was first released. Also it appears that software support for Dell Generation 13 ceased before Generation 14 was released, that sucks for the people who bought Generation 13 when they were new! HP is currently selling Gen 10 servers which were first released at the end of 2017. So it appears that HP stopped properly supporting Gen 9 servers as soon as Gen 10 servers were released! One thing to note about these support times, when the new generation of hardware was officially released the previous generation was still on sale. So while HP Gen 10 servers officially came out in 2017 that doesn t necessarily mean that someone who wanted to buy a ML-110 Gen10 could actually have done so. For comparison Red Hat Enterprise Linux has been supported for 4-6 years for every release they made since 2005 and Ubuntu has always had a 5 year LTS support for servers. How To Do It Properly The correct way of interfacing with hardware is via a device driver that is supported in the kernel.org tree. That means it goes through the usual kernel source code quality checks which are really good at finding bugs and gives users an assurance that the code won t cause security problems. Generally nothing about the code from Dell or HP gives me confidence that it should be directly accessing /dev/kmem or raw IO ports without risk of problems. Once a driver is in the kernel.org tree it will usually stay there forever and not require further effort from the people who submit it. Then it just works for everyone and tends to work with any other kernel features that people use, like LSMs. If they released the source code to the management programs then it would save them even more effort as they could be maintained by the community.

30 June 2021

Russell Coker: Links June 2021

MIT Technology Review has an interesting article about Google Project Zero shutting down a western intelligence operation [1]. There s an Internet trend of people eating rotten meat they call high meat (rotten meat) [2]. This is up there with people setting themselves on fire and nut shot videos. A young female who was making popular Twitter posts about motorbikes turned out to be a 50yo man using deep fake technology [3]. He has long hair IRL and just needed to replace his face. After coming out of the closet he has continued making such videos and remains popular. FYHTECH has an informative blog post about using sgdisk to backup and restore GPT partition tables [4]. This is in the Debian package gdisk along with several other tools for managing partition tables. One interesting thing to note is that you can backup a partition table and restore to a smaller device (with a bunch of warnings that you can ignore if you know what you are doing). This is the only way I ve discovered to cleanly truncate a GPT partitioned disk, which is sometimes necessary when running VMs. Insightful blog post about PCIe bifurcation and how PCIe lanes are assigned to sockets [5]. This explains why many motherboards have sockets with unused PCIe lanes, EG *8 sockets that are wired for *4. The PCIe slots all go back to the CPU which has a limited number of *16 PCIe connections that are bifurcated to make the larger number of PCIe slots on the motherboard. New Republic has an interesting article on the infamous transphobe Jordan Peterson s battle with tranquiliser dependency [6]. Wired has an interesting article about the hack of RSA infrastructure related to the SecureID keys 10 years ago [7]. Apparently some 10 year NDAs had delayed it. There are many posts about the situation with Freenode, I think that this one best captures the problems in the shortest amount of text [8]. You could spend a few hours reading about it as I have just done, but just reading this gives you the basics that you need to know to avoid Freenode. That blog post has links to articles about Andrew Lee s involvement with Mt Gox and claims to be the heir to the throne of Korea (which is not a monarchy). Nicholas Wade wrote an insightful and informative article about the origin of Covid19, which leads to the conclusion that it was made in a Chinese laboratory [9]. I first saw this in David Brin s Facebook feed. I would be hesitant to share this sort of thing if it wasn t reviewed by a reliable source, I think David Brin has the skill to analyse this sort of article and the contacts to allow him to seek verification of any scientific issues that are outside his field. I believe that this article is reliable and it s conclusion is most likely to be correct. Interesting Wired article about an art project using display computers at Apple stores to photograph people [10]. Ends with a visit from the FBI.

7 June 2021

Russell Coker: Dell PowerEdge T320 and Linux

I recently bought a couple of PowerEdge T320 servers, so now to learn about setting them up. They are a little newer than the R710 I recently setup (which had iDRAC version 6), they have iDRAC version 7. RAM Speed One system has a E5-2440 CPU with 2*16G DDR3 DIMMs and a Memtest86+ speed of 13,043MB/s, the other is essentially identical but with a E5-2430 CPU and 4*16G DDR3 DIMMs and a Memtest86+ speed of 8,270MB/s. I had expected that more DIMMs means better RAM performance but this isn t what happened. I firstly upgraded the BIOS, as I expected it didn t make a difference but it s a good thing to try first. On the E5-2430 I tried removing a DIMM after it was pointed out on Facebook that the CPU has 3 memory channels (here s a link to a great site with information on that CPU and many others [1]). When I did that I was prompted to disable advanced ECC (which treats pairs of DIMMs as a single unit for ECC allowing correcting more than 1 bit errors) and I had to move the 3 remaining DIMMS to different slots. That improved the performance to 13,497MB/s. I then put the spare DIMM into the E5-2440 system and the performance increased to 13,793MB/s, when I installed 4 DIMMs in the E5-2440 system the performance remained at 13,793MB/s and the E5-2430 went down to 12,643MB/s. This is a good result for me, I now have the most RAM and fastest RAM configuration in the system with the fastest CPU. I ll sell the other one to someone who doesn t need so much RAM or performance (it will be really good for a small office mail server and NAS). Firmware Update BIOS The first issue is updating the BIOS, unfortunately the first link I found to the Dell web site didn t have a link to download the Linux installer. It offered a Windows binary, an EFI program, and a DOS binary. I m not about to install Windows if there is any other option and EFI is somewhat annoying, so that leaves DOS. The first Google result for installing FreeDOS advised using unetbootin , that didn t work at all for me (created a USB image that the Dell BIOS didn t recognise as bootable) and even if it did it wouldn t have been a good solution. I went to the FreeDOS download page [2] and got the Lite USB zip file. That contained FD12LITE.img which I could just dd to a USB stick. I then used fdisk to create a second 32MB partition, used mkfs.fat to format it, and then copied the BIOS image file to it. I booted the USB stick and then ran the BIOS update program from drive D:. After the BIOS update this became the first system I ve seen get a totally green result from spectre-meltdown-checker ! I found the link to the Linux installer for the new Dell BIOS afterwards, but it was still good to play with FreeDOS. PERC Driver I probably didn t really need to update the PERC (PowerEdge Raid Controller) firmware as I m just going to run it in JBOD mode. But it was easy to do, a simple bash shell script to update it. Here are the perccli commands needed to access disks, it s all hot-plug so you can insert disks and do all this without a reboot:
# show overview
perccli show
# show controller 0 details
perccli /c0 show all
# show controller 0 info with less detail
perccli /c0 show
# clear all "foreign" RAID members
perccli /c0 /fall delete
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0
The perccli /c0 show command gives the following summary of disk ( PD in perccli terminology) information amongst other information. The EID is the enclosure, Slt is the slot (IE the bay you plug the disk into) and the DID is the disk identifier (not sure what happens if you have multiple enclosures). The allocation of device names (sda, sdb, etc) will be in order of EID:Slt or DID at boot time, and any drives added at run time will get the next letters available.
----------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                     Sp 
----------------------------------------------------------------------------------
32:0      0 Onln   0  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:1      1 Onln   1  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:3      3 Onln   2   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
32:4      4 Onln   3   3.637 TB SATA HDD N   N  512B WDC WD40EURX-64WRWY0      U  
32:5      5 Onln   5 278.875 GB SAS  HDD Y   N  512B ST300MM0026               U  
32:6      6 Onln   6 558.375 GB SAS  HDD N   N  512B AL13SXL600N               U  
32:7      7 Onln   4   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
----------------------------------------------------------------------------------
The PERC controller is a MegaRAID with possibly some minor changes, there are reports of Linux MegaRAID management utilities working on it for similar functionality to perccli. The version of MegaRAID utilities I tried didn t work on my PERC hardware. The smartctl utility works on those disks if you tell it you have a MegaRAID controller (so obviously there s enough similarity that some MegaRAID utilities will work). Here are example smartctl commands for the first and last disks on my system. Note that the disk device node doesn t matter as all device nodes associated with the PERC/MegaRAID are equal for smartctl.
# get model number etc on DID 0 (Samsung SSD)
smartctl -d megaraid,0 -i /dev/sda
# get all the basic information on DID 0
smartctl -d megaraid,0 -a /dev/sda
# get model number etc on DID 7 (Seagate 4TB disk)
smartctl -d megaraid,7 -i /dev/sda
# exactly the same output as the previous command
smartctl -d megaraid,7 -i /dev/sdc
I have uploaded etbemon version 1.3.5-6 to Debian which has support for monitoring smartctl status of MegaRAID devices and NVMe devices. IDRAC To update IDRAC on Linux there s a bash script with the firmware in the same file (binary stuff at the end of a shell script). To make things a little more exciting the script insists that rpm be available (running apt install rpm fixes that for a Debian system). It also creates and runs other shell scripts which start with #!/bin/sh but depend on bash syntax. So I had to make /bin/sh a symlink to /bin/bash. You know you need this if you see errors like typeset: not found and [: -eq: unexpected operator and then the system reboots. Dell people, please test your scripts on dash (the Debian /bin/sh) or just specify #!/bin/bash. If the IDRAC update works it will take about 8 minutes. Lifecycle Controller The Lifecycle Controller is apparently for installing OS and firmware updates. I use Linux tools to update Linux and I generally don t plan to update the firmware after deployment (although I could do so from Linux if needed). So it doesn t seem to offer anything useful to me. Setting Up IDRAC For extra excitement I decided to try to setup IDRAC from the Linux command-line. To install the RAC setup tool you run apt install srvadmin-idracadm7 libargtable2-0 (because srvadmin-idracadm7 doesn t have the right dependencies).
# srvadmin-idracadm7 is missing a dependency
apt install srvadmin-idracadm7 libargtable2-0
# set the IP address, netmask, and gatewat for IDRAC
idracadm7 setniccfg -s 192.168.0.2 255.255.255.0 192.168.0.1
# put my name on the front panel LCD
idracadm7 set System.LCD.UserDefinedString "Russell Coker"
Conclusion This is a very nice deskside workstation/server. It s extremely quiet with hardly any fan noise and the case is strong enough to contain the noise of hard drives. When running with 3* 3.5 SATA disks and 2*10k 2.5 SAS disks on a wooden floor it wasn t annoyingly loud. Without the SAS disks it was as quiet as you can expect any PC to be, definitely not the volume you expect from a serious server! I bought the T320 systems loaded with SAS disks which made them quite loud, I immediately put the disks on ebay and installed SATA SSDs and hard drives which gives me more performance and more space than the SAS disks with less cost and almost no noise. 8*3.5 drive bays gives room for expansion. I currently have 2*SATA SSDs and 3*SATA disks, the SSDs are for the root filesystem (including /home) and the disks are for a separate filesystem for large files.

6 June 2021

Russell Coker: Netflix and IPv6

It seems that Netflix has an ongoing issue of not working well with IPv6, apparently they have some sort of region checking code that doesn t correctly identify IPv6 prefixes. To fix this I wrote the following script to make a small zone file with only A records for Netflix and no AAAA records. The $OUT.header file just has the SOA record for my fake netflix.com domain.
#!/bin/bash
OUT=/etc/bind/data/netflix.com
HEAD=$OUT.header
cp $HEAD $OUT
dig -t a www.netflix.com @8.8.8.8 sed -n -e "s/^.*IN/www IN/p" grep [0-9]$ >> $OUT
dig -t a android.prod.cloud.netflix.com @8.8.8.8 sed -n -e "s/^.*IN/android.prod.cloud IN/p" grep [0-9]$ >> $OUT
/usr/sbin/rndc reload > /dev/null
Update I updated this post to add a line for android.prod.cloud.netflix.com which is the address used by Android devices.

1 June 2021

Russell Coker: Internode NBN with Arris CM8200 on Debian

I ve recently signed up for Internode NBN while using the Arris CM8200 device supplied by Optus (previously used for a regular phone service). I took the configuration mostly from Dean s great blog post on the topic [1]. One thing I changed was the /etc/networ/interfaces configuration, I used the following:
# VLAN ID 2 for Internode's NBN HFC.
auto eth1.2
iface eth1.2 inet manual
  vlan-raw-device eth1
auto nbn
iface nbn inet ppp
    pre-up /bin/ip link set eth1.2 up
    provider nbn
There is no need to have a section for eth1 when you have a section for eth1.2. IPv6 IPv6 for only one system With a line in /etc/ppp/options containing only ipv6 , you get an IPv6 address automatically for the ppp0 interface after starting pppd. IPv6 for your lan Internode has documented how to configure the WIDE DHCPv6 client to get an IPv6 prefix (subnet) [2]. Just install the wide-dhcpv6-client package and put your interface names in a copy of the Internode example config and that works. That gets you a /64 assigned to your local Ethernet. Here s an example of /etc/wide-dhcpv6/dhcp6c.conf:
interface ppp0  
    send ia-pd 0;
    script "/etc/wide-dhcpv6/dhcp6c-script";
 ;
id-assoc pd  
    prefix-interface br0  
        sla-id 0;
        sla-len 8;
     ;
 ;
For providing addresses to other systems on your LAN they recommend radvd version 1.1 or greater, Debian/Bullseye will ship with version 2.18. Here is an example /etc/radvd.conf that will work with it. It seems that you have to manually (or with a script) set the value to use in place of xxxx:xxxx:xxxx:xxxx from the value that is assigned to eth0 (or whichever interface you are using) by the wide-dhcpv6-client.
interface eth0   
        AdvSendAdvert on;
        MinRtrAdvInterval 3; 
        MaxRtrAdvInterval 10;
        prefix xxxx:xxxx:xxxx:xxxx::/64   
                AdvOnLink on; 
                AdvAutonomous on; 
                AdvRouterAddr on; 
         ;
 ;
Either the configuration of the wide dhcp client or radvd removes the default route from ppp0, so you need to run a command like
ip -6 route add default dev ppp0 to put it back. Probably having ipv6 , is the wrong thing to do when using wide-dhcp-client and radvd. On a client machine with bridging I needed to have net.ipv6.conf.br0.accept_ra=2 in /etc/sysctl.conf to allow it to accept route advisory messages on the interface (in this case eth0), for machines without bridging I didn t need that. Firewalling The default model for firewalling nowadays seems to be using NAT and only configuring specific ports to be forwarded to machines on the LAN. With IPv6 on the LAN every system can directly communicate with the rest of the world which may be a bad thing. The following lines in a firewall script will drop all inbound packets that aren t in response to packets that are sent out. This will give an equivalent result to the NAT firewall people are used to and you can always add more rules to allow specific ports in.
ip6tables -A FORWARD -i ppp+ -m state --state ESTABLISHED,RELATED -j ACCEPT
ip6tables -A FORWARD -i ppp+ -i DROP

31 May 2021

Russell Coker: Some Ideas About Storage Reliability

Hard Drive Brands When people ask for advice about what storage to use they often get answers like use brand X, it works well for me and brand Y had a heap of returns a few years ago . I m not convinced there is any difference between the small number of manufacturers that are still in business. One problem we face with reliability of computer systems is that the rate of change is significant, so every year there will be new technological developments to improve things and every company will take advantage of them. Storage devices are unique among computer parts for their requirement for long-term reliability. For most other parts in a computer system a fault that involves total failure is usually easy to fix and even a fault that causes unreliable operation usually won t spread it s damage too far before being noticed (except in corner cases like RAM corruption causing corrupted data on disk). Every year each manufacturer will bring out newer disks that are bigger, cheaper, faster, or all three. Those disks will be expected to remain in service for 3 years in most cases, and for consumer disks often 5 years or more. The manufacturers can t test the new storage technology for even 3 years before releasing it so their ability to prove the reliability is limited. Maybe you could buy some 8TB disks now that were manufactured to the same design as used 3 years ago, but if you buy 12TB consumer grade disks, the 20TB+ data center disks, or any other device that is pushing the limits of new technology then you know that the manufacturer never tested it running for as long as you plan to run it. Generally the engineering is done well and they don t have many problems in the field. Sometimes a new range of disks has a significant number of defects, but that doesn t mean the next series of disks from the same manufacturer will have problems. The issues with SSDs are similar to the issues with hard drives but a little different. I m not sure how much of the improvements in SSDs recently have been due to new technology and how much is due to new manufacturing processes. I had a bad experience with a nameless brand SSD a couple of years ago and now stick to the better known brands. So for SSDs I don t expect a great quality difference between devices that have the names of major computer companies on them, but stuff that comes from China with the name of the discount web store stamped on it is always a risk. Hard Drive vs SSD A few years ago some people were still avoiding SSDs due to the perceived risk of new technology. The first problem with this is that hard drives have lots of new technology in them. The next issue is that hard drives often have some sort of flash storage built in, presumably a SSHD or Hybrid Drive gets all the potential failures of hard drives and SSDs. One theoretical issue with SSDs is that filesystems have been (in theory at least) designed to cope with hard drive failure modes not SSD failure modes. The problem with that theory is that most filesystems don t cope with data corruption at all. If you want to avoid losing data when a disk returns bad data and claims it to be good then you need to use ZFS, BTRFS, the NetApp WAFL filesystem, Microsoft ReFS (with the optional file data checksum feature enabled), or Hammer2 (which wasn t production ready last time I tested it). Some people are concerned that their filesystem won t support wear levelling for SSD use. When a flash storage device is exposed to the OS via a block interface like SATA there isn t much possibility of wear levelling. If flash storage exposes that level of hardware detail to the OS then you need a filesystem like JFFS2 to use it. I believe that most SSDs have something like JFFS2 inside the firmware and use it to expose what looks like a regular block device. Another common concern about SSD is that it will wear out from too many writes. Lots of people are using SSD for the ZIL (ZFS Intent Log) on the ZFS filesystem, that means that SSD devices become the write bottleneck for the system and in some cases are run that way 24*7. If there was a problem with SSDs wearing out I expect that ZFS users would be complaining about it. Back in 2014 I wrote a blog post about whether swap would break SSD [1] (conclusion it won t). Apart from the nameless brand SSD I mentioned previously all of my SSDs in question are still in service. I have recently had a single Samsung 500G SSD give me 25 read errors (which BTRFS recovered from the other Samsung SSD in the RAID-1), I have yet to determine if this is an ongoing issue with the SSD in question or a transient thing. I also had a 256G SSD in a Hetzner DC give 23 read errors a few months after it gave a SMART alert about Wear_Leveling_Count (old age). Hard drives have moving parts and are therefore inherently more susceptible to vibration than SSDs, they are also more likely to cause vibration related problems in other disks. I will probably write a future blog post about disks that work in small arrays but not in big arrays. My personal experience is that SSDs are at least as reliable as hard drives even when run in situations where vibration and heat aren t issues. Vibration or a warm environment can cause data loss from hard drives in situations where SSDs will work reliably. NVMe I think that NVMe isn t very different from other SSDs in terms of the actual storage. But the different interface gives some interesting possibilities for data loss. OS, filesystem, and motherboard bugs are all potential causes of data loss when using a newer technology. Future Technology The latest thing for high end servers is Optane Persistent memory [2] also known as DCPMM. This is NVRAM that fits in a regular DDR4 DIMM socket that gives performance somewhere between NVMe and RAM and capacity similar to NVMe. One of the ways of using this is Memory Mode where the DCPMM is seen by the OS as RAM and the actual RAM caches the DCPMM (essentially this is swap space at the hardware level), this could make multiple terabytes of RAM not ridiculously expensive. Another way of using it is App Direct Mode where the DCPMM can either be a simulated block device for regular filesystems or a byte addressable device for application use. The final option is Mixed Memory Mode which has some DCPMM in Memory Mode and some in App Direct Mode . This has much potential for use of backups and to make things extra exciting App Direct Mode has RAID-0 but no other form of RAID. Conclusion I think that the best things to do for storage reliability are to have ECC RAM to avoid corruption before the data gets written, use reasonable quality hardware (buy stuff with a brand that someone will want to protect), and avoid new technology. New hardware and new software needed to talk to new hardware interfaces will have bugs and sometimes those bugs will lose data. Filesystems like BTRFS and ZFS are needed to cope with storage devices returning bad data and claiming it to be good, this is a very common failure mode. Backups are a good thing.

30 May 2021

Russell Coker: Wifi Performance on Linux

Wifi usually just works. In the past I haven t had to worry much about performance as for home use things have always been bearable and at work it s never been my job so I just file a bug report with the relevant people when things go wrong. But a few years ago I had some problems. For my home network I got a free Wifi AP which wasn t performing well. My AP supported 802.11 modes b/g or g/n (b, g, and n are slow, medium, and fast speeds). I initially had the AP running in b/g mode because I had an 802.11b USB wifi device that I used. When I replaced that with one that did 802.11g I tried changing the AP to g/n mode but performance was even worse on my laptop (although quite good on phones) so I switched back. For phones it appeared to work well giving 54Mb/s while on my laptop (a second hand Thinkpad X1 Carbon) it was giving 11Mb/s at best and often much less than that. The best demonstration of problems was to start transferring a large file while pinging a system on the LAN the AP was connected to. Usually it would give ping times of 1s or more, sometimes 5s+ ping times. While this was happening the Invalid misc count increased rapidly, often by more than 100 per second. The results of Google searches suggest that Invalid misc is due to interference and recommend changing the channel. My AP had been on channel 1 which had performed poorly, channels 2-8 were ok, and channel 9 seemed reasonably good. As an aside trying all channels manually is not a good idea, it takes a lot of time and gives little useful data. After changing to channel 9 it still only gave about 500KB/s when transferring large files with ping times of about 100ms, but that s a big improvement. I tried running iwlist scanning to scan the Wifi network for other APs, that showed that channel 1 was used a lot but didn t make it clear what I should do other than that. The next thing I tried was the Wifi Analyser app on Android [1] (which doesn t work on my latest phone, I don t know if it s still being actively maintained, it will definitely work on older phones). That has a nice graph mode that shows which channels are used and how the frequencies spread and interfere with other channels. One thing I hadn t realised before I looked at the graphs is that 802.11n uses 4 channels and interferes past that. If you have two 802.11n devices you don t have much space left out of the 14 channels available. To make more space I configured the Wifi AP in my ADSL modem to 802.11b/g mode and assigned it a channel away from the others making 4 channels available with no interference. After that iwconfig reported between 60 and 120Mb/s and I got consistent transfer rates over 1.5MB/s while ping times remained below 100ms. The 5GHz frequency range is less congested. But at the time I didn t feel like buying 5GHz equipment. Since that time I had signed up with an ISP that had a good deal on a Wifi AP that had 5GHz. Now I have all my devices configured to use 5GHz or 2.4GHz depending on which they think is best. So there s less devices on 2.4GHz and the AP is configured for 20MHz channel width in the 2.4GHz range (which means 802.11b/g). Conclusion 802.11n seems to be a bad idea unless you run the only AP in an area. In a suburban area you will have 3 other houses broadcasting in your area and 802.11n is bad for everyone. The worst case scenario would be one person using 802.11n and interfering with everyone else s 802.11g and then having everyone else turn on 802.11n to try and make things faster. 5GHz is less congested as most people run old hardware. It also has a shorter range which has the upside of getting less interference from other people. I m considering installing 5GHz APs at both ends of my house and configuring all my new devices to not use 2.4GHz. Wifi spectrum analysis software is much better than manual testing of channels or trying to deduce things from the output if iwlist scanning .

Next.