Search Results: "pascal"

30 August 2022

John Goerzen: The PC & Internet Revolution in Rural America

Inspired by several others (such as Alex Schroeder s post and Szcze uja s prompt), as well as a desire to get this down for my kids, I figure it s time to write a bit about living through the PC and Internet revolution where I did: outside a tiny town in rural Kansas. And, as I ve been back in that same area for the past 15 years, I reflect some on the challenges that continue to play out. Although the stories from the others were primarily about getting online, I want to start by setting some background. Those of you that didn t grow up in the same era as I did probably never realized that a typical business PC setup might cost $10,000 in today s dollars, for instance. So let me start with the background.

Nothing was easy This story begins in the 1980s. Somewhere around my Kindergarten year of school, around 1985, my parents bought a TRS-80 Color Computer 2 (aka CoCo II). It had 64K of RAM and used a TV for display and sound. This got you the computer. It didn t get you any disk drive or anything, no joysticks (required by a number of games). So whenever the system powered down, or it hung and you had to power cycle it a frequent event you d lose whatever you were doing and would have to re-enter the program, literally by typing it in. The floppy drive for the CoCo II cost more than the computer, and it was quite common for people to buy the computer first and then the floppy drive later when they d saved up the money for that. I particularly want to mention that computers then didn t come with a modem. What would be like buying a laptop or a tablet without wifi today. A modem, which I ll talk about in a bit, was another expensive accessory. To cobble together a system in the 80s that was capable of talking to others with persistent storage (floppy, or hard drive), screen, keyboard, and modem would be quite expensive. Adjusted for inflation, if you re talking a PC-style device (a clone of the IBM PC that ran DOS), this would easily be more expensive than the Macbook Pros of today. Few people back in the 80s had a computer at home. And the portion of those that had even the capability to get online in a meaningful way was even smaller. Eventually my parents bought a PC clone with 640K RAM and dual floppy drives. This was primarily used for my mom s work, but I did my best to take it over whenever possible. It ran DOS and, despite its monochrome screen, was generally a more capable machine than the CoCo II. For instance, it supported lowercase. (I m not even kidding; the CoCo II pretty much didn t.) A while later, they purchased a 32MB hard drive for it what luxury! Just getting a machine to work wasn t easy. Say you d bought a PC, and then bought a hard drive, and a modem. You didn t just plug in the hard drive and it would work. You would have to fight it every step of the way. The BIOS and DOS partition tables of the day used a cylinder/head/sector method of addressing the drive, and various parts of that those addresses had too few bits to work with the big drives of the day above 20MB. So you would have to lie to the BIOS and fdisk in various ways, and sort of work out how to do it for each drive. For each peripheral serial port, sound card (in later years), etc., you d have to set jumpers for DMA and IRQs, hoping not to conflict with anything already in the system. Perhaps you can now start to see why USB and PCI were so welcomed.

Sharing and finding resources Despite the two computers in our home, it wasn t as if software written on one machine just ran on another. A lot of software for PC clones assumed a CGA color display. The monochrome HGC in our PC wasn t particularly compatible. You could find a TSR program to emulate the CGA on the HGC, but it wasn t particularly stable, and there s only so much you can do when a program that assumes color displays on a monitor that can only show black, dark amber, or light amber. So I d periodically get to use other computers most commonly at an office in the evening when it wasn t being used. There were some local computer clubs that my dad took me to periodically. Software was swapped back then; disks copied, shareware exchanged, and so forth. For me, at least, there was no online to download software from, and selling software over the Internet wasn t a thing at all.

Three Different Worlds There were sort of three different worlds of computing experience in the 80s:
  1. Home users. Initially using a wide variety of software from Apple, Commodore, Tandy/RadioShack, etc., but eventually coming to be mostly dominated by IBM PC clones
  2. Small and mid-sized business users. Some of them had larger minicomputers or small mainframes, but most that I had contact with by the early 90s were standardized on DOS-based PCs. More advanced ones had a network running Netware, most commonly. Networking hardware and software was generally too expensive for home users to use in the early days.
  3. Universities and large institutions. These are the places that had the mainframes, the earliest implementations of TCP/IP, the earliest users of UUCP, and so forth.
The difference between the home computing experience and the large institution experience were vast. Not only in terms of dollars the large institution hardware could easily cost anywhere from tens of thousands to millions of dollars but also in terms of sheer resources required (large rooms, enormous power circuits, support staff, etc). Nothing was in common between them; not operating systems, not software, not experience. I was never much aware of the third category until the differences started to collapse in the mid-90s, and even then I only was exposed to it once the collapse was well underway. You might say to me, Well, Google certainly isn t running what I m running at home! And, yes of course, it s different. But fundamentally, most large datacenters are running on x86_64 hardware, with Linux as the operating system, and a TCP/IP network. It s a different scale, obviously, but at a fundamental level, the hardware and operating system stack are pretty similar to what you can readily run at home. Back in the 80s and 90s, this wasn t the case. TCP/IP wasn t even available for DOS or Windows until much later, and when it was, it was a clunky beast that was difficult. One of the things Kevin Driscoll highlights in his book called Modem World see my short post about it is that the history of the Internet we usually receive is focused on case 3: the large institutions. In reality, the Internet was and is literally a network of networks. Gateways to and from Internet existed from all three kinds of users for years, and while TCP/IP ultimately won the battle of the internetworking protocol, the other two streams of users also shaped the Internet as we now know it. Like many, I had no access to the large institution networks, but as I ve been reflecting on my experiences, I ve found a new appreciation for the way that those of us that grew up with primarily home PCs shaped the evolution of today s online world also.

An Era of Scarcity I should take a moment to comment about the cost of software back then. A newspaper article from 1985 comments that WordPerfect, then the most powerful word processing program, sold for $495 (or $219 if you could score a mail order discount). That s $1360/$600 in 2022 money. Other popular software, such as Lotus 1-2-3, was up there as well. If you were to buy a new PC clone in the mid to late 80s, it would often cost $2000 in 1980s dollars. Now add a printer a low-end dot matrix for $300 or a laser for $1500 or even more. A modem: another $300. So the basic system would be $3600, or $9900 in 2022 dollars. If you wanted a nice printer, you re now pushing well over $10,000 in 2022 dollars. You start to see one barrier here, and also why things like shareware and piracy if it was indeed even recognized as such were common in those days. So you can see, from a home computer setup (TRS-80, Commodore C64, Apple ][, etc) to a business-class PC setup was an order of magnitude increase in cost. From there to the high-end minis/mainframes was another order of magnitude (at least!) increase. Eventually there was price pressure on the higher end and things all got better, which is probably why the non-DOS PCs lasted until the early 90s.

Increasing Capabilities My first exposure to computers in school was in the 4th grade, when I would have been about 9. There was a single Apple ][ machine in that room. I primarily remember playing Oregon Trail on it. The next year, the school added a computer lab. Remember, this is a small rural area, so each graduating class might have about 25 people in it; this lab was shared by everyone in the K-8 building. It was full of some flavor of IBM PS/2 machines running DOS and Netware. There was a dedicated computer teacher too, though I think she was a regular teacher that was given somewhat minimal training on computers. We were going to learn typing that year, but I did so well on the very first typing program that we soon worked out that I could do programming instead. I started going to school early these machines were far more powerful than the XT at home and worked on programming projects there. Eventually my parents bought me a Gateway 486SX/25 with a VGA monitor and hard drive. Wow! This was a whole different world. It may have come with Windows 3.0 or 3.1 on it, but I mainly remember running OS/2 on that machine. More on that below.

Programming That CoCo II came with a BASIC interpreter in ROM. It came with a large manual, which served as a BASIC tutorial as well. The BASIC interpreter was also the shell, so literally you could not use the computer without at least a bit of BASIC. Once I had access to a DOS machine, it also had a basic interpreter: GW-BASIC. There was a fair bit of software written in BASIC at the time, but most of the more advanced software wasn t. I wondered how these .EXE and .COM programs were written. I could find vague references to DEBUG.EXE, assemblers, and such. But it wasn t until I got a copy of Turbo Pascal that I was able to do that sort of thing myself. Eventually I got Borland C++ and taught myself C as well. A few years later, I wanted to try writing GUI programs for Windows, and bought Watcom C++ much cheaper than the competition, and it could target Windows, DOS (and I think even OS/2). Notice that, aside from BASIC, none of this was free, and none of it was bundled. You couldn t just download a C compiler, or Python interpreter, or whatnot back then. You had to pay for the ability to write any kind of serious code on the computer you already owned.

The Microsoft Domination Microsoft came to dominate the PC landscape, and then even the computing landscape as a whole. IBM very quickly lost control over the hardware side of PCs as Compaq and others made clones, but Microsoft has managed in varying degrees even to this day to keep a stranglehold on the software, and especially the operating system, side. Yes, there was occasional talk of things like DR-DOS, but by and large the dominant platform came to be the PC, and if you had a PC, you ran DOS (and later Windows) from Microsoft. For awhile, it looked like IBM was going to challenge Microsoft on the operating system front; they had OS/2, and when I switched to it sometime around the version 2.1 era in 1993, it was unquestionably more advanced technically than the consumer-grade Windows from Microsoft at the time. It had Internet support baked in, could run most DOS and Windows programs, and had introduced a replacement for the by-then terrible FAT filesystem: HPFS, in 1988. Microsoft wouldn t introduce a better filesystem for its consumer operating systems until Windows XP in 2001, 13 years later. But more on that story later.

Free Software, Shareware, and Commercial Software I ve covered the high cost of software already. Obviously $500 software wasn t going to sell in the home market. So what did we have? Mainly, these things:
  1. Public domain software. It was free to use, and if implemented in BASIC, probably had source code with it too.
  2. Shareware
  3. Commercial software (some of it from small publishers was a lot cheaper than $500)
Let s talk about shareware. The idea with shareware was that a company would release a useful program, sometimes limited. You were encouraged to register , or pay for, it if you liked it and used it. And, regardless of whether you registered it or not, were told please copy! Sometimes shareware was fully functional, and registering it got you nothing more than printed manuals and an easy conscience (guilt trips for not registering weren t necessarily very subtle). Sometimes unregistered shareware would have a nag screen a delay of a few seconds while they told you to register. Sometimes they d be limited in some way; you d get more features if you registered. With games, it was popular to have a trilogy, and release the first episode inevitably ending with a cliffhanger as shareware, and the subsequent episodes would require registration. In any event, a lot of software people used in the 80s and 90s was shareware. Also pirated commercial software, though in the earlier days of computing, I think some people didn t even know the difference. Notice what s missing: Free Software / FLOSS in the Richard Stallman sense of the word. Stallman lived in the big institution world after all, he worked at MIT and what he was doing with the Free Software Foundation and GNU project beginning in 1983 never really filtered into the DOS/Windows world at the time. I had no awareness of it even existing until into the 90s, when I first started getting some hints of it as a port of gcc became available for OS/2. The Internet was what really brought this home, but I m getting ahead of myself. I want to say again: FLOSS never really entered the DOS and Windows 3.x ecosystems. You d see it make a few inroads here and there in later versions of Windows, and moreso now that Microsoft has been sort of forced to accept it, but still, reflect on its legacy. What is the software market like in Windows compared to Linux, even today? Now it is, finally, time to talk about connectivity!

Getting On-Line What does it even mean to get on line? Certainly not connecting to a wifi access point. The answer is, unsurprisingly, complex. But for everyone except the large institutional users, it begins with a telephone.

The telephone system By the 80s, there was one communication network that already reached into nearly every home in America: the phone system. Virtually every household (note I don t say every person) was uniquely identified by a 10-digit phone number. You could, at least in theory, call up virtually any other phone in the country and be connected in less than a minute. But I ve got to talk about cost. The way things worked in the USA, you paid a monthly fee for a phone line. Included in that monthly fee was unlimited local calling. What is a local call? That was an extremely complex question. Generally it meant, roughly, calling within your city. But of course, as you deal with things like suburbs and cities growing into each other (eg, the Dallas-Ft. Worth metroplex), things got complicated fast. But let s just say for simplicity you could call others in your city. What about calling people not in your city? That was long distance , and you paid often hugely by the minute for it. Long distance rates were difficult to figure out, but were generally most expensive during business hours and cheapest at night or on weekends. Prices eventually started to come down when competition was introduced for long distance carriers, but even then you often were stuck with a single carrier for long distance calls outside your city but within your state. Anyhow, let s just leave it at this: local calls were virtually free, and long distance calls were extremely expensive.

Getting a modem I remember getting a modem that ran at either 1200bps or 2400bps. Either way, quite slow; you could often read even plain text faster than the modem could display it. But what was a modem? A modem hooked up to a computer with a serial cable, and to the phone system. By the time I got one, modems could automatically dial and answer. You would send a command like ATDT5551212 and it would dial 555-1212. Modems had speakers, because often things wouldn t work right, and the telephone system was oriented around speech, so you could hear what was happening. You d hear it wait for dial tone, then dial, then hopefully the remote end would ring, a modem there would answer, you d hear the screeching of a handshake, and eventually your terminal would say CONNECT 2400. Now your computer was bridged to the other; anything going out your serial port was encoded as sound by your modem and decoded at the other end, and vice-versa. But what, exactly, was the other end? It might have been another person at their computer. Turn on local echo, and you can see what they did. Maybe you d send files to each other. But in my case, the answer was different: PC Magazine.

PC Magazine and CompuServe Starting around 1986 (so I would have been about 6 years old), I got to read PC Magazine. My dad would bring copies that were being discarded at his office home for me to read, and I think eventually bought me a subscription directly. This was not just a standard magazine; it ran something like 350-400 pages an issue, and came out every other week. This thing was a monster. It had reviews of hardware and software, descriptions of upcoming technologies, pages and pages of ads (that often had some degree of being informative to them). And they had sections on programming. Many issues would talk about BASIC or Pascal programming, and there d be a utility in most issues. What do I mean by a utility in most issues ? Did they include a floppy disk with software? No, of course not. There was a literal program listing printed in the magazine. If you wanted the utility, you had to type it in. And a lot of them were written in assembler, so you had to have an assembler. An assembler, of course, was not free and I didn t have one. Or maybe they wrote it in Microsoft C, and I had Borland C, and (of course) they weren t compatible. Sometimes they would list the program sort of in binary: line after line of a BASIC program, with lines like 64, 193, 253, 0, 53, 0, 87 that you would type in for hours, hopefully correctly. Running the BASIC program would, if you got it correct, emit a .COM file that you could then run. They did have a rudimentary checksum system built in, but it wasn t even a CRC, so something like swapping two numbers you d never notice except when the program would mysteriously hang. Eventually they teamed up with CompuServe to offer a limited slice of CompuServe for the purpose of downloading PC Magazine utilities. This was called PC MagNet. I am foggy on the details, but I believe that for a time you could connect to the limited PC MagNet part of CompuServe for free (after the cost of the long-distance call, that is) rather than paying for CompuServe itself (because, OF COURSE, that also charged you per the minute.) So in the early days, I would get special permission from my parents to place a long distance call, and after some nerve-wracking minutes in which we were aware every minute was racking up charges, I could navigate the menus, download what I wanted, and log off immediately. I still, incidentally, mourn what PC Magazine became. As with computing generally, it followed the mass market. It lost its deep technical chops, cut its programming columns, stopped talking about things like how SCSI worked, and so forth. By the time it stopped printing in 2009, it was no longer a square-bound 400-page beheamoth, but rather looked more like a copy of Newsweek, but with less depth.

Continuing with CompuServe CompuServe was a much larger service than just PC MagNet. Eventually, our family got a subscription. It was still an expensive and scarce resource; I d call it only after hours when the long-distance rates were cheapest. Everyone had a numerical username separated by commas; mine was 71510,1421. CompuServe had forums, and files. Eventually I would use TapCIS to queue up things I wanted to do offline, to minimize phone usage online. CompuServe eventually added a gateway to the Internet. For the sum of somewhere around $1 a message, you could send or receive an email from someone with an Internet email address! I remember the thrill of one time, as a kid of probably 11 years, sending a message to one of the editors of PC Magazine and getting a kind, if brief, reply back! But inevitably I had

The Godzilla Phone Bill Yes, one month I became lax in tracking my time online. I ran up my parents phone bill. I don t remember how high, but I remember it was hundreds of dollars, a hefty sum at the time. As I watched Jason Scott s BBS Documentary, I realized how common an experience this was. I think this was the end of CompuServe for me for awhile.

Toll-Free Numbers I lived near a town with a population of 500. Not even IN town, but near town. The calling area included another town with a population of maybe 1500, so all told, there were maybe 2000 people total I could talk to with a local call though far fewer numbers, because remember, telephones were allocated by the household. There was, as far as I know, zero modems that were a local call (aside from one that belonged to a friend I met in around 1992). So basically everything was long-distance. But there was a special feature of the telephone network: toll-free numbers. Normally when calling long-distance, you, the caller, paid the bill. But with a toll-free number, beginning with 1-800, the recipient paid the bill. These numbers almost inevitably belonged to corporations that wanted to make it easy for people to call. Sales and ordering lines, for instance. Some of these companies started to set up modems on toll-free numbers. There were few of these, but they existed, so of course I had to try them! One of them was a company called PennyWise that sold office supplies. They had a toll-free line you could call with a modem to order stuff. Yes, online ordering before the web! I loved office supplies. And, because I lived far from a big city, if the local K-Mart didn t have it, I probably couldn t get it. Of course, the interface was entirely text, but you could search for products and place orders with the modem. I had loads of fun exploring the system, and actually ordered things from them and probably actually saved money doing so. With the first order they shipped a monster full-color catalog. That thing must have been 500 pages, like the Sears catalogs of the day. Every item had a part number, which streamlined ordering through the modem.

Inbound FAXes By the 90s, a number of modems became able to send and receive FAXes as well. For those that don t know, a FAX machine was essentially a special modem. It would scan a page and digitally transmit it over the phone system, where it would at least in the early days be printed out in real time (because the machines didn t have the memory to store an entire page as an image). Eventually, PC modems integrated FAX capabilities. There still wasn t anything useful I could do locally, but there were ways I could get other companies to FAX something to me. I remember two of them. One was for US Robotics. They had an on demand FAX system. You d call up a toll-free number, which was an automated IVR system. You could navigate through it and select various documents of interest to you: spec sheets and the like. You d key in your FAX number, hang up, and US Robotics would call YOU and FAX you the documents you wanted. Yes! I was talking to a computer (of a sorts) at no cost to me! The New York Times also ran a service for awhile called TimesFax. Every day, they would FAX out a page or two of summaries of the day s top stories. This was pretty cool in an era in which I had no other way to access anything from the New York Times. I managed to sign up for TimesFax I have no idea how, anymore and for awhile I would get a daily FAX of their top stories. When my family got its first laser printer, I could them even print these FAXes complete with the gothic New York Times masthead. Wow! (OK, so technically I could print it on a dot-matrix printer also, but graphics on a 9-pin dot matrix is a kind of pain that is a whole other article.)

My own phone line Remember how I discussed that phone lines were allocated per household? This was a problem for a lot of reasons:
  1. Anybody that tried to call my family while I was using my modem would get a busy signal (unable to complete the call)
  2. If anybody in the house picked up the phone while I was using it, that would degrade the quality of the ongoing call and either mess up or disconnect the call in progress. In many cases, that could cancel a file transfer (which wasn t necessarily easy or possible to resume), prompting howls of annoyance from me.
  3. Generally we all had to work around each other
So eventually I found various small jobs and used the money I made to pay for my own phone line and my own long distance costs. Eventually I upgraded to a 28.8Kbps US Robotics Courier modem even! Yes, you heard it right: I got a job and a bank account so I could have a phone line and a faster modem. Uh, isn t that why every teenager gets a job? Now my local friend and I could call each other freely at least on my end (I can t remember if he had his own phone line too). We could exchange files using HS/Link, which had the added benefit of allowing split-screen chat even while a file transfer is in progress. I m sure we spent hours chatting to each other keyboard-to-keyboard while sharing files with each other.

Technology in Schools By this point in the story, we re in the late 80s and early 90s. I m still using PC-style OSs at home; OS/2 in the later years of this period, DOS or maybe a bit of Windows in the earlier years. I mentioned that they let me work on programming at school starting in 5th grade. It was soon apparent that I knew more about computers than anybody on staff, and I started getting pulled out of class to help teachers or administrators with vexing school problems. This continued until I graduated from high school, incidentally often to my enjoyment, and the annoyance of one particular teacher who, I must say, I was fine with annoying in this way. That s not to say that there was institutional support for what I was doing. It was, after all, a small school. Larger schools might have introduced BASIC or maybe Logo in high school. But I had already taught myself BASIC, Pascal, and C by the time I was somewhere around 12 years old. So I wouldn t have had any use for that anyhow. There were programming contests occasionally held in the area. Schools would send teams. My school didn t really send anybody, but I went as an individual. One of them was run by a local college (but for jr. high or high school students. Years later, I met one of the professors that ran it. He remembered me, and that day, better than I did. The programming contest had problems one could solve in BASIC or Logo. I knew nothing about what to expect going into it, but I had lugged my computer and screen along, and asked him, Can I write my solutions in C? He was, apparently, stunned, but said sure, go for it. I took first place that day, leading to some rather confused teams from much larger schools. The Netware network that the school had was, as these generally were, itself isolated. There was no link to the Internet or anything like it. Several schools across three local counties eventually invested in a fiber-optic network linking them together. This built a larger, but still closed, network. Its primary purpose was to allow students to be exposed to a wider variety of classes at high schools. Participating schools had an ITV room , outfitted with cameras and mics. So students at any school could take classes offered over ITV at other schools. For instance, only my school taught German classes, so people at any of those participating schools could take German. It was an early Zoom room. But alongside the TV signal, there was enough bandwidth to run some Netware frames. By about 1995 or so, this let one of the schools purchase some CD-ROM software that was made available on a file server and could be accessed by any participating school. Nice! But Netware was mainly about file and printer sharing; there wasn t even a facility like email, at least not on our deployment.

BBSs My last hop before the Internet was the BBS. A BBS was a computer program, usually ran by a hobbyist like me, on a computer with a modem connected. Callers would call it up, and they d interact with the BBS. Most BBSs had discussion groups like forums and file areas. Some also had games. I, of course, continued to have that most vexing of problems: they were all long-distance. There were some ways to help with that, chiefly QWK and BlueWave. These, somewhat like TapCIS in the CompuServe days, let me download new message posts for reading offline, and queue up my own messages to send later. QWK and BlueWave didn t help with file downloading, though.

BBSs get networked BBSs were an interesting thing. You d call up one, and inevitably somewhere in the file area would be a BBS list. Download the BBS list and you ve suddenly got a list of phone numbers to try calling. All of them were long distance, of course. You d try calling them at random and have a success rate of maybe 20%. The other 80% would be defunct; you might get the dreaded this number is no longer in service or the even more dreaded angry human answering the phone (and of course a modem can t talk to a human, so they d just get silence for probably the nth time that week). The phone company cared nothing about BBSs and recycled their numbers just as fast as any others. To talk to various people, or participate in certain discussion groups, you d have to call specific BBSs. That s annoying enough in the general case, but even more so for someone paying long distance for it all, because it takes a few minutes to establish a connection to a BBS: handshaking, logging in, menu navigation, etc. But BBSs started talking to each other. The earliest successful such effort was FidoNet, and for the duration of the BBS era, it remained by far the largest. FidoNet was analogous to the UUCP that the institutional users had, but ran on the much cheaper PC hardware. Basically, BBSs that participated in FidoNet would relay email, forum posts, and files between themselves overnight. Eventually, as with UUCP, by hopping through this network, messages could reach around the globe, and forums could have worldwide participation asynchronously, long before they could link to each other directly via the Internet. It was almost entirely volunteer-run.

Running my own BBS At age 13, I eventually chose to set up my own BBS. It ran on my single phone line, so of course when I was dialing up something else, nobody could dial up me. Not that this was a huge problem; in my town of 500, I probably had a good 1 or 2 regular callers in the beginning. In the PC era, there was a big difference between a server and a client. Server-class software was expensive and rare. Maybe in later years you had an email client, but an email server would be completely unavailable to you as a home user. But with a BBS, I could effectively run a server. I even ran serial lines in our house so that the BBS could be connected from other rooms! Since I was running OS/2, the BBS didn t tie up the computer; I could continue using it for other things. FidoNet had an Internet email gateway. This one, unlike CompuServe s, was free. Once I had a BBS on FidoNet, you could reach me from the Internet using the FidoNet address. This didn t support attachments, but then email of the day didn t really, either. Various others outside Kansas ran FidoNet distribution points. I believe one of them was mgmtsys; my memory is quite vague, but I think they offered a direct gateway and I would call them to pick up Internet mail via FidoNet protocols, but I m not at all certain of this.

Pros and Cons of the Non-Microsoft World As mentioned, Microsoft was and is the dominant operating system vendor for PCs. But I left that world in 1993, and here, nearly 30 years later, have never really returned. I got an operating system with more technical capabilities than the DOS and Windows of the day, but the tradeoff was a much smaller software ecosystem. OS/2 could run DOS programs, but it ran OS/2 programs a lot better. So if I were to run a BBS, I wanted one that had a native OS/2 version limiting me to a small fraction of available BBS server software. On the other hand, as a fully 32-bit operating system, there started to be OS/2 ports of certain software with a Unix heritage; most notably for me at the time, gcc. At some point, I eventually came across the RMS essays and started to be hooked.

Internet: The Hunt Begins I certainly was aware that the Internet was out there and interesting. But the first problem was: how the heck do I get connected to the Internet?

Computer labs There was one place that tended to have Internet access: colleges and universities. In 7th grade, I participated in a program that resulted in me being invited to visit Duke University, and in 8th grade, I participated in National History Day, resulting in a trip to visit the University of Maryland. I probably sought out computer labs at both of those. My most distinct memory was finding my way into a computer lab at one of those universities, and it was full of NeXT workstations. I had never seen or used NeXT before, and had no idea how to operate it. I had brought a box of floppy disks, unaware that the DOS disks probably weren t compatible with NeXT. Closer to home, a small college had a computer lab that I could also visit. I would go there in summer or when it wasn t used with my stack of floppies. I remember downloading disk images of FLOSS operating systems: FreeBSD, Slackware, or Debian, at the time. The hash marks from the DOS-based FTP client would creep across the screen as the 1.44MB disk images would slowly download. telnet was also available on those machines, so I could telnet to things like public-access Archie servers and libraries though not Gopher. Still, FTP and telnet access opened up a lot, and I learned quite a bit in those years.

Continuing the Journey At some point, I got a copy of the Whole Internet User s Guide and Catalog, published in 1994. I still have it. If it hadn t already figured it out by then, I certainly became aware from it that Unix was the dominant operating system on the Internet. The examples in Whole Internet covered FTP, telnet, gopher all assuming the user somehow got to a Unix prompt. The web was introduced about 300 pages in; clearly viewed as something that wasn t page 1 material. And it covered the command-line www client before introducing the graphical Mosaic. Even then, though, the book highlighted Mosaic s utility as a front-end for Gopher and FTP, and even the ability to launch telnet sessions by clicking on links. But having a copy of the book didn t equate to having any way to run Mosaic. The machines in the computer lab I mentioned above all ran DOS and were incapable of running a graphical browser. I had no SLIP or PPP (both ways to run Internet traffic over a modem) connectivity at home. In short, the Web was something for the large institutional users at the time.

CD-ROMs As CD-ROMs came out, with their huge (for the day) 650MB capacity, various companies started collecting software that could be downloaded on the Internet and selling it on CD-ROM. The two most popular ones were Walnut Creek CD-ROM and Infomagic. One could buy extensive Shareware and gaming collections, and then even entire Linux and BSD distributions. Although not exactly an Internet service per se, it was a way of bringing what may ordinarily only be accessible to institutional users into the home computer realm.

Free Software Jumps In As I mentioned, by the mid 90s, I had come across RMS s writings about free software most probably his 1992 essay Why Software Should Be Free. (Please note, this is not a commentary on the more recently-revealed issues surrounding RMS, but rather his writings and work as I encountered them in the 90s.) The notion of a Free operating system not just in cost but in openness was incredibly appealing. Not only could I tinker with it to a much greater extent due to having source for everything, but it included so much software that I d otherwise have to pay for. Compilers! Interpreters! Editors! Terminal emulators! And, especially, server software of all sorts. There d be no way I could afford or run Netware, but with a Free Unixy operating system, I could do all that. My interest was obviously piqued. Add to that the fact that I could actually participate and contribute I was about to become hooked on something that I ve stayed hooked on for decades. But then the question was: which Free operating system? Eventually I chose FreeBSD to begin with; that would have been sometime in 1995. I don t recall the exact reasons for that. I remember downloading Slackware install floppies, and probably the fact that Debian wasn t yet at 1.0 scared me off for a time. FreeBSD s fantastic Handbook far better than anything I could find for Linux at the time was no doubt also a factor.

The de Raadt Factor Why not NetBSD or OpenBSD? The short answer is Theo de Raadt. Somewhere in this time, when I was somewhere between 14 and 16 years old, I asked some questions comparing NetBSD to the other two free BSDs. This was on a NetBSD mailing list, but for some reason Theo saw it and got a flame war going, which CC d me. Now keep in mind that even if NetBSD had a web presence at the time, it would have been minimal, and I would have not all that unusually for the time had no way to access it. I was certainly not aware of the, shall we say, acrimony between Theo and NetBSD. While I had certainly seen an online flamewar before, this took on a different and more disturbing tone; months later, Theo randomly emailed me under the subject SLIME saying that I was, well, SLIME . I seem to recall periodic emails from him thereafter reminding me that he hates me and that he had blocked me. (Disclaimer: I have poor email archives from this period, so the full details are lost to me, but I believe I am accurately conveying these events from over 25 years ago) This was a surprise, and an unpleasant one. I was trying to learn, and while it is possible I didn t understand some aspect or other of netiquette (or Theo s personal hatred of NetBSD) at the time, still that is not a reason to flame a 16-year-old (though he would have had no way to know my age). This didn t leave any kind of scar, but did leave a lasting impression; to this day, I am particularly concerned with how FLOSS projects handle poisonous people. Debian, for instance, has come a long way in this over the years, and even Linus Torvalds has turned over a new leaf. I don t know if Theo has. In any case, I didn t use NetBSD then. I did try it periodically in the years since, but never found it compelling enough to justify a large switch from Debian. I never tried OpenBSD for various reasons, but one of them was that I didn t want to join a community that tolerates behavior such as Theo s from its leader.

Moving to FreeBSD Moving from OS/2 to FreeBSD was final. That is, I didn t have enough hard drive space to keep both. I also didn t have the backup capacity to back up OS/2 completely. My BBS, which ran Virtual BBS (and at some point also AdeptXBBS) was deleted and reincarnated in a different form. My BBS was a member of both FidoNet and VirtualNet; the latter was specific to VBBS, and had to be dropped. I believe I may have also had to drop the FidoNet link for a time. This was the biggest change of computing in my life to that point. The earlier experiences hadn t literally destroyed what came before. OS/2 could still run my DOS programs. Its command shell was quite DOS-like. It ran Windows programs. I was going to throw all that away and leap into the unknown. I wish I had saved a copy of my BBS; I would love to see the messages I exchanged back then, or see its menu screens again. I have little memory of what it looked like. But other than that, I have no regrets. Pursuing Free, Unixy operating systems brought me a lot of enjoyment and a good career. That s not to say it was easy. All the problems of not being in the Microsoft ecosystem were magnified under FreeBSD and Linux. In a day before EDID, monitor timings had to be calculated manually and you risked destroying your monitor if you got them wrong. Word processing and spreadsheet software was pretty much not there for FreeBSD or Linux at the time; I was therefore forced to learn LaTeX and actually appreciated that. Software like PageMaker or CorelDraw was certainly nowhere to be found for those free operating systems either. But I got a ton of new capabilities. I mentioned the BBS didn t shut down, and indeed it didn t. I ran what was surely a supremely unique oddity: a free, dialin Unix shell server in the middle of a small town in Kansas. I m sure I provided things such as pine for email and some help text and maybe even printouts for how to use it. The set of callers slowly grew over the time period, in fact. And then I got UUCP.

Enter UUCP Even throughout all this, there was no local Internet provider and things were still long distance. I had Internet Email access via assorted strange routes, but they were all strange. And, I wanted access to Usenet. In 1995, it happened. The local ISP I mentioned offered UUCP access. Though I couldn t afford the dialup shell (or later, SLIP/PPP) that they offered due to long-distance costs, UUCP s very efficient batched processes looked doable. I believe I established that link when I was 15, so in 1995. I worked to register my domain,, as well. At the time, the process was a bit lengthy and involved downloading a text file form, filling it out in a precise way, sending it to InterNIC, and probably mailing them a check. Well I did that, and in September of 1995, became mine. I set up sendmail on my local system, as well as INN to handle the limited Usenet newsfeed I requested from the ISP. I even ran Majordomo to host some mailing lists, including some that were surprisingly high-traffic for a few-times-a-day long-distance modem UUCP link! The modem client programs for FreeBSD were somewhat less advanced than for OS/2, but I believe I wound up using Minicom or Seyon to continue to dial out to BBSs and, I believe, continue to use Learning Link. So all the while I was setting up my local BBS, I continued to have access to the text Internet, consisting of chiefly Gopher for me.

Switching to Debian I switched to Debian sometime in 1995 or 1996, and have been using Debian as my primary OS ever since. I continued to offer shell access, but added the WorldVU Atlantis menuing BBS system. This provided a return of a more BBS-like interface (by default; shell was still an uption) as well as some BBS door games such as LoRD and TradeWars 2002, running under DOS emulation. I also continued to run INN, and ran ifgate to allow FidoNet echomail to be presented into INN Usenet-like newsgroups, and netmail to be gated to Unix email. This worked pretty well. The BBS continued to grow in these days, peaking at about two dozen total user accounts, and maybe a dozen regular users.

Dial-up access availability I believe it was in 1996 that dial up PPP access finally became available in my small town. What a thrill! FINALLY! I could now FTP, use Gopher, telnet, and the web all from home. Of course, it was at modem speeds, but still. (Strangely, I have a memory of accessing the Web using WebExplorer from OS/2. I don t know exactly why; it s possible that by this time, I had upgraded to a 486 DX2/66 and was able to reinstall OS/2 on the old 25MHz 486, or maybe something was wrong with the timeline from my memories from 25 years ago above. Or perhaps I made the occasional long-distance call somewhere before I ditched OS/2.) Gopher sites still existed at this point, and I could access them using Netscape Navigator which likely became my standard Gopher client at that point. I don t recall using UMN text-mode gopher client locally at that time, though it s certainly possible I did.

The city Starting when I was 15, I took computer science classes at Wichita State University. The first one was a class in the summer of 1995 on C++. I remember being worried about being good enough for it I was, after all, just after my HS freshman year and had never taken the prerequisite C class. I loved it and got an A! By 1996, I was taking more classes. In 1996 or 1997 I stayed in Wichita during the day due to having more than one class. So, what would I do then but enjoy the computer lab? The CS dept. had two of them: one that had NCD X terminals connected to a pair of SunOS servers, and another one running Windows. I spent most of the time in the Unix lab with the NCDs; I d use Netscape or pine, write code, enjoy the University s fast Internet connection, and so forth. In 1997 I had graduated high school and that summer I moved to Wichita to attend college. As was so often the case, I shut down the BBS at that time. It would be 5 years until I again dealt with Internet at home in a rural community. By the time I moved to my apartment in Wichita, I had stopped using OS/2 entirely. I have no memory of ever having OS/2 there. Along the way, I had bought a Pentium 166, and then the most expensive piece of computing equipment I have ever owned: a DEC Alpha, which, of course, ran Linux.

ISDN I must have used dialup PPP for a time, but I eventually got a job working for the ISP I had used for UUCP, and then PPP. While there, I got a 128Kbps ISDN line installed in my apartment, and they gave me a discount on the service for it. That was around 3x the speed of a modem, and crucially was always on and gave me a public IP. No longer did I have to use UUCP; now I got to host my own things! By at least 1998, I was running a web server on, and I had an FTP server going as well.

Even Bigger Cities In 1999 I moved to Dallas, and there got my first broadband connection: an ADSL link at, I think, 1.5Mbps! Now that was something! But it had some reliability problems. I eventually put together a server and had it hosted at an acquantaince s place who had SDSL in his apartment. Within a couple of years, I had switched to various kinds of proper hosting for it, but that is a whole other article. In Indianapolis, I got a cable modem for the first time, with even tighter speeds but prohibitions on running servers on it. Yuck.

Challenges Being non-Microsoft continued to have challenges. Until the advent of Firefox, a web browser was one of the biggest. While Netscape supported Linux on i386, it didn t support Linux on Alpha. I hobbled along with various attempts at emulators, old versions of Mosaic, and so forth. And, until StarOffice was open-sourced as Open Office, reading Microsoft file formats was also a challenge, though WordPerfect was briefly available for Linux. Over the years, I have become used to the Linux ecosystem. Perhaps I use Gimp instead of Photoshop and digikam instead of well, whatever somebody would use on Windows. But I get ZFS, and containers, and so much that isn t available there. Yes, I know Apple never went away and is a thing, but for most of the time period I discuss in this article, at least after the rise of DOS, it was niche compared to the PC market.

Back to Kansas In 2002, I moved back to Kansas, to a rural home near a different small town in the county next to where I grew up. Over there, it was back to dialup at home, but I had faster access at work. I didn t much care for this, and thus began a 20+-year effort to get broadband in the country. At first, I got a wireless link, which worked well enough in the winter, but had serious problems in the summer when the trees leafed out. Eventually DSL became available locally highly unreliable, but still, it was something. Then I moved back to the community I grew up in, a few miles from where I grew up. Again I got DSL a bit better. But after some years, being at the end of the run of DSL meant I had poor speeds and reliability problems. I eventually switched to various wireless ISPs, which continues to the present day; while people in cities can get Gbps service, I can get, at best, about 50Mbps. Long-distance fees are gone, but the speed disparity remains.

Concluding Reflections I am glad I grew up where I did; the strong community has a lot of advantages I don t have room to discuss here. In a number of very real senses, having no local services made things a lot more difficult than they otherwise would have been. However, perhaps I could say that I also learned a lot through the need to come up with inventive solutions to those challenges. To this day, I think a lot about computing in remote environments: partially because I live in one, and partially because I enjoy visiting places that are remote enough that they have no Internet, phone, or cell service whatsoever. I have written articles like Tools for Communicating Offline and in Difficult Circumstances based on my own personal experience. I instinctively think about making protocols robust in the face of various kinds of connectivity failures because I experience various kinds of connectivity failures myself.

(Almost) Everything Lives On In 2002, Gopher turned 10 years old. It had probably been about 9 or 10 years since I had first used Gopher, which was the first way I got on live Internet from my house. It was hard to believe. By that point, I had an always-on Internet link at home and at work. I had my Alpha, and probably also at least PCMCIA Ethernet for a laptop (many laptops had modems by the 90s also). Despite its popularity in the early 90s, less than 10 years after it came on the scene and started to unify the Internet, it was mostly forgotten. And it was at that moment that I decided to try to resurrect it. The University of Minnesota finally released it under an Open Source license. I wrote the first new gopher server in years, pygopherd, and introduced gopher to Debian. Gopher lives on; there are now quite a few Gopher clients and servers out there, newly started post-2002. The Gemini protocol can be thought of as something akin to Gopher 2.0, and it too has a small but blossoming ecosystem. Archie, the old FTP search tool, is dead though. Same for WAIS and a number of the other pre-web search tools. But still, even FTP lives on today. And BBSs? Well, they didn t go away either. Jason Scott s fabulous BBS documentary looks back at the history of the BBS, while Back to the BBS from last year talks about the modern BBS scene. FidoNet somehow is still alive and kicking. UUCP still has its place and has inspired a whole string of successors. Some, like NNCP, are clearly direct descendents of UUCP. Filespooler lives in that ecosystem, and you can even see UUCP concepts in projects as far afield as Syncthing and Meshtastic. Usenet still exists, and you can now run Usenet over NNCP just as I ran Usenet over UUCP back in the day (which you can still do as well). Telnet, of course, has been largely supplanted by ssh, but the concept is more popular now than ever, as Linux has made ssh be available on everything from Raspberry Pi to Android. And I still run a Gopher server, looking pretty much like it did in 2002. This post also has a permanent home on my website, where it may be periodically updated.

17 April 2021

Steve Kemp: Having fun with CP/M on a Z80 single-board computer.

In the past, I've talked about building a Z80-based computer. I made some progress towards that goal, in the sense that I took the initial (trivial steps) towards making something: But then I stalled, repeatedly, at designing an interface to RAM and ROM, so that it could actually do something useful. Over the lockdown I've been in two minds about getting sucked back down the rabbit-hole, so I compromised. I did a bit of searching on tindie, and similar places, and figured I'd buy a Z80-based single board computer. My requirements were minimal: With those goals there were a bunch of boards to choose from, rc2014 is the standard choice - a well engineered system which uses a common backplane and lets you build mini-boards to add functionality. So first you build the CPU-card, then the RAM card, then the flash-disk card, etc. Over-engineered in one sense, extensible in another. (There are some single-board variants to cut down on soldering overhead, at a cost of less flexibility.) After a while I came across, which describes a simple board called the the Z80 playground. The advantage of this design is that it loads code from a USB stick, making it easy to transfer files to/from it, without the need for a compact flash card, or similar. The downside is that the system has only 64K RAM, meaning it cannot run CP/M 3, only 2.2. (CP/M 3.x requires more RAM, and a banking/paging system setup to swap between pages.) When the system boots it loads code from an EEPROM, which then fetches the CP/M files from the USB-stick, copies them into RAM and executes them. The memory map can be split so you either have ROM & RAM, or you have just RAM (after the boot the ROM will be switched off). To change the initial stuff you need to reprogram the EEPROM, after that it's just a matter of adding binaries to the stick or transferring them over the serial port. In only a couple of hours I got the basic stuff working as well as I needed: I had some fun with a CP/M emulator to get my hand back in things before the board arrived, and using that I tested my first "real" assembly language program (cls to clear the screen), as well as got the hang of using the wordstar keyboard shortcuts as used within the turbo pascal environment. I have some plans for development: Nothing major, but fun changes that won't be too difficult to implement. Since CP/M 2.x has no concept of sub-directories you end up using drives for everything, I implemented a "search-path" so that when you type "FOO" it will attempt to run "A:FOO.COM" if there is no file matching on the current-drive. That's a nicer user-experience at all. I also wrote some Z80-assembly code to search all drives for an executable, if not found in current drive and not already qualified. Remember CP/M doesn't have a concept of sub-directories) that's actually pretty useful:
I've also written some other trivial assembly language tools, which was surprisingly relaxing. Especially once I got back into the zen mode of optimizing for size. I forked the upstream repository, mostly to tidy up the contents, rather than because I want to go into my own direction. I'll keep the contents in sync, because there's no point splitting a community even further - I guess there are fewer than 100 of these boards in the wild, probably far far fewer!

9 September 2020

Reproducible Builds: Reproducible Builds in August 2020

Welcome to the August 2020 report from the Reproducible Builds project. In our monthly reports, we summarise the things that we have been up to over the past month. The motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced from the original free software source code to the pre-compiled binaries we install on our systems. If you re interested in contributing to the project, please visit our main website.

This month, Jennifer Helsby launched a new website to address the lack of reproducibility of Python wheels. To quote Jennifer s accompanying explanatory blog post:
One hiccup we ve encountered in SecureDrop development is that not all Python wheels can be built reproducibly. We ship multiple (Python) projects in Debian packages, with Python dependencies included in those packages as wheels. In order for our Debian packages to be reproducible, we need that wheel build process to also be reproducible
Parallel to this, was also launched, a service that verifies the contents of URLs against a publicly recorded cryptographic log. It keeps an append-only log of the cryptographic digests of all URLs it has seen. (GitHub repo) On 18th September, Bernhard M. Wiedemann will give a presentation in German, titled Wie reproducible builds Software sicherer machen ( How reproducible builds make software more secure ) at the Internet Security Digital Days 2020 conference.

Reproducible builds at DebConf20 There were a number of talks at the recent online-only DebConf20 conference on the topic of reproducible builds. Holger gave a talk titled Reproducing Bullseye in practice , focusing on independently verifying that the binaries distributed from are made from their claimed sources. It also served as a general update on the status of reproducible builds within Debian. The video (145 MB) and slides are available. There were also a number of other talks that involved Reproducible Builds too. For example, the Malayalam language mini-conference had a talk titled , ? ( I want to join Debian, what should I do? ) presented by Praveen Arimbrathodiyil, the Clojure Packaging Team BoF session led by Elana Hashman, as well as Where is Salsa CI right now? that was on the topic of Salsa, the collaborative development server that Debian uses to provide the necessary tools for package maintainers, packaging teams and so on. Jonathan Bustillos (Jathan) also gave a talk in Spanish titled Un camino verificable desde el origen hasta el binario ( A verifiable path from source to binary ). (Video, 88MB)

Development work After many years of development work, the compiler for the Rust programming language now generates reproducible binary code. This generated some general discussion on Reddit on the topic of reproducibility in general. Paul Spooren posted a request for comments to OpenWrt s openwrt-devel mailing list asking for clarification on when to raise the PKG_RELEASE identifier of a package. This is needed in order to successfully perform rebuilds in a reproducible builds context. In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update. Chris Lamb provided some comments and pointers on an upstream issue regarding the reproducibility of a Snap / SquashFS archive file. [ ]

Debian Holger Levsen identified that a large number of Debian .buildinfo build certificates have been tainted on the official Debian build servers, as these environments have files underneath the /usr/local/sbin directory [ ]. He also filed against bug for debrebuild after spotting that it can fail to download packages from [ ]. This month, several issues were uncovered (or assisted) due to the efforts of reproducible builds. For instance, Debian bug #968710 was filed by Simon McVittie, which describes a problem with detached debug symbol files (required to generate a traceback) that is unlikely to have been discovered without reproducible builds. In addition, Jelmer Vernooij called attention that the new Debian Janitor tool is using the property of reproducibility (as well as diffoscope when applying archive-wide changes to Debian:
New merge proposals also include a link to the diffoscope diff between a vanilla build and the build with changes. Unfortunately these can be a bit noisy for packages that are not reproducible yet, due to the difference in build environment between the two builds. [ ]
56 reviews of Debian packages were added, 38 were updated and 24 were removed this month adding to our knowledge about identified issues. Specifically, Chris Lamb added and categorised the nondeterministic_version_generated_by_python_param and the lessc_nondeterministic_keys toolchain issues. [ ][ ] Holger Levsen sponsored Lukas Puehringer s upload of the python-securesystemslib pacage, which is a dependency of in-toto, a framework to secure the integrity of software supply chains. [ ] Lastly, Chris Lamb further refined his merge request against the debian-installer component to allow all arguments from sources.list files (such as [check-valid-until=no]) in order that we can test the reproducibility of the installer images on the Reproducible Builds own testing infrastructure and sent a ping to the team that maintains that code.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds. In August, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 155, 156, 157 and 158 to Debian:
  • New features:
    • Support extracting data of PGP signed data. (#214)
    • Try files named .pgp against pgpdump(1) to determine whether they are Pretty Good Privacy (PGP) files. (#211)
    • Support multiple options for all file extension matching. [ ]
  • Bug fixes:
    • Don t raise an exception when we encounter XML files with <!ENTITY> declarations inside the Document Type Definition (DTD), or when a DTD or entity references an external resource. (#212)
    • pgpdump(1) can successfully parse some binary files, so check that the parsed output contains something sensible before accepting it. [ ]
    • Temporarily drop gnumeric from the Debian build-dependencies as it has been removed from the testing distribution. (#968742)
    • Correctly use fallback_recognises to prevent matching .xsb binary XML files.
    • Correct identify signed PGP files as file(1) returns data . (#211)
  • Logging improvements:
    • Emit a message when ppudump version does not match our file header. [ ]
    • Don t use Python s repr(object) output in Calling external command messages. [ ]
    • Include the filename in the not identified by any comparator message. [ ]
  • Codebase improvements:
    • Bump Python requirement from 3.6 to 3.7. Most distributions are either shipping with Python 3.5 or 3.7, so supporting 3.6 is not only somewhat unnecessary but also cumbersome to test locally. [ ]
    • Drop some unused imports [ ], drop an unnecessary dictionary comprehensions [ ] and some unnecessary control flow [ ].
    • Correct typo of output in a comment. [ ]
  • Release process:
    • Move generation of debian/tests/control to an external script. [ ]
    • Add some URLs for the site that will appear on [ ]
    • Update author and author email in for and similar. [ ]
  • Testsuite improvements:
    • Update PPU tests for compatibility with Free Pascal versions 3.2.0 or greater. (#968124)
    • Mark that our identification test for .ppu files requires ppudump version 3.2.0 or higher. [ ]
    • Add an assert_diff helper that loads and compares a fixture output. [ ][ ][ ][ ]
  • Misc:
In addition, Mattia Rizzolo documented in that diffoscope works with Python version 3.8 [ ] and Frazer Clews applied some Pylint suggestions [ ] and removed some deprecated methods [ ].

Website This month, Chris Lamb updated the main Reproducible Builds website and documentation to:
  • Clarify & fix a few entries on the who page [ ][ ] and ensure that images do not get to large on some viewports [ ].
  • Clarify use of a pronoun re. Conservancy. [ ]
  • Use View all our monthly reports over View all monthly reports . [ ]
  • Move a is a suffix out of the link target on the SOURCE_DATE_EPOCH age. [ ]
In addition, Javier Jard n added the freedesktop-sdk project [ ] and Kushal Das added SecureDrop project [ ] to our projects page. Lastly, Michael P hn added internationalisation and translation support with help from Hans-Christoph Steiner [ ].

Testing framework The Reproducible Builds project operate a Jenkins-based testing framework to power This month, Holger Levsen made the following changes:
  • System health checks:
    • Improve explanation how the status and scores are calculated. [ ][ ]
    • Update and condense view of detected issues. [ ][ ]
    • Query the canonical configuration file to determine whether a job is disabled instead of duplicating/hardcoding this. [ ]
    • Detect several problems when updating the status of reporting-oriented metapackage sets. [ ]
    • Detect when diffoscope is not installable [ ] and failures in DNS resolution [ ].
  • Debian:
    • Update the URL to the Debian security team bug tracker s Git repository. [ ]
    • Reschedule the unstable and bullseye distributions often for the arm64 architecture. [ ]
    • Schedule buster less often for armhf. [ ][ ][ ]
    • Force the build of certain packages in the work-in-progress package rebuilder. [ ][ ]
    • Only update the stretch and buster base build images when necessary. [ ]
  • Other distributions:
    • For F-Droid, trigger jobs by commits, not by a timer. [ ]
    • Disable the Archlinux HTML page generation job as it has never worked. [ ]
    • Disable the alternative OpenWrt rebuilder jobs. [ ]
  • Misc;
Many other changes were made too, including:
  • Chris Lamb:
    • Use <pre> HTML tags when dumping fixed-width debugging data in the self-serve package scheduler. [ ]
  • Mattia Rizzolo:
  • Vagrant Cascadian:
    • Mark that the u-boot Universal Boot Loader should not build architecture independent packages on the arm64 architecture anymore. [ ]
Finally, build node maintenance was performed by Holger Levsen [ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ]

Mailing list On our mailing list this month, Leo Wandersleb sent a message to the list after he was wondering how to expand his project (which aims to improve the security of Bitcoin wallets) from Android wallets to also monitor Linux wallets as well:
If you think you know how to spread the word about reproducibility in the context of Bitcoin wallets through WalletScrutiny, your contributions are highly welcome on this PR [ ]
Julien Lepiller posted to the list linking to a blog post by Tavis Ormandy titled You don t need reproducible builds. Morten Linderud (foxboron) responded with a clear rebuttal that Tavis was only considering the narrow use-case of proprietary vendors and closed-source software. He additionally noted that the criticism that reproducible builds cannot prevent against backdoors being deliberately introduced into the upstream source ( bugdoors ) are decidedly (and deliberately) outside the scope of reproducible builds to begin with. Chris Lamb included the Reproducible Builds mailing list in a wider discussion regarding a tentative proposal to include .buildinfo files in .deb packages, adding his remarks regarding requiring a custom tool in order to determine whether generated build artifacts are identical in a reproducible context. [ ] Jonathan Bustillos (Jathan) posted a quick email to the list requesting whether there was a list of To do tasks in Reproducible Builds. Lastly, Chris Lamb responded at length to a query regarding the status of reproducible builds for Debian ISO or installation images. He noted that most of the technical work has been performed but there are at least four issues until they can be generally advertised as such . He pointed that the privacy-oriented Tails operation system, which is based directly on Debian, has had reproducible builds for a number of years now. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

31 August 2020

Chris Lamb: Free software activities in August 2020

Here is another monthly update covering what I have been doing in the free software world during August 2020 (previous month): I uploaded Lintian versions 2.86.0, 2.87.0, 2.88.0, 2.89.0, 2.90.0, 2.91.0 and 2.92.0, as well as made the following changes:

Reproducible Builds One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. The project is proud to be a member project of the Software Freedom Conservancy. Conservancy acts as a corporate umbrella allowing projects to operate as non-profit initiatives without managing their own corporate structure. If you like the work of the Conservancy or the Reproducible Builds project, please consider becoming an official supporter. This month, I:

diffoscope I made the following changes to diffoscope, including preparing and uploading versions 155, 156, 157 and 158 to Debian:

Debian Debian LTS This month I have worked 18 hours on Debian Long Term Support (LTS) and 12 hours on its sister Extended LTS project. You can find out more about the project via the following video:

Uploads to Debian

14 August 2020

Reproducible Builds (diffoscope): diffoscope 156 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 156. This version includes the following changes:
[ Chris Lamb ]
* Update PPU tests for compatibility with Free Pascal versions 3.2.0 or
  greater. (Closes: #968124)
* Emit a debug-level logging message when our ppudump(1) version does not
  match file header.
* Add and use an assert_diff helper that loads and compares a fixture output
  to avoid a bunch of test boilerplate.
[ Frazer Clews ]
* Apply some pylint suggestions to the codebase.
You find out more by visiting the project homepage.

1 April 2020

Joachim Breitner: 30 years of Haskell

Vitaly Bragilevsky, in a mail to the GHC Steering Committee, reminded me that the first version of the Haskell programming language was released exactly 30 years ago. On April 1st. So that raises the question: Was Haskell just an April fool's joke that was never retracted?
The cover of the 1.0 Haskell report

The cover of the 1.0 Haskell report

My own first exposure to Haskell was in April 2005; the oldest piece of Haskell I could find on my machine is this part of a university assignment from April:
> pascal 1 = [1]
> pascal (n+1) = zipWith (+) (x ++ [0]) (0 : x) where x = pascal n
This means that I now have witnessed half of Haskell's existence. I have never regretted getting into Haskell, and every time I come back from having worked in other languages (which all have their merits too), I greatly enjoy the beauty and elegance of expressing my ideas in a lazy and strictly typed language with a concise syntax. I am looking forward to witnessing (and, to a very small degree, shaping) the next 15 years of Haskell.

31 July 2017

Chris Lamb: Free software activities in July 2017

Here is my monthly update covering what I have been doing in the free software world during July 2017 (previous month): I also blogged about my recent lintian hacking and installation-birthday package.
Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to permit verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. (I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area.) This month I:
  • Assisted Mattia with a draft of an extensive status update to the debian-devel-announce mailing list. There were interesting follow-up discussions on Hacker News and Reddit.
  • Submitted the following patches to fix reproducibility-related toolchain issues within Debian:
  • I also submitted 5 patches to fix specific reproducibility issues in autopep8, castle-game-engine, grep, libcdio & tinymux.
  • Categorised a large number of packages and issues in the Reproducible Builds "notes" repository.
  • Worked on publishing our weekly reports. (#114 #115, #116 & #117)

I also made the following changes to our tooling:

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues.

  • comparators.xml:
    • Fix EPUB "missing file" tests; they ship a META-INF/container.xml file. [ ]
    • Misc style fixups. [ ]
  • APK files can also be identified as "DOS/MBR boot sector". (#868486)
  • comparators.sqlite: Simplify file detection by rewriting manual recognizes call with a Sqlite3Database.RE_FILE_TYPE definition. [ ]
    • Revert the removal of a try-except. (#868534)
    • Tidy module. [ ]


strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build.

  • Add missing File::Temp imports in the JAR and PNG handlers. This appears to have been exposed by lazily-loading handlers in #867982. (#868077) is my experiment into how to process, store and distribute .buildinfo files after the Debian archive software has processed them.

  • Avoid a race condition between check-and-creation of Buildinfo instances. [ ]

Debian My activities as the current Debian Project Leader are covered in my "Bits from the DPL emails to the debian-devel-announce mailing list.
Patches contributed
  • obs-studio: Remove annoying "click wrapper" on first startup. (#867756)
  • vim: Syntax highlighting for debian/copyright files. (#869965)
  • moin: Incorrect timezone offset applied due to "84600" typo. (#868463)
  • ssss: Add a simple autopkgtest. (#869645)
  • dch: Please bump $latest_bpo_dist to current stable release. (#867662)
  • python-kaitaistruct: Remove Markdown and homepage references from package long descriptions. (#869265)
  • album-data: Correct invalid Vcs-Git URI. (#869822)
  • pytest-sourceorder: Update Homepage field. (#869125)
I also made a very large number of contributions to the Lintian static analysis tool. To avoid duplication here, I have outlined them in a separate post.

Debian LTS

This month I have been paid to work 18 hours on Debian Long Term Support (LTS). In that time I did the following:
  • "Frontdesk" duties, triaging CVEs, etc.
  • Issued DLA 1014-1 for libclamunrar, a library to add unrar support to the Clam anti-virus software to fix an arbitrary code execution vulnerability.
  • Issued DLA 1015-1 for the libgcrypt11 crypto library to fix a "sliding windows" information leak.
  • Issued DLA 1016-1 for radare2 (a reverse-engineering framework) to prevent a remote denial-of-service attack.
  • Issued DLA 1017-1 to fix a heap-based buffer over-read in the mpg123 audio library.
  • Issued DLA 1018-1 for the sqlite3 database engine to prevent a vulnerability that could be exploited via a specially-crafted database file.
  • Issued DLA 1019-1 to patch a cross-site scripting (XSS) exploit in phpldapadmin, a web-based interface for administering LDAP servers.
  • Issued DLA 1024-1 to prevent an information leak in nginx via a specially-crafted HTTP range.
  • Issued DLA 1028-1 for apache2 to prevent the leakage of potentially confidential information via providing Authorization Digest headers.
  • Issued DLA 1033-1 for the memcached in-memory object caching server to prevent a remote denial-of-service attack.

  • redis:
    • 4:4.0.0-1 Upload new major upstream release to unstable.
    • 4:4.0.0-2 Make /usr/bin/redis-server in the primary package a symlink to /usr/bin/redis-check-rdb in the redis-tools package to prevent duplicate debug symbols that result in a package file collision. (#868551)
    • 4:4.0.0-3 Add -latomic to LDFLAGS to avoid a FTBFS on the mips & mipsel architectures.
    • 4:4.0.1-1 New upstream version. Install 00-RELEASENOTES as the upstream changelog.
    • 4:4.0.1-2 Skip non-deterministic tests that rely on timing. (#857855)
  • python-django:
    • 1:1.11.3-1 New upstream bugfix release. Check DEB_BUILD_PROFILES consistently, not DEB_BUILD_OPTIONS.
  • bfs:
    • 1.0.2-2 & 1.0.2-3 Use help2man to generate a manpage.
    • 1.0.2-4 Set hardening=+all for bindnow, etc.
    • 1.0.2-5 & 1.0.2-6 Don't use upstream's release target as it overrides our CFLAGS & install as the upstream changelog.
    • 1.1-1 New upstream release.
  • libfiu:
    • 0.95-4 Apply patch from Steve Langasek to fix autopkgtests. (#869709)
  • python-daiquiri:
    • 1.0.1-1 Initial upload. (ITP)
    • 1.1.0-1 New upstream release.
    • 1.1.0-2 Tidy package long description.
    • 1.2.1-1 New upstream release.

I also reviewed and sponsored the uploads of gtts-token 1.1.1-1 and nlopt 2.4.2+dfsg-3.

Debian bugs filed
  • ITP: python-daiquiri Python library to easily setup basic logging functionality. (#867322)
  • twittering-mode: Correct incorrect time formatting due to "84600" typo. (#868479)

10 September 2016

Sylvain Le Gall: Release of OASIS 0.4.7

I am happy to announce the release of OASIS v0.4.7. Logo OASIS small OASIS is a tool to help OCaml developers to integrate configure, build and install systems in their projects. It should help to create standard entry points in the source code build system, allowing external tools to analyse projects easily. This tool is freely inspired by Cabal which is the same kind of tool for Haskell. You can find the new release here and the changelog here. More information about OASIS in general on the OASIS website. Pull request for inclusion in OPAM is pending. Here is a quick summary of the important changes: Features: This version contains a lot of changes and is the achievement of a huge amount of work. The addition of OMake as a plugin is a huge progress. The overall work has been targeted at making OASIS more library like. This is still a work in progress but we made some clear improvement by getting rid of various side effect (like the requirement of using "chdir" to handle the "-C", which leads to propage ~ctxt everywhere and design OASISFileSystem). I would like to thanks again the contributor for this release: Spiros Eliopoulos, Paul Snively, Jeremie Dimino, Christopher Zimmermann, Christophe Troestler, Max Mouratov, Jacques-Pascal Deplaix, Geoff Shannon, Simon Cruanes, Vladimir Brankov, Gabriel Radanne, Evgenii Lepikhin, Petter Urkedal, Gerd Stolpmann and Anton Bachin.

6 August 2016

Mirco Bauer: Ethereum GPU Mining on Linux How-To

TL;DR Install/use Debian 8 or Ubuntu 16.0.4 then execute:
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:ethereum/ethereum
sudo sed 's/jessie/vivid/' -i /etc/apt/sources.list.d/ethereum-ethereum-*.list
sudo apt-get update
sudo apt-get install ethereum ethminer
geth account new
# copy long character sequence within  , that is your <YOUR_WALLET_ADDRESS>
# if you lose the passphrase, you lose your coins!
sudo apt-get install linux-headers-amd64 build-essential
chmod +x
ethminer -G -F<YOUR_WALLET_ADDRESS> --farm-recheck 200
echo done
My Attention Span is > 60 seconds Ethereum is a crypto currency similar to Bitcoin as it is based on the blockchain technology. Ethereum is not yet another Bitcoin clone though, since it has an additional feature called Smart Contracts that makes it unique and very promising. I am not going into details how Ethereum works, you can get that into great detail on the Internet. This post is about Ethereum mining. Mining is how crypto coins are created. You need to spent computing time to get coins out. At the beginning CPU mining was sufficient, but as the Ethereum network difficulty has increased you need to use GPUs as they can calculate at a much higher hashrate than a general purpose CPU can do. About 2 months ago I bought a new gaming rig, with a Nvidia GTX 1070 so I can experience virtual-reality gaming with a HTC Vive at a great framerate. As it turns out modern graphics cards are very good at hashing so I gave it a spin. Initially I did this mining setup with Windows 10, as that is the operating system on my gaming rig. If you want to do Ethereum mining using your GPU, then you really want to use Linux. On Windows the GTX 1070 produced a hashrate of 6 MH/s (megahashes per second) while the same hardware does 25 MH/s on Linux. The hashrate multiplied by 4 by using Linux instead of Windows. Sounds good? Keep reading and follow this guide. You have to pick a Linux distro to use for mining. As I am a Debian developer, all my systems run Debian, which is what I am also using for this guide. The same procedure can be done for Ubuntu as it is similar enough. For other distros you have to substitute the steps yourself. So I assume you already have Debian 8 or Ubuntu 16.04 installed on your system.

Install Ethereum Software First we need the geth tool which is the main Ethereum "client". Ethereum is really a peer-to-peer network, that means each node is a server and client at the same time. A node that contains the complete blockchain history in a database is called a full node. For this guide you don't need to run a full node, as mining pools do this for you. We still need geth to create the private key of your Ethereum wallet. Somewhere we have to receive the coins we are mining ;) Add the Ethereum APT repository using these commands:
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:ethereum/ethereum
sudo apt-get update
On Debian 8 (on Ubuntu you can skip this) you need to replace the repository name with this command:
sudo sed 's/jessie/vivid/' -i /etc/apt/sources.list.d/ethereum-ethereum-*.list
sudo apt-get update
Install ethereum, ethminer and geth:
sudo apt-get install ethereum ethminer geth

Create Ethereum Wallet A wallet is where coins are "stored". They are not really stored in the wallet because the wallet is just a private key that nobody has. The balance of that wallet is visible to everyone using the blockchain database. And this is what full nodes do, they contain and distribute the database to all other peers. So this this command to create your first private key for your wallet:
geth account new
Be aware, that this passphrase protects the private key of your wallet. Anyone who has access to that file and knows your passphrase will have full control over your coins. And also do not forget the passphrase, as if you do, you lost all your coins! The output of "geth account new" shows a long character/number sequence quoted in . This is your wallet address and you should write that number down, as if someone wants to send you money, then it is to that address. We will use that for the mining pool later.

Install (proprietary) nvidia driver For OpenCL to work with nvidia graphics cards, like my GTX 1070, you need to install this proprietary driver from nvidia. If you have an older card maybe the opensource drivers will work for you. For the nvidia pascal cards numbers 10xx you will need this driver package. After you have agreed the terms, download the file. But before we can use that installer we need to install some dependencies that installer needs as it will have to compile a Linux kernel module for you. Install the dependencies using this command:
sudo apt-get install linux-headers-amd64 build-essential
Now we can make the installer executable and run it like this:
chmod +x
If that step completed without error, then we should be able to run the mining benchmark!
ethminer -M -G
The -M means "run benchmark" and the -G is for GPU mining. The first time you run it it will create a DAG file and that will takes a while. For me it took about 12 minutes on my GTX 1070. After that is should show a inner mean hashrate. If it says H/s that is hashes per second and KH is kilo (H/1000) and MH is megahashes per second (KH/1000). I had numbers around 25-30 MH/s, but for real mining you will see an average that is a balanced number and not a min/max range.

Pick Ethereum Network Now it gets serious, you need to decide 2 things. First which Ethereum network you want to mine for and the second is using which pool. Ethereum has 2 networks, one is called Ethereum One or Core, while the other is called Ethereum Classic. Ethereum has made a hardfork to undo the consequences of a software bug in the DAO. The DAO is a smart contract for a decentralized organization. Because of that bug, a blackhat could use that bug to obtain money from that DAO. The Ethereum developers made a poll and decided that the consequences will be undone. Not everyone agreed and the old network stayed alive and is now called Ethereum Classic short ETC. The hardfork kept its short name ETH. This is important to understand for mining, because the hashing difficulty has a huge difference between ETH and ETC. As of writing, the hashrate of ETC is at 20% compared to ETH. Thus you need less computing time to get ETC coins and more time to get ETH coins. Differently said, ETC mining is currently more profitable.

Pick a Pool Hmmmm, I want a swimming pool, thanks! Just kidding... You can mine without a pool, that is called solo mining, but you will get less reward. A mining pool are multiple computers that work on the same block to find a solution quicker than others. The pool has an aggregated hashrate that is higher than other solo miners. Each found block by anyone in this pool will be rewarded to everyone in the pool. The reward of 5 ether currently per block gets split in the same ratio of hashrate each member provides (minus the pool fee). So while you get less for a found block, you still have a steady lower income rate instead of higher with less chance of finding one (in time). Simply said: you have to find a new block faster than the others to receive the reward. If you want to mine Ethereum Classic (ETC) use one of the pools listed here (at the bottom of the page). If you want to mine Ethereum One / Core (ETH) use one of the pools listed here.

Run ethminer The instruction page of the pool website usually says how to start the miner program, but here is an example of the pool that I use (because pony!):
ethminer -G -F<YOUR_WALLET_ADDRESS> --farm-recheck 200

Profit If this guide was helpful for you, you can tip me at ethereum:0x9ec1220d2f2fadd3f0c96e3007daa827bc83fbd6 or simply run the ethminer using my wallet address for a day or two:
ethminer -G -F --farm-recheck 200
Happy mining!

20 July 2016

Daniel Stender: Theano in Debian: maintenance, BLAS and CUDA

I'm glad to announce that we have the current release of Theano (0.8.2) in Debian unstable now, it's on its way into the testing branch and the Debian derivatives, heading for Debian 9. The Debian package is maintained in behalf of the Debian Science Team. We have a binary package with the modules in the Python 2.7 import path (python-theano), if you want or need to stick to that branch a little longer (as a matter of fact, in the current popcon stats it's the most installed package), and a package running on the default Python 3 version (python3-theano). The comprehensive documentation is available for offline usage in another binary package (theano-doc). Although Theano builds its extensions on run time and therefore all binary packages contain the same code, the source package generates arch specific packages1 for the reason that the exhaustive test suite could run over all the architectures to detect if there are problems somewhere (#824116). what's this? In a nutshell, Theano is a computer algebra system (CAS) and expression compiler, which is implemented in Python as a library. It is named after a Classical Greek female mathematician and it's developed at the LISA lab (located at MILA, the Montreal Institute for Learning Algorithms) at the Universit de Montr al. Theano tightly integrates multi-dimensional arrays (N-dimensional, ND-array) from NumPy (numpy.ndarray), which are broadly used in Scientific Python for the representation of numeric data. It features a declarative Python based language with symbolic operations for the functional definition of mathematical expressions, which allows to create functions that compute values for them. Internally the expressions are represented as directed graphs with nodes for variables and operations. The internal compiler then optimizes those graphs for stability and speed and then generates high-performance native machine code to evaluate resp. compute these mathematical expressions2. One of the main features of Theano is that it's capable to compute also on GPU processors (graphical processor unit), like on custom graphic cards (e.g. the developers are using a GeForce GTX Titan X for benchmarks). Today's GPUs became very powerful parallel floating point devices which can be employed also for scientific computations instead of 3D video games3. The acronym "GPGPU" (general purpose graphical processor unit) refers to special cards like NVIDIA's Tesla4, which could be used alike (more on that below). Thus, Theano is a high-performance number cruncher with an own computing engine which could be used for large-scale scientific computations. If you haven't came across Theano as a Pythonistic professional mathematician, it's also one of the most prevalent frameworks for implementing deep learning applications (training multi-layered, "deep" artificial neural networks, DNN) around5, and has been developed with a focus on machine learning from the ground up. There are several higher level user interfaces build in the top of Theano (for DNN, Keras, Lasagne, Blocks, and others, or for Python probalistic programming, PyMC3). I'll seek for some of them also becoming available in Debian, too. helper scripts Both binary packages ship three convenience scripts, theano-cache, theano-test, and theano-nose. Instead of them being copied into /usr/bin, which would result into a binaries-have-conflict violation, the scripts are to be found in /usr/share/python-theano (python3-theano respectively), so that both module packages of Theano can be installed at the same time. The scripts could be run directly from these folders, e.g. do $ python /usr/share/python-theano/theano-nose to achieve that. If you're going to heavy use them, you could add the directory of the flavour you prefer (Python 2 or Python 3) to the $PATH environment variable manually by either typing e.g. $ export PATH=/usr/share/python-theano:$PATH on the prompt, or save that line into ~/.bashrc. Manpages aren't available for these little helper scripts6, but you could always get info on what they do and which arguments they accept by invoking them with the -h (for theano-nose) resp. help flag (for theano-cache). running the tests On some occasions you might want to run the testsuite of the installed library, like to check over if everything runs fine on your GPU hardware. There are two different ways to run the tests (anyway you need to have python ,3 -nose installed). One is, you could launch the test suite by doing $ python -c 'import theano; theano.test() (or the same with python3 to test the other flavour), that's the same what the helper script theano-test does. However, by doing it that way some particular tests might fail by raising errors also for the group of known failures. Known failures are excluded from being errors if you run the tests by theano-nose, which is a wrapper around nosetests, so this might be always the better choice. You can run this convenience script with the option --theano on the installed library, or from the source package root, which you could pull by $ sudo apt-get source theano (there you have also the option to use bin/theano-nose). The script accept options for nosetests, so you might run it with -v to increase verbosity. For the tests the configuration switch config.device must be set to cpu. This will also include the GPU tests when a proper accessible device is detected, so that's a little misleading in the sense of it doesn't mean "run everything on the CPU". You're on the safe side if you run it always like this: $ THEANO_FLAGS=device=cpu theano-nose, if you've set config.device to gpu in your ~/.theanorc. Depending on the available hardware and the used BLAS implementation (see below) it could take quite a long time to run the whole test suite through, on the Core-i5 in my laptop that takes around an hour even excluded the GPU related tests (which perform pretty fast, though). Theano features a couple of switches to manipulate the default configuration for optimization and compilation. There is a rivalry between optimization and compilation costs against performance of the test suite, and it turned out the test suite performs a quicker with lesser graph optimization. There are two different switches available to control config.optimizer, the fast_run toggles maximal optimization, while fast_compile runs only a minimal set of graph optimization features. These settings are used by the general mode switches for config.mode, which is either FAST_RUN by default, or FAST_COMPILE. The default mode FAST_RUN (optimizer=fast_run, linker=cvm) needs around 72 minutes on my lower mid-level machine (on un-optimized BLAS). To set mode=FAST_COMPILE (optimizer=fast_compile, linker=py) brings some boost for the performance of the test suite because it runs the whole suite in 46 minutes. The downside of that is that C code compilation is disabled in this mode by using the linker py, and also the GPU related tests are not included. I've played around with using the optimizer fast_compile with some of the other linkers (c py and cvm, and their versions without garbage collection) as alternative to FAST_COMPILE with minimal optimization but also machine code compilation incl. GPU testing. But to my experience, fast_compile without another than the linker py results in some new errors and failures of some tests on amd64, and this might the case also on other architectures, too. By the way, another useful feature is DebugMode for config.mode, which verifies the correctness of all optimizations and compares the C to Python results. If you want to have detailed info on the configuration settings of Theano, do $ python -c 'import theano; print theano.config' less, and check out the chapter config in the library documentation in the documentation. cache maintenance Theano isn't a JIT (just-in-time) compiler like Numba, which generates native machine code in the memory and executes it immediately, but it saves the generated native machine code into compiledirs. The reason for doing it that way is quite practical like the docs explain, the persistent cache on disk makes it possible to avoid generating code for the same operation, and to avoid compiling again when different operations generate the same code. The compiledirs by default are located within $(HOME)/.theano/. After some time the folder becomes quite large, and might look something like this:
$ ls ~/.theano
If the used Python version changed like in this example you might to want to purge obsolete cache. For working with the cache resp. the compiledirs, the helper theano-cache comes in handy. If you invoke it without any arguments the current cache location is put out like ~/.theano/compiledir_Linux-4.5--amd64-x86_64-with-debian-stretch-sid--2.7.12-64 (the script is run from /usr/share/python-theano). So, the compiledirs for the old Python versions in this example (11+ and 12rc1) can be removed to free the space they occupy. All compiledirs resp. cache directories meaning the whole cache could be erased by $ theano-cache basecompiledir purge, the effect is the same as by performing $ rm -rf ~/.theano. You might want to do that e.g. if you're using different hardware, like when you got yourself another graphics card. Or habitual from time to time when the compiledirs fill up so much that it slows down processing with the harddisk being very busy all the time, if you don't have an SSD drive available. For example, the disk space of build chroots carrying (mainly) the tests completely compiled through on default Python 2 and Python 3 consumes around 1.3 GB (see here). BLAS implementations Theano needs a level 3 implementation of BLAS (Basic Linear Algebra Subprograms) for operations between vectors (one-dimensional mathematical objects) and matrices (two-dimensional objects) carried out on the CPU. NumPy is already build on BLAS and pulls the standard implementation (libblas3, soure package: lapack), but Theano links directly to it instead of using NumPy as intermediate layer to reduce the computational overhead. For this, Theano needs development headers and the binary packages pull libblas-dev by default, if any other development package of another BLAS implementation (like OpenBLAS or ATLAS) isn't already installed, or pulled with them (providing the virtual package The linker flags could be manipulated directly through the configuration switch config.blas.ldflags, which is by default set to -L/usr/lib -lblas -lblas. By the way, if you set it to an empty value, Theano falls back to using BLAS through NumPy, if you want to have that for some reason. On Debian, there is a very convenient way to switch between BLAS implementations by the alternatives mechanism. If you have several alternative implementations installed at the same time, you can switch from one to another easily by just doing:
$ sudo update-alternatives --config
There are 3 choices for the alternative (providing /usr/lib/
  Selection    Path                                  Priority   Status
* 0            /usr/lib/openblas-base/      40        auto mode
  1            /usr/lib/atlas-base/atlas/   35        manual mode
  2            /usr/lib/libblas/            10        manual mode
  3            /usr/lib/openblas-base/      40        manual mode
Press <enter> to keep the current choice[*], or type selection number:
The implementations are performing differently on different hardware, so you might want to take the time to compare which one does it best on your processor (the other packages are libatlas-base-dev and libopenblas-dev), and choose that to optimize your system. If you want to squeeze out all which is in there for carrying out Theano's computations on the CPU, another option is to compile an optimized version of a BLAS library especially for your processor. I'm going to write another blog posting on this issue. The binary packages of Theano ship the script to check over how well a BLAS implementation performs with it, and if everything works right. That script is located in the misc subfolder of the library, you could locate it by doing $ dpkg -L python-theano grep check_blas (or for the package python3-theano accordingly), and run it with the Python interpreter. By default the scripts puts out a lot of info like a huge perfomance comparison reference table, the current setting of blas.ldflags, the compiledir, the setting of floatX, OS information, the GCC version, the current NumPy config towards BLAS, NumPy location and version, if Theano linked directly or has used the NumPy binding, and finally and most important, the execution time. If just the execution time for quick perfomance comparisons is needed this script could be invoked with -q. Theano on CUDA The function compiler of Theano works with alternative backends to carry out the computations, like the ones for graphics cards. Currently, there are two different backends for GPU processing available, one docks onto NVIDIA's CUDA (Compute Unified Device Architecture) technology7, and another one for libgpuarray, which is also developed by the Theano developers in parallel. The libgpuarray library is an interesting alternative for Theano, it's a GPU tensor (multi-dimensional mathematical object) array written in C with Python bindings based on Cython, which has the advantage of running also on OpenCL8. OpenCL, unlike CUDA9, is full free software, vendor neutral and overcomes the limitation of the CUDA toolkit being only available for amd64 and the ppc64el port (see here). I've opened an ITP on libgpuarray and we'll see if and how this works out. Another reason for it would be great to have it available is that it looks like CUDA currently runs into problems with GCC 610. More on that, soon. Here's a litle checklist for setting up your CUDA device so that you don't have to experience something like this:
$ THEANO_FLAGS=device=gpu,floatX=float32 python ./ 
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
hardware check For running Theano on CUDA you need an NVIDIA graphics card which is capable of doing that. You can recheck if your device is supported by CUDA here. When the hardware isn't too old (CUDA support started with GeForce 8 and Quadro X series) or too strange I think it isn't working only in exceptional cases. You can check your model and if the device is present in the system on the bare hardware level by doing this:
$ lspci   grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
If a line like this doesn't get returned, your device most probably is broken, or not properly connected (ouch). If rev ff appears at the end of the line that means the device is off meaning powered down. This might be happening if you have a laptop with Optimus graphics hardware, and the related drivers have switched off the unoccupied device to safe energy11. kernel module Running CUDA applications requires the proprietary NVIDIA driver kernel module to be loaded into the kernel and working. If you haven't already installed it for another purpose, the NVIDIA driver and the CUDA toolkit are both in the non-free section of the Debian archive, which is not enabled by default. To get non-free packages you have to add non-free (and it's better to do so, also contrib) to your package source in /etc/apt/sources.list, which might then look like this:
deb testing main contrib non-free
After doing that, perform $ apt-cache update to update the package lists, and there you go with the non-free packages. The headers of the running kernel are needed to compile modules, you can get them together with the NVIDIA kernel module package by running:
$ sudo apt-get install linux-headers-$(uname -r) nvidia-kernel-dkms build-essential
DKMS will then build the NVIDIA module for the kernel and does some other things on the system. When the installation has finished, it's generally advised to reboot the system completely. troubleshooting If you have problems with the CUDA device, it's advised to verify if the following things concerning the NVIDIA driver resp. kernel module are in order: blacklist nouveau Check if the default Nouveau kernel module driver (which blocks the NVIDIA module) for some reason still gets loaded by doing $ lsmod grep nouveau. If nothing gets returned, that's right. If it's still in the kernel, just add blacklist nouveau to /etc/modprobe.d/blacklist.conf, and update the booting ramdisk with sudo update-initramfs -u afterwards. Then reboot once more, this shouldn't be the case then anymore. rebuild kernel module To fix it when the module haven't been properly compiled for some reason you could trigger a rebuild of the NVIDIA kernel module with $ sudo dpkg-reconfigure nvidia-kernel-dkms. When you're about to send your hardware in to repair because everything looks all right but the device just isn't working, that really could help (own experience). After the rebuild of the module or modules (if you have a few kernel packages installed) has completed, you could recheck if the module really is available by running:
$ sudo modinfo nvidia-current
filename:       /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko
alias:          char-major-195-*
version:        352.79
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
vermagic:       4.4.0-1-amd64 SMP mod_unload modversions 
parm:           NVreg_Mobile:int
It should be something similiar to this when everything is all right. reload kernel module When there are problems with the GPU, maybe the kernel module isn't properly loaded. You could recheck if the module has been properly loaded by doing
$ lsmod   grep nvidia
nvidia_uvm             73728  0
nvidia               8540160  1 nvidia_uvm
drm                   356352  7 i915,drm_kms_helper,nvidia
The kernel module could be loaded resp. reloaded with $ sudo nvidia-modprobe (that tool is from the package nvidia-modprobe). unsupported graphics card Be sure that you graphics cards is supported by the current driver kernel module. If you have bought new hardware, that's quite possible to come out being a problem. You can get the version of the current NVIDIA driver with:
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 5.3.1 20160528 (Debian 5.3.1-21)
Then, google the version number like nvidia 352.79, this should get you onto an official driver download page like this. There, check for what's to be found under "Supported Products". I you're stuck with that there are two options, to wait until the driver in Debian got updated, or replace it with the latest driver package from NVIDIA. That's possible to do, but something more for experienced users. occupied graphics card The CUDA driver cannot work while the graphical interface is busy like by processing the graphical display of your X.Org server. Which kernel driver actually is used to process the desktop could be examined by this command:12
$ grep '(II).*([0-9]):' /var/log/Xorg.0.log
[    37.700] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20150522
[    37.700] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917-2 (Vincent Cheng <>)
[    39.808] (II) intel(0): switch to mode 1920x1080@60.0 on eDP1 using pipe 0, position (0, 0), rotation normal, reflection none
[    39.810] (II) intel(0): Setting screen physical size to 508 x 285
[    67.576] (II) intel(0): EDID vendor "CMN", prod id 5941
[    67.576] (II) intel(0): Printing DDC gathered Modelines:
[    67.576] (II) intel(0): Modeline "1920x1080"x0.0  152.84  1920 1968 2000 2250  1080 1083 1088 1132 -hsync -vsync (67.9 kHz eP)
This example shows that the rendering of the desktop is performed by the graphical device of the Intel CPU, which is just like it's needed for running CUDA applications on your NVIDIA graphics card, if you don't have another one. nvidia-cuda-toolkit With the Debian package of the CUDA toolkit everything pretty much runs out of the box for Theano. Just install it with apt-get, and you're ready to go, the CUDA backend is the default one. Pycuda is also a suggested dependency of the binary packages, it could be pulled together with the CUDA toolkit. The up-to-date CUDA release 7.5 is of course available, with that you have Maxwell architecture support so that you can run Theano on e.g. a GeForce GTX Titan X with 6,2 TFLOPS on single precision13 at an affordable price. CUDA 814 is around the corner with support for the new Pascal architecture15. Like the GeForce GTX 1080 high-end gaming graphics card already has 8,23 TFLOPS16. When it comes to professional GPGPU hardware like the Tesla P100 there is much more computational power available, scalable by multiplication of cores resp. cards up to genuine little supercomputers which fit on a desk, like the DGX-117. Theano can use multiple GPUs for calculations to work with highly scaled hardware, I'll write another blog post on this issue. Theano on the GPU It's not difficult to run Theano on the GPU. Only single precision floating point numbers (float32) are supported on the GPU, but that is sufficient for deep learning applications. Theano uses double precision floats (float64) by default, so you have to set the configuration variable config.floatX to float32, like written on above, either with the THEANO_FLAGS environment variable or better in your .theanorc file, if you're going to use the GPU a lot. Switching to the GPU actually happens with the config.device configuration variable, which must be set to either gpu or gpu0, gpu1 etc., to choose a particular one if multiple devices are available. Here's is a little test script, it's taken from the docs and slightly altered. You can run that script either with python or python3 (there was a single test failure on the Python 3 package, so the Python 2 library might be a little more stable currently). For comparison, here's an example on how it perfoms on my hardware, one time on the CPU, one more time on the GPU:
$ THEANO_FLAGS=floatX=float32 python ./ 
[Elemwise exp,no_inplace (<TensorType(float32, vector)>)]
Looping 1000 times took 4.481719 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
Used the cpu
$ THEANO_FLAGS=floatX=float32,device=gpu python ./ 
Using gpu device 0: GeForce 940M (CNMeM is disabled, cuDNN not available)
[GpuElemwise exp,no_inplace (<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise exp,no_inplace .0)]
Looping 1000 times took 1.164906 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
Used the gpu
If you got a result like this you're ready to go with Theano on Debian, training computer vision classifiers or whatever you want to do with it. I'll write more on for what Theano could be used, soon.

  1. Some ports are disabled because they are currently not supported by Theano. There are NotImplementedErrors and other errors in the tests on the numpy.ndarray object being not aligned. The developers commented on that, see here. And on some ports the build flags -m32 resp. -m64 of Theano aren't supported by g++, the build flags can't be manipulated easily.
  2. Theano Development Team: "Theano: a Python framework for fast computation of mathematical expressions"
  3. Marc Couture: "Today's high-powered GPUs: strong for graphics and for maths". In: RTC magazine June 2015, pp. 22 25
  4. Ogier Maitre: "Understanding NVIDIA GPGPU hardware". In: Tsutsui/Collet (eds.): Massively parallel evolutionary computation on GPGPUs. Berlin, Heidelberg: Springer 2013, pp. 15-34
  5. Geoffrey French: "Deep learing tutorial: advanved techniques". PyData London 2016 presentation
  6. Like the description of the Lintian tag binary-without-manpage says, that's not needed for them being in /usr/share.
  7. Tom. R. Halfhill: "Parallel processing with CUDA: Nvidia's high-performance computing platform uses massive multithreading". In: Microprocessor Report January 28, 2008
  8. Faber "Parallelwelten: GPU-Programmierung mit OpenCL". In: C't 26/2014, pp. 160-165
  9. For comparison, see: Valentine Sinitsyn: "Feel the taste of GPU programming". In: Linux Voice February 2015, pp. 106-109
  11. If Optimus (hybrid) graphics hardware is present (like commonly today on PC laptops), Debian launches the X-server on the graphics processing unit of the CPU, which is ideal for CUDA. The problem with Optimus actually is the graphics processing on the dedicated GPU. If you are using Bumblebee, the Python interpreter which you want to run Theano on has be to be started with the launcher primusrun, because Bumblebee powers the GPU down with the tool bbswitch every time it isn't used, and I think also the kernel module of the driver is dynamically loaded.
  12. Thorsten Leemhuis: "Treiberreviere. Probleme mit Grafiktreibern f r Linux l sen": In: C't Nr.2/2013, pp. 156-161
  13. Martin Fischer: "4K-Rakete: Die schnellste Single-GPU-Grafikkarte der Welt". In C't 13/2015, pp. 60-61
  15. Martin Fischer: "All In: Nvidia enth llt die GPU-Architektur 'Pascal'". In: C't 9/2016, pp. 30-31
  16. Martin Fischer: "Turbo-Pascal: High-End-Grafikkarte f r Spieler: GeForce GTX 1080". In: C't 13/2016, pp. 100-103

8 June 2016

Reproducible builds folks: Reproducible builds: week 58 in Stretch cycle

What happened in the Reproducible Builds effort between May 29th and June 4th 2016: Media coverage Ed Maste will present Reproducible Builds in FreeBSD at BDSCan 2016 in Ottawa, Canada on June 11th. GSoC and Outreachy updates Toolchain fixes Other upstream fixes Packages fixed The following 53 packages have become reproducible due to changes in their build-dependencies: angband blktrace code-saturne coinor-symphony device-tree-compiler mpich rtslib ruby-bcrypt ruby-bson-ext ruby-byebug ruby-cairo ruby-charlock-holmes ruby-curb ruby-dataobjects-sqlite3 ruby-escape-utils ruby-ferret ruby-ffi ruby-fusefs ruby-github-markdown ruby-god ruby-gsl ruby-hdfeos5 ruby-hiredis ruby-hitimes ruby-hpricot ruby-kgio ruby-lapack ruby-ldap ruby-libvirt ruby-libxml ruby-msgpack ruby-ncurses ruby-nfc ruby-nio4r ruby-nokogiri ruby-odbc ruby-oj ruby-ox ruby-raindrops ruby-rdiscount ruby-redcarpet ruby-redcloth ruby-rinku ruby-rjb ruby-rmagick ruby-rugged ruby-sdl ruby-serialport ruby-sqlite3 ruby-unicode ruby-yajl ruby-zoom thin The following packages have become reproducible after being fixed: Some uploads have addressed some reproducibility issues, but not all of them: Uploads with an unknown result because they fail to build: Patches submitted that have not made their way to the archive yet: Package reviews 45 reviews have been added, 25 have been updated and 25 have been removed in this week. 12 FTBFS bugs have been reported by Chris Lamb and Niko Tyni. diffoscope development strip-nondeterminism development Mattia uploaded strip-nondeterminism 0.018-1 which improved support for *.epub files. Misc. Last week we also learned about progress of reproducible builds in FreeBSD. Ed Maste announced a change to record the build timestamp during ports building, which is required for later reproduction. This week's edition was written by Reiner Herrman, Holger Levsen and Chris Lamb and reviewed by a bunch of Reproducible builds folks on IRC.

21 April 2016

Mario Lang: Scraping the web with Python and XQuery

During a JAWS for Windows training, I was introduced to the Research It feature of that screen reader. Research It is a quick way to utilize web scraping to make working with complex web pages easier. It is about extracting specific information from a website that does not offer an API. For instance, look up a word in an online dictionary, or quickly check the status of a delivery. Strictly speaking, this feature does not belong in a screen reader, but it is a very helpful tool to have at your fingertips. Research It uses XQuery (actually, XQilla) to do all the heavy lifting. This also means that the Research It Rulesets are theoretically also useable on other platforms. I was immediately hooked, because I always had a love for XPath. Looking at XQuery code is totally self-explanatory for me. I just like the syntax and semantics. So I immediately checked out XQilla on Debian, and found #821329 and #821330, which were promptly fixed by Tommi Vainikainen, thanks to him for the really quick response! Unfortunately, making xqilla:parse-html available and upgrading to the latest upstream version is not enough to use XQilla on Linux with the typical webpages out there. Xerces-C++, which is what XQilla uses to fetch web resources, does not support HTTPS URLs at the moment. I filed #821380 to ask for HTTPS support in Xerces-C to be enabled by default. And even with HTTPS support enabled in Xerces-C, the xqilla:parse-html function (which is based on HTML Tidy) fails for a lot of real-world webpages I tried. Manually upgrading the six year old version of HTML Tidy in Debian to the latest from GitHub (tidy-html5, #810951) did not help a lot either.
Python to the rescue XQuery is still a very nice language for extracting information from markup documents. XQilla just has a bit of a hard time dealing with the typical HTML documents out there. After all, it was designed to deal with well-formed XML documents. So I decided to build myself a little wrapper around XQilla which fetches the web resources with the Python Requests package, and cleans the HTML document with BeautifulSoup (which uses lxml to do HTML parsing). The output of BeautifulSoup can apparently be passed to XQilla as the context document. This is a fairly crazy hack, but it works quite reliably so far. Here is how one of my web scraping rules looks like:
from click import argument, group
def xq():
  """Web scraping for command-line users."""
def github():
  """Quick access to"""
def github_code_search(language, query):
  """Search for source code."""
         params= 'l': language, 'q': query, 'type': 'code' )
The function scrape automatically determines the XQuery filename according to the callers function name. Here is how github_code_search.xq looks like:
declare function local:source-lines($table as node()*) as xs:string*
  for $tr in $table/tr return normalize-space(data($tr))
let $results := html//div[@id="code_search_results"]/div[@class="code-list"]
for $div in $results/div
let $repo := data($div/p/a[1])
let $file := data($div/p/a[2])
let $link := resolve-uri(data($div/p/a[2]/@href))
return (concat($repo, ": ", $file), $link, local:source-lines($div//table),
That is all I need to implement a custom web scraping rule. A few lines of Python to specify how and where to fetch the website from. And a XQuery file that specifies how to mangle the document content. And thanks to the Python click package, the various entry points of my web scraping script can easily be called from the command-line. Here is a sample invokation:
fx:~/xq% ./
  Quick access to
  --help  Show this message and exit.
  code_search  Search for source code.
fx:~/xq% ./ code_search Pascal '"debian/rules"'
prof7bit/LazPackager: frmlazpackageroptionsdeb.pas
230 procedure TFDebianOptions.BtnPreviewRulesClick(Sender: TObject);
231 begin
232 ShowPreview('debian/rules', EdRules.Text);
233 end;
235 procedure TFDebianOptions.BtnPreviewChangelogClick(Sender: TObject);
prof7bit/LazPackager: lazpackagerdebian.pas
205 + 'mv ../rules debian/' + LF
206 + 'chmod +x debian/rules' + LF
207 + 'mv ../changelog debian/' + LF
208 + 'mv ../copyright debian/' + LF
For the impatient, here is the implementation of scrape:
from bs4 import BeautifulSoup
from bs4.element import Doctype, ResultSet
from inspect import currentframe
from itertools import chain
from os import path
from os.path import abspath, dirname
from subprocess import PIPE, run
from tempfile import NamedTemporaryFile
import requests
def scrape(get=None, post=None, find_all=None,
           xquery_name=None, xquery_vars= , **kwargs):
  """Execute a XQuery file.
  When either get or post is specified, fetch the resource and run it through
  BeautifulSoup, passing it as context to the XQuery.
  If find_all is given, wrap the result of executing find_all on
  the BeautifulSoup in an artificial HTML body.
  If xquery_name is not specified, the callers function name is used.
  xquery_name combined with extension ".xq" is searched in the directory
  where this Python script resides and executed with XQilla.
  kwargs are passed to get or post calls.  Typical extra keywords would be:
  params -- To pass extra parameters to the URL.
  data -- For HTTP POST.
  response = None
  url = None
  context = None
  if get is not None:
    response = requests.get(get, **kwargs)
  elif post is not None:
    response =, **kwargs)
  if response is not None:
    context = BeautifulSoup(response.text, 'lxml')
    dtd = next(context.descendants)
    if type(dtd) is Doctype:
    if find_all is not None:
      context = context.find_all(find_all)
    url = response.url
  if xquery_name is None:
    xquery_name = currentframe().f_back.f_code.co_name
  cmd = ['xqilla']
  if context is not None:
    if type(context) is BeautifulSoup:
      soup = context
      context = NamedTemporaryFile(mode='w')
      print(soup, file=context)
    elif isinstance(context, list) or isinstance(context, ResultSet):
      tags = context
      context = NamedTemporaryFile(mode='w')
      print('<html><body>', file=context)
      for item in tags: print(item, file=context)
      print('</body></html>', file=context)
  cmd.extend(chain.from_iterable(['-v', k, v] for k, v in xquery_vars.items()))
  if url is not None:
    cmd.extend(['-b', url])
  cmd.append(abspath(path.join(dirname(__file__), xquery_name + ".xq")))
  output = run(cmd, stdout=PIPE).stdout.decode('utf-8')
  if type(context) is NamedTemporaryFile: context.close()
  print(output, end='')
The full source for xq can be found on GitHub. The project is just two days old, so I have only implemented three scraping rules as of now. However, adding new rules has been made deliberately easy, so that I can just write up a few lines of code whenever I find something on the web which I'd like to scrape on the command-line. If you find this "framework" useful, make sure to share your insights with me. And if you impelement your own scraping rules for a public service, consider sharing that as well. If you have an comments or questions, send me mail. Oh, and by the way, I am now also on Twitter as @blindbird23.

20 December 2015

Lunar: Reproducible builds: week 34 in Stretch cycle

What happened in the reproducible builds effort between December 13th to December 19th: Infrastructure Niels Thykier started implementing support for .buildinfo files in dak. A very preliminary commit was made by Ansgar Burchardt to prevent .buildinfo files from being removed from the upload queue. Toolchain fixes Mattia Rizzolo rebased our experimental debhelper with the changes from the latest upload. New fixes have been merged by OCaml upstream. Packages fixed The following 39 packages have become reproducible due to changes in their build dependencies: apache-mime4j, avahi-sharp, blam, bless, cecil-flowanalysis, cecil, coco-cs, cowbell, cppformat, dbus-sharp-glib, dbus-sharp, gdcm, gnome-keyring-sharp, gudev-sharp-1.0, jackson-annotations, jackson-core, jboss-classfilewriter, jboss-jdeparser2, jetty8, json-spirit, lat, leveldb-sharp, libdecentxml-java, libjavaewah-java, libkarma, mono.reflection, monobristol, nuget, pinta, snakeyaml, taglib-sharp, tangerine, themonospot, tomboy-latex, widemargin, wordpress, xsddiagram, xsp, zeitgeist-sharp. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: Packages in experimental are now tested on armhf. (h01ger) Arch Linux packages in the multilib and community repositories (4,000 more source packages) are also being tested. All of these test results are better analyzed and nicely displayed together with each package. (h01ger) For Fedora, build jobs can now run in parallel. Two are currently running, now testing reproducibility of 785 source packages from Fedora 23. mock/1.2.3-1.1 has been uploaded to experimental to better build RPMs. (h01ger) Work has started on having automatic build node pools to maximize use of armhf build nodes. (Vagrant Cascadian) diffoscope development Version 43 has been released on December 15th. It has been dubbed as epic! as it contains many contributions that were written around the summit in Athens. Baptiste Daroussin found that running diffoscope on some Tar archives could overwrite arbitrary files. This has been fixed by using libarchive instead of Python internal Tar library and adding a sanity check for destination paths. In any cases, until proper sandboxing is implemented, don't run diffosope on unstrusted inputs outside an isolated, throw-away system. Mike Hommey identified that the CBFS comparator would needlessly waste time scanning big files. It will now not consider any files bigger than 24 MiB 8 MiB more than the largest ROM created by coreboot at this time. An encoding issue related to Zip files has also been fixed. (Lunar) New comparators have been added: Android dex files (Reiner Herrmann), filesystem images using libguestfs (Reiner Herrmann), icons and JPEG images using libcaca (Chris Lamb), and OS X binaries (Clemens Lang). The comparator for Free Pascal Compilation Unit will now only be used when the unit version matches the compiler one. (Levente Polyak) A new multi-file HTML output with on-demand loading of long diffs is available through the --html-dir option. On-demand loading requires jQuery which path can be specified through the --jquery option. The diffs can also be simply browsed for non-JavaScript users or when jQuery is not available. (Joachim Breitner) Example of on-demand loading in diffosope Portability toward other systems has been improved: old versions of GNU diff are now supported (Mike McQuaid), suggestion of the appropriate locale is now the more generic en_US.UTF-8 (Ed Maste), the --list-tools option can now support multiple systems (Mattia Rizzolo, Levente Polyak, Lunar). Many internal changes and code clean-ups have been made, paving the way for parallel processing. (Lunar) Version 44 was released on December 18th fixing an issue affecting .deb lacking a md5sums file introduced in a previous refactoring (Lunar). Support has been added for Mozilla optimized Zip files. (Mike Hommey). The HTML output has been optimized in size (Mike Hommey, Esa Peuha, Lunar), speed (Lunar), and will now properly number lines (Mike Hommey). A message will always be displayed when lines are ignored at the end of a diff (Lunar). For portability and consistency, Python os.walk() function is now used instead of find to perform directory listing. (Lunar) Documentation update Package reviews 143 reviews have been removed, 69 added and 22 updated in the previous week. Chris Lamb reported 12 new FTBFS issues. News issues identified this week: random_order_in_init_py_generated_by_python-genpy, timestamps_in_copyright_added_by_perl_dist_zilla, random_contents_in_dat_files_generated_by_chasen-dictutils_makemat, timestamps_in_documentation_generated_by_pandoc. Chris West did some improvements on the scripts used to manage notes in the misc repository. Misc. Accounts of the reproducible builds summit in Athens were written by Thomas Klausner from NetBSD and Hans-Christoph Steiner from The Guardian Project. Some openSUSE developers are working on a hackweek on reproducible builds which was discussed on the opensuse-packaging mailing-list.

15 November 2015

Lunar: Reproducible builds: week 29 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Emmanuel Bourg uploaded eigenbase-resgen/ which uses of the scm-safe comment style by default to make them deterministic. Mattia Rizzolo started a new thread on debian-devel to ask a wider audience for issues about the -Wdate-time compile time flag. When enabled, GCC and clang print warnings when __DATE__, __TIME__, or __TIMESTAMP__ are used. Having the flag set by default would prompt maintainers to remove these source of unreproducibility from the sources. Packages fixed The following packages have become reproducible due to changes in their build dependencies: bmake, cyrus-imapd-2.4, drobo-utils, eigenbase-farrago, fhist, fstrcmp, git-dpm, intercal, libexplain, libtemplates-parser, mcl, openimageio, pcal, powstatd, ruby-aggregate, ruby-archive-tar-minitar, ruby-bert, ruby-dbd-odbc, ruby-dbd-pg, ruby-extendmatrix, ruby-rack-mobile-detect, ruby-remcached, ruby-stomp, ruby-test-declarative, ruby-wirble, vtprint. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: The fifth and sixth armhf build nodes have been set up, resulting in five more builder jobs for armhf. More than 10,000 packages have now been identified as reproducible with the reproducible toolchain on armhf. (Vagrant Cascadian, h01ger) Helmut Grohne and Mattia Rizzolo now have root access on all 12 build nodes used by and (h01ger) is now linked from all package pages and the dashboard. (h01ger) profitbricks-build5-amd64 and profitbricks-build6-amd64, responsible for running amd64 tests now run 398.26 days in the future. This means that one of the two builds that are being compared will be run on a different minute, hour, day, month, and year. This is not yet the case for armhf. FreeBSD tests are also done with 398.26 days difference. (h01ger) The design of the Arch Linux test page has been greatly improved. (Levente Polyak) diffoscope development Three releases of diffoscope happened this week numbered 39 to 41. It includes support for EPUB files (Reiner Herrmann) and Free Pascal unit files, usually having .ppu as extension (Paul Gevers). The rest of the changes were mostly targetting at making it easier to run diffoscope on other systems. The tlsh, rpm, and debian modules are now all optional. The test suite will properly skip tests that need optional tools or modules when they are not available. As a result, diffosope is now available on PyPI and thanks to the work of Levente Polyak in Arch Linux. Getting these versions in Debian was a bit cumbersome. Version 39 was uploaded with an expired key (according to the keyring on which will hopefully be updated soon) which is currently handled by keeping the files in the queue without REJECTing them. This prevented any other Debian Developpers to upload the same version. Version 40 was uploaded as a source-only upload but failed to build from source which had the undesirable side effect of removing the previous version from unstable. The package faild to build from source because it was built passing -I to debbuild. This excluded the ELF object files and static archives used by the test suite from the archive, preventing the test suite to work correctly. Hopefully, in a nearby future it will be possible to implement a sanity check to prevent such mistakes in the future. It has also been identified that ppudump outputs time in the system timezone without considering the TZ environment variable. Zachary Vance and Paul Gevers raised the issue on the appropriate channels. strip-nondeterminism development Chris Lamb released strip-nondeterminism version 0.014-1 which disables stripping Mono binaries as it is too aggressive and the source of the problem is being worked on by Mono upstream. Package reviews 133 reviews have been removed, 115 added and 103 updated this week. Chris West and Chris Lamb reported 57 new FTBFS bugs. Misc. The video of h01ger and Chris Lamb's talk at MiniDebConf Cambridge is now available. h01ger gave a talk at CCC Hamburg on November 13th, which was well received and sparked some interest among Gentoo folks. Slides and video should be available shortly. Frederick Kautz has started to revive Dhiru Kholia's work on testing Fedora packages. Your editor wish to once again thank #debian-reproducible regulars for reviewing these reports weeks after weeks.

9 November 2015

Lunar: Reproducible builds: week 28 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Chris Lamb filled a bug on python-setuptools with a patch to make the generated requires.txt files reproducible. The patch has been forwarded upstream. Chris also understood why the she-bang in some Python scripts kept being undeterministic: setuptools as called by dh-python could skip re-installing the scripts if the build had been too fast (under one second). #804339 offers a patch fixing the issue by passing --force to install. #804141 reported on gettext asks for support of SOURCE_DATE_EPOCH in gettextize. Santiago Vila pointed out that it doesn't felt appropriate as gettextize is supposed to be an interactive tool. The problem reported seems to be in avahi build system instead. Packages fixed The following packages became reproducible due to changes in their build dependencies: celestia, dsdo, fonts-taml-tscu, fte, hkgerman, ifrench-gut, ispell-czech, maven-assembly-plugin, maven-project-info-reports-plugin, python-avro, ruby-compass, signond, thepeg, wagon2, xjdic. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: Chris Lamb closed a wrongly reopened bug against haskell-devscripts that was actually a problem in haddock. FreeBSD tests are now run for three branches: master, stable/10, release/10.2.0. (h01ger) diffoscope development Support has been added for Free Pascal unit files (.ppc). (Paul Gevers) The homepage is now available using HTTPS, thanks to Let's Encrypt!. Work has been done to be able to publish diffoscope on the Python Package Index (also known as PyPI): the tlsh module is now optional, compatibility with python-magic has been added, and the fallback code to handle RPM has been fixed. Documentation update Reiner Herrmann, Paul Gevers, Niko Tyni, opi, and Dhole offered various fixes and wording improvements to the A mailing-list is now available to receive change notifications. NixOS, Guix, and Baserock are featured as projects working on reproducible builds. Package reviews 70 reviews have been removed, 74 added and 17 updated this week. Chris Lamb opened 22 new fail to build from source bugs. New issues this week: randomness_in_ocaml_provides, randomness_in_qdoc_page_id, randomness_in_python_setuptools_requires_txt, gettext_creates_ChangeLog_files_and_entries_with_current_date. Misc. h01ger and Chris Lamb presented Beyond reproducible builds at the MiniDebConf in Cambridge on November 8th. They gave an overview of where we stand and the changes in user tools, infrastructure, and development practices that we might want to see happening. Feedback on these thoughts are welcome. Slides are already available, and the video should be online soon. At the same event, a meeting happened with some members of the release team to discuss the best strategy regarding releases and reproducibility. Minutes have been posted on the Debian reproducible-builds mailing-list.

2 November 2015

Lunar: Reproducible builds: week 27 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Packages fixed The following packages became reproducible due to changes in their build dependencies: maven-plugin-tools, norwegian, ocaml-melt, python-biom-format, rivet. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: The following package is currently failing to build from source but should now be reproducible: Patches submitted which have not made their way to the archive yet: A quick update on current statistics: testing is at 85% of packages tested reproducible with our modified packages, unstable on armhf caught up with amd64 with 80%. The schroot name used for running diffoscope when testing OpenWrt, NetBSD, Coreboot, and Arch Linux has been fixed. (h01ger, Mattia Rizzolo) Documentation update Paul Gevers documented timestamps in unit files created by the Free Pascal Compiler. is now live. It contains a comprehensive documentation on all aspects that have been identified so far of what we call reproducible builds . It makes room for pointers to projects working on reproducible builds, news, dedicated tools, and community events. Package reviews 206 reviews have been removed, 171 added and 196 updated this week. Chris Lamb reported 28 failing to build from source issues. New issues identified this week: timestamps_in_pdf_content, different_encoding_in_html_by_docbook_xsl, timestamps_in_ppu_generated_by_fpc, method_may_never_be_called_in_documentation_generated_by_javadoc. Misc. Andrei Borzenkov has proposed a fix for uninitialized memory in GRUB's mkimage. Uninitialized memory is one source of hard to track down reproducibility errors. Holger Levsen presented the efforts on reproduible builds at Festival de Software Libre in Puerto Vallarta, Mexico.

22 October 2014

Sylvain Le Gall: Release of OASIS 0.4.5

On behalf of Jacques-Pascal Deplaix I am happy to announce the release of OASIS v0.4.5. Logo OASIS small OASIS is a tool to help OCaml developers to integrate configure, build and install systems in their projects. It should help to create standard entry points in the source code build system, allowing external tools to analyse projects easily. This tool is freely inspired by Cabal which is the same kind of tool for Haskell. You can find the new release here and the changelog here. More information about OASIS in general on the OASIS website. Here is a quick summary of the important changes: Features: This new version is a small release to catch up with all the fixes/pull requests present in the VCS that have not yet been published. This should made the life of my dear contributors easier -- thanks again for being patient. I would like to thanks again the contributor for this release: Christopher Zimmermann, Jerome Vouillon, Tomohiro Matsuyama and Christoph H ger. Their help is greatly appreciated.

5 October 2014

Stefano Zacchiroli: je code

je.code(); promoting programming (in French) is a nice initiative by, among others, my fellow Debian developer and university professor Martin Quinson. The goal of is to raise awareness about the importance of learning the basics of programming, for everyone in modern societies. targets specifically francophone children (hence the name, for "I code"). I've been happy to contribute to the initiative with my thoughts on why learning to program is so important today, joining the happy bunch of "codeurs" on the web site. If you read French, you can find them reposted below. If you also write French, you might want to contribute your thoughts on the matter. How? By forking the project of course!
Pourquoi codes-tu ? Tout d'abord, je code parce que c'est une activit passionnante, dr le, et qui permet de prouver le plaisir de cr er. Deuxi mement, je code pour automatiser les taches r p titives qui peuvent rendre p nibles nos vies num riques. Un ordinateur est con u exactement pour cela: lib rer les tres humains des taches stupides, pour leur permettre de se concentrer sur les taches qui ont besoin de l'intelligence humaine pour tre r solues. Mais je code aussi pour le pur plaisir du hacking, i.e., trouver des utilisations originelles et inattendues pour des logiciels existants. Comment as-tu appris ? Compl tement au hasard, quand j' tais gamin. 7 ou 8 ans, je suis tomb dans la biblioth que municipale de mon petit village, sur un livre qui enseignait programmer en BASIC travers la m taphore du jeu de l'oie. partir de ce jour j'ai utilis le Commodore 64 de mon p re beaucoup plus pour programmer que pour les jeux vid o: coder est tellement plus dr le! Plus tard, au lyc e, j'ai pu appr cier la programmation structur e et les avantages normes qu'elle apporte par rapport aux GO TO du BASIC et je suis devenu un accro du Pascal. Le reste est venu avec l'universit et la d couverte du Logiciel Libre: la caverne d'Ali Baba du codeur curieux. Quel est ton langage pr f r ? J'ai plusieurs langages pr f r s. J'aime Python pour son minimalisme syntactique, sa communaut vaste et bien organis e, et pour l'abondance des outils et ressources dont il dispose. J'utilise Python pour le d veloppement d'infrastructures (souvent quip es d'interfaces Web) de taille moyenne/grande, surtout si j'ai envie des cr er une communaut de contributeurs autour du logiciel. J'aime OCaml pour son syst me de types et sa capacit de capturer les bonnes propri t s des applications complexes. Cela permet au compilateur d'aider norm ment les d veloppeur viter des erreurs de codage comme de conception. J'utilise aussi beaucoup Perl et le shell script (principalement Bash) pour l'automatisation des taches: la capacit de ces langages de connecter d'autres applications est encore in gal e. Pourquoi chacun devrait-il apprendre programmer ou tre initi ? On est de plus en plus d pendants des logiciels. Quand on utilise une lave-vaisselle, on conduit une voiture, on est soign dans un h pital, quand on communique sur un r seau social, ou on surfe le Web, nos activit s sont constamment ex cut es par des logiciels. Celui qui contr le ces logiciels contr le nos vies. Comme citoyens d'un monde qui est de plus en plus num rique, pour ne pas devenir des esclaves 2.0, nous devons pr tendre le contr le sur le logiciel qui nous entoure. Pour y parvenir, le Logiciel Libre---qui nous permet d'utiliser, tudier, modifier, reproduire le logiciel sans restrictions---est un ingr dient indispensable. Aussi bien qu'une vaste diffusion des comp tences en programmation: chaque bit de connaissance dans ce domaine nous rende tous plus libres.

12 September 2014

John Goerzen: The Thrill and Stress of Too Many Hobbies

Today, 4PM. Jacob and Oliver excitedly peer at the box in our kitchen a really big box, taller than them. Inside is is the first model airplane I d ever purchased. The three of us hunkered down on the kitchen floor, opened the box, unpacked the parts, examined the controller, and found the manual with cryptic assembly directions. Oliver turned some screws while Jacob checked out the levers on the controllers. Then they both left for a bit to play with their toy buses. A little while later, the three of us went outside. It was too windy to fly. I had never flied an RC plane before only RC quadcopters (much easier to fly), and some practice time on an RC simulator. But the excitement was too much. So out we went, and the plane took off perfectly, climbed, flew over the trees, and circled above our heads at my command. I even managed a good landing in the wind, despite about 5 aborted attempts due to coming in too high, wrong angle, too fast, or last-minute gusts of wind throwing everything off. I am not sure how I pulled that all off on my first flight, but somehow I did! It was thrilling! I ve had a lot of hobbies in my life. Computers have run through many of them; I learned Pascal (a programming language) at about the same time I learned cursive handwriting and started with C at around age 10. It was all fun. I ve been a Debian developer for some 18 years now, and have written a lot of code, and even books about code, over the years. Photography, music, literature, history, philosophy, and theology have been interests for quite some time as well. In the last few years, I ve picked up amateur radio, model aircraft, etc. And last month, Laura led me into Ada s Technical Books during our visit to Seattle, resulting in me getting interested in Arduino. (The boys and I have already built a light-activated crossing gate for their HO-gauge model trains, and Jacob can now say he s edited a few characters of C!) Sometimes I find ways to merge hobbies; I ve set up all sorts of amateur radio systems on Linux, take aerial photographs, and set up systems to stream music in my house. But I also have a lot less time for hobbies overall than I once did; other things in life, such as my children, are more important. Some of the code I once worked on actively I no longer use or maintain, and I feel guilty about that when people send bug reports that I have no interest in fixing anymore. Sometimes I feel a need to cut down, and perhaps have; and then, I get an interest in RC aircraft and find an airplane that is great for a beginner and fairly inexpensive. Perhaps it is the curse of being a curious person living in an interesting world. Do any of the rest of you have a large number of hobbies? How do you feel about that?

25 March 2014

Sylvain Le Gall: Release of OASIS 0.4.3

I am happy to announce the release of OASIS v0.4.3. Logo OASIS small OASIS is a tool to help OCaml developers to integrate configure, build and install systems in their projects. It should help to create standard entry points in the source code build system, allowing external tools to analyse projects easily. This tool is freely inspired by Cabal which is the same kind of tool for Haskell. You can find the new release here and the changelog here. More information about OASIS in general on the OASIS website. Here is a quick summary of the important changes: Features: This new version closes 4 bugs, mostly related to parsing of _oasis. It also includes a lot of refactoring to improve the overall quality of OASIS code base. The big project for the next release will be to setup a Windows host for regular build and test on this platform. I plan to use WODI for this setup. I would like to thanks again the contributor for this release: David Allsopp, Martin Keegan and Jacques-Pascal Deplaix. Their help is greatly appreciated.