Search Results: "ghe"

30 November 2021

Russell Coker: Links November 2021

The Guardian has an amusing article by Sophie Elmhirst about Libertarians buying a cruise ship to make a seasteading project off the coast of Panama [1]. It turns out that you need permits etc to do this and maintaining a ship is expensive. Also you wouldn t want to mine cryptocurrency in a ship cabin as most cabins are small and don t have enough airconditioning to remain pleasant if you dump 1kW or more into the air. NPR has an interesting article about the reaction of the NRA to the Columbine shootings [2]. Seems that some NRA person isn t a total asshole and is sharing their private information, maybe they are dying and are worried about going to hell. David Brin wrote an insightful blog post about the singleton hypothesis where he covers some of the evidence of autocratic societies failing [3]. I think he makes a convincing point about a single centralised government for human society not being viable. But something like the EU on a world wide scale could work well. Ken Shirriff wrote an interesting blog post about reverse engineering the Yamaha DX7 synthesiser [4]. The New York Times has an interesting article about a Baboon troop that became less aggressive after the alpha males all died at once from tuberculosis [5]. They established a new more peaceful culture that has outlived the beta males who avoided tuberculosis. The Guardian has an interesting article about how sequencing the genomes of the entire population can save healthcare costs while improving the health of the population [6]. This is somthing wealthy countries should offer for free to the world population. At a bit under $1000 per test that s only about $7 trillion to test everyone, and of course the price should drop significantly if there were billions of tests being done. The Strategy Bridge has an interesting article about SciFi books that have useful portrayals of military strategy [7]. The co-author is Major General Mick Ryan of the Australian Army which is noteworthy as Major General is the second highest rank in use by the Australian Army at this time. Vice has an interesting article about the co-evolution of penises and vaginas and how a lot of that evolution is based on avoiding impregnation from rape [8]. Cory Doctorow wrote an insightful Medium article about the way that governments could force interoperability through purchasing power [9]. Cory Doctorow wrote an insightful article for Locus Magazine about imagining life after capitalism and how capitalism might be replaced [10]. We need a Star Trek future! Arstechnica has an informative article about new developmenet in the rowhammer category of security attacks on DRAM [11]. It seems that DDR4 with ECC is the best current mitigation technique and that DDR3 with ECC is harder to attack than non-ECC RAM. So the thing to do is use ECC on all workstations and avoid doing security critical things on laptops because they can t juse ECC RAM.

Russell Coker: Your Device Has Been Improved

I ve just started a Samsung tablet downloading a 770MB update, the description says:
Technically I have no doubt that both those claims are true and accurate. But according to common understanding of the English language I think they are both misleading. By stability improved they mean fixed some bugs that made it unstable and no technical person would imagine that after a certain number of such updates the number of bugs will ever reach zero and the tablet will be perfectly reliable. In fact if you should consider yourself lucky if they fix more bugs than they add. It s not THAT uncommon for phones and tablets to be bricked (rendered unusable by software) by an update. In the past I got a Huawei Mate9 as a warranty replacement for a Nexus 6P because an update caused so many Nexus 6P phones to fail that they couldn t be replaced with an identical phone [1]. By security improved they usually mean fixed some security flaws that were recently discovered to make it almost as secure as it was designed to be . Note that I deliberately say almost as secure because it s sometimes impossible to fix a security flaw without making significant changes to interfaces which requires more work than desired for an old product and also gives a higher probability of things going wrong. So it s sometimes better to aim for almost as secure or alternatively just as secure but with some features disabled. Device manufacturers (and most companies in the Android space make the same claims while having the exact same bugs to deal with, Samsung is no different from the others in this regards) are not making devices more secure or more reliable than when they were initially released. They are aiming to make them almost as secure and reliable as when they were released. They don t have much incentive to try too hard in this regard, Samsung won t suffer if I decide my old tablet isn t reliable enough and buy a new one, which will almost certainly be from Samsung because they make nice tablets. As a thought experiment, consider if car repairers did the same thing. Getting us to service your car will improve fuel efficiency , great how much more efficient will it be than when I purchased it? As another thought experiment, consider if car companies stopped providing parts for car repair a few years after releasing a new model. This is effectively what phone and tablet manufacturers have been doing all along, software updates for stability and security are to devices what changing oil etc is for cars.

21 November 2021

Julian Andres Klode: APT Z3 Solver Basics

Z3 is a theorem prover developed at Microsoft research and available as a dynamically linked C++ library in Debian-based distributions. While the library is a whopping 16 MB, and the solver is a tad slow, it s permissive licensing, and number of tactics offered give it a huge potential for use in solving dependencies in a wide variety of applications. Z3 does not need normalized formulas, but offers higher level abstractions like atmost and atleast and implies, that we will make use of together with boolean variables to translate the dependency problem to a form Z3 understands. In this post, we ll see how we can apply Z3 to the dependency resolution in APT. We ll only discuss the basics here, a future post will explore optimization criteria and recommends.

Translating the universe APT s package universe consists of 3 relevant things: packages (the tuple of name and architecture), versions (basically a .deb), and dependencies between versions. While we could translate our entire universe to Z3 problems, we instead will construct a root set from packages that were manually installed and versions marked for installation, and then build the transitive root set from it by translating all versions reachable from the root set. For each package P in the transitive root set, we create a boolean literal P. We then translate each version P1, P2, and so on. Translating a version means building a boolean literal for it, e.g. P1, and then translating the dependencies as shown below. We now need to create two more clauses to satisfy the basic requirements for debs:
  1. If a version is installed, the package is installed; and vice versa. We can encode this requirement for P above as P == atleast( P1,P2 , 1).
  2. There can only be one version installed. We add an additional constraint of the form atmost( P1,P2 , 1).
We also encode the requirements of the operation.
  1. For each package P that is manually installed, add a constraint P.
  2. For each version V that is marked for install, add a constraint V.
  3. For each package P that is marked for removal, add a constraint !P.

Dependencies Packages in APT have dependencies of two basic forms: Depends and Conflicts, as well as variations like Breaks (identical to Conflicts in solving terms), and Recommends (soft Depends) - we ll ignore those for now. We ll discuss Conflicts in the next section. Let s take a basic dependency list: A Depends: X Y, Z. To represent that dependency, we expand each name to a list of versions that can satisfy the dependency, for example X1 X2 Y1, Z1. Translating this dependency list to our Z3 solver, we create boolean variables X1,X2,Y1,Z1 and define two rules:
  1. A implies atleast( X1,X2,Y1 , 1)
  2. A implies atleast( Z1 , 1)
If there actually was nothing that satisfied the Z requirement, we d have added a rule not A. It would be possible to simply not tell Z3 about the version at all as an optimization, but that adds more complexity, and the not A constraint should not cause too many problems.

Conflicts Conflicts cannot have or in them. A dependency B Conflicts: X, Y means that only one of B, X, and Y can be installed. We can directly encode this in Z3 by using the constraint atmost( B,X,Y , 1). This is an optimized encoding of the constraint: We could have encoded each conflict in the form !B or !X, !B or !X, and so on. Usually this leads to worse performance as it introduces additional clauses.

Complete example Let s assume we start with an empty install and want to install the package a below.
Package: a
Version: 1
Depends: c   b
Package: b
Version: 1
Package: b
Version: 2
Conflicts: x
Package: d
Version: 1
Package: x
Version: 1
The translation in Z3 rules looks like this:
  1. Package rules for a:
    1. a == atleast( a1 , 1) - package is installed iff one version is
    2. atmost( a1 , 1) - only one version may be installed
    3. a a must be installed
  2. Dependency rules for a
    1. implies(a1, atleast( b2, b1 , 1)) the translated dependency above. note that c is gone, it s not reachable.
  3. Package rules for b:
    1. b == atleast( b1,b2 , 1) - package is installed iff one version is
    2. atmost( b1, b2 , 1) - only one version may be installed
  4. Dependencies for b (= 2):
    1. atmost( b2, x1 , 1) - the conflicts between x and b = 2 above
  5. Package rules for x:
    1. x == atleast( x1 , 1) - package is installed iff one version is
    2. atmost( x1 , 1) - only one version may be installed
The package d is not translated, as it is not reachable from the root set a1 , the transitive root set is a1,b1,b2,x1 .

Next iteration: Optimization We have now constructed the basic set of rules that allows us to solve solve our dependency problems (equivalent to SAT), however it might lead to suboptimal solutions where it removes automatically installed packages, or installs more packages than necessary, to name a few examples. In our next iteration, we have to look at introducing optimization; for example, have the minimum number of removals, the minimal number of changed packages, or satisfy as many recommends as possible. We will also look at the upgrade problem (upgrade as many packages as possible), the autoremove problem (remove as many automatically installed packages as possible).

16 November 2021

Paul Tagliamonte: Measuring the Power Output of my SDRs

Over the last few years, I ve often wondered what the true power output of my SDRs are. It s a question with a shocking amount of complexity in the response, due to a number of factors (mostly Frequency). The ranges given in spec sheets are often extremely vague, and if I m being honest with myself, not incredibly helpful for being able to determine what specific filters and amplifiers I ll need to get a clean signal transmitted.
Hey, heads up! - This post contains extremely unvalidated and back of the napkin quality work to understand how my equipment works. Hopefully this work can be of help to others, but please double check any information you need for your own work!
I was specifically interested in what gain output (in dBm) looks like across the frequency range in particular, how variable the output dBm is when I change frequencies. The second question I had was understanding how linear the output gain is when adjusting the requested gain from the radio. Does a 2 dB increase on a HackRF API mean 2 dB of gain in dBm, no matter what the absolute value of the gain stage is? I ve finally bit the bullet and undertaken work to characterize the hardware I do have, with some outdated laboratory equipment I found on eBay. Of course, if it s worth doing, it s worth overdoing, so I spent a bit of time automating a handful of components in order to collect the data that I need from my SDRs. I bought an HP 437B, which is the cutting edge of 30 years ago, but still accurate to within 0.01dBm. I paired this Power Meter with an Agilent 8481A Power Sensor (-30 dBm to 20 dBm from 10MHz to 18GHz). For some of my radios, I was worried about exceeding the 20 dBm mark, so I used a 20db attenuator while I waited for a higher power power sensor. Finally, I was able to find a GPIB to USB interface, and get that interface working with the GPIB Kernel driver on my system. With all that out of the way, I was able to write Go bindings to my HP 437B to allow for totally headless and automated control in sync with my SDR s RF output. This allowed me to script the transmission of a sine wave at a controlled amplitude across a defined gain range and frequency range and read the Power Sensor s measured dBm output to characterize the Gain across frequency and configured Gain.

HackRF Looking at configured Gain against output power, the requested gain appears to have a fairly linear relation to the output signal power. The measured dBm ranged between the sensor noise floor to approx +13dBm. The average standard deviation of all tested gain values over the frequency range swept was +/-2dBm, with a minimum standard deviation of +/-0.8dBm, and a maximum of +/-3dBm. When looking at output power over the frequency range swept, the HackRF contains a distinctive (and frankly jarring) ripple across the Frequency range, with a clearly visible jump in gain somewhere around 2.1GHz. I have no idea what is causing this massive jump in output gain, nor what is causing these distinctive ripples. I d love to know more if anyone s familiar with HackRF s RF internals!

PlutoSDR The power output is very linear when operating above -20dB TX channel gain, but can get quite erratic the lower the output power is configured. The PlutoSDR s output power is directly related to the configured power level, and is generally predictable once a minimum power level is reached. The measured dBm ranged from the noise floor to 3.39 dBm, with an average standard deviation of +/-1.98 dBm, a minimum standard deviation of +/-0.91 dBm and a maximum standard deviation of +/-3.37 dBm. Generally, the power output is quite stable, and looks to have very even and wideband gain control. There s a few artifacts, which I have not confidently isolated to the SDR TX gain, noise (transmit artifacts such as intermodulation) or to my test setup. They appear fairly narrowband, so I m not overly worried about them yet. If anyone has any ideas what this could be, I d very much appreciate understanding why they exist!

Ettus B210 The power output on the Ettus B210 is higher (in dBm) than any of my other radios, but it has a very odd quirk where the power becomes nonlinear somewhere around -55dB TX channel gain. After that point, adding gain has no effect on the measured signal output in dBm up to 0 dB gain. The measured dBm ranged from the noise floor to 18.31 dBm, with an average standard deviation of +/-2.60 dBm, a minimum of +/-1.39 dBm and a maximum of +/-5.82 dBm. When the Gain is somewhere around the noise floor, the measured gain is incredibly erratic, which throws the maximum standard deviation significantly. I haven t isolated that to my test setup or the radio itself. I m inclined to believe it s my test setup. The radio has a fairly even and wideband gain, and so long as you re operating between -70dB to -55dB, fairly linear as well.

Summary Of all my radios, the Ettus B210 has the highest output (in dBm) over the widest frequency range, but the HackRF is a close second, especially after the gain bump kicks in around 2.1GHz. The Pluto SDR feels the most predictable and consistent, but also a very low output, comparatively - right around 0 dBm.
Name Max dBm stdev dBm stdev min dBm stdev max dBm
HackRF +12.6 +/-2.0 +/-0.8 +/-3.0
PlutoSDR +3.3 +/-2.0 +/-0.9 +/-3.7
B210 +18.3 +/-2.6 +/-1.4 +/-6.0

9 November 2021

Joachim Breitner: How to audit an Internet Computer canister

I was recently called upon by Origyn to audit the source code of some of their Internet Computer canisters ( canisters are services or smart contracts on the Internet Computer), which were written in the Motoko programming language. Both the application model of the Internet Computer as well as Motoko bring with them their own particular pitfalls and possible sources for bugs. So given that I was involved in the creation of both, they reached out to me. In the course of that audit work I collected a list of things to watch out for, and general advice around them. Origyn generously allowed me to share that list here, in the hope that it will be helpful to the wider community.

Inter-canister calls The Internet Computer system provides inter-canister communication that follows the actor model: Inter-canister calls are implemented via two asynchronous messages, one to initiate the call, and one to return the response. Canisters process messages atomically (and roll back upon certain error conditions), but not complete calls. This makes programming with inter-canister calls error-prone. Possible common sources for bugs, vulnerabilities or simply unexpected behavior are:
  • Reading global state before issuing an inter-canister call, and assuming it to still hold when the call comes back.
  • Changing global state before issuing an inter-canister call, changing it again in the response handler, but assuming nothing else changes the state in between (reentrancy).
  • Changing global state before issuing an inter-canister call, and not handling failures correctly, e.g. when the code handling the callback rolls backs.
If you find such pattern in your code, you should analyze if a malicious party can trigger them, and assess the severity that effect These issues apply to all canisters, and are not Motoko-specific.

Rollbacks Even in the absence of inter-canister calls the behavior of rollbacks can be surprising. In particular, rejecting (i.e. throw) does not rollback state changes done before, while trapping (e.g. Debug.trap, assert , out of cycle conditions) does. Therefore, one should check all public update call entry points for unwanted state changes or unwanted rollbacks. In particular, look for methods (or rather, messages, i.e. the code between commit points) where a state change is followed by a throw. This issues apply to all canisters, and are not Motoko-specific, although other CDKs may not turn exceptions into rejects (which don t roll back).

Talking to malicious canisters Talking to untrustworthy canisters can be risky, for the following (likely incomplete) reasons:
  • The other canister can withhold a response. Although the bidirectional messaging paradigm of the Internet Computer was designed to guarantee a response eventually, the other party can busy-loop for as long as they are willing to pay for before responding. Worse, there are ways to deadlock a canister.
  • The other canister can respond with invalidly encoded Candid. This will cause a Motoko-implemented canister to trap in the reply handler, with no easy way to recover. Other CDKs may give you better ways to handle invalid Candid, but even then you will have to worry about Candid cycle bombs that will cause your reply handler to trap.
Many canisters do not even do inter-canister calls, or only call other trustwothy canisters. For the others, the impact of this needs to be carefully assessed.

Canister upgrade: overview For most services it is crucial that canisters can be upgraded reliably. This can be broken down into the following aspects:
  1. Can the canister be upgraded at all?
  2. Will the canister upgrade retain all data?
  3. Can the canister be upgraded promptly?
  4. Is three a recovery plan for when upgrading is not possible?

Canister upgradeability A canister that traps, for whatever reason, in its canister_preupgrade system method is no longer upgradeable. This is a major risk. The canister_preupgrade method of a Motoko canister consists of the developer-written code in any system func preupgrade() block, followed by the system-generated code that serializes the content of any stable var into a binary format, and then copies that to stable memory. Since the Motoko-internal serialization code will first serialize into a scratch space in the main heap, and then copy that to stable memory, canisters with more than 2GB of live data will likely be unupgradeable. But this is unlikely the first limit: The system imposes an instruction limit on upgrading a canister (spanning both canister_preupgrade and canister_postupgrade). This limit is a subnet configuration value, and sepearate (and likely higher) than the normal per-message limit, and not easily determined. If the canister s live data becomes too large to be serialized within this limit, the canister becomes non-upgradeable. This risk cannot be eliminated completely, as long as Motoko and Stable Variables are used. It can be mitigated by appropriate load testing: Install a canister, fill it up with live data, and exercise the upgrade. If this succeeds with a live data set exceeding the expected amount of data by a margin, this risk is probably acceptable. Bonus points for adding functionality that will prevent the canister s live data to increase above a certain size. If this testing is to be done on a local replica, extra care needs to be taken to make sure the local replica actually performs instruction counting and has the same resource limits as the production subnet. An alternative mitigation is to avoid canister_pre_upgrade as much as possible. This means no use of stable var (or restricted to small, fixed-size configuration data). All other data could be
  • mirrored off the canister (possibly off chain), and manually re-hydrated after an upgrade.
  • stored in stable memory manually, during each update call, using the ExperimentalStableMemory API. While this matches what high-assurance Rust canisters (e.g. the Internet Identity) do, This requires manual binary encoding of the data, and is marked experimental, so I cannot recommend this at the moment.
  • not put into a Motoko canister until Motoko has a scalable solution for stable variable (for example keeping them in stable memory permanently, with smart caching in main memory, and thus obliterating the need for pre-upgrade code.)

Data retention on upgrades Obviously, all live data ought to be retained during upgrades. Motoko automatically ensures this for stable var data. But often canisters want to work with their data in a different format (e.g. in objects that are not shared and thus cannot be put in stable vars, such as HashMap or Buffer objects), and thus may follow following idiom:
stable var fooStable =  ;
var foo = fooFromStable(fooStable);
system func preupgrade()   fooStable := fooToStable(foo);  )
system func postupgrade()   fooStable := (empty);  )
In this case, it is important to check that
  • All non-stable global vars, or global lets with mutable values, have a stable companion.
  • The assignments to foo and fooStable are not forgotten.
  • The fooToStable and fooFromStable form bijections.
An example would be HashMaps stored as arrays via Iter.toArray( .entries()) and HashMap.fromIter( .vals()). It is worth pointiong out that a code view will only look at a single version of the code, but cannot check whether code changes will preserve data on upgrade. This can easily go wrong if the names and types of stable variables are changed in incompatible way. The upgrade may fail loudly in this cases, but in bad cases, the upgrade may even succeed, losing data along the way. This risk needs to be mitigated by thorough testing, and possibly backups (see below).

Prompt upgrades Motoko and Rust canisters cannot be safely upgraded when they are still waiting for responses to inter-canister calls (the callback would eventually reach the new instance, and because of infelicities of the IC s System API, could possibly call arbitrary internal functions). Therefore, the canister needs to be stopped before upgrading, and started again. If the inter-canister calls take a long time, this mean that upgrading may take a long time, which may be undesirable. Again, this risk is reduced if all calls are made to trustworthy canisters, and elevated when possibly untrustworthy canisters are called, directly or indirectly.

Backup and recovery Because of the above risk around upgrades it is advisable to have a disaster recovery strategy. This could involve off-chain backups of all relevant data, so that it is possible to reinstall (not upgrade) the canister and re-upload all data. Note that reinstall has the same issue as upgrade described above in prompt upgrades : It ought to be stopped first to be safe. Note that the instruction limit for messages, as well as the message size limit, limit the amount of data returned. If the canister needs to hold more data than that, the backup query method might have to return chunks or deltas, with all the extra complexity that entails, e.g. state changes between downloading chunks. If large data load testing is performed (as Irecommend anyways to test upgradeability), one can test whether the backup query method works within the resource limits.

Time is not strictly monotonic The timestamps for current time that the Internet Computer provides to its canisters is guaranteed to be monotonic, but not strictly monotonic. It can return the same values, even in the same messages, as long as they are processed in the same block. It should therefore not be used to detect happens-before relations. Instead of using and comparing time stamps to check whether Y has been performed after X happened last, introduce an explicit var y_done : Bool state, which is set to False by X and then to True by Y. When things become more complex, it will be easier to model that state via an enumeration with speaking tag names, and update this state machine along the way. Another solution to this problem is to introduce a var v : Nat counter that you bump in every update method, and after each await. Now v is your canister s state counter, and can be used like a timestamp in many ways. While we are talking about time: The system time (typically) changes across an await. So if you do let now = Time.now() and then await, the value in now may no longer be what you want.

Wrapping arithmetic The Nat64 data type, and the other fixed-width numeric types provide opt-in wrapping arithmetic (e.g. +%, fromIntWrap). Unless explicitly required by the current application, this should be avoided, as usually a too large or negatie value is a serious, unrecoverable logic error, and trapping is the best one can do.

Cycle balance drain attacks Because of the IC s canister pays model, all canisters are prone to DoS attacks by draining their cycle balance, and this risk needs to be taken into account. The most elementary mitigation strategy is to monitor the cycle balance of canisters and keep it far from the (configurable) freezing threshold. On the raw IC-level, further mitigation strategies are possible:
  • If all update calls are authenticated, perform this authentication as quickly as possible, possibly before decoding the caller s argument. This way, a cycle drain attack by an unauthenticated attacker is less effective (but still possible).
  • Additionally, implementing the canister_inspect_message system method allows the above checks to be performed before the message even is accepted by the Internet Computer. But it does not defend against inter-canister messages and is therefore not a complete solution.
  • If an attack from an authenticated user (e.g. a stakeholder) is to be expected, the above methods are not effective, and an effective defense might require relatively involved additional program logic (e.g. per-caller statistics) to detect such an attack, and react (e.g. rate-limiting).
  • Such defenses are pointless if there is only a single method where they do not apply (e.g. an unauthenticated user registration method). If the application is inherently attackable this way, it is not worth the bother to raise defenses for other methods.
Related: A justification why the Internet Identity does not use canister_inspect_message) A motoko-implemented canister currently cannot perform most of these defenses: Argument decoding happens unconditionally before any user code that may reject a message based on the caller, and canister_inspect_message is not supported. Furthermore, Candid decoding is not very cycle defensive, and one should assume that it is possible to construct Candid messages that require many instructions to decode, even for simple argument type signatures. The conclusion for the audited canisters is to rely on monitoring to keep the cycle blance up, even during an attack, if the expense can be born, and maybe pray for IC-level DoS protections to kick in.

Large data attacks Another DoS attack vector exists if public methods allow untrustworthy users to send data of unlimited size that is persisted in the canister memory. Because of the translation of async-await code into multiple message handlers, this applies not only to data that is obviously stored in global state, but also local data that is live across an await point. The effectiveness of such attacks is limited by the Internet Computer s message size limit, which is in the order of a few megabytes, but many of those also add up. The problem becomes much worse if a method has an argument type that allows a Candid space bomb: It is possible to encode very large vectors with all values null in Candid, so if any method has an argument of type [Null] or [?t], a small message will expand to a large value in the Motoko heap. Other types to watch out:
  • Nat and Int: This is an unbounded natural number, and thus can be arbitrarily large. The Motoko representation will however not be much larger than the Candid encoding (so this does not qualify as a space bomb).
It is still advisable to check if the number is reasonable in size before storing it or doing an await. For example, when it denotes an index in an array, throw early if it exceeds the size of the array; if it denotes a token amount to transfer, check it against the available balance, if it denotes time, check it against reasonable bounds.
  • Principal: A Principal is effectively a Blob. The Interface specification says that principals are at most 29 bytes in length, but the Motoko Candid decoder does not check that currently (fixed in the next version of Motoko). Until then, a Principal passed as an argument can be large (the principal in msg.caller is system-provided and thus safe). If you cannot wait for the fix to reach you, manually check the size of the princial (via Principal.toBlob) before doing the await.

Shadowing of msg or caller Don t use the same name for the message context of the enclosing actor and the methods of the canister: It is dangerous to write shared(msg) actor, because now msg is in scope across all public methods. As long as these also use public shared(msg) func , and thus shadow the outer msg, it is correct, but it if one accidentially omits or mis-types the msg, no compiler error would occur, but suddenly msg.caller would now be the original controller, likely defeating an important authorization step. Instead, write shared(init_msg) actor or shared( caller = controller ) actor to avoid using msg.

Conclusion If you write a serious canister, whether in Motoko or not, it is worth to go through the code and watch out for these patterns. Or better, have someone else review your code, as it may be hard to spot issues in your own code. Unfortunately, such a list is never complete, and there are surely more ways to screw up your code in addition to all the non-IC-specific ways in which code can be wrong. Still, things get done out there, so best of luck!

6 November 2021

Reproducible Builds: Reproducible Builds in October 2021

Welcome to the October 2021 report from the Reproducible Builds project!
This month Samanta Navarro posted to the oss-security security mailing on a novel category of exploit in the .tar archive format, where a single .tar file contains different contents depending on the tar utility being used. Naturally, this has consequences for reproducible builds as Samanta goes onto reply:

Arch Linux uses libarchive (bsdtar) in its build environment. The default tar program installed is GNU tar. It is possible to create a source distribution which leads to different files seen by the build environment than compared to a careful reviewer and other Linux distributions.
Samanta notes that addressing the tar utilities themselves will not be a sufficient fix:
I have submitted bug reports and patches to some projects but eventually I had to conclude that the problem itself cannot be fixed by these implementations alone. The best choice for these tools would be to only allow archives which are fully compatible to standards but this in turn would render a lot of archives broken.
Reproducible builds, with its twin ideas of reaching consensus on the build outputs as well as precisely recording and describing the build environment, would help address this problem at a higher level.
Codethink announced that they had achieved ISO-26262 ASIL D Tool Certification, a way of determining specific safety standards for software. Codethink used open source tooling to achieve this, but they also leverage:
Reproducibility, repeatability and traceability of builds, drawing heavily on best-practices championed by the Reproducible Builds project.

Elsewhere on the internet, according to a comment on Hacker News, Microsoft are now comparing NPM Javascript packages with their original source repositories:
I got a PR in my repository a few days ago leading back to a team trying to make it easier for packages to be reproducible from source.

Lastly, Martin Monperrus started an interesting thread on our mailing list about Github, specifically that their autogenerated release tarballs are not deterministic . The thread generated a significant number of replies that are worth reading.

Events and presentations

Community news On our mailing list this month:
There were quite a few changes to the Reproducible Builds website and documentation this month as well, including Feng Chai updating some links on our publications page [ ] and marco updated our project metadata around the Bitcoin Core building guide [ ].
Lastly, we ran another productive meeting on IRC during October. A full set of notes from the meeting is available to view.

Distribution work Qubes was heavily featured in the latest edition of Linux Weekly News, and a significant section was dedicated to discussing reproducibility. For example, it was mentioned that the Qubes project has been working on incorporating reproducible builds into its continuous integration (CI) infrastructure . But the LWN article goes on to describe that:
The current goal is to be able to build the Qubes OS Debian templates solely from packages that can be built reproducibly. Templates in Qubes OS are VM images that can be used to start an application qube quickly based on the template. The qube will have read-only access to the root filesystem of the template, so that the same root filesystem can be shared with multiple application qubes. There are official templates for several variants of both Fedora and Debian, as well as community maintained templates for several other distributions.
You can view the whole article on LWN, and Fr d ric also published a lengthy summary about their work on reproducible builds in Qubes as well for those wishing to learn more.
In Debian this month, 133 reviews of Debian packages were added, 81 were updated and 24 were removed this month, adding to Debian s ever-growing knowledge about identified issues. A number of issues were categorised and added by Chris Lamb and Vagrant Cascadian too [ ][ ][ ]. In addition, work on alternative snapshot service has made progress by Fr d ric Pierret and Holger Levsen this month, including moving from the existing host (snapshot.notset.fr) to snapshot.reproducible-builds.org (more info) thanks to OSUOSL for the machine and hosting and Debian for the disks.
Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 186, 187, 188 and 189 to Debian
  • New features:
    • Add support for Python Sphinx inventory files (usually named objects.inv on-disk). [ ]
    • Add support for comparing .pyc files. Thanks to Sergei Trofimovich for the inspiration. [ ]
    • Try some alternative suffixes (e.g. .py) to support distributions that strip or retain them. [ ][ ]
  • Bug fixes:
    • Fix Python decompilation tests under Python 3.10+ [ ] and for Python 3.7 [ ].
    • Don t raise a traceback if we cannot unmarshal Python bytecode. This is in order to support Python 3.7 failing to load .pyc files generated with newer versions of Python. [ ]
    • Skip Python bytecode testing where we do not have an expected diff. [ ]
  • Codebase improvements:
    • Use our file_version_is_lt utility instead of accepting both versions of uImage expected diff. [ ]
    • Split out a custom call to assert_diff for a .startswith equivalent. [ ]
    • Use skipif instead of manual conditionals in some tests. [ ]
In addition, Jelle van der Waa added external tool references for Arch Linux for ocamlobjinfo, openssl and ffmpeg [ ][ ][ ] and added Arch Linux as a Continuous Integration (CI) test target. [ ] and Vagrant Cascadian updated the testsuite to skip Python bytecode comparisons when file(1) is older than 5.39. [ ] as well as added external tool references for the Guix distribution for dumppdf and ppudump. [ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ]. Lastly, Guangyuan Yang updated the FreeBSD package name on the website [ ], Mattia Rizzolo made a change to override a new Lintian warning due to the new test files [ ], Roland Clobus added support to detect and log if the GNU_BUILD_ID field in an ELF binary been modified [ ], Sandro J ckel updated a number of helpful links on the website [ ] and Sergei Trofimovich made the uImage test output support file() version 5.41 [ ].

reprotest reprotest is the Reproducible Build s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences. This month, reprotest version 0.7.18 was uploaded to Debian unstable by Holger Levsen, which also included a change by Holger to clarify that Python 3.9 is used nowadays [ ], but it also included two changes by Vasyl Gello to implement realistic CPU architecture shuffling [ ] and to log the selected variations when the verbosity is configured at a sufficiently high level [ ]. Finally, Vagrant Cascadian updated reprotest to version 0.7.18 in GNU Guix.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix unreproducible packages. We try to send all of our patches upstream where appropriate. We authored a large number of such patches this month, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Incorporate a fix from bremner into builtin-pho related to binary-NMUs. [ ]
      • Keep bullseye environments around longer, in an attempt to fix a Jenkins issue. [ ]
      • Improve the documentation of buildinfos.debian.net. [ ]
      • Improve documentation for the builtin-pho setup. [ ][ ]
    • OpenWrt-related changes:
      • Also use -j1 for better debugging. [ ]
      • Document that that Python 3.x is now used. [ ]
      • Enable further debugging for the toolchain build. [ ]
    • New snapshot.reproducible-builds.org service:
      • Actually add new node. [ ][ ]
      • Install xfsprogs on snapshot.reproducible-builds.org. [ ]
      • Create account for fpierret on new node. [ ]
      • Run node_health_check job on new node too. [ ]
  • Mattia Rizzolo:
    • Debian-related changes:
      • Handle schroot errors when invoking diffoscope instead of masking them. [ ][ ]
      • Declare and define some variables separately to avoid masking the subshell return code. [ ]
      • Fix variable name. [ ]
      • Improve log reporting. [ ]
      • Execute apt-get update with the -q argument to get more decent logs. [ ]
      • Set the Debian HTTP mirror and proxy for snapshot.reproducible-builds.org. [ ]
      • Install the libarchive-tools package (instead of bsdtar) when updating Jenkins nodes. [ ]
    • Be stricter about errors when starting the node agent [ ] and don t overwrite NODE_NAME so that we can expect Jenkins to properly set for us [ ].
    • Explicitly warn if the NODE_NAME is not a fully-qualified domain name (FQDN). [ ]
    • Document whether a node runs in the future. [ ]
    • Disable postgresql_autodoc as it not available in bullseye. [ ]
    • Don t be so eager when deleting schroot internals, call to schroot -e to terminate the schroots instead. [ ]
    • Only consider schroot underlays for deletion that are over a month old. [ ][ ]
    • Only try to unmount /proc if it s actually mounted. [ ]
    • Move the db_backup task to its own Jenkins job. [ ]
Lastly, Vasyl Gello added usage information to the reproducible_build.sh script [ ].

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

4 November 2021

Russell Coker: USB Microphones

The Situation I bought myself some USB microphones over ebay, I couldn t see any with USB type A connectors (the original USB connectors) and bought ones with USB-C connectors. I thought it would be good to have microphones that could work with recent mobile phones and with PCs, because surely it wouldn t be difficult to get an adaptor. I tested one of the microphones, it worked well on a phone. I bought a pair of adaptors for USB A ports on a PC or laptop to USB-C (here s the link to where I bought them). I used one of the adaptors with a USB-C HDMI device which gave the following line from lsusb, I didn t try using a HDMI monitor on my laptop, having the device recognised was enough.
Bus 003 Device 002: ID 2109:0100 VIA Labs, Inc. USB 2.0 BILLBOARD
I tried connecting a USB-C microphone and Linux didn t recognise the existence of a USB device, I tried that on a PC and a laptop on multiple ports. I wondered whether the description of the VIA BILLBOARD device as USB 2.0 was relevant to my problem. According to Big Mess O Wires USB-C has separate wires for USB 3.1 and USB 2 [1]. So someone could make a device that converts USB-A to USB-C with only USB-2 wires in place. I tested the USB-A to USB-C adaptor with the HDMI device in a USB SuperSpeed (IE 3.x) port and it still identified as USB 2.0. I suspect that the USB-C HDMI device is using all the high speed wires for DisplayPort data (with a conversion to HDMI) and therefore looks like a USB 2.0 device. The Problem I want to install a microphone in my workstation for long Zoom training sessions (7 hours in a day) that otherwise require me to use multiple Android devices as I don t have a device that will do 7 hours of Zoom without running out of battery. A new workstation with USB-C is unreasonably expensive. A PCIe USB-C card would give me the port at the back of the machine, I can t have the back of the machine near the microphone because it s too noisy. If I could have a USB-C hub with reasonable length cables (the 1M cables typical for USB 2.0 hubs would be fine) connected to a USB-C port at the back of my workstation that would work. But there seems to be a great lack of USB-C hubs. NewBeDev has an informative post about the lack of USB-C hubs that have multiple USB-C ports [2]. There also seems to be a lack of USB-C hubs with cables longer than 20cm. The Solution I ended up ordering a Sades Wand gaming headset [3], that has over-ear headphones and an attached microphone which connects to the computer via USB 2.0. I gave the URL for the sades.com.au web site for reference but you will get a significantly better price by buying on ebay ($39+postage vs about $30 including postage). I guess I won t be using my new USB-C microphones for a while.

1 November 2021

Andreas R nnquist: Debian packages, version numbers and pre-release versions

Getting the latest version of a package into Debian involves checking when there are new versions available fortunately (and not surprisingly) Debian has tools to make this simpler. I have recently ran into the problem when an upstream beta version sorts higher than a newer non-beta version. Which is problematic, of course. This is due to Debian sorting something like 1.0b as later than a pure 1.0 version.
gusnan@debian-i7:~ > dpkg --compare-versions 1.0b lt 1.0 && echo true
gusnan@debian-i7:~ > dpkg --compare-versions 1.0b gt 1.0 && echo true
true
But there s a solution name the beta versions something like 1.0~beta. And you don t need to force upstream to make any changes either. You can use uscan and the watch file to make it interpret an upstream 1.0b version as 1.0~beta in Debian. This is done by using a line like
uversionmangle=s/(\d)[\_\.\-\+]?((RC rc pre dev beta alpha b a)\d*)$/$1~$2/;s/\~b/\~beta/;,\
in uversionmangle in your debian/watch file. In this case i have added on the end something to make the ending ~b into ~beta instead. Full version of the watch file available here.

26 October 2021

Russell Coker: Links October 2021

Bloomburg has an insightful article about Juniper, the NSA, and the compromise of Netscreen [1]. It was worse than we previously thought and the Chinese government was involved. Haaretz has an amusing story about security issues at a credit card company based on a series of major WTFs [2]. They used WhatsApp for communicating with customers (despite the lack of support from Facebook for issues like account compromise), stored it on a phone (they should have used a desktop PC), didn t lock the phone down (should have been in a locked case and bolted down like any other financial security device), and allowed it to get stolen. Fortunately the thief was only after a free phone not the financial data stored on it. David Brin wrote an insightful blog post Should facts and successes matter in economics? Or politics? [3] which is part of his series about challenging conservatives to bet on their policies. Vice has an interesting article about a normal-looking USB-C to Lightning cable that intercepts data transfer and sends it out via an embedded Wifi AP [4]. Getting that into such a small space is an impressive engineering feat. The vendor already has a YSB-A to lightning cable with such features for $120 [5]. That s too expensive to just leave them lying around and hope that someone with interesting data finds them, but it s also quite cheap for a targeted attack. Interesting article about tracking people via Bluetooth MAC address or device name [6]. Most of the research is based on a man riding a bike around Norway and passively sniffing Bluetooth transmissions. You can buy commercial devices that can receive Bluetooth from 1Km away. A recent version of Bluetooth has random Mac addresses but that still allows tracking by device name which for many people is their own name. Cory Doctorow has a good summary of the ways that Facebook is rotten [7]. It s worse than you think. In 2019 almost all Facebook s top Christian pages were run by foreign troll farms [8]. This is partly due to Christians being gullible, but Facebook is also to blame for this. Cornell has an interesting article about using CRISPR to identify the gender of chicken eggs before they hatch [9]. This means that instead of killing roosters hatched from eggs for egg production they can just put those eggs for eating and save some money. Another option would be to genetically engineer more sexual dimorphism into chickens as the real problem is that hens for laying eggs are too thin to be good for eating so if you could have a breed of chicken with thin hens and fat cocks then all eggs could be hatched and the chickens used. The article claims that this is an ethical benefit of not killing baby roosters, but really it s about saving 50 cents per egg. Umair Haque wrote an insightful article about why everything will get more expensive as the externalities dating back to the industrial revolution have to be paid for [9]. Alexei Navalny (the jailed Russian opposition politician who Putin tried to murder) wrote an insightful article about why corruption is at the root of most world problems and how to solve it [10]. Cory Doctorow wrote an insightful article about breaking in to the writing industry which can apply to starting in most careers [11]. The main point is that people who have established careers have knowledge about starting a career that s at best outdated and at most totally irrelevant. Learning from people who are at most one step ahead of you is probably best. Peter Wehner wrote an insightful article for The Atlantic about the way churches in the US are breaking apart due to political issues [12]. Similar things appear to be happening in Australia for the same reason, conservative fear based politics which directly opposes everything in the Bible about Jesus is taking over churches. On the positive side this should destroy churches and the way churches are currently going they should be destroyed. The Guardian has an article about the incidence of reinfection with Covid19 [13]. The current expectation is that people who aren t vaccinated will probably get it about every 16 months if it becomes endemic (as it has in the US and will do in Australia if conservatives have their way). If the mortality rate is 2% each time then an unvaccinated person could expect a 15% chance of dying over the course of 10 years if there is no cumulative damage. However if damage to the heart and lungs accumulates over multiple courses of the disease then the probability of death over 10 years could be a lot higher. Psyche has an interesting article by Professor Jan-Willem van Prooijeni about the way that conspiracy theories bypass rationality [14]. The way that entertaining stories bypass rationality is particularly concerning given the way Facebook and other social media are driven by clickbait.

8 September 2021

Ian Jackson: Wanted: Rust sync web framework

tl;dr: Please recommend me a high-level Rust server-side web framework which is sync and does not plan to move to an async api. Why Async Rust gives somewhat higher performance. But it is considerably less convenient and ergonomic than using threads for concurrency. Much work is ongoing to improve matters, but I doubt async Rust will ever be as easy to write as sync Rust. "Even" sync multithreaded Rust is very fast (and light on memory use) compared to many things people write web apps in. The vast majority of web applications do not need the additional performance (which is typically a percentage, not a factor). So it is rather disappointing to find that all the review articles I read, and all the web framework authors, seem to have assumed that async is the inevitable future. There should be room for both sync and async. Please let universal use of async not be the inevitable future! What I would like a web framework that provides a sync API (something like Rocket 0.4's API would be ideal) and will remain sync. It should probably use (async) hyper underneath. So far I have not found one single web framework on crates.io that neither is already async nor suggests that its authors intend to move to an async API. Some review articles I found even excluded sync frameworks entirely! Answers in the comments please :-).

comment count unavailable comments

7 September 2021

Joachim Breitner: A Candid explainer: Opt is special

This is the third post in a series about the interface description language Candid.

The record extension problem Initially, the upgrade rules of Candid were a straight-forward application of the canonical subtyping rules. This worked and was sound, but it forbid one very commonly requested use case: Extending records in argument position. The subtyping rule for records say that
   record   old_field : nat; new_field : int  
<: record   old_field : nat  
or, in words, that subtypes can have more field than supertypes. This makes intuitive sense: A user of a such a record value that only expects old_field to be there is not bothered if suddenly a new_field appears. But a user who expects new_field to be there is thrown off if it suddenly isn t anymore. Therefore it is ok to extend the records returned by a service with new fields, but not to extend the record in your methods s argument types. But users want to extend their record types over time, also in argument position! In fact, they often want to extend them in both positions. Consider a service with the following interface, where the CUser record appears both in argument and result position:
type CUser = record   login : text; name : text  ;
service C :  
  register_user : (CUser) -> ();
  get_user_data : (text) -> (CUser);
 
It seems quite natural to want to extend the record with a new field (say, the age of the user). But simply changing the definition of CUser to
type CUser = record   login : text; name : text; age : nat  
is not ok, because now register_user requires the age field, but old clients don t provide it. So how can we allow developers to make changes like this (because they really really want that), while keeping the soundness guarantees made by Candid (because we really really want that)? This question has bothered us for over two years, and we even created a meta issue that records the dozen approached we considered, discussed and eventually ditched.

Dynamic subtyping checks in opt I will spare you the history lesson, though, and explain the solution we eventually came up with. In the example above it seems unreasonable to let the developer add a field age of type nat. Since there may be old clients around, the service unavoidably has to deal with records that don t have an age field. If the code expects an age value, what should it do in that case? But one can argue that changing the record type to
type CUser = record   login : text; name : text; age : opt nat  
could work: If the age field is present, use the value. If the field is absent, inject a null value during decoding. In the first-order case, this rather obvious solution would work just fine, and we d be done. But Candid aims to solve the higher order case, and I said things get tricky here, didn t I? Consider another, independent service D with the following interface:
type DUser = record   login : text; name : text  ;
service D :  
  on_user_added : (func (DUser) -> ()) -> ()
 
This service has a method that takes a method reference, presumably with the intention of calling it whenever a new user was added to service D. And because the types line up nicely, we can compose these two services, by once calling D.on_user_added(C.register_user), so that from now on D calls C.register_user( ). These kind of service mesh-ups are central to the vision of the Internet Computer! But what happens if the services now evolve their types in different ways, e.g. towards
type CUser = record   login : text; name : text; age : opt nat  
and
type DUser = record   login : text; name : text; age : opt variant   under_age; adult  
Individually, these are type changes that we want to allow. But now the call from D to C may transmit a value of record ; age = opt variant under_age when C expects a natural number! This is precisely what we want to prevent by introducing types, and it seems we have lost soundness. The best solution we could find is to make the opt type somewhat special, and apply these extra rules when decoding at an expected type of opt t.
  • If this was a record field, and the provided record value doesn t even have a field of that name, don t fail but instead treat this as null. This handles the first-order example above.
  • If there is a value, look at its type (which, in Candid, is part of the message).
    • If it is opt t' and t' <: t, then decode as normal.
    • If it is opt t' but t' <: t does not hold, then treat that as null. This should only happen in these relatively obscure higher-order cases where services evolved differently and incompatibly, and makes sure that the calls that worked before the upgrades will continue to work afterwards. It is not advisable to actually use that rule when changing your service s interface. Tools that assist the developer with an upgrade check should prevent or warn about the use of this rule.
  • Not strictly required here, but since we are making opt special anyways: If its type t' is not of the form opt , pretend it was opt t', and also pretend that the given value was wrapped in opt. This allows services to make record field in arguments that were required in the old version optional in a new version, e.g. as a way to deprecated them. So it is mildly useful, although I can report that it makes the meta-theory and implementation rather complex, in particular together with equirecursion and beasts like type O = opt O. See this discussion for a glimpse of that.
In the above I stress that we look at the type of the provided value, and not just the value. For example, if the sender sends the value opt vec at type opt vec nat, and the recipient expects opt vec bool, then this will decode as null, even though one could argue that the value opt vec could easily be understood at type opt vec bool. We initially had that, but then noticed that this still breaks soundness when there are references inside, and we have to do a full subtyping check in the decoder. This is very unfortunate, because it makes writing a Candid decoder a noticeably harder task that requires complicated graph algorithms with memoization (which I must admit Motoko has not implemented yet), but it s the least bad solution we could find so far.

Isn t that just dynamic typing? You might have noticed that with these rules, t <: opt t' holds for all types t and t'. In other words, every opt is a top type (like reserved), thus all optional types are equivalent. One could argue that we are throwing all of the benefits of static typing over board this way. But it s not quite as bad: It s true that decoding Candid values now involves a dynamic check inserting null values in certain situations, but as long as everyone plays by the rules (i.e. upgrades their services according to the Candid safe upgrade rules, and heeds the warning alluded to above), these certain situations will only happen in the obscurest of cases. It is, by the way, not material to this solution that the special subtyping behavior is applied to the opt type, and a neighboring point in the design space would be a dedicated upgraded t type constructor. That would allow developers to use the canonical opt type with the usual subtyping rules and communicate their intent ( clients should consider this field optional vs. old clients don t know about this field, but new clients should use it ) more cleanly at the expense of having more non-canonical types in the type system. To see why it is convenient if Candid has mostly just the normal types, read the next post, which will describe how Candid can be integrated into a host language.

5 September 2021

Reproducible Builds: Reproducible Builds in August 2021

Welcome to the latest report from the Reproducible Builds project. In this post, we round up the important things that happened in the world of reproducible builds in August 2021. As always, if you are interested in contributing to the project, please visit the Contribute page on our website.
There were a large number of talks related to reproducible builds at DebConf21 this year, the 21st annual conference of the Debian Linux distribution (full schedule):
PackagingCon (@PackagingCon) is new conference for developers of package management software as well as their related communities and stakeholders. The virtual event, which is scheduled to take place on the 9th and 10th November 2021, has a mission is to bring different ecosystems together: from Python s pip to Rust s cargo to Julia s Pkg, from Debian apt over Nix to conda and mamba, and from vcpkg to Spack we hope to have many different approaches to package management at the conference . A number of people from reproducible builds community are planning on attending this new conference, and some may even present. Tickets start at $20 USD.
As reported in our May report, the president of the United States signed an executive order outlining policies aimed to improve the cybersecurity in the US. The executive order comes after a number of highly-publicised security problems such as a ransomware attack that affected an oil pipeline between Texas and New York and the SolarWinds hack that affected a large number of US federal agencies. As a followup this month, however, a detailed fact sheet was released announcing a number large-scale initiatives and that will undoubtedly be related to software supply chain security and, as a result, reproducible builds.
Lastly, We ran another productive meeting on IRC in August (original announcement) which ran for just short of two hours. A full set of notes from the meeting is available.

Software development kpcyrd announced an interesting new project this month called I probably didn t backdoor this which is an attempt to be:
a practical attempt at shipping a program and having reasonably solid evidence there s probably no backdoor. All source code is annotated and there are instructions explaining how to use reproducible builds to rebuild the artifacts distributed in this repository from source. The idea is shifting the burden of proof from you need to prove there s a backdoor to we need to prove there s probably no backdoor . This repository is less about code (we re going to try to keep code at a minimum actually) and instead contains technical writing that explains why these controls are effective and how to verify them. You are very welcome to adopt the techniques used here in your projects. ( )
As the project s README goes on the mention: the techniques used to rebuild the binary artifacts are only possible because the builds for this project are reproducible . This was also announced on our mailing list this month in a thread titled i-probably-didnt-backdoor-this: Reproducible Builds for upstreams. kpcyrd also wrote a detailed blog post about the problems surrounding Linux distributions (such as Alpine and Arch Linux) that distribute compiled Python bytecode in the form of .pyc files generated during the build process.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made a number of changes, including releasing version 180), version 181) and version 182) as well as the following changes:
  • New features:
    • Add support for extracting the signing block from Android APKs. [ ]
    • If we specify a suffix for a temporary file or directory within the code, ensure it starts with an underscore (ie. _ ) to make the generated filenames more human-readable. [ ]
    • Don t include short GCC lines that differ on a single prefix byte either. These are distracting, not very useful and are simply the strings(1) command s idea of the build ID, which is displayed elsewhere in the diff. [ ][ ]
    • Don t include specific .debug-like lines in the ELF-related output, as it is invariably a duplicate of the debug ID that exists better in the readelf(1) differences for this file. [ ]
  • Bug fixes:
    • Add a special case to SquashFS image extraction to not fail if we aren t the superuser. [ ]
    • Only use java -jar /path/to/apksigner.jar if we have an apksigner.jar as newer versions of apksigner in Debian use a shell wrapper script which will be rejected if passed directly to the JVM. [ ]
    • Reduce the maximum line length for calculating Wagner-Fischer, improving the speed of output generation a lot. [ ]
    • Don t require apksigner in order to compare .apk files using apktool. [ ]
    • Update calls (and tests) for the new version of odt2txt. [ ]
  • Output improvements:
    • Mention in the output if the apksigner tool is missing. [ ]
    • Profile diffoscope.diff.linediff and specialize. [ ][ ]
  • Logging improvements:
    • Format debug-level messages related to ELF sections using the diffoscope.utils.format_class. [ ]
    • Print the size of generated reports in the logs (if possible). [ ]
    • Include profiling information in --debug output if --profile is not set. [ ]
  • Codebase improvements:
    • Clarify a comment about the HUGE_TOOLS Python dictionary. [ ]
    • We can pass -f to apktool to avoid creating a strangely-named subdirectory. [ ]
    • Drop an unused File import. [ ]
    • Update the supported & minimum version of Black. [ ]
    • We don t use the logging variable in a specific place, so alias it to an underscore (ie. _ ) instead. [ ]
    • Update some various copyright years. [ ]
    • Clarify a comment. [ ]
  • Test improvements:
    • Update a test to check specific contents of SquashFS listings, otherwise it fails depending on the test systems user ID to username passwd(5) mapping. [ ]
    • Assign seen and expected values to local variables to improve contextual information in failed tests. [ ]
    • Don t print an orphan newline when the source code formatting test passes. [ ]

In addition Santiago Torres Arias added support for Squashfs version 4.5 [ ] and Felix C. Stegerman suggested a number of small improvements to the output of the new APK signing block [ ]. Lastly, Chris Lamb uploaded python-libarchive-c version 3.1-1 to Debian experimental for the new 3.x branch python-libarchive-c is used by diffoscope.

Distribution work In Debian, 68 reviews of packages were added, 33 were updated and 10 were removed this month, adding to our knowledge about identified issues. Two new issue types have been identified too: nondeterministic_ordering_in_todo_items_collected_by_doxygen and kodi_package_captures_build_path_in_source_filename_hash. kpcyrd published another monthly report on their work on reproducible builds within the Alpine and Arch Linux distributions, specifically mentioning rebuilderd, one of the components powering reproducible.archlinux.org. The report also touches on binary transparency, an important component for supply chain security. The @GuixHPC account on Twitter posted an infographic on what fraction of GNU Guix packages are bit-for-bit reproducible: Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Elsewhere, it was discovered that when supporting various new language features and APIs for Android apps, the resulting APK files that are generated now vary wildly from build to build (example diffoscope output). Happily, it appears that a patch has been committed to the relevant source tree. This was also discussed on our mailing list this month in a thread titled Android desugaring and reproducible builds started by Marcus Hoffmann.

Website and documentation There were quite a few changes to the Reproducible Builds website and documentation this month, including:
  • Felix C. Stegerman:
    • Update the website self-build process to not use the buster-backports suite now that Debian Bullseye is the stable release. [ ]
  • Holger Levsen:
    • Add a new page documenting various package rebuilder solutions. [ ]
    • Add some historical talks and slides from DebConf20. [ ][ ]
    • Various improvements to the history page. [ ][ ][ ]
    • Rename the Comparison protocol documentation category to Verification . [ ]
    • Update links to F-Droid documentation. [ ]
  • Ian Muchina:
    • Increase the font size of titles and de-emphasize event details on the talk page. [ ]
    • Rename the README file to README.md to improve the user experience when browsing the Git repository in a web browser. [ ]
  • Mattia Rizzolo:
    • Drop a position:fixed CSS statement that is negatively affecting with some width settings. [ ]
    • Fix the sizing of the elements inside the side navigation bar. [ ]
    • Show gold level sponsors and above in the sidebar. [ ]
    • Updated the documentation within reprotest to mention how ldconfig conflicts with the kernel variation. [ ]
  • Roland Clobus:
    • Added a ticket number for the issue with the live Cinnamon image and diffoscope. [ ]

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Make a large number of changes to support the new Debian bookworm release, including adding it to the dashboard [ ], start scheduling tests [ ], adding suitable Apache redirects [ ] etc. [ ][ ][ ][ ][ ]
      • Make the first build use LANG=C.UTF-8 to match the official Debian build servers. [ ]
      • Only test Debian Live images once a week. [ ]
      • Upgrade all nodes to use Debian Bullseye [ ] [ ]
      • Update README documentation for the Debian Bullseye release. [ ]
    • Other changes:
      • Only include rsync output if the $DEBUG variable is enabled. [ ]
      • Don t try to install mock, a tool used to build Fedora packages some time ago. [ ]
      • Drop an unused function. [ ]
      • Various documentation improvements. [ ][ ]
      • Improve the node health check to detect zombie jobs. [ ]
  • Jessica Clarke (FreeBSD-related changes):
    • Update the location and branch name for the main FreeBSD Git repository. [ ]
    • Correctly ignore the source tarball when comparing build results. [ ]
    • Drop an outdated version number from the documentation. [ ]
  • Mattia Rizzolo:
    • Block F-Droid jobs from running whilst the setup is running. [ ]
    • Enable debugging for the rsync job related to Debian Live images. [ ]
    • Pass BUILD_TAG and BUILD_URL environment for the Debian Live jobs. [ ]
    • Refactor the master_wrapper script to use a Bash array for the parameters. [ ]
    • Prefer YAML s safe_load() function over the unsafe variant. [ ]
    • Use the correct variable in the Apache config to match possible existing files on disk. [ ]
    • Stop issuing HTTP 301 redirects for things that not actually permanent. [ ]
  • Roland Clobus (Debian live image generation):
    • Increase the diffoscope timeout from 120 to 240 minutes; the Cinnamon image should now be able to finish. [ ]
    • Use the new snapshot service. [ ]
    • Make a number of improvements to artifact handling, such as moving the artifacts to the Jenkins host [ ] and correctly cleaning them up at the right time. [ ][ ][ ]
    • Where possible, link to the Jenkins build URL that created the artifacts. [ ][ ]
    • Only allow only one job to run at the same time. [ ]
  • Vagrant Cascadian:
    • Temporarily disable armhf nodes for DebConf21. [ ][ ]

Lastly, if you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

30 August 2021

Joachim Breitner: A Candid explainer: Safe higher-order upgrades

This is the second post in a series about the interface description language Candid.

Safe upgrades A central idea behind Candid is that services evolve over time, and so also their interfaces evolve. As they do, it is desirable to keep the interface usable by clients who have not been updated. In particular on a blockchainy platform like the Internet Computer, where some programs are immutable and cannot be changed to accommodate changes in the interface of the services they use, this is of importance. Therefore, Candid defines which changes to an interface are guaranteed to be backward compatible. Of course it s compatible to add new methods to a service, but some changes to a method signature can also be ok. For example, changing
service A1 :  
  get_value : (variant   current; previous : nat  )
    -> (record   value : int; last_change : nat  )
 
to
service A2 :  
  get_value : (variant   current; previous : nat; default  )
    -> (record   value : int; last_change : nat; committed : bool  )
 
is fine: It doesn t matter that clients that still use the old interface don t know about the new constructor of the argument variant. And the extra field in the result record will silently be ignored by old clients. In the Candid spec, this relation is written as A2 <: A1 (note the order!), and the formal footing this builds on is subtyping. We thus say that it is safe to upgrade a service to a subtype , and that A2 s interface is a subtype of A1 s. In small examples, I often use nat and int, because every nat is also a int, but some int values are not not nat values, namely the negative ones. We say nat is a subtype of int, nat <: int. So a function that in the first returns a int can in the new version return a nat without breaking old clients. But the other direction doesn t work: If the old version s method had a return type of nat, then changing that to int would be a breaking change, as old clients would not be prepared to handle negative numbers. Note that arguments of function types follow different rules. In fact, the rules are simply turned around, and in argument position (also called negative position ), we can evolve the interface to accept supertypes. Concretely, a function that originally expected an nat can be changed to expect an int, but the other direction doesn t work, because there might still be old clients around that send negative numbers. This reversing-of-the-rules is called contravariance. The vision is that the developer s tooling will warn the developer before they upgrade the code of a running service to a type that is not a subtype of the old interface. Let me stress, though, that all this is about not breaking existing clients that use the old interface. It does not mean that a client developer who fetches the new interface for your service won t have to change their code to make his programs compile again.

Higher order composition So far, we considered the simple case of one service with a bunch of clients, i.e. the first order situation. Things get much more interesting if we have multiple services that are composed, and that pass around references to each other, and we still want everything to be nicely typed and never fail even as we upgrade services. Therefore, Candid s type system can express that a service s method expects or a returns a reference to another service or method, and the type that this service or method should have. For example
service B :   add_listener : (text, func (int) -> ()) -> ()  
says that the service with interface B has a method called add_listener which takes two arguments, a plain value of type text and a reference to a function that itself expects a int-typed argument. The contravariance of the subtyping rules explained above also applies to the types of these function reference. And because the int in the above type is on the left of two function arrows, the subtyping rule direction flips twice, and it is therefore safe to upgrade the service to accept the following interface:
service B :   add_listener : (text, func (nat) -> ()) -> ()  

Soundness theorem and Coq proof The combination of these higher order features and the safe upgrade mechanism is maybe the unique selling point of Candid, but also what made its design quite tricky sometimes. And although the conventional subtyping rules work well, we wanted to do some less conventional things (more on that below), and more than once thought we had a good solution, only to notice that it did not hold water in complicated higher-order cases. But what does it mean to hold water? I felt the need to precisely define a soundness criteria for higher order interface description languages, which you can find in the document An IDL Soundness Proposition. This is a general framework which you can instantiate with your concrete IDL language and rules, and then it tells you what you have to prove to consider your language to be sound. Intuitively, it simulates all possible ways in which services can be defined and upgraded, and in which they can pass around references to each other, and the soundness property is that then that for all messages sent between services, they can always be understood. The document also shows, in general, that any system that builds on canonical subtyping in sound, as expected. That is still a generic theorem that you can instantiate with a concrete system, but now there is less to do. Because these proofs can get tricky in corner cases, it is valuable to mechanize them in an interactive theorem prover and have the computer check the proofs. So I have created a Coq formalization that defines the soundness criteria, including the reduction to canonical subtyping. It also defines MiniCandid, a greatly simplified variant of Candid, proves various properties about it (transitivity etc.) and finally instantiates the soundness theorem. I am particularly fond of the use of coinduction to model the equirecursive types, and the use of named cases, as we know them from Isabelle, to make the proofs a bit more readable and maintainable. I am less fond of how much work it seems to be to extend MiniCandid with more of Candid s types. Already adding vec was more work than it s worth it, and I defensively blame Coq s not-so-great support for nested recursion. The soundness relies on subtyping, and that is all fine and well as long as we stick to canonical rules. Unfortunately, practical considerations force us to deviate from these a bit, as we see in the next post of this series.

Russell Coker: Links August 2021

Sciencealert has an interesting article on a game to combat misinformation by microdosing people [1]. The game seemed overly simplistic to me, but I guess I m not the target demographic. Research shows it to work. Vice has an interesting and amusing article about mass walkouts of underpaid staff in the US [2]. The way that corporations are fighting an increase in the minimum wage doesn t seem financially beneficial for them. An increase in the minimum wage means small companies have to increase salaries too and the ratio of revenue to payroll is probably worse for small companies. It seems that companies like McDonalds make oppressing their workers a higher priority than making a profit. Interesting article in Vice about how the company Shot Spotter (which determines the locations of gunshots by sound) forges evidence for US police [3]. All convictions based on Shot Spotter evidence should be declared mistrials. BitsNBites has an interesting article on the fundamental flaws of SIMD (Single Instruction Multiple Data) [4]. The Daily Dot has a disturbing article anbout the possible future of the QAnon movement [5]. Let s hope they become too busy fighting each other to hurt many innocent people. Ben Taylor wrote an interesting blog post suggesting that Web Assembly should be a default binary target [6]. I don t support that idea but I think that considering it is useful. Web assembly could be used more for non-web things and it would be a better option than Node.js for some things. There are also some interesting corner cases like games, Minecraft was written in Java and there s no reason that Web Assembly couldn t do the same things. Vice has an interesting article about the Phantom encrypted phone service that ran on Blackberry handsets [7]. Australia really needs legislation based on the US RICO law! Vice has an interesting article about an encrypted phone company run by drug dealers [8]. Apparently after making an encrypted phone system for their own use they decided to sell it to others and made millions of dollars. They could have run a successful legal business. Salon has an insightful interview with Michael Petersen about his research on fake news and people who share it because they need chaos [9]. Apparently low status people who are status seeking are a main contributor to this, they share fake news knowingly to spread chaos. A society with less inequality would have less problems with fake news. Salon has another insightful interview with Michael Petersen, about is later research on fake news as an evolutionary strategy [10]. People knowingly share fake news to mobilise their supporters and to signal allegiance to their group. The more bizarre the beliefs are the more strongly they signal allegiance. If an opposing group has a belief then they can show support for their group by having the opposite belief (EG by opposing vaccination if the other political side supports doctors). He also suggests that lying can be a way of establishing dominance, the more honest people are opposed by a lie the more dominant the liar may seem. Vice has an amusing article about how police took over the Encrochat encrypted phone network that was mostly used by criminals [11]. It s amusing to read of criminals getting taken down like this. It s also interesting to note that the authorities messed up by breaking the wipe facility which alerted the criminals that their security was compromised. The investigation could have continued for longer if they hadn t changed the functionality of compromised phones. A later vice article mentioned that the malware installed on Encrochat devices recorded MAC addresses of Wifi access points which was used to locate the phones even though they had the GPS hardware removed. Cory Doctorow wrote an insightful article for Locus about the insufficient necessity of interoperability [12]. The problem if monopolies is not just an inability to interoperate with other services or leave it s losing control over your life. A few cartel participants interoperating will be able to do all the bad things to us tha a single monopolist could do.

29 August 2021

Joachim Breitner: A Candid explainer: The rough idea

One of the technologies that was created by DFINITY as we built the Internet Computer is Candid. Candid is In this in-depth blog post series I want to shed some light onto these aspects of Candid. The target audience is mainly anyone who wants to deeply understand Candid, e.g. researchers, implementors of Candid tools, anyone who wants to builds a better alternative, but no particular prior Candid knowledge is expected. Also, much of what is discussed here is independent of the Internet Computer. Some posts may be more theoretical or technical than others; if you are lost in one I hope you ll rejoin for the subsequent post Much in these posts is not relevant for developers who just want to use Candid in their projects, and also much that they want to know is missing. The Internet Computer dev docs will hopefully cater to that audience.

Blog post series overview
  1. The rough idea (this post) Announcing the blog post series, and very briefly outlining what an Interface Description Language even is.
  2. Safe higher-order upgrades Every can do first-order interface description languages. What makes Candid special is its support for higher-order service composition. This post also contains some general meta-theory and a Coq proof!
  3. Opt is special Extending records in argument position is surprisingly tricky. Lean why, and how we solved it.
  4. Language integration The point of Candid is inter-op, so let s look the various patterns of connecting Candid to your favorite programming language.
  5. Quirks The maybe most interesting post in this series: Candid has a bunch of quirks (hashed field names, no tuples but sequnces, references that nobody used, and more). I ll explain some of them, why they are there, and what else we could have done. Beware: This post will be opinionated.

The rough idea The idea of an interface description language is easy to begin with. Something like
service A :  
  add : (int) -> ();
  get : () -> (int);
 
clearly communicates that a service called A provides two methods add and get, that add takes an integer as an argument, and that get returns such a number. Of course, this only captures the outer shape of the service, but not what it actually does we might assume it implements a counter of sorts, but that is not part of the Candid interface description. In order to now use the service, we also need a (low-level) transport mechanism. Candid itself does not specify that, but merely assumes that it exists, and is able to transport the name of the invoked method and a raw sequence of bytes to the service, and a response (again a raw sequence of bytes) back. Already on the Internet Computer we have two distinct transport mechanisms used are external calls via an HTTP-based RPC interface and on-chain inter-canister calls. Both are handled by the service in the same way. Here we can see that Candid succeeds in abstracting over differences in the low-level transport mechanism. Because of this abstraction, it is possible to use Candid over other transportation mechanisms as well (conventional HTTPS, E-Mail, avian carrier). I think Candid has potential there as well, so even if you are not interested in the Internet Computer, this post may be interesting to you. The translation of argument and result values (e.g. numbers of type int) to the raw sequence of bytes is specified by Candid, defining a wire format. For example, the type int denotes whole numbers of arbitrary size, and they are encoded using the LEB128 scheme. The Candid wire format also contains a magic number (DIDL, which is 0x4449444c in hex) and a type description (more on that later). So when passing the value 2342 to the add, the raw bytes transported will be 0x4449444c00017da612. Of course we want to transfer more data than just integers, and thus Candid supports a fairly complete set of common basic types (nat, int, nat8, nat16, nat32, nat64, int8, int16, int32, int64, float32, float64, bool, text, blob) and composite types (vectors, optional values , records and variants). The design goal is to provide a canonical set of types enough to express most data that you might want to pass, but no more than needed, so that different host languages can support these types easily. The Candid type system is structural. This means that two types are the same when they are defined the same way. Imagine two different services defining the same
type User = record   name : text; user_id : nat  
then although User is defined in two places, it s still the same type. In other words, these name type definitions are always just simple aliases, and what matters is their right-hand side. Because Candid types can be recursive (e.g. type Peano = opt Peano), this means we have an equirecursive type system, which makes some things relatively hard. So far, nothing too special about Candid compared to other interface definition languages (although a structural equirecursive type system is already something). In the next post we look at reference types, and how such higher order features make Candid an interesting technology.

25 August 2021

Jelmer Vernooij: Thousands of Debian packages updated from their upstream Git repository

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. Linux distributions like Debian fulfill an important function in the FOSS ecosystem - they are system integrators that take existing free and open source software projects and adapt them where necessary to work well together. They also make it possible for users to install more software in an easy and consistent way and with some degree of quality control and review. One of the consequences of this model is that the distribution package often lags behind upstream releases. This is especially true for distributions that have tighter integration and standardization (such as Debian), and often new upstream code is only imported irregularly because it is a manual process - both updating the package, but also making sure that it still works together well with the rest of the system. The process of importing a new upstream used to be (well, back when I started working on Debian packages) fairly manual and something like this:

Ecosystem Improvements However, there have been developments over the last decade that make it easier to import new upstream releases into Debian packages.
Uscan and debian QA watch Uscan and debian/watch have been around for a while and make it possible to find upstream tarballs. A debian watch file usually looks something like this:
1
2
version=4
http://somesite.com/dir/filenamewithversion.tar.gz
The QA watch service regularly polls all watch locations in the archive and makes the information available, so it s possible to know which packages have changed without downloading each one of them.
Git Git is fairly ubiquitous nowadays, and most upstream projects and packages in Debian use it. There are still exceptions that do not use any version control system or that use a different control system, but they are becoming increasingly rare. [1]
debian/upstream/metadata DEP-12 specifies a file format with metadata about the upstream project that a package was based on. In particular relevant for our case is the fact it has fields for the location of the upstream version control location. debian/upstream/metadata files look something like this:
1
2
3
---
Repository: https://www.dulwich.io/code/dulwich/
Repository-Browse: https://www.dulwich.io/code/dulwich/
While DEP-12 is still a draft, it has already been widely adopted - there are about 10000 packages in Debian that ship a debian/upstream/metadata file with Repository information.
Autopkgtest The Autopkgtest standard and associated tooling provide a way to run a defined set of tests against an installed package. This makes it possible to verify that a package is working correctly as part of the system as a whole. ci.debian.net regularly runs these tests against Debian packages to detect regressions.
Vcs-Git headers The Vcs-Git headers in debian/control are the equivalent of the Repository field in debian/upstream/metadata, but for the packaging repositories (as opposed to the upstream ones). They ve been around for a while and are widely adopted, as can be seen from zack s stats: The vcswatch service that regularly polls packaging repositories to see whether they have changed makes it a lot easier to consume this information in usable way.
Debhelper adoption Over the last couple of years, Debian has slowly been converging on a single build tool - debhelper s dh interface. Being able to rely on a single build tool makes it easier to write code to update packaging when upstream changes require it.
Debhelper DWIM Debhelper (and its helpers) increasingly can figure out how to do the Right Thing in many cases without being explicitly configured. This makes packaging less effort, but also means that it s less likely that importing a new upstream version will require updates to the packaging. With all of these improvements in place, it actually becomes feasible in a lot of situations to update a Debian package to a new upstream version automatically. Of course, this requires that all of this information is available, so it won t work for all packages. In some cases, the packaging for the older upstream version might not apply to the newer upstream version. The Janitor has attempted to import a new upstream Git snapshot and a new upstream release for every package in the archive where a debian/watch file or debian/upstream/metadata file are present. These are the steps it uses:
  • Find new upstream version
    • If release, use debian/watch - or maybe tagged in upstream repository
    • If snapshot, use debian/upstream/metadata s Repository field
    • If neither is available, use guess-upstream-metadata from upstream-ontologist to guess the upstream Repository
  • Merge upstream version into packaging repository, possibly importing tarballs using pristine-tar
  • Update the changelog file to mention the new upstream version
  • Run some checks to ensure there are no unintentional changes, e.g.:
    • Scan diff between old and new for surprising license changes
      • Today, abort if there are any - in the future, maybe update debian/copyright
    • Check for obvious compatibility breaks - e.g. sonames changing
  • Attempt to update the packaging to reflect upstream changes
    • Refresh patches
  • Attempt to build the package with deb-fix-build, to deal with any missing dependencies
  • Run the autopkgtests with deb-fix-build to deal with missing dependencies, and abort if any tests fail
Results When run over all packages in unstable (sid), this process works for a surprising number of them.
Fresh Releases For fresh-releases (aka imports of upstream releases), processing all packages maintained in Git for which QA watch reports new releases (about 11,000): That means about 2300 packages updated, and about 4000 unchanged.
Fresh Snapshots For fresh-snapshots (aka imports of latest Git commit from upstream), processing all packages maintained in Git (about 26,000): Or 5100 packages updated and 2100 for which there was nothing to do, i.e. no upstream commits since the last Debian upload. As can be seen, this works for a surprising fraction of packages. It s possible to get the numbers up even higher, by both improving the tooling, the autopkgtests and the metadata that is provided by packages.
Using these packages All the packages that have been built can be accessed from the Janitor APT repository. More information can be found at https://janitor.debian.net/fresh, but in short - run:
1
2
3
4
5
6
echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \
    https://janitor.debian.net/ fresh-snapshots main   sudo tee /etc/apt/sources.list.d/fresh-snapshots.list
echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \
    https://janitor.debian.net/ fresh-releases main   sudo tee /etc/apt/sources.list.d/fresh-releases.list
sudo curl -o /usr/share/keyrings/debian-janitor-archive-keyring.gpg https://janitor.debian.net/pgp_keys
apt update
And then you can install packages from the fresh-snapshots (upstream git snapshots) or fresh-releases suites on a case-by-case basis by running something like:
1
apt install -t fresh-snapshots r-cran-roxygen2
Most packages are updated based on information provided by vcswatch and qa watch, but it s also possible for upstream repositories to call a web hook to trigger a refresh of a package. These packages were built against unstable, but should in almost all cases also work for testing.
Caveats Of course, since these packages are built automatically without human supervision it s likely that some of them will have bugs in them that would otherwise have been caught by the maintainer.
[1]I m not saying that a monoculture is great here, but it does help distributions.

Bits from Debian: DebConf21 welcomes its sponsors!

DebConf21 logo DebConf21 is taking place online, from 24 August to 28 August 2021. It is the 22nd Debian conference, and organizers and participants are working hard together at creating interesting and fruitful events. We would like to warmly welcome the 19 sponsors of DebConf21, and introduce them to you. We have five Platinum sponsors. Our first Platinum sponsor is Lenovo. As a global technology leader manufacturing a wide portfolio of connected products, including smartphones, tablets, PCs and workstations as well as AR/VR devices, smart home/office and data center solutions, Lenovo understands how critical open systems and platforms are to a connected world. Our next Platinum sponsor is Infomaniak. Infomaniak is Switzerland's largest web-hosting company, also offering backup and storage services, solutions for event organizers, live-streaming and video on demand services. It wholly owns its datacenters and all elements critical to the functioning of the services and products provided by the company (both software and hardware). Roche is our third Platinum sponsor. Roche is a major international pharmaceutical provider and research company dedicated to personalized healthcare. More than 100,000 employees worldwide work towards solving some of the greatest challenges for humanity using science and technology. Roche is strongly involved in publicly funded collaborative research projects with other industrial and academic partners and has supported DebConf since 2017. Amazon Web Services (AWS) is our fourth Platinum sponsor. Amazon Web Services is one of the world's most comprehensive and broadly adopted cloud platform, offering over 175 fully featured services from data centers globally (in 77 Availability Zones within 24 geographic regions). AWS customers include the fastest-growing startups, largest enterprises and leading government agencies. Google is our fifth Platinum sponsor. Google is one of the largest technology companies in the world, providing a wide range of Internet-related services and products such as online advertising technologies, search, cloud computing, software, and hardware. Google has been supporting Debian by sponsoring DebConf for more than ten years, and is also a Debian partner sponsoring parts of Salsa's continuous integration infrastructure within Google Cloud Platform. Our Gold sponsor is the Matanel Foundation. The Matanel Foundation operates in Israel, as its first concern is to preserve the cohesion of a society and a nation plagued by divisions. The Matanel Foundation also works in Europe, in Africa and in South America. Our Silver sponsors are: arm: the World s Best SoC Design Portfolio, Arm powered solutions have been supporting innovation for more than 30 years and are deployed in over 160 billion chips to date, Hudson-Trading, a company researching and developing automated trading algorithms using advanced mathematical techniques, Ubuntu the Operating System delivered by Canonical, Globo, the largest media conglomerate in Brazil, founded in Rio de Janeiro in 1925 and distributing high-quality content across multiple platforms, Two Sigma, rigorous inquiry, data analysis, and invention to help solve the toughest challenges across financial services and GitLab, an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Bronze sponsors: Univention, Gandi.net, daskeyboard, InterFace AG and credativ. And finally, our Supporter level sponsor, ISG.EE. Thanks to all our sponsors for their support! Their contributions make it possible for a large number of Debian contributors from all over the globe to work together, help and learn from each other in DebConf21. Participating in DebConf21 online The 22nd Debian Conference is being held online, due to COVID-19, from August 24 to 28, 2021. Talks, discussions, panels and other activities run from 12:00 to 02:00 UTC. Visit the DebConf21 website at https://debconf21.debconf.org to learn about the complete schedule, watch the live streaming and join the different communication channels for participating in the conference.

15 August 2021

Dirk Eddelbuettel: RcppBDT 0.2.4 on CRAN: Updates

After a seven-year break (!!), the RcppBDT packages has been updated on CRAN. The RcppBDT package is an early adopter of Rcpp and was one of the first packages utilizing Boost and its Date_Time library. The now more widely-used package anytime is a direct descentant of RcppBDT. In fact, the last time RcppBDT was released, anytime did not yet exist. And some of the changes now finally released here in this version are some of the first steps made towards what became anytime. RcppBDT is broader in scope and provides a wider range of functionality but in a somewhat rougher form as we never sat down writing higher-end wrappers in R for all the potential use cases. When we wrote the first RcppBDT versions, many other popular date/time packages were all in R code and not compiled, and this package showed how things could be done at the compiled level. Now other packages, including anytime have filled the void so fully polishing RcppBDT may never happen. In any event, this release refreshes the package and brings it to full R CMD check --as-cran compliance. The NEWS entry follows:

Changes in version 0.2.4 (2021-08-15)
  • New utility function toPOSIXct which can take multitple input format (integer, floating point or character) vectors and can convert by relying on a wide variety of standard formats. This predates what has long been split off into a new package anytime which is more functional and feaureful.
  • New demo 'toPOSIXct' illustrating the feature.
  • New demo 'toPOSIXctTiming' benchmarking it.
  • Documentation for new functions was added as well.
  • CI now uses run.sh from r-ci.
  • Functions getLastDayOfWeekInMonth and getFirstDayOfWeekInMonth now use dow argument.
  • The shared library is now registered when loaded from NAMESPACE.
  • C level entry points are now registered as R now recommends.
  • Several badges were added to the README.md file.
  • Several fields were added to the DESCRIPTION file, and/or updated.
  • Documentation URLs where both updated as needed and converted to https.

Courtesy of my CRANberries, there is also a diffstat report for this release. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

4 August 2021

Dirk Eddelbuettel: x13binary 1.1.57-1 on CRAN: New Upstream, New M1 Binary

Christoph and I are please to share that a new release 1.1.57-1 of x13binary, of the X-13ARIMA-SEATS program by the US Census Bureau (with updated upstream release 1.1.57) is now on CRAN. The x13binary package takes the pain out of installing X-13ARIMA-SEATS by making it a fully resolved CRAN dependency. For example, when installing the excellent seasonal package by Christoph, then X-13ARIMA-SEATS will get pulled in via the x13binary package and things just work. Just depend on x13binary and on all major OSs supported by R you should have an X-13ARIMA-SEATS binary installed which will be called seamlessly by the higher-level packages such as seasonal or gunsales. With this the full power of the what is likely the world s most sophisticated deseasonalization and forecasting package is now at your fingertips and the R prompt, just like any other of the 17960+ CRAN packages. You can read more about this (and the seasonal package) in the Journal of Statistical Software paper by Christoph and myself. This release brings a new upstream release as well as binaries. We continue to support two Linux flavours (theh standard x86_64 as well as armv7l), windows and for a first time two macOS flavour. In addition to the existing Intel binary we now have a native built using the arm64 M1 chip (with thanks to Kirill for the assist). We still lack a genuine binary for Solaris so if any of the esteemed readers of this post happens to have access to R on Solaris along with a basic Fortran compiler, we would love to hear from you. Building X-13ARIMA-SEATS from source on Solaris should be straightforward as it is on the other OSs. Or is someone with a bit of time wants to help following Gabor s tutorial we would greatly appreciate it. Courtesy of my CRANberries, there is also a diffstat report for this release showing changes to the previous release. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

16 July 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.6.0.0 on CRAN: A New Upstream

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 882 other packages on CRAN. This new release gets us Armadillo 10.6.0 which was released yesterday. We did the usual reverse dependency checks (which came out spotless and clean), and had also just done even fuller checks for Rcpp 1.0.7. Since the previous RcppArmadillo 0.10.5.0.0 release we made a few interim releases to the drat repo. In general, Conrad is a little more active than we want to be with (montly or less frequent) CRAN updates so keep and eye on the drat repo (or follow the GitHub repo) for a higher-frequence cadence. To use the drat repo, use install.packages("RcppArmadillo", repos="https://RcppCore.github.io/drat") or update.packages() with a similar repos argument. The full set of changes follows. We include the last interim release as well.

Changes in RcppArmadillo version 0.10.6.0.0 (2021-07-16)
  • Upgraded to Armadillo release 10.6.0 (Keep Calm)
    • expanded chol() to optionally use pivoted decomposition
    • expanded vector, matrix and cube constructors to allow element initialisation via fill::value(scalar), eg. mat X(4,5,fill::value(123))
    • faster loading of CSV files when using OpenMP
    • added csv_opts::semicolon option to allow saving/loading of CSV files with semicolon (;) instead of comma (,) as the separator

Changes in RcppArmadillo version 0.10.5.3.0 (2021-07-01)
  • Upgraded to Armadillo release 10.5.3 (Antipodean Fortress)
  • GitHub-only release
  • Extended test coverage with several new tests, added a coverage badge.

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Next.