Search Results: "paultag"

2 February 2026

Paul Tagliamonte: Paging all Radio Curious Hackers

After years of thinking about and learning about how radios work, I figured it was high-time to start to more aggressively share the things i ve been learning. I had a ton of fun at DistrictCon year 0, so it was a pretty natural place to pitch an RF-focused introductory talk. I was selected for Year 1, and able to give my first ever RF related talk about how to set off restaurant pagers (including one on stage!) by reading and writing IQ directly using a little bit of stdlib only Python. This talk is based around the work I ve written about previously (here, here and here), but the all-in-one form factor was something I was hoping would help encourage folks out there to take a look under the hood of some of the gear around them. (In case the iframe above isn t working, direct link to the YouTube video recording is here) I ve posted my slides from the talk at PARCH.pdf to hopefully give folks some time to flip through them directly. All in all, the session was great It was truely humbling to see so many folks interested in hearing me talk about radios. I had a bit of an own-goal in picking a 20 minute form-factor, so the talk is paced wrong (it feels like it went way too fast). Hopefully being able to see the slides and pause the video is helpful. We had a short ad-hoc session after where I brought two sets of pagers and my power switch; but unfortunately we didn t have anyone who was able to trigger any of the devices on their own (due to a mix of time between sessions and computer set-up). Hopefully it was enough to get folks interested in trying this on their own!

27 October 2025

Paul Tagliamonte: It's NOT always DNS.

I ve written down a new rule (no name, sorry) that I ll be repeating to myself and those around me. If you can replace DNS with key value store mapping a name to an ip and it still makes sense, it was not, in fact, DNS. Feel free to repeat it along with me. Sure, the It s always DNS meme is funny the first few hundred times you see it but what s less funny is when critical thinking ends because a DNS query is involved. DNS failures are often the first observable problem because it s one of the first things that needs to be done. DNS is fairly complicated, implementation-dependent, and at times frustrating to debug but it is not the operational hazard it s made out to be. It s at best a shallow take, and at worst actively holding teams back from understanding their true operational risks. IP connectivity failures between a host and the rest of the network is not a reason to blame DNS. This would happen no matter how you distribute the updated name to IP mappings. Wiping out all the records during the course of operations due to an automation bug is not a reason to blame DNS. This, too, would happen no matter how you distribute the name to IP mappings. Something made the choice to delete all the mappings, and it did what you asked it to do There s plenty of annoying DNS specific sharp edges to blame when things do go wrong (like 8.8.8.8 and 1.1.1.1 disagreeing about resolving a domain because of DNSSEC, or since we re on the topic, a DNSSEC rollout bricking prod for hours) for us to be cracking jokes anytime a program makes a DNS request. We can do better.

10 July 2025

Tianon Gravi: Yubi Whati? (YubiKeys, ECDSA, and X.509)

Off-and-on over the last several weeks, I've been spending time trying to learn/understand YubiKeys better, especially from the perspective of ECDSA and signing. I had a good mental model for how "slots" work (canonically referenced by their hexadecimal names such as 9C), but found that it had a gap related to "objects"; while closing that, I was annoyed that the main reference table for this gap lives primarily in either a PDF or inside several implementations, so I figured I should create the reference I want to see in the world, but that it would also be useful to write down some of my understanding for my own (and maybe others') future reference. So, to that end, I'm going to start with a bit ( ) of background information, with the heavy caveat that this only applies to "PIV" ("FIPS 201") usage of YubiKeys, and that I only actually care about ECDSA, although I've been reassured that it's the same for at least RSA (anything outside this is firmly Here Be Not Tianon; "gl hf dd"). (Incidentally, learning all this helped me actually appreciate the simplicity of cloud-based KMS solutions, which was an unexpected side effect. ) At a really high level, ECDSA is like many other (asymmetric) cryptographic solutions you've got a public key and a private key, the private key can be used to "sign" data (tiny amounts of data, in fact, like P-256 can only reasonably sign 256 bits of data, which is where cryptographic hashes like SHA256 come in as secure analogues for larger data in small bit sizes), and the public key can then be used to verify that the data was indeed signed by the private key, and only someone with the private key could've done so. There's some complex math and RNGs involved, but none of that's actually relevant to this post, so find that information elsewhere. Unfortunately, this is where things go off the rails: PIV is X.509 ("x509") heavy, and there's no X.509 in the na ve view of my use case. In a YubiKey (or any other PIV-signing-supporting smart card? do they actually have competitors in this specific niche? ), a given "slot" can hold one single private key. There are ~24 slots which can hold a private key and be used for signing, although "Slot 9c" is officially designated as the "Digital Signature" slot and is encouraged for signing purposes. One of the biggest gotchas is that with pure-PIV (and older YubiKey firmware ) the public key for a given slot is only available at the time the key is generated, and the whole point of the device in the first place is that the private key is never, ever available from it (all cryptographic operations happen inside the device), so if you don't save that public key when you first ask the device to generate a private key in a particular slot, the public key is lost forever (asterisk).
$ # generate a new ECDSA P-256 key in "slot 9c" ("Digital Signature")
$ # WARNING: THIS WILL GLEEFULLY WIPE SLOT 9C WITHOUT PROMPTING
$ yubico-piv-tool --slot 9c --algorithm ECCP256 --action generate
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEtGoWRGyjjUlJFXpu8BL6Rnx8jjKR
5+Mzl2Vepgor+k7N9q7ppOtSMWefjFVR0SEPmXqXINNsCi6LpLtNEigIRg==
-----END PUBLIC KEY-----
Successfully generated a new private key.
$ # this is the only time/place we (officially) get this public key
With that background, now let's get to the second aspect of "slots" and how X.509 fits. For every aforementioned slot, there is a corresponding "object" (read: place to store arbitrary data) which is corresponding only by convention. For all these "key" slots the (again, by convention) corresponding "object" is explicitly supposed to be an X.509 certificate (see also the PDF reference linked above). It turns out this is a useful and topical place to store that public key we need to keep handy! It's also an interesting place to shove additional details about what the key in a given slot is being used for, if that's your thing. Converting the raw public key into a (likely self-signed) X.509 certificate is an exercise for the reader, but if you want to follow the conventions, you need some way to convert a given "slot" to the corresponding "object", and that is the lookup table I wish existed in more forms. So, without further ado, here is the anti-climax:
Slot Object Description
0x9A 0x5FC105 X.509 Certificate for PIV Authentication
0x9E 0x5FC101 X.509 Certificate for Card Authentication
0x9C 0x5FC10A X.509 Certificate for Digital Signature
0x9D 0x5FC10B X.509 Certificate for Key Management
0x82 0x5FC10D Retired X.509 Certificate for Key Management 1
0x83 0x5FC10E Retired X.509 Certificate for Key Management 2
0x84 0x5FC10F Retired X.509 Certificate for Key Management 3
0x85 0x5FC110 Retired X.509 Certificate for Key Management 4
0x86 0x5FC111 Retired X.509 Certificate for Key Management 5
0x87 0x5FC112 Retired X.509 Certificate for Key Management 6
0x88 0x5FC113 Retired X.509 Certificate for Key Management 7
0x89 0x5FC114 Retired X.509 Certificate for Key Management 8
0x8A 0x5FC115 Retired X.509 Certificate for Key Management 9
0x8B 0x5FC116 Retired X.509 Certificate for Key Management 10
0x8C 0x5FC117 Retired X.509 Certificate for Key Management 11
0x8D 0x5FC118 Retired X.509 Certificate for Key Management 12
0x8E 0x5FC119 Retired X.509 Certificate for Key Management 13
0x8F 0x5FC11A Retired X.509 Certificate for Key Management 14
0x90 0x5FC11B Retired X.509 Certificate for Key Management 15
0x91 0x5FC11C Retired X.509 Certificate for Key Management 16
0x92 0x5FC11D Retired X.509 Certificate for Key Management 17
0x93 0x5FC11E Retired X.509 Certificate for Key Management 18
0x94 0x5FC11F Retired X.509 Certificate for Key Management 19
0x95 0x5FC120 Retired X.509 Certificate for Key Management 20
See also "piv-objects.json" for a machine-readable copy of this data. (Major thanks to paultag and jon gzip johnson for helping me learn and generally putting up with me, but especially dealing with my live-stream-of-thoughts while I stumble through the dark. )

16 June 2025

Paul Tagliamonte: The Promised LAN

The Internet has changed a lot in the last 40+ years. Fads have come and gone. Network protocols have been designed, deployed, adopted, and abandoned. Industries have come and gone. The types of people on the internet have changed a lot. The number of people on the internet has changed a lot, creating an information medium unlike anything ever seen before in human history. There s a lot of good things about the Internet as of 2025, but there s also an inescapable hole in what it used to be, for me. I miss being able to throw a site up to send around to friends to play with without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my website, costing me thousands in network transfer for the privilege. I miss being able to put a lightly authenticated game server up and not worry too much at night wondering if that process is now mining bitcoin. I miss being able to run a server in my home closet. Decades of cat and mouse games have rendered running a mail server nearly impossible. Those who are brave enough to try are met with weekslong stretches of delivery failures and countless hours yelling ineffectually into a pipe that leads from the cheerful lobby of some disinterested corporation directly into a void somewhere 4 layers below ground level. I miss the spirit of curiosity, exploration, and trying new things. I miss building things for fun without having to worry about being too successful, after which security offices start demanding my supplier paperwork in triplicate as heartfelt thanks from their engineering teams. I miss communities that are run because it is important to them, not for ad revenue. I miss community operated spaces and having more than four websites that are all full of nothing except screenshots of each other. Every other page I find myself on now has an AI generated click-bait title, shared for rage-clicks all brought-to-you-by-our-sponsors completely covered wall-to-wall with popup modals, telling me how much they respect my privacy, with the real content hidden at the bottom bracketed by deceptive ads served by companies that definitely know which new coffee shop I went to last month. This is wrong, and those who have seen what was know it. I can t keep doing it. I m not doing it any more. I reject the notion that this is as it needs to be. It is wrong. The hole left in what the Internet used to be must be filled. I will fill it.

What comes before part b? Throughout the 2000s, some of my favorite memories were from LAN parties at my friends places. Dragging your setup somewhere, long nights playing games, goofing off, even building software all night to get something working being able to do something fiercely technical in the context of a uniquely social activity. It wasn t really much about the games or the projects it was an excuse to spend time together, just hanging out. A huge reason I learned so much in college was that campus was a non-stop LAN party we could freely stand up servers, talk between dorms on the LAN, and hit my dorm room computer from the lab. Things could go from individual to social in the matter of seconds. The Internet used to work this way my dorm had public IPs handed out by DHCP, and my workstation could serve traffic from anywhere on the internet. I haven t been back to campus in a few years, but I d be surprised if this were still the case. In December of 2021, three of us got together and connected our houses together in what we now call The Promised LAN. The idea is simple fill the hole we feel is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a space that is intrinsically social, even though we re doing technical things. We can freely host insecure game servers or one-off side projects without worrying about what someone will do with it. Over the years, it s evolved very slowly we haven t pulled any all-nighters. Our mantra has become old growth , building each layer carefully. As of May 2025, the LAN is now 19 friends running around 25 network segments. Those 25 networks are connected to 3 backbone nodes, exchanging routes and IP traffic for the LAN. We refer to the set of backbone operators as The Bureau of LAN Management . Combined decades of operating critical infrastructure has driven The Bureau to make a set of well-understood, boring, predictable, interoperable and easily debuggable decisions to make this all happen. Nothing here is exotic or even technically interesting.

Applications of trusting trust The hardest part, however, is rejecting the idea that anything outside our own LAN is untrustworthy nearly irreversible damage inflicted on us by the Internet. We have solved this by not solving it. We strictly control membership the absolute hard minimum for joining the LAN requires 10 years of friendship with at least one member of the Bureau, with another 10 years of friendship planned. Members of the LAN can veto new members even if all other criteria is met. Even with those strict rules, there s no shortage of friends that meet the qualifications but we are not equipped to take that many folks on. It s hard to join -both socially and technically. Doing something malicious on the LAN requires a lot of highly technical effort upfront, and it would endanger a decade of friendship. We have relied on those human, social, interpersonal bonds to bring us all together. It s worked for the last 4 years, and it should continue working until we think of something better. We assume roommates, partners, kids, and visitors all have access to The Promised LAN. If they re let into our friends' network, there is a level of trust that works transitively for us I trust them to be on mine. This LAN is not for security , rather, the network border is a social one. Benign hacking in the original sense of misusing systems to do fun and interesting things is encouraged. Robust ACLs and firewalls on the LAN are, by definition, an interpersonal not technical failure. We all trust every other network operator to run their segment in a way that aligns with our collective values and norms. Over the last 4 years, we ve grown our own culture and fads around half of the people on the LAN have thermal receipt printers with open access, for printing out quips or jokes on each other s counters. It s incredible how much network transport and a trusting culture gets you there s a 3-node IRC network, exotic hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and even a SIP phone network of redphones .

DIY We do not wish to, nor will we, rebuild the internet. We do not wish to, nor will we, scale this. We will never be friends with enough people, as hard as we may try. Participation hinges on us all having fun. As a result, membership will never be open, and we will never have enough connected LANs to deal with the technical and social problems that start to happen with scale. This is a feature, not a bug. This is a call for you to do the same. Build your own LAN. Connect it with friends homes. Remember what is missing from your life, and fill it in. Use software you know how to operate and get it running. Build slowly. Build your community. Do it with joy. Remember how we got here. Rebuild a community space that doesn t need to be mediated by faceless corporations and ad revenue. Build something sustainable that brings you joy. Rebuild something you use daily. Bring back what we re missing.

4 March 2025

Paul Tagliamonte: Reverse Engineering (another) Restaurant Pager system

Some of you may remember that I recently felt a bit underwhelmed by the last pager I reverse engineered the Retekess TD-158, mostly due to how intuitive their design decions were. It was pretty easy to jump to conclusions because they had made some pretty good decisions on how to do things. I figured I d spin the wheel again and try a new pager system this time I went for a SU-68G-10 pager, since I recognized the form factor as another fairly common unit I ve seen around town. Off to Amazon I went, bought a set, and got to work trying to track down the FCC filings on this model. I eventually found what seemed to be the right make/model, and it, once again, indicated that this system should be operating in the 433 MHz ISM band likely using OOK modulation. So, figured I d start with the center of the band (again) at 433.92 MHz, take a capture, test my luck, and was greeted with a now very familiar sight. Same as the last goarounds, except the premable here is a 0 symbol followed by 6-ish symbol durations of no data, followed by 25 bits of a packet. Careful readers will observe 26 symbols above after the preamble I did too! The last 0 in the screenshot above is not actually a part of the packet rather, it s part of the next packet s preamble. Each packet is packed in pretty tight.

By Hand Demodulation Going off the same premise as last time, I figured i d give it a manual demod and see what shakes out (again). This is now the third time i ve run this play, so check out either of my prior two posts for a better written description of what s going on here I ll skip all the details since i d just be copy-pasting from those posts into here. Long story short, I demodulated a call for pager 1, call for pager 10, and a power off command.
What Bits
Call 1 1101111111100100100000000
Call 101101111111100100010100000
Off 1101111111100111101101110
A few things jump out at me here the first 14 bits are fixed (in my case, 11011111111001), which means some mix of preamble, system id, or other system-wide constant. Additionally, The last 9 bits also look like they are our pager the 1 and 10 pager numbers (LSB bit order) jump right out (100000000 and 010100000, respectively). That just leaves the two remaining bits which look to be the action 00 for a Call , and 11 for a Power off . I don t super love this since command has two bits rather than one, the base station ID seems really long, and a 9-bit Pager ID is just weird. Also, what is up with that power-off pager id? Weird. So, let s go and see what we can do to narrow down and confirm things by hand.

Testing bit flips Rather than call it a day at that, I figure it s worth a bit of diligence to make sure it s all correct so I figured we should try sending packets to my pagers and see how they react to different messages after flipping bits in parts of the packet. I implemented a simple base station for the pagers using my Ettus B210mini, and threw together a simple OOK modulator and transmitter program which allows me to send specifically crafted test packets on frequency. Implementing the base station is pretty straightforward, because of the modulation of the signal (OOK), it s mostly a matter of setting a buffer to 1 and 0 for where the carrier signal is on or off timed to the sample rate, and sending that off to the radio. If you re interested in a more detailed writeup on the steps involved, there s a bit more in my christmas tree post. First off, I d like to check the base id. I want to know if all the bits in what I m calling the base id are truly part of the base station ID, or perhaps they have some other purpose (version, preamble?). I wound up following a three-step process for every base station id:
  • Starting with an unmodified call packet for the pager under test:
    • Flip the Nth bit, and transmit the call. See if the pager reacts.
    • Hold SET , and pair the pager with the new packet.
    • Transmit the call. See if the pager reacts.
    • After re-setting the ID, transmit the call with the physical base station, see if the pager reacts.
  • Starting with an unmodified off packet for the pager system
  • Flip the Nth bit, transmit the off, see if the pager reacts.
What wound up happening is that changing any bit in the first 14 bits meant that the packet no longer worked with any pager until it was re-paired, at which point it begun to work again. This likely means the first 14 bits are part of the base station ID and not static between base stations, or some constant like a version or something. All bits appear to be used. I repeated the same process with the command bits, and found that only 11 and 00 caused the pagers to react for the pager ids i ve tried. I repeated this process one last time with the pager id bits this time, and found the last bit in the packet isn t part of the pager ID, and can be either a 1 or a 0 and still cause the pager to react as if it were a 0. This means that the last bit is unknown but it has no impact on either a power off or call, and all messages sent by my base station always have a 0 set. It s not clear if this is used by anything likely not since setting a bit there doesn t result in any change of behavior I can see yet.

Final Packet Structure After playing around with flipping bits and testing, the final structure I was able to come up with based on behavior I was able to observe from transmitting hand-crafted packets and watching pagers buzz:
base id
command
pager id
???

Commands The command section bit comes in two flavors either a call or an off command.
Type Id (2 bits) Description
Call00Call the pager identified by the id in pager id
Off11Request pagers power off, pager id is always 10110111
As for the actual RF PHY characteristics, here s my best guesses at what s going on with them:
What Description
Center Frequency 433.92 MHz
Modulation OOK
Symbol Duration 1300us
Bits 25
Preamble 325us of carrier, followed by 8800us of no carrier
I m not 100% on the timings, but they appear to be close enough to work reliabily. Same with the center frequency, it s roughly right but there may be a slight difference i m missing.

Lingering Questions This was all generally pretty understandable another system that had some good decisions, and wasn t too bad to reverse engineer. This was a bit more fun to do, since there was a bit more ambiguity here, but still not crazy. At least this one was a bit more ambiguous that needed a bit of followup to confirm things, which made it a bit more fun. I am left with a few questions, though which I m kinda interested in understanding, but I ll likely need a lot more data and/or original source: Why is the command two bits here? This was a bit tough to understand because of the number of bits they have at their disposal given the one last bit at the end of the packet that doesn t seem to do anything, there s no reason this couldn t have been a 16 bit base station id, and an 8 bit pager id along with a single bit command (call or off). When sending an off why is power off that bit pattern? Other pager IDs don t seem to work with off , so it has some meaning, but I m not sure what that is. You press and hold 9 on the physical base station, but the code winds up coming out to 0xED, 237 or maybe -19 if it s signed. I can t quite figure out why it s this value. Are there other codes? Finally what s up with the last bit? Why is it 25 bits and not 24? It must take more work to process something that isn t 8 bit aligned and all for something that s not being used!

20 February 2025

Paul Tagliamonte: boot2kier

I can t remember exactly the joke I was making at the time in my work s slack instance (I m sure it wasn t particularly funny, though; and not even worth re-reading the thread to work out), but it wound up with me writing a UEFI binary for the punchline. Not to spoil the ending but it worked - no pesky kernel, no messing around with userland . I guess the only part of this you really need to know for the setup here is that it was a Severance joke, which is some fantastic TV. If you haven t seen it, this post will seem perhaps weirder than it actually is. I promise I haven t joined any new cults. For those who have seen it, the payoff to my joke is that I wanted my machine to boot directly to an image of Kier Eagan. As for how to do it I figured I d give the uefi crate a shot, and see how it is to use, since this is a low stakes way of trying it out. In general, this isn t the sort of thing I d usually post about except this wound up being easier and way cleaner than I thought it would be. That alone is worth sharing, in the hopes someome comes across this in the future and feels like they, too, can write something fun targeting the UEFI. First thing s first gotta create a rust project (I ll leave that part to you depending on your life choices), and to add the uefi crate to your Cargo.toml. You can either use cargo add or add a line like this by hand:
uefi =   version = "0.33", features = ["panic_handler", "alloc", "global_allocator"]  
We also need to teach cargo about how to go about building for the UEFI target, so we need to create a rust-toolchain.toml with one (or both) of the UEFI targets we re interested in:
[toolchain]
targets = ["aarch64-unknown-uefi", "x86_64-unknown-uefi"]
Unfortunately, I wasn t able to use the image crate, since it won t build against the uefi target. This looks like it s because rustc had no way to compile the required floating point operations within the image crate without hardware floating point instructions specifically. Rust tends to punt a lot of that to libm usually, so this isnt entirely shocking given we re nostd for a non-hardfloat target. So-called softening requires a software floating point implementation that the compiler can use to polyfill (feels weird to use the term polyfill here, but I guess it s spiritually right?) the lack of hardware floating point operations, which rust hasn t implemented for this target yet. As a result, I changed tactics, and figured I d use ImageMagick to pre-compute the pixels from a jpg, rather than doing it at runtime. A bit of a bummer, since I need to do more out of band pre-processing and hardcoding, and updating the image kinda sucks as a result but it s entirely manageable.
$ convert -resize 1280x900 kier.jpg kier.full.jpg
$ convert -depth 8 kier.full.jpg rgba:kier.bin
This will take our input file (kier.jpg), resize it to get as close to the desired resolution as possible while maintaining aspect ration, then convert it from a jpg to a flat array of 4 byte RGBA pixels. Critically, it s also important to remember that the size of the kier.full.jpg file may not actually be the requested size it will not change the aspect ratio, so be sure to make a careful note of the resulting size of the kier.full.jpg file. Last step with the image is to compile it into our Rust bianary, since we don t want to struggle with trying to read this off disk, which is thankfully real easy to do.
const KIER: &[u8] = include_bytes!("../kier.bin");
const KIER_WIDTH: usize = 1280;
const KIER_HEIGHT: usize = 641;
const KIER_PIXEL_SIZE: usize = 4;
Remember to use the width and height from the final kier.full.jpg file as the values for KIER_WIDTH and KIER_HEIGHT. KIER_PIXEL_SIZE is 4, since we have 4 byte wide values for each pixel as a result of our conversion step into RGBA. We ll only use RGB, and if we ever drop the alpha channel, we can drop that down to 3. I don t entirely know why I kept alpha around, but I figured it was fine. My kier.full.jpg image winds up shorter than the requested height (which is also qemu s default resolution for me) which means we ll get a semi-annoying black band under the image when we go to run it but it ll work. Anyway, now that we have our image as bytes, we can get down to work, and write the rest of the code to handle moving bytes around from in-memory as a flat block if pixels, and request that they be displayed using the UEFI GOP. We ll just need to hack up a container for the image pixels and teach it how to blit to the display.
/// RGB Image to move around. This isn't the same as an
///  image::RgbImage , but we can associate the size of
/// the image along with the flat buffer of pixels.
struct RgbImage  
/// Size of the image as a tuple, as the
 /// (width, height)
 size: (usize, usize),
/// raw pixels we'll send to the display.
 inner: Vec<BltPixel>,
 
impl RgbImage  
/// Create a new  RgbImage .
 fn new(width: usize, height: usize) -> Self  
RgbImage  
size: (width, height),
inner: vec![BltPixel::new(0, 0, 0); width * height],
 
 
/// Take our pixels and request that the UEFI GOP
 /// display them for us.
 fn write(&self, gop: &mut GraphicsOutput) -> Result  
gop.blt(BltOp::BufferToVideo  
buffer: &self.inner,
src: BltRegion::Full,
dest: (0, 0),
dims: self.size,
 )
 
 
impl Index<(usize, usize)> for RgbImage  
type Output = BltPixel;
fn index(&self, idx: (usize, usize)) -> &BltPixel  
let (x, y) = idx;
&self.inner[y * self.size.0 + x]
 
 
impl IndexMut<(usize, usize)> for RgbImage  
fn index_mut(&mut self, idx: (usize, usize)) -> &mut BltPixel  
let (x, y) = idx;
&mut self.inner[y * self.size.0 + x]
 
 
We also need to do some basic setup to get a handle to the UEFI GOP via the UEFI crate (using uefi::boot::get_handle_for_protocol and uefi::boot::open_protocol_exclusive for the GraphicsOutput protocol), so that we have the object we need to pass to RgbImage in order for it to write the pixels to the display. The only trick here is that the display on the booted system can really be any resolution so we need to do some capping to ensure that we don t write more pixels than the display can handle. Writing fewer than the display s maximum seems fine, though.
fn praise() -> Result  
let gop_handle = boot::get_handle_for_protocol::<GraphicsOutput>()?;
let mut gop = boot::open_protocol_exclusive::<GraphicsOutput>(gop_handle)?;
// Get the (width, height) that is the minimum of
 // our image and the display we're using.
 let (width, height) = gop.current_mode_info().resolution();
let (width, height) = (width.min(KIER_WIDTH), height.min(KIER_HEIGHT));
let mut buffer = RgbImage::new(width, height);
for y in 0..height  
for x in 0..width  
let idx_r = ((y * KIER_WIDTH) + x) * KIER_PIXEL_SIZE;
let pixel = &mut buffer[(x, y)];
pixel.red = KIER[idx_r];
pixel.green = KIER[idx_r + 1];
pixel.blue = KIER[idx_r + 2];
 
 
buffer.write(&mut gop)?;
Ok(())
 
Not so bad! A bit tedious we could solve some of this by turning KIER into an RgbImage at compile-time using some clever Cow and const tricks and implement blitting a sub-image of the image but this will do for now. This is a joke, after all, let s not go nuts. All that s left with our code is for us to write our main function and try and boot the thing!
#[entry]
fn main() -> Status  
uefi::helpers::init().unwrap();
praise().unwrap();
boot::stall(100_000_000);
Status::SUCCESS
 
If you re following along at home and so interested, the final source is over at gist.github.com. We can go ahead and build it using cargo (as is our tradition) by targeting the UEFI platform.
$ cargo build --release --target x86_64-unknown-uefi

Testing the UEFI Blob While I can definitely get my machine to boot these blobs to test, I figured I d save myself some time by using QEMU to test without a full boot. If you ve not done this sort of thing before, we ll need two packages, qemu and ovmf. It s a bit different than most invocations of qemu you may see out there so I figured it d be worth writing this down, too.
$ doas apt install qemu-system-x86 ovmf
qemu has a nice feature where it ll create us an EFI partition as a drive and attach it to the VM off a local directory so let s construct an EFI partition file structure, and drop our binary into the conventional location. If you haven t done this before, and are only interested in running this in a VM, don t worry too much about it, a lot of it is convention and this layout should work for you.
$ mkdir -p esp/efi/boot
$ cp target/x86_64-unknown-uefi/release/*.efi \
 esp/efi/boot/bootx64.efi
With all this in place, we can kick off qemu, booting it in UEFI mode using the ovmf firmware, attaching our EFI partition directory as a drive to our VM to boot off of.
$ qemu-system-x86_64 \
 -enable-kvm \
 -m 2048 \
 -smbios type=0,uefi=on \
 -bios /usr/share/ovmf/OVMF.fd \
 -drive format=raw,file=fat:rw:esp
If all goes well, soon you ll be met with the all knowing gaze of Chosen One, Kier Eagan. The thing that really impressed me about all this is this program worked first try it all went so boringly normal. Truly, kudos to the uefi crate maintainers, it s incredibly well done.

Booting a live system Sure, we could stop here, but anyone can open up an app window and see a picture of Kier Eagan, so I knew I needed to finish the job and boot a real machine up with this. In order to do that, we need to format a USB stick. BE SURE /dev/sda IS CORRECT IF YOU RE COPY AND PASTING. All my drives are NVMe, so BE CAREFUL if you use SATA, it may very well be your hard drive! Please do not destroy your computer over this.
$ doas fdisk /dev/sda
Welcome to fdisk (util-linux 2.40.4).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-4014079, default 2048):
Last sector, +/-sectors or +/-size K,M,G,T,P  (2048-4014079, default 4014079):
Created a new partition 1 of type 'Linux' and of size 1.9 GiB.
Command (m for help): t
Selected partition 1
Hex code or alias (type L to list all): ef
Changed type of partition 'Linux' to 'EFI (FAT-12/16/32)'.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
Once that looks good (depending on your flavor of udev you may or may not need to unplug and replug your USB stick), we can go ahead and format our new EFI partition (BE CAREFUL THAT /dev/sda IS YOUR USB STICK) and write our EFI directory to it.
$ doas mkfs.fat /dev/sda1
$ doas mount /dev/sda1 /mnt
$ cp -r esp/efi /mnt
$ find /mnt
/mnt
/mnt/efi
/mnt/efi/boot
/mnt/efi/boot/bootx64.efi
Of course, naturally, devotion to Kier shouldn t mean backdooring your system. Disabling Secure Boot runs counter to the Core Principals, such as Probity, and not doing this would surely run counter to Verve, Wit and Vision. This bit does require that you ve taken the step to enroll a MOK and know how to use it, right about now is when we can use sbsign to sign our UEFI binary we want to boot from to continue enforcing Secure Boot. The details for how this command should be run specifically is likely something you ll need to work out depending on how you ve decided to manage your MOK.
$ doas sbsign \
 --cert /path/to/mok.crt \
 --key /path/to/mok.key \
 target/x86_64-unknown-uefi/release/*.efi \
 --output esp/efi/boot/bootx64.efi
I figured I d leave a signed copy of boot2kier at /boot/efi/EFI/BOOT/KIER.efi on my Dell XPS 13, with Secure Boot enabled and enforcing, just took a matter of going into my BIOS to add the right boot option, which was no sweat. I m sure there is a way to do it using efibootmgr, but I wasn t smart enough to do that quickly. I let er rip, and it booted up and worked great! It was a bit hard to get a video of my laptop, though but lucky for me, I have a Minisforum Z83-F sitting around (which, until a few weeks ago was running the annual http server to control my christmas tree ) so I grabbed it out of the christmas bin, wired it up to a video capture card I have sitting around, and figured I d grab a video of me booting a physical device off the boot2kier USB stick.
Attentive readers will notice the image of Kier is smaller then the qemu booted system which just means our real machine has a larger GOP display resolution than qemu, which makes sense! We could write some fancy resize code (sounds annoying), center the image (can t be assed but should be the easy way out here) or resize the original image (pretty hardware specific workaround). Additionally, you can make out the image being written to the display before us (the Minisforum logo) behind Kier, which is really cool stuff. If we were real fancy we could write blank pixels to the display before blitting Kier, but, again, I don t think I care to do that much work.

But now I must away If I wanted to keep this joke going, I d likely try and find a copy of the original video when Helly 100%s her file and boot into that or maybe play a terrible midi PC speaker rendition of Kier, Chosen One, Kier after rendering the image. I, unfortunately, don t have any friends involved with production (yet?), so I reckon all that s out for now. I ll likely stop playing with this the joke was done and I m only writing this post because of how great everything was along the way. All in all, this reminds me so much of building a homebrew kernel to boot a system into but like, good, though, and it s a nice reminder of both how fun this stuff can be, and how far we ve come. UEFI protocols are light-years better than how we did it in the dark ages, and the tooling for this is SO much more mature. Booting a custom UEFI binary is miles ahead of trying to boot your own kernel, and I can t believe how good the uefi crate is specifically. Praise Kier! Kudos, to everyone involved in making this so delightful .

12 November 2024

Paul Tagliamonte: Complex for Whom?

In basically every engineering organization I ve ever regarded as particularly high functioning, I ve sat through one specific recurring conversation which is not a conversation about complexity . Things are good or bad because they are or aren t complex, architectures needs to be redone because it s too complex some refactor of whatever it is won t work because it s too complex. You may have even been a part of some of these conversations or even been the one advocating for simple light-weight solutions. I ve done it. Many times. Rarely, if ever, do we talk about complexity within its rightful context complexity for whom. Is a solution complex because it s complex for the end user? Is it complex if it s complex for an API consumer? Is it complex if it s complex for the person maintaining the API service? Is it complex if it s complex for someone outside the team maintaining it to understand? Complexity within a problem domain I ve come to believe, is fairly zero-sum there s a fixed amount of complexity in the problem to be solved, and you can choose to either solve it, or leave it for those downstream of you to solve that problem on their own. That being said, while I believe there is a lower bound in complexity to contend with for a problem, I do not believe there is an upper bound to the complexity of solutions possible. It is always possible, and in fact, very likely that teams create problems for themselves while trying to solve a problem. The rest of this post is talking to the lower bound. When getting feedback on an early draft of this blog post, I ve been informed that Fred Brooks coined a term for what I call lower bound complexity Essential Complexity , in the paper No Silver Bullet Essence and Accident in Software Engineering , which is a better term and can be used interchangeably.

Complexity Culture In a large enough organization, where the team is high functioning enough to have and maintain trust amongst peers, members of the team will specialize. People will begin to engage with subsets of the work to be done, and begin to have their efficacy measured against that part of the organization s problems. Incentives shift, and over time it becomes increasingly likely that two engineers may have two very different priorities when working on the same system together. Someone accountable for uptime and tasked with responding to outages will begin to resist changes. Someone accountable for rapidly delivering features will resist gates between them and their users. Companies (either wittingly or unwittingly) will deal with this by tasking engineers with both production (feature development) and operational tasks (maintenance), so the difference in incentives isn t usually as bad as it could be. When we get a bunch of folks from far-flung corners of an organization in a room, fire up a slide deck and throw up some aspirational to-be architecture diagram in order to get a sign-off to solve some problem (be it someone needs a credible promotion packet, new feature needs to get delivered, or the system has begun to fail and needs fixing), the initial reaction will, more often than I d like, start to devolve into a discussion of how this is going to introduce a bunch of complexity, going to be hard to maintain, why can t you make it less complex? Right around here is when I start to try and contextualize the conversation happening around me understand what complexity is that being discussed, and understand who is taking on that burden. Think about who should be owning that problem, and work through the tradeoffs involved. Is it best solved here, or left to consumers (be them other systems, developers, or users). Should something become an API call s optional param, taking on all the edge-cases and on, or should users have to implement the logic using the data you return (leaving everyone else to take on all the edge-cases and maintenance)? Should you process the data, or require the user to preprocess it for you? Frequently it s right to make an active and explicit decision to simplify and leave problems to be solved downstream, since they may not actually need to be solved or perhaps you expect consumers will want to own the specifics of how the problem is solved, in which case you leave lots of documentation and examples. Many other times, especially when it s something downstream consumers are likely to hit, it s best solved internal to the system, since the only thing that can come of leaving it unsolved are bugs, frustration and half-correct solutions. This is a grey-space of tradeoffs, not a clear decision tree. No one wants the software manifestation of a katamari ball or a junk drawer, nor does anyone want a half-baked service unable to handle the simplest use-case.

Head-in-sand as a Service Popoffs about how complex something is, are, to a first approximation, best understood as meaning complicated for the person making comments . A lot of the #thoughtleadership believe that an AWS hosted EKS k8s cluster running images built by CI talking to an AWS hosted PostgreSQL RDS is not complex. They re right. Mostly right. This is less complex less complex for them. It s not, however, without complexity and its own tradeoffs it s just complexity that they do not have to deal with. Now they don t have to maintain machines that have pesky operating systems or hard drive failures. They don t have to deal with updating the version of k8s, nor ensuring the backups work. No one has to push some artifact to prod manually. Deployments happen unattended. You click a button and get a cluster. On the other hand, developers outside the ops function need to deal with troubleshooting CI, debugging access control rules encoded in turing complete YAML, permissions issues inside the cluster due to whatever the fuck a service mesh is, everyone needs to learn how to use some k8s tools they only actually use during a bad day, likely while doing some x.509 troubleshooting to connect to the cluster (an internal only endpoint; just port forward it) not to mention all sorts of rules to route packets to their project (a single repo s binary being run in 3 containers on a single vm host). Beyond that, there s the invisible complexity complexity on the interior of a service you depend on. I think about the dozens of teams maintaining the EKS service (which is either run on EC2 instances, or alternately, EC2 instances in a trench coat, moustache and even more shell scripts), the RDS service (also EC2 and shell scripts, but this time accounting for redundancy, backups, availability zones), scores of hypervisors pulled off the shelf (xen, kvm) smashed together with the ones built in-house (firecracker, nitro, etc) running on hardware that has to be refreshed and maintained continuously. Every request processed by network ACL rules, AWS IAM rules, security group rules, using IP space announced to the internet wired through IXPs directly into ISPs. I don t even want to begin to think about the complexity inherent in how those switches are designed. Shitloads of complexity to solve problems you may or may not have, or even know you had. What s more complex? An app running in an in-house 4u server racked in the office s telco closet in the back running off the office Verizon line, or an app running four hypervisors deep in an AWS datacenter? Which is more complex to you? What about to your organization? In total? Which is more prone to failure? Which is more secure? Is the complexity good or bad? What type of Complexity can you manage effectively? Which threaten the system? Which threaten your users?

COMPLEXIVIBES This extends beyond Engineering. Decisions regarding what tools are we able to use be them existing contracts with cloud providers, CIO mandated SaaS products, a list of the only permissible open source projects will incur costs in terms of expressed complexity . Pinning open source projects to a fixed set makes SBOM production less complex . Using only one SaaS provider s product suite (even if its terrible, because it has all the types of tools you need) makes accreditation less complex . If all you have is a contract with Pauly T s lowest price technically acceptable artisinal cloudary and haberdashery, the way you pay for your compute is less complex for the CIO shop, though you will find yourself building your own hosted database template, mechanism to spin up a k8s cluster, and all the operational and technical burden that comes with it. Or you won t and make it everyone else s problem in the organization. Nothing you can do will solve for the fact that you must now deal with this problem somewhere because it was less complicated for the business to put the workloads on the existing contract with a cut-rate vendor. Suddenly, the decision to reduce complexity because of an existing contract vehicle has resulted in a huge amount of technical risk and maintenance burden being onboarded. Complexity you would otherwise externalize has now been taken on internally. With a large enough organizations (specifically, in this case, i m talking about you, bureaucracies), this is largely ignored or accepted as normal since the personnel cost is understood to be free to everyone involved. Doing it this way is more expensive, more work, less reliable and less maintainable, and yet, somehow, is, in a lot of ways, less complex to the organization. It s particularly bad with bureaucracies, since screwing up a contract will get you into much more trouble than delivering a broken product, leaving basically no reason for anyone to care to fix this. I can t shake the feeling that for every story of technical mandates gone awry, somewhere just out of sight there s a decisionmaker optimizing for what they believe to be the least amount of complexity least hassle, fewest unique cases, most consistency as they can. They freely offload complexity from their accreditation and risk acceptance functions through mandates. They will never have to deal with it. That does not change the fact that someone does.

TC;DR (TOO COMPLEX; DIDN T REVIEW) We wish to rid ourselves of systemic Complexity after all, complexity is bad, simplicity is good. Removing upper-bound own-goal complexity ( accidental complexity in Brooks s terms) is important, but once you hit the lower bound complexity, the tradeoffs become zero-sum. Removing complexity from one part of the system means that somewhere else maybe outside your organization or in a non-engineering function must grow it back. Sometimes, the opposite is the case, such as when a previously manual business processes is automated. Maybe that s a good idea. Maybe it s not. All I know is that what doesn t help the situation is conflating complexity with everything we don t like legacy code, maintenance burden or toil, cost, delivery velocity.
  • Complexity is not the same as proclivity to failure. The most reliable systems I ve interacted with are unimaginably complex, with layers of internal protection to prevent complete failure. This has its own set of costs which other people have written about extensively.
  • Complexity is not cost. Sometimes the cost of taking all the complexity in-house is less, for whatever value of cost you choose to use.
  • Complexity is not absolute. Something simple from one perspective may be wildly complex from another. The impulse to burn down complex sections of code is helpful to have generally, but sometimes things are complicated for a reason, even if that reason exists outside your codebase or organization.
  • Complexity is not something you can remove without introducing complexity elsewhere. Just as not making a decision is a decision itself; choosing to require someone else to deal with a problem rather than dealing with it internally is a choice that needs to be considered in its full context.
Next time you re sitting through a discussion and someone starts to talk about all the complexity about to be introduced, I want to pop up in the back of your head, politely asking what does complex mean in this context? Is it lower bound complexity? Is this complexity desirable? Is what they re saying mean something along the lines of I don t understand the problems being solved, or does it mean something along the lines of this problem should be solved elsewhere? Do they believe this will result in more work for them in a way that you don t see? Should this not solved at all by changing the bounds of what we should accept or redefine the understood limits of this system? Is the perceived complexity a result of a decision elsewhere? Who s taking this complexity on, or more to the point, is failing to address complexity required by the problem leaving it to others? Does it impact others? How specifically? What are you not seeing? What can change? What should change?

14 June 2024

Paul Tagliamonte: Reverse Engineering a Restaurant Pager system

It s been a while since I played with something new been stuck in a bit of a rut with radios recently - working on refining and debugging stuff I mostly understand for the time being. The other day, I was out getting some food and I idly wondered how the restaurant pager system worked. Idle curiosity gave way to the realization that I, in fact, likely had the means and ability to answer this question, so I bought the first set of the most popular looking restaurant pagers I could find on eBay, figuring it d be a fun multi-week adventure.

Order up! I wound up buying a Retekess brand TD-158 Restaurant Pager System (they looked like ones I d seen before and seemed to be low-cost and popular), and quickly after, had a pack of 10 pagers and a base station in-hand. The manual stated that the radios operated at 433 MHz (cool! can do! Love a good ISM band device), and after taking an inital read through the manual for tips on the PHY, I picked out a few interesting things. First is that the base station ID was limited to 0-999, which is weird because it means the limiting factor is likely the base-10 display on the base station, not the protocol we need enough bits to store 999 at least 10 bits. Nothing else seemed to catch my eye, so I figured may as well jump right to it. Not being the type to mess with success, I did exactly the same thing as I did in my christmas tree post, and took a capture at 433.92MHz since it was in the middle of the band, and immediately got deja-vu. Not only was the signal at 433.92MHz, but throwing the packet into inspectrum gave me the identical plot of the OOK encoding scheme. Not just similar identical. The only major difference was the baud rate and bit structure of the packets, and the only minor difference was the existence of what I think is a wakeup preamble packet (of all zeros), rather than a preamble symbol that lasted longer than usual PHY symbol (which makes this pager system a bit easier to work with than my tree, IMHO). Getting down to work, I took some measurements to determine what the symbol duration was over the course of a few packets, I was able to determine the symbol rate was somewhere around 858 microseconds (0.000858 seconds per symbol), which is a weird number, but maybe I m slightly off or there s some larger math I m missing that makes this number satisfyingly round (internal low cost crystal clock or something? I assume this is some hardware constrait with the pager?) Anyway, good enough. Moving along, let s try our hand at a demod let s just assume it s all the same as the chrismas tree post and demod ones and zeros the same way here. That gives us 26 bits:
00001101110000001010001000
Now, I know we need at least 10 bits for the base station ID, some number of bits for the pager ID, and some bits for the command. This was a capture of me hitting call from a base station ID of 55 to a pager with the ID of 10, so let s blindly look for 10 bit chunks with the numbers we re looking for:
0000110111 0000001010 001000
Jeez. First try. 10 bits for the base station ID (55 in binary is 0000110111), 10 bits for the pager ID (10 in binary is 0000001010), which leaves us with 6 bits for a command (and maybe something else too?) which is 8 here. Great, cool, let s work off that being the case and revisit it if we hit bugs. Besides our data packet, there s also a preamble packet that I ll add in, in case it s used for signal detection or wakeup or something which is fairly easy to do since it s the same packet structure as the above, just all zeros. Very kind of them to leave it with the same number of bits and encoding scheme it s nice that it can live outside the PHY. Once I got here, I wrote a quick and dirty modulator, and was able to ring up pagers! Unmitigated success and good news only downside was that it took me a single night, and not the multi-week adventure I was looking for. Well, let s finish the job and document what we ve found for the sake of completeness.

Boxing everything up My best guess on the packet structure is as follows:
base id
argument
command
For a call or F2 operation, the argument is the Pager s ID code, but for other commands it s a value or an enum, depending. Here s a table of my by-hand demodulation of all the packet types the base station produces:
Type Cmd Id Description
Call8Call the pager identified by the id in argument
Off60Request any pagers on the charger power off when power is removed, argument is all zero
F240Program a pager to the specified Pager ID (in argument) and base station
F344Set the reminder duration in seconds specified in argument
F448Set the pager's beep mode to the one in argument (0 is disabled, 1 is slow, 2 is medium, 3 is fast)
F552Set the pager's vibration mode to the one in argument (0 is disabled, 1 is enabled)

Kitchen s closed for the night I m not going to be publishing this code since I can t think of a good use anyone would have for this besides folks using a low cost SDR and annoying local resturants; but there s enough here for folks who find this interesting to try modulating this protocol on their own hardware if they want to buy their own pack of pagers and give it a shot, which I do encourage! It s fun! Radios are great, and this is a good protocol to hack with it s really nice. All in all, this wasn t the multi-week adventure I was looking for, this was still a great exercise and a fun reminder that I ve come a far way from when I ve started. It felt a lot like cheating since I was able to infer a lot about the PHY because I d seen it before, but it was still a great time. I may grab a few more restaurant pagers and see if I can find one with a more exotic PHY to emulate next. I mean why not, I ve already got the thermal printer libraries working

13 April 2024

Paul Tagliamonte: Domo Arigato, Mr. debugfs

Years ago, at what I think I remember was DebConf 15, I hacked for a while on debhelper to write build-ids to debian binary control files, so that the build-id (more specifically, the ELF note .note.gnu.build-id) wound up in the Debian apt archive metadata. I ve always thought this was super cool, and seeing as how Michael Stapelberg blogged some great pointers around the ecosystem, including the fancy new debuginfod service, and the find-dbgsym-packages helper, which uses these same headers, I don t think I m the only one. At work I ve been using a lot of rust, specifically, async rust using tokio. To try and work on my style, and to dig deeper into the how and why of the decisions made in these frameworks, I ve decided to hack up a project that I ve wanted to do ever since 2015 write a debug filesystem. Let s get to it.

Back to the Future Time to admit something. I really love Plan 9. It s just so good. So many ideas from Plan 9 are just so prescient, and everything just feels right. Not just right like, feels good like, correct. The bit that I ve always liked the most is 9p, the network protocol for serving a filesystem over a network. This leads to all sorts of fun programs, like the Plan 9 ftp client being a 9p server you mount the ftp server and access files like any other files. It s kinda like if fuse were more fully a part of how the operating system worked, but fuse is all running client-side. With 9p there s a single client, and different servers that you can connect to, which may be backed by a hard drive, remote resources over something like SFTP, FTP, HTTP or even purely synthetic. The interesting (maybe sad?) part here is that 9p wound up outliving Plan 9 in terms of adoption 9p is in all sorts of places folks don t usually expect. For instance, the Windows Subsystem for Linux uses the 9p protocol to share files between Windows and Linux. ChromeOS uses it to share files with Crostini, and qemu uses 9p (virtio-p9) to share files between guest and host. If you re noticing a pattern here, you d be right; for some reason 9p is the go-to protocol to exchange files between hypervisor and guest. Why? I have no idea, except maybe due to being designed well, simple to implement, and it s a lot easier to validate the data being shared and validate security boundaries. Simplicity has its value. As a result, there s a lot of lingering 9p support kicking around. Turns out Linux can even handle mounting 9p filesystems out of the box. This means that I can deploy a filesystem to my LAN or my localhost by running a process on top of a computer that needs nothing special, and mount it over the network on an unmodified machine unlike fuse, where you d need client-specific software to run in order to mount the directory. For instance, let s mount a 9p filesystem running on my localhost machine, serving requests on 127.0.0.1:564 (tcp) that goes by the name mountpointname to /mnt.
$ mount -t 9p \
-o trans=tcp,port=564,version=9p2000.u,aname=mountpointname \
127.0.0.1 \
/mnt
Linux will mount away, and attach to the filesystem as the root user, and by default, attach to that mountpoint again for each local user that attempts to use it. Nifty, right? I think so. The server is able to keep track of per-user access and authorization along with the host OS.

WHEREIN I STYX WITH IT Since I wanted to push myself a bit more with rust and tokio specifically, I opted to implement the whole stack myself, without third party libraries on the critical path where I could avoid it. The 9p protocol (sometimes called Styx, the original name for it) is incredibly simple. It s a series of client to server requests, which receive a server to client response. These are, respectively, T messages, which transmit a request to the server, which trigger an R message in response (Reply messages). These messages are TLV payload with a very straight forward structure so straight forward, in fact, that I was able to implement a working server off nothing more than a handful of man pages. Later on after the basics worked, I found a more complete spec page that contains more information about the unix specific variant that I opted to use (9P2000.u rather than 9P2000) due to the level of Linux specific support for the 9P2000.u variant over the 9P2000 protocol.

MR ROBOTO The backend stack over at zoo is rust and tokio running i/o for an HTTP and WebRTC server. I figured I d pick something fairly similar to write my filesystem with, since 9P can be implemented on basically anything with I/O. That means tokio tcp server bits, which construct and use a 9p server, which has an idiomatic Rusty API that partially abstracts the raw R and T messages, but not so much as to cause issues with hiding implementation possibilities. At each abstraction level, there s an escape hatch allowing someone to implement any of the layers if required. I called this framework arigato which can be found over on docs.rs and crates.io.
/// Simplified version of the arigato File trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.File.html
trait File  
/// OpenFile is the type returned by this File via an Open call.
 type OpenFile: OpenFile;
/// Return the 9p Qid for this file. A file is the same if the Qid is
 /// the same. A Qid contains information about the mode of the file,
 /// version of the file, and a unique 64 bit identifier.
 fn qid(&self) -> Qid;
/// Construct the 9p Stat struct with metadata about a file.
 async fn stat(&self) -> FileResult<Stat>;
/// Attempt to update the file metadata.
 async fn wstat(&mut self, s: &Stat) -> FileResult<()>;
/// Traverse the filesystem tree.
 async fn walk(&self, path: &[&str]) -> FileResult<(Option<Self>, Vec<Self>)>;
/// Request that a file's reference be removed from the file tree.
 async fn unlink(&mut self) -> FileResult<()>;
/// Create a file at a specific location in the file tree.
 async fn create(
&mut self,
name: &str,
perm: u16,
ty: FileType,
mode: OpenMode,
extension: &str,
) -> FileResult<Self>;
/// Open the File, returning a handle to the open file, which handles
 /// file i/o. This is split into a second type since it is genuinely
 /// unrelated -- and the fact that a file is Open or Closed can be
 /// handled by the  arigato  server for us.
 async fn open(&mut self, mode: OpenMode) -> FileResult<Self::OpenFile>;
 
/// Simplified version of the arigato OpenFile trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.OpenFile.html
trait OpenFile  
/// iounit to report for this file. The iounit reported is used for Read
 /// or Write operations to signal, if non-zero, the maximum size that is
 /// guaranteed to be transferred atomically.
 fn iounit(&self) -> u32;
/// Read some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes read is
 /// returned.
 async fn read_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
/// Write some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes written
 /// is returned.
 fn write_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
 

Thanks, decade ago paultag! Let s do it! Let s use arigato to implement a 9p filesystem we ll call debugfs that will serve all the debug files shipped according to the Packages metadata from the apt archive. We ll fetch the Packages file and construct a filesystem based on the reported Build-Id entries. For those who don t know much about how an apt repo works, here s the 2-second crash course on what we re doing. The first is to fetch the Packages file, which is specific to a binary architecture (such as amd64, arm64 or riscv64). That architecture is specific to a component (such as main, contrib or non-free). That component is specific to a suite, such as stable, unstable or any of its aliases (bullseye, bookworm, etc). Let s take a look at the Packages.xz file for the unstable-debug suite, main component, for all amd64 binaries.
$ curl \
https://deb.debian.org/debian-debug/dists/unstable-debug/main/binary-amd64/Packages.xz \
  unxz
This will return the Debian-style rfc2822-like headers, which is an export of the metadata contained inside each .deb file which apt (or other tools that can use the apt repo format) use to fetch information about debs. Let s take a look at the debug headers for the netlabel-tools package in unstable which is a package named netlabel-tools-dbgsym in unstable-debug.
Package: netlabel-tools-dbgsym
Source: netlabel-tools (0.30.0-1)
Version: 0.30.0-1+b1
Installed-Size: 79
Maintainer: Paul Tagliamonte <paultag@debian.org>
Architecture: amd64
Depends: netlabel-tools (= 0.30.0-1+b1)
Description: debug symbols for netlabel-tools
Auto-Built-Package: debug-symbols
Build-Ids: e59f81f6573dadd5d95a6e4474d9388ab2777e2a
Description-md5: a0e587a0cf730c88a4010f78562e6db7
Section: debug
Priority: optional
Filename: pool/main/n/netlabel-tools/netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
Size: 62776
SHA256: 0e9bdb087617f0350995a84fb9aa84541bc4df45c6cd717f2157aa83711d0c60
So here, we can parse the package headers in the Packages.xz file, and store, for each Build-Id, the Filename where we can fetch the .deb at. Each .deb contains a number of files but we re only really interested in the files inside the .deb located at or under /usr/lib/debug/.build-id/, which you can find in debugfs under rfc822.rs. It s crude, and very single-purpose, but I m feeling a bit lazy.

Who needs dpkg?! For folks who haven t seen it yet, a .deb file is a special type of .ar file, that contains (usually) three files inside debian-binary, control.tar.xz and data.tar.xz. The core of an .ar file is a fixed size (60 byte) entry header, followed by the specified size number of bytes.
[8 byte .ar file magic]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
...
First up was to implement a basic ar parser in ar.rs. Before we get into using it to parse a deb, as a quick diversion, let s break apart a .deb file by hand something that is a bit of a rite of passage (or at least it used to be? I m getting old) during the Debian nm (new member) process, to take a look at where exactly the .debug file lives inside the .deb file.
$ ar x netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ ls
control.tar.xz debian-binary
data.tar.xz netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ tar --list -f data.tar.xz   grep '.debug$'
./usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
Since we know quite a bit about the structure of a .deb file, and I had to implement support from scratch anyway, I opted to implement a (very!) basic debfile parser using HTTP Range requests. HTTP Range requests, if supported by the server (denoted by a accept-ranges: bytes HTTP header in response to an HTTP HEAD request to that file) means that we can add a header such as range: bytes=8-68 to specifically request that the returned GET body be the byte range provided (in the above case, the bytes starting from byte offset 8 until byte offset 68). This means we can fetch just the ar file entry from the .deb file until we get to the file inside the .deb we are interested in (in our case, the data.tar.xz file) at which point we can request the body of that file with a final range request. I wound up writing a struct to handle a read_at-style API surface in hrange.rs, which we can pair with ar.rs above and start to find our data in the .deb remotely without downloading and unpacking the .deb at all. After we have the body of the data.tar.xz coming back through the HTTP response, we get to pipe it through an xz decompressor (this kinda sucked in Rust, since a tokio AsyncRead is not the same as an http Body response is not the same as std::io::Read, is not the same as an async (or sync) Iterator is not the same as what the xz2 crate expects; leading me to read blocks of data to a buffer and stuff them through the decoder by looping over the buffer for each lzma2 packet in a loop), and tarfile parser (similarly troublesome). From there we get to iterate over all entries in the tarfile, stopping when we reach our file of interest. Since we can t seek, but gdb needs to, we ll pull it out of the stream into a Cursor<Vec<u8>> in-memory and pass a handle to it back to the user. From here on out its a matter of gluing together a File traited struct in debugfs, and serving the filesystem over TCP using arigato. Done deal!

A quick diversion about compression I was originally hoping to avoid transferring the whole tar file over the network (and therefore also reading the whole debug file into ram, which objectively sucks), but quickly hit issues with figuring out a way around seeking around an xz file. What s interesting is xz has a great primitive to solve this specific problem (specifically, use a block size that allows you to seek to the block as close to your desired seek position just before it, only discarding at most block size - 1 bytes), but data.tar.xz files generated by dpkg appear to have a single mega-huge block for the whole file. I don t know why I would have expected any different, in retrospect. That means that this now devolves into the base case of How do I seek around an lzma2 compressed data stream ; which is a lot more complex of a question. Thankfully, notoriously brilliant tianon was nice enough to introduce me to Jon Johnson who did something super similar adapted a technique to seek inside a compressed gzip file, which lets his service oci.dag.dev seek through Docker container images super fast based on some prior work such as soci-snapshotter, gztool, and zran.c. He also pulled this party trick off for apk based distros over at apk.dag.dev, which seems apropos. Jon was nice enough to publish a lot of his work on this specifically in a central place under the name targz on his GitHub, which has been a ton of fun to read through. The gist is that, by dumping the decompressor s state (window of previous bytes, in-memory data derived from the last N-1 bytes) at specific checkpoints along with the compressed data stream offset in bytes and decompressed offset in bytes, one can seek to that checkpoint in the compressed stream and pick up where you left off creating a similar block mechanism against the wishes of gzip. It means you d need to do an O(n) run over the file, but every request after that will be sped up according to the number of checkpoints you ve taken. Given the complexity of xz and lzma2, I don t think this is possible for me at the moment especially given most of the files I ll be requesting will not be loaded from again especially when I can just cache the debug header by Build-Id. I want to implement this (because I m generally curious and Jon has a way of getting someone excited about compression schemes, which is not a sentence I thought I d ever say out loud), but for now I m going to move on without this optimization. Such a shame, since it kills a lot of the work that went into seeking around the .deb file in the first place, given the debian-binary and control.tar.gz members are so small.

The Good First, the good news right? It works! That s pretty cool. I m positive my younger self would be amused and happy to see this working; as is current day paultag. Let s take debugfs out for a spin! First, we need to mount the filesystem. It even works on an entirely unmodified, stock Debian box on my LAN, which is huge. Let s take it for a spin:
$ mount \
-t 9p \
-o trans=tcp,version=9p2000.u,aname=unstable-debug \
192.168.0.2 \
/usr/lib/debug/.build-id/
And, let s prove to ourselves that this actually mounted before we go trying to use it:
$ mount   grep build-id
192.168.0.2 on /usr/lib/debug/.build-id type 9p (rw,relatime,aname=unstable-debug,access=user,trans=tcp,version=9p2000.u,port=564)
Slick. We ve got an open connection to the server, where our host will keep a connection alive as root, attached to the filesystem provided in aname. Let s take a look at it.
$ ls /usr/lib/debug/.build-id/
00 0d 1a 27 34 41 4e 5b 68 75 82 8E 9b a8 b5 c2 CE db e7 f3
01 0e 1b 28 35 42 4f 5c 69 76 83 8f 9c a9 b6 c3 cf dc E7 f4
02 0f 1c 29 36 43 50 5d 6a 77 84 90 9d aa b7 c4 d0 dd e8 f5
03 10 1d 2a 37 44 51 5e 6b 78 85 91 9e ab b8 c5 d1 de e9 f6
04 11 1e 2b 38 45 52 5f 6c 79 86 92 9f ac b9 c6 d2 df ea f7
05 12 1f 2c 39 46 53 60 6d 7a 87 93 a0 ad ba c7 d3 e0 eb f8
06 13 20 2d 3a 47 54 61 6e 7b 88 94 a1 ae bb c8 d4 e1 ec f9
07 14 21 2e 3b 48 55 62 6f 7c 89 95 a2 af bc c9 d5 e2 ed fa
08 15 22 2f 3c 49 56 63 70 7d 8a 96 a3 b0 bd ca d6 e3 ee fb
09 16 23 30 3d 4a 57 64 71 7e 8b 97 a4 b1 be cb d7 e4 ef fc
0a 17 24 31 3e 4b 58 65 72 7f 8c 98 a5 b2 bf cc d8 E4 f0 fd
0b 18 25 32 3f 4c 59 66 73 80 8d 99 a6 b3 c0 cd d9 e5 f1 fe
0c 19 26 33 40 4d 5a 67 74 81 8e 9a a7 b4 c1 ce da e6 f2 ff
Outstanding. Let s try using gdb to debug a binary that was provided by the Debian archive, and see if it ll load the ELF by build-id from the right .deb in the unstable-debug suite:
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Yes! Yes it will!
$ file /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
/usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, BuildID[sha1]=e59f81f6573dadd5d95a6e4474d9388ab2777e2a, for GNU/Linux 3.2.0, with debug_info, not stripped

The Bad Linux s support for 9p is mainline, which is great, but it s not robust. Network issues or server restarts will wedge the mountpoint (Linux can t reconnect when the tcp connection breaks), and things that work fine on local filesystems get translated in a way that causes a lot of network chatter for instance, just due to the way the syscalls are translated, doing an ls, will result in a stat call for each file in the directory, even though linux had just got a stat entry for every file while it was resolving directory names. On top of that, Linux will serialize all I/O with the server, so there s no concurrent requests for file information, writes, or reads pending at the same time to the server; and read and write throughput will degrade as latency increases due to increasing round-trip time, even though there are offsets included in the read and write calls. It works well enough, but is frustrating to run up against, since there s not a lot you can do server-side to help with this beyond implementing the 9P2000.L variant (which, maybe is worth it).

The Ugly Unfortunately, we don t know the file size(s) until we ve actually opened the underlying tar file and found the correct member, so for most files, we don t know the real size to report when getting a stat. We can t parse the tarfiles for every stat call, since that d make ls even slower (bummer). Only hiccup is that when I report a filesize of zero, gdb throws a bit of a fit; let s try with a size of 0 to start:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 0 Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
warning: Discarding section .note.gnu.build-id which has a section size (24) larger than the file size [in module /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug]
[...]
This obviously won t work since gdb will throw away all our hard work because of stat s output, and neither will loading the real size of the underlying file. That only leaves us with hardcoding a file size and hope nothing else breaks significantly as a result. Let s try it again:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 954M Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Much better. I mean, terrible but better. Better for now, anyway.

Kilroy was here Do I think this is a particularly good idea? I mean; kinda. I m probably going to make some fun 9p arigato-based filesystems for use around my LAN, but I don t think I ll be moving to use debugfs until I can figure out how to ensure the connection is more resilient to changing networks, server restarts and fixes on i/o performance. I think it was a useful exercise and is a pretty great hack, but I don t think this ll be shipping anywhere anytime soon. Along with me publishing this post, I ve pushed up all my repos; so you should be able to play along at home! There s a lot more work to be done on arigato; but it does handshake and successfully export a working 9P2000.u filesystem. Check it out on on my github at arigato, debugfs and also on crates.io and docs.rs. At least I can say I was here and I got it working after all these years.

22 January 2024

Paul Tagliamonte: Writing a simulator to check phased array beamforming

Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.

If you're on the Fediverse, I'd very much appreciate boosts on my toot!
While working on hz.tools, I started to move my beamforming code from 2-D (meaning, beamforming to some specific angle on the X-Y plane for waves on the X-Y plane) to 3-D. I ll have more to say about that once I get around to publishing the code as soon as I m sure it s not completely wrong, but in the meantime I decided to write a simple simulator to visually check the beamformer against the textbooks. The results were pretty rad, so I figured I d throw together a post since it s interesting all on its own outside of beamforming as a general topic. I figured I d write this in Rust, since I ve been using Rust as my primary language over at zoo, and it s a good chance to learn the language better.
This post has some large GIFs

It make take a little bit to load depending on your internet connection. Sorry about that, I'm not clever enough to do better without doing tons of complex engineering work. They may be choppy while they load or something. I tried to compress an ensmall them, so if they're loaded but fuzzy, click on them to load a slightly larger version.
This post won t cover the basics of how phased arrays work or the specifics of calculating the phase offsets for each antenna, but I ll dig into how I wrote a simple simulator and how I wound up checking my phase offsets to generate the renders below.

Assumptions I didn t want to build a general purpose RF simulator, anything particularly generic, or something that would solve for any more than the things right in front of me. To do this as simply (and quickly all this code took about a day to write, including the beamforming math) I had to reduce the amount of work in front of me. Given that I was concerend with visualizing what the antenna pattern would look like in 3-D given some antenna geometry, operating frequency and configured beam, I made the following assumptions: All anetnnas are perfectly isotropic they receive a signal that is exactly the same strength no matter what direction the signal originates from. There s a single point-source isotropic emitter in the far-field (I modeled this as being 1 million meters away 1000 kilometers) of the antenna system. There is no noise, multipath, loss or distortion in the signal as it travels through space. Antennas will never interfere with each other.

2-D Polar Plots The last time I wrote something like this, I generated 2-D GIFs which show a radiation pattern, not unlike the polar plots you d see on a microphone. These are handy because it lets you visualize what the directionality of the antenna looks like, as well as in what direction emissions are captured, and in what directions emissions are nulled out. You can see these plots on spec sheets for antennas in both 2-D and 3-D form. Now, let s port the 2-D approach to 3-D and see how well it works out.

Writing the 3-D simulator As an EM wave travels through free space, the place at which you sample the wave controls that phase you observe at each time-step. This means, assuming perfectly synchronized clocks, a transmitter and receiver exactly one RF wavelength apart will observe a signal in-phase, but a transmitter and receiver a half wavelength apart will observe a signal 180 degrees out of phase. This means that if we take the distance between our point-source and antenna element, divide it by the wavelength, we can use the fractional part of the resulting number to determine the phase observed. If we multiply that number (in the range of 0 to just under 1) by tau, we can generate a complex number by taking the cos and sin of the multiplied phase (in the range of 0 to tau), assuming the transmitter is emitting a carrier wave at a static amplitude and all clocks are in perfect sync.
 let observed_phases: Vec<Complex> = antennas
.iter()
.map( antenna   
let distance = (antenna - tx).magnitude();
let distance = distance - (distance as i64 as f64);
((distance / wavelength) * TAU)
 )
.map( phase  Complex(phase.cos(), phase.sin()))
.collect();
At this point, given some synthetic transmission point and each antenna, we know what the expected complex sample would be at each antenna. At this point, we can adjust the phase of each antenna according to the beamforming phase offset configuration, and add up every sample in order to determine what the entire system would collectively produce a sample as.
 let beamformed_phases: Vec<Complex> = ...;
let magnitude = beamformed_phases
.iter()
.zip(observed_phases.iter())
.map( (beamformed, observed)  observed * beamformed)
.reduce( acc, el  acc + el)
.unwrap()
.abs();
Armed with this information, it s straight forward to generate some number of (Azimuth, Elevation) points to sample, generate a transmission point far away in that direction, resolve what the resulting Complex sample would be, take its magnitude, and use that to create an (x, y, z) point at (azimuth, elevation, magnitude). The color attached two that point is based on its distance from (0, 0, 0). I opted to use the Life Aquatic table for this one. After this process is complete, I have a point cloud of ((x, y, z), (r, g, b)) points. I wrote a small program using kiss3d to render point cloud using tons of small spheres, and write out the frames to a set of PNGs, which get compiled into a GIF. Now for the fun part, let s take a look at some radiation patterns!

1x4 Phased Array The first configuration is a phased array where all the elements are in perfect alignment on the y and z axis, and separated by some offset in the x axis. This configuration can sweep 180 degrees (not the full 360), but can t be steared in elevation at all. Let s take a look at what this looks like for a well constructed 1x4 phased array: And now let s take a look at the renders as we play with the configuration of this array and make sure things look right. Our initial quarter-wavelength spacing is very effective and has some outstanding performance characteristics. Let s check to see that everything looks right as a first test. Nice. Looks perfect. When pointing forward at (0, 0), we d expect to see a torus, which we do. As we sweep between 0 and 360, astute observers will notice the pattern is mirrored along the axis of the antennas, when the beam is facing forward to 0 degrees, it ll also receive at 180 degrees just as strong. There s a small sidelobe that forms when it s configured along the array, but it also becomes the most directional, and the sidelobes remain fairly small.

Long compared to the wavelength (1 ) Let s try again, but rather than spacing each antenna of a wavelength apart, let s see about spacing each antenna 1 of a wavelength apart instead. The main lobe is a lot more narrow (not a bad thing!), but some significant sidelobes have formed (not ideal). This can cause a lot of confusion when doing things that require a lot of directional resolution unless they re compensated for.

Going from ( to 5 ) The last model begs the question - what do things look like when you separate the antennas from each other but without moving the beam? Let s simulate moving our antennas but not adjusting the configured beam or operating frequency. Very cool. As the spacing becomes longer in relation to the operating frequency, we can see the sidelobes start to form out of the end of the antenna system.

2x2 Phased Array The second configuration I want to try is a phased array where the elements are in perfect alignment on the z axis, and separated by a fixed offset in either the x or y axis by their neighbor, forming a square when viewed along the x/y axis. Let s take a look at what this looks like for a well constructed 2x2 phased array: Let s do the same as above and take a look at the renders as we play with the configuration of this array and see what things look like. This configuration should suppress the sidelobes and give us good performance, and even give us some amount of control in elevation while we re at it. Sweet. Heck yeah. The array is quite directional in the configured direction, and can even sweep a little bit in elevation, a definite improvement from the 1x4 above.

Long compared to the wavelength (1 ) Let s do the same thing as the 1x4 and take a look at what happens when the distance between elements is long compared to the frequency of operation say, 1 of a wavelength apart? What happens to the sidelobes given this spacing when the frequency of operation is much different than the physical geometry? Mesmerising. This is my favorate render. The sidelobes are very fun to watch come in and out of existence. It looks absolutely other-worldly.

Going from ( to 5 ) Finally, for completeness' sake, what do things look like when you separate the antennas from each other just as we did with the 1x4? Let s simulate moving our antennas but not adjusting the configured beam or operating frequency. Very very cool. The sidelobes wind up turning the very blobby cardioid into an electromagnetic dog toy. I think we ve proven to ourselves that using a phased array much outside its designed frequency of operation seems like a real bad idea.

Future Work Now that I have a system to test things out, I m a bit more confident that my beamforming code is close to right! I d love to push that code over the line and blog about it, since it s a really interesting topic on its own. Once I m sure the code involved isn t full of lies, I ll put it up on the hztools org, and post about it here and on mastodon.

8 May 2023

Paul Tagliamonte: Open to work!

I decided to leave my job (Principal Software Engineer) after 4 years. I have no idea what I want to do next, so I ve been having loads of chats to try and work that out. I like working in mission focused organizations, working to fix problems across the stack, from interpersonal down to the operating system. I enjoy going where I m rare , places that don t always get the most attention. At my last job, I most enjoyed working to drive engineering standards for all products across the company, mentoring engineers across all teams and seniority levels, and serving as an advisor for senior leadership as we grew the engineering team from 3 to 150 people. If you have a role that you think I d like to hear about, I d love to hear about it at jobs pault.ag (where the is an @ sign).

23 February 2023

Paul Tagliamonte: Announcing hz.tools

Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.

If you're on the Fediverse, I'd very much appreciate boosts on my announcement toot!
Ever since 2019, I ve been learning about how radios work, and trying to learn about using them the hard way by writing as much of the stack as is practical (for some value of practical) myself. I wrote my first Hello World in 2018, which was a simple FM radio player, which used librtlsdr to read in an IQ stream, did some filtering, and played the real valued audio stream via pulseaudio. Over 4 years this has slowly grown through persistence, lots of questions to too many friends to thank (although I will try), and the eternal patience of my wife hearing about radios nonstop for years into a number of Go repos that can do quite a bit, and support a handful of radios. I ve resisted making the repos public not out of embarrassment or a desire to keep secrets, but rather, an attempt to keep myself free of any maintenance obligations to users so that I could freely break my own API, add and remove API surface as I saw fit. The worst case was to have this project feel like work, and I can t imagine that will happen if I feel frustrated by PRs that are getting ahead of me solving problems I didn t yet know about, or bugs I didn t understand the fix for. As my rate of changes to the most central dependencies has slowed, i ve begun to entertain the idea of publishing them. After a bit of back and forth, I ve decided it s time to make a number of them public, and to start working on them in the open, as I ve built up a bit of knowledge in the space, and I and feel confident that the repo doesn t contain overt lies. That s not to say it doesn t contain lies, but those lies are likely hidden and lurking in the dark. Beware. That being said, it shouldn t be a surprise to say I ve not published everything yet for the same reasons as above. I plan to open repos as the rate of changes slows and I understand the problems the library solves well enough or if the project dead ends and I ve stopped learning.

Intention behind hz.tools It s my sincere hope that my repos help to make Software Defined Radio (SDR) code a bit easier to understand, and serves as an understandable framework to learn with. It s a large codebase, but one that is possible to sit down and understand because, well, it was written by a single person. Frankly, I m also not productive enough in my free time in the middle of the night and on weekends and holidays to create a codebase that s too large to understand, I hope! I remain wary of this project turning into work, so my goal is to be very upfront about my boundaries, and the limits of what classes of contributions i m interested in seeing. Here s some goals of open sourcing these repos:
  • I do want this library to be used to learn with. Please go through it all and use it to learn about radios and how software can control them!
  • I am interested in bugs if there s a problem you discover. Such bugs are likely a great chance for me to fix something I ve misunderstood or typoed.
  • I am interested in PRs fixing bugs you find. I may need a bit of a back and forth to fully understand the problem if I do not understand the bug and fix yet. I hope you may have some grace if it s taking a long time.
Here s a list of some anti-goals of open sourcing these repos.
  • I do not want this library to become a critical dependency of an important project, since I do not have the time to deal with the maintenance burden. Putting me in that position is going to make me very uncomfortable.
  • I am not interested in feature requests, the features have grown as I ve hit problems, I m not interested in building or maintaining features for features sake. The API surface should be exposed enough to allow others to experiment with such things out-of-tree.
  • I m not interested in clever code replacing clear code without a very compelling reason.
  • I use GNU/Linux (specifically Debian ), and from time-to-time I ve made sure that my code runs on OpenBSD too. Platforms beyond that will likely not be supported at the expense of either of those two. I ll take fixes for bugs that fix a problem on another platform, but not damage the code to work around issues / lack of features on other platforms (like Windows).
I m not saying all this to be a jerk, I do it to make sure I can continue on my journey to learn about how radios work without my full time job becoming maintaining a radio framework single-handedly for other people to use even if it means I need to close PRs or bugs without merging it or fixing the issue. With all that out of the way, I m very happy to announce that the repos are now public under github.com/hztools.

Should you use this? Probably not. The intent here is not to provide a general purpose Go SDR framework for everyone to build on, although I am keenly aware it looks and feels like it, since that what it is to me. This is a learning project, so for any use beyond joining me in learning should use something like GNU Radio or a similar framework that has a community behind it. In fact, I suspect most contributors ought to be contributing to GNU Radio, and not this project. If I can encourage people to do so, contribute to GNU Radio! Nothing makes me happier than seeing GNU Radio continue to be the go-to, and well supported. Consider donating to GNU Radio!

hz.tools/rf - Frequency types The hz.tools/rf library contains the abstract concept of frequency, and some very basic helpers to interact with frequency ranges (such as helpers to deal with frequency ranges, or frequency range math) as well as frequencies and some very basic conversions (to meters, etc) and parsers (to parse values like 10MHz). This ensures that all the hz.tools libraries have a shared understanding of Frequencies, a standard way of representing ranges of Frequencies, and the ability to handle the IO boundary with things like CLI arguments, JSON or YAML. The git repo can be found at github.com/hztools/go-rf, and is importable as hz.tools/rf.
 // Parse a frequency using hz.tools/rf.ParseHz, and print it to stdout.
 freq := rf.MustParseHz("-10kHz")
fmt.Printf("Frequency: %s\n", freq+rf.MHz)
// Prints: 'Frequency: 990kHz'

// Return the Intersection between two RF ranges, and print
 // it to stdout.
 r1 := rf.Range rf.KHz, rf.MHz 
r2 := rf.Range rf.Hz(10), rf.KHz * 100 
fmt.Printf("Range: %s\n", r1.Intersection(r2))
// Prints: Range: 1000Hz->100kHz
These can be used to represent tons of things - ranges can be used for things like the tunable range of an SDR, the bandpass of a filter or the frequencies that correspond to a bin of an FFT, while frequencies can be used for things such as frequency offsets or the tuned center frequency.

hz.tools/sdr - SDR I/O and IQ Types This is the big one. This library represents the majority of the shared types and bindings, and is likely the most useful place to look at when learning about the IO boundary between a program and an SDR. The git repo can be found at github.com/hztools/go-sdr, and is importable as hz.tools/sdr. This library is designed to look (and in some cases, mirror) the Go io idioms so that this library feels as idiomatic as it can, so that Go builtins interact with IQ in a way that s possible to reason about, and to avoid reinventing the wheel by designing new API surface. While some of the API looks (and is even called) the same thing as a similar function in io, the implementation is usually a lot more naive, and may have unexpected sharp edges such as concurrency issues or performance problems. The following IQ types are implemented using the sdr.Samples interface. The hz.tools/sdr package contains helpers for conversion between types, and some basic manipulation of IQ streams.
IQ Format hz.tools Name Underlying Go Type
Interleaved uint8 (rtl-sdr) sdr.SamplesU8 [][2]uint8
Interleaved int8 (hackrf, uhd) sdr.SamplesI8 [][2]int8
Interleaved int16 (pluto, uhd) sdr.SamplesI16 [][2]int16
Interleaved float32 (airspy, uhd) sdr.SamplesC64 []complex64
The following SDRs have implemented drivers in-tree.
SDR Format RX/TX State
rtl u8 RX Good
HackRF i8 RX/TX Good
PlutoSDR i16 RX/TX Good
rtl kerberos u8 RX Old
uhd i16/c64/i8 RX/TX Good
airspyhf c64 RX Exp
The following major packages and subpackages exist at the time of writing:
Import What is it?
hz.tools/sdr Core IQ types, supporting types and implementations that interact with the byte boundary
hz.tools/sdr/rtl sdr.Receiver implementation using librtlsdr.
hz.tools/sdr/rtl/kerberos Helpers to enable coherent RX using the Kerberos SDR.
hz.tools/sdr/rtl/e4k Helpers to interact with the E4000 RTL-SDR dongle.
hz.tools/sdr/fft Interfaces for performing an FFT, which are implemented by other packages.
hz.tools/sdr/rtltcp sdr.Receiver implementation for rtl_tcp servers.
hz.tools/sdr/pluto sdr.Transceiver implementation for the PlutoSDR using libiio.
hz.tools/sdr/uhd sdr.Transceiver implementation for UHD radios, specifically the B210 and B200mini
hz.tools/sdr/hackrf sdr.Transceiver implementation for the HackRF using libhackrf.
hz.tools/sdr/mock Mock SDR for testing purposes.
hz.tools/sdr/airspyhf sdr.Receiver implementation for the AirspyHF+ Discovery with libairspyhf.
hz.tools/sdr/internal/simd SIMD helpers for IQ operations, written in Go ASM. This isn t the best to learn from, and it contains pure go implemtnations alongside.
hz.tools/sdr/stream Common Reader/Writer helpers that operate on IQ streams.

hz.tools/fftw - hz.tools/sdr/fft implementation The hz.tools/fftw package contains bindings to libfftw3 to implement the hz.tools/sdr/fft.Planner type to transform between the time and frequency domain. The git repo can be found at github.com/hztools/go-fftw, and is importable as hz.tools/fftw. This is the default throughout most of my codebase, although that default is only expressed at the leaf package libraries should not be hardcoding the use of this library in favor of taking an fft.Planner, unless it s used as part of testing. There are a bunch of ways to do an FFT out there, things like clFFT or a pure-go FFT implementation could be plugged in depending on what s being solved for.

hz.tools/ fm,am - analog audio demodulation and modulation The hz.tools/fm and hz.tools/am packages contain demodulators for AM analog radio, and FM analog radio. This code is a bit old, so it has a lot of room for cleanup, but it ll do a very basic demodulation of IQ to audio. The git repos can be found at github.com/hztools/go-fm and github.com/hztools/go-am, and are importable as hz.tools/fm and hz.tools/am. As a bonus, the hz.tools/fm package also contains a modulator, which has been tested on the air and with some of my handheld radios. This code is a bit old, since the hz.tools/fm code is effectively the first IQ processing code I d ever written, but it still runs and I run it from time to time.
 // Basic sketch for playing FM radio using a reader stream from
 // an SDR or other IQ stream.

bandwidth := 150*rf.KHz
reader, err = stream.ConvertReader(reader, sdr.SampleFormatC64)
if err != nil  
...
 
demod, err := fm.Demodulate(reader, fm.DemodulatorConfig 
Deviation: bandwidth / 2,
Downsample: 8, // some value here depending on sample rate
 Planner: fftw.Plan,
 )
if err != nil  
...
 
speaker, err := pulseaudio.NewWriter(pulseaudio.Config 
Format: pulseaudio.SampleFormatFloat32NE,
Rate: demod.SampleRate(),
AppName: "rf",
StreamName: "fm",
Channels: 1,
SinkName: "",
 )
if err != nil  
...
 
buf := make([]float32, 1024*64)
for  
i, err := demod.Read(buf)
if err != nil  
...
 
if i == 0  
panic("...")
 
if err := speaker.Write(buf[:i]); err != nil  
...
 
 

hz.tools/rfcap - byte serialization for IQ data The hz.tools/rfcap package is the reference implementation of the rfcap spec , and is how I store IQ captures locally, and how I send them across a byte boundary. The git repo can be found at github.com/hztools/go-rfcap, and is importable as hz.tools/rfcap. If you re interested in storing IQ in a way others can use, the better approach is to use SigMF rfcap exists for cases like using UNIX pipes to move IQ around, through APIs, or when I send IQ data through an OS socket, to ensure the sample format (and other metadata) is communicated with it. rfcap has a number of limitations, for instance, it can not express a change in frequency or sample rate during the capture, since the header is fixed at the beginning of the file.

1 November 2022

Paul Tagliamonte: Decoding LDPC: k-Bit Brute Forcing

Before you go on: I've been warned off implementing this in practice on a few counts; namely, the space tradeoff isn't worth it, and it's unlikely to correct meaningful errors. I'm going to leave this post up, but please do take the content with a very large grain of salt!
My initial efforts to build a PHY and Data Link layer from scratch using my own code have been progressing nicely since the initial BPSK based protocol I ve documented under the PACKRAT series. As part of that, I ve been diving deep into FEC, and in particular, LDPC. I won t be able to do an overview of LDPC justice in this post with any luck that ll come in a later post to come as part of the RATPACK series, so some knowledge is assumed. As such this post is less useful for those looking to learn about LDPC, and a bit more targeted to those who enjoy talking and thinking about FEC.
Hey, heads up! - This post contains extremely unvalidated and back of the napkin quality work without any effort to prove this out generally. Hopefully this work can be of help to others, but please double check anything below if you need it for your own work!
While implementing LDPC, I ve gotten an encoder and checker working, enough to use LDPC like a checksum. The next big step is to write a Decoder, which can do error correction. The two popular approaches for the actual correction that I ve seen while reading about LDPC are Belief Propagation, and some class of linear programming that I haven t dug into yet. I m not thrilled at how expensive this all is in software, so while implementing the stack I ve been exploring every shady side ally to try and learn more about how encoders and decoders work, both in theory - and in practice.

Processing an LDPC Message Checking if a message is correct is fairly straightforward with LDPC (as with encoding, I ll note). As a quick refresher given the LDPC H (check) matrix of width N, you can check your message vector (msg) of length N by multipling H and msg, and checking if the output vector is all zero.
 // scheme contains our G (generator) and
 // H (check) matrices.
 scheme :=  G: Matrix ... , H: Matrix ... 
// msg contains our LDPC message (data and
 // check bits).
 msg := Vector ... 
// N is also the length of the encoded
 // msg vector after check bits have been
 // added.
 N := scheme.G.Width
// Now, let's generate our 'check' vector.
 ch := Multiply(scheme.H, msg)
We can now see if the message is correct or not:
 // if the ch vector is all zeros, we know
 // that the message is valid, and we don't
 // need to do anything.
 if ch.IsZero()  
// handle the case where the message
 // is fine as-is.
 return ...
 
// Expensive decode here
This is great for getting a thumbs up / thumbs down on the message being correct, but correcting errors still requires pulling the LDPC matrix values from the g (generator) matrix out, building a bipartite graph, and iteratively reprocessing the bit values, until constraints are satisfied and the message has been corrected. This got me thinking - what is the output vector when it s not all zeros? Since 1 values in the output vector indicates consistency problems in the message bits as they relate to the check bits, I wondered if this could be used to speed up my LDPC decoder. It appears to work, so this post is half an attempt to document this technique before I put it in my hot path, and half a plea for those who do like to talk about FEC to tell me what name this technique actually is.

k-Bit Brute Forcing Given that the output Vector s non-zero bit pattern is set due to the position of errors in the message vector, let s use that fact to build up a table of k-Bit errors that we can index into.
 // for clarity's sake, the Vector
 // type is being used as the lookup
 // key here, even though it may
 // need to be a hash or string in
 // some cases.
 idx := map[Vector]int 
for i := 0; i < N; i++  
// Create a vector of length N
 v := Vector 
v.FlipBit(i)
// Now, let's use the generator matrix to encode
 // the data with checksums, and then use the
 // check matrix on the message to figure out what
 // bit pattern results
 ev := Multiply(scheme.H, Multiply(v, scheme.G))
idx[ev] = i
 
This can be extended to multiple bits (hence: k-Bits), but I ve only done one here for illustration. Now that we have our idx mapping, we can now go back to the hot path on Checking the incoming message data:
 // if the ch vector is all zeros, we know
 // that the message is valid, and we don't
 // need to do anything.
 if ch.IsZero()  
// handle the case where the message
 // is fine as-is.
 return ...
 
errIdx, ok := idx[ch]
if ok  
msg.FlipBit(errIdx)
// Verify the LDPC message using
 // H again here.
 return ...
 
// Expensive decode here
Since map lookups wind up a heck of a lot faster than message-passing bit state, the hope here is this will short-circuit easy to solve errors for k-Bits, for some value of k that the system memory can tolerate.

Does this work? Frankly I have no idea. I ve written a small program and brute forced single-bit errors in all bit positions using random data to start with, and I ve not been able to find any collisions in the 1-bit error set, using the LDPC matrix from 802.3an-2006. Even if I was to find a collision for a higher-order k-Bit value, I m tempted to continue with this approach, and treat each set of bits in the Vector s bin (like a hash-table), checking the LDPC validity after each bit set in the bin. As long as the collision rate is small enough, it should be possible to correct k-Bits of error faster than the more expensive Belief Propagation approach. That being said, I m not entirely convinced collisions will be very common, but it ll take a bit more time working through the math to say that with any confidence. Have you seen this approach called something official in publications? See an obvious flaw in the system? Send me a tip, please!

11 April 2022

Paul Tagliamonte: k3xec.com/patty: Go bindings to patty

AX.25 is a tough protocol to use on UNIX systems. A lot of the support in Linux, specifically, is pretty hard to use, and tends to be built into the reptilian brain of the kernel. xan built a userland AX.25 stack called patty, for which I have now built some Go bindings on top of. Code needed to create AX.25 Sockets via Go can be found at github.com/k3xec/go-patty, and imported by Go source as k3xec.com/patty.

Overview Clint patty programs (including consumers of this Go library) work by communicating with a userland daemon (pattyd) via a UNIX named socket. That daemon will communicate with a particular radio using a KISS TNC serial device. The Go bindings implement as many standard Go library interfaces as is practical, allowing for the plug and play use of patty (and AX.25) in places where you would expect a network socket (such as TCP) to work, such as Go s http library.

Example
package main
import (
"fmt"
"log"
"net"
"os"
"time"
"k3xec.com/patty"
)
func main()  
callsign := "N0CALL-10"
client, err := patty.Open("patty.sock")
if err != nil  
panic(err)
 
l, err := client.Listen("ax25", callsign)
if err != nil  
panic(err)
 
for  
log.Printf("Listening for requests to %s", l.Addr())
conn, err := l.Accept()
if err != nil  
log.Printf("Error accepting: %s", err)
continue
 
go handle(conn)
 
 
func handle(c net.Conn) error  
defer c.Close()
log.Printf("New connection from %s (local: %s)", c.RemoteAddr(), c.LocalAddr())
fmt.Fprintf(c,  

Hello! This is Paul's experimental %s node. Feel free
to poke around. Let me know if you spot anything funny.

Five pings are to follow!

 , c.LocalAddr())
for i := 0; i < 5; i++  
time.Sleep(time.Second * 5)
fmt.Fprintf(c, "Ping!\n")
 
return nil
 

6 December 2021

Paul Tagliamonte: Proxying Ethernet Frames to PACKRAT (Part 5/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we left off at being able to send and recieve PACKRAT frames to and from devices. Since we can transport IPv4 packets over the network, let s go ahead and see if we can read/write Ethernet frames from a Linux network interface, and on the backend, read and write PACKRAT frames over the air. This has the benifit of continuing to allow Linux userspace tools to work (like cURL, as we ll try!), which means we don t have to do a lot of work to implement higher level protocols or tactics to get a connection established over the link. Given that this post is less RF and more Linuxy, I m going to include more code snippits than in prior posts, and those snippits are closer to runable Go, but still not complete examples. There s also a lot of different ways to do this, I ve just picked the easiest one for me to implement and debug given my existing tooling for you, you may find another approach easier to implement! Again, deviation here is very welcome, and since this segment is the least RF centric post in the series, the pace and tone is going to feel different. If you feel lost here, that s OK. This isn t the most important part of the series, and is mostly here to give a concrete ending to the story arc. Any way you want to finish your own journy is the best way for you to finish it!

Implement Ethernet conversion code This assumes an importable package with a Frame struct, which we can use to convert a Frame to/from Ethernet. Given that the PACKRAT frame has a field that Ethernet doesn t (namely, Callsign), that will need to be explicitly passed in when turning an Ethernet frame into a PACKRAT Frame.
...
// ToPackrat will create a packrat frame from an Ethernet frame.
func ToPackrat(callsign [8]byte, frame *ethernet.Frame) (*packrat.Frame, error)  
var frameType packrat.FrameType
switch frame.EtherType  
case ethernet.EtherTypeIPv4:
frameType = packrat.FrameTypeIPv4
default:
return nil, fmt.Errorf("ethernet: unsupported ethernet type %x", frame.EtherType)
 
return &packrat.Frame 
Destination: frame.Destination,
Source: frame.Source,
Type: frameType,
Callsign: callsign,
Payload: frame.Payload,
 , nil
 
// FromPackrat will create an Ethernet frame from a Packrat frame.
func FromPackrat(frame *packrat.Frame) (*ethernet.Frame, error)  
var etherType ethernet.EtherType
switch frame.Type  
case packrat.FrameTypeRaw:
return nil, fmt.Errorf("ethernet: unsupported packrat type 'raw'")
case packrat.FrameTypeIPv4:
etherType = ethernet.EtherTypeIPv4
default:
return nil, fmt.Errorf("ethernet: unknown packrat type %x", frame.Type)
 
// We lose the Callsign here, which is sad.
 return &ethernet.Frame 
Destination: frame.Destination,
Source: frame.Source,
EtherType: etherType,
Payload: frame.Payload,
 , nil
 
Our helpers, ToPackrat and FromPackrat can now be used to transmorgify PACKRAT into Ethernet, or Ethernet into PACKRAT. Let s put them into use!

Implement a TAP interface On Linux, the networking stack can be exposed to userland using TUN or TAP interfaces. TUN devices allow a userspace program to read and write data at the Layer 3 / IP layer. TAP devices allow a userspace program to read and write data at the Layer 2 Data Link / Ethernet layer. Writing data at Layer 2 is what we want to do, since we re looking to transform our Layer 2 into Ethernet s Layer 2 Frames. Our first job here is to create the actual TAP interface, set the MAC address, and set the IP range to our pre-coordinated IP range.
...
import (
"net"
"github.com/mdlayher/ethernet"
"github.com/songgao/water"
"github.com/vishvananda/netlink"
)
...
config := water.Config DeviceType: water.TAP 
config.Name = "rat0"
iface, err := water.New(config)
...
netIface, err := netlink.LinkByName("rat0")
...
// Pick a range here that works for you!
 //
 // For my local network, I'm using some IPs
 // that AMPR (ampr.org) was nice enough to
 // allocate to me for ham radio use. Thanks,
 // AMPR!
 //
 // Let's just use 10.* here, though.
 //
 ip, cidr, err := net.ParseCIDR("10.0.0.1/24")
...
cidr.IP = ip
err = netlink.AddrAdd(netIface, &netlink.Addr 
IPNet: cidr,
Peer: cidr,
 )
...
// Add all our neighbors to the ARP table
 for _, neighbor := range neighbors  
netlink.NeighAdd(&netlink.Neigh 
LinkIndex: netIface.Attrs().Index,
Type: netlink.FAMILY_V4,
State: netlink.NUD_PERMANENT,
IP: neighbor.IP,
HardwareAddr: neighbor.MAC,
 )
 
// Pick a MAC that is globally unique here, this is
 // just used as an example!
 addr, err := net.ParseMAC("FA:DE:DC:AB:LE:01")
...
netlink.LinkSetHardwareAddr(netIface, addr)
...
err = netlink.LinkSetUp(netIface)
var frame = &ethernet.Frame 
var buf = make([]byte, 1500)
for  
n, err := iface.Read(buf)
...
err = frame.UnmarshalBinary(buf[:n])
...
// process frame here (to come)
  
...
Now that our network stack can resolve an IP to a MAC Address (via ip neigh according to our pre-defined neighbors), and send that IP packet to our daemon, it s now on us to send IPv4 data over the airwaves. Here, we re going to take packets coming in from our TAP interface, and marshal the Ethernet frame into a PACKRAT Frame and transmit it. As with the rest of the RF code, we ll leave that up to the implementer, of course, using what was built during Part 2: Transmitting BPSK symbols and Part 4: Framing data.
...
for  
// continued from above

n, err := iface.Read(buf)
...
err = frame.UnmarshalBinary(buf[:n])
...
switch frame.EtherType  
case 0x0800:
// ipv4 packet
 pack, err := ToPackrat(
// Add my callsign to all Frames, for now
 [8]byte 'K', '3', 'X', 'E', 'C' ,
frame,
)
...
err = transmitPacket(pack)
...
 
 
...
Now that we have transmitting covered, let s go ahead and handle the recieve path here. We re going to listen on frequency using the code built in Part 3: Receiving BPSK symbols and Part 4: Framing data. The Frames we decode from the airwaves are expected to come back from the call packratReader.Next in the code below, and the exact way that works is up to the implementer.
...
for  
// pull the next packrat frame from
 // the symbol stream as we did in the
 // last post
 packet, err := packratReader.Next()
...
// check for CRC errors and drop invalid
 // packets
 err = packet.Check()
...
if bytes.Equal(packet.Source, addr)  
// if we've heard ourself transmitting
 // let's avoid looping back
 continue
 
// create an ethernet frame
 frame, err := FromPackrat(packet)
...
buf, err := frame.MarshalBinary()
...
// and inject it into the tap
 err = iface.Write(buf)
...
 
...
Phew. Right. Now we should be able to listen for PACKRAT frames on the air and inject them into our TAP interface.

Putting it all Together After all this work weeks of work! we can finally get around to putting some real packets over the air. For me, this was an incredibly satisfying milestone, and tied together months of learning! I was able to start up a UDP server on a remote machine with an RTL-SDR dongle attached to it, listening on the TAP interface s host IP with my defined MAC address, and send UDP packets to that server via PACKRAT using my laptop, /dev/udp and an Ettus B210, sending packets into the TAP interface. Now that UDP was working, I was able to get TCP to work using two PlutoSDRs, which allowed me to run the cURL command I pasted in the first post (both simultaneously listen and transmit on behalf of my TAP interface). It s my hope that someone out there will be inspired to implement their own Layer 1 and Layer 2 as a learning exercise, and gets the same sense of gratification that I did! If you re reading this, and at a point where you ve been able to send IP traffic over your own Layer 1 / Layer 2, please get in touch! I d be thrilled to hear all about it. I d love to link to any posts or examples you publish here!

5 December 2021

Paul Tagliamonte: Framing data (Part 4/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we we were able to build a functioning Layer 1 PHY where we can encode symbols to transmit, and receive symbols on the other end, we re now at the point where we can encode and decode those symbols as bits and frame blocks of data, marking them with a Sender and a Destination for routing to the right host(s). This is a Layer 2 scheme in the OSI model, which is otherwise known as the Data Link Layer. You re using one to view this website right now I m willing to bet your data is going through an Ethernet layer 2 as well as WiFi or maybe a cellular data protocol like 5G or LTE. Given that this entire exercise is hard enough without designing a complex Layer 2 scheme, I opted for simplicity in the hopes this would free me from the complexity and research that has gone into this field for the last 50 years. I settled on stealing a few ideas from Ethernet Frames namely, the use of MAC addresses to identify parties, and the EtherType field to indicate the Payload type. I also stole the idea of using a CRC at the end of the Frame to check for corruption, as well as the specific CRC method (crc32 using 0xedb88320 as the polynomial). Lastly, I added a callsign field to make life easier on ham radio frequencies if I was ever to seriously attempt to use a variant of this protocol over the air with multiple users. However, given this scheme is not a commonly used scheme, it s best practice to use a nearby radio to identify your transmissions on the same frequency while testing or use a Faraday box to test without transmitting over the airwaves. I added the callsign field in an effort to lean into the spirit of the Part 97 regulations, even if I relied on a phone emission to identify the Frames. As an aside, I asked the ARRL for input here, and their stance to me over email was I d be OK according to the regs if I were to stick to UHF and put my callsign into the BPSK stream using a widely understood encoding (even with no knowledge of PACKRAT, the callsign is ASCII over BPSK and should be easily demodulatable for followup with me). Even with all this, I opted to use FM phone to transmit my callsign when I was active on the air (specifically, using an SDR and a small bash script to automate transmission while I watched for interference or other band users). Right, back to the Frame:
sync
dest
source
callsign
type
payload
crc
With all that done, I put that layout into a struct, so that we can marshal and unmarshal bytes to and from our Frame objects, and work with it in software.
type FrameType [2]byte
type Frame struct  
Destination net.HardwareAddr
Source net.HardwareAddr
Callsign [8]byte
Type FrameType
Payload []byte
CRC uint32
 

Time to pick some consts I picked a unique and distinctive sync sequence, which the sender will transmit before the Frame, while the receiver listens for that sequence to know when it s in byte alignment with the symbol stream. My sync sequence is [3]byte 'U', 'f', '~' which works out to be a very pleasant bit sequence of 01010101 01100110 01111110. It s important to have soothing preambles for your Frames. We need all the good energy we can get at this point.
var (
FrameStart = [3]byte 'U', 'f', '~' 
FrameMaxPayloadSize = 1500
)
Next, I defined some FrameType values for the type field, which I can use to determine what is done with that data next, something Ethernet was originally missing, but has since grown to depend on (who needs Length anyway? Not me. See below!)
FrameType Description Bytes
Raw Bytes in the Payload field are opaque and not to be parsed. [2]byte 0x00, 0x01
IPv4 Bytes in the Payload field are an IPv4 packet. [2]byte 0x00, 0x02
And finally, I decided on a maximum length of the Payload, and decided on limiting it to 1500 bytes to align with the MTU of Ethernet.
var (
FrameTypeRaw = FrameType 0, 1 
FrameTypeIPv4 = FrameType 0, 2 
)
Given we know how we re going to marshal and unmarshal binary data to and from Frames, we can now move on to looking through the bit stream for our Frames.

Why is there no Length field? I was initially a bit surprised that Ethernet Frames didn t have a Length field in use, but the more I thought about it, the more it seemed like a big ole' failure mode without a good implementation outcome. Either the Length is right (resulting in no action and used bits on every packet) or the Length is not the length of the Payload and the driver needs to determine what to do with the packet does it try and trim the overlong payload and ignore the rest? What if both the end of the read bytes and the end of the subset of the packet denoted by Length have a valid CRC? Which is used? Will everyone agree? What if Length is longer than the Payload but the CRC is good where we detected a lost carrer? I decided on simplicity. The end of a Frame is denoted by the loss of the BPSK carrier when the signal is no longer being transmitted (or more correctly, when the signal is no longer received), we know we ve hit the end of a packet. Missing a single symbol will result in the Frame being finalized. This can cause some degree of corruption, but it s also a lot easier than doing tricks like bit stuffing to create an end of symbol stream delimiter.

Finding the Frame start in a Symbol Stream First thing we need to do is find our sync bit pattern in the symbols we re receiving from our BPSK demodulator. There s some smart ways to do this, but given that I m not much of a smart man, I again decided to go for simple instead. Given our incoming vector of symbols (which are still float values) prepend one at a time to a vector of floats that is the same length as the sync phrase, and compare against the sync phrase, to determine if we re in sync with the byte boundary within the symbol stream. The only trick here is that because we re using BPSK to modulate and demodulate the data, post phaselock we can be 180 degrees out of alignment (such that a +1 is demodulated as -1, or vice versa). To deal with that, I check against both the sync phrase as well as the inverse of the sync phrase (both [1, -1, 1] as well as [-1, 1, -1]) where if the inverse sync is matched, all symbols to follow will be inverted as well. This effectively turns our symbols back into bits, even if we re flipped out of phase. Other techniques like NRZI will represent a 0 or 1 by a change in phase state which is great, but can often cascade into long runs of bit errors, and is generally more complex to implement. That representation isn t ambiguous, given you look for a phase change, not the absolute phase value, which is incredibly compelling. Here s a notional example of how I ve been thinking about the phrase sliding window and how I ve been thinking of the checks. Each row is a new symbol taken from the BPSK receiver, and pushed to the head of the sliding window, moving all symbols back in the vector by one.
 var (
sync = []float  ...  
buf = make([]float, len(sync))
incomingSymbols = []float  ...  
)
for _, el := range incomingSymbols  
copy(buf, buf[1:])
buf[len(buf)-1] = el
if compare(sync, buf)  
// we're synced!
 break
 
 
Given the pseudocode above, let s step through what the checks would be doing at each step:
Buffer Sync Inverse Sync
[ ]float 0, ,0 [ ]float -1, ,-1 [ ]float 1, ,1
[ ]float 0, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
[more bits in] [ ]float -1, ,-1 [ ]float 1, ,1
[ ]float 1, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
After this notional set of comparisons, we know that at the last step, we are now aligned to the frame and byte boundary the next symbol / bit will be the MSB of the 0th Frame byte. Additionally, we know we re also 180 degrees out of phase, so we need to flip the symbol s sign to get the bit. From this point on we can consume 8 bits at a time, and re-assemble the byte stream. I don t know what this technique is called or even if this is used in real grown-up implementations, but it s been working for my toy implementation.

Next Steps Now that we can read/write Frames to and from PACKRAT, the next steps here are going to be implementing code to encode and decode Ethernet traffic into PACKRAT, coming next in Part 5!

4 December 2021

Paul Tagliamonte: Receiving BPSK symbols (Part 3/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we worked through how to generate a BPSK signal, and hopefully transmit it using one of our SDRs. Let s take that and move on to Receiving BPSK and turning that back into symbols! Demodulating BPSK data is a bit more tricky than transmitting BPSK data, mostly due to tedious facts of life such as space, time, and hardware built with compromises because not doing that makes the problem impossible. Unfortunately, it s now our job to work within our imperfect world to recover perfect data. We need to handle the addition of noise, differences in frequency, clock synchronization and interference in order to recover our information. This makes life a lot harder than when we transmit information, and as a result, a lot more complex.

Coarse Sync Our starting point for this section will be working from a capture of a number of generated PACKRAT packets as heard by a PlutoSDR at (xz compressed interleaved int16, 2,621,440 samples per second) Every SDR has its own oscillator, which eventually controls a number of different components of an SDR, such as the IF (if it s a superheterodyne architecture) and the sampling rate. Drift in oscillators lead to drifts in frequency such that what one SDR may think is 100MHz may be 100.01MHz for another radio. Even if the radios were perfectly in sync, other artifacts such as doppler time dilation due to motion can cause the frequency to appear higher or lower in frequency than it was transmitted. All this is a long way of saying, we need to determine when we see a strong signal that s close-ish to our tuned frequency, and take steps to roughly correct it to our center frequency (in the order of 100s of Hz to kHz) in order to acquire a phase lock on the signal to attempt to decode information contained within. The easiest way of detecting the loudest signal of interest is to use an FFT. Getting into how FFTs work is out of scope of this post, so if this is the first time you re seeing mention of an FFT, it may be a good place to take a quick break to learn a bit about the time domain (which is what the IQ data we ve been working with so far is), frequency domain, and how the FFT and iFFT operations can convert between them. Lastly, because FFTs average power over the window, swapping phases such that the transmitted wave has the same number of in-phase and inverted-phase symbols the power would wind up averaging to zero. This is not helpful, so I took a tip from Dr. Marc Lichtman s PySDR project and used complex squaring to drive our BPSK signal into a single detectable carrier by squaring the IQ data. Because points are on the unit circle and at tau/2 (specifically, tau/(2^1) for BPSK, 2^2 for QPSK) angles, and given that squaring has the effect of doubling the angle, and angles are all mod tau, this will drive our wave comprised of two opposite phases back into a continuous wave effectively removing our BPSK modulation, making it much easier to detect in the frequency domain. Thanks to Tom Bereknyei for helping me with that!
...
var iq []complex 
var freq []complex 
for i := range iq  
iq[i] = iq[i] * iq[i]
 
// perform an fft, computing the frequency
 // domain vector in  freq  given the iq data
 // contained in  iq .
 fft(iq, freq)
// get the array index of the max value in the
 // freq array given the magnitude value of the
 // complex numbers.
 var binIdx = max(abs(freq))
...
Now, most FFT operations will lay the frequency domain data out a bit differently than you may expect (as a human), which is that the 0th element of the FFT is 0Hz, not the most negative number (like in a waterfall). Generally speaking, zero first is the most common frequency domain layout (and generally speaking the most safe assumption if there s no other documentation on fft layout). Negative first is usually used when the FFT is being rendered for human consumption such as a waterfall plot. Given that we now know which FFT bin (which is to say, which index into the FFT array) contains the strongest signal, we ll go ahead and figure out what frequency that bin relates to. In the time domain, each complex number is the next time instant. In the frequency domain, each bin is a discrete frequency or more specifically a frequency range. The bandwidth of the bin is a function of the sampling rate and number of time domain samples used to do the FFT operation. As you increase the amount of time used to preform the FFT, the more precise the FFT measurement of frequency can be, but it will cover the same bandwidth, as defined by the sampling rate.
...
var sampleRate = 2,621,440
// bandwidth is the range of frequencies
 // contained inside a single FFT bin,
 // measured in Hz.
 var bandwidth = sampleRate/len(freq)
...
Now that we know we have a zero-first layout and the bin bandwidth, we can compute what our frequency offset is in Hz.
...
// binIdx is the index into the freq slice
 // containing the frequency domain data.
 var binIdx = 0
// binFreq is the frequency of the bin
 // denoted by binIdx
 var binFreq = 0
if binIdx > len(freq)/2  
// This branch covers the case where the bin
 // is past the middle point - which is to say,
 // if this is a negative frequency.
 binFreq = bandwidth * (binIdx - len(freq))
  else  
// This branch covers the case where the bin
 // is in the first half of the frequency array,
 // which is to say - if this frequency is
 // a positive frequency.
 binFreq = bandwidth * binIdx
 
...
However, sice we squared the IQ data, we re off in frequency by twice the actual frequency if we are reading 12kHz, the bin is actually 6kHz. We need to adjust for that before continuing with processing.
...
var binFreq = 0
...
// [compute the binFreq as above]
 ...
// Adjust for the squaring of our IQ data
 binFreq = binFreq / 2
...
Finally, we need to shift the frequency by the inverse of the binFreq by generating a carrier wave at a specific frequency and rotating every sample by our carrier wave so that a wave at the same frequency will slow down (or stand still!) as it approaches 0Hz relative to the carrier wave.
 var tau = pi * 2
// ts tracks where in time we are (basically: phase)
 var ts float
// inc is the amount we step forward in time (seconds)
 // each sample.
 var inc float = (1 / sampleRate)
// amount to shift frequencies, in Hz,
 // in this case, shift +12 kHz to 0Hz
 var shift = -12,000
for i := range iq  
ts += inc
if ts > tau  
// not actually needed, but keeps ts within
 // 0 to 2*pi (since it is modulus 2*pi anyway)
 ts -= tau
 
// Here, we're going to create a carrier wave
 // at the provided frequency (in this case,
 // -12kHz)
 cwIq = complex(cos(tau*shift*ts), sin(tau*shift*ts))
iq[i] = iq[i] * cwIq
 
Now we ve got the strong signal we ve observed (which may or may not be our BPSK modulated signal!) close enough to 0Hz that we ought to be able to Phase Lock the signal in order to begin demodulating the signal.

Filter After we re roughly in the neighborhood of a few kHz, we can now take some steps to cut out any high frequency components (both positive high frequencies and negative high frequencies). The normal way to do this would be to do an FFT, apply the filter in the frequency domain, and then do an iFFT to turn it back into time series data. This will work in loads of cases, but I ve found it to be incredibly tricky to get right when doing PSK. As such, I ve opted to do this the old fashioned way in the time domain. I ve again opted to go simple rather than correct, and haven t used nearly any of the advanced level trickery I ve come across for fear of using it wrong. As a result, our process here is going to be generating a sinc filter by computing a number of taps, and applying that in the time domain directly on the IQ stream.
// Generate sinc taps

func sinc(x float) float  
if x == 0  
return 1
 
var v = pi * x
return sin(v) / v
 
...
var dst []float
var length = float(len(dst))
if int(length)%2 == 0  
length++
 
for j := range dst  
i := float(j)
dst[j] = sinc(2 * cutoff * (i - (length-1)/2))
 
...
then we apply it in the time domain
...
// Apply sinc taps to an IQ stream

var iq []complex
// taps as created in  dst  above
 var taps []float
var delay = make([]complex, len(taps))
for i := range iq  
// let's shift the next sample into
 // the delay buffer
 copy(delay[1:], delay)
delay[0] = iq[i]
var phasor complex
for j := range delay  
// for each sample in the buffer, let's
 // weight them by the tap values, and
 // create a new complex number based on
 // filtering the real and imag values.
 phasor += complex(
taps[j] * real(delay[j]),
taps[j] * imag(delay[j]),
)
 
// now that we've run this sample
 // through the filter, we can go ahead
 // and scale it back (since we multiply
 // above) and drop it back into the iq
 // buffer.
 iq[i] = complex(
real(phasor) / len(taps),
imag(phasor) / len(taps),
)
 
...
After running IQ samples through the taps and back out, we ll have a signal that s been filtered to the shape of our designed Sinc filter which will cut out captured high frequency components (both positive and negative). Astute observers will note that we re using the real (float) valued taps on both the real and imaginary values independently. I m sure there s a way to apply taps using complex numbers, but it was a bit confusing to work through without being positive of the outcome. I may revisit this in the future!

Downsample Now, post-filter, we ve got a lot of extra RF bandwidth being represented in our IQ stream at our high sample rate All the high frequency values are now filtered out, which means we can reduce our sampling rate without losing much information at all. We can either do nothing about it and process at the fairly high sample rate we re capturing at, or we can drop the sample rate down and help reduce the volume of numbers coming our way. There s two big ways of doing this; either you can take every Nth sample (e.g., take every other sample to half the sample rate, or take every 10th to decimate the sample stream to a 10th of what it originally was) which is the easiest to implement (and easy on the CPU too), or to average a number of samples to create a new sample. A nice bonus to averaging samples is that you can trade-off some CPU time for a higher effective number of bits (ENOB) in your IQ stream, which helps reduce noise, among other things. Some hardware does exactly this (called Oversampling ), and like many things, it has some pros and some cons. I ve opted to treat our IQ stream like an oversampled IQ stream and average samples to get a marginal bump in ENOB. Taking a group of 4 samples and averaging them results in a bit of added precision. That means that a stream of IQ data at 8 ENOB can be bumped to 9 ENOB of precision after the process of oversampling and averaging. That resulting stream will be at 1/4 of the sample rate, and this process can be repeated 4 samples can again be taken for a bit of added precision; which is going to be 1/4 of the sample rate (again), or 1/16 of the original sample rate. If we again take a group of 4 samples, we ll wind up with another bit and a sample rate that s 1/64 of the original sample rate.

Phase Lock Our starting point for this section is the same capture as above, but post-coarse sync, filtering downsampling (xz compressed interleaved float32, 163,840 samples per second) The PLL in PACKRAT was one of the parts I spent the most time stuck on. There s no shortage of discussions of how hardware PLLs work, or even a few software PLLs, but very little by way of how to apply them and/or troubleshoot them. After getting frustrated trying to follow the well worn path, I decided to cut my own way through the bush using what I had learned about the concept, and hope that it works well enough to continue on. PLLs, in concept are fairly simple you generate a carrier wave at a frequency, compare the real-world SDR IQ sample to where your carrier wave is in phase, and use the difference between the local wave and the observed wave to adjust the frequency and phase of your carrier wave. Eventually, if all goes well, that delta is driven as small as possible, and your carrier wave can be used as a reference clock to determine if the observed signal changes in frequency or phase. In reality, tuning PLLs is a total pain, and basically no one outlines how to apply them to BPSK signals in a descriptive way. I ve had to steal an approach I ve seen in hardware to implement my software PLL, with any hope it s close enough that this isn t a hazard to learners. The concept is to generate the carrier wave (as above) and store some rolling averages to tune the carrier wave over time. I use two constants, alpha and beta (which appear to be traditional PLL variable names for this function) which control how quickly the frequency and phase is changed according to observed mismatches. Alpha is set fairly high, which means discrepancies between our carrier and observed data are quickly applied to the phase, and a lower constant for Beta, which will take long-term errors and attempt to use that to match frequency. This is all well and good. Getting to this point isn t all that obscure, but the trouble comes when processing a BPSK signal. Phase changes kick the PLL out of alignment and it tends to require some time to get back into phase lock, when we really shouldn t even be loosing it in the first place. My attempt is to generate two predicted samples, one for each phase of our BPSK signal. The delta is compared, and the lower error of the two is used to adjust the PLL, but the carrier wave itself is used to rotate the sample.
 var alpha = 0.1
var beta = (alpha * alpha) / 2
var phase = 0.0
var frequency = 0.0
...
for i := range iq  
predicted = complex(cos(phase), sin(phase))
sample = iq[i] * conj(predicted)
delta = phase(sample)
predicted2 = complex(cos(phase+pi), sin(phase+pi))
sample2 = iq[i] * conj(predicted2)
delta2 = phase(sample2)
if abs(delta2) < abs(delta)  
// note that we do not update 'sample'.
 delta = delta2
 
phase += alpha * delta
frequency += beta * delta
// adjust the iq sample to the PLL rotated
 // sample.
 iq[i] = sample
 
...
If all goes well, this loop has the effect of driving a BPSK signal s imaginary values to 0, and the real value between +1 and -1.

Average Idle / Carrier Detect Our starting point for this section is the same capture as above, but post-PLL (xz compressed interleaved float32, 163,840 samples per second) When we start out, we have IQ samples that have been mostly driven to an imaginary component of 0 and real value range between +1 and -1 for each symbol period. Our goal now is to determine if we re receiving a signal, and if so, determine if it s +1 or -1. This is a deceptively hard problem given it spans a lot of other similarly entertaining hard problems. I ve opted to not solve the hard problems involved and hope that in practice my very haphazard implementation works well enough. This turns out to be both good (not solving a problem is a great way to not spend time on it) and bad (turns out it does materially impact performance). This segment is the one I plan on revisiting, first. Expect more here at some point! Given that I want to be able to encapsulate three states in the output from this section (our Symbols are no carrier detected ( 0 ), real value 1 ( 1 ) or real value -1 ("-1")), which means spending cycles to determine what the baseline noise is to try and identify when a signal breaks through the noise becomes incredibly important.
var idleThreshold
var thresholdFactor = 10
...
// sigThreshold is used to determine if the symbol
 // is -1, +1 or 0. It's 1.3 times the idle signal
 // threshold.
 var sigThreshold = (idleThreshold * 0.3) + idleThreshold
// iq contains a single symbol's worth of IQ samples.
 // clock alignment isn't really considered; so we'll
 // get a bad packet if we have a symbol transition
 // in the middle of this buffer. No attempt is made
 // to correct for this yet.
 var iq []complex
// avg is used to average a chunk of samples in the
 // symbol buffer.
 var avg float
var mid = len(iq) / 2
// midNum is used to determine how many symbols to
 // average at the middle of the symbol.
 var midNum = len(iq) / 50
for j := mid; j < mid+midNum; j++  
avg += real(iq[j])
 
avg /= midNum
var symbol float
switch  
case avg > sigThreshold:
symbol = 1
case avg < -sigThreshold:
symbol = -1
default:
symbol = 0
// update the idleThreshold using the thresholdFactor
 // to average the idleThreshold over more samples to
 // get a better idea of average noise.
 idleThreshold = (
(idleThreshold*(thresholdFactor-1) + symbol) \
/ thresholdFactor
)
 
// write symbol to output somewhere
...

Next Steps Now that we have a stream of values that are either +1, -1 or 0, we can frame / unframe the data contained in the stream, and decode Packets contained inside, coming next in Part 4!

3 December 2021

Paul Tagliamonte: Transmitting BPSK symbols (Part 2/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we worked through what IQ is, and different formats that it may be sent or received in. Let s take that and move on to Transmitting BPSK using IQ data! When we transmit and receive information through RF using an SDR, data is traditionally encoded into a stream of symbols which are then used by a program to modulate the IQ stream, and sent over the airwaves. PACKRAT uses BPSK to encode Symbols through RF. BPSK is the act of modulating the phase of a sine wave to carry information. The transmitted wave swaps between two states in order to convey a 0 or a 1. Our symbols modulate the transmitted sine wave s phase, so that it moves between in-phase with the SDR s transmitter and 180 degrees (or radians) out of phase with the SDR s transmitter. The difference between a Bit and a Symbol in PACKRAT is not incredibly meaningful, and I ll often find myself slipping up when talking about them. I ve done my best to try and use the right word at the right stage, but it s not as obvious where the line between bit and symbol is at least not as obvious as it would be with QPSK or QAM. The biggest difference is that there are three meaningful states for PACKRAT over BPSK - a 1 (for In phase ), -1 (for 180 degrees out of phase ) and 0 (for no carrier ). For my implementation, a stream of all zeros will not transmit data over the airwaves, a stream of all 1s will transmit all 1 bits over the airwaves, and a stream of all -1s will transmit all 0 bits over the airwaves. We re not going to cover turning a byte (or bit) into a symbol yet I m going to write more about that in a later section. So for now, let s just worry about symbols in, and symbols out.

Transmitting a Sine wave at 0Hz If we go back to thinking about IQ data as a precisely timed measurements of energy over time at some particular specific frequency, we can consider what a sine wave will look like in IQ. Before we dive into antennas and RF, let s go to something a bit more visual. For the first example, you can see an example of a camera who s frame rate (or Sampling Rate!) matches the exact number of rotations per second (or Frequency!) of the propeller and it appears to stand exactly still. Every time the Camera takes a frame, it s catching the propeller in the exact same place in space, even though it s made a complete rotation. The second example is very similar, it s a light strobing (in this case, our sampling rate, since the darkness is ignored by our brains) at the same rate (frequency) as water dropping from a faucet and the video creator is even nice enough to change the sampling frequency to have the droplets move both forward and backward (positive and negative frequency) in comparison to the faucet. IQ works the same way. If we catch something in perfect frequency alignment with our radio, we ll wind up with readings that are the same for the entire stream of data. This means we can transmit a sine wave by setting all of the IQ samples in our buffer to 1+0i, which will transmit a pure sine wave at exactly the center frequency of the radio.
 var sine []complex 
for i := range sine  
sine[i] = complex(1.0, 0.0)
 
Alternatively, we can transmit a Sine wave (but with the opposite phase) by flipping the real value from 1 to -1. The same Sine wave is transmitted on the same Frequency, except when the wave goes high in the example above, the wave will go low in the example below.
 var sine []complex 
for i := range sine  
sine[i] = complex(-1.0, 0.0)
 
In fact, we can make a carrier wave at any phase angle and amplitude by using a bit of trig.
 // angle is in radians - here we have
 // 1.5 Pi (3 Tau) or 270 degrees.
 var angle = pi * 1.5
// amplitude controls the transmitted
 // strength of the carrier wave.
 var amplitude = 1.0
// output buffer as above
 var sine []complex 
for i := range sine  
sine[i] = complex(
amplitude*cos(angle),
amplitude*sin(angle),
)
 
The amplitude of the transmitted wave is the absolute value of the IQ sample (sometimes called magnitude), and the phase can be computed as the angle (or argument). The amplitude remains constant (at 1) in both cases. Remember back to the airplane propeller or water droplets we re controlling where we re observing the sine wave. It looks like a consistent value to us, but in reality it s being transmitted as a pure carrier wave at the provided frequency. Changing the angle of the number we re transmitting will control where in the sine wave cycle we re observing it at.

Generating BPSK modulated IQ data Modulating our carrier wave with our symbols is fairly straightforward to do we can multiply the symbol by 1 to get the real value to be used in the IQ stream. Or, more simply - we can just use the symbol directly in the constructed IQ data.
 var sampleRate = 2,621,440
var baudRate = 1024
// This represents the number of IQ samples
 // required to send a single symbol at the
 // provided baud and sample rate. I picked
 // two numbers in order to avoid half samples.
 // We will transmit each symbol in blocks of
 // this size.
 var samplesPerSymbol = sampleRate / baudRate
var samples = make([]complex, samplesPerSymbol)
// symbol is one of 1, -1 or 0.
 for each symbol in symbols  
for i := range samples  
samples[i] = complex(symbol, 0)
 
// write the samples out to an output file
 // or radio.
 write(samples)
 
If you want to check against a baseline capture, here s 10 example packets at 204800 samples per second.

Next Steps Now that we can transmit data, we ll start working on a receive path in Part 3, in order to check our work when transmitting the packets, as well as being able to hear packets we transmit from afar, coming up next in Part 3!!

2 December 2021

Paul Tagliamonte: Processing IQ data formats (Part 1/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
When working with SDRs, information about the signals your radio is receiving are communicated by streams of IQ data. IQ is short for In-phase and Quadrature , which means 90 degrees out of phase. Values in the IQ stream are complex numbers, so converting them to a native complex type in your language helps greatly when processing the IQ data for meaning. I won t get too deep into what IQ is or why complex numbers (mostly since I don t think I fully understand it well enough to explain it yet), but here s some basics in case this is your first interaction with IQ data before going off and reading more.
Before we get started at any point, if you feel lost in this post, it's OK to take a break to do a bit of learning elsewhere in the internet. I'm still new to this, so I'm sure my overview in one paragraph here won't help clarify things too much. This took me months to sort out on my own. It's not you, really! I particularly enjoyed reading visual-dsp.switchb.org when it came to learning about how IQ represents signals, and Software-Defined Radio for Engineers for a more general reference.
Each value in the stream is taken at a precisely spaced sampling interval (called the sampling rate of the radio). Jitter in that sampling interval, or a drift in the requested and actual sampling rate (usually represented in PPM, or parts per million how many samples out of one million are missing) can cause errors in frequency. In the case of a PPM error, one radio may think it s 100.1MHz and the other may think it s 100.2MHz, and jitter will result in added noise in the resulting stream. A single IQ sample is both the real and imaginary values, together. The complex number (both parts) is the sample. The number of samples per second is the number of real and imaginary value pairs per second. Each sample is reading the electrical energy coming off the antenna at that exact time instant. We re looking to see how that goes up and down over time to determine what frequencies we re observing around us. If the IQ stream is only real-valued measures (e.g., float values rather than complex values reading voltage from a wire), you can still send and receive signals, but those signals will be mirrored across your 0Hz boundary. That means if you re tuned to 100MHz, and you have a nearby transmitter at 99.9MHz, you d see it at 100.1MHz. If you want to get an intuitive understanding of this concept before getting into the heavy math, a good place to start is looking at how Quadrature encoders work. Using complex numbers means we can see up in frequency as well as down in frequency, and understand that those are different signals. The reason why we need negative frequencies is that our 0Hz is the center of our SDR s tuned frequency, not actually at 0Hz in nature. Generally speaking, it s doing loads in hardware (and firmware!) to mix the raw RF signals with a local oscillator to a frequency that can be sampled at the requested rate (fundamentally the same concept as a superheterodyne receiver), so a frequency of -10MHz means that signal is 10 MHz below the center of our SDR s tuned frequency. The sampling rate dictates the amount of frequency representable in the data stream. You ll sometimes see this called the Nyquist frequency. The Nyquist Frequency is one half of the sampling rate. Intuitively, if you think about the amount of bandwidth observable as being 1:1 with the sampling rate of the stream, and the middle of your bandwidth is 0 Hz, you would only have enough space to go up in frequency for half of your bandwidth or half of your sampling rate. Same for going down in frequency.

Float 32 / Complex 64 IQ samples that are being processed by software are commonly processed as an interleaved pair of 32 bit floating point numbers, or a 64 bit complex number. The first float32 is the real value, and the second is the imaginary value.
I#0
Q#0
I#1
Q#1
I#2
Q#2
The complex number 1+1i is represented as 1.0 1.0 and the complex number -1-1i is represented as -1.0 -1.0. Unless otherwise specified, all the IQ samples and pseudocode to follow assumes interleaved float32 IQ data streams. Example interleaved float32 file (10Hz Wave at 1024 Samples per Second)

RTL-SDR IQ samples from the RTL-SDR are encoded as a stream of interleaved unsigned 8 bit integers (uint8 or u8). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.
I#0
Q#0
I#1
Q#1
I#2
Q#2
The complex number 1+1i is represented as 0xFF 0xFF and the complex number -1-1i is represented as 0x00 0x00. The complex number 0+0i is not easily representable since half of 0xFF is 127.5.
Complex Number Representation
1+1i []uint8 0xFF, 0xFF
-1+1i []uint8 0x00, 0xFF
-1-1i []uint8 0x00, 0x00
0+0i []uint8 0x80, 0x80 or []uint8 0x7F, 0x7F
And finally, here s some pseudocode to convert an rtl-sdr style IQ sample to a floating point complex number:
...
in = []uint8 0x7F, 0x7F 
real = (float(iq[0])-127.5)/127.5
imag = (float(iq[1])-127.5)/127.5
out = complex(real, imag)
....
Example interleaved uint8 file (10Hz Wave at 1024 Samples per Second)

HackRF IQ samples from the HackRF are encoded as a stream of interleaved signed 8 bit integers (int8 or i8). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.
I#0
Q#0
I#1
Q#1
I#2
Q#2
Formats that use signed integers do have one quirk due to two s complement, which is that the smallest negative number representable s absolute value is one more than the largest positive number. int8 values can range between -128 to 127, which means there s bit of ambiguity in how +1, 0 and -1 are represented. Either you can create perfectly symmetric ranges of values between +1 and -1, but 0 is not representable, have more possible values in the negative range, or allow values above (or just below) the maximum in the range to be allowed. Within my implementation, my approach has been to scale based on the max integer value of the type, so the lowest possible signed value is actually slightly smaller than -1. Generally, if your code is seeing values that low the difference in step between -1 and slightly less than -1 isn t very significant, even with only 8 bits. Just a curiosity to be aware of.
Complex Number Representation
1+1i []int8 127, 127
-1+1i []int8 -128, 127
-1-1i []int8 -128, -128
0+0i []int8 0, 0
And finally, here s some pseudocode to convert a hackrf style IQ sample to a floating point complex number:
...
in = []int8 -5, 112 
real = (float(in[0]))/127
imag = (float(in[1]))/127
out = complex(real, imag)
....
Example interleaved int8 file (10Hz Wave at 1024 Samples per Second)

PlutoSDR IQ samples from the PlutoSDR are encoded as a stream of interleaved signed 16 bit integers (int16 or i16). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant. Almost no SDRs capture at a 16 bit depth natively, often you ll see 12 bit integers (as is the case with the PlutoSDR) being sent around as 16 bit integers. This leads to the next possible question, which is are values LSB or MSB aligned? The PlutoSDR sends data LSB aligned (which is to say, the largest real or imaginary value in the stream will not exceed 4095), but expects data being transmitted to be MSB aligned (which is to say the lowest set bit possible is the 5th bit in the number, or values can only be set in increments of 16). As a result, the quirk observed with the HackRF (that the range of values between 0 and -1 is different than the range of values between 0 and +1) does not impact us so long as we do not use the whole 16 bit range.
Complex Number Representation
1+1i []int16 32767, 32767
-1+1i []int16 -32768, 32767
-1-1i []int16 -32768, -32768
0+0i []int16 0, 0
And finally, here s some pseudocode to convert a PlutoSDR style IQ sample to a floating point complex number, including moving the sample from LSB to MSB aligned:
...
in = []int16 -15072, 496 
// shift left 4 bits (16 bits - 12 bits = 4 bits)
 // to move from LSB aligned to MSB aligned.
 in[0] = in[0] << 4
in[1] = in[1] << 4
real = (float(in[0]))/32767
imag = (float(in[1]))/32767
out = complex(real, imag)
....
Example interleaved i16 file (10Hz Wave at 1024 Samples per Second)

Next Steps Now that we can read (and write!) IQ data, we can get started first on the transmitter, which we can (in turn) use to test receiving our own BPSK signal, coming next in Part 2!

Paul Tagliamonte: Intro to PACKRAT (Part 0/5)

Hello! Welcome. I m so thrilled you re here. Some of you may know this (as I ve written about in the past), but if you re new to my RF travels, I ve spent nights and weekends over the last two years doing some self directed learning on how radios work. I ve gone from a very basic understanding of wireless communications, all the way through the process of learning about and implementing a set of libraries to modulate and demodulate data using my now formidable stash of SDRs. I ve been implementing all of the RF processing code from first principals and purely based on other primitives I ve written myself to prove to myself that I understand each concept before moving on. I ve just finished a large personal milestone I was able to successfully send a cURL HTTP request through a network interface into my stack of libraries, through my own BPSK implementation, framed in my own artisanal hand crafted Layer 2 framing scheme, demodulated by my code on the other end, and sent into a Linux network interface. The combination of the Layer 1 PHY and Layer 2 Data Link is something that I ve been calling PACKRAT .
$ curl http://44.127.0.8:8000/
* Connected to 44.127.0.8 (44.127.0.8) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:1313
> User-Agent: curl/7.79.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP/1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Length: 236
<
____ _ ____ _ ______ _ _____
  _ \ / \ / ___   / / _ \ / \ _ _ 
   _) / _ \      ' /   _)   / _ \    
  __/ ___ \  ___  . \  _ < / ___ \   
 _  /_/ \_\____ _ \_\_  \_\/_/ \_\_ 
* Closing connection 0
In an effort to pay it forward to thank my friends for their time walking me through huge chunks of this, and those who publish their work, I m now spending some time documenting how I was able to implement this protocol. I would never have gotten as far as I did without the incredible patience and kindness of friends spending time working with me, and educators publishing their hard work for the world to learn from. Please accept my deepest thanks and appreciation. The PACKRAT posts are written from the perspective of a novice radio engineer, but experienced software engineer. I ll be leaving out a lot of the technical details on the software end and specific software implementation, focusing on the general gist of the implementation in the radio critical components exclusively. The idea here is this is intended to be a framework a jumping off point for those who are interested in doing this themselves. I hope that this series of blog posts will come to be useful to those who embark on this incredibly rewarding journey after me. This is the first post in the series, and it will contain links to all the posts to follow. This is going to be the landing page I link others to as I publish additional posts, I ll be updating the links on this page. The posts will also grow a tag, which you can check back on, or follow along with here.

Tau Tau ( ) is a much more natural expression of the mathematical constant used for circles which I use rather than Pi ( ). You may see me use Tau in code or text Tau is the same as 2 , so if you see a Tau and don t know what to do, feel free to mentally or textually replace it with 2 . I just hate always writing 2 everywhere and only using (or worse yet 2 /2) .when I mean 1/2 of a circle (or, /2).

Psuedo-code Basicaly none of the code contained in this series is valid on its own. It s very lightly basically Go, and only meant to express concepts in term of software. The examples in the post shouldn t be taken on their own as working snippits to process IQ data, but rather, be used to guide implementations to process the data in question. I d love to invite all readers to try to play at home with the examples, and try and work through the example data captures!

Captures Speaking of captures, I ve included live on-the-air captures of PACKRAT packets, as transmitted from my implementation, in different parts of these posts. This means you can go through the process of building code to parse and receive PACKRAT packets, and then build a transmitter that is validated by your receiver. It s my hope folks will follow along at home and experiment with software to process RF data on their own!

Posts in this series

Next.