Search Results: "sesse"

18 April 2017

Steinar H. Gunderson: Chinese HDMI-to-SDI converters

I often need to convert signals from HDMI to SDI (and occasionally back). This requires a box of some sort, and eBay obliges; there's a bunch of different sellers of the same devices, selling them around $20 25. They don't seem to have a brand name, but they are invariably sold as 3G-SDI converters (meaning they should go up to 1080p60) and look like this: There are also corresponding SDI-to-HDMI converters that look pretty much the same except they convert the other way. (They're easy to confuse, but that's not a problem unique tothem.) I've used them for a while now, and there are pros and cons. They seem reliable enough, and they're 1/4th the price of e.g. Blackmagic's Micro converters, which is a real bargain. However, there are also some issues: The last issue is by far the worst, but it only affects 3G-SDI resolutions. 720p60, 1080p30 and 1080i60 all work fine. And to be fair, not even Blackmagic's own converters actually send 352M correctly most of the time I wish there were a way I could publish this somewhere people would actually read it before buying these things, but without a name, it's hard for people to find it. They're great value for money, and I wouldn't hesitate to recommend them for almost all use but then, there's that almost. :-)

5 April 2017

Steinar H. Gunderson: Nageru 1.5.0 released

I just released version 1.5.0 of Nageru, my live video mixer. The biggest feature is obviously the HDMI/SDI live output, but there are lots of small nuggets everywhere; it's been four months in the making. I'll simply paste the NEWS entry here:
Nageru 1.5.0, April 5th, 2017
  - Support for low-latency HDMI/SDI output in addition to (or instead of) the
    stream. This currently only works with DeckLink cards, not bmusb. See the
    manual for more information.
  - Support changing the resolution from the command line, instead of locking
    everything to 1280x720.
  - The A/V sync code has been rewritten to be more in line with Fons
    Adriaensen's original paper. It handles several cases much better,
    in particular when trying to match 59.94 and 60 Hz sources to each other.
    However, it might occasionally need a few extra seconds on startup to
    lock properly if startup is slow.
  - Add support for using x264 for the disk recording. This makes it possible,
    among other things, to run Nageru on a machine entirely without VA-API
    support.
  - Support for 10-bit Y'CbCr, both on input and output. (Output requires
    x264 disk recording, as Quick Sync Video does not support 10-bit H.264.)
    This requires compute shader support, and is in general a little bit
    slower on input and output, due to the extra amount of data being shuffled
    around. Intermediate precision is 16-bit floating-point or better,
    as before.
  - Enable input mode autodetection for DeckLink cards that support it.
    (bmusb mode has always been autodetected.)
  - Add functionality to add a time code to the stream; useful for debugging
    latency.
  - The live display is now both more performant and of higher image quality.
  - Fix a long-standing issue where the preview displays would be too bright
    when using an NVIDIA GPU. (This did not affect the finished stream.)
  - Many other bugfixes and small improvements.
1.5.0 is on its way into Debian experimental (it's too late for the stretch release, especially as it also depends on Movit and bmusb from experimental), or you can get it from the home page as always.

4 April 2017

Matthias Klumpp: On Tanglu

It s time for a long-overdue blogpost about the status of Tanglu. Tanglu is a Debian derivative, started in early 2013 when the systemd debate at Debian was still hot. It was formed by a few people wanting to create a Debian derivative for workstations with a time-based release schedule using and showcasing new technologies (which include systemd, but also bundling systems and other things) and built in the open with a community using the similar infrastructure to Debian. Tanglu is designed explicitly to complement Debian and not to compete with it on all devices. Tanglu has achieved a lot of great things. We were the first Debian derivative to adopt systemd and with the help of our contributors we could kill a few nasty issues affecting it and Debian before it ended up becoming default in Debian Jessie. We also started to use the Calamares installer relatively early, bringing a modern installation experience additionally to the traditional debian-installer. We performed the usrmerge early, uncovering a few more issues which were fed back into Debian to be resolved (while workarounds were added to Tanglu). We also briefly explored switching from initramfs-tools to Dracut, but this release goal was dropped due to issues (but might be revived later). A lot of other less-impactful changes happened as well, borrowing a lot of useful ideas and code from Ubuntu (kudos to them!). On the infrastructure side, we set up the Debian Archive Kit (dak), managing to find a couple of issues (mostly hardcoded assumptions about Debian) and reporting them back to make using dak for distributions which aren t Debian easier. We explored using fedmsg for our infrastructure, went through a long and painful iteration of build systems (buildbot -> Jenkins -> Debile) before finally ending up with Debile, and added a set of own custom tools to collect archive QA information and present it to our developers in an easy to digest way. Except for wanna-build, Tanglu is hosting an almost-complete clone of basic Debian archive management tools. During the past year however, the project s progress slowed down significantly. For this, mostly I am to blame. One of the biggest challenges for a young project is to attract new developers and members and keep them engaged. A lot of the people coming to Tanglu and being interested in contributing were unfortunately no packagers and sometimes no developers, and we didn t have the manpower to individually mentor these people and teach them the necessary skills. People asking for tasks were usually asked where their interests were and what they would like to do to give them a useful task. This sounds great in principle, but in practice it is actually not very helpful. A curated list of junior jobs is a much better starting point. We also invested almost zero time in making our project known and create the necessary buzz and excitement that s actually needed to sustain a project like this. Doing more in the advertisement domain and help newcomers area is a high priority issue in the Tanglu bugtracker, which to the day is still open. Doing good alone isn t enough, talking about it is of crucial importance and that is something I knew about, but didn t realize the impact of for quite a while. As strange as it sounds, investing in the tech only isn t enough, community building is of equal importance. Regardless of that, Tanglu has members working on the project, but way too few to manage a project of this magnitude (getting package transitions migrated alone is a large task requiring quite some time while at the same time being incredibly boring :P). A lot of our current developers can only invest small amounts of time into the project because they have a lot of other projects as well. The other issue why Tanglu has problems is too much stuff being centralized on myself. That is a problem I wanted to rectify for a long time, but as soon as a task wasn t done in Tanglu because no people were available to do it, I completed it. This essentially increased the project s dependency on me as single person, giving it a really low bus factor. It not only centralizes power in one person (which actually isn t a problem as long as that person is available enough to perform tasks if asked for), it also centralizes knowledge on how to run services and how to do things. And if you want to give up power, people will need the knowledge on how to perform the specific task first (which they will never gain if there s always that one guy doing it). I still haven t found a great way to solve this it s a problem that essentially kills itself as soon as the project is big enough, but until then the only way to counter it slightly is to write lots of documentation. Last year I had way less time to work on Tanglu than the project deserves. I also started to work for Purism on their PureOS Debian derivative (which is heavily influenced by some of the choices we made for Tanglu, but with different focus that s probably something for another blogpost). A lot of the stuff I do for Purism duplicates the work I do on Tanglu, and also takes away time I have for the project. Additionally I need to invest a lot more time into other projects such as AppStream and a lot of random other stuff that just needs continuous maintenance and discussion (especially AppStream eats up a lot of time since it became really popular in a lot of places). There is also my MSc thesis in neuroscience that requires attention (and is actually in focus most of the time). All in all, I can t split myself and KDE s cloning machine remains broken, so I can t even use that ;-). In terms of projects there is also a personal hard limit of how much stuff I can handle, and exceeding it long-term is not very healthy, as in these cases I try to satisfy all projects and in the end do not focus enough on any of them, which makes me end up with a lot of half-baked stuff (which helps nobody, and most importantly makes me loose the fun, energy and interest to work on it). Good news everyone! (sort of) So, this sounded overly negative, so where does this leave Tanglu? Fact is, I can not commit the crazy amounts of time for it as I did in 2013. But, I love the project and I actually do have some time I can put into it. My work on Purism has an overlap with Tanglu, so Tanglu can actually benefit from the software I develop for them, maybe creating a synergy effect between PureOS and Tanglu. Tanglu is also important to me as a testing environment for future ideas (be it in infrastructure or in the make bundling nice! department). So, what actually is the way forward? First, maybe I have the chance to find a few people willing to work on tasks in Tanglu. It s a fun project, and I learned a lot while working on it. Tanglu also possesses some unique properties few other Debian derivatives have, like being built from source completely (allowing us things like swapping core components or compiling with more hardening flags, switching to newer KDE Plasma and GNOME faster, etc.). Second, if we do not have enough manpower, I think converting Tanglu into a rolling-release distribution might be the only viable way to keep the project running. A rolling release scheme creates much less effort for us than making releases (especially time-based ones!). That way, users will have a constantly updated and secure Tanglu system with machines doing most of the background work. If it turns out that absolutely nothing works and we can t attract new people to help with Tanglu, it would mean that there generally isn t much interest from the developer or user side in a project like this, so shutting it down or scaling it down dramatically would be the only option. But I do not think that this is the case, and I believe that having Tanglu around is important. I also have some interesting plans for it which will be fun to implement for testing  The only thing that had to stop is leaving our users in the dark on what is happening. Sorry for the long post, but there are some subjects which are worth writing more than 140 characters about  If you are interested in contributing to Tanglu, get in touch with us! We have an IRC channel #tanglu-devel on Freenode (go there for quicker responses!), forums and mailinglists, It looks like I will be at Debconf this year as well, so you can also catch me there! I might even talk about PureOS/Tanglu infrastructure at the conference.

21 March 2017

Steinar H. Gunderson: 10-bit H.264 support

Following my previous tests about 10-bit H.264, I did some more practical tests; since media.xiph.org is up again, I did some tests with actual 10-bit input. The results were pretty similar, although of course 4K 60 fps organic content is going to be different at times from the partially rendered 1080p 24 fps clip I used. But I also tested browser support, with good help from people on IRC. It was every bit as bad as I feared: Chrome on desktop (Windows, Linux, macOS) supports 10-bit H.264, although of course without hardware acceleration. Chrome on Android does not. Firefox does not (it tries on macOS, but plays back buggy). iOS does not. VLC does; I didn't try a lot of media players, but obviously ffmpeg-based players should do quite well. I haven't tried Chromecast, but I doubt it works. So I guess that yes, it really is 8-bit H.264 or 10-bit HEVC but I haven't tested the latter yet either :-)

9 March 2017

Steinar H. Gunderson: Tired

To be honest, at this stage I'd actually prefer ads in Wikipedia to having ever more intrusive begging for donations. Please go away soon.

27 February 2017

Steinar H. Gunderson: 10-bit H.264 tests

Following the post about 10-bit Y'CbCr earlier this week, I thought I'd make an actual test of 10-bit H.264 compression for live streaming. The basic question is; sure, it's better-per-bit, but it's also slower, so it is better-per-MHz? This is largely inspired by Ronald Bultje's post about streaming performance, where he largely showed that HEVC is currently useless for live streaming from software; unless you can encode at x264's veryslow preset (which, at 720p60, means basically rather simple content and 20 cores or so), the best x265 presets you can afford will give you worse quality than the best x264 presets you can afford. My results will maybe not be as scientific, but hopefully still enlightening. I used the same test clip as Ronald, namely a two-minute clip of Tears of Steel. Note that this is an 8-bit input, so we're not testing the effects of 10-bit input; it's just testing the increased internal precision in the codec. Since my focus is practical streaming, I ran the last version of x264 at four threads (a typical desktop machine), using one-pass encoding at 4000 kbit/sec. Nageru's speed control has 26 presets to choose from, which gives pretty smooth steps between neighboring ones, but I've been sticking to the ten standard x264 presets (ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, placebo). Here's the graph: The x-axis is seconds used for the encode (note the logarithmic scale; placebo takes 200 250 times as long as ultrafast). The y-axis is SSIM dB, so up and to the left is better. The blue line is 8-bit, and the red line is 10-bit. (I ran most encodes five times and averaged the results, but it doesn't really matter, due to the logarithmic scale.) The results are actually much stronger than I assumed; if you run on (8-bit) ultrafast or superfast, you should stay with 8-bit, but from there on, 10-bit is on the Pareto frontier. Actually, 10-bit veryfast (18.187 dB) is better than 8-bit medium (18.111 dB), while being four times as fast! But not all of us have a relation to dB quality, so I chose to also do a test that maybe is a bit more intuitive, centered around bitrate needed for constant quality. I locked quality to 18 dBm, ie., for each preset, I adjusted the bitrate until the SSIM showed 18.000 dB plus/minus 0.001 dB. (Note that this means faster presets get less of a speed advantage, because they need higher bitrate, which means more time spent entropy coding.) Then I measured the encoding time (again five times) and graphed the results: x-axis is again seconds, and y-axis is bitrate needed in kbit/sec, so lower and to the left is better. Blue is again 8-bit and red is again 10-bit. If the previous graph was enough to make me intrigued, this is enough to make me excited. In general, 10-bit gives 20-30% lower bitrate for the same quality and CPU usage! (Compare this with the supposed up to 50% benefits of HEVC over H.264, given infinite CPU usage.) The most dramatic example is when comparing the medium presets directly, where 10-bit runs at 2648 kbit/sec versus 3715 kbit/sec (29% lower bitrate!) and is only 5% slower. As one progresses towards the slower presets, the gap is somewhat narrowed (placebo is 27% slower and only 24% lower bitrate), but in the realistic middle range, the difference is quite marked. If you run 3 Mbit/sec at 10-bit, you get the quality of 4 Mbit/sec at 8-bit. So is 10-bit H.264 a no-brainer? Unfortunately, no; the client hardware support is nearly nil. Not even Skylake, which can do 10-bit HEVC encoding in hardware (and 10-bit VP9 decoding), can do 10-bit H.264 decoding in hardware. Worse still, mobile chipsets generally don't support it. There are rumors that iPhone 6s supports it, but these are unconfirmed; some Android chips support it, but most don't. I guess this explains a lot of the limited uptake; since it's in some ways a new codec, implementers are more keen to get the full benefits of HEVC instead (even though the licensing situation is really icky). The only ones I know that have really picked it up as a distribution format is the anime scene, and they're feeling quite specific pains due to unique content (large gradients giving pronounced banding in undithered 8-bit). So, 10-bit H.264: It's awesome, but you can't have it. Sorry :-)

23 February 2017

Steinar H. Gunderson: Fyrrom recording released

The recording of yesterday's Fyrrom (Samfundet's unofficial take on Boiler Room) is now available on YouTube. Five video inputs, four hours, two DJs, no dropped frames. Good times. Soundcloud coming soon!

21 February 2017

Steinar H. Gunderson: 8-bit Y'CbCr ought to be enough for anyone?

If you take a random computer today, it's pretty much a given that it runs a 24-bit mode (8 bits of each of R, G and B); as we moved from palettized displays at some point during the 90s, we quickly went past 15- and 16-bit and settled on 24-bit. The reasons are simple; 8 bits per channel is easy to work with on CPUs, and it's on the verge of what human vision can distinguish, at least if you add some dither. As we've been slowly taking the CPU off the pixel path and replacing it with GPUs (which has specialized hardware for more kinds of pixels formats), changing formats have become easier, and there's some push to 10-bit (30-bit) deep color for photo pros, but largely, 8-bit per channel is where we are. Yet, I'm now spending time adding 10-bit input (and eventually also 10-bit output) to Nageru. Why? The reason is simple: Y'CbCr. Video traditionally isn't done in RGB, but in Y'CbCr; that is, a black-and-white signal (Y) and then two color-difference signals (Cb and Cr, roughly additional blueness and additional redness , respectively). We started doing this because it was convenient in analog TV (if you separate the two, black-and-white TVs can just ignore the color signal), but we kept doing it because it's very nice for reducing bandwidth: Human vision is much less sensitive to color than to brightness, so we can transfer the color channels in lower resolution and get away with it. (Also, a typical Bayer sensor can't deliver full color resolution anyway.) So most cameras and video codecs work in Y'CbCr, not RGB. Let's look at the implications of using 8-bit Y'CbCr, using a highly simplified model for, well, simplicity. Let's define Y = 1/3 (R + G + B), Cr = R - Y and Cb = B - Y. (The reverse transformation becomes R = Y + Cr, B = Y + Cb and G = 3Y - R - B.) This means that an RGB color such as pure gray ([127, 127, 127]) becomes [127, 0, 0]. All is good, and Y can go from 0 to 255, just like R, G and B can. A pure red ([255, 0, 0]) becomes [85, 170, 0], and a pure blue ([255, 0, 0]) becomes correspondingly [85, 0, 170]. But we can also have negative Cr and Cb values; a pure yellow ([0, 255, 255]) becomes [170, -170, 85], for instance. So we need to squeeze values from -170 to +170 into an 8-bit range, losing accuracy. Even worse, there are valid Y'CbCr triplets that don't correspond to meaningful RGB colors at all. For instance, Y'CbCr [255, 170, 0] would be RGB [425, 85, 255]; R is out of range! And Y'CbCr [255, -170, 0] would be RGB [85, -85, 255], that is, negative green. This isn't a problem for compression, as we can just avoid using those illegal colors with no loss of efficiency. But it means that the conversion in itself causes a loss; actually, if you do the maths on the real formulas (using the BT.601 standard), it turns out only 17% of the 24-bit Y'CbCr code words are valid! In other words, we lose about two and a half bits of data, and our 24 bits of accuracy have been reduced to 21.5. Or, to put it another way; 8-bit Y'CbCr is roughly equivalent to 7-bit RGB. Thus, pretty much all professional video uses 10-bit Y'CbCr. It's much more annoying to deal with (especially when you've got subsampling!), but if you're using SDI, there's not even any 8-bit version defined, so if you insist on 8-bit, you're taking data you're getting on the wire (whether you want it or not) and throwing 20% of it away. UHDTV standards (using HEVC) are also simply not defined for 8-bit; it's 10- and 12-bit only, even on the codec level. Parts of this is because UHDTV also supports HDR, so you have a wider RGB range than usual to begin with, and 8-bit would cause excessive banding. Using it on the codec level makes a lot of sense for another reason, namely that you reduce internal roundoff errors during processing by a lot; errors equal noise, and noise is bad for compression. I've seen numbers of 15% lower bitrate for H.264 at the same quality, although you also have to take into account that the encoeder also needs more CPU power that you could have used for a higher preset in 8-bit. I don't know how the tradeoff here works out, and you also have to take into account decoder support for 10-bit, especially when it comes to hardware. (When it comes to HEVC, Intel didn't get full fixed-function 10-bit support before Kaby Lake!) So indeed, 10-bit Y'CbCr makes sense even for quite normal video. It isn't a no-brainer to turn it on, though even though Nageru uses a compute shader to convert the 4:2:2 10-bit Y'CbCr to something the GPU can sample from quickly (ie., the CPU doesn't need to touch it), and all internal processing is in 16-bit floating point anyway, it still takes a nonzero amount of time to convert compared to just blasting through 8-bit, so my ultraportable probably won't make it anymore. (A discrete GPU has no issues at all, of course. My laptop converts a 720p frame in about 1.4 ms, FWIW.) But it's worth considering when you want to squeeze even more quality out of the system. And of course, there's still 10-bit output support to be written...

2 February 2017

Steinar H. Gunderson: Not going to FOSDEM but a year of Nageru

It's that time of the year :-) And FOSDEM is fun. But this year I won't be going; there was a scheduling conflict, and I didn't really have anything new to present (although I probably could have shifted around priorities to get something). But FOSDEM 2017 also means there's a year since FOSDEM 2016, where I presented Nageru, my live video mixer. And that's been a pretty busy year, so I thought I'd do a live cap from high up above. First of all, Nageru has actually been used in production we did Solskogen and Fyrrom, and both gave invaluable input. Then there have been some non-public events, which have also been useful. The Nageru that's in git right now is evolved considerably from the 1.0.0 that was released last year. diffstat shows 19660 insertions and 3543 deletions; that's counting about 2500 lines of vendored headers, though. Even though I like deleting code much more than adding it, the doubling (from ~10k to ~20k lines) represents a significant amount of new features: 1.1.x added support for non-Intel GPUs. 1.2.x added support for DeckLink input cards (through Blackmagic's proprietary drivers), greatly increasing hardware support, and did a bunch of small UI changes. 1.3.x added x264 support that's strong enough that Nageru has really displaced VLC as my go-to tool for just video-signal-to-H.264-conversion (even though it feels overkill), and also added hotplug support. 1.4.x added multichannel audio support including support for MIDI controllers, and also a disk space indicator (because when you run out of disk during production without understanding that's what happens, it really sucks), and brought extensive end-user documentation. And 1.5.x, in development right now, will add HDMI/SDI output, which, like all the previous changes, requires various rearchitecting and fixing. Of course, there are lots of things that haven't changed as well; the basic UI remains the same, including the way the theme (governing the look-and-feel of the finished video stream) works. The basic design has proved sound, and I don't think I would change a lot if I were to design something like 1.0.0 again. As a small free software project, you have to pick your battles, and I'm certainly glad I didn't start out doing something like network support (or a distributed architecture in general, really). So what's for the next year of Nageru? It's hard to say, and it will definitely depend on the concrete needs of events. A hot candidate (since I might happen to need it) is chroma keying, although good keying is hard to get right and this needs some research. There's also been some discussion around other concrete features, but I won't name them until a firm commitment has been made; priorities can shift around, and it's important to stay flexible. So, enjoy FOSDEM! Perhaps I'll return with a talk in 2018. In the meantime, I'll preparing the stream for the 2017 edition of Fyrrom, and I know for sure there will be more events, more features and more experiences to be had. And, inevitably, more bugs. :-)

22 January 2017

Steinar H. Gunderson: Nageru loopback test

Nageru, my live video mixer, is in the process of gaining HDMI/SDI output for bigscreen use, and in that process, I ran some loopback tests (connecting the output of one card into the input of another) to verify that I had all the colorspace parameters right. (This is of course trivial if you are only sending one input on bit-by-bit, but Nageru is much more flexible, so it really needs to understand what each pixel means.) It turns out that if you mess up any of these parameters ever so slightly, you end up with something like this, this or this. But thankfully, I got this instead on the very first try, so it really seems it's been right all along. :-) (There's a minor first-generation loss in that the SDI chain is 8-bit Y'CbCr instead of 10-bit, but I really can't spot it with the naked eye, and it doesn't compound through generations. I plan to fix that for those with spare GPU power at some point, possibly before 1.5.0 release.)

11 January 2017

Steinar H. Gunderson: 3G-SDI signal support

I had to figure out what kinds of signal you can run over 3G-SDI today, and it's pretty confusing, so I thought I'd share it. For the reference, 3G-SDI is the same as 3G HD-SDI, an extension of HD-SDI, which is an extension of the venerable SDI standard (well, duh). They're all used for running uncompressed audio/video data of regular BNC coaxial cable, possibly hundreds of meters, and are in wide use in professional and semiprofessional setups. So here's the rundown on 3G-SDI capabilities: And then there's dual-link 3G-SDI, which uses two cables instead of one and there's also Blackmagic's proprietary 6G-SDI , which supports basically everything dual-link 3G-SDI does. But in 2015, seemingly there was also a real 6G-SDI and 12G-SDI, and it's unclear to me whether it's in any way compatible with Blackmagic's offering. It's all confusing. But at least, these are the differences from single-link to dual-link 3G-SDI: 4K? I don't know. 120fps? I believe that's also a proprietary extension of some sort. And of course, having a device support 3G-SDI doesn't mean at all it's required to support all of this; in particular, I believe Blackmagic's systems don't support alpha at all except on their single 12G-SDI card, and I'd also not be surprised if RGB support is rather limited in practice.

8 January 2017

Steinar H. Gunderson: SpeedHQ decoder

I reverse-engineered a video codec. (And then the CTO of the company making it became really enthusiastic, and offered help. Life is strange sometimes.) I'd talk about this and some related stuff at FOSDEM, but there's a scheduling conflict, so I will be in s that weekend, not Brussels.

25 December 2016

Steinar H. Gunderson: Cracking a DataEase password

I recently needed to get access to a DataEase database; the person I helped was the legitimate owner of the data, but had forgotten the password, as the database was largely from 1996. There are various companies around the world that seem to do this, or something similar (like give you an API), for a usually unspecified fee; they all have very 90s homepages and in general seem like they have gone out of business a long time ago. And I wasn't prepared to wait. For those of you who don't know DataEase, it's a sort-of relational database for DOS that had its heyday in the late 80s and early 90s (being sort of the cheap cousin of dBase); this is before SQL gained traction as the standard query language, before real multiuser database access, and before variable-width field storage. It is also before reasonable encryption. Let's see what we can do. DataEase has a system where tables are mapped through the data dictionary, which is a table on its own. (Sidenote: MySQL pre-8.0 still does not have this.) This is the file RDRRTAAA.DBM; I don't really know what RDRR stands for, but T is the database letter in case you wanted more than one database in the same directory, and AAA, AAB, AAC etc. is a counter so that a table grows to be too big for one file. (There's also .DBA files for structure of non-system tables, and then some extra stuff for indexes.) DBM files are pretty much the classical, fixed-length 80s-style database files; each row has some flags (I believe these are for e.g. row is deleted ) and then just the rows in fixed format right after each other. For instance, here's one I created as part of testing (just the first few lines of the hexdump are shown):
00000000: 0e 00 01 74 65 73 74 62 61 73 65 00 00 00 00 00  ...testbase.....
00000010: 00 00 00 00 00 00 00 73 46 cc 29 37 00 09 00 00  .......sF.)7....
00000020: 00 00 00 00 00 43 3a 52 44 52 52 54 41 41 41 2e  .....C:RDRRTAAA.
00000030: 44 42 4d 00 00 01 00 0e 00 52 45 50 4f 52 54 20  DBM......REPORT 
00000040: 44 49 52 45 43 54 4f 52 59 00 00 00 00 00 1c bd  DIRECTORY.......
00000050: d4 1a 27 00 00 00 00 00 00 00 00 00 43 3a 52 45  ..'.........C:RE
00000060: 50 4f 54 41 41 41 2e 44 42 4d 00 00 01 00 0e 00  POTAAA.DBM......
00000070: 52 65 6c 61 74 69 6f 6e 73 68 69 70 73 00 00 00  Relationships...
Even without going in-depth, we can see the structure here; there's testbase which maps to C:RDRRTAA.DBM (the RDRR itself), there's a table called REPORT DIRECTORY that maps to C:REPOTAAA.DBM, and then more stuff after that, and so on. However, other tables are not so easily read, because you can ask DataEase to encrypt a table. Let's look at such an encrypted table, like the Users table (containing usernames, passwords not password hashes and some extra information like access level), which is always encrypted:
00000000: 0c 01 9f ed 94 f7 ed 34 ba 88 9f 78 21 92 7b 34  .......4...x!. 4
00000010: ba 88 0f d9 94 05 1e 34 ba 88 a0 78 21 92 7b 34  .......4...x!. 4
00000020: e2 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b 34  ...x!. 4...x!. 4
00000030: ba 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b     ...x!. 4...x!. 
Clearly, this isn't very good encryption; it uses a very short, repetitive key of eight bytes (64 bits). (The data is mostly zero padding, which makes it much easier to spot this.) In fact, in actual data tables, only five of these bytes are set to a non-zero value, which means we have a 40-bit key; export controls? My first assumption here was of course XOR, but through some experimentation, it turned out what you need is actually 8-bit subtraction (with wraparound). The key used is derived from both a database key and a per-table key, both stored in the RDRR; again, if you disassemble, I'm sure you can find the key derivation function, but that's annoying, too. Note, by the way, that this precludes making an attack by just copying tables between databases, since the database key is different. So let's do a plaintext attack. If you assume the plaintext of the bottom row is all padding, that's your key and here's what you end up with:
00000000: 52 79 00 75 73 65 72 00 00 00 00 00 00 00 00 00  Ry.user.........
00000010: 00 00 70 61 73 73 a3 00 00 00 01 00 00 00 00 00  ..pass..........
00000020: 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  (...............
00000030: 00 00 00 00 00 00 00 00                          ........ 
Not bad, eh? Actually the first byte of the key here is wrong as far as I know, but it didn't interfere with the fields, so we have what we need to log in. (At that point, we've won, because DataEase will helpfully decrypt everything transparent for us.) However, there's a twist; if the password is longer than four characters, the entire decryption of the Users table changes. Of course, we could run our plaintext attack against every data table and pick out the information by decoding the structure, but again; annoying. So let's see what it looks like if we choose passs instead:
00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d  ...z..!..c.m...]
00000010: 70 cb df 36 7e 7c 91 c5 d8 33 d8 3d 73 71 e7 2d  p..6~ ...3.=sq.-
00000020: 7b 9b 3f a5 db d9 4f 95 a8 03 a7 0d 43 41 b7 fd   .?...O.....CA..
00000030: 10 6b 0f 75 ab a9 1f 65 78 d3 77 dd 13 11 87     .k.u...ex.w....
Distinctly more confusing. At this point, of course, we know at which byte positions the username and password start, so if we wanted to, we could just try setting the start byte of the password to every possible byte in turn until we hit 0x00 (DataEase truncates fields at the first zero byte), which would allow us to get in with an empty password. However, I didn't know the username either, and trying two bytes would mean 65536 tries, and I wasn't up for automating macros through DOSBox. So an active attack wasn't too tempting. However, we can look at the last hex byte (where we know the plaintext is 0); it goes 0x5d, 0x2d, 0xfd... and some other bytes go 0x08, 0xd8, 0xa8, 0x78, and so on. So clearly there's an obfuscation here where we have a per-line offset that decreases with 0x30 per line. (Actually, the increase/decrease per line seems to be derived from the key somehow, too.) If we remove that, we end up with:
00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d  ...z..!..c.m...]
00000010: a0 fb 0f 66 ae ac c1 f5 08 63 08 6d a3 a1 17 5d  ...f.....c.m...]
00000020: db fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17 5d  ....;9...c.m...]
00000030: a0 fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17     ....;9...c.m...
Well, OK, this wasn't much more complicated; our fixed key is now 16 bytes long instead of 8 bytes long, but apart from that, we can do exactly the same plaintext attack. (Also, it seems to change per-record now, but we don't see it here, since we've only added one user.) Again, assume the last line is supposed to be all 0x00 and thus use that as a key (plus the last byte from the previous line), and we get:
00000000: 6e 06 00 75 73 65 72 00 00 00 00 00 00 00 00 00  n..user.........
00000010: 00 00 70 61 73 73 12 00 00 00 01 00 00 00 00 00  ..pass..........
00000020: 3b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ;...............
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00     ...............
Well, OK, it wasn't perfect; we got pass\x12 instead of passs , so we messed up somehow. I don't know exactly why the fifth character gets messed up like this; actually, it cost me half an hour of trying because the password looked very real but the database wouldn't let me in, but eventually, we just guessed at what the missing letter was supposed to be. So there you have it; practical small-scale cryptanalysis of DOS-era homegrown encryption. Nothing advanced, but the user was happy about getting the data back after a few hours of work. :-)

20 November 2016

Steinar H. Gunderson: Nageru documentation

Even though the World Chess Championship takes up a lot of time these days, I've still found some time for Nageru, my live video mixer. But this time it doesn't come in form of code; rather, I've spent my time writing documentation. I spent some time fretting over what technical solution I wanted. I explicitly wanted end-user documentation, not developer documentation I rarely find HTML-rendered versions of every member function in a class the best way to understand a program anyway. Actually, on the contrary: Having all sorts of syntax interwoven in class comments tends to be more distracting than anything else. Eventually I settled on Sphinx, not because I found it fantastic (in particular, ReST is a pain with its bizarre variable punctuation-based syntax), but because I'm convinced it has all the momentum right now. Just like git did back in the day, the fact that the Linux kernel has chosen it means it will inevitably grow a quite large ecosystem, and I won't be ending up having to maintain it anytime soon. I tried finding a balance between spending time on installation/setup (only really useful for first-time users, and even then, only a subset of them), concept documentation (how to deal with live video in general, and how Nageru fits into a larger ecosystem of software and equipment) and more concrete documentation of all the various features and quirks of Nageru itself. Hopefully, most people will find at least something that's not already obvious to them, without drowning in detail. You can read the documentation at https://nageru.sesse.net/doc/, or if you want to send patches, the right place to patch is the git repository.

5 November 2016

Steinar H. Gunderson: Multithreaded OpenGL

Multithreading continues to be hard (although the alternatives are not really a lot better). While debugging a user issue in Nageru, I found and fixed a few races (mostly harmless in practice, though) in my own code, but also two issues that I filed patches for in Mesa. But that's not enough, it seems; there are still issues that are too subtle for me to figure out on-the-fly. But at least with those patches, I can use interlaced video sources in Nageru on Intel GPUs without segfaulting pretty much immediately. My laptop's GPU isn't fast enough to actually run the YADIF interlacer realtime in 1080p60, though, but it's nice at least not take the program down. (These things are super-sensitive to timing, of course, which is probably why I didn't see them when developing the feature a year or so ago.) As usual, NVIDIA's proprietary drivers seem to be near-flawless in this regard. I'm starting to think maybe it's about massive amounts of QA resources.

26 October 2016

Steinar H. Gunderson: Why does software development take so long?

Nageru 1.4.0 is out (and on its way through the Debian upload process right now), so now you can do live video mixing with multichannel audio to your heart's content. I've already blogged about most of the interesting new features, so instead, I'm trying to answer a question: What took so long? To be clear, I'm not saying 1.4.0 took more time than I really anticipated (on the contrary, I pretty much understood the scope from the beginning, and there was a reason why I didn't go for building this stuff into 1.0.0); but if you just look at the changelog from the outside, it's not immediately obvious why multichannel audio support should take the better part of three months of develoment. What I'm going to say is of course going to be obvious to most software developers, but not everyone is one, and perhaps my experiences will be illuminating. Let's first look at some obvious things that isn't the case: First of all, development is not primarily limited by typing speed. There are about 9,000 lines of new code in 1.4.0 (depending a bit on how you count), and if it was just about typing them in, I would be done in a day or two. On a good keyboard, I can type plain text at more than 800 characters per minute but you hardly ever write code for even a single minute at that speed. Just as when writing a novel, most time is spent thinking, not typing. I also didn't spend a lot of time backtracking; most code I wrote actually ended up in the finished product as opposed to being thrown away. (I'm not as lucky in all of my projects.) It's pretty common to do so if you're in an exploratory phase, but in this case, I had a pretty good idea of what I wanted to do right from the start, and that plan seemed to work. This wasn't a difficult project per se; it just needed to be done (which, in a sense, just increases the mystery). However, even if this isn't at the forefront of science in any way (most code in the world is pretty pedestrian, after all), there's still a lot of decisions to make, on several levels of abstraction. And a lot of those decisions depend on information gathering beforehand. Let's take a look at an example from late in the development cycle, namely support for using MIDI controllers instead of the mouse to control the various widgets. I've kept a pretty meticulous TODO list; it's just a text file on my laptop, but it serves the purpose of a ghetto bugtracker. For 1.4.0, it contains 83 work items (a single-digit number is not ticked off, mostly because I decided not to do those things), which corresponds roughly 1:2 to the number of commits. So let's have a look at what the ~20 MIDI controller items went into. First of all, to allow MIDI controllers to influence the UI, we need a way of getting to it. Since Nageru is single-platform on Linux, ALSA is the obvious choice (if not, I'd probably have to look for a library to put in-between), but seemingly, ALSA has two interfaces (raw MIDI and sequencer). Which one do you want? It sounds like raw MIDI is what we want, but actually, it's the sequencer interface (it does more of the MIDI parsing for you, and generally is friendlier). The first question is where to start picking events from. I went the simplest path and just said I wanted all events anything else would necessitate a UI, a command-line flag, figuring out if we wanted to distinguish between different devices with the same name (and not all devices potentially even have names), and so on. But how do you enumerate devices? (Relatively simple, thankfully.) What do you do if the user inserts a new one while Nageru is running? (Turns out there's a special device you can subscribe to that will tell you about new devices.) What if you get an error on subscription? (Just print a warning and ignore it; it's legitimate not to have access to all devices on the system. By the way, for PCM devices, all of these answers are different.) So now we have a sequencer device, how do we get events from it? Can we do it in the main loop? Turns out it probably doesn't integrate too well with Qt, but it's easy enough to put it in a thread. The class dealing with the MIDI handling now needs locking; what mutex granularity do we want? (Experience will tell you that you nearly always just want one mutex. Two mutexes give you all sorts of headaches with ordering them, and nearly never gives any gain.) ALSA expects us to poll() a given set of descriptors for data, but on shutdown, how do you break out of that poll to tell the thread to go away? (The simplest way on Linux is using an eventfd.) There's a quirk where if you get two or more MIDI messages right after each other and only read one, poll() won't trigger to alert you there are more left. Did you know that? (I didn't. I also can't find it documented. Perhaps it's a bug?) It took me some looking into sample code to find it. Oh, and ALSA uses POSIX error codes to signal errors (like nothing more is available ), but it doesn't use errno. OK, so you have events (like controller 3 was set to value 47 ); what do you do about them? The meaning of the controller numbers is different from device to device, and there's no open format for describing them. So I had to make a format describing the mapping; I used protobuf (I have lots of experience with it) to make a simple text-based format, but it's obviously a nightmare to set up 50+ controllers by hand in a text file, so I had to make an UI for this. My initial thought was making a grid of spinners (similar to how the input mapping dialog already worked), but then I realized that there isn't an easy way to make headlines in Qt's grid. (You can substitute a label widget for a single cell, but not for an entire row. Who knew?) So after some searching, I found out that it would be better to have a tree view (Qt Creator does this), and then you can treat that more-or-less as a table for the rows that should be editable. Of course, guessing controller numbers is impossible even in an editor, so I wanted it to respond to MIDI events. This means the editor needs to take over the role as MIDI receiver from the main UI. How you do that in a thread-safe way? (Reuse the existing mutex; you don't generally want to use atomics for complicated things.) Thinking about it, shouldn't the MIDI mapper just support multiple receivers at a time? (Doubtful; you don't want your random controller fiddling during setup to actually influence the audio on a running stream. And would you use the old or the new mapping?) And do you really need to set up every single controller for each bus, given that the mapping is pretty much guaranteed to be similar for them? Making a guess bus button doesn't seem too difficult, where if you have one correctly set up controller on the bus, it can guess from a neighboring bus (assuming a static offset). But what if there's conflicting information? OK; then you should disable the button. So now the enable/disable status of that button depends on which cell in your grid has the focus; how do you get at those events? (Install an event filter, or subclass the spinner.) And so on, and so on, and so on. You could argue that most of these questions go away with experience; if you're an expert in a given API, you can answer most of these questions in a minute or two even if you haven't heard the exact question before. But you can't expect even experienced developers to be an expert in all possible libraries; if you know everything there is to know about Qt, ALSA, x264, ffmpeg, OpenGL, VA-API, libusb, microhttpd and Lua (in addition to C++11, of course), I'm sure you'd be a great fit for Nageru, but I'd wager that pretty few developers fit that bill. I've written C++ for almost 20 years now (almost ten of them professionally), and that experience certainly helps boosting productivity, but I can't say I expect a 10x reduction in my own development time at any point. You could also argue, of course, that spending so much time on the editor is wasted, since most users will only ever see it once. But here's the point; it's not actually a lot of time. The only reason why it seems like so much is that I bothered to write two paragraphs about it; it's not a particular pain point, it just adds to the total. Also, the first impression matters a lot if the user can't get the editor to work, they also can't get the MIDI controller to work, and is likely to just go do something else. A common misconception is that just switching languages or using libraries will help you a lot. (Witness the never-ending stream of software that advertises written in Foo or uses Bar as if it were a feature.) For the former, note that nothing I've said so far is specific to my choice of language (C++), and I've certainly avoided a bunch of battles by making that specific choice over, say, Python. For the latter, note that most of these problems are actually related to library use libraries are great, and they solve a bunch of problems I'm really glad I didn't have to worry about (how should each button look?), but they still give their own interaction problems. And even when you're a master of your chosen programming environment, things still take time, because you have all those decisions to make on top of your libraries. Of course, there are cases where libraries really solve your entire problem and your code gets reduced to 100 trivial lines, but that's really only when you're solving a problem that's been solved a million times before. Congrats on making that blog in Rails; I'm sure you're advancing the world. (To make things worse, usually this breaks down when you want to stray ever so slightly from what was intended by the library or framework author. What seems like a perfect match can suddenly become a development trap where you spend more of your time trying to become an expert in working around the given library than actually doing any development.) The entire thing reminds me of the famous essay No Silver Bullet by Fred Brooks, but perhaps even more so, this quote from John Carmack's .plan has struck with me (incidentally about mobile game development in 2006, but the basic story still rings true):
To some degree this is already the case on high end BREW phones today. I have a pretty clear idea what a maxed out software renderer would look like for that class of phones, and it wouldn't be the PlayStation-esq 3D graphics that seems to be the standard direction. When I was doing the graphics engine upgrades for BREW, I started along those lines, but after putting in a couple days at it I realized that I just couldn't afford to spend the time to finish the work. "A clear vision" doesn't mean I can necessarily implement it in a very small integral number of days.
In a sense, programming is all about what your program should do in the first place. The how question is just the what , moved down the chain of abstractions until it ends up where a computer can understand it, and at that point, the three words multichannel audio support have become those 9,000 lines that describe in perfect detail what's going on.

16 October 2016

Steinar H. Gunderson: backup.sh opensourced

It's been said that backup is a bit like flossing; everybody knows you should do it, but nobody does it. If you want to start flossing, an immediate question is what kind of dental floss to get and conversely, for backup, which backup software do you want to rely on? I had some criteria: I looked at basically everything that existed in Debian and then some, and all of them failed. But Samfundet had its own script that's basically just a simple wrapper around tar and ssh, which has worked for 15+ years without a hitch (including several restores), so why not use it? All the authors agreed to a GPLv2+ licensing, so now it's time for backup.sh to meet the world. It does about the simplest thing you can imagine: ssh to the server and use GNU tar to tar down every filesystem that has the dump bit set in fstab. Every 30 days, it does a full backup; otherwise, it does an incremental backup using GNU tar's incremental mode (which makes sure you will also get information about file deletes). It doesn't do inter-file diffs (so if you have huge files that change only a little bit every day, you'll get blowup), and you can't do single-file restores without basically scanning through all the files; tar isn't random-access. So it doesn't do much fancy, but it works, and it sends you a nice little email every day so you can know your backup went well. (There's also a less frequently used mode where the backed-up server encrypts the backup using GnuPG, so you don't even need to trust the backup server.) It really takes fifteen minutes to set up, so now there's no excuse. :-) Oh, and the only good dental floss is this one. :-)

2 October 2016

Steinar H. Gunderson: SNMP MIB setup

If you just install the snmp package out of the box, you won't get the MIBs, so it's pretty much useless for anything vendor without some setup. I'm sure this is documented somewhere, but I have to figure it out afresh every single time, so this time I'm writing it down; I can't possibly be the only one getting confused. First, install snmp-mibs-downloader from non-free. You'll need to work around bug #839574 to get the Cisco MIBs right:
# cp /usr/share/doc/snmp-mibs-downloader/examples/cisco.conf /etc/snmp-mibs-downloader/
# gzip -cd /usr/share/doc/snmp-mibs-downloader/examples/ciscolist.gz > /etc/snmp-mibs-downloader/ciscolist
Now you can download the Cisco MIBs:
# download-mibs cisco
However, this only downloads them; you will need to modify snmp.conf to actually use them. Comment out the line that says mibs : , and then add:
mibdirs +/var/lib/snmp/mibs/cisco/
Voila! Now you can use snmpwalk with e.g. -m AIRESPACE-WIRELESS-MIB to get the full range of Cisco WLC objects (and the first time you do so as root or the Debian-snmp user, the MIBs will be indexed in /var/lib/snmp/mib_indexes/.)

25 September 2016

Steinar H. Gunderson: Nageru @ Fyrrom

When Samfundet wanted to make their own Boiler Room spinoff (called Fyrrom more or less a direct translation), it was a great opportunity to try out the new multitrack code in Nageru. After all, what can go wrong with a pretty much untested and unfinished git branch, right? So we cobbled together a bunch of random equipment from here and there: Video equipment Hooked it up to Nageru: Nageru screenshot and together with some great work from the people actually pulling together the event, this was the result. Lots of fun. And yes, some bugs were discovered of course, field testing without followup patches is meaningless (that would either mean you're not actually taking your test experience into account, or that your testing gave no actionable feedback and thus was useless), so they will be fixed in due time for the 1.4.0 release. Edit: Fixed a screenshot link.

16 September 2016

Steinar H. Gunderson: BBR opensourced

This is pretty big stuff for anyone who cares about TCP. Huge congrats to the team at Google.

Next.

Previous.