Search Results: "sesse"

9 July 2017

Steinar H. Gunderson: Nageru 1.6.1 released

I've released version 1.6.1 of Nageru, my live video mixer. Now that Solskogen is coming up, there's been a lot of activity on the Nageru front, but hopefully everything is actually coming together now. Testing has been good, but we'll see whether it stands up to the battle-hardening of the real world or not. Hopefully I won't be needing any last-minute patches. :-) Besides the previously promised Prometheus metrics (1.6.1 ships with a rather extensive set, as well as an example Grafana dashboard) and frame queue management improvements, a surprising late addition was that of a new transcoder called Kaeru (following the naming style of Nageru itself, from the japanese verb kaeru ( ) which means roughly to replace or excahnge iKnow! claims it can also mean convert , but I haven't seen support for this anywhere else). Normally, when I do streams, I just let Nageru do its thing and send out a single 720p60 stream (occasionally 1080p), usually around 5 Mbit/sec; less than that doesn't really give good enough quality for the high-movement scenarios I'm after. But Solskogen is different in that there's a pretty diverse audience when it comes to networking conditions; even though I have a few mirrors spread around the world (and some JavaScript to automatically pick the fastest one; DNS round-robin is really quite useless here!), not all viewers can sustain such a bitrate. Thus, there's also a 480p variant with around 1 Mbit/sec or so, and it needs to come from somewhere. Traditionally, I've been using VLC for this, but streaming is really a niche thing for VLC. I've been told it will be an increased focus for 4.0 now that 3.0 is getting out the door, but over the last few years, there's been a constant trickle of little issues that have been breaking my transcoding pipeline. My solution for this was to simply never update VLC, but now that I'm up to stretch, this didn't really work anymore, and I'd been toying around with the idea of making a standalone transcoder for a while. (You'd ask why not the ffmpeg(1) command-line client? , but it's a bit too centered around files and not streams; I use it for converting to HLS for iOS devices, but it has a nasty habit of I/O blocking real work, and its HTTP server really isn't meant for production work. I could survive the latter if it supported Metacube and I could feed it into Cubemap, but it doesn't.) It turned out Nageru had already grown most of the pieces I needed; it had video decoding through FFmpeg, x264 encoding with speed control (so that it automatically picks the best preset the machine can sustain at any given time) and muxing, audio encoding, proper threading everywhere, and a usable HTTP server that could output Metacube. All that was required was to add audio decoding to the FFmpeg input, and then replace the GPU-based mixer and GUI with a very simple driver that just connects the decoders to the encoders. (This means it runs fine on a headless server with no GPU, but it also means you'll get FFmpeg's scaling, which isn't as pretty or fast as Nageru's. I think it's an okay tradeoff.) All in all, this was only about 250 lines of delta, which pales compared to the ~28000 lines of delta that are between 1.3.1 (used for last Solskogen) and 1.6.1. It only supports a rather limited set of Prometheus metrics, and it has some limitations, but it seems to be stable and deliver pretty good quality. I've denoted it experimental for now, but overall, I'm quite happy with how it turned out, and I'll be using it for Solskogen. Nageru 1.6.1 is on its way into Debian, but it depends on a new version of Movit which needs to go through the NEW queue (a soname bump), so it might be a few days. In the meantime, I'll be busy preparing for Solskogen. :-)

25 June 2017

Steinar H. Gunderson: Frame queue management in Nageru 1.6.1

Nageru 1.6.1 is on its way, and what was intended to only be a release centered around monitoring improvements (more specifically a full set of native Prometheus] metrics) actually ended up getting a fairly substantial change to how Nageru manages its frame queues. To understand what's changing and why, it's useful to first understand the history of Nageru's queue management. Nageru 1.0.0 started out with a fairly simple scheme, but with some basics that are still relevant today: One of the input cards was deemed the master card, and whenever it delivers a frame, the master clock ticks and an output frame is produced. (There are some subtleties about dropped frames and/or the master card changing frame rates, but I'm going to ignore them, since they're not important to the discussion.) To this end, every card keeps a preallocated frame queue; when a card delivers a frame, it's put into the queue, and when the master clock ticks, it tries picking out one frame from each of the other card's queues to mix together. Note that mix here could be as simple as picking one input and throwing all the other ones away; the queueing algorithm doesn't care, it just feeds all of them to the theme and lets that run whatever GPU code it needs to match the user's preferences. The only thing that really keeps the queues bounded is that the frames in them are preallocated (in GPU memory), so if one queue gets longer than 16 frames, Nageru starts dropping it. But is 16 the right number? There are two conflicting demands here, ignoring memory usage: The 1.0.0 scheme does about as well as one could possibly hope in never dropping frames, but unfortunately, it can be pretty poor at latency. For instance, if your master card runs at 50 Hz and you have a 60 Hz card, the latter will eventually build up a delay of 16 * 16.7 ms = 266.7 ms clearly unacceptable, and rather unneeded. You could ask the user to specify a queue length, but the user probably doesn't know, and also shouldn't really have to care more knobs to twiddle are a bad thing, and even more so knobs the user is expected to twiddle. Thus, Nageru 1.2.0 introduced queue autotuning; it keeps a running estimate on how big the queue needs to be to avoid underruns, simply based on experience. If we've been dropping frames on a queue and then there's an underrun, the safe queue length is increased by one, and if the queue has been having excess frames for more than a thousand successive master clock ticks, we reduce it by one again. Whenever the queue has more than this safe number, we drop frames. This was simple, effective and largely fixed the problem. However, when adding metrics, I noticed a peculiar effect: Not all of my devices have equally good clocks. In particular, when setting up for 1080p50, my output card's internal clock (which assumes the role of the master clock when using HDMI/SDI output) seems to tick at about 49.9998 Hz, and my simple home camcorder delivers frames at about 49.9995 Hz. Over the course of an hour, this means it produces one more frame than you should have which should of course be dropped. Having an SDI setup with synchronized clocks (blackburst/tri-level) would of course fix this problem, but most people are not so lucky with their cameras, not to mention the price of PC graphics cards with SDI outputs! However, this happens very slowly, which means that for a significant amount of time, the two clocks will very nearly be in sync, and thus racing. Who ticks first is determined largely by luck in the jitter (normal is maybe 1ms, but occasionally, you'll see delayed delivery of as much as 10 ms), and this means that the 1000 frames estimate is likely to be thrown off, and the result is hundreds of dropped frames and underruns in that period. Once the clocks have diverged enough again, you're off the hook, but again, this isn't a good place to be. Thus, Nageru 1.6.1 change the algorithm around yet again, by incorporating more data to build an explicit jitter model. 1.5.0 was already timestamping each frame to be able to measure end-to-end latency precisely (now also exposed in Prometheus metrics), but from 1.6.1, they are actually used in the queueing algorithm. I ran several eight- to twelve-hour tests and simply stored all the event arrivals to a file, and then simulated a few different algorithms (including the old algorithm) to see how they fared in measures such as latency and number of drops/underruns. I won't go into the full details of the new queueing algorithm (see the commit if you're interested), but the gist is: Based on the last 5000 frames, it tries to estimate the maximum possible jitter for each input (ie., how late the frame could possibly be). Based on this as well as clock offsets, it determines whether it's really sure that there will be an input frame available on the next master tick even if it drops the queue, and then trims the queue to fit. The result is pretty satisfying; here's the end-to-end latency of my camera being sent through to the SDI output: As you can see, the latency goes up, up, up until Nageru figures it's now safe to drop a frame, and then does it in one clean drop event; no more hundreds on drops involved. There are very late frame arrivals involved in this run two extra frame drops, to be precise but the algorithm simply determines immediately that they are outliers, and drops them without letting them linger in the queue. (Immediate dropping is usually preferred to sticking around for a bit and then dropping it later, as it means you only get one disturbance event in your stream as opposed to two. Of course, you can only do it if you're reasonably sure it won't lead to more underruns later.) Nageru 1.6.1 will ship before Solskogen, as I intend to run it there :-) And there will probably be lovely premade Grafana dashboards from the Prometheus data. Although it would have been a lot nicer if Grafana were more packaging-friendly, so I could pick it up from stock Debian and run it on armhf. Hrmf. :-)

18 June 2017

Eriberto Mota: Como migrar do Debian Jessie para o Stretch

Bem vindo ao Debian Stretch! Ontem, 17 de junho de 2017, o Debian 9 (Stretch) foi lan ado. Eu gostaria de falar sobre alguns procedimentos b sicos e regras para migrar do Debian 8 (Jessie). Passos iniciais
# apt-get update
# apt-get dist-upgrade
Migrando
deb http://ftp.br.debian.org/debian/ stretch main
deb-src http://ftp.br.debian.org/debian/ stretch main
   
deb http://security.debian.org/ stretch/updates main
deb-src http://security.debian.org/ stretch/updates main
# apt-get update
# apt-get dist-upgrade
Caso haja algum problema, leia as mensagens de erro e tente resolver o problema. Resolvendo ou n o tal problema, execute novamente o comando:
# apt-get dist-upgrade
Havendo novos problemas, tente resolver. Busque solu es no Google, se for necess rio. Mas, geralmente, tudo dar certo e voc n o dever ter problemas. Altera es em arquivos de configura o Quando voc estiver migrando, algumas mensagens sobre altera es em arquivos de configura o poder o ser mostradas. Isso poder deixar alguns usu rios pedidos, sem saber o que fazer. N o entre em p nico. Existem duas formas de apresentar essas mensagens: via texto puro em shell ou via janela azul de mensagens. O texto a seguir um exemplo de mensagem em shell:
Ficheiro de configura o '/etc/rsyslog.conf'
 ==> Modificado (por si ou por um script) desde a instala o.
 ==> O distribuidor do pacote lan ou uma vers o atualizada.
 O que deseja fazer? As suas op es s o:
 Y ou I : instalar a vers o do pacote do maintainer
 N ou O : manter a vers o actualmente instalada
 D : mostrar diferen as entre as vers es
 Z : iniciar uma shell para examinar a situa o
 A a o padr o   manter sua vers o atual.
*** rsyslog.conf (Y/I/N/O/D/Z) [padr o=N] ?
A tela a seguir um exemplo de mensagem via janela: Nos dois casos, recomend vel que voc escolha por instalar a nova vers o do arquivo de configura o. Isso porque o novo arquivo de configura o estar totalmente adaptado aos novos servi os instalados e poder ter muitas op es novas ou diferentes. Mas n o se preocupe, pois as suas configura es n o ser o perdidas. Haver um backup das mesmas. Assim, para shell, escolha a op o "Y" e, no caso de janela, escolha a op o "instalar a vers o do mantenedor do pacote". muito importante anotar o nome de cada arquivo modificado. No caso da janela anterior, trata-se do arquivo /etc/samba/smb.conf. No caso do shell o arquivo foi o /etc/rsyslog.conf. Depois de completar a migra o, voc poder ver o novo arquivo de configura o e o original. Caso o novo arquivo tenha sido instalado ap s uma escolha via shell, o arquivo original (o que voc tinha anteriormente) ter o mesmo nome com a extens o .dpkg-old. No caso de escolha via janela, o arquivo ser mantido com a extens o .ucf-old. Nos dois casos, voc poder ver as modifica es feitas e reconfigurar o seu novo arquivo de acordo com as necessidades. Caso voc precise de ajuda para ver as diferen as entre os arquivos, voc poder usar o comando diff para compar -los. Fa a o diff sempre do arquivo novo para o original. como se voc quisesse ver como fazer com o novo arquivo para ficar igual ao original. Exemplo:
# diff -Naur /etc/rsyslog.conf /etc/rsyslog.conf.dpkg-old
Em uma primeira vista, as linhas marcadas com "+" dever o ser adicionadas ao novo arquivo para que se pare a com o anterior, assim como as marcadas com "-" dever o ser suprimidas. Mas cuidado: normal que haja algumas linhas diferentes, pois o arquivo de configura o foi feito para uma nova vers o do servi o ou aplicativo ao qual ele pertence. Assim, altere somente as linhas que realmente s o necess rias e que voc mudou no arquivo anterior. Veja o exemplo:
+daemon.*;mail.*;\
+ news.err;\
+ *.=debug;*.=info;\
+ *.=notice;*.=warn  /dev/xconsole
+*.* @sam
No meu caso, originalmente, eu s alterei a ltima linha. Ent o, no novo arquivo de configura o, s terei interesse em adicionar essa linha. Bem, se foi voc quem fez a configura o anterior, voc saber fazer a coisa certa. Geralmente, n o haver muitas diferen as entre os arquivos. Outra op o para ver as diferen as entre arquivos o comando mcdiff, que poder ser fornecido pelo pacote mc. Exemplo:
# mcdiff /etc/rsyslog.conf /etc/rsyslog.conf.dpkg-old
Problemas com ambientes e aplica es gr ficas poss vel que voc tenha algum problema com o funcionamento de ambientes gr ficos, como Gnome, KDE etc, ou com aplica es como o Mozilla Firefox. Nesses casos, prov vel que o problema seja os arquivos de configura o desses elementos, existentes no diret rio home do usu rio. Para verificar, crie um novo usu rio no Debian e teste com ele. Se tudo der certo, fa a um backup das configura es anteriores (ou renomeie as mesmas) e deixe que a aplica o crie uma configura o nova. Por exemplo, para o Mozilla Firefox, v ao diret rio home do usu rio e, com o Firefox fechado, renomeie o diret rio .mozilla para .mozilla.bak, inicie o Firefox e teste. Est inseguro? Caso voc esteja muito inseguro, instale um Debian 8, com ambiente gr fico e outras coisas, em uma m quina virtual e migre para Debian 9 para testar e aprender. Sugiro VirtualBox como virtualizador. Divirta-se!

30 May 2017

Steinar H. Gunderson: Nageru 1.6.0 released

I've released version 1.6.0 of Nageru, my live video mixer, together with dependent libraries Movit 1.5.1 and bmusb 0.7.0. The primray new feature this time is integration with CasparCG, the dominating open-source broadcast graphics system, which opens up a whole new world of possibilities for intelligent overlay graphics. (Actually, the feature is a bit more generic than that; any FFmpeg file or stream will do as input. Audio isn't supported yet, though.) You can see a simple HTML5 CasparCG setup in the ultimate tournament stream test we did in April, in preparation of a larger event in September; CasparCG generates a stream with alpha, which is then fed into Nageru and used on top of the three camera sources. Apart from that, there's a new frame analyzer that helps with calibrating your signal chain; there are lots of devices that will happily mess with your signal, and measuring is the first step in counteracting that. (There's also a few input interpretation tweaks that will help most common issues.) 1.6.0 is on its way to Debian experimental, along with its dependencies (stretch will release with Nageru 1.4.2); there are likely to be backports when stretch releases and the backport queue opens up.

26 May 2017

Steinar H. Gunderson: Last minute stretch bugs

The last week, I found no less than three pet bugs I have hope that will be allowed to go in before stretch release: I promise, none of these were found late because I upgraded to stretch too late just a perfect storm. :-)

27 April 2017

Steinar H. Gunderson: Chinese HDMI-to-SDI converters, part II

Following up on my previous post, I have only a few small nuggets of extra information: The power draw appears to be 230 240 mA at 5 V, or about 1.15 W. (It doesn't matter whether they have a signal or not.) This means you can power them off of a regular USB power bank; in a recent test, we used Deltaco 20000 mAh (at 3.7 V) power banks, which are supposed to power a GoPro plus such a converter for somewhere between 8 12 hours (we haven't measured exactly). It worked brilliantly, and solved a headache of how to get AC to the camera and converter; just slap on a power bank instead, and all you need to run is SDI. Curiously enough, 230 mA is so little that the power bank thinks it doesn't count as load, and just turns itself off after ten seconds or so. However, with the GoPro also active, it stays on all the time. At least it ran for the two hours that we needed without a hitch. The SDI converters don't accept USB directly, but you can purchase dumb USB-to-5.5x2.1mm converters cheap from wherever, which works fine even though USB is supposed to give you only 100 mA without handshaking. Some eBay sellers seem to include them with the converters, even. I guess the power bank just doesn't care; it's spec-ed to 2.1 A on two ports and 1 A on the last one. Update: Sebastian Reichel pointed out that USB 2.0 ups the limit to 500 mA, so you should be within spec in any case. :-) And here's a picture of the entire contraption as a bonus: Power bank and HDMI-to-SDI converter

18 April 2017

Steinar H. Gunderson: Chinese HDMI-to-SDI converters

I often need to convert signals from HDMI to SDI (and occasionally back). This requires a box of some sort, and eBay obliges; there's a bunch of different sellers of the same devices, selling them around $20 25. They don't seem to have a brand name, but they are invariably sold as 3G-SDI converters (meaning they should go up to 1080p60) and look like this: There are also corresponding SDI-to-HDMI converters that look pretty much the same except they convert the other way. (They're easy to confuse, but that's not a problem unique tothem.) I've used them for a while now, and there are pros and cons. They seem reliable enough, and they're 1/4th the price of e.g. Blackmagic's Micro converters, which is a real bargain. However, there are also some issues: The last issue is by far the worst, but it only affects 3G-SDI resolutions. 720p60, 1080p30 and 1080i60 all work fine. And to be fair, not even Blackmagic's own converters actually send 352M correctly most of the time I wish there were a way I could publish this somewhere people would actually read it before buying these things, but without a name, it's hard for people to find it. They're great value for money, and I wouldn't hesitate to recommend them for almost all use but then, there's that almost. :-)

5 April 2017

Steinar H. Gunderson: Nageru 1.5.0 released

I just released version 1.5.0 of Nageru, my live video mixer. The biggest feature is obviously the HDMI/SDI live output, but there are lots of small nuggets everywhere; it's been four months in the making. I'll simply paste the NEWS entry here:
Nageru 1.5.0, April 5th, 2017
  - Support for low-latency HDMI/SDI output in addition to (or instead of) the
    stream. This currently only works with DeckLink cards, not bmusb. See the
    manual for more information.
  - Support changing the resolution from the command line, instead of locking
    everything to 1280x720.
  - The A/V sync code has been rewritten to be more in line with Fons
    Adriaensen's original paper. It handles several cases much better,
    in particular when trying to match 59.94 and 60 Hz sources to each other.
    However, it might occasionally need a few extra seconds on startup to
    lock properly if startup is slow.
  - Add support for using x264 for the disk recording. This makes it possible,
    among other things, to run Nageru on a machine entirely without VA-API
    support.
  - Support for 10-bit Y'CbCr, both on input and output. (Output requires
    x264 disk recording, as Quick Sync Video does not support 10-bit H.264.)
    This requires compute shader support, and is in general a little bit
    slower on input and output, due to the extra amount of data being shuffled
    around. Intermediate precision is 16-bit floating-point or better,
    as before.
  - Enable input mode autodetection for DeckLink cards that support it.
    (bmusb mode has always been autodetected.)
  - Add functionality to add a time code to the stream; useful for debugging
    latency.
  - The live display is now both more performant and of higher image quality.
  - Fix a long-standing issue where the preview displays would be too bright
    when using an NVIDIA GPU. (This did not affect the finished stream.)
  - Many other bugfixes and small improvements.
1.5.0 is on its way into Debian experimental (it's too late for the stretch release, especially as it also depends on Movit and bmusb from experimental), or you can get it from the home page as always.

4 April 2017

Matthias Klumpp: On Tanglu

It s time for a long-overdue blogpost about the status of Tanglu. Tanglu is a Debian derivative, started in early 2013 when the systemd debate at Debian was still hot. It was formed by a few people wanting to create a Debian derivative for workstations with a time-based release schedule using and showcasing new technologies (which include systemd, but also bundling systems and other things) and built in the open with a community using the similar infrastructure to Debian. Tanglu is designed explicitly to complement Debian and not to compete with it on all devices. Tanglu has achieved a lot of great things. We were the first Debian derivative to adopt systemd and with the help of our contributors we could kill a few nasty issues affecting it and Debian before it ended up becoming default in Debian Jessie. We also started to use the Calamares installer relatively early, bringing a modern installation experience additionally to the traditional debian-installer. We performed the usrmerge early, uncovering a few more issues which were fed back into Debian to be resolved (while workarounds were added to Tanglu). We also briefly explored switching from initramfs-tools to Dracut, but this release goal was dropped due to issues (but might be revived later). A lot of other less-impactful changes happened as well, borrowing a lot of useful ideas and code from Ubuntu (kudos to them!). On the infrastructure side, we set up the Debian Archive Kit (dak), managing to find a couple of issues (mostly hardcoded assumptions about Debian) and reporting them back to make using dak for distributions which aren t Debian easier. We explored using fedmsg for our infrastructure, went through a long and painful iteration of build systems (buildbot -> Jenkins -> Debile) before finally ending up with Debile, and added a set of own custom tools to collect archive QA information and present it to our developers in an easy to digest way. Except for wanna-build, Tanglu is hosting an almost-complete clone of basic Debian archive management tools. During the past year however, the project s progress slowed down significantly. For this, mostly I am to blame. One of the biggest challenges for a young project is to attract new developers and members and keep them engaged. A lot of the people coming to Tanglu and being interested in contributing were unfortunately no packagers and sometimes no developers, and we didn t have the manpower to individually mentor these people and teach them the necessary skills. People asking for tasks were usually asked where their interests were and what they would like to do to give them a useful task. This sounds great in principle, but in practice it is actually not very helpful. A curated list of junior jobs is a much better starting point. We also invested almost zero time in making our project known and create the necessary buzz and excitement that s actually needed to sustain a project like this. Doing more in the advertisement domain and help newcomers area is a high priority issue in the Tanglu bugtracker, which to the day is still open. Doing good alone isn t enough, talking about it is of crucial importance and that is something I knew about, but didn t realize the impact of for quite a while. As strange as it sounds, investing in the tech only isn t enough, community building is of equal importance. Regardless of that, Tanglu has members working on the project, but way too few to manage a project of this magnitude (getting package transitions migrated alone is a large task requiring quite some time while at the same time being incredibly boring :P). A lot of our current developers can only invest small amounts of time into the project because they have a lot of other projects as well. The other issue why Tanglu has problems is too much stuff being centralized on myself. That is a problem I wanted to rectify for a long time, but as soon as a task wasn t done in Tanglu because no people were available to do it, I completed it. This essentially increased the project s dependency on me as single person, giving it a really low bus factor. It not only centralizes power in one person (which actually isn t a problem as long as that person is available enough to perform tasks if asked for), it also centralizes knowledge on how to run services and how to do things. And if you want to give up power, people will need the knowledge on how to perform the specific task first (which they will never gain if there s always that one guy doing it). I still haven t found a great way to solve this it s a problem that essentially kills itself as soon as the project is big enough, but until then the only way to counter it slightly is to write lots of documentation. Last year I had way less time to work on Tanglu than the project deserves. I also started to work for Purism on their PureOS Debian derivative (which is heavily influenced by some of the choices we made for Tanglu, but with different focus that s probably something for another blogpost). A lot of the stuff I do for Purism duplicates the work I do on Tanglu, and also takes away time I have for the project. Additionally I need to invest a lot more time into other projects such as AppStream and a lot of random other stuff that just needs continuous maintenance and discussion (especially AppStream eats up a lot of time since it became really popular in a lot of places). There is also my MSc thesis in neuroscience that requires attention (and is actually in focus most of the time). All in all, I can t split myself and KDE s cloning machine remains broken, so I can t even use that ;-). In terms of projects there is also a personal hard limit of how much stuff I can handle, and exceeding it long-term is not very healthy, as in these cases I try to satisfy all projects and in the end do not focus enough on any of them, which makes me end up with a lot of half-baked stuff (which helps nobody, and most importantly makes me loose the fun, energy and interest to work on it). Good news everyone! (sort of) So, this sounded overly negative, so where does this leave Tanglu? Fact is, I can not commit the crazy amounts of time for it as I did in 2013. But, I love the project and I actually do have some time I can put into it. My work on Purism has an overlap with Tanglu, so Tanglu can actually benefit from the software I develop for them, maybe creating a synergy effect between PureOS and Tanglu. Tanglu is also important to me as a testing environment for future ideas (be it in infrastructure or in the make bundling nice! department). So, what actually is the way forward? First, maybe I have the chance to find a few people willing to work on tasks in Tanglu. It s a fun project, and I learned a lot while working on it. Tanglu also possesses some unique properties few other Debian derivatives have, like being built from source completely (allowing us things like swapping core components or compiling with more hardening flags, switching to newer KDE Plasma and GNOME faster, etc.). Second, if we do not have enough manpower, I think converting Tanglu into a rolling-release distribution might be the only viable way to keep the project running. A rolling release scheme creates much less effort for us than making releases (especially time-based ones!). That way, users will have a constantly updated and secure Tanglu system with machines doing most of the background work. If it turns out that absolutely nothing works and we can t attract new people to help with Tanglu, it would mean that there generally isn t much interest from the developer or user side in a project like this, so shutting it down or scaling it down dramatically would be the only option. But I do not think that this is the case, and I believe that having Tanglu around is important. I also have some interesting plans for it which will be fun to implement for testing  The only thing that had to stop is leaving our users in the dark on what is happening. Sorry for the long post, but there are some subjects which are worth writing more than 140 characters about  If you are interested in contributing to Tanglu, get in touch with us! We have an IRC channel #tanglu-devel on Freenode (go there for quicker responses!), forums and mailinglists, It looks like I will be at Debconf this year as well, so you can also catch me there! I might even talk about PureOS/Tanglu infrastructure at the conference.

21 March 2017

Steinar H. Gunderson: 10-bit H.264 support

Following my previous tests about 10-bit H.264, I did some more practical tests; since media.xiph.org is up again, I did some tests with actual 10-bit input. The results were pretty similar, although of course 4K 60 fps organic content is going to be different at times from the partially rendered 1080p 24 fps clip I used. But I also tested browser support, with good help from people on IRC. It was every bit as bad as I feared: Chrome on desktop (Windows, Linux, macOS) supports 10-bit H.264, although of course without hardware acceleration. Chrome on Android does not. Firefox does not (it tries on macOS, but plays back buggy). iOS does not. VLC does; I didn't try a lot of media players, but obviously ffmpeg-based players should do quite well. I haven't tried Chromecast, but I doubt it works. So I guess that yes, it really is 8-bit H.264 or 10-bit HEVC but I haven't tested the latter yet either :-)

9 March 2017

Steinar H. Gunderson: Tired

To be honest, at this stage I'd actually prefer ads in Wikipedia to having ever more intrusive begging for donations. Please go away soon.

27 February 2017

Steinar H. Gunderson: 10-bit H.264 tests

Following the post about 10-bit Y'CbCr earlier this week, I thought I'd make an actual test of 10-bit H.264 compression for live streaming. The basic question is; sure, it's better-per-bit, but it's also slower, so it is better-per-MHz? This is largely inspired by Ronald Bultje's post about streaming performance, where he largely showed that HEVC is currently useless for live streaming from software; unless you can encode at x264's veryslow preset (which, at 720p60, means basically rather simple content and 20 cores or so), the best x265 presets you can afford will give you worse quality than the best x264 presets you can afford. My results will maybe not be as scientific, but hopefully still enlightening. I used the same test clip as Ronald, namely a two-minute clip of Tears of Steel. Note that this is an 8-bit input, so we're not testing the effects of 10-bit input; it's just testing the increased internal precision in the codec. Since my focus is practical streaming, I ran the last version of x264 at four threads (a typical desktop machine), using one-pass encoding at 4000 kbit/sec. Nageru's speed control has 26 presets to choose from, which gives pretty smooth steps between neighboring ones, but I've been sticking to the ten standard x264 presets (ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, placebo). Here's the graph: The x-axis is seconds used for the encode (note the logarithmic scale; placebo takes 200 250 times as long as ultrafast). The y-axis is SSIM dB, so up and to the left is better. The blue line is 8-bit, and the red line is 10-bit. (I ran most encodes five times and averaged the results, but it doesn't really matter, due to the logarithmic scale.) The results are actually much stronger than I assumed; if you run on (8-bit) ultrafast or superfast, you should stay with 8-bit, but from there on, 10-bit is on the Pareto frontier. Actually, 10-bit veryfast (18.187 dB) is better than 8-bit medium (18.111 dB), while being four times as fast! But not all of us have a relation to dB quality, so I chose to also do a test that maybe is a bit more intuitive, centered around bitrate needed for constant quality. I locked quality to 18 dBm, ie., for each preset, I adjusted the bitrate until the SSIM showed 18.000 dB plus/minus 0.001 dB. (Note that this means faster presets get less of a speed advantage, because they need higher bitrate, which means more time spent entropy coding.) Then I measured the encoding time (again five times) and graphed the results: x-axis is again seconds, and y-axis is bitrate needed in kbit/sec, so lower and to the left is better. Blue is again 8-bit and red is again 10-bit. If the previous graph was enough to make me intrigued, this is enough to make me excited. In general, 10-bit gives 20-30% lower bitrate for the same quality and CPU usage! (Compare this with the supposed up to 50% benefits of HEVC over H.264, given infinite CPU usage.) The most dramatic example is when comparing the medium presets directly, where 10-bit runs at 2648 kbit/sec versus 3715 kbit/sec (29% lower bitrate!) and is only 5% slower. As one progresses towards the slower presets, the gap is somewhat narrowed (placebo is 27% slower and only 24% lower bitrate), but in the realistic middle range, the difference is quite marked. If you run 3 Mbit/sec at 10-bit, you get the quality of 4 Mbit/sec at 8-bit. So is 10-bit H.264 a no-brainer? Unfortunately, no; the client hardware support is nearly nil. Not even Skylake, which can do 10-bit HEVC encoding in hardware (and 10-bit VP9 decoding), can do 10-bit H.264 decoding in hardware. Worse still, mobile chipsets generally don't support it. There are rumors that iPhone 6s supports it, but these are unconfirmed; some Android chips support it, but most don't. I guess this explains a lot of the limited uptake; since it's in some ways a new codec, implementers are more keen to get the full benefits of HEVC instead (even though the licensing situation is really icky). The only ones I know that have really picked it up as a distribution format is the anime scene, and they're feeling quite specific pains due to unique content (large gradients giving pronounced banding in undithered 8-bit). So, 10-bit H.264: It's awesome, but you can't have it. Sorry :-)

23 February 2017

Steinar H. Gunderson: Fyrrom recording released

The recording of yesterday's Fyrrom (Samfundet's unofficial take on Boiler Room) is now available on YouTube. Five video inputs, four hours, two DJs, no dropped frames. Good times. Soundcloud coming soon!

21 February 2017

Steinar H. Gunderson: 8-bit Y'CbCr ought to be enough for anyone?

If you take a random computer today, it's pretty much a given that it runs a 24-bit mode (8 bits of each of R, G and B); as we moved from palettized displays at some point during the 90s, we quickly went past 15- and 16-bit and settled on 24-bit. The reasons are simple; 8 bits per channel is easy to work with on CPUs, and it's on the verge of what human vision can distinguish, at least if you add some dither. As we've been slowly taking the CPU off the pixel path and replacing it with GPUs (which has specialized hardware for more kinds of pixels formats), changing formats have become easier, and there's some push to 10-bit (30-bit) deep color for photo pros, but largely, 8-bit per channel is where we are. Yet, I'm now spending time adding 10-bit input (and eventually also 10-bit output) to Nageru. Why? The reason is simple: Y'CbCr. Video traditionally isn't done in RGB, but in Y'CbCr; that is, a black-and-white signal (Y) and then two color-difference signals (Cb and Cr, roughly additional blueness and additional redness , respectively). We started doing this because it was convenient in analog TV (if you separate the two, black-and-white TVs can just ignore the color signal), but we kept doing it because it's very nice for reducing bandwidth: Human vision is much less sensitive to color than to brightness, so we can transfer the color channels in lower resolution and get away with it. (Also, a typical Bayer sensor can't deliver full color resolution anyway.) So most cameras and video codecs work in Y'CbCr, not RGB. Let's look at the implications of using 8-bit Y'CbCr, using a highly simplified model for, well, simplicity. Let's define Y = 1/3 (R + G + B), Cr = R - Y and Cb = B - Y. (The reverse transformation becomes R = Y + Cr, B = Y + Cb and G = 3Y - R - B.) This means that an RGB color such as pure gray ([127, 127, 127]) becomes [127, 0, 0]. All is good, and Y can go from 0 to 255, just like R, G and B can. A pure red ([255, 0, 0]) becomes [85, 170, 0], and a pure blue ([255, 0, 0]) becomes correspondingly [85, 0, 170]. But we can also have negative Cr and Cb values; a pure yellow ([0, 255, 255]) becomes [170, -170, 85], for instance. So we need to squeeze values from -170 to +170 into an 8-bit range, losing accuracy. Even worse, there are valid Y'CbCr triplets that don't correspond to meaningful RGB colors at all. For instance, Y'CbCr [255, 170, 0] would be RGB [425, 85, 255]; R is out of range! And Y'CbCr [255, -170, 0] would be RGB [85, -85, 255], that is, negative green. This isn't a problem for compression, as we can just avoid using those illegal colors with no loss of efficiency. But it means that the conversion in itself causes a loss; actually, if you do the maths on the real formulas (using the BT.601 standard), it turns out only 17% of the 24-bit Y'CbCr code words are valid! In other words, we lose about two and a half bits of data, and our 24 bits of accuracy have been reduced to 21.5. Or, to put it another way; 8-bit Y'CbCr is roughly equivalent to 7-bit RGB. Thus, pretty much all professional video uses 10-bit Y'CbCr. It's much more annoying to deal with (especially when you've got subsampling!), but if you're using SDI, there's not even any 8-bit version defined, so if you insist on 8-bit, you're taking data you're getting on the wire (whether you want it or not) and throwing 20% of it away. UHDTV standards (using HEVC) are also simply not defined for 8-bit; it's 10- and 12-bit only, even on the codec level. Parts of this is because UHDTV also supports HDR, so you have a wider RGB range than usual to begin with, and 8-bit would cause excessive banding. Using it on the codec level makes a lot of sense for another reason, namely that you reduce internal roundoff errors during processing by a lot; errors equal noise, and noise is bad for compression. I've seen numbers of 15% lower bitrate for H.264 at the same quality, although you also have to take into account that the encoeder also needs more CPU power that you could have used for a higher preset in 8-bit. I don't know how the tradeoff here works out, and you also have to take into account decoder support for 10-bit, especially when it comes to hardware. (When it comes to HEVC, Intel didn't get full fixed-function 10-bit support before Kaby Lake!) So indeed, 10-bit Y'CbCr makes sense even for quite normal video. It isn't a no-brainer to turn it on, though even though Nageru uses a compute shader to convert the 4:2:2 10-bit Y'CbCr to something the GPU can sample from quickly (ie., the CPU doesn't need to touch it), and all internal processing is in 16-bit floating point anyway, it still takes a nonzero amount of time to convert compared to just blasting through 8-bit, so my ultraportable probably won't make it anymore. (A discrete GPU has no issues at all, of course. My laptop converts a 720p frame in about 1.4 ms, FWIW.) But it's worth considering when you want to squeeze even more quality out of the system. And of course, there's still 10-bit output support to be written...

2 February 2017

Steinar H. Gunderson: Not going to FOSDEM but a year of Nageru

It's that time of the year :-) And FOSDEM is fun. But this year I won't be going; there was a scheduling conflict, and I didn't really have anything new to present (although I probably could have shifted around priorities to get something). But FOSDEM 2017 also means there's a year since FOSDEM 2016, where I presented Nageru, my live video mixer. And that's been a pretty busy year, so I thought I'd do a live cap from high up above. First of all, Nageru has actually been used in production we did Solskogen and Fyrrom, and both gave invaluable input. Then there have been some non-public events, which have also been useful. The Nageru that's in git right now is evolved considerably from the 1.0.0 that was released last year. diffstat shows 19660 insertions and 3543 deletions; that's counting about 2500 lines of vendored headers, though. Even though I like deleting code much more than adding it, the doubling (from ~10k to ~20k lines) represents a significant amount of new features: 1.1.x added support for non-Intel GPUs. 1.2.x added support for DeckLink input cards (through Blackmagic's proprietary drivers), greatly increasing hardware support, and did a bunch of small UI changes. 1.3.x added x264 support that's strong enough that Nageru has really displaced VLC as my go-to tool for just video-signal-to-H.264-conversion (even though it feels overkill), and also added hotplug support. 1.4.x added multichannel audio support including support for MIDI controllers, and also a disk space indicator (because when you run out of disk during production without understanding that's what happens, it really sucks), and brought extensive end-user documentation. And 1.5.x, in development right now, will add HDMI/SDI output, which, like all the previous changes, requires various rearchitecting and fixing. Of course, there are lots of things that haven't changed as well; the basic UI remains the same, including the way the theme (governing the look-and-feel of the finished video stream) works. The basic design has proved sound, and I don't think I would change a lot if I were to design something like 1.0.0 again. As a small free software project, you have to pick your battles, and I'm certainly glad I didn't start out doing something like network support (or a distributed architecture in general, really). So what's for the next year of Nageru? It's hard to say, and it will definitely depend on the concrete needs of events. A hot candidate (since I might happen to need it) is chroma keying, although good keying is hard to get right and this needs some research. There's also been some discussion around other concrete features, but I won't name them until a firm commitment has been made; priorities can shift around, and it's important to stay flexible. So, enjoy FOSDEM! Perhaps I'll return with a talk in 2018. In the meantime, I'll preparing the stream for the 2017 edition of Fyrrom, and I know for sure there will be more events, more features and more experiences to be had. And, inevitably, more bugs. :-)

22 January 2017

Steinar H. Gunderson: Nageru loopback test

Nageru, my live video mixer, is in the process of gaining HDMI/SDI output for bigscreen use, and in that process, I ran some loopback tests (connecting the output of one card into the input of another) to verify that I had all the colorspace parameters right. (This is of course trivial if you are only sending one input on bit-by-bit, but Nageru is much more flexible, so it really needs to understand what each pixel means.) It turns out that if you mess up any of these parameters ever so slightly, you end up with something like this, this or this. But thankfully, I got this instead on the very first try, so it really seems it's been right all along. :-) (There's a minor first-generation loss in that the SDI chain is 8-bit Y'CbCr instead of 10-bit, but I really can't spot it with the naked eye, and it doesn't compound through generations. I plan to fix that for those with spare GPU power at some point, possibly before 1.5.0 release.)

11 January 2017

Steinar H. Gunderson: 3G-SDI signal support

I had to figure out what kinds of signal you can run over 3G-SDI today, and it's pretty confusing, so I thought I'd share it. For the reference, 3G-SDI is the same as 3G HD-SDI, an extension of HD-SDI, which is an extension of the venerable SDI standard (well, duh). They're all used for running uncompressed audio/video data of regular BNC coaxial cable, possibly hundreds of meters, and are in wide use in professional and semiprofessional setups. So here's the rundown on 3G-SDI capabilities: And then there's dual-link 3G-SDI, which uses two cables instead of one and there's also Blackmagic's proprietary 6G-SDI , which supports basically everything dual-link 3G-SDI does. But in 2015, seemingly there was also a real 6G-SDI and 12G-SDI, and it's unclear to me whether it's in any way compatible with Blackmagic's offering. It's all confusing. But at least, these are the differences from single-link to dual-link 3G-SDI: 4K? I don't know. 120fps? I believe that's also a proprietary extension of some sort. And of course, having a device support 3G-SDI doesn't mean at all it's required to support all of this; in particular, I believe Blackmagic's systems don't support alpha at all except on their single 12G-SDI card, and I'd also not be surprised if RGB support is rather limited in practice.

8 January 2017

Steinar H. Gunderson: SpeedHQ decoder

I reverse-engineered a video codec. (And then the CTO of the company making it became really enthusiastic, and offered help. Life is strange sometimes.) I'd talk about this and some related stuff at FOSDEM, but there's a scheduling conflict, so I will be in s that weekend, not Brussels.

25 December 2016

Steinar H. Gunderson: Cracking a DataEase password

I recently needed to get access to a DataEase database; the person I helped was the legitimate owner of the data, but had forgotten the password, as the database was largely from 1996. There are various companies around the world that seem to do this, or something similar (like give you an API), for a usually unspecified fee; they all have very 90s homepages and in general seem like they have gone out of business a long time ago. And I wasn't prepared to wait. For those of you who don't know DataEase, it's a sort-of relational database for DOS that had its heyday in the late 80s and early 90s (being sort of the cheap cousin of dBase); this is before SQL gained traction as the standard query language, before real multiuser database access, and before variable-width field storage. It is also before reasonable encryption. Let's see what we can do. DataEase has a system where tables are mapped through the data dictionary, which is a table on its own. (Sidenote: MySQL pre-8.0 still does not have this.) This is the file RDRRTAAA.DBM; I don't really know what RDRR stands for, but T is the database letter in case you wanted more than one database in the same directory, and AAA, AAB, AAC etc. is a counter so that a table grows to be too big for one file. (There's also .DBA files for structure of non-system tables, and then some extra stuff for indexes.) DBM files are pretty much the classical, fixed-length 80s-style database files; each row has some flags (I believe these are for e.g. row is deleted ) and then just the rows in fixed format right after each other. For instance, here's one I created as part of testing (just the first few lines of the hexdump are shown):
00000000: 0e 00 01 74 65 73 74 62 61 73 65 00 00 00 00 00  ...testbase.....
00000010: 00 00 00 00 00 00 00 73 46 cc 29 37 00 09 00 00  .......sF.)7....
00000020: 00 00 00 00 00 43 3a 52 44 52 52 54 41 41 41 2e  .....C:RDRRTAAA.
00000030: 44 42 4d 00 00 01 00 0e 00 52 45 50 4f 52 54 20  DBM......REPORT 
00000040: 44 49 52 45 43 54 4f 52 59 00 00 00 00 00 1c bd  DIRECTORY.......
00000050: d4 1a 27 00 00 00 00 00 00 00 00 00 43 3a 52 45  ..'.........C:RE
00000060: 50 4f 54 41 41 41 2e 44 42 4d 00 00 01 00 0e 00  POTAAA.DBM......
00000070: 52 65 6c 61 74 69 6f 6e 73 68 69 70 73 00 00 00  Relationships...
Even without going in-depth, we can see the structure here; there's testbase which maps to C:RDRRTAA.DBM (the RDRR itself), there's a table called REPORT DIRECTORY that maps to C:REPOTAAA.DBM, and then more stuff after that, and so on. However, other tables are not so easily read, because you can ask DataEase to encrypt a table. Let's look at such an encrypted table, like the Users table (containing usernames, passwords not password hashes and some extra information like access level), which is always encrypted:
00000000: 0c 01 9f ed 94 f7 ed 34 ba 88 9f 78 21 92 7b 34  .......4...x!. 4
00000010: ba 88 0f d9 94 05 1e 34 ba 88 a0 78 21 92 7b 34  .......4...x!. 4
00000020: e2 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b 34  ...x!. 4...x!. 4
00000030: ba 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b     ...x!. 4...x!. 
Clearly, this isn't very good encryption; it uses a very short, repetitive key of eight bytes (64 bits). (The data is mostly zero padding, which makes it much easier to spot this.) In fact, in actual data tables, only five of these bytes are set to a non-zero value, which means we have a 40-bit key; export controls? My first assumption here was of course XOR, but through some experimentation, it turned out what you need is actually 8-bit subtraction (with wraparound). The key used is derived from both a database key and a per-table key, both stored in the RDRR; again, if you disassemble, I'm sure you can find the key derivation function, but that's annoying, too. Note, by the way, that this precludes making an attack by just copying tables between databases, since the database key is different. So let's do a plaintext attack. If you assume the plaintext of the bottom row is all padding, that's your key and here's what you end up with:
00000000: 52 79 00 75 73 65 72 00 00 00 00 00 00 00 00 00  Ry.user.........
00000010: 00 00 70 61 73 73 a3 00 00 00 01 00 00 00 00 00  ..pass..........
00000020: 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  (...............
00000030: 00 00 00 00 00 00 00 00                          ........ 
Not bad, eh? Actually the first byte of the key here is wrong as far as I know, but it didn't interfere with the fields, so we have what we need to log in. (At that point, we've won, because DataEase will helpfully decrypt everything transparent for us.) However, there's a twist; if the password is longer than four characters, the entire decryption of the Users table changes. Of course, we could run our plaintext attack against every data table and pick out the information by decoding the structure, but again; annoying. So let's see what it looks like if we choose passs instead:
00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d  ...z..!..c.m...]
00000010: 70 cb df 36 7e 7c 91 c5 d8 33 d8 3d 73 71 e7 2d  p..6~ ...3.=sq.-
00000020: 7b 9b 3f a5 db d9 4f 95 a8 03 a7 0d 43 41 b7 fd   .?...O.....CA..
00000030: 10 6b 0f 75 ab a9 1f 65 78 d3 77 dd 13 11 87     .k.u...ex.w....
Distinctly more confusing. At this point, of course, we know at which byte positions the username and password start, so if we wanted to, we could just try setting the start byte of the password to every possible byte in turn until we hit 0x00 (DataEase truncates fields at the first zero byte), which would allow us to get in with an empty password. However, I didn't know the username either, and trying two bytes would mean 65536 tries, and I wasn't up for automating macros through DOSBox. So an active attack wasn't too tempting. However, we can look at the last hex byte (where we know the plaintext is 0); it goes 0x5d, 0x2d, 0xfd... and some other bytes go 0x08, 0xd8, 0xa8, 0x78, and so on. So clearly there's an obfuscation here where we have a per-line offset that decreases with 0x30 per line. (Actually, the increase/decrease per line seems to be derived from the key somehow, too.) If we remove that, we end up with:
00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d  ...z..!..c.m...]
00000010: a0 fb 0f 66 ae ac c1 f5 08 63 08 6d a3 a1 17 5d  ...f.....c.m...]
00000020: db fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17 5d  ....;9...c.m...]
00000030: a0 fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17     ....;9...c.m...
Well, OK, this wasn't much more complicated; our fixed key is now 16 bytes long instead of 8 bytes long, but apart from that, we can do exactly the same plaintext attack. (Also, it seems to change per-record now, but we don't see it here, since we've only added one user.) Again, assume the last line is supposed to be all 0x00 and thus use that as a key (plus the last byte from the previous line), and we get:
00000000: 6e 06 00 75 73 65 72 00 00 00 00 00 00 00 00 00  n..user.........
00000010: 00 00 70 61 73 73 12 00 00 00 01 00 00 00 00 00  ..pass..........
00000020: 3b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ;...............
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00     ...............
Well, OK, it wasn't perfect; we got pass\x12 instead of passs , so we messed up somehow. I don't know exactly why the fifth character gets messed up like this; actually, it cost me half an hour of trying because the password looked very real but the database wouldn't let me in, but eventually, we just guessed at what the missing letter was supposed to be. So there you have it; practical small-scale cryptanalysis of DOS-era homegrown encryption. Nothing advanced, but the user was happy about getting the data back after a few hours of work. :-)

20 November 2016

Steinar H. Gunderson: Nageru documentation

Even though the World Chess Championship takes up a lot of time these days, I've still found some time for Nageru, my live video mixer. But this time it doesn't come in form of code; rather, I've spent my time writing documentation. I spent some time fretting over what technical solution I wanted. I explicitly wanted end-user documentation, not developer documentation I rarely find HTML-rendered versions of every member function in a class the best way to understand a program anyway. Actually, on the contrary: Having all sorts of syntax interwoven in class comments tends to be more distracting than anything else. Eventually I settled on Sphinx, not because I found it fantastic (in particular, ReST is a pain with its bizarre variable punctuation-based syntax), but because I'm convinced it has all the momentum right now. Just like git did back in the day, the fact that the Linux kernel has chosen it means it will inevitably grow a quite large ecosystem, and I won't be ending up having to maintain it anytime soon. I tried finding a balance between spending time on installation/setup (only really useful for first-time users, and even then, only a subset of them), concept documentation (how to deal with live video in general, and how Nageru fits into a larger ecosystem of software and equipment) and more concrete documentation of all the various features and quirks of Nageru itself. Hopefully, most people will find at least something that's not already obvious to them, without drowning in detail. You can read the documentation at https://nageru.sesse.net/doc/, or if you want to send patches, the right place to patch is the git repository.

Next.

Previous.