Nageru 1.4.0 is out (and on its way through
the Debian upload process right now), so now you can do live video mixing with multichannel
audio to your heart's content. I've already blogged about most of the
interesting new features, so instead, I'm trying to answer a question:
What took so long?
To be clear, I'm not saying 1.4.0 took more time than I really anticipated
(on the contrary, I pretty much understood the scope from the beginning,
and there was a reason why I didn't go for building this stuff into 1.0.0);
but if you just look at the changelog from the outside, it's not immediately
obvious why multichannel audio support should take the better part of three
months of develoment. What I'm going to say is of course going to be obvious
to most software developers, but not everyone is one, and perhaps my
experiences will be illuminating.
Let's first look at some obvious things that
isn't the case: First of all,
development is not primarily limited by typing speed. There are about 9,000
lines of new code in 1.4.0 (depending a bit on how you count), and if it was
just about typing them in, I would be done in a day or two. On a good
keyboard, I can type plain text at more than 800 characters per minute but
you hardly ever write code for even a single minute at that speed. Just as
when writing a novel, most time is spent thinking, not typing.
I also didn't spend a lot of time backtracking; most code I wrote actually
ended up in the finished product as opposed to being thrown away. (I'm not
as lucky in all of my projects.) It's pretty
common to do so if you're in an exploratory phase, but in this case, I had a
pretty good idea of what I wanted to do right from the start, and that plan
seemed to work. This wasn't a
difficult project per se; it just needed
to be
done (which, in a sense, just increases the mystery).
However, even if this isn't at the forefront of science in any way (most code
in the world is pretty pedestrian, after all), there's still a lot of
decisions to make, on several levels of abstraction. And a lot of those
decisions depend on information gathering beforehand. Let's take a look at
an example from late in the development cycle, namely support for using MIDI
controllers instead of the mouse to control the various widgets.
I've kept a pretty meticulous TODO list; it's just a text file on my laptop,
but it serves the purpose of a ghetto bugtracker. For 1.4.0, it contains 83
work items (a single-digit number is not ticked off, mostly because I decided
not to do those things), which corresponds roughly 1:2 to the number of
commits. So let's have a look at what the ~20 MIDI controller items went into.
First of all, to allow MIDI controllers to influence the UI, we need a way
of getting to it. Since Nageru is single-platform on Linux, ALSA is the
obvious choice (if not, I'd probably have to look for a library to put
in-between), but seemingly, ALSA has two interfaces (raw MIDI and sequencer).
Which one do you want? It sounds like raw MIDI is what we want, but actually,
it's the sequencer interface (it does more of the MIDI parsing for you,
and generally is friendlier).
The first question is where to start picking events from. I went the simplest
path and just said I wanted all events anything else would necessitate a UI,
a command-line flag, figuring out if we wanted to distinguish between
different devices with the same name (and not all devices potentially even
have names), and so on. But how do you enumerate devices? (Relatively simple,
thankfully.) What do you do if the user inserts a new one while Nageru is
running? (Turns out there's a special device you can subscribe to that will
tell you about new devices.) What if you get an error on subscription?
(Just print a warning and ignore it; it's legitimate not to have access to
all devices on the system. By the way, for PCM devices, all of these answers
are different.)
So now we have a sequencer device, how do we get events from it? Can we do it in the main loop? Turns out
it probably doesn't integrate too well with Qt, but it's easy enough to put
it in a thread. The class dealing with the MIDI handling now needs locking;
what mutex granularity do we want? (Experience will tell you that you nearly
always just want one mutex. Two mutexes give you all sorts of headaches with
ordering them, and nearly never gives any gain.) ALSA expects us to poll()
a given set of descriptors for data, but on shutdown, how do you break out
of that poll to tell the thread to go away? (The simplest way on Linux is
using an eventfd.)
There's a quirk where if you get two or more MIDI messages right after each
other and only read one, poll() won't trigger to alert you there are more
left. Did you know that? (I didn't. I also can't find it documented. Perhaps
it's a bug?) It took me some looking into sample code to find it. Oh, and
ALSA uses POSIX error codes to signal errors (like nothing more is
available ), but it doesn't use errno.
OK, so you have events (like controller 3 was set to value 47 ); what do you do
about them? The meaning of the controller numbers is different from
device to device, and there's no open format for describing them. So I had to
make a format describing the mapping; I used protobuf (I have lots of
experience with it) to make a simple text-based format, but it's obviously
a nightmare to set up 50+ controllers by hand in a text file, so I had to
make an UI for this. My initial thought was making a grid of spinners
(similar to how the input mapping dialog already worked), but then I realized
that there isn't an easy way to make headlines in Qt's grid. (You can
substitute a label widget for a single cell, but not for an entire row.
Who knew?) So after some searching, I found out that it would be better
to have a tree view (Qt Creator does this), and then you can treat that
more-or-less as a table for the rows that should be editable.
Of course, guessing controller numbers is impossible even in an editor,
so I wanted it to respond to MIDI events. This means the editor needs
to take over the role as MIDI receiver from the main UI. How you do
that in a thread-safe way? (Reuse the existing mutex; you don't generally
want to use atomics for complicated things.) Thinking about it, shouldn't the
MIDI mapper just support multiple receivers at a time? (Doubtful; you don't
want your random controller fiddling during setup to actually influence
the audio on a running stream. And would you use the old or the new mapping?)
And do you really need to set up every single controller for each bus,
given that the mapping is pretty much guaranteed to be similar for them?
Making a guess bus button doesn't seem too difficult, where if you
have
one correctly set up controller on the bus, it can guess from
a neighboring bus (assuming a static offset). But what if there's
conflicting information? OK; then you should disable the button.
So now the enable/disable status of that button depends on which cell
in your grid has the focus; how do you get at those events? (Install an event
filter, or subclass the spinner.) And so on, and so on, and so on.
You could argue that most of these questions go away with experience;
if you're an expert in a given API, you can answer most of these questions
in a minute or two even if you haven't heard the exact question before.
But you can't expect even experienced developers to be an expert in all
possible libraries; if you know everything there is to know about Qt,
ALSA, x264, ffmpeg, OpenGL, VA-API, libusb, microhttpd
and Lua
(in addition to C++11, of course), I'm sure you'd be a great fit for
Nageru, but I'd wager that pretty few developers fit that bill.
I've written C++ for almost 20 years now (almost ten of them professionally),
and that experience certainly
helps boosting productivity, but I can't
say I expect a 10x reduction in my own development time at any point.
You could also argue, of course, that spending so much time on the editor
is wasted, since most users will only ever see it once. But here's the
point; it's not actually a lot of time. The only reason why it seems
like so much is that I bothered to write two paragraphs about it;
it's not a particular pain point, it just adds to the total. Also,
the first impression matters a lot if the user can't get the editor
to work, they also can't get the MIDI controller to work, and is likely
to just go do something else.
A common misconception is that just switching languages or using libraries
will help you a lot. (Witness the never-ending stream of software that
advertises written in Foo or uses Bar
as if it were a feature.)
For the former, note that nothing I've said so far is specific to my choice
of language (C++), and I've certainly avoided a bunch of battles by making
that specific choice over, say, Python. For the latter, note that most of these problems are actually related
to library use libraries are great, and they solve a bunch of problems
I'm really glad I didn't have to worry about (how should each button look?),
but they still give their own interaction problems. And even when you're a
master of your chosen programming environment, things
still take time,
because you have all those decisions to make on
top of your libraries.
Of course, there are cases where libraries really solve your
entire problem
and your code gets reduced to 100 trivial lines, but that's really only when
you're solving a problem that's been solved a million times before. Congrats
on making that blog in Rails; I'm sure you're advancing the world. (To make
things worse, usually this breaks down when you want to stray ever so
slightly from what was intended by the library or framework author. What
seems like a perfect match can suddenly become a development trap where you
spend more of your time trying to become an expert in working around the
given library than actually doing any development.)
The entire thing reminds me of the famous essay
No Silver Bullet by Fred
Brooks, but perhaps even more so, this quote from
John Carmack's .plan has
struck with me (incidentally about mobile game development in 2006,
but the basic story still rings true):
To some degree this is already the case on high end BREW phones today. I have
a pretty clear idea what a maxed out software renderer would look like for
that class of phones, and it wouldn't be the PlayStation-esq 3D graphics that
seems to be the standard direction. When I was doing the graphics engine
upgrades for BREW, I started along those lines, but after putting in a couple
days at it I realized that I just couldn't afford to spend the time to finish
the work. "A clear vision" doesn't mean I can necessarily implement it in a
very small integral number of days.
In a sense, programming is all about what your program should do in the first
place. The how question is just the what , moved down the chain of
abstractions until it ends up where a computer can understand it, and at that
point, the three words multichannel audio support have become those 9,000
lines that describe in perfect detail what's going on.