I started with an idea of what I'd like to hear. One of my motivations was to explore some automated updates we run at work. So, I was hoping to capture the initial DNS and ARP traffic as the update discovered the systems it would contact. Then I was hoping to capture the ssh and other traffic of the actual update.
To Packet or Stream
One of the simplest things to do would simply be to model network packets. For DNS I chose that approach. I was dubious that a packet-based model would capture the aspects of TCP streams I typically care about. I care about the source and destination (both address and port) of course. However I also care about how much traffic is being carried over the stream and the condition of the stream. Are there retransmits? Are there a bunch of unanswered SYNs? But I don't care about the actual distribution of packets. Also, a busy TCP stream can generate thousands of packets a second. I doubted my ability to distinguish thousands of sounds a second at all, especially while trying to convey enough information to carry stream characteristics like overall traffic volume.
So, for TCP, I decided to model some characteristics of streams rather than individual packets.
For DNS, I decided to represent individual requests/replies.
I came up with something clever for ARPP. There, I model the request/reply as an outstanding request. A lot of unanswered ARPs can be a sign of a scan or a significant problem. The mornful sound of a TCP stream trailing off into an unanswered ARP as the cache times out on a broken network is certainly something I'd like to capture. So, I track when an ARP request is sent and when/if it is answered.
Sound or Music
I saw two approaches. First, I could use some sound to represent streams. As an example, a running diesel engine could make a great representation of a stream. The engine speed could represent overall traffic flow. There are many opportunities for detuning the engine to represent various problems that can happen with a stream. Perhaps using stereo separation and slightly different fundamental frequencies I could even represent a couple of streams and still be able to track them.
However, at least with me as a listener, that's not going to scale to a busy network. The other option I saw was to try and create melodic music with various musical phrases modified as conditions within the stream or network changed. That seemed a lot harder to do, but humans are good at listening to complicated music.
I ended up deciding that at least for the TCP streams, I was going to try and produce something more musical than sound. I was nervous: I kept having visions of a performance of "Peter and the Wolf" with different instruments representing all the characters that somehow went dreadfully wrong.
As an aside, the decision to approach music rather than sound depended heavily on what I was trying to capture. If I'm modeling more holistic properties of a system--for example, total network traffic without splitting into streams--I think parameterized sounds would be a better approach.
The decision to approach things musically affected the rest of the modeling. Somehow I was going to need to figure out notes to play. I'd already rejected the idea of modeling packets, so I wouldn't simply be able to play notes when a packet arrived.
As I played with various options, I realized that the critical challenge would be figuring out how to focus the listener's attention on the important aspects of what was going on. Clutter was the great enemy. My job would be figuring out how to spend sound wisely. When something interesting happened, that part of the model should get more focus--more of the listener's energy.
Soon I found myself thinking a lot about managing the energy of network streams. I imagined streams getting energy when something happened, and spending that energy to convey that interesting event to the listener. Energy needed to accumulate fast enough that even low-traffic streams could be noticed. Energy needed to be spent fast enough that old events were not taking listener focus from new, interesting things going on. However, if the energy were spent slow enough, then network events could be smoothed out to give a better picture of the stream rather than individual packets.
This concept of managing some decaying quantity and managing the rate of decay proved useful at multiple levels of the model.
Two Layer Model
I started with a python script that parses tcpdump output. It associates a packet with a stream and batches packets together to avoid overloading other parts of the system.
The output of this script are stream events. Events include a source and destination address, a stream ID, traffic in each direction, and any special events on the stream.
For DNS, the script just outputs packet events. For ARP, the script outputs request start, reply, and timeout events. There's some initial support for UDP, but so far that doesn't make sound.
Right now, FINs are modeled, but SYNs and the interesting TCP conditions aren't directly modeled. If you get retransmissions you'll notice because packet flow will decrease. However, I'd love to explicitly sound retransmissions. I also think a window filling as an application fails to read is important. I imagine either narrowing a band-pass filter to clamp the audio bandwidth available to a stream with a full window. Or perhaps taking it the other direction and adding an echo.
The next layer down tracks the energy of each stream. But that, and how I map energy into music, is the topic of the next post.