I have been spending time with
Jorge Aparicio's
RTFM for Cortex M3 framework for writing
Rust to target Cortex-M3
devices from Arm (and particularly the STM32F103 from ST Microelectronics).
Jorge's work in this area has been of interest to me ever since I discovered
him working on this stuff a while ago. I am very tempted by the idea of being
able to implement code for the STM32 with the guarantees of Rust and the
language features which I have come to love such as the trait system.
I have been thinking to myself that, while I admire and appreciate the work
done on the
GNUK, I would like to, personally, have a go at implementing
some kind of security token on an STM32 as a USB device. And with the advent
of the RTFM for M3 work, and Jorge's magical tooling to make it easier to
access and control the registers on an M3 microcontroller, I figured it'd be
super-nice to do this in Rust, with all the advantages that entails in terms
of isolating unsafe behaviour and generally having the potential to be more
easily verified as not misbehaving.
To do this though, means that I need a USB device stack which will work in the
RTFM framework. Sadly it seems that, thus-far, only Jorge has been working on
drivers for any of the M3 devices his framework supports. And one person can
only do so much. So, in my infinite madness, I decided I should investigate
the complexity of writing a USB device stack in Rust for the RTFM/M3 framework.
(Why I thought this was a good idea is lost to the mists of late night
Googling, but hey, it might make a good talk at the next conference I go to).
As such, this blog post, and further ones along these lines, will serve as a
partial tour of what I'm up to, and a partial aide-memoir for me about learning
USB. If I get something horribly wrong, please DO contact me to correct
me, otherwise I'll just continue to be wrong. If I've simplified something but
it's still strictly correct, just let me know if it's an
oversimplification
since in a lot of cases there's no point in me putting the full details into a
blog posting. I will mostly be considering USB2.0 protocol details but only
really for low and full speed devices. (The hardware I'm targetting does
low-speed and full-speed, but not high-speed. Though some similar HW does
high-speed too, I don't have any to hand right now)
A brief introduction to USB
In order to go much further, I needed a grounding in USB. It's a multi-layer
protocol as you might expect, though we can probably ignore the actual
electrical layer since any device we might hope to support will have to have a
hardware block to deal with that. We will however need to consider the packet
layer (since that will inform how the hardware block is implemented and thus
its interface) and then the higher level protocols on top.
USB is a deliberately asymmetric protocol. Devices are meant to be
significantly easier to implement, both in terms of hardware and software, as
compared with hosts. As such, despite some STM32s having OTG ports, I have no
intention of supporting host mode at this time.
USB is arranged into a set of busses which are, at least in the USB1.1 case,
broadcast domains. As such, each device has an address assigned to it by the
host during an early phase called 'configuration'. Once the address is
assigned, the device is expected to only ever respond to messages addressed to
it. Note that since everything is asymmetric in USB, the device can't send
messages on its own, but has to be asked for them by the host, and as such
the addressing is always from host toward device.
USB devices then expose a number of endpoints through which communication can
flow IN
to the host or OUT
to the device. Endpoints are not bidirectional,
but the in and out endpoints do overlap in numbering. There is a special pair
of endpoints, IN0
and OUT0
which, between them, form what I will call the
device control endpoints. The device control endpoints are important since
every USB device MUST implement them, and there are a number of well
defined messages which pass over them to control the USB device. In theory a
bare minimum USB device would implement only the device control endpoints.
Configurations, and Classes, and Interfaces, Oh My!
In order for the host to understand what the USB device is, and what it is
capable of, part of the device control endpoints' responsibility is to provide
a set of descriptors which describe the device. These descriptors form a
heirarchy and are then glommed together into a big lump of data which the host
can download from the device in order to decide what it is and how to use it.
Because of various historical reasons, where a multi-byte value is used, they
are defined to be little-endian, though there are some BCD fields.
Descriptors always start with a length byte and a type byte because that way
the host can parse/skip as necessary, with ease.
The first descriptor is the device descriptor, is a big one, and looks like
this:
Device Descriptor
Field Name |
Byte start |
Byte length |
Encoding |
Meaning |
bLength |
0 |
1 |
Number |
Size of the descriptor in bytes (18) |
bDescriptorType |
1 |
1 |
Constant |
Device Descriptor (0x01) |
bcdUSB |
2 |
2 |
BCD |
USB spec version compiled with |
bDeviceClass |
4 |
1 |
Class |
Code, assigned by USB org (0 means "Look at interface descriptors", common value is 2 for CDC) |
bDeviceSubClass |
5 |
1 |
SubClass |
Code, assigned by USB org (usually 0) |
bDeviceProtocol |
6 |
1 |
Protocol |
Code, assigned by USB org (usually 0) |
bMaxPacketSize |
7 |
1 |
Number |
Max packet size for IN0 /OUT0 (Valid are 8, 16, 32, 64) |
idVendor |
8 |
2 |
ID |
16bit Vendor ID (Assigned by USB org) |
idProduct |
10 |
2 |
ID |
16bit Product ID (Assigned by manufacturer) |
bcdDevice |
12 |
2 |
BCD |
Device version number (same encoding as bcdUSB) |
iManufacturer |
14 |
1 |
Index |
String index of manufacturer name (0 if unavailable) |
iProduct |
15 |
1 |
Index |
String index of product name (0 if unavailable) |
iSerialNumber |
16 |
1 |
Index |
String index of device serial number (0 if unavailable) |
bNumConfigurations |
17 |
1 |
Number |
Count of configurations the device has. |
This looks quite complex, but breaks down into a relatively simple two halves.
The first eight bytes carries everything necessary for the host to be able to
configure itself and the device control endpoints properly in order to
communicate effectively. Since eight bytes is the bare minimum a device must
be able to transmit in one go, the host can guarantee to get those, and they
tell it what kind of device it is, what USB protocol it supports, and what the
maximum transfer size is for its device control endpoints.
The encoding of the
bcdUSB
and
bcdDevice
fields is interesting too. It is
of the form
0xMMmm
where
MM
is the major number,
mm
the minor. So USB2.0
is encoded as
0x0200
, USB1.1 as
0x0110
etc. If the device version is 17.36
then that'd be
0x1736
.
Other fields of note are
bDeviceClass
which can be
0
meaning that
interfaces will specify their classes, and
idVendor
/
idProduct
which between
them form the primary way for the specific USB device to be identified. The
Index
fields are indices into a string table which we'll look at later. For
now it's enough to know that wherever a string index is needed,
0
can be
provided to mean "no string here".
The last field is
bNumConfigurations
and this indicates the number of ways in
which this device might function. A USB device can provide any number of these
configurations, though typically only one is provided. If the host wishes to
switch between configurations then it will have to effectively entirely quiesce
and reset the device.
The next kind of descriptor is the configuration descriptor. This one is much
shorter, but starts with the same two fields:
Configuration Descriptor
Field Name |
Byte start |
Byte length |
Encoding |
Meaning |
bLength |
0 |
1 |
Number |
Size of the descriptor in bytes (9) |
bDescriptorType |
1 |
1 |
Constant |
Configuration Descriptor (0x02) |
wTotalLength |
2 |
2 |
Number |
Size of the configuration in bytes, in total |
bNumInterfaces |
4 |
1 |
Number |
The number of interfaces in this configuration |
bConfigurationValue |
5 |
1 |
Number |
The value to use to select this configuration |
iConfiguration |
6 |
1 |
Index |
The name of this configuration (0 for unavailable) |
bmAttributes |
7 |
1 |
Bitmap |
Attributes field (see below) |
bMaxPower |
8 |
1 |
Number |
Maximum bus power this configuration will draw (in 2mA increments) |
An important field to consider here is the
bmAttributes
field which tells the
host some useful information. Bit 7 must be set, bit 6 is set if the device
would be self-powered in this configuration, bit 5 indicates that the device
would like to be able to wake the host from sleep mode, and bits 4 to 0 must be
unset.
The
bMaxPower
field is interesting because it encodes the power draw of the
device (when set to this configuration). USB allows for up to 100mA of draw
per device when it isn't yet configured, and up to 500mA when configured. The
value may be used to decide if it's sensible to configure a device if the host
is in a low power situation. Typically this field will be set to
50
to
indicate the nominal 100mA is fine, or
250
to request the full 500mA.
Finally, the
wTotalLength
field is interesting because it tells the host the
total length of this configuration, including all the interface and endpoint
descriptors which make it up. With this field, the host can allocate enough
RAM to fetch the entire configuration descriptor block at once, simplifying
matters dramatically for it.
Each configuration has one or more interfaces. The interfaces group some
endpoints together into a logical function. For example a configuration for
a multifunction scanner/fax/printer might have an interface for the scanner
function, one for the fax, and one for the printer. Endpoints are not shared
among interfaces, so when building this table, be careful.
Next, logically, come the interface descriptors:
Interface Descriptor
Field Name |
Byte start |
Byte length |
Encoding |
Meaning |
bLength |
0 |
1 |
Number |
Size of the descriptor in bytes (9) |
bDescriptorType |
1 |
1 |
Constant |
Interface Descriptor (0x04) |
bInterfaceNumber |
2 |
1 |
Number |
The number of the interface |
bAlternateSetting |
3 |
1 |
Number |
The interface alternate index |
bNumEndpoints |
4 |
1 |
Number |
The number of endpoints in this interface |
bInterfaceClass |
5 |
1 |
Class |
The interface class (USB Org defined) |
bInterfaceSubClass |
6 |
1 |
SubClass |
The interface subclass (USB Org defined) |
bInterfaceProtocol |
7 |
1 |
Protocol |
The interface protocol (USB Org defined) |
iInterface |
8 |
1 |
Index |
The name of the interface (or 0 if not provided) |
The important values here are the class/subclass/protocol fields which provide
a lot of information to the host about what the interface is. If the class is
a USB Org defined one (e.g. 0x02 for Communications Device Class) then the host
may already have drivers designed to work with the interface meaning that the
device manufacturer doesn't have to provide host drivers.
The
bInterfaceNumber
is used by the host to indicate this interface when
sending messages, and the
bAlternateSetting
is a way to vary interfaces. Two
interfaces with the came
bInterfaceNumber
but different
bAlternateSetting
s
can be switched between (like configurations, but) without resetting the
device.
Hopefully the rest of this descriptor is self-evident by now.
The next descriptor kind is endpoint descriptors:
Endpoint Descriptor
Field Name |
Byte start |
Byte length |
Encoding |
Meaning |
bLength |
0 |
1 |
Number |
Size of the descriptor in bytes (7) |
bDescriptorType |
1 |
1 |
Constant |
Endpoint Descriptor (0x05) |
bEndpointAddress |
2 |
1 |
Endpoint |
Endpoint address (see below) |
bmAttributes |
3 |
1 |
Bitmap |
Endpoint attributes (see below) |
wMaxPacketSize |
4 |
2 |
Number |
Maximum packet size this endpoint can send/receive |
bInterval |
6 |
1 |
Number |
Interval for polling endpoint (in frames) |
The
bEndpointAddress
is a 4 bit endpoint number (so there're 16 endpoint
indices) and a bit to indicate
IN
vs.
OUT
. Bit 7 is the direction marker
and bits 3 to 0 are the endpoint number. This means there are 32 endpoints in
total, 16 in each direction, 2 of which are reserved (
IN0
and
OUT0
) giving
30 endpoints available for interfaces to use in any given configuration. The
bmAttributes
bitmap covers the transfer type of the endpoint (more below), and
the
bInterval
is an interval measured in frames (1ms for low or full speed,
125 s in high speed).
bInterval
is only valid for some endpoint types.
The final descriptor kind is for the strings which we've seen indices for
throughout the above. String descriptors have two forms:
String Descriptor (index zero)
Field Name |
Byte start |
Byte length |
Encoding |
Meaning |
bLength |
0 |
1 |
Number |
Size of the descriptor in bytes (variable) |
bDescriptorType |
1 |
1 |
Constant |
String Descriptor (0x03) |
wLangID[0] |
2 |
2 |
Number |
Language code zero (e.g. 0x0409 for en_US) |
wLangID[n] |
4.. |
2 |
Number |
Language code n ... |
This form (for descriptor
0
) is that of a series of language IDs supported by
the device. The device may support any number of languages. When the host
requests a string descriptor, it will supply both the index of the string and
also the language id it desires (from the list available in string descriptor
zero). The host can tell how many language IDs are available simply by
dividing bLength by 2 and subtracting 1 for the two header bytes.
And for string descriptors of an index greater than zero:
String Descriptor (index greater than zero)
Field Name |
Byte start |
Byte length |
Encoding |
Meaning |
bLength |
0 |
1 |
Number |
Size of the descriptor in bytes (variable) |
bDescriptorType |
1 |
1 |
Constant |
String Descriptor (0x03) |
bString |
2.. |
.. |
Unicode |
The string, in "unicode" format |
This second form of the string descriptor is simply the the string is in what
the USB spec calls 'Unicode' format which is, as of 2005, defined to be
UTF16-LE without a BOM or terminator.
Since string descriptors are of a variable length, the host must request
strings in two transactions. First a request for 2 bytes is sent, retrieving
the
bLength
and
bDescriptorType
fields which can be checked and memory
allocated. Then a request for
bLength
bytes can be sent to retrieve the
entire string descriptor.
Putting that all together
Phew, this is getting to be quite a long posting, so I'm going to leave this
here and in my next post I'll talk about how the host and device pass packets
to get all that information to the host, and how it gets used.