
I recently needed to get access to a DataEase database; the person I helped
was the legitimate owner of the data, but had forgotten the password,
as the database was largely from 1996. There are various companies
around the world that seem to do this, or something similar (like give
you an API), for a usually unspecified fee; they all have very 90s homepages
and in general seem like they have gone out of business a long time ago.
And I wasn't prepared to wait.
For those of you who don't know DataEase, it's a sort-of relational database
for DOS that had its heyday in the late 80s and early 90s (being sort of
the cheap cousin of dBase); this is before SQL gained traction as the
standard query language, before real multiuser database access, and before
variable-width field storage.
It is also before reasonable encryption. Let's see what we can do.
DataEase has a system where tables are mapped through the data dictionary,
which is a table on its own. (Sidenote: MySQL pre-8.0 still does not have
this.) This is the file RDRRTAAA.DBM; I don't really know what RDRR stands
for, but T is the database letter in case you wanted more than one database
in the same directory, and AAA, AAB, AAC etc. is a counter so that a table
grows to be too big for one file. (There's also .DBA files for structure of
non-system tables, and then some extra stuff for indexes.)
DBM files are pretty much the classical, fixed-length 80s-style database
files; each row has some flags (I believe these are for e.g. row is
deleted ) and then just the rows in fixed format right after each other.
For instance, here's one I created as part of testing (just the first few
lines of the hexdump are shown):
00000000: 0e 00 01 74 65 73 74 62 61 73 65 00 00 00 00 00 ...testbase.....
00000010: 00 00 00 00 00 00 00 73 46 cc 29 37 00 09 00 00 .......sF.)7....
00000020: 00 00 00 00 00 43 3a 52 44 52 52 54 41 41 41 2e .....C:RDRRTAAA.
00000030: 44 42 4d 00 00 01 00 0e 00 52 45 50 4f 52 54 20 DBM......REPORT
00000040: 44 49 52 45 43 54 4f 52 59 00 00 00 00 00 1c bd DIRECTORY.......
00000050: d4 1a 27 00 00 00 00 00 00 00 00 00 43 3a 52 45 ..'.........C:RE
00000060: 50 4f 54 41 41 41 2e 44 42 4d 00 00 01 00 0e 00 POTAAA.DBM......
00000070: 52 65 6c 61 74 69 6f 6e 73 68 69 70 73 00 00 00 Relationships...
Even without going in-depth, we can see the structure here; there's
testbase which maps to C:RDRRTAA.DBM (the RDRR itself), there's a table
called REPORT DIRECTORY that maps to C:REPOTAAA.DBM, and then more stuff
after that, and so on.
However, other tables are not so easily read, because you can ask DataEase
to
encrypt a table. Let's look at such an encrypted table, like the Users
table (containing usernames, passwords not password hashes and some extra
information like access level), which is always encrypted:
00000000: 0c 01 9f ed 94 f7 ed 34 ba 88 9f 78 21 92 7b 34 .......4...x!. 4
00000010: ba 88 0f d9 94 05 1e 34 ba 88 a0 78 21 92 7b 34 .......4...x!. 4
00000020: e2 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b 34 ...x!. 4...x!. 4
00000030: ba 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b ...x!. 4...x!.
Clearly, this isn't very good encryption; it uses a very short, repetitive
key of eight bytes (64 bits). (The data is mostly zero padding, which makes
it much easier to spot this.) In fact, in actual data tables, only five of
these bytes are set to a non-zero value, which means we have a 40-bit key;
export controls?
My first assumption here was of course XOR, but through some experimentation,
it turned out what you need is actually 8-bit subtraction (with wraparound).
The key used is derived from both a
database key and a
per-table key,
both stored in the RDRR; again, if you disassemble, I'm sure you can find the
key derivation function, but that's annoying, too. Note, by the way, that
this precludes making an attack by just copying tables between databases,
since the database key is different.
So let's do a plaintext attack. If you assume the plaintext of the bottom row
is all padding, that's your key and here's what you end up with:
00000000: 52 79 00 75 73 65 72 00 00 00 00 00 00 00 00 00 Ry.user.........
00000010: 00 00 70 61 73 73 a3 00 00 00 01 00 00 00 00 00 ..pass..........
00000020: 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (...............
00000030: 00 00 00 00 00 00 00 00 ........
Not bad, eh? Actually the first byte of the key here is wrong as far as I
know, but it didn't interfere with the fields, so we have what we need to
log in. (At that point, we've won, because DataEase will helpfully decrypt
everything transparent for us.)
However, there's a twist; if the password is longer than four characters,
the entire decryption of the Users table changes. Of course, we could run
our plaintext attack against every data table and pick out the information
by decoding the structure, but again; annoying. So let's see what it looks
like if we choose passs instead:
00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d ...z..!..c.m...]
00000010: 70 cb df 36 7e 7c 91 c5 d8 33 d8 3d 73 71 e7 2d p..6~ ...3.=sq.-
00000020: 7b 9b 3f a5 db d9 4f 95 a8 03 a7 0d 43 41 b7 fd .?...O.....CA..
00000030: 10 6b 0f 75 ab a9 1f 65 78 d3 77 dd 13 11 87 .k.u...ex.w....
Distinctly more confusing. At this point, of course, we know at which byte
positions the username and password start, so if we wanted to, we could just try
setting the start byte of the password to every possible byte in turn until
we hit 0x00 (DataEase truncates fields at the first zero byte), which would
allow us to get in with an empty password. However, I didn't know the
username either, and trying two bytes would mean 65536 tries, and I wasn't
up for automating macros through DOSBox. So an active attack wasn't
too tempting.
However, we can look at the last hex byte
(where we know the plaintext is 0); it goes 0x5d, 0x2d, 0xfd... and some
other bytes go 0x08, 0xd8, 0xa8, 0x78, and so on. So clearly there's an
obfuscation here where we have a per-line offset that decreases with 0x30
per line. (Actually, the increase/decrease per line seems to be derived from
the key somehow, too.) If we remove that, we end up with:
00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d ...z..!..c.m...]
00000010: a0 fb 0f 66 ae ac c1 f5 08 63 08 6d a3 a1 17 5d ...f.....c.m...]
00000020: db fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17 5d ....;9...c.m...]
00000030: a0 fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17 ....;9...c.m...
Well, OK, this wasn't much more complicated; our fixed key is now 16 bytes
long instead of 8 bytes long, but apart from that, we can do exactly the same
plaintext attack. (Also, it seems to change per-record now, but we don't see
it here, since we've only added one user.) Again, assume the last line is
supposed to be all 0x00 and thus use that as a key (plus the last byte from
the previous line), and we get:
00000000: 6e 06 00 75 73 65 72 00 00 00 00 00 00 00 00 00 n..user.........
00000010: 00 00 70 61 73 73 12 00 00 00 01 00 00 00 00 00 ..pass..........
00000020: 3b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ;...............
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...............
Well, OK, it wasn't perfect; we got pass\x12 instead of passs , so we
messed up somehow. I don't know exactly why the fifth character gets messed
up like this; actually, it cost me half an hour of trying because the
password
looked very real but the database wouldn't let me in, but
eventually, we just guessed at what the missing letter was supposed to be.
So there you have it; practical small-scale cryptanalysis of DOS-era
homegrown encryption. Nothing advanced, but the user was happy about getting
the data back after a few hours of work. :-)