Sylvain Beucler: dot-zed archive file format
TL,DR: I reverse-engineered the .zed encrypted archive format.
Following a clean-room design, I'm providing a description that can be implemented by a third-party.
Interested? (reference version at: https://www.beuc.net/zed/) .zed archive file format Introduction Archives with the .zed extension are conceptually similar to an encrypted .zip file. In addition to a specific format, .zed files support multiple users: files are encrypted using the archive master key, which itself is encrypted for each user and/or authentication method (password, RSA key through certificate or PKCS#11 token). Metadata such as filenames is partially encrypted. .zed archives are used as stand-alone or attached to e-mails with the help of a MS Outlook plugin. A variant, which is not covered here, can encrypt/decrypt MS Windows folders on the fly like ecryptfs. In the spirit of academic and independent research this document provides a description of the file format and encryption algorithms for this encrypted file archive. See the conventions section for conventions and acronyms used in this document. Structure overview The .zed file format is composed of several layers.
Following a clean-room design, I'm providing a description that can be implemented by a third-party.
Interested? (reference version at: https://www.beuc.net/zed/) .zed archive file format Introduction Archives with the .zed extension are conceptually similar to an encrypted .zip file. In addition to a specific format, .zed files support multiple users: files are encrypted using the archive master key, which itself is encrypted for each user and/or authentication method (password, RSA key through certificate or PKCS#11 token). Metadata such as filenames is partially encrypted. .zed archives are used as stand-alone or attached to e-mails with the help of a MS Outlook plugin. A variant, which is not covered here, can encrypt/decrypt MS Windows folders on the fly like ecryptfs. In the spirit of academic and independent research this document provides a description of the file format and encryption algorithms for this encrypted file archive. See the conventions section for conventions and acronyms used in this document. Structure overview The .zed file format is composed of several layers.
- The main container is using the (MS-CFB), which is notably used by MS
Office 97-2003 .doc files. It contains several streams:
- Metadata stream: in OLE Property Set format
(MS-OLEPS), contains 2 blobs in a specific Type-Length-Value
(TLV) format:
- _ctlfile: global archive properties and access list
It is obfuscated by means of static-key AES encryption.
The properties include archive initial filename and a global IV.
A global encryption key is itself encrypted in each user entry. - _catalog: file list
Contains each file metadata indexed with a 15-bytes identifier.
Directories are supported.
Full filename is encrypted using AES.
File extension is (redundantly) stored in clear, and so are file metadata such as modification time.
- _ctlfile: global archive properties and access list
- Each file in the archive compressed with zlib and encrypted with
the standard AES algorithm, in a separate stream.
Several encryption schemes and key sizes are supported.
The file stream is split in chunks of 512 bytes, individually encrypted. - Optional streams, contain additional metadata as well as pictures to display in the application background ("watermarks"). They are not discussed here.
- Metadata stream: in OLE Property Set format
(MS-OLEPS), contains 2 blobs in a specific Type-Length-Value
(TLV) format:
+----------------------------------------------------------------------------------------------------+
.zed archive (MS-CBF)
stream #1 stream #2 stream #3...
+------------------------------+ +---------------------------+ +---------------------------+
metadata (MS-OLEPS) encryption (AES) encryption (AES)
512-bytes chunks 512-bytes chunks
+--------------------------+
obfuscation (static key) +-----------------------+ +-----------------------+
+----------------------+ - compression (zlib) - - compression (zlib) -
_ctlfile (TLV) ...
+----------------------+ +---------------+ +---------------+
+--------------------------+ file contents file contents
+--------------------------+ - +---------------+ - - +---------------+ -
_catalog (TLV)
+--------------------------+ +-----------------------+ +-----------------------+
+------------------------------+ +---------------------------+ +---------------------------+
+----------------------------------------------------------------------------------------------------+
Encryption schemes
Several AES key sizes are supported, such as 128 and 256 bits.
The Cipher Block Chaining (CBC) block cipher mode of operation is used
to decrypt multiple AES 16-byte blocks, which means an initialisation
vector (IV) is stored in clear along with the ciphertext.
All filenames and file contents are encrypted using the same
encryption mode, key and IV (e.g. if you remove and re-add a file in
the archive, the resulting stream will be identical).
No cleartext padding is used during encryption; instead, several
end-of-stream handlers are available, so the ciphertext has exactly
the size of the cleartext (e.g. the size of the compressed file).
The following variants were identified in the 'encryption_mode'
field.
STREAM
This is the end-of-stream handler for:
- obfuscated metadata encrypted with static AES key
- filenames and files in archives with 'encryption_mode' set to "AES-CBC-STREAM"
- any AES ciphertext of size < 16 bytes, regardless of encryption mode
- the second-to-last block of the ciphertext is encrypted in AES-ECB mode (i.e. block cipher encryption only, without XORing with the IV)
- then XOR-ed with the last partial block (hence truncated to the length of the partial block)
- filenames and files in archives with 'encryption_mode' set to "AES-CBC-CTS".
- exception: if the size of the ciphertext is < 16 bytes, then "STREAM" is used instead.
- static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by 0100 (hexadecimal) (18 bytes total)
- 16-byte IV
- ciphertext
- 1 uint32be representing the length of all the above
- static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by "ZoneCentral (R)" (ASCII) and a NUL byte (32 bytes total)
- global archive properties as a 'fileprops' structure,
- extra archive properties as a 'archive_extraprops' structure
- users access list as a series of 'passworduser' and 'rsauser entries.
- 4 bytes for Type (specified as a 4-bytes hexadecimal below)
- 4 bytes for value Length (uint32be)
- Value
- 80110600: fileprops, used for the file list as well as for the global archive properties
- 001b0600: archive_extraprops
- 80140600: accesslist
- 80110600: fileprops (TLV structure): global archive properties
- 00230400: archive_pathname (UTF-16LE string): initial archive filename (past versions also leaked the full pathname of the initial archive)
- 80270200: encryption_mode (utf32be): 103 for "AES-CBC-STREAM", 104 for "AES-CBC-CTS"
- 80260200: encryption_strength (utf32be): AES key size, in bytes (e.g. 32 means AES with a 256-bit key)
- 80280500: files_iv (sequence of bytes): global IV for all filenames and file contents
- 001b0600: archive_extraprops (TLV structure): additionnal archive properties (optional)
- 00c40500: archive_creationtime (FILETIME): date and time when archive was initially created (optional)
- 00c00400: archive_createdwith (UTF-16LE string): uuid-like structure describing the application that initialized the archive (optional)
00000188-1000-3CA8-8868-36F59DEFD14D is Zed! Free 1.0.188.
- 80140600: accesslist (TLV structure): describe the users, their key encryption and their permissions
- 80610600: passworduser (TLV structure): user identified by password (0 or more)
- 80620600: rsauser (TLV structure): user identified by RSA key (via file or PKCS#11 token) (0 or more)
- Fields common to passworduser and rsauser:
- 80710400: login (UTF-16LE string): user name
- 80720300: login_md5 (sequence of bytes): used by the application to search for a user name
- 807e0100: priv1 (uchar): user privileges; present and set to 1 when user is admin (optional)
- 00830200: priv2 (uint32be): user privileges; present and set to 2 when user is admin, present and set to 5 when user is a marked as mandatory, e.g. for recovery keys (optional)
- 80740500: files_key_ciphertext (sequence of bytes): the archive encryption key, itself encrypted
- 00840500: user_creationtime (FILETIME): date and time when the user was added to the archive
- passworduser-specific fields:
- 80760500: pbe_salt (sequence of bytes): salt for PBE
- 80770200: pbe_iter (uint32be): number of iterations for PBE
- 80780200: pkcs12_hashfunc (uint32be): hash function used for PBE and PBA key derivation
- 80790500: pba_checksum (sequence of bytes): password derived with PBA to check for password validity
- 807a0500: pba_salt (sequence of bytes): salt for PBA
- 807b0200: pba_iter (uint32be): number of iterations for PBA
- rsauser-specific fields:
- 807d0500: certificate (sequence of bytes): user X509 certificate in DER format
- 80110600: fileprops (TLV structure): describe the archive files (0 or more)
- 80300500: file_id (sequence of bytes): a 16-byte unique identifier
- 80310400: filename_halfanon (UTF-16LE string): half-anonymized filename, e.g. File1.txt (leaking filename extension)
- 00380500: filename_ciphertext (sequence of bytes): encrypted filename; may have a trailing NUL byte once decrypted
- 80330500: file_size (uint64le): decompressed file size in bytes
- 80340500: file_creationtime (FILETIME): file creation date and time
- 80350500: file_lastwritetime (FILETIME): file last modification date and time
- 80360500: file_lastaccesstime (FILETIME): file last access date and time
- 00370500: parent_directory_id (sequence of bytes): file_id of the parent directory, 0 is top-level
- 80320100: is_dir (uint32be): 1 if entry is directory (optional)
- 21: SHA-1
- 22: SHA-256
- ID: 3
- 'pba_salt': the salt, typically an 8-byte random sequence
- 'pba_iter': the iteration count, typically 200000
- 'pbe_salt': the salt, typically an 8-bytes random sequence
- 'pbe_iter': the iteration count, typically 100000
- ID: 1
- size: 32
- ID: 2
- size: 16
Initial offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Shuffled offset: 3 2 1 0 5 4 7 6 8 9 10 11 12 13 14 15
The 16th byte is usually a NUL byte, hence the stream identifier is a
30-character-long string.
Decrypting files
The compressed stream is split in chunks of 512 bytes, each of them
encrypted separately using AES CBS and the global archive encryption
scheme. Decryption uses the global AES key (retrieved using the user
credentials), and the global IV (retrieved from the deobfuscated
archive metadata).
The IV for each chunk is computed by:
- expressing the current chunk number as little endian on 16 bytes
- XORing it with the global IV
- encrypting with the global AES key in ECB mode (without IV).
- uint32le: unsigned int on 4 bytes, little endian.
- uint32be: unsigned int on 4 bytes, big endian.
- uint64le: unsigned int on 8 bytes, little endian.
- uchar: unsigned char on 1 byte.
- (hexadecimal): represents 1 byte with 2 hexadecimal numbers.
e.g. 4142 (hexadecimal) means decimal values 65 and then 66. - (ascii): represents 1 byte with the matching 1-byte ASCII value.
e.g. 'AB\0' means decimal values 65 then 66 and then 0. - UTF-16LE: text encoded using 2-bytes Unicode, little-endian, without Byte Order Mark (BOM), as commonly used under MS Windows.
- IV: initialization vector for AES encryption and decryption.
- CBC: Cipher Block Chaining, see Recommendation for Block Cipher Modes of Operation and Wikipedia.
- CTS: CipherText Stealing, one type of end-of-stream handling, see Recommendation for Block Cipher Modes of Operation: Three Variants of Ciphertext Stealing for CBC Mode and Wikipedia.
- PBA, PBE: password-based authentication, password-based encryption; see PKCS #11 Cryptographic Token Interface Current Mechanisms Specification.
- MS-CFB: Compound File Binary Format, documented at
MSDN,
libolecf,
Wikipedia.
Libraries that can manipulate this format include olefile, libolecf and libforensics. - MS-OLEPS: OLE Property Set, documented at
MSDN and
libfole.
The 'oleps.py' demo from libforensics displays detailed information on MS-OLEPS streams; olefile and libolecf have basic support. - TLV : Type-Length-Value encoding ; 4 bytes for type, 4 bytes for length (uint32be), and a variable-sized value; see Type-length-value at Wikipedia for the general principle.
- FILETIME: timestamp as 1/10000000s since January 1, 1601 (UTC), see FILETIME structure at MSDN.