Internal Compression of DBPF Files
A reference written by Fogity
The internal compression format is a custom compression format available for package files. It is used by Maxis for [string table(../stbl-format/)] resources, as it is optimised for text.
The format consists of a series of commands mixed with raw data.
Format Specification
Section titled “Format Specification”- Binary- Big Endian
- Most Significant Bit First
== FILE ==HEADER compression metadataBLOCK[...] a number of instruction and data blocksEND BLOCK final instruction and data block
== HEADER ==bit extended sizebit[15] unknownIF 'extended size' set uint16 uncompressed sizeELSE uint12 uncompressed size
== BLOCK ==ANY OF BLOCK A BLOCK B BLOCK C BLOCK D
== BLOCK A ==bit = 0b0 identifying bitbit[2] highest bits of DObit[3] DNbit[2] SNbyte lowest byte of DObyte[source count] data# source count = SN# destination count = DN + 3# destination offset = DO + 1
== BLOCK B ==bit[2] = 0b10 identifying bitsbit[6] DNbit[2] SNbit[6] highest bits of DObyte lowest byte of DObyte[source count] data# source count = SN# destination count = DN + 4# destination offset = DO + 1
== BLOCK C ==bit[3] = 0b110 identifying bitsbit highest bit of DObit[2] highest bits of DNbit[2] SNbyte[2] lowest bytes of DObyte lowest byte of DNbyte[source count] data# source count = SN# destination count = DN + 5# destination offset = DO + 1
== BLOCK D ==bit[3] = 0b111 identifying bitsbit[5] SNbyte[source count] data# source count = 4 * (SN + 1)# destination count = 0# destination offset = 0
== END BLOCK ==bit[5] = 0b11111 identifying bitsbit[2] SNbyte[source count] data# source count = SN# destination count = 0# destination offset = 0
Decompression
Section titled “Decompression”Each block variant can be identified by the starting bits (MSBs), there is no overlap except for the end block which overlaps with block D (the end block should take precedence when parsing blocks).
During decompression the source array will be read once in order, while the destination array may be accessed from anywhere up to (and including) the last written byte.
After parsing a block, the block data (given by the source count) should be appended to the destination array. Then a number of bytes (given by the destination count) should be copied from the destination array and appended to the array (the offset to copy from is given by destination offset). It is safest to do this byte by byte as the bytes to copy might overlap with the bytes being appended (for repeating data).
Originally written by Fogity on GitLab.