General Purpose Compression Techniques
There are many popular general purpose lossless compression techniques, that can be applied to any type of data.
Run Length Encoding
Run Length Encoding
is a compression technique that replaces consecutive occurrences of a symbol with the symbol followed by the number of times it is repeated. For example, the string 111110000003355 could be represented by 15063252. Clearly this compression technique is most useful where symbols appear in long runs, and thus can sometimes be useful for images that have areas where the pixels all have the same value, cartoons for example.
is a transmission technique that attempts to improve efficiency by transmitting the difference between each value and its predecessor, in place of the value itself. Thus the values 15106433003 would be transmitted as 1+4-4-1+6-2-1+0-3+0+3. In effect the transmitter is predicting that each value is the same as its predecessor and the data transmitted is the difference between the predicted and actual values. Differential Pulse Code Modulation (DPCM) is an example of relative encoding.
The signal above can have one of 7 possible values (-3 to +3) and so would require 3 bits per sample. Each sample can also be described by the difference between it and the previous sample. Each sample is either the same, one more, or one less than the previous sample. Only two bits are required to express the relationship between the samples. Coding the signal in this way results in a reduction of one third in the number of bits.
is a popular compression technique that assigns variable length codes (VLC) to symbols, so that the most frequently occurring symbols have the shortest codes [Huff52
]. On decompression the symbols are reassigned their original fixed length codes. When used to compress text, for example, variable length codes are used in place of ASCII codes, and the most common characters, usually space
, and t
are assigned the shortest codes. In this way the total number of bits required to transmit the data can be considerably less than the number required if the fixed length representation is used. Huffman coding is particularly effective where the data are dominated by a small number of symbols.
Although Huffman coding is very efficient, it is only optimal when the symbol probabilities are integral powers of two. Arithmetic coding [Witt87] does not have this restriction and is usually more efficient than the more popular Huffman technique. Although more efficient than Huffman coding, arithmetic coding is more complex.
Lempel-Ziv compressors use a dictionary of symbol sequences. When an occurrence of the sequence is repeated it is replaced by a reference to its position in the dictionary. There are several variations of this coding technique and they differ primarily in the manner in which they manage the dictionary. The most well known of these techniques is the Lempel-Ziv-Welch variation.
© Colin E. Manning 1996