Huffman Coding

Huffman coding offers a way to compress data. The average length of a Huffman code depends on the statistical frequency with which the source produces each symbol from its alphabet. A Huffman code dictionary, which associates each data symbol with a codeword, has the property that no codeword in the dictionary is a prefix of any other codeword in the dictionary.

The huffmandict, huffmanenco, and huffmandeco functions support Huffman coding and decoding.


For long sequences from sources having skewed distributions and small alphabets, arithmetic coding compresses better than Huffman coding. To learn how to use arithmetic coding, see Arithmetic Coding.

Huffman coding requires statistical information about the source of the data being encoded. In particular, the p input argument in the huffmandict function lists the probability with which the source produces each symbol in its alphabet.

For example, consider a data source that produces 1s with probability 0.1, 2s with probability 0.1, and 3s with probability 0.8. The main computational step in encoding data from this source using a Huffman code is to create a dictionary that associates each data symbol with a codeword. The example below creates such a dictionary and then show the codeword vector associated with a particular value from the data source.

Create a Huffman Code Dictionary

This example shows how to create a Huffman code dictionary using the huffmandict function.

Create a vector of data symbols and assign a probability to each.

symbols = [1 2 3];
prob = [0.1 0.1 0.8];

Create a Huffman code dictionary. The most probable data symbol, 3, is associated with a one-digit codeword, while less probable data symbols are associated with two-digit codewords.

dict = huffmandict(symbols,prob)
dict=3×2 cell array
    {[1]}    {1x2 double}
    {[2]}    {1x2 double}
    {[3]}    {[       0]}

Display the second row of the dictionary. The output also shows that a Huffman encoder receiving the data symbol 2 substitutes the sequence 1 0.

ans = 2
ans = 1×2

     1     0

Create and Decode a Huffman Code

The example performs Huffman encoding and decoding using a source whose alphabet has three symbols. Notice that the huffmanenco and huffmandeco functions use the dictionary created by huffmandict.

Generate a data sequence to encode.

sig = repmat([3 3 1 3 3 3 3 3 2 3],1,50);

Define the set of data symbols and the probability associated with each element.

symbols = [1 2 3];
p = [0.1 0.1 0.8];

Create the Huffman code dictionary.

dict = huffmandict(symbols,p);

Encode and decode the data. Verify that the original data, sig, and the decoded data, dhsig, are identical.

hcode = huffmanenco(sig,dict);
dhsig = huffmandeco(hcode,dict);
ans = logical