Data Structure & Algorithm in Java

⌘K
  1. Home
  2. Docs
  3. Data Structure & Alg...
  4. Trees
  5. Huffman Algorithm

Huffman Algorithm

The Huffman algorithm is a popular method used for lossless data compression, primarily for constructing a variable-length prefix code.

Here is the detailed explanation of Huffman Algorithm:

  • The Huffman algorithm is a widely-used method for constructing prefix codes, which are used in data compression.
  • It was developed by David A. Huffman in 1952. The algorithm builds a binary tree called the Huffman tree, where the characters to be encoded are represented by the leaves of the tree.
  • The construction of the tree is based on the frequency of occurrence of each character in the input data.

1.) Frequency Calculation:

First, the algorithm calculates the frequency of each character in the input data.

2.) Construction of Huffman Tree:

Then, it constructs a binary tree using a greedy approach. It repeatedly merges the two least frequent characters into a single node until all characters are merged and the tree is complete.

3.) Assigning Codewords:

Finally, the algorithm assigns codewords to each character based on their position in the Huffman tree. Characters closer to the root of the tree have shorter codewords, while characters farther away have longer codewords. This property ensures that the Huffman code is prefix-free, meaning no codeword is a prefix of another codeword.

Let us understand the algorithm with an example:

character   Frequency
    a            5
    b           9
    c           12
    d           13
    e           16
    f           45

Step 1. Build a min heap that contains 6 nodes where each node represents root of a tree with single node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new internal node with frequency 5 + 9 = 14. 

Screenshot 2024 03 30 193648

Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one heap node is root of tree with 3 elements.

character           Frequency
       c               12
       d               13
 Internal Node         14
       e               16
       f                45

Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 12 + 13 = 25.

Screenshot 2024 03 30 193957

Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

character           Frequency
Internal Node          14
       e               16
Internal Node          25
       f               45

Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 16 = 30.

Screenshot 2024 03 30 194325

Now min heap contains 3 nodes.

character          Frequency
Internal Node         25
Internal Node         30
      f               45 

Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 25 + 30 = 55.

Screenshot 2024 03 30 194603

Now min heap contains 2 nodes.

character     Frequency
       f         45
Internal Node    55

Step 6: Extract two minimum frequency nodes. Add a new internal node with frequency 45 + 55 = 100.

Screenshot 2024 03 30 194746

Now min heap contains only one node.

character      Frequency
Internal Node    100

Since the heap contains only one node, the algorithm stops here.

Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a leaf node is encountered.

Screenshot 2024 03 30 195051

The codes are as follows:

character   code-word
    f          0
    c          100
    d          101
    a          1100
    b          1101
    e          111

Below is the implementation of above approach: 

import java.util.Comparator; 
import java.util.PriorityQueue; 
import java.util.Scanner; 
  
class Huffman { 
  
    // recursive function to print the 
    // huffman-code through the tree traversal. 
    // Here s is the huffman - code generated. 
    public static void printCode(HuffmanNode root, String s) 
    { 
  
        // base case; if the left and right are null 
        // then its a leaf node and we print 
        // the code s generated by traversing the tree. 
        if (root.left == null && root.right == null
            && Character.isLetter(root.c)) { 
  
            // c is the character in the node 
            System.out.println(root.c + ":" + s); 
  
            return; 
        } 
  
        // if we go to left then add "0" to the code. 
        // if we go to the right add"1" to the code. 
  
        // recursive calls for left and 
        // right sub-tree of the generated tree. 
        printCode(root.left, s + "0"); 
        printCode(root.right, s + "1"); 
    } 
  
    // main function 
    public static void main(String[] args) 
    { 
  
        Scanner s = new Scanner(System.in); 
  
        // number of characters. 
        int n = 6; 
        char[] charArray = { 'a', 'b', 'c', 'd', 'e', 'f' }; 
        int[] charfreq = { 5, 9, 12, 13, 16, 45 }; 
  
        // creating a priority queue q. 
        // makes a min-priority queue(min-heap). 
        PriorityQueue<HuffmanNode> q 
            = new PriorityQueue<HuffmanNode>( 
                n, new MyComparator()); 
  
        for (int i = 0; i < n; i++) { 
  
            // creating a Huffman node object 
            // and add it to the priority queue. 
            HuffmanNode hn = new HuffmanNode(); 
  
            hn.c = charArray[i]; 
            hn.data = charfreq[i]; 
  
            hn.left = null; 
            hn.right = null; 
  
            // add functions adds 
            // the huffman node to the queue. 
            q.add(hn); 
        } 
  
        // create a root node 
        HuffmanNode root = null; 
  
        // Here we will extract the two minimum value 
        // from the heap each time until 
        // its size reduces to 1, extract until 
        // all the nodes are extracted. 
        while (q.size() > 1) { 
  
            // first min extract. 
            HuffmanNode x = q.peek(); 
            q.poll(); 
  
            // second min extract. 
            HuffmanNode y = q.peek(); 
            q.poll(); 
  
            // new node f which is equal 
            HuffmanNode f = new HuffmanNode(); 
  
            // to the sum of the frequency of the two nodes 
            // assigning values to the f node. 
            f.data = x.data + y.data; 
            f.c = '-'; 
  
            // first extracted node as left child. 
            f.left = x; 
  
            // second extracted node as the right child. 
            f.right = y; 
  
            // marking the f node as the root node. 
            root = f; 
  
            // add this node to the priority-queue. 
            q.add(f); 
        } 
  
        // print the codes by traversing the tree 
        printCode(root, ""); 
    } 
} 
  
// node class is the basic structure 
// of each node present in the Huffman - tree. 
class HuffmanNode { 
  
    int data; 
    char c; 
  
    HuffmanNode left; 
    HuffmanNode right; 
} 
  
// comparator class helps to compare the node 
// on the basis of one of its attribute. 
// Here we will be compared 
// on the basis of data values of the nodes. 
class MyComparator implements Comparator<HuffmanNode> { 
    public int compare(HuffmanNode x, HuffmanNode y) 
    { 
  
        return x.data - y.data; 
    } 
} 

Output:

f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111

1.) Data Compression:

The primary application of the Huffman algorithm is in data compression, where it is used to generate efficient variable-length codes. These codes are used to represent data more compactly, reducing storage requirements and transmission time.

2.) File Compression:

Huffman coding is used in various file compression formats such as ZIP, gzip, and JPEG.

3.) Network Communication:

It is used in network communication protocols to reduce the amount of data transmitted over the network.

1.) Efficiency:

The Huffman algorithm produces optimal prefix codes, meaning it achieves the minimum possible average codeword length for a given set of symbol frequencies.

2.) Simplicity:

The algorithm is relatively simple to understand and implement, making it accessible for various applications.

3.) Lossless Compression:

Huffman coding is a lossless compression technique, meaning the original data can be perfectly reconstructed from the compressed data.

1.) Variable-Length Codes:

While variable-length codes generated by Huffman coding are more efficient than fixed-length codes for compressing data with varying symbol frequencies, they can make decoding slightly more complex and slower.

2.) Preprocessing Overhead:

The algorithm requires preprocessing to calculate symbol frequencies and construct the Huffman tree, which can add overhead, especially for large datasets.

3.) Context Dependence:

Huffman coding operates on individual symbols and does not consider correlations or dependencies between symbols, which may limit its effectiveness in certain types of data.

How can we help?

Leave a Reply

Your email address will not be published. Required fields are marked *