The Huffman algorithm is a popular method used for lossless data compression, primarily for constructing a variable-length prefix code.
Here is the detailed explanation of Huffman Algorithm:
- The Huffman algorithm is a widely-used method for constructing prefix codes, which are used in data compression.
- It was developed by David A. Huffman in 1952. The algorithm builds a binary tree called the Huffman tree, where the characters to be encoded are represented by the leaves of the tree.
- The construction of the tree is based on the frequency of occurrence of each character in the input data.
How the Huffman Algorithm Works:
1.) Frequency Calculation:
First, the algorithm calculates the frequency of each character in the input data.
2.) Construction of Huffman Tree:
Then, it constructs a binary tree using a greedy approach. It repeatedly merges the two least frequent characters into a single node until all characters are merged and the tree is complete.
3.) Assigning Codewords:
Finally, the algorithm assigns codewords to each character based on their position in the Huffman tree. Characters closer to the root of the tree have shorter codewords, while characters farther away have longer codewords. This property ensures that the Huffman code is prefix-free, meaning no codeword is a prefix of another codeword.
Let us understand the algorithm with an example:
character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Step 1. Build a min heap that contains 6 nodes where each node represents root of a tree with single node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new internal node with frequency 5 + 9 = 14.
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one heap node is root of tree with 3 elements.
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 12 + 13 = 25.
Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 16 = 30.
Now min heap contains 3 nodes.
character Frequency
Internal Node 25
Internal Node 30
f 45
Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 25 + 30 = 55.
Now min heap contains 2 nodes.
character Frequency
f 45
Internal Node 55
Step 6: Extract two minimum frequency nodes. Add a new internal node with frequency 45 + 55 = 100.
Now min heap contains only one node.
character Frequency
Internal Node 100
Since the heap contains only one node, the algorithm stops here.
Steps to Print Codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a leaf node is encountered.
The codes are as follows:
character code-word
f 0
c 100
d 101
a 1100
b 1101
e 111
Below is the implementation of above approach:Â
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Scanner;
class Huffman {
// recursive function to print the
// huffman-code through the tree traversal.
// Here s is the huffman - code generated.
public static void printCode(HuffmanNode root, String s)
{
// base case; if the left and right are null
// then its a leaf node and we print
// the code s generated by traversing the tree.
if (root.left == null && root.right == null
&& Character.isLetter(root.c)) {
// c is the character in the node
System.out.println(root.c + ":" + s);
return;
}
// if we go to left then add "0" to the code.
// if we go to the right add"1" to the code.
// recursive calls for left and
// right sub-tree of the generated tree.
printCode(root.left, s + "0");
printCode(root.right, s + "1");
}
// main function
public static void main(String[] args)
{
Scanner s = new Scanner(System.in);
// number of characters.
int n = 6;
char[] charArray = { 'a', 'b', 'c', 'd', 'e', 'f' };
int[] charfreq = { 5, 9, 12, 13, 16, 45 };
// creating a priority queue q.
// makes a min-priority queue(min-heap).
PriorityQueue<HuffmanNode> q
= new PriorityQueue<HuffmanNode>(
n, new MyComparator());
for (int i = 0; i < n; i++) {
// creating a Huffman node object
// and add it to the priority queue.
HuffmanNode hn = new HuffmanNode();
hn.c = charArray[i];
hn.data = charfreq[i];
hn.left = null;
hn.right = null;
// add functions adds
// the huffman node to the queue.
q.add(hn);
}
// create a root node
HuffmanNode root = null;
// Here we will extract the two minimum value
// from the heap each time until
// its size reduces to 1, extract until
// all the nodes are extracted.
while (q.size() > 1) {
// first min extract.
HuffmanNode x = q.peek();
q.poll();
// second min extract.
HuffmanNode y = q.peek();
q.poll();
// new node f which is equal
HuffmanNode f = new HuffmanNode();
// to the sum of the frequency of the two nodes
// assigning values to the f node.
f.data = x.data + y.data;
f.c = '-';
// first extracted node as left child.
f.left = x;
// second extracted node as the right child.
f.right = y;
// marking the f node as the root node.
root = f;
// add this node to the priority-queue.
q.add(f);
}
// print the codes by traversing the tree
printCode(root, "");
}
}
// node class is the basic structure
// of each node present in the Huffman - tree.
class HuffmanNode {
int data;
char c;
HuffmanNode left;
HuffmanNode right;
}
// comparator class helps to compare the node
// on the basis of one of its attribute.
// Here we will be compared
// on the basis of data values of the nodes.
class MyComparator implements Comparator<HuffmanNode> {
public int compare(HuffmanNode x, HuffmanNode y)
{
return x.data - y.data;
}
}
Output:
f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111
Applications of Huffman Algorithm:
1.) Data Compression:
The primary application of the Huffman algorithm is in data compression, where it is used to generate efficient variable-length codes. These codes are used to represent data more compactly, reducing storage requirements and transmission time.
2.) File Compression:
Huffman coding is used in various file compression formats such as ZIP, gzip, and JPEG.
3.) Network Communication:
It is used in network communication protocols to reduce the amount of data transmitted over the network.
Advantages of Huffman Algorithm:
1.) Efficiency:
The Huffman algorithm produces optimal prefix codes, meaning it achieves the minimum possible average codeword length for a given set of symbol frequencies.
2.) Simplicity:
The algorithm is relatively simple to understand and implement, making it accessible for various applications.
3.) Lossless Compression:
Huffman coding is a lossless compression technique, meaning the original data can be perfectly reconstructed from the compressed data.
Disadvantages of Huffman Algorithm:
1.) Variable-Length Codes:
While variable-length codes generated by Huffman coding are more efficient than fixed-length codes for compressing data with varying symbol frequencies, they can make decoding slightly more complex and slower.
2.) Preprocessing Overhead:
The algorithm requires preprocessing to calculate symbol frequencies and construct the Huffman tree, which can add overhead, especially for large datasets.
3.) Context Dependence:
Huffman coding operates on individual symbols and does not consider correlations or dependencies between symbols, which may limit its effectiveness in certain types of data.