Here is the detailed explanation of Hashing and Hash Functions:
Hashing in the data structure is a technique of mapping a large chunk of data into small tables using a hashing function.
- It is also known as the message digest function. It is a technique that uniquely identifies a specific item from a collection of similar items.
- It is the process of mapping data of an arbitrary size to fixed-size values called hash codes or hash values. It is used to quickly locate a data record in a database.
- A hash function is a mathematical function that converts an input data into a fixed-size value. It is designed to be fast and deterministic.
- The output of a hash function is known as a hash code or hash value.
Importance of Hashing:
- Efficient Data Storage: Hash tables provide efficient storage of key-value pairs by distributing data across buckets based on hash codes. This allows for optimized memory usage and faster access to elements.
- Collision Handling: Hashing techniques offer collision resolution strategies to manage scenarios where different keys hash to the same value. Effective collision resolution methods ensure data integrity and maintain performance in hash-based data structures.
- Data Security: Hashing is widely used in data security applications like password hashing. By converting sensitive information into hash values, it enhances data protection and confidentiality.
- Indexing and Searching: Hashing plays a crucial role in indexing and searching applications such as databases and search engines.
Types of Hashing
Static Hashing:
Static hashing is a technique used in data structures and algorithms where the number of buckets or slots in the hash table remains fixed throughout the operation.
- In static hashing, a fixed number of buckets are allocated in advance, and the hash function determines the bucket in which each key-value pair will be stored.
- Static hashing is efficient for scenarios where the number of elements to be stored is known in advance and remains relatively stable.
- However, it may lead to inefficiencies if the number of elements increases significantly, as this could lead to increased collisions and degrade performance.
- Despite this limitation, static hashing offers simplicity and predictability in managing data storage and retrieval operations, making it a suitable choice for applications with a consistent workload and data size.
Complexity Analysis of Static Heap:
- Insertion (Heapify):
- Time Complexity: O(log N) where N is the number of elements in the heap.
- Space Complexity: O(1) for insertion as the heap size remains fixed in a static heap.
- Deletion (Extract-Min/Max):
- Time Complexity: O(log N) where N is the number of elements in the heap.
- Space Complexity: O(1) for deletion as the heap size remains fixed in a static heap.
- Building a static heap from an array:
- Time Complexity: O(N) where N is the number of elements in the array.
- Space Complexity: O(1) for building a static heap.
Dynamic Hashing:
Dynamic hashing is a method of hashing in which the data structure grows and shrinks dynamically as records are added or removed.
- This way, the hash table can grow incrementally without needing to rehash all existing keys, which can be a costly operation.
- When the number of elements in the hash table exceeds a certain threshold, dynamic hashing triggers a rehashing process, where the hash table is resized and elements are redistributed to new locations.
- This helps in avoiding clustering and maintaining a more uniform distribution of elements across the hash table.
Complexity Analysis of Dynamic Heap:
- Insertion:
- The complexity of inserting an element into a dynamic heap is O(log n) where n is the number of elements in the heap. This is because the element is inserted at the bottom level of the heap and then percolated up to maintain the heap property.
- Deletion (Extracting minimum or maximum element):
- The complexity of deleting the minimum (or maximum) element from a dynamic heap is O(log n) as well. After removal, the last element from the heap is moved to the root and then percolated down to maintain the heap property.
- Finding minimum (or maximum) element:
- The complexity of finding the minimum (or maximum) element in a dynamic heap is O(1) since it is always at the root of the heap.
- Heapify Operation (Building a heap):
- The complexity of building a heap from an array of elements using the heapify operation is O(n) where n is the number of elements in the array.
- Heap Sort:
- The complexity of performing a heap sort using a dynamic heap is O(n log n) since building the heap initially takes O(n) time and each removal operation (extracting the minimum/maximum) takes O(log n) time.
- Updating an element:
- The complexity of updating an element in a dynamic heap involves two main operations – first, locating the element to be updated which takes O(n) time in the worst case, and then heapifying the heap which takes O(log n) time. Hence, the overall complexity can be O(n) + O(log n) = O(n).
Components of Hashing:
- Key : A Key can be anything string or integer which is fed as input in the hash function the technique that determines an index or location for storage of an item in a data structure.
- Hash Function: The hash function receives the input key and returns the index of an element in an array called a hash table. The index is known as the hash index.
- Hash Table: Hash table is a data structure that maps keys to values using a special function called a hash function.
Hash Functions:
Hash functions are algorithms that take an input (or ‘key’) and produce a fixed-size string of characters, which is typically a hash code. Common hash functions include division, folding, mid-square function, and extraction.
Here are some reasons why hash functions are essential in DSA:
- Efficient Searching : Hash functions enable constant-time average-case search complexity (O(1)) in hash tables. They map keys to indices, allowing for quick access to values stored at specific locations.
- Data Organization: Hash functions help organize and store data more efficiently by distributing elements across a data structure based on key-value pairs. This distribution can lead to reduced search times compared to linear search algorithms.
- Collision Resolution: Hash functions are used to handle collisions that occur when two different keys map to the same index in a hash table. Techniques like chaining or open addressing can be employed to resolve collisions efficiently.
- Security and Cryptography: Hash functions are essential in security applications for hashing passwords, digital signatures, and data integrity verification. They provide a one-way function that converts data into a fixed-size hash, making it difficult to reverse engineer the original input.
- Performance Optimization: By using hash functions, DSA can achieve faster operations such as insertion, deletion, and retrieval of elements in hash based data structures. This performance improvement is crucial for handling large datasets efficiently.
Types of Hash Function:
There are various type of Hash Function:
Division Method :
The division method is a hashing technique used to convert keys into hash indices for hash tables.
• It involves dividing the key by a fixed integer value (usually the hash table size) and using the remainder as the index.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more uniformly distributed. The hash function is dependent upon the remainder of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
= 0
Mid Square Method:
The mid-square method is a very good hashing method. It involves two steps to compute the hash value-
- Square the value of the key k i.e. k2
- Extract the middle r digits as the hash value.
Formula:
h(K) = h(k x k)
Here,
k is the key value.
The value of r can be decided based on the size of the table.
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
Folding Method:
This method involves two steps:
- Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same number of digits except for the last part that can have lesser digits than the other parts.
- Add the individual parts. The hash value is obtained by ignoring the last carry if any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for example the size of the hash table is 100, then each part must have two digits except for the last part which can have a lesser number of digits.
Multiplication Method:
This method involves the following steps:
- Choose a constant value A such that 0 < A < 1.
- Multiply the key value with A.
- Extract the fractional part of kA.
- Multiply the result of the above step by the size of the hash table i.e. M.
- The resulting hash value is obtained by taking the floor of the result obtained in step 4.
Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Extraction Method:
This method involves following steps:
- Arrays:
- Identify the index of the element you want to extract.
- Access the element directly using its index.
- Optionally, remove the element from the array if needed, which might involve shifting subsequent elements.
- Linked Lists:
- Traverse the linked list until you find the node containing the desired element.
- Once the node is found, extract the element from the node.
- Optionally, remove the node from the linked list if needed, adjusting pointers accordingly.
- Stacks:
- Use the “pop” operation to extract the top element from the stack.
- The “pop” operation also removes the extracted element from the stack.
- Queues:
- Use the “dequeue” operation to extract the front element from the queue.
- The “dequeue” operation also removes the extracted element from the queue.
- Binary Trees:
- Perform a traversal algorithm (in-order, pre-order, post-order) to find the node containing the desired element.
- Once the node is found, extract the element from the node.
- Hash Tables (Separate Chaining):
- Hash the key of the element to determine its index in the hash table.
- Traverse the linked list at the index until you find the node containing the desired element.
- Once the node is found, extract the element from the node.
Formula:
Example: