Java HashMap Implementation in a Nutshell
Join the DZone community and get the full member experience.
A HashMap (or hash table) is a data structure that maps keys to values for highly efficient lookup. There are a number of ways to implement this data structure. This post is about the simple implementation of HashMaps in Java using an array of a linked list.
So, let’s first define a class representing a node of a linked list as:
class Entry < final K key; V value; Entrynext; public Entry(K key, V value, Entry next) < this.key = key; this.value = value; this.next = next; >// getters, equals, hashCode and toString >
Inserting Element
To insert an element, a key and value, we do the following:
- First, compute the key’s hash code, which will usually be an int . The two different objects could have the same hash code, as there may be an infinite number of elements and a finite number of ints.
- Then, calculate the index in the array using hash code using modulo as hashCode (key) % array_length . Here, two different hash codes could map to the same index.
- Get the linked list at this index calculated above. Store the element in this index. The use of a linked list is important because of collisions: you could have two different keys with the same hash code or two different hash codes that map to the same index.
The picture below shows explains this.
This can be implemented as:
public class MyMap < private Entry[] buckets; private static final int INITIAL_CAPACITY = 1 public MyMap(int capacity) < this.buckets = new Entry[capacity]; >public void put(K key, V value) < Entryentry = new Entry<>(key, value, null); int bucket = getHash(key) % getBucketSize(); Entry existing = buckets[bucket]; if (existing == null) < buckets[bucket] = entry; size++; >else < // compare the keys see if key already exists while (existing.next != null) < if (existing.key.equals(key)) < existing.value = value; return; >existing = existing.next; > if (existing.key.equals(key)) < existing.value = value; >else < existing.next = entry; size++; >> > // . . . >
Retrieving Element
The retrieval of the element from HashMap can be done with the following steps:
- Compute the hash code from the key, and then compute the index from the hash code with module operation.
- Then, get the linked list at index computed above and search through the linked list for the value with this value.
The implementation can be as simple as the following:
public V get(K key) < Entrybucket = buckets[getHash(key) % getBucketSize()]; while (bucket != null) < if (bucket.key.equals(key)) < return bucket.value; >bucket = bucket.next; > return null; >
Testing
The custom HashMap implemented above can be tested easily as:
@Test public void testMyMap() < MyMapmyMap = new MyMap<>(); myMap.put("USA", "Washington DC"); myMap.put("Nepal", "Kathmandu"); myMap.put("India", "New Delhi"); myMap.put("Australia", "Sydney"); assertNotNull(myMap); assertEquals(4, myMap.size()); assertEquals("Kathmandu", myMap.get("Nepal")); assertEquals("Sydney", myMap.get("Australia")); >
Time Complexity
Since different keys can be mapped to the same index, there is a chance of collision. If the number of collisions is very high, the worst case runtime is O(n), where n is the number of keys.
However, we generally assume a good implementation that keeps collisions to a minimum, in which case the lookup time is O(1).
Conclusion
This post illustrated how HashMap (or HashTable) can be implemented with an array-based linked list. You can take a look more examples on Cracking the Coding Interview by Gayle Laakmann McDowell.
The source code for all example presented above is available on GitHub.
If you enjoyed this article and want to learn more about Java Collections, check out this collection of tutorials and articles on all things Java Collections.
Published at DZone with permission of Yogen Rai , DZone MVB . See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake
Class HashMap
Type Parameters: K — the type of keys maintained by this map V — the type of mapped values All Implemented Interfaces: Serializable , Cloneable , Map Direct Known Subclasses: LinkedHashMap , PrinterStateReasons
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable , except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations ( get and put ), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the «capacity» of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put ). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable , this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be «wrapped» using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(. ));
The iterators returned by all of this class’s «collection view methods» are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException . Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
This class is a member of the Java Collections Framework.
Class HashMap
Type Parameters: K — the type of keys maintained by this map V — the type of mapped values All Implemented Interfaces: Serializable , Cloneable , Map Direct Known Subclasses: LinkedHashMap , PrinterStateReasons
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable , except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations ( get and put ), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the «capacity» of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put ). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable , this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be «wrapped» using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(. ));
The iterators returned by all of this class’s «collection view methods» are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException . Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
This class is a member of the Java Collections Framework.