Equals and Hashcode Part II: Handling Collisions

In Part I, we looked a first pass at implementing a HashMap to see how hashing works. However, that implementation had some flaws which we need to address. In addressing these flaws, we will see why the rules related to overriding the equals() and hashCode() methods are what they are.

Initial Implementation

Recall that the first implementation is susceptible to hash collisions, and we wrote a unit test to expose that failure:


public class ConstantHash {
    @Override
    public int hashCode() {
        return 42; // this is a horrible hashcode method, do not use!
    }
}

@Test
public void testTwoObjectsSameHash() throws Exception {
   Object k1 = new ConstantHash();
   Object k2 = new ConstantHash();

   map.put(k1, "apples");
   map.put(k2, "oranges");

   // FAILS when there is a hash collision!
   assertEquals("apples",  map.get(k1));
   assertEquals("oranges", map.get(k2)); 
}

Two keys with different values but the same hashcode are being placed into the same index of the array, and the “oranges” entry is overwriting “apples”. To fix this requires that we introduce a data structure so that we can store entries whose keys have different values but the same hashcode.

Collision Insurance

If we modify our map so that each element of our original array is no longer a simple container for a single entry but instead is itself a list of entries, we can map a hashcode to a list of entries instead of to a single entry.

/**
 * This implements a hash map, not necessarily Java Map.
 * This class is not thread safe, 
 * not optimized for performance, 
 * doesn't do null checking, and is generally not 
 * production-ready. But it shows how hashing works!
 * 
 * @param <K> type of the key
 * @param <V> type of the value
 */
public class BetterHashMap<K, V> implements SimpleMap<K,V> {
    
    private final LinkedList<Entry<K,V>>[] elements;
    private final Entry<K,V> NULL_ENTRY = new Entry(null, null);
    
    public BetterHashMap() {
        elements = new LinkedList[16];
        Arrays.fill(elements, null);
    }

With such a structure in place, when two keys have the same hashcode, they can be stored in the same list. To retrieve an object with a given key, with this list we can do a simple iteration through the entries, comparing keys using the equals() method to determine when we’ve found the right entry.

    @Override
    public V get(K inKey) {
        return (V) getElementList(inKey).stream()
                                .filter(e -> e.getKey().equals(inKey))
                                .findFirst()
                                .orElse(NULL_ENTRY)
                                .getValue();
    }

    @Override
    public V put(K inKey, V putValue) {
        remove(inKey);
        getElementList(inKey).add(new Entry<>(inKey, putValue));
        return putValue;
    }

    @Override
    public V remove(K inKey) {
        LinkedList<Entry<K,V>> list = getElementList(inKey);
        Entry<K,V> entry = list.stream()
                               .filter(e -> e.getKey().equals(inKey))
                               .findFirst()
                               .orElse(NULL_ENTRY);
        list.remove(entry);
        return entry.getValue();
    }

Finally we have a convenience method to ensure that we always have a linked list available at any given index. And the Entry inner class is the same as before.

    
    private LinkedList<Entry<K,V>> getElementList(K searchKey) {
        int index = searchKey.hashCode() % elements.length;
        elements[index] = (elements[index] == null) 
                                 ? new LinkedList<>() 
                                 : elements[index];
        return elements[index];
    }
    
    public static class Entry<K,V> {
        private final K key;
        private final V value;
        public Entry(K k, V v) {
            key = k;
            value = v;
        }
        public K getKey() {
            return key;
        }
        public V getValue() {
            return value;
        }
    }
}

The code is good for detail, but as usual an image is worth a thousand words. This is an array of linked lists where each list has unequal objects with the same hashcode. Use the diagram to visualize what happens when you calculate a hashcode, how you find the appropriate list, and how you use the equals method to traverse the list to find the correct entry. It should make sense why unequal objects can have the same hashcode but equal objects must have the same hashcode.

Contracts

There are contracts to both equals and hashCode. We are focusing on the hashcode contract here but the point is the same. Understanding how the collections framework uses these methods helps you to understand the meaning behind the contract.

Let’s review some elements of the contract of hashcode:

  • “The hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.” Why is that? Well otherwise you would not be able to retrieve your object from a HashMap because you would not be able to reliably calculate its index.
  • “If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.” Why is that? If two objects were equal and had different hashcodes, it would be impossible to determine which of the two objects you want to retrieve from the map.
  • “It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.” Why is that? Think about what happens with our “ConstantKey” class whose hashcode always returns 42. While this is legal, all objects will get placed into the same linked list, and you will lose all advantages of using a hashmap to store your objects. Finding any given object would require a linear search. That’s why the contract states that such behavior can impact the performance of hash tables.

Conclusion

If you can remember that picture of how a hashmap works, you will have an intuition of why equals and hashcode should be overridden together and what their relationship will be. Memorizing rules is no fun, but have an understanding of how things work IS. Hopefully this understanding will serve you well next time you need to work with equals and hashcode.

Advertisements

Leave a comment

Filed under Software Engineering

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s