Equals And Hashcode

Over the years I’ve found (somewhat surprisingly) that many Java developers aren’t familiar with the equals and hashCode methods or the rules related to overriding them. The rest of this post is for these developers, as we are going to investigate equals and hashcode in detail.

The Problem

The problem is with remembering things like the contract of hashCode(), which includes rules like “if two objects are equal then their hashCode must be equal” Some people take these as commandments which are handed down and must be memorized. You could memorize rules like this, but that is not very much fun, and besides memorized rules are easy to forget.

The Solution

The solution is to have understanding instead of memorization. If you understand code that uses these methods, it makes sense how they work and you won’t have to memorize (or forget) rules like this. Let’s take a deep dive into code that relies on equals and hashcode: The Trusty HashMap! We’ll do this in two parts: first covering the basics of what hashing is and how it works, and then looking at how equals is used in conjunction with it to provide a complete hashmap solution. With this understanding in hand, you will never be left wondering why you should override hashCode().

Hash It Up

For starters, let’s look at what the hashCode() method is used for in the first place.

A Map is a data structure that allows you to quickly retrieve an object when you have its key (which is another object). How can we code something like this? Let’s start simple and progress from there… Imagine we have a simple array of Entry objects, where each Entry contains a key object and a value object. To retrieve a value object given its key, we could simply loop over the elements of the array, comparing each entry’s key with the key we have in hand, and returning the entry’s value when we find the right entry. While this works, it’s a horrible idea and you should never write the code I just described! (For the interested reader, this algorithm has O(N) performance). I just wanted to set up the idea of keeping entries in an array, and give you a context for why hashing is so useful.

A much better solution is to use the key to compute the index of the array where the corresponding entry is stored. Then an entry and can be retrieved in constant time no matter where in the array it is, because you already have the index. This is what the hashCode method is! All it does is compute a number – it can be any number at all – and that tells us where in the array the entry is stored. If the number is greater than the number of elements in the array, we can simply use the modulo operator to keep the number within bounds.


public class BuggyHashMap<K, V> implements SimpleMap<K,V> {
    
    private Entry<K,V>[] elements;
    
    public BuggyHashMap() {
        elements = new Entry[16];
    }

    @Override
    public V get(K inKey) {
        int index = inKey.hashCode() % elements.length;
        Entry<K,V> entry = elements[index];
        return (entry == null) ? null : entry.value;
    }

    @Override
    public V put(K inKey, V putValue) {
        int index = inKey.hashCode() % elements.length;
        elements[index] = new Entry<>(inKey, putValue);
        return putValue;
    }

    public static class Entry<K,V> {
        private K key;
        private V value;
        public Entry(K k, V v) {
            key = k;
            value = v;
        }
        public K getKey() {
            return key;
        }
        public V getValue() {
            return value;
        }
    }
}

Here is a unit test that tests that you can put and get an object by its key.

@Test
public void testPutAndGet() throws Exception {
    assertNull(map.get("1"));
    map.put("1", "apples");
    assertEquals("apples", map.get("1"));
}

Now, this is still only a first pass at writing a HashMap, and it has some big issues. For one, what happens when two different keys compute the same hashCode? This implementation will overwrite an entry if that happens, which is not good. This occurrence is called a hash collision, and we can easily write a test exposes that failure:


public class ConstantHash {
    @Override
    public int hashCode() {
        return 42; // this is a horrible hashcode method, do not use!
    }
}

@Test
public void testTwoObjectsSameHash() throws Exception {
   Object k1 = new ConstantHash();
   Object k2 = new ConstantHash();

   map.put(k1, "apples");
   map.put(k2, "oranges");

   // FAILS with this implementation!
   assertEquals("apples",  map.get(k1));
   assertEquals("oranges", map.get(k2)); 
}

Do you see what happens in the code when you run this test? Since these two values have the same hashcode, they are being placed into the same index of the array (because the keys have the same hashcode), and “oranges” is overwriting “apples”. To fix this requires a restructuring of our map and the introduction of the equals method. Don’t worry, we will fix all of this in the next post!

To Be Continued…

Advertisements

Leave a comment

Filed under Software Engineering

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s