Monthly Archives: September 2014

Equals and Hashcode Part II: Handling Collisions

In Part I, we looked a first pass at implementing a HashMap to see how hashing works. However, that implementation had some flaws which we need to address. In addressing these flaws, we will see why the rules related to overriding the equals() and hashCode() methods are what they are.

Initial Implementation

Recall that the first implementation is susceptible to hash collisions, and we wrote a unit test to expose that failure:


public class ConstantHash {
    @Override
    public int hashCode() {
        return 42; // this is a horrible hashcode method, do not use!
    }
}

@Test
public void testTwoObjectsSameHash() throws Exception {
   Object k1 = new ConstantHash();
   Object k2 = new ConstantHash();

   map.put(k1, "apples");
   map.put(k2, "oranges");

   // FAILS when there is a hash collision!
   assertEquals("apples",  map.get(k1));
   assertEquals("oranges", map.get(k2)); 
}

Two keys with different values but the same hashcode are being placed into the same index of the array, and the “oranges” entry is overwriting “apples”. To fix this requires that we introduce a data structure so that we can store entries whose keys have different values but the same hashcode.

Collision Insurance

If we modify our map so that each element of our original array is no longer a simple container for a single entry but instead is itself a list of entries, we can map a hashcode to a list of entries instead of to a single entry.

/**
 * This implements a hash map, not necessarily Java Map.
 * This class is not thread safe, 
 * not optimized for performance, 
 * doesn't do null checking, and is generally not 
 * production-ready. But it shows how hashing works!
 * 
 * @param <K> type of the key
 * @param <V> type of the value
 */
public class BetterHashMap<K, V> implements SimpleMap<K,V> {
    
    private final LinkedList<Entry<K,V>>[] elements;
    private final Entry<K,V> NULL_ENTRY = new Entry(null, null);
    
    public BetterHashMap() {
        elements = new LinkedList[16];
        Arrays.fill(elements, null);
    }

With such a structure in place, when two keys have the same hashcode, they can be stored in the same list. To retrieve an object with a given key, with this list we can do a simple iteration through the entries, comparing keys using the equals() method to determine when we’ve found the right entry.

    @Override
    public V get(K inKey) {
        return (V) getElementList(inKey).stream()
                                .filter(e -> e.getKey().equals(inKey))
                                .findFirst()
                                .orElse(NULL_ENTRY)
                                .getValue();
    }

    @Override
    public V put(K inKey, V putValue) {
        remove(inKey);
        getElementList(inKey).add(new Entry<>(inKey, putValue));
        return putValue;
    }

    @Override
    public V remove(K inKey) {
        LinkedList<Entry<K,V>> list = getElementList(inKey);
        Entry<K,V> entry = list.stream()
                               .filter(e -> e.getKey().equals(inKey))
                               .findFirst()
                               .orElse(NULL_ENTRY);
        list.remove(entry);
        return entry.getValue();
    }

Finally we have a convenience method to ensure that we always have a linked list available at any given index. And the Entry inner class is the same as before.

    
    private LinkedList<Entry<K,V>> getElementList(K searchKey) {
        int index = searchKey.hashCode() % elements.length;
        elements[index] = (elements[index] == null) 
                                 ? new LinkedList<>() 
                                 : elements[index];
        return elements[index];
    }
    
    public static class Entry<K,V> {
        private final K key;
        private final V value;
        public Entry(K k, V v) {
            key = k;
            value = v;
        }
        public K getKey() {
            return key;
        }
        public V getValue() {
            return value;
        }
    }
}

The code is good for detail, but as usual an image is worth a thousand words. This is an array of linked lists where each list has unequal objects with the same hashcode. Use the diagram to visualize what happens when you calculate a hashcode, how you find the appropriate list, and how you use the equals method to traverse the list to find the correct entry. It should make sense why unequal objects can have the same hashcode but equal objects must have the same hashcode.

Contracts

There are contracts to both equals and hashCode. We are focusing on the hashcode contract here but the point is the same. Understanding how the collections framework uses these methods helps you to understand the meaning behind the contract.

Let’s review some elements of the contract of hashcode:

  • “The hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.” Why is that? Well otherwise you would not be able to retrieve your object from a HashMap because you would not be able to reliably calculate its index.
  • “If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.” Why is that? If two objects were equal and had different hashcodes, it would be impossible to determine which of the two objects you want to retrieve from the map.
  • “It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.” Why is that? Think about what happens with our “ConstantKey” class whose hashcode always returns 42. While this is legal, all objects will get placed into the same linked list, and you will lose all advantages of using a hashmap to store your objects. Finding any given object would require a linear search. That’s why the contract states that such behavior can impact the performance of hash tables.

Conclusion

If you can remember that picture of how a hashmap works, you will have an intuition of why equals and hashcode should be overridden together and what their relationship will be. Memorizing rules is no fun, but have an understanding of how things work IS. Hopefully this understanding will serve you well next time you need to work with equals and hashcode.

Advertisements

Leave a comment

Filed under Software Engineering

Equals And Hashcode

Over the years I’ve found (somewhat surprisingly) that many Java developers aren’t familiar with the equals and hashCode methods or the rules related to overriding them. The rest of this post is for these developers, as we are going to investigate equals and hashcode in detail.

The Problem

The problem is with remembering things like the contract of hashCode(), which includes rules like “if two objects are equal then their hashCode must be equal” Some people take these as commandments which are handed down and must be memorized. You could memorize rules like this, but that is not very much fun, and besides memorized rules are easy to forget.

The Solution

The solution is to have understanding instead of memorization. If you understand code that uses these methods, it makes sense how they work and you won’t have to memorize (or forget) rules like this. Let’s take a deep dive into code that relies on equals and hashcode: The Trusty HashMap! We’ll do this in two parts: first covering the basics of what hashing is and how it works, and then looking at how equals is used in conjunction with it to provide a complete hashmap solution. With this understanding in hand, you will never be left wondering why you should override hashCode().

Hash It Up

For starters, let’s look at what the hashCode() method is used for in the first place.

A Map is a data structure that allows you to quickly retrieve an object when you have its key (which is another object). How can we code something like this? Let’s start simple and progress from there… Imagine we have a simple array of Entry objects, where each Entry contains a key object and a value object. To retrieve a value object given its key, we could simply loop over the elements of the array, comparing each entry’s key with the key we have in hand, and returning the entry’s value when we find the right entry. While this works, it’s a horrible idea and you should never write the code I just described! (For the interested reader, this algorithm has O(N) performance). I just wanted to set up the idea of keeping entries in an array, and give you a context for why hashing is so useful.

A much better solution is to use the key to compute the index of the array where the corresponding entry is stored. Then an entry and can be retrieved in constant time no matter where in the array it is, because you already have the index. This is what the hashCode method is! All it does is compute a number – it can be any number at all – and that tells us where in the array the entry is stored. If the number is greater than the number of elements in the array, we can simply use the modulo operator to keep the number within bounds.


public class BuggyHashMap<K, V> implements SimpleMap<K,V> {
    
    private Entry<K,V>[] elements;
    
    public BuggyHashMap() {
        elements = new Entry[16];
    }

    @Override
    public V get(K inKey) {
        int index = inKey.hashCode() % elements.length;
        Entry<K,V> entry = elements[index];
        return (entry == null) ? null : entry.value;
    }

    @Override
    public V put(K inKey, V putValue) {
        int index = inKey.hashCode() % elements.length;
        elements[index] = new Entry<>(inKey, putValue);
        return putValue;
    }

    public static class Entry<K,V> {
        private K key;
        private V value;
        public Entry(K k, V v) {
            key = k;
            value = v;
        }
        public K getKey() {
            return key;
        }
        public V getValue() {
            return value;
        }
    }
}

Here is a unit test that tests that you can put and get an object by its key.

@Test
public void testPutAndGet() throws Exception {
    assertNull(map.get("1"));
    map.put("1", "apples");
    assertEquals("apples", map.get("1"));
}

Now, this is still only a first pass at writing a HashMap, and it has some big issues. For one, what happens when two different keys compute the same hashCode? This implementation will overwrite an entry if that happens, which is not good. This occurrence is called a hash collision, and we can easily write a test exposes that failure:


public class ConstantHash {
    @Override
    public int hashCode() {
        return 42; // this is a horrible hashcode method, do not use!
    }
}

@Test
public void testTwoObjectsSameHash() throws Exception {
   Object k1 = new ConstantHash();
   Object k2 = new ConstantHash();

   map.put(k1, "apples");
   map.put(k2, "oranges");

   // FAILS with this implementation!
   assertEquals("apples",  map.get(k1));
   assertEquals("oranges", map.get(k2)); 
}

Do you see what happens in the code when you run this test? Since these two values have the same hashcode, they are being placed into the same index of the array (because the keys have the same hashcode), and “oranges” is overwriting “apples”. To fix this requires a restructuring of our map and the introduction of the equals method. Don’t worry, we will fix all of this in the next post!

To Be Continued…

Leave a comment

Filed under Software Engineering

The science of Motivation: Purpose

This is part five in a series on motivation. To recap the basic idea: rather than external motivations, what we actually need are internal motivations. This idea is from Dan Pink’s book Drive, and you can also watch an inspiring TED talk about it which I highly recommend. The three main ideas (which we are exploring in three parts) are Autonomy, Mastery, and Purpose.

Let’s talk about Purpose and what it means for software engineers.

What Is My Purpose?

Purpose (noun) “The reason for which something is done or created or for which something exists.”

“What is my purpose” is a question that philosophers have thought about for thousands of years, so maybe this is a bigger question than can be tackled in a blog post! But let’s go ahead, join the philosophers, and give it some thought.

Do I Matter To My Software?

Your work has purpose if you feel like you have a reason to do what you do, if you feel like you are important to the success of the software and your work matters.True, nothing is more demoralizing than feeling like your work doesn’t matter. That’s why constantly shifting priorities resulting in throwing away work is a big destroyer of motivation in any company that develops software. In contrast, feeling like the software needs you, and by extension an ecosystem of other human beings (your customers) needs you, can be highly motivating. This can happen when a startup is counting on you, and even in a bigger company if you have critical skills, knowledge or experience.

Does My Software Matter To The World?

Beyond that, software itself can have a reason to exist and be serving a higher purpose. Maybe the domain is something that accomplishes a good cause or makes the world a better place. Of course everybody has different values and might see different things as “a good cause.” But the point is to discover what’s important to you and acknowledge that working on software in those domains would probably be very satisfying to you.

Conclusion

Purpose is just having a reason to do what you do. It’s almost a tautology to say that having a reason to do what you is motivating. But it’s important to search for and recognize work that you feel matters to you, and in a context where you feel you can make a difference.

Leave a comment

Filed under Software Engineering

The Science of Motivation: Mastery

This is part four in a series on motivation. To recap the basic idea: rather than external motivations, what we actually need are internal motivations. This idea is from Dan Pink’s book Drive, and you can also watch an inspiring TED talk about it which I highly recommend. The three main ideas (which we are exploring in three parts) are Autonomy, Mastery, and Purpose.

What It Means In Software

Let’s talk about Mastery and what it means for software engineers.

Mastery means the engineer is getting better and better at something. Something that matters. For example, learning a new language, library, or framework is a frequent requirement of the profession and lends itself to mastery as one starts from scratch and ends with doing something useful and beautiful. If an engineer believes that this new language or library is valuable, this can be a wonderful thing to master.

Beyond the mastery of a technology, one can also master the product under development, or the domain in which the product is used. If the product or the domain is incredibly interesting to someone, mastery in that area could be highly motivating even if the technology choices are boring. Conversely, if the domain or product is incredibly boring (at least, boring to the kinds of people who become software engineers) then it would be motivating to find a way to incorporate new technology into a product.

Whatever the topic of mastery, there are some important elements. The engineer not only be actually learning something new (and recognize that they are mastering something), the engineer must care about what they are learning about. The topic under mastery should be something that they find valuable. Obviously one could master something that they find tedious or uninteresting and it would not be an enjoyable experience!

Let’s Find Something To Master

Can you promote your internal motivation by introducing elements of mastery into the workplace? Of course!

If you’re at a startup or a small company, you may have the autonomy to introduce technologies that you want to master. If you’re at a big company and don’t have a lot of leeway to change the underlying technologies of your product, internal tools are good places to introduce a new technology or language. New technologies can be introduced on the periphery, and as they prove themselves (and you master them!) they may eventually find a home in your product. And then you’re the go-to person for that technology! How to introduce new technologies in the workplace is a topic worthy of an article all by itself.

Finally, if you are at a larger company and have less power to introduce technologies to learn, you may find it an interesting challenge to try to master the product or learn more about the business. Using the product in its intended domain is an entirely different skill from writing the software. And learning the business is an entire world unto itself! But there are opportunities there for you if you choose to discover them. The support team, business development, and sales teams would probably be happy to have someone with your technical expertise to help them out and show an interest in what they do. It can’t hurt to start a conversation and see where it might lead.

Conclusion

Engineers often suffer from boredom and loss of motivation. Finding something new to master may be the internal motivator that can make you excited to go to work again!

Leave a comment

Filed under Software Engineering

The Science of Motivation: Autonomy

This is part two in a series on motivation. To recap the last post: rather than external motivations, what we actually need are internal motivations. This idea is from Dan Pink’s book Drive, and you can also watch an inspiring TED talk about it which I highly recommend. The three main ideas (which we will explore in three parts) are Autonomy, Mastery, and Purpose.

Let’s talk about Autonomy and what it means for software engineers. Autonomy means the engineer is in control of many aspects of their work. Aspects under his or her control might include when, where, and how to accomplish the work, and even what the specific work is.

When

When an engineer can choose when to work, they are empowered to work at what is their optimal time. Some people are morning birds and think best when their mind is fresh at 7AM. Others are night owls and would prefer to not show up at work until 11AM, and don’t really get warmed up until 3PM. To tell either of these people “company hours are 9-5, you have to be at your desk at that time” would be soul-crushing. Usually companies are fairly flexible with this aspect of autonomy, but we can do better: I’ve seen big corporations that gave the option of 4 day weeks (with 10 hour days), or schedules with every other friday off (with 9 hour days), which were wildly popular with employees.

Where

Most employers offer very limited options for where to physically work. The single option is usually a single desk in an open floor plan. Open floor plans are very popular among managers and office planners, but among developers open floor plans are almost universally despised. As an engineer trying to get into the zone, the best option you can hope for (and it’s worth asking for) is easy access to a guess office or room with a closed door where you can go at will. Better yet, permanent semi/full time work-from-home arrangements are also incredibly popular and easily doable with the nature of work of the software engineer.

What

Being able to choose your actual work is the pinnacle of autonomy! But how can we accomplish that and still do what needs to get done for the business? In many cases, you may be able to choose the features or bugs that interest you (from those available) instead of all being dictated to you. Some people will naturally gravitate to certain parts of the system or certain kinds of features, giving them autonomy over their actual work. And if you are lucky, there are programs like Google’s 20% time or Atlassian’s shipit days where engineers truly have power over the “what” of the work and still push the business forward. Even if you don’t work for Google or Atlassian, it’s worth bringing up with management to see if it could benefit your company (and benefit you at the same time).

Conclusion

As an employer, you can provide these options of autonomy to increase the engagement of your engineers and the competitiveness of your company, while keeping in mind to find the balance between full autonomy and accomplishing business objectives. As a software engineer, you can ask for these things in your current job or look for them in your next job. With a bit more autonomy, developers can be happier and more motivated, and businesses can be faster and stronger!

Leave a comment

Filed under Software Engineering