Monthly Archives: November 2013

Programming Puzzle: Implement LRU Cache

Since we were talking about cache recently, it’s worth it to look at implementing our own cache as a programming exercise.

Why?

First, let’s be clear: There is really no need to use your own cache implementation in production. There are a variety of existing caching options (such as EHCache, Guava, and JCache), and you put yourself at risk by re-inventing the wheel.

That said, looking into how caches work is a fun exercise. It can give us a feel for the issues involved and hopefully an intuition for dealing with off-the-shelf caching implementations.

Out With The Old, In With The New

Let’s look at a Least Recently Used (LRU) cache. In this cache, entries that have not been accessed for the longest amount of time are eligable for removal when the cache reaches its maximum size. Java provides an easy base class out of the box to implement this idea: the LinkedHashMap.

Things To Note

  • The constructor specifies the ordering by access mode (get or put). This is important so that the least recently used entries are make eligible for removal.
  • We have an implementation of removeEldestEntry(), this is called when entries are added to see if any entries can be removed.
  • The map by default is not thread safe, so we can add the option of providing this as a synchronized cache.
  • Optionally we can choose the cache to store values of type Optional instead of type V. This reduces the need of the caller to do null handling if the value is not present.

Finally: The Code

public class LRUCache<K, V> extends LinkedHashMap<K, Optional<V>> {

    private static final long serialVersionUID = 1L;
    private final int maxEntries;

    public LRUCache(final int maxEntries) {
        super(16, 0.75f, true);
        this.maxEntries = maxEntries;
    }
    
    @Override
    public Optional<V> get(Object key) {
        return super.get(key);
    }

    @Override
    protected boolean removeEldestEntry(final Map.Entry<K, Optional<V>> eldest) {
        return super.size() > maxEntries;
    }

    public Map<K, Optional<V>> asSynchronizedCache() {
        return Collections.synchronizedMap(this);
    }
}

Leave a comment

Filed under Software Engineering

Negative Caching

Recently while investigating caching, I came across the idea of “Negative Caching”. According to Wikipedia, a negative cache is a cache that also stores “negative” responses or failures. Put simply, you can store the knowledge that something does not exist, as opposed to a positive cache that stores information about something that does exist. (In the rest of this article we will indicate a “normal” cache as a “positive cache”.)

But Why?

A positive cache is useful if success – retrieving an object or resource – is more expensive than storing information about what you want. In contrast, a negative cache is useful if failure – determining that something does not exist – is very expensive. However, this is a niche scenario because for the majority of modern software, failure in a resource lookup is usually not that expensive.

Ok… But Why?

The only applications of negative caching that I could find were for DNS caching.

A DNS lookup for a record that does not exist can be expensive since the query travels through a large network of DNS servers if the record has not yet propagated. Propagation can take hours or days, so if a request for a DNS record returns a negative result, you (the DNS server) can likely save yourself a lot of trouble by remembering that result for a few hours.

If a DNS server is doing negative caching, and it does a lookup for a record but gets no result, the system can remember this failed lookup and remember that the record does not exist. This way the server can use this information to return a negative result immediately instead of trying to (expensively) find it again the next time somebody asks for it. The negative cache records can expire over time so that once the actual DNS record has been propagated, the search will find it.

Once again: Cache Invalidation Is Hard

If a positive cache contains an entry and the resource underlying that entry is updated, the positive cache entry must be invalidated, or risk returning incorrect results. Application level errors resulting from inappropriately greedy caches are notoriously difficult to diagnose because usually caching fades into the fabric of your system. Clearing the cache at any level is usually the first line of attack when you are getting strange results that you don’t expect, whether at the database or at the browser.

By the same token, if a negative cache contains an entry for something that doesn’t exist, and that underlying thing appears, you can miss out on something that actually does exist until the cache entry expires. Imagine your system telling you that something doesn’t exist when you can verify by other means that it does! Expiration by timeout or by providing cache clearing directions is therefore very important.

Conclusion

You can store the knowledge that something does not exist if determining non-existence is expensive. This solution is pretty rare, but does have its uses in the right situations.

Leave a comment

Filed under Software Engineering

How to Merge Two Sorted Streams Into One Sorted Stream, Part II (Java 8 Edition)

Previously, we merged two sorted iterators into one sorted iterator. One might reasonably ask, can we do the same thing with Java 8’s Streams? One might reasonably answer: Yes, we can!

The Data Structure

The Stream object has the same qualities as java.util.Iterator (allowing you to iterate over data without exposing storage), but it also offers a lot more functionality. Stream is a central component of Java 8 Lambdas, designed from the ground up to work with lambda expressions.

Given the prevalence of Stream, it would be nice to convert our Iterator solution into a Stream solution.

It’s All About Leverage

A Stream is backed by a Spliterator. A look at the API shows that a Stream can produce an Iterator and be constructed by a Spliterator, so with a simple Adapter (to make an Iterator look like a Spliterator) we can essentially convert Streams and Iterators back and forth to each other. This will allow us to leverage the existing iterator-merging code that we already wrote in the last post.
Our approach might look like this:

  1. accept two incoming Streams
  2. extract iterators from each one
  3. merge the iterators using our existing iterator-merging code
  4. use the newly merged iterator to create a Spliterator
  5. use the Spliterator to construct a new Stream

Embodied in code, the above description looks like so:

public class StreamMerger<T> {

    public Stream merge(Stream stream1, Stream stream2) {
        Iterator<T> iterator = new MergedIterator<>(stream1.iterator(), stream2.iterator());
        Spliterator<T> spliterator = new SpliteratorAdapter<>(iterator);
        return StreamSupport.stream(spliterator, false);
    }
}

With the SpliteratorAdapter being equally simple:

/**
 * Recommend using with StreamSupport.stream(iteratorStream, false);
 */
public class SpliteratorAdapter<T> extends Spliterators.AbstractSpliterator <T>{

    private final Iterator<T> iterator;

    public SpliteratorAdapter(Iterator<T> iter) {
        super(Long.MAX_VALUE, 0);
        iterator = iter;
    }

    @Override
    public synchronized boolean tryAdvance(Consumer<? super T> action) {
        if(iterator.hasNext()) {
            action.accept(iterator.next());
            return true;
        }
        return false;
    }
}

At this point we can simply use our new StreamMerger to merge as many sorted streams as we want.

The Test

The unit test would be pretty similar to the one for the iterator. In fact we can use the same test data for both tests since the Parameterized test runner requires a public static data provider.

@RunWith(Parameterized.class)
public class StreamMergerTest {

    private final List<Integer> expectedMerge;
    private final StreamMerger<Integer> streamMerger = new StreamMerger<>();
    private final Stream<Integer> mergedStream;

    public StreamMergerTest(List<Integer> list1, List<Integer> list2, List<Integer> expected) {
        expectedMerge = expected;
        mergedStream = streamMerger.merge(list1.stream(), list2.stream());
    }

    @Test
    public void testStream() throws Exception {
        List<Integer> merged = mergedStream.collect(Collectors.<Integer>toList());
        assertEquals(expectedMerge, merged);
    }

    @Parameters
    public static List<Object[]> data() {
        return MergedIteratorTest.data();
    }
}

And there we have it! Now we can merge sorted Java 8 Streams.

3 Comments

Filed under Software Engineering

How to Merge Two Sorted Streams Into One Sorted Stream

The title says it all, doesn’t it? If we were given two sorted streams, how could we merge them into one single sorted stream?

The Data Structure

The term Stream implies a structure whose elements can be retrieved and processed but whose storage cannot be inspected or manipulated. Java has a couple different streams, such as java.io.InputStream and java.util.stream.Stream, but for purposes of this exercise we’ll use the simplest interface we can that will let us stream over elements: java.util.Iterator.

A First Pass

At first glance we may be tempted to write a simple single method that accepts two iterators and returns a single iterator. Such a method could stream both iterators into a list, sort the list, and then return an iterator for the resulting list. This approach suffers from two flaws. The first flaw is the performance hit of having to do a sort. The second flaw is a fatal one: the iterators could be infinite.

Truly unbounded iterators are rare in the wild. But we could imagine, say, an iterator representing the set of incoming requests to an HTTP server, sorted by request time. Such an iterator is a sorted iterator that can never end as long as the server is running. If you wanted to merge two request processing threads, a merge technique like this could be useful.

A Second Pass

Ok, so we need to stream through these iterators without trying to store values. To imagine how this will work, imagine that the two iterators are like the two sides of a zipper, and we are zipping the two sides together into one zipped zipper. In practice, we will take the two leading elements of each stream, compare them, and return the one that should appear earlier in a sort. Since both streams are already sorted, we are essentially sorting the merged values as we return each one.

To accomplish this, our merged iterator will need to retain a reference to the two incoming iterators, and to do that our merged iterator will have to be its own class. Additionally, this new class will need to retain a reference to the “next” value from each iterator so that we can repeatedly compare the next value of each iterator until it needs to be returned.


public class MergedIterator<T extends Comparable> implements Iterator<T> {

    private Iterator<T> s1;
    private Iterator<T> s2;
    private T next1;
    private T next2;

    public MergedIterator(Iterator<T> s1, Iterator<T> s2) {
        if (s1 == null || s2 == null) {
            throw new IllegalArgumentException("arguments can't be null");
        }

        this.s1 = s1;
        this.s2 = s2;

        next1 = s1.hasNext() ? s1.next() : null;
        next2 = s2.hasNext() ? s2.next() : null;
    }

    public boolean hasNext() {} // TODO
    public T next() { } // TODO

How to choose the next value

So we have an iterator backed by two other iterators and the next value for each iterator. We can compare these two values and return the one that should appear next in sorted order. Before returning we need to be sure to pop the next value from that iterator so we can do another comparison on any subsequent call to .next(). The content of MergedIterator.next() might look like this:

        if (next1.compareTo(next2) <= 0) {
            T returnObject = next1;
            next1 = s1.hasNext() ? s1.next() : null;
            return returnObject;
        } 
        else {
            T returnObject = next2;
            next2 = s2.hasNext() ? s2.next() : null;
            return returnObject;
        }

What if an Iterator is Empty?

To indicate that an iterator is empty, the value holding that iterator’s next value can be null. We can do a check so that one iterator’s value will always be returned if the other’s is null. In other words: once an iterator is empty the other iterator is the only one that can provide values. We can add this to MergedIterator.next()

        // if one stream is empty,
        // just continue streaming from the other
        if (next1 == null && next2 != null) {
            T returnObject = next2;
            next2 = s2.hasNext() ? s2.next() : null;
            return returnObject;
        }

        if (next1 != null && next2 == null) {
            T returnObject = next1;
            next1 = s1.hasNext() ? s1.next() : null;
            return returnObject;
        }

This also leads us to .hasNext(). If we use .hasNext() on the backing iterators to implement MergedIterator.hasNext(), we’ll never get the last element because the merged iterator will think both backing iterators are empty when in reality the last element could be in the stored next element. so hasNext() should look like this:

    @Override
    public boolean hasNext() {
        return next1 != null || next2 != null;
    }

Reducing Duplicate Code

Finally, we might notice some repeated code in next() and decide that we really only need to decide one thing: which stream to use. We can merge these conditions and end up with half the code at the expense of some readability. I’ll leave it as an exercise to the reader which way is really more readable.

        // can use half the code at the expense of some readability
        boolean useStream1 = (next1 != null && next2 == null)  ||
                             (next1 != null && next2 != null && next1.compareTo(next2) <= 0);

        if(useStream1) {
            T returnObject = next1;
            next1 = s1.hasNext() ? s1.next() : null;
            return returnObject;            
        }
        else {
            T returnObject = next2;
            next2 = s2.hasNext() ? s2.next() : null;
            return returnObject;
        }

With all of this together, here is the complete class.

public class MergedIterator<T extends Comparable> implements Iterator<T> {

    private Iterator<T> s1;
    private Iterator<T> s2;
    private T next1;
    private T next2;

    public MergedIterator(Iterator<T> s1, Iterator<T> s2) {
        if (s1 == null || s2 == null) {
            throw new IllegalArgumentException("arguments can't be null");
        }

        this.s1 = s1;
        this.s2 = s2;

        next1 = s1.hasNext() ? s1.next() : null;
        next2 = s2.hasNext() ? s2.next() : null;
    }

    // NOT THREAD SAFE
    @Override
    public boolean hasNext() {
        // if you use .hasNext() on the backing iterators, you'll never get the last element
        return next1 != null || next2 != null;
    }

    // NOT THREAD SAFE
    @Override
    public T next() {

        // returning null instead of throwing exceptions makes it easier to use parameterized tests
        if(!hasNext()) {
            return null;
        }
        
        // can use half the code at the expense of some readability
        boolean useStream1 = (next1 != null && next2 == null)  ||
                             (next1 != null && next2 != null && next1.compareTo(next2) <= 0);

        if(useStream1) {
            T returnObject = next1;
            next1 = s1.hasNext() ? s1.next() : null;
            return returnObject;            
        }
        else {
            T returnObject = next2;
            next2 = s2.hasNext() ? s2.next() : null;
            return returnObject;
        }
        
//        // if one stream is empty, can just continue streaming from the other
//        if (next1 == null && next2 != null) {
//            T returnObject = next2;
//            next2 = s2.hasNext() ? s2.next() : null;
//            return returnObject;
//        }
//
//        if (next1 != null && next2 == null) {
//            T returnObject = next1;
//            next1 = s1.hasNext() ? s1.next() : null;
//            return returnObject;
//        }
//
//        if (next1.compareTo(next2) <= 0) {
//            T returnObject = next1;
//            next1 = s1.hasNext() ? s1.next() : null;
//            return returnObject;
//        } 
//        else {
//            T returnObject = next2;
//            next2 = s2.hasNext() ? s2.next() : null;
//            return returnObject;
//        }
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException("Not supported.");
    }

}

How To Test This

I propose these test cases for the incoming iterators: both empty, one or the other is empty, and both non-empty but one or the other is longer. A parameterized test will accomplish this nicely.

@RunWith(Parameterized.class)
public class MergedIteratorTest {

    private final List<Integer> expectedMerge;
    private final MergedIterator<Integer> mergedIterator;

    public MergedIteratorTest(List<Integer> stream1, List<Integer> stream2, List<Integer> expected) {
        expectedMerge = expected;
        mergedIterator = new MergedIterator<>(stream1.iterator(), stream2.iterator());
    }

    @Test
    public void testIterator() throws Exception {
        List<Integer> merged = streamIntoList(mergedIterator);
        assertEquals(expectedMerge, merged);
    }

    @Parameters
    public static List<Object[]> data() {
        return Arrays.asList(new Object[][]{
            {new ArrayList<>(), new ArrayList<>(), new ArrayList<>()},
            {Arrays.asList(1, 2, 3, 4, 5, 6), new ArrayList<>(), Arrays.asList(1, 2, 3, 4, 5, 6)},
            {new ArrayList<>(), Arrays.asList(1, 2, 3, 4, 5, 6), Arrays.asList(1, 2, 3, 4, 5, 6)},
            {Arrays.asList(1, 3, 5), Arrays.asList(1, 2, 4, 6), Arrays.asList(1, 1, 2, 3, 4, 5, 6)},
            {Arrays.asList(1, 2, 4, 6), Arrays.asList(1, 3, 5), Arrays.asList(1, 1, 2, 3, 4, 5, 6)}
        });
    }
    
    private static List<Integer> streamIntoList(Iterator<Integer> m) {
        List<Integer> merged = new ArrayList<>();
        while (m.hasNext()) {
            merged.add(m.next());
        }
        return merged;
    }
}

And there we have it! This was a fun little exercise about merging unbounded sorted streams, and importantly, how to test it. Maybe next time we can merge Java 8 Streams!

Leave a comment

Filed under Software Engineering