Monthly Archives: October 2013

Traversal of Infinite Complete Binary Trees

In the last post, we looked at code to generate an infinitely large binary tree. In this post we will look at how to traverse it.

Breadth First VS Depth First

First of all, an Infinite Complete Binary Tree can’t be effectively traversed with a depth-first traversal. Wikipedia states it quite eloquently:

“A basic requirement for traversal is to visit every node. For infinite trees, simple algorithms often fail this. For example, given a binary tree of infinite depth, a depth-first traversal will go down one side (by convention the left side) of the tree, never visiting the rest, and indeed if in-order or post-order will never visit any nodes, as it has not reached a leaf (and in fact never will). By contrast, a breadth-first (level-order) traversal will traverse a binary tree of infinite depth without problem, and indeed will traverse any tree with bounded branching factor.”

So for an infinite complete binary tree, the breadth first traversal is really the most useful. Let’s examine some approaches to a breadth first traversal for our PointNode tree.

Using Lists

To mix it up a little bit, we’ll write the unit test first. Here is a small set of test data containing the values of a breadth first traversal of our binary tree going three levels deep.

List breadthFirst;
breadthFirst = Arrays.asList(new BigDecimal(0.5),
                             new BigDecimal(0.25),
                             new BigDecimal(0.75),
                             new BigDecimal(0.125),
                             new BigDecimal(0.375),
                             new BigDecimal(0.625),
                             new BigDecimal(0.875));

And the test method might look like this. We can use lambda expressions to accumulate each node in the traversal.

@Test
public void breadthFirstSimple() throws Exception {

    List traversal = new ArrayList<>();
    PointNode.UNIT_INSTANCE.breadthFirstSimple((n)-> traversal.add(n.getValue()), 3);

    assertEquals(breadthFirst, traversal);
}

As a first pass at this problem, I think the simplest approach involves using lists to express what you do when you visualize traversal of a tree in breadth-first order. Imagine we’re given an arbitrary level of a tree. Given a list of nodes at that level, visit each one in the list. Once done, construct a new list of the ordered children of each of those nodes, and pass that list into the same method recursively. Note that with this approach, the public method can send the root node to the recursive method as a singleton list. Importantly: This recursive method call will never return unless there is an exit criteria. In the sample here, we use the tree depth as the exit criteria.

public void breadthFirstSimple(Consumer visitor, int depth) {
    breadthFirstSimple(visitor, Arrays.asList(this), depth);
}

protected void breadthFirstSimple(Consumer visitor, List supplier, int depth) {

    if(depth <= 0) {
        return;
    }

    for(PointNode node : supplier) {
        visitor.accept(node);
    }

    List subTrees = new ArrayList<>();
    for(PointNode node : supplier) {
        subTrees.addAll(Arrays.asList(node.getLeftChild(), node.getRightChild()));
    }

    breadthFirstSimple(visitor, subTrees, depth-1);

}

Using Lambdas

With Java 8 we can use lambdas to implement the above idea in a functional style. The unit test will look the same.

@Test
public void breadthFirst() throws Exception {

    List traversal = new ArrayList<>();
    PointNode.UNIT_INSTANCE.breadthFirst((n)-> traversal.add(n.getValue()), 3);

    assertEquals(breadthFirst, traversal);
}

Similarly to the approach above, we pass in a list of nodes but this time in the form of a Stream instead of a List. The Stream can visit each node with Stream.peek(), map each node to a stream of that node’s children, and reduce those child streams into a new single stream. Once all of the original nodes have been visited, we are left with a Stream of all the child nodes in breadth-first order. This approach constructs the child data structure and passes it along recursively after visiting all of the original nodes like we did above. But this time the child Stream is constructed as we visit each node instead of after we have visited all nodes. Again, because this is recursive we need an exit criteria.

public void breadthFirst(Consumer visitor, int depth) {
    breadthFirst(visitor, Stream.builder().add(this).build(), depth);
}

protected void breadthFirst(Consumer visitor, Stream supplier, int depth) {

    if(depth == 0) {
        return;
    }

    // want to accumulate child nodes into a stream
    Stream childStream = supplier.peek(visitor)
                                            .map((node) -> node.streamChildren())
                                            .reduce(Stream.empty(), Stream::concat);

    breadthFirst(visitor, childStream, depth - 1);
}

public Stream streamChildren() {
    return Stream.builder().add(this.getLeftChild()).add(this.getRightChild()).build().sequential();
}

Using Streams

The functional approach is nice, but suffers from a higher memory requirement, because nodes at each level are retained for each call in the stack. Also the caller needs to know how many levels deep it wants to go, requiring the caller to have knowledge of the tree structure to know how many nodes it can process. If we could return a Stream that iterates in breadth-first order, we would eliminate that requirement and allow for more Stream-oriented operations on the nodes we visit.

This is my favorite approach: a Stream whose elements are the ordered breadth-first traversal of the tree. This allows us to limit the number of returned elements directly and use the other useful methods on Stream. A test might look like this:

@Test
public void stream() throws Exception {

    List traversal = PointNode.UNIT_INSTANCE
            .stream()
            .map(n -> n.getValue())
            .limit(breadthFirst.size())
            .collect(Collectors.toList());

    assertEquals(breadthFirst, traversal);
}

To construct such a Stream, we need to implement a Spliterator (the iterator that backs up a Stream object). If you read the code below, you’ll notice that it takes the functional approach from above one step further. As it traverses each node in the stream, it does not build a new stream and reduce it into a stream of child nodes. Instead it bypasses the construction of a child stream and directly deposits the child nodes farther down in the current stream. The use of a Queue is very important in this case: we need to be able to deposit nodes farther down so that they will be encountered in order after all of the nodes at the current level have been visited in order.

This approach also uses less memory because it does not retain a reference to previously visited nodes.

public Stream stream() {
    return StreamSupport.stream(new BreadthFirstSpliterator(this), false);
}

protected static class BreadthFirstSpliterator extends AbstractSpliterator {

    private final Queue nodes = new LinkedBlockingQueue<>();

    public BreadthFirstSpliterator(PointNode root) {
        super(Long.MAX_VALUE, ORDERED | DISTINCT | NONNULL | IMMUTABLE);
        nodes.add(root);
    }

    @Override
    public boolean tryAdvance(Consumer action) {
        PointNode current = nodes.remove();
        action.accept(current);
        nodes.addAll(Arrays.asList(current.getLeftChild(), current.getRightChild()));
        return true;
    }
}

In the future we can cover other kinds of traversal, or other kinds of trees. Hopefully this has given you some insight into tree traversal!

Leave a comment

Filed under Software Engineering

Infinite Data Structures

I recently read an article about stream processing. The concept of stream-oriented algorithms is interesting in its own right, but today we’ll be talking about data structures. The data structure presented there was a stream, which we can think of as an array of infinite length – an infinite data structure.

Thinking about arrays of infinite length leads us to ask other questions, like: “What other infinite data structures are there? If so, what might they look like and how can we operate on them?”

One structure that we all know and love is the Tree. It turns out there is an infinite tree data structure: the “Infinite Complete Binary Tree”. These turn out to be more useful as mathematical constructs than as representations of user data, but they are interesting in their own right. Besides, familiarity with them may give us insight into dealing with finite but arbitrarily large trees that do represent real-world data. Let’s build ourselves an infinite tree!

From a mathematical point of view, we can consider the set of all points of the form (m/2^n). This is easily constructed recursively by bisecting the closed interval [0,1] and recursively bisecting the resulting bisections. To construct such a set of points using a tree, each node would accept a range, be able to calculate its value as the midpoint of that range, and pass a bisected range to construct each child node. The result is a binary tree with an infinite number of nodes.

(As a side note, this kind of thing has been done before. The tree described above is just a subset of the Stern-Brocot tree. https://en.wikipedia.org/wiki/Stern%E2%80%93Brocot_tree)

In the spirit of TDD, let’s look at a simple test first. The values and the childrens’ values should give you an idea of what we are constructing here.

public class PointNodeTest {

    private final PointNode root = new PointNode(Range.closed(BigDecimal.ZERO, BigDecimal.TEN));

    @Test
    public void testTraversal() throws Exception {

        assertEquals(new BigDecimal(5), root.getValue());
        assertEquals(new BigDecimal(2.5), root.getLeftChild().getValue());
        assertEquals(new BigDecimal(7.5), root.getRightChild().getValue());
        assertEquals(new BigDecimal(1.25), root.getLeftChild().getLeftChild().getValue());
    }
}

Finally, here is some code that satisfies that test. Once thing you might notice is that the child nodes are lazily instantiated – as they must be – but are not cached. While child objects might be different by reference, they are the same by value if we were to implement equals() and hashcode(). While we could cache the values, that would increase the complexity of the code and at this point I just wanted to illustrate the algorithm.

import com.google.common.collect.Range;
import java.math.BigDecimal;

public class PointNode {

    private static final BigDecimal TWO = new BigDecimal(2);
    public  static final PointNode  UNIT_INSTANCE = new PointNode(Range.closed(BigDecimal.ZERO, BigDecimal.ONE));

    private final Range range;
    private final BigDecimal value;

    public PointNode(Range _range) {
        range = _range;
        value = range.lowerEndpoint().add((range.upperEndpoint().subtract(range.lowerEndpoint())).divide(TWO));
    }

    public PointNode getLeftChild() {
        return new PointNode(Range.closed(range.lowerEndpoint(), value));
    }

    public PointNode getRightChild() {
        return new PointNode(Range.closed(value, range.upperEndpoint()));
    }

    public BigDecimal getValue() {
        return value;
    }
}

Some other questions we can ask: How can we construct the Stern-Brocot tree? Can we construct a tree to represent the set of points in the Cantor set? A larger question is: How can we traverse such a tree? Obviously depth-first search is not an option! Hopefully we will get a chance to answer these questions in a subsequent post!

Leave a comment

Filed under Software Engineering

Code Puzzle: Anagram Detector with Lambdas in Java 8

A while ago we looked at some different approaches to detect anagrams given two strings. In our last detector we constructed a frequency map to determine the character counts of the input strings. This was much more efficient than earlier attempts in that it allowed us to construct a map with a single scan per string.

With Java 8 coming out early next year, now is a good time to revisit this problem with Lambdas! Let’s rework the frequency map approach with Java 8.

The first thing to notice is that using a HashMap carries overhead of checking whether a value has already been mapped or not. This entails calculations with hashcode() and equals(). While not expensive, we can do better. At the cost of more space initially, we can use an array to represent the frequency map. The array index can be the integer value of a character, and the value in the array at that index can be the count for that character. This allows us to directly map a character without any calculation, and to always do a simple increment as long as the array is initialized to all zeros.

Now for the lambda code! Here are the relevant points.

  • String.chars() returns a Stream. The Stream object is a central concept in JDK8 Lambdas
  • Stream.collect() is a reduce operation that mutates the initial value. It’s kind of un-functional to mutate function arguments (causing side effects), but this saves us the overhead of re-creating and combining maps.
  • The initial value is a zeroed-out array, and it is modified by the accumulator
  • The resulting code is actually concise enough that we don’t need a createFrequencyMap() method like we did before.

import java.util.Arrays;
import java.util.HashMap;
import java.util.function.BiConsumer;
import java.util.function.ObjIntConsumer;
import java.util.function.Supplier;
public class LambdaDetector implements AnagramDetector {

  private static final int MAX_VALUE = 65280;

  @Override
  public boolean isAnagram(String input1, String input2) {

     // null and length checks omitted for brevity
     Supplier<Long[]> start = () -> { Long[] a = new Long[MAX_VALUE]; Arrays.fill(a, 0L); return a; };
     ObjIntConsumer<Long[]> accumulator = (map, number) -> { map[number]++; };
     BiConsumer<Long[],Long[]> combiner = (map1, map2) -> {  };

     return Arrays.equals( input1.chars().collect(start, accumulate, combine),
                           input2.chars().collect(start, accumulate, combine));

  }

}

There is a lot to JDK8 Lambdas, this sample barely scratches the surface. Next time we can look at what lambdas bring to the table, my favorites would be easy parallelism and lazy operations. Exciting times are ahead!

Leave a comment

Filed under Software Engineering

Using IE For Fun And Profit

Just Kidding. The title should actually be “Using IE Because You Have To.”

Internet Explorer (IE) is almost always required to be a supported browser for today’s web applications, but many technology workers develop on Mac or Linux where IE is not immediately available. There’s nothing fun or profitable about coding for IE, but with a little help from our friends at Microsoft, at least it’s less painful to run IE on non-windows platforms.

Microsoft provides virtual machine images of different versions of Windows and IE, and you can run these on the platform of your choice. Let’s begin!

  1. You can download them virtual machine images here: http://www.modern.ie/en-us/virtualization-tools#downloads. These are large multi-gigabyte downloads so you might want to download them overnight.
  2. You need to install an unrar tool and Oracle’s VirtualBox to run these so run the following commands to install them and unzip the rar files.
    sudo apt-get install virtualbox unrar
    unrar e [filename.part1.sfx]
    
  3. import the resulting .ova file (virtualbox -> file -> import appliance -> open appliance)
  4. accept defaults and finish the import. This takes a while, do not be alarmed! Feel free to delete all the .rar files if you want to. You probably want to.
  5. Make changes machine settings
    • After installation it helps to have the clipboard shared between the guest and host machines. In VirtualBox Manger, click on Machine -> Settings -> General -> Advanced
      Set Shared Clipboard to “Bidirectional”
      Set Drag n Drop to “Bidirectional”
      Click OK
    • After installation, we need to ensure that the Network Adapter is configured correctly.
      Open up the “Oracle VM VirtuaBox Manager” (or leave it opened after the VirtualBox installation guide).
      Right click on the new Virtual Machine
      Select Network on the left hand side
      Enable Adapter 1 by ensuring that the “Enable Network Adapter” check box is checked
      change the “Attached to:” drop down to select “NAT”
    • Correct any “Non-optimal settings” noted at the bottom of the dialog, such as insufficient video RAM.
  6. install the guest additions: from a running machine, go to Devices -> Install Guest Additions… (this requires a restart)
  7. Now start up the VM and open up IE!

The Windows XP VM expires after 30 days (the others expire after 90 days of uptime). If you have already installed the IE VM and you want to reset it, use VirtualBox to delete the VM (files and all). You should still have the .ova file available from the initial installation. Use VirtualBox to just import this file as an appliance, and reset the settings as described above. Voila, your machine is recreated!

As a final note, there are Vagrant VM’s available for use which would probably ease the download and installation of the VM. The only difference is that you would still have to install guest additions yourself. https://github.com/xdissent/ievms

Leave a comment

Filed under Software Engineering