I recently read an article about stream processing. The concept of stream-oriented algorithms is interesting in its own right, but today we’ll be talking about data structures. The data structure presented there was a stream, which we can think of as an array of infinite length – an infinite data structure.

Thinking about arrays of infinite length leads us to ask other questions, like: “What other infinite data structures are there? If so, what might they look like and how can we operate on them?”

One structure that we all know and love is the Tree. It turns out there is an infinite tree data structure: the “Infinite Complete Binary Tree”. These turn out to be more useful as mathematical constructs than as representations of user data, but they are interesting in their own right. Besides, familiarity with them may give us insight into dealing with finite but arbitrarily large trees that do represent real-world data. Let’s build ourselves an infinite tree!

From a mathematical point of view, we can consider the set of all points of the form (m/2^n). This is easily constructed recursively by bisecting the closed interval [0,1] and recursively bisecting the resulting bisections. To construct such a set of points using a tree, each node would accept a range, be able to calculate its value as the midpoint of that range, and pass a bisected range to construct each child node. The result is a binary tree with an infinite number of nodes.

(As a side note, this kind of thing has been done before. The tree described above is just a subset of the Stern-Brocot tree. https://en.wikipedia.org/wiki/Stern%E2%80%93Brocot_tree)

In the spirit of TDD, let’s look at a simple test first. The values and the childrens’ values should give you an idea of what we are constructing here.

public class PointNodeTest { private final PointNode root = new PointNode(Range.closed(BigDecimal.ZERO, BigDecimal.TEN)); @Test public void testTraversal() throws Exception { assertEquals(new BigDecimal(5), root.getValue()); assertEquals(new BigDecimal(2.5), root.getLeftChild().getValue()); assertEquals(new BigDecimal(7.5), root.getRightChild().getValue()); assertEquals(new BigDecimal(1.25), root.getLeftChild().getLeftChild().getValue()); } }

Finally, here is some code that satisfies that test. Once thing you might notice is that the child nodes are lazily instantiated – as they must be – but are not cached. While child objects might be different by reference, they are the same by value if we were to implement equals() and hashcode(). While we could cache the values, that would increase the complexity of the code and at this point I just wanted to illustrate the algorithm.

import com.google.common.collect.Range; import java.math.BigDecimal; public class PointNode { private static final BigDecimal TWO = new BigDecimal(2); public static final PointNode UNIT_INSTANCE = new PointNode(Range.closed(BigDecimal.ZERO, BigDecimal.ONE)); private final Range range; private final BigDecimal value; public PointNode(Range _range) { range = _range; value = range.lowerEndpoint().add((range.upperEndpoint().subtract(range.lowerEndpoint())).divide(TWO)); } public PointNode getLeftChild() { return new PointNode(Range.closed(range.lowerEndpoint(), value)); } public PointNode getRightChild() { return new PointNode(Range.closed(value, range.upperEndpoint())); } public BigDecimal getValue() { return value; } }

Some other questions we can ask: How can we construct the Stern-Brocot tree? Can we construct a tree to represent the set of points in the Cantor set? A larger question is: How can we traverse such a tree? Obviously depth-first search is not an option! Hopefully we will get a chance to answer these questions in a subsequent post!