Monthly Archives: June 2013

Code Puzzle: String Reversal

Here is a question: Given an input String, how do you reverse it? I have seen this used as an interview question during technical interviews. There are myriad ways to implement this, so asking someone to solve this problem gives you some insight into the kinds of solutions they would look to first. Hopefully this can then be used as a proxy to see whether someone fits in with your technical culture or not, but the consensus lately seems to be that technical interviews are not really that useful.

But just for fun, let’s go ahead and play that game. An interface would look something like this:

public interface StringReverser {
   String reverse(String input);
}

What could you tell about someone based on their solution?

Those who like to rely on the standard libraries and have a head for concise solutions might first think of using StringBuffer.

public class SimpleStringReverser implements StringReverser {
    @Override
    public String reverse(String input) {
        return input == null ? input : new StringBuffer(input).reverse().toString();
    }
}

Those who like to bring in functionality from external libraries might first think of Apache Commons.

import org.apache.commons.lang.ArrayUtils;

public class StringReverserArrays implements StringReverser {
    @Override
    public String reverse(String input) {
        if (input == null) {
            return null;
        }
        char[] chars = input.toCharArray();
        ArrayUtils.reverse(chars);
        return new String(chars);
    }
}

Those who eschew primitives out of some idea of object purity might hide the use of primitives altogether.

import org.apache.commons.lang.ArrayUtils;

public class HeavyStringReverser implements StringReverser {
   @Override
    public String reverse(String input) {
        if (input == null) {
            return null;
        }
        Character[] characters = ArrayUtils.toObject(input.toCharArray());
        List<Character> characterList = Arrays.asList(characters);
        Collections.reverse(characterList);
        return new String(ArrayUtils.toPrimitive((Character[]) characterList.toArray()));
    }
}

Finally, those who like to have a little fun with it and are comfortable with recursion might bring up a recursive solution. There are performance issues here of course, the same performance issues that we saw with the recursive solution for detecting palindromes.

public class RecursiveStringReverser implements StringReverser {
    @Override
    public String reverse(String input) {
        if (input == null) {
            return null;
        }
        if (input.length() <= 1) {
            return input;
        }
        StringBuffer buffer = new StringBuffer();
        buffer.append(input.substring(input.length() - 1, input.length()));
        String substring = input.substring(0, input.length() - 1);
        buffer.append(reverse(substring));
        return buffer.toString();
    }
}

Finally, of course all this should be unit tested, and a parameterized test is a great tool for this.

The fun doesn’t stop there. We can take it a little further and look for more creative solutions with questions like “What if the string was in a 3GB file and we couldn’t load the whole thing into memory?” Again, this question is left as an exercise for the reader or for a future blog post 🙂

What other variations on the string reversal problem can we ask? Can you think of other solutions?

Leave a comment

Filed under Software Engineering

Code Puzzle: Palindrome Detector

The palindrome problem is a fun little programming exercise. Given a string, how can you determine that it is a palindrome (the same whether spelled forwards or backwards)?

There are a few solutions that come to mind.

  • loop solution: check each character against the character at the mirror position at the “other end” of the string
  • recursive solution: compare the first and last characters of the string, and pass the rest of the string into the same method, returning true until the comparison fails
  • reversal solution: reverse the string and compare the entire reversed version with the original. For a more performant variation, reverse the second half of the original and compare it with the first half.
  • hash solution: reverse the second half of the original string and compare its hash with the hash of the first half. Mathematically this can produce false positives but not false negatives, so maybe this is more accurate to call a non-palindrome detector. It’s probably not the best solution, but still a creative idea.

The first (available on github) is what I think of as the default palindrome detector. Essentially you just loop over the string, checking each character as you move from the beginning/end of the string to the center of the string.

public boolean isPalindrome(String input) {

   // only need to loop over half of the string because second half is checked at the same time as the first
   // also can break if you detect it's not a palindrome in which case don't need to check the rest
   int halfLength = input.length()/2;
   for(int i=0; i < halfLength; i++) {
      if(input.charAt(i) != input.charAt(input.length()-1-i)) {
         return false;
      }
   }
   return true;
}

The second one (also available on github) is a nice little example of a recursive algorithm. One issue with it is that String.substring() returns a new String, so you could quickly run out of memory if the String is very large. One way around that is for the method to pass the original String in the recursive calls so no new memory is allocated, and instead recurse on the indices you’re checking.

public boolean isPalindrome(String input) {

   // base case for recursion:
   // length 1 for odd-length starting strings, length 0 for even-length starting strings
   if(input.length() == 1 || input.isEmpty()) {
      return true;
   }

   if(input.charAt(0) == input.charAt(input.length()-1)) {
      return isPalindrome(input.substring(1,input.length()-1));
   }

   return false;
}

Finally, of course we would want to unit test all of this. Parameterized tests fit the bill quite well for something like this.

The fun with Palindromes doesn’t end here. We could ask other questions like “What if the string was in a 3GB file and we couldn’t load the whole thing into memory?” This question is left as an exercise for the reader or for a future blog post 🙂

What other variations on the palindrome problem can we ask? Can you think of other solutions?

Leave a comment

Filed under Software Engineering

Connection Pools And Validation Queries

Let’s say you start working on your application first thing in the morning. The first action you perform with the application results in a stack trace like this, but subsequent calls succeed so it’s not easily replicable. Is this a good way to start your day?

Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 32,308,627 milliseconds ago.
The last packet sent successfully to the server was 1 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
...
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2552)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3002)
... 58 more

I saw a question about this on stackoverflow the other day and thought it was worth answering because I had just dealt with the same issue myself. Here is what’s happening…

Consider the following flow of events:

  1. A database connection is requested and used by the caller (application or connection pool)
  2. The caller keeps a reference to it so that the connection can be re-used (connections are expensive to create)
  3. The caller goes through a period of inactivity (for example, a dev system overnight or a QA system over the weekend).
  4. Once that database connection is not in use, the database considers the connection to be idle. Because it is idle, after a certain amount of time (MySQL default is 8 hours) the database closes the connection.
  5. The caller (application or connection pool) still has a handle to the connection, and when the caller tries to use the connection again, unpleasantly discovers that connection has been closed.
  6. The connection pool now knows that the connection has been closed, and opens a new connection for subsequent calls (so subsequent calls succeed)

It turns out this is not an uncommon situation. Some variation of the question about the nature of this situation gets asked over and over and over and over again.

There are a few solutions, each with their own advantages and disadvantages:

  1. Use a connection pool manager like C3P0 or DBCP, and configure it with a validation query to validate connections on request. This would be a property you set on the  DataSource (like DBCP’s BasicDataSource). The reason testing the validity of the connection works is that we are instructing the calling system to test the connection for this situation and to try again if this situation happens. An advantage is that this requires no code, just app configuration. However, it requires more calls even for valid connections, although in practice this is trivial.
  2. Set the global wait_timeout property in mysql large enough for your use case (default is 8 hrs). Maximum value is 31536000 seconds (one year!). The advantage of this is that no code change is required, and it is potentially the most performant. However, this comes at a cost. We have to remember to configure the database this way, and to configure it correctly. We need to balance the timeout against usage so we don’t have too many idle connections.
  3. Setup a scheduled task that refreshes connections, “pinging” the database server. This could work fine, but it requires extra code and requires the application to know the timeout configuration of the database so it can ping more often than the timeout. This means that the configuration of the app with the configuration of the database is more tightly linked.

To experiment that a potential fix works, we can trigger the error if we set the global wait_timeout on mysql (using /etc/mysql/my.conf and restarting mysql) to something much less. Set it to, say, 30 seconds, and check the value with mysql> show global variables; Now we can replicate the error at will and show that a fix works.

Setting the validation query on the connection pool seems to be a pretty standard resolution to this problem, and it’s the one I’ve settled on. It seems to provide a good balance of less code and looser coupling between database configuration and application configuration.

One might ask “Why do I need to check the connection before using it? Can my application just close the connection after every call?” If you are opening and closing database connections yourself, yes you should eventually close the connection if you’re not using it. But usually you don’t want to be opening and closing database connections yourself, you’ll want to use a connection pool. And if the connection pool is hanging on to the connection, you do NOT want it to close the connection after every call. The point of the connection pool is to keep a connection ready for the next call because creating connections is expensive. The connection objects maintained by the pool are backed by actual database connections, and the database is the one who closes that actual connection after the idle timeout period. Note that the timeout to close idle connections is configured on the database, not on the connection pool. Because of this, the pool has no way of knowing whether the connection has been closed or not unless it actually tries to connect with it. That’s why you need a validation query.

For more information about configuring DBCP, see the configuration page and the API docs.

Leave a comment

Filed under Software Engineering

Propagating Best Practices in Code Through Communities of Practice

In a previous post about propagating best practices in code, we confronted the question “How do we build up best practices or established ways of doing things in the code?” The answer was a variety of communication techniques: code reviews, learning sessions, mentoring, pair programming, and documentation.

We can add another item to that list of knowledge management techniques: Communities of Practice (CoP). The wikipedia definition of Community of Practice is very general: “a group of people who share a craft and/or a profession.” If we wanted a CoP for purposes of propagating best practices in code, we could refine it to say that we want “a group of people who share a codebase, the group being created specifically with the goal of sharing knowledge related to that codebase.”

Imagine a group of developers for a project coming together on a regular basis to discuss the codebase. If people talked about their current work or gave short tutorials on different parts of the system, they would quickly eliminate the scattered word-of-mouth knowledge about how things work or how to make certain kinds of changes. Other topics could include issues in maintenance, upcoming architectural decisions with risks and mitigations, current work on a particular component and how that relates to the rest of the code, reviews of a component to bring it in line with existing best practices, and so on.

There are known ways to encourage the success of your CoP. They are listed on wikipedia but I’ve repeated them here with suggestions for practical implementation in this context.

  1. Design the community to evolve naturally: The group should be able to shift direction and focus as needed. Making this an explicit goal and allocating time for meta discussion on the group would be a practical way to encourage natural evolution of the group.
  2. Create opportunities for open dialog within and with outside perspectives: There should be free exchange of ideas between members of the team who have been there the shortest and the longest. Newer members have outside perspectives that can be brought up to compare with the inside perspectives, resulting in cross-pollination of the best ideas.
  3. Welcome and allow different levels of participation: There will naturally be high content contributors and low content contributors (who are there to absorb content from high contributors). Both levels are necessary for the group to effectively share information. To implement this, not everybody should be forced to contribute content. Contribution should be completely voluntary.
  4. Develop both public and private community spaces: If there is an online component of the group, private messaging should be possible. Regular private meetings should be encouraged.
  5. Focus on the value of the community: Again, meta analysis is important. If the group is not productive as it stands, it should shift focus (per previous note) or be disbanded altogether. The purpose is to create value, not to have unnecessary meetings.
  6. Combine familiarity and excitement: Besides review of the familiar existing code, there should be brainstorming about new and exciting ideas for the technical direction of the codebase.
  7. Find and nurture a regular rhythm for the community: A regularly scheduled meeting instead of ad-hoc meetings would be good approach. Weekly would be a good place to start, possibly moving to bi-weekly if there is not enough content.

In summary: a regularly scheduled meeting of developers to just talk about the codebase goes a long way to propagating best practices. If you start a CoP the ideas here should give you a good place to start!

Leave a comment

Filed under Software Engineering

Some Basic Caching Notes

This is just a quickie post with some notes about caching. The subject has been covered in great detail elsewhere (see links below) so I will leave the heavy discussions to them, but wanted to point some things out.

Caching strategies can form an important part of the performance and scalability of your system. Caching can also cause problems when it is not stored or cleared correctly: cache invalidation is a hard problem.

There are multiple levels to use cache in a system (all of which can be cached and cleared):

  1. session storage (HTML5, client side application level cache)

  2. browser cache

  3. spring cache (service layer)

  4. Hibernate L2 cache (data layer)

For RESTful web apps, the argument can be made that we should consider the browser to be the cache, and the application should be in the business of computing cache keys, not actually caching. More on that with the discussion here (this is my handy link of the week).

If computing ETags is what you want to do there is a way to use a servlet filter to compute ETags. This saves on bandwidth but not on computation. The discussion above notes: in an era of mobile phones with iffy connections, it’s often worth it for the end-user experience to go compute a complete response just to tell the caller 304 NOT MODIFIED

Finally, here is another more sophisticated etag technique with Spring.

Again, this is just a quick summary of caching levels and some handy links. We may go into more detail with some of this later 🙂

Leave a comment

Filed under Software Engineering