Connection Pools And Validation Queries

Let’s say you start working on your application first thing in the morning. The first action you perform with the application results in a stack trace like this, but subsequent calls succeed so it’s not easily replicable. Is this a good way to start your day?

Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 32,308,627 milliseconds ago.
The last packet sent successfully to the server was 1 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
...
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2552)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3002)
... 58 more

I saw a question about this on stackoverflow the other day and thought it was worth answering because I had just dealt with the same issue myself. Here is what’s happening…

Consider the following flow of events:

  1. A database connection is requested and used by the caller (application or connection pool)
  2. The caller keeps a reference to it so that the connection can be re-used (connections are expensive to create)
  3. The caller goes through a period of inactivity (for example, a dev system overnight or a QA system over the weekend).
  4. Once that database connection is not in use, the database considers the connection to be idle. Because it is idle, after a certain amount of time (MySQL default is 8 hours) the database closes the connection.
  5. The caller (application or connection pool) still has a handle to the connection, and when the caller tries to use the connection again, unpleasantly discovers that connection has been closed.
  6. The connection pool now knows that the connection has been closed, and opens a new connection for subsequent calls (so subsequent calls succeed)

It turns out this is not an uncommon situation. Some variation of the question about the nature of this situation gets asked over and over and over and over again.

There are a few solutions, each with their own advantages and disadvantages:

  1. Use a connection pool manager like C3P0 or DBCP, and configure it with a validation query to validate connections on request. This would be a property you set on the  DataSource (like DBCP’s BasicDataSource). The reason testing the validity of the connection works is that we are instructing the calling system to test the connection for this situation and to try again if this situation happens. An advantage is that this requires no code, just app configuration. However, it requires more calls even for valid connections, although in practice this is trivial.
  2. Set the global wait_timeout property in mysql large enough for your use case (default is 8 hrs). Maximum value is 31536000 seconds (one year!). The advantage of this is that no code change is required, and it is potentially the most performant. However, this comes at a cost. We have to remember to configure the database this way, and to configure it correctly. We need to balance the timeout against usage so we don’t have too many idle connections.
  3. Setup a scheduled task that refreshes connections, “pinging” the database server. This could work fine, but it requires extra code and requires the application to know the timeout configuration of the database so it can ping more often than the timeout. This means that the configuration of the app with the configuration of the database is more tightly linked.

To experiment that a potential fix works, we can trigger the error if we set the global wait_timeout on mysql (using /etc/mysql/my.conf and restarting mysql) to something much less. Set it to, say, 30 seconds, and check the value with mysql> show global variables; Now we can replicate the error at will and show that a fix works.

Setting the validation query on the connection pool seems to be a pretty standard resolution to this problem, and it’s the one I’ve settled on. It seems to provide a good balance of less code and looser coupling between database configuration and application configuration.

One might ask “Why do I need to check the connection before using it? Can my application just close the connection after every call?” If you are opening and closing database connections yourself, yes you should eventually close the connection if you’re not using it. But usually you don’t want to be opening and closing database connections yourself, you’ll want to use a connection pool. And if the connection pool is hanging on to the connection, you do NOT want it to close the connection after every call. The point of the connection pool is to keep a connection ready for the next call because creating connections is expensive. The connection objects maintained by the pool are backed by actual database connections, and the database is the one who closes that actual connection after the idle timeout period. Note that the timeout to close idle connections is configured on the database, not on the connection pool. Because of this, the pool has no way of knowing whether the connection has been closed or not unless it actually tries to connect with it. That’s why you need a validation query.

For more information about configuring DBCP, see the configuration page and the API docs.

Advertisements

Leave a comment

Filed under Software Engineering

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s