Monthly Archives: May 2013

Sanitizing User Input, Part II (Validation with Spring REST)

In Part I of sanitizing user input, we looked at the how, why, and when of sanitization. In Part II, we will look at one technique for validating at time of input, thus reducing the problem to a validation problem. For the context of this question, we will be using Spring MVC as the RESTful endpoint, with domain objects automatically marshalled from a JSON request body.

The problem with sanitizing user input is where to do the sanitizing. It’s generally a good idea to validate input as early as possible, so it makes sense to do something like this in the controller layer. We could grab the domain object given in the controller method argument, extract all the strings we care about, and validate them one by one. This might be straightforward and easy to implement for one or two domain objects, but for a respectable system of any size, this approach would not scale very well. Let’s take a different approach.

Imagine if you will, a world in which we can detect malicious input and reject it as invalid just as we would any other invalid input. Sanitizing would then simply be a validation problem. Importantly, validation is a solved problem with known frameworks. In this case we can leverage JSR 303 and Spring’s support for applying validation annotations to controller arguments.

Here are the steps to accomplish this:

  1. Define an annotation to be applied to an Entity’s field. This annotation will be linked to a validation implementation.
  2. Define a custom validation constraint to do the actual safety check, we can leverage the OWASP Java HTML Sanitizer. The sanitizer will sanitize the string, and the validation check will just be to see that the sanitized version is the same as the original version (thus showing that it does not contain content against your security policy).
  3. Apply the validation annotation to the Entity’s field.
  4. Apply the @Valid annotation to the Controller argument, include the BindingResult in the Controller method, and check it inside the method for finer control over sanitized input failures

Ok, step 1: Define the annotation to be applied to the field. This is part of the standard Validation API. Note the @Constraint annotation applied to @NoHtml, and how it references a ConstraintValidator.

@Constraint(validatedBy = NoHtmlValidator.class)
public @interface NoHtml {
    // TODO use a better message, look up
    String message() default "{org.myproject.constraints.nohtml}";
    Class<?>[] groups() default {};
    Class<? extends Payload>[] payload() default {};

Step 2: The validation to be applied to fields with the specified annotation would look like this. Note that this class extends ConstraintValidator, and fits in with the standard Validation API. We can refine the behavior of this implementation by providing annotation values, and accessing them in the ConstraintValidator’s initialize() method.

import javax.validation.ConstraintValidator;
import javax.validation.ConstraintValidatorContext;
import org.owasp.html.HtmlPolicyBuilder;
import org.owasp.html.PolicyFactory;

public class NoHtmlValidator implements ConstraintValidator<NoHtml, String> {

   // builder is not thread safe, so make local
   private static final PolicyFactory DISALLOW_ALL = new HtmlPolicyBuilder().toFactory();

   public void initialize(NoHtml constraintAnnotation)
      // TODO specify the policy as an annotation attribute
      // to use them, values from annotation are stored in private properties here

   public boolean isValid(String value, ConstraintValidatorContext context)
      String sanitized = DISALLOW_ALL.sanitize(value);
      return sanitized.equals(value);

Step 3: Apply the custom validation constraint to the Entity. For instance, if we had a User class, we can add the @NoHtml annotation to its display name because we know that it can be displayed on pages to other users.

public class User {

   @Size(min = 3, message = "must be at least three characters")
   private String displayName = "";


Step 4: At this point if we attempt to save an object with this validation annotation, and the annotated field contains html, the persistence will fail with a validation exception. This is a good first step. We can now apply input validation at any point in our domain without duplicate or excess code in our domain classes. The validation is self-contained and easy to test.

However, the exception thrown is a validation exception which is not discovered until Hibernate tries to commit the object. On principle it would be nice to have the input checked as early as possible; at the controller layer. Also at that point we could prevent the persistence-level exception and control the response going back to the client directly.

To do this, we need to add @Valid to the incoming object which was unmarshalled by Spring, and include the BindingResult in the method signature. With those in place we can detect validation errors (including sanitizing checks and all other validation) and return a more appropriate response. In the sample below, an IllegalArgumentException is thrown and a Spring exception handler (not shown) resolves that to a 400 response. There are other ways to accomplish this as well, but the point here is to show where and how the validation error can be intercepted in the controller.

public class UserResource {

   private UserService userService;

   public UserResource(UserService us) {
      userService = us;


   @RequestMapping(value="/create", method=RequestMethod.POST, produces={"application/json"})
   public @ResponseBody Resource<User> createUser(@RequestBody @Valid User newUser, BindingResult binding) throws UnsupportedEncodingException
      if(binding.hasErrors()) {
         // catch this and return a 400 response from your general handler
         throw new IllegalArgumentException();

      // create new User here...

And there we have it! Input sanitization solved as an input validation problem! Hopefully this is helpful to you. And if you remember nothing else from this post, remember: Assume All User Input Is Evil! 

Leave a comment

Filed under Software Engineering

Sanitizing User Input, Part I

Many years ago I took a secure coding class. I mainly remember one thing from the course: “Assume all user input is evil.” This is fine because the instructors did say “If you only remember one thing from the course, remember this!”

What can a user do with input alone? Let’s say you are a malicious user of a web forum. You could create an account on the forum and set your display name to “<script>window.alert(‘surprise!’);</script>”. After registration, there is a script tag with javascript being stored in the database where your username should be. With that in place, any time another user of the forum loads a page where your username is displayed (say on any of your comments) your custom javascript will execute on their browser!

This is a Bad Thing because usually a malicious user would not just pop up “surprise” but instead use that snippet of javascript to, say, grab all the cookies in your browser and send them to said user for nefarious purposes. This is called Cross Site Scripting (XSS) and works best on pages that are rendered on the server because the script is always loaded by the browser that way. The script might not be run if it’s added to the page after the page is loaded by an AJAX call. But an AJAX application can still be susceptible, for example if the username is added to the DOM like this:

// bad code that is susceptible to XSS
// JSON object returnedUser.displayName is
// “<script type="text/javascript">// <![CDATA[
// ]]></script>"
var newdiv = document.createElement('div');
newdiv.innerHTML = returnedUser.displayName;

Sometimes developers in an overzealous commitment to security decide to html encode all input, or strip all special characters such as “<” and “>”. There are worse things to be overzealously committed to, but we can do better.

Instead of sanitizing every field every time, we can say that the check depends on what the content is supposed to represent and how it will be displayed. For example, in a user display name, a link tag is probably not valid to have as part of the name, but a link tag or style tag may be very helpful and relevant in a displayed product description. For this reason, most HTML sanitizers are very flexible about how they can be configured, so that we can easily allow different kinds of HTML in different places.

Now that we’ve agreed we need to sanitize user-supplied text, the next question is when to do the sanitizing. You could make an argument to sanitize user-supplied text on input (when the text is first persisted) or on output (just before it is rendered on subsequent pages). A reason for sanitizing input is that philosophically it makes sense to catch potential security issues as early as possible, and you would avoid storing malicious input on your system at all. This takes malicious input and turns it into a validation issue, just like storing numbers for a zip code is a validation issue. A reason for sanitizing output is that the site then has the option to change sanitization policy dynamically. At some point html may be considered unsafe (say, allowing links in a product description) and later it could be considered safe (due to a business decision to allow it, or due to discovery of something you missed in your sanitization policy). If you sanitize only output, you are free to change or fix the policy whenever you want.

Some libraries are designed to operate on input. Lacking a strong driver to be able to change the sanitization policy dynamically, I would favor sanitizing on input as well. Stay tuned for Part II where we’ll look at a slick way to sanitize input as an input validation problem!

Leave a comment

Filed under Software Engineering

Containerless Web Applications Part I: Introduction

The first time I ever saw a Scala program run, it was running as a standalone program with the Play Framework. There was no application server, no servlet container, in fact no container of any kind. It was a program running all by itself, just like the programs we learned to write in college. From a JEE point of view, it was odd, but strangely beautiful.

Fast forward a few years, and we see this idea of container-less deployment becoming more widespread among the Java players. There has been a trend of developers looking for simpler ways to run their applications: think about people moving from EJB 2.0 to Spring, and from heavy Application Servers to simpler servlet containers. It only makes sense that eventually more people would eschew containers altogether and embrace the concept of putting HTTP-handling inside your application instead of outside.

This approach has a few names: containerless, war-less, or “embedded jetty” (if you want to google for more on this topic). There are some disadvantages, of course… nothing in life is free. But it’s the opinion of many developers that the advantages outweigh the disadvantages.

Let’s outline some of the advantages of containerless deployment:

  • Better IDE support   Since it’s a regular application instead of something hosted in a web server, it can be easily started and stopped by any IDE without special plugins and without attaching a server to your IDE. Additionally, profiling is a breeze, and debugging is instantaneous (no need to attach your debugger to a remote server).

  • Simplified development  Developers run the application exactly the same way in development as in production. This reduces the possibility of errors from differences between running in development and running in production. A pleasant side effect is that even between developers it’s impossible to have server-specific errors because someone ran the application on one server and someone else ran it on another.

  • Easier to deploy  Deployments are easier to reproduce. There’s no need for the maven cargo plugin, there’s no copying from here to there, and there’s no extra server configuration that you need to maintain separately from the application.

  • Ease of re-use  The application is already a jar and is easily usable as a standalone library itself.

  • Startup is faster Even for a small application, Tomcat can take a minute (or more). With an embedded server there is less for the container to do and it can start in mere seconds.

  • Less classloader issues Fewer classloaders means fewer classloader issues. Conflicts between your application’s dependent libraries and the libraries distributed with your server has been known to cause bizarre and difficult-to-diagnose classloader problems.

Hopefully this whets your appetite for containerless deployment. In upcoming posts, we will review a simple architecture for a containerless application (complete with working code).

Leave a comment

Filed under Software Engineering

Using A Variable Depth Copy to Prevent Hibernate LazyInitializationException

What is LazyInitializationException?

I’m going to go out on a limb and say: Anybody who has ever used Hibernate has at some point triggered a LazyInitializationException. Seriously, questions about what this exception is and how to fix it pop up all the time.

What is a LazyInitializationException? In a nutshell, if you use Hibernate to load an entity from the database, and then try to access a lazily-loaded portion of that entity’s object graph from outside the database session, Hibernate will throw a LazyInitializationException. The simple answer is to just not access lazily loaded properties or collections outside of a session. However, the devil, as they say, is in the details.

If you’re unfamiliar with this exception and its solutions, I recommend reading up on the many options to solve this problem. The options have been covered enough and are google-able enough that I don’t need to discuss all of them here (except maybe to ask you to avoid OpenSessionInView). But what I can discuss here is my own solution to the problem, and show how my solution provides a two-for-one bundle of goodness.

What were the issues I was looking to solve?

Besides preventing this exception, first of all I was looking for a solution that would convert Hibernate objects into a DTO that looks the same as my regular entity / domain object. I investigated mapping with Dozer, but it looked a bit heavy for my needs, and I didn’t want to introduce another library unless it was actually necessary. Another option is to use a HibernateUnproxifier, but that still requires code to access each specific collection every time, and I was looking for something more general.

Secondly, I was looking for a solution that would load collections to the right depth (deep enough, or shallow enough). Sometimes I wanted just the top-level properties of an object, sometimes I wanted the object’s objects or collections, and so on. As mentioned above, the simple answer is that you can just lazily load the objects and collections you want as needed. But the devil in the details is that it quickly becomes cumbersome to modify your DAO or Service with methods to load this specific collection here and that specific collection there.

Finally, I was looking for a solution that prevents circular references.  Sometimes my domain classes had to have circular references to satisfy certain Hibernate mappings. When marshalling these objects to JSON, the conversion would break. It’s easy to introduce @JsonBackReference, but again I felt like there had to be a better way.

Copy It!

Enter the VariableDepthCopier. With this class you can copy an object coming from the database into a new instance of the same class, and specify how deep the resulting object graph should be in the new copy. This copy can then be safely passed around or marshalled to the client without worrying about LazyInitializationExceptions.
As for how to specify the copy depth: an object copied with level 0 has just the primitives and java.lang immutable classes (such as String, Number, Date, etc). Non-primitive properties will be set to null, and collections will be empty. Level 1 contains non-primitive properties, and collections will be filled. These child objects and objects in collections are set to the equivalent of what we saw as level 0 for the first object. The pattern goes on, you can copy to level 2, level 3, and so on. I’m going to make the argument that you should not return a variable depth copy of level higher than 2.

How does this satisfy the three issues I was looking to solve?

First, this converts my domain object to a new object of the same class but without Hibernate’s persistent collections. This prevents the exception. Hibernate’s persistent collections aren’t copied, their contents are copied into the natural collection of the target class.This way we don’t need the HibernateUnproxifier, everything can be copied without knowledge of Hibernate.

Secondly, the copy still needs to be performed inside a transaction, but it can be done generically without needing to specify which specific collections are loaded. The copier provides complete control over how shallow or deep the copy/mapping occurs.

Finally, this copy performs cycle detection and sets any repeated copies to null. With this mapping technique, I was able to remove @JsonBackReference from all of my domain objects.


It’s important to note that the copy is done according to bean properties, not necessarily through field reflection. Because of this, the object to be copied should be a domain object following the Java Bean pattern. Additionally, the copier depends on Spring’s BeanUtil, so it works best in a project already using Spring. If this were to be distributed as a more general purpose tool, I would probably try to rework it to use field reflection, and to not have other dependencies. As of this writing, this solution works fine in my project so this is the state it’s in right now.


As mentioned above, there are many ways to deal with LazyInitializationExceptions. I thought this was a neat and useful idea. If you have ever dealt with LazyInitializationExceptions, hopefully you’ll find it useful too.

Leave a comment

Filed under Software Engineering