Eradicate SQL Injection Once and for All

This is the last (I think) of the tips for clearing up the semantic issues. You should hopefully notice a common theme on these - the way to fix injections is primarily properly formatting the data for presentation to whatever layer you're passing it off to. That doesn't mean that you oughtn't do input validation - it simply means that you oughtn't rely only on input validation.

Here are the steps necessary for removing most possibility of SQL injection.

1) Apply a level of indirection. In the case of using a database, this will help prevent several problems, and I'll go into more detail in a bit. You want to apply this indirection to keys that get passed to the database, as well as field names that get used dynamically - for example ORDER BY statements where the user gets to pick what order things are presented in.

2) Use Parameterized Queries or Prepared Statements. They basically mean the same thing, and most every modern language has a way to deal with them. They allow you to use placeholders rather than constructing dynamic SQL. So a statement that would ordinarily look like this:

strInput = request.getParameter("searchdescr");
strSQL = "SELECT * FROM product WHERE description = '" + strInput + "' ORDER BY price DESC";
rs = db.execute(strSQL);

where a user could put naughty things in searchdescr, you do something like:
strInput = request.getParameter("searchdescr");
stmt = db.prepareStatement("SELECT * FROM product WHERE description = ? ORDER BY price DESC");
stmt.setParameter(1, strInput);
rs = stmt.execute();

What has taken place here is that we use the database itself to do all the sanity checking on the parameter put in. We also don't include single quotes around the dynamic value. And we don't escape them in any way before the setParameter() call. The benefits of this are many-fold:

  • It improves performance in a couple of different ways. First, if you're going to be performing many calls, only changing one parameter, it's much more efficient for the database server to compile the query with placeholders, then execute with different parameters. Second, the fine people who wrote your framework do a better job (sorry, but it's probably true) of efficiently escaping anything necessary.
  • You get type checking on the data for free. Because the database server is expecting particular types, the setParameter() (or whatever it's called in your tool of choice) will fail if you provide data that can't be coerced.
  • You don't have to do a bunch of checking for certain characters then escaping them. All of that is handled by the database server.

Now, the only things that can be parameterized in a call like this are literal column values in the query. You can't parameterize table or field names (so you can't parameterize what columns to pick out, or the sort order) so you'll need to make additional provisions there as necessary with a level of indirection (read: don't send column names to the user for them to pick, then use those literal strings in your query without checking them).

Now, the level of indirection in database calls helps you in several ways:
  • You limit key exposure. While it's not that big of a deal for people to see primary key values if you're properly permission checking everything, you don't need for your customers to know that you use an auto-incrementing integer primary key, or a GUID key.
  • You protect yourself from privilege escalation on calls after a listing render. If you don't have permission to edit a particular item, it won't be in the listing, hence you can't pick some value not in the list because you're picking them by index in the list, not by ID field.
  • You protect yourself from schema exposure - if you're not passing column names back for the user to pick in what shows up or for sequencing, they can't start to enumerate table names.

And if you can work it out right, I recommend using a database utility class like Hibernate or iBATIS. Lots of eyes look at these frameworks, so you're pretty well protected with them, and they do a nice job of abstracting the database layer calls from you.

Notice that I didn't say that using stored procedures fixes the problem. It's sad how many people are convinced that they're protecting themselves simply by using stored procedures, but then don't actually use them properly - i.e., they call the stored procedure, using dynamic SQL to fill in the parameter values, rather than placeholders.

I may have to flesh out the two components of this more later. Different ways of adding a layer of indirection when dealing with the database is a multi-parter, and I'd like to give you a better foothold on properly doing parameterized queries in your tool of choice.