Good Habits Part I: Input Validation


In a previous post, I made mention that many security flaws are really the result of a lack of good programming practices. Since I'm a solutions-oriented type, here's Part 1 of <<not sure how many>> on some of those good habits.

My colleagues will probably have a heart attack upon reading that Input Validation is my first post on good habits, as I've always been very passionate about output filtering. But I've always felt that proper business-rule input validation is critical.

Remember back to your days as a wee tot programming in turtle logo on the Apple II-e. Or maybe just remember back to programming in BASIC on the C=64 or TRS-80. (I realize I'm dating myself here). When you were first learning the basics of programming, one of the most important things you had to do as soon as you took input from a user, from a file, or from thin air, was to verify that the type of data received was consistent with the type of data you intended to operate on. This was absolutely critical to having the application behave as expected.

C and assembler programmers know this very well. If you don't deal with the right type of input in your application, you'll get very odd results. Things like reading random areas of memory, buffer overruns, fencepost errors, and on the rare occasion, you'll give your application the ability to inject new handlers into the Interrupt Vector Table. But somewhere around VB4, and those other 4-GL's, when components were coupled to the code that deals with them, and the design interface allowed but didn't require you to specify the format of incoming data, applications went very quickly from prototype to production, and those simple requirements were left by the wayside. (I think I'm guilty of telling people "Don't enter apostrophes, they mess things up" as well.)

Things didn't improve when we made web applications. And at some point security professionals thought they'd start recommending input validation again. But since the application security field seems to have evolved out of network security, rather than application development, security professionals always made recommendations regarding "malicious characters" (a term that makes me sick to my stomach). So those developers who have effectively been un-trained (or never trained at all) of good input validation habits are now being drawn into what is almost as bad - blacklist input validation. I say almost as bad because it blacklist input validation provides very, very little protection (just ask MySpace), but gives developers a false sense of security.

So what are the good habits? I think the best approach to input validation is to follow this (pretty well agreed-upon) formula:

  1. Whitelist input validation. If the specific item requested or sent by the user isn't in the list, reject the value. And this can happen in more places than you might think. If you have a jump page, you know the pages you expect to send your users to off your site, so enumerate those, and use a level of indirection so that the user can only send, say integers, which index into an array with the list of sites that you can jump to.
  2. Positive regex matching. This is where you specify a regular expression that the field must match. Phone number matching might look (in the US) something like: /\(?\d{3}[)-]?\d{3}-?\d{4}/, and if the data doesn't match that, then reject. Surnames might match /^([a-zA-Z]+[ '-])?[A-Z][A-Za-z]+$/. If I get the time, I'll reconstruct my regex for matching an IP address without having to do arithmetic on each octet.
  3. Negative regex matching. This is "don't allow malicious characters". I hate this philosophy. There are a billion ways that "malicious characters" (did the bytes themselves intend harm?) can be encoded on input. If you're stuck with this, you had better be doing presentation filtering (another good habit) further down the pipe - you are bound to miss something.

So there you have it. Deal with what you expect, first. And please use libraries where possible like the Apache Commons Validator, which will give you a really consistent interface for validating things. And really, really look for chances to apply a level of indirection. Like I say, there are probably more places that you can do that than you might think. (Primary key fields, jump URL's, any list, etc.)