Ouput Filtering

Okay, I've delayed this as long as I could, just so that those who know me wouldn't accuse me of going straight for my soapbox. But this would be an appropriate time, considering that people have beat the IE7 (it's been there since 6) "vulnerability" to death, and now the only security "news" is the "0day" on MySpace (What! A Script injection vulnerability on a site that allows you to inject script!), and that hackers are now hacking for money, not just annoyance (this is news?)

Here's the soapbox - output filtering. It seems that people are 100% certain that input validation cures all script injection (XSS - which it's really not) vulnerabilities, and that if you still have injection vulnerabilities, you're obviously not doing your input validation right.

Do not take what I say from here on out as license to STOP doing input validation! You have been warned.

Here are several reasons why I believe output filtering is a more appropriate and complete approach to ALL injection vulnerabilities, not just script injection (XSS).

  • There's nothing inherently "malicious" about <, >, " or '. It's when they make it all the way to an HTML presentation layer that they do things the developer probably did not intend.
  • If dynamic user input is going to various outputs, it has to be encoded for those outputs. If the user input ends up making it into a PDF, &amp;lt; is probably not what the user wanted, when they truly wanted displayed <
  • There are a billion different ways to encode "malicious" HTML injection characters going in, but only one way to properly encode them going out.
  • Input validation is for verifying business rules, not semantic output rules.
  • If you actually make your output XHTML, and want to make it strict, you have to output filter anyway.
  • Your data may not come from users. It might come from a vendor feed, or some other outside data source - or even a DBA could put script directly into the database.
  • What's "malicious" when the data makes it to HTML is different from what's "malicious" when it makes it to LDAP, or SQL, or PDF, or XML, or command line, or...
Remember, I am not saying we should stop performing input validation. Rather, input validation can be quite effective when your rules are to ensure the data meets a proper format (as opposed to ensuring it doesn't meet a rotten format). However, the way you repair all type of injection attacks is at the presentation layer. E.g., when dynamic data is going to:
  • HTML - encode HTML meta characters to their entities - < to &lt;, > to &gt;, " to ".
  • Javascript - hrmm....I really recommend putting the dynamic stuff into a hidden HTML form field, perform HTML encoding on it, then from the Javascript, pull it from the HTML form - much easier than determining if you need to escape ' or " or neither.
  • SQL - Don't do dynamic SQL - do prepared statements/parameterized queries. (No! Stored procedures are NOT sufficient!)
  • LDAP - hopefully your LDAP library allows you perform parameterized LDAP queries, rather than a dynamically constructed LDAP query.
  • XML - see HTML
  • PDF - golly - I dunno - I use FO, in which case, see XML
  • HTTP headers - URL encoding
  • command-line - please tell me you have a better alternative
I've really only scratched the surface of this rant. Exceedingly few sites really intend for you to be able to write HTML, so if you don't intend for the user to be able to put in HTML, make sure all your dynamic output goes through an HTML filter. Your programming language probably has a really efficient function for doing exactly that.

Feel free to pipe up.