Okay, I've delayed this as long as I could, just so that those who know me wouldn't accuse me of going straight for my soapbox. But this would be an appropriate time, considering that people have beat the IE7 (it's been there since 6) "vulnerability" to death, and now the only security "news" is the "0day" on MySpace (What! A Script injection vulnerability on a site that allows you to inject script!), and that hackers are now hacking for money, not just annoyance (this is news?)
Here's the soapbox - output filtering. It seems that people are 100% certain that input validation cures all script injection (XSS - which it's really not) vulnerabilities, and that if you still have injection vulnerabilities, you're obviously not doing your input validation right.
Do not take what I say from here on out as license to STOP doing input validation! You have been warned.
Here are several reasons why I believe output filtering is a more appropriate and complete approach to ALL injection vulnerabilities, not just script injection (XSS).
- There's nothing inherently "malicious" about <, >, " or '. It's when they make it all the way to an HTML presentation layer that they do things the developer probably did not intend.
- If dynamic user input is going to various outputs, it has to be encoded for those outputs. If the user input ends up making it into a PDF, &lt; is probably not what the user wanted, when they truly wanted displayed <
- There are a billion different ways to encode "malicious" HTML injection characters going in, but only one way to properly encode them going out.
- Input validation is for verifying business rules, not semantic output rules.
- If you actually make your output XHTML, and want to make it strict, you have to output filter anyway.
- Your data may not come from users. It might come from a vendor feed, or some other outside data source - or even a DBA could put script directly into the database.
- What's "malicious" when the data makes it to HTML is different from what's "malicious" when it makes it to LDAP, or SQL, or PDF, or XML, or command line, or...
- HTML - encode HTML meta characters to their entities - < to <, > to >, " to ".
- SQL - Don't do dynamic SQL - do prepared statements/parameterized queries. (No! Stored procedures are NOT sufficient!)
- LDAP - hopefully your LDAP library allows you perform parameterized LDAP queries, rather than a dynamically constructed LDAP query.
- XML - see HTML
- PDF - golly - I dunno - I use FO, in which case, see XML
- HTTP headers - URL encoding
- command-line - please tell me you have a better alternative
Feel free to pipe up.