Stopping XSS but allowing HTML is Hard


RSnake has a post today that made me laugh out loud as soon as I read it. But rather than say "I told you so" (i.e. here, and here) I thought I'd give those of you who INSIST on allowing users to enter HTML into your site a couple of ways you might make it work:

1) You could use a wiki markup library. Many wiki's use the same markup semantics (and it isn't HTML), so I'm shocked that I did a quick search for just the markup library and haven't found one.
2) You could require your users to provide strict XML. The benefit here is that you have good XML parsers available that will puke if the XML is not well-formed. The benefit of having well-formed XML is that then you can then whitelist the tags and the attributes allowed for those tags. So rather than trying to get rid of anything that looks like it might be script, only allow stuff that you specify. And you only specify the bare minimum folks might need to get by with.
3) Combination of 1 and 2. Not saying it's perfect, but that's what Google does with Blogger. But they allow far more formatting than is probably necessary for most user-controlled display.

So there you go. And of course, even when you do one of these methods, you still need really sharp people like RSnake and kuza55 to check it out. If there's a way to script it, those cats will find it.

1 comment:

  1. i had been thinking about the same thing. By using XHTML instead of HTML, we can enforce strict formatting, which might stop certain XSS attacks. It may not stop all the attacks but it may certain reduce the possible combinations.