To be added to my list of ways to effectively allow HTML without allowing javascript, enter OWASP's AntiSamy library.

AntiSamy uses home-grown methods of being very specific about what is and is not allowed to come into the application. And what's very nice about it is that it already includes configuration for several profiles - online sites that allow HTML. These profiles somewhat match the profiles of those websites. For example, the slashdot profile is pretty restrictive, while the MySpace profile allows quite a bit more.

To be honest, I don't know how much better AntiSamy is than using a DTD, Schema, or RelaxNG to validate the HTML, *except* that there are additional rules that have to be validated with logical tests.

One thing that occurred to me while working on a mock-up of using XML validation to perform this validation is that <img tags (or anything with a remote URL) requires special consideration to deal with the possibility of XSRF against other sites. For example, if an attacker finds an XSRF vulnerability against some site, it's in their interest to get that URL injected in as many places as possible. One way to do this is with <img tags in sites that allow them to come from remote sites, or url() constructs in style attributes where a url is permissible. In order to deal with these, the best way I came up with is to have the server fetch the resource when the tag is entered, then when the page is rendered, to reference the URL locally. (This also gives the user the benefit of having a static version of an image, even when it expires off the original site).

Anyhow, cheers again to OWASP.


  1. I'm not a huge fan of image retrieval since you're adding additional complexity when there's no real vulnerability in your site. I've seen very few sites which allow file uploads (and this is essentially what you're describing) do it securely.

    So if you were sure the rest of your application was secure, and were 100% sure you could get this right and had nothing better to do with your developer's time, then I guess you could do it; but I don't think that saying people should try and protect other sites from csrf by disallowing your site to be used as a platform is really a good idea...

  2. This caught my eye on the OWASP Moderated AppSec feed. First of all, thanks for your interest in AntiSamy- I'm the author, Arshan Dabirsiaghi.

    XML validation (DTD/RelaxNG) require pre-formatted code, something that is not a reasonable requirement in the real world. Also, there isn't any stylesheet validation in such a solution.

    I could go on and on. =)

    I will have to add "Why is this better than XSD/DTD validation?" to the FAQ.

  3. Also, AntiSamy has the ability to retrieve offsite CSS resources and turning them into inline styles. But really, back to work!

  4. @kuza55 Absolutely - if you're accepting HTML from your users, you've probably got bigger problems than just being a platform for XSRF against other sites. I'd just hate to be implicated in the latter. Believe me, using a server to fetch the contents of some unknown URL, then doing some image processing on that for not only validity and sanity (size, and is it actually an image?) but also just checking to see if that image is something you want to show on your site is no small chore - hard to imagine you've got time to mess with that, but have 0 other flaws in the rest of your HTML-accepting site. Good insight as usual.

  5. @arshan - right on - those are exactly the reasons I was excited to see AntiSamy - all of the stuff I was having to go and do with code after checking the schema (I wrapped their input in a root node so that they can do html-optional), can be done declaratively in the same configuration file.

    Great work - I'm a fan already (and I don't usually write apps where the users can submit HTML).