← Front page

Input validation is hard and usually pointless

Sometimes people think that stripping HTML tags (or replacing < and > character with entities) from user input is enough. Which is of course wrong. Here's why.

Attribute values

Quite often we want to put usernames in profile picture alt attributes or do something similar. If all we have done is just stripped HTML tags away, username with "style="display: none;" will bite us. Replacing " with ' works too of course, special points goes to IE which handles ` also as attribute delimiter.

And if you ever see attributes without " or ' characters, run away. If attributes are not wrapped in quotations, space character will do the trick. As will probably any "special" character, like @ or $.

JSON and other formats

If you move user data using JSON format, ","email": "foo@bar.com might do something you did not want.

HTTP headers

Or if you trying to redirect one user based on anothers data, like this

Location: {{new url based on user input}}

input like

Set-Cookie: FOO=bar; domain=spage.fi


will do nasty stuff just because it has line breaks in convenient places.

Handle the output

Instead of paying too much attention to input, you should pay a lot of attention to output. O'Donnell is perfectly valid name and it can be safely shown as node content, but not as attribute value because of ' character. So you must know the context before doing escaping, and that can be known only during output.