Input validation is hard and usually pointless
Sometimes people think that stripping HTML tags (or replacing <
and >
character with entities) from user input is enough. Which is of course wrong. Here's why.
Attribute values
Quite often we want to put usernames in profile picture alt attributes or do something similar. If all we have done is just stripped HTML tags away, username with "style="display: none;"
will bite us. Replacing "
with '
works too of course, special points goes to IE which handles `
also as attribute delimiter.
And if you ever see attributes without "
or '
characters, run away. If attributes are not wrapped in quotations, space character will do the trick. As will probably any "special" character, like @
or $
.
JSON and other formats
If you move user data using JSON format, ","email": "foo@bar.com
might do something you did not want.
HTTP headers
Or if you trying to redirect one user based on anothers data, like this
Location: {{new url based on user input}}
input like
invalid:url
Set-Cookie: FOO=bar; domain=spage.fi
<script>alert("XSS")</script>
will do nasty stuff just because it has line breaks in convenient places.
Handle the output
Instead of paying too much attention to input, you should pay a lot of attention to output. O'Donnell is perfectly valid name and it can be safely shown as node content, but not as attribute value because of '
character. So you must know the context before doing escaping, and that can be known only during output.