简体   繁体   中英

ASP.NET request validation with HTML encoded characters

I have a textbox in a form which needs to accept input with HTML tags.

Submitting input with HTML tags in makes the app throw a HttpRequestValidationException , unless we use HttpUtility.HtmlEncode . Easy so far.

However, the input may also contain symbols, such as the 'degrees' symbol (°). When these are also HTML encoded, they become numeric escape codes, in this example ° . These codes also cause HttpRequestValidationException to be thrown, but the question is why?

I can't see why numeric escape codes are thought of as potentially dangerous, especially as ° works as input just fine.

I seem to be stuck, as leaving the input as-is fails due to the tags, and HTML encoding the input fails due to the numeric escapes. My solution so far has been to HTML encode, then regex replace the escape sequences with their HTML decoded forms, but I'm not sure if this is a safe solution, as I assume the escape sequences are seen as dangerous for a reason.

ASP.NET considers html char escapes (&#xxx) dangerous for the same reason it considers angled bracket dangerous ie XSS. Using above escape, you can include any character (for example, angled bracket). Here's summary of what request validation does in 1.1 and 2.0.

In legitimate cases such as your case, you can choose any of below

  1. Choose your own handling as described by you
  2. Disable request validation at page level (<%@ Page validateRequest="false")
  3. In .NET 4, substitute your own request validation using RequestValidator class.

This is due to ASP.NET builtin Cross Site Scripting validation capabilities. There is some kind of a list of what's allowed and what's not by ASP.NET, here on SO: ASP.NET request validation causes: is there a list?

On the specific case of # encoded characters, there is a complete reference of XSS attacks available here: XSS (Cross Site Scripting) Cheat Sheet that demonstrate how complex these attacks can be, and why encoded characters are forbidden.

You can read the Script Exploits Overview in the msdn help.

If you are sure that you handle any possible malicious code input in your page then you can disable validation using the <%@ Page validateRequest="false" %> directive.

I'd suggest looking into doing limited html encoding on the client side, quite a breeze to do with jquery by binding processing to a form submit.

What do I mean by "limited"? Ampersands, angled brackets and quotes should be be encoded but not the unicode symbols. You're pointing out that, in fact, numeric escape codes are evil and get declined, unlike their unescaped equivalents!

You could run the string you're submitting through a javascript function similar to the following code, giving you a value that would pass request validation:

function safeString(s) {
    return s.replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;").replace(/"/g, "&quot;");
}

This could cause you some grief if, after storing it or doing some server-side magic with the submitted value, you want to re-display it inside of an input. Let me elaborate: if you simply put a string encoded that way into an empty paragraph, it will render perfectly; however if you dump it into a textarea, you will see &lt; instead of <

Ironically, when writing the last sentence I had to type &amp;lt; and &lt; respectively...

Just add in your page directive (first line of the page) this attribute:

ValidateRequest="false"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM