简体   繁体   中英

Unescapping client data in C# to prevent XSS or other attack

To prevent web application input from XSS or any other attack, we would like to decode all the input coming from the client (browser).

To bypass the standard validation, bad guys encode the data. Example:

<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>

That gets translated to

<IMG SRC=javascript:alert('XSS')>

In C#, we can use HttpUtility.HtmlDecode & HttpUtility.UrlDecode to decode the client input. But, it does not cover all the type of encoding. For example, following encoded values are not getting translated using above methods. However, all the browser decode and execute them properly. One can verify them at https://mothereff.in/html-entities as well.

<img src=x onerror="&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041">

It gets decoded to <img src=x onerror="javascript:alert('XSS')">

There are some more encoded text that does not get decoded using HtmlDecode method. In Java, https://github.com/unbescape/unbescape handles all such varieties.

Do we have a similar library in .Net or how do handle such scenarios?

Generally, you should not allow users to enter code into a text box.

Client side

Judging from the comments on your post, I'd simply add some client-side validation to prevent users from adding any sort of malicious inputs (such as verifying email fields contain emails) and then add the same validation techniques to your server.

Server side

As soon as you read a user's input in a model, you should validate and sanitise it before you do any further processing. Have a generic AntiXSS() class that can remove any malicious characters such as the <> symbols by checking myString.Contains("<") or myString.Contains(">") for example. If it does, remove that character. Validate your types. If you're checking the userEmail field, make sure it conforms to email syntax.

The general idea is that you can pass data to the client, but never trust any of the data that comes back from the client without first sanitising and cleansing everything.

I found the solution. HtmlUtility.HtmlDecode decodes the chars between ampersand '&' and semicolon ';'. However, the browsers do not bother about the suffixed ';'.

In my case, semicolon ';' was missing. I have written simple code to insert a semicolon before calling HtmlDecode method. Now, it's decoding properly as expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM