简体   繁体   中英

Is there a way to covert Unicode Hex Character Code a to a simple "a" so that I can get an alphabetic string?

This character format encoding description can be found in All such encoded characters and character description . The exact specific string I want to convert is-

-9<ahref="j&#x61vascript:&#x61lert(window.origin)">X

This should be converted to-

-9<ahref="javascript:alert(window.origin)">X

I am a beginner in C# and have very little knowledge of this encoding style. Please help. I want something like Encoding.{what-is-the-format}.GetString(encodedString) => returns decodedString.

Answering my own question, thanks to @Ralf's comment above :

  • Step 1: Used Regex to extract unicode-hex character format from my string

  • Step 2: Used HttpUtility.HtmlDecode() to decode the extracted matches

  • Step 3: Replaced the decoded characters in my original string

Complete C# code is given below:

using System;
using System.Web;
using System.IO;
using System.Text.RegularExpressions;
class MyNewClass
{
    public static void Main()
    {            
            var textWithCountryNames= "-9<ahref=\"j&#x61;vascript:&#x61;lert(window.origin)\">X";
           
            Console.WriteLine(Regex.Replace(textWithCountryNames, @"&#x?[^;]{2}\;?", 
            delegate(Match match)
            {
                string v= match.ToString();
                return HttpUtility.HtmlDecode(v.EndsWith(";")? v : v+";");
            }
            , RegexOptions.IgnoreCase));
                 
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM