简体   繁体   中英

Decoding a special character in C#

I am wondering how I could decode the special character • to HTML?

I have tried using System.Web.HttpUtility.HtmlDecode but not luck yet.

The issue here is not HTML decoding, but rather that the text was encoded in one character set (eg, windows-1252) and then encoded again as a second (UTF-8).

In UTF-8, is decoded as E2 80 A2 . When this byte sequence is read using windows-1252 encoding, E2 80 A2 encodes as • . (Saved again as UTF-8 • becomes C3 A2 E2 82 AC C2 A2 20 54 65 73 74 .)

If the file is a windows-1252-encoded file, the file can simply be read with the correct encoding (eg, as an argument to a StreamReader constructor.):

new StreamReader(..., Encoding.GetEncoding("windows-1252"));

If the file was saved with an incorrect encoding, the encoding can be reversed in some cases. For instance, for the string sequence in your question, you can write:

string s = "•"; // the string sequence that is not properly encoded
var b = Encoding.GetEncoding("windows-1252").GetBytes(s); // b = `E2 80 A2`
string c = Encoding.UTF8.GetString(b);  // c = `•`

Note that many common nonprinting characters are in the range U+2000 to U+2044 ( Reference ), such as "smart quotes", bullets, and dashes. Thus, the sequence � , where ? is any character, will typically signify this type of encoding error. This allows this type of error to be corrected more broadly:

static string CorrectText(string input)
{
    var winencoding = Encoding.GetEncoding("windows-1252");
    return Regex.Replace(input, "â€.",
        m => Encoding.UTF8.GetString(winencoding.GetBytes(m.Value)));
}

Calling this function with text malformed in this way will correct some (but not all) errors. For instance CorrectText("•Test–or“") will return the intended •Test–or“ .

HtmlDecode is for converting Html-encoded strings into a readable string format. Perhaps HtmlEncode might be what you're actually looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM