简体   繁体   English

将ANSI转换为UTF8

[英]Convert ANSI to UTF8

I download a file from the internet in C# Windows Phone 8.1. 我在C#Windows Phone 8.1中从Internet下载文件。 The problem is, that the downloaded content has strange looking special characters. 问题在于,下载的内容具有奇怪的特殊字符。 When I examined the file on my PC with Notepad++, it told me, that the file is encoded in ANSI and I want to read it as UTF8 当我使用Notepad ++在PC上检查文件时,它告诉我该文件是ANSI编码的,我想将其读取为UTF8

Here my code 这是我的代码

byte[] responseBytes = await client.GetByteArrayAsync("http://somesite/myfile.txt");
string content = Encoding.UTF8.GetString(responseBytes, 0, responseBytes.Length);

But as it is encoded in ANSI, all special characters are displayed strange. 但是由于它是用ANSI编码的,所以所有特殊字符都显示为奇怪。

Now after some research, a lot of people have this approach: 现在,经过一些研究,许多人都采用了这种方法:

Encoding ANSI = Encoding.GetEncoding(1252);
byte[] ansiBytes = ANSI.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert(ANSI, Encoding.UTF8, ansiBytes);
String utf8String = Encoding.UTF8.GetString(utf8Bytes);

but in WP 8.1, the routine Encoding.GetEncoding(1252) is invalid, as well as Encoding.Default . 但在WP 8.1中,例程Encoding.GetEncoding(1252)以及Encoding.Default均无效。 What can I do, to have my string in UTF8? 要把我的字符串放在UTF8中,该怎么办?

In general (but apparently not on Windows Phone), the way to do this is to simply use the correct encoding from the get-go: 通常(但显然不是在Windows Phone上),执行此操作的方法是简单地从开始使用正确的编码:

string content = Encoding.Default.GetString(responseBytes, 0, responseBytes.Length);

Where Encoding.Default is defined as: 其中Encoding.Default定义为:

an encoding for the operating system's current ANSI code page. 操作系统当前ANSI代码页的编码。

… What you are currently attempting to do is interpret the bytes in an incorrect encoding, and then try to re-encode them. …您当前要执行的操作是以错误的编码方式解释字节,然后尝试重新编码它们。 This won't generally work. 通常这将无法正常工作。


But as you've said, Windows Phone does not support this . 但是正如您所说的, Windows Phone不支持this So what you do instead is to manually create a byte-to-character translation table for Windows-1252 and look up the characters. 因此,您要做的是手动为Windows-1252创建字节到字符的转换表并查找字符。 You can then either manually loop over the input buffer, or, for extra points, create a new class which derives from System.Text.Encoding and which implements the required encoding. 然后,您可以手动循环输入缓冲区,或者,为了加分,创建一个新类,该类从System.Text.Encoding派生并实现所需的编码。

In fact, if I remember C# conversions correctly, you don't even need a lookup table. 实际上,如果我没有记错C#转换,那么您甚至都不需要查找表。 The following is a rudimentary but (for your purposes) sufficient Encoding implementation: 以下是基本的(但出于您的目的)足够的Encoding实现:

class Windows1252Encoding : System.Text.Encoding {
    public override int GetByteCount(char[] chars, int index, int count) {
        return count;
    }

    public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex) {
        Array.Copy(chars, charIndex, bytes, byteIndex, charCount);
        return charCount;
    }

    public override int GetCharCount(byte[] bytes, int index, int count) {
        return count;
    }

    public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex) {
        Array.Copy(bytes, byteIndex, chars, charIndex, byteCount);
        return byteCount;
    }

    public override int GetMaxByteCount(int charCount) {
        return charCount;
    }

    public override int GetMaxCharCount(int byteCount) {
        return byteCount;
    }
}

This seems to work, but I cannot test it on Windows Phone, only on Mono. 这似乎可行,但是我不能在Windows Phone上测试,只能在Mono上测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM