简体   繁体   English

C# 将字符串从 UTF-8 转换为 ISO-8859-1 (Latin1) H

[英]C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

I have googled on this topic and I have looked at every answer, but I still don't get it.我在这个话题上用谷歌搜索过,我看过每个答案,但我仍然不明白。

Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code:基本上,我需要将 UTF-8 字符串转换为 ISO-8859-1,并使用以下代码执行此操作:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));

My source string is我的源字符串是

Message = "ÄäÖöÕõÜü"

But unfortunately my result string becomes但不幸的是我的结果字符串变成

msg = "�ä�ö�õ�ü

What I'm doing wrong here?我在这里做错了什么?

Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.在尝试将其解码为目标编码之前,使用Encoding.Convert调整字节数组。

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);

I think your problem is that you assume that the bytes that represent the utf8 string will result in the same string when interpreted as something else (iso-8859-1).我认为您的问题是您假设表示 utf8 字符串的字节在解释为其他内容时会产生相同的字符串(iso-8859-1)。 And that is simply just not the case.而事实并非如此。 I recommend that you read this excellent article by Joel spolsky.我建议您阅读 Joel spolsky 撰写的这篇优秀文章

Try this:试试这个:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8,iso,utfBytes);
string msg = iso.GetString(isoBytes);

You need to fix the source of the string in the first place.您首先需要修复字符串的来源。

A string in .NET is actually just an array of 16-bit unicode code-points, characters, so a string isn't in any particular encoding. .NET 中的字符串实际上只是一个由 16 位 unicode 代码点、字符组成的数组,因此字符串没有任何特定的编码。

It's when you take that string and convert it to a set of bytes that encoding comes into play.当您获取该字符串并将其转换为一组字节时,编码就起作用了。

In any case, the way you did it, encoded a string to a byte array with one character set, and then decoding it with another, will not work, as you see.在任何情况下,如您所见,您使用一种字符集将字符串编码为字节数组,然后使用另一种字符集对其进行解码的方式将不起作用。

Can you tell us more about where that original string comes from, and why you think it has been encoded wrong?你能告诉我们更多关于原始字符串的来源,以及你认为它编码错误的原因吗?

Seems bit strange code.看起来有点奇怪的代码。 To get string from Utf8 byte stream all you need to do is:要从 Utf8 字节流中获取字符串,您需要做的就是:

string str = Encoding.UTF8.GetString(utf8ByteArray);

If you need to save iso-8859-1 byte stream to somewhere then just use: additional line of code for previous:如果您需要将 iso-8859-1 字节流保存到某处,那么只需使用:前一行的附加代码行:

byte[] iso88591data = Encoding.GetEncoding("ISO-8859-1").GetBytes(str);

Just used the Nathan's solution and it works fine.刚刚使用了 Nathan 的解决方案,效果很好。 I needed to convert ISO-8859-1 to Unicode:我需要将 ISO-8859-1 转换为 Unicode:

string isocontent = Encoding.GetEncoding("ISO-8859-1").GetString(fileContent, 0, fileContent.Length);
byte[] isobytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(isocontent);
byte[] ubytes = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.Unicode, isobytes);
return Encoding.Unicode.GetString(ubytes, 0, ubytes.Length);
Encoding targetEncoding = Encoding.GetEncoding(1252);
// Encode a string into an array of bytes.
Byte[] encodedBytes = targetEncoding.GetBytes(utfString);
// Show the encoded byte values.
Console.WriteLine("Encoded bytes: " + BitConverter.ToString(encodedBytes));
// Decode the byte array back to a string.
String decodedString = Encoding.Default.GetString(encodedBytes);

Maybe it can help也许它可以帮助
Convert one codepage to another:将一个代码页转换为另一个:

    public static string fnStringConverterCodepage(string sText, string sCodepageIn = "ISO-8859-8", string sCodepageOut="ISO-8859-8")
    {
        string sResultado = string.Empty;
        try
        {
            byte[] tempBytes;
            tempBytes = System.Text.Encoding.GetEncoding(sCodepageIn).GetBytes(sText);
            sResultado = System.Text.Encoding.GetEncoding(sCodepageOut).GetString(tempBytes);
        }
        catch (Exception)
        {
            sResultado = "";
        }
        return sResultado;
    }

Usage:用法:

string sMsg = "ERRO: Não foi possivel acessar o servico de Autenticação";
var sOut = fnStringConverterCodepage(sMsg ,"ISO-8859-1","UTF-8"));

Output:输出:

"Não foi possivel acessar o servico de Autenticação"

Here is a sample for ISO-8859-9;这是 ISO-8859-9 的示例;

protected void btnKaydet_Click(object sender, EventArgs e)
{
    Response.Clear();
    Response.Buffer = true;
    Response.ContentType = "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet";
    Response.AddHeader("Content-Disposition", "attachment; filename=XXXX.doc");
    Response.ContentEncoding = Encoding.GetEncoding("ISO-8859-9");
    Response.Charset = "ISO-8859-9";
    EnableViewState = false;


    StringWriter writer = new StringWriter();
    HtmlTextWriter html = new HtmlTextWriter(writer);
    form1.RenderControl(html);


    byte[] bytesInStream = Encoding.GetEncoding("iso-8859-9").GetBytes(writer.ToString());
    MemoryStream memoryStream = new MemoryStream(bytesInStream);


    string msgBody = "";
    string Email = "mail@xxxxxx.org";
    SmtpClient client = new SmtpClient("mail.xxxxx.org");
    MailMessage message = new MailMessage(Email, "mail@someone.com", "ONLINE APP FORM WITH WORD DOC", msgBody);
    Attachment att = new Attachment(memoryStream, "XXXX.doc", "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet");
    message.Attachments.Add(att);
    message.BodyEncoding = System.Text.Encoding.UTF8;
    message.IsBodyHtml = true;
    client.Send(message);}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM