简体   繁体   English

将.txt文件转换为unicode

[英]converting .txt files into unicode

有没有一种方法可以使用C#将.txt文件转换为unicode?

Only if you know the original encoding used to produce the .txt file (and that's not a restriction of C# or the .NET language either, it's a general problem). 仅当您知道用于生成.txt文件的原始编码时(这也不是C#或.NET语言的限制,这是一个普遍的问题)。

Read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) to learn why "plain text" is meaningless if you don't know the encoding. 阅读每个软件开发人员绝对,肯定地必须了解的Unicode和字符集的绝对最低要求(无借口!),以了解如果不知道编码,为什么“纯文本”毫无意义。

Provided you're only using ASCII characters in your text file, they're already Unicode, encoded as UTF-8. 如果您仅在文本文件中使用ASCII字符,则它们已经是Unicode,编码为UTF-8。

In you want a different encoding of the characters (UTF16/UCS2, etc), any language that supports Unicode should be able to read in one encoding and write out another. 如果您想要字符的不同编码 (UTF16 / UCS2等),则任何支持Unicode的语言都应该能够读入一种编码,然后再写出另一种。

The System.Text.Encoding stuff will do it as per the following example - it outputs UTF16 as both UTF8 and ASCII and then back again (code gratuitously stolen from here ). System.Text.Encoding东西将按照以下示例进行操作-将UTF16输出为UTF8和ASCII,然后再次返回(从此处免费窃取的代码)。

using System;
using System.IO;
using System.Text;

class Test {
    public static void Main() {        
        using (StreamWriter output = new StreamWriter("practice.txt")) {
            string srcString = "Area = \u03A0r^2"; // PI.R.R

            // Convert the UTF-16 encoded source string to UTF-8 and ASCII.
            byte[] utf8String = Encoding.UTF8.GetBytes(srcString);
            byte[] asciiString = Encoding.ASCII.GetBytes(srcString);

            // Write the UTF-8 and ASCII encoded byte arrays. 
            output.WriteLine("UTF-8  Bytes: {0}",
                BitConverter.ToString(utf8String));
            output.WriteLine("ASCII  Bytes: {0}",
                BitConverter.ToString(asciiString));

            // Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded  
            // string and write.
            output.WriteLine("UTF-8  Text : {0}",
                Encoding.UTF8.GetString(utf8String));
            output.WriteLine("ASCII  Text : {0}",
                Encoding.ASCII.GetString(asciiString));

            Console.WriteLine(Encoding.UTF8.GetString(utf8String));
            Console.WriteLine(Encoding.ASCII.GetString(asciiString));
        }
    }
}

Here is an example: 这是一个例子:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace utf16
{
    class Program
    {
        static void Main(string[] args)
        {
            using (StreamReader sr = new StreamReader(args[0], Encoding.UTF8))
            using (StreamWriter sw = new StreamWriter(args[1], false, Encoding.Unicode))
            {
                string line;
                while ((line = sr.ReadLine()) != null)
                {
                    sw.WriteLine(line);
                }
            }
        }
    }
}

If you do really need to change the encoding (see Pax's answer about UTF-8 being valid Unicode), then yes, you can do that quite easily. 如果确实需要更改编码(请参阅Pax关于UTF-8是有效Unicode的答案),则可以,您可以很容易地做到这一点。 Check out the System.Text.Encoding class. 检出System.Text.Encoding类。

There is a nice page on MSDN about this, including a whole example: MSDN上有一个很好的页面 ,包括一个完整的示例:

   // Specify the code page to correctly interpret byte values
    Encoding encoding = Encoding.GetEncoding(737); //(DOS) Greek code page
    byte[] codePageValues = System.IO.File.ReadAllBytes(@"greek.txt");

    // Same content is now encoded as UTF-16
    string unicodeValues = encoding.GetString(codePageValues);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM