简体   繁体   English

通过C#读取txt文件(在unicode和utf8中)

[英]Read txt files (in unicode and utf8) by means of C#

I created two txt files (windows notepad) with the same content "thank you - спасибо" and saved them in utf8 and unicode. 我创建了两个具有相同内容的“txt文件”(Windows记事本)“谢谢 - спасибо”并将它们保存在utf8和unicode中。 In notepad they look fine. 在记事本中他们看起来很好。 Then I tried to read them using .Net: 然后我尝试使用.Net读取它们:

...File.ReadAllText(utf8FileFullName, Encoding.UTF8);

and

...File.ReadAllText(unicodeFileFullName, Encoding.Unicode);

But in both cases I got this "thank you - ???????". 但在这两种情况下我都得到了这个“谢谢 - ???????”。 What's wrong? 怎么了?

Upd: code for utf8 Upd:utf8的代码

static void Main(string[] args)
        {
            var encoding = Encoding.UTF8;
            var file = new FileInfo(@"D:\encodes\enc.txt");
            Console.OutputEncoding = encoding;
            var content = File.ReadAllText(file.FullName, encoding);
            Console.WriteLine("encoding: " + encoding);
            Console.WriteLine("content: " + content);
            Console.ReadLine();
        }

Result: thanks ÑпаÑибо 结果: 谢谢ÑпаÑибо

Edited as UTF8 should support the characters. 编辑为UTF8应该支持字符。 It seems that you're outputting to a console or a location which hasn't had its encoding set. 您似乎正在输出到控制台或没有编码设置的位置。 If so, you need to set the encoding. 如果是这样,您需要设置编码。 For the console you can do this 对于控制台,您可以执行此操作

string allText = File.ReadAllText(unicodeFileFullName, Encoding.UTF8);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(allText);

Use the Encoding type Default 使用编码类型默认

File.ReadAllText(unicodeFileFullName, Encoding.Default);

It will fix the ???? 它会修复???? Chracters. Chracters。

When outputting Unicode or UTF-8 encoded multi-byte characters to the console you will need to set the encoding as well as ensure that the console has a font set that supports the multi-byte character in order to display the corresponding glyph. 将Unicode或UTF-8编码的多字节字符输出到控制台时,您需要设置编码以及确保控制台具有支持多字节字符的字体集以显示相应的字形。 With your existing code a MessageBox.Show(content) or display on a Windows or Web Form would appear correctly. 使用现有代码,Windows或Web窗体上的MessageBox.Show(内容)或显示将正确显示。

Have a look at http://msdn.microsoft.com/en-us/library/system.console.aspx for an explanation on setting fonts within the console window. 有关在控制台窗口中设置字体的说明,请查看http://msdn.microsoft.com/en-us/library/system.console.aspx

" Support for Unicode characters requires the encoder to recognize a particular Unicode character, and also requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a non-raster or TrueType font such as Consolas or Lucida Console." 支持Unicode字符需要编码器识别特定的Unicode字符,并且还需要具有呈现该字符所需的字形的字体。要成功地将Unicode字符显示到控制台,控制台字体必须设置为非光栅或者TrueType字体,例如Consolas或Lucida Console。“

As a side note, you can use the FileStream class to read the first three bytes of the file and look for the byte order mark indicator to automatically set the encoding when reading the file. 作为旁注,您可以使用FileStream类读取文件的前三个字节,并查找字节顺序标记指示器以在读取文件时自动设置编码。 For example, if byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF then you have a UTF-8 encoded file. 例如,如果byte [0] == 0xEF && byte [1] == 0xBB && byte [2] == 0xBF,那么您有一个UTF-8编码文件。 Refer to http://en.wikipedia.org/wiki/Byte_order_mark for more information. 有关更多信息,请参阅http://en.wikipedia.org/wiki/Byte_order_mark

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM