简体   繁体   English

C#读取文件和编码问题

[英]C# Reading files and encoding issue

I've searched everywhere for this answer so hopefully it's not a duplicate. 我到处都在寻找这个答案,因此希望它不会重复。 I decided I'm just finally going to ask it here. 我决定终于要在这里问了。

I have a file named Program1.exe When I drag that file into Notepad or Notepad++ I get all kinds of random symbols and then some readable text. 我有一个名为Program1.exe的文件,当我将该文件拖到Notepad或Notepad ++中时,会得到各种随机符号,然后是一些可读的文本。 However, when I try to read this file in C#, I either get inaccurate results, or just a big MZ. 但是,当我尝试使用C#读取此文件时,我得到的结果不准确,或者只是一个很大的MZ。 I've tried all supported encodings in C#. 我已经尝试了C#中所有受支持的编码。 How can notepad programs read a file like this but I simply can't? 记事本程序如何读取这样的文件,但是我却无法读取? I try to convert bytes to string and it doesn't work. 我尝试将字节转换为字符串,但不起作用。 I try to directly read line by line and it doesn't work. 我尝试直接逐行读取,但它不起作用。 I've even tried binary and it doesn't work. 我什至尝试过二进制,但是它不起作用。

Thanks for the help! 谢谢您的帮助! :) :)

Reading a binary file as text is a peculiar thing to do, but it is possible. 读取二进制文件作为文本是一件奇怪的事情,但是有可能。 Any of the 8-bit encodings will do it just fine. 任何8位编码都可以。 For example, the code below opens and reads an executable and outputs it to the console. 例如,下面的代码打开并读取可执行文件,并将其输出到控制台。

const string fname = @"C:\mystuff\program.exe";
using (var sw = new StreamReader(fname, Encoding.GetEncoding("windows-1252")))
{
    var s = sw.ReadToEnd();
    s = s.Replace('\x0', ' '); // replace NUL bytes with spaces
    Console.WriteLine(s);
}

The result is very similar to what you'll see in Notepad or Notepad++. 结果与在Notepad或Notepad ++中看到的非常相似。 The "funny symbols" will differ based on how your console is configured, but you get the idea. 根据控制台的配置,“有趣的符号”会有所不同,但是您可以理解。

By the way, if you examine the string in the debugger, you're going to see something quite different. 顺便说一句,如果您在调试器中检查字符串,将会看到完全不同的东西。 Those funny symbols are encoded as C# character escapes. 这些有趣的符号被编码为C#字符转义符。 For example, nul bytes (value 0) will display as \\0 in the debugger, as NUL in Notepad++, and as spaces on the console or in Notepad. 例如,nul字节(值0)在调试器中显示为\\0 ,在Notepad ++中显示为NUL ,在控制台或记事本中显示为空格。 Newlines show up as \\r in the debugger, etc. 换行符在调试器等中显示为\\r

As I said, reading a binary file as text is pretty peculiar. 正如我所说,将二进制文件读取为文本是非常特殊的。 Unless you're just looking to see if there's human-readable data in the file, I can't imagine why you'd want to do this. 除非您只是想查看文件中是否存在人类可读的数据,否则我无法想象您为什么要这样做。

Update 更新资料

I suspect the reason that all you see in the Windows Forms TextBox is "MZ" is that the Windows textbox control (which is what the TextBox ultimately uses), uses the NUL character as a string terminator, so won't display anything after the first NUL . 我怀疑您在Windows窗体文本框中看到的全部是“ MZ”的原因是Windows文本框控件(文本框最终使用的控件)使用NUL字符作为字符串终止符,因此在首先是NUL And the first thing after the "MZ" is a NUL (shows as `\\0' in the debugger). “ MZ”之后的第一件事是NUL (在调试器中显示为“ \\ 0”)。 You'll have to replace the 0's in the string with spaces. 您必须用空格替换字符串中的0。 I edited the code example above showing how you'd do that. 我编辑了上面的代码示例,显示了您将如何执行此操作。

The exe is a binary file and if you try to read it as a text file you'll get the effect that you are describing. exe是一个二进制文件,如果您尝试将其作为文本文件读取,将获得您所描述的效果。 Try using something like a FileStream instead that does not care about the structure of the file but treats it just as a series of bytes. 尝试使用诸如FileStream之类的东西,该东西不关心文件的结构,而是将其视为一系列字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM