File.ReadAllText 中的无效字符

Question

I'm calling File.ReadAllText() in a program designed to format some files that I have.我在一个旨在格式化我拥有的文件的程序中调用File.ReadAllText() 。

Some of these files contain the ® (174) symbol.其中一些文件包含® (174) 符号。 However, when the text is being read, the returned string contains (65533) symbols where the ® (174) should be.但是，在读取文本时，返回的字符串包含 (65533) 个符号，而® (174) 应在该位置。

What would cause this and how can I fix it?什么会导致这种情况，我该如何解决？

Answer 1

This is likely due to a mismatch in the Encoding .这可能是由于Encoding不匹配造成的。 Use the ReadAllText overload which allows you to specify the proper Encoding to use when reading the file.使用ReadAllText重载，它允许您指定在读取文件时使用的正确Encoding 。

The default overload will assume UTF-8 unless it can detect UTF-32.默认重载将假定为 UTF-8，除非它可以检测到 UTF-32。 Any other encoding will come through incorrectly.任何其他编码都会错误地通过。

Answer 2

Most likely the file contains a different encoding than the default.该文件很可能包含与默认值不同的编码。 If you know it, you can specify it using the File.ReadAllText Method (String, Encoding) override.如果您知道，您可以使用File.ReadAllText 方法（字符串，编码）覆盖来指定它。

Code sample:代码示例：

string readText = File.ReadAllText(path, Encoding.Default);  // <-- change the encoding to whatever the encoding really is

If you DON'T know the encoding, see this previous SO question: How to use ReadAllText when file encoding unknown如果您不知道编码，请参阅上一个 SO 问题： How to use ReadAllText when file encoding unknown

Answer 3

You need to specify the encoding when you call File.ReadAllText , unless the file is actually in UTF-8, which it sounds like it's not.您需要在调用File.ReadAllText时指定编码，除非文件实际上是 UTF-8，听起来好像不是。 (Basically the one-parameter overload is equivalent to passing in UTF-8 as the second argument. It will also detect UTF-32 with an appropriate byte-order mark, I believe.) （基本上，单参数重载相当于将 UTF-8 作为第二个参数传递。我相信，它还将使用适当的字节顺序标记检测 UTF-32。）

The first thing is to work out which encoding it is in (eg ISO-8859-1 - but you need to check this) and then pass that as a second argument.的第一件事就是制定出哪些编码它是在（如ISO-8859-1 -但你需要检查这一点），然后传递作为第二个参数。

For example:例如：

Encoding isoLatin1 = Encoding.GetEncoding(28591);
string text = File.ReadAllText(path, isoLatin1);

It's always important that you know what encoding binary data is using before you try to read it as text.在尝试将其作为文本阅读之前，了解正在使用的编码二进制数据总是很重要的。 That's true for files, network streams, anything.对于文件、网络流等任何事物都是如此。

Answer 4

The character you are reading is the Replacement character您正在阅读的字符是替换字符

used to replace an incoming character whose value is unknown or unrepresentable in Unicode compare the use of U+001A as a control character to indicate the substitute function用于替换值未知或在 Unicode 中无法表示的传入字符比较使用 U+001A 作为控制字符来指示替换功能

http://www.fileformat.info/info/unicode/char/fffd/index.htm http://www.fileformat.info/info/unicode/char/fffd/index.htm

You are getting this because the actual encoding of the file does not match the encoding your program expects.你得到这个是因为文件的实际编码与你的程序期望的编码不匹配。

By default ReadAllText expects UTF-8.默认情况下，ReadAllText 需要 UTF-8。 It is encountering a byte sequence that does not represent a valid UTF-8 character, so replacing it with the Replacement character .它遇到了不代表有效 UTF-8 字符的字节序列，因此将其替换为Replacement character 。

File.ReadAllText 中的无效字符

问题描述

4 个解决方案

解决方案1
14 已采纳 2013-03-18 15:47:16

解决方案2
13 2013-03-18 15:48:25

解决方案3
10 2013-03-18 15:47:41

解决方案4
0 2013-03-18 15:48:45

File.ReadAllText 中的无效字符

问题描述

4 个解决方案

解决方案1 14 已采纳 2013-03-18 15:47:16

解决方案2 13 2013-03-18 15:48:25

解决方案3 10 2013-03-18 15:47:41

解决方案4 0 2013-03-18 15:48:45

解决方案1
14 已采纳 2013-03-18 15:47:16

解决方案2
13 2013-03-18 15:48:25

解决方案3
10 2013-03-18 15:47:41

解决方案4
0 2013-03-18 15:48:45