如何读取包含 NUL 字符的文本文件？

Question

I have a file that contains NUL characters.我有一个包含NUL字符的文件。

This file is generated by another program I have no control over, but I have to read it in order to get some crucial information.这个文件是由另一个我无法控制的程序生成的，但我必须阅读它才能获得一些重要信息。

Unfortunately, readChar() truncates the output with this warning:不幸的是， readChar()截断了 output 并发出以下警告：

 In readChar("output.txt", 1e+05): truncating string with embedded nuls

Is there a way around this problem?有办法解决这个问题吗？

Answer 1

By convention, a text file cannot contain non-printable characters (including NUL).按照惯例，文本文件不能包含不可打印的字符（包括 NUL）。 If a file contains such characters, it isn't a text file — it's a binary file.如果文件包含此类字符，则它不是文本文件——它是二进制文件。

R strictly ¹ adheres to this convention, and completely disallows NUL characters. R 严格¹遵守此约定，并且完全不允许 NUL 字符。 You really need to read and treat the data as binary data.您确实需要读取数据并将其视为二进制数据。 This means using readBin and the raw data type:这意味着使用readBin和raw数据类型：

n = file.size(filename)
buffer = readBin(filename, 'raw', n = n)
# Unfortunately the above has a race condition, so check that the size hasn’t changed!
stopifnot(n == file.size(filename))

Now we can fix the buffer by removing embedded zero bytes.现在我们可以通过删除嵌入的零字节来修复缓冲区。 This assumes UTF- x or ASCII encoding!这假定 UTF- x或 ASCII 编码！ Other encodings might have embedded zero bytes that need to be interpreted!其他编码可能嵌入了需要解释的零字节！

buffer = buffer[buffer != 0L]
text = rawToChar(buffer)

¹ Maybe too strictly … ¹可能太严格了……

如何读取包含 NUL 字符的文本文件？

问题描述

1 个解决方案

解决方案1
5 已采纳 2022-12-14 08:41:48

如何读取包含 NUL 字符的文本文件？

问题描述

1 个解决方案

解决方案1 5 已采纳 2022-12-14 08:41:48

解决方案1
5 已采纳 2022-12-14 08:41:48