简体   繁体   English

如何读取包含 NUL 字符的文本文件?

[英]How to read a text file containing NUL characters?

I have a file that contains NUL characters.我有一个包含NUL字符的文件。

This file is generated by another program I have no control over, but I have to read it in order to get some crucial information.这个文件是由另一个我无法控制的程序生成的,但我必须阅读它才能获得一些重要信息。

Unfortunately, readChar() truncates the output with this warning:不幸的是, readChar()截断了 output 并发出以下警告:

 In readChar("output.txt", 1e+05): truncating string with embedded nuls

Is there a way around this problem?有办法解决这个问题吗?

By convention, a text file cannot contain non-printable characters (including NUL).按照惯例,文本文件不能包含不可打印的字符(包括 NUL)。 If a file contains such characters, it isn't a text file — it's a binary file.如果文件包含此类字符,则它不是文本文件——它是二进制文件。

R strictly 1 adheres to this convention, and completely disallows NUL characters. R 严格1遵守此约定,并且完全不允许 NUL 字符。 You really need to read and treat the data as binary data.您确实需要读取数据并将其视为二进制数据。 This means using readBin and the raw data type:这意味着使用readBinraw数据类型:

n = file.size(filename)
buffer = readBin(filename, 'raw', n = n)
# Unfortunately the above has a race condition, so check that the size hasn’t changed!
stopifnot(n == file.size(filename))

Now we can fix the buffer by removing embedded zero bytes.现在我们可以通过删除嵌入的零字节来修复缓冲区。 This assumes UTF- x or ASCII encoding!这假定 UTF- x或 ASCII 编码! Other encodings might have embedded zero bytes that need to be interpreted!其他编码可能嵌入了需要解释的零字节!

buffer = buffer[buffer != 0L]
text = rawToChar(buffer)

1 Maybe too strictly … 1可能严格了……

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM