[英]How to read a text file containing NUL characters?
I have a file that contains NUL
characters.我有一个包含
NUL
字符的文件。
This file is generated by another program I have no control over, but I have to read it in order to get some crucial information.这个文件是由另一个我无法控制的程序生成的,但我必须阅读它才能获得一些重要信息。
Unfortunately, readChar()
truncates the output with this warning:不幸的是,
readChar()
截断了 output 并发出以下警告:
In readChar("output.txt", 1e+05): truncating string with embedded nuls
Is there a way around this problem?有办法解决这个问题吗?
By convention, a text file cannot contain non-printable characters (including NUL).按照惯例,文本文件不能包含不可打印的字符(包括 NUL)。 If a file contains such characters, it isn't a text file — it's a binary file.
如果文件包含此类字符,则它不是文本文件——它是二进制文件。
R strictly 1 adheres to this convention, and completely disallows NUL characters. R 严格1遵守此约定,并且完全不允许 NUL 字符。 You really need to read and treat the data as binary data.
您确实需要读取数据并将其视为二进制数据。 This means using
readBin
and the raw
data type:这意味着使用
readBin
和raw
数据类型:
n = file.size(filename)
buffer = readBin(filename, 'raw', n = n)
# Unfortunately the above has a race condition, so check that the size hasn’t changed!
stopifnot(n == file.size(filename))
Now we can fix the buffer by removing embedded zero bytes.现在我们可以通过删除嵌入的零字节来修复缓冲区。 This assumes UTF- x or ASCII encoding!
这假定 UTF- x或 ASCII 编码! Other encodings might have embedded zero bytes that need to be interpreted!
其他编码可能嵌入了需要解释的零字节!
buffer = buffer[buffer != 0L]
text = rawToChar(buffer)
1 Maybe too strictly … 1可能太严格了……
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.