简体   繁体   English

python readlines()不包含整个文件

[英]python readlines() does not contain whole file

I have an auto-generated info file coming from a measurement. 我有一个来自测量的自动生成的信息文件。 It consists of both binary as well as human readable parts. 它由二进制和人类可读部分组成。 I want to extract some of the non binary meta data. 我想提取一些非二进制元数据。 For some files, I am not able to get to the meta data, as the readlines() does not yield the whole file. 对于某些文件,我无法获取元数据,因为readlines()不能产生整个文件。 I guess that the file contains some EOF char. 我猜该文件包含一些EOF字符。 I can open the file in notepad++ without problems. 我可以在notepad ++中打开文件,而不会出现问题。

A possible solution to this problem would be to read in the file binary and parse it to char afterwards, deleting the EOF char while doing so. 解决此问题的可能方法是读取文件二进制文件,然后将其解析为char,同时删除EOF char。 Anyhow, I wonder if there is a more elegant way to do so? 无论如何,我想知道是否还有更优雅的方法?

Edit: The question was rightfully downvoted, I should have provided code. 编辑:这个问题被正确地否决了,我应该提供代码。 I actually use 我实际上使用

f = open(fname, 'r')
raw = f.readlines()

and then proceed with walking through the list. 然后继续浏览列表。 The EOF chars that are existing (depending on the OS) seem to cause the havoc I am observing. 现有的EOF字符(取决于操作系统)似乎引起了我正在观察的破坏。 I will accept the answer that states using the binary 'rb' flag. 我将接受使用二进制'rb'标志声明的答案。 By the way, this was an impressive response time! 顺便说一下,这是令人印象深刻的响应时间! (-: ( - :

with open(afile,"rb") as f: print f.readlines()

What's the problem with doing this? 这样做有什么问题?

If you don't open the file in binary mode some non ASCII characters are incorrectly interpreted and or discarded... Which may inadvertently also remove some ASCII if it is mixed in with binary data 如果您不以二进制模式打开文件,则某些非ASCII字符将被错误地解释和/或丢弃...如果与二进制数据混合,则可能会无意间删除一些ASCII字符

You can use the read() function of the file object. 您可以使用文件对象的read()函数。 It reads the whole file. 它读取整个文件。

with open('input.bin', 'r') as f:
    content = f.read()

Then you can parse the content. 然后,您可以解析内容。 If you know where the part you need starts, you can seek to it (eg if the file has a fixed-length binary start): 如果知道所需部分的起点,则可以查找该起点(例如,如果文件具有固定长度的二进制开头):

with open('input.bin', 'r') as f:
    f.seek(CONTENT_START)
    content = f.read()

On Windows, you should change the reading mode to 'rb', to indicate that you want to read the file in binary mode; 在Windows上,您应该将读取模式更改为“ rb”,以表示您希望以二进制模式读取文件; only then line endings in the text-part may consist of '\\r\\n', depending on how you created the file in the first place. 只有这样,文本部分的行尾才能由'\\ r \\ n'组成,具体取决于您首先创建文件的方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM