简体   繁体   English

Python file.read()方法

[英]Python file.read() method

I'm reading a text file, line by line, using Python. 我正在使用Python逐行读取文本文件。 Each line is of a variable length. 每行的长度是可变的。 The first line could be 10 characters, the next one could be 100; 第一行可以是10个字符,下一行可以是100个字符; there's no way of telling. 没有办法告诉。 Presently, I issue a file.readline() method for each line, process it, and then save it to a database. 当前,我为每行发出一个file.readline()方法,对其进行处理,然后将其保存到数据库中。 This method guarantees me one full line of input. 这种方法保证了我完整的输入行。 I'd like to do this faster however. 但是,我想更快地执行此操作。 Is there a way to do a bulk read using the Python file.read() method such that I can guarantee an end-of-line read character when the buffer stops in the middle of a line? 有没有一种方法可以使用Python file.read()方法进行批量读取,这样当缓冲区在行中间停止时,我可以保证行尾读取字符? What's the best way to handle this? 处理这个问题的最佳方法是什么?

The typical way to handle these are just: 处理这些的典型方法是:

for line in fileobj:
    process(line)

There's no way to actually specify that read ends at the end of a line. 无法实际指定read在行尾结束。 You could do something kludgy with fileobj.seek . 您可以使用fileobj.seek进行一些fileobj.seek Basically, you'd read N bytes, find the last newline in the returned string (using string.rfind ) you could seek backward that many bytes. 基本上,您将读取N个字节,在返回的字符串中找到最后一个换行符(使用string.rfind ),可以向后查找那么多字节。


Of course, if you have sufficient memory, you can read the entire file in at once: 当然,如果您有足够的内存,则可以一次读取整个文件:

list_of_lines = fileobj.readlines()

However, I'm really not positive that you'll see any noticeable speedup here. 但是,我真的不满意您会在这里看到任何明显的加速。 Are you sure you're not optimizing before you need to? 确定要先优化吗?

You can use .. 您可以使用 ..

lines = file_handle.read().split('\n')
# Or 
lines = file_handle.readlines()

Check their documentation for accurate behavior with '\\n' . 使用'\\n'检查其文档以获取正确的行为。

The way the encoders I've messed with have done this, is to read whatever's there, or a particular chunk size, note the position of the last newline (.rfind('\\n')), process the data up to that newline, and then store from the newline to the end of the chunk in a list. 我所困扰的编码器执行此操作的方式是,读取其中的内容或特定的块大小,注意最后一个换行符(.rfind('\\ n'))的位置,将数据处理到该换行符,然后从换行符到块的末尾存储在列表中。 When reading the next block, you read from the same position as you stopped reading before, and you append the leftover string from before onto it. 读取下一个块时,您将从与之前读取相同的位置开始读取,然后将之前的剩余字符串附加到该位置。 The performance was reasonable, and it's stable, of course this was for network sockets, where you can't seek backwards, I'm not sure which method would actually perform better on files. 性能是合理的,并且是稳定的,当然这是针对网络套接字的,您不能向后搜索,我不确定哪种方法实际上对文件性能更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM