简体   繁体   English

用Python读取多行文件

[英]Multiline file read in Python

I am looking for a method in Python which can read multiple lines from a file(10 lines at a time). 我正在寻找Python中的一种方法,该方法可以从文件中读取多行(一次读取10行)。 I have already looked into readlines(sizehint) , I tried to pass value 10 but doesn't read only 10 lines. 我已经研究过readlines(sizehint) ,我尝试传递值10,但不只读取10行。 It actually reads till end of the file(I have tried on the small file). 它实际上会读取到文件末尾(我已经尝试过小文件了)。 Each line is 11 bytes long and each read should fetch me 10 lines each time. 每行长11个字节,每次读取应每次获取10行。 If less than 10 lines are found then return only those lines. 如果发现少于10行,则仅返回那些行。 My actual file contains more than 150K lines. 我的实际文件包含超过15万行。

Any idea how I can achieve this? 知道我该如何实现吗?

You're looking for itertools.islice() : 您正在寻找itertools.islice()

with open('data.txt') as f:
    lines = []
    while True:
        line = list(islice(f, 10)) #islice returns an iterator ,so you convert it to list here.
        if line:                     
            #do something with current set of <=10 lines here
            lines.append(line)       # may be store it 
        else:
            break
    print lines    

This should do it 这应该做

def read10Lines(fp):
    answer = []
    for i in range(10):
        answer.append(fp.readline())
    return answer

Or, the list comprehension: 或者,列表理解:

ten_lines = [fp.readline() for _ in range(10)]

In both cases, fp = open('path/to/file') 在这两种情况下, fp = open('path/to/file')

Another solution which can get rid of the silly infinite loop in favor of a more familiar for loop relies on itertools.izip_longest and a small trick with iterators. 可以摆脱愚蠢的无限循环而采用更熟悉的for循环的另一种解决方案,取决于itertools.izip_longest和一个带有迭代器的小技巧。 The trick is that zip(*[iter(iterator)]*n) breaks iterator up into chunks of size n. 诀窍是zip(*[iter(iterator)]*n)iterator分解为大小为n的块。 Since a file is already generator-like iterator (as opposed to being sequence like), we can write: 由于文件已经是类似于生成器的迭代器(而不是类似于序列的迭代器),我们可以编写:

from itertools import izip_longest
with open('data.txt') as f:
    for ten_lines in izip_longest(*[f]*10,fillvalue=None):
        if ten_lines[-1] is None:
           ten_lines = filter(ten_lines) #filter removes the `None` values at the end
        process(ten_lines) 
from itertools import groupby, count
with open("data.txt") as f:
    groups = groupby(f, key=lambda x,c=count():next(c)//10)
    for k, v in groups:
        bunch_of_lines = list(v)
        print bunch_of_lines

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM