[英]Multiline file read in Python
I am looking for a method in Python which can read multiple lines from a file(10 lines at a time). 我正在寻找Python中的一种方法,该方法可以从文件中读取多行(一次读取10行)。 I have already looked into
readlines(sizehint)
, I tried to pass value 10 but doesn't read only 10 lines. 我已经研究过
readlines(sizehint)
,我尝试传递值10,但不只读取10行。 It actually reads till end of the file(I have tried on the small file). 它实际上会读取到文件末尾(我已经尝试过小文件了)。 Each line is 11 bytes long and each read should fetch me 10 lines each time.
每行长11个字节,每次读取应每次获取10行。 If less than 10 lines are found then return only those lines.
如果发现少于10行,则仅返回那些行。 My actual file contains more than 150K lines.
我的实际文件包含超过15万行。
Any idea how I can achieve this? 知道我该如何实现吗?
You're looking for itertools.islice()
: 您正在寻找
itertools.islice()
:
with open('data.txt') as f:
lines = []
while True:
line = list(islice(f, 10)) #islice returns an iterator ,so you convert it to list here.
if line:
#do something with current set of <=10 lines here
lines.append(line) # may be store it
else:
break
print lines
This should do it 这应该做
def read10Lines(fp):
answer = []
for i in range(10):
answer.append(fp.readline())
return answer
Or, the list comprehension: 或者,列表理解:
ten_lines = [fp.readline() for _ in range(10)]
In both cases, fp = open('path/to/file')
在这两种情况下,
fp = open('path/to/file')
Another solution which can get rid of the silly infinite loop in favor of a more familiar for
loop relies on itertools.izip_longest
and a small trick with iterators. 可以摆脱愚蠢的无限循环而采用更熟悉的
for
循环的另一种解决方案,取决于itertools.izip_longest
和一个带有迭代器的小技巧。 The trick is that zip(*[iter(iterator)]*n)
breaks iterator
up into chunks of size n. 诀窍是
zip(*[iter(iterator)]*n)
将iterator
分解为大小为n的块。 Since a file is already generator-like iterator (as opposed to being sequence like), we can write: 由于文件已经是类似于生成器的迭代器(而不是类似于序列的迭代器),我们可以编写:
from itertools import izip_longest
with open('data.txt') as f:
for ten_lines in izip_longest(*[f]*10,fillvalue=None):
if ten_lines[-1] is None:
ten_lines = filter(ten_lines) #filter removes the `None` values at the end
process(ten_lines)
from itertools import groupby, count
with open("data.txt") as f:
groups = groupby(f, key=lambda x,c=count():next(c)//10)
for k, v in groups:
bunch_of_lines = list(v)
print bunch_of_lines
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.