[英]file stream processing in python
I've got a data file where each "row" is delimited by \\n\\n\\n
. 我有一个数据文件,其中每个“行”都由
\\n\\n\\n
分隔。 My solution is to isolate those rows by first slurping the file, and then splitting rows: 我的解决方案是通过首先对文件进行打浆,然后拆分行来隔离这些行:
for row in slurped_file.split('\n\n\n'):
...
Is there an " awk
-like" approach I could take to parse the file as a stream within Python 2.7.9 , and split lines according to a given string value ? 我是否可以采用“类似于
awk
的”方法将文件解析为Python 2.7.9中的流,并根据给定的字符串值分割行? Thanks. 谢谢。
So there is no such thing in the standard library. 因此,标准库中没有这样的东西。 But we can make a custom generator to iterate over such records:
但是我们可以创建一个自定义生成器来迭代这些记录:
def chunk_iterator(iterable):
chunk = []
empty_lines = 0
for line in iterable:
chunk.append(line)
if line == '\n':
empty_lines += 1
if empty_lines == 2:
yield ''.join(chunk[:-2])
empty_lines, chunk = 0, []
else:
empty_lines = 0
yield ''.join(chunk)
Use as: 用于:
with open('filename') as f:
for chunk in chunk_iterator(f):
...
This will use the per-line iteration of file written in C in CPython and thus be faster than the general record separator solution. 这将使用在CPython中用C语言编写的文件的每行迭代,因此比常规记录分隔符解决方案要快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.