简体   繁体   English

python中的文件流处理

[英]file stream processing in python

I've got a data file where each "row" is delimited by \\n\\n\\n . 我有一个数据文件,其中每个“行”都由\\n\\n\\n分隔。 My solution is to isolate those rows by first slurping the file, and then splitting rows: 我的解决方案是通过首先对文件进行打浆,然后拆分行来隔离这些行:

 for row in slurped_file.split('\n\n\n'):
    ...

Is there an " awk -like" approach I could take to parse the file as a stream within Python 2.7.9 , and split lines according to a given string value ? 我是否可以采用“类似于awk的”方法将文件解析为Python 2.7.9中的流,并根据给定的字符串值分割行? Thanks. 谢谢。

So there is no such thing in the standard library. 因此,标准库中没有这样的东西。 But we can make a custom generator to iterate over such records: 但是我们可以创建一个自定义生成器来迭代这些记录:

def chunk_iterator(iterable):
    chunk = []
    empty_lines = 0
    for line in iterable:
        chunk.append(line)
        if line == '\n':
            empty_lines += 1
            if empty_lines == 2:
                yield ''.join(chunk[:-2])
                empty_lines, chunk = 0, []
        else:
            empty_lines = 0

    yield ''.join(chunk)

Use as: 用于:

with open('filename') as f:
    for chunk in chunk_iterator(f):
        ...

This will use the per-line iteration of file written in C in CPython and thus be faster than the general record separator solution. 这将使用在CPython中用C语言编写的文件的每行迭代,因此比常规记录分隔符解决方案要快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM