python中的文件流处理

Question

I've got a data file where each "row" is delimited by \\n\\n\\n . 我有一个数据文件，其中每个“行”都由\\n\\n\\n分隔。 My solution is to isolate those rows by first slurping the file, and then splitting rows: 我的解决方案是通过首先对文件进行打浆，然后拆分行来隔离这些行：

 for row in slurped_file.split('\n\n\n'):
    ...

Is there an " awk -like" approach I could take to parse the file as a stream within Python 2.7.9 , and split lines according to a given string value ? 我是否可以采用“类似于awk的”方法将文件解析为Python 2.7.9中的流，并根据给定的字符串值分割行？ Thanks. 谢谢。

Answer 1

So there is no such thing in the standard library. 因此，标准库中没有这样的东西。 But we can make a custom generator to iterate over such records: 但是我们可以创建一个自定义生成器来迭代这些记录：

def chunk_iterator(iterable):
    chunk = []
    empty_lines = 0
    for line in iterable:
        chunk.append(line)
        if line == '\n':
            empty_lines += 1
            if empty_lines == 2:
                yield ''.join(chunk[:-2])
                empty_lines, chunk = 0, []
        else:
            empty_lines = 0

    yield ''.join(chunk)

Use as: 用于：

with open('filename') as f:
    for chunk in chunk_iterator(f):
        ...

This will use the per-line iteration of file written in C in CPython and thus be faster than the general record separator solution. 这将使用在CPython中用C语言编写的文件的每行迭代，因此比常规记录分隔符解决方案要快。

python中的文件流处理

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-02-19 18:11:38

python中的文件流处理

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-02-19 18:11:38

解决方案1
1 已采纳 2015-02-19 18:11:38