简体   繁体   English

Python:如何按字节块从标准输入读取直到 EOF?

[英]Python: How to read from stdin by byte chunks until EOF?

I want to read from standard input chunk by chunk until EOF.我想从标准输入块中逐块读取,直到 EOF。 For example, I could have a very large file, and I want to read in and process 1024 bytes at a time from STDIN until EOF is encountered.例如,我可能有一个非常大的文件,我想从 STDIN 一次读入和处理 1024 个字节,直到遇到 EOF。 I've seen sys.stdin.read() which saves everything in memory at once.我见过 sys.stdin.read() 一次将所有内容保存在内存中。 This isn't feasible because there might not be enough space available to store the entire file.这是不可行的,因为可能没有足够的空间来存储整个文件。 There is also for "line in sys.stdin", but that separates the input by newline only, which is not what I'm looking for.还有“sys.stdin 中的行”,但它仅通过换行符分隔输入,这不是我要找的。 Is there any way to accomplish this in Python?有什么办法可以在 Python 中实现这一点吗?

The read() method of a file object accepts an optional size parameter.文件对象的read()方法接受一个可选的size参数。

If you specify size , at most size bytes are read and returned.如果指定size ,则最多读取和返回 size 字节。 If the end of the file has been reached, f.read() will return an empty string ('').如果已到达文件末尾,f.read() 将返回一个空字符串 ('')。

See the io docs and open() docs .请参阅io 文档open() 文档

Pseudo code:伪代码:

with open('file') as f:
    while True:
        buffer = f.read(1024) # Returns *at most* 1024 bytes, maybe less
        if buffer = '':
            break
        process_data(buffer)

You can read stdin (or any file) in chunks using f.read(n) , where n is the integer number of bytes you want to read as an argument.您可以使用f.read(n) ) 以块的形式读取 stdin(或任何文件),其中n是要作为参数读取的整数字节数。 It will return the empty string if there is nothing left in the file.如果文件中没有任何内容,它将返回空字符串。

Inspired by @Andre's answer, but with python3 code and also handles SIGINT (just because...):受到@Andre 的回答的启发,但使用 python3 代码并处理 SIGINT(只是因为...):

#!/usr/bin/env python3

########
# g.py #
########

import signal
import sys


def process_data(buffer):
    sys.stdout.buffer.write(buffer)
    sys.stdout.buffer.flush()


def read_stdin_stream(handler, chunk_size=1024):
    with sys.stdin as f:
        while True:
            buffer = f.buffer.read(chunk_size)
            if buffer == b'':
                break
            handler(buffer)


def signal_handler(sig, frame):
    sys.stdout.buffer.flush()
    sys.exit(0)


def main():
    signal.signal(signal.SIGINT, signal_handler)

    # notice the `chunk_size` of 1 for this particular example
    read_stdin_stream(process_data, chunk_size=1)


if __name__ == "__main__":
    main()

Example:例子:

$ for i in $(seq 1 5); do echo -n "$i" && sleep 1; done | python3 g.py
12345

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM