如何设置使用Python subprocess.Popen（）或open（）读取的文件中读取行的“块大小”？

Question

I have a fairly large text file which I would like to run in chunks. 我有一个很大的文本文件，我想分块运行。 In order to do this with the subprocess library, one would execute following shell command: 为了subprocess库执行此操作，将执行以下shell命令：

"cat hugefile.log"

with the code: 与代码：

import subprocess
task = subprocess.Popen("cat hugefile.log", shell=True,  stdout=subprocess.PIPE)
data = task.stdout.read()

Using print(data) will spit out the entire contents of the file at once. 使用print(data)将立即吐出文件的全部内容。 How can I present the number of chunks, and then access the contents of this file by the chunk size (eg chunk = three lines at a time). 如何显示块的数量，然后按块大小访问此文件的内容（例如，块=一次三行）。

It must be something like: 它一定是这样的：

chunksize = 1000   # break up hugefile.log into 1000 chunks

for chunk in data:
    print(chunk)

The equivalent question with Python open() of course uses the code Python open()的等效问题当然使用了代码

with open('hugefile.log', 'r') as f:
     read_data = f.read()

How would you read_data in chunks? 你将如何read_data在块？

Answer 1

Using a file, you can iterate on the file handle (no need for subprocess to open cat ): 使用文件，您可以迭代文件句柄（无需子进程打开cat ）：

with open('hugefile.log', 'r') as f:
     for read_line in f:
        print(read_line)

Python reads a line by reading all the chars up to \\n . Python通过读取直到\\n所有字符来读取一行。 To simulate the line-by-line I/O, just call it 3 times. 要模拟逐行I / O，只需调用3次即可。 or read and count 3 \\n chars but you have to handle the end of file, etc... not very useful and you won't gain any speed by doing that. 或读取并计数3个\\n字符，但是您必须处理文件的结尾，等等...不是很有用，这样做不会使您获得任何速度。

with open('hugefile.log', 'r') as f:
     while True:
        read_3_lines = ""
        try:
           for i in range(3):
               read_3_lines += next(f)
        # process read_3_lines
        except StopIteration:  # end of file
            # process read_3_lines if nb lines not divisible by 3
            break

With Popen you can do exactly the same, as a bonus add poll to monitor the process (no need with cat but I suppose that your process is different and that's only for the question's purpose) 使用Popen您可以做完全一样的事情，作为奖金添加poll来监视过程（不需要cat但我想您的过程有所不同，这仅是出于问题的目的）

import subprocess
task = subprocess.Popen("cat hugefile.log", shell=True,  stdout=subprocess.PIPE)
while True:
    line = task.stdout.readline()
    if line == '' and task.poll() != None: break

rc = task.wait()   # wait for completion and get return code of the command

Python 3 compliant code supporting encoding: 支持编码的符合Python 3的代码：

    line = task.stdout.readline().decode("latin-1")
    if len(line) == 0 and task.poll() != None: break

Now, if you want to split the file into a given number of chunks: 现在，如果要将文件拆分为给定数量的块：

you cannot use Popen for obvious reasons: you would have to know the size of the output first 出于明显的原因，您不能使用Popen ：您必须首先知道输出的大小
if you have a file as input you can do as follows: 如果您有文件作为输入，则可以执行以下操作：

code: 码：

import os,sys
filename = "hugefile.log"
filesize = os.path.getsize(filename)
nb_chunks = 1000
chunksize = filesize // nb_chunks

with open(filename,"r") as f:
   while True:
      chunk = f.read(chunksize)
      if chunk=="":
          break
      # do something useful with the chunk
      sys.stdout.write(chunk)

如何设置使用Python subprocess.Popen（）或open（）读取的文件中读取行的“块大小”？

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-09-15 11:37:35

如何设置使用Python subprocess.Popen（）或open（）读取的文件中读取行的“块大小”？

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-09-15 11:37:35

解决方案1
1 已采纳 2016-09-15 11:37:35