如何設置使用Python subprocess.Popen（）或open（）讀取的文件中讀取行的“塊大小”？

Question

我有一個很大的文本文件，我想分塊運行。 為了subprocess庫執行此操作，將執行以下shell命令：

"cat hugefile.log"

與代碼：

import subprocess
task = subprocess.Popen("cat hugefile.log", shell=True,  stdout=subprocess.PIPE)
data = task.stdout.read()

使用print(data)將立即吐出文件的全部內容。 如何顯示塊的數量，然后按塊大小訪問此文件的內容（例如，塊=一次三行）。

它一定是這樣的：

chunksize = 1000   # break up hugefile.log into 1000 chunks

for chunk in data:
    print(chunk)

Python open()的等效問題當然使用了代碼

with open('hugefile.log', 'r') as f:
     read_data = f.read()

你將如何read_data在塊？

Answer 1

使用文件，您可以迭代文件句柄（無需子進程打開cat ）：

with open('hugefile.log', 'r') as f:
     for read_line in f:
        print(read_line)

Python通過讀取直到\\n所有字符來讀取一行。 要模擬逐行I / O，只需調用3次即可。 或讀取並計數3個\\n字符，但是您必須處理文件的結尾，等等...不是很有用，這樣做不會使您獲得任何速度。

with open('hugefile.log', 'r') as f:
     while True:
        read_3_lines = ""
        try:
           for i in range(3):
               read_3_lines += next(f)
        # process read_3_lines
        except StopIteration:  # end of file
            # process read_3_lines if nb lines not divisible by 3
            break

使用Popen您可以做完全一樣的事情，作為獎金添加poll來監視過程（不需要cat但我想您的過程有所不同，這僅是出於問題的目的）

import subprocess
task = subprocess.Popen("cat hugefile.log", shell=True,  stdout=subprocess.PIPE)
while True:
    line = task.stdout.readline()
    if line == '' and task.poll() != None: break

rc = task.wait()   # wait for completion and get return code of the command

支持編碼的符合Python 3的代碼：

    line = task.stdout.readline().decode("latin-1")
    if len(line) == 0 and task.poll() != None: break

現在，如果要將文件拆分為給定數量的塊：

出於明顯的原因，您不能使用Popen ：您必須首先知道輸出的大小
如果您有文件作為輸入，則可以執行以下操作：

碼：

import os,sys
filename = "hugefile.log"
filesize = os.path.getsize(filename)
nb_chunks = 1000
chunksize = filesize // nb_chunks

with open(filename,"r") as f:
   while True:
      chunk = f.read(chunksize)
      if chunk=="":
          break
      # do something useful with the chunk
      sys.stdout.write(chunk)

如何設置使用Python subprocess.Popen（）或open（）讀取的文件中讀取行的“塊大小”？

問題描述

1 個解決方案

解決方案1
1 已采納 2016-09-15 11:37:35

如何設置使用Python subprocess.Popen（）或open（）讀取的文件中讀取行的“塊大小”？

問題描述

1 個解決方案

解決方案1 1 已采納 2016-09-15 11:37:35

解決方案1
1 已采納 2016-09-15 11:37:35