I have a fairly large text file which I would like to run in chunks. In order to do this with the subprocess
library, one would execute following shell command:
"cat hugefile.log"
with the code:
import subprocess
task = subprocess.Popen("cat hugefile.log", shell=True, stdout=subprocess.PIPE)
data = task.stdout.read()
Using print(data)
will spit out the entire contents of the file at once. How can I present the number of chunks, and then access the contents of this file by the chunk size (eg chunk = three lines at a time).
It must be something like:
chunksize = 1000 # break up hugefile.log into 1000 chunks
for chunk in data:
print(chunk)
The equivalent question with Python open()
of course uses the code
with open('hugefile.log', 'r') as f:
read_data = f.read()
How would you read_data
in chunks?
Using a file, you can iterate on the file handle (no need for subprocess to open cat
):
with open('hugefile.log', 'r') as f:
for read_line in f:
print(read_line)
Python reads a line by reading all the chars up to \\n
. To simulate the line-by-line I/O, just call it 3 times. or read and count 3 \\n
chars but you have to handle the end of file, etc... not very useful and you won't gain any speed by doing that.
with open('hugefile.log', 'r') as f:
while True:
read_3_lines = ""
try:
for i in range(3):
read_3_lines += next(f)
# process read_3_lines
except StopIteration: # end of file
# process read_3_lines if nb lines not divisible by 3
break
With Popen
you can do exactly the same, as a bonus add poll
to monitor the process (no need with cat
but I suppose that your process is different and that's only for the question's purpose)
import subprocess
task = subprocess.Popen("cat hugefile.log", shell=True, stdout=subprocess.PIPE)
while True:
line = task.stdout.readline()
if line == '' and task.poll() != None: break
rc = task.wait() # wait for completion and get return code of the command
Python 3 compliant code supporting encoding:
line = task.stdout.readline().decode("latin-1")
if len(line) == 0 and task.poll() != None: break
Now, if you want to split the file into a given number of chunks:
Popen
for obvious reasons: you would have to know the size of the output first code:
import os,sys
filename = "hugefile.log"
filesize = os.path.getsize(filename)
nb_chunks = 1000
chunksize = filesize // nb_chunks
with open(filename,"r") as f:
while True:
chunk = f.read(chunksize)
if chunk=="":
break
# do something useful with the chunk
sys.stdout.write(chunk)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.