[英]joining big files in Python
I have several HEVEC files that I'd like to merge.我有几个要合并的 HEVEC 文件。 With small files (about 1.5 GB) the following code works fine
对于小文件(大约 1.5 GB),以下代码可以正常工作
with open(path+"/"+str(sys.argv[2])+"_EL.265", "wb") as outfile:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
outfile.write(infile.read())
With bigger files (8 GB or more) the same code get stuck.对于更大的文件(8 GB 或更多),相同的代码会卡住。 I've copied from here ( Lazy Method for Reading Big File in Python? ) the code to read big files in chunks and I've integrated it with my code:
我从这里( Python 中的读取大文件的惰性方法?)复制了用于读取大文件的代码,并将其与我的代码集成:
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
for piece in read_in_chunks(infile):
outfile_bl.write(infile.read())
This code produces a file that is of the right size but it is no longer an HEVC file and cannot be read by a video player.此代码生成大小合适的文件,但它不再是 HEVC 文件,视频播放器无法读取。
Any Idea?任何想法? Please help
请帮忙
Dario达里奥
You are reading from infile
in two different places: inside read_in_chunks
, and directly when you call outfile_bl
.您在两个不同的地方从
infile
读取:在read_in_chunks
内部,以及在您调用outfile_bl
时直接读取。 This causes you to skip writing the data just read into the variable piece
, so you only copy roughly half the file.这会导致您跳过将刚刚读取的数据写入变量
piece
,因此您只复制了大约一半的文件。
You've already read data into piece
;您已经将数据读入
piece
; just write that to your file.只需将其写入您的文件即可。
with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
for piece in read_in_chunks(infile):
outfile_bl.write(piece)
As an aside, you don't really need to define read_in_chunks
, or at least its definition can be simplified greatly by using iter
:顺便说一句,您实际上不需要定义
read_in_chunks
,或者至少可以通过使用iter
大大简化其定义:
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
yield from iter(lambda: file_object.read(chunk_size), '')
# Or
# from functools import partial
# yield from iter(partial(file_object.read, chunk_size), '')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.