加入 Python 中的大文件

Question

I have several HEVEC files that I'd like to merge.我有几个要合并的 HEVEC 文件。 With small files (about 1.5 GB) the following code works fine对于小文件（大约 1.5 GB），以下代码可以正常工作

with open(path+"/"+str(sys.argv[2])+"_EL.265", "wb") as outfile:
        for fname in dirs:
                with open(path+"/"+fname, 'rb') as infile:
                    outfile.write(infile.read())

With bigger files (8 GB or more) the same code get stuck.对于更大的文件（8 GB 或更多），相同的代码会卡住。 I've copied from here ( Lazy Method for Reading Big File in Python? ) the code to read big files in chunks and I've integrated it with my code:我从这里（ Python 中的读取大文件的惰性方法？）复制了用于读取大文件的代码，并将其与我的代码集成：

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
        for fname in dirs:
                    with open(path+"/"+fname, 'rb') as infile:
                            for piece in read_in_chunks(infile):
                                outfile_bl.write(infile.read())

This code produces a file that is of the right size but it is no longer an HEVC file and cannot be read by a video player.此代码生成大小合适的文件，但它不再是 HEVC 文件，视频播放器无法读取。

Any Idea?任何想法？ Please help请帮忙

Dario达里奥

Answer 1

You are reading from infile in two different places: inside read_in_chunks , and directly when you call outfile_bl .您在两个不同的地方从infile读取：在read_in_chunks内部，以及在您调用outfile_bl时直接读取。 This causes you to skip writing the data just read into the variable piece , so you only copy roughly half the file.这会导致您跳过将刚刚读取的数据写入变量piece ，因此您只复制了大约一半的文件。

You've already read data into piece ;您已经将数据读入piece ； just write that to your file.只需将其写入您的文件即可。

with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
    for fname in dirs:
        with open(path+"/"+fname, 'rb') as infile:
            for piece in read_in_chunks(infile):
                outfile_bl.write(piece)

As an aside, you don't really need to define read_in_chunks , or at least its definition can be simplified greatly by using iter :顺便说一句，您实际上不需要定义read_in_chunks ，或者至少可以通过使用iter大大简化其定义：

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""

    yield from iter(lambda: file_object.read(chunk_size), '')

    # Or
    # from functools import partial
    # yield from iter(partial(file_object.read, chunk_size), '')

加入 Python 中的大文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-02-03 14:58:47

加入 Python 中的大文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-02-03 14:58:47

解决方案1
1 已采纳 2022-02-03 14:58:47