简体   繁体   English

多处理中的Python巨大文件读取

[英]Python Huge File Reading in multiprocessing

I have binary file that contains invariable number of images (size of each image 1024*768). 我有一个二进制文件,其中包含不变数量的图像(每个图像的大小为1024 * 768)。 I put each image to JoinableQueue and analyzed it in multiprocessing, and it works perfect with small files, but I get Memory Error when I try to read huge files. 我将每个图像放入JoinableQueue并在多处理中对其进行了分析,它非常适合小文件,但是当尝试读取大文件时出现内存错误。 Anybody know how can i store big files to bufer/Queue(as string)? 有人知道如何将大文件存储到缓冲区/队列(作为字符串)吗? (unfortunately i can't use Manager or Pool) (不幸的是我不能使用管理器或池)

Did you have a look at the module io.BytesIO? 您是否看过模块io.BytesIO? You can find it here: https://docs.python.org/release/3.1.3/library/io.html#binary-io You can set your Buffer size, that solved a memory problem for me once. 您可以在这里找到它: https : //docs.python.org/release/3.1.3/library/io.html#binary-io您可以设置缓冲区大小,这一次为我解决了内存问题。

  1. You can read about buffer here . 您可以在此处阅读有关缓冲区的信息
  2. If your memory if small, you can try force gc like that: 如果您的内存较小,则可以尝试强制执行gc:
import gc

SIZE = 1024*768  
MEMOSIZE = 1024  # your memory size
with open('xxx', 'rb') as fp:  # open the file
    i = 0  # remember the number to gc in time
    queue = []
    while True:
        if (i*(SIZE-1) < MEMOSIZE):
            x = fp.read(SIZE)  # if your image is single channel
            queue.append(x)
            # do something
        else:
            del queue
            gc.collect()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM