简体繁体 English

将大量图像保存为数组

[英]Saving a high number of images as an array

原文 2019-10-19 23:41:51 2 1 python/ numpy/ image-processing

I have a high number of videos and I want to extract the frames, pre-process them and then create an array for each video.我有大量视频，我想提取帧，对它们进行预处理，然后为每个视频创建一个数组。 So far I have created the arrays but the final size of each array is too big for all of the videos.到目前为止，我已经创建了 arrays，但每个数组的最终大小对于所有视频来说都太大了。 I have 224 videos, each resulting in a 6GB array totaling more than 1.2TB.我有 224 个视频，每个视频组成一个 6GB 阵列，总计超过 1.2TB。 I have tried using numpy.save and pickle.dump but both create the same size on the system.我尝试使用 numpy.save 和 pickle.dump 但两者都在系统上创建相同的大小。 Do you have a recommendation or an alternative way in general?您有一般的建议或替代方法吗？

1 个解决方案

Do these steps for each of the videos:对每个视频执行以下步骤：

Load the data into one NumPy array.将数据加载到一个 NumPy 数组中。
Write to disk using np.save() with the extension .npy .使用扩展名为.npy的np.save()写入磁盘。
Add the .npy file to a .zip compressed archive using the zipfile module.使用zipfile模块将.npy文件添加到.zip压缩存档中。

The end result will be as if you loaded all 224 arrays and saved them at once using np.savez_compressed , but it will only use enough RAM to process a single video at a time, instead of having to store all the uncompressed data in memory at once.最终结果就像您加载了所有 224 arrays 并使用np.savez_compressed一次保存它们，但它一次只使用足够的 RAM 来处理单个视频，而不必将所有未压缩数据存储在 memory 中一次。

Finally, np.load() (or zipfile ) can be used to load the data from disk, one video at a time, or even using concurrent.futures.ThreadPoolExecutor to load multiple files at once using multiple cores for decompression to save time (you can get speedup almost linear with the number of cores, if your disk is fast).最后， np.load() （或zipfile ）可用于从磁盘加载数据，一次加载一个视频，甚至使用concurrent.futures.ThreadPoolExecutor一次加载多个文件，使用多个内核进行解压缩以节省时间（如果您的磁盘速度很快，您可以获得几乎与核心数量成线性关系的加速）。