简体   繁体   English

有没有办法在现有二进制文件周围包装 numpy `ndarray` 接口?

[英]Is there a way to wrap an numpy `ndarray` interface around an existing binary file?

I have a binary.network capture ( .pcapng ) file that contains video data.我有一个包含视频数据的 binary.network 捕获 ( .pcapng ) 文件。 I am parsing the .pcapng with scapy and I can extract the data, but the video sequences I am working with are very large and the operations I want to perform quickly grind my machine to a halt if I load very much data at once.我正在用 scapy 解析.pcapng并且我可以提取数据,但是我正在处理的视频序列非常大,如果我一次加载非常多的数据,我想要执行的操作很快就会让我的机器停止运转。 One approach to deal with this would be to extract all the data and save it into a mmap file, or better yet, HDF5.处理此问题的一种方法是提取所有数据并将其保存到 mmap 文件中,或者更好的是,HDF5。 However, before I sign up for making copies of all the data, I wanted to see if it is possible to memory map the existing files in place.但是,在我注册复制所有数据之前,我想看看是否可以 memory map 现有文件就位。 Is there a way to make a discontinuous mmap into an existing file that tells an ndarray object where to find memory associated with a given index, when that memory may be in arbitrary locations within the file?有没有办法将不连续的 mmap 制作到现有文件中,告诉 ndarray object 在哪里可以找到与给定索引关联的 memory,而 memory 可能位于文件中的任意位置? I haven't found any good analogs in mmap, which assumes a contiguous file is available.我没有在 mmap 中找到任何好的类似物,它假设有一个连续的文件可用。 I imagine some ndarray subclass that loads up the file, scans the file for the boundaries of all the relevant imagery data within the .pcapng file, and provides a custom implementation of ndarray __index__ method that can return the appropriate file offset(s) for a given index or slice.我想象一些加载文件的ndarray子类,扫描文件以.pcapng文件中所有相关图像数据的边界,并提供ndarray方法的自定义实现, __index__方法可以返回适当的文件偏移量给定索引或切片。 Is this bonkers, or is there a better (already solved) method for doing this?这是疯子,还是有更好的(已经解决的)方法来做到这一点?

Construct a mmap.mmap object from the file and pass that to the numpy.ndarray constructor.从文件构造一个mmap.mmap object 并将其传递给numpy.ndarray构造函数。 For example:例如:

import os
import numpy

with open("filename", "rb") as f:
    mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

nd = numpy.ndarray(1000000,buffer=mm)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM