简体   繁体   English

在python中存储百万像素的最佳数据结构?

[英]optimal data structure to store million of pixels in python?

I have several images and after some basic processing and contour detection I want to store the detected pixels locations and their adjacent neighbours values into a Python Data Structure.我有几张图像,经过一些基本处理和轮廓检测后,我想将检测到的像素位置及其相邻的邻居值存储到 Python 数据结构中。 I settled for numpy.array我选择了numpy.array

The pixel locations from each Image are retrieved using:使用以下方法检索每个图像的像素位置:

locationsPx = cv2.findNonZero(SomeBWImage)

which will return an array of the shape (NumberOfPixels,1L,2L) with :这将返回一个形状数组 (NumberOfPixels,1L,2L),其中:

print(locationsPx[0]) : array([[1649,    4]])

for example.例如。

My question is: is it possible to store this double array on a single column in another array?我的问题是:是否可以将这个双数组存储在另一个数组的单列上? Or should I use a list and drop the array all together?或者我应该使用列表并将数组放在一起吗?

note: the dataset of images might increase so the dimensions of my chose data structure will not be only huge, but also variable注意:图像数据集可能会增加,所以我选择的数据结构的维度不仅会很大,而且还会变化

EDIT: or maybe numpy.array is not good idea and Pandas Dataframe is better suited?编辑:或者 numpy.array 不是个好主意而 Pandas Dataframe 更适合? I am open to suggestion from those who have more experience in this.我愿意听取那些在这方面有更多经验的人的建议。

Numpy arrays are great for computation. Numpy 数组非常适合计算。 They are not great for storing data if the size of the data keeps changing.如果数据的大小不断变化,它们就不太适合存储数据。 As ali_m pointed out, all forms of array concatenation in numpy are inherently slow.正如 ali_m 指出的那样,numpy 中所有形式的数组连接本质上都很慢。 Better to store the arrays in a plain-old python list:最好将数组存储在一个普通的 Python 列表中:

coordlist = []
coordlist.append(locationsPx[0])

Alternatively, if your images have names, it might be attractive to use a dict with the image names as keys:或者,如果您的图像有名称,则使用以图像名称作为键的dict可能很有吸引力:

coorddict = {}
coorddict[image_name] = locationsPx[0]

Either way, you can readily iterate over the contents of the list:无论哪种方式,您都可以轻松地遍历列表的内容:

for coords in coordlist:

or

for image_name, coords in coorddict.items():

And pickle is a convenient way to store your results in a file:pickle是一种将结果存储在文件中的便捷方式:

import pickle
with open("filename.pkl", "wb") as f:
    pickle.dump(coordlist, f, pickle.HIGHEST_PROTOCOL)

(or same with coorddict instead of coordlist). (或与coorddict而不是 coordlist 相同)。 Reloading is trivially easy as well:重新加载也非常简单:

with open("filename.pkl", "rb") as f:
    coordlist = pickle.load(f)

There are some security concerns with pickle , but if you only load files you have created yourself, those don't apply. pickle存在一些安全问题,但如果您只加载自己创建的文件,则这些问题不适用。

If you find yourself frequently adding to a previously pickled file, you might be better off with an alternative back end, such as sqlite .如果您发现自己经常添加到先前腌制的文件中,则最好使用替代后端,例如sqlite

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM