在python中存储百万像素的最佳数据结构？

Question

I have several images and after some basic processing and contour detection I want to store the detected pixels locations and their adjacent neighbours values into a Python Data Structure.我有几张图像，经过一些基本处理和轮廓检测后，我想将检测到的像素位置及其相邻的邻居值存储到 Python 数据结构中。 I settled for numpy.array我选择了numpy.array

The pixel locations from each Image are retrieved using:使用以下方法检索每个图像的像素位置：

locationsPx = cv2.findNonZero(SomeBWImage)

which will return an array of the shape (NumberOfPixels,1L,2L) with :这将返回一个形状数组 (NumberOfPixels,1L,2L)，其中：

print(locationsPx[0]) : array([[1649,    4]])

for example.例如。

My question is: is it possible to store this double array on a single column in another array?我的问题是：是否可以将这个双数组存储在另一个数组的单列上？ Or should I use a list and drop the array all together?或者我应该使用列表并将数组放在一起吗？

note: the dataset of images might increase so the dimensions of my chose data structure will not be only huge, but also variable注意：图像数据集可能会增加，所以我选择的数据结构的维度不仅会很大，而且还会变化

EDIT: or maybe numpy.array is not good idea and Pandas Dataframe is better suited?编辑：或者 numpy.array 不是个好主意而 Pandas Dataframe 更适合？ I am open to suggestion from those who have more experience in this.我愿意听取那些在这方面有更多经验的人的建议。

Answer 1

Numpy arrays are great for computation. Numpy 数组非常适合计算。 They are not great for storing data if the size of the data keeps changing.如果数据的大小不断变化，它们就不太适合存储数据。 As ali_m pointed out, all forms of array concatenation in numpy are inherently slow.正如 ali_m 指出的那样，numpy 中所有形式的数组连接本质上都很慢。 Better to store the arrays in a plain-old python list:最好将数组存储在一个普通的 Python 列表中：

coordlist = []
coordlist.append(locationsPx[0])

Alternatively, if your images have names, it might be attractive to use a dict with the image names as keys:或者，如果您的图像有名称，则使用以图像名称作为键的dict可能很有吸引力：

coorddict = {}
coorddict[image_name] = locationsPx[0]

Either way, you can readily iterate over the contents of the list:无论哪种方式，您都可以轻松地遍历列表的内容：

for coords in coordlist:

or或

for image_name, coords in coorddict.items():

And pickle is a convenient way to store your results in a file:而pickle是一种将结果存储在文件中的便捷方式：

import pickle
with open("filename.pkl", "wb") as f:
    pickle.dump(coordlist, f, pickle.HIGHEST_PROTOCOL)

(or same with coorddict instead of coordlist). （或与coorddict而不是 coordlist 相同）。 Reloading is trivially easy as well:重新加载也非常简单：

with open("filename.pkl", "rb") as f:
    coordlist = pickle.load(f)

There are some security concerns with pickle , but if you only load files you have created yourself, those don't apply. pickle存在一些安全问题，但如果您只加载自己创建的文件，则这些问题不适用。

If you find yourself frequently adding to a previously pickled file, you might be better off with an alternative back end, such as sqlite .如果您发现自己经常添加到先前腌制的文件中，则最好使用替代后端，例如sqlite 。

在python中存储百万像素的最佳数据结构？

问题描述

1 个解决方案

解决方案1
1 2020-04-07 04:16:57

在python中存储百万像素的最佳数据结构？

问题描述

1 个解决方案

解决方案1 1 2020-04-07 04:16:57

解决方案1
1 2020-04-07 04:16:57