简体   繁体   English

提高 Python 中字典搜索的性能

[英]Improve performance of dictionary search in Python

I am using a dictionary structure in python to store (row, col) from a large numpy array.我在 python 中使用字典结构从一个大的 numpy 数组中存储 (row, col)。 The length of the dictionary object is almost 100,000字典对象的长度接近100,000

The key is a tuple of (row,col).键是 (row,col) 的元组。 Some of the sample values in this structure are:此结构中的一些示例值是:

OrderedDict([((1783, 586), 0), ((1783, 587), 1), ((1783, 588), 2), ((1783, 589), 3), ((1783, 590), 4), ((1784, 584), 5), ((1784, 585), 6), ((1784, 586), 7), ((1784, 587), 8), ((1784, 588), 9), ((1784, 589), 10), ((1784, 590), 11), ((1784, 591), 12), ((1784, 592), 13), ((1784, 593), 14), ((1784, 594), 15), ((1784, 595), 16), ((1785, 583), 17), ((1785, 584), 18), ((1785, 585), 19), ((1785, 586), 20), ((1785, 587), 21), ((1785, 588), 22), ((1785, 589), 23), ((1785, 590), 24), ((1785, 591), 25), ((1785, 592), 26), ((1785, 593), 27), ((1785, 594), 28), ((1785, 595), 29), ((1785, 596), 30), ((1785, 597), 31),...

The processing is taking forever for lookups using the key.使用该键进行查找的处理将永远持续下去。

I perform a lookup using (row,col):我使用 (row,col) 执行查找:

if (1783,586) in keyed_var_pixels:

Based on this post, using in keyword for a dict object should use hashing.根据这篇文章,对 dict 对象使用in关键字应该使用散列。 For each of the lookup, it seems to take around 0.02 seconds, and a total of 30 mins if running for the entire dataset.对于每次查找,如果运行整个数据集,似乎需要大约 0.02 秒,总共需要 30 分钟。 This seems too long for a hashed retrieval.这对于散列检索来说似乎太长了。 I am wondering how I can improve this runtime?我想知道如何改进这个运行时? Or any alternative data structure to store these values for fast retrieval and existence check.或者任何替代数据结构来存储这些值以进行快速检索和存在检查。

Thanks in advance!提前致谢!

I did a few performance test a while ago and a two level dictionary is faster than using a tuple as key:不久前我做了一些性能测试,两级字典比使用元组作为键要快:

d = { 1783:{ 586:0, 587:1 ... },  ... }

if 1783 in d and 586 in d[1783] : 
    # ...

or you can define an empty default and do it like this:或者您可以定义一个空的默认值并这样做:

notFound = dict()
# ... 
if 586 in d.get(1783,notFound):
   # ...

or this:或这个:

value = d.get(1783,notFount).get(586,None)
if value is not None:
   # ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM