简体   繁体   English

如果字典中的2个值属于同一个键,它们是否属于另一个字典中的相同键?

[英]if 2 values in a dictionary belong to a same key, do they belong to the same key in another dictionary?

I am trying to compare to two dictionary to check the accuracy on a big dataset 我试图比较两个字典来检查大数据集的准确性

I want to see if two points belong to the same key in dictionary 1, they belong to the same key in dictionary 2 我想看看两个点是否属于字典1中的相同键,它们属于字典2中的相同键

I have way to much point to do a double for loop with "if points in both dictionaries" i'm looking for a faster way to compare both dictionary 我有点要做一个双循环用“如果两个字典中的点”我正在寻找一种更快的方式来比较两个字典

dict_1 has only 1 key for each point_id where dict_2 can have multiple key for 1 point_id dict_1每个point_id只有一个密钥,其中dict_2可以有1个point_id的多个密钥

both dictionary look like: 这两本字典看起来像:

{key1 : [list of point id], key2 : [list of point id], etc}

dict_1 = {key1 : [1,2,3,4,5,6], key2 : [7,8,9,10,11,12]}  
dict_2 = {key3 : [1,2,4,6,8,11,12], key4 :[2,5,7,9,10,11,12]}

def accuracy_from_dict_to_dict(dict_1,dict_2):
    total, truth = 0,0
    for key_dict_1 in dict_1:
        point_of_key = dict_1.get(key_dict_1)
        i=0
        while i < len(point_of_key): #for each point of the key_dict_1 list
          j = i+1
          while j < len(point of key):
              for key_dict_2 in dict_2:
                  point_i = point_of_key[i]
                  point_j = point_of_key[j]
                  if point_i in key_dict_2 and point_j in key_dict_2:
                      truth += 1
                  total += 1
                  j += 1  
          i+=1

The problem is not the code itself but more the computation time. 问题不在于代码本身,而在于计算时间。 unless the data set is small enough, it's to long to be run 除非数据集足够小,否则要运行很长时间

Looks like you are just checking 2-item combinations from the both dicts. 看起来你只是从两个dicts中检查两项组合。 You can do it better using itertools module from the standart library: 你可以使用standart库中的itertools模块做得更好:

from itertools import combinations, chain

dict_1 = {'key1' : [1,2,3,4,5,6], 'key2' : [7,8,9,10,11,12]}  
dict_2 = {'key3' : [1,2,4,6,8,11,12], 'key4' :[2,5,7,9,10,11,12]}

c_dict1 = set(chain.from_iterable(combinations(v, 2) for v in dict_1.values()))
c_dict2 = set(chain.from_iterable(combinations(v, 2) for v in dict_2.values()))
total = len(c_dict1) + len(c_dict2)
similarity = len(c_dict1 & c_dict2) / total
print(total, similarity)

will print you: 会打印给你:

69 0.2753623188405797

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM