简体   繁体   English

计算两个列表字典之间的相似度的最有效方法是什么?

[英]What is the most efficient way of computing similarity between two dictionnaries of lists?

I want to compute accuracy using sets logic.我想使用集合逻辑计算准确性。 I'll explain with an example:我会用一个例子来解释:

For these two dictionnaries:对于这两个字典:

d1 = {1: {'hello', 'goodbye'}, 2:{'sayonnara'}, 3:{'origami'}}
d2 = {1: {'goodbye'}, 2:{'hola', 'bye'}, 3:{'bird','origami','giraffe'}}

I want to get this result:我想得到这个结果:

{1: 0.5, 2: 0, 3: 0.3333333333333333}

I'm doing it this way:我是这样做的:

from collections import defaultdict
acc=defaultdict(list)
for (k,v1) in d1.items():
    for (k,v) in d2.items():
        nb=len(v1.intersection(v))
        if (nb>0):
            print(nb)
            acc[k] = nb/ (abs(len(v) - len(v1))+1)
            print(acc)
        if k not in acc.keys():
            acc[k] = 0

Is there a more efficient solution than this?还有比这更有效的解决方案吗?

If we operate under the assumption that both dicts will have the same keys, this can be done with a dict comprehension with a single loop:如果我们假设两个 dict 都具有相同的键,那么这可以通过带有单个循环的 dict 理解来完成:

print({k1: (len(v1.intersection(d2[k1])) / (abs(len(v1) - len(d2[k1])) + 1))
       for k1, v1 in d1.items()})

outputs产出

{1: 0.5, 2: 0.0, 3: 0.3333333333333333}

This can be generalized a bit by making sure we take into account only the common keys between the two dicts, just to be on the safe side.这可以通过确保我们只考虑两个字典之间的公共键来概括,只是为了安全起见。

print({common_key: (len(d1[common_key].intersection(d2[common_key])) / (abs(len(d1[common_key]) - len(d2[common_key])) + 1))
       for common_key in d1.keys() & d2.keys()})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算大 DataFrame 成对余弦相似度的最有效方法 - Most efficient way of computing pairwise cosine similarity for large DataFrame Python Pandas:在循环中比较两个列表的最有效方法是什么? - Python Pandas: What is the most efficient way to compare two lists in a loop? 使用两个列表创建dict的最有效方法是什么? - what is the most efficient way to creating a dict with two lists? 计算250k列表的成对相似性的最有效方法 - Most efficient way to calculate pairwise similarity of 250k lists python - 以最有效的方式查找两组向量之间的余弦相似度 - python - finding cosine similarity between two groups of vectors in the most efficient way Python3:计算两个列表总和为100的所有排列的最有效方法是什么? - Python3: What is the most efficient way to calculate all permutations of two lists summing to 100? 比较两组的最有效方法是什么? - What is the most efficient way of comparing two sets? 计算两组矢量numpy之间的交叉乘积的有效方法 - Efficient way of computing the cross products between two sets of vectors numpy 计算 Keras 中两个张量之间的余弦相似度 - Computing cosine similarity between two tensors in Keras 构建相似矩阵的最有效方法 - Most efficient way to construct similarity matrix
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM