简体   繁体   中英

What is the most efficient way of computing similarity between two dictionnaries of lists?

I want to compute accuracy using sets logic. I'll explain with an example:

For these two dictionnaries:

d1 = {1: {'hello', 'goodbye'}, 2:{'sayonnara'}, 3:{'origami'}}
d2 = {1: {'goodbye'}, 2:{'hola', 'bye'}, 3:{'bird','origami','giraffe'}}

I want to get this result:

{1: 0.5, 2: 0, 3: 0.3333333333333333}

I'm doing it this way:

from collections import defaultdict
acc=defaultdict(list)
for (k,v1) in d1.items():
    for (k,v) in d2.items():
        nb=len(v1.intersection(v))
        if (nb>0):
            print(nb)
            acc[k] = nb/ (abs(len(v) - len(v1))+1)
            print(acc)
        if k not in acc.keys():
            acc[k] = 0

Is there a more efficient solution than this?

If we operate under the assumption that both dicts will have the same keys, this can be done with a dict comprehension with a single loop:

print({k1: (len(v1.intersection(d2[k1])) / (abs(len(v1) - len(d2[k1])) + 1))
       for k1, v1 in d1.items()})

outputs

{1: 0.5, 2: 0.0, 3: 0.3333333333333333}

This can be generalized a bit by making sure we take into account only the common keys between the two dicts, just to be on the safe side.

print({common_key: (len(d1[common_key].intersection(d2[common_key])) / (abs(len(d1[common_key]) - len(d2[common_key])) + 1))
       for common_key in d1.keys() & d2.keys()})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM