简体   繁体   中英

Calculating the cosine similarity with dictionaries

I'm trying to calculate the cosine similarity of two vectors stored in dictionaries dict_1 and dict_2 , this is my code:

import math
from numpy import dot
def norma(dict):
    sqr_sum = 0.0
    for x in dict:
        sqr_sum += dict[x] * dict[x]
    
    return math.sqrt(sqr_sum)
        
def cosine_similarity(dict_1, dict_2):
    List1 = list(dict_1.values())
    List2 = list(dict_2.values())
    
    similarity = dot(List1,List2) / (norma(dict_1) * norma(dict_2))
    return round(similarity, 2)


if __name__ == '__main__':
    print(cosine_similarity({"a": 1, "b": 2, "c": 3}, {"b": 4, "c": 5, "d": 6}))

The function norma() is used to calculate the norm of the dicts. When I execute the code, I got the output 0.97 , but the expected output is approximately 0.7 , where am I missing?

Quick and dirty way.

You don't have to calculate values for keys that are not in both dictionaries. constant * 0 = 0

import math
import numpy as np

def norma(dct):
    return math.sqrt(sum(x*x for x in dct.values()))
        
def cosine_similarity(dict_1, dict_2):
    intersecting_keys = list(dict_1.keys() & dict_2.keys())

    List1 = list(dict_1[k] for k in intersecting_keys)
    List2 = list(dict_2[k] for k in intersecting_keys)
    
    similarity = np.dot(List1,List2) / (norma(dict_1) * norma(dict_2))
    return round(similarity, 2)


if __name__ == '__main__':
    print(cosine_similarity({"a": 1, "b": 2, "c": 3}, {"c": 5, "b": 4, "d": 6}))

Outputs:

0.7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM