I'm trying to calculate the cosine similarity of two vectors stored in dictionaries dict_1
and dict_2
, this is my code:
import math
from numpy import dot
def norma(dict):
sqr_sum = 0.0
for x in dict:
sqr_sum += dict[x] * dict[x]
return math.sqrt(sqr_sum)
def cosine_similarity(dict_1, dict_2):
List1 = list(dict_1.values())
List2 = list(dict_2.values())
similarity = dot(List1,List2) / (norma(dict_1) * norma(dict_2))
return round(similarity, 2)
if __name__ == '__main__':
print(cosine_similarity({"a": 1, "b": 2, "c": 3}, {"b": 4, "c": 5, "d": 6}))
The function norma()
is used to calculate the norm of the dicts. When I execute the code, I got the output 0.97
, but the expected output is approximately 0.7
, where am I missing?
Quick and dirty way.
You don't have to calculate values for keys that are not in both dictionaries. constant * 0 = 0
import math
import numpy as np
def norma(dct):
return math.sqrt(sum(x*x for x in dct.values()))
def cosine_similarity(dict_1, dict_2):
intersecting_keys = list(dict_1.keys() & dict_2.keys())
List1 = list(dict_1[k] for k in intersecting_keys)
List2 = list(dict_2[k] for k in intersecting_keys)
similarity = np.dot(List1,List2) / (norma(dict_1) * norma(dict_2))
return round(similarity, 2)
if __name__ == '__main__':
print(cosine_similarity({"a": 1, "b": 2, "c": 3}, {"c": 5, "b": 4, "d": 6}))
Outputs:
0.7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.