first of all, I have calculated the cosine similarities using the data frame and the result is returned array object.
assumed that, this is my data frame
A B C D E
X1 0 0 1 0 1
X2 0 1 2 3 1
X3 0 1 1 0 1
here is the way I have calculated the df
df = df.drop(['colX'], axis=1)
cos_sim = cosine_similarity(df_new_jac)
and it returns like this
array([[0., 0., 1.],
[0., 1., 2.],
[0., 1., 1.]
however, i hope to see the result like this
X1 X2 X3
X1 0 0 1
X2 0 1 2
X3 0 1 1
but, according to the shape of 'df' and 'cos_sim' are having the different shape, I can't use this code
df = df.set_index('colX')
v = cosine_similarity(df.values)
df[:] = v
df.reset_index()
the error shows, the len must be equivalent. is any suggestion to fix this issue?
not exactly sure what you're trying to achieve here, but here is my best guess:
import pandas as pd
# the original df
df1 = pd.DataFrame({'index': ['X1','X2','X3'], 'A':[0,0,0], 'B':[0,1,1], 'C': [1,2,1], 'D': [0,3,0], 'E':[1,1,1]})
# the cosine_similarity df
df2 = pd.DataFrame({'index': ['X1','X2','X3'], 'X1':[0,0,0], 'X2':[0, 1,1], 'X3':[1,2,1]})
# note the 'index' column is a column, not the index.
# merge the 2, by default on the common column (i.e. the 'index' column)
df = df1.merge(df2)
df.set_index('index', inplace=True)
> A B C D E X1 X2 X3
index
X1 0 0 1 0 1 0 0 1
X2 0 1 2 3 1 0 1 2
X3 0 1 1 0 1 0 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.