[英]Apply column-wise custom function to 2 dataframes, creating a new dataframe
I have 2 dataframes:我有 2 个数据框:
df_up = pd.DataFrame({"u1":[2,-3,5,0],
"u2":[1,0,5,-2]},
index=["ta","tb","tc","td"])
df_tt = pd.DataFrame({"q1":[1,0,1,0],
"q2":[1,0,1,1],
"q3":[0,1,0,0]},
index=["ta","tb","tc","td"])
I want to create a new dataframe that calculates the cosine similarity between all columns of df_up and all columns of df_tt.我想创建一个新的 dataframe 来计算 df_up 的所有列和 df_tt 的所有列之间的余弦相似度。 Both dataframes have the same number of rows.Ideally, the solution would work with a custom function, such as:
两个数据框的行数相同。理想情况下,该解决方案将与自定义 function 一起使用,例如:
from scipy import spatial
def cosine_similarity(array_1, array_2):
return 1 - spatial.distance.cosine(array_1,array_2)
The result would look like this:结果将如下所示:
u1 u2
q1 0.8029 0.7745
q2 0.6556 0.4216
q3 -0.4866 0.0
Is there an "elegant" way of solving this or is iterating through the 2 dataframes the only way?是否有解决此问题的“优雅”方法,还是唯一的方法是遍历 2 个数据帧?
Solution from cdist
cdist
的解决方案
from scipy.spatial.distance import cdist
ary=(1-cdist(df_up.T.values, df_tt.T.values, metric='cosine')).T
df=pd.DataFrame(ary,columns=df_up.columns,index=df_tt.columns)
Out[258]:
u1 u2
q1 0.802955 0.774597
q2 0.655610 0.421637
q3 -0.486664 0.000000
A generic way is to use corr
with a callable
method, see below,一种通用的方法是将
corr
与callable
方法一起使用,见下文,
# There was a typo in the original method: array_1, array_2
def cosine_similarity(array1, array2):
return 1 - spatial.distance.cosine(array1,array2)
output = (pd.concat([df_up, df_tt], axis=1)
.corr(method=cosine_similarity)
.drop(columns=df_tt.columns, index=df_up.columns))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.