将按列自定义 function 应用于 2 个数据帧，创建一个新的 dataframe

Question

I have 2 dataframes:我有 2 个数据框：

df_up = pd.DataFrame({"u1":[2,-3,5,0], 
                      "u2":[1,0,5,-2]},
                      index=["ta","tb","tc","td"])

df_tt = pd.DataFrame({"q1":[1,0,1,0], 
                      "q2":[1,0,1,1],
                      "q3":[0,1,0,0]},
                      index=["ta","tb","tc","td"])

I want to create a new dataframe that calculates the cosine similarity between all columns of df_up and all columns of df_tt.我想创建一个新的 dataframe 来计算 df_up 的所有列和 df_tt 的所有列之间的余弦相似度。 Both dataframes have the same number of rows.Ideally, the solution would work with a custom function, such as:两个数据框的行数相同。理想情况下，该解决方案将与自定义 function 一起使用，例如：

from scipy import spatial
def cosine_similarity(array_1, array_2):
    return 1 - spatial.distance.cosine(array_1,array_2)

The result would look like this:结果将如下所示：

    u1       u2
q1  0.8029   0.7745
q2  0.6556   0.4216
q3  -0.4866  0.0

Is there an "elegant" way of solving this or is iterating through the 2 dataframes the only way?是否有解决此问题的“优雅”方法，还是唯一的方法是遍历 2 个数据帧？

Answer 1

Solution from cdist cdist的解决方案

from scipy.spatial.distance import cdist
ary=(1-cdist(df_up.T.values, df_tt.T.values, metric='cosine')).T
df=pd.DataFrame(ary,columns=df_up.columns,index=df_tt.columns)
Out[258]: 
          u1        u2
q1  0.802955  0.774597
q2  0.655610  0.421637
q3 -0.486664  0.000000

Answer 2

A generic way is to use corr with a callable method, see below,一种通用的方法是将corr与callable方法一起使用，见下文，

# There was a typo in the original method: array_1, array_2

def cosine_similarity(array1, array2):
    return 1 - spatial.distance.cosine(array1,array2)

output = (pd.concat([df_up, df_tt], axis=1)
            .corr(method=cosine_similarity)
            .drop(columns=df_tt.columns, index=df_up.columns))

将按列自定义 function 应用于 2 个数据帧，创建一个新的 dataframe

问题描述

2 个解决方案

解决方案1
2 2020-05-19 20:51:16

解决方案2
1 已采纳 2020-05-19 20:58:07

将按列自定义 function 应用于 2 个数据帧，创建一个新的 dataframe

问题描述

2 个解决方案

解决方案1 2 2020-05-19 20:51:16

解决方案2 1 已采纳 2020-05-19 20:58:07

解决方案1
2 2020-05-19 20:51:16

解决方案2
1 已采纳 2020-05-19 20:58:07