简体   繁体   English

Python:如何将 Dataframe 列作为参数传递给 function?

[英]Python: How to pass Dataframe Columns as parameters to a function?

I have a dataframe df with 2 columns of text embeddings namely embedding_1 and embedding_2 .我有一个 dataframe df有 2 列文本嵌入,即embedding_1embedding_2 I want to create a third column in df named distances which should contain the cosine_similarity between every row of embedding_1 and embedding_2 .我想在名为distancesdf中创建第三列,其中应该包含embedding_1embedding_2的每一行之间的 cosine_similarity。

But when I try to implement this using the following code I get a ValueError .但是当我尝试使用以下代码来实现它时,我得到一个ValueError

How to fix it?如何解决?

Dataframe df Dataframe df

           embedding_1              |            embedding_2                                 
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.49163356, -0.4877703,...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.06686627, -0.75147504...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.42776933, -0.88310856,...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.6520882, -1.049325,...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-1.4216679, -0.8930428,...]]

Code to Calculate Cosine Similarity计算余弦相似度的代码

df['distances'] = cosine_similarity(df['embeddings_1'], df['embeddings_2'])

Error错误

ValueError: setting an array element with a sequence.

Required Dataframe必需 Dataframe

       embedding_1              |            embedding_2                 |  distances                        
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.49163356, -0.4877703,...]]   |    0.427
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.06686627, -0.75147504...]]   |    0.673
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.42776933, -0.88310856,...]]  |    0.882
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.6520882, -1.049325,...]]     |    0.665
 [[-0.28876397, -0.6367827, ...]]   |  [[-1.4216679, -0.8930428,...]]    |    0.312

You can use apply() to use cosine_similarity() on each row:您可以使用apply() cosine_similarity()

def cal_cosine_similarity(row):
    return cosine_similarity(row['embeddings_1'], row['embeddings_2'])

df['distances'] = df.apply(cal_cosine_similarity, axis=1)

or one liner或一个班轮

df['distances'] = df.apply(lambda row: cosine_similarity(row['embeddings_1'], row['embeddings_2']), axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM