[英]Python: How to pass Dataframe Columns as parameters to a function?
I have a dataframe df
with 2 columns of text embeddings namely embedding_1
and embedding_2
.我有一个 dataframe df
有 2 列文本嵌入,即embedding_1
和embedding_2
。 I want to create a third column in df
named distances
which should contain the cosine_similarity between every row of embedding_1
and embedding_2
.我想在名为distances
的df
中创建第三列,其中应该包含embedding_1
和embedding_2
的每一行之间的 cosine_similarity。
But when I try to implement this using the following code I get a ValueError
.但是当我尝试使用以下代码来实现它时,我得到一个ValueError
。
How to fix it?如何解决?
Dataframe df
Dataframe df
embedding_1 | embedding_2
[[-0.28876397, -0.6367827, ...]] | [[-0.49163356, -0.4877703,...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.06686627, -0.75147504...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.42776933, -0.88310856,...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.6520882, -1.049325,...]]
[[-0.28876397, -0.6367827, ...]] | [[-1.4216679, -0.8930428,...]]
Code to Calculate Cosine Similarity计算余弦相似度的代码
df['distances'] = cosine_similarity(df['embeddings_1'], df['embeddings_2'])
Error错误
ValueError: setting an array element with a sequence.
Required Dataframe必需 Dataframe
embedding_1 | embedding_2 | distances
[[-0.28876397, -0.6367827, ...]] | [[-0.49163356, -0.4877703,...]] | 0.427
[[-0.28876397, -0.6367827, ...]] | [[-0.06686627, -0.75147504...]] | 0.673
[[-0.28876397, -0.6367827, ...]] | [[-0.42776933, -0.88310856,...]] | 0.882
[[-0.28876397, -0.6367827, ...]] | [[-0.6520882, -1.049325,...]] | 0.665
[[-0.28876397, -0.6367827, ...]] | [[-1.4216679, -0.8930428,...]] | 0.312
You can use apply()
to use cosine_similarity()
on each row:您可以使用apply()
cosine_similarity()
:
def cal_cosine_similarity(row):
return cosine_similarity(row['embeddings_1'], row['embeddings_2'])
df['distances'] = df.apply(cal_cosine_similarity, axis=1)
or one liner或一个班轮
df['distances'] = df.apply(lambda row: cosine_similarity(row['embeddings_1'], row['embeddings_2']), axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.