[英]Check similarity of texts in pandas dataframe
I have a dataframe我有一个 dataframe
Account Message
454232 Hi, first example 1
321342 Now, second example
412295 hello, a new example 1 in the third row
432325 And now something completely different
I would like to check similarity between texts in Message column.我想检查消息列中文本之间的相似性。 I would need to choose one of the message as source to test (for example the first one) and create a new column with the output from similarity test.我需要选择一条消息作为要测试的源(例如第一个消息)并使用相似性测试中的 output 创建一个新列。 If I had two lists, I would do as follows如果我有两个列表,我会这样做
import spacy
spacyModel = spacy.load('en')
list1 = ["Hi, first example 1"]
list2 = ["Now, second example","hello, a new example 1 in the third row","And now something completely different"]
list1SpacyDocs = [spacyModel(x) for x in list1]
list2SpacyDocs = [spacyModel(x) for x in list2]
similarityMatrix = [[x.similarity(y) for x in list1SpacyDocs] for y in list2SpacyDocs]
print(similarityMatrix)
But I do not know how to do the same in pandas, creating a new column with similarity results.但我不知道如何在 pandas 中做同样的事情,创建一个具有相似结果的新列。
Any suggestions?有什么建议么?
I am not sure about spacy
, but in order to compare the one text with other values in the columns I would use .apply()
and pass the match making function and set axis=1
for column-wise.我不确定spacy
,但为了将一个文本与列中的其他值进行比较,我将使用.apply()
并传递匹配 function 并设置axis=1
为按列。 Here is an example using SequenceMatcher
(I don't have spacy
for now).这是一个使用SequenceMatcher
的示例(我现在没有spacy
)。
test = 'Hi, first example 1'
df['r'] = df.apply(lambda x: SequenceMatcher(None, test, x.Message).ratio(), axis=1)
print(df)
Result:结果:
Account Message r
0 454232 Hi, first example 1 1.000000
1 321342 Now, second example 0.578947
2 412295 hello, a new example 1 in the third row 0.413793
3 432325 And now something completely different 0.245614
So in your case, it will be a similar statement but using functions you have instead of SequenceMatcher因此,在您的情况下,这将是一个类似的语句,但使用您拥有的函数而不是 SequenceMatcher
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.