![](/img/trans.png)
[英]How can I calculate the Levenshtein distance between all rows in two dataframes and output the Levenshtein score for each pair?
[英]How do I calculate the Levenshtein distance between two Pandas DataFrame columns?
我正在嘗試計算兩個 Pandas 列之間的 Levenshtein 距離,但我卡住了這是我正在使用的庫。 這是一個最小的、可重現的示例:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
['passwrd', 'psword'],
['psw0rd', 'passwor']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
password attempt
0 passw0rd pasw0rd
1 passwrd psword
2 psw0rd passwor
我可憐的嘗試:
df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)
這就是該功能的工作原理。 它接受兩個字符串作為參數:
levenshtein.distance('helloworld', 'heloworl')
Out[1]: 2
也許我遺漏了一些東西,你不喜歡 lambda 表達式有什么原因嗎? 這對我有用:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
['passwrd', 'psword'],
['psw0rd', 'passwor'],
['helloworld', 'heloworl']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
df.apply(lambda x: levenshtein.distance(x['password'], x['attempt']), axis=1)
出去:
0 1
1 3
2 4
3 2
dtype: int64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.