[英]Match pandas dataframe name columns to another dataframe's columns?
I'm very new to Python.我对 Python 很陌生。 How can I match one text dataframe to another?
如何将一个文本数据框与另一个文本数据框匹配? (kindly please edit this question if I ask this wrongly)
(如果我问错了,请编辑这个问题)
For example given this input data:例如给定这个输入数据:
df1 =
id Names
0 123 Simpson J.
1 456 Snoop Dogg
df2 =
Names
0 John Simpson
1 Snoop Dogg
2 S. Dogg
3 Mr Dogg
Is there a way I could find (maybe using findall
or match
, or any python packages) so that I could produce how many times the names with the id has appeared which almost like this result:有没有一种方法可以找到(可能使用
findall
或match
,或任何 python 包),以便我可以生成带有 id 的名称出现的次数,这几乎就像这个结果:
result =
id Names_appeared
0 123 1
1 456 3
Looking for a brief explanation and some URL to help me understand.寻找一个简短的解释和一些 URL 来帮助我理解。
Here's an example using fuzzywuzzy as IanS suggested:这是 IanS 建议的使用模糊模糊的示例:
import pandas as pd
from fuzzywuzzy import fuzz
def fuzz_count(shortList, longList, minScore):
""" Count fuzz ratios greater than or equal to a minimum score. """
results = []
for s in shortList:
scores = [fuzz.ratio(s, l) for l in longList]
count = sum(x >= minScore for x in scores)
results.append(count)
return results
data1 = {'id': pd.Series([123, 456]),
'Names': pd.Series(['Simpson J.', 'Snoop Dogg'])}
data2 = {'Names': pd.Series(['John Simpson', 'Snoop Dogg', 'S. Dogg', 'Mr Dogg'])}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
result = pd.DataFrame()
result['id'] = df1['id']
counts = fuzz_count(df1['Names'], df2['Names'], minScore=60) # [1, 2]
result['Names_appeared'] = counts
print(result) # id Names_appeared
# 0 123 1
# 1 456 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.