![](/img/trans.png)
[英]Indexing and matching rows between two different dataframes using Pandas in Python
[英]matching rows between dataframes in pandas in python
我有两个数据框
df1,
Names
one two three
Sri is a good player
Ravi is a mentor
Kumar is a cricketer
df2,
values
sri
NaN
sri, is
kumar,cricketer
我正在尝试在df1中获取包含df2中所有项目的行
我的预期输出是
values Names
sri Sri is a good player
NaN
sri, is Sri is a good player
kumar,cricketer Kumar is a cricketer
我试过了, df1["Names"].str.contains("|".join(df2["values"].values.tolist()))
但是我无法达到预期的输出,因为它具有(“,”)。 请帮忙
使用集
s1 = df1.Names.dropna()
s1.loc[:] = [set(x.lower().split()) for x in s1.values.tolist()]
a1 = s1.values
s2 = df2['values'].dropna()
s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()]
a2 = s2.values
i = np.column_stack([a1 >= a2[:, None], [True] * len(a2)]).argmax(1)
df2.assign(Names=pd.Series(
np.append(df1.Names.values, np.nan)[i], s2.index
))
values Names
0 sri Sri is a good player
1 NaN NaN
2 sri, is Sri is a good player
3 kumar,cricketer Kumar is a cricketer
import pandas as pd
names = [
'one two three',
'Sri is a good player',
'Ravi is a mentor',
'Kumar is a cricketer'
]
values = [
'sri',
'NaN',
'sri, is',
'kumar,cricketer',
]
names = pd.Series(names)
values = pd.DataFrame(values, columns=['values'])
def foo(words):
names_copy = names.copy()
for word in words.split(','):
names_copy = names_copy[names_copy.str.contains(word, case=False)]
return names_copy.values
values['names'] = values['values'].map(foo)
values
values names
0 sri [Sri is a good player]
1 NaN []
2 sri, is [Sri is a good player]
3 kumar,cricketer [Kumar is a cricketer]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.