![](/img/trans.png)
[英]Match string values in two different DataFrames and create a new column with match indicator in Pandas
[英]Match string values in two different DataFrames Pandas
我有两个数据帧(df1,df2),我想在df1中创建一个新列,该列指示每个数据帧之间的多个列中是否存在匹配,可能匹配或不匹配的情况。 DF1:
id a b c d name
a1 94 18 10 20 b1
a2 20 18 1 2 b4,b5
a3 21 18 34 32 b2,b3,b4
a4 216 5 56 76 b5
a5 210 5 10 30 b4,b5
DF2:
id a b c d
b1 94 5 10 20
b2 A150 5 13 45
b3 167 5 4 -1
b4 210 5 40 80
b5 216 5 60 80
基本上名称是df2的ID。 我想将df1的名称与df2的ID相匹配,并根据以下条件创建新列。
Match : df1['a','b','c','d'] = df2['a','b','c','d']
likely match : df1['a','b'] = df2['a','b'] & c or d +- 10 is fine
Missmatch: df1['a','b'] = df2['a','b'] but column c & d > +- 10
Missing: df1 record not in df2
结果
id a b c d name Status
a1 94 18 10 20 b1 Match
a2 20 18 1 2 b2,b3 Missing
a3 21 18 34 32 b2,b3,b4Missing
a4 210 5 10 30 b4,b5 Missmatch
a5 216 5 56 76 b5 Likely
您预期的结果是错误的。 您翻转了df1['id'] == 'a4'
和df1['id'] == 'a5'
的列值,则列名不同。 不过,您可以使用np.select
df2['name'] = df1['name'].str.split(',')
conditions = [
((df2.apply(lambda x: x['id'] in x['name'], axis=1)) & (df1[['a','b','c','d']] == df2[['a','b','c','d']]).any(axis=1)),
((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) <= 10).any(axis=1)),
((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) >= 10).any(axis=1)),
]
choices = [
'Match',
'Likely',
'Missmatch',
]
df1['Status'] = np.select(conditions,choices,default='Missing')
结果:
id a b c d name Status
0 a1 94 18 10 20 b1 Match
1 a2 20 18 1 2 b4,b5 Missing
2 a3 21 18 34 32 b2,b3,b4 Missing
3 a4 216 5 56 76 b5 Likely
4 a5 210 5 10 30 b4,b5 Match
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.