繁体   English   中英

在两个不同的DataFrames Pandas中匹配字符串值

[英]Match string values in two different DataFrames Pandas

我有两个数据帧(df1,df2),我想在df1中创建一个新列,该列指示每个数据帧之间的多个列中是否存在匹配,可能匹配或不匹配的情况。 DF1:

id  a   b   c   d   name
a1  94  18  10  20  b1
a2  20  18  1   2   b4,b5
a3  21  18  34  32  b2,b3,b4
a4  216 5   56  76  b5
a5  210 5   10  30  b4,b5

DF2:

id  a   b   c   d
b1  94  5   10  20
b2  A150    5   13  45
b3  167 5   4   -1
b4  210 5   40  80
b5  216 5   60  80

基本上名称是df2的ID。 我想将df1的名称与df2的ID相匹配,并根据以下条件创建新列。

Match : df1['a','b','c','d'] = df2['a','b','c','d']  
likely match : df1['a','b'] = df2['a','b'] & c or d +- 10 is fine
Missmatch: df1['a','b'] = df2['a','b'] but column c & d > +- 10
Missing: df1 record not in df2

结果

id  a   b   c   d   name    Status
a1  94  18  10  20  b1      Match
a2  20  18  1   2   b2,b3   Missing
a3  21  18  34  32  b2,b3,b4Missing
a4  210 5   10  30  b4,b5   Missmatch
a5  216 5   56  76  b5      Likely

您预期的结果是错误的。 您翻转了df1['id'] == 'a4'df1['id'] == 'a5'的列值,则列名不同。 不过,您可以使用np.select

df2['name'] = df1['name'].str.split(',')

conditions = [
    ((df2.apply(lambda x: x['id'] in x['name'], axis=1)) & (df1[['a','b','c','d']] == df2[['a','b','c','d']]).any(axis=1)),
    ((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) <= 10).any(axis=1)),
    ((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) >= 10).any(axis=1)),
]

choices = [
    'Match',
    'Likely',
    'Missmatch',
]

df1['Status'] = np.select(conditions,choices,default='Missing')

结果:

    id  a   b   c   d   name      Status
0   a1  94  18  10  20  b1         Match
1   a2  20  18  1   2   b4,b5      Missing
2   a3  21  18  34  32  b2,b3,b4   Missing
3   a4  216 5   56  76  b5         Likely
4   a5  210 5   10  30  b4,b5      Match

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM