简体   繁体   中英

Adding data to columns in a dataframe based on condition on column values of another dataframe

I have an input dataframe with a column B with multiple values: df1

    A   B         C D   E
0   a1  b1       c1 d3  e1
1   a1  b2,b3    c2 d4  e2
2   a2  b3       c3 d5  e3
3   a2  b2       c8 d6  e1
4   a2  b4,b1,b5 c4 d7  e2
5   a3  b4       c5 d3  e4
6   a4  b5       c6 d1  e5
7   a4  b6, b2   c1 d2  e1
8   a5  b6       c2 d7  e2

There is another dataframe that I want data from columns C and D in df1 added to. In this the column of B has only 1 value in each row. df2

    A   B
0   a1  b1
1   a4  b6
2   a2  b1
3   a4  b2

I want an output dataframe that checks the row in df1 which has both the values of A and B in df2 and adds value of C and D from that row in df1. desired output:

    A   B   C   D
0   a1  b1  c1  d3
1   a4  b6  c1  d2
2   a2  b1  c4  d7
3   a4  b2  c1  d2

The challenge for me is the multiple values in column B of df1 and checking 2 columns in df1 to add C and D in df2. How can I do this?

You need first to explode the column B such as you have a single value and not comma separated values in a single cell. Use str.split and explode the column B to do the operation. Then merge .

res = (
    df2.merge(df1.assign(B=lambda x: x['B'].str.split(','))
                 .explode('B')
                 [['A','B','C','D']], 
              on=['A','B'], how='left')
)
print(res)
    A   B   C   D
0  a1  b1  c1  d3
1  a4  b6  c1  d2
2  a2  b1  c4  d7
3  a4  b2  c1  d2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM