I have an input dataframe with a column B with multiple values: df1
A B C D E
0 a1 b1 c1 d3 e1
1 a1 b2,b3 c2 d4 e2
2 a2 b3 c3 d5 e3
3 a2 b2 c8 d6 e1
4 a2 b4,b1,b5 c4 d7 e2
5 a3 b4 c5 d3 e4
6 a4 b5 c6 d1 e5
7 a4 b6, b2 c1 d2 e1
8 a5 b6 c2 d7 e2
There is another dataframe that I want data from columns C and D in df1 added to. In this the column of B has only 1 value in each row. df2
A B
0 a1 b1
1 a4 b6
2 a2 b1
3 a4 b2
I want an output dataframe that checks the row in df1 which has both the values of A and B in df2 and adds value of C and D from that row in df1. desired output:
A B C D
0 a1 b1 c1 d3
1 a4 b6 c1 d2
2 a2 b1 c4 d7
3 a4 b2 c1 d2
The challenge for me is the multiple values in column B of df1 and checking 2 columns in df1 to add C and D in df2. How can I do this?
You need first to explode the column B such as you have a single value and not comma separated values in a single cell. Use str.split
and explode
the column B to do the operation. Then merge
.
res = (
df2.merge(df1.assign(B=lambda x: x['B'].str.split(','))
.explode('B')
[['A','B','C','D']],
on=['A','B'], how='left')
)
print(res)
A B C D
0 a1 b1 c1 d3
1 a4 b6 c1 d2
2 a2 b1 c4 d7
3 a4 b2 c1 d2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.