![](/img/trans.png)
[英]Merge two dataframes on value in column of df1 in comma separated values in column of df2
[英]Merge two dataframes on value in column of df1 in comma separated values in column of df2 AND df1.Column2 = df2.Column2
输入:两个具有以下值的数据帧:
df1:
| Employee Name | EmployeeID | workDate |
|---------------|------------|------------|
| John | 2,22 | 2020-11-01 |
| John | 2,22 | 2020-11-02 |
| Kim | 3 | 2020-11-01 |
df2:
| EmployeeID | workDate | Hours |
|------------|------------|-------|
| 2 | 2020-11-01 | 8 |
| 22 | 2020-11-02 | 2 |
| 3 | 2020-11-01 | 10 |
需要在 df1.EmployeeIDs 和 df2.workDate == df1.workDate 的 df2.EmployeeID 上加入这两个数据框。
Output:
| Employee Name | EmployeeID | workDate | Hours |
|---------------|------------|------------|-------|
| John | 2,22 | 2020-11-01 | 8 |
| John | 2,22 | 2020-11-02 | 2 |
| Kim | 3 | 2020-11-01 | 10 |
使用DataFrame.explode
和 , 分割值,
然后使用DataFrame.merge
和左连接,最后由GroupBy.agg
:
#converted to strings for match splitted values
df2['EmployeeID'] = df2['EmployeeID'].astype(str)
df1 = (df1.assign(EmployeeID = df1['EmployeeID'].str.split('\s*,\s*'))
.explode('EmployeeID')
.merge(df2, on=['EmployeeID','workDate'], how='left')
.groupby(['Employee Name','workDate'], as_index=False, sort=False)
.agg({'EmployeeID':','.join, 'Hours':'sum'}))
print (df1)
Employee Name workDate EmployeeID Hours
0 John 2020-11-01 2,22 8.0
1 John 2020-11-02 2,22 2.0
2 Kim 2020-11-01 3 10.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.