繁体   English   中英

合并 df1 列中值的两个数据框,以 df2 列中的逗号分隔值和 df1.Column2 = df2.Column2

[英]Merge two dataframes on value in column of df1 in comma separated values in column of df2 AND df1.Column2 = df2.Column2

输入:两个具有以下值的数据帧:

df1:

| Employee Name | EmployeeID | workDate   |
|---------------|------------|------------|
| John          | 2,22       | 2020-11-01 |
| John          | 2,22       | 2020-11-02 |
| Kim           | 3          | 2020-11-01 |

df2:

| EmployeeID | workDate   | Hours |
|------------|------------|-------|
| 2          | 2020-11-01 | 8     |
| 22         | 2020-11-02 | 2     |
| 3          | 2020-11-01 | 10    |

需要在 df1.EmployeeIDs 和 df2.workDate == df1.workDate 的 df2.EmployeeID 上加入这两个数据框。

Output:

| Employee Name | EmployeeID | workDate   | Hours |
|---------------|------------|------------|-------|
| John          | 2,22       | 2020-11-01 | 8     |
| John          | 2,22       | 2020-11-02 | 2     |
| Kim           | 3          | 2020-11-01 | 10    |

使用DataFrame.explode和 , 分割值,然后使用DataFrame.merge和左连接,最后由GroupBy.agg

#converted to strings for match splitted values
df2['EmployeeID'] = df2['EmployeeID'].astype(str)

    
df1 = (df1.assign(EmployeeID = df1['EmployeeID'].str.split('\s*,\s*'))
          .explode('EmployeeID')
          .merge(df2, on=['EmployeeID','workDate'], how='left')
          .groupby(['Employee Name','workDate'], as_index=False, sort=False)
          .agg({'EmployeeID':','.join, 'Hours':'sum'}))
print (df1)
  Employee Name    workDate EmployeeID  Hours
0          John  2020-11-01       2,22    8.0
1          John  2020-11-02       2,22    2.0
2           Kim  2020-11-01          3   10.0
  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM