![](/img/trans.png)
[英]Merge two dataframes on value in column of df1 in comma separated values in column of df2 AND df1.Column2 = df2.Column2
[英]Merge two dataframes on value in column of df1 in comma separated values in column of df2
输入:具有以下数据的两个数据框:df1:(注意 EmployeeID 是一串逗号分隔值)
| Employee Name | EmployeeID |
|---------------|------------|
| John | 2, 22 |
| Kim | 3 |
df2:
| EmployeeID | Hours |
|------------|-------|
| 2 | 8 |
| 3 | 10 |
我想在 df1.EmployeeID 的 ID 列表中合并 df2.EmployeeID 上的 df1 和 df2。
Output:
| Employee Name | EmployeeID | Hours |
|---------------|------------|-------|
| John | 2,22 | 8 |
| Kim | 3 | 10 |
如果需要匹配多个值,例如EmployeeID = 2,3,22
到Hours=8+10
使用字典映射在理解中使用split
和sum
:
#converted to strings for match splitted values
df2['EmployeeID'] = df2['EmployeeID'].astype(str)
d = df2.set_index('EmployeeID')['Hours'].to_dict()
f = lambda x: sum(d[y] for y in x.split(', ') if y in d)
df1['Hours'] = df1['EmployeeID'].apply(f)
print (df1)
Employee Name EmployeeID Hours
0 John 2, 22 8
1 Kim 3 10
整数匹配的另一个想法:
d = df2.set_index('EmployeeID')['Hours'].to_dict()
f = lambda x: sum(d[int(y)] for y in x.split(', ') if int(y) in d)
df1['Hours'] = df1['EmployeeID'].apply(f)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.