[英]I have two dataframes (DF1) and (DF2). I want to substitute values for a column in (DF2) that match criteria on two columns of (DF1)
I want to populate the 'Salary' column in DataFrame1 (DF1) with the corresponding 'Salary' in DataFrame2 (DF2).我想用 DataFrame2 (DF2) 中的相应“Salary”填充 DataFrame1 (DF1) 中的“Salary”列。 These need to match on 'Team' AND 'Players'.这些需要匹配“团队”和“球员”。
To note:要注意:
The Dataframes are: Not the same size.数据框是: 大小不同。 Not the same order.不是同一个顺序。
import pandas as pd
#df 1:
nba_data = {'Team': ['Mavericks', 'Mavericks', 'Mavericks', '', 'NewYorkKnicks17','Houston Rockets', 'NewYorkKnicks17'],
'Players': ['Luka Doncic', 'Kristaps Porzingis', 'Jalen Brunson', 'Kristaps Porzingis', 'JR Smith',
'James Harden', 'Derrick Rose',],
'Salary': ['0', '0', '0','0', '0', '0', '0'],
'Coach': ['Rick Carlisle', 'Rick Carlisle', 'Steve Kerr', 'Phil Jackson', 'Tom Thibideou', '', '']}
nba_df1 = pd.DataFrame(nba_data)
nba_df1
#df2:
nba_data2 = {'Team': ['Mavericks', 'Mavericks', 'Mavericks', 'NewYorkKnicks17', 'NewYorkKnicks17', 'NewYorkKnicks17', 'Houston Rockets'],
'Players': ['Luka Doncic', 'Kristaps Porzingis', 'Steph Curry', 'JR Smith', 'Derrick Rose',
'Kristaps Porzingis', 'James Harden'],
'Salary': ['3m', '126m', '0','115m', '0', '20m', '1.5m'],
'Coach': ['Rick Carlisle', 'Rick Carlisle', 'Steve Kerr', '', 'Tom Thibideou', 'Phil Jackson', '']}
nba_df2 = pd.DataFrame(nba_data2)
nba_df2
Result desired = nba_df1 with the appropriate salaries populated (run the below):所需的结果 = nba_df1 并填充了适当的薪水(运行以下命令):
nba_data3 = {'Team': ['Mavericks', 'Mavericks', 'Mavericks', '', 'NewYorkKnicks17','Houston Rockets', 'NewYorkKnicks17'],
'Players': ['Luka Doncic', 'Kristaps Porzingis', 'Jalen Brunson', 'Kristaps Porzingis', 'JR Smith',
'James Harden', 'Derrick Rose',],
'Salary': ['3m', '126m', '0','20m', '115m', '1.5m', '0'],
'Coach': ['Rick Carlisle', 'Rick Carlisle', 'Steve Kerr', 'Phil Jackson', 'Tom Thibideou', '', '']}
nba_df1_adjusted = pd.DataFrame(nba_data3)
Kindly note: this is not a tutorial. - it is a specific question and therefore not a duplicate of a general tutorial.
agg = pd.merge(nba_df1, nba_df2, on = ['Players', 'Team'], how = 'left')
Your result will be on Salary_y您的结果将在 Salary_y 上
Edit: Kind of dirty but it works:编辑:有点脏,但它有效:
agg = pd.merge(nba_df1, nba_df2[['Team', 'Players', 'Salary']], on = ['Players', 'Team'], how = 'left')
agg2 = pd.merge(nba_df1, nba_df2, on = ['Players', 'Coach'], how = 'left')
merge = pd.merge(agg, agg2, on = ['Players', 'Coach'])
merge['Salary'] = merge['Salary_y_x'].fillna(merge['Salary_y_y'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.