![](/img/trans.png)
[英]How to recode multiple columns in a table based on another table [R or Python]?
[英]update table information based on columns of another table
我是 python 的新手,有两个数据框,df1 包含有关所有学生及其组和分数的信息,df2 包含有关少数学生更改组和分数时的更新信息。 如何根据 df2 的值(组和分数)更新 df1 中的信息?
df1
+----+----------+-----------+----------------+
| |student No| group | score |
|----+----------+-----------+----------------|
| 0 | 0 | 0 | 0.839626 |
| 1 | 1 | 0 | 0.845435 |
| 2 | 2 | 3 | 0.830778 |
| 3 | 3 | 2 | 0.831565 |
| 4 | 4 | 3 | 0.823569 |
| 5 | 5 | 0 | 0.808109 |
| 6 | 6 | 4 | 0.831645 |
| 7 | 7 | 1 | 0.851048 |
| 8 | 8 | 3 | 0.843209 |
| 9 | 9 | 4 | 0.84902 |
| 10 | 10 | 0 | 0.835143 |
| 11 | 11 | 4 | 0.843228 |
| 12 | 12 | 2 | 0.826949 |
| 13 | 13 | 0 | 0.84196 |
| 14 | 14 | 1 | 0.821634 |
| 15 | 15 | 3 | 0.840702 |
| 16 | 16 | 0 | 0.828994 |
| 17 | 17 | 2 | 0.843043 |
| 18 | 18 | 4 | 0.809093 |
| 19 | 19 | 1 | 0.85426 |
+----+----------+-----------+----------------+
df2
+----+-----------+----------+----------------+
| | group |student No| score |
|----+-----------+----------+----------------|
| 0 | 2 | 1 | 0.887435 |
| 1 | 0 | 19 | 0.81214 |
| 2 | 3 | 17 | 0.899041 |
| 3 | 0 | 8 | 0.853333 |
| 4 | 4 | 9 | 0.88512 |
+----+-----------+----------+----------------+
结果
自由度:3
+----+----------+-----------+----------------+
| |student No| group | score |
|----+----------+-----------+----------------|
| 0 | 0 | 0 | 0.839626 |
| 1 | 1 | 2 | 0.887435 |
| 2 | 2 | 3 | 0.830778 |
| 3 | 3 | 2 | 0.831565 |
| 4 | 4 | 3 | 0.823569 |
| 5 | 5 | 0 | 0.808109 |
| 6 | 6 | 4 | 0.831645 |
| 7 | 7 | 1 | 0.851048 |
| 8 | 8 | 0 | 0.853333 |
| 9 | 9 | 4 | 0.88512 |
| 10 | 10 | 0 | 0.835143 |
| 11 | 11 | 4 | 0.843228 |
| 12 | 12 | 2 | 0.826949 |
| 13 | 13 | 0 | 0.84196 |
| 14 | 14 | 1 | 0.821634 |
| 15 | 15 | 3 | 0.840702 |
| 16 | 16 | 0 | 0.828994 |
| 17 | 17 | 3 | 0.899041 |
| 18 | 18 | 4 | 0.809093 |
| 19 | 19 | 0 | 0.81214 |
+----+----------+-----------+----------------+
我的代码从 df2 更新 df1
dfupdated = df1.merge(df2, how='left', on=['student No'], suffixes=('', '_new'))
dfupdated['group'] = np.where(pd.notnull(dfupdated['group_new']), dfupdated['group_new'],
dfupdated['group'])
dfupdated['score'] = np.where(pd.notnull(dfupdated['score_new']), dfupdated['score_new'],
dfupdated['score'])
dfupdated.drop(['group_new', 'score_new'],axis=1, inplace=True)
dfupdated.reset_index(drop=True, inplace=True)
但我面临以下错误
KeyError: "['group'] not in index"
我不知道怎么了
尝试:
dfupdated = df1.merge(df2, on='student No', how='left')
dfupdated['group'] = dfupdated['group_y'].fillna(dfupdated['group_x'])
dfupdated['score'] = dfupdated['score_y'].fillna(dfupdated['score_x'])
dfupdated.drop(['group_x', 'group_y','score_x', 'score_y'], axis=1,inplace=True)
会给你你想要的解决方案。
从每个组中获得最大值
dfupdated.groupby(['group'], sort=False)['score'].max()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.