简体   繁体   English

根据另一个表的列更新表信息

[英]update table information based on columns of another table

I am new in python have two dataframes, df1 contains information about all students with their group and score, and df2 contains updated information about few students when they change their group and score.我是 python 的新手,有两个数据框,df1 包含有关所有学生及其组和分数的信息,df2 包含有关少数学生更改组和分数时的更新信息。 How could I update the information in df1 based on the values of df2 (group and score)?如何根据 df2 的值(组和分数)更新 df1 中的信息?

df1 df1

   +----+----------+-----------+----------------+
    |    |student No|   group   |       score    |
    |----+----------+-----------+----------------|
    |  0 |        0 |         0 |       0.839626 |
    |  1 |        1 |         0 |       0.845435 |
    |  2 |        2 |         3 |       0.830778 |
    |  3 |        3 |         2 |       0.831565 |
    |  4 |        4 |         3 |       0.823569 |
    |  5 |        5 |         0 |       0.808109 |
    |  6 |        6 |         4 |       0.831645 |
    |  7 |        7 |         1 |       0.851048 |
    |  8 |        8 |         3 |       0.843209 |
    |  9 |        9 |         4 |       0.84902  |
    | 10 |       10 |         0 |       0.835143 |
    | 11 |       11 |         4 |       0.843228 |
    | 12 |       12 |         2 |       0.826949 |
    | 13 |       13 |         0 |       0.84196  |
    | 14 |       14 |         1 |       0.821634 |
    | 15 |       15 |         3 |       0.840702 |
    | 16 |       16 |         0 |       0.828994 |
    | 17 |       17 |         2 |       0.843043 |
    | 18 |       18 |         4 |       0.809093 |
    | 19 |       19 |         1 |       0.85426  |
    +----+----------+-----------+----------------+

df2
+----+-----------+----------+----------------+
|    |   group   |student No|       score    |
|----+-----------+----------+----------------|
|  0 |         2 |        1 |       0.887435 |
|  1 |         0 |       19 |       0.81214  |
|  2 |         3 |       17 |       0.899041 |
|  3 |         0 |        8 |       0.853333 |
|  4 |         4 |        9 |       0.88512  |
+----+-----------+----------+----------------+

The result结果

df: 3自由度:3

   +----+----------+-----------+----------------+
    |    |student No|   group   |       score    |
    |----+----------+-----------+----------------|
    |  0 |        0 |         0 |       0.839626 |
    |  1 |        1 |         2 |       0.887435 |
    |  2 |        2 |         3 |       0.830778 |
    |  3 |        3 |         2 |       0.831565 |
    |  4 |        4 |         3 |       0.823569 |
    |  5 |        5 |         0 |       0.808109 |
    |  6 |        6 |         4 |       0.831645 |
    |  7 |        7 |         1 |       0.851048 |
    |  8 |        8 |         0 |       0.853333 |
    |  9 |        9 |         4 |       0.88512  |
    | 10 |       10 |         0 |       0.835143 |
    | 11 |       11 |         4 |       0.843228 |
    | 12 |       12 |         2 |       0.826949 |
    | 13 |       13 |         0 |       0.84196  |
    | 14 |       14 |         1 |       0.821634 |
    | 15 |       15 |         3 |       0.840702 |
    | 16 |       16 |         0 |       0.828994 |
    | 17 |       17 |         3 |       0.899041 |
    | 18 |       18 |         4 |       0.809093 |
    | 19 |       19 |         0 |       0.81214  |
    +----+----------+-----------+----------------+

my code to update df1 from df2我的代码从 df2 更新 df1

dfupdated = df1.merge(df2, how='left', on=['student No'], suffixes=('', '_new'))
dfupdated['group'] = np.where(pd.notnull(dfupdated['group_new']), dfupdated['group_new'],
                                         dfupdated['group'])
dfupdated['score'] = np.where(pd.notnull(dfupdated['score_new']), dfupdated['score_new'],
                                         dfupdated['score'])
dfupdated.drop(['group_new', 'score_new'],axis=1, inplace=True)
dfupdated.reset_index(drop=True, inplace=True)

but I face the following error但我面临以下错误

KeyError: "['group'] not in index"

I don't know what's wrong I ran same and got the answer giving a different way to solve it我不知道怎么了

try:尝试:

dfupdated = df1.merge(df2, on='student No', how='left')
dfupdated['group'] = dfupdated['group_y'].fillna(dfupdated['group_x'])
dfupdated['score'] = dfupdated['score_y'].fillna(dfupdated['score_x'])
dfupdated.drop(['group_x', 'group_y','score_x', 'score_y'], axis=1,inplace=True)

will give you the solution you want.会给你你想要的解决方案。

to get the max from each group从每个组中获得最大值

dfupdated.groupby(['group'], sort=False)['score'].max()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM