简体   繁体   中英

update table information based on columns of another table

I am new in python have two dataframes, df1 contains information about all students with their group and score, and df2 contains updated information about few students when they change their group and score. How could I update the information in df1 based on the values of df2 (group and score)?

df1

   +----+----------+-----------+----------------+
    |    |student No|   group   |       score    |
    |----+----------+-----------+----------------|
    |  0 |        0 |         0 |       0.839626 |
    |  1 |        1 |         0 |       0.845435 |
    |  2 |        2 |         3 |       0.830778 |
    |  3 |        3 |         2 |       0.831565 |
    |  4 |        4 |         3 |       0.823569 |
    |  5 |        5 |         0 |       0.808109 |
    |  6 |        6 |         4 |       0.831645 |
    |  7 |        7 |         1 |       0.851048 |
    |  8 |        8 |         3 |       0.843209 |
    |  9 |        9 |         4 |       0.84902  |
    | 10 |       10 |         0 |       0.835143 |
    | 11 |       11 |         4 |       0.843228 |
    | 12 |       12 |         2 |       0.826949 |
    | 13 |       13 |         0 |       0.84196  |
    | 14 |       14 |         1 |       0.821634 |
    | 15 |       15 |         3 |       0.840702 |
    | 16 |       16 |         0 |       0.828994 |
    | 17 |       17 |         2 |       0.843043 |
    | 18 |       18 |         4 |       0.809093 |
    | 19 |       19 |         1 |       0.85426  |
    +----+----------+-----------+----------------+

df2
+----+-----------+----------+----------------+
|    |   group   |student No|       score    |
|----+-----------+----------+----------------|
|  0 |         2 |        1 |       0.887435 |
|  1 |         0 |       19 |       0.81214  |
|  2 |         3 |       17 |       0.899041 |
|  3 |         0 |        8 |       0.853333 |
|  4 |         4 |        9 |       0.88512  |
+----+-----------+----------+----------------+

The result

df: 3

   +----+----------+-----------+----------------+
    |    |student No|   group   |       score    |
    |----+----------+-----------+----------------|
    |  0 |        0 |         0 |       0.839626 |
    |  1 |        1 |         2 |       0.887435 |
    |  2 |        2 |         3 |       0.830778 |
    |  3 |        3 |         2 |       0.831565 |
    |  4 |        4 |         3 |       0.823569 |
    |  5 |        5 |         0 |       0.808109 |
    |  6 |        6 |         4 |       0.831645 |
    |  7 |        7 |         1 |       0.851048 |
    |  8 |        8 |         0 |       0.853333 |
    |  9 |        9 |         4 |       0.88512  |
    | 10 |       10 |         0 |       0.835143 |
    | 11 |       11 |         4 |       0.843228 |
    | 12 |       12 |         2 |       0.826949 |
    | 13 |       13 |         0 |       0.84196  |
    | 14 |       14 |         1 |       0.821634 |
    | 15 |       15 |         3 |       0.840702 |
    | 16 |       16 |         0 |       0.828994 |
    | 17 |       17 |         3 |       0.899041 |
    | 18 |       18 |         4 |       0.809093 |
    | 19 |       19 |         0 |       0.81214  |
    +----+----------+-----------+----------------+

my code to update df1 from df2

dfupdated = df1.merge(df2, how='left', on=['student No'], suffixes=('', '_new'))
dfupdated['group'] = np.where(pd.notnull(dfupdated['group_new']), dfupdated['group_new'],
                                         dfupdated['group'])
dfupdated['score'] = np.where(pd.notnull(dfupdated['score_new']), dfupdated['score_new'],
                                         dfupdated['score'])
dfupdated.drop(['group_new', 'score_new'],axis=1, inplace=True)
dfupdated.reset_index(drop=True, inplace=True)

but I face the following error

KeyError: "['group'] not in index"

I don't know what's wrong I ran same and got the answer giving a different way to solve it

try:

dfupdated = df1.merge(df2, on='student No', how='left')
dfupdated['group'] = dfupdated['group_y'].fillna(dfupdated['group_x'])
dfupdated['score'] = dfupdated['score_y'].fillna(dfupdated['score_x'])
dfupdated.drop(['group_x', 'group_y','score_x', 'score_y'], axis=1,inplace=True)

will give you the solution you want.

to get the max from each group

dfupdated.groupby(['group'], sort=False)['score'].max()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM