update table information based on columns of another table

Question

I am new in python have two dataframes, df1 contains information about all students with their group and score, and df2 contains updated information about few students when they change their group and score. How could I update the information in df1 based on the values of df2 (group and score)?

df1

   +----+----------+-----------+----------------+
    |    |student No|   group   |       score    |
    |----+----------+-----------+----------------|
    |  0 |        0 |         0 |       0.839626 |
    |  1 |        1 |         0 |       0.845435 |
    |  2 |        2 |         3 |       0.830778 |
    |  3 |        3 |         2 |       0.831565 |
    |  4 |        4 |         3 |       0.823569 |
    |  5 |        5 |         0 |       0.808109 |
    |  6 |        6 |         4 |       0.831645 |
    |  7 |        7 |         1 |       0.851048 |
    |  8 |        8 |         3 |       0.843209 |
    |  9 |        9 |         4 |       0.84902  |
    | 10 |       10 |         0 |       0.835143 |
    | 11 |       11 |         4 |       0.843228 |
    | 12 |       12 |         2 |       0.826949 |
    | 13 |       13 |         0 |       0.84196  |
    | 14 |       14 |         1 |       0.821634 |
    | 15 |       15 |         3 |       0.840702 |
    | 16 |       16 |         0 |       0.828994 |
    | 17 |       17 |         2 |       0.843043 |
    | 18 |       18 |         4 |       0.809093 |
    | 19 |       19 |         1 |       0.85426  |
    +----+----------+-----------+----------------+

df2
+----+-----------+----------+----------------+
|    |   group   |student No|       score    |
|----+-----------+----------+----------------|
|  0 |         2 |        1 |       0.887435 |
|  1 |         0 |       19 |       0.81214  |
|  2 |         3 |       17 |       0.899041 |
|  3 |         0 |        8 |       0.853333 |
|  4 |         4 |        9 |       0.88512  |
+----+-----------+----------+----------------+

The result

df: 3

   +----+----------+-----------+----------------+
    |    |student No|   group   |       score    |
    |----+----------+-----------+----------------|
    |  0 |        0 |         0 |       0.839626 |
    |  1 |        1 |         2 |       0.887435 |
    |  2 |        2 |         3 |       0.830778 |
    |  3 |        3 |         2 |       0.831565 |
    |  4 |        4 |         3 |       0.823569 |
    |  5 |        5 |         0 |       0.808109 |
    |  6 |        6 |         4 |       0.831645 |
    |  7 |        7 |         1 |       0.851048 |
    |  8 |        8 |         0 |       0.853333 |
    |  9 |        9 |         4 |       0.88512  |
    | 10 |       10 |         0 |       0.835143 |
    | 11 |       11 |         4 |       0.843228 |
    | 12 |       12 |         2 |       0.826949 |
    | 13 |       13 |         0 |       0.84196  |
    | 14 |       14 |         1 |       0.821634 |
    | 15 |       15 |         3 |       0.840702 |
    | 16 |       16 |         0 |       0.828994 |
    | 17 |       17 |         3 |       0.899041 |
    | 18 |       18 |         4 |       0.809093 |
    | 19 |       19 |         0 |       0.81214  |
    +----+----------+-----------+----------------+

my code to update df1 from df2

dfupdated = df1.merge(df2, how='left', on=['student No'], suffixes=('', '_new'))
dfupdated['group'] = np.where(pd.notnull(dfupdated['group_new']), dfupdated['group_new'],
                                         dfupdated['group'])
dfupdated['score'] = np.where(pd.notnull(dfupdated['score_new']), dfupdated['score_new'],
                                         dfupdated['score'])
dfupdated.drop(['group_new', 'score_new'],axis=1, inplace=True)
dfupdated.reset_index(drop=True, inplace=True)

but I face the following error

KeyError: "['group'] not in index"

Answer 1

I don't know what's wrong I ran same and got the answer giving a different way to solve it

try:

dfupdated = df1.merge(df2, on='student No', how='left')
dfupdated['group'] = dfupdated['group_y'].fillna(dfupdated['group_x'])
dfupdated['score'] = dfupdated['score_y'].fillna(dfupdated['score_x'])
dfupdated.drop(['group_x', 'group_y','score_x', 'score_y'], axis=1,inplace=True)

will give you the solution you want.

to get the max from each group

dfupdated.groupby(['group'], sort=False)['score'].max()

update table information based on columns of another table

Question

1 answers

solution1
1 ACCPTED 2021-02-27 19:16:44

update table information based on columns of another table

Question

1 answers

solution1 1 ACCPTED 2021-02-27 19:16:44

solution1
1 ACCPTED 2021-02-27 19:16:44