Assuming a dataframe where values from any of the columns can change, Given another dataframe which contains the old value, new value and column it belongs to, how to update dataframe using information about changes? For example:
>>> my_df
x y z
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
my_df_2
contains information about changed values and their columns:
>>> my_df_2
changed_col old_value new_value
0 x 2 10
1 z 9 20
2 x 1 12
3 y 4 23
How to use information in my_df_2
to update my_df
such that my_df
now becomes:
>>> my_df
x y z
0 12 2 5
1 10 3 20
2 8 7 2
3 3 23 7
4 6 7 7
You can create a dictionary for the changes as follows:
d = {i: dict(zip(j['old_value'], j['new_value'])) for i, j in my_df_2.groupby('changed_col')}
d
Out: {'x': {1: 12, 2: 10}, 'y': {4: 23}, 'z': {9: 20}}
Then use it in DataFrame.replace :
my_df.replace(d)
Out:
x y z
0 12 2 5
1 10 3 20
2 8 7 2
3 3 23 7
4 6 7 7
You can use the update method. See http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.update.html
Example:
old_df = pd.DataFrame({"a":np.arange(5), "b": np.arange(4,9)})
+----+-----+-----+
| | a | b |
|----+-----+-----|
| 0 | 0 | 4 |
| 1 | 1 | 5 |
| 2 | 2 | 6 |
| 3 | 3 | 7 |
| 4 | 4 | 8 |
+----+-----+-----+
new_df = pd.DataFrame({"a":np.arange(7,8), "b": np.arange(10,11)})
+----+-----+-----+
| | a | b |
|----+-----+-----|
| 0 | 7 | 10 |
+----+-----+-----+
old_df.update(new_df)
+----+-----+-----+
| | a | b |
|----+-----+-----|
| 0 | 7 | 10 | #Changed row
| 1 | 1 | 5 |
| 2 | 2 | 6 |
| 3 | 3 | 7 |
| 4 | 4 | 8 |
+----+-----+-----+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.