I have two dataframes, df1 and df2, and I would like to substruct the df2 from df1 and using as a row comparison a specific column, 'Code'
import pandas as pd
import numpy as np
rng = pd.date_range('2021-01-01', periods=10, freq='D')
df1 = pd.DataFrame(index=rng, data={'Val1': range(10), 'Val2': np.array(range(10))*5, 'Code': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3]})
df2 = pd.DataFrame(data={'Code': [1, 2, 3, 4], 'Val1': [10, 5, 15, 20], 'Val2': [4, 8, 10, 7]})
df1:
Val1 Val2 Code
2021-01-01 0 0 1
2021-01-02 1 5 1
2021-01-03 2 10 1
2021-01-04 3 15 2
2021-01-05 4 20 2
2021-01-06 5 25 2
2021-01-07 6 30 3
2021-01-08 7 35 3
2021-01-09 8 40 3
2021-01-10 9 45 3
df2:
Code Val1 Val2
0 1 10 4
1 2 5 8
2 3 15 10
3 4 20 7
I using the following code:
df = (df1.set_index(['Code']) - df2.set_index(['Code']))
and the result is
Code
1 -10.0 -4.0
1 -9.0 1.0
1 -8.0 6.0
2 -2.0 7.0
2 -1.0 12.0
2 0.0 17.0
3 -9.0 20.0
3 -8.0 25.0
3 -7.0 30.0
3 -6.0 35.0
4 NaN NaN
However, I only want to get the results for the rows that are in df1 and not the missing keys, in this example the 4.
How do I do it and then to set back the index to the original from df1?
Something like that but it doesn't work:
df = (df1.set_index(['Code']) - df2.set_index(['Code'])).set_index(df1['Code'])
Also I would like to keep the headers of the columns.
Desired output:
Val1 Val2 Code
Date
2021-01-01 -10.0 -4.0 1
2021-01-02 -9.0 1.0 1
2021-01-03 -8.0 6.0 1
2021-01-04 -2.0 7.0 2
2021-01-05 -1.0 12.0 2
2021-01-06 0.0 17.0 2
2021-01-07 -9.0 20.0 3
2021-01-08 -8.0 25.0 3
2021-01-09 -7.0 30.0 3
2021-01-10 -6.0 35.0 3
If you want to get the results for the rows that are in df1 and not the missing keys, in this example the 4 then just use drop_na()
method
df = (df1.set_index(['Code']) - df2.set_index(['Code'])).dropna()
then:-
df.insert(0,'Date',df1.index)
And Finally:-
df.reset_index(inplace=True)
df.set_index('Date',inplace=True)
Now if you print df
you will get your desired output:-
Code Val1 Val2
Date
2021-01-01 1 -10.0 -4.0
2021-01-02 1 -9.0 1.0
2021-01-03 1 -8.0 6.0
2021-01-04 2 -2.0 7.0
2021-01-05 2 -1.0 12.0
2021-01-06 2 0.0 17.0
2021-01-07 3 -9.0 20.0
2021-01-08 3 -8.0 25.0
2021-01-09 3 -7.0 30.0
2021-01-10 3 -6.0 35.0
Note:-In case this is not your desired output then let me know
You can use reindex
to align df2
to df1["code"]
. Then we can take the underlying numpy ndarray and subtract that inplace from the corresponding columns df1
. This will leave both the index and the "code" column untouched and perform subtraction as expected.
subtract_values = df2.set_index("Code").reindex(df1["Code"]).to_numpy()
df1[["Val1", "Val2"]] -= subtract_values
print(df1)
Val1 Val2 Code
2021-01-01 -10 -4 1
2021-01-02 -9 1 1
2021-01-03 -8 6 1
2021-01-04 -2 7 2
2021-01-05 -1 12 2
2021-01-06 0 17 2
2021-01-07 -9 20 3
2021-01-08 -8 25 3
2021-01-09 -7 30 3
2021-01-10 -6 35 3
If you don't want to change df1
, you can copy the data to a new DataFrame
via new_df = df1.copy()
and proceeding with new_df
instead of df1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.