I have a dataframe
full_name x
q 1.5
q_1 1.3
q_2 1.2
q_3 1.3
r 1.5
r_1 1.3
r_2 1.2
r_3 1.3
and I'd like to create a new column which is the difference between the suffixed full names and their bases, such as the following:
full_name x x_diff
q 1.5 0
q_1 1.3 -0.2
q_2 1.2 -0.3
q_3 1.3 -0.2
r 1.5 0
r_1 1.3 -0.2
r_2 1.2 -0.3
r_3 1.3 -0.2
so, q
- q
, q_1
- q
, q_2
- q
, q_3
- q
, and the same for r
.
I've tried something like df['x_diff'] = df.res - df[df.main_name == df.full_name].x
but that doesn't work. Any advice on what to do?
Create Series
for matched main_name
with full_name
with DataFrame.set_index
and then subtract Series.map
ed main_name
:
s = df.loc[df.main_name == df.full_name].set_index('main_name')['x']
df['x_diff'] = df.x - df.main_name.map(s)
print (df)
full_name main_name x x_diff
0 q q 1.5 0.0
1 q_1 q 1.3 -0.2
2 q_2 q 1.2 -0.3
3 q_3 q 1.3 -0.2
4 r r 1.5 0.0
5 r_1 r 1.3 -0.2
6 r_2 r 1.2 -0.3
7 r_3 r 1.3 -0.2
If always first values are equals in main_name
with full_name
per groups subtract Series
created by GroupBy.first
with GroupBy.transform
:
df['x_diff'] = df.x - df.groupby('main_name')['x'].transform('first')
You can do it in 3 steps:
main_name
df.shift(1)
( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html ) x_diff
that is the difference between x_shifted
and x
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.