In RI can calculate the change over time for each group in a data set like this:
df %>%
group_by(z) %>%
mutate(diff = y - y[x == 0])
What is the equivalent in pandas?
I know that using pandas you can minus the first value of a column like this:
df['diff'] = df.y-df.y.iloc[0]
But how do you group by variable z?
Example data:
x y z
0 2 A
5 4 A
10 6 A
0 1 B
5 3 B
10 9 B
Expected output:
x y z diff
0 2 A 0
5 4 A 2
10 6 A 4
0 1 B 0
5 5 B 4
10 9 B 8
You can try this.
temp = df.groupby('z').\
apply(lambda g: g.y - g.y[0]).\
reset_index().\
rename(columns={'y': 'diff'}).\
drop('z', axis=1)
df.merge(temp, how='inner', left_index=True, right_on='level_1').\
drop('level_1', axis=1)
Return:
x y z diff
0 2 A 0
5 4 A 2
10 6 A 4
0 1 B 0
5 5 B 4
10 9 B 8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.