简体   繁体   中英

How do I calculate the first value in each group from every other value in the group to calculate change over time?

In RI can calculate the change over time for each group in a data set like this:

df %>% 
  group_by(z) %>%
  mutate(diff = y - y[x == 0])

What is the equivalent in pandas?

I know that using pandas you can minus the first value of a column like this:

df['diff'] = df.y-df.y.iloc[0]

But how do you group by variable z?

Example data:

x   y   z
0   2   A
5   4   A
10  6   A
0   1   B
5   3   B
10  9   B

Expected output:

x   y   z   diff
0   2   A   0
5   4   A   2
10  6   A   4
0   1   B   0
5   5   B   4
10  9   B   8

You can try this.

temp = df.groupby('z').\
    apply(lambda g: g.y - g.y[0]).\
    reset_index().\
    rename(columns={'y': 'diff'}).\
    drop('z', axis=1)

df.merge(temp, how='inner', left_index=True, right_on='level_1').\
    drop('level_1', axis=1)

Return:

x   y   z   diff
0   2   A   0
5   4   A   2
10  6   A   4
0   1   B   0
5   5   B   4
10  9   B   8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM