How do I calculate the first value in each group from every other value in the group to calculate change over time?

Question

In RI can calculate the change over time for each group in a data set like this:

df %>% 
  group_by(z) %>%
  mutate(diff = y - y[x == 0])

What is the equivalent in pandas?

I know that using pandas you can minus the first value of a column like this:

df['diff'] = df.y-df.y.iloc[0]

But how do you group by variable z?

Example data:

Expected output:

x   y   z   diff
0   2   A   0
5   4   A   2
10  6   A   4
0   1   B   0
5   5   B   4
10  9   B   8

Answer 1

You can try this.

temp = df.groupby('z').\
    apply(lambda g: g.y - g.y[0]).\
    reset_index().\
    rename(columns={'y': 'diff'}).\
    drop('z', axis=1)

df.merge(temp, how='inner', left_index=True, right_on='level_1').\
    drop('level_1', axis=1)

Return:

x   y   z   diff
0   2   A   0
5   4   A   2
10  6   A   4
0   1   B   0
5   5   B   4
10  9   B   8

How do I calculate the first value in each group from every other value in the group to calculate change over time?

Question

1 answers

solution1
1 ACCPTED 2020-10-23 16:51:31

How do I calculate the first value in each group from every other value in the group to calculate change over time?

Question

1 answers

solution1 1 ACCPTED 2020-10-23 16:51:31

solution1
1 ACCPTED 2020-10-23 16:51:31