Pandas groupby diff removes column

Question

I have a dataframe like this:

d = {'id': ['101_i','101_e','102_i','102_e'], 1: [3, 4, 5, 7], 2: [5,9,10,11], 3: [8,4,3,7]}
df = pd.DataFrame(data=d)

I want to subtract all rows which have the same prefix id, ie subtract all values of rows 101_i with 101_e or vice versa. The code I use for that is:

df['new_identifier'] = [x.upper().replace('E', '').replace('I','').replace('_','') for x in df['id']]
df = df.groupby('new_identifier')[df.columns[1:-1]].diff().dropna()

I get the output like this:

I see that I lose the new column that I create, new_identifier . Is there a way I can retain that?

Answer 1

You can define specific aggregation function (in this case np.diff() for columns 1, 2, and 3) for columns that you know the types (int or float in this case).

import numpy as np
df.groupby('new_identifier').agg({i: np.diff for i in range(1, 4)}).dropna()

Result:

                1  2  3
new_identifier         
101             1  4 -4
102             2  1  4

Answer 2

Series.str.split to get groups, you need DataFrame.set_axis() before GroupBy , after that we use GroupBy.diff

cols = df.columns.difference(['id'])
groups = df['id'].str.split('_').str[0]

new_df = (

df.set_axis(groups, axis=0)
.groupby(level=0)
[cols]
.diff()
.dropna()
)

print(new_df)
       1    2    3
id                
101  1.0  4.0 -4.0
102  2.0  1.0  4.0

Detail Groups

df['id'].str.split('_').str[0]

0    101
1    101
2    102
3    102
Name: id, dtype: object

Pandas groupby diff removes column

Question

2 answers

solution1
2 2022-02-25 23:21:58

solution2
1 2022-02-25 23:43:09

Pandas groupby diff removes column

Question

2 answers

solution1 2 2022-02-25 23:21:58

solution2 1 2022-02-25 23:43:09

solution1
2 2022-02-25 23:21:58

solution2
1 2022-02-25 23:43:09