简体   繁体   中英

Pandas groupby diff removes column

I have a dataframe like this:

d = {'id': ['101_i','101_e','102_i','102_e'], 1: [3, 4, 5, 7], 2: [5,9,10,11], 3: [8,4,3,7]}
df = pd.DataFrame(data=d)

去向

I want to subtract all rows which have the same prefix id, ie subtract all values of rows 101_i with 101_e or vice versa. The code I use for that is:

df['new_identifier'] = [x.upper().replace('E', '').replace('I','').replace('_','') for x in df['id']]
df = df.groupby('new_identifier')[df.columns[1:-1]].diff().dropna()

I get the output like this:

df2

I see that I lose the new column that I create, new_identifier . Is there a way I can retain that?

You can define specific aggregation function (in this case np.diff() for columns 1, 2, and 3) for columns that you know the types (int or float in this case).

import numpy as np
df.groupby('new_identifier').agg({i: np.diff for i in range(1, 4)}).dropna()

Result:

                1  2  3
new_identifier         
101             1  4 -4
102             2  1  4

Series.str.split to get groups, you need DataFrame.set_axis() before GroupBy , after that we use GroupBy.diff

cols = df.columns.difference(['id'])
groups = df['id'].str.split('_').str[0]

new_df = (

df.set_axis(groups, axis=0)
.groupby(level=0)
[cols]
.diff()
.dropna()
)

print(new_df)
       1    2    3
id                
101  1.0  4.0 -4.0
102  2.0  1.0  4.0

Detail Groups

df['id'].str.split('_').str[0]

0    101
1    101
2    102
3    102
Name: id, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM