简体   繁体   中英

Conditional aggregation on pandas dataframe columns with combining 'n' rows into 1 row

I have the following pandas dataframe:

START   NAME
5.11    name1
9.1     name1
10.86   name1
12.61   name2
14.86   name2
23.11   name2
25.36   name1
26.61   name1
28.36   name2
31.61   name2
32.86   name1
35.61   name1
44.61   name1
46.36   name2

I would this merged by name as follows:

START   END     NAME
5.11    12.61   name1
12.61   25.36   name2
26.61   28.36   name1
28.36   32.86   name2
32.86   46.36   name1
46.36   total   name2

I tried something like this:

df2 = df.copy()
df2 = df2.rename({"name": "temp"}).reset_index()
grp = (df2['name'] != df2['name'].shift()).cumsum().rename('group')
df2 = df2.groupby(['name', grp], sort=False)

But this does not produce the desired output. Any help is appreciated

thanks

  1. use shift to compare the row's content is same with the next row
  2. keep the NAME that is not the same as the next row's NAME.
  3. use shift(-1) to assign the NAME's END.
cond = (df['NAME'] != df['NAME'].shift(1))
dfn = df[cond].copy()
dfn['END'] = dfn['START'].shift(-1).fillna('total')
dfn[['START', 'END', 'NAME']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM