简体   繁体   中英

Pandas: counting consecutive rows with condition

I have a table like this:

name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate

I want to create a column like this:

df['count'] = [1,2,1,2,3,1,2,3,1]

Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases; otherwise, 'count' column equals to the last value plus 1.

It's easy to do it using loops, but I'd like to avoid this since the data is huge. Is there an alternative way to do it?

IIUC let us combine shift with diff and using cumsum create the sub-group, cumcount

s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]: 
0    1
1    2
2    1
3    2
4    3
5    1
6    2
7    3
8    1
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM