Pandas: counting consecutive rows with condition

Question

I have a table like this:

name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate

I want to create a column like this:

df['count'] = [1,2,1,2,3,1,2,3,1]

Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases; otherwise, 'count' column equals to the last value plus 1.

It's easy to do it using loops, but I'd like to avoid this since the data is huge. Is there an alternative way to do it?

Answer 1

IIUC let us combine shift with diff and using cumsum create the sub-group, cumcount

s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]: 
0    1
1    2
2    1
3    2
4    3
5    1
6    2
7    3
8    1
dtype: int64

Pandas: counting consecutive rows with condition

Question

1 answers

solution1
5 2020-02-04 15:59:12

Pandas: counting consecutive rows with condition

Question

1 answers

solution1 5 2020-02-04 15:59:12

solution1
5 2020-02-04 15:59:12