熊猫：计算有条件的连续行

Question

I have a table like this:我有一张这样的表：

name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate

I want to create a column like this:我想创建一个这样的列：

df['count'] = [1,2,1,2,3,1,2,3,1]

Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases;说明：当有新名称或填充率增加时，'count' 列重置为 1； otherwise, 'count' column equals to the last value plus 1.否则，'count' 列等于最后一个值加 1。

It's easy to do it using loops, but I'd like to avoid this since the data is huge.使用循环很容易做到这一点，但我想避免这种情况，因为数据很大。 Is there an alternative way to do it?有没有其他方法可以做到？

Answer 1

IIUC let us combine shift with diff and using cumsum create the sub-group, cumcount IIUC 让我们将shift与diff结合起来，并使用cumsum创建子组cumcount

s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]: 
0    1
1    2
2    1
3    2
4    3
5    1
6    2
7    3
8    1
dtype: int64

熊猫：计算有条件的连续行

问题描述

1 个解决方案

解决方案1
5 2020-02-04 15:59:12

熊猫：计算有条件的连续行

问题描述

1 个解决方案

解决方案1 5 2020-02-04 15:59:12

解决方案1
5 2020-02-04 15:59:12