简体   繁体   English

熊猫:计算有条件的连续行

[英]Pandas: counting consecutive rows with condition

I have a table like this:我有一张这样的表:

name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate

I want to create a column like this:我想创建一个这样的列:

df['count'] = [1,2,1,2,3,1,2,3,1]

Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases;说明:当有新名称或填充率增加时,'count' 列重置为 1; otherwise, 'count' column equals to the last value plus 1.否则,'count' 列等于最后一个值加 1。

It's easy to do it using loops, but I'd like to avoid this since the data is huge.使用循环很容易做到这一点,但我想避免这种情况,因为数据很大。 Is there an alternative way to do it?有没有其他方法可以做到?

IIUC let us combine shift with diff and using cumsum create the sub-group, cumcount IIUC 让我们将shiftdiff结合起来,并使用cumsum创建子组cumcount

s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]: 
0    1
1    2
2    1
3    2
4    3
5    1
6    2
7    3
8    1
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM