[英]Pandas: counting consecutive rows with condition
I have a table like this:我有一张这样的表:
name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate
I want to create a column like this:我想创建一个这样的列:
df['count'] = [1,2,1,2,3,1,2,3,1]
Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases;说明:当有新名称或填充率增加时,'count' 列重置为 1; otherwise, 'count' column equals to the last value plus 1.否则,'count' 列等于最后一个值加 1。
It's easy to do it using loops, but I'd like to avoid this since the data is huge.使用循环很容易做到这一点,但我想避免这种情况,因为数据很大。 Is there an alternative way to do it?有没有其他方法可以做到?
IIUC let us combine shift
with diff
and using cumsum
create the sub-group, cumcount
IIUC 让我们将shift
与diff
结合起来,并使用cumsum
创建子组cumcount
s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]:
0 1
1 2
2 1
3 2
4 3
5 1
6 2
7 3
8 1
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.