[英]Count consecutive numbers from a column of a dataframe in Python
I have a dataframe that has segments of consecutive values appearing in column a
(the value in column b
does not matter):我有一个 dataframe 在
a
列中出现连续值段( b
列中的值无关紧要):
import pandas as pd
import numpy as np
np.random.seed(150)
df = pd.DataFrame(data={'a':[1,2,3,4,5,15,16,17,18,203,204,205],'b':np.random.randint(50000,size=(12))})
>>> df
a b
0 1 27066
1 2 28155
2 3 49177
3 4 496
4 5 2354
5 15 23292
6 16 9358
7 17 19036
8 18 29946
9 203 39785
10 204 15843
11 205 21917
I would like to add a column c
whose values are sequential counts according to presenting consecutive values in column a
, as shown below:我想添加一列
c
其值是根据在列a
中呈现连续值的顺序计数,如下所示:
a b c
1 27066 1
2 28155 2
3 49177 3
4 496 4
5 2354 5
15 23292 1
16 9358 2
17 19036 3
18 29946 4
203 39785 1
204 15843 2
205 21917 3
How to do this?这个怎么做?
One solution:一种解决方案:
df["c"] = (s := df["a"] - np.arange(len(df))).groupby(s).cumcount() + 1
print(df)
Output Output
a b c
0 1 27066 1
1 2 28155 2
2 3 49177 3
3 4 496 4
4 5 2354 5
5 15 23292 1
6 16 9358 2
7 17 19036 3
8 18 29946 4
9 203 39785 1
10 204 15843 2
11 205 21917 3
The original idea comes from ancient Python docs .最初的想法来自古老的 Python文档。
In order to use the walrus operator ( (:=)
or assignment expressions ) you need Python 3.8+, instead you can do:为了使用海象运算符(
(:=)
或赋值表达式),您需要 Python 3.8+,而不是您可以执行以下操作:
s = df["a"] - np.arange(len(df))
df["c"] = s.groupby(s).cumcount() + 1
print(df)
A simple solution is to find consecutive groups, use cumsum
to get the number sequence and then remove any extra in later groups.一个简单的解决方案是找到连续的组,使用
cumsum
获取数字序列,然后在后面的组中删除任何多余的。
a = df['a'].add(1).shift(1).eq(df['a'])
df['c'] = a.cumsum() - a.cumsum().where(~a).ffill().fillna(0).astype(int) + 1
df
Result:结果:
a b c
0 1 27066 1
1 2 28155 2
2 3 49177 3
3 4 496 4
4 5 2354 5
5 15 23292 1
6 16 9358 2
7 17 19036 3
8 18 29946 4
9 203 39785 1
10 204 15843 2
11 205 21917 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.