简体   繁体   English

从 Python 中的 dataframe 的列中计算连续数字

[英]Count consecutive numbers from a column of a dataframe in Python

I have a dataframe that has segments of consecutive values appearing in column a (the value in column b does not matter):我有一个 dataframe 在a列中出现连续值段( b列中的值无关紧要):

import pandas as pd
import numpy as np
np.random.seed(150)
df = pd.DataFrame(data={'a':[1,2,3,4,5,15,16,17,18,203,204,205],'b':np.random.randint(50000,size=(12))})
>>> df
      a      b
0     1  27066
1     2  28155
2     3  49177
3     4    496
4     5   2354
5    15  23292
6    16   9358
7    17  19036
8    18  29946
9   203  39785
10  204  15843
11  205  21917

I would like to add a column c whose values are sequential counts according to presenting consecutive values in column a , as shown below:我想添加一列c其值是根据在列a中呈现连续值的顺序计数,如下所示:

a   b       c
1   27066   1
2   28155   2
3   49177   3
4   496     4
5   2354    5
15  23292   1
16  9358    2
17  19036   3
18  29946   4
203 39785   1
204 15843   2
205 21917   3

How to do this?这个怎么做?

One solution:一种解决方案:

df["c"] = (s := df["a"] - np.arange(len(df))).groupby(s).cumcount() + 1
print(df)

Output Output

      a      b  c
0     1  27066  1
1     2  28155  2
2     3  49177  3
3     4    496  4
4     5   2354  5
5    15  23292  1
6    16   9358  2
7    17  19036  3
8    18  29946  4
9   203  39785  1
10  204  15843  2
11  205  21917  3

The original idea comes from ancient Python docs .最初的想法来自古老的 Python文档

In order to use the walrus operator ( (:=) or assignment expressions ) you need Python 3.8+, instead you can do:为了使用海象运算符( (:=)赋值表达式),您需要 Python 3.8+,而不是您可以执行以下操作:

s = df["a"] - np.arange(len(df))
df["c"] = s.groupby(s).cumcount() + 1
print(df)

A simple solution is to find consecutive groups, use cumsum to get the number sequence and then remove any extra in later groups.一个简单的解决方案是找到连续的组,使用cumsum获取数字序列,然后在后面的组中删除任何多余的。

a = df['a'].add(1).shift(1).eq(df['a'])
df['c'] = a.cumsum() - a.cumsum().where(~a).ffill().fillna(0).astype(int) + 1
df

Result:结果:

      a      b  c
0     1  27066  1
1     2  28155  2
2     3  49177  3
3     4    496  4
4     5   2354  5
5    15  23292  1
6    16   9358  2
7    17  19036  3
8    18  29946  4
9   203  39785  1
10  204  15843  2
11  205  21917  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM