从 Python 中的 dataframe 的列中计算连续数字

Question

I have a dataframe that has segments of consecutive values appearing in column a (the value in column b does not matter):我有一个 dataframe 在a列中出现连续值段（ b列中的值无关紧要）：

import pandas as pd
import numpy as np
np.random.seed(150)
df = pd.DataFrame(data={'a':[1,2,3,4,5,15,16,17,18,203,204,205],'b':np.random.randint(50000,size=(12))})
>>> df
      a      b
0     1  27066
1     2  28155
2     3  49177
3     4    496
4     5   2354
5    15  23292
6    16   9358
7    17  19036
8    18  29946
9   203  39785
10  204  15843
11  205  21917

I would like to add a column c whose values are sequential counts according to presenting consecutive values in column a , as shown below:我想添加一列c其值是根据在列a中呈现连续值的顺序计数，如下所示：

a   b       c
1   27066   1
2   28155   2
3   49177   3
4   496     4
5   2354    5
15  23292   1
16  9358    2
17  19036   3
18  29946   4
203 39785   1
204 15843   2
205 21917   3

How to do this?这个怎么做？

Answer 1

One solution:一种解决方案：

df["c"] = (s := df["a"] - np.arange(len(df))).groupby(s).cumcount() + 1
print(df)

Output Output

      a      b  c
0     1  27066  1
1     2  28155  2
2     3  49177  3
3     4    496  4
4     5   2354  5
5    15  23292  1
6    16   9358  2
7    17  19036  3
8    18  29946  4
9   203  39785  1
10  204  15843  2
11  205  21917  3

The original idea comes from ancient Python docs .最初的想法来自古老的 Python文档。

In order to use the walrus operator ( (:=) or assignment expressions ) you need Python 3.8+, instead you can do:为了使用海象运算符（ (:=)或赋值表达式），您需要 Python 3.8+，而不是您可以执行以下操作：

s = df["a"] - np.arange(len(df))
df["c"] = s.groupby(s).cumcount() + 1
print(df)

Answer 2

A simple solution is to find consecutive groups, use cumsum to get the number sequence and then remove any extra in later groups.一个简单的解决方案是找到连续的组，使用cumsum获取数字序列，然后在后面的组中删除任何多余的。

a = df['a'].add(1).shift(1).eq(df['a'])
df['c'] = a.cumsum() - a.cumsum().where(~a).ffill().fillna(0).astype(int) + 1
df

Result:结果：

      a      b  c
0     1  27066  1
1     2  28155  2
2     3  49177  3
3     4    496  4
4     5   2354  5
5    15  23292  1
6    16   9358  2
7    17  19036  3
8    18  29946  4
9   203  39785  1
10  204  15843  2
11  205  21917  3

从 Python 中的 dataframe 的列中计算连续数字

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-07-26 09:45:49

解决方案2
1 2022-07-26 09:55:22

从 Python 中的 dataframe 的列中计算连续数字

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-07-26 09:45:49

解决方案2 1 2022-07-26 09:55:22

解决方案1
2 已采纳 2022-07-26 09:45:49

解决方案2
1 2022-07-26 09:55:22