如何使用 python/pandas 計算列中相同的順序值的數量？

Question

假設我有一個 dataframe 例如：

df = pd.DataFrame({'A': [1, 1, 2, 3, 3, 3, 1, 1]})

我想計算在前一行中看到當前列值的次數。 對於上述示例，output 將是：

[1, 2, 1, 1, 2, 3, 1, 2]

我知道如何對所有重復值進行分組和累積總和，但我不知道如何讓它在每個新值處重新啟動。

IE

df['A'].groupby(df['A']).cumcount() 
# returns [0, 1, 0, 0, 1, 2, 2, 3] which is not what I want.

Answer 1

試試這個方法：

df.groupby((df['A'] != df['A'].shift()).cumsum()).cumcount() + 1

Output：

0    1
1    2
2    1
3    1
4    2
5    3
6    1
7    2
dtype: int64

細節

使用相等來檢查當前行和下一行之間，然后cumsum為“A”中的每個更改創建一個新組，然后groupby和cumcount加 1 以從 1 開始而不是 0。

分解為步驟

分步分解，以便您可以看到 dataframe 列中的進展。

df['grp'] = df['A'] != df['A'].shift() 
#for numbers you can use df['A'].diff().ne(0) 
#however using inquality check is more versatile for strings
df['cumgroup'] = df['grp'].cumsum()
df['count'] = df.groupby('cumgroup').cumcount() + 1
df

Output：

   A    grp  cumgroup  count
0  1   True         1      1
1  1  False         1      2
2  2   True         2      1
3  3   True         3      1
4  3  False         3      2
5  3  False         3      3
6  1   True         4      1
7  1  False         4      2

如何使用 python/pandas 計算列中相同的順序值的數量？

問題描述

1 個解決方案

解決方案1
3 已采納 2020-07-12 19:57:35

細節

分解為步驟

如何使用 python/pandas 計算列中相同的順序值的數量？

問題描述

1 個解決方案

解決方案1 3 已采納 2020-07-12 19:57:35

細節

分解為步驟

解決方案1
3 已采納 2020-07-12 19:57:35