Say I have a dataframe such as:
df = pd.DataFrame({'A': [1, 1, 2, 3, 3, 3, 1, 1]})
I'd like to count the number of time the current column value has been seen in a row previous. For the above example, the output would be:
[1, 2, 1, 1, 2, 3, 1, 2]
I know how to group by and cumulative sum all repeating values, but I don't know how to get it to restart at each new value.
ie
df['A'].groupby(df['A']).cumcount()
# returns [0, 1, 0, 0, 1, 2, 2, 3] which is not what I want.
Try this method:
df.groupby((df['A'] != df['A'].shift()).cumsum()).cumcount() + 1
Output:
0 1
1 2
2 1
3 1
4 2
5 3
6 1
7 2
dtype: int64
Use equality to check between current row and next row, then cumsum
to create a new group for each changing in 'A', then groupby
and cumcount
adding 1 to start at 1 instead of zero.
Broken up in steps so you can see the progression in the dataframe columns.
df['grp'] = df['A'] != df['A'].shift()
#for numbers you can use df['A'].diff().ne(0)
#however using inquality check is more versatile for strings
df['cumgroup'] = df['grp'].cumsum()
df['count'] = df.groupby('cumgroup').cumcount() + 1
df
Output:
A grp cumgroup count
0 1 True 1 1
1 1 False 1 2
2 2 True 2 1
3 3 True 3 1
4 3 False 3 2
5 3 False 3 3
6 1 True 4 1
7 1 False 4 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.