I have a table like this:
name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate
I want to create a column like this:
df['count'] = [1,2,1,2,3,1,2,3,1]
Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases; otherwise, 'count' column equals to the last value plus 1.
It's easy to do it using loops, but I'd like to avoid this since the data is huge. Is there an alternative way to do it?
IIUC let us combine shift
with diff
and using cumsum
create the sub-group, cumcount
s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]:
0 1
1 2
2 1
3 2
4 3
5 1
6 2
7 3
8 1
dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.