简体   繁体   中英

Count by group and reset at 1 in pandas

I have a dataframe with 3 columns, signal is either 0 or 1. I need to calculate number of cells after the signal was generated. also need to start the calculation with 0 if there was no signal from the start. sample data as follows -

Time    symbol  signal
09:15   abc     0
09:16   abc     0
09:17   abc     0
09:18   abc     1
09:19   abc     0
09:20   abc     0
09:21   abc     0
09:22   abc     1
09:23   abc     0
09:24   abc     1
09:25   abc     1
09:26   abc     0
09:15   xyz     0
09:16   xyz     0
09:17   xyz     1
09:18   xyz     0
09:19   xyz     0
09:20   xyz     0
09:21   xyz     1
09:22   xyz     0
09:23   xyz     0
09:24   xyz     0
09:25   xyz     0
09:26   xyz     0

Expected output -

Time    symbol  signal  MinsSinceSignal
09:15   abc     0       0
09:16   abc     0       0
09:17   abc     0       0
09:18   abc     1       1
09:19   abc     0       2
09:20   abc     0       3
09:21   abc     0       4
09:22   abc     1       1
09:23   abc     0       2
09:24   abc     1       1
09:25   abc     1       1
09:26   abc     0       2
09:15   xyz     0       0
09:16   xyz     0       0
09:17   xyz     1       1
09:18   xyz     0       2
09:19   xyz     0       3
09:20   xyz     0       4
09:21   xyz     1       1
09:22   xyz     0       2
09:23   xyz     0       3
09:24   xyz     0       4
09:25   xyz     0       5
09:26   xyz     0       6

I have tried solution from Cumsum within group and reset on condition in pandas but its not working as expected.

df['G']=df.groupby('symbol').signal.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df['MinsSinceSignal']= df.groupby([df.symbol,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())

There are couple of issues with above code.

  1. It doesn't start with 0.
  2. when signal is 1. calculation starts from next row.

Please help!

Here's a solution. First, group them by symbol and add a cumulative sum:

>>> df['groups'] = df.groupby('symbol').signal.transform(lambda g:g.cumsum())
>>> print(df.head(15))
   symbol  signal  groups
0     abc       0       0
1     abc       0       0
2     abc       0       0
3     abc       1       1
4     abc       0       1
5     abc       0       1
6     abc       0       1
7     abc       1       2
8     abc       0       2
9     abc       1       3
10    abc       1       4
11    abc       0       4
12    xyz       0       0
13    xyz       0       0
14    xyz       1       1

Next, fill a new column, depending on what's in the first row of the group – if the first value is 0 , add 0 s, else 1 s. That's how much we need to add up in the final step.

>>> df['MinsSinceSignal'] = df.groupby(['symbol', 'groups'])['groups'].transform(lambda g: min((g.values[0], 1)))
>>> print(df.head(15))
   symbol  signal  groups  MinsSinceSignal
0     abc       0       0                0
1     abc       0       0                0
2     abc       0       0                0
3     abc       1       1                1
4     abc       0       1                1
5     abc       0       1                1
6     abc       0       1                1
7     abc       1       2                1
8     abc       0       2                1
9     abc       1       3                1
10    abc       1       4                1
11    abc       0       4                1
12    xyz       0       0                0
13    xyz       0       0                0
14    xyz       1       1                1

Finally, another cumsum will generate the desired result:

>>> df['MinsSinceSignal'] = df.groupby(['symbol', 'groups'])['MinsSinceSignal'].cumsum()
>>> df = df.drop('groups', axis=1)
>>> print(df)
   symbol  signal  MinsSinceSignal
0     abc       0                0
1     abc       0                0
2     abc       0                0
3     abc       1                1
4     abc       0                2
5     abc       0                3
6     abc       0                4
7     abc       1                1
8     abc       0                2
9     abc       1                1
10    abc       1                1
11    abc       0                2
12    xyz       0                0
13    xyz       0                0
14    xyz       1                1
15    xyz       0                2
16    xyz       0                3
17    xyz       0                4
18    xyz       1                1
19    xyz       0                2
20    xyz       0                3
21    xyz       0                4
22    xyz       0                5
23    xyz       0                6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM