I have a dataframe with 3 columns, signal is either 0 or 1. I need to calculate number of cells after the signal was generated. also need to start the calculation with 0 if there was no signal from the start. sample data as follows -
Time symbol signal
09:15 abc 0
09:16 abc 0
09:17 abc 0
09:18 abc 1
09:19 abc 0
09:20 abc 0
09:21 abc 0
09:22 abc 1
09:23 abc 0
09:24 abc 1
09:25 abc 1
09:26 abc 0
09:15 xyz 0
09:16 xyz 0
09:17 xyz 1
09:18 xyz 0
09:19 xyz 0
09:20 xyz 0
09:21 xyz 1
09:22 xyz 0
09:23 xyz 0
09:24 xyz 0
09:25 xyz 0
09:26 xyz 0
Expected output -
Time symbol signal MinsSinceSignal
09:15 abc 0 0
09:16 abc 0 0
09:17 abc 0 0
09:18 abc 1 1
09:19 abc 0 2
09:20 abc 0 3
09:21 abc 0 4
09:22 abc 1 1
09:23 abc 0 2
09:24 abc 1 1
09:25 abc 1 1
09:26 abc 0 2
09:15 xyz 0 0
09:16 xyz 0 0
09:17 xyz 1 1
09:18 xyz 0 2
09:19 xyz 0 3
09:20 xyz 0 4
09:21 xyz 1 1
09:22 xyz 0 2
09:23 xyz 0 3
09:24 xyz 0 4
09:25 xyz 0 5
09:26 xyz 0 6
I have tried solution from Cumsum within group and reset on condition in pandas but its not working as expected.
df['G']=df.groupby('symbol').signal.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df['MinsSinceSignal']= df.groupby([df.symbol,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())
There are couple of issues with above code.
Please help!
Here's a solution. First, group them by symbol and add a cumulative sum:
>>> df['groups'] = df.groupby('symbol').signal.transform(lambda g:g.cumsum())
>>> print(df.head(15))
symbol signal groups
0 abc 0 0
1 abc 0 0
2 abc 0 0
3 abc 1 1
4 abc 0 1
5 abc 0 1
6 abc 0 1
7 abc 1 2
8 abc 0 2
9 abc 1 3
10 abc 1 4
11 abc 0 4
12 xyz 0 0
13 xyz 0 0
14 xyz 1 1
Next, fill a new column, depending on what's in the first row of the group – if the first value is 0
, add 0
s, else 1
s. That's how much we need to add up in the final step.
>>> df['MinsSinceSignal'] = df.groupby(['symbol', 'groups'])['groups'].transform(lambda g: min((g.values[0], 1)))
>>> print(df.head(15))
symbol signal groups MinsSinceSignal
0 abc 0 0 0
1 abc 0 0 0
2 abc 0 0 0
3 abc 1 1 1
4 abc 0 1 1
5 abc 0 1 1
6 abc 0 1 1
7 abc 1 2 1
8 abc 0 2 1
9 abc 1 3 1
10 abc 1 4 1
11 abc 0 4 1
12 xyz 0 0 0
13 xyz 0 0 0
14 xyz 1 1 1
Finally, another cumsum
will generate the desired result:
>>> df['MinsSinceSignal'] = df.groupby(['symbol', 'groups'])['MinsSinceSignal'].cumsum()
>>> df = df.drop('groups', axis=1)
>>> print(df)
symbol signal MinsSinceSignal
0 abc 0 0
1 abc 0 0
2 abc 0 0
3 abc 1 1
4 abc 0 2
5 abc 0 3
6 abc 0 4
7 abc 1 1
8 abc 0 2
9 abc 1 1
10 abc 1 1
11 abc 0 2
12 xyz 0 0
13 xyz 0 0
14 xyz 1 1
15 xyz 0 2
16 xyz 0 3
17 xyz 0 4
18 xyz 1 1
19 xyz 0 2
20 xyz 0 3
21 xyz 0 4
22 xyz 0 5
23 xyz 0 6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.