简体   繁体   中英

Pandas DataFrame create new columns based on a logic dependent on other columns with cumulative counting rule

I have a DataFrame originally as follows:

d1={'on':[0,1,0,1,0,0,0,1,0,0,0],'off':[0,0,0,0,0,0,1,0,1,0,1]}

原来的

My end objective is to add a new column 'final' where it will show a value of '1' once an 'on' indicator' is triggered (ignoring any duplicate) but then 'final' is switched back to '0' if the 'off' indicator is triggered AND ONLY when the 'on' sign was triggered for 3 rows. I did try coming up with any code but failed to tackle it at all.

My desired output is as follows:

期望的

Column 'final' is first triggered in row 1 when the 'on' indicator is switched to 1. 'on' indictor in row 3 is ignored as it is just a redundant signal. 'off' indictor at row 6 is triggered and the 'final' value is switched back to 0 because it has been turned on for more than 3 rows already, unlike the case in row 8 where the 'off' indicator is triggered but the 'final' value cannot be switched off until encountering another 'off' indicator in row 10 because that was the time when the 'final' value has been switched off for > 3 rows.

Thank you for assisting. Appreciate.

One solution using a "state machine" implemented with yield :

def state_machine():
    on, off = yield
    cnt, current = 0, on
    while True:
        current = int(on or current)
        cnt += current

        if off and cnt > 3:
            cnt = 0
            current = 0

        on, off = yield current


machine = state_machine()
next(machine)

df = pd.DataFrame(d1)
df['final'] = df.apply(lambda x: machine.send((x['on'], x['off'])), axis=1)

print(df)

Prints:

    on  off  final
0    0    0      0
1    1    0      1
2    0    0      1
3    1    0      1
4    0    0      1
5    0    0      1
6    0    1      0
7    1    0      1
8    0    1      1
9    0    0      1
10   0    1      0
import pandas as pd

d1 = {'on': [0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0], 'off': [0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1]}
df = pd.DataFrame(d1)
df['final'], status, hook = 0, 0, 0

for index, row in df.iterrows():
    hook = index if row['on'] else hook
    row['final'] = status = int((row['on'] or status) and (not (row['off'] and index - hook > 2)))
print(df)

Output:

         on  off  final
    0    0    0      0
    1    1    0      1
    2    0    0      1
    3    1    0      1
    4    0    0      1
    5    0    0      1
    6    0    1      0
    7    1    0      1
    8    0    1      1
    9    0    0      1
    10   0    1      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM