简体   繁体   中英

Find time interval that satisfy certain condition using pandas dataframe

Is there a better way to find time interval that has consecutive 1's on cond1 and contains a 1 on cond2?

I tried iterate through the df before, but now I want a better performance.

Sample input:

df = pd.DataFrame({'date': pd.date_range(dt.datetime(2022,1,1), dt.datetime(2022,1,15)),
                   'cond1':[0,0,0,1,1,1,0,0,0,1,1,1,0,0,0],
                   'cond2':[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0]})
print(df)

         date  cond1  cond2
0  2022-01-01      0      0
1  2022-01-02      0      0
2  2022-01-03      0      0
3  2022-01-04      1      0
4  2022-01-05      1      1
5  2022-01-06      1      0
6  2022-01-07      0      0
7  2022-01-08      0      0
8  2022-01-09      0      0
9  2022-01-10      1      0
10 2022-01-11      1      0
11 2022-01-12      1      0
12 2022-01-13      0      0
13 2022-01-14      0      0
14 2022-01-15      0      0

Sample output:

  cond2_date cond1_start_date cond1_end_date  duration
0 2022-01-05       2022-01-04     2022-01-06         3

edit:

I used cumsum() method, and here's my new problem:

Is there a way i can use groupby result as a dataframe?

groups = df.groupby(df.cond1.shift().ne(df.cond1).cumsum())
out = pd.DataFrame()
for group in groups:
    if(group['cond1'].all()==True and group['cond2'].any()==True):
        out['cond2_date'] = group.loc(group['cond2']==True)[time]
        out['cond1_start_date'] = group.first()
        out['cond1_end_date'] = group.last()
        out['duration'] = out['cond1_end_date'] - out['cond1_start_date']
import pandas as pd
import datetime as dt

#creating the dataframe   
df = pd.DataFrame({'date': pd.date_range(dt.datetime(2022,1,1), dt.datetime(2022,1,15)),
                       'cond1':[0,0,0,1,1,1,0,0,0,1,1,1,0,0,0],
                       'cond2':[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0]})

#mark rows as True or False by equating 'cond1' to 1
is_B = df['cond1'].eq(1)

#creating mask
mask = is_B & (~(is_B.shift() & is_B.shift(-1)) )
output = list(df.index[mask])

#creating a list as start_date and end_date using the list of 
dates from the mask
ox = [output[i:i+2] for i in range(0,  len(output), 2)]
l = [[df.loc[i, 'date'], df.loc[j, 'date']] for i,j in ox]

#creating a dataframe from the list
dx = pd.DataFrame(l, columns = ['cond1_start_date', 'cond1_end_date'])

#add a new column to dx as a list of all the cond2_dates where cond1 is true
dx['cond2_date'] = dx.apply(lambda row : pd.date_range(row['cond1_start_date'], row['cond1_end_date'], freq='d').tolist() , axis = 1)

dx['duration'] = dx.apply(lambda row : (row['cond1_end_date'] - row['cond1_start_date']).days + 1, axis = 1)

#exploding the column with list of cond2_dates to make a new row for each value
dx = dx.explode(['cond2_date'], ignore_index = True)
print(dx)

If you need 'cond1' to be before and after the event when 'cond1' and 'cond2' are equal to 1. First, a list is created with indexes where these events occur(ind). Then these indexes are iterated through in the loop. There are two nested loops in this loop, one checks if there is 'cond1' == 1 in the previous data, the other checks in the following. Next, if there is at least one value after and before, then we output the dates using indexes and the number of continuous sequences 'cond1'.

import datetime as dt
import pandas as pd

df = pd.DataFrame({'date': pd.date_range(dt.datetime(2022,1,1), dt.datetime(2022,1,15)),
                   'cond1':[0,0,0,1,1,1,0,0,0,1,1,1,0,0,0],
                   'cond2':[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0]})

ind = df[(df['cond1'] == 1) & (df['cond2'] == 1)].index

for i in range(0, len(ind)):
    cond1_start_date = -1
    cond1_end_date = -1
    duration = 0
    ppp = 0
    for y in range((ind[i] -1), -1, -1):
        if df.loc[y, 'cond1'] == 1:#check for the presence of 1, if not then break the loop
            cond1_start_date = y
        else:
            break
    if cond1_start_date >  -1:#condition is met, proceed to the next cycle
        for x in range(ind[i] + 1, len(df)):
            if df.loc[x, 'cond1'] == 1:##check for the presence of 1, if not then break the loop
                cond1_end_date = x
            else:
                break

        if cond1_end_date > -1:#the following condition is met show the data
            duration = cond1_end_date - cond1_start_date + 1
            cond1_start_date = df.loc[cond1_start_date, 'date']
            cond1_end_date = df.loc[cond1_end_date, 'date']
            cond2_date = df.loc[ind[i], 'date']
            print('cond2_date', cond2_date, 'cond1_start_date', cond1_start_date, 'cond1_end_date', cond1_end_date,
                  'duration', duration)

Output

cond2_date 2022-01-05 00:00:00 cond1_start_date 2022-01-04 00:00:00 cond1_end_date 2022-01-06 00:00:00 duration 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM