简体   繁体   中英

Python: Function to fill in the previous row of a non-null value

I have a dataset which is mostly timedelta values which relate to the shift length worked by emergency workers. If certain conditions were met, then the shift time was combined with the prior shift length time ['Combined Time']

What I'm having trouble getting is the 'Final Times' column. To not double count hours worked, if the shift was combined, for example row 3 and row 6, then the previous row should show NaT or 0:00 hours and any other row should return the the ['Shift Time'] value column.

在此处输入图像描述

I've been trying to write a function which I can apply which can get the ['Final Times'] column, but am having trouble specifically with accessing the prior row to the 'Combined Time' value. What I've done so far gets me 2/3 but I'm completely lost on the part (second if or elif statement) to fill in the NaT/zero part.

def my_func(x):

    if pd.notnull(x['Combined Time']):
        return x['Combined Time']      
    else:
        return x['Shift Time']
    
df['Final Times'] = df.apply(my_func, axis=1)   

Any assistance would be much appreciated!

Cheers

You can use pandas where() + bfill() to fill previous row with a "check" value, so my_func() will test it to calculate "final times".

df['Combined Time'] = df['Combined Time'].where(
                            df['Combined Time'].bfill(limit=1).isnull(), 
                            df['Combined Time'].fillna(pd.Timedelta('0:00:00')))

Modified function:

def my_func(x):
    if pd.notnull(x['Combined Time']):
        if x['Combined Time'] == pd.Timedelta('0:00:00'):
            return pd.NaT
        else:
            return x['Combined Time']
    else:
        return x['Shift Time']

Apply:

df['Final Times'] = df.apply(my_func, axis=1)
df

Result:

    Shift Time       Combined Time      Final Times
0   0 days 13:00:00  NaT                0 days 13:00:00
1   0 days 07:00:00  0 days 00:00:00    NaT
2   0 days 01:19:00  0 days 08:19:48    0 days 08:19:48
3   0 days 07:00:00  NaT                0 days 07:00:00
4   0 days 14:00:00  0 days 00:00:00    NaT
5   0 days 02:00:00  0 days 16:00:00    0 days 16:00:00

Load data:
(Please paste your data and format as code instead of screenshots)

df = pd.DataFrame({'Shift Time': [pd.Timedelta('13:00:00'), 
                             pd.Timedelta('7:00:00'),
                             pd.Timedelta('1:19:00'),
                             pd.Timedelta('7:00:00'),
                             pd.Timedelta('14:00:00'),
                             pd.Timedelta('2:00:00')],
                  'Combined Time': [np.nan, np.nan, 
                               pd.Timedelta('8:19:48'), 
                               np.nan, 
                               np.nan, 
                               pd.Timedelta('16:00:00')],
                  'Final Times': np.nan * 6})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM