简体   繁体   中英

Pandas functions with rolling window

I'm working on a neonatal project with the long story short of it being that neonates are assigned a certain score based on symptoms they have at a given time point. Based on how their scores change over time, we decide whether to increase medicine dosages, keep them the same, or wean them off. We denote these 3 states numerically as either +1 (increase), 0 (maintain), or -1 (weaning) so that each time point has an associated score. The rules to decide what to do are as follows:

  • Increase dosage if sum of 3 consecutive scores >= 24 OR a single score is >= 12 (+1).
  • Lower dose if there's at least 48 hours without needing to increase dose, the sum of the 3 most recent scores is <18, AND no single score is >8 (-1).
  • Maintain dose otherwise (0)

A sample code is this:

import pandas as pd

df = pd.DataFrame({
   'baby': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
   'dateandtime':  ['7/20/2009  5:00:00 PM', '7/18/2009  5:00:00 PM', '7/18/2009  7:00:00 PM', '7/17/2009  6:00:00 AM','7/17/2009  12:01:00 AM', '7/14/2009  12:01:00 AM', '7/19/2009  5:00:00 AM', '7/16/2009  9:00:00 PM','7/19/2009  9:00:00 AM', '7/14/2009  6:00:00 PM', '7/15/2009  3:04:00 PM', '7/20/2009  5:00:00 PM','7/16/2009  12:01:00 AM', '7/18/2009  1:00:00 PM', '7/16/2009  6:00:00 AM', '7/13/2009  9:00:00 PM','7/19/2009  1:00:00 AM','7/15/2009  12:04:00 AM'],
   'score':  [6, 3, 7, 5, 13, 14, 5, 4, 11, 4, 4, 6, 7, 4, 6, 12, 6, 6],
    })

df.dateandtime = pd.to_datetime(df['dateandtime']) # change column type for ease of indexing
df = df.set_index('dateandtime')
df.sort_index(inplace = True)
df = df[~df.index.duplicated()] #Remove any duplicated rows

#Calculate conditions
df['sum_3_scores'] = df.groupby('baby')['score'].rolling(3).sum().reset_index(0,drop=True)
df['max_1_score'] = df.groupby('baby')['score'].rolling(1).max().reset_index(0,drop=True)
df['sum_3_scores_48hours'] = df.groupby('baby')['score'].rolling('48h', max_periods=3).apply(lambda x: sum(x[-3:])).reset_index(0,drop=True)

#scoring logic
def score(data):
    if data['sum_3_scores'] >= 24 or data['max_1_score'] >= 12:
        return 1
    if data['sum_3_scores_48hours'] < 18 and data['max_1_score'] < 8 and data['sum_3_scores']<24: 
        return -1
    return 0

df['rule (original)'] = df.apply(score, axis = 1)

#just for a nicely ordered output
df.reset_index().set_index(['baby','dateandtime']).sort_index()
df.sort_values(by=['baby', 'dateandtime'],inplace=True)
df.drop(['sum_3_scores','sum_3_scores_48hours'], axis=1, inplace=True)
df.sort_values(by=['baby', 'dateandtime'],inplace=True)
print(df)

This produces a nice output that is what I'm going for:

                    baby  score  max_1_score  rule (original)
dateandtime                                                  
2009-07-14 00:01:00    A     14         14.0                1
2009-07-16 21:00:00    A      4          4.0                0
2009-07-17 00:01:00    A     13         13.0                1
2009-07-17 06:00:00    A      5          5.0                0
2009-07-18 17:00:00    A      3          3.0                0
2009-07-18 19:00:00    A      7          7.0               -1
2009-07-19 05:00:00    A      5          5.0               -1
2009-07-19 09:00:00    A     11         11.0                0
2009-07-13 21:00:00    B     12         12.0                1
2009-07-14 18:00:00    B      4          4.0                0
2009-07-15 00:04:00    B      6          6.0                0
2009-07-15 15:04:00    B      4          4.0               -1
2009-07-16 00:01:00    B      7          7.0               -1
2009-07-16 06:00:00    B      6          6.0               -1
2009-07-18 13:00:00    B      4          4.0               -1
2009-07-19 01:00:00    B      6          6.0               -1
2009-07-20 17:00:00    B      6          6.0               -1

Everything is doing what I want except the problem here is that this doesn't follow the part of the decrease dosage rule that is "Lower dose if there's at least 48 hours without needing to increase dose." (in other words, if there's a +1, I can't produce a -1 until at least 48 hours later). For example, I have increased the dosage at "2009-07-17 00:01:00" but then the code says to lower the dose at "2009-07-18 19:00:00" which is less than 48 hours. Therefore, I know the issue is in my "def score(data)" function, but I'm not sure how to modify this function so that it know not to produce -1 if the time points are less than 48 hours away from an increased dosage.

The following will give you the number of days:

import pandas as pd

df = pd.DataFrame( 
    { 
        'baby': [ 
            'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B' 
        ],  
        'dateandtime':  [ 
            '7/20/2009  5:00:00 PM', '7/18/2009  5:00:00 PM', '7/18/2009  7:00:00 PM', '7/17/2009  6:00:00 AM', 
            '7/17/2009  12:01:00 AM', '7/14/2009  12:01:00 AM', '7/19/2009  5:00:00 AM', '7/16/2009  9:00:00 PM', 
            '7/19/2009  9:00:00 AM', '7/14/2009  6:00:00 PM', '7/15/2009  3:04:00 PM', '7/20/2009  5:00:00 PM', 
            '7/16/2009  12:01:00 AM', '7/18/2009  1:00:00 PM', '7/16/2009  6:00:00 AM', '7/13/2009  9:00:00 PM', 
            '7/19/2009  1:00:00 AM','7/15/2009  12:04:00 AM' 
        ], 
       'score':  [ 
           6, 3, 7, 5, 13, 14, 5, 4, 11, 4, 4, 6, 7, 4, 6, 12, 6, 6 
       ] 
    } 
)

df["dateandtime"] = pd.to_datetime(df['dateandtime'])
df = df.set_index('dateandtime').sort_index()
df = df[~df.index.duplicated()]

ndays = (
    df.assign(days=0)
    .groupby("baby")["days"].rolling(3)
    .apply(lambda row: (row.index.max() - row.index.min()).days)
)

df = df.reset_index().merge(ndays, on=["dateandtime", "baby"]).set_index("dateandtime")


You can then calculate the score based on this new column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM