I'm working on a neonatal project with the long story short of it being that neonates are assigned a certain score based on symptoms they have at a given time point. Based on how their scores change over time, we decide whether to increase medicine dosages, keep them the same, or wean them off. We denote these 3 states numerically as either +1 (increase), 0 (maintain), or -1 (weaning) so that each time point has an associated score. The rules to decide what to do are as follows:
A sample code is this:
import pandas as pd
df = pd.DataFrame({
'baby': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
'dateandtime': ['7/20/2009 5:00:00 PM', '7/18/2009 5:00:00 PM', '7/18/2009 7:00:00 PM', '7/17/2009 6:00:00 AM','7/17/2009 12:01:00 AM', '7/14/2009 12:01:00 AM', '7/19/2009 5:00:00 AM', '7/16/2009 9:00:00 PM','7/19/2009 9:00:00 AM', '7/14/2009 6:00:00 PM', '7/15/2009 3:04:00 PM', '7/20/2009 5:00:00 PM','7/16/2009 12:01:00 AM', '7/18/2009 1:00:00 PM', '7/16/2009 6:00:00 AM', '7/13/2009 9:00:00 PM','7/19/2009 1:00:00 AM','7/15/2009 12:04:00 AM'],
'score': [6, 3, 7, 5, 13, 14, 5, 4, 11, 4, 4, 6, 7, 4, 6, 12, 6, 6],
})
df.dateandtime = pd.to_datetime(df['dateandtime']) # change column type for ease of indexing
df = df.set_index('dateandtime')
df.sort_index(inplace = True)
df = df[~df.index.duplicated()] #Remove any duplicated rows
#Calculate conditions
df['sum_3_scores'] = df.groupby('baby')['score'].rolling(3).sum().reset_index(0,drop=True)
df['max_1_score'] = df.groupby('baby')['score'].rolling(1).max().reset_index(0,drop=True)
df['sum_3_scores_48hours'] = df.groupby('baby')['score'].rolling('48h', max_periods=3).apply(lambda x: sum(x[-3:])).reset_index(0,drop=True)
#scoring logic
def score(data):
if data['sum_3_scores'] >= 24 or data['max_1_score'] >= 12:
return 1
if data['sum_3_scores_48hours'] < 18 and data['max_1_score'] < 8 and data['sum_3_scores']<24:
return -1
return 0
df['rule (original)'] = df.apply(score, axis = 1)
#just for a nicely ordered output
df.reset_index().set_index(['baby','dateandtime']).sort_index()
df.sort_values(by=['baby', 'dateandtime'],inplace=True)
df.drop(['sum_3_scores','sum_3_scores_48hours'], axis=1, inplace=True)
df.sort_values(by=['baby', 'dateandtime'],inplace=True)
print(df)
This produces a nice output that is what I'm going for:
baby score max_1_score rule (original)
dateandtime
2009-07-14 00:01:00 A 14 14.0 1
2009-07-16 21:00:00 A 4 4.0 0
2009-07-17 00:01:00 A 13 13.0 1
2009-07-17 06:00:00 A 5 5.0 0
2009-07-18 17:00:00 A 3 3.0 0
2009-07-18 19:00:00 A 7 7.0 -1
2009-07-19 05:00:00 A 5 5.0 -1
2009-07-19 09:00:00 A 11 11.0 0
2009-07-13 21:00:00 B 12 12.0 1
2009-07-14 18:00:00 B 4 4.0 0
2009-07-15 00:04:00 B 6 6.0 0
2009-07-15 15:04:00 B 4 4.0 -1
2009-07-16 00:01:00 B 7 7.0 -1
2009-07-16 06:00:00 B 6 6.0 -1
2009-07-18 13:00:00 B 4 4.0 -1
2009-07-19 01:00:00 B 6 6.0 -1
2009-07-20 17:00:00 B 6 6.0 -1
Everything is doing what I want except the problem here is that this doesn't follow the part of the decrease dosage rule that is "Lower dose if there's at least 48 hours without needing to increase dose." (in other words, if there's a +1, I can't produce a -1 until at least 48 hours later). For example, I have increased the dosage at "2009-07-17 00:01:00" but then the code says to lower the dose at "2009-07-18 19:00:00" which is less than 48 hours. Therefore, I know the issue is in my "def score(data)" function, but I'm not sure how to modify this function so that it know not to produce -1 if the time points are less than 48 hours away from an increased dosage.
The following will give you the number of days:
import pandas as pd
df = pd.DataFrame(
{
'baby': [
'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'
],
'dateandtime': [
'7/20/2009 5:00:00 PM', '7/18/2009 5:00:00 PM', '7/18/2009 7:00:00 PM', '7/17/2009 6:00:00 AM',
'7/17/2009 12:01:00 AM', '7/14/2009 12:01:00 AM', '7/19/2009 5:00:00 AM', '7/16/2009 9:00:00 PM',
'7/19/2009 9:00:00 AM', '7/14/2009 6:00:00 PM', '7/15/2009 3:04:00 PM', '7/20/2009 5:00:00 PM',
'7/16/2009 12:01:00 AM', '7/18/2009 1:00:00 PM', '7/16/2009 6:00:00 AM', '7/13/2009 9:00:00 PM',
'7/19/2009 1:00:00 AM','7/15/2009 12:04:00 AM'
],
'score': [
6, 3, 7, 5, 13, 14, 5, 4, 11, 4, 4, 6, 7, 4, 6, 12, 6, 6
]
}
)
df["dateandtime"] = pd.to_datetime(df['dateandtime'])
df = df.set_index('dateandtime').sort_index()
df = df[~df.index.duplicated()]
ndays = (
df.assign(days=0)
.groupby("baby")["days"].rolling(3)
.apply(lambda row: (row.index.max() - row.index.min()).days)
)
df = df.reset_index().merge(ndays, on=["dateandtime", "baby"]).set_index("dateandtime")
You can then calculate the score based on this new column
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.