I am a surgeon trying to analyse some patient data.I have a dataframe of patients (271x15) who have had multiple operations. This is from a larger (4010x71) dataframe of single operations using much help from @Arne . Essentially (see post original post ) using a pivot table then looking for multiple (>=2) operations. This is great. I am interested in the first two operations and the dates to get the number of days between them to see how long an implant lasted. The dataframe head is here and shows the patient ID and the codes (V011 and V014) for the insetion and removal of the implant.
OPERTN_01 OPDATE_01
ID
11 [V011, V014] [2016-06-21, 2017-02-27]
13 [V011, V014] [2016-07-14, 2016-01-14]
14 [V014, V011] [2014-02-25, 2014-07-01]
15 [V014, V011] [2014-06-26, 2015-04-16]
I was hoping to subtract the dates of the two operations by
pd.datetime
I am stuck at removing the brackets. I have tried replace df.replace("[", "")
, which has no effect on the dataframe or on the series OPERTN_01
. Ideally I would like to remove the square brackets throughout the dataframe rather than column by column.
The lists produced in this dataframe (thanks @Arne ) have produced great descriptive statistics but are difficult for me to manipulate.
I also have the problem that the dates in OPDATE_01 are not sorted so the difference between the dates is often negative. Could be that I am wanting to do too much at one of course..
Are you looking for something like this:
from io import StringIO
import ast
import pandas as pd
# ------ create sample data ------
s = """ID;OPERTN_01;OPDATE_01
11;["V011", "V014"];["2016-06-21", "2017-02-27"]
13;["V011", "V014"];["2016-07-14", "2016-01-14"]
14;["V014", "V011"];["2014-02-25", "2014-07-01"]
15;["V014", "V011"];["2014-06-26", "2015-04-16"]"""
df = pd.read_csv(StringIO(s), sep=';')
df['OPERTN_01'] = df['OPERTN_01'].apply(ast.literal_eval)
df['OPDATE_01'] = df['OPDATE_01'].apply(ast.literal_eval)
df = df.set_index('ID')
# ------ end sample data ------
# list comprehension to sort and convert str to datetime
df['OPDATE_01'] = [sorted([pd.to_datetime(x[0]), pd.to_datetime(x[1])]) for x in df['OPDATE_01']]
# if your values in the list are already datetime then ignore what is above and do
# df['OPDATE_01'] = df['OPDATE_01'].apply(sorted)
# apply pd.Series to explode your list into columns and then rename col if you want
date = df['OPDATE_01'].apply(pd.Series).rename(columns={0:'OPDATE_01_0', 1:'OPDATE_01_1'})
# calculate the difference between dates
date.diff(axis=1)
OPDATE_01_0 OPDATE_01_1
ID
11 NaT 251 days
13 NaT 182 days
14 NaT 126 days
15 NaT 294 days
# list comprehension to sort and convert list to datetime
df['OPDATE_01'] = [sorted([pd.to_datetime(x[0]), pd.to_datetime(x[1])]) for x in df['OPDATE_01']]
# if your values in the list are already datetime then ignore what is above and do
# df['OPDATE_01'] = df['OPDATE_01'].apply(sorted)
# apply pd.Series to explode your list into columns and then rename col if you want
date = df['OPDATE_01'].apply(pd.Series).rename(columns={0:'OPDATE_01_0', 1:'OPDATE_01_1'})
# merge two frames on ID to maintain all columns
m = df['OPERTN_01'].to_frame().merge(date, left_index=True, right_index=True)
# calc diff and assign to new column
m['diff'] = m.diff(axis=1)['OPDATE_01_1']
OPERTN_01 OPDATE_01_0 OPDATE_01_1 diff
ID
11 [V011, V014] 2016-06-21 2017-02-27 251 days
13 [V011, V014] 2016-01-14 2016-07-14 182 days
14 [V014, V011] 2014-02-25 2014-07-01 126 days
15 [V014, V011] 2014-06-26 2015-04-16 294 days
# just changing variable name to match your comment
df_implants = m
# convert OPERTN_01 to a string
s = df_implants['OPERTN_01'].apply(str)
# boolean indexing to filter df_implants where OPERTN_01 is equal to ['V011', 'V014']
v011v014 = df_implants[(s == "['V011', 'V014']")]
# boolean indexing to filter df_implants where OPERTN_01 is equal to ['V014', 'V011']
v014v011 = df_implants[(s == "['V014', 'V011']")]
OPERTN_01 OPDATE_01_0 OPDATE_01_1 diff
ID
11 [V011, V014] 2016-06-21 2017-02-27 251 days
13 [V011, V014] 2016-01-14 2016-07-14 182 days
OPERTN_01 OPDATE_01_0 OPDATE_01_1 diff
ID
14 [V014, V011] 2014-02-25 2014-07-01 126 days
15 [V014, V011] 2014-06-26 2015-04-16 294 days
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.