I have a dataframe of surgeries and their complications with True and False values. I need to know how many times each complication occurs to each surgery. Each row represents a patient. The dataframe looks like this:
surgery_1 surgery_2 Surgery_3 complication_1 complication_2 complication_3
True False True True True False
False False False False False False
True False False True False True
I want to have a dataframe like this:
complication_1 complication_2 complication_3
surgery_1 1 1 0
surgery_2 0 0 0
surgery_3 1 0 1
I tried df.pivot_table
and df.groupby
but nothing helps me. Note that I'm not interested in how much the surgeries are. I just need to know how many times each complication occurs to each surgery
If I understand correctly, each row represents an operation performed on a patient. In the operation multiple surgeries might be performed.
Step 1 is to unpivot the DataFrame in order to have each row represent a surgery as this going to be the key of the new DataFrame
In [58]: df2 = pd.wide_to_long(df, 'surgery_', ['complication_1', 'complication_2', 'complication_3'], 'surgery_id').reset_index()
Out[58]:
complication_1 complication_2 complication_3 surgery_id surgery_
0 True True False 1 True
1 True True False 2 False
2 True True False 3 True
3 False False False 1 False
4 False False False 2 False
5 False False False 3 False
6 True False True 1 True
7 True False True 2 False
8 True False True 3 False
Now you have a row for each surgery and each patient. However not all surgeries are performed on all patients. This is given in remaining value column 'surgery_'
. Step 2 is to filter so we are only left with the rows where a surgery was actually performed
In [64]: df3 = df2.query('surgery_ == True').drop('surgery_', axis=1)
Out[64]:
complication_1 complication_2 complication_3 surgery_id
0 True True False 1
2 True True False 3
6 True False True 1
Step 3 is then straightforward: groupby
, sum
and reindex because there is no entry for 'surgery_2'
In [67]: df2.groupby('surgery_id').sum().reindex([1,2,3], fill_value=0)
Out[67]:
complication_1 complication_2 complication_3
surgery_id
1 2 1 1
2 0 0 0
3 1 1 0
This differs significantly from you desired output, but frankly I have no idea what you could want other than this;)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.