How to sum values of multiple columns against multiple columns of the same dataframe?

Question

I have a dataframe of surgeries and their complications with True and False values. I need to know how many times each complication occurs to each surgery. Each row represents a patient. The dataframe looks like this:

surgery_1 surgery_2 Surgery_3 complication_1  complication_2 complication_3
True        False     True       True              True         False
False       False     False      False             False        False
True        False     False      True              False        True

I want to have a dataframe like this:

           complication_1    complication_2     complication_3
surgery_1       1                  1                   0
surgery_2       0                  0                   0
surgery_3       1                  0                   1

I tried df.pivot_table and df.groupby but nothing helps me. Note that I'm not interested in how much the surgeries are. I just need to know how many times each complication occurs to each surgery

Answer 1

If I understand correctly, each row represents an operation performed on a patient. In the operation multiple surgeries might be performed.

Step 1 is to unpivot the DataFrame in order to have each row represent a surgery as this going to be the key of the new DataFrame

In [58]: df2 = pd.wide_to_long(df, 'surgery_', ['complication_1', 'complication_2', 'complication_3'], 'surgery_id').reset_index()
Out[58]: 
   complication_1  complication_2  complication_3  surgery_id  surgery_
0            True            True           False           1      True
1            True            True           False           2     False
2            True            True           False           3      True
3           False           False           False           1     False
4           False           False           False           2     False
5           False           False           False           3     False
6            True           False            True           1      True
7            True           False            True           2     False
8            True           False            True           3     False

Now you have a row for each surgery and each patient. However not all surgeries are performed on all patients. This is given in remaining value column 'surgery_' . Step 2 is to filter so we are only left with the rows where a surgery was actually performed

In [64]: df3 = df2.query('surgery_ == True').drop('surgery_', axis=1)
Out[64]: 
   complication_1  complication_2  complication_3  surgery_id
0            True            True           False           1
2            True            True           False           3
6            True           False            True           1

Step 3 is then straightforward: groupby , sum and reindex because there is no entry for 'surgery_2'

In [67]: df2.groupby('surgery_id').sum().reindex([1,2,3], fill_value=0)
Out[67]: 
            complication_1  complication_2  complication_3
surgery_id                                                
1                        2               1               1
2                        0               0               0
3                        1               1               0

This differs significantly from you desired output, but frankly I have no idea what you could want other than this;)

How to sum values of multiple columns against multiple columns of the same dataframe?

Question

1 answers

solution1
0 2021-03-15 13:11:44

How to sum values of multiple columns against multiple columns of the same dataframe?

Question

1 answers

solution1 0 2021-03-15 13:11:44

solution1
0 2021-03-15 13:11:44