以相同文本字符串开头的列的总和值

Question

I want to take the sum of values (row-wise) of columns that start with the same text string.我想获取以相同文本字符串开头的列的值的总和（按行）。 Underneath is my original df with fails on courses.下面是我原来的 df 课程失败。

Original df:原始df：

ID  P_English_2  P_English_3  P_German_1   P_Math_1  P_Math_3  P_Physics_2  P_Physics_4
56            1            3           1          2         0            0            3
11            0            0           0          1         4            1            0
6             0            0           0          0         0            1            0
43            1            2           1          0         0            1            1
14            0            1           0          0         1            0            0

Desired df:所需的df：

ID  P_English   P_German   P_Math   P_Physics
56          4          1        2           3
11          0          0        5           1 
6           0          0        0           1 
43          3          1        0           2
14          1          0        1           0

Tried code:试过的代码：

import pandas as pd  

df = pd.DataFrame({"ID": [56,11,6,43,14], 
             "P_Math_1": [2,1,0,0,0], 
          "P_English_3": [3,0,0,2,1],
           "P_English_2": [1,0,0,1,0], 
             "P_Math_3": [0,4,0,0,1], 
          "P_Physics_2": [0,1,1,1,0],
           "P_Physics_4": [3,0,0,1,0], 
           "P_German_1": [1,0,0,1,0]}) 

print(df)  

categories = ['P_Math', 'P_English', 'P_Physics', 'P_German'] 

def correct_categories(cols):
     return [cat for col in cols for cat in categories if col.startswith(cat)] 

result = df.groupby(correct_categories(df.columns),axis=1).sum()
 print(result)

Answer 1

Let's try groupby with axis=1:让我们尝试使用axis = 1的groupby：

# extract the subjects
subjects = [x[0] for x in df.columns.str.rsplit('_',n=1)]

df.groupby(subjects, axis=1).sum()

Output: Output：

   ID  P_English  P_German  P_Math  P_Physics
0  56          4         1       2          3
1  11          0         0       5          1
2   6          0         0       0          1
3  43          3         1       0          2
4  14          1         0       1          0

Or you can use wide_to_long , assuming ID are unique valued:或者您可以使用wide_to_long ，假设ID是唯一值：

(pd.wide_to_long(df, stubnames=categories,
               i=['ID'], j='count', sep='_')
  .groupby('ID').sum()
)

Output: Output：

    P_Math  P_English  P_Physics  P_German
ID                                        
56     2.0        4.0        3.0       1.0
11     5.0        0.0        1.0       0.0
6      0.0        0.0        1.0       0.0
43     0.0        3.0        2.0       1.0
14     1.0        1.0        0.0       0.0

以相同文本字符串开头的列的总和值

问题描述

1 个解决方案

解决方案1
5 已采纳 2021-01-19 16:46:32

以相同文本字符串开头的列的总和值

问题描述

1 个解决方案

解决方案1 5 已采纳 2021-01-19 16:46:32

解决方案1
5 已采纳 2021-01-19 16:46:32