基于列名称条件的pandas dataframe列值总和

Question

I have a DataFrame with column names in the shape of xy , where I would like to sum up all columns with the same value on x without having to explicitly name them. 我有一个DataFrame，其列名的形状为xy ，在这里我想对x具有相同值的所有列求和，而不必显式命名它们。 That is, the value of column_name.split(".")[0] should determine their group. 即， column_name.split(".")[0]应确定它们的组。 Here's an example: 这是一个例子：

import pandas as pd
df = pd.DataFrame({'x.1': [1,2,3,4], 'x.2': [5,4,3,2], 'y.8': [19,2,1,3], 'y.92': [10,9,2,4]})

df
Out[3]: 
   x.1  x.2  y.8  y.92
0    1    5   19    10
1    2    4    2     9
2    3    3    1     2
3    4    2    3     4

The result should be the same as this operation, only I shouldn't have to explicitly list the column names and how they should group. 结果应该与该操作相同，只是我不必显式列出列名及其分组方式。

pd.DataFrame({'x': df[['x.1', 'x.2']].sum(axis=1), 'y': df[['y.8', 'y.92']].sum(axis=1)})

   x   y
0  6  29
1  6  11
2  6   3
3  6   7

Answer 1

You can first create Multiindex by split and then groupby by first level and aggregate sum : 您可以先建立Multiindex的split ，然后groupby通过第一级和总sum ：

df.columns = df.columns.str.split('.', expand=True)
print (df)
   x      y    
   1  2   8  92
0  1  5  19  10
1  2  4   2   9
2  3  3   1   2
3  4  2   3   4

df = df.groupby(axis=1, level=0).sum()
print (df)
   x   y
0  6  29
1  6  11
2  6   3
3  6   7

Answer 2

Another option, you can extract the prefix from the column names and use it as a group variable: 另一个选择是，您可以从列名中提取前缀并将其用作组变量：

df.groupby(by = df.columns.str.split('.').str[0], axis = 1).sum()

#   x   y
#0  6   29
#1  6   11
#2  6   3
#3  6   7

基于列名称条件的pandas dataframe列值总和

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-02-19 15:08:48

解决方案2
3 2017-02-19 15:09:57

基于列名称条件的pandas dataframe列值总和

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-02-19 15:08:48

解决方案2 3 2017-02-19 15:09:57

解决方案1
3 已采纳 2017-02-19 15:08:48

解决方案2
3 2017-02-19 15:09:57