简体   繁体   English

基于列名称条件的pandas dataframe列值总和

[英]Sum pandas dataframe column values based on condition of column name

I have a DataFrame with column names in the shape of xy , where I would like to sum up all columns with the same value on x without having to explicitly name them. 我有一个DataFrame,其列名的形状为xy ,在这里我想对x具有相同值的所有列求和,而不必显式命名它们。 That is, the value of column_name.split(".")[0] should determine their group. 即, column_name.split(".")[0]应确定它们的组。 Here's an example: 这是一个例子:

import pandas as pd
df = pd.DataFrame({'x.1': [1,2,3,4], 'x.2': [5,4,3,2], 'y.8': [19,2,1,3], 'y.92': [10,9,2,4]})

df
Out[3]: 
   x.1  x.2  y.8  y.92
0    1    5   19    10
1    2    4    2     9
2    3    3    1     2
3    4    2    3     4

The result should be the same as this operation, only I shouldn't have to explicitly list the column names and how they should group. 结果应该与该操作相同,只是我不必显式列出列名及其分组方式。

pd.DataFrame({'x': df[['x.1', 'x.2']].sum(axis=1), 'y': df[['y.8', 'y.92']].sum(axis=1)})

   x   y
0  6  29
1  6  11
2  6   3
3  6   7

You can first create Multiindex by split and then groupby by first level and aggregate sum : 您可以先建立Multiindexsplit ,然后groupby通过第一级和总sum

df.columns = df.columns.str.split('.', expand=True)
print (df)
   x      y    
   1  2   8  92
0  1  5  19  10
1  2  4   2   9
2  3  3   1   2
3  4  2   3   4

df = df.groupby(axis=1, level=0).sum()
print (df)
   x   y
0  6  29
1  6  11
2  6   3
3  6   7

Another option, you can extract the prefix from the column names and use it as a group variable: 另一个选择是,您可以从列名中提取前缀并将其用作组变量:

df.groupby(by = df.columns.str.split('.').str[0], axis = 1).sum()

#   x   y
#0  6   29
#1  6   11
#2  6   3
#3  6   7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM