[英]Sum pandas dataframe column values based on condition of column name
I have a DataFrame with column names in the shape of xy
, where I would like to sum up all columns with the same value on x
without having to explicitly name them. 我有一个DataFrame,其列名的形状为
xy
,在这里我想对x
具有相同值的所有列求和,而不必显式命名它们。 That is, the value of column_name.split(".")[0]
should determine their group. 即,
column_name.split(".")[0]
应确定它们的组。 Here's an example: 这是一个例子:
import pandas as pd
df = pd.DataFrame({'x.1': [1,2,3,4], 'x.2': [5,4,3,2], 'y.8': [19,2,1,3], 'y.92': [10,9,2,4]})
df
Out[3]:
x.1 x.2 y.8 y.92
0 1 5 19 10
1 2 4 2 9
2 3 3 1 2
3 4 2 3 4
The result should be the same as this operation, only I shouldn't have to explicitly list the column names and how they should group. 结果应该与该操作相同,只是我不必显式列出列名及其分组方式。
pd.DataFrame({'x': df[['x.1', 'x.2']].sum(axis=1), 'y': df[['y.8', 'y.92']].sum(axis=1)})
x y
0 6 29
1 6 11
2 6 3
3 6 7
You can first create Multiindex
by split
and then groupby
by first level and aggregate sum
: 您可以先建立
Multiindex
的split
,然后groupby
通过第一级和总sum
:
df.columns = df.columns.str.split('.', expand=True)
print (df)
x y
1 2 8 92
0 1 5 19 10
1 2 4 2 9
2 3 3 1 2
3 4 2 3 4
df = df.groupby(axis=1, level=0).sum()
print (df)
x y
0 6 29
1 6 11
2 6 3
3 6 7
Another option, you can extract the prefix from the column names and use it as a group variable: 另一个选择是,您可以从列名中提取前缀并将其用作组变量:
df.groupby(by = df.columns.str.split('.').str[0], axis = 1).sum()
# x y
#0 6 29
#1 6 11
#2 6 3
#3 6 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.