简体   繁体   中英

Pandas— aggregating multiple columns with the same name?

I have multiple columns that have the same name (ie. 3 columns named "box"), and I need to have one column that sums the values so that I can concatenate my dataframes. There are multiple instances of this needing to happen.

I've tried df1 = df.groupby(level=0, axis=1).sum() but I get one column with no values. (Ie. in the first row, "box": 9, "box": 1, "box": 4 should become one column named "box" with value of 14, but it's showing one column named "box" but no value.)

You can sum based on column names using df['column_name'].sum(axis=1) and setting that for the column you want the results in.

Simple example below:

>>> df
   a  b   c  b   e
0  1  5  dd  1   6
1  2  9  ee  1  10
2  3  1  ff  1   2
>>> df['f'] = df['b'].sum(axis=1)
>>> df
   a  b   c  b   e   f
0  1  5  dd  1   6   6
1  2  9  ee  1  10  10
2  3  1  ff  1   2   2

Notice it isn't summing every value in the row, just the values from columns named 'b'.

This works because when you call a column by name and it's a duplicate, pandas recognizes that fact. For example:

>>> df['a']
0    1
1    2
2    3
Name: a, dtype: int64
>>> df['b']
   b  b
0  5  1
1  9  1
2  1  1
>>> df.columns.get_loc('a')
0
>>> df.columns.get_loc('b')
array([False,  True, False,  True, False, False])

I would recommend renaming your duplicate column names and avoiding the uncertainty that comes with having duplicate names in the future. This question has a lot of suggestions for how to do that as well as how to avoid it during import from other sources.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM