sum values of columns containing certain strings in pandas

Question

There's a dataframe. How can sum column values a001 + a002, and b +b1?

df2 = pd.DataFrame({'id':[1,2,3,4],
        'a001': [1, np.nan, 3, 4],
        'a002': [2, 3, 4, 5],
        'b': [1, 2, 3, 4],
        'b1': [2, 3, 4,np.nan],
    })

   id  a001  a002  b   b1
0   1   1.0     2  1  2.0
1   2   NaN     3  2  3.0
2   3   3.0     4  3  4.0
3   4   4.0     5  4  NaN

The final result will be,

   id   a       b   
0   1   3     3 
1   2   3     5  
2   3   7     7 
3   4   9     4

Use and modify an answer from previous question but it has AttributeError: 'str' object has no attribute 'str'.

categories = ['a', 'b']
def correct_categories(cols):
    return [cat for col in cols for cat in categories if col.str.contains(cat)]    

df2.groupby(correct_categories(df2.columns),axis=1).sum()

Answer 1

Use .str.extract to get the categories, and groupby with axis=1 :

df2.groupby(df2.columns.str.extract('(\D+)', expand=False),
            axis=1, sort=False).sum()

Output:

    id    a    b
0  1.0  3.0  3.0
1  2.0  3.0  5.0
2  3.0  7.0  7.0
3  4.0  9.0  4.0

Answer 2

Create function that returns category if argument matches category. Otherwise, returns argument.

import re

def get_col_grouper(cats):
    
    def col_grouper(x):
        return re.sub(f'^({"|".join(cats)}).*', r'\1', x)
    
    return col_grouper

df2.groupby(get_col_grouper(['a', 'b']), axis=1, sort=False).sum()

    id    a    b    c   c1  d03  d06
0  1.0  3.0  3.0  1.0  1.0  4.0  8.0
1  2.0  3.0  5.0  2.0  2.0  5.0  4.0
2  3.0  7.0  7.0  4.0  4.0  6.0  3.0
3  4.0  9.0  4.0  9.0  9.0  7.0  0.0

Setup

df2 = pd.DataFrame({'id':[1,2,3,4],
        'a001': [1, np.nan, 3, 4],
        'a002': [2, 3, 4, 5],
        'b': [1, 2, 3, 4],
        'b1': [2, 3, 4,np.nan],
        'c': [1, 2, 4, 9],
        'c1': [1, 2, 4, 9],
        'd03': [4, 5, 6, 7],
        'd06': [8, 4, 3, None],
    })

sum values of columns containing certain strings in pandas

Question

2 answers

solution1
0 2021-03-11 22:06:51

solution2
0 ACCPTED 2021-03-11 22:52:32

Setup

sum values of columns containing certain strings in pandas

Question

2 answers

solution1 0 2021-03-11 22:06:51

solution2 0 ACCPTED 2021-03-11 22:52:32

Setup

solution1
0 2021-03-11 22:06:51

solution2
0 ACCPTED 2021-03-11 22:52:32