There's a dataframe. How can sum column values a001 + a002, and b +b1?
df2 = pd.DataFrame({'id':[1,2,3,4],
'a001': [1, np.nan, 3, 4],
'a002': [2, 3, 4, 5],
'b': [1, 2, 3, 4],
'b1': [2, 3, 4,np.nan],
})
id a001 a002 b b1
0 1 1.0 2 1 2.0
1 2 NaN 3 2 3.0
2 3 3.0 4 3 4.0
3 4 4.0 5 4 NaN
The final result will be,
id a b
0 1 3 3
1 2 3 5
2 3 7 7
3 4 9 4
Use and modify an answer from previous question but it has AttributeError: 'str' object has no attribute 'str'.
categories = ['a', 'b']
def correct_categories(cols):
return [cat for col in cols for cat in categories if col.str.contains(cat)]
df2.groupby(correct_categories(df2.columns),axis=1).sum()
Use .str.extract
to get the categories, and groupby
with axis=1
:
df2.groupby(df2.columns.str.extract('(\D+)', expand=False),
axis=1, sort=False).sum()
Output:
id a b
0 1.0 3.0 3.0
1 2.0 3.0 5.0
2 3.0 7.0 7.0
3 4.0 9.0 4.0
Create function that returns category if argument matches category. Otherwise, returns argument.
import re
def get_col_grouper(cats):
def col_grouper(x):
return re.sub(f'^({"|".join(cats)}).*', r'\1', x)
return col_grouper
df2.groupby(get_col_grouper(['a', 'b']), axis=1, sort=False).sum()
id a b c c1 d03 d06
0 1.0 3.0 3.0 1.0 1.0 4.0 8.0
1 2.0 3.0 5.0 2.0 2.0 5.0 4.0
2 3.0 7.0 7.0 4.0 4.0 6.0 3.0
3 4.0 9.0 4.0 9.0 9.0 7.0 0.0
df2 = pd.DataFrame({'id':[1,2,3,4],
'a001': [1, np.nan, 3, 4],
'a002': [2, 3, 4, 5],
'b': [1, 2, 3, 4],
'b1': [2, 3, 4,np.nan],
'c': [1, 2, 4, 9],
'c1': [1, 2, 4, 9],
'd03': [4, 5, 6, 7],
'd06': [8, 4, 3, None],
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.