I have a dataset where I would like to groupby two column, sum and take the count of these values.
Data
source ex pw role date
aa 10 hello q222
aa 10 hello q222
bb 15 ok q422
bb 5 no q422
bb 1 sure q422
bb 4 yes q422
Desired
source ex pw count date
aa 20 2 q222
bb 25 4 q422
Doing
#df.groupby(['source','date'])['pw'].agg(['count','sum'])
df.groupby(['ex','date'])['pw'].agg(['count','sum'])
However, with this, I have to now perform a concatenation to merge the two outputs. Any suggestion is appreciated
use groupby()
with dropna=False
+ rename()
:
out=(df.groupby(['source','ex','date'],dropna=False)['pw'].agg(['count','sum'])
.reset_index().rename(columns={'sum':'pw'}))
OR
groupby()
with dropna=False
and aggregration with named tuples:
out=(df.groupby(['source','ex'],dropna=False)
.agg(pw=('pw','sum'),count=('pw','count'),date=('date','first'))
.reset_index())
output of out
:
source ex date count pw
0 aa NaN q222 2 20
1 NaN bb q422 4 25
Try groupby
with new key create with fillna
out = df.groupby([df.source.fillna(df.ex),df.date]).agg({'source':'first',
'ex':'first',
'pw':'sum',
'role':'count',
'date':'first'}).reset_index(drop=True)
Out[489]:
source ex pw role date
0 aa None 20 2 q222
1 None bb 25 4 q422
Try:
>>> df.fillna('').groupby(['source','ex','date']).agg({'pw': [sum, 'count']})
pw
sum count
source ex date
bb q422 25 4
aa q222 20 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.