简体   繁体   中英

Pandas group by two columns and count the second column value by each group

I have a dataset of domains could someone tell me how I can filter domains with more than one extension with Pandas.

I grouped it by this code but I got this result:

dfActive.groupby(['domain','ext'])['ext'].nunique()

Result:

domain         com     1
sample         com     1
mashhadmap     com     1
               net     1

Expected Result:

mashhadmap     2

IIUC use if need count per first level domain by aggregate sum :

dfActive.groupby(['domain','ext'])['ext'].nunique().groupby(level=0).sum()

If need filter values if duplicated per first level:

s = dfActive.groupby(['domain','ext'])['ext'].nunique()
s = s[s.index.get_level_values(0).duplicated(keep=False)]

#and then if need aggregate sum
out = s.groupby(level=0).sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM