I have a dataset of domains could someone tell me how I can filter domains with more than one extension with Pandas.
I grouped it by this code but I got this result:
dfActive.groupby(['domain','ext'])['ext'].nunique()
Result:
domain com 1
sample com 1
mashhadmap com 1
net 1
Expected Result:
mashhadmap 2
IIUC use if need count per first level domain
by aggregate sum
:
dfActive.groupby(['domain','ext'])['ext'].nunique().groupby(level=0).sum()
If need filter values if duplicated per first level:
s = dfActive.groupby(['domain','ext'])['ext'].nunique()
s = s[s.index.get_level_values(0).duplicated(keep=False)]
#and then if need aggregate sum
out = s.groupby(level=0).sum()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.