Suppose I have a Pandas DataFrame called df
with columns a
and b
and what I want is the number of distinct values of b
per each a
. I would do:
distcounts = df.groupby('a')['b'].nunique()
which gives the desidered result, but it is as Series object rather than another DataFrame. I'd like a DataFrame instead. In regular SQL, I'd do:
SELECT a, COUNT(DISTINCT(b)) FROM df
and haven't been able to emulate this query in Pandas exactly. How to?
I think you need reset_index
:
distcounts = df.groupby('a')['b'].nunique().reset_index()
Sample:
df = pd.DataFrame({'a':[7,8,8],
'b':[4,5,6]})
print (df)
a b
0 7 4
1 8 5
2 8 6
distcounts = df.groupby('a')['b'].nunique().reset_index()
print (distcounts)
a b
0 7 1
1 8 2
使用Groupby.agg
另一种替代方法:
df.groupby('a', as_index=False).agg({'b': 'nunique'})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.