[英]python pandas : group by several columns and count value for one column
我有df :
orgs feature1 feature2 feature3
0 org1 True True NaN
1 org1 NaN True NaN
2 org2 NaN True True
3 org3 True True NaN
4 org4 True True True
5 org4 True True True
現在我想計算每個功能的不同組織的數量。 基本上有一個df_Result這樣的:
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2
有沒有人知道如何做到這一點?
df1 = df.groupby('orgs')
.apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())).sum().reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2
aggregate
Series.nunique
另一個解決方案:
df1 = df.groupby('orgs')
.agg(lambda x: pd.Series.nunique(x))
.sum()
.astype(int)
.reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2
stack
解決方案有效,但返回警告:
C:\\Anaconda3\\lib\\site-packages\\pandas\\core\\groupby.py:2937: FutureWarning: numpy not_equal 將來不會檢查對象身份。 比較沒有返回與標識 (
is
)) 所建議的結果相同的結果,並且會發生變化。 inc = np.r_[1, val[1:] != val[:-1]]
df1 = df.set_index('orgs').stack(dropna=False)
df1 = df1.groupby(level=[0,1]).nunique().unstack().sum().reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.