python pandas：按幾列分組並計算一列的值

Question

我有df ：

    orgs  feature1       feature2      feature3
0   org1        True        True         NaN
1   org1        NaN        True         NaN
2   org2        NaN        True         True 
3   org3        True        True       NaN
4   org4        True        True       True 
5   org4        True        True       True

現在我想計算每個功能的不同組織的數量。 基本上有一個df_Result這樣的：

    features  count_distinct_orgs      
0   feature1        3        
1   feature2        4      
2   feature3        2

有沒有人知道如何做到這一點？

Answer 1

您可以將sum添加到以前的解決方案中：

df1 = df.groupby('orgs')
        .apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())).sum().reset_index()
df1.columns = ['features','count_distinct_orgs']

print (df1)
   features  count_distinct_orgs
0  feature1                    3
1  feature2                    4
2  feature3                    2

aggregate Series.nunique另一個解決方案：

df1 = df.groupby('orgs')
        .agg(lambda x: pd.Series.nunique(x))
        .sum()
        .astype(int)
        .reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
   features  count_distinct_orgs
0  feature1                    3
1  feature2                    4
2  feature3                    2

stack解決方案有效，但返回警告：

C:\\Anaconda3\\lib\\site-packages\\pandas\\core\\groupby.py:2937: FutureWarning: numpy not_equal 將來不會檢查對象身份。 比較沒有返回與標識 ( is )) 所建議的結果相同的結果，並且會發生變化。 inc = np.r_[1, val[1:] != val[:-1]]

df1 = df.set_index('orgs').stack(dropna=False)
df1 = df1.groupby(level=[0,1]).nunique().unstack().sum().reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
   features  count_distinct_orgs
0  feature1                    3
1  feature2                    4
2  feature3                    2

python pandas：按幾列分組並計算一列的值

問題描述

1 個解決方案

解決方案1
2 已采納 2016-10-13 08:47:59

python pandas：按幾列分組並計算一列的值

問題描述

1 個解決方案

解決方案1 2 已采納 2016-10-13 08:47:59

解決方案1
2 已采納 2016-10-13 08:47:59