帶有lambda函數的pandas groupby中無法使用.size（）.div（）方法

Question

我正在使用以下代碼行來計算條件概率

    variable = 'variable_name'
    probs = df.groupby(variable).size().div(len(df))
    cond_probs = df.groupby([variable, 'has_income']).size().div(len(df)).div(probs, axis=0, level=variable)

這些將導致以下輸出：

    varibale_name         has_income
    (0.999, 2.0]          False          0.756323
                          True           0.243677
    (2.0, 3.0]            False          0.798372
                          True           0.201628
    (3.0, 16.0]           False          0.809635
                          True           0.190365

我想在輸出中添加額外的列作為每個組的樣本大小，但是我無法在lambda函數中重寫公式，因為組對象與返回的對象沒有相同的方法通過df.groupby（） 。例：

    cond_probs =df.groupby([variable, 'has_income']).apply(lambda x: 
    pd.Series({
        'probs': x.size().div(len(df)).div(probs, axis=0, level=variable),
        'size': x.size()
    }))

錯誤：TypeError：“ numpy.int32”對象不可調用

是否有其他選擇可以以理想的方式獲得這些結果，而無需計算兩個groupby並在最后加入數據幀？

Answer 1

當將apply與groupby一起使用時，您不會獲得組對象，但是會得到與相關組相對應的數據框的一部分。 所以x在你的情況下是一個DataFrame，而不是一個GroupBy對象-對待它的方式與對待df相同。

cond_probs = df.groupby([variable, 'has_income']).apply(lambda x: 
  pd.Series({
    'probs': (len(x) / len(df)) / probs[x.iloc[0][variable]],
    'size': len(x)
  })
)

NB如果使用.size上一個數據幀，它將返回細胞的總數-所以它不是一樣GroupBy.size （文檔）

帶有lambda函數的pandas groupby中無法使用.size（）.div（）方法

問題描述

1 個解決方案

解決方案1
2 已采納 2019-07-03 11:51:49

帶有lambda函數的pandas groupby中無法使用.size（）.div（）方法

問題描述

1 個解決方案

解決方案1 2 已采納 2019-07-03 11:51:49

解決方案1
2 已采納 2019-07-03 11:51:49