Python pandas groupby轉換/應用函數在多列上運行

Question

嘗試使用apply-split-combine pandas轉換。 隨着apply函數需要在多列上操作的扭曲。 似乎我無法使用pd.transform使其工作，並且必須通過pd.apply間接進行。 還有辦法

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date':[1,1,1,2,2,2],'col1':[1,2,3,4,5,6],'col2':[1,2,3,4,5,6]})
col1 = 'col1'
col2 = 'col2'
def calc(dfg):
    nparray = np.array(dfg[col1])
    somecalc = np.array(dfg[col2])
    # do something with somecalc that helps caculate result
    return(nparray - nparray.mean()) #just some dummy data, the function does a complicated calculation

#===> results in: KeyError: 'col1'
df['colnew'] = df.groupby('Date')[col1].transform(calc)

#===> results in: ValueError: could not broadcast input array from shape (9) into shape (9,16) or TypeError: cannot concatenate a non-NDFrame object
df['colnew'] = df.groupby('Date').transform(calc)

#===> this works but feels unnecessary 
def applycalc(df):
    df['colnew'] = calc(df)
    return(df)

df = df.groupby('Date').apply(applycalc)

這篇文章是我找到的最接近的帖子。 除了存在groupby操作之外，我寧願不將所有列作為單獨的參數傳遞。

編輯：請注意，我並沒有真正嘗試計算nparray - nparray.mean() ，這只是一個虛擬計算。 它做了一些復雜的事情，它返回一個形狀數組(group_length,1) 。 另外，我想將colnew存儲為原始數據幀中的新列。

Answer 1

你可以做GROUPBY 然后減法，而不是一次：

In [11]: df["col1"] - df.groupby('Date')["col1"].transform("mean")
Out[11]:
0   -1
1    0
2    1
3   -1
4    0
5    1
dtype: int64

在這種情況下，您不能使用transform，因為該函數返回多個值/ array / series：

In [21]: def calc2(dfg):
             return dfg["col1"] - dfg["col1"].mean()

In [22]: df.groupby('Date', as_index=True).apply(calc2)
Out[22]:
Date
1     0   -1
      1    0
      2    1
2     3   -1
      4    0
      5    1
Name: col1, dtype: float64

請注意，返回一個系列很重要，否則它將不對齊：

In [23]: df.groupby('Date').apply(calc)
Out[23]:
Date
1    [-1.0, 0.0, 1.0]
2    [-1.0, 0.0, 1.0]
dtype: object

Python pandas groupby轉換/應用函數在多列上運行

問題描述

1 個解決方案

解決方案1
2 已采納 2015-11-13 18:14:17

Python pandas groupby轉換/應用函數在多列上運行

問題描述

1 個解決方案

解決方案1 2 已采納 2015-11-13 18:14:17

解決方案1
2 已采納 2015-11-13 18:14:17