Pandas groupby object.aggregate具有自定義列表操作功能

Question

我有一個csv文件，如下所示

Hour,L,Dr,Tag,Code,Vge
0,L5,XI,PS,4R,15
0,L3,St,sst,4R,17
5,L5,XI,PS,4R,12
2,L0,St,v2T,4R,11
8,L2,TI,sst,4R,8
12,L5,XI,PS,4R,18
2,L2,St,PS,4R,9
12,L3,XI,sst,4R,16

我在ipython筆記本中執行以下腳本。

In[1]
    import pandas as pd
In[2]
    df = pd.read_csv('/python/concepts/pandas/in.csv')
In[3]    
    df.head(n=9)

Out[1]: 

       Hour   L  Dr  Tag Code  Vge
    0     0  L5  XI   PS   4R   15
    1     0  L3  St  sst   4R   17
    2     5  L5  XI   PS   4R   12
    3     2  L0  St  v2T   4R   11
    4     8  L2  TI  sst   4R    8
    5    12  L5  XI   PS   4R   18
    6     2  L2  St   PS   4R    9
    7    12  L3  XI  sst   4R   16

In[4]
    df.groupby(('Hour'))['Vge'].aggregate(np.sum)



Out[2]:  
     Hour
        0     32
        2     20
        5     12
        8      8
        12    34
        Name: Vge, dtype: int64

現在我寫一個列表操作square_list 。

In[4]    

    newlist = []
In[5]    
    def square_list(x):
        for item in x:
            newlist.append(item**item)
        return newlist

In [44]: df.groupby(('Hour'))['Vge'].aggregate(square_list)
Out[44]: 
Hour
0     [437893890380859375, -2863221430593058543, 437...
2     [437893890380859375, -2863221430593058543, 437...
5     [437893890380859375, -2863221430593058543, 437...
8     [437893890380859375, -2863221430593058543, 437...
12    [437893890380859375, -2863221430593058543, 437...
Name: Vge, dtype: object

輸出看起來很奇怪。我所期待的是第一個輸出中項目的squares 。

如果我使用

df.groupby(('Hour'))['Vge'].aggregate(lambda x: x ** x)

我收到以下錯誤。

ValueError                                Traceback (most recent call last)
/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in agg_series(self, obj, func)
   1632         try:
-> 1633             return self._aggregate_series_fast(obj, func)
   1634         except Exception:

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _aggregate_series_fast(self, obj, func)
   1651                                     dummy)
-> 1652         result, counts = grouper.get_result()
   1653         return result, counts

pandas/src/reduce.pyx in pandas.lib.SeriesGrouper.get_result (pandas/lib.c:38634)()

pandas/src/reduce.pyx in pandas.lib.SeriesGrouper.get_result (pandas/lib.c:38503)()

pandas/src/reduce.pyx in pandas.lib._get_result_array (pandas/lib.c:32023)()

ValueError: function does not reduce

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2339             try:
-> 2340                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
   2341             except Exception:

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs)
   1167             try:
-> 1168                 result, counts = self.grouper.agg_series(obj, f)
   1169                 output[name] = self._try_cast(result, obj)

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in agg_series(self, obj, func)
   1634         except Exception:
-> 1635             return self._aggregate_series_pure_python(obj, func)
   1636 

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _aggregate_series_pure_python(self, obj, func)
   1668                         isinstance(res, list)):
-> 1669                     raise ValueError('Function does not reduce')
   1670                 result = np.empty(ngroups, dtype='O')

ValueError: Function does not reduce

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-47-874cf4c23d53> in <module>()
----> 1 df.groupby(('Hour'))['Vge'].aggregate(lambda x : x**x)

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2340                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
   2341             except Exception:
-> 2342                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
   2343 
   2344             index = Index(sorted(result), name=self.grouper.names[0])

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _aggregate_named(self, func, *args, **kwargs)
   2429             output = func(group, *args, **kwargs)
   2430             if isinstance(output, (Series, Index, np.ndarray)):
-> 2431                 raise Exception('Must produce aggregated value')
   2432             result[name] = self._try_cast(output, group)
   2433 

Exception: Must produce aggregated value

Answer 1

你在仔細閱讀這個錯誤嗎？ 它說功能不會降低。 請花幾分鍾時間來正確定義您想要的內容。 這也是你的square_list()函數的確切問題，它返回一個列表，而不是列表元素的總和。 它沒有減少。

如果你想要簡單的總和：
```
 df.groupby('Hour')['Vge'].sum() 
```
如果要平方列中的所有元素：
```
 df['Vge_squared'] = df['Vge']**2 
```

如果你想要組的平方和：

 df.groupby('Hour')['Vge_squared'].sum()

要么，

def square_list(x):
    x = numpy.array(x)
    return numpy.sum(numpy.multiply(x,x))

df.groupby('Hour')['Vge'].aggregate(square_list)

要么，

def square_list(x):
    for item in x:
        newlist.append(item**item)
    return newlist

df.groupby('Hour')['Vge'].aggregate(square_list).apply(sum)

希望這可以幫助。

Answer 2

首先，第一輸出是“預期”，因為每次調用square_list被追加到全球 newlist 。

您可以在每次調用時創建列表：

def square_list(x):
    newlist = []
    for item in x:
        newlist.append(item**item)
    return newlist

In [11]: df.groupby(('Hour'))['Vge'].aggregate(square_list)
Out[11]:
Hour
0     [437893890380859375, -2863221430593058543]
2                      [285311670611, 387420489]
5                                [8916100448256]
8                                     [16777216]
12                      [-497033925936021504, 0]
dtype: object

但我懷疑這不是你想要的。

錯誤消息非常准確：“必須產生聚合值”。 目前你的lambda沒有返回單個值。

也許你想要總和：

In [21]: df.groupby(('Hour'))['Vge'].aggregate(lambda x: (x ** x).sum())
Out[21]:
Hour
0    -8785478146473916416
2            285699091100
5           8916100448256
8                16777216
12                      0
Name: Vge, dtype: int64

注意：為正方形創建一個虛擬列可能會更快，然后是一個“干凈”的總和。

Pandas groupby object.aggregate具有自定義列表操作功能

問題描述

2 個解決方案

解決方案1
2 2015-12-06 06:29:17

解決方案2
1 2015-12-06 06:29:30

Pandas groupby object.aggregate具有自定義列表操作功能

問題描述

2 個解決方案

解決方案1 2 2015-12-06 06:29:17

解決方案2 1 2015-12-06 06:29:30

解決方案1
2 2015-12-06 06:29:17

解決方案2
1 2015-12-06 06:29:30