Pandas Groupby Agg功能不会减少

Question

我正在使用我已经在我的工作中使用了很长时间的聚合函数。 这个想法是，如果系列传递给函数的长度为1（即该组只有一个观察值），则返回该观察值。 如果传递的系列的长度大于1，则在列表中返回观察结果。

这对某些人来说可能看起来很奇怪，但这不是X，Y问题，我有充分的理由想要这样做与这个问题无关。

这是我一直在使用的功能：

def MakeList(x):
    """ This function is used to aggregate data that needs to be kept distinc within multi day 
        observations for later use and transformation. It makes a list of the data and if the list is of length 1
        then there is only one line/day observation in that group so the single element of the list is returned. 
        If the list is longer than one then there are multiple line/day observations and the list itself is 
        returned."""
    L = x.tolist()
    if len(L) > 1:
        return L
    else:
        return L[0]

现在出于某种原因，使用我正在处理的当前数据集，我得到一个ValueError，声明该函数没有减少。 这是一些测试数据和我正在使用的其余步骤：

import pandas as pd
DF = pd.DataFrame({'date': ['2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02'],
                    'line_code':   ['401101',
                                    '401101',
                                    '401102',
                                    '401103',
                                    '401104',
                                    '401105',
                                    '401105',
                                    '401106',
                                    '401106',
                                    '401107'],
                    's.m.v.': [ 7.760,
                                25.564,
                                25.564,
                                9.550,
                                4.870,
                                7.760,
                                25.564,
                                5.282,
                                25.564,
                                5.282]})
DFGrouped = DF.groupby(['date', 'line_code'], as_index = False)
DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

在尝试调试时，我将print语句设置为print L和print x.index的效果，输出如下：

[7.7599999999999998, 25.564]
Int64Index([0, 1], dtype='int64')
[7.7599999999999998, 25.564]
Int64Index([0, 1], dtype='int64')

出于某种原因，似乎agg将系列两次传递给函数。 据我所知，这根本不正常，可能是我的功能没有减少的原因。

例如，如果我写一个这样的函数：

def test_func(x):
    print x.index
    return x.iloc[0]

这没有问题，打印语句是：

DF_Agg = DFGrouped.agg({'s.m.v.' : test_func})

Int64Index([0, 1], dtype='int64')
Int64Index([2], dtype='int64')
Int64Index([3], dtype='int64')
Int64Index([4], dtype='int64')
Int64Index([5, 6], dtype='int64')
Int64Index([7, 8], dtype='int64')
Int64Index([9], dtype='int64')

这表示每个组仅作为一个系列传递给函数。

任何人都可以帮助我理解为什么会失败？ 我已经使用了这个功能，在我使用的许多数据集中取得了成功....

谢谢

Answer 1

我无法真正解释你为什么，但是从我在pandas.DataFrame经验list pandas.DataFrame并不是很好。

我通常使用tuple代替。 那可行：

def MakeList(x):
    T = tuple(x)
    if len(T) > 1:
        return T
    else:
        return T[0]

DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

     date line_code           s.m.v.
0  2013-04-02    401101   (7.76, 25.564)
1  2013-04-02    401102           25.564
2  2013-04-02    401103             9.55
3  2013-04-02    401104             4.87
4  2013-04-02    401105   (7.76, 25.564)
5  2013-04-02    401106  (5.282, 25.564)
6  2013-04-02    401107            5.282

Answer 2

这是DataFrame中的一个错误。 如果聚合器返回第一个组的列表，它将失败并显示您提到的错误; 如果它返回第一组的非列表（非系列），它将正常工作。 破碎的代码在groupby.py中：

def _aggregate_series_pure_python(self, obj, func):

    group_index, _, ngroups = self.group_info

    counts = np.zeros(ngroups, dtype=int)
    result = None

    splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)

    for label, group in splitter:
        res = func(group)
        if result is None:
            if (isinstance(res, (Series, Index, np.ndarray)) or
                    isinstance(res, list)):
                raise ValueError('Function does not reduce')
            result = np.empty(ngroups, dtype='O')

        counts[label] = group.shape[0]
        result[label] = res

请注意， if result is None且isinstance(res, list 。您的选项是：

伪造groupby（）。agg（），因此它没有看到第一组的列表，或者
使用上面的代码自己进行聚合，但没有错误的测试。

Pandas Groupby Agg功能不会减少

问题描述

2 个解决方案

解决方案1
33 2015-04-24 13:16:45

解决方案2
14 2016-06-21 22:59:40

Pandas Groupby Agg功能不会减少

问题描述

2 个解决方案

解决方案1 33 2015-04-24 13:16:45

解决方案2 14 2016-06-21 22:59:40

解决方案1
33 2015-04-24 13:16:45

解决方案2
14 2016-06-21 22:59:40