Pandas - Groupby多列

Question

I'm trying to group by multiple columns, and aggregate them so that they become a list after grouping. 我正在尝试按多列进行分组，并将它们聚合在一起，以便它们在分组后成为一个列表。

Currently, the DataFrame looks like this: 目前， DataFrame看起来像这样：

I've tried to use this: 我试过用这个：

grouped = DataFrame.groupby(['jobname', 'block'], axis=0)
DataFrame= grouped.aggregate(lambda x: list(x))

However, when I apply this in IPython, it gives me this error: 但是，当我在IPython中应用它时，它给了我这个错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-221-97113b757fa1> in <module>()
----> 1 cassandraFrame_2 = grouped.aggregate(lambda x: list(x))
      2 cassandraFrame_2

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2867 
   2868             if self.grouper.nkeys > 1:
-> 2869                 return self._python_agg_general(arg, *args, **kwargs)
   2870             else:
   2871 

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _python_agg_general(self, func, *args, **kwargs)
   1166         for name, obj in self._iterate_slices():
   1167             try:
-> 1168                 result, counts = self.grouper.agg_series(obj, f)
   1169                 output[name] = self._try_cast(result, obj)
   1170             except TypeError:

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in agg_series(self, obj, func)
   1633             return self._aggregate_series_fast(obj, func)
   1634         except Exception:
-> 1635             return self._aggregate_series_pure_python(obj, func)
   1636 
   1637     def _aggregate_series_fast(self, obj, func):

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _aggregate_series_pure_python(self, obj, func)
   1667                 if (isinstance(res, (Series, Index, np.ndarray)) or
   1668                         isinstance(res, list)):
-> 1669                     raise ValueError('Function does not reduce')
   1670                 result = np.empty(ngroups, dtype='O')
   1671 

ValueError: Function does not reduce

Ultimately, I want to group the same jobname, and block together, but the data is a list of tuples, right now it is a 3 item tuple. 最终，我想将相同的作业名分组，并阻塞在一起，但数据是一个元组列表，现在它是一个3项元组。

For Example: 例如：

jobname       block         data
Complete-Test Simple_buff   (tuple_1)
Complete-Test Simple_buff   (tuple_2)

Aggregate: 骨料：

jobname       block         data
Complete-Test Simple_buff   [(tuple_1),(tuple_2)]

I could group by jobname , however, this aggregates the block together, but I want to keep the blocks seperate. 我可以按jobname ，但是，这种聚合了block在一起，但我想保持blocks独立。

Can someone point me to the right direction? 有人能指出我正确的方向吗？

Thanks 谢谢

Answer 1

Looks like there is an explicit check that the value returned by the aggregating function is not a Series , Index , np.ndarray , or a list . 看起来有一个明确的检查，聚合函数返回的值不是Series ， Index ， np.ndarray或list 。

So, the following should work: 所以，以下应该工作：

grouped = df.groupby(['jobname', 'block'])
aggregated = grouped.aggregate(lambda x: tuple(x))

Pandas - Groupby多列

问题描述

1 个解决方案

解决方案1
6 已采纳 2015-11-13 23:16:01

Pandas - Groupby多列

问题描述

1 个解决方案

解决方案1 6 已采纳 2015-11-13 23:16:01

解决方案1
6 已采纳 2015-11-13 23:16:01