简体   繁体   English

Pandas - Groupby多列

[英]Pandas - Groupby multiple columns

I'm trying to group by multiple columns, and aggregate them so that they become a list after grouping. 我正在尝试按多列进行分组,并将它们聚合在一起,以便它们在分组后成为一个列表。

Currently, the DataFrame looks like this: 目前, DataFrame看起来像这样:

在此输入图像描述

I've tried to use this: 我试过用这个:

grouped = DataFrame.groupby(['jobname', 'block'], axis=0)
DataFrame= grouped.aggregate(lambda x: list(x))

However, when I apply this in IPython, it gives me this error: 但是,当我在IPython中应用它时,它给了我这个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-221-97113b757fa1> in <module>()
----> 1 cassandraFrame_2 = grouped.aggregate(lambda x: list(x))
      2 cassandraFrame_2

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2867 
   2868             if self.grouper.nkeys > 1:
-> 2869                 return self._python_agg_general(arg, *args, **kwargs)
   2870             else:
   2871 

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _python_agg_general(self, func, *args, **kwargs)
   1166         for name, obj in self._iterate_slices():
   1167             try:
-> 1168                 result, counts = self.grouper.agg_series(obj, f)
   1169                 output[name] = self._try_cast(result, obj)
   1170             except TypeError:

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in agg_series(self, obj, func)
   1633             return self._aggregate_series_fast(obj, func)
   1634         except Exception:
-> 1635             return self._aggregate_series_pure_python(obj, func)
   1636 
   1637     def _aggregate_series_fast(self, obj, func):

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _aggregate_series_pure_python(self, obj, func)
   1667                 if (isinstance(res, (Series, Index, np.ndarray)) or
   1668                         isinstance(res, list)):
-> 1669                     raise ValueError('Function does not reduce')
   1670                 result = np.empty(ngroups, dtype='O')
   1671 

ValueError: Function does not reduce

Ultimately, I want to group the same jobname, and block together, but the data is a list of tuples, right now it is a 3 item tuple. 最终,我想将相同的作业名分组,并阻塞在一起,但数据是一个元组列表,现在它是一个3项元组。

For Example: 例如:

jobname       block         data
Complete-Test Simple_buff   (tuple_1)
Complete-Test Simple_buff   (tuple_2)

Aggregate: 骨料:

jobname       block         data
Complete-Test Simple_buff   [(tuple_1),(tuple_2)]

I could group by jobname , however, this aggregates the block together, but I want to keep the blocks seperate. 我可以按jobname ,但是,这种聚合了block在一起,但我想保持blocks独立。

Can someone point me to the right direction? 有人能指出我正确的方向吗?

Thanks 谢谢

Looks like there is an explicit check that the value returned by the aggregating function is not a Series , Index , np.ndarray , or a list . 看起来有一个明确的检查,聚合函数返回的值不是SeriesIndexnp.ndarraylist

So, the following should work: 所以,以下应该工作:

grouped = df.groupby(['jobname', 'block'])
aggregated = grouped.aggregate(lambda x: tuple(x))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM