[英]Pandas - Groupby multiple columns
I'm trying to group by multiple columns, and aggregate them so that they become a list after grouping. 我正在尝试按多列进行分组,并将它们聚合在一起,以便它们在分组后成为一个列表。
Currently, the DataFrame
looks like this: 目前, DataFrame
看起来像这样:
I've tried to use this: 我试过用这个:
grouped = DataFrame.groupby(['jobname', 'block'], axis=0)
DataFrame= grouped.aggregate(lambda x: list(x))
However, when I apply this in IPython, it gives me this error: 但是,当我在IPython中应用它时,它给了我这个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-221-97113b757fa1> in <module>()
----> 1 cassandraFrame_2 = grouped.aggregate(lambda x: list(x))
2 cassandraFrame_2
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
2867
2868 if self.grouper.nkeys > 1:
-> 2869 return self._python_agg_general(arg, *args, **kwargs)
2870 else:
2871
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _python_agg_general(self, func, *args, **kwargs)
1166 for name, obj in self._iterate_slices():
1167 try:
-> 1168 result, counts = self.grouper.agg_series(obj, f)
1169 output[name] = self._try_cast(result, obj)
1170 except TypeError:
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in agg_series(self, obj, func)
1633 return self._aggregate_series_fast(obj, func)
1634 except Exception:
-> 1635 return self._aggregate_series_pure_python(obj, func)
1636
1637 def _aggregate_series_fast(self, obj, func):
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _aggregate_series_pure_python(self, obj, func)
1667 if (isinstance(res, (Series, Index, np.ndarray)) or
1668 isinstance(res, list)):
-> 1669 raise ValueError('Function does not reduce')
1670 result = np.empty(ngroups, dtype='O')
1671
ValueError: Function does not reduce
Ultimately, I want to group the same jobname, and block together, but the data is a list of tuples, right now it is a 3 item tuple. 最终,我想将相同的作业名分组,并阻塞在一起,但数据是一个元组列表,现在它是一个3项元组。
For Example: 例如:
jobname block data
Complete-Test Simple_buff (tuple_1)
Complete-Test Simple_buff (tuple_2)
Aggregate: 骨料:
jobname block data
Complete-Test Simple_buff [(tuple_1),(tuple_2)]
I could group by jobname
, however, this aggregates the block
together, but I want to keep the blocks
seperate. 我可以按jobname
,但是,这种聚合了block
在一起,但我想保持blocks
独立。
Can someone point me to the right direction? 有人能指出我正确的方向吗?
Thanks 谢谢
Looks like there is an explicit check that the value returned by the aggregating function is not a Series
, Index
, np.ndarray
, or a list
. 看起来有一个明确的检查,聚合函数返回的值不是Series
, Index
, np.ndarray
或list
。
So, the following should work: 所以,以下应该工作:
grouped = df.groupby(['jobname', 'block'])
aggregated = grouped.aggregate(lambda x: tuple(x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.