大熊猫集团利用分裂

Question

I'm wondering how to aggregate data within a grouped pandas dataframe by a function where I take into account the value stored in some column of the dataframe. 我想知道如何通过一个函数聚合分组的pandas数据帧中的数据，其中我考虑了存储在数据帧的某些列中的值。 This would be useful in operations where order of operations matters, such as division. 这对于操作顺序很重要的操作很有用，例如划分。

For example I have: 例如，我有：

In [8]: df
Out[8]: 
  class cat  xer
0     a   1    2
1     b   1    4
2     c   1    9
3     a   2    6
4     b   2    8
5     c   2    3

I want to group by by class and for each class divide the xer value corresponding to cat == 1 by that for cat == 2 . 我希望按类进行分组，并且每个class将cat == 1的xer值除以cat == 2 。 In other words, the entries in the final output should be: 换句话说，最终输出中的条目应为：

  class    div
0     a   0.33  (i.e. 2/6)
1     b    0.5  (i.e. 4/8)
2     c      3  (i.e. 9/3)

Is this possible to do using groupby? 这可以用groupby吗？ I can't quite figure out how to do it without manually iterating through each class and even so it's not clean or fun. 我不知道如何在没有手动迭代每个类的情况下完成它，即使这样也不干净或有趣。

Answer 1

Without doing anything too clever: 没有做任何太聪明的事情：

In [11]: one = df[df["cat"] == 1].set_index("class")["xer"]

In [12]: two = df[df["cat"] == 2].set_index("class")["xer"]

In [13]: one / two
Out[13]:
class
a    0.333333
b    0.500000
c    3.000000
Name: xer, dtype: float64

Answer 2

Given your DataFrame , you can use the following: 给定您的DataFrame ，您可以使用以下内容：

df.groupby('class').agg({'xer': lambda L: reduce(pd.np.divide, L)})

Which gives you: 哪个给你：

            xer
class          
a      0.333333
b      0.500000
c      3.000000

This caters for > 2 per group (if needs be), but you might want to ensure your df is sorted by cat first to ensure they appear in the right order. 这适合每组> 2（如果需要），但您可能希望确保您的df首先按cat排序，以确保它们以正确的顺序出现。

Answer 3

This is one approach, step by step: 这是一种一步一步的方法：

# get cat==1 and cat==2 merged by class
grouped = df[df.cat==1].merge(df[df.cat==2], on='class')
# calculate div
grouped['div'] = grouped.xer_x / grouped.xer_y
# return the final dataframe
grouped[['class', 'div']]

which yields: 产量：

  class       div
0     a  0.333333
1     b  0.500000
2     c  3.000000

Answer 4

You may want to rearrange your data to make it easier to view: 您可能需要重新排列数据以便于查看：

df2 = df.set_index(['class', 'cat']).unstack()

>>> df2
       xer   
cat      1  2
class        
a        2  6
b        4  8
c        9  3

You can then do the following to get your desired result: 然后，您可以执行以下操作以获得所需的结果：

>>> df2.iloc[:,0].div(df2.iloc[:, 1])

class
a        0.333333
b        0.500000
c        3.000000
Name: (xer, 1), dtype: float64

大熊猫集团利用分裂

问题描述

4 个解决方案

解决方案1
2 2015-05-08 22:37:53

解决方案2
2 已采纳 2015-05-08 22:39:21

解决方案3
0 2015-05-08 22:47:19

解决方案4
0 2015-05-08 22:53:39

大熊猫集团利用分裂

问题描述

4 个解决方案

解决方案1 2 2015-05-08 22:37:53

解决方案2 2 已采纳 2015-05-08 22:39:21

解决方案3 0 2015-05-08 22:47:19

解决方案4 0 2015-05-08 22:53:39

解决方案1
2 2015-05-08 22:37:53

解决方案2
2 已采纳 2015-05-08 22:39:21

解决方案3
0 2015-05-08 22:47:19

解决方案4
0 2015-05-08 22:53:39