[英]Pandas group by aggregate using division
I'm wondering how to aggregate data within a grouped pandas dataframe by a function where I take into account the value stored in some column of the dataframe. 我想知道如何通过一个函数聚合分组的pandas数据帧中的数据,其中我考虑了存储在数据帧的某些列中的值。 This would be useful in operations where order of operations matters, such as division.
这对于操作顺序很重要的操作很有用,例如划分。
For example I have: 例如,我有:
In [8]: df
Out[8]:
class cat xer
0 a 1 2
1 b 1 4
2 c 1 9
3 a 2 6
4 b 2 8
5 c 2 3
I want to group by by class and for each class
divide the xer
value corresponding to cat == 1
by that for cat == 2
. 我希望按类进行分组,并且每个
class
将cat == 1
的xer
值除以cat == 2
。 In other words, the entries in the final output should be: 换句话说,最终输出中的条目应为:
class div
0 a 0.33 (i.e. 2/6)
1 b 0.5 (i.e. 4/8)
2 c 3 (i.e. 9/3)
Is this possible to do using groupby? 这可以用groupby吗? I can't quite figure out how to do it without manually iterating through each class and even so it's not clean or fun.
我不知道如何在没有手动迭代每个类的情况下完成它,即使这样也不干净或有趣。
Without doing anything too clever: 没有做任何太聪明的事情:
In [11]: one = df[df["cat"] == 1].set_index("class")["xer"]
In [12]: two = df[df["cat"] == 2].set_index("class")["xer"]
In [13]: one / two
Out[13]:
class
a 0.333333
b 0.500000
c 3.000000
Name: xer, dtype: float64
Given your DataFrame
, you can use the following: 给定您的
DataFrame
,您可以使用以下内容:
df.groupby('class').agg({'xer': lambda L: reduce(pd.np.divide, L)})
Which gives you: 哪个给你:
xer
class
a 0.333333
b 0.500000
c 3.000000
This caters for > 2 per group (if needs be), but you might want to ensure your df is sorted by cat
first to ensure they appear in the right order. 这适合每组> 2(如果需要),但您可能希望确保您的df首先按
cat
排序,以确保它们以正确的顺序出现。
This is one approach, step by step: 这是一种一步一步的方法:
# get cat==1 and cat==2 merged by class
grouped = df[df.cat==1].merge(df[df.cat==2], on='class')
# calculate div
grouped['div'] = grouped.xer_x / grouped.xer_y
# return the final dataframe
grouped[['class', 'div']]
which yields: 产量:
class div
0 a 0.333333
1 b 0.500000
2 c 3.000000
You may want to rearrange your data to make it easier to view: 您可能需要重新排列数据以便于查看:
df2 = df.set_index(['class', 'cat']).unstack()
>>> df2
xer
cat 1 2
class
a 2 6
b 4 8
c 9 3
You can then do the following to get your desired result: 然后,您可以执行以下操作以获得所需的结果:
>>> df2.iloc[:,0].div(df2.iloc[:, 1])
class
a 0.333333
b 0.500000
c 3.000000
Name: (xer, 1), dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.