[英]Pandas: groupby quantile with agg values
I'm trying to group numerical values by quantiles and create columns for the sum of the values falling into the quantile bands. 我正在尝试按分位数对数值分组,并为落入分位数带中的值的总和创建列。 Here's a simplified, reproducible example: 这是一个简化的,可复制的示例:
raw_data = {'female': [0, 1, 0, 1, 0, 1, 0, 1],
'male': [1, 0, 1, 0, 1, 0, 1, 0],
'number': [25000, 34000, 48600, 22000, 50000, 21000, 29000, 36000]}
df = pd.DataFrame(raw_data, columns = ['female', 'male', 'number'])
df
female male number
0 0 1 25000
1 1 0 34000
2 0 1 48600
3 1 0 22000
4 0 1 50000
5 1 0 21000
6 0 1 29000
7 1 0 36000
Essentially I'm trying to achieve this: 本质上,我正在尝试实现以下目标:
pd.DataFrame(df['number'].quantile([.1, .2, .3, .4, .5]))
number
0.1 21700
0.2 23200
0.3 25400
0.4 28200
0.5 31500
But in this dataframe show two new columns. 但是在此数据框中显示了两个新列。 One for the sum of males who's number falls into the corresponding quantile band and one for the sum of females. 一种用于属于相应分位数带的男性总数,另一种用于女性总数。
Initially I thought this would be a groupby
with .quantile([values])
appended, and then .agg({'male': 'sum', 'female':'sum'})
This doesn't work though. 最初,我以为这是一个附加了.quantile([values])
的groupby
,然后是.agg({'male': 'sum', 'female':'sum'})
但这是行不通的。 Can what I'm trying to achieve even be done? 我想要实现的目标还能做到吗?
You want to use pd.qcut
to create the groupings: 您要使用pd.qcut
创建分组:
qs = pd.qcut(df.number, [0, .1, .2, .3, .4, .5, 1], ['q%d' % i for i in xrange(6)])
qs
0 q2
1 q5
2 q5
3 q1
4 q5
5 q0
6 q4
7 q5
Name: number, dtype: category
Categories (6, object): [q0 < q1 < q2 < q3 < q4 < q5]
Then groupby
and agg
然后groupby
和agg
df.groupby(qs).agg({'male': 'sum', 'female':'sum'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.