[英]How to create a Pandas crosstab with a custom aggregate function with mixture of two variables?
Suppose I have 4 variables A
, B
, C
, D
, I want the crosstab to look like this:假设我有 4 个变量
A
、 B
、 C
、 D
,我希望交叉表如下所示:
+-------+-----------------+-----------------+
| A/B | Cat1 | Cat2 |
+-------+-----------------+-----------------+
| Cat-a | Sum(C)/Count(D) | Sum(C)/Count(D) |
| Cat-b | Sum(C)/Count(D) | Sum(C)/Count(D) |
+-------+-----------------+-----------------+
Eg.例如。 My data looks like this:
我的数据如下所示:
+------+--------+--------+--------+
| Type | Gender | Height | Weight |
+------+--------+--------+--------+
| Dog | F | 80 | 60 |
| Dog | F | 75 | 57 |
| Dog | M | 90 | 68 |
| Cat | F | 50 | 50 |
| Cat | F | 53 | 53 |
| Cat | M | 56 | 55 |
| Cat | M | 60 | 54 |
| Cat | M | 65 | 60 |
+------+--------+--------+--------+
Now suppose I want the aggregate to be sum(weight)/max(height).现在假设我希望聚合为 sum(weight)/max(height)。 The crosstab would look something like:
交叉表看起来像:
+-------------+------------+------------+
| Type/Gender | M | F |
+-------------+------------+------------+
| Cat | 169/65=2.6 | 103/53=1.9 |
| Dog | 68/90=0.75 | 117/80=1.4 |
+-------------+------------+------------+
First aggregate by GroupBy.agg
with max
and sum
, then create new column by DataFrame.assign
with division, reshape by Series.unstack
and last data cleaning - DataFrame.reset_index
with DataFrame.rename_axis
:首先通过
GroupBy.agg
与max
和sum
聚合,然后通过DataFrame.assign
创建新列DataFrame.assign
使用除法,通过Series.unstack
重塑和最后一次数据清理 - DataFrame.reset_index
与DataFrame.rename_axis
:
df1 = (df.groupby(['Type','Gender'])
.agg({'Height':'max','Weight':'sum'})
.assign(New = lambda x: x.Weight / x.Height)['New']
.unstack()
.reset_index()
.rename_axis(None, axis=1))
print (df1)
Type F M
0 Cat 1.943396 2.600000
1 Dog 1.462500 0.755556
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.