按组别占总数的百分比

Question

Say I start with:说我开始：

In [1]: import polars as pl

In [2]: df = pl.DataFrame({
    'group1': ['a', 'a', 'b', 'c', 'a', 'b'], 
    'group2': [0, 1, 1, 0, 1, 1]
})

In [3]: df
Out[3]:
shape: (6, 2)
┌────────┬────────┐
│ group1 ┆ group2 │
│ ---    ┆ ---    │
│ str    ┆ i64    │
╞════════╪════════╡
│ a      ┆ 0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ a      ┆ 1      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ c      ┆ 0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ a      ┆ 1      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      │
└────────┴────────┘

I'd like to get, for each group1 , the distribution of group2 .我想为每个group1获取group2的分布。

My desired outcome is:我想要的结果是：

shape: (4, 4)
┌────────┬────────┬───────┬────────────┐
│ group1 ┆ group2 ┆ count ┆ percentage │
│ ---    ┆ ---    ┆ ---   ┆ ---        │
│ str    ┆ i64    ┆ u32   ┆ f64        │
╞════════╪════════╪═══════╪════════════╡
│ a      ┆ 0      ┆ 1     ┆ 0.333333   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a      ┆ 1      ┆ 2     ┆ 0.666667   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      ┆ 2     ┆ 1.0        │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c      ┆ 0      ┆ 1     ┆ 1.0        │
└────────┴────────┴───────┴────────────┘

Here's one way I've found to do it - is there a more idiomatic way in polars?这是我发现的一种方法 - 在极地中是否有更惯用的方法？

counts = df.groupby(['group1', 'group2']).count()
counts.with_column(
    (
         counts['count']
         / counts.select(pl.col('count').sum().over('group1'))['count']
    ).alias('percentage')
).sort(['group1', 'group2'])

Answer 1

You are on the right path, but it is better to use expressions all the way and don't construct/access intermediate dataframes.你走在正确的道路上，但最好一直使用表达式并且不要构建/访问中间数据帧。


(df.groupby(["group1", "group2"])
  .agg([
      pl.count()
  ])
).select([
    pl.all().exclude("count"),
    (pl.col("count") / pl.sum("count").over("group1")).alias("percentage")
])

按组别占总数的百分比

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-12-11 15:56:23

按组别占总数的百分比

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-12-11 15:56:23

解决方案1
2 已采纳 2022-12-11 15:56:23