简体   繁体   English

按组别占总数的百分比

[英]Percentage of total by group

Say I start with:说我开始:

In [1]: import polars as pl

In [2]: df = pl.DataFrame({
    'group1': ['a', 'a', 'b', 'c', 'a', 'b'], 
    'group2': [0, 1, 1, 0, 1, 1]
})

In [3]: df
Out[3]:
shape: (6, 2)
┌────────┬────────┐
│ group1 ┆ group2 │
│ ---    ┆ ---    │
│ str    ┆ i64    │
╞════════╪════════╡
│ a      ┆ 0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ a      ┆ 1      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ c      ┆ 0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ a      ┆ 1      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      │
└────────┴────────┘

I'd like to get, for each group1 , the distribution of group2 .我想为每个group1获取group2的分布。

My desired outcome is:我想要的结果是:

shape: (4, 4)
┌────────┬────────┬───────┬────────────┐
│ group1 ┆ group2 ┆ count ┆ percentage │
│ ---    ┆ ---    ┆ ---   ┆ ---        │
│ str    ┆ i64    ┆ u32   ┆ f64        │
╞════════╪════════╪═══════╪════════════╡
│ a      ┆ 0      ┆ 1     ┆ 0.333333   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a      ┆ 1      ┆ 2     ┆ 0.666667   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      ┆ 2     ┆ 1.0        │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c      ┆ 0      ┆ 1     ┆ 1.0        │
└────────┴────────┴───────┴────────────┘

Here's one way I've found to do it - is there a more idiomatic way in polars?这是我发现的一种方法 - 在极地中是否有更惯用的方法?

counts = df.groupby(['group1', 'group2']).count()
counts.with_column(
    (
         counts['count']
         / counts.select(pl.col('count').sum().over('group1'))['count']
    ).alias('percentage')
).sort(['group1', 'group2'])

You are on the right path, but it is better to use expressions all the way and don't construct/access intermediate dataframes.你走在正确的道路上,但最好一直使用表达式并且不要构建/访问中间数据帧。


(df.groupby(["group1", "group2"])
  .agg([
      pl.count()
  ])
).select([
    pl.all().exclude("count"),
    (pl.col("count") / pl.sum("count").over("group1")).alias("percentage")
])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM