如何在 Pandas Groupby Python 中执行不同的平均值？

Question

I have data like this:我有这样的数据：

Input输入

>>> import pandas as pd
>>> df.head(8)
date          id       count
01.02.2020    a        5
01.02.2020    b        10
02.02.2020    a        6
02.02.2020    b        11
03.02.2020    a        9
03.02.2020    a        13
03.02.2020    b        3
03.02.2020    b        5
...

Desired Output所需 Output

date          distinctAverage
01.02.2020    7.5
02.02.2020    8.5
03.02.2020    15         # (9+13+3+5)/2, because 2 distinct entries out of 4 entries
...

Function Function

I want to compute the unique average of "count" for unique IDs in a groupby expression.我想计算 groupby 表达式中唯一 ID 的"count"的唯一平均值。 I group the data like this:我这样分组数据：

df.groupby(
    ["date"]
    ).agg(
        #sumCount=("count", "sum"), # works!
        #countUniqueIDs=("id", lambda x: x.nunique()),  # works!
        distinctAverage=("count", lambda x, y=df["id"]: x.sum() / y.nunique()), # Doesn't work!
        distinctAverage2=("count", "mean") # Doesn't work, takes 4 as the denominator at 03.02.2020
    ).reset_index()

Any idea on how to accomplish a distinct average?关于如何达到不同的平均水平的任何想法？

EDIT: Answer: The distinctAverage as mentioned above works just fine for the sample data.编辑：回答：上面提到的 distinctAverage 对样本数据工作得很好。 In a bigger dataset that can't be displayed here it doesn't work (for whatever reason,): and there is a workaround: After using the groupby and aggregating "sumCount" and "countUniqueIDs" , add another line after the groupby: df["workaroundDistinctAverage"] = df["sumCount"] / df["countUniqueIDs"] Not very elegant, but easier to understand than accepted answer.在此处无法显示的更大的数据集中，它不起作用（无论出于何种原因）：并且有一个解决方法：在使用 groupby 并聚合"sumCount"和"countUniqueIDs"之后，在 groupby 之后添加另一行： df["workaroundDistinctAverage"] = df["sumCount"] / df["countUniqueIDs"]不是很优雅，但比接受的答案更容易理解。

Answer 1

Save the .groupby() return in a variable and then compute what you need with .sum() and .nunique()将.groupby()返回保存在变量中，然后使用.sum()和.nunique()计算您需要的内容

grouper = df.groupby(['date'])

(
  (grouper['count'].sum() / grouper['id'].nunique())
  .reset_index(name = 'distinctAverage')
)
#output:
    date        distinctAverage
0   01.02.2020  7.5
1   02.02.2020  8.5
2   03.02.2020  15.0

Answer 2

This works just fine !这很好用！

df.groupby(["date"]).agg(
        distinctAverage=("count", lambda x, y=df["id"]: float(x.sum()/ y.nunique()))
        )

如何在 Pandas Groupby Python 中执行不同的平均值？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-08-03 15:17:24

解决方案2
1 2020-08-03 15:02:04

如何在 Pandas Groupby Python 中执行不同的平均值？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-08-03 15:17:24

解决方案2 1 2020-08-03 15:02:04

解决方案1
2 已采纳 2020-08-03 15:17:24

解决方案2
1 2020-08-03 15:02:04