[英]How to perform distinct average in Pandas Groupby in Python?
I have data like this:我有这样的数据:
Input输入
>>> import pandas as pd
>>> df.head(8)
date id count
01.02.2020 a 5
01.02.2020 b 10
02.02.2020 a 6
02.02.2020 b 11
03.02.2020 a 9
03.02.2020 a 13
03.02.2020 b 3
03.02.2020 b 5
...
Desired Output所需 Output
date distinctAverage
01.02.2020 7.5
02.02.2020 8.5
03.02.2020 15 # (9+13+3+5)/2, because 2 distinct entries out of 4 entries
...
Function Function
I want to compute the unique average of "count"
for unique IDs in a groupby expression.我想计算 groupby 表达式中唯一 ID 的
"count"
的唯一平均值。 I group the data like this:我这样分组数据:
df.groupby(
["date"]
).agg(
#sumCount=("count", "sum"), # works!
#countUniqueIDs=("id", lambda x: x.nunique()), # works!
distinctAverage=("count", lambda x, y=df["id"]: x.sum() / y.nunique()), # Doesn't work!
distinctAverage2=("count", "mean") # Doesn't work, takes 4 as the denominator at 03.02.2020
).reset_index()
Any idea on how to accomplish a distinct average?关于如何达到不同的平均水平的任何想法?
EDIT: Answer: The distinctAverage as mentioned above works just fine for the sample data.编辑:回答:上面提到的 distinctAverage 对样本数据工作得很好。 In a bigger dataset that can't be displayed here it doesn't work (for whatever reason,): and there is a workaround: After using the groupby and aggregating
"sumCount"
and "countUniqueIDs"
, add another line after the groupby: df["workaroundDistinctAverage"] = df["sumCount"] / df["countUniqueIDs"]
Not very elegant, but easier to understand than accepted answer.在此处无法显示的更大的数据集中,它不起作用(无论出于何种原因):并且有一个解决方法:在使用 groupby 并聚合
"sumCount"
和"countUniqueIDs"
之后,在 groupby 之后添加另一行: df["workaroundDistinctAverage"] = df["sumCount"] / df["countUniqueIDs"]
不是很优雅,但比接受的答案更容易理解。
Save the .groupby()
return in a variable and then compute what you need with .sum()
and .nunique()
将
.groupby()
返回保存在变量中,然后使用.sum()
和.nunique()
计算您需要的内容
grouper = df.groupby(['date'])
(
(grouper['count'].sum() / grouper['id'].nunique())
.reset_index(name = 'distinctAverage')
)
#output:
date distinctAverage
0 01.02.2020 7.5
1 02.02.2020 8.5
2 03.02.2020 15.0
This works just fine !这很好用!
df.groupby(["date"]).agg(
distinctAverage=("count", lambda x, y=df["id"]: float(x.sum()/ y.nunique()))
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.