[英]How to calculate a weighted average in Python for each unique value in two columns?
The picture below shows a few lines of printed lists I have in Python.下图显示了我在 Python 中的几行打印列表。 I would like to get: a list of unique values of boroughs, a corresponding list of unique values of years, and a list of weighted averages of "averages" with "nobs" as weights but for each borough and each year (the variable "type" indicates if there was just one, two or three types in a specific year in a borough).
我想得到:一个行政区的唯一值列表,一个相应的年份唯一值列表,以及一个“平均值”的加权平均值列表,其中“nobs”作为权重,但对于每个行政区和每年(变量“ type”表示在一个自治市镇的特定年份中是否只有一种、两种或三种类型)。
I know how to get a weighted average using the entire lists:我知道如何使用整个列表获得加权平均值:
weighted_avg = np.average(average, weights=nobs)
But I don't know how to calculate one for each unique borough-year.但我不知道如何为每个独特的自治市镇年度计算一个。
I'm new to Python, please help if you know how to do it.我是 Python 的新手,如果你知道怎么做,请帮忙。
Assuming that the 'type' column doesn't affect your calculations, you can get the average using groupby
.假设“类型”列不影响您的计算,您可以使用
groupby
获得平均值。 Here's the data:这是数据:
df = pd.DataFrame({'borough': ['b1', 'b2']*6, 'year': [2008, 2009, 2010, 2011]*3,
'average': np.random.randint(low=100, high=200, size=12),
'nobs': np.random.randint(low=1, high=40, size=12)})
print(df):
borough year average nobs
0 b1 2008 166 1
1 b2 2009 177 35
2 b1 2010 114 27
3 b2 2011 187 18
4 b1 2008 193 2
5 b2 2009 105 27
6 b1 2010 114 36
7 b2 2011 144 3
8 b1 2008 114 39
9 b2 2009 157 6
10 b1 2010 133 17
11 b2 2011 176 12
we add a new column which is the product of the average and nobs columns:我们添加一个新列,它是 average 和 nobs 列的乘积:
df['average x nobs'] = df['average']*df['nobs']
newdf = pd.DataFrame({'weighted average': df.groupby(['borough', 'year']).sum()['average x nobs']/df.groupby(['borough', 'year']).sum()['nobs']})
print(newdf):
weighted average
borough year
b1 2008 119.000000
2010 118.037500
b2 2009 146.647059
2011 179.090909
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.