如何为两列中的每个唯一值计算 Python 中的加权平均值？

Question

The picture below shows a few lines of printed lists I have in Python.下图显示了我在 Python 中的几行打印列表。 I would like to get: a list of unique values of boroughs, a corresponding list of unique values of years, and a list of weighted averages of "averages" with "nobs" as weights but for each borough and each year (the variable "type" indicates if there was just one, two or three types in a specific year in a borough).我想得到：一个行政区的唯一值列表，一个相应的年份唯一值列表，以及一个“平均值”的加权平均值列表，其中“nobs”作为权重，但对于每个行政区和每年（变量“ type”表示在一个自治市镇的特定年份中是否只有一种、两种或三种类型）。

I know how to get a weighted average using the entire lists:我知道如何使用整个列表获得加权平均值：

weighted_avg = np.average(average, weights=nobs)

But I don't know how to calculate one for each unique borough-year.但我不知道如何为每个独特的自治市镇年度计算一个。

I'm new to Python, please help if you know how to do it.我是 Python 的新手，如果你知道怎么做，请帮忙。

Answer 1

Assuming that the 'type' column doesn't affect your calculations, you can get the average using groupby .假设“类型”列不影响您的计算，您可以使用groupby获得平均值。 Here's the data:这是数据：

df = pd.DataFrame({'borough': ['b1', 'b2']*6, 'year': [2008, 2009, 2010, 2011]*3,
          'average': np.random.randint(low=100, high=200, size=12), 
          'nobs': np.random.randint(low=1, high=40, size=12)})
print(df):
   borough  year  average  nobs
0       b1  2008      166     1
1       b2  2009      177    35
2       b1  2010      114    27
3       b2  2011      187    18
4       b1  2008      193     2
5       b2  2009      105    27
6       b1  2010      114    36
7       b2  2011      144     3
8       b1  2008      114    39
9       b2  2009      157     6
10      b1  2010      133    17
11      b2  2011      176    12

we add a new column which is the product of the average and nobs columns:我们添加一个新列，它是 average 和 nobs 列的乘积：

df['average x nobs'] = df['average']*df['nobs']
newdf = pd.DataFrame({'weighted average': df.groupby(['borough', 'year']).sum()['average x nobs']/df.groupby(['borough', 'year']).sum()['nobs']})
print(newdf):
              weighted average
borough year                  
b1      2008        119.000000
        2010        118.037500
b2      2009        146.647059
        2011        179.090909

如何为两列中的每个唯一值计算 Python 中的加权平均值？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-15 13:45:11

如何为两列中的每个唯一值计算 Python 中的加权平均值？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-15 13:45:11

解决方案1
1 已采纳 2020-07-15 13:45:11