简体   繁体   English

如何为两列中的每个唯一值计算 Python 中的加权平均值?

[英]How to calculate a weighted average in Python for each unique value in two columns?

The picture below shows a few lines of printed lists I have in Python.下图显示了我在 Python 中的几行打印列表。 I would like to get: a list of unique values of boroughs, a corresponding list of unique values of years, and a list of weighted averages of "averages" with "nobs" as weights but for each borough and each year (the variable "type" indicates if there was just one, two or three types in a specific year in a borough).我想得到:一个行政区的唯一值列表,一个相应的年份唯一值列表,以及一个“平均值”的加权平均值列表,其中“nobs”作为权重,但对于每个行政区和每年(变量“ type”表示在一个自治市镇的特定年份中是否只有一种、两种或三种类型)。

I know how to get a weighted average using the entire lists:我知道如何使用整个列表获得加权平均值:

weighted_avg = np.average(average, weights=nobs)

But I don't know how to calculate one for each unique borough-year.但我不知道如何为每个独特的自治市镇年度计算一个。

在此处输入图像描述

I'm new to Python, please help if you know how to do it.我是 Python 的新手,如果你知道怎么做,请帮忙。

Assuming that the 'type' column doesn't affect your calculations, you can get the average using groupby .假设“类型”列不影响您的计算,您可以使用groupby获得平均值。 Here's the data:这是数据:

df = pd.DataFrame({'borough': ['b1', 'b2']*6, 'year': [2008, 2009, 2010, 2011]*3,
          'average': np.random.randint(low=100, high=200, size=12), 
          'nobs': np.random.randint(low=1, high=40, size=12)})
print(df):
   borough  year  average  nobs
0       b1  2008      166     1
1       b2  2009      177    35
2       b1  2010      114    27
3       b2  2011      187    18
4       b1  2008      193     2
5       b2  2009      105    27
6       b1  2010      114    36
7       b2  2011      144     3
8       b1  2008      114    39
9       b2  2009      157     6
10      b1  2010      133    17
11      b2  2011      176    12

we add a new column which is the product of the average and nobs columns:我们添加一个新列,它是 average 和 nobs 列的乘积:

df['average x nobs'] = df['average']*df['nobs']
newdf = pd.DataFrame({'weighted average': df.groupby(['borough', 'year']).sum()['average x nobs']/df.groupby(['borough', 'year']).sum()['nobs']})
print(newdf):
              weighted average
borough year                  
b1      2008        119.000000
        2010        118.037500
b2      2009        146.647059
        2011        179.090909

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python 计算时间加权平均值? - How to calculate time weighted average using Python? 如何在Python中按两列分组,计算加权平均值,返回DataFrame - How to group by two columns, calculate weighted mean, return DataFrame, in Python 计算 Python 中按索引分组的 NaN 值的多列的加权平均值 - Calculate weighted average for multiple columns with NaN values grouped by index in Python 计算具有特殊条件的熊猫的加权平均值 - Calculate weighted average in pandas with unique condition 如何计算 Python 中文件列的平均值? - how do I calculate average value of columns of a file in Python? 使用 Python 中的 groupby 计算加权平均值 - Calculate the weighted average using groupby in Python 如何在 dataframe 中选择两行,计算每列中两个值的平均值并将新行与 dataframe 中的平均值相加 - How to choose two rows in a dataframe, calculate the average of both values in each columns and add the new row with the averages in the dataframe 按其他 dataframe 对 dataframe 的列进行分组并计算聚合列的加权平均值 - Grouping columns of dataframe by other dataframe and calculate weighted average of aggregated columns 如何使用熊猫计算累积加权平均值 - How to calculate cumulative weighted average using pandas 如何计算三只股票的加权平均值 - How to calculate the weighted average from three stocks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM