简体   繁体   English

计算R数据帧中的加权平均值

[英]Calculate weighted average in R dataframe

"f","index","values","lo.80","lo.95","hi.80","hi.95"

"auto.arima",2017-07-31 16:40:00,2.81613884762163,NA,NA,NA,NA

"auto.arima",2017-07-31 16:40:10,2.83441637197378,NA,NA,NA,NA

"auto.arima",2017-07-31 20:39:10,3.18497899649267,2.73259824384436,2.49312233904087,3.63735974914098,3.87683565394447

"auto.arima",2017-07-31 20:39:20,3.16981166809297,2.69309866988864,2.44074205235297,3.64652466629731,3.89888128383297

"ets",2017-07-31 16:40:00,2.93983529828936,NA,NA,NA,NA

"ets",2017-07-31 16:40:10,3.09739640066054,NA,NA,NA,NA

"ets",2017-07-31 20:39:10,3.1951571771414,2.80966705285567,2.60560090776504,3.58064730142714,3.78471344651776

"ets",2017-07-31 20:39:20,3.33876776870274,2.93593322313957,2.72268549604222,3.7416023142659,3.95485004136325

"bats",2017-07-31 16:40:00,2.82795253090081,NA,NA,NA,NA

"bats",2017-07-31 16:40:10,2.96389759682623,NA,NA,NA,NA

"bats",2017-07-31 20:39:10,3.1383560278272,2.76890864400062,2.573335012715,3.50780341165378,3.7033770429394

"bats",2017-07-31 20:39:20,3.3561357998535,2.98646195085452,2.79076843614824,3.72580964885248,3.92150316355876

I have a dataframe like above which has column names as:"f","index","values","lo.80","lo.95","hi.80","hi.95". 我有一个像上面这样的数据框,其列名称为:“ f”,“索引”,“值”,“ lo.80”,“ lo.95”,“ hi.80”,“ hi.95”。

What I want to do is calculate the weighted average on forecast results from different models for a particular timestamp. 我想要做的是针对特定时间戳计算不同模型的预测结果的加权平均值。 By this what i mean is 我的意思是

For every row in auto.arima there is a corresponding row in ets and bats with the same timestamp value, so weighted average should be calculated something like this: 对于auto.arima中的每一行,在ets和bat中都有对应的行,且具有相同的时间戳值,因此应计算加权平均值,如下所示:

value_arima*1/3 + values_ets*1/3 + values_bats*1/3 ; value_arima * 1/3 + values_ets * 1/3 + values_bats * 1/3; similary values for lo.80 and other columns should be calculated. 应计算lo.80和其他列的相似性值。

This result should be stored in a new dataframe with all the weighted average values. 该结果应与所有加权平均值一起存储在新的数据框中。

New dataframe can look something like: 新的数据框可能类似于:

index(timesamp from above dataframe),avg,avg_lo_80,avg_lo_95,avg_hi_80,avg_hi_95

I think I need to use spread() and mutate () function to achieve this. 我想我需要使用spread()和mutate()函数来实现这一点。 Being new to R I'm unable to proceed after forming this dataframe. 我是R的新手,在形成此数据框后无法继续操作。

Please help. 请帮忙。

The example you provide is not a weighted average but a simple average. 您提供的示例不是加权平均值,而是简单的平均值。 What you want is a simple aggregate. 您想要的是一个简单的聚合。 The first part is your dataset as provided by dput (better for sharing here) 第一部分是dput提供的数据集(最好在此处共享)

d <- structure(list(f = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L), .Label = c("auto.arima", "bats", "ets"), class = "factor"), 
index = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L), .Label = c("2017-07-31 16:40:00", "2017-07-31 16:40:10", 
"2017-07-31 20:39:10", "2017-07-31 20:39:20"), class = "factor"), 
values = c(2.81613884762163, 2.83441637197378, 3.18497899649267, 
3.16981166809297, 2.93983529828936, 3.09739640066054, 3.1951571771414, 
3.33876776870274, 2.82795253090081, 2.96389759682623, 3.1383560278272, 
3.3561357998535), lo.80 = c(NA, NA, 2.73259824384436, 2.69309866988864, 
NA, NA, 2.80966705285567, 2.93593322313957, NA, NA, 2.76890864400062, 
2.98646195085452), lo.95 = c(NA, NA, 2.49312233904087, 2.44074205235297, 
NA, NA, 2.60560090776504, 2.72268549604222, NA, NA, 2.573335012715, 
2.79076843614824), hi.80 = c(NA, NA, 3.63735974914098, 3.64652466629731, 
NA, NA, 3.58064730142714, 3.7416023142659, NA, NA, 3.50780341165378, 
3.72580964885248), hi.95 = c(NA, NA, 3.87683565394447, 3.89888128383297, 
NA, NA, 3.78471344651776, 3.95485004136325, NA, NA, 3.7033770429394, 
3.92150316355876)), .Names = c("f", "index", "values", "lo.80", 
"lo.95", "hi.80", "hi.95"), class = "data.frame", row.names = c(NA, 
-12L))

> aggregate(d[,3:7], by = d["index"], FUN = mean)
                index   values    lo.80    lo.95    hi.80    hi.95
1 2017-07-31 16:40:00 2.861309       NA       NA       NA       NA
2 2017-07-31 16:40:10 2.965237       NA       NA       NA       NA
3 2017-07-31 20:39:10 3.172831 2.770391 2.557353 3.575270 3.788309
4 2017-07-31 20:39:20 3.288238 2.871831 2.651399 3.704646 3.925078

You can save this output in an object and change the column names as you want. 您可以将此输出保存在对象中,并根据需要更改列名称。

If you really want a weighted average this is a way to obtain it (here bat has a weight of 0.8 and the 2 others 0.1) : 如果您确实想要加权平均值,那么这是一种获取加权平均值的方法(此处bat的权重为0.8,另外2个权重为0.1):

> d$weight <- (d$f)
> levels(d$weight) # check the levels
[1] "auto.arima" "bats"       "ets"       
> levels(d$weight) <- c(0.1, 0.8, 0.1)
> # transform the factor into numbers
> # warning as.numeric(d$weight) is not correct !!
> d$weight <- as.numeric(as.character((d$weight))) 
> 
> # Here the result is saved in a data.frame called "result
> result <- aggregate(d[,3:7] * d$weight, by = d["index"], FUN = sum)
> result
                index   values    lo.80    lo.95    hi.80    hi.95
1 2017-07-31 16:40:00 2.837959       NA       NA       NA       NA
2 2017-07-31 16:40:10 2.964299       NA       NA       NA       NA
3 2017-07-31 20:39:10 3.148698 2.769353 2.568540 3.528043 3.728857
4 2017-07-31 20:39:20 3.335767 2.952073 2.748958 3.719460 3.922576

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM