[英]Weighted mean calculation in R with missing values
Does anyone know if it is possible to calculate a weighted mean in R when values are missing, and when values are missing, the weights for the existing values are scaled upward proportionately? 有谁知道是否可以在缺少值时计算R中的加权平均值,而在缺少值时,现有值的权重会按比例向上缩放?
To convey this clearly, I created a hypothetical scenario. 为了清楚地表达这一点,我创建了一个假设的场景。 This describes the root of the question, where the scalar needs to be adjusted for each row, depending on which values are missing.
这描述了问题的根源,需要根据丢失的值为每一行调整标量。
The best way to post an example dataset is to use dput(head(dat, 20))
, where dat
is the name of a dataset. 发布示例数据集的最佳方法是使用
dput(head(dat, 20))
,其中dat
是数据集的名称。 Graphic images are a really bad choice for that. 图形图像是一个非常糟糕的选择。
DATA. 数据。
dat <-
structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA),
Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")
w <-
structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA
), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")
CODE. 码。
You can use function weighted.mean
in base package stats
and sapply
for this. 您可以使用功能
weighted.mean
在基础包stats
和sapply
这一点。 Note that if your datasets of notes and weights are R objects of class matrix
you will not need unlist
. 请注意,如果音符和权重的数据集是类
matrix
R个对象,则不需要unlist
。
sapply(seq_len(nrow(dat)), function(i){
weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE)
})
Using weighted.mean
from the base stats
package with the argument na.rm = TRUE
should get you the result you need. 使用
weighted.mean
从基础stats
包的说法na.rm = TRUE
应该得到你需要的结果。 Here is a tidyverse
way this could be done: 这里是一个
tidyverse
方式可以这样做:
library(tidyverse)
scores <- tribble(
~student, ~test1, ~test2, ~test3,
"Mark", 90, 91, 92,
"Mike", NA, 79, 98,
"Nick", 81, NA, 83)
weights <- tribble(
~test, ~weight,
"test1", 0.2,
"test2", 0.4,
"test3", 0.4)
scores %>%
gather(test, score, -student) %>%
left_join(weights, by = "test") %>%
group_by(student) %>%
summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#> student result
#> <chr> <dbl>
#> 1 Mark 91.20000
#> 2 Mike 88.50000
#> 3 Nick 82.33333
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.