简体   繁体   English

R中缺少值的加权平均值计算

[英]Weighted mean calculation in R with missing values

Does anyone know if it is possible to calculate a weighted mean in R when values are missing, and when values are missing, the weights for the existing values are scaled upward proportionately? 有谁知道是否可以在缺少值时计算R中的加权平均值,而在缺少值时,现有值的权重会按比例向上缩放?

To convey this clearly, I created a hypothetical scenario. 为了清楚地表达这一点,我创建了一个假设的场景。 This describes the root of the question, where the scalar needs to be adjusted for each row, depending on which values are missing. 这描述了问题的根源,需要根据丢失的值为每一行调整标量。

Image: Weighted Mean Calculation 图片:加权平均值计算

File: Weighted Mean Calculation in Excel 文件:Excel中的加权平均值计算

The best way to post an example dataset is to use dput(head(dat, 20)) , where dat is the name of a dataset. 发布示例数据集的最佳方法是使用dput(head(dat, 20)) ,其中dat是数据集的名称。 Graphic images are a really bad choice for that. 图形图像是一个非常糟糕的选择。
DATA. 数据。

dat <-
structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA), 
    Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

w <-
structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA
), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

CODE. 码。
You can use function weighted.mean in base package stats and sapply for this. 您可以使用功能weighted.mean在基础包statssapply这一点。 Note that if your datasets of notes and weights are R objects of class matrix you will not need unlist . 请注意,如果音符和权重的数据集是类matrix R个对象,则不需要unlist

sapply(seq_len(nrow(dat)), function(i){
    weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE)
})

Using weighted.mean from the base stats package with the argument na.rm = TRUE should get you the result you need. 使用weighted.mean从基础stats包的说法na.rm = TRUE应该得到你需要的结果。 Here is a tidyverse way this could be done: 这里是一个tidyverse方式可以这样做:

library(tidyverse)
scores <- tribble(
 ~student, ~test1, ~test2, ~test3,
   "Mark",     90,     91,     92,
   "Mike",     NA,     79,     98,
   "Nick",     81,     NA,     83)

weights <- tribble(
  ~test,   ~weight, 
  "test1",     0.2, 
  "test2",     0.4,
  "test3",     0.4)

scores %>% 
  gather(test, score, -student) %>%
  left_join(weights, by = "test") %>%
  group_by(student) %>%
  summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#>   student   result
#>     <chr>    <dbl>
#> 1    Mark 91.20000
#> 2    Mike 88.50000
#> 3    Nick 82.33333

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM