根據分組變量改變權重變量

Question

我有如下示例數據：

library(diagis) # weighted_se
table_selection <- structure(list(year = c(2006, 2006, 2006, 2006, 2006), Totaal_pop_weights = c(12.125, 
12.125, 12.125, 12.125, 12.125), Y02_pop_weights = c(97, 97, 
97, 97, 97), Y01_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), h10_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), A_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), B_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), C_ha_pop_weights = c(97, 97, 97, 97, 97), D_ha_pop_weights = c(12.125, 
12.125, 12.125, 12.125, 12.125), variable = structure(c(2L, 1L, 
1L, 4L, 1L), levels = c("A_ha", "B_ha", "C_ha", 
"C_ha", "Y01", "Y02", "Totaal", "X10"), class = "factor"), 
    value = c(2, 3, 1, 1, 12.9)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))


   year Totaal_pop_weights Y02_pop_weights Y01_pop_weights h10_pop_weights A_ha_pop_weights B_ha_pop_weights
1: 2006             12.125              97          12.125          12.125           12.125           12.125
2: 2006             12.125              97          12.125          12.125           12.125           12.125
3: 2006             12.125              97          12.125          12.125           12.125           12.125
4: 2006             12.125              97          12.125          12.125           12.125           12.125
5: 2006             12.125              97          12.125          12.125           12.125           12.125
   C_ha_pop_weights D_ha_pop_weights variable value
1:               97           12.125     B_ha   2.0
2:               97           12.125     A_ha   3.0
3:               97           12.125     A_ha   1.0
4:               97           12.125     C_ha   1.0
5:               97           12.125     A_ha  12.9

我想按如下方式對觀察結果進行加權：

weights_of_interest <- select(table_selection, contains(c("weights")))
table_selection <- table_selection %>%
    group_by(year, variable) %>%
    summarize(weighted_mean = weighted_mean(value, w = Y01_pop_weights , na.rm=TRUE),
              weighted_se = weighted_se(value, w = Y01_pop_weights , na.rm=TRUE))

但這始終使用相同的權重Y01_pop_weights 。 如何更改權重，以便變量為A_ha的值使用A_ha_pop_weights作為權重。

Answer 1

如果table_selection是 data.table（如您的示例數據所示），您可以創建一個新的單列wt來根據variable中的值保存彈出權重值

table_selection[
  ,
  wt:=.SD[[paste0(variable,"_pop_weights")]][1],
  by = 1:nrow(table_selection),
  .SDcols = patterns("ha_pop_weights")
]

這是使用dplyr （ rowwise()和cacross() ）的相同方法

# helper function
f <- function(d,v) d[[paste0(v,"_pop_weights")]][1]

# vector of wt variable names
ha_wts = names(table_selection)[grepl("ha_pop_weights$", names(table_selection))]

# mutate the `wt` column
table_selection %>% 
  rowwise() %>% 
  mutate(wt = f(setNames(c_across(all_of(ha_wts)), ha_wts),variable))

使用任何一種方法，您都可以在上面的summarize()調用中使用w=wt 。

Answer 2

如果你想要一個 tidyverse 解決方案，我認為 go 的方法是使用 tidyr 將數據轉換為長格式。 我的電腦不知道函數“weighed_mean”或“weighed_se”，所以我不能 100% 確定這會起作用。

library(magrittr)
table_selection %>% 
  tidyr::pivot_longer(cols = tidyselect::contains("weights"),
                      values_to = "pop_values",
                      names_to = "NAMES") %>% 
  dplyr::group_by(year, variable, NAMES) %>%
  dplyr::summarize(weighted_mean = weighted_mean(value, w = pop_values, na.rm=TRUE),
weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))

但是使用來自統計數據 package 的 weighted.mean ...

table_selection %>% 
  tidyr::pivot_longer(cols = tidyselect::contains("weights"),
                      values_to = "pop_values",
                      names_to = "NAMES") %>% 
  dplyr::group_by(year, variable, NAMES) %>%
  dplyr::summarize(weighted_mean = stats::weighted.mean(value, w = pop_values , na.rm=TRUE),
                   #weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))

回報：

# A tibble: 24 x 4
# Groups:   year, variable [3]
    year variable NAMES              weighted_mean
   <dbl> <fct>    <chr>                      <dbl>
 1  2006 A_ha     A_ha_pop_weights            5.63
 2  2006 A_ha     B_ha_pop_weights            5.63
 3  2006 A_ha     C_ha_pop_weights            5.63
 4  2006 A_ha     D_ha_pop_weights            5.63
 5  2006 A_ha     h10_pop_weights             5.63
 6  2006 A_ha     Totaal_pop_weights          5.63
 7  2006 A_ha     Y01_pop_weights             5.63
 8  2006 A_ha     Y02_pop_weights             5.63
 9  2006 B_ha     A_ha_pop_weights            2   
10  2006 B_ha     B_ha_pop_weights            2   
# ... with 14 more rows

根據分組變量改變權重變量

問題描述

2 個解決方案

解決方案1
1 已采納 2023-01-07 14:35:38

解決方案2
1 2023-01-07 14:51:27

根據分組變量改變權重變量

問題描述

2 個解決方案

解決方案1 1 已采納 2023-01-07 14:35:38

解決方案2 1 2023-01-07 14:51:27

解決方案1
1 已采納 2023-01-07 14:35:38

解決方案2
1 2023-01-07 14:51:27