[英]Varying the weights variable, based on the grouping variable
我有如下示例數據:
library(diagis) # weighted_se
table_selection <- structure(list(year = c(2006, 2006, 2006, 2006, 2006), Totaal_pop_weights = c(12.125,
12.125, 12.125, 12.125, 12.125), Y02_pop_weights = c(97, 97,
97, 97, 97), Y01_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), h10_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), A_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), B_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), C_ha_pop_weights = c(97, 97, 97, 97, 97), D_ha_pop_weights = c(12.125,
12.125, 12.125, 12.125, 12.125), variable = structure(c(2L, 1L,
1L, 4L, 1L), levels = c("A_ha", "B_ha", "C_ha",
"C_ha", "Y01", "Y02", "Totaal", "X10"), class = "factor"),
value = c(2, 3, 1, 1, 12.9)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
year Totaal_pop_weights Y02_pop_weights Y01_pop_weights h10_pop_weights A_ha_pop_weights B_ha_pop_weights
1: 2006 12.125 97 12.125 12.125 12.125 12.125
2: 2006 12.125 97 12.125 12.125 12.125 12.125
3: 2006 12.125 97 12.125 12.125 12.125 12.125
4: 2006 12.125 97 12.125 12.125 12.125 12.125
5: 2006 12.125 97 12.125 12.125 12.125 12.125
C_ha_pop_weights D_ha_pop_weights variable value
1: 97 12.125 B_ha 2.0
2: 97 12.125 A_ha 3.0
3: 97 12.125 A_ha 1.0
4: 97 12.125 C_ha 1.0
5: 97 12.125 A_ha 12.9
我想按如下方式對觀察結果進行加權:
weights_of_interest <- select(table_selection, contains(c("weights")))
table_selection <- table_selection %>%
group_by(year, variable) %>%
summarize(weighted_mean = weighted_mean(value, w = Y01_pop_weights , na.rm=TRUE),
weighted_se = weighted_se(value, w = Y01_pop_weights , na.rm=TRUE))
但這始終使用相同的權重Y01_pop_weights
。 如何更改權重,以便變量為A_ha
的值使用A_ha_pop_weights
作為權重。
如果table_selection
是 data.table(如您的示例數據所示),您可以創建一個新的單列wt
來根據variable
中的值保存彈出權重值
table_selection[
,
wt:=.SD[[paste0(variable,"_pop_weights")]][1],
by = 1:nrow(table_selection),
.SDcols = patterns("ha_pop_weights")
]
這是使用dplyr
( rowwise()
和cacross()
)的相同方法
# helper function
f <- function(d,v) d[[paste0(v,"_pop_weights")]][1]
# vector of wt variable names
ha_wts = names(table_selection)[grepl("ha_pop_weights$", names(table_selection))]
# mutate the `wt` column
table_selection %>%
rowwise() %>%
mutate(wt = f(setNames(c_across(all_of(ha_wts)), ha_wts),variable))
使用任何一種方法,您都可以在上面的summarize()
調用中使用w=wt
。
如果你想要一個 tidyverse 解決方案,我認為 go 的方法是使用 tidyr 將數據轉換為長格式。 我的電腦不知道函數“weighed_mean”或“weighed_se”,所以我不能 100% 確定這會起作用。
library(magrittr)
table_selection %>%
tidyr::pivot_longer(cols = tidyselect::contains("weights"),
values_to = "pop_values",
names_to = "NAMES") %>%
dplyr::group_by(year, variable, NAMES) %>%
dplyr::summarize(weighted_mean = weighted_mean(value, w = pop_values, na.rm=TRUE),
weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))
但是使用來自統計數據 package 的 weighted.mean ...
table_selection %>%
tidyr::pivot_longer(cols = tidyselect::contains("weights"),
values_to = "pop_values",
names_to = "NAMES") %>%
dplyr::group_by(year, variable, NAMES) %>%
dplyr::summarize(weighted_mean = stats::weighted.mean(value, w = pop_values , na.rm=TRUE),
#weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))
回報:
# A tibble: 24 x 4
# Groups: year, variable [3]
year variable NAMES weighted_mean
<dbl> <fct> <chr> <dbl>
1 2006 A_ha A_ha_pop_weights 5.63
2 2006 A_ha B_ha_pop_weights 5.63
3 2006 A_ha C_ha_pop_weights 5.63
4 2006 A_ha D_ha_pop_weights 5.63
5 2006 A_ha h10_pop_weights 5.63
6 2006 A_ha Totaal_pop_weights 5.63
7 2006 A_ha Y01_pop_weights 5.63
8 2006 A_ha Y02_pop_weights 5.63
9 2006 B_ha A_ha_pop_weights 2
10 2006 B_ha B_ha_pop_weights 2
# ... with 14 more rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.