[英]How to assign weights to sample in R
Before performing some statistical analysis I would like to add weights to my sample as a function of a variable (the population size for each areal unit) so that the higher the population size within each unit, the greater the weight it will get and the opposite.在执行一些统计分析之前,我想将权重添加到我的样本中,作为一个变量(每个区域单位的人口规模)的 function,这样每个单位内的人口规模越大,它获得的权重就越大,反之亦然. Do you have any suggestion on how to do this in R?您对如何在 R 中执行此操作有任何建议吗? Thanks in advance提前致谢
You can do this with weighted.mean()
, providing the weights as the second argument.您可以使用weighted.mean()
执行此操作,将权重作为第二个参数提供。
Here is a quick example, using population as weights.这是一个简单的例子,使用人口作为权重。
dat <- data.frame(
country = c("UK", "US", "France", "Zimbabwe"),
pop = c(6.7e4, 3.31e8, 6.8e4, 1.5e4),
love_of_british_royal_family = c(5, 9, 2, 1)
)
mean(dat$love_of_british_royal_family) # 4.25
weighted.mean(
dat$love_of_british_royal_family,
w = dat$pop
) # 8.997391
SamR's weighted.mean
requires a weight for each member of your vector. SamR 的weighted.mean
需要向量中每个成员的权重。 If you have a population vector with many members and want to weight by a catagories of population size, you could use the base R cut
function. Here is a toy example:如果你有一个包含许多成员的人口向量,并且想按人口规模的类别加权,你可以使用基数 R cut
function。这是一个玩具示例:
population <- sample(200:200000, 100)
df <- data.frame(population)
breaks <- c(200, 10000, 50000, 100000, 200000)
labels <- c(0.1, 0.2, 0.3, 0.4)
cuts <- cut(df$population, breaks = breaks, labels = labels)
df$weights <- as.numeric(as.character(cuts))
head(df)
population weights
1 25087 0.2
2 92652 0.3
3 99051 0.3
4 136376 0.4
5 184573 0.4
6 147675 0.4
Note that cuts
is a vector of factors.请注意, cuts
是因子的向量。 Therefore the as.character(cuts)
conversion is required to maintain the intended fractional weights.因此, as.character(cuts)
转换来保持预期的分数权重。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.