[英]anesrake error: “no variables are off by more than ____” when they are
I need to weight the observations in a sample based on the marginal distributions of four demographic characteristics from a broader population.我需要根据来自更广泛人群的四个人口特征的边际分布对样本中的观察结果进行加权。 I'm currently using the package
anesrake
to do so.我目前正在使用 package
anesrake
这样做。
The population info is stored in targets
.人口信息存储在
targets
中。 This is a list containing 4 elements - one numeric vector for each respondent attribute I want to weight my sample based on.这是一个包含 4 个元素的列表 - 我要根据每个受访者属性对样本进行加权的一个数字向量。 The row names of each element represent the different categories.
每个元素的行名代表不同的类别。 I create
targets
here:我在这里创建
targets
:
quota_age <- c(0.30, 0.33, 0.37)
quota_race <- c(0.62, 0.12, 0.17, 0.5, 0.3)
quota_gender <- c(0.52, 0.48)
quota_ed <- c(0.41, 0.29, 0.19, 0.11)
names(quota_age) <- c("18 to 34", "35 to 54", "55+")
names(quota_race) <- c("White non-Hispanic", "Black non-Hispanic", "Hispanic", "Asian", "Other")
names(quota_gender) <- c("Female", "Male")
names(quota_ed) <- c("HS or less", "Some college", "Bachelors", "Advanced")
targets <- list(quota_age, quota_race, quota_gender, quota_ed)
The survey file ( m1b
) is a data frame containing demographic info and a unique ID for each respondent ( link to google sheet here ).调查文件 (
m1b
) 是一个数据框,其中包含人口统计信息和每个受访者的唯一 ID( 此处链接到谷歌表)。 Here are the first few obs:这是前几个obs:
> head(m1b)
ResponseId quota_ed quota_age quota_gender quota_race
1 R_3McITJbfcFuwc9x Some college 18 to 34 Female White non-Hispanic
2 R_2q3oeAbZgCZ5YcZ Bachelors 55+ Female White non-Hispanic
3 R_YSVccSQ1xJ6zuDv Advanced 35 to 54 Female White non-Hispanic
4 R_DubbKu7uJicbpQd Some college 35 to 54 Male White non-Hispanic
5 R_5zj5CNu598lCwRX Bachelors 55+ Male Other
6 R_21mPGFS7kHX2ELm Advanced 55+ Female White non-Hispanic
Using the anesrake
package, I want to construct a new variable called weight
that I can use to account for differences between the population and sample marginal distributions in later analyses.使用
anesrake
package,我想构建一个名为weight
的新变量,我可以在以后的分析中使用它来解释总体和样本边际分布之间的差异。
But when I call the anesrake
function like so (the pctlim
argument is extremely small to exaggerate my point):但是当我像这样调用
anesrake
function 时( pctlim
参数非常小,无法夸大我的观点):
library(anesrake)
raking <- anesrake(inputter = targets,
dataframe = m1b,
caseid = m1b$ResponseId,
choosemethod = "total",
type = "pctlim",
pctlim = 0.0000001)
I get the following error:我收到以下错误:
Error in selecthighestpcts(discrep1, inputter, pctlim) :
No variables are off by more than 0.00001 percent using the method you have chosen, either weighting is
unnecessary or a smaller pre-raking limit should be chosen.
Even though this is objectively not true.尽管这在客观上是不正确的。 Consider the quota_ed target for example:
例如,考虑 quota_ed 目标:
> targets[[4]]
HS or less Some college Bachelors Advanced
0.41 0.29 0.19 0.11
> wpct(m1b$quota_ed)
Advanced Bachelors HS or less Some college
0.1614583 0.3645833 0.1666667 0.3072917
Any thoughts on what I'm doing wrong would be greatly appreciated.任何关于我做错了什么的想法将不胜感激。 See this link to an RBloggers post for the routine I'm trying to emulate.
请参阅此链接到 RBloggers 帖子,了解我要模拟的例程。
For the anesrake function to work, the following steps might be necessary:要使 anesrake function 工作,可能需要执行以下步骤:
names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed")
.names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed")
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.