简体   繁体   English

anesrake 错误:“没有变量偏离超过 ____”

[英]anesrake error: “no variables are off by more than ____” when they are

I need to weight the observations in a sample based on the marginal distributions of four demographic characteristics from a broader population.我需要根据来自更广泛人群的四个人口特征的边际分布对样本中的观察结果进行加权。 I'm currently using the package anesrake to do so.我目前正在使用 package anesrake这样做。

The population info is stored in targets .人口信息存储在targets中。 This is a list containing 4 elements - one numeric vector for each respondent attribute I want to weight my sample based on.这是一个包含 4 个元素的列表 - 我要根据每个受访者属性对样本进行加权的一个数字向量。 The row names of each element represent the different categories.每个元素的行名代表不同的类别。 I create targets here:我在这里创建targets

quota_age    <- c(0.30, 0.33, 0.37)
quota_race   <- c(0.62, 0.12, 0.17, 0.5, 0.3)
quota_gender <- c(0.52, 0.48)
quota_ed     <- c(0.41, 0.29, 0.19, 0.11)

names(quota_age)    <- c("18 to 34", "35 to 54", "55+")
names(quota_race)   <- c("White non-Hispanic", "Black non-Hispanic", "Hispanic", "Asian", "Other")
names(quota_gender) <- c("Female", "Male")
names(quota_ed)     <- c("HS or less", "Some college", "Bachelors", "Advanced")

targets <- list(quota_age, quota_race, quota_gender, quota_ed)

The survey file ( m1b ) is a data frame containing demographic info and a unique ID for each respondent ( link to google sheet here ).调查文件 ( m1b ) 是一个数据框,其中包含人口统计信息和每个受访者的唯一 ID( 此处链接到谷歌表)。 Here are the first few obs:这是前几个obs:

> head(m1b)
         ResponseId     quota_ed quota_age quota_gender         quota_race
1 R_3McITJbfcFuwc9x Some college  18 to 34       Female White non-Hispanic
2 R_2q3oeAbZgCZ5YcZ    Bachelors       55+       Female White non-Hispanic
3 R_YSVccSQ1xJ6zuDv     Advanced  35 to 54       Female White non-Hispanic
4 R_DubbKu7uJicbpQd Some college  35 to 54         Male White non-Hispanic
5 R_5zj5CNu598lCwRX    Bachelors       55+         Male              Other
6 R_21mPGFS7kHX2ELm     Advanced       55+       Female White non-Hispanic

Using the anesrake package, I want to construct a new variable called weight that I can use to account for differences between the population and sample marginal distributions in later analyses.使用anesrake package,我想构建一个名为weight的新变量,我可以在以后的分析中使用它来解释总体和样本边际分布之间的差异。

But when I call the anesrake function like so (the pctlim argument is extremely small to exaggerate my point):但是当我像这样调用anesrake function 时( pctlim参数非常小,无法夸大我的观点):

library(anesrake)

raking <- anesrake(inputter     = targets,
                   dataframe    = m1b,
                   caseid       = m1b$ResponseId,
                   choosemethod = "total",
                   type         = "pctlim",
                   pctlim       = 0.0000001)

I get the following error:我收到以下错误:

    Error in selecthighestpcts(discrep1, inputter, pctlim) : 
      No variables are off by more than 0.00001 percent using the method you have chosen, either weighting is 
unnecessary or a smaller pre-raking limit should be chosen.

Even though this is objectively not true.尽管这在客观上是不正确的。 Consider the quota_ed target for example:例如,考虑 quota_ed 目标:

> targets[[4]]
  HS or less Some college    Bachelors     Advanced 
        0.41         0.29         0.19         0.11 
> wpct(m1b$quota_ed)
    Advanced    Bachelors   HS or less Some college 
   0.1614583    0.3645833    0.1666667    0.3072917

Any thoughts on what I'm doing wrong would be greatly appreciated.任何关于我做错了什么的想法将不胜感激。 See this link to an RBloggers post for the routine I'm trying to emulate.请参阅此链接到 RBloggers 帖子,了解我要模拟的例程。

For the anesrake function to work, the following steps might be necessary:要使 anesrake function 工作,可能需要执行以下步骤:

  1. Convert your weighting variables to factors.将您的权重变量转换为因子。 Make sure that they don't contain empty levels.确保它们不包含空级别。
  2. Exclude empty levels also from your targets.也从您的目标中排除空级别。 Eg let's assume nobody of age 55+ would be in your data.例如,假设您的数据中没有 55 岁以上的人。 Then you should drop that level from a) the quota_age variable as well as b) from your m1b data.然后,您应该从 a) quota_age 变量以及 b) 从您的 m1b 数据中删除该级别。
  3. The first level of your list also need to be named with the specific column names taht are supposed to be weighted, ie after your commands add: names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed") .您的列表的第一级还需要使用应该加权的特定列名命名,即在您的命令添加后: names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM