简体   繁体   English

如何决定我是否需要在 R 的回归中使用权重

[英]How to decide if I need to use weights in regressions in R

I have a dataset which is a combination of some survey results and some demographics.我有一个数据集,它结合了一些调查结果和一些人口统计数据。 All survey results are normalized by population density.所有调查结果均按人口密度标准化。 Now I want to design a model to see the relationship between some of the variables.现在我想设计一个模型来查看一些变量之间的关系。 The model looks like this:该模型如下所示:

lm(log(violation+1) ~ Wighted.mean + mks + In_ct + Asian + Black + Hispanic + PopDen + MedHouseIncome, data = dt, weights = pop) lm(log(violation+1) ~ Wighted.mean + mks + In_ct + Asian + Black + Hispanic + PopDen + MedHouseIncome, data = dt, weights = pop)

How can I decide if weights is useful here?我如何确定权重在这里是否有用? When I remove it I get different coefficients with less R-square.当我删除它时,我会得到不同的系数,而 R 方较少。 But I feel like that is not enough to decide.但我觉得这还不够决定。 Can anyone give me suggestions of how to decide that?谁能给我建议如何决定?

使用 summary(m.lm) 并使用最小估计值(f.ex. <10% 的可变性)和最高 Pr(>|t|) 值(f.ex. > 0.05)删除权重。

At the very high level:在非常高的水平上:

If you have weights from a survey data set they might be doing quite a few things, the most straightforward of which is allowing you to offset the survey's sampling scheme.如果您从调查数据集中获得权重,他们可能会做很多事情,其中​​最直接的就是允许您抵消调查的抽样方案。 For example if women were over-sampled relative to men then the weights would reflect this and analyses that used them would be correct for the actual population's gender balance rather than the one in the data.例如,如果女性相对于男性被过度抽样,那么权重将反映这一点,并且使用它们的分析对于实际人口的性别平衡而不是数据中的性别平衡来说是正确的。 In your case, they might be offsetting the standardization too.在您的情况下,它们也可能会抵消标准化。

In short, weights change your estimand (the quantity your estimation strategy is targeting).简而言之,权重会改变您的估计量(您的估计策略所针对的数量)。 So if you care about the quantities your survey thinks you ought to care about, eg to be 'representative' to a particular population, then you'd want to use its weights.因此,如果您关心您的调查认为您应该关心的数量,例如“代表”特定人群,那么您需要使用其权重。

But things are, inevitably, more complicated than that, as weights can offset other features and perhaps less necessary when your model's covariates include the one used to unbalance the sample, or when you want particular conditional effects.但事情不可避免地比这更复杂,因为权重可以抵消其他特征,并且当您的模型的协变量包括用于使样本不平衡的协变量时,或者当您想要特定的条件效果时,权重可能不太必要。

The best advice is to take a look at the survey's variable codebook and see what it thinks the weights will do for you.最好的建议是查看调查的变量码本,看看它认为权重对你有什么作用。 (There may indeed be different weights for different purposes). (对于不同的目的可能确实有不同的权重)。 Then make your decisions on that basis.然后在此基础上做出决定。 Certainly not on whether the model summaries look different with and without them.当然不是关于模型摘要在有和没有它们的情况下看起来是否不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM