繁体   English   中英

在R中的预测模型中,如何使用分层后的输出影响变量?

[英]How Do You Use Post-Stratification Output to Influence Variables in a Predictive Model in R?

我目前的数据集对女性的抽样过高,以至于她们占411个样本总数的74%,应该是50%到50%。 如何使用分层后的输出影响我的(逻辑回归)预测模型?

这是我在更改被调查的女性人数时获得新的均值和支持系数的方法:

> library(foreign)
> library(survey)
> 
> mydata <- read.csv("~/Desktop/R/mydata.csv")
> 
> #Enter Actual Population Size
> mydata$fpc <- 1200
> 
> #Enter ID Column Name
> id <- mydata$My.ID
> 
> #Enter Column to Post-Stratify
> type <- mydata$Male
> 
> #Enter Column Variables
> x1 <- 0
> y1 <- 1
> 
> #Enter Corresponding Frequencies
> x2 <- 600
> y2 <- 600
> 
> #Enter the Variable of Interest
> mydata$interest <- mydata$Support
> 
> preliminary.design <- svydesign(id = ~1, data = mydata, fpc = ~fpc)
> 
> ps.weights <- data.frame(type = c(x1,y1), Freq = c(x2, y2))
> 
> mydesign <- postStratify(preliminary.design, ~type, ps.weights)
> 
> #Print Original Mean of Variable of Interest
> mean(mydata$Support)
[1] 0.6666666667
> 
> #Total Actual Population Size
> sum(ps.weights$Freq)
[1] 1200
> 
> #Unweighted Observations Where the Variable of Interest is Not Missing
> unwtd.count(~interest, mydesign)
       counts SE
counts    411  0
> 
> #Print the Post-Stratified Mean and SE of the Variable
> svymean(~interest, mydesign)
               mean      SE
interest 0.71077946 0.01935
> 
> #Print the Weighted Total and SE of the Variable
> svytotal(~interest, mydesign)
             total       SE
interest 852.93535 23.21552
> 
> #Print the Mean and SE of the Interest Variable, by Type
> svyby(~interest, ~type, mydesign, svymean)
  type     interest            se
0    0 0.6196721311 0.02256768435
1    1 0.8018867925 0.03142947839
> 
> mysvyby <- svyby(~interest, ~type, mydesign, svytotal)
> 
> #Print the Coefficients of each Type
> coef(mysvyby)
          0           1 
371.8032787 481.1320755 
> 
> #Print the Standard Error of each Type
> SE(mysvyby)
[1] 13.54061061 18.85768704
> 
> #Print Confidence Intervals for the Coefficient Estimates
> confint(mysvyby)
        2.5 %      97.5 %
0 345.2641696 398.3423878
1 444.1716880 518.0924629

上面的所有输出看起来都是正确的-但我不知道如何利用这些数据来影响我的逻辑回归模型的输出。 这是没有任何后分层影响的代码:

> mydata <- read.csv("~/Desktop/R/mydata.csv")
> 
> attach(mydata) 
> 
> # Define variables 
> 
> Y <- cbind(Support)
> X <- cbind(Black, vote, Male) 
> 
> # Descriptive statistics 
> 
> summary(Y) 
    Support         
 Min.   :0.0000000  
 1st Qu.:0.0000000  
 Median :1.0000000  
 Mean   :0.6666667  
 3rd Qu.:1.0000000  
 Max.   :1.0000000  
> 
> summary(X) 
     Black            vote                   Male          
 Min.   :0.0000000   Min.   : 0.8100   Min.   :0.0000000  
 1st Qu.:0.0000000   1st Qu.:24.0350   1st Qu.:0.0000000  
 Median :0.0000000   Median :47.6300   Median :0.0000000  
 Mean   :0.4355231   Mean   :48.0447   Mean   :0.2579075  
 3rd Qu.:1.0000000   3rd Qu.:72.1300   3rd Qu.:1.0000000  
 Max.   :1.0000000   Max.   :91.3200   Max.   :1.0000000  
> 
> table(Y) 
Y
  0   1 
137 274 
> 
> table(Y)/sum(table(Y)) 
Y
           0            1 
0.3333333333 0.6666666667 
> 
> 
> # Logit model coefficients 
> 
> logit<- glm(Y ~ X, family=binomial (link = "logit")) 
> 
> summary(logit) 

Call:
glm(formula = Y ~ X, family = binomial(link = "logit"))

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.1658288  -1.1277933   0.5904486   0.9190314   1.3256407  

Coefficients:
                  Estimate   Std. Error  z value   Pr(>|z|)    
(Intercept)    0.462496014  0.265017604  1.74515  0.0809584 .  
XBlack         1.329633506  0.244053422  5.44812 5.0904e-08 ***
Xvote         -0.008839950  0.004262016 -2.07412  0.0380678 *  
XMale          0.781144950  0.283218355  2.75810  0.0058138 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 523.21465  on 410  degrees of freedom
Residual deviance: 469.48706  on 407  degrees of freedom
AIC: 477.48706

Number of Fisher Scoring iterations: 4

> 
> # Logit model odds ratios 
> 
> exp(logit$coefficients) 
  (Intercept)        XBlack Xvote                XMale 
 1.5880327947  3.7796579101  0.9911990073  2.1839713716 

有没有一种方法可以在R中结合这两个脚本来更新我的logit模型,以便在我预测时将性别视为50/50,而不是74%的女性/ 26%的男性?

谢谢!

由于您要根据模型创建预测,因此,有一个可能的解决方案:(1)将逻辑回归模型与您手头的数据拟合(即女性占74%,男性占26%),然后(2)提取预测模型中将性别变量设置为0.5的概率。 有关更多信息,请参见?predict.glm

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM