簡體   English   中英

在R中的預測模型中,如何使用分層后的輸出影響變量?

[英]How Do You Use Post-Stratification Output to Influence Variables in a Predictive Model in R?

我目前的數據集對女性的抽樣過高,以至於她們占411個樣本總數的74%,應該是50%到50%。 如何使用分層后的輸出影響我的(邏輯回歸)預測模型?

這是我在更改被調查的女性人數時獲得新的均值和支持系數的方法:

> library(foreign)
> library(survey)
> 
> mydata <- read.csv("~/Desktop/R/mydata.csv")
> 
> #Enter Actual Population Size
> mydata$fpc <- 1200
> 
> #Enter ID Column Name
> id <- mydata$My.ID
> 
> #Enter Column to Post-Stratify
> type <- mydata$Male
> 
> #Enter Column Variables
> x1 <- 0
> y1 <- 1
> 
> #Enter Corresponding Frequencies
> x2 <- 600
> y2 <- 600
> 
> #Enter the Variable of Interest
> mydata$interest <- mydata$Support
> 
> preliminary.design <- svydesign(id = ~1, data = mydata, fpc = ~fpc)
> 
> ps.weights <- data.frame(type = c(x1,y1), Freq = c(x2, y2))
> 
> mydesign <- postStratify(preliminary.design, ~type, ps.weights)
> 
> #Print Original Mean of Variable of Interest
> mean(mydata$Support)
[1] 0.6666666667
> 
> #Total Actual Population Size
> sum(ps.weights$Freq)
[1] 1200
> 
> #Unweighted Observations Where the Variable of Interest is Not Missing
> unwtd.count(~interest, mydesign)
       counts SE
counts    411  0
> 
> #Print the Post-Stratified Mean and SE of the Variable
> svymean(~interest, mydesign)
               mean      SE
interest 0.71077946 0.01935
> 
> #Print the Weighted Total and SE of the Variable
> svytotal(~interest, mydesign)
             total       SE
interest 852.93535 23.21552
> 
> #Print the Mean and SE of the Interest Variable, by Type
> svyby(~interest, ~type, mydesign, svymean)
  type     interest            se
0    0 0.6196721311 0.02256768435
1    1 0.8018867925 0.03142947839
> 
> mysvyby <- svyby(~interest, ~type, mydesign, svytotal)
> 
> #Print the Coefficients of each Type
> coef(mysvyby)
          0           1 
371.8032787 481.1320755 
> 
> #Print the Standard Error of each Type
> SE(mysvyby)
[1] 13.54061061 18.85768704
> 
> #Print Confidence Intervals for the Coefficient Estimates
> confint(mysvyby)
        2.5 %      97.5 %
0 345.2641696 398.3423878
1 444.1716880 518.0924629

上面的所有輸出看起來都是正確的-但我不知道如何利用這些數據來影響我的邏輯回歸模型的輸出。 這是沒有任何后分層影響的代碼:

> mydata <- read.csv("~/Desktop/R/mydata.csv")
> 
> attach(mydata) 
> 
> # Define variables 
> 
> Y <- cbind(Support)
> X <- cbind(Black, vote, Male) 
> 
> # Descriptive statistics 
> 
> summary(Y) 
    Support         
 Min.   :0.0000000  
 1st Qu.:0.0000000  
 Median :1.0000000  
 Mean   :0.6666667  
 3rd Qu.:1.0000000  
 Max.   :1.0000000  
> 
> summary(X) 
     Black            vote                   Male          
 Min.   :0.0000000   Min.   : 0.8100   Min.   :0.0000000  
 1st Qu.:0.0000000   1st Qu.:24.0350   1st Qu.:0.0000000  
 Median :0.0000000   Median :47.6300   Median :0.0000000  
 Mean   :0.4355231   Mean   :48.0447   Mean   :0.2579075  
 3rd Qu.:1.0000000   3rd Qu.:72.1300   3rd Qu.:1.0000000  
 Max.   :1.0000000   Max.   :91.3200   Max.   :1.0000000  
> 
> table(Y) 
Y
  0   1 
137 274 
> 
> table(Y)/sum(table(Y)) 
Y
           0            1 
0.3333333333 0.6666666667 
> 
> 
> # Logit model coefficients 
> 
> logit<- glm(Y ~ X, family=binomial (link = "logit")) 
> 
> summary(logit) 

Call:
glm(formula = Y ~ X, family = binomial(link = "logit"))

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.1658288  -1.1277933   0.5904486   0.9190314   1.3256407  

Coefficients:
                  Estimate   Std. Error  z value   Pr(>|z|)    
(Intercept)    0.462496014  0.265017604  1.74515  0.0809584 .  
XBlack         1.329633506  0.244053422  5.44812 5.0904e-08 ***
Xvote         -0.008839950  0.004262016 -2.07412  0.0380678 *  
XMale          0.781144950  0.283218355  2.75810  0.0058138 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 523.21465  on 410  degrees of freedom
Residual deviance: 469.48706  on 407  degrees of freedom
AIC: 477.48706

Number of Fisher Scoring iterations: 4

> 
> # Logit model odds ratios 
> 
> exp(logit$coefficients) 
  (Intercept)        XBlack Xvote                XMale 
 1.5880327947  3.7796579101  0.9911990073  2.1839713716 

有沒有一種方法可以在R中結合這兩個腳本來更新我的logit模型,以便在我預測時將性別視為50/50,而不是74%的女性/ 26%的男性?

謝謝!

由於您要根據模型創建預測,因此,有一個可能的解決方案:(1)將邏輯回歸模型與您手頭的數據擬合(即女性占74%,男性占26%),然后(2)提取預測模型中將性別變量設置為0.5的概率。 有關更多信息,請參見?predict.glm

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM