简体   繁体   中英

Using zelig for simulation

I am very confused about the package Zelig and in particular the function sim. What i want to do is estimate a logistic regression using a subset of my data and then estimate the fitted values of the remaining data to see how well the estimation performs. Some sample code follows:

data(turnout)

turnout <- data.table(turnout)

Shuffle the data

turnout <- turnout[sample(.N,2000)]

Create a sample for regression

turnout_sample <- turnout[1:1800,]

Create a sample for out of data testing

turnout_sample2 <- turnout[1801:2000,]

Run the regression

z.out1 <- zelig(vote ~ age + race, model = "logit", data = turnout_sample)

summary(z.out1) Model:

Call: z5$zelig(formula = vote ~ age + race, data = turnout_sample)

Deviance Residuals: Min 1Q Median 3Q Max
-1.9394 -1.2933 0.7049 0.7777 1.0718

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.028874 0.186446 0.155 0.876927 age 0.011830 0.003251 3.639 0.000274 racewhite 0.633472 0.142994 4.430 0.00000942

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2037.5  on 1799  degrees of freedom

Residual deviance: 2002.9 on 1797 degrees of freedom AIC: 2008.9

Number of Fisher Scoring iterations: 4

Next step: Use 'setx' method

Set the x values to the remaining 200 observations

x.out1 <- setx(z.out1,fn=NULL,data=turnout_sample2)

Simulate

s.out1 <- sim(z.out1,x=x.out1)

Get the fitted values

fitted <- s.out1$getqi("ev")

What i don't understand is that the list fitted now contains 1000 values and all the values are between 0,728 and 0,799. 1. Why are there 1000 values when what I am trying to estimate is the fitted value of 200 observations? 2. And why are the observations so closely grouped?

I hope someone can help me with this.

Best regards

The first question:
From the signature of sim ( sim(obj, x = NULL, x1 = NULL, y = NULL, num = 1000 ..) you see the default number of simulations is 1000. If you want to have 200, set num=200 .

However, the sim in this example from documentation you use, actually generates (simulates) the probability that a person will vote given certain values (either computed by setx or computed and fixed on some value like this setx(z.out, race = "white") ).

So in your case, you have 1000 simulated probability values between 0,728 and 0,799, which is what you are supposed to get.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM