简体   繁体   中英

Predictive Performance from Poisson model in R

I'm attending a statistics class and my teacher told that the predictive performance is the difference^2 between the data predicted using the model and the actual values. The more the PP tends to 0, the more accurate the model is.

I've split the dataframe in the first 75% to train the model. I'll measure the PP on the last 25% observations. The last column of the dataframe is my Y, a count variable. I've created the model with forward selection using stepAIC and that's my model (eg. shown)

bestmod1 <- glm(formula = y ~ Dalc + school + higher + Pstatus + romantic + 
                 goout + Mjob + schoolsup + reason + studytime + Fedu + failures + 
                 Walc + Fjob + traveltime + health + Medu + 
                 address + guardian + paid + freetime + sex, family = poisson, 
               data = x)

I've read a lot about predict() but I didn't get so much. Could you please show me how to proceed? Thanks.

Could this be right?

p <- predict(bestmod1, newdata=df_test, type = "response")

To evaluate the predictive performance

sum(p - df_test$y)^2

What do you think?

What your teacher seems to be referring to, is known as the residual sum of squares. That is a well-known approach for evaluating the performance of a prediction model, as it is used to determine a mean squared error. In this particular case, it is applied to predicted probabilities and known as the Brier score, which is reflected by type = "response" in the predict method.

Generally, you sum the square of the differences. Then

sum(p - df_test$y)^2

should be altered to

sum((p - df_test$y)^2)

And to answer your question, you seem to be using predict in the correct way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM