简体   繁体   中英

predictions from nnet::multinom

I'm probably just missing something silly here but I can't seem to manually replicate the predicted values from this model. I'm following this example

library('foreign')
library('nnet')
library('tidyverse')
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml = ml %>% 
  mutate(prog2 = fct_relevel(prog, "academic"))

# Fit a very basic model of the students choice of program 
# as a function of their socioeconmic status and writing score:
test <- multinom(prog2 ~ ses + write, data = ml)
summary(test)

# If we wanted to calculate the probability of a high SES student
# with a median writing score picking a vocational program,
# we should be able to do this:
coef = summary(test)$coefficients[2, c(1, 3:4)]
log_odds = sum(coef * c(1, 1, median(ml$write)))
prob = exp(log_odds)/(1 + exp(log_odds))
prob

# from preditions:
ml %>% 
  bind_cols(as_tibble(predict(test, type = 'probs'))) %>% 
  filter(ses == 'high', write == median(write))

I'm getting 13.0% from my manual calculation and the predict function gives 10.8%. What did I miss?

The prediction of a multinomial logistic model on the link & response scale can be obtained as follows (key is that the inverse link function for multinomial is the softmax function, not the inverse logit):

library('foreign')
library('nnet')
library('tidyverse')
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml = ml %>% 
  mutate(prog2 = fct_relevel(prog, "academic"))

# Fit a very basic model of the students choice of program 
# as a function of their socioeconmic status and writing score:
fit <- multinom(prog2 ~ ses + write, data = ml)
summary(fit)

# model predictions on link scale
X <- as.matrix(data.frame('(Intercept)'=1, sesmiddle=0, seshigh=1, write=median(ml$write), check.names=F)) 
# or X <- model.matrix(fit) to use model matrix
betahat <- t(rbind(0, coef(fit))) # model coefficients, with expicit zero row added for reference category & transposed
preds_link <- X %*% betahat # predictions on the link scale (with explicit zero for reference level included here, sometimes this level is dropped)
colnames(preds_link) <- fit$lev

# model prediction on response / probability scale
softMax <- function(eta){ # softmax function = inverse link function for multinomial
   exp_eta <- exp(eta)
   return(sweep(exp_eta, 1, STATS=rowSums(exp_eta), FUN="/"))
 }
preds_response <- softMax(preds_link)
preds_response
#      academic   general  vocation
# [1,] 0.721014 0.1710377 0.1079482

# this matches
ml %>% 
  bind_cols(as_tibble(predict(fit, type = 'probs'))) %>% 
  filter(ses == 'high', write == median(write))
# id female  ses schtyp     prog read write math science socst       honors awards cid    prog2 academic
# 71   90 female high public academic   42    54   50      50    52 not enrolled      1   8 academic 0.721014
# 92  130 female high public  general   43    54   55      55    46 not enrolled      1  10  general 0.721014
# 140  97   male high public academic   60    54   58      58    61 not enrolled      1  14 academic 0.721014
# 157  96 female high public academic   65    54   61      58    56 not enrolled      1  16 academic 0.721014
# general  vocation
# 71  0.1710377 0.1079482
# 92  0.1710377 0.1079482
# 140 0.1710377 0.1079482
# 157 0.1710377 0.1079482

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM