[英]Multinomial regression (different results — same dataset, R vs SPSS). nnet package — multinom function
Recently, I had to work with R and SPSS to analyze a dataset with Multinomial Regression framework. 最近,我不得不与R和SPSS合作,用Multinomial Regression框架分析数据集。 We surveyed some participants (10-12 years old) and we asked which "professional field" they like the most , then we asked how often they accessed the internet . 我们调查了一些参与者(10-12岁),我们询问他们最喜欢哪个“专业领域” ,然后我们询问他们访问互联网的频率 。 So, the outcome is a Categorical variable": professional field -- "military", "I don't know", and "Other profession"; and the independent variable is also a categorical variable (how often do you access the internet ("I don't have access", "1-3 hours/day", "3-5 hours/day"). 因此,结果是一个分类变量“:专业领域 - ”军事“,”我不知道“和”其他职业“;而自变量也是一个分类变量(您多久访问一次互联网( “我无法访问”,“1-3小时/天”,“3-5小时/天”)。
I ran a model using R (with nnet package , via multinom function ), and other statistician ran using SPSS. 我使用R(使用nnet包 ,通过multinom函数 )运行模型,其他统计学家使用SPSS运行。 All reference category was defined correctly. 所有参考类别都已正确定义。
Now, when we compare the results, they don't agree for the second category of my independent variable. 现在,当我们比较结果时,他们不同意我的自变量的第二类。 The first one is ok. 第一个是好的。
Please have a look at the entire code: 请看一下整个代码:
library(tidyverse)
library(stargazer)
library(nnet)
ds <- ds %>% mutate(internet = factor(internet))
ds <- ds %>% mutate(internet = relevel(internet, ref = "I dont have internet access"))
ds <- ds %>% mutate(field = factor(field))
ds <- ds %>% mutate(fielf = relevel(field, ref = "I dont know"))
mod <- multinom(field ~ internet, data = ds, maxit=1000, reltol = 1.0e-9)
stargazer(mod, type = 'text')
For the sake of clarity, when the independent variable has only two categories (such as sex, male and female), both R and SPSS agree with its results 为了清楚起见,当自变量只有两个类别(如性别,男性和女性)时,R和SPSS都同意其结果
After a huge effort trying to understand the discrepancy between both results, I read that nnet estimation could have some problems (optimization problem ?), and that the discrepancy of results is not so strange as I was thinking at the beginning .. 在努力了解两种结果之间的差异之后,我认为nnet估计可能存在一些问题 (优化问题?), 并且结果的差异并不像我在开始时想的那么奇怪 。
Can someone explain to me what's going on here? 有人可以向我解释这里发生了什么吗? What am I missing?! 我错过了什么?! I assume SPSS and R must have the same results if we are running the same model. 我假设如果我们运行相同的模型,SPSS和R必须具有相同的结果。
Thank you 谢谢
That's the ds I'm using in this example: 那是我在这个例子中使用的ds:
ds <- structure(list(sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 1L), .Label = c("male", "female"), class = "factor"), internet = structure(c(3L,
3L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L,
2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L,
2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 2L,
2L, 1L, 3L, 2L, 2L, 3L, 2L, 2L), .Label = c("I dont have internet access",
"1-3 hours/day", "3-5 hours/day"), class = "factor"), field = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("I dont know", "Military",
"Other profession"), class = "factor")), class = "data.frame", row.names = c(NA,
-73L))
You could use mlogit
alternatively which resembles the SPSS results more closely. 您可以使用mlogit
,它更接近于SPSS结果。 The SPSS values should be valid, since Stata yields similar results ( -14.88 (982.95), 11.58 (982.95), 11.44 (982.95)
). SPSS值应该是有效的,因为Stata产生类似的结果( -14.88 (982.95), 11.58 (982.95), 11.44 (982.95)
)。 The remaining deviations might stem from the ridiculous significance of "other profession". 剩下的偏差可能源于“其他职业”的荒谬意义。
library(mlogit)
ml.dat <- mlogit.data(ds, choice="field", shape="wide")
ml <- mlogit(field ~ 1 | internet, data=ml.dat)
Yielding 生产
texreg::screenreg(ml)
=========================================================
Model 1
---------------------------------------------------------
Military:(intercept) -0.41
(0.91)
Other profession:(intercept) -16.89
(2690.89)
Military:factor(internet)1-3 hours/day -1.50
(1.06)
Other profession:factor(internet)1-3 hours/day 13.60
(2690.89)
Military:factor(internet)3-5 hours/day -1.64
(1.06)
Other profession:factor(internet)3-5 hours/day 13.46
(2690.89)
---------------------------------------------------------
AIC 85.49
Log Likelihood -36.74
Num. obs. 73
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.