简体   繁体   English

多项回归(不同的结果 - 相同的数据集,R与SPSS)。 nnet包 - multinom功能

[英]Multinomial regression (different results — same dataset, R vs SPSS). nnet package — multinom function

Recently, I had to work with R and SPSS to analyze a dataset with Multinomial Regression framework. 最近,我不得不与R和SPSS合作,用Multinomial Regression框架分析数据集。 We surveyed some participants (10-12 years old) and we asked which "professional field" they like the most , then we asked how often they accessed the internet . 我们调查了一些参与者(10-12岁),我们询问他们最喜欢哪个“专业领域” ,然后我们询问他们访问互联网的频率 So, the outcome is a Categorical variable": professional field -- "military", "I don't know", and "Other profession"; and the independent variable is also a categorical variable (how often do you access the internet ("I don't have access", "1-3 hours/day", "3-5 hours/day"). 因此,结果是一个分类变量“:专业领域 - ”军事“,”我不知道“和”其他职业“;而自变量也是一个分类变量(您多久访问一次互联网( “我无法访问”,“1-3小时/天”,“3-5小时/天”)。

I ran a model using R (with nnet package , via multinom function ), and other statistician ran using SPSS. 我使用R(使用nnet包 ,通过multinom函数 )运行模型,其他统计学家使用SPSS运行。 All reference category was defined correctly. 所有参考类别都已正确定义。

Now, when we compare the results, they don't agree for the second category of my independent variable. 现在,当我们比较结果时,他们不同意我的自变量的第二类。 The first one is ok. 第一个是好的。

Please have a look at the entire code: 请看一下整个代码:

library(tidyverse)
library(stargazer)
library(nnet)

ds <- ds %>% mutate(internet = factor(internet))
ds <- ds %>% mutate(internet = relevel(internet, ref = "I dont have internet access"))

ds <- ds %>% mutate(field = factor(field))
ds <- ds %>% mutate(fielf = relevel(field, ref = "I dont know"))

mod <- multinom(field ~ internet, data = ds, maxit=1000, reltol = 1.0e-9)
stargazer(mod, type = 'text')

and the SPSS result 和SPSS结果 SPSS结果

For the sake of clarity, when the independent variable has only two categories (such as sex, male and female), both R and SPSS agree with its results 为了清楚起见,当自变量只有两个类别(如性别,男性和女性)时,R和SPSS都同意其结果

SPSS结果2

After a huge effort trying to understand the discrepancy between both results, I read that nnet estimation could have some problems (optimization problem ?), and that the discrepancy of results is not so strange as I was thinking at the beginning .. 在努力了解两种结果之间的差异之后,我认为nnet估计可能存在一些问题 (优化问题?), 并且结果的差异并不像我在开始时想的那么奇怪

Can someone explain to me what's going on here? 有人可以向我解释这里发生了什么吗? What am I missing?! 我错过了什么?! I assume SPSS and R must have the same results if we are running the same model. 我假设如果我们运行相同的模型,SPSS和R必须具有相同的结果。

Thank you 谢谢

That's the ds I'm using in this example: 那是我在这个例子中使用的ds:

ds <- structure(list(sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                             2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
                                             2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                             2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
                                             1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
                                             2L, 1L), .Label = c("male", "female"), class = "factor"), internet = structure(c(3L, 
                                                                                                                              3L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L, 
                                                                                                                              2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 
                                                                                                                              2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 
                                                                                                                              2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 2L, 
                                                                                                                              2L, 1L, 3L, 2L, 2L, 3L, 2L, 2L), .Label = c("I dont have internet access", 
                                                                                                                                                                          "1-3 hours/day", "3-5 hours/day"), class = "factor"), field = structure(c(1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("I dont know", "Military", 
                                                                                                                                                                                                                                                                                                "Other profession"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                             -73L))

You could use mlogit alternatively which resembles the SPSS results more closely. 您可以使用mlogit ,它更接近于SPSS结果。 The SPSS values should be valid, since Stata yields similar results ( -14.88 (982.95), 11.58 (982.95), 11.44 (982.95) ). SPSS值应该是有效的,因为Stata产生类似的结果( -14.88 (982.95), 11.58 (982.95), 11.44 (982.95) )。 The remaining deviations might stem from the ridiculous significance of "other profession". 剩下的偏差可能源于“其他职业”的荒谬意义。

library(mlogit)
ml.dat <- mlogit.data(ds, choice="field", shape="wide")
ml <- mlogit(field ~ 1 | internet, data=ml.dat)

Yielding 生产

texreg::screenreg(ml)
=========================================================
                                                Model 1  
---------------------------------------------------------
Military:(intercept)                               -0.41 
                                                   (0.91)
Other profession:(intercept)                      -16.89 
                                                (2690.89)
Military:factor(internet)1-3 hours/day             -1.50 
                                                   (1.06)
Other profession:factor(internet)1-3 hours/day     13.60 
                                                (2690.89)
Military:factor(internet)3-5 hours/day             -1.64 
                                                   (1.06)
Other profession:factor(internet)3-5 hours/day     13.46 
                                                (2690.89)
---------------------------------------------------------
AIC                                                85.49 
Log Likelihood                                    -36.74 
Num. obs.                                          73    
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 来自 R 的 nnet package 的 function multinom() 是否适合多项逻辑回归或泊松回归? - Does the function multinom() from R's nnet package fit a multinomial logistic regression, or a Poisson regression? R中的多项式逻辑回归:nnet程序包中的多项式与mlogit程序包中的mlogit有何不同? - multinomial logistic regression in R: multinom in nnet package result different from mlogit in mlogit package? 来自R package nnet的函数multinom如何计算多项式概率权重? - How does the function multinom from R package nnet compute the multinomial probability weights? 在R中使用多项式函数进行多项式回归 - Multinomial regression using multinom function in R 用R中的响应(nnet程序包)分离“多项式”估计 - separating `multinom` estimations by response (nnet package) in R 在R中的nnet multinom函数中抑制收敛消息 - Suppress convergence message in nnet multinom function in R R:用于nnet multinom多项式的Tukey posthoc测试适用于测试多项分布的总体差异 - R: Tukey posthoc tests for nnet multinom multinomial fit to test for overall differences in multinomial distribution 使用 nnet 包评估 R 中多项式 logit 的拟合优度 - Assesing the goodness of fit for the multinomial logit in R with the nnet package 如何在nnet包中的multinom()中设置特定的对比? - How to set specific contrasts in multinom() in nnet package? 当使用nnet包中的multinom函数时,如何控制神经网络的架构? - When using the multinom function from the nnet package, how can I control the architecture of the neural networks?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM