简体   繁体   English

我应该为我的 GLMM 选择哪个发行系列?

[英]Which distribution family should I choose for my GLMM?

I'am modelling the effect of individuals' chronotype on their respective school performance.我正在模拟个人的时间表对他们各自学校表现的影响。 So, my dataframe consist of the subjective and school declared performance (dependent variable) of middle school students (random variable), and their chronotype (independent variable).所以,我的dataframe由中学生的主观和学校声明的表现(因变量)(随机变量)和他们的时间表(自变量)组成。 The students' subjective performance was measured on a scale from 1 to 5, while the school declared performance was on a scale from 0 to 10. Thus, I Z-standardized these performance values.学生的主观表现以 1 到 5 的等级衡量,而学校宣布的表现是从 0 到 10 的等级。因此,我对这些表现值进行了 Z 标准化。

#Loading data
data <- read.table("clipboard", header=T)
#Scale function: Z-Score Standardization
data.st <- as.data.frame(scale(data$performance, center = FALSE))
data$performance.st <- data.st$V1
hist(data$performance.st, prob=TRUE, ylim=c(0,1), 
     main = "Histogram", col= "lightblue")

在此处输入图像描述

shapiro.test(data$performance.st)
#W = 0.96497, p-value = 6.798e-07

Then I modelled my data considering a poisson family:然后我考虑泊松族对我的数据进行建模:

# Poisson distribution
glmer.poisson <- glmer(performance ~ chronotype + (1|random), 
               family = poisson(link = log),
                     data =  data)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: poisson  ( log )
Formula: performance ~ chronotype + (1 | random)
   Data: data
     AIC      BIC   logLik deviance df.resid 
     Inf      Inf     -Inf      Inf      310 
Random effects:
 Groups Name        Std.Dev.
 random (Intercept) 1       
Number of obs: 315, groups:  random, 55
Fixed Effects:
 (Intercept)  chronotypemm  chronotypemv  chronotypeve  
    -0.03279      -0.02495      -0.12318       0.01500  
optimizer (Nelder_Mead) convergence code: 0 (OK) ; 29610 optimizer warnings; 1 lme4 warnings

plot(simulateResiduals(glmer.poisson))

在此处输入图像描述

However, the GLMM clearly did not fit.然而,GLMM 显然不适合。 And now I have some questions regarding my data and GLMMs.现在我对我的数据和 GLMM 有一些疑问。

First: I don't really know which family to use to model my data.第一:我真的不知道要使用哪个系列来 model 我的数据。 Although I have non-integer values, I think poisson is the right family to use (although lognormal distribution provided nice results).虽然我有非整数值,但我认为泊松是正确的家庭使用(尽管对数正态分布提供了很好的结果)。 Though, what about the warnings returned in the lm4 package saying that my values are non-integer?但是,在 lm4 package 中返回的警告说我的值不是整数呢? and the optimizer warnings?和优化器警告? should I change my optimizer?我应该改变我的优化器吗? if so, how?如果是这样,如何?

Second: A fellow of mine suggested for me to use gamma or negative binomial instead of poisson.第二:我的一位同事建议我使用伽玛或负二项式而不是泊松。 Though, after I see the histogram i thought a lognormal distribution would be a nice try.不过,在我看到直方图之后,我认为对数正态分布会是一个不错的尝试。 So I modelled accordingly:所以我做了相应的建模:

#Gamma distribution
gamma <- glmer(performance ~ chronotype + (1|random), 
               family = Gamma(link="inverse"),
               data =  data)
qqnorm(resid(gamma), pch=16)
qqline(resid(gamma))
plot(gamma)

# Negative binomal distribution
glmer.nb <- glmer.nb(performance ~ chronotype + (1|random),
               data =  data, family=MASS::negative.binomial(theta=1.75))

plot(simulateResiduals(glmer.nb)) 
# The simulated residuals of the glmer.nb were very similar to the poisson model
plot(glmer.nb)
qqnorm(resid(glmer.nb), pch=16)
qqline(resid(glmer.nb))

# lognormal distribution
lognormal <- glmer(formula = log(performance) ~ chronotype + (1|random),
               data = data, family=gaussian(link = identity))
plot(simulateResiduals(lognormal))
plot(lognormal)

qqnorm(resid(lognormal), pch=16)
qqline(resid(lognormal))

The lognormal seems to be a good distribution for my data, but I am not sure.对数正态似乎是我数据的一个很好的分布,但我不确定。

Finally: Suppose that I'am building a GLMM considering a poisson family.最后:假设我正在构建一个考虑泊松族的 GLMM。 Should I standardize my dependent variable even though poisson use a log link function?即使泊松使用日志链接 function,我是否应该标准化我的因变量? I think the correct answer is yes since my data have different scales.我认为正确的答案是肯定的,因为我的数据有不同的尺度。

Even if I'm a great Erlang/OTP fan currently developing my application server ( http://code.google.com/p/tideland-eas/ ) in Erlang I think it's not the right tool for you.即使我是一个伟大的 Erlang/OTP 粉丝,目前正在 Erlang 中开发我的应用程序服务器 ( http://code.google.com/p/tideland-eas/ ) 我认为这不是适合你的工具。 Erlang is brilliant in the domain of concurrency, distribution, and reliability. Erlang 在并发、分布和可靠性领域非常出色。 But you need a tough integration into the Microsoft world.但是您需要与 Microsoft 世界进行严格的整合。 So maybe you should take a look at F# to get at least a kind of Erlang feeling here.所以也许你应该看看 F# 在这里得到至少一种 Erlang 的感觉。

Plots are good.情节很好。

We often learn a lot from visualizing the raw data.我们经常从可视化原始数据中学到很多东西。 It's more effective than fitting an inappropriate model and then wondering whether the residual QQ plot looks normal "enough".这比拟合一个不合适的 model 然后想知道剩余的 QQ plot 看起来是否正常“足够”更有效。

So I made a few plots of your data.所以我对你的数据做了几张图。 The plots suggest a substantial revision of your analysis;这些图表表明您对分析进行了重大修改; choosing a different distribution family won't be sufficient to make sense of your data.选择不同的分布族不足以理解您的数据。

First, we look at histograms of performance scores by subject (mat, pt, science) and type (school-declared, subjective).首先,我们按学科(mat、pt、科学)和类型(学校申报的、主观的)查看表现得分的直方图。

在此处输入图像描述

The histograms show that the transformation直方图显示变换

data$performance = scale(data$performance, center = FALSE)

doesn't make sense because it assumes subjective scores and school-declared scores have the same variance.没有意义,因为它假设主观分数和学校宣布的分数具有相同的方差。 This obviously doesn't hold.这显然不成立。

Once you standardize the scores, you ignore the fact that you are working with different measures of performance.一旦你标准化了分数,你就忽略了你正在使用不同的绩效衡量标准这一事实。 By plotting subjective against school-declared scores, we see that the two measures are correlated only for math.通过将主观分数与学校公布的分数作图,我们看到这两个指标仅与数学相关。 There is little agreement between students' perception and teachers' evaluation of performance in pt and science.学生的感知与教师对科学和科学表现的评价几乎没有一致性。

在此处输入图像描述

And finally, let's look at (school-declared) performance as a function of choronotype.最后,让我们看一下(学校宣布的)性能,作为 choronotype 的 function。 There seems to be something going on though it will be difficult to estimate the effect of chronotype with high precision as 39 out of 55 students have the in chronotype;似乎有一些事情正在发生,尽管很难高精度地估计时间表的影响,因为 55 名学生中有 39 人in时间表; the other three types are rare.其他三种很少见。 A lot of the difference is in the spread (variability) rather than the mean, except for pt where mv seems to be associated with lower performance.很多差异在于分布(可变性)而不是平均值,除了 pt ,其中mv似乎与较低的性能相关联。

在此处输入图像描述

  1. If you want COM support you better work with more microsoft-friendly language如果你想要 COM 支持你更好地使用更微软友好的语言
  2. The same相同
  3. The same, but erlang has ODBC interface that allows you to work with 'ordinary' SQL-servers.相同,但 erlang 具有 ODBC 接口,允许您使用“普通”SQL 服务器。 I know it can mysql, not sure for mssql我知道它可以 mysql,不确定 mssql

Anyway you should think about some helping applications like 'print_pdf.exe', 'change_word.exe' that would be managed by erlang system, but not one erlang application which doing everything.无论如何,您应该考虑一些由 erlang 系统管理的帮助应用程序,例如“print_pdf.exe”、“change_word.exe”,但不是一个可以做所有事情的 erlang 应用程序。 Please read about C Nodes and Erlang FAQ's question "What sort of applications is Erlang particularly suitable for?"请阅读有关 C 节点和Erlang 常见问题解答的问题“Erlang 特别适合什么样的应用?”

-- sorry my English ) --对不起我的英语)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 GLMM 中的 Gamma 分布 - Gamma distribution in a GLMM 在GLMM或lme中将Family ID指定为随机效果 - To specify Family id as random effect in GLMM or lme 具有 beta 分布且 y 变量中有很多零的 GLMM - GLMM with beta distribution and lots of zeros in y variable 关于模拟分发族 - Regarding simulating a family of distribution 在二项式 GLMM 中,如何在模型输出中包含所有级别? - In a binomial GLMM how do I include all levels in my model output? eval 中的错误(family$initialize,rho):拟合 GLMM 时,y 值必须为 0 &lt;= y &lt;= 1 - Error in eval(family$initialize, rho) : y values must be 0 <= y <= 1 when fitting a GLMM 我是否需要/如何为 GLM model 指定(或检索)分布参数(例如,泊松家族的 lambda)? - Do I need to / how can I specify (or retrieve) distribution parameters for GLM model (e.g. lambda for poisson family)? 我如何选择具有子集的数据点数量分布的前 25% 中具有点数量的国家 - How i can i choose the countries that have number of points in the top 25% of the distribution of number of datapoints with subset 选择我想从 Shiny 应用程序下载的情节 - Choose which plot I want to download from Shiny app 我需要找到来自零售链网络的数据分布。 没有分布适合数据 - I need to find the distribution of data, which is from a retail chain network. No distribution fits the data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM