简体   繁体   English

我如何使用glm()函数?

[英]How do I use the glm() function?

I'm trying to fit a general linear model (GLM) on my data using R. I have a Y continuous variable and two categorical factors, A and B. Each factor is coded as 0 or 1, for presence or absence. 我正在尝试使用R在我的数据上拟合一般线性模型(GLM)。我有一个Y连续变量和两个分类因子,A和B.每个因子编码为0或1,表示存在或不存在。

Even if just looking at the data I see a clear interaction between A and B, the GLM says that p-value>>>0.05. 即使只看数据我看到A和B之间有明显的相互作用,GLM也说p值>>> 0.05。 Am I doing something wrong? 难道我做错了什么?

First of all I create the data frame including my data for the GLM, which consists on a Y dependent variable and two factors, A and B. These are two level factors (0 and 1). 首先,我创建了数据框,包括我的GLM数据,它包含一个Y因变量和两个因子A和B.这是两个因子(0和1)。 There are 3 replicates per combination. 每种组合有3个重复。

A<-c(0,0,0,1,1,1,0,0,0,1,1,1)
B<-c(0,0,0,0,0,0,1,1,1,1,1,1)
Y<-c(0.90,0.87,0.93,0.85,0.98,0.96,0.56,0.58,0.59,0.02,0.03,0.04)
my_data<-data.frame(A,B,Y)

Let's see how it looks like: 让我们看看它的样子:

my_data
##    A B    Y
## 1  0 0 0.90
## 2  0 0 0.87
## 3  0 0 0.93
## 4  1 0 0.85
## 5  1 0 0.98
## 6  1 0 0.96
## 7  0 1 0.56
## 8  0 1 0.58
## 9  0 1 0.59
## 10 1 1 0.02
## 11 1 1 0.03
## 12 1 1 0.04

As we can see just looking on the data, there is a clear interaction between factor A and factor B, as the value of Y dramatically decreases when A and B are present (that is A=1 and B=1). 正如我们只能看到数据一样,因子A和因子B之间存在明显的相互作用,因为当A和B存在时,Y的值急剧下降(即A = 1且B = 1)。 However, using the glm function I get no significant interaction between A and B, as p-value>>>0.05 但是,使用glm函数我得不到A和B之间的显着相互作用,因为p值>>> 0.05

attach(my_data)
## The following objects are masked _by_ .GlobalEnv:
## 
##     A, B, Y


my_glm<-glm(Y~A+B+A*B,data=my_data,family=binomial)
## Warning: non-integer #successes in a binomial glm!
summary(my_glm)
## 
## Call:
## glm(formula = Y ~ A + B + A * B, family = binomial, data = my_data)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.275191  -0.040838   0.003374   0.068165   0.229196  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   2.1972     1.9245   1.142    0.254
## A             0.3895     2.9705   0.131    0.896
## B            -1.8881     2.2515  -0.839    0.402
## A:B          -4.1747     4.6523  -0.897    0.370
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7.86365  on 11  degrees of freedom
## Residual deviance: 0.17364  on  8  degrees of freedom
## AIC: 12.553
## 
## Number of Fisher Scoring iterations: 6

While you state Y is continuous, the data shows that Y is rather a fraction. 当你说Y是连续的时,数据显示Y是一个分数。 Hence, probably the reason you tried to apply GLM in the first place. 因此,可能是您首先尝试应用GLM的原因。

To model fractions (ie continuous values bounded by 0 and 1) can be done with logistic regression if certain assumptions are fullfilled. 如果满足某些假设,则可以使用逻辑回归来对分数进行建模(即,以0和1为界的连续值)。 See the following cross-validated post for details: https://stats.stackexchange.com/questions/26762/how-to-do-logistic-regression-in-r-when-outcome-is-fractional . 有关详细信息,请参阅以下交叉验证的帖子: https//stats.stackexchange.com/questions/26762/how-to-do-logistic-regression-in-r-when-outcome-is-fractional However, from the data description it is not clear that those assumptions are fullfilled. 但是,根据数据描述,尚不清楚这些假设是否已满。

An alternative to model fractions are beta regression or fractional repsonse models. 模型分数的替代方案是β回归或分数回复模型。

See below how to apply those methods to your data. 请参阅下文,了解如何将这些方法应用于数据。 The results of both methods are consistent in terms of signs and significance. 两种方法的结果在符号和显着性方面是一致的。

# Beta regression
install.packages("betareg")
library("betareg")
result.betareg <-betareg(Y~A+B+A*B,data=my_data)
summary(result.betareg)

# Call:
#   betareg(formula = Y ~ A + B + A * B, data = my_data)
# 
# Standardized weighted residuals 2:
#   Min      1Q  Median      3Q     Max 
# -2.7073 -0.4227  0.0682  0.5574  2.1586 
# 
# Coefficients (mean model with logit link):
#   Estimate Std. Error z value Pr(>|z|)    
# (Intercept)   2.1666     0.2192   9.885  < 2e-16 ***
#   A             0.6471     0.3541   1.828   0.0676 .  
#   B            -1.8617     0.2583  -7.206 5.76e-13 ***
#   A:B          -4.2632     0.5156  -8.268  < 2e-16 ***
#   
#   Phi coefficients (precision model with identity link):
#   Estimate Std. Error z value Pr(>|z|)  
# (phi)    71.57      29.50   2.426   0.0153 *
#   ---
#   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
# 
# Type of estimator: ML (maximum likelihood)
# Log-likelihood: 24.56 on 5 Df
# Pseudo R-squared: 0.9626
# Number of iterations: 62 (BFGS) + 2 (Fisher scoring) 


# ----------------------------------------------------------


# Fractional response model
install.packages("frm")
library("frm")
frm(Y,cbind(A, B, AB=A*B),linkfrac="logit")


*** Fractional logit regression model ***

#   Estimate Std. Error t value Pr(>|t|)    
# INTERCEPT  2.197225   0.157135  13.983    0.000 ***
#   A          0.389465   0.530684   0.734    0.463    
#   B         -1.888120   0.159879 -11.810    0.000 ***
#   AB        -4.174668   0.555642  -7.513    0.000 ***
#   
#   Note: robust standard errors
# 
# Number of observations: 12 
# R-squared: 0.992 

The family=binomial implies Logit (Logistic) Regression, which is itself produces a binary result. family =二项式意味着Logit(Logistic)回归,它本身产生二进制结果。

From Quick-R 来自Quick-R

Logistic Regression Logistic回归

Logistic regression is useful when you are predicting a binary outcome from a set of continuous predictor variables. 当您从一组连续预测变量预测二元结果时,逻辑回归很有用。 It is frequently preferred over discriminant function analysis because of its less restrictive assumptions. 由于其限制性较低的假设,它通常优于判别函数分析。

The data shows an interaction. 数据显示了一种互动。 Try to fit a different model, logistic is not appropriate. 尝试适应不同的模型,物流是不合适的。

with(my_data, interaction.plot(A, B, Y, fixed = TRUE, col = 2:3, type = "l"))

在此输入图像描述 An analysis of variance shows clear significance for all factors and interaction. 方差分析显示了所有因素和相互作用的明显意义。

fit <- aov(Y~(A*B),data=my_data)
summary(fit)
            Df Sum Sq Mean Sq F value   Pr(>F)    
A            1 0.2002  0.2002   130.6 3.11e-06 ***
B            1 1.1224  1.1224   732.0 3.75e-09 ***
A:B          1 0.2494  0.2494   162.7 1.35e-06 ***
Residuals    8 0.0123  0.0015                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM