通过分类变量和连续变量的交互可视化 GLMM 预测

Question

I am working in R with a GLMM with a mixture of continuous and categorical variables with some interactions.我正在使用 GLMM 在 R 中工作，该 GLMM 混合了连续和分类变量以及一些交互。 I have used the dredge and model.avg functions in MuMIn to obtain effect estimates for each variable.我在 MuMIn 中使用了 dredge 和 model.avg 函数来获得每个变量的效果估计。 My problem is in how best to plot the results.我的问题是如何最好地绘制结果。 I want to make a figure showing the effect of one variable (forest) on my data where the trendline reflects the forest parameter estimate, but I can't figure out how to hold the categorical variables and interaction variables at their 'average' so that the trendline only reflects the effect of forest.我想制作一个图表，显示一个变量（森林）对我的数据的影响，其中趋势线反映了森林参数估计，但我无法弄清楚如何将分类变量和交互变量保持在它们的“平均值”，以便趋势线仅反映森林的影响。

Here's the model and plot set-up:这是模型和绘图设置：

#load packages and document
cuckoo<-read.table("http://www.acsu.buffalo.edu/~ciaranwi/home_range.txt", 
header=T,sep="\t")
require(lme4)
require(MuMIn)
as.factor (cuckoo$ID)
as.factor (cuckoo$Sex)
as.factor(cuckoo$MS_bin)
options(na.action = "na.fail")

# create global model and fit
fm<- lmer(log(KD_95)~ MS_bin + Forest + NDVI + Sex + Precip + MS_bin*Forest 
+ MS_bin*NDVI  + MS_bin*Sex + MS_bin*Precip + Argos + Sample + (1|ID), data 
= cuckoo, REML = FALSE)

# dredge but always include argos and sample
KD95<-dredge(fm,fixed=c("Argos","Sample"))

# model averaging 
avgmod<-model.avg(KD95, fit=TRUE)
summary(avgmod)

#plot data
plot(cuckoo$Forest, (log(cuckoo$KD_95)),
 xlab = "Mean percentage of forest cover",
 ylab = expression(paste(plain("Log of Kernel density estimate, 95%    
utilisation, km"^{2}))),
 pch = c(15,17)[as.numeric(cuckoo$MS_bin)],  
 main = "Forest cover",
 col="black", 
 ylim=c(14,23))
legend(80,22, c("Breeding","Nonbreeding"), pch=c(15, 17),  cex=0.7)

Then I get stuck with how to include a trendline.然后我陷入了如何包含趋势线的问题。 So far I have:到目前为止，我有：

#parameter estimates from model.avg
argos_est<- -1.6
MS_est<- -1.77
samp_est<-0.01
forest_est<--0.02
sex_est<-0.0653
precip_est<-0.0004
ndvi_est<--0.00003
model_intercept<-22.7

#calculate mean values for parameters
argos_mean<-mean(cuckoo$Argos)
samp_mean<-mean(cuckoo$Sample)
forest_mean<-mean(cuckoo$Forest)
ndvi_mean<-mean(cuckoo$NDVI)
precip_mean<-mean(cuckoo$Precip)

#calculate the intercept and add trend line
intercept<-(model_intercept + (forest_est*cuckoo$Forest) +    
(argos_est*argos_mean) + (samp_est * samp_mean) + (ndvi_est*ndvi_mean) +  
(precip_est*precip_mean) )

abline(intercept, forest_est)

But this doesn't consider the interactions or the categorical variables and the intercept looks way too high.但这并没有考虑交互作用或分类变量，截距看起来太高了。 Any ideas?有任何想法吗？

Answer 1

In terms of process, you can make your coding much easier by taking advantage of the fact that R stores lots of information about the model in the model object and has functions to get information out of the model object.在流程方面，您可以利用 R 在模型对象中存储大量有关模型的信息并具有从模型对象中获取信息的功能这一事实，从而使您的编码变得更加容易。 For example, coef(avgmod) will give you the model coefficients and predict(avgmod) will give you the model's predictions for each observation in the data frame you used to fit the model.例如， coef(avgmod)将为您提供模型系数，而predict(avgmod)将为您提供用于拟合模型的数据框中每个观察值的模型预测。

To visualize predictions for specific combinations of data values we're interested in, create a new data frame that has the means of the variables we want to hold constant, along with a range of values for variables that we want to vary (like Forest ).为了可视化我们感兴趣的数据值的特定组合的预测，创建一个新的数据框，其中包含我们想要保持不变的变量的均值，以及我们想要改变的变量值的范围（如Forest ） . expand.grid creates a data frame with all combinations of the values listed below. expand.grid使用下面列出的值的所有组合创建一个数据框。

pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                        Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                        Sex="M", Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                        ID=unique(cuckoo$ID))

Now we use the predict function to add predictions for log(KD_95) to this data frame.现在我们使用predict函数将 log(KD_95) 的predict添加到这个数据帧。 predict takes care of calculating the model predictions for whatever data you feed it (assuming you give it a data frame that includes all the variables in your model). predict负责计算您提供给它的任何数据的模型预测（假设您给它一个包含模型中所有变量的数据框）。

pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)

Now we plot the results.现在我们绘制结果。 geom_point plots the points, as in your original plot, then geom_line adds the predictions for each level of MS_bin (and Sex="M"). geom_point绘制点，就像在原始图中一样，然后geom_line为MS_bin每个级别（和 Sex="M"）添加预测。

library(ggplot2)

ggplot() +
  geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=factor(MS_bin), 
             colour=factor(MS_bin), size=3)) +
  geom_line(data=pred.data, aes(Forest, lKD_95_pred, colour=factor(MS_bin)))

Here's the result:结果如下：

UPDATE: To plot regression lines for both male and female, just include Sex="F" in pred.data and add Sex as an aesthetic in the plot.更新：要绘制男性和女性的回归线，只需在pred.data包含 Sex="F" 并在pred.data添加Sex作为美学。 In the example below, I use different shapes to mark Sex when plotting the points and different line types to mark Sex for the regression lines.在下面的示例中，我在绘制点时使用不同的形状来标记Sex ，并使用不同的线型来标记回归线的Sex 。

pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                        Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                        Sex=unique(cuckoo$Sex), Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                        ID=unique(cuckoo$ID))

pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)

ggplot() +
  geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=Sex, 
                              colour=factor(MS_bin)), size=3) +
  geom_line(data=pred.data, aes(Forest, lgKD_95_pred, linetype=Sex, 
                                colour=factor(MS_bin)))

Answer 2

I hope I am not missing the point, but if you want a linear trend you don't actually have to manually calculate everything, but get what you plotted and fit ay~x linear regression model, like this:我希望我没有错过这一点，但如果你想要一个线性趋势，你实际上不必手动计算所有内容，而是获取你绘制的内容并拟合 a~x 线性回归模型，如下所示：

model = lm(log(cuckoo$KD_95)~cuckoo$Forest)

model

# Call:
#   lm(formula = log(cuckoo$KD_95) ~ cuckoo$Forest)
# 
# Coefficients:
#   (Intercept)  cuckoo$Forest  
#      17.13698       -0.01461 

abline(17.13698 ,  -0.01461, col="red")

The red line is using the intercept and slope from the regression fit.红线使用回归拟合的截距和斜率。 Black line is your manual process.黑线是您的手动过程。

通过分类变量和连续变量的交互可视化 GLMM 预测

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-09-04 16:15:55

解决方案2
1 2015-09-04 15:11:04

通过分类变量和连续变量的交互可视化 GLMM 预测

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-09-04 16:15:55

解决方案2 1 2015-09-04 15:11:04

解决方案1
4 已采纳 2015-09-04 16:15:55

解决方案2
1 2015-09-04 15:11:04