可视化 R 中两个连续变量和一个分类变量之间的三向交互

Question

我在 R 中有一个模型，其中包括两个连续自变量 IVContinuousA、IVContinuousB、IVCategorical 和一个分类变量（具有两个级别：控制和处理）之间的显着三向交互作用。 因变量是连续的 (DV)。

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

你可以在这里找到数据

我试图找到一种在 R 中可视化它的方法，以简化我对它的解释（也许在ggplot2中？）。

受这篇博文的启发，我认为我可以将IVContinuousB二分为高值和低值（因此它本身就是一个两级因素：

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

然后，我计划绘制 DV 和 IV ContinuousA 之间的关系，以及代表这种关系斜率的拟合线，用于 IVCategorical 和我的新二分法 IVContinuousB 的不同组合：

IVCategoricalControl和IVContinuousBHigh
IVCategoricalControl和IVContinuousBLow
IVCategoricalTreatment和IVContinuousBHigh
IVCategoricalTreatment和IVContinuousBLow

我的第一个问题是，这听起来像是生成可解释的三向交互图的可行解决方案吗？ 我想尽可能避免使用 3D 图，因为我觉得它们不直观……或者还有其他方法吗？ 也许上面不同组合的刻面图？

如果这是一个好的解决方案，我的第二个问题是如何生成数据来预测拟合线来表示上面的不同组合？

第三个问题——有人对如何在 ggplot2 中编写代码有任何建议吗？

我在 Cross Validated 上发布了一个非常相似的问题，但因为它与代码更相关，所以我想我会在这里尝试（如果这个问题与社区更相关，我将删除 CV 帖子:)）

非常感谢，

莎拉

请注意，在 DV 列中有NA （保留为空白）并且设计是不平衡的 - 变量 IVCategorical 的控制组与治疗组中的数据点数量略有不同。

仅供参考，我有用于可视化 IVContinuousA 和 IVCategorical 之间双向交互的代码：

A<-ggplot(data=data,aes(x=AOTAverage,y=SciconC,group=MisinfoCondition,shape=MisinfoCondition,col=MisinfoCondition,))+geom_point(size=2)+geom_smooth(method='lm',公式=y~x)

但我想要的是以 IVContinuousB 为条件绘制这种关系......

Answer 1

以下是用于在二维中可视化模型输出的几个选项。 我在这里假设这里的目标是比较Treatment和Control

library(tidyverse)
  theme_set(theme_classic() +
          theme(panel.background=element_rect(colour="grey40", fill=NA))

dat = read_excel("Some Data.xlsx")  # I downloaded your data file

mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)

# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
  nCat = length(unique(data$IVCategorical))
  d = with(data, 
           data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                      IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                      IVCategorical=rep(unique(IVCategorical), each=nA*nB)))

  d$DV = predict(mod, newdata=d)

  return(d)
}

`IVContinuousA`与`DV`的`IVContinuousB`水平

IVContinuousA和IVContinuousB的角色当然可以在这里切换。

ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
  geom_line() +
  facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="")

您可以在没有分面的情况下制作类似的图，但随着IVContinuousB级别数的增加，它变得难以解释：

ggplot(make_pred_dat(nB=3), 
       aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
  geom_line() +
  #facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="", linetype="IVContinuousB") +
  scale_linetype_manual(values=c("1434","11","62")) +
  guides(linetype=guide_legend(reverse=TRUE))

模型预测差异的热图，DV 处理 - `IVContinuousA`和`IVContinuousB`值网格上的 DV 控制

下面，我们看看每对IVContinuousA和IVContinuousB的处理和控制之间的区别。

ggplot(make_pred_dat(nA=100, nB=100) %>% 
         group_by(IVContinuousA, IVContinuousB) %>% 
         arrange(IVCategorical) %>% 
         summarise(DV = diff(DV)), 
       aes(x=IVContinuousA, y=IVContinuousB)) + 
  geom_tile(aes(fill=DV)) +
  scale_fill_gradient2(low="red", mid="white", high="blue") +
  labs(fill=expression(Delta*DV~(Treatment - Control)))

Answer 2

如果你真的想避免 3-d 绘图，你确实可以将连续变量之一转换为分类变量以用于可视化目的。

出于回答的目的，我使用了包car中的Duncan数据集，因为它与您描述的形式相同。

library(car)
# the data
data("Duncan")

# the fitted model; education and income are continuous, type is categorical
lm0 <- lm(prestige ~ education * income * type, data = Duncan)

# turning education into high and low values (you can extend this to more 
# levels)
edu_high <- mean(Duncan$education)  + sd(Duncan$education)
edu_low <- mean(Duncan$education)  - sd(Duncan$education)

# the values below should be used for predictions, each combination of the 
# categories must be represented:
prediction_mat <- data.frame(income = Duncan$income, 
                         education = rep(c(edu_high, edu_low),each = 
                         nrow(Duncan)),
                         type = rep(levels(Duncan$type), each = 
                         nrow(Duncan)*2))


predicted <- predict(lm0, newdata = prediction_mat)


# rearranging the fitted values and the values used for predictions
df <- data.frame(predicted,
             income = Duncan$income,
             edu_group =rep(c("edu_high", "edu_low"),each = nrow(Duncan)),
             type = rep(levels(Duncan$type), each = nrow(Duncan)*2))


# plotting the fitted regression lines
ggplot(df, aes(x = income, y = predicted, group = type, col = type)) + 
geom_line() + 
facet_grid(. ~ edu_group)

可视化 R 中两个连续变量和一个分类变量之间的三向交互

问题描述

2 个解决方案

解决方案1
11 已采纳 2018-03-03 18:30:39

`IVContinuousA`与`DV`的`IVContinuousB`水平

模型预测差异的热图，DV 处理 - `IVContinuousA`和`IVContinuousB`值网格上的 DV 控制

解决方案2
5 2018-03-03 18:14:47

可视化 R 中两个连续变量和一个分类变量之间的三向交互

问题描述

2 个解决方案

解决方案1 11 已采纳 2018-03-03 18:30:39

IVContinuousA与DV的IVContinuousB水平

模型预测差异的热图，DV 处理 - IVContinuousA和IVContinuousB值网格上的 DV 控制

解决方案2 5 2018-03-03 18:14:47

解决方案1
11 已采纳 2018-03-03 18:30:39

`IVContinuousA`与`DV`的`IVContinuousB`水平

模型预测差异的热图，DV 处理 - `IVContinuousA`和`IVContinuousB`值网格上的 DV 控制

解决方案2
5 2018-03-03 18:14:47