简体   繁体   English

跨组添加趋势线并在分组小提琴 plot 或框 plot 中设置刻度标签

[英]Adding trend lines across groups and setting tick labels in a grouped violin plot or box plot

I have xy grouped data that I'm plotting using R 's ggplot2 geom_violin adding regression trend lines:我有xy分组数据,我正在使用Rggplot2 geom_violin添加回归趋势线:

Here are the data:以下是数据:

library(dplyr)
library(plotly)
library(ggplot2)

set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
                 age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
                 group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
  dplyr::mutate(time = as.integer(age)) %>%
  dplyr::arrange(group,time) %>%
  dplyr::mutate(group_age=paste0(group,"_",age))

df$group_age <- factor(df$group_age,levels=unique(df$group_age))

And my current plot:还有我现在的 plot:

ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) + 
  geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal()

在此处输入图像描述

My questions are:我的问题是:

  1. How do I get rid of the alpha part of the legend ?如何摆脱legendalpha部分?
  2. I would like the x-axis ticks to be df$group rather than df$group_age , which means a tick per each group at the center of that group where the label is group .我希望x-axis ticksdf$group而不是df$group_age ,这意味着在 label 是group的那个group的中心每个group都有一个tick Consider a situation where not all group s have all age s - for example, if a certain group has only two of the age s and I'm pretty sure ggplot will only present only these two age s, I'd like the tick to still be centered between their two age s.考虑并非所有group都具有age的情况 - 例如,如果某个group只有两个age并且我很确定ggplot只会显示这两个age ,我希望tick仍然居中在他们的两个age之间。

One more question:还有一个问题:

It would also be nice to have the p-values of each fitted slope plotted on top of each group .将每个拟合斜率的 p 值绘制在每个group的顶部也会很好。

I tried:我试过了:

library(ggpmisc)
my.formula <- value ~ group_age
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) + 
  geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal() +
  stat_poly_eq(formula = my.formula,aes(label=stat(p.value.label)),parse=T)

But I get the same plot as above with the following warning message:但我得到与上面相同的 plot 并带有以下warning消息:

Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default 

geom_smooth() fits a line, while stat_poly_eqn() issues an error. geom_smooth()适合一条线,而stat_poly_eqn()发出错误。 A factor is a categorical variable with unordered levels. factor是具有无序水平的分类变量。 A trend against a factor is undefined.针对某个因素的趋势是不确定的。 geom_smooth() may be taking the levels and converting them to "arbitrary" numerical values, but these values are just indexes rather than meaningful values. geom_smooth()可能会获取级别并将它们转换为“任意”数值,但这些值只是索引而不是有意义的值。

To obtain a plot similar to what is described in the question but using code that provides correct linear regression lines and the corresponding p -values I would use the code below.要获得类似于问题中描述的 plot 但使用提供正确线性回归线和相应p值的代码,我将使用下面的代码。 The main change is that the numerical variable time is mapped to x making the fitting of a regression a valid operation.主要变化是数值变量time映射到x ,使得回归拟合成为有效操作。 To allow for a linear fit an x-scale with a log10 transformation is used, with breaks and labels at the ages for which data is available.为了允许线性拟合,使用了带有 log10 变换的 x 尺度,在数据可用的年龄处带有中断和标签。

library(dplyr)
library(ggplot2)
library(ggpmisc)

set.seed(1)
df <-
  data.frame(
    value = c(
      rnorm(500, 8, 1), rnorm(600, 6, 1.5), rnorm(400, 4, 0.5),
      rnorm(500, 2, 2), rnorm(400, 4, 1), rnorm(600, 7, 0.5),
      rnorm(500, 3, 1), rnorm(500, 3, 1), rnorm(500, 3, 1)
    ),
    age = c(
      rep("d3", 500), rep("d8", 600), rep("d24", 400),
      rep("d3", 500), rep("d8", 400), rep("d24", 600),
      rep("d3", 500), rep("d8", 500), rep("d24", 500)
    ),
    group = c(rep("A", 1500), rep("B", 1500), rep("C", 1500))
  ) %>%
  mutate(time = as.integer(gsub("d", "", age))) %>%
  arrange(group, time) %>%
  mutate(age = factor(age, levels = c("d3", "d8", "d24")),
         group = factor(group))

my_formula = y ~ x

ggplot(df, aes(x = time, y = value)) +
  geom_violin(aes(fill = age, color = age), alpha = 0.3) + 
  geom_boxplot(width = 0.1,
               aes(color = age), fill = NA) +
  geom_smooth(color = "black", formula = my_formula, method = 'lm') + 
  stat_poly_eq(aes(label = stat(p.value.label)), 
               formula = my_formula, parse = TRUE,
               npcx = "center", npcy = "bottom") +
  scale_x_log10(name = "Age", breaks = c(3, 8, 24)) +
  facet_wrap(~group) +
  theme_minimal()

Which creates the following figure:这将创建下图: 在此处输入图像描述

Here is a solution.这是一个解决方案。 The alpha - legend issue is easy. alpha - 图例问题很简单。 Anything you place into the aes() functioning will get placed in a legend.您放入aes()函数的任何内容都将被放入图例中。 This feature should be used when you want a feature of the data to be used as an aestetic.当您希望将数据的某个特征用作美学特征时,应使用此特征。 Putting alpha outside of an aes will remove it from the legend.alpha放在aes之外会将其从图例中删除。

I'm not sure the x legend is what you wanted but i did it manually so it should be easy to configure.我不确定 x 图例是您想要的,但我是手动完成的,所以应该很容易配置。

Regarding the p.values, i did separate linear regressions and store the p.value in three different vectors which can be called into the ggplot using the annotate.关于 p.values,我做了单独的线性回归并将 p.value 存储在三个不同的向量中,这些向量可以使用注释调用到 ggplot 中。 For two of the groups the p.value was <.001 so the round functioning will round it to 0. Therefore, i just added p. <.001对于其中两个组,p.value 为 <.001,因此round入函数会将其舍入为 0。因此,我只是添加了p. <.001 p. <.001

Good luck with this!祝你好运!

library(dplyr)
library(ggplot2)

set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
                 age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
                 group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
  dplyr::mutate(time = as.integer(age)) %>%
  dplyr::arrange(group,time) %>%
  dplyr::mutate(group_age=paste0(group,"_",age))

df$group_age <- factor(df$group_age,levels=unique(df$group_age))

mod1 <- lm(value ~ time,df\[df$group == 'A',\])
mod1 <- summary(mod1)$coefficients\[8\] %>% round(2)

mod2 <- lm(value ~ time,df\[df$group == 'B',\])
mod2 <- summary(mod2)$coefficients\[8\] %>% round(2)

mod3 <- lm(value ~ time,df\[df$group == 'C',\])
mod3 <- summary(mod3)$coefficients\[8\] %>% round(2)



ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) + 
  geom_violin(alpha=0.5) + 
  geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  geom_smooth(mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + 
  scale_x_discrete(labels = c('','A','','','B','','','C','')) +
  annotate('text',x = 2,y = -1,label = paste('pvalue: <.001')) +
  annotate('text',x = 6,y = 10,label = paste('pvalue: <.001')) +
  annotate('text',x = 8,y = -1.2,label = paste('pvalue:',mod3))+
  theme_minimal()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM