[英]Adding trend lines across groups and setting tick labels in a grouped violin plot or box plot
I have xy
grouped data that I'm plotting using R
's ggplot2
geom_violin
adding regression trend lines:我有
xy
分组数据,我正在使用R
的ggplot2
geom_violin
添加回归趋势线:
Here are the data:以下是数据:
library(dplyr)
library(plotly)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
And my current plot:还有我现在的 plot:
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal()
My questions are:我的问题是:
alpha
part of the legend
?legend
的alpha
部分?x-axis
ticks
to be df$group
rather than df$group_age
, which means a tick
per each group
at the center of that group
where the label is group
.x-axis
ticks
是df$group
而不是df$group_age
,这意味着在 label 是group
的那个group
的中心每个group
都有一个tick
。 Consider a situation where not all group
s have all age
s - for example, if a certain group
has only two of the age
s and I'm pretty sure ggplot
will only present only these two age
s, I'd like the tick
to still be centered between their two age
s.group
都具有age
的情况 - 例如,如果某个group
只有两个age
并且我很确定ggplot
只会显示这两个age
,我希望tick
仍然居中在他们的两个age
之间。 One more question:还有一个问题:
It would also be nice to have the p-values of each fitted slope plotted on top of each group
.将每个拟合斜率的 p 值绘制在每个
group
的顶部也会很好。
I tried:我试过了:
library(ggpmisc)
my.formula <- value ~ group_age
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal() +
stat_poly_eq(formula = my.formula,aes(label=stat(p.value.label)),parse=T)
But I get the same plot as above with the following warning
message:但我得到与上面相同的 plot 并带有以下
warning
消息:
Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default
geom_smooth()
fits a line, while stat_poly_eqn()
issues an error. geom_smooth()
适合一条线,而stat_poly_eqn()
发出错误。 A factor
is a categorical variable with unordered levels. factor
是具有无序水平的分类变量。 A trend against a factor is undefined.针对某个因素的趋势是不确定的。
geom_smooth()
may be taking the levels and converting them to "arbitrary" numerical values, but these values are just indexes rather than meaningful values. geom_smooth()
可能会获取级别并将它们转换为“任意”数值,但这些值只是索引而不是有意义的值。
To obtain a plot similar to what is described in the question but using code that provides correct linear regression lines and the corresponding p -values I would use the code below.要获得类似于问题中描述的 plot 但使用提供正确线性回归线和相应p值的代码,我将使用下面的代码。 The main change is that the numerical variable
time
is mapped to x
making the fitting of a regression a valid operation.主要变化是数值变量
time
映射到x
,使得回归拟合成为有效操作。 To allow for a linear fit an x-scale with a log10 transformation is used, with breaks and labels at the ages for which data is available.为了允许线性拟合,使用了带有 log10 变换的 x 尺度,在数据可用的年龄处带有中断和标签。
library(dplyr)
library(ggplot2)
library(ggpmisc)
set.seed(1)
df <-
data.frame(
value = c(
rnorm(500, 8, 1), rnorm(600, 6, 1.5), rnorm(400, 4, 0.5),
rnorm(500, 2, 2), rnorm(400, 4, 1), rnorm(600, 7, 0.5),
rnorm(500, 3, 1), rnorm(500, 3, 1), rnorm(500, 3, 1)
),
age = c(
rep("d3", 500), rep("d8", 600), rep("d24", 400),
rep("d3", 500), rep("d8", 400), rep("d24", 600),
rep("d3", 500), rep("d8", 500), rep("d24", 500)
),
group = c(rep("A", 1500), rep("B", 1500), rep("C", 1500))
) %>%
mutate(time = as.integer(gsub("d", "", age))) %>%
arrange(group, time) %>%
mutate(age = factor(age, levels = c("d3", "d8", "d24")),
group = factor(group))
my_formula = y ~ x
ggplot(df, aes(x = time, y = value)) +
geom_violin(aes(fill = age, color = age), alpha = 0.3) +
geom_boxplot(width = 0.1,
aes(color = age), fill = NA) +
geom_smooth(color = "black", formula = my_formula, method = 'lm') +
stat_poly_eq(aes(label = stat(p.value.label)),
formula = my_formula, parse = TRUE,
npcx = "center", npcy = "bottom") +
scale_x_log10(name = "Age", breaks = c(3, 8, 24)) +
facet_wrap(~group) +
theme_minimal()
Here is a solution.这是一个解决方案。 The alpha - legend issue is easy.
alpha - 图例问题很简单。 Anything you place into the
aes()
functioning will get placed in a legend.您放入
aes()
函数的任何内容都将被放入图例中。 This feature should be used when you want a feature of the data to be used as an aestetic.当您希望将数据的某个特征用作美学特征时,应使用此特征。 Putting
alpha
outside of an aes
will remove it from the legend.将
alpha
放在aes
之外会将其从图例中删除。
I'm not sure the x legend is what you wanted but i did it manually so it should be easy to configure.我不确定 x 图例是您想要的,但我是手动完成的,所以应该很容易配置。
Regarding the p.values, i did separate linear regressions and store the p.value in three different vectors which can be called into the ggplot using the annotate.关于 p.values,我做了单独的线性回归并将 p.value 存储在三个不同的向量中,这些向量可以使用注释调用到 ggplot 中。 For two of the groups the p.value was <.001 so the
round
functioning will round it to 0. Therefore, i just added p. <.001
对于其中两个组,p.value 为 <.001,因此
round
入函数会将其舍入为 0。因此,我只是添加了p. <.001
p. <.001
Good luck with this!祝你好运!
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
mod1 <- lm(value ~ time,df\[df$group == 'A',\])
mod1 <- summary(mod1)$coefficients\[8\] %>% round(2)
mod2 <- lm(value ~ time,df\[df$group == 'B',\])
mod2 <- summary(mod2)$coefficients\[8\] %>% round(2)
mod3 <- lm(value ~ time,df\[df$group == 'C',\])
mod3 <- summary(mod3)$coefficients\[8\] %>% round(2)
ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) +
geom_violin(alpha=0.5) +
geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) +
scale_x_discrete(labels = c('','A','','','B','','','C','')) +
annotate('text',x = 2,y = -1,label = paste('pvalue: <.001')) +
annotate('text',x = 6,y = 10,label = paste('pvalue: <.001')) +
annotate('text',x = 8,y = -1.2,label = paste('pvalue:',mod3))+
theme_minimal()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.