[英]R ggplot: How to define group dependent y-axis breaks using facetted ggplots?
I have 40 groups (defined by short_ID) and would like to produce 40 different plots that use different y-scale breaks for each short_ID.我有 40 个组(由 short_ID 定义)并且想要生成 40 个不同的图,这些图为每个 short_ID 使用不同的 y 尺度中断。 I want the breaks for the y-scale to be (1) mean-2SD, (2) mean and (3) mean+2SD.
我希望 y 尺度的中断为 (1) 均值-2SD,(2) 均值和 (3) 均值 + 2SD。
I have a dataset called Dataplots containing my X and Y variables and the grouping variable "short_ID".我有一个名为 Dataplots 的数据集,其中包含我的 X 和 Y 变量以及分组变量“short_ID”。 I have created additional vectors M$SD11 (=mean-2SD), M$mean and M$SD22 (=mean+2SD) to define the breaks and M$short_ID as grouping variable.
我创建了额外的向量 M$SD11 (=mean-2SD)、M$mean 和 M$SD22 (=mean+2SD) 来定义中断和 M$short_ID 作为分组变量。 The code below partly works but the problem is that I do not know how to make the breaks group-dependent (ie, dependent on short_ID).
下面的代码部分有效,但问题是我不知道如何使中断依赖于组(即依赖于 short_ID)。 When I run the code below I get the same y axis breaks for all plots, namely for example the max of the vector M$SD22 instead of a different M$SD22 value for each plot.
当我运行下面的代码时,我得到所有图的相同 y 轴中断,即例如向量 M$SD22 的最大值,而不是每个图的不同 M$SD22 值。 So I think I need to add something to
所以我想我需要添加一些东西
"scale_y_continuous(breaks=c(M$SD11, M$mean, M$SD22)", for example "scale_y_continuous(group=M$short_ID, breaks=c(M$SD11, M$mean, M$SD22)" but this does not work.
Does anybody know what I can do to define different breaks for my different groups (ie, short_IDs)?有人知道我可以做什么来为我的不同组(即 short_ID)定义不同的休息时间吗? How can I change the code below to do this?
如何更改下面的代码来做到这一点? Many thanks!
非常感谢!
Dataplot <- ggplot(data = Dataplots, aes(x = Measure, y = Amylase_u, group = short_ID)) + geom_line() + facet_wrap(~ short_ID) + scale_y_continuous(breaks=c(M$SD11, M$mean, M$SD22))
I have added an example of 'Dataplots' and 'M'.我添加了一个“Dataplots”和“M”的例子。 For the purpose of the example I included only two groups (ie, short_IDs) instead of the 40 I actually have.
出于示例的目的,我只包含了两个组(即 short_ID),而不是我实际拥有的 40 个。 Thus this example would need to produce 2 plots, one for each short_ID with different y-axis breaks for each of the groups.
因此,此示例需要生成 2 个图,每个 short_ID 一个图,每个组的 y 轴中断点不同。
Example of Dataplots:数据图示例:
dput(Dataplots) structure(list(short_ID = c(1111, 1111, 1111, 1111, 2222, 2222, 2222, 2222), Measure = c(1, 2, 3, 4, 1, 2, 3, 4), Amylase_u = c(81.561, 75.648, 145.25, 85.246, 311.69, 261.74, 600.93, 291.39)), .Names = c("short_ID", "Measure", "Amylase_u"), row.names = c(NA, -8L), class = "data.frame", codepage = 65001L)
Example of M: M的例子:
dput(M) structure(list(SD11 = c(162, 682), mean = c(97, 366), SD22 = c(32, 51), short_ID = c(1111, 2222)), .Names = c("SD11", "mean", "SD22", "short_ID"), row.names = 1:2, class = "data.frame")
@Mark I have been trying to apply your suggestions to my complete dataset but cannot seem to get it right. @Mark 我一直在尝试将您的建议应用于我的完整数据集,但似乎无法做到正确。 I have in total 61 plots.
我总共有 61 个地块。 I started with:
我开始于:
myPlots <-
lapply(unique(Dataplots$short_ID), function(thisID){
Dataplots %>%
filter(short_ID == thisID) %>%
ggplot(aes(x = Measure, y = Amylase_u)) +
geom_line() +
scale_y_continuous(breaks= M %>%
filter(short_ID == thisID) %>%
select(mean) %>%
as.numeric()
) +
ggtitle(thisID)
})
(As you can see I decided to go for the subject-mean on the y-axis only and decided to drop the SDs.) I then continued with your final cowplot sugestion: (如您所见,我决定仅在 y 轴上采用主题均值,并决定放弃 SD。)然后我继续您最后的牛图建议:
plot_grid(ggdraw() + draw_label("Amylase_u", angle = 90), plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x + theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1))
, nrow = 1, rel_widths = c(0.05, 0.95))
This, however, results in 61 plots with the subject-mean on the y-axis but without the Measurements depecited in it (so the graph itself is missing).然而,这会产生 61 个图,y 轴上有主题平均值,但没有测量其中的测量值(因此图表本身丢失)。 I figured there may be a ')' misplaced so I tried:
我想可能有一个 ')' 放错了地方,所以我试过:
plot_grid(
ggdraw() + draw_label("Amylase_u", angle = 90)
, plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x +theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1)
, nrow = 1
, rel_widths = c(0.05, 0.95)))
This does give me graphs but they are tiny and the layout is terrible (Rplot2).这确实给了我图表,但它们很小而且布局很糟糕(Rplot2)。 I tried adapting the rel-heights and widths too but even after reading the help-file don't quite get how I should adapt them.
我也尝试调整相对高度和宽度,但即使在阅读帮助文件后也不太明白我应该如何调整它们。
Thanks again!再次感谢!
Finally, I removed the IDnumbers on top of each plot because they are not really necessary and this already greatly improves the plot (Rplot3), but still the layout needs to be adjusted.最后,我删除了每个图顶部的 IDnumbers,因为它们并不是真正必要的,这已经大大改善了图 (Rplot3),但仍然需要调整布局。
My understanding is that this still remains impossible in the facet
functions.我的理解是,这在
facet
功能中仍然是不可能的。 However, you can accomplish it yourself using the cowplot
package.但是,您可以使用
cowplot
包自己完成。
First, loop over your ideas (in lapply
) and generate each of the sub-plots you wanted.首先,循环您的想法(在
lapply
)并生成您想要的每个子图。 Note that I am using dplyr
for the pipe and filtering.请注意,我使用
dplyr
进行管道和过滤。
myPlots <-
lapply(unique(Dataplots$short_ID), function(thisID){
Dataplots %>%
filter(short_ID == thisID) %>%
ggplot(aes(x = Measure, y = Amylase_u)) +
geom_line() +
scale_y_continuous(breaks= M %>%
filter(short_ID == thisID) %>%
select(SD11, mean, SD22) %>%
as.numeric()
) +
ggtitle(thisID)
})
Then, call the function plot_grid
from cowplot
with the list of plots:然后,从带有绘图列表的
cowplot
调用函数plot_grid
:
plot_grid(plotlist = myPlots)
gives:给出:
A few notes:一些注意事项:
cowplot
autoloads its own default style, so use theme_set
to return to your preferred style cowplot
自动加载自己的默认样式,因此请使用theme_set
返回您喜欢的样式 Since I am not sure what your goal is, here is another alternative.由于我不确定您的目标是什么,这是另一种选择。 If you just want to plot deviation from mean (in standard deviations) to make the changes comparable, you could just calculate the z-score of the column within the groups and plot the results.
如果您只想绘制与平均值的偏差(以标准差表示)以使更改具有可比性,您只需计算组内列的 z 分数并绘制结果。 Using
dplyr
again:再次使用
dplyr
:
Dataplots %>%
group_by(short_ID) %>%
mutate(scaledAmylase = as.numeric(scale(Amylase_u)) ) %>%
ggplot(aes(x = Measure
, y = scaledAmylase)) +
geom_line() +
facet_wrap(~short_ID)
gives给
Or, if the mean/SD are calculated/defined somewhere else (and stored in M
) rather than coming directly from the data, you can scale using M
instead of the data:或者,如果平均值/标准差是在其他地方计算/定义的(并存储在
M
)而不是直接来自数据,您可以使用M
而不是数据进行缩放:
Dataplots %>%
left_join(M) %>%
mutate(scaledAmylase = (Amylase_u - mean) / ((SD22 - mean) / 2) ) %>%
ggplot(aes(x = Measure
, y = scaledAmylase)) +
geom_line() +
facet_wrap(~short_ID)
gives给
And, because I can't leave well enough alone, here is a version of the plot_grid
approach that removes the duplicated axis titles and includes them just once instead (like facet_wrap
would).而且,因为我不能单独留下足够好,这里是
plot_grid
方法的一个版本,它删除了重复的轴标题并只包含它们一次(就像facet_wrap
一样)。 As above, increasing the number of subplots or the aspect ratio will force you to tweak the relative values here:如上所述,增加子图的数量或纵横比将迫使您在这里调整相对值:
plot_grid(
ggdraw() + draw_label("Amylase_u", angle = 90)
, plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x + theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1))
, nrow = 1
, rel_widths = c(0.05, 0.95)
)
gives给
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.