简体   繁体   English

R:使用ggplot2的组之间差异的facet_grid图

[英]R: facet_grid plot of differences between groups using ggplot2

I'm trying to create a series of plots showing differences between groups of a measured variable, and am searching for an efficient way to do this using the facet_grid feature of ggplot2 in R. 我正在尝试创建一系列图表,以显示测量变量组之间的差异,并正在寻找一种有效的方法,以使用R中的ggplot2facet_grid功能进行此操作。

Here is an illustrative example: 这是一个说明性的示例:

# sample input data
df <- data.frame(year=rep(c(2011:2015), 2), 
                 value=c(0:4, 1:5),
                 scenario=rep(c("a","b"), each=5))

# make a sample plot
p <- 
  ggplot(df, aes(x=year, y=value)) +
  geom_point() + geom_line() +
  facet_grid(scenario ~ scenario)

This produces the following sample plot, in which value is plotted against year separately for each scenario combination: 这将生成以下样本图,其中针对每种方案组合分别针对year绘制value

样面图

(I assume the second row is not plotted because it is identical to the first). (我假设未绘制第二行,因为它与第一行相同)。

However, what I am looking for is a plot where, in each facet, (value in scenario on top) - (value in scenario on right) is plotted by year. 但是,我要寻找的是一个图,在每个方面中,(年份上方方案中的值)-(右侧方案中的值)按年份绘制。 Specifically: 特别:

  • Upper left plot would be (value a) - (value a) = 0 for all years. 所有年份的左上图为(值a)-(值a)= 0。
  • Upper right plot would be (value b) - (value a) = 1 for all years. 所有年份的右上图将为(值b)-(值a)= 1。
  • Lower left plot would be (value a) - (value b) = -1 for all years. 所有年份的左下角图将是(值a)-(值b)= -1。
  • Lower right plot would be (value b) - (value b) = 0 for all years 所有年份的右下图为(值b)-(值b)= 0

I have not been able to find any built-in or automated difference command to facet_grid . 我还没有找到任何内置或自动的差异命令到facet_grid My initial thought was to pass a function as the y argument to ggplot , but given that the data frame has a single value column I got stumped. 我最初的想法是将一个函数作为y参数传递给ggplot ,但是鉴于数据框具有单个value列,我很困惑。 I am guessing there might be a solution using some combination of dplyr and reshape2 but cannot wrap my head around how to implement it. 我猜可能会有使用dplyrreshape2某种组合的解决方案,但reshape2如何实现它。

Here is an option using some functions from tidyr to first spread the data to allow contrasts to be calculated, then gather ing it back together to allow plotting: 这是一个使用tidyr一些功能的选项,该功能首先spread数据以允许计算对比度,然后将其gather回以进行绘图:

forPlotting <-
  df %>%
  spread(scenario, value) %>%
  mutate(`a - b` = a - b
         , `b - a` = b - a
         , `a - a` = 0
         , `b - b` = 0) %>%
  gather(Comparison, Difference, -(year:b) ) %>%
  separate(Comparison, c("First Val", "Second Val"), " - ")

That returns a data.frame like so (just the head here): 这样返回一个data.frame(仅在此处为头):

  year a b First Val Second Val Difference
1 2011 0 1         a          b         -1
2 2012 1 2         a          b         -1
3 2013 2 3         a          b         -1
4 2014 3 4         a          b         -1
5 2015 4 5         a          b         -1
6 2011 0 1         b          a          1

And you can plot like so: 您可以这样绘制:

ggplot(forPlotting
       , aes(x = year, y = Difference)) +
  geom_point() + geom_line() +
  facet_grid(`First Val` ~ `Second Val`)

在此处输入图片说明

The bigger question is why you want to do this. 更大的问题是为什么要这样做。 I assume that you already know that just plotting the two sets as different color lines is an easier visualization: 我假设您已经知道,将两个集合绘制为不同的色线会更容易实现:

ggplot(df, aes(x=year, y=value, col = scenario)) +
  geom_point() + geom_line()

在此处输入图片说明

So, I am assuming that you have more complicated data -- specifically, with lots more columns to compare. 因此,我假设您拥有更复杂的数据-具体来说,还有更多要比较的列。 So, here is an approach that will automate (and simplify) many of the above steps for multiple columns. 因此,这是一种针对多列自动执行(并简化)上述许多步骤的方法。 The approach is basically the same, but it uses mutate_ to allow you to pass in a vector with the columns you are trying to create. 该方法基本上是相同的,但是它使用mutate_允许您将向量与要创建的列一起传递。

df <-
  data.frame(
    year = 2011:2015
    , a = 0:4
    , b = 1:5
    , c = 2:6
    , d = 3:7
  )

allContrasts <-
  outer(colnames(df)[-1]
        , colnames(df)[-1]
        , paste
        , sep = " - ") %>%
  as.character() %>%
  setNames(., .) %>%
  as.list()

forPlotting <-
  df %>%
  mutate_(.dots = allContrasts) %>%
  select(-(a:d)) %>%
  gather(Comparison, Difference, -year ) %>%
  separate(Comparison, c("First Val", "Second Val"), " - ") %>%
  filter(`First Val` != `Second Val`)

ggplot(forPlotting
       , aes(x = year, y = Difference)) +
  geom_point() + geom_line() +
  facet_grid(`First Val` ~ `Second Val`) +
  theme(axis.text.x = element_text(angle = 90))

Gives this: 给出以下内容:

在此处输入图片说明

Why can I not leave this alone? 为什么我不能不去管它呢? I just like playing with the standard evaluation too much. 我只是非常喜欢标准评估。 If you have non-parsing column names (eg, things with spaces) the above will fail. 如果您具有非解析的列名(例如,带空格的东西),则上述操作将失败。 So, here is an example with such column names, showing the addition of backticks to ensure the columns parse correctly. 因此,这是带有此类列名称的示例,显示了添加反引号以确保正确解析列的情况。

df <-
  data.frame(
    year = 2011:2015
    , value = c(0:4, 1:5, 2:6, 3:7)
    , scenario = rep(c("Unit 1", "Exam 2"
                       , "Homework", "Final Exam")
                     , each = 5)
  ) %>%
  spread(scenario, value)

allContrasts <-
  outer(paste0("`", colnames(df)[-1], "`")
        , paste0("`", colnames(df)[-1], "`")
        , paste
        , sep = " - ") %>%
  as.character() %>%
  setNames(., .) %>%
  as.list()

forPlotting <-
  df %>%
  mutate_(.dots = allContrasts) %>%
  select_(.dots = paste0("-`", colnames(df)[-1], "`")) %>%
  gather(Comparison, Difference, -year ) %>%
  separate(Comparison, c("First Val", "Second Val"), " - ") %>%
  filter(`First Val` != `Second Val`) %>%
  mutate_each(funs(gsub("`", "", .)), `First Val`, `Second Val`)

ggplot(forPlotting
       , aes(x = year, y = Difference)) +
  geom_point() + geom_line() +
  facet_grid(`First Val` ~ `Second Val`) +
  theme(axis.text.x = element_text(angle = 90))

在此处输入图片说明

Do you want something like the following? 您是否需要以下内容?

dflist <- split(df, df$scenario)
df <- rbind(merge(dflist$a, dflist$a, by='year'),
      merge(dflist$a, dflist$b, by='year'),
      merge(dflist$b, dflist$a, by='year'),
      merge(dflist$b, dflist$b, by='year'))
df$value <- df$value.x - df$value.y
ggplot(df, aes(x=year, y=value)) +
  geom_point() + geom_line() +
  facet_grid(scenario.x ~ scenario.y)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM