繁体   English   中英

如何在同一图表中用条形图表示两个时间序列之间的差异?

[英]How to represent the difference between two time series with a bar plot in the same chart?

例如,如果我们有两个时间序列ab

time <- seq(as.Date("1999-06-15"),as.Date("2008-06-15") , by= "years")
a <- c(22.3,24.1,35,35,35.9,39.2,34.8,31.5,29.1,25.8)    
b <- c(22,24.9,31,34,37.5,36.3,32.1,29.7,28.6,23.9)
plot(as.Date(time),a,type="l",xlab="Date",ylab="T(°C)")
lines(as.Date(time),b,col=2)

有没有办法让我的情节看起来像图像示例:

在此输入图像描述

你可以使用ggplot2geom_linegeom_col

library(tidyverse)
DF_bar <- mutate(DF, diff_a_b = a - b)

DF %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, col = key)) + 
  geom_col(data = DF_bar, aes(y = diff_a_b)) # or geom_bar(data = DF_bar, aes(y = diff_a_b), stat = "identity")

在第一步中,我创建了一个包含变量diff_a_b的新数据集,这是ab之间的差异。

接下来,我将数据从宽到长重新整形,以便我们可以将列key映射到geom_line的颜色美学。 最后,我用DF_bargeom_col绘制diff_a_b

数据

DF <- data.frame(time = seq(as.Date("1999-06-15"),as.Date("2008-06-15"), by= "years"),
                 a = c(22.3,24.1,35,35,35.9,39.2,34.8,31.5,29.1,25.8),
                 b = c(22,24.9,31,34,37.5,36.3,32.1,29.7,28.6,23.9))

在此输入图像描述

不幸的是, Markus的第一个答案(在编辑之前)包含一个主要缺陷,导致显示残差的条形是预期的两倍 当根据key对条的填充进行着色时,这将立即可见:

library(dplyr)
library(tidyr)
library(ggplot2)

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, color = key)) + 
  geom_col(aes(y = diff_a_b, fill = key))

在此输入图像描述

根本原因是当从宽到长格式重新整形时, diff_a_b不被视为变量:

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b)

所以diff_a_b出现两次,每次time值:

 # A tibble: 20 x 4 time diff_a_b key value <date> <dbl> <chr> <dbl> 1 1999-06-15 0.3 a 22.3 2 2000-06-15 -0.800 a 24.1 3 2001-06-15 4 a 35 4 2002-06-15 1 a 35 5 2003-06-15 -1.6 a 35.9 6 2004-06-15 2.9 a 39.2 7 2005-06-15 2.70 a 34.8 8 2006-06-15 1.8 a 31.5 9 2007-06-15 0.5 a 29.1 10 2008-06-15 1.9 a 25.8 11 1999-06-15 0.3 b 22 12 2000-06-15 -0.800 b 24.9 13 2001-06-15 4 b 31 14 2002-06-15 1 b 34 15 2003-06-15 -1.6 b 37.5 16 2004-06-15 2.9 b 36.3 17 2005-06-15 2.70 b 32.1 18 2006-06-15 1.8 b 29.7 19 2007-06-15 0.5 b 28.6 20 2008-06-15 1.9 b 23.9 

由于geom_col()的默认值是position = "stack"因此两个值堆叠在一起。

快速修复markus的回答

如果位置变为"dodge"那么markus的答案将显示预期结果

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, color = key)) + 
  geom_col(aes(y = diff_a_b), position = "dodge")

在此输入图像描述

另一个修复方法是使用geom_linerange() ,其中每个段将被绘制两次:

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, color = key)) + 
  geom_linerange(aes(ymin = 0, ymax = diff_a_b), size = 3)

在此输入图像描述

“整洁”的方法

恕我直言,正确的(“整洁”)方法是在重新整形时将diff_a_b视为第三个变量/时间序列,并在创建geoms时使用data参数:

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(, , -time) %>%
  ggplot(aes(x = time, y = value)) +
  geom_line(aes(col = key), data = function(x) filter(x, key != "diff_a_b")) + 
  geom_col(data = function(x) filter(x, key == "diff_a_b"))

在此输入图像描述

data.tableggplot2

对于那些喜欢data.table进行数据调整的人:

library(data.table)
library(ggplot2)
long <- data.table(time, a, b)[
  , diff_a_b := a - b][
    , melt(.SD, "time")]
ggplot() + aes(time, value) + 
  geom_line(aes(color = variable), data = long[variable != "diff_a_b"]) + 
  geom_col(data = long[variable == "diff_a_b"])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM