![](/img/trans.png)
[英]Present correlation in plot between two time series for a multiline time series
[英]How to represent the difference between two time series with a bar plot in the same chart?
例如,如果我们有两个时间序列a
和b
:
time <- seq(as.Date("1999-06-15"),as.Date("2008-06-15") , by= "years")
a <- c(22.3,24.1,35,35,35.9,39.2,34.8,31.5,29.1,25.8)
b <- c(22,24.9,31,34,37.5,36.3,32.1,29.7,28.6,23.9)
plot(as.Date(time),a,type="l",xlab="Date",ylab="T(°C)")
lines(as.Date(time),b,col=2)
有没有办法让我的情节看起来像图像示例:
你可以使用ggplot2
的geom_line
和geom_col
。
library(tidyverse)
DF_bar <- mutate(DF, diff_a_b = a - b)
DF %>%
gather(key, value, a, b) %>%
ggplot(., aes(time)) +
geom_line(aes(y = value, col = key)) +
geom_col(data = DF_bar, aes(y = diff_a_b)) # or geom_bar(data = DF_bar, aes(y = diff_a_b), stat = "identity")
在第一步中,我创建了一个包含变量diff_a_b
的新数据集,这是a
和b
之间的差异。
接下来,我将数据从宽到长重新整形,以便我们可以将列key
映射到geom_line
的颜色美学。 最后,我用DF_bar
在geom_col
绘制diff_a_b
。
数据
DF <- data.frame(time = seq(as.Date("1999-06-15"),as.Date("2008-06-15"), by= "years"),
a = c(22.3,24.1,35,35,35.9,39.2,34.8,31.5,29.1,25.8),
b = c(22,24.9,31,34,37.5,36.3,32.1,29.7,28.6,23.9))
不幸的是, Markus的第一个答案(在编辑之前)包含一个主要缺陷,导致显示残差的条形是预期的两倍 。 当根据key
对条的填充进行着色时,这将立即可见:
library(dplyr)
library(tidyr)
library(ggplot2)
data_frame(time, a, b) %>%
mutate(diff_a_b = a - b) %>%
gather(key, value, a, b) %>%
ggplot(., aes(time)) +
geom_line(aes(y = value, color = key)) +
geom_col(aes(y = diff_a_b, fill = key))
根本原因是当从宽到长格式重新整形时, diff_a_b
不被视为变量:
data_frame(time, a, b) %>%
mutate(diff_a_b = a - b) %>%
gather(key, value, a, b)
所以diff_a_b
出现两次,每次time
值:
# A tibble: 20 x 4 time diff_a_b key value <date> <dbl> <chr> <dbl> 1 1999-06-15 0.3 a 22.3 2 2000-06-15 -0.800 a 24.1 3 2001-06-15 4 a 35 4 2002-06-15 1 a 35 5 2003-06-15 -1.6 a 35.9 6 2004-06-15 2.9 a 39.2 7 2005-06-15 2.70 a 34.8 8 2006-06-15 1.8 a 31.5 9 2007-06-15 0.5 a 29.1 10 2008-06-15 1.9 a 25.8 11 1999-06-15 0.3 b 22 12 2000-06-15 -0.800 b 24.9 13 2001-06-15 4 b 31 14 2002-06-15 1 b 34 15 2003-06-15 -1.6 b 37.5 16 2004-06-15 2.9 b 36.3 17 2005-06-15 2.70 b 32.1 18 2006-06-15 1.8 b 29.7 19 2007-06-15 0.5 b 28.6 20 2008-06-15 1.9 b 23.9
由于geom_col()
的默认值是position = "stack"
因此两个值堆叠在一起。
如果位置变为"dodge"
那么markus的答案将显示预期结果
data_frame(time, a, b) %>%
mutate(diff_a_b = a - b) %>%
gather(key, value, a, b) %>%
ggplot(., aes(time)) +
geom_line(aes(y = value, color = key)) +
geom_col(aes(y = diff_a_b), position = "dodge")
另一个修复方法是使用geom_linerange()
,其中每个段将被绘制两次:
data_frame(time, a, b) %>%
mutate(diff_a_b = a - b) %>%
gather(key, value, a, b) %>%
ggplot(., aes(time)) +
geom_line(aes(y = value, color = key)) +
geom_linerange(aes(ymin = 0, ymax = diff_a_b), size = 3)
恕我直言,正确的(“整洁”)方法是在重新整形时将diff_a_b
视为第三个变量/时间序列,并在创建geoms时使用data
参数:
data_frame(time, a, b) %>%
mutate(diff_a_b = a - b) %>%
gather(, , -time) %>%
ggplot(aes(x = time, y = value)) +
geom_line(aes(col = key), data = function(x) filter(x, key != "diff_a_b")) +
geom_col(data = function(x) filter(x, key == "diff_a_b"))
data.table
和ggplot2
对于那些喜欢data.table
进行数据调整的人:
library(data.table)
library(ggplot2)
long <- data.table(time, a, b)[
, diff_a_b := a - b][
, melt(.SD, "time")]
ggplot() + aes(time, value) +
geom_line(aes(color = variable), data = long[variable != "diff_a_b"]) +
geom_col(data = long[variable == "diff_a_b"])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.