简体   繁体   English

条形图,条形宽度可变,作为 x 轴上的日期范围

[英]Bar plot with variable bar widths as date ranges on the x-axis

I wish to make a bar graph where the response variable (weight change) is measured over time periods of different length, defined by a start and an end date.我希望制作一个条形图,其中响应变量(重量变化)是在不同长度的时间段内测量的,由开始日期和结束日期定义。 The width of the bars should correspond to the length of the period.条形的宽度应对应于周期的长度。 A small example of my data:我的数据的一个小例子:

wtchange.data <- structure(list(start.date = structure(1:3, .Label = c("2015-04-01", 
    "2015-04-15", "2015-04-30"), class = "factor"), end.date = structure(1:3, .Label = c("2015-04-15", 
    "2015-04-30", "2015-05-30"), class = "factor"), wtchange = c(5L, 
    10L, 15L), se = c(1.2, 2.5, 0.8)), .Names = c("start.date", "end.date", 
    "wtchange", "se"), class = "data.frame", row.names = c(NA, -3L
    ))

wtchange.data
#   start.date   end.date wtchange  se
# 1 2015-04-01 2015-04-15        5 1.2
# 2 2015-04-15 2015-04-30       10 2.5
# 3 2015-04-30 2015-05-30       15 0.8

wtchange.data$start.date <- as.Date(wtchange.data$start.date)
wtchange.data$end.date <- as.Date(wtchange.data$end.date)

Attempting to use geom_bar :尝试使用geom_bar

library(ggplot2)
ggplot(wtchange.data, aes(x = start.date, y = wtchange)) +
  geom_bar(stat = "identity", color = "black") +
  geom_errorbar(aes(ymin = wtchange-se, ymax = wtchange+se), width = 1)

(not allowed >2 links with <10 reputation, so can unfortunately not show the first plot) (不允许具有 <10 个声望的 >2 个链接,因此很遗憾不能显示第一个情节)

The main problem is that when aesthetics of the plot area are defined ( x = start.date, y = wtchange ), I can use only one variable ( start.date in this example) for the x-axis, but I really need to somehow use both start.date and end.date to delimit bar widths corresponding to each period.主要问题是,当定义了绘图区域的美感( x = start.date, y = wtchange )时,我只能对 x 轴使用一个变量(在此示例中为start.date ),但我确实需要以某种方式同时使用start.dateend.date来分隔对应于每个时期的条形宽度。 The graph should look something like this (drawn in Paint):该图应如下所示(在 Paint 中绘制): 在此处输入图片说明

A secondary problem is that the bars should touch without gaps, but I am not sure if it is even possible, given that the bars have to be of different widths, so you cannot set one bar width for all bars.第二个问题是条形应该没有间隙地接触,但我不确定这是否可能,因为条形必须具有不同的宽度,因此您不能为所有条形设置一个条形宽度。 Would it be possible to set width for each bar manually?是否可以手动设置每个条的宽度?


Edit: Thank you Henrik for the links.编辑:感谢 Henrik 提供的链接。 I have made some further progress.我取得了一些进一步的进展。 I calculated date midpoints for centering the bars at:我计算了将条形居中的日期中点:

wtchange.data$date.midpoint <- wtchange.data$start.date +
(wtchange.data$end.date - wtchange.data$start.date)/2

And then calculated period lengths for using as bar widths:然后计算用作条形宽度的周期长度:

wtchange.data$period.length <- wtchange.data$end.date - wtchange.data$start.date

The updated graph code is now:更新后的图形代码现在是:

ggplot(wtchange.data, aes(x = date.midpoint, y = wtchange)) +
  geom_bar(stat = "identity", color = "black", width = wtchange.data$period.length) +
  geom_errorbar(aes(ymin = wtchange-se, ymax = wtchange+se), width = 1)

在此处输入图片说明

The only problem remaining is that there still is a small gap between bars in one place.剩下的唯一问题是在一个地方的条形之间仍然有一个小的间隙。 I guess this is due to the way R rounds date difference calculation to the nearest number of days?我想这是由于 R 将日期差异计算四舍五入到最接近的天数的方式?

You are right: it's the calculation of difference between end and start dates which is the reason for the gap.您是对的:这是结束日期和开始日期之间差异的计算,这是造成差距的原因。 We need to use numeric periods instead of difftime (see explanation below) when calculating the width and the midpoint.在计算宽度和中点时,我们需要使用numeric句点而不是difftime (参见下面的解释)。

# length of periods, width of bars as numeric
df$width <- as.numeric(df$end.date - df$start.date) 

# mid-points
df$mid <- df$start.date + df$width / 2

# dates for breaks 
dates <- unique(c(df$start.date, df$end.date))

ggplot(df, aes(x = mid, y = wtchange)) +
  geom_bar(stat = "identity", color = "black", width = df$width) +
  geom_errorbar(aes(ymin = wtchange - se, ymax = wtchange + se), width = 1) +
  scale_x_date(breaks = dates)

在此处输入图片说明


Corresponding geom_rect code:对应的geom_rect代码:

# mid-points
df$mid <- df$start.date + as.numeric(df$end.date - df$start.date) / 2

# dates for breaks 
dates <- unique(c(df$start.date, df$end.date))

ggplot(df, aes(x = mid, y = wtchange)) +
  geom_rect(aes(xmin = start.date, xmax = end.date, ymin = 0, ymax = wtchange), color = "black") +
  geom_errorbar(aes(ymin = wtchange - se, ymax = wtchange + se), width = 1) +
  scale_x_date(breaks = dates)

And slightly less ink demanding with geom_step :geom_step墨水要求geom_step

# need to add an end date to the last period
df2 <- tail(df, 1)
df2$start.date <- df2$end.date
df2 <- rbind(df, df2)

# mid-points
df$mid <- df$start.date + as.numeric(df$end.date - df$start.date) / 2

ggplot() +
  geom_step(data = df2, aes(x = start.date, y = wtchange)) +
  geom_errorbar(data = df, aes(x = mid, ymin = wtchange - se, ymax = wtchange + se), width = 1) +
  scale_x_date(breaks = dates) +
  ylim(0, 16) +
  theme_bw()

在此处输入图片说明


On the " difftime issue":关于“ difftime问题”:

Values of class Date can be represented internally as fractional days (see ?Date and ?Ops.Date ; try: Sys.Date() ; Sys.Date() + 0.5 ; Sys.Date() + 0.5 + 0.5 ).Date值可以在内部表示为小数天(请参阅?Date?Ops.Date ;尝试: Sys.Date()Sys.Date() + 0.5Sys.Date() + 0.5 + 0.5 )。 However, when adding a difftime object to a Date , the difftime object is rounded the nearest whole day (see x argument in ?Ops.Date ).但是,当将difftime对象添加到Datedifftime对象会四舍五入最近的一整天(请参阅?Ops.Date x参数)。

Let's check the calculations using your start date 2015-04-15 and end date 2015-04-30 :让我们使用开始日期2015-04-15和结束日期2015-04-30检查计算:

mid <- (as.Date("2015-04-30") - as.Date("2015-04-15")) / 2
mid
# Time difference of 7.5 days

str(mid)
# Class 'difftime'  atomic [1:1] 7.5
# ..- attr(*, "units")= chr "days"

# calculate the midpoint using the difftime object
as.Date("2015-04-15") + mid
# [1] "2015-04-23"

# calculating midpoint using numeric object yields another date...
as.Date("2015-04-15") + as.numeric(mid)
# [1] "2015-04-22"

# But is "2015-04-15" above in fact fractional, i.e. "2015-04-22 point 5"?
# Let's try and add 0.5
as.Date("2015-04-15") + as.numeric(mid) + 0.5
# [1] "2015-04-23"
# Yes.

Thus, we use the numeric period, instead of the difftime period.因此,我们使用numeric时间段,而不是difftime时间段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM