简体   繁体   English

在 ggplot2 中绘制日期间隔

[英]Plotting date intervals in ggplot2

I have a dataset which has a bunch of date intervals (ie POSIXct format start dates and end dates).我有一个数据集,它有一堆日期间隔(即 POSIXct 格式的开始日期和结束日期)。

In the example provided, let's say it's each period is associated to when someone was in school or out of school.在提供的示例中,假设每个时期都与某人上学或失学的时间相关联。 I'm interested in plotting the data in ggplot2, each row is essentially data for one period.我有兴趣在 ggplot2 中绘制数据,每一行本质上是一个时期的数据。 Currently all of the rows don't have a factor variable, but I've put one in the example as it may make things easier to plot.目前所有的行都没有因子变量,但我在示例中放了一个,因为它可能会使 plot 的事情变得更容易。 It's worth noting that in some cases the end date of one period and the beginning of the next overlap.值得注意的是,在某些情况下,一个时期的结束日期和下一个时期的开始重叠。

In the data, each row is a unique stint in school associated to a specific period.在数据中,每一行都是与特定时期相关的在校时期。 I'm interested in creating a sequence of weeks (from the first week to the last week in dataset) in the x axis and on the y axis I want just either a dot for each week to signify whether the person was in school (also identifying which stint) or out of school (even a gap perhaps would suffice).我有兴趣在 x 轴和 y 轴上创建一个周序列(从数据集中的第一周到最后一周),我希望每周都有一个点来表示该人是否在学校(也确定哪个时间段)或失学(甚至可能就足够了)。 Thus perhaps an 8 level factor is needed in this case, one for each period, and a level for out of school (or perhaps no level is needed for when out of school)?因此,在这种情况下,可能需要一个 8 级因子,一个用于每个时期,一个用于失学的水平(或者也许在失学时不需要任何水平)?

So in this case we could envisage having 7 rows of dots on the y axis, something (very loosely) like this (this example has many gap in lines, but I expect few or no gaps).所以在这种情况下,我们可以设想在 y 轴上有 7 行点,像这样(非常松散地)(这个例子有很多线间隙,但我预计很少或没有间隙)。

在此处输入图像描述

I envisaged the process to be something like: create a sequence from min(start_date) to max(end_date), join rows to this.我设想这个过程类似于:创建一个从 min(start_date) 到 max(end_date) 的序列,将行连接到这个序列。 Then somehow identify each period and create a factor variable for each period.然后以某种方式识别每个时期并为每个时期创建一个因子变量。 Then plot the factor variable (eg period1, period2, period3) against the sequence of dates.然后 plot 针对日期序列的因子变量(例如 period1、period2、period3)。 I haven't been able to do this though as it's quite fiddly.我无法做到这一点,因为它非常繁琐。

Looking at the lubridate package I was thinking that using interval() and %within% might be the solution but I wasn't sure.看着 lubridate package 我在想使用 interval() 和 %within% 可能是解决方案,但我不确定。

library(tidyverse)
library(lubridate)
                              
start_dates = ymd_hms(c("2019-05-08 00:00:00",
                        "2020-01-17 00:00:00",
                        "2020-03-03 00:00:00",
                        "2020-05-28 00:00:00",
                        "2020-12-10 00:00:00",
                        "2021-05-07 00:00:00",
                        "2022-01-04 00:00:00"), tz = "UTC")
  
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
                       "2020-03-03 00:00:00", 
                       "2020-05-28 00:00:00",
                       "2020-12-10 00:00:00",
                       "2021-05-07 00:00:00",
                       "2022-01-04 00:00:00",
                       "2022-01-19 00:00:00"), tz = "UTC") 

df = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
 

You can try你可以试试

df %>% 
  ggplot() + 
   geom_segment(aes(x = start_dates, xend = end_dates, y =studying, yend = studying, color = studying), size=3) + 
  geom_segment(aes(x = start_dates, xend = start_dates, y =0, yend = studying))+
  geom_segment(aes(x = end_dates, xend = end_dates, y =0, yend = studying))

在此处输入图像描述

Per wwek as you asked in the comments正如你在评论中所问的那样

df %>% 
  as_tibble() %>%
  mutate(start = week(start_dates),
         end = week(end_dates)) %>% 
  mutate(gr = start>end, 
         start_2 = ifelse(gr, 0, NA),
         end_2 = ifelse(gr, end, NA),
         end = ifelse(gr, 52, end)) %>% 
  select(-2:-3, -gr) %>% 
  pivot_longer(-1) %>% 
  filter(!is.na(value)) %>% 
  separate(col = name, into = c("name", "index"), sep = "_", fill = "right") %>%  
  mutate(index = ifelse(is.na(index), 1, index)) %>% 
  pivot_wider(names_from = "name", values_from = "value") %>% 
  ggplot(aes(y=studying , yend=studying , x=start, xend=end, color=studying)) + 
   geom_segment(size = 2)

在此处输入图像描述

To get overlaps you can use the valr package.要获得重叠,您可以使用valr package。 Since it is developed to find overlaps in DNA segments the data needs some transformation.由于它是为了发现 DNA 片段中的重叠而开发的,因此数据需要进行一些转换。 Start end end are calculated using a cumsum week approach.开始结束结束使用 cumsum week 方法计算。 Chrom is set to "1" .色度设置为"1"

library(valr)
df %>% 
  as_tibble() %>%
  mutate(start = week(start_dates) + (year(start_dates)-min(year(start_dates)))*52,
         end = week(end_dates) + (year(end_dates)-min(year(end_dates)))*52,
         chrom="1", 
         index=1:n()) %>%  
  valr::bed_intersect(., .) %>% 
  filter(studying.x != studying.y) %>% 
  # filter duplicated intervals out
  mutate(index = paste(index.x, index.y) %>% str_split(., " ") %>% map(sort) %>% map_chr(toString)) %>% 
  filter(duplicated(index))

# A tibble: 5 x 15
  studying.x start_dates.x end_dates.x start.x end.x chrom index.x studying.y start_dates.y end_dates.y start.y end.y index.y .overlap index
  <chr>              <dbl>       <dbl>   <dbl> <dbl> <chr>   <int> <chr>              <dbl>       <dbl>   <dbl> <dbl>   <int>    <int> <chr>
1 period3       1583193600  1590624000      61    74 1           3 period2       1579219200  1583193600      55    61       2        0 2, 3 
2 period4       1590624000  1607558400      74   102 1           4 period3       1583193600  1590624000      61    74       3        0 3, 4 
3 period5       1607558400  1620345600     102   123 1           5 period4       1590624000  1607558400      74   102       4        0 4, 5 
4 period6       1620345600  1641254400     123   157 1           6 period5       1607558400  1620345600     102   123       5        0 5, 6 
5 period7       1641254400  1642550400     157   159 1           7 period6       1620345600  1641254400     123   157       6        0 6, 7 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM