繁体   English   中英

将日期范围更改为一系列日期(从宽到长)

[英]Changing date range into series of dates (wide to long)

我想要类似下面的数据

data<- data.frame("Subject" = c("13434","14544", "14544", 
                             "22222","22222","22222"), 
                  "Period" = c("MAD", "MAD", "OSE", "MAD","OSE","OSE"), 
                  "Dose" = c(400, 800, 800, 400, 800, 1200), 
                  "Start" = as.Date(c('2017-04-18','2017-06-13'
                        ,"2018-09-27", "2017-06-06","2018-08-21","2018-12-12")), 
                  "End" = as.Date(c("2017-05-16","2017-07-11", "2019-02-09",
                      "2017-07-04", "2018-12-11","2019-02-05")))

 data
Subject Period Dose  Start   End 
 13434  MAD  400    2017-04-18  2017-05-16
 14544  MAD  800    2017-06-13  2017-07-11
 14544  OSE  800    2018-09-27  2019-02-09
 22222  MAD  400    2017-06-06  2017-07-04
 22222  OSE  800    2018-08-21  2018-12-11
 22222  OSE  1200   2018-12-12  2019-02-05

并将其转换为类似于以下内容的内容,该行中的每个日期均被赋予一行,并且剂量在该范围内按天累加。 在理想世界中,当时间段发生变化时,累积剂量将从上一个时间段结束处继续。

Subject Period Sum_Dose   Day
 13434  MAD    400   2017-04-18
 13434  MAD    800   2017-04-19
 13434  MAD   1200   2017-04-20
 13434  MAD   1600   2017-04-21
 13434  MAD   2000   2017-04-22
 13434  MAD   2400   2017-04-23
 Etc. 

在给定的时期和剂量下针对每个受试者。

这条路?

library(tidyverse)

dat %>%
  group_by(Subject, Period, Dose) %>%
  summarize(Day = list(seq(Start, End, by = 'day'))) %>% 
  unnest(Day) %>%
  mutate(Dose = cumsum(Dose)) %>%
  ungroup()

输出:

# A tibble: 392 x 4
   Subject Period  Dose Day       
   <fct>   <fct>  <dbl> <date>    
 1 13434   MAD      400 2017-04-18
 2 13434   MAD      800 2017-04-19
 3 13434   MAD     1200 2017-04-20
 4 13434   MAD     1600 2017-04-21
 5 13434   MAD     2000 2017-04-22
 6 13434   MAD     2400 2017-04-23
 7 13434   MAD     2800 2017-04-24
 8 13434   MAD     3200 2017-04-25
 9 13434   MAD     3600 2017-04-26
10 13434   MAD     4000 2017-04-27
# ... with 382 more rows

我认为元组(Subject, Period, Dose)是唯一的。 如果没有,您可以通过Start End添加分组。

而“理想世界”可以通过以下方式实现:

dat %>%
  group_by(Subject, Period, Dose) %>%
  summarize(Day = list(seq(Start, End, by = 'day'))) %>% 
  unnest(Day) %>%
  group_by(Subject) %>%
  arrange(Day) %>%
  mutate(Dose = cumsum(Dose)) %>%
  ungroup() 

如果我们在上面的代码中添加以下行:

... %>% filter(Day >= as.Date("2018-12-11"), Day <= as.Date("2018-12-12"), 
               Subject == "22222")

它将输出:

  Subject Period   Dose Day       
  <fct>   <fct>   <dbl> <date>    
1 22222   OSE    102000 2018-12-11
2 22222   OSE    103200 2018-12-12

因此,似乎可以正确地计算前后相继的周期的cumsum (相加1200,这是下一个周期的下一个剂量)。

谢谢@utubun! 我结束了

library(dplyr)
library(tidyr)
dose.long <- data %>% 
  gather(g, DAY, Start, End) %>% 
  select(-g) %>%
  group_by(Subject, Period, Dose) %>% arrange(Subject, DAY) %>% 
  filter(is.na(DAY) == F) %>% 
  # Create a list column that includes all grades between existing
  summarize(DAY = list(full_seq(DAY, 1))) %>%
  # unnest the list
  unnest() %>% ungroup()%>%
  group_by(Subject)%>%
  mutate(Sum_Dose = cumsum(Dose))

如果我理解正确,则OP希望

  1. 将每一行扩展为给定Start日期和End日期之间的天数序列,
  2. Dose累积每个Subject Dose

这里不需要重塑“ 宽到长 ”,例如, gather()melt() (并且指向错误的方向,恕我直言)。

dplyrtidyr

这是使用dplyrtidyr的实现。 由于seq()不接受向量参数,因此我们需要按每一行分组,并unnest()扩展的日期。

library(dplyr)
library(tidyr)
dat %>% 
  group_by(rn = row_number()) %>%
  mutate(Day = list(seq(Start, End, "1 day"))) %>% 
  unnest() %>% 
  arrange(Subject, Day) %>% 
  group_by(Subject)%>%
  mutate(Sum_Dose = cumsum(Dose)) %>% 
  select(Subject, Period, Sum_Dose, Day)

请注意,如果尚未对dat进行排序,或者在日期范围重叠的情况下,在调用cumsum()之前按Day进行排序只是一个警告。

 # A tibble: 392 x 5 # Groups: Subject [3] Subject Period Dose DAY Sum_Dose <fct> <fct> <dbl> <date> <dbl> 1 13434 MAD 400 2017-04-18 400 2 13434 MAD 400 2017-04-19 800 3 13434 MAD 400 2017-04-20 1200 4 13434 MAD 400 2017-04-21 1600 5 13434 MAD 400 2017-04-22 2000 6 13434 MAD 400 2017-04-23 2400 7 13434 MAD 400 2017-04-24 2800 8 13434 MAD 400 2017-04-25 3200 9 13434 MAD 400 2017-04-26 3600 10 13434 MAD 400 2017-04-27 4000 # ... with 382 more rows 

data.table

data.table版本实现了相同的方法,但是由于隐式地进行了“ data.table ”操作,因此较为冗长。

library(data.table)
setDT(dat)[, rn := .I][
  , .(Subject, Period, Dose, Day = seq(Start, End, "1 day")), by = rn][
    order(Day), .(Period, Sum_Dose = cumsum(Dose), Day), keyby = Subject]
  Subject Period Sum_Dose Day 1: 13434 MAD 400 2017-04-18 2: 13434 MAD 800 2017-04-19 3: 13434 MAD 1200 2017-04-20 4: 13434 MAD 1600 2017-04-21 5: 13434 MAD 2000 2017-04-22 --- 388: 14544 OSE 128800 2019-02-05 389: 14544 OSE 129600 2019-02-06 390: 14544 OSE 130400 2019-02-07 391: 14544 OSE 131200 2019-02-08 392: 14544 OSE 132000 2019-02-09 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM