简体   繁体   English

如何根据下一个开始日期计算结束日期 R,并将数据重塑为日期计数/时间序列?

[英]How to calculate end date R based on next start date, and reshaping the data into date count / time series?

Beginner here again初学者又来了

I have been looking for an answer on stackoverflow, without succes我一直在寻找关于 stackoverflow 的答案,但没有成功

If you know have online tutorials which explains how I should/could tackle these problems, I would love to hear.如果您知道有在线教程来解释我应该/可以如何解决这些问题,我很想听听。

DATA数据

test <- structure(list(record_id = c(110032, 110032, 110321, 110321, 
110032, 110032, 110032, 110032, 110321), start_fu = structure(c(16302, 
16302, 17308, 17308, 16302, 16302, 16302, 16302, 17308), class = "Date"), 
    end_fu = structure(c(17033, 17033, 17828, 17828, 17033, 17033, 
    17033, 17033, 17828), class = "Date"), start_course = structure(c(16301, 
    16302, 17307, 17308, 16355, 16325, 16344, 16499, 17824), class = "Date"), 
    course = structure(c(0, 1, 3, 3, 5, 3, 0, 3, 0), class = c("haven_labelled", 
    "vctrs_vctr", "double"))), row.names = c(NA, -9L), groups = structure(list(
    record_id = c(110032, 110321), .rows = structure(list(c(1L, 
    2L, 5L, 6L, 7L, 8L), c(3L, 4L, 9L)), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

EXPLANATION AND VARIABLES解释和变量

So I collected follow-up data from multiple records.所以我从多条记录中收集了后续数据。 Now, I am showing two records.现在,我展示了两个记录。 During the follow-up, these people can switch courses.在跟进期间,这些人可以切换课程。 The start date of this course has been recorded.本课程的开始日期已被记录。

  • record_id = individual unique id record_id = 个人唯一 ID
  • start_fu = start of follow-up start_fu = 开始跟进
  • end_fu = end of follow-up end_fu = 随访结束
  • start_course = start date of the course start_course = 课程开始日期
  • course = which course was started course = 开始的课程

QUESTION 1问题 1

I want to create a variable called stop_course.我想创建一个名为 stop_course 的变量。 This is calculated based on the start_course of the next course.这是根据下一门课程的 start_course 计算的。 (start_course - 1 day) If there is no next course, then it should be based on the end_fu date. (start_course - 1 天) 如果没有下一门课程,那么应该以 end_fu 日期为准。

EXPECTED OUTPUT 1预期产出 1

| record_id | start_fu   | end_fu     | start_course | course | stop_course |
|-----------|------------|------------|--------------|--------|-------------|
|    110032 | 2014-08-20 | 2016-08-20 | 2014-08-19   | 0      | 2014-08-19  |
|    110032 | 2014-08-20 | 2016-08-20 | 2014-08-20   | 1      | 2014-09-11  |
|    110032 | 2014-08-20 | 2016-08-20 | 2014-09-12   | 3      | 2014-09-30  |
|    110032 | 2014-08-20 | 2016-08-20 | 2014-10-01   | 0      | 2014-10-11  |
|    110032 | 2014-08-20 | 2016-08-20 | 2014-10-12   | 5      | 2014-03-04  |
|    110032 | 2014-08-20 | 2016-08-20 | 2015-03-05   | 3      | 2016-08-20  |
|    110321 | 2017-05-22 | 2018-10-24 | 2017-05-21   | 3      | 2017-05-21  |
|    110321 | 2017-05-22 | 2018-10-24 | 2017-05-22   | 3      | 2018-10-19  |
|    110321 | 2017-05-22 | 2018-10-24 | 2018-10-20   | 0      | 2018-10-24  |

QUESTION 2 At the end I want to create per record_id a day to day list with their courses.问题 2最后,我想为每个 record_id 创建一个包含他们课程的日常列表。 Thus: create a variable day_count因此:创建一个变量 day_count

EXPECTED OUTPUT 2预期产出 2

| record_id | day_count | date       | course |
|-----------|-----------|------------|--------|
|    110032 | 0         | 2014-08-19 | 0      |
|    110032 | 1         | 2014-08-20 | 1      |
|    110032 | 2         | 2014-08-21 | 1      |
|       ... | ...       | ...        | ...    |
|    110032 | 24        | 2014-09-12 | 3      |
|    110032 | 25        | 2013-09-13 | 3      |
|       ... | ...       | ...        | ...    |

Hope you can help me with coding or providing me some good tutorials希望你能帮助我编码或为我提供一些好的教程

BW KB带宽知识库

Using dplyr and tidyr here is a way :在这里使用dplyrtidyr是一种方法:

We can use lead to get next date of start_course and subtract 1 day from it with default value as last value from end_fu in each record_id .我们可以用lead来获取下一个日期start_course并从中减去1天default值作为last的值end_fu每个record_id We can then create a sequence from first date till last date, fill the course value and create a day_count column.然后我们可以创建一个从第一个日期到最后一个日期的序列, fill course值并创建一个day_count列。

library(dplyr)
library(tidyr)

test %>%
  group_by(record_id) %>%
  mutate(stop_course = lead(start_course - 1, default = last(end_fu))) %>%
  complete(start_course = seq(min(start_course), max(start_course), 'day')) %>%
  select(-ends_with('fu'), -stop_course) %>%
  fill(course) %>%
  mutate(day_count = row_number() - 1) %>%
  rename(date = start_course) 


#   record_id date          course day_count
#       <dbl> <date>     <dbl+lbl>     <dbl>
# 1    110032 2014-08-19         0         0
# 2    110032 2014-08-20         1         1
# 3    110032 2014-08-21         1         2
# 4    110032 2014-08-22         1         3
# 5    110032 2014-08-23         1         4
# 6    110032 2014-08-24         1         5
# 7    110032 2014-08-25         1         6
# 8    110032 2014-08-26         1         7
# 9    110032 2014-08-27         1         8
#10    110032 2014-08-28         1         9
# … with 707 more rows

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算R中开始日期结束日期间隔的记录? - How to count records with start date end date interval in R? R函数返回时间序列ts()对象的开始和结束日期? - R function to return start & end date of a time series ts() object? 生成具有特定开始和结束日期的时间序列 - Generating a time series with a specific start and end date 如何根据 R 中的持续时间(显示开始和结束时间,以及持续时间以小时和分钟为单位)计算最长日期 - how to calculate the longest date based on time duration (display begin and end time, and duration in hours and minutes) in R 如何使用多个开始和结束日期的输入来计算时间序列中指定日期/时间范围内的摘要统计信息? - How to calculate summary statistics within specified date/time range within time series, using an input of multiple start and end dates? 如何根据开始和结束日期计算季度下降的观察次数? - How to count the number of observation that fall in quarter based on start and end date? 如何根据开始和结束日期将一条记录拆分为多条记录 R - How to split a record into multiple record based on start and end date R 如何使用每小时数据在时间序列中设置开始日期 - How to set up start date in time series in with hourly data R 原生时间序列:日期数据 - R native time series: date data 如何在R中以日期格式绘制数据的时间序列图 - How to draw time series plot for data in date format in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM