简体   繁体   English

如何计算两个日期之间的持续时间

[英]How to calculate duration of time between two dates

I'm working with a large data set in RStudio that includes multiple test scores for the same individuals.我正在使用 RStudio 中的大型数据集,其中包括同一个人的多个测试分数。 I've filtered my data set to display the same individual's scores in two consecutive rows with the test date for each test administration in one column.我已经过滤了我的数据集,以在连续两行中显示同一个人的分数,并在一列中显示每次测试管理的测试日期。 My data appears as follows:我的数据显示如下:

id  test_date     score    baseline_number_1    baseline_number_2 
1   08/15/2017    21.18          Baseline             N/A
1   08/28/2019    28.55             N/A             Baseline
2   11/22/2017    33.38          Baseline             N/A
2   11/06/2019    35.3              N/A             Baseline
3   07/25/2018    30.77          Baseline             N/A
3   07/31/2019    33.42             N/A             Baseline

I would like to calculate the total duration of time between baseline 1 and baseline 2 administration and store that value in a new column.我想计算基线 1 和基线 2 管理之间的总持续时间,并将该值存储在新列中。 Therefore, my first question is what is the best way to calculate the duration of time between two dates?因此,我的第一个问题是计算两个日期之间持续时间的最佳方法是什么? And two, what is the best way to condense each individual's data into one row to make calculating the difference between test scores easier and to be stored in a new column?第二,将每个人的数据压缩到一行以使计算测试分数之间的差异更容易并存储在新列中的最佳方法是什么?

Thank you for any assistance!感谢您的任何帮助!

This is a solution inside the tidyverse universe.这是tidyverse宇宙中的一个解决方案。 The packages we are going to use are dplyr and tidyr .我们要使用的包是dplyrtidyr

First, we create the dataset (you read it from a file instead) and convert strings to date format:首先,我们创建数据集(改为从文件中读取)并将字符串转换为日期格式:

library(dplyr)
library(tidyr)

dataset <- read.table(text = "id  test_date     score    baseline_number_1    baseline_number_2 
1   08/15/2017    21.18          Baseline             N/A
1   08/28/2019    28.55             N/A             Baseline
2   11/22/2017    33.38          Baseline             N/A
2   11/06/2019    35.3              N/A             Baseline
3   07/25/2018    30.77          Baseline             N/A
3   07/31/2019    33.42             N/A             Baseline", header = TRUE)
dataset$test_date <- as.Date(dataset$test_date, format = "%m/%d/%Y")

#   id  test_date score baseline_number_1 baseline_number_2
# 1  1 2017-08-15 21.18          Baseline              <NA>
# 2  1 2019-08-28 28.55              <NA>          Baseline
# 3  2 2017-11-22 33.38          Baseline              <NA>
# 4  2 2019-11-06 35.30              <NA>          Baseline
# 5  3 2018-07-25 30.77          Baseline              <NA>
# 6  3 2019-07-31 33.42              <NA>          Baseline

The best solution to condense each individual's data into one row and compute the difference between the two baselines can be achieved as follows:将每个人的数据压缩到一行并计算两个基线之间的差异的最佳解决方案可以实现如下:

dataset %>% 
  group_by(id) %>% 
  mutate(number = row_number()) %>% 
  ungroup() %>% 
  pivot_wider(
    id_cols = id,
    names_from = number, 
    values_from = c(test_date, score), 
    names_glue = "{.value}_{number}"
    ) %>% 
  mutate(
    time_between = test_date_2 - test_date_1
  )

Brief explanation: first we create the variable number which indicates the baseline number in each row;简要说明:首先我们创建变量number ,表示每行中的基线号; then we use pivot_wider to make the dataset "wider" indeed, ie we have one row for each id along with its features;然后我们使用pivot_wider使数据集确实“更宽”,即每个 id 及其特征都有一行; finally we create the variable time_between which contains the difference in days between two baselines.最后,我们创建变量time_between ,其中包含两个基线之间的天数差异。 In you are not familiar with some of these functions, I suggest you break the pipeline after each operation and analyse it step by step.在您对其中一些功能不熟悉的情况下,建议您在每次操作后断开管道并逐步分析。

Final output最终 output

# A tibble: 3 x 6
#      id test_date_1 test_date_2 score_1 score_2 time_between
#   <int> <date>      <date>        <dbl>   <dbl> <drtn>      
# 1     1 2017-08-15  2019-08-28     21.2    28.6 743 days    
# 2     2 2017-11-22  2019-11-06     33.4    35.3 714 days    
# 3     3 2018-07-25  2019-07-31     30.8    33.4 371 days

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM