如何根据 R 中的日期从重复数据中创建聚合数据

Question

I have longitudinal patient data in R.我在 R 中有纵向患者数据。 I would like to create an aggregate table like table 2 below from table 1. so Table 2 would only have one row for each patient and have total counts of consultations before the registration date (column 3 in table 1) and total consultations after the registration date我想从表 1 中创建一个如下表 2 的汇总表。因此表 2 中每个患者只有一行，并且在注册日期之前（表 1 中的第 3 列）的咨询总数和注册后的总咨询次数日期

Table1:表格1：

patid帕蒂	consultation_date咨询日期	registration_date注册日期	consultation_count咨询次数
1 1	07/07/2016 2016 年 7 月 7 日	07/07/2018 2018 年 7 月 7 日	1 1
1 1	07/07/2019 2019 年 7 月 7 日	07/07/2018 2018 年 7 月 7 日	1 1
1 1	07/07/2020 2020 年 7 月 7 日	07/07/2018 2018 年 7 月 7 日	1 1
2 2	14/08/2016 2016 年 8 月 14 日	07/09/2016 2016 年 7 月 9 日	1 1
2 2	07/05/2015 2015 年 7 月 5 日	07/09/2016 2016 年 7 月 9 日	1 1
2 2	02/12/2016 2016 年 2 月 12 日	07/09/2016 2016 年 7 月 9 日	1 1

Table 2:表 2：

patid帕蒂	consultation_count_pre_registration Consultation_count_pre_registration	consultation_count_post_registration Consultation_count_post_registration
1 1	1 1	2 2
2 2	2 2	1 1

Answer 1

We could convert the 'date' to Date class, then group by 'patid', get the sum of logical vector from the 'consultation_date' and 'registration_date'我们可以将'date'转换为Date class，然后按'patid'分组，从'consultation_date'和'registration_date'得到逻辑向量的sum

library(dplyr)
library(lubridate)
df1 %>%
    mutate(across(ends_with('date'), dmy)) %>%
    group_by(patid) %>%
    summarise(
     count_pre = sum(consultation_date < registration_date, na.rm = TRUE),
      count_post = sum(consultation_date > registration_date, na.rm = TRUE), 
     .groups = 'drop')

-output -输出

# A tibble: 2 × 3
  patid count_pre count_post
  <int>     <int>      <int>
1     1         1          2
2     2         2          1

data数据

df1 <- structure(list(patid = c(1L, 1L, 1L, 2L, 2L, 2L), 
consultation_date = c("07/07/2016", 
"07/07/2019", "07/07/2020", "14/08/2016", "07/05/2015", "02/12/2016"
), registration_date = c("07/07/2018", "07/07/2018", "07/07/2018", 
"07/09/2016", "07/09/2016", "07/09/2016"), consultation_count = c(1L, 
1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-6L))

Answer 2

Similar to akrun in using tidyverse but slightly different approach:使用tidyverse与 akrun 类似，但方法略有不同：

library(dplyr)
library(tidyr)

consultations  |>
    mutate(period = ifelse(
        registration_date <= consultation_date, 
        "after registration",
        "before registration"
    )
    )  |>
    group_by(patid, period)  |>
    summarise(n = n())  |>
    pivot_wider(
        names_from = period, 
        values_from = n
    )

# A tibble: 2 x 3
# Groups:   patid [2]
#   patid `after registration` `before registration`
#   <int>                <int>                 <int>
# 1     1                    2                     1
# 2     2                    1                     2

Data数据

consultations  <- read.table(text = "patid  consultation_date   registration_date   consultation_count
1   07/07/2016  07/07/2018  1
1   07/07/2019  07/07/2018  1
1   07/07/2020  07/07/2018  1
2   14/08/2016  07/09/2016  1
2   07/05/2015  07/09/2016  1
2   02/12/2016  07/09/2016  1", h=T)

如何根据 R 中的日期从重复数据中创建聚合数据

问题描述

2 个解决方案

解决方案1
0 2022-07-26 15:18:58

data数据

解决方案2
0 2022-07-26 15:22:21

如何根据 R 中的日期从重复数据中创建聚合数据

问题描述

2 个解决方案

解决方案1 0 2022-07-26 15:18:58

data数据

解决方案2 0 2022-07-26 15:22:21

解决方案1
0 2022-07-26 15:18:58

解决方案2
0 2022-07-26 15:22:21