[英]how to create aggregate data from repeated data based on a date in R
I have longitudinal patient data in R.我在 R 中有纵向患者数据。 I would like to create an aggregate table like table 2 below from table 1. so Table 2 would only have one row for each patient and have total counts of consultations before the registration date (column 3 in table 1) and total consultations after the registration date我想从表 1 中创建一个如下表 2 的汇总表。因此表 2 中每个患者只有一行,并且在注册日期之前(表 1 中的第 3 列)的咨询总数和注册后的总咨询次数日期
Table1:表格1:
patid帕蒂 | consultation_date咨询日期 | registration_date注册日期 | consultation_count咨询次数 |
---|---|---|---|
1 1 | 07/07/2016 2016 年 7 月 7 日 | 07/07/2018 2018 年 7 月 7 日 | 1 1 |
1 1 | 07/07/2019 2019 年 7 月 7 日 | 07/07/2018 2018 年 7 月 7 日 | 1 1 |
1 1 | 07/07/2020 2020 年 7 月 7 日 | 07/07/2018 2018 年 7 月 7 日 | 1 1 |
2 2 | 14/08/2016 2016 年 8 月 14 日 | 07/09/2016 2016 年 7 月 9 日 | 1 1 |
2 2 | 07/05/2015 2015 年 7 月 5 日 | 07/09/2016 2016 年 7 月 9 日 | 1 1 |
2 2 | 02/12/2016 2016 年 2 月 12 日 | 07/09/2016 2016 年 7 月 9 日 | 1 1 |
Table 2:表 2:
patid帕蒂 | consultation_count_pre_registration Consultation_count_pre_registration | consultation_count_post_registration Consultation_count_post_registration |
---|---|---|
1 1 | 1 1 | 2 2 |
2 2 | 2 2 | 1 1 |
We could convert the 'date' to Date
class, then group by 'patid', get the sum
of logical vector from the 'consultation_date' and 'registration_date'我们可以将'date'转换为Date
class,然后按'patid'分组,从'consultation_date'和'registration_date'得到逻辑向量的sum
library(dplyr)
library(lubridate)
df1 %>%
mutate(across(ends_with('date'), dmy)) %>%
group_by(patid) %>%
summarise(
count_pre = sum(consultation_date < registration_date, na.rm = TRUE),
count_post = sum(consultation_date > registration_date, na.rm = TRUE),
.groups = 'drop')
-output -输出
# A tibble: 2 × 3
patid count_pre count_post
<int> <int> <int>
1 1 1 2
2 2 2 1
df1 <- structure(list(patid = c(1L, 1L, 1L, 2L, 2L, 2L),
consultation_date = c("07/07/2016",
"07/07/2019", "07/07/2020", "14/08/2016", "07/05/2015", "02/12/2016"
), registration_date = c("07/07/2018", "07/07/2018", "07/07/2018",
"07/09/2016", "07/09/2016", "07/09/2016"), consultation_count = c(1L,
1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-6L))
Similar to akrun in using tidyverse
but slightly different approach:使用tidyverse
与 akrun 类似,但方法略有不同:
library(dplyr)
library(tidyr)
consultations |>
mutate(period = ifelse(
registration_date <= consultation_date,
"after registration",
"before registration"
)
) |>
group_by(patid, period) |>
summarise(n = n()) |>
pivot_wider(
names_from = period,
values_from = n
)
# A tibble: 2 x 3
# Groups: patid [2]
# patid `after registration` `before registration`
# <int> <int> <int>
# 1 1 2 1
# 2 2 1 2
Data数据
consultations <- read.table(text = "patid consultation_date registration_date consultation_count
1 07/07/2016 07/07/2018 1
1 07/07/2019 07/07/2018 1
1 07/07/2020 07/07/2018 1
2 14/08/2016 07/09/2016 1
2 07/05/2015 07/09/2016 1
2 02/12/2016 07/09/2016 1", h=T)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.