[英]Group_by and between date summary in R
R 用户!
在过去的 2 个小时里,我一直在努力解决这个问题,但我找不到任何解决方案。
介绍:
我正在研究一个 covid 数据集,我必须根据给定地点计算一周的发病率和流行率。
发生率很容易,我的代码是这样的:
我创建了一个列“week”,用这个 function 将日期分配到一周:
floor_date_by_week <- function(the_date) {
return(lubridate::date(the_date) - lubridate::wday(the_date-1) +1 )
}
然后我计算了发生率
output <- data %>%
group_by(week, Location) %>%
summarise(cases = n()) %>%
left_join(., pop, by = c("Location" = "Location")) %>%
mutate(inc = round(100000*n/pop,2)) %>%
select(-pop)
现在我必须计算每周和地点的实际阳性数,这让我发疯。
问题:
我的数据集中的每一行都是一个人,我有一个变量表示感染日期,一个变量表示恢复/死亡日期。 在这两个日期之间,患者呈阳性,我必须将其包含在group_by
中,但我不知道如何。
示例玩具数据集:
病人编号 | 感染日期 | date_of_recovery_death | 地点 | 星期 |
---|---|---|---|---|
1 | 2020-02-21 | 2020-03-02 | 一个 | 2020-02-17 |
2 | 2020-02-23 | 2020-04-15 | 一个 | 2020-02-17 |
3 | 2020-02-26 | 2020-03-12 | 乙 | 2020-02-24 |
... | ... | ... | ... | ... |
这可能会有所帮助
df <- read.table(text = "Patientid date_of_infection date_of_recovery_death Location Week
1 2020-02-21 2020-03-02 A 2020-02-17
2 2020-02-23 2020-04-15 A 2020-02-17
3 2020-02-26 2020-03-12 B 2020-02-24", header = T)
suppressMessages(library(tidyverse, lubridate))
df %>% pivot_longer(c(date_of_infection, date_of_recovery_death), names_prefix = 'date_of_',
names_to = 'event', values_to = 'date') %>%
mutate(date = as.Date(date),
Week = date - lubridate::wday(date-1) +1,
dummy = ifelse(event == 'infection', 1, -1)) %>%
group_by(Week, Location) %>%
summarise(active_cases_addition_or_recovered = sum(dummy), .groups = 'drop') %>%
arrange(Location, Week) %>%
group_by(Location) %>%
mutate(net_active_cases = cumsum(active_cases_addition_or_recovered))
#> # A tibble: 5 x 4
#> Week Location active_cases_addition_or_recovered net_active_cases
#> <date> <chr> <dbl> <dbl>
#> 1 2020-02-17 A 2 2
#> 2 2020-03-02 A -1 1
#> 3 2020-04-13 A -1 0
#> 4 2020-02-24 B 1 1
#> 5 2020-03-09 B -1 0
由代表 package (v2.0.0) 于 2021 年 5 月 5 日创建
如果有一些缺失的数据,这将进一步有助于更好地展示
df %>% pivot_longer(c(date_of_infection, date_of_recovery_death), names_prefix = 'date_of_',
names_to = 'event', values_to = 'date') %>%
mutate(date = as.Date(date),
Week = date - lubridate::wday(date-1) +1,
dummy = ifelse(event == 'infection', 1, -1)) %>%
group_by(Week, Location) %>%
summarise(active_cases_addition_or_recovered = sum(dummy), .groups = 'drop') %>%
complete(Week = seq.Date(min(Week), max(Week), by = '7 days'),
nesting(Location),
fill = list(active_cases_addition_or_recovered = 0)) %>%
arrange(Location, Week) %>%
group_by(Location) %>%
mutate(net_active_cases = cumsum(active_cases_addition_or_recovered))
#> # A tibble: 18 x 4
#> Week Location active_cases_addition_or_recovered net_active_cases
#> <date> <chr> <dbl> <dbl>
#> 1 2020-02-17 A 2 2
#> 2 2020-02-24 A 0 2
#> 3 2020-03-02 A -1 1
#> 4 2020-03-09 A 0 1
#> 5 2020-03-16 A 0 1
#> 6 2020-03-23 A 0 1
#> 7 2020-03-30 A 0 1
#> 8 2020-04-06 A 0 1
#> 9 2020-04-13 A -1 0
#> 10 2020-02-17 B 0 0
#> 11 2020-02-24 B 1 1
#> 12 2020-03-02 B 0 1
#> 13 2020-03-09 B -1 0
#> 14 2020-03-16 B 0 0
#> 15 2020-03-23 B 0 0
#> 16 2020-03-30 B 0 0
#> 17 2020-04-06 B 0 0
#> 18 2020-04-13 B 0 0
由代表 package (v2.0.0) 于 2021 年 5 月 5 日创建
library(dplyr)
library(tidyr)
library(lubridate)
df <- read.table(text = "Patientid date_of_infection date_of_recovery_death Location
1 2020-02-21 2020-03-02 A
2 2020-02-23 2020-04-15 A
3 2020-02-26 2020-03-12 B", header = T)
地板日期到一周的开始
# data preparation
df <- df %>%
mutate(across(c(date_of_infection, date_of_recovery_death), as_date)) %>%
mutate(across(c(date_of_infection, date_of_recovery_death), floor_date, unit = "week", week_start = 1))
扩大每个 Patient-Location 的感染周数,然后计算每周和 Location 中感染的患者。
# number of infected by week
df %>%
rowwise() %>%
summarise(week = seq.Date(date_of_infection, date_of_recovery_death, by = "7 days"),
Patientid, Location) %>%
count(Location, week)
#> # A tibble: 12 x 3
#> Location week n
#> <chr> <date> <int>
#> 1 A 2020-02-17 2
#> 2 A 2020-02-24 2
#> 3 A 2020-03-02 2
#> 4 A 2020-03-09 1
#> 5 A 2020-03-16 1
#> 6 A 2020-03-23 1
#> 7 A 2020-03-30 1
#> 8 A 2020-04-06 1
#> 9 A 2020-04-13 1
#> 10 B 2020-02-24 1
#> 11 B 2020-03-02 1
#> 12 B 2020-03-09 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.