Group_by 和 R 中的日期摘要

Question

R 用户！

在过去的 2 个小时里，我一直在努力解决这个问题，但我找不到任何解决方案。

介绍：

我正在研究一个 covid 数据集，我必须根据给定地点计算一周的发病率和流行率。

发生率很容易，我的代码是这样的：

我创建了一个列“week”，用这个 function 将日期分配到一周：

floor_date_by_week <- function(the_date) {
  return(lubridate::date(the_date) - lubridate::wday(the_date-1) +1 )
}

然后我计算了发生率

output <- data %>% 
  group_by(week, Location) %>% 
  summarise(cases = n()) %>% 
  left_join(., pop, by = c("Location" = "Location")) %>% 
  mutate(inc = round(100000*n/pop,2)) %>% 
  select(-pop)

现在我必须计算每周和地点的实际阳性数，这让我发疯。

问题：

我的数据集中的每一行都是一个人，我有一个变量表示感染日期，一个变量表示恢复/死亡日期。 在这两个日期之间，患者呈阳性，我必须将其包含在group_by中，但我不知道如何。

示例玩具数据集：

病人编号	感染日期	date_of_recovery_death	地点	星期
1	2020-02-21	2020-03-02	一个	2020-02-17
2	2020-02-23	2020-04-15	一个	2020-02-17
3	2020-02-26	2020-03-12	乙	2020-02-24
...	...	...	...	...

Answer 1

这可能会有所帮助

df <- read.table(text = "Patientid  date_of_infection   date_of_recovery_death  Location    Week
1   2020-02-21  2020-03-02  A   2020-02-17
2   2020-02-23  2020-04-15  A   2020-02-17
3   2020-02-26  2020-03-12  B   2020-02-24", header = T)

suppressMessages(library(tidyverse, lubridate))
df %>% pivot_longer(c(date_of_infection, date_of_recovery_death), names_prefix = 'date_of_',
                    names_to = 'event', values_to = 'date') %>%
  mutate(date = as.Date(date),
         Week = date - lubridate::wday(date-1) +1,
         dummy = ifelse(event == 'infection', 1, -1)) %>%
  group_by(Week, Location) %>%
  summarise(active_cases_addition_or_recovered = sum(dummy), .groups = 'drop') %>%
  arrange(Location, Week) %>%
  group_by(Location) %>%
  mutate(net_active_cases = cumsum(active_cases_addition_or_recovered))
#> # A tibble: 5 x 4
#>   Week       Location active_cases_addition_or_recovered net_active_cases
#>   <date>     <chr>                                 <dbl>            <dbl>
#> 1 2020-02-17 A                                         2                2
#> 2 2020-03-02 A                                        -1                1
#> 3 2020-04-13 A                                        -1                0
#> 4 2020-02-24 B                                         1                1
#> 5 2020-03-09 B                                        -1                0

^{由代表 package (v2.0.0) 于 2021 年 5 月 5 日创建}

如果有一些缺失的数据，这将进一步有助于更好地展示

df %>% pivot_longer(c(date_of_infection, date_of_recovery_death), names_prefix = 'date_of_',
                    names_to = 'event', values_to = 'date') %>%
  mutate(date = as.Date(date),
         Week = date - lubridate::wday(date-1) +1,
         dummy = ifelse(event == 'infection', 1, -1)) %>%
  group_by(Week, Location) %>%
  summarise(active_cases_addition_or_recovered = sum(dummy), .groups = 'drop') %>%
  complete(Week = seq.Date(min(Week), max(Week), by = '7 days'), 
           nesting(Location), 
           fill = list(active_cases_addition_or_recovered = 0)) %>%
  arrange(Location, Week) %>%
  group_by(Location) %>%
  mutate(net_active_cases = cumsum(active_cases_addition_or_recovered))
#> # A tibble: 18 x 4
#>    Week       Location active_cases_addition_or_recovered net_active_cases
#>    <date>     <chr>                                 <dbl>            <dbl>
#>  1 2020-02-17 A                                         2                2
#>  2 2020-02-24 A                                         0                2
#>  3 2020-03-02 A                                        -1                1
#>  4 2020-03-09 A                                         0                1
#>  5 2020-03-16 A                                         0                1
#>  6 2020-03-23 A                                         0                1
#>  7 2020-03-30 A                                         0                1
#>  8 2020-04-06 A                                         0                1
#>  9 2020-04-13 A                                        -1                0
#> 10 2020-02-17 B                                         0                0
#> 11 2020-02-24 B                                         1                1
#> 12 2020-03-02 B                                         0                1
#> 13 2020-03-09 B                                        -1                0
#> 14 2020-03-16 B                                         0                0
#> 15 2020-03-23 B                                         0                0
#> 16 2020-03-30 B                                         0                0
#> 17 2020-04-06 B                                         0                0
#> 18 2020-04-13 B                                         0                0

^{由代表 package (v2.0.0) 于 2021 年 5 月 5 日创建}

Answer 2

图书馆和数据

library(dplyr)
library(tidyr)
library(lubridate)

df <- read.table(text = "Patientid  date_of_infection   date_of_recovery_death  Location
1   2020-02-21  2020-03-02  A
2   2020-02-23  2020-04-15  A
3   2020-02-26  2020-03-12  B", header = T)

数据准备

地板日期到一周的开始

# data preparation
df <- df %>%
  mutate(across(c(date_of_infection, date_of_recovery_death), as_date)) %>% 
  mutate(across(c(date_of_infection, date_of_recovery_death), floor_date, unit = "week", week_start = 1))

感染人数

扩大每个 Patient-Location 的感染周数，然后计算每周和 Location 中感染的患者。

# number of infected by week
df %>% 
  rowwise() %>% 
  summarise(week = seq.Date(date_of_infection, date_of_recovery_death, by = "7 days"),
            Patientid, Location) %>% 
  count(Location, week)

#> # A tibble: 12 x 3
#>    Location week           n
#>    <chr>    <date>     <int>
#>  1 A        2020-02-17     2
#>  2 A        2020-02-24     2
#>  3 A        2020-03-02     2
#>  4 A        2020-03-09     1
#>  5 A        2020-03-16     1
#>  6 A        2020-03-23     1
#>  7 A        2020-03-30     1
#>  8 A        2020-04-06     1
#>  9 A        2020-04-13     1
#> 10 B        2020-02-24     1
#> 11 B        2020-03-02     1
#> 12 B        2020-03-09     1

Group_by 和 R 中的日期摘要

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-05-05 16:46:51

解决方案2
1 2021-05-05 17:16:44

图书馆和数据

数据准备

感染人数

Group_by 和 R 中的日期摘要

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-05-05 16:46:51

解决方案2 1 2021-05-05 17:16:44

图书馆和数据

数据准备

感染人数

解决方案1
2 已采纳 2021-05-05 16:46:51

解决方案2
1 2021-05-05 17:16:44