繁体   English   中英

Group_by 和 R 中的日期摘要

[英]Group_by and between date summary in R

R 用户!

在过去的 2 个小时里,我一直在努力解决这个问题,但我找不到任何解决方案。

介绍:

我正在研究一个 covid 数据集,我必须根据给定地点计算一周的发病率和流行率。

发生率很容易,我的代码是这样的:

我创建了一个列“week”,用这个 function 将日期分配到一周:

floor_date_by_week <- function(the_date) {
  return(lubridate::date(the_date) - lubridate::wday(the_date-1) +1 )
}

然后我计算了发生率

output <- data %>% 
  group_by(week, Location) %>% 
  summarise(cases = n()) %>% 
  left_join(., pop, by = c("Location" = "Location")) %>% 
  mutate(inc = round(100000*n/pop,2)) %>% 
  select(-pop)

现在我必须计算每周和地点的实际阳性数,这让我发疯。

问题:

我的数据集中的每一行都是一个人,我有一个变量表示感染日期,一个变量表示恢复/死亡日期。 在这两个日期之间,患者呈阳性,我必须将其包含在group_by中,但我不知道如何。

示例玩具数据集:

病人编号 感染日期 date_of_recovery_death 地点 星期
1 2020-02-21 2020-03-02 一个 2020-02-17
2 2020-02-23 2020-04-15 一个 2020-02-17
3 2020-02-26 2020-03-12 2020-02-24
... ... ... ... ...

这可能会有所帮助

df <- read.table(text = "Patientid  date_of_infection   date_of_recovery_death  Location    Week
1   2020-02-21  2020-03-02  A   2020-02-17
2   2020-02-23  2020-04-15  A   2020-02-17
3   2020-02-26  2020-03-12  B   2020-02-24", header = T)

suppressMessages(library(tidyverse, lubridate))
df %>% pivot_longer(c(date_of_infection, date_of_recovery_death), names_prefix = 'date_of_',
                    names_to = 'event', values_to = 'date') %>%
  mutate(date = as.Date(date),
         Week = date - lubridate::wday(date-1) +1,
         dummy = ifelse(event == 'infection', 1, -1)) %>%
  group_by(Week, Location) %>%
  summarise(active_cases_addition_or_recovered = sum(dummy), .groups = 'drop') %>%
  arrange(Location, Week) %>%
  group_by(Location) %>%
  mutate(net_active_cases = cumsum(active_cases_addition_or_recovered))
#> # A tibble: 5 x 4
#>   Week       Location active_cases_addition_or_recovered net_active_cases
#>   <date>     <chr>                                 <dbl>            <dbl>
#> 1 2020-02-17 A                                         2                2
#> 2 2020-03-02 A                                        -1                1
#> 3 2020-04-13 A                                        -1                0
#> 4 2020-02-24 B                                         1                1
#> 5 2020-03-09 B                                        -1                0

代表 package (v2.0.0) 于 2021 年 5 月 5 日创建

如果有一些缺失的数据,这将进一步有助于更好地展示

df %>% pivot_longer(c(date_of_infection, date_of_recovery_death), names_prefix = 'date_of_',
                    names_to = 'event', values_to = 'date') %>%
  mutate(date = as.Date(date),
         Week = date - lubridate::wday(date-1) +1,
         dummy = ifelse(event == 'infection', 1, -1)) %>%
  group_by(Week, Location) %>%
  summarise(active_cases_addition_or_recovered = sum(dummy), .groups = 'drop') %>%
  complete(Week = seq.Date(min(Week), max(Week), by = '7 days'), 
           nesting(Location), 
           fill = list(active_cases_addition_or_recovered = 0)) %>%
  arrange(Location, Week) %>%
  group_by(Location) %>%
  mutate(net_active_cases = cumsum(active_cases_addition_or_recovered))
#> # A tibble: 18 x 4
#>    Week       Location active_cases_addition_or_recovered net_active_cases
#>    <date>     <chr>                                 <dbl>            <dbl>
#>  1 2020-02-17 A                                         2                2
#>  2 2020-02-24 A                                         0                2
#>  3 2020-03-02 A                                        -1                1
#>  4 2020-03-09 A                                         0                1
#>  5 2020-03-16 A                                         0                1
#>  6 2020-03-23 A                                         0                1
#>  7 2020-03-30 A                                         0                1
#>  8 2020-04-06 A                                         0                1
#>  9 2020-04-13 A                                        -1                0
#> 10 2020-02-17 B                                         0                0
#> 11 2020-02-24 B                                         1                1
#> 12 2020-03-02 B                                         0                1
#> 13 2020-03-09 B                                        -1                0
#> 14 2020-03-16 B                                         0                0
#> 15 2020-03-23 B                                         0                0
#> 16 2020-03-30 B                                         0                0
#> 17 2020-04-06 B                                         0                0
#> 18 2020-04-13 B                                         0                0

代表 package (v2.0.0) 于 2021 年 5 月 5 日创建

图书馆和数据

library(dplyr)
library(tidyr)
library(lubridate)

df <- read.table(text = "Patientid  date_of_infection   date_of_recovery_death  Location
1   2020-02-21  2020-03-02  A
2   2020-02-23  2020-04-15  A
3   2020-02-26  2020-03-12  B", header = T)

数据准备

地板日期到一周的开始

# data preparation
df <- df %>%
  mutate(across(c(date_of_infection, date_of_recovery_death), as_date)) %>% 
  mutate(across(c(date_of_infection, date_of_recovery_death), floor_date, unit = "week", week_start = 1))

感染人数

扩大每个 Patient-Location 的感染周数,然后计算每周和 Location 中感染的患者。

# number of infected by week
df %>% 
  rowwise() %>% 
  summarise(week = seq.Date(date_of_infection, date_of_recovery_death, by = "7 days"),
            Patientid, Location) %>% 
  count(Location, week)

#> # A tibble: 12 x 3
#>    Location week           n
#>    <chr>    <date>     <int>
#>  1 A        2020-02-17     2
#>  2 A        2020-02-24     2
#>  3 A        2020-03-02     2
#>  4 A        2020-03-09     1
#>  5 A        2020-03-16     1
#>  6 A        2020-03-23     1
#>  7 A        2020-03-30     1
#>  8 A        2020-04-06     1
#>  9 A        2020-04-13     1
#> 10 B        2020-02-24     1
#> 11 B        2020-03-02     1
#> 12 B        2020-03-09     1 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM