簡體   English   中英

將日期聚合為 R 中的日期間隔/周期

[英]Aggregate dates into date intervals / periods in R

我有以下示例數據:

require(tibble)
sample_data <- tibble(
                      emp_name = c("john", "john", "john", "john","john","john", "john"), 
                      task = c("carpenter", "carpenter","carpenter", "painter", "painter", "carpenter", "carpenter"),
                      date_stamp = c("2019-01-01","2019-01-02", "2019-01-03", "2019-01-07", "2019-01-08", "2019-01-30", "2019-02-02")
                      )

為此,我需要根據日期聚合成間隔。

規則是:如果為同一屬性列出的下一個date_stamp之間沒有日期,那么它應該被聚合。 否則, date_stamp_fromdate_stamp_to應該等於date_stamp

desired_result <- tibble(
                  emp_name = c("john", "john","john", "john"),
                  task = c("carpenter","painter", "carpenter", "carpenter"),
                  date_stamp_from = c("2019-01-01","2019-01-07", "2019-01-30", "2019-02-02"),
                  date_stamp_to = c("2019-01-03","2019-01-08", "2019-01-30", "2019-02-02"),
                  count_dates = c(3,2,1,1)
)

解決這個問題的最有效方法是什么? 原始數據集大約有 10000 條記錄。

我們可以使用diffcumsum創建組並計算每個組中的firstlast date_stamp和行數。

library(dplyr)

sample_data %>%
     mutate(date_stamp = as.Date(date_stamp)) %>%
     group_by(gr = cumsum(c(TRUE, diff(date_stamp) > 1))) %>%
     mutate(date_stamp_from = first(date_stamp), 
            date_stamp_to = last(date_stamp), 
            count_dates = n()) %>%
     slice(1L) %>%
     ungroup() %>%
     select(-gr, -date_stamp)

# A tibble: 4 x 5
#  emp_name task      date_stamp_from date_stamp_to count_dates
#  <chr>    <chr>     <date>          <date>              <int>
#1 john     carpenter 2019-01-01      2019-01-03              3
#2 john     painter   2019-01-07      2019-01-08              2
#3 john     carpenter 2019-01-30      2019-01-30              1
#4 john     carpenter 2019-02-02      2019-02-02              1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM