[英]Flag consecutive dates by group - R
下面是我的數據示例(房間和日期)。 我想生成變量 Goal1、Goal2 和 Goal3。 每次 Date 變量中出現間隙時,都表示房間已關閉。 我的目標是按房間識別連續日期。
Room Date Goal1 Goal2 Goal3
1 Upper A 2021-01-01 1 2021-01-01 2021-01-02
2 Upper A 2021-01-02 1 2021-01-01 2021-01-02
3 Upper A 2021-01-05 2 2021-01-05 2021-01-05
4 Upper A 2021-01-10 3 2021-01-10 2021-01-10
5 Upper B 2021-01-01 1 2021-01-01 2021-01-01
6 Upper B 2021-02-05 2 2021-02-05 2021-02-07
7 Upper B 2021-02-06 2 2021-02-05 2021-02-07
8 Upper B 2021-02-07 2 2021-02-05 2021-02-07
df <- data.frame("Area" = c("Upper A", "Upper A", "Upper A", "Upper A",
"Upper B", "Upper B", "Upper B", "Upper B"),
"Date" = c("1/1/2021", "1/2/2021", "1/5/2021", "1/10/2021",
"1/1/2021", "2/5/2021", "2/6/2021", "2/7/2021"))
df$Date <- as.Date(df$Date, format = "%m/%d/%Y")
謝謝你,馬文
你也可以這樣做
df %>% group_by(Area, Goal1 = cumsum(c(0, diff.Date(Date)) != 1)) %>%
arrange(Area, Date) %>%
mutate(Goal2 = min(Date),
Goal3 = max(Date))
# A tibble: 8 x 5
# Groups: Area, Goal1 [5]
Area Date Goal1 Goal2 Goal3
<chr> <date> <int> <date> <date>
1 Upper A 2021-01-01 1 2021-01-01 2021-01-02
2 Upper A 2021-01-02 1 2021-01-01 2021-01-02
3 Upper A 2021-01-05 2 2021-01-05 2021-01-05
4 Upper A 2021-01-10 3 2021-01-10 2021-01-10
5 Upper B 2021-01-01 4 2021-01-01 2021-01-01
6 Upper B 2021-02-05 5 2021-02-05 2021-02-07
7 Upper B 2021-02-06 5 2021-02-05 2021-02-07
8 Upper B 2021-02-07 5 2021-02-05 2021-02-07
# Original Data (Note I use a different method to convert the Date to date format below)
df <- data.frame("Area" = c("Upper A", "Upper A", "Upper A", "Upper A",
"Upper B", "Upper B", "Upper B", "Upper B"),
"Date" = c("1/1/2021", "1/2/2021", "1/5/2021", "1/10/2021",
"1/1/2021", "2/5/2021", "2/6/2021", "2/7/2021"))
這是一種可能的解決方案。 我創建了一個帶有嵌套if_else()
語句的額外列,該語句標識每個連續日期“組”的開始日期。 我在最終數據集中留下了額外的列,以更好地說明代碼中發生的情況。
library(lubridate) # I suggest lubridate for working with dates
# It sticks with the dplyr/tidyverse syntax
df.grouped <- df %>%
mutate(Date = mdy(Date)) %>% #convert characters to actual dates in month-day-year format
arrange(Area, Date) %>% # arrange data in order by area, then Date
group_by(Area) %>% # group by Area
mutate(group_start = if_else(row_number() == 1, 1, #group_start gives the start of consecutive groups of days a 1, other dates a 0
if_else(Date-lag(Date) == 1, 0, 1)),
group_id = cumsum(group_start)) %>% #group_id cumulatively adds the group_start column, effectively generating a new id # for each group start day
group_by(Area, group_id) %>% # re-group the data by Area AND group_id
mutate(start_date = min(Date), #find the min (start) and max (end) dates for each group
end_date = max(Date))
最后結果:
df.grouped
> df.grouped
# A tibble: 8 x 6
# Groups: Area, group_id [5]
Area Date group_start group_id start_date end_date
<chr> <date> <dbl> <dbl> <date> <date>
1 Upper A 2021-01-01 1 1 2021-01-01 2021-01-02
2 Upper A 2021-01-02 0 1 2021-01-01 2021-01-02
3 Upper A 2021-01-05 1 2 2021-01-05 2021-01-05
4 Upper A 2021-01-10 1 3 2021-01-10 2021-01-10
5 Upper B 2021-01-01 1 1 2021-01-01 2021-01-01
6 Upper B 2021-02-05 1 2 2021-02-05 2021-02-07
7 Upper B 2021-02-06 0 2 2021-02-05 2021-02-07
8 Upper B 2021-02-07 0 2 2021-02-05 2021-02-07
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.