如何根据时间范围在 R dataframe 中添加组列

Question

I have a dataframe in R (thousands of rows) containing data like this.我在 R （数千行）中有一个 dataframe 包含这样的数据。

"id","ts"
1,2010-11-11 06:00:00
2,2010-11-11 06:01:00
3,2010-11-11 06:02:00
4,2010-11-11 06:03:00
...
11,2010-11-11 06:10:00
12,2010-11-11 06:11:00
13,2010-11-11 06:12:00
14,2010-11-11 06:13:00
15,2010-11-11 06:14:00
16,2010-11-11 06:15:00
17,2010-11-11 10:00:00
18,2010-11-11 10:01:00
19,2010-11-11 10:02:00
20,2010-11-11 10:03:00
21,2010-11-11 10:04:00
22,2010-11-11 10:05:00
...

I have data like the above for many days (11 Nov 2010 - 15 Dec 2010).我有很多天（2010 年 11 月 11 日 - 2010 年 12 月 15 日）的上述数据。 Each day, ideally, has timestamp data ( as.POSIXct, tz = "UTC" ) in three time slots between the ranges given below.理想情况下，每天在下面给出的范围之间的三个时隙中都有时间戳数据（ as.POSIXct, tz = "UTC" ）。 However, some days have data for one or two time slots only.但是，有些日子只有一两个时隙的数据。

Slot1: 06:00:00 - 06:15:00
Slot2: 10:00:00 - 10:15:00
Slot3: 13:00:00 - 13:15:00

What I would like to do is, to add a group column (continous group number until 15 Dec 2010 data) based on the above three time ranges.我想做的是，根据上述三个时间范围添加一个组列（连续组号，直到 2010 年 12 月 15 日数据）。 The expected output is:预期的 output 为：

"id","ts","Group"
1,2010-11-11 06:00:00,1
2,2010-11-11 06:01:00,1
3,2010-11-11 06:02:00,1
4,2010-11-11 06:03:00,1
...
11,2010-11-11 06:10:00,1
12,2010-11-11 06:11:00,1
13,2010-11-11 06:12:00,1
14,2010-11-11 06:13:00,1
15,2010-11-11 06:14:00,1
16,2010-11-11 06:15:00,1
17,2010-11-11 10:00:00,2
18,2010-11-11 10:01:00,2
19,2010-11-11 10:02:00,2
20,2010-11-11 10:03:00,2
21,2010-11-11 10:04:00,2
22,2010-11-11 10:05:00,2
...

How this could be achieved in R?如何在 R 中实现这一点？

Some reproducible sample data is here:一些可重现的样本数据在这里：

start1  <- as.POSIXct("2010-11-11 06:00:00 UTC")
end1    <- as.POSIXct("2010-11-11 06:15:00 UTC")
start2  <- as.POSIXct("2010-11-11 10:00:00 UTC")
end2    <- as.POSIXct("2010-11-11 10:15:00 UTC")
start3  <- as.POSIXct("2010-11-11 13:00:00 UTC")
end3    <- as.POSIXct("2010-11-11 13:15:00 UTC")
ts1     <- data.frame(ts=seq.POSIXt(start1,end1, by = "min"))
ts2     <- data.frame(ts=seq.POSIXt(start2,end2, by = "min"))
ts3     <- data.frame(ts=seq.POSIXt(start3,end3, by = "min"))
ts      <- data.frame(rbind(ts1,ts2,ts3))
id      <- data.frame(id=seq.int(1,48,1))
dat     <- data.frame(cbind(id,ts))

Answer 1

You can extract hour and minute value from ts and use case_when to apply Group number.您可以从ts中提取小时和分钟值，并使用case_when应用Group号。

library(dplyr)
library(lubridate)

dat %>%
  arrange(ts) %>%
  mutate(hour = hour(ts), 
         minute = minute(ts), 
         date = as.Date(ts),
         Group =  case_when(hour == 6 & minute <= 15 ~ 1L, 
                           hour == 10 & minute <= 15 ~ 2L,
                           hour == 13 & minute <= 15 ~ 3L),
         Group = (as.integer(date - min(date)) * 3) + Group, 
         Group = match(Group, unique(Group))) -> result

result

You can keep the columns that you want using select ie result %>% select(id, ts, Group) .您可以使用select即result %>% select(id, ts, Group)保留您想要的列。

如何根据时间范围在 R dataframe 中添加组列

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-15 13:34:02

如何根据时间范围在 R dataframe 中添加组列

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-15 13:34:02

解决方案1
1 已采纳 2021-01-15 13:34:02