简体   繁体   English

如何根据时间范围在 R dataframe 中添加组列

[英]How to add group column in R dataframe based on time ranges

I have a dataframe in R (thousands of rows) containing data like this.我在 R (数千行)中有一个 dataframe 包含这样的数据。

"id","ts"
1,2010-11-11 06:00:00
2,2010-11-11 06:01:00
3,2010-11-11 06:02:00
4,2010-11-11 06:03:00
...
11,2010-11-11 06:10:00
12,2010-11-11 06:11:00
13,2010-11-11 06:12:00
14,2010-11-11 06:13:00
15,2010-11-11 06:14:00
16,2010-11-11 06:15:00
17,2010-11-11 10:00:00
18,2010-11-11 10:01:00
19,2010-11-11 10:02:00
20,2010-11-11 10:03:00
21,2010-11-11 10:04:00
22,2010-11-11 10:05:00
...

I have data like the above for many days (11 Nov 2010 - 15 Dec 2010).我有很多天(2010 年 11 月 11 日 - 2010 年 12 月 15 日)的上述数据。 Each day, ideally, has timestamp data ( as.POSIXct, tz = "UTC" ) in three time slots between the ranges given below.理想情况下,每天在下面给出的范围之间的三个时隙中都有时间戳数据( as.POSIXct, tz = "UTC" )。 However, some days have data for one or two time slots only.但是,有些日子只有一两个时隙的数据。

Slot1: 06:00:00 - 06:15:00
Slot2: 10:00:00 - 10:15:00
Slot3: 13:00:00 - 13:15:00

What I would like to do is, to add a group column (continous group number until 15 Dec 2010 data) based on the above three time ranges.我想做的是,根据上述三个时间范围添加一个组列(连续组号,直到 2010 年 12 月 15 日数据)。 The expected output is:预期的 output 为:

"id","ts","Group"
1,2010-11-11 06:00:00,1
2,2010-11-11 06:01:00,1
3,2010-11-11 06:02:00,1
4,2010-11-11 06:03:00,1
...
11,2010-11-11 06:10:00,1
12,2010-11-11 06:11:00,1
13,2010-11-11 06:12:00,1
14,2010-11-11 06:13:00,1
15,2010-11-11 06:14:00,1
16,2010-11-11 06:15:00,1
17,2010-11-11 10:00:00,2
18,2010-11-11 10:01:00,2
19,2010-11-11 10:02:00,2
20,2010-11-11 10:03:00,2
21,2010-11-11 10:04:00,2
22,2010-11-11 10:05:00,2
...

How this could be achieved in R?如何在 R 中实现这一点?

Some reproducible sample data is here:一些可重现的样本数据在这里:

start1  <- as.POSIXct("2010-11-11 06:00:00 UTC")
end1    <- as.POSIXct("2010-11-11 06:15:00 UTC")
start2  <- as.POSIXct("2010-11-11 10:00:00 UTC")
end2    <- as.POSIXct("2010-11-11 10:15:00 UTC")
start3  <- as.POSIXct("2010-11-11 13:00:00 UTC")
end3    <- as.POSIXct("2010-11-11 13:15:00 UTC")
ts1     <- data.frame(ts=seq.POSIXt(start1,end1, by = "min"))
ts2     <- data.frame(ts=seq.POSIXt(start2,end2, by = "min"))
ts3     <- data.frame(ts=seq.POSIXt(start3,end3, by = "min"))
ts      <- data.frame(rbind(ts1,ts2,ts3))
id      <- data.frame(id=seq.int(1,48,1))
dat     <- data.frame(cbind(id,ts))

You can extract hour and minute value from ts and use case_when to apply Group number.您可以从ts中提取小时和分钟值,并使用case_when应用Group号。

library(dplyr)
library(lubridate)

dat %>%
  arrange(ts) %>%
  mutate(hour = hour(ts), 
         minute = minute(ts), 
         date = as.Date(ts),
         Group =  case_when(hour == 6 & minute <= 15 ~ 1L, 
                           hour == 10 & minute <= 15 ~ 2L,
                           hour == 13 & minute <= 15 ~ 3L),
         Group = (as.integer(date - min(date)) * 3) + Group, 
         Group = match(Group, unique(Group))) -> result

result

You can keep the columns that you want using select ie result %>% select(id, ts, Group) .您可以使用selectresult %>% select(id, ts, Group)保留您想要的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM