[英]How to add group column in R dataframe based on time ranges
I have a dataframe in R (thousands of rows) containing data like this.我在 R (数千行)中有一个 dataframe 包含这样的数据。
"id","ts"
1,2010-11-11 06:00:00
2,2010-11-11 06:01:00
3,2010-11-11 06:02:00
4,2010-11-11 06:03:00
...
11,2010-11-11 06:10:00
12,2010-11-11 06:11:00
13,2010-11-11 06:12:00
14,2010-11-11 06:13:00
15,2010-11-11 06:14:00
16,2010-11-11 06:15:00
17,2010-11-11 10:00:00
18,2010-11-11 10:01:00
19,2010-11-11 10:02:00
20,2010-11-11 10:03:00
21,2010-11-11 10:04:00
22,2010-11-11 10:05:00
...
I have data like the above for many days (11 Nov 2010 - 15 Dec 2010).我有很多天(2010 年 11 月 11 日 - 2010 年 12 月 15 日)的上述数据。 Each day, ideally, has timestamp data ( as.POSIXct, tz = "UTC"
) in three time slots between the ranges given below.理想情况下,每天在下面给出的范围之间的三个时隙中都有时间戳数据( as.POSIXct, tz = "UTC"
)。 However, some days have data for one or two time slots only.但是,有些日子只有一两个时隙的数据。
Slot1: 06:00:00 - 06:15:00
Slot2: 10:00:00 - 10:15:00
Slot3: 13:00:00 - 13:15:00
What I would like to do is, to add a group column (continous group number until 15 Dec 2010 data) based on the above three time ranges.我想做的是,根据上述三个时间范围添加一个组列(连续组号,直到 2010 年 12 月 15 日数据)。 The expected output is:预期的 output 为:
"id","ts","Group"
1,2010-11-11 06:00:00,1
2,2010-11-11 06:01:00,1
3,2010-11-11 06:02:00,1
4,2010-11-11 06:03:00,1
...
11,2010-11-11 06:10:00,1
12,2010-11-11 06:11:00,1
13,2010-11-11 06:12:00,1
14,2010-11-11 06:13:00,1
15,2010-11-11 06:14:00,1
16,2010-11-11 06:15:00,1
17,2010-11-11 10:00:00,2
18,2010-11-11 10:01:00,2
19,2010-11-11 10:02:00,2
20,2010-11-11 10:03:00,2
21,2010-11-11 10:04:00,2
22,2010-11-11 10:05:00,2
...
How this could be achieved in R?如何在 R 中实现这一点?
Some reproducible sample data is here:一些可重现的样本数据在这里:
start1 <- as.POSIXct("2010-11-11 06:00:00 UTC")
end1 <- as.POSIXct("2010-11-11 06:15:00 UTC")
start2 <- as.POSIXct("2010-11-11 10:00:00 UTC")
end2 <- as.POSIXct("2010-11-11 10:15:00 UTC")
start3 <- as.POSIXct("2010-11-11 13:00:00 UTC")
end3 <- as.POSIXct("2010-11-11 13:15:00 UTC")
ts1 <- data.frame(ts=seq.POSIXt(start1,end1, by = "min"))
ts2 <- data.frame(ts=seq.POSIXt(start2,end2, by = "min"))
ts3 <- data.frame(ts=seq.POSIXt(start3,end3, by = "min"))
ts <- data.frame(rbind(ts1,ts2,ts3))
id <- data.frame(id=seq.int(1,48,1))
dat <- data.frame(cbind(id,ts))
You can extract hour and minute value from ts
and use case_when
to apply Group
number.您可以从ts
中提取小时和分钟值,并使用case_when
应用Group
号。
library(dplyr)
library(lubridate)
dat %>%
arrange(ts) %>%
mutate(hour = hour(ts),
minute = minute(ts),
date = as.Date(ts),
Group = case_when(hour == 6 & minute <= 15 ~ 1L,
hour == 10 & minute <= 15 ~ 2L,
hour == 13 & minute <= 15 ~ 3L),
Group = (as.integer(date - min(date)) * 3) + Group,
Group = match(Group, unique(Group))) -> result
result
You can keep the columns that you want using select
ie result %>% select(id, ts, Group)
.您可以使用select
即result %>% select(id, ts, Group)
保留您想要的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.