[英]How can I filter multiple factor levels by multiple dates in R?
我有多個分組變量 (id),我想用自己的特定日期過濾每個分組變量。
mydata <- structure(list(ID = structure(c("A", "A", "A", "B", "B", "B", "C", "C", "C")),
Start = structure(c(1357038060, 1357221074, 1357369644, 1357834170,
1357913412, 1358151763, 1358691675, 1358789411, 1359538400
), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1357110430,
1357365312, 1357564413, 1358230679, 1357978810, 1358674600,
1358853933, 1359531923, 1359568151), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("Line", "Start", "End"), row.names = c(NA, -9L), class = "data.frame")
我可以使用以下方法單獨完成,但如何將其合並為一行?
mydata %>% filter(id == "A" & time >= as.Date("2013-01-01 00:00:00"))
mydata %>% filter(id == "B" & time >= as.Date("2013-01-13 00:00:00"))
mydata %>% filter(id == "C" & time >= as.Date("2013-01-23 00:00:00"))
如果有很多日期,那么可以使用循環
library(dplyr)
library(purrr)
v1 <- unique(mydata$Line)
dates <- as.POSIXct(c("2013-01-01", "2013-01-13", "2013-01-23"))
mydata %>%
filter(map2(v1, dates, ~ Line== .x & Start >= .y) %>%
reduce(`|`))
如果有很多日期,我建議使用 SQL(包sqldf
)或data.table
使用非對等連接
為此,創建了一個帶有過濾條件的表,例如,
fc <- data.frame(Line = LETTERS[1:3],
dates = as.POSIXct(c("2013-01-01", "2013-01-13", "2013-01-23")))
fc
Line dates 1 A 2013-01-01 2 B 2013-01-13 3 C 2013-01-23
(請注意, dates
屬於POSIXct
類型以符合Start
和End
)
library(sqldf)
sqldf("select mydata.* from mydata join fc on mydata.Line = fc.Line and mydata.Start >= fc.dates")
Line Start End 1 A 2013-01-01 12:01:00 2013-01-02 08:07:10 2 A 2013-01-03 14:51:14 2013-01-05 06:55:12 3 A 2013-01-05 08:07:24 2013-01-07 14:13:33 4 B 2013-01-14 09:22:43 2013-01-20 10:36:40 5 C 2013-01-30 10:33:20 2013-01-30 18:49:11
順便提一句,
sqldf("select mydata.* from mydata, fc where mydata.Line = fc.Line and mydata.Start >= fc.dates")
返回相同的結果。
library(data.table)
setDT(mydata)[mydata[fc, on = .(Line, Start >= dates ), which = TRUE]]
Line Start End 1: A 2013-01-01 12:01:00 2013-01-02 08:07:10 2: A 2013-01-03 14:51:14 2013-01-05 06:55:12 3: A 2013-01-05 08:07:24 2013-01-07 14:13:33 4: B 2013-01-14 09:22:43 2013-01-20 10:36:40 5: C 2013-01-30 10:33:20 2013-01-30 18:49:11
表達方式
mydata[fc, on = .(Line, Start >= dates ), which = TRUE]
返回滿足條件的mydata
行的索引
[1] 1 2 3 6 9
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.