簡體   English   中英

如何按 R 中的多個日期過濾多個因子級別?

[英]How can I filter multiple factor levels by multiple dates in R?

我有多個分組變量 (id),我想用自己的特定日期過濾每個分組變量。

mydata <- structure(list(ID = structure(c("A", "A", "A", "B", "B", "B", "C", "C", "C")), 
    Start = structure(c(1357038060, 1357221074, 1357369644, 1357834170, 
    1357913412, 1358151763, 1358691675, 1358789411, 1359538400
    ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1357110430, 
    1357365312, 1357564413, 1358230679, 1357978810, 1358674600, 
    1358853933, 1359531923, 1359568151), class = c("POSIXct", 
    "POSIXt"), tzone = "")), .Names = c("Line", "Start", "End"), row.names = c(NA, -9L), class = "data.frame")

我可以使用以下方法單獨完成,但如何將其合並為一行?

mydata %>% filter(id == "A" & time >= as.Date("2013-01-01 00:00:00")) 
mydata %>% filter(id == "B" & time >= as.Date("2013-01-13 00:00:00")) 
mydata %>% filter(id == "C" & time >= as.Date("2013-01-23 00:00:00")) 

如果有很多日期,那么可以使用循環

library(dplyr)
library(purrr)
v1 <- unique(mydata$Line)
dates <- as.POSIXct(c("2013-01-01", "2013-01-13", "2013-01-23"))
mydata %>% 
    filter(map2(v1, dates, ~ Line== .x & Start >= .y) %>%
             reduce(`|`))

如果有很多日期,我建議使用 SQL(包sqldf )或data.table使用非對等連接

為此,創建了一個帶有過濾條件的表,例如,

fc <- data.frame(Line = LETTERS[1:3],
                 dates = as.POSIXct(c("2013-01-01", "2013-01-13", "2013-01-23")))
fc   
 Line dates 1 A 2013-01-01 2 B 2013-01-13 3 C 2013-01-23

(請注意, dates屬於POSIXct類型以符合StartEnd

sqldf

library(sqldf)
sqldf("select mydata.* from mydata join fc on mydata.Line = fc.Line and mydata.Start >= fc.dates")
 Line Start End 1 A 2013-01-01 12:01:00 2013-01-02 08:07:10 2 A 2013-01-03 14:51:14 2013-01-05 06:55:12 3 A 2013-01-05 08:07:24 2013-01-07 14:13:33 4 B 2013-01-14 09:22:43 2013-01-20 10:36:40 5 C 2013-01-30 10:33:20 2013-01-30 18:49:11

順便提一句,

sqldf("select mydata.* from mydata, fc where mydata.Line = fc.Line and mydata.Start >= fc.dates")

返回相同的結果。

數據表

library(data.table)
setDT(mydata)[mydata[fc, on = .(Line, Start >= dates ), which = TRUE]]
 Line Start End 1: A 2013-01-01 12:01:00 2013-01-02 08:07:10 2: A 2013-01-03 14:51:14 2013-01-05 06:55:12 3: A 2013-01-05 08:07:24 2013-01-07 14:13:33 4: B 2013-01-14 09:22:43 2013-01-20 10:36:40 5: C 2013-01-30 10:33:20 2013-01-30 18:49:11

表達方式

mydata[fc, on = .(Line, Start >= dates ), which = TRUE]

返回滿足條件的mydata行的索引

[1] 1 2 3 6 9

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM