简体   繁体   English

Select 日期基于 dataframe 中的某些标准

[英]Select dates based in certain criteria in a dataframe

I have a dataframe comprised by 4 columns, with temperature data from 3 different locations, from different periods rbind -ed in a single data frame.我有一个 dataframe 由 4 列组成,温度数据来自 3 个不同位置,来自不同时期的rbind -ed 在单个数据框中。 I want to select temperatures which are in common dates/time(hours) from the 3 stations.我想要 select温度,这些温度是来自 3 个站点的常见日期/时间(小时)。

Below I provide a reproducible example:下面我提供一个可重现的例子:

a1 <- seq.POSIXt(as.POSIXct("1995-01-01"), as.POSIXct("2007-04-01"), by = "120 min")
a2 <- seq.POSIXt(as.POSIXct("1998-04-19"), as.POSIXct("2004-03-20"), by = "60 min")
a3 <- seq.POSIXt(as.POSIXct("1991-01-01"), as.POSIXct("2001-04-01"), by = "180 min")


t1 <- runif(length(a1), min = -5, max = 45)
t2 <- runif(length(a2), min = -5, max = 45)
t3 <- runif(length(a3), min = -5, max = 45)


station1 <- data.frame(date = a1, temp = t1, ID = "station1")
station2 <- data.frame(date = a2, temp = t2, ID = "station2")
station3 <- data.frame(date = a3, temp = t3, ID = "station3")

all_stat <- rbind(station1,station2,station3)


all_stat <- all_stat %>%
  mutate(time = hms::as_hms(date),
         date = as_date(date)) %>%
  relocate(date, time)

Idealy I would like to have the four columns data frame (date/time/temp/ID) with only the common dates/hours of temp data, among these 3 stations.理想情况下,我希望在这 3 个站点中包含临时数据的常见日期/时间的四列数据框(日期/时间/温度/ID)。 I tried multiple things with dplyr as well as subset but nothing worked.我用dplyr以及subset尝试了多种方法,但没有任何效果。

Combine date and time to create datetime column.结合datetime来创建日期时间列。 split the datetime variable for each ID and find the common ones using Reduce and use it to subset the dataframes to keep only the common date and times between all the ID 's. split每个IDdatetime变量,并使用Reduce找到常见的变量,并使用它对数据框进行子集化,以仅保留所有ID之间的公共日期和时间。

all_stat$datetime <- paste(all_stat$date, all_stat$time)
result <- subset(all_stat, datetime %in% 
                    Reduce(intersect, split(all_stat$datetime, all_stat$ID)))

We can do this in tidyverse .我们可以在tidyverse中做到这一点。

library(dplyr)   
library(stringr)
all_stat %>%       
    group_by(datetime = str_c(date, time)) %>%
    filter(n_distinct(ID) == n_distinct(all_stat$ID))

Or if we want to make this faster, use data.table或者,如果我们想让它更快,请使用data.table

library(data.table)
setDT(all_stat)[, datetime := paste(date, time)]
sub_stat <- all_stat[all_stat[, .I[uniqueN(ID) == uniqueN(all_stat$ID)],
             by = datetime]$V1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM