Select 日期基于 dataframe 中的某些标准

Question

I have a dataframe comprised by 4 columns, with temperature data from 3 different locations, from different periods rbind -ed in a single data frame.我有一个 dataframe 由 4 列组成，温度数据来自 3 个不同位置，来自不同时期的rbind -ed 在单个数据框中。 I want to select temperatures which are in common dates/time(hours) from the 3 stations.我想要 select温度，这些温度是来自 3 个站点的常见日期/时间（小时）。

Below I provide a reproducible example:下面我提供一个可重现的例子：

a1 <- seq.POSIXt(as.POSIXct("1995-01-01"), as.POSIXct("2007-04-01"), by = "120 min")
a2 <- seq.POSIXt(as.POSIXct("1998-04-19"), as.POSIXct("2004-03-20"), by = "60 min")
a3 <- seq.POSIXt(as.POSIXct("1991-01-01"), as.POSIXct("2001-04-01"), by = "180 min")


t1 <- runif(length(a1), min = -5, max = 45)
t2 <- runif(length(a2), min = -5, max = 45)
t3 <- runif(length(a3), min = -5, max = 45)


station1 <- data.frame(date = a1, temp = t1, ID = "station1")
station2 <- data.frame(date = a2, temp = t2, ID = "station2")
station3 <- data.frame(date = a3, temp = t3, ID = "station3")

all_stat <- rbind(station1,station2,station3)


all_stat <- all_stat %>%
  mutate(time = hms::as_hms(date),
         date = as_date(date)) %>%
  relocate(date, time)

Idealy I would like to have the four columns data frame (date/time/temp/ID) with only the common dates/hours of temp data, among these 3 stations.理想情况下，我希望在这 3 个站点中仅包含临时数据的常见日期/时间的四列数据框（日期/时间/温度/ID）。 I tried multiple things with dplyr as well as subset but nothing worked.我用dplyr以及subset尝试了多种方法，但没有任何效果。

Answer 1

Combine date and time to create datetime column.结合date和time来创建日期时间列。 split the datetime variable for each ID and find the common ones using Reduce and use it to subset the dataframes to keep only the common date and times between all the ID 's. split每个ID的datetime变量，并使用Reduce找到常见的变量，并使用它对数据框进行子集化，以仅保留所有ID之间的公共日期和时间。

all_stat$datetime <- paste(all_stat$date, all_stat$time)
result <- subset(all_stat, datetime %in% 
                    Reduce(intersect, split(all_stat$datetime, all_stat$ID)))

Answer 2

We can do this in tidyverse .我们可以在tidyverse中做到这一点。

library(dplyr)   
library(stringr)
all_stat %>%       
    group_by(datetime = str_c(date, time)) %>%
    filter(n_distinct(ID) == n_distinct(all_stat$ID))

Or if we want to make this faster, use data.table或者，如果我们想让它更快，请使用data.table

library(data.table)
setDT(all_stat)[, datetime := paste(date, time)]
sub_stat <- all_stat[all_stat[, .I[uniqueN(ID) == uniqueN(all_stat$ID)],
             by = datetime]$V1]

Select 日期基于 dataframe 中的某些标准

问题描述

2 个解决方案

解决方案1
1 2021-05-31 12:22:21

解决方案2
1 已采纳 2021-05-31 16:53:29

Select 日期基于 dataframe 中的某些标准

问题描述

2 个解决方案

解决方案1 1 2021-05-31 12:22:21

解决方案2 1 已采纳 2021-05-31 16:53:29

解决方案1
1 2021-05-31 12:22:21

解决方案2
1 已采纳 2021-05-31 16:53:29