[英]Select dates based in certain criteria in a dataframe
I have a dataframe comprised by 4 columns, with temperature data from 3 different locations, from different periods rbind -ed in a single data frame.我有一个 dataframe 由 4 列组成,温度数据来自 3 个不同位置,来自不同时期的rbind -ed 在单个数据框中。 I want to select temperatures which are in common dates/time(hours) from the 3 stations.
我想要 select温度,这些温度是来自 3 个站点的常见日期/时间(小时)。
Below I provide a reproducible example:下面我提供一个可重现的例子:
a1 <- seq.POSIXt(as.POSIXct("1995-01-01"), as.POSIXct("2007-04-01"), by = "120 min")
a2 <- seq.POSIXt(as.POSIXct("1998-04-19"), as.POSIXct("2004-03-20"), by = "60 min")
a3 <- seq.POSIXt(as.POSIXct("1991-01-01"), as.POSIXct("2001-04-01"), by = "180 min")
t1 <- runif(length(a1), min = -5, max = 45)
t2 <- runif(length(a2), min = -5, max = 45)
t3 <- runif(length(a3), min = -5, max = 45)
station1 <- data.frame(date = a1, temp = t1, ID = "station1")
station2 <- data.frame(date = a2, temp = t2, ID = "station2")
station3 <- data.frame(date = a3, temp = t3, ID = "station3")
all_stat <- rbind(station1,station2,station3)
all_stat <- all_stat %>%
mutate(time = hms::as_hms(date),
date = as_date(date)) %>%
relocate(date, time)
Idealy I would like to have the four columns data frame (date/time/temp/ID) with only the common dates/hours of temp data, among these 3 stations.理想情况下,我希望在这 3 个站点中仅包含临时数据的常见日期/时间的四列数据框(日期/时间/温度/ID)。 I tried multiple things with
dplyr
as well as subset
but nothing worked.我用
dplyr
以及subset
尝试了多种方法,但没有任何效果。
Combine date
and time
to create datetime column.结合
date
和time
来创建日期时间列。 split
the datetime
variable for each ID
and find the common ones using Reduce
and use it to subset the dataframes to keep only the common date and times between all the ID
's. split
每个ID
的datetime
变量,并使用Reduce
找到常见的变量,并使用它对数据框进行子集化,以仅保留所有ID
之间的公共日期和时间。
all_stat$datetime <- paste(all_stat$date, all_stat$time)
result <- subset(all_stat, datetime %in%
Reduce(intersect, split(all_stat$datetime, all_stat$ID)))
We can do this in tidyverse
.我们可以在
tidyverse
中做到这一点。
library(dplyr)
library(stringr)
all_stat %>%
group_by(datetime = str_c(date, time)) %>%
filter(n_distinct(ID) == n_distinct(all_stat$ID))
Or if we want to make this faster, use data.table
或者,如果我们想让它更快,请使用
data.table
library(data.table)
setDT(all_stat)[, datetime := paste(date, time)]
sub_stat <- all_stat[all_stat[, .I[uniqueN(ID) == uniqueN(all_stat$ID)],
by = datetime]$V1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.