[英]dplyr filter by multiple conditions including date
My dataset:我的数据集:
> as_tibble(wq4)
# A tibble: 58,538 x 4
Date Site Analyte Value2
<date> <fct> <fct> <dbl>
1 2014-01-10 N2 Ammonia NH3-N 0.01
2 2014-01-10 N2 Chlorophyll - a 1.5
3 2014-01-10 N2 Filtered Total Phosphorus 0.005
4 2014-01-10 N2 Oxidised Nitrogen 0.1
5 2014-01-10 N2 Total Nitrogen 0.3
6 2014-01-10 N2 Total Phosphorus 0.008
7 2014-01-10 N2 Ammonia NH3-N 0.02
8 2014-01-10 N2 Chlorophyll - a 1.4
9 2014-01-10 N2 Conductivity 191
10 2014-01-10 N2 Enterococci 19
# … with 58,528 more rows
I want to filter out a certain set of values based on multiple conditions using dplyr. What I've got so far is:我想使用 dplyr 根据多个条件过滤掉一组特定的值。到目前为止我得到的是:
filter(wq4, Site != "N1" & !Date %in% c("2019-04-17", "2019-04-18", "2019-04-19"))
I essentially want to remove any data from the 17th to 19th Apr 2019, only at Site N1 (not any of my other sites).我基本上想删除 2019 年 4 月 17 日至 19 日期间的所有数据,仅在站点 N1(而不是我的任何其他站点)。
I don't think this line of code is working for me.我不认为这行代码对我有用。 It is either the "&" or perhaps dplyr is struggling with the date format?
是“&”还是 dplyr 正在与日期格式作斗争?
Any suggestions?有什么建议么? Thanks.
谢谢。
Your sample data does not include "N1"
, but here's a guess:您的样本数据不包括
"N1"
,但这是一个猜测:
filter(wq4, Site != "N1" | !between(Date, as.Date("2019-04-17"), as.Date("2019-04-19")))
will return sites that are not "N1"
(any date), and data for site "N1"
that are not between those two dates.将返回不是
"N1"
(任何日期)的站点,以及不在这两个日期之间的站点"N1"
的数据。
You can still use your %in%
logic if you prefer, I offered !between
as an alternative for two reasons:如果你愿意,你仍然可以使用你的
%in%
逻辑,我提供!between
作为替代方案有两个原因:
Date
objects are not necessarily integral, diff(c(Sys.Date(), Sys.Date() + 0.1))
returns 0.1 days
, showing that it is a floating-point. Date
对象不一定是整数, diff(c(Sys.Date(), Sys.Date() + 0.1))
返回0.1 days
,表明它是一个浮点数。 If your dates are all clearly integral and nothing has possibly nudged them off of the perfect-day, then your %in%
should work just fine, but along the lines of Why are these numbers not equal?如果您的日期显然都是不可或缺的,并且没有什么可能使它们偏离完美的一天,那么您的
%in%
应该可以正常工作,但是按照为什么这些数字不相等? , floating-point equality is not assured. , 不保证浮点相等。
As an example:举个例子:
Sys.Date() # [1] "2020-09-19" Sys.Date() %in% as.Date("2020-09-19") # [1] TRUE (Sys.Date() + 0.1) # [1] "2020-09-19" # still looks integral (Sys.Date() + 0.1) %in% as.Date("2020-09-19") # [1] FALSE
In case you want to span more than a few days, it is more efficient to deal with the start/end dates instead of every... possible... date.如果您想跨越几天以上,处理开始/结束日期而不是每个......可能的......日期会更有效。
Try with:尝试:
library(dplyr)
wq4 %>%
filter(!(Site == "N1" &
Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19"))))
and the same expression in subset
:和
subset
的相同表达式:
subset(wq4, !(Site == "N1" &
Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19"))))
Site == "N1" & Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19")
are the rows which you want to remove. So we add !
sign before it. Site == "N1" & Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19")
是您要删除的行。所以我们在它之前添加!
符号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.