简体   繁体   中英

Filter dates in R with dplyr

I have a large dataset (> 3k rows) that I want to filter based on geographic location and date. The location filtering works fine but I get the following error message when using logical operators on Dates with filter ( dplyr ):

Error: level sets of factors are different

my current codes is as below:

head(master.data)
   State.Name County.Code Latitude Longitude Arithmetic.Mean Date.Local
1     Alabama           3 30.49748 -87.88026             8.0 2014-01-02
2     Alabama           3 30.49748 -87.88026             7.0 2014-01-05
3     Alabama           3 30.49748 -87.88026             7.0 2014-01-08
4     Alabama           3 30.49748 -87.88026             3.6 2014-01-11
5     Alabama           3 30.49748 -87.88026             5.2 2014-01-14
6     Alabama           3 30.49748 -87.88026             4.4 2014-01-17  

master.data$Date.Local <- as.Date(master.data$Date.Local, format = "%Y-%m-%d")

site.info <- data.frame("Alabama", 3, 30, 90, "28/12/2015", "13/07/2016")
names(site.info) <- c("State.Name", "County.Code", "Latitude", "Longitude", 
                       "Date.Start", "Date.End")
site.info$Date.Start <- as.Date(site.info$Date.Start, format = "%d/%m/%Y")
site.info$Date.End <- as.Date(site.info$Date.End, format = "%d/%m/%Y")

reduced.data <- filter(master.data, State.Name == site.info$State.Name, 
                       Date.Local >= site.info$Date.Start 
                       & Date.Local <= site.info$Date.End)

Both site.info and master.data have the dates formatted using as.Date . The input format is different because they are imported from external sources.

I am able to perform logical operations on the two with the expected results outside of filter . Not sure why this is the case. Using %in% yields the same results

Date.Local %in% c(site.info$Date.Start, site.info$Date.End)

How can I get this to work?

After much trying, it seems that subset works better than filter in this case:

 reduced.data <- subset(master.data, Latitude %in% closest.sites$Latitude
                        & Longitude %in% closest.sites$Longitude
                        & Date.Local >= site.info$Date.Start 
                        & Date.Local <= site.info$Date.End)

The above code gives me exactly the results I want.

Now I am stuck trying to filter all entries with the same time stamp - I want to filter all samples taken on the same day and find the average. Both subset and filter seem to fail in this case. R is woe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM