简体   繁体   中英

Counting Observations Within Date Range R

This probably has a really simply solution. I have two data sets. One is a vector of POSIXct tweet timestamps and the second is a vector of POSIXct ADL HEAT Map timestamps.

I'm looking to build a function that lets me take the dates from the tweets vector and for each one count the number of timestamps in the ADL HEAT Map vector that fall within a specified range from the tweet.

My aim is to build the function such that I can put in the tweets vector, the ADL vector, the number of days from the tweets vector to start counting, and the number of days from the tweets vector to stop counting, and return a vector of counts the same length as the tweets data.

I already tried the solution here, and it didn't work: Count number of occurences in date range in R

Here's an example of what I'm trying to do. Here's a smaller version of the data sets I'm using:

tweets <- c("2016-12-12 14:34:00 GMT", "2016-12-5 17:20:06 GMT")
ADLData <- c("2016-12-11 16:30:00 GMT", "2016-12-7 18:00:00 GMT", "2016-12-2 09:10:00 GMT")

I want to create a function, let's call it countingfunction that lets me input the first data set, the second one, and call a number of days to look back. In this example, I chose 7 days:

countingfunction(tweets, ADLData, 7)

Ideally this would return a vector of the length of tweets or in this case 2 with counts for each of how many events in ADLData occurred within the past 7 days from the date in tweets . In this case, c(2,1) .

So, if I have understood you correctly you have that kind of data:

tweets <- c(as.POSIXct("2020-08-16", tz = ""), as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""), as.POSIXct("2020-08-13", tz = ""))
ADL <- c(as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""))

And what you want to do, is to say whether a tweet is within the ADL date range or not. That could be accomplished doing this:

ifelse(tweets %in% ADL, print("its in"), print("its not"))

You can assign this easily to another vector, which then states whether it is in or not.

You can write countingfunction with the help of outer and calculate the difference in time between every value of two vectors using difftime .

countingfunction <- function(x1, x2, n) {
  mat <- outer(x1, x2, difftime, units = 'days')  
  rowSums(mat > 0 & mat <= n)
}

Assuming you have vectors of class POSIXct like these:

tweets <- as.POSIXct(c("2016-12-12 14:34:00", "2016-12-5 17:20:06"), tz = 'GMT')
ADLData <- as.POSIXct(c("2016-12-11 16:30:00","2016-12-7 18:00:00", 
                        "2016-12-2 09:10:00"), tz = 'GMT')
n <- 7

You can pass them as:

countingfunction(tweets, ADLData, n)
#[1] 2 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM