简体   繁体   English

计数日期范围内的观测值 R

[英]Counting Observations Within Date Range R

This probably has a really simply solution.这可能有一个非常简单的解决方案。 I have two data sets.我有两个数据集。 One is a vector of POSIXct tweet timestamps and the second is a vector of POSIXct ADL HEAT Map timestamps.一个是 POSIXct 推文时间戳的向量,第二个是 POSIXct ADL HEAT Map 时间戳的向量。

I'm looking to build a function that lets me take the dates from the tweets vector and for each one count the number of timestamps in the ADL HEAT Map vector that fall within a specified range from the tweet.我正在寻找构建一个 function 让我从推文向量中获取日期,并为每个计算 ADL HEAT Map 向量中位于推文指定范围内的时间戳数。

My aim is to build the function such that I can put in the tweets vector, the ADL vector, the number of days from the tweets vector to start counting, and the number of days from the tweets vector to stop counting, and return a vector of counts the same length as the tweets data.我的目标是构建 function 以便我可以放入推文向量、ADL 向量、推文向量开始计数的天数以及推文向量停止计数的天数,并返回一个向量的计数与推文数据的长度相同。

I already tried the solution here, and it didn't work: Count number of occurences in date range in R我已经在这里尝试过解决方案,但它没有用: Count number of occurrences in date range in R

Here's an example of what I'm trying to do.这是我正在尝试做的一个例子。 Here's a smaller version of the data sets I'm using:这是我正在使用的数据集的较小版本:

tweets <- c("2016-12-12 14:34:00 GMT", "2016-12-5 17:20:06 GMT")
ADLData <- c("2016-12-11 16:30:00 GMT", "2016-12-7 18:00:00 GMT", "2016-12-2 09:10:00 GMT")

I want to create a function, let's call it countingfunction that lets me input the first data set, the second one, and call a number of days to look back.我想创建一个function,我们称之为countingfunction ,让我输入第一个数据集,第二个,调用天数来回溯。 In this example, I chose 7 days:在本例中,我选择了 7 天:

countingfunction(tweets, ADLData, 7)

Ideally this would return a vector of the length of tweets or in this case 2 with counts for each of how many events in ADLData occurred within the past 7 days from the date in tweets .理想情况下,这将返回tweets长度的向量,或者在本例中为 2,其中包含从tweets中的日期开始的过去 7 天内发生的ADLData中的每个事件的计数。 In this case, c(2,1) .在这种情况下, c(2,1)

So, if I have understood you correctly you have that kind of data:因此,如果我对您的理解正确,您将拥有这样的数据:

tweets <- c(as.POSIXct("2020-08-16", tz = ""), as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""), as.POSIXct("2020-08-13", tz = ""))
ADL <- c(as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""))

And what you want to do, is to say whether a tweet is within the ADL date range or not.而你想要做的,是说一条推文是否在 ADL 日期范围内。 That could be accomplished doing this:这样做可以完成:

ifelse(tweets %in% ADL, print("its in"), print("its not"))

You can assign this easily to another vector, which then states whether it is in or not.您可以轻松地将其分配给另一个向量,然后该向量会说明它是否存在。

You can write countingfunction with the help of outer and calculate the difference in time between every value of two vectors using difftime .您可以在difftime的帮助下编写countingfunction ,并使用outer计算两个向量的每个值之间的时间差。

countingfunction <- function(x1, x2, n) {
  mat <- outer(x1, x2, difftime, units = 'days')  
  rowSums(mat > 0 & mat <= n)
}

Assuming you have vectors of class POSIXct like these:假设您有 class POSIXct的向量,如下所示:

tweets <- as.POSIXct(c("2016-12-12 14:34:00", "2016-12-5 17:20:06"), tz = 'GMT')
ADLData <- as.POSIXct(c("2016-12-11 16:30:00","2016-12-7 18:00:00", 
                        "2016-12-2 09:10:00"), tz = 'GMT')
n <- 7

You can pass them as:您可以将它们传递为:

countingfunction(tweets, ADLData, n)
#[1] 2 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM