[英]In R, how do I calculate difference between dates when condition is met?
I have two data frames ( df1
and df2
) with start and end dates for certain events.我有两个数据框(
df1
和df2
),其中包含某些事件的开始日期和结束日期。 I've determined which dates have overlapping events, defined here as having a start date in df1
that is within the start and end date of df2
.我已经确定了哪些日期有重叠事件,这里定义为
df1
中的开始日期在df2
的开始日期和结束日期之内。 If overlap occurs then they are labeled as TRUE
, if there is no overlap then they are labeled as FALSE
.如果发生重叠,则将它们标记为
TRUE
,如果没有重叠,则将它们标记为FALSE
。 What I would like to know is...when Overlap
is TRUE
, how do I calculate the difference between start times in df2
and df1
?我想知道的是...当
Overlap
为TRUE
时,如何计算df2
和df1
的开始时间之间的差异?
> df1$aa
date_start date_end Site
1 2002-04-14 2002-04-21 aa
2 2002-06-26 2002-07-05 aa
3 2002-08-15 2002-08-20 aa
4 2004-05-12 2004-05-19 aa
> df2$bb
date_start date_end Site
1 2002-04-13 2002-04-19 bb
2 2002-08-11 2002-08-19 bb
3 2005-06-09 2005-06-14 bb
4 2005-08-10 2005-08-14 bb
This code determines if there is overlap此代码确定是否有重叠
df1$aa$Overlap <- df1$aa$date_start %in% unlist(Map(':', df2$bb$date_start, df2$bb$date_end))
> df1$aa
date_start date_end Site Overlap
1 2002-04-14 2002-04-21 aa TRUE
2 2002-06-26 2002-07-05 aa FALSE
3 2002-08-15 2002-08-20 aa TRUE
4 2004-05-12 2004-05-19 aa FALSE
You can see that there are two events (rows 1 and 3) where Overlap
is TRUE
.您可以看到
Overlap
为TRUE
的两个事件(第 1 行和第 3 行)。 What I would like to do is determine the time difference ( Diff
) between date_start
for df1
and df2
when Overlap
equals TRUE
.当
Overlap
等于TRUE
时,我想做的是确定df1
和df2
的date_start
之间的时间差( Diff
)。
The result I am looking for should look something like this.我正在寻找的结果应该是这样的。
date_start date_end Site Overlap Diff
1 2002-04-13 2002-04-21 aa TRUE 1
2 2002-08-13 2002-08-20 aa TRUE 4
This should solve your problem with some nested for
loops.这应该可以解决一些嵌套
for
循环的问题。
# Setup df1
df1 <- read.table(textConnection(
' date_start date_end Site
1 2002-04-14 2002-04-21 aa
2 2002-06-26 2002-07-05 aa
3 2002-08-15 2002-08-20 aa
4 2004-05-12 2004-05-19 aa'
))
df1$date_start <- as.Date(df1$date_start)
df1$date_end <- as.Date(df1$date_end)
# Setup df1
df2 <- read.table(textConnection(
' date_start date_end Site
1 2002-04-13 2002-04-19 bb
2 2002-08-11 2002-08-19 bb
3 2005-06-09 2005-06-14 bb
4 2005-08-10 2005-08-14 bb'
))
df2$date_start <- as.Date(df2$date_start)
df2$date_end <- as.Date(df2$date_end)
# Find overlap of dates
df1$Overlap <- df1$date_start %in% unlist(Map(':', df2$date_start, df2$date_end))
# Loop through rows
for (i in 1:nrow(df1)) {
# Go through only those that overlap
if (df1[i, "Overlap"]) {
# Loop through all rows in other data frame
for (j in 1:nrow(df2)) {
# Check if within range of df1
sec_date_range <- df2[j, "date_start"]:df2[j, "date_end"]
if (df1[i, "date_start"] %in% sec_date_range) {
# Find absolute difference in start dates
df1[i, "diff"] <- df1[i, "date_start"] - df2[j, "date_start"]
df1[i, "diff"] <- abs(df1[i, "diff"])
}
}
}
}
# Filter and print result
df1[df1$Overlap, ]
#> date_start date_end Site Overlap diff
#> 1 2002-04-14 2002-04-21 aa TRUE 1 days
#> 3 2002-08-15 2002-08-20 aa TRUE 4 days
Created on 2020-06-15 by the reprex package (v0.3.0)由reprex package (v0.3.0) 于 2020 年 6 月 15 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.