[英]In R, how do I calculate difference between dates when condition is met?
我有兩個數據框( df1
和df2
),其中包含某些事件的開始日期和結束日期。 我已經確定了哪些日期有重疊事件,這里定義為df1
中的開始日期在df2
的開始日期和結束日期之內。 如果發生重疊,則將它們標記為TRUE
,如果沒有重疊,則將它們標記為FALSE
。 我想知道的是...當Overlap
為TRUE
時,如何計算df2
和df1
的開始時間之間的差異?
> df1$aa
date_start date_end Site
1 2002-04-14 2002-04-21 aa
2 2002-06-26 2002-07-05 aa
3 2002-08-15 2002-08-20 aa
4 2004-05-12 2004-05-19 aa
> df2$bb
date_start date_end Site
1 2002-04-13 2002-04-19 bb
2 2002-08-11 2002-08-19 bb
3 2005-06-09 2005-06-14 bb
4 2005-08-10 2005-08-14 bb
此代碼確定是否有重疊
df1$aa$Overlap <- df1$aa$date_start %in% unlist(Map(':', df2$bb$date_start, df2$bb$date_end))
> df1$aa
date_start date_end Site Overlap
1 2002-04-14 2002-04-21 aa TRUE
2 2002-06-26 2002-07-05 aa FALSE
3 2002-08-15 2002-08-20 aa TRUE
4 2004-05-12 2004-05-19 aa FALSE
您可以看到Overlap
為TRUE
的兩個事件(第 1 行和第 3 行)。 當Overlap
等於TRUE
時,我想做的是確定df1
和df2
的date_start
之間的時間差( Diff
)。
我正在尋找的結果應該是這樣的。
date_start date_end Site Overlap Diff
1 2002-04-13 2002-04-21 aa TRUE 1
2 2002-08-13 2002-08-20 aa TRUE 4
這應該可以解決一些嵌套for
循環的問題。
# Setup df1
df1 <- read.table(textConnection(
' date_start date_end Site
1 2002-04-14 2002-04-21 aa
2 2002-06-26 2002-07-05 aa
3 2002-08-15 2002-08-20 aa
4 2004-05-12 2004-05-19 aa'
))
df1$date_start <- as.Date(df1$date_start)
df1$date_end <- as.Date(df1$date_end)
# Setup df1
df2 <- read.table(textConnection(
' date_start date_end Site
1 2002-04-13 2002-04-19 bb
2 2002-08-11 2002-08-19 bb
3 2005-06-09 2005-06-14 bb
4 2005-08-10 2005-08-14 bb'
))
df2$date_start <- as.Date(df2$date_start)
df2$date_end <- as.Date(df2$date_end)
# Find overlap of dates
df1$Overlap <- df1$date_start %in% unlist(Map(':', df2$date_start, df2$date_end))
# Loop through rows
for (i in 1:nrow(df1)) {
# Go through only those that overlap
if (df1[i, "Overlap"]) {
# Loop through all rows in other data frame
for (j in 1:nrow(df2)) {
# Check if within range of df1
sec_date_range <- df2[j, "date_start"]:df2[j, "date_end"]
if (df1[i, "date_start"] %in% sec_date_range) {
# Find absolute difference in start dates
df1[i, "diff"] <- df1[i, "date_start"] - df2[j, "date_start"]
df1[i, "diff"] <- abs(df1[i, "diff"])
}
}
}
}
# Filter and print result
df1[df1$Overlap, ]
#> date_start date_end Site Overlap diff
#> 1 2002-04-14 2002-04-21 aa TRUE 1 days
#> 3 2002-08-15 2002-08-20 aa TRUE 4 days
由reprex package (v0.3.0) 於 2020 年 6 月 15 日創建
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.