[英]Compare date by group in two data frames in R
我有一個數據框,其中包含id的事件日期:
data.frame(id = c("a", "a", "a", "d", "d"),
date = as.Date(c("2018-01-03", "2018-02-02", "2018-02-22", "2018-02-13", "2018-05-01")))
id date
1 a 2018-01-03
2 a 2018-02-02
3 a 2018-02-22
4 d 2018-02-13
5 d 2018-05-01
另一個包含id的開始和結束時段:
data.frame(id = c("a", "a", "d", "d", "d", "d"),
start = as.Date(c("2018-01-15", "2018-01-30", "2018-03-01", "2018-02-01", "2018-04-02", "2018-03-19")),
end = as.Date(c("2018-01-18", "2018-02-10", "2018-03-03", "2018-04-22", "2018-05-23", "2018-08-29")))
id start end
1 a 2018-01-15 2018-01-18
2 a 2018-01-30 2018-02-10
3 d 2018-03-01 2018-03-03
4 d 2018-02-01 2018-04-22
5 d 2018-04-02 2018-05-23
6 d 2018-03-19 2018-08-29
對於每個id,我需要計算第一個數據幀中每個日期所屬的第二個數據幀的周期數。
我想要的數據框架是:
id date n
1 a 2018-01-03 0 # does not belong to any period
2 a 2018-02-02 1 # belongs to [2018-01-30,2018-02-10]
3 a 2018-02-22 0 # does not belong to any period
4 d 2018-02-13 1 # belongs to [2018-02-01,2018-04-22]
5 d 2018-05-01 2 # belongs to [2018-04-02,2018-05-23] and [2018-03-19,2018-08-29]
我的問題不是關於日期比較和總結結果。 我的問題是在每個id組中執行這些分析。 我想有一種方法可以使用split
和/或apply
系列,但我沒有找到。
我怎么能在基地R做? 我在一個限制性的環境中工作,我只能訪問基地R.
基礎方法
temp <- subset( merge(df1, df2), date >= start & date <= end, select = "date" )
df1$n <- sapply( df1$date, function(x) length( temp$date[ temp$date == x ] ))
# id date n
# 1 a 2018-01-03 0
# 2 a 2018-02-02 1
# 3 a 2018-02-22 0
# 4 d 2018-02-13 1
# 5 d 2018-05-01 2
另一個基礎R方法:
dates <- data.frame(id = c("a", "a", "a", "d", "d"),
date = as.Date(c("2018-01-03", "2018-02-02", "2018-02-22", "2018-02-13", "2018-05-01")))
periods <- data.frame(id = c("a", "a", "d", "d", "d", "d"),
start = as.Date(c("2018-01-15", "2018-01-30", "2018-03-01", "2018-02-01", "2018-04-02", "2018-03-19")),
end = as.Date(c("2018-01-18", "2018-02-10", "2018-03-03", "2018-04-22", "2018-05-23", "2018-08-29")))
df <- transform(merge(dates, periods), belongs = date >= start & date <= end)
aggregate(belongs ~ date + id, data = df, sum)
# date id belongs
# 1 2018-01-03 a 0
# 2 2018-02-02 a 1
# 3 2018-02-22 a 0
# 4 2018-02-13 d 1
# 5 2018-05-01 d 2
或者使用data.table
:
library(data.table)
dt <- as.data.table(merge(dates, periods))
dt[, .(n = sum(date >= start & date <= end)), by=c("id","date")]
# id date n
# 1: a 2018-01-03 0
# 2: a 2018-02-02 1
# 3: a 2018-02-22 0
# 4: d 2018-02-13 1
# 5: d 2018-05-01 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.