[英]Return Past Closest or Equivalent dates in R
I have two data frames mondays
& tdates
as follows : 我在
mondays
和tdates
有两个数据框,如下所示:
T Dates
User.ID tdate
1 11-02-2013
1 04-03-2013
1 16-04-2015
1 03-05-2015
1 05-05-2015
1 11-05-2015
1 29-09-2015
1 26-11-2013
1 28-11-2013
3 01-02-2016
4 22-11-2012
4 25-04-2013
4 29-05-2013
Mondays
ID Monday Closest Date
1 05-09-2016
1 20-04-2015
1 27-07-2015
1 08-06-2015
1 13-10-2014
3 16-09-2013
3 16-02-2015
3 29-08-2016
3 26-05-2014
3 29-02-2016
3 18-07-2016
3 22-02-2016
4 16-11-2015
Now i want to return the past closest or equivalent date in 3rd column from tdates
for each of the User.ID
in mondays
. 现在,我想从返回第3列过去的最近或等同日期
tdates
每个的User.ID
在mondays
。 For eg the expected output is 例如,预期输出为
Mondays
ID Monday Closest Date
1 05-09-2016 29-09-2015
1 20-04-2015 16-04-2015
1 27-07-2015 11-05-2015
1 08-06-2015 11-05-2015
1 13-10-2014 28-11-2013
3 16-09-2013 NA
3 16-02-2015 NA
3 29-08-2016 01-02-2016
3 26-05-2014 NA
3 29-02-2016 01-02-2016
3 18-07-2016 01-02-2016
3 22-02-2016 01-02-2016
4 16-11-2015 29-05-2013
For ID = 1
& Monday = 05-09-2016
ID = 1
& Monday = 05-09-2016
the past closest tdate
is 29-09-2015
thus it'll get this date in Closest Date
column 过去最接近
tdate
是29-09-2015
因此会得到这个日期的Closest Date
列
Note : If no transaction date is found to past or equivalent to monday's date fill NAs
注意:如果未找到过去或等于星期一的交易日期,请填写
NAs
This has to be done for a very large data set , any ideas how this can be done . 必须对非常大的数据集执行此操作,无论如何执行此操作都有任何想法。 I have tried this using a customized function as follows :
我已经尝试过使用自定义函数,如下所示:
lasttxndate <- function(userid, mydate){
+ return(max(subset(tdates$Date.Asked, tdates$User.ID == userid & tdates$Date.Asked <= as.Date(mydate))))
+ }
But this isn't working out when using this with lapply' or
sapply`. 但这不适用于
lapply' or
sapply`。
# date conversion
mondays$Monday <- as.Date(mondays$Monday, "%d-%m-%Y")
tdates$tdate <- as.Date(tdates$tdate, "%d-%m-%Y")
# convert to data.table
library(data.table)
setDT(mondays)
setDT(tdates)
# you need identical column names for join
tdates[, ID := User.ID, ]
tdates[, Monday := tdate, ]
tdates[mondays, on = c("ID", "Monday"), roll = Inf]
User.ID tdate ID Monday
1: 1 2015-09-29 1 2016-09-05
2: 1 2015-04-16 1 2015-04-20
3: 1 2015-05-11 1 2015-07-27
4: 1 2015-05-11 1 2015-06-08
5: 1 2013-11-28 1 2014-10-13
6: NA <NA> 3 2013-09-16
7: NA <NA> 3 2015-02-16
8: 3 2016-02-01 3 2016-08-29
9: NA <NA> 3 2014-05-26
10: 3 2016-02-01 3 2016-02-29
11: 3 2016-02-01 3 2016-07-18
12: 3 2016-02-01 3 2016-02-22
13: 4 2013-05-29 4 2015-11-16
tdate
column gives you the desired dates tdate
您提供所需的日期
This code works well: 该代码运行良好:
T.Dates <- data.frame(
User.ID=c("1","1","1","1","1","1","1","1","1","3","4","4","4"),
tdate=as.Date(c("11-02-2013","04-03-2013","16-04-2015","03-05-2015","05-05-2015","11-05-2015","29-09-2015","26-11-2013","28-11-2013","01-02-2016","22-11-2012","25-04-2013","29-05-2013"),format="%d-%m-%Y"))
Mondays <- data.frame(
ID=c("1","1","1","1","1","3","3","3","3","3","3","3","4"),
Monday=as.Date(c("05-09-2016","20-04-2015","27-07-2015","08-06-2015","13-10-2014","16-09-2013","16-02-2015","29-08-2016","26-05-2014","29-02-2016","18-07-2016","22-02-2016","16-11-2015"),format="%d-%m-%Y"))
Mondays$Closest.Date <- NA
Mondays$Closest.Date <- as.Date(Mondays$Closest.Date, format="%d-%m-%Y")
for(i in 1:nrow(Mondays)){
Mondays[i,"Closest.Date"] <- max(T.Dates$tdate[T.Dates$User.ID==Mondays$ID[i] & T.Dates$tdate <= Mondays[i,"Monday"]])
}
The output: 输出:
> Mondays
ID Monday Closest.Date
1 1 2016-09-05 2015-09-29
2 1 2015-04-20 2015-04-16
3 1 2015-07-27 2015-05-11
4 1 2015-06-08 2015-05-11
5 1 2014-10-13 2013-11-28
6 3 2013-09-16 <NA>
7 3 2015-02-16 <NA>
8 3 2016-08-29 2016-02-01
9 3 2014-05-26 <NA>
10 3 2016-02-29 2016-02-01
11 3 2016-07-18 2016-02-01
12 3 2016-02-22 2016-02-01
13 4 2015-11-16 2013-05-29
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.