简体   繁体   English

根据 R 中的日期标准合并数据帧

[英]Merge dataframes based on date criteria in R

I have two data frmaes (say DF1 and DF2).我有两个数据帧(比如 DF1 和 DF2)。 I want to merge them based on multiple criteria.我想根据多个标准合并它们。 If 'state' and 'city' of DF1 match that of DF2, and 'date' of DF2 is in within four years of 'date' of DF1, then I would like to add column 'margin' from DF2 to DF1.如果 DF1 的“州”和“城市”与 DF2 的匹配,并且 DF2 的“日期”在 DF1 的“日期”的四年内,那么我想将 DF2 的“边距”列添加到 DF1。 If the conditions are not met, column 'margin' of DF1 will have value of NA.如果条件不满足,DF1 的“边距”列的值为 NA。

DF1 <- structure(list(date = c("2001-02-14", "2001-06-14", "2004-03-31", 
"2003-03-11", "2003-06-29"), state = c("DE", "NY", "NY", "NY", 
"AZ"), city = c("Wilmington", "New York", "Buffalo", "New York", 
"Phoenix"), industry = c("Retail", "Computers and Software", 
"Manufacturing (Misc.)", "Healthcare and Medical", "Construction and Supplies"
), SIC = c(5331, 3571, 2541, 8063, 2421)), row.names = c(2937L, 
2817L, 2117L, 2298L, 2228L), class = "data.frame")

DF2 <- structure(list(date = c("2000-11-07", "2000-11-07", "2008-11-04", 
"2000-11-07", "2000-11-07", "2008-11-04", "2004-11-02", "2004-11-02", 
"2008-11-04", "2012-11-06"), state = c("MA", "NY", "OH", "VA", 
"CA", "DE", "NY", "NY", "NY", "AZ"), city = c("Boston", "New York", 
"Cleveland", "Richmond", "Los Angeles", "Wilmington", "New York", 
"Buffalo", "New York", "Phoenix"), margin = c(-3.61895488477766, -41.5805022156573, -40.2049010106604, 
24.8839947364776, 17.2042747593408, -55.4514285714286, -35.5094126201826, 
-61.9743406985032, -39.9718177548145, 7.47655435915248)), row.names = c(9849L, 
10041L, 29268L, 11941L, 7365L, 31116L, 13227L, 17397L, 23352L, 
32571L), class = "data.frame")

Something like this?像这样的东西? Depending on how you want the interval.取决于你想要的间隔。

 library(lubridate)
 library(fuzzyjoin)
    
        DF1$date <- ymd(DF1$date)     
        DF2$date <- ymd(DF2$date)       
        DF2$interval <- interval(DF2$date,  DF2$date + years(4))
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
        fuzzy_left_join(DF1, DF2, 
                        by = c("city" = "city",
                               "state" = "state",
                               "date" = "interval"),
                        match_fun = c(`==`, `==`, `%within%`))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM