Merge dataframes by rounding the dates

Question

I would like to merge two dataframes according to their dates, but they may have different dates. Basically, when the pair group-date is not perfectly matched, I would like to round the dates so that the values in the second dataframe match the values in the first one with the nearest date possible.

To be clearer, here's an example:

library(dplyr)

data1 <- tibble(
  group = rep(c("A", "B"), each = 3),
  date = c(2002, 2005, 2010, 2001, 2004, 2009),
  variable_1 = c("Thing_1", "Thing_1", "Thing_2", "Thing_1", "Thing_2", "Thing_1")
)

# A tibble: 6 x 3
  group  date variable_1
  <chr> <dbl> <chr>     
1 A      2002 Thing_1   
2 A      2005 Thing_1   
3 A      2010 Thing_2   
4 B      2001 Thing_1   
5 B      2004 Thing_2   
6 B      2009 Thing_1   

data2 <- tibble(
  group = rep(c("A", "B"), each = 2),
  date = c(2007, 2008, 2001, 2010),
  variable_2 = c("Else_1", "Else_2", "Else_2", "Else_1")
)

  group  date variable_2
  <chr> <dbl> <chr>     
1 A      2007 Else_1    
2 A      2008 Else_2    
3 B      2001 Else_2    
4 B      2010 Else_1

In the group A for example, we can see that the dates are not the same: 2002, 2005 and 2010 for data1 ; 2007 and 2008 for data2 . Therefore, since no perfect match is possible, I would like to "round" the dates. The value when data2$date is 2007 should be matched with the one where data1$date is 2005, since 2005 is the closest value of 2007. Similarly, the value when data2$date is 2008 should be matched with the one where data1$date is 2010.

Same thing for group B.

Here's the expected output:

# A tibble: 6 x 4
  group  date variable_1 variable_2
  <chr> <dbl> <chr>      <chr>     
1 A      2002 Thing_1    NA        
2 A      2005 Thing_1    Else_1    
3 A      2009 Thing_2    Else_2    
4 B      2001 Thing_1    Else_2    
5 B      2004 Thing_2    NA        
6 B      2009 Thing_1    Else_1

How can I do this?

Answer 1

Using some arithmetics in a Map approach. Since the dates are numeric, rounding them in increments of five is straightforward. We do this in both data frames and use match thereafter.

res <- do.call(rbind, Map(function(x, y) {
  transform(x, variable_2=y$variable_2[
    match(round(x$date / 5)/.2, round(y$date / 5)/.2)
    ])},
  split(data1, data1$group), split(data2, data2$group)))
res
#     group date variable_1 variable_2
# A.1     A 2002    Thing_1       <NA>
# A.2     A 2005    Thing_1     Else_1
# A.3     A 2010    Thing_2     Else_2
# B.4     B 2001    Thing_1     Else_2
# B.5     B 2004    Thing_2       <NA>
# B.6     B 2009    Thing_1     Else_1

Answer 2

you can use data.table package and check for rolling joins,roll="nearest" might help

Answer 3

data1 <- data.table(data1)
data2 <- data.table(data2)
setkey(data1, "date")
setkey(data2, "date")

data_a <- subset(data1,data1$group=="A")
data_b <- subset(data2,data2$group=="A")

data <- data_a[data_b, roll="TRUE"]

Merge dataframes by rounding the dates

Question

3 answers

solution1
1 ACCPTED 2020-04-08 10:26:51

solution2
-1 2020-04-08 10:13:24

solution3
-1 2020-04-08 10:31:59

Merge dataframes by rounding the dates

Question

3 answers

solution1 1 ACCPTED 2020-04-08 10:26:51

solution2 -1 2020-04-08 10:13:24

solution3 -1 2020-04-08 10:31:59

solution1
1 ACCPTED 2020-04-08 10:26:51

solution2
-1 2020-04-08 10:13:24

solution3
-1 2020-04-08 10:31:59