Creating a column based on filtering two data frames of different lengths using R

Question

I got two data sets of different lengths. I want to create a new column in the dataset which got more rows based on filtering a specific column from the shorter df. I am getting a waring " Longer object length is not a multiple of shorter object length". And the result is also not correct. I tried to created a smaller example datasets and tried the same code and its working with correct results. I am not sure why on my original data the results are not correct and I am getting the warning. The example datasets are

    structure(list(id = 1:10, activity = c(0, 0, 0, 0, 1, 0, 0, 1, 
0, 0), code = c(2, 5, 11, 15, 3, 18, 21, 3, 27, 55)), class = "data.frame", row.names = c(NA, 
-10L))

the second df

    structure(list(id2 = 1:20, code2 = c(2, 5, 11, 15, 9, 18, 21, 
3, 27, 55, 2, 5, 11, 15, 3, 18, 21, 3, 27, 55), d_Activity = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA, 
-20L))

I tried this on both my original datasets where I get the warning and these dummy dfs where no warning and correct results.

    data2 <- data2 %>% 
  mutate(d_Activity = ifelse(code2 %in% data1$code & activity == 1, 1,0))

Answer 1

Actually, you are doing it wrong way. Let me explain-

In sample data it is working because larger df have rows (20) which is multiple of rows in smaller df (10).
So in you syntax what you are doing is, to check one complete vector with another complete vector (column of another df), because R normally works in vectorised way of operations.
the correct way of matching one to many is through purrr::map where each individual value in first argument (code2 here) operates with another vector ie df1$code which is not in argument of map .

df1 <- structure(list(id = 1:10, activity = c(0, 0, 0, 0, 1, 0, 0, 1, 
                                       0, 0), code = c(2, 5, 11, 15, 3, 18, 21, 3, 27, 55)), class = "data.frame", row.names = c(NA, 
                                                                                                                                 -10L))
df2 <- structure(list(id2 = 1:20, code2 = c(2, 5, 11, 15, 9, 18, 21, 
                                     3, 27, 55, 2, 5, 11, 15, 3, 18, 21, 3, 27, 55), d_Activity = c(0, 
                                                                                                    0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                   -20L))
library(tidyverse)

df2 %>%
  mutate(d_Activity = map(code2, ~ +(.x %in% df1$code[df1$activity == 1])))
#>    id2 code2 d_Activity
#> 1    1     2          0
#> 2    2     5          0
#> 3    3    11          0
#> 4    4    15          0
#> 5    5     9          0
#> 6    6    18          0
#> 7    7    21          0
#> 8    8     3          1
#> 9    9    27          0
#> 10  10    55          0
#> 11  11     2          0
#> 12  12     5          0
#> 13  13    11          0
#> 14  14    15          0
#> 15  15     3          1
#> 16  16    18          0
#> 17  17    21          0
#> 18  18     3          1
#> 19  19    27          0
#> 20  20    55          0

^{Created on 2021-06-17 by the reprex package (v2.0.0)}

Creating a column based on filtering two data frames of different lengths using R

Question

1 answers

solution1
2 ACCPTED 2021-06-17 11:13:42

Creating a column based on filtering two data frames of different lengths using R

Question

1 answers

solution1 2 ACCPTED 2021-06-17 11:13:42

solution1
2 ACCPTED 2021-06-17 11:13:42