简体   繁体   中英

Creating a column based on filtering two data frames of different lengths using R

I got two data sets of different lengths. I want to create a new column in the dataset which got more rows based on filtering a specific column from the shorter df. I am getting a waring " Longer object length is not a multiple of shorter object length". And the result is also not correct. I tried to created a smaller example datasets and tried the same code and its working with correct results. I am not sure why on my original data the results are not correct and I am getting the warning. The example datasets are

    structure(list(id = 1:10, activity = c(0, 0, 0, 0, 1, 0, 0, 1, 
0, 0), code = c(2, 5, 11, 15, 3, 18, 21, 3, 27, 55)), class = "data.frame", row.names = c(NA, 
-10L))

the second df

    structure(list(id2 = 1:20, code2 = c(2, 5, 11, 15, 9, 18, 21, 
3, 27, 55, 2, 5, 11, 15, 3, 18, 21, 3, 27, 55), d_Activity = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA, 
-20L))

I tried this on both my original datasets where I get the warning and these dummy dfs where no warning and correct results.

    data2 <- data2 %>% 
  mutate(d_Activity = ifelse(code2 %in% data1$code & activity == 1, 1,0))

Actually, you are doing it wrong way. Let me explain-

  • In sample data it is working because larger df have rows (20) which is multiple of rows in smaller df (10).
  • So in you syntax what you are doing is, to check one complete vector with another complete vector (column of another df), because R normally works in vectorised way of operations.
  • the correct way of matching one to many is through purrr::map where each individual value in first argument (code2 here) operates with another vector ie df1$code which is not in argument of map .
df1 <- structure(list(id = 1:10, activity = c(0, 0, 0, 0, 1, 0, 0, 1, 
                                       0, 0), code = c(2, 5, 11, 15, 3, 18, 21, 3, 27, 55)), class = "data.frame", row.names = c(NA, 
                                                                                                                                 -10L))
df2 <- structure(list(id2 = 1:20, code2 = c(2, 5, 11, 15, 9, 18, 21, 
                                     3, 27, 55, 2, 5, 11, 15, 3, 18, 21, 3, 27, 55), d_Activity = c(0, 
                                                                                                    0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                   -20L))
library(tidyverse)

df2 %>%
  mutate(d_Activity = map(code2, ~ +(.x %in% df1$code[df1$activity == 1])))
#>    id2 code2 d_Activity
#> 1    1     2          0
#> 2    2     5          0
#> 3    3    11          0
#> 4    4    15          0
#> 5    5     9          0
#> 6    6    18          0
#> 7    7    21          0
#> 8    8     3          1
#> 9    9    27          0
#> 10  10    55          0
#> 11  11     2          0
#> 12  12     5          0
#> 13  13    11          0
#> 14  14    15          0
#> 15  15     3          1
#> 16  16    18          0
#> 17  17    21          0
#> 18  18     3          1
#> 19  19    27          0
#> 20  20    55          0

Created on 2021-06-17 by the reprex package (v2.0.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM