简体   繁体   中英

Efficient way to update specific row in dataframe1 based on specific condition match with that of dataframe2

I have 2 data frames of unequal size and df1 has a comma separated values.

df1 <- data.frame(col1=c("id1", "id2", "id3", "id4", "id5", "id6", "id7"),
                  col2=c("1,2", "2", "1,2,3", "1,2,3,4,5", "4", "2", "1"))

Output of df1
-----------------
  col1      col2
1  id1       1,2
2  id2         2
3  id3     1,2,3
4  id4 1,2,3,4,5
5  id5         4
6  id6         2
7  id7         1


df2 <- data.frame(col1=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16),
                  col2=c("id1", "id1", "id1", "id1", "id1", "id1", "id2", "id3", "id3", "id4", "id4", "id4", "id5", "id6", "id7", "id7"), 
                  col3=c(1, 1, 1, 2, 3, 4, 2, 4, 2, 1, 4, 5, 1, 1, 2, 3))

df2$newCol <- 0

Output of df2
---------------
   col1 col2 col3 newCol
1     1  id1    1      0
2     2  id1    1      0
3     3  id1    1      0
4     4  id1    2      0
5     5  id1    3      0
6     6  id1    4      0
7     7  id2    2      0
8     8  id3    4      0
9     9  id3    2      0
10   10  id4    1      0
11   11  id4    4      0
12   12  id4    5      0
13   13  id5    1      0
14   14  id6    1      0
15   15  id7    2      0
16   16  id7    3      0

I would like to update 'newCol' rows with 1 if following condition satisfies values in df2$col2 == df1$col1 and df2$col3 %in% df2$col2 and the rest should be 0.

Here is the output I expect

    col1 col2 col3 newCol
1     1  id1    1      1
2     2  id1    1      1
3     3  id1    1      1
4     4  id1    2      1
5     5  id1    3      0
6     6  id1    4      0
7     7  id2    2      1
8     8  id3    4      0
9     9  id3    2      1
10   10  id4    1      1
11   11  id4    4      1
12   12  id4    5      1
13   13  id5    1      0
14   14  id6    1      0
15   15  id7    2      0
16   16  id7    3      0

I tried to use unnest df1 values into its own rows and use merge but it does not seem to be working as expected.

Any suggestion to achieve this efficiently would be appreciated.

You could use grepl to detect whether the vector is in col2 . This will be after merge ing the data

transform(merge(df2,df1,by.x="col2",by.y="col1"),newcol = +Vectorize(grepl)(col3,col2.y))[-4]

   col2 col1 col3 newcol
1   id1    1    1      1
2   id1    2    1      1
3   id1    3    1      1
4   id1    4    2      1
5   id1    5    3      0
6   id1    6    4      0
7   id2    7    2      1
8   id3    8    4      0
9   id3    9    2      1
10  id4   10    1      1
11  id4   11    4      1
12  id4   12    5      1
13  id5   13    1      0
14  id6   14    1      0
15  id7   15    2      0
16  id7   16    3      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM