简体   繁体   中英

Selecting rows where a value in one of a number of columns matches one of a separate vector/list in R

Similar to: In R, select row of a matrix that match a vector

Dataframe:

structure(list(id = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
"42", "43", "44", "45", "46", "47", "48", "49", "50"), X1 = c(8L, 
18L, 6L, 10L, 2L, 12L, 20L, 19L, 17L, 6L, 20L, 3L, 14L, 20L, 
11L, 17L, 19L, 3L, 12L, 17L, 20L, 14L, 11L, 1L, 9L, 1L, 2L, 4L, 
7L, 18L, 7L, 12L, 18L, 6L, 6L, 6L, 20L, 11L, 17L, 15L, 6L, 2L, 
17L, 14L, 15L, 10L, 4L, 4L, 7L, 3L), X2 = c(1L, 3L, 10L, 14L, 
12L, 9L, 0L, 1L, 7L, 14L, 3L, 10L, 5L, 15L, 1L, 14L, 17L, 9L, 
16L, 6L, 10L, 6L, 1L, 11L, 8L, 1L, 0L, 3L, 14L, 4L, 16L, 5L, 
15L, 11L, 10L, 0L, 16L, 16L, 15L, 20L, 5L, 1L, 9L, 2L, 16L, 12L, 
4L, 2L, 15L, 11L), X3 = c(16L, 2L, 10L, 19L, 5L, 16L, 13L, 14L, 
10L, 15L, 18L, 17L, 0L, 2L, 7L, 5L, 19L, 3L, 2L, 20L, 19L, 14L, 
18L, 13L, 5L, 15L, 13L, 6L, 9L, 17L, 9L, 17L, 15L, 1L, 20L, 17L, 
19L, 13L, 15L, 4L, 9L, 0L, 13L, 9L, 11L, 2L, 0L, 5L, 5L, 16L), 
    X4 = c(14L, 16L, 6L, 2L, 2L, 10L, 13L, 5L, 9L, 16L, 15L, 
    3L, 11L, 8L, 2L, 17L, 1L, 1L, 5L, 18L, 0L, 14L, 18L, 19L, 
    6L, 17L, 15L, 11L, 19L, 13L, 2L, 12L, 8L, 4L, 17L, 14L, 9L, 
    18L, 10L, 19L, 14L, 14L, 15L, 15L, 7L, 16L, 2L, 19L, 12L, 
    13L), X5 = c(8L, 7L, 18L, 20L, 9L, 12L, 4L, 5L, 18L, 14L, 
    10L, 3L, 8L, 9L, 15L, 13L, 2L, 3L, 18L, 7L, 16L, 17L, 20L, 
    15L, 9L, 17L, 9L, 17L, 14L, 10L, 4L, 5L, 0L, 2L, 13L, 20L, 
    16L, 12L, 14L, 20L, 1L, 9L, 8L, 14L, 19L, 12L, 2L, 0L, 1L, 
    5L), X6 = c(10L, 2L, 11L, 19L, 2L, 11L, 7L, 12L, 16L, 17L, 
    2L, 9L, 20L, 0L, 19L, 1L, 15L, 15L, 6L, 8L, 1L, 15L, 11L, 
    17L, 16L, 8L, 16L, 20L, 15L, 9L, 7L, 15L, 12L, 14L, 20L, 
    4L, 12L, 6L, 2L, 5L, 13L, 17L, 2L, 2L, 2L, 17L, 0L, 19L, 
    19L, 14L), X7 = c(13L, 19L, 12L, 14L, 17L, 14L, 18L, 12L, 
    7L, 1L, 10L, 14L, 20L, 11L, 20L, 12L, 15L, 2L, 11L, 20L, 
    1L, 3L, 10L, 11L, 12L, 13L, 15L, 18L, 8L, 13L, 14L, 8L, 6L, 
    11L, 8L, 10L, 3L, 10L, 4L, 5L, 15L, 11L, 12L, 16L, 11L, 8L, 
    3L, 8L, 9L, 1L), X8 = c(7L, 17L, 7L, 17L, 17L, 6L, 18L, 11L, 
    14L, 17L, 1L, 4L, 18L, 9L, 15L, 20L, 12L, 8L, 5L, 20L, 6L, 
    15L, 8L, 3L, 12L, 1L, 14L, 12L, 6L, 0L, 8L, 13L, 20L, 0L, 
    20L, 20L, 13L, 9L, 0L, 17L, 1L, 2L, 15L, 10L, 2L, 1L, 20L, 
    11L, 15L, 11L), X9 = c(17L, 6L, 16L, 13L, 15L, 3L, 12L, 15L, 
    7L, 15L, 1L, 1L, 17L, 17L, 13L, 4L, 11L, 10L, 19L, 6L, 11L, 
    3L, 3L, 3L, 9L, 10L, 12L, 4L, 5L, 17L, 8L, 12L, 16L, 12L, 
    20L, 3L, 5L, 6L, 16L, 8L, 20L, 0L, 15L, 9L, 2L, 6L, 19L, 
    7L, 11L, 7L), X10 = c(15L, 11L, 4L, 1L, 10L, 18L, 16L, 2L, 
    1L, 0L, 9L, 19L, 1L, 11L, 0L, 0L, 14L, 15L, 8L, 12L, 12L, 
    20L, 13L, 13L, 3L, 13L, 8L, 4L, 19L, 3L, 0L, 15L, 18L, 15L, 
    19L, 13L, 15L, 18L, 8L, 9L, 17L, 2L, 1L, 18L, 5L, 19L, 10L, 
    16L, 5L, 12L)), class = "data.frame", row.names = c(NA, -50L
))

I want to select the rows where any one of x1 - x10 equals any one of a vector x <-c(0:5) . In reality x1 - x10 has many NA s and the vector contains entries that are made up of letters and numbers eg X1425 , 52546 , HPO1567 etc.

I know about the %in% sign but if I wanted this to apply to multiple columns and for it to return the row if only a single vector matches only a single row what would be the best way of doing that?

My expected outcome is the whole row as it appears in the original table where any column matches any of the search vector how every many times eg multiple matches in the same row.

Ideally in base R.

Many thanks

You can try this, and then filter by Index

vec <-c(0:5)
Data$Index <- apply(Data[,-1],1,function(x) sum(x %in% vec))

   id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 Index
1   1  8  1 16 14  8 10 13  7 17  15     1
2   2 18  3  2 16  7  2 19 17  6  11     3
3   3  6 10 10  6 18 11 12  7 16   4     1
4   4 10 14 19  2 20 19 14 17 13   1     2
5   5  2 12  5  2  9  2 17 17 15  10     4
6   6 12  9 16 10 12 11 14  6  3  18     1
7   7 20  0 13 13  4  7 18 18 12  16     2
8   8 19  1 14  5  5 12 12 11 15   2     4
9   9 17  7 10  9 18 16  7 14  7   1     1
10 10  6 14 15 16 14 17  1 17 15   0     2
11 11 20  3 18 15 10  2 10  1  1   9     4
12 12  3 10 17  3  3  9 14  4  1  19     5
13 13 14  5  0 11  8 20 20 18 17   1     3
14 14 20 15  2  8  9  0 11  9 17  11     2
15 15 11  1  7  2 15 19 20 15 13   0     3
16 16 17 14  5 17 13  1 12 20  4   0     4
17 17 19 17 19  1  2 15 15 12 11  14     2
18 18  3  9  3  1  3 15  2  8 10  15     5
19 19 12 16  2  5 18  6 11  5 19   8     3
20 20 17  6 20 18  7  8 20 20  6  12     0
21 21 20 10 19  0 16  1  1  6 11  12     3
22 22 14  6 14 14 17 15  3 15  3  20     2
23 23 11  1 18 18 20 11 10  8  3  13     2
24 24  1 11 13 19 15 17 11  3  3  13     3
25 25  9  8  5  6  9 16 12 12  9   3     2
26 26  1  1 15 17 17  8 13  1 10  13     3
27 27  2  0 13 15  9 16 15 14 12   8     2
28 28  4  3  6 11 17 20 18 12  4   4     4
29 29  7 14  9 19 14 15  8  6  5  19     1
30 30 18  4 17 13 10  9 13  0 17   3     3
31 31  7 16  9  2  4  7 14  8  8   0     3
32 32 12  5 17 12  5 15  8 13 12  15     2
33 33 18 15 15  8  0 12  6 20 16  18     1
34 34  6 11  1  4  2 14 11  0 12  15     4
35 35  6 10 20 17 13 20  8 20 20  19     0
36 36  6  0 17 14 20  4 10 20  3  13     3
37 37 20 16 19  9 16 12  3 13  5  15     2
38 38 11 16 13 18 12  6 10  9  6  18     0
39 39 17 15 15 10 14  2  4  0 16   8     3
40 40 15 20  4 19 20  5  5 17  8   9     3
41 41  6  5  9 14  1 13 15  1 20  17     3
42 42  2  1  0 14  9 17 11  2  0   2     6
43 43 17  9 13 15  8  2 12 15 15   1     2
44 44 14  2  9 15 14  2 16 10  9  18     2
45 45 15 16 11  7 19  2 11  2  2   5     4
46 46 10 12  2 16 12 17  8  1  6  19     2
47 47  4  4  0  2  2  0  3 20 19  10     7
48 48  4  2  5 19  0 19  8 11  7  16     4
49 49  7 15  5 12  1 19  9 15 11   5     3
50 50  3 11 16 13  5 14  1 11  7  12     3

Data is your dput() data. I hope this helps.

I would suggest a slightly different approach from @Duck using any instead of sum , so you don't need to use a filter afterwards:

vec= c(2,20)
rows = apply(data1[,-1],1,function(x) any(x %in% vec))
data1[rows,]

   id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
2   2 18  3  2 16  7  2 19 17  6  11
4   4 10 14 19  2 20 19 14 17 13   1
5   5  2 12  5  2  9  2 17 17 15  10
7   7 20  0 13 13  4  7 18 18 12  16
8   8 19  1 14  5  5 12 12 11 15   2
11 11 20  3 18 15 10  2 10  1  1   9
13 13 14  5  0 11  8 20 20 18 17   1
14 14 20 15  2  8  9  0 11  9 17  11
15 15 11  1  7  2 15 19 20 15 13   0
16 16 17 14  5 17 13  1 12 20  4   0
17 17 19 17 19  1  2 15 15 12 11  14
18 18  3  9  3  1  3 15  2  8 10  15
19 19 12 16  2  5 18  6 11  5 19   8
20 20 17  6 20 18  7  8 20 20  6  12
21 21 20 10 19  0 16  1  1  6 11  12
22 22 14  6 14 14 17 15  3 15  3  20
23 23 11  1 18 18 20 11 10  8  3  13
27 27  2  0 13 15  9 16 15 14 12   8
28 28  4  3  6 11 17 20 18 12  4   4
31 31  7 16  9  2  4  7 14  8  8   0
33 33 18 15 15  8  0 12  6 20 16  18
34 34  6 11  1  4  2 14 11  0 12  15
35 35  6 10 20 17 13 20  8 20 20  19
36 36  6  0 17 14 20  4 10 20  3  13
37 37 20 16 19  9 16 12  3 13  5  15
39 39 17 15 15 10 14  2  4  0 16   8
40 40 15 20  4 19 20  5  5 17  8   9
41 41  6  5  9 14  1 13 15  1 20  17
42 42  2  1  0 14  9 17 11  2  0   2
43 43 17  9 13 15  8  2 12 15 15   1
44 44 14  2  9 15 14  2 16 10  9  18
45 45 15 16 11  7 19  2 11  2  2   5
46 46 10 12  2 16 12 17  8  1  6  19
47 47  4  4  0  2  2  0  3 20 19  10
48 48  4  2  5 19  0 19  8 11  7  16 

Here's a (fun) regex solution:

It works by first paste0 ing all rows into strings and then, using grepl , matching the numbers from 0-5 in these rows to finally subset the dataframe df on those rows for which matches are returned:

df[which(grepl("(?<=^| )[0-5](?= |$)", apply(df[-1], 1, paste0, collapse = " "), perl = T)),]
   id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1   1  8  1 16 14  8 10 13  7 17  15
2   2 18  3  2 16  7  2 19 17  6  11
3   3  6 10 10  6 18 11 12  7 16   4
4   4 10 14 19  2 20 19 14 17 13   1
5   5  2 12  5  2  9  2 17 17 15  10
6   6 12  9 16 10 12 11 14  6  3  18
7   7 20  0 13 13  4  7 18 18 12  16
8   8 19  1 14  5  5 12 12 11 15   2
9   9 17  7 10  9 18 16  7 14  7   1
10 10  6 14 15 16 14 17  1 17 15   0
11 11 20  3 18 15 10  2 10  1  1   9
12 12  3 10 17  3  3  9 14  4  1  19
13 13 14  5  0 11  8 20 20 18 17   1
14 14 20 15  2  8  9  0 11  9 17  11
15 15 11  1  7  2 15 19 20 15 13   0
16 16 17 14  5 17 13  1 12 20  4   0
17 17 19 17 19  1  2 15 15 12 11  14
18 18  3  9  3  1  3 15  2  8 10  15
19 19 12 16  2  5 18  6 11  5 19   8
21 21 20 10 19  0 16  1  1  6 11  12
22 22 14  6 14 14 17 15  3 15  3  20
23 23 11  1 18 18 20 11 10  8  3  13
24 24  1 11 13 19 15 17 11  3  3  13
25 25  9  8  5  6  9 16 12 12  9   3
26 26  1  1 15 17 17  8 13  1 10  13
27 27  2  0 13 15  9 16 15 14 12   8
28 28  4  3  6 11 17 20 18 12  4   4
29 29  7 14  9 19 14 15  8  6  5  19
30 30 18  4 17 13 10  9 13  0 17   3
31 31  7 16  9  2  4  7 14  8  8   0
32 32 12  5 17 12  5 15  8 13 12  15
33 33 18 15 15  8  0 12  6 20 16  18
34 34  6 11  1  4  2 14 11  0 12  15
36 36  6  0 17 14 20  4 10 20  3  13
37 37 20 16 19  9 16 12  3 13  5  15
39 39 17 15 15 10 14  2  4  0 16   8
40 40 15 20  4 19 20  5  5 17  8   9
41 41  6  5  9 14  1 13 15  1 20  17
42 42  2  1  0 14  9 17 11  2  0   2
43 43 17  9 13 15  8  2 12 15 15   1
44 44 14  2  9 15 14  2 16 10  9  18
45 45 15 16 11  7 19  2 11  2  2   5
46 46 10 12  2 16 12 17  8  1  6  19
47 47  4  4  0  2  2  0  3 20 19  10
48 48  4  2  5 19  0 19  8 11  7  16
49 49  7 15  5 12  1 19  9 15 11   5
50 50  3 11 16 13  5 14  1 11  7  12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM