I am trying to use grep
to test whether a vector of strings are present in an another vector or not, and to output the values that are present (the matching patterns).
I have a data frame like this:
FirstName Letter
Alex A1
Alex A6
Alex A7
Bob A1
Chris A9
Chris A6
I have a vector of strings patterns to be found in the "Letter" columns, for example: c("A1", "A9", "A6")
.
I would like to check whether the any of the strings in the pattern vector is present in the "Letter" column. If they are, I would like the output of unique values.
The problem is, I don't know how to use grep
with multiple patterns. I tried:
matches <- unique (
grep("A1| A9 | A6", myfile$Letter, value=TRUE, fixed=TRUE)
)
But it gives me 0 matches which is not true, any suggestions?
In addition to @Marek's comment about not including fixed==TRUE
, you also need to not have the spaces in your regular expression. It should be "A1|A9|A6"
.
You also mention that there are lots of patterns. Assuming that they are in a vector
toMatch <- c("A1", "A9", "A6")
Then you can create your regular expression directly using paste
and collapse = "|"
.
matches <- unique (grep(paste(toMatch,collapse="|"),
myfile$Letter, value=TRUE))
Good answers, however don't forget about filter()
from dplyr:
patterns <- c("A1", "A9", "A6")
>your_df
FirstName Letter
1 Alex A1
2 Alex A6
3 Alex A7
4 Bob A1
5 Chris A9
6 Chris A6
result <- filter(your_df, grepl(paste(patterns, collapse="|"), Letter))
>result
FirstName Letter
1 Alex A1
2 Alex A6
3 Bob A1
4 Chris A9
5 Chris A6
This should work:
grep(pattern = 'A1|A9|A6', x = myfile$Letter)
Or even more simply:
library(data.table)
myfile$Letter %like% 'A1|A9|A6'
Based on Brian Digg's post, here are two helpful functions for filtering lists:
#Returns all items in a list that are not contained in toMatch
#toMatch can be a single item or a list of items
exclude <- function (theList, toMatch){
return(setdiff(theList,include(theList,toMatch)))
}
#Returns all items in a list that ARE contained in toMatch
#toMatch can be a single item or a list of items
include <- function (theList, toMatch){
matches <- unique (grep(paste(toMatch,collapse="|"),
theList, value=TRUE))
return(matches)
}
Have you tried the match()
or charmatch()
functions?
Example use:
match(c("A1", "A9", "A6"), myfile$Letter)
To add to Brian Diggs answer.
another way using grepl will return a data frame containing all your values.
toMatch <- myfile$Letter
matches <- myfile[grepl(paste(toMatch, collapse="|"), myfile$Letter), ]
matches
Letter Firstname
1 A1 Alex
2 A6 Alex
4 A1 Bob
5 A9 Chris
6 A6 Chris
Maybe a bit cleaner... maybe?
Not sure whether this answer has already appeared...
For the particular pattern in the question, you can just do it with a single grep()
call,
grep("A[169]", myfile$Letter)
Take away the spaces. So do:
matches <- unique(grep("A1|A9|A6", myfile$Letter, value=TRUE, fixed=TRUE))
Using the sapply
patterns <- c("A1", "A9", "A6")
df <- data.frame(name=c("A","Ale","Al","lex","x"),Letters=c("A1","A2","A9","A1","A9"))
name Letters
1 A A1
2 Ale A2
3 Al A9
4 lex A1
5 x A9
df[unlist(sapply(patterns, grep, df$Letters, USE.NAMES = F)), ]
name Letters
1 A A1
4 lex A1
3 Al A9
5 x A9
Another option would be using the syntax like '\\b(A1|A9|A6)\\b'
as the pattern. This is for regular expressions word boundary which comes in hand for example if Bob had the letters for example "A7,A1", when using that syntax, you can still extract the row. Here is a reproducible example for both options:
df <- read.table(text="FirstName Letter
Alex A1
Alex A6
Alex A7
Bob A1
Chris A9
Chris A6", header = TRUE)
df
#> FirstName Letter
#> 1 Alex A1
#> 2 Alex A6
#> 3 Alex A7
#> 4 Bob A1
#> 5 Chris A9
#> 6 Chris A6
with(df, df[grep('\\b(A1|A9|A6)\\b', Letter),])
#> FirstName Letter
#> 1 Alex A1
#> 2 Alex A6
#> 4 Bob A1
#> 5 Chris A9
#> 6 Chris A6
df2 <- read.table(text="FirstName Letter
Alex A1
Alex A6
Alex A7,A1
Bob A1
Chris A9
Chris A6", header = TRUE)
df2
#> FirstName Letter
#> 1 Alex A1
#> 2 Alex A6
#> 3 Alex A7,A1
#> 4 Bob A1
#> 5 Chris A9
#> 6 Chris A6
with(df2, df2[grep('A1|A9|A6', Letter),])
#> FirstName Letter
#> 1 Alex A1
#> 2 Alex A6
#> 3 Alex A7,A1
#> 4 Bob A1
#> 5 Chris A9
#> 6 Chris A6
Created on 2022-07-16 by the reprex package (v2.0.1)
Please note: if you are using R v4.1+, you can use \\b
, otherwise use \b
.
I suggest writing a little script and doing multiple searches with Grep. I've never found a way to search for multiple patterns, and believe me, I've looked!
Like so, your shell file, with an embedded string:
#!/bin/bash
grep *A6* "Alex A1 Alex A6 Alex A7 Bob A1 Chris A9 Chris A6";
grep *A7* "Alex A1 Alex A6 Alex A7 Bob A1 Chris A9 Chris A6";
grep *A8* "Alex A1 Alex A6 Alex A7 Bob A1 Chris A9 Chris A6";
Then run by typing myshell.sh.
If you want to be able to pass in the string on the command line, do it like this, with a shell argument--this is bash notation btw:
#!/bin/bash
$stingtomatch = "${1}";
grep *A6* "${stingtomatch}";
grep *A7* "${stingtomatch}";
grep *A8* "${stingtomatch}";
And so forth.
If there are a lot of patterns to match, you can put it in a for loop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.