filter/subset/delete rows that contain character in middle of string in R

Question

I've got a dataframe with a column containing peptide sequences and I want to keep only rows that have no internal "R" or "K" in their string.

df1 <- data.frame(
    Peptide = c("ABCOIIJUHFSAUJHR", "AOFIAUKOAISDFUK", 'ASOIRDFHAOHFKK'))


df1 #check output

As output I would like to keep only the first row (ie "ABCOIIJUHFSAUJHR").

I have tried using filter (dplyr) and str_locate_all from the stringr package and length but couldn't figure it out.

Any help would be much appreciated.

Thanks Moe

Answer 1

We can skip with the first and last character ( ^. , .$ ) and match zero or more characters that are not an R or K ( [^RK]* ) in grep and use that to subset the dataset

df1[grepl("^.[^RK]*.$", df1$Peptide), , drop = FALSE]
#           Peptide
#1 ABCOIIJUHFSAUJHR

Answer 2

Here's the dplyr solution: str_locate is the tidyverse equivalent to grepl , so the code looks like this:

df2 <- df1 %>%
  filter(Peptide %>% str_detect("^.[^RK]*.$"))

filter/subset/delete rows that contain character in middle of string in R

Question

2 answers

solution1
5 2018-04-20 03:59:21

solution2
3 ACCPTED 2018-04-20 04:12:12

filter/subset/delete rows that contain character in middle of string in R

Question

2 answers

solution1 5 2018-04-20 03:59:21

solution2 3 ACCPTED 2018-04-20 04:12:12

solution1
5 2018-04-20 03:59:21

solution2
3 ACCPTED 2018-04-20 04:12:12