简体   繁体   中英

filter/subset/delete rows that contain character in middle of string in R

I've got a dataframe with a column containing peptide sequences and I want to keep only rows that have no internal "R" or "K" in their string.

df1 <- data.frame(
    Peptide = c("ABCOIIJUHFSAUJHR", "AOFIAUKOAISDFUK", 'ASOIRDFHAOHFKK'))


df1 #check output

As output I would like to keep only the first row (ie "ABCOIIJUHFSAUJHR").

I have tried using filter (dplyr) and str_locate_all from the stringr package and length but couldn't figure it out.

Any help would be much appreciated.

Thanks Moe

We can skip with the first and last character ( ^. , .$ ) and match zero or more characters that are not an R or K ( [^RK]* ) in grep and use that to subset the dataset

df1[grepl("^.[^RK]*.$", df1$Peptide), , drop = FALSE]
#           Peptide
#1 ABCOIIJUHFSAUJHR

Here's the dplyr solution: str_locate is the tidyverse equivalent to grepl , so the code looks like this:

df2 <- df1 %>%
  filter(Peptide %>% str_detect("^.[^RK]*.$"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM