I've got a dataframe with a column containing peptide sequences and I want to keep only rows that have no internal "R" or "K" in their string.
df1 <- data.frame(
Peptide = c("ABCOIIJUHFSAUJHR", "AOFIAUKOAISDFUK", 'ASOIRDFHAOHFKK'))
df1 #check output
As output I would like to keep only the first row (ie "ABCOIIJUHFSAUJHR").
I have tried using filter (dplyr) and str_locate_all from the stringr package and length but couldn't figure it out.
Any help would be much appreciated.
Thanks Moe
We can skip with the first and last character ( ^.
, .$
) and match zero or more characters that are not an R or K ( [^RK]*
) in grep
and use that to subset the dataset
df1[grepl("^.[^RK]*.$", df1$Peptide), , drop = FALSE]
# Peptide
#1 ABCOIIJUHFSAUJHR
Here's the dplyr
solution: str_locate
is the tidyverse equivalent to grepl
, so the code looks like this:
df2 <- df1 %>%
filter(Peptide %>% str_detect("^.[^RK]*.$"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.