简体   繁体   中英

String match repeated letter and ignore other letters between the repetitions

I have a list of words. I want to count the words that have a certain letter repeatedly appears. I don't mind how many times the letter repeated appears, as long as it appears at least twice. I don't mind if the repetition is adjacent or not. I want to include both "ppa" and "pepa" for example.

fruit <- c("apple", "banana", "pear", "pineapple", "papaya")

Say this is my list. My target letter is "p". I want to count words that have at least two "p". So I want to count "apple", "pineapple", and "papaya". The number I want to obtain is 3.

I've tried

str_detect(fruit, "p[abcdefghijklmmoqrstuvwxyz]p")

But this does not count "apple" and "pineapple". Is there a way to have all three words included?

A non-regex way to approach the problem is to count number of 'p' in fruits . This can be done using str_count function.

library(stringr)
fruit[str_count(fruit, 'p') > 1]
#[1] "apple"     "pineapple" "papaya"   

If you want output as 3, you can sum the output instead of subsetting.

sum(str_count(fruit, 'p') > 1)
#[1] 3

where str_count returns the number of times the pattern is repeated which in our case is 'p' .

str_count(fruit, 'p')
#[1] 2 0 1 3 2

If you really want to use regex to solve this problem, one of the many ways could be:

p[a-zA-Z]*p

The regex essentially looks for at least two 'p' along with other alphabets. The total number of matches is the expected output you are looking for.

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM