[英]String match repeated letter and ignore other letters between the repetitions
I have a list of words.我有一个单词列表。 I want to count the words that have a certain letter repeatedly appears.我想统计某个字母重复出现的单词。 I don't mind how many times the letter repeated appears, as long as it appears at least twice.我不介意这个字母重复出现多少次,只要它至少出现两次即可。 I don't mind if the repetition is adjacent or not.我不介意重复是否相邻。 I want to include both "ppa" and "pepa" for example.例如,我想包括“ppa”和“pepa”。
fruit <- c("apple", "banana", "pear", "pineapple", "papaya")
Say this is my list.说这是我的清单。 My target letter is "p".我的目标字母是“p”。 I want to count words that have at least two "p".我想计算至少有两个“p”的单词。 So I want to count "apple", "pineapple", and "papaya".所以我要数“苹果”、“菠萝”、“木瓜”。 The number I want to obtain is 3.我要获取的数字是 3。
I've tried我试过了
str_detect(fruit, "p[abcdefghijklmmoqrstuvwxyz]p")
But this does not count "apple" and "pineapple".但这还不算“苹果”和“菠萝”。 Is there a way to have all three words included?有没有办法包含所有三个词?
A non-regex way to approach the problem is to count number of 'p'
in fruits
.解决该问题的一种非正则表达式方法是计算fruits
中'p'
的数量。 This can be done using str_count
function.这可以使用str_count
函数来完成。
library(stringr)
fruit[str_count(fruit, 'p') > 1]
#[1] "apple" "pineapple" "papaya"
If you want output as 3, you can sum
the output instead of subsetting.如果您希望输出为 3,则可以对输出sum
而不是子集化。
sum(str_count(fruit, 'p') > 1)
#[1] 3
where str_count
returns the number of times the pattern is repeated which in our case is 'p'
.其中str_count
返回模式重复的次数,在我们的例子中是'p'
。
str_count(fruit, 'p')
#[1] 2 0 1 3 2
If you really want to use regex to solve this problem, one of the many ways could be:如果您真的想使用正则表达式来解决此问题,那么多种方法之一可能是:
p[a-zA-Z]*p
The regex essentially looks for at least two 'p' along with other alphabets.正则表达式基本上会查找至少两个“p”以及其他字母。 The total number of matches is the expected output you are looking for.匹配总数就是您要查找的预期输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.