简体   繁体   English

字符串匹配重复的字母并忽略重复之间的其他字母

[英]String match repeated letter and ignore other letters between the repetitions

I have a list of words.我有一个单词列表。 I want to count the words that have a certain letter repeatedly appears.我想统计某个字母重复出现的单词。 I don't mind how many times the letter repeated appears, as long as it appears at least twice.我不介意这个字母重复出现多少次,只要它至少出现两次即可。 I don't mind if the repetition is adjacent or not.我不介意重复是否相邻。 I want to include both "ppa" and "pepa" for example.例如,我想包括“ppa”和“pepa”。

fruit <- c("apple", "banana", "pear", "pineapple", "papaya")

Say this is my list.说这是我的清单。 My target letter is "p".我的目标字母是“p”。 I want to count words that have at least two "p".我想计算至少有两个“p”的单词。 So I want to count "apple", "pineapple", and "papaya".所以我要数“苹果”、“菠萝”、“木瓜”。 The number I want to obtain is 3.我要获取的数字是 3。

I've tried我试过了

str_detect(fruit, "p[abcdefghijklmmoqrstuvwxyz]p")

But this does not count "apple" and "pineapple".但这还不算“苹果”和“菠萝”。 Is there a way to have all three words included?有没有办法包含所有三个词?

A non-regex way to approach the problem is to count number of 'p' in fruits .解决该问题的一种非正则表达式方法是计算fruits'p'的数量。 This can be done using str_count function.这可以使用str_count函数来完成。

library(stringr)
fruit[str_count(fruit, 'p') > 1]
#[1] "apple"     "pineapple" "papaya"   

If you want output as 3, you can sum the output instead of subsetting.如果您希望输出为 3,则可以对输出sum而不是子集化。

sum(str_count(fruit, 'p') > 1)
#[1] 3

where str_count returns the number of times the pattern is repeated which in our case is 'p' .其中str_count返回模式重复的次数,在我们的例子中是'p'

str_count(fruit, 'p')
#[1] 2 0 1 3 2

If you really want to use regex to solve this problem, one of the many ways could be:如果您真的想使用正则表达式来解决此问题,那么多种方法之一可能是:

p[a-zA-Z]*p

The regex essentially looks for at least two 'p' along with other alphabets.正则表达式基本上会查找至少两个“p”以及其他字母。 The total number of matches is the expected output you are looking for.匹配总数就是您要查找的预期输出。

Demo演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM