简体   繁体   中英

Regex to find all the words with at least 3 specific characters

I am solving a problem using regular expressions in which I need to find all the words in a sentence having at least 3 specific characters. Let's say I have following data:

Define a pattern for selecting all words containing at least three times the character a (including its uppercase variant A ).

The example sentence for the test is:

Anastasia would like to have a banana split.

So what I have done is that I've compiled all the possible situations that I can face:

在此处输入图片说明

So far I have written a regular expression using pipes for the 4th and 6th situation and it works for the given text.

"\\b(\\b[Aa]{1}[^Aa\\W\\s]*[Aa]{1}[^Aa]*[Aa]{1,}\\w*\\b)|(\\b[^Aa\\W]*[Aa]{1}[^Aa\\W]*[Aa]{1}[^Aa\\W]*[Aa]{1,}\\w*\\b)"
  • Am I doing it right?
  • Is it efficient?
  • Is there a concept in regular expressions that allow me to count for specific characters?
  • I learnt in "Theory of Automata" that NFA/DFA are limited in a way that count can not be tracked. So do I have to use advanced turing machines?

That looks quite convoluted. I think it would be quicker and easier to start at a word boundary, and repeat a group that contains (zero or more non-A, non-space characters, followed by a single A character) 3 times, followed by more characters until you get to the next space:

\b(?:[^a ]*a){3}\w*

https://regex101.com/r/ZVxATc/2

(of course, make sure to use the case-insensitive flag so you don't have to spell out things like [aA] )

How about /^([^a]*a){3}[^a]*$/ .

This will find exactly 3 a characters in a string.

This can be seen working here , and I add a few test strings in the following:

 const regex = /^([^a]*a){3}[^a]*$/; const strings = ['abcabcabc', 'abcabc', 'abcabcabcabc', 'aaa', 'abab', 'ababa', 'aa a', 'a ba ba', 'a ab ab', 'a ab ab ab', 'b ab ab ab']; for (let i = 0; i < strings.length; i++) { console.log(strings[i] + ": " + regex.test(strings[i])); }

Here is a solution, that uses look ahead:

\b(?=([^ ]*a){3,})\w*\b

It starts at a word boundary, then creates a look ahead checking:

zero or more non Space chacacters followed by a 'a' . It repeats this 3 or more times.

Then it matches zero or more Word characters and finally a Word boundary.

You should use the 'IgnoreCase' flag.

Examples of match:

abcabcabc banana aaa aaabbaa

 const regex = /\\b(?=([^ ]*a){3,})\\w*\\b/; const strings = ['abcabcabc', 'abcabc', 'abcabcabcabc', 'aaa', 'abab', 'ababa', 'aa a', 'a ba ba', 'a ab ab', 'a ab ab ab', 'b ab ab ab']; for (let i = 0; i < strings.length; i++) { console.log(strings[i] + ": " + regex.test(strings[i])); }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM