Regex to find all the words with at least 3 specific characters

Question

I am solving a problem using regular expressions in which I need to find all the words in a sentence having at least 3 specific characters. Let's say I have following data:

Define a pattern for selecting all words containing at least three times the character a (including its uppercase variant A ).

The example sentence for the test is:

Anastasia would like to have a banana split.

So what I have done is that I've compiled all the possible situations that I can face:

So far I have written a regular expression using pipes for the 4th and 6th situation and it works for the given text.

"\\b(\\b[Aa]{1}[^Aa\\W\\s]*[Aa]{1}[^Aa]*[Aa]{1,}\\w*\\b)|(\\b[^Aa\\W]*[Aa]{1}[^Aa\\W]*[Aa]{1}[^Aa\\W]*[Aa]{1,}\\w*\\b)"

Am I doing it right?
Is it efficient?
Is there a concept in regular expressions that allow me to count for specific characters?
I learnt in "Theory of Automata" that NFA/DFA are limited in a way that count can not be tracked. So do I have to use advanced turing machines?

Answer 1

That looks quite convoluted. I think it would be quicker and easier to start at a word boundary, and repeat a group that contains (zero or more non-A, non-space characters, followed by a single A character) 3 times, followed by more characters until you get to the next space:

\b(?:[^a ]*a){3}\w*

https://regex101.com/r/ZVxATc/2

(of course, make sure to use the case-insensitive flag so you don't have to spell out things like [aA] )

Answer 2

How about /^([^a]*a){3}[^a]*$/ .

This will find exactly 3 a characters in a string.

This can be seen working here , and I add a few test strings in the following:

 const regex = /^([^a]*a){3}[^a]*$/; const strings = ['abcabcabc', 'abcabc', 'abcabcabcabc', 'aaa', 'abab', 'ababa', 'aa a', 'a ba ba', 'a ab ab', 'a ab ab ab', 'b ab ab ab']; for (let i = 0; i < strings.length; i++) { console.log(strings[i] + ": " + regex.test(strings[i])); }

Answer 3

Here is a solution, that uses look ahead:

\b(?=([^ ]*a){3,})\w*\b

It starts at a word boundary, then creates a look ahead checking:

zero or more non Space chacacters followed by a 'a' . It repeats this 3 or more times.

Then it matches zero or more Word characters and finally a Word boundary.

You should use the 'IgnoreCase' flag.

Examples of match:

abcabcabc banana aaa aaabbaa

 const regex = /\\b(?=([^ ]*a){3,})\\w*\\b/; const strings = ['abcabcabc', 'abcabc', 'abcabcabcabc', 'aaa', 'abab', 'ababa', 'aa a', 'a ba ba', 'a ab ab', 'a ab ab ab', 'b ab ab ab']; for (let i = 0; i < strings.length; i++) { console.log(strings[i] + ": " + regex.test(strings[i])); }

Regex to find all the words with at least 3 specific characters

Question

3 answers

solution1
1 2018-10-29 01:36:14

solution2
0 2018-10-29 01:39:26

solution3
0 2018-10-29 02:38:36

Regex to find all the words with at least 3 specific characters

Question

3 answers

solution1 1 2018-10-29 01:36:14

solution2 0 2018-10-29 01:39:26

solution3 0 2018-10-29 02:38:36

solution1
1 2018-10-29 01:36:14

solution2
0 2018-10-29 01:39:26

solution3
0 2018-10-29 02:38:36