简体   繁体   中英

How to find all words with x (and one or more) occurrences of a letter?

I have an answer to my second question right here: To find words with one or more occurrences of the letter 'a' in it

var re = /(\w+a)/;

With regards to the above, how does it work? For example,

var re = /(\w+a)/g;
var str = "gamma";
console.log(re.exec(str));

Output:

[ 'gamma', 'gamma', index: 0, input: 'gamma' ]

However; these are not the results I expected (although it IS what I want). That is to say, re should have found patterns such that there were any number of occurrences of \\w. Then the first occurrence of the letter 'a'. Then stop. Ie I expected: ga.
Then mma

Next, how do I look for words with a pre-defined number of occurrences (call it x) of the letter 'a'. Such that f(x)=gamma iff x=2.

Repetition in regex is greedy . That is it takes as much as possible. You happen to get the full word, because it ends in an a . To make it ungreedy, (stop at the first one), you'd use:

\w+?a

But to actually get the full word, I'd rather use

\w*a\w*

Note the * , otherwise you'll get problems with words that have an a only as the first or last letter.

To get words with exactly 2 a you need to exclude a from the repeated letters. This is best done with a negated character class, that disallows non-word characters and a s. In addition you need to make sure, that you get full words. This is easily done with the word boundary \\b :

\b[^\Wa]*a[^\Wa]*a[^\Wa]*\b

For more flexibility in terms of the number of repetitions, this can be rewritten as

\b[^\Wa]*(?:a[^\Wa]*){2}\b

Regular expressions are greedy by default. That means that if they can grab more characters they will. You need to consider greed when using quantifiers, like + and *.

To make a quantifier not greedy (lazy) suffix it with a ? .

/(\w+?a)/

You can use regex for something, such as

/\b\w*a\w*\b/ - find a word with at least 1 a (can match the word 'a')
/\b\w*(?:a\w*){2}\b/ - find a word with at least 2 as

But it gets tricky when the amount is exact, because you must change the \\w to include all letters except a ... works by the negated class, thus

/\b[^\Wa]*(?:a[^\Wa]*){2}\b/ - matches a word with exactly 2 as 

To find the syllables or so until the "a" letter, then you can use

/\b(?:[^\Wa]*a)/ - matches ga alone and in gamma

/\b(?:[^\Wa]*a){1,4}/ - matches word having 1-4 a, ending in a.

The easiest way to achieve something like this is however is to match all words /\\w+/, and filter them by Javascript.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM