简体   繁体   中英

RegEx for matching words only formed with a list of letters

Given a set of words, I need to know which words are formed only by a set of letters. This word can not have more letters than allowed, even if this letter is part of the verification set.

Example:

Char set: a, a, ã, c, e, l, m, m, m, o, o, o, o, t (fixed set)

Words set: mom, ace, to, toooo, ten, all, aaa (variable set)

Result:

mom = true
ace = true
to = true
toooo = true
ten = false (n is not in the set)
all = false (there is only 1 L in the set)
aaa = false (theres is only 2 A in the set)

How to generate this regular expression in Javascript? (Case sensitive is not a problem).

I have tried this code without success:

var str = "ten"
var patt = new RegExp("^[a, a, ã, c, e, l, m, m, m, o, o, o, o, t]*");
console.log(patt.test(str));

Although I feel this task is more suited by writing some code and not using regex. But one approach I can think, that should work is using negative look ahead.

Let's take your character set as the example, that your allowed words can have following letters and not exceeding the amount as they are present in the list.

a, a, ã, c, e, l, m, m, m, o, o, o, o, t

We can write following regex which uses negative lookahead to discard strings which contain more number of characters than allowed as per above set for each character, and finally capture the word using allowed character set from 1 to N number of characters where N is the total number of characters.

^(?!([^a]*a){3})(?!([^ã]*ã){2})(?!([^c]*c){2})(?!([^e]*e){2})(?!([^l]*l){2})(?!([^m]*m){4})(?!([^o]*o){5})(?!([^t]*t){2})[aãcelmot]{1,14}$

Explanation:

  • ^ - Start of string
  • (?!([^a]*a){3}) - This negative lookahead will reject input if number of a in the string is 3 or more as total number of a in set is only 2.
  • (?!([^ã]*ã){2}) - Similarly, this negative lookahead will reject input if number of ã in the string is 2 or more as in the set total number of ã is only one.
  • And so on for all the characters
  • [aãcelmot]{1,14} - This character set captures allowed characters at least one to max 14, although we can also simply write + as check for max number of allowed characters is already done using negative look ahead.
  • $ - End of string

JS Code Demo,

 const arr = ['mom','ace','to','toooo','ten','all','aaa'] arr.forEach(x => console.log(x + " --> " +/^(?!([^a]*a){3})(?!([^ã]*ã){2})(?!([^c]*c){2})(?!([^e]*e){2})(?!([^l]*l){2})(?!([^m]*m){4})(?!([^o]*o){5})(?!([^t]*t){2})[aãcelmot]{1,14}$/.test(x))) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM