简体   繁体   中英

Output the first character that causes mismatch with a regular expression

Is it possible to output the first character from a string (its index) that causes a mismatch with a regular expression? Is it possible with just using regular expression matching operations or something more complex must be employed?

For instance, in JavaScript, I may have a regular expression /^\\d{3}\\s\\d{2}$/ that matches string with 3 digits followed by a whitespace and another 2 digits. I have a string "123a45" to which I apply this regular expression. Doing this (eg, "123a45".match(/^\\d{3}\\s\\d{2}$/) ) returns null since the regular expression is not matched. How can I get the first character that causes this mismatch (in this case "a" , the character with the index 3)?

One use case for this could be to point user directly to the character that causes a string entered by the user to be invalid according to some regular expression used for its validation.

You would need to break-down the regex pattern to all possible matching patterns for partial matches and such list of patterns ordered from the longest match to the shortest one (or none). Once you got match, calculating the lenght of (partial) match you'll get position of the character that causes mismatch. Substring from that position with length of one character is exactly character that is behind this mismatch (if some). If there is no mismatch, then it returns empty (sub-)string.

var s = "123a45";
alert(s.substr(s.match(/^(\d{3}\s\d{2}|\d{3}\s\d|\d{3}\s|\d{0,3})/)[1].length,1));

http://jsfiddle.net/ETWWS/

To provide detailed explanation on why the input is invalid, it is better to write a small parser and provide feedback instead. It is possible to point user to the character that is causing problem, and give more helpful and targeted error message.

In the parser, you may use regex to assert certain property in the string to generate targeted error message. For example, if the input must contain 6 character, and the first 3 characters are number, and the last 3 are alphabet characters, then you can write a regex to assert the length of input to report the error to the user.

Either that, or just use regex you have been using and provide a generic error message (with helpful instruction on how to enter correctly). A normal user should be able to enter the data correctly in at most 2-3 tries. Above that, it may be malicious user, or the data to be entered is not applicable to all user, or your instruction is lacking.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM