简体   繁体   English

Javascript regex - 为什么需要终止空格来匹配整个字符串

[英]Javascript regex - why a terminating space is needed to match the whole string

I am parsing sql WHERE clauses and I have the following javascript ( regex )我正在解析 sql WHERE 子句,并且我有以下 javascript(正则表达式

(?:(?:(between )(['"]?)(.*?)(\2)( and )(['"]?)(.*?)(\6)))

That I am matching against我匹配的

id BETWEEN 3 and 10

In order for this regex to work, I have to add \\s or \\s+ at the end of the regex and include a space at the end of the string being matched.为了让这个正则表达式工作,我必须在正则表达式的末尾添加\\s\\s+并在匹配的字符串的末尾包含一个空格。

Can someone explain why this matching of the extra space is necessary to match the 10 part of the string (in capturing group 7)?有人可以解释为什么需要额外空间的这种匹配来匹配字符串的10部分(在捕获组 7 中)?

Note that this regex is extracted from a larger regex which is used to parse an sql filter:请注意,此正则表达式是从用于解析 sql 过滤器的较大正则表达式中提取的:

(\(*)([\w][\w\d.]*)\s*([<>!=]{1,2}|like|not like|is null|is not null|in\s*\()?\s*(?!and|or)(?:(?:(between )(['"]?)(.*?)(\5)( and )(['"]?)(.*?)(\9))|(?:(['"]?)(.*?)(\12)))\s*(\)*)\s+(?!'|")\s*(and|or)?\s*

In (?:(?:(between )(['"]?)(.*?)(\\2)( and )(['"]?)(.*?)(\\6))) , the 6th group - (['"]?) - matches an empty string. So, .*? (the 7th group) appears at the end of the pattern, and being a lazy pattern, matches the least amount of characters it can match, that is, zero.(?:(?:(between )(['"]?)(.*?)(\\2)( and )(['"]?)(.*?)(\\6))) ,第 6 个group - (['"]?) - 匹配一个空字符串。因此, .*? (第 7 组)出现在模式的末尾,并且是一个惰性模式,匹配它可以匹配的最少数量的字符,即是,零。

Consider a regex like /I have a .*?/ and you try it against a I have a cat string (see demo here ).考虑像/I have a .*?/这样的正则表达式,然后针对I have a cat字符串尝试它(请参阅此处的演示)。 The regex finds I have a and then the .*?正则表达式发现I have a ,然后是.*? part - matching any zero or more chars other than linebreak chars as few as possible - matches the empty space right before cat because that is how lazy quantifiers work: rather than match eagerly, they let subsequent patterns match, and only when they fail, the lazy pattern will "expand", ie will try to match. part -尽可能少地匹配除换行符以外的任何零个或多个字符 - 匹配cat之前的空白空间,因为这就是惰性量词的工作方式:它们不是急切地匹配,而是让后续模式匹配,并且只有当它们失败时,懒惰模式将“扩展”,即会尝试匹配。 That is why the lazy patterns at the end of the pattern match the minimal amount of chars they need to match: .+?这就是为什么模式末尾的惰性模式匹配它们需要匹配的最少字符数: .+? will match only 1 char, and .*?将只匹配 1 个字符,而.*? will match 0.将匹配 0。

See Greedy vs. Reluctant vs. Possessive Quantifiers for more information on how lazy quantifiers work.有关惰性量词如何工作的更多信息,请参阅贪婪与不情愿与占有量词

As you cannot use a backreference to the empty string as a boundary, you will need to use alternation and capture " and ' delimited substrings into 1 capturing group, and a sequence of non-whitespace into another.由于您不能使用对空字符串的反向引用作为边界,您将需要使用交替并将"'分隔的子字符串捕获到 1 个捕获组中,并将非空白序列捕获到另一个中。

Besdies, the \\s+ close to the end of the pattern needs to be changed into \\s* to allow the string not to end with whitespace. Besdies,靠近模式末尾的\\s+需要更改为\\s*以允许字符串不以空格结尾。

(\(*)(\w[\w.]*)\s*([<>!=]{1,2}|like|not like|is null|is not null|in\s*\()?\s*(?!and|or)(?:(?:(between )(?:(['"])(.*?)(\5)|(\S+))( and )(?:(['"])(.*?)(\10)|(\S+)))|(?:(['"])(.*?)(\14)|(\S+)))\s*(\)*)\s*(?!'|")\s*(and|or)?\s*

See this regex demo看到这个正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM