Javascript regex - 为什么需要终止空格来匹配整个字符串

Question

I am parsing sql WHERE clauses and I have the following javascript ( regex )我正在解析 sql WHERE 子句，并且我有以下 javascript（正则表达式）

(?:(?:(between )(['"]?)(.*?)(\2)( and )(['"]?)(.*?)(\6)))

That I am matching against我匹配的

id BETWEEN 3 and 10

In order for this regex to work, I have to add \\s or \\s+ at the end of the regex and include a space at the end of the string being matched.为了让这个正则表达式工作，我必须在正则表达式的末尾添加\\s或\\s+并在匹配的字符串的末尾包含一个空格。

Can someone explain why this matching of the extra space is necessary to match the 10 part of the string (in capturing group 7)?有人可以解释为什么需要额外空间的这种匹配来匹配字符串的10部分（在捕获组 7 中）？

Note that this regex is extracted from a larger regex which is used to parse an sql filter:请注意，此正则表达式是从用于解析 sql 过滤器的较大正则表达式中提取的：

(\(*)([\w][\w\d.]*)\s*([<>!=]{1,2}|like|not like|is null|is not null|in\s*\()?\s*(?!and|or)(?:(?:(between )(['"]?)(.*?)(\5)( and )(['"]?)(.*?)(\9))|(?:(['"]?)(.*?)(\12)))\s*(\)*)\s+(?!'|")\s*(and|or)?\s*

Answer 1

In (?:(?:(between )(['"]?)(.*?)(\\2)( and )(['"]?)(.*?)(\\6))) , the 6th group - (['"]?) - matches an empty string. So, .*? (the 7th group) appears at the end of the pattern, and being a lazy pattern, matches the least amount of characters it can match, that is, zero.在(?:(?:(between )(['"]?)(.*?)(\\2)( and )(['"]?)(.*?)(\\6))) ，第 6 个group - (['"]?) - 匹配一个空字符串。因此， .*? （第 7 组）出现在模式的末尾，并且是一个惰性模式，匹配它可以匹配的最少数量的字符，即是，零。

Consider a regex like /I have a .*?/ and you try it against a I have a cat string (see demo here ).考虑像/I have a .*?/这样的正则表达式，然后针对I have a cat字符串尝试它（请参阅此处的演示）。 The regex finds I have a and then the .*?正则表达式发现I have a ，然后是.*? part - matching any zero or more chars other than linebreak chars as few as possible - matches the empty space right before cat because that is how lazy quantifiers work: rather than match eagerly, they let subsequent patterns match, and only when they fail, the lazy pattern will "expand", ie will try to match. part -尽可能少地匹配除换行符以外的任何零个或多个字符 - 匹配cat之前的空白空间，因为这就是惰性量词的工作方式：它们不是急切地匹配，而是让后续模式匹配，并且只有当它们失败时，懒惰模式将“扩展”，即会尝试匹配。 That is why the lazy patterns at the end of the pattern match the minimal amount of chars they need to match: .+?这就是为什么模式末尾的惰性模式匹配它们需要匹配的最少字符数： .+? will match only 1 char, and .*?将只匹配 1 个字符，而.*? will match 0.将匹配 0。

See Greedy vs. Reluctant vs. Possessive Quantifiers for more information on how lazy quantifiers work.有关惰性量词如何工作的更多信息，请参阅贪婪与不情愿与占有量词。

As you cannot use a backreference to the empty string as a boundary, you will need to use alternation and capture " and ' delimited substrings into 1 capturing group, and a sequence of non-whitespace into another.由于您不能使用对空字符串的反向引用作为边界，您将需要使用交替并将"和'分隔的子字符串捕获到 1 个捕获组中，并将非空白序列捕获到另一个中。

Besdies, the \\s+ close to the end of the pattern needs to be changed into \\s* to allow the string not to end with whitespace. Besdies，靠近模式末尾的\\s+需要更改为\\s*以允许字符串不以空格结尾。

(\(*)(\w[\w.]*)\s*([<>!=]{1,2}|like|not like|is null|is not null|in\s*\()?\s*(?!and|or)(?:(?:(between )(?:(['"])(.*?)(\5)|(\S+))( and )(?:(['"])(.*?)(\10)|(\S+)))|(?:(['"])(.*?)(\14)|(\S+)))\s*(\)*)\s*(?!'|")\s*(and|or)?\s*

See this regex demo看到这个正则表达式演示

Javascript regex - 为什么需要终止空格来匹配整个字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-10-15 21:15:36

Javascript regex - 为什么需要终止空格来匹配整个字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-10-15 21:15:36

解决方案1
1 已采纳 2016-10-15 21:15:36