[英]Javascript regex - why a terminating space is needed to match the whole string
I am parsing sql WHERE clauses and I have the following javascript ( regex )我正在解析 sql WHERE 子句,并且我有以下 javascript(正则表达式)
(?:(?:(between )(['"]?)(.*?)(\2)( and )(['"]?)(.*?)(\6)))
That I am matching against我匹配的
id BETWEEN 3 and 10
In order for this regex to work, I have to add \\s
or \\s+
at the end of the regex and include a space at the end of the string being matched.为了让这个正则表达式工作,我必须在正则表达式的末尾添加
\\s
或\\s+
并在匹配的字符串的末尾包含一个空格。
Can someone explain why this matching of the extra space is necessary to match the 10
part of the string (in capturing group 7)?有人可以解释为什么需要额外空间的这种匹配来匹配字符串的
10
部分(在捕获组 7 中)?
Note that this regex is extracted from a larger regex which is used to parse an sql filter:请注意,此正则表达式是从用于解析 sql 过滤器的较大正则表达式中提取的:
(\(*)([\w][\w\d.]*)\s*([<>!=]{1,2}|like|not like|is null|is not null|in\s*\()?\s*(?!and|or)(?:(?:(between )(['"]?)(.*?)(\5)( and )(['"]?)(.*?)(\9))|(?:(['"]?)(.*?)(\12)))\s*(\)*)\s+(?!'|")\s*(and|or)?\s*
In (?:(?:(between )(['"]?)(.*?)(\\2)( and )(['"]?)(.*?)(\\6)))
, the 6th group - (['"]?)
- matches an empty string. So, .*?
(the 7th group) appears at the end of the pattern, and being a lazy pattern, matches the least amount of characters it can match, that is, zero.在
(?:(?:(between )(['"]?)(.*?)(\\2)( and )(['"]?)(.*?)(\\6)))
,第 6 个group - (['"]?)
- 匹配一个空字符串。因此, .*?
(第 7 组)出现在模式的末尾,并且是一个惰性模式,匹配它可以匹配的最少数量的字符,即是,零。
Consider a regex like /I have a .*?/
and you try it against a I have a cat
string (see demo here ).考虑像
/I have a .*?/
这样的正则表达式,然后针对I have a cat
字符串尝试它(请参阅此处的演示)。 The regex finds I have a
and then the .*?
正则表达式发现
I have a
,然后是.*?
part - matching any zero or more chars other than linebreak chars as few as possible - matches the empty space right before cat
because that is how lazy quantifiers work: rather than match eagerly, they let subsequent patterns match, and only when they fail, the lazy pattern will "expand", ie will try to match. part -尽可能少地匹配除换行符以外的任何零个或多个字符 - 匹配
cat
之前的空白空间,因为这就是惰性量词的工作方式:它们不是急切地匹配,而是让后续模式匹配,并且只有当它们失败时,懒惰模式将“扩展”,即会尝试匹配。 That is why the lazy patterns at the end of the pattern match the minimal amount of chars they need to match: .+?
这就是为什么模式末尾的惰性模式匹配它们需要匹配的最少字符数:
.+?
will match only 1 char, and .*?
将只匹配 1 个字符,而
.*?
will match 0.将匹配 0。
See Greedy vs. Reluctant vs. Possessive Quantifiers for more information on how lazy quantifiers work.有关惰性量词如何工作的更多信息,请参阅贪婪与不情愿与占有量词。
As you cannot use a backreference to the empty string as a boundary, you will need to use alternation and capture "
and '
delimited substrings into 1 capturing group, and a sequence of non-whitespace into another.由于您不能使用对空字符串的反向引用作为边界,您将需要使用交替并将
"
和'
分隔的子字符串捕获到 1 个捕获组中,并将非空白序列捕获到另一个中。
Besdies, the \\s+
close to the end of the pattern needs to be changed into \\s*
to allow the string not to end with whitespace. Besdies,靠近模式末尾的
\\s+
需要更改为\\s*
以允许字符串不以空格结尾。
(\(*)(\w[\w.]*)\s*([<>!=]{1,2}|like|not like|is null|is not null|in\s*\()?\s*(?!and|or)(?:(?:(between )(?:(['"])(.*?)(\5)|(\S+))( and )(?:(['"])(.*?)(\10)|(\S+)))|(?:(['"])(.*?)(\14)|(\S+)))\s*(\)*)\s*(?!'|")\s*(and|or)?\s*
See this regex demo看到这个正则表达式演示
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.