简体   繁体   中英

Regex: String.split and Pattern.matches does not agree with each other

I'm practicing my regex lookarounds and, to do so, I'm trying to extract the table name from an SQL insert statement. I have the regex (?<=INSERT INTO )\\w+(?= (\\(|VALUES).+) and I'm testing it on the String INSERT INTO tests VALUES (regex, test) . While I'm aware that my regex isn't meticulously done, I expect it to match the tests substring of my input.

I'm using Java's regex engine and I'm printing out the results of what happens when I String.split on a regex and when I Pattern.matches on a regex. I get the following, seemingly-contradicting results

regex> (?<=INSERT INTO )\w+(?= (\(|VALUES).+)
string> INSERT INTO tests VALUES (regex, test)
[INSERT INTO ,  VALUES (regex, test)]
regex> (?<=INSERT INTO )\w+(?= (\(|VALUES).+)
string> INSERT INTO tests VALUES (regex, test)
false

Now just to get this on the record, the code that produced the first result is

Arrays.toString(searchString.split(regex))

while the second one came from

Pattern.matches(regex, searchString)

Isn't it that split splits a string on the matches to its argument? That means that the regex matched tests hence the result [INSERT INTO , VALUES (regex, test)] . So, why did Pattern.matches return false? Anything I missed?

I would try if you get the same problem if you use:

Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(inputString); 

and check if m.find() returns true

Pattern.matches expects the whole string to match - it might have a problem with the lookarounds as these are zero-width assertions and as such the matched characters are discarded.

Just to add a little bit to Joanna's answer: Lookaheads and lookbehinds don't participate in the match. Pattern.matches requires that the regex match starting at the beginning of the string going all the way to the end. Since you have a positive lookbehind ( INSERT INTO ), the match starts at text which is not at the beginning. Likewise, the lookahead at the end means there is no match at the end either.

split works as expected because it doesn't require the match to start at the beginning.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM