Does anyone know how to split a string on a character taking into account its escape sequence?
For example, if the character is ':', "a:b" is split into two parts ("a" and "b"), whereas "a:b" is not split at all.
I think this is hard (impossible?) to do with regular expressions.
Thank you in advance,
Kedar
(?<=^|[^\\\\]):
gets you close, but doesn't address escaped slashes. (That's a literal regex, of course you have to escape the slashes in it to get it into a java string)
(?<=(^|[^\\\\])(\\\\\\\\)*):
How about that? I think that should satisfy any ':' that is preceded by an even number of slashes.
Edit: don't vote this up. MizardX's solution is better :)
Since Java supports variable-length look-behinds (as long as they are finite), you could do do it like this:
import java.util.regex.*;
public class RegexTest {
public static void main(String[] argv) {
Pattern p = Pattern.compile("(?<=(?<!\\\\)(?:\\\\\\\\){0,10}):");
String text = "foo:bar\\:baz\\\\:qux\\\\\\:quux\\\\\\\\:corge";
String[] parts = p.split(text);
System.out.printf("Input string: %s\n", text);
for (int i = 0; i < parts.length; i++) {
System.out.printf("Part %d: %s\n", i+1, parts[i]);
}
}
}
(?<=(?<!\\\\)(?:\\\\\\\\){0,10})
looks behind for an even number of back-slashes (including zero, up to a maximum of 10). Output:
Input string: foo:bar\\:baz\\\\:qux\\\\\\:quux\\\\\\\\:corge
Part 1: foo
Part 2: bar\\:baz\\\\
Part 3: qux\\\\\\:quux\\\\\\\\
Part 4: corge
Another way would be to match the parts themselves, instead of split at the delimiters.
Pattern p2 = Pattern.compile("(?<=\\A|\\G:)((?:\\\\.|[^:\\\\])*)");
List<String> parts2 = new LinkedList<String>();
Matcher m = p2.matcher(text);
while (m.find()) {
parts2.add(m.group(1));
}
The strange syntax stems from that it need to handle the case of empty pieces at the start and end of the string. When a match spans exactly zero characters, the next attempt will start one character past the end of it. If it didn't, it would match another empty string, and another, ad infinitum…
(?<=\\A|\\G:)
will look behind for either the start of the string (the first piece), or the end of the previous match, followed by the separator. If we did (?:\\A|\\G:)
, it would fail if the first piece is empty (input starts with a separator). \\\\.
matches any escaped character. [^:\\\\]
matches any character that is not in an escape sequence (because \\\\.
consumed both of those). ((?:\\\\.|[^:\\\\])*)
captures all characters up until the first non-escaped delimiter into capture-group 1.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.