I read this string from file:
abc | abc (abc\\|abc)|def
I want to get array inludes 3 items:
How to write regex correctly? line.split("(?!<=\\\\)\\\\|")
doesn't work.
Code:
public class __QuickTester {
public static void main (String [] args) {
String test = "abc|abc (abc\\|abc)|def|banana\\|apple|orange";
// \\\\ becomes \\ <-- String
// \\ becomes \ <-- In Regex
String[] result = test.split("(?<!\\\\)\\|");
for(String part : result) {
System.out.println(part);
}
}
}
Output:
abc
abc (abc\|abc)
def
banana\|apple
orange
Note: You need \\\\\\\\
(4 backslashes) to get \\\\
(2 backslashes) as a String, and then \\\\
(2 backslashes) becomes a single \\
in Regex.
试试这个正则表达式: ([\\w()]|(\\\\|))+
Main problem in your approach is that \\
is special in regex, but also in String. So to create \\
literal you need to escape it twice:
\\\\
"\\\\\\\\"
. so you would need to write it as split("(?<!\\\\\\\\)\\\\|")
But there are also possible problems with this approach since splitting on |
which is simple preceded by \\
can be error-prone. Because you are using \\
as special character to create \\
literal you probably need to write it as \\\\
, for instance to create c:\\foo\\bar\\
you probably need to write it in your text as c:\\\\foo\\\\bar\\\\
.
So in that case lets say that you want to split text like
abc|foo\|c:\\bar\\|cde
I assume that you want to split only in this places
abc|foo\|c:\\bar\\|cde
^ ^
because
abc|foo
pipe |
have no \\
before it, bar\\\\|cde
despite pipe having \\
before it, we know that this \\
wasn't used to escape |
, but to generate text representing \\
literal (so generally |
which have non or even number of \\
characters are OK to split on). But split(onEachPipeWhichHaveBackslashBeforeIt)
like split("(?<!\\\\\\\\)\\\\|")
you will not split between bar\\\\|cde
because there is \\
before |
which will prevent such split.
To solve this problem you could check if there are odd number of \\
before |
, but this is hard to do in Java since look-behind needs to have limited width.
Possible solution would be split("(?<!(?<!\\\\\\\\)((\\\\\\\\){2}){0,1000}\\\\\\\\)\\\\|")
and assumption that string will never contain more than 1000
continuous \\
characters, but it seems like overkill.
IMO better solution would be searching for strings you want to find, ninstead of searching for strings you want to split on. And strings you want to find are
|
\\
(including |
since \\
will simply escape it). So our regex could look like (\\\\\\\\.|[^|])+
(I placed \\\\\\\\.
at start to prevent [^|]
consuming \\
which will be used to escape other characters).
Example:
Pattern p = Pattern.compile("(\\\\.|[^|])+");
Matcher m = p.matcher(text);
while (m.find()){
System.out.println(m.group());
}
Output:
abc
foo\|c:\\bar\\
cde
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.