简体   繁体   English

正则表达式,如何分割|并且在\之前避免分裂

[英]Regular Expression, how to split with | and avoiding to split when \ is before

I have the next text 我有下一个文字

 aaa|bbbb|cccc|dddd\|eeee|ffff

and i want to split by | 我想分开| and excluding when | 并且在|时排除 is preceded by \\ and obtain 之前是\\并获得

aaa AAA

bbbb BBBB

cccc CCCC

dddd\\|eeee DDDD \\ | EEEE

ffff FFFF

Thanks. 谢谢。

ps : i tried using some regexp generator (for example http://txt2re.com/ ) but frankly regexp is anything but friendly. ps:我尝试使用一些正则表达式生成器(例如http://txt2re.com/ )但坦率地说regexp不是很友好。

update: finally i give up. 更新:最后我放弃了。 Regexp is not fast (i did a benchmark), neither is clear (in comparison with a function that everybody can follow), then i skip it and now i am using real code. Regexp并不快(我做了一个基准测试),既不清楚(与每个人都可以遵循的功能相比),然后我跳过它,现在我使用真正的代码。

This should do it: 这应该这样做:

(?<!\\\\)\\|

If you want to allow backslash-escaped backslashes, you can use: 如果要允许反斜杠转义的反斜杠,可以使用:

(?<!(?<!\\\\)\\\\)\\|

So given the string aaa|bbbb|cccc|dddd\\|eeee\\\\|ffff , the split would be: 所以给定字符串aaa|bbbb|cccc|dddd\\|eeee\\\\|ffff ,拆分将是:


    aaa
    bbbb
    cccc
    dddd|eeee\*
    ffff

* Or dddd\\|eeee\\\\ if you're not stripping escape-backslashes for some reason. *或者dddd\\|eeee\\\\如果由于某种原因你没有剥离转义反斜杠。

Edit: not familiar with Java regular expression flavor, added escapes per ratchet freak's comment. 编辑:不熟悉Java正则表达式的味道,为每个棘轮怪物的评论添加了逃脱。

Tried to add this as a comment to eyelidlessness's answer, but don't know how to format it there... 试图将此添加为对eyelidlessness的答案的评论,但不知道如何在那里格式化...

Anyhow, eyelidlessness answer looks correct to me: 无论如何,眼睑的回答对我来说是正确的:

    String str = "aaa|bbbb|cccc|dddd\\|eeee|ffff";
    String[] tokens = str.split("(?<!\\\\)\\|");
    System.out.println(Arrays.toString(tokens));    

which prints: 打印:

[aaa, bbbb, cccc, dddd\|eeee, ffff]

Don't use split() for this. 不要使用split() (You could if Java supported indefinite repetition inside lookbehind assertions. But it doesn't.) (如果Java在lookbehind断言中支持无限重复,你可以。但它没有。)

Better collect all the matches between | 更好地收集|之间的所有匹配 s: S:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("(?:\\\\.|[^\\\\|])*");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
}

This correctly splits aaa|bbbb\\\\|cccc|dddd\\|eeee|ffff\\\\\\|ggg\\\\\\\\|hhhh into 这正确地将aaa|bbbb\\\\|cccc|dddd\\|eeee|ffff\\\\\\|ggg\\\\\\\\|hhhh分成了

aaa
bbbb\\
cccc
dddd\|eeee
ffff\\\|ggg\\\\
hhhh

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM