简体   繁体   English

在 Java String.split() 方法中处理带有转义字符的分隔符

[英]Handling delimiter with escape characters in Java String.split() method

I have searched the web for my query, but didn't get the answer which fits my requirement exactly.我在网上搜索了我的查询,但没有得到完全符合我要求的答案。 I have my string like below:我的字符串如下:

A|B|C|The Steading\|Keir Allan\|Braco|E

My Output should look like below:我的输出应如下所示:

A
B
C
The Steading|Keir Allan|Braco
E

My requirement is to skip the delimiter if it is preceded by the escape sequence.如果分隔符前面有转义序列,我的要求是跳过分隔符。 I have tried the following using negative lookbehinds in String.split() :我在String.split()中使用否定的lookbehinds尝试了以下操作:

(?<!\\)\|

But, my problem is the delimiter will be defined by the end user dynamically and it need not be always |但是,我的问题是分隔符将由最终用户动态定义,不一定总是| . . It can be any character on the keyboard (no restrictions).它可以是键盘上的任何字符(没有限制)。 Hence, my doubt is that the above regex might fail for some of the special characters which are not allowed in regex.因此,我怀疑上述正则表达式可能会因某些正则表达式中不允许的特殊字符而失败。

I just wanted to know if this is the perfect way to do it.我只是想知道这是否是完美的方法。

You can use Pattern.quote() :您可以使用Pattern.quote()

String regex = "(?<!\\\\)" + Pattern.quote(delim);

Using your example:使用您的示例:

String delim = "|";
String regex = "(?<!\\\\)" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading\\|Keir Allan\\|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading\|Keir Allan\|Braco
E

You can extend this to use a custom escape sequence as well:您也可以扩展它以使用自定义转义序列:

String delim = "|";
String esc = "+";
String regex = "(?<!" + Pattern.quote(esc) + ")" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading+|Keir Allan+|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading+|Keir Allan+|Braco
E

I know this is an old thread, but the lookbehind solution has an issue, that it doesn't allow escaping of the escape character (the split would not occur on A|B|C|The Steading\\|Keir Allan\|Braco|E) ).我知道这是一个旧线程,但是后视解决方案有一个问题,它不允许转义字符的转义(拆分不会发生在A|B|C|The Steading\\|Keir Allan\|Braco|E) )。

The positive matching solution in thread Regex and escaped and unescaped delimiter works better (with modification using Pattern.quote() if the delimiter is dynamic).线程Regex 和转义和非转义分隔符中的正匹配解决方案效果更好(如果分隔符是动态的,则使用Pattern.quote()进行修改)。

private static void splitString(String str, char escapeCharacter, char delimiter, Consumer<String> resultConsumer) {
    final StringBuilder sb = new StringBuilder();
    boolean isEscaped = false;
    for (int i = 0; i < str.length(); i++) {
        char c = str.charAt(i);
        if (c == escapeCharacter) {
            isEscaped = ! isEscaped;
            sb.append(c);
        } else if (c == delimiter) {
            if (isEscaped) {
                sb.append(c);
                isEscaped = false;
            } else {
                resultConsumer.accept(sb.toString());
                sb.setLength(0);
            }
        } else {
            isEscaped = false;
            sb.append(c);
        }
    }
    resultConsumer.accept(sb.toString());
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM