简体   繁体   English

如何恢复用Java中的正则表达式拆分的字符串

[英]How to recover a string that has been split by a regular expression in Java

We know that String.split(String regex) returns a String[] . 我们知道String.split(String regex)返回String[] So, how can we recover a string by using the returned String[] and the regex (or needs more inputs)? 那么,如何使用返回的String[]和正则表达式(或需要更多输入)来恢复String[]

  1. It seems like if the regex is a constant, say \\t , we can recover the original string by appending the joint of each member of the String[] and the constant sequentially. 好像正则表达式是一个常量,例如\\t ,我们可以通过依次附加String[]的每个成员和常量的连接来恢复原始字符串。 However, what if the regex is sth. 但是,如果正则表达式是什么呢? like [AB]+ ? 喜欢[AB]+吗?

  2. If there is a all-around way(function) that can handle the above both cases ? 如果有一种全面的方法(功能)可以处理以上两种情况?

There is no method present in JDK which can do this, but you can use the Apache Commons Langs StringUtls class , it will work JDK中没有可以执行此操作的方法,但是您可以使用Apache Commons Langs StringUtls类,它将起作用

import org.apache.commons.lang3.StringUtils;

/**
 *
 * @author Himanshu Mishra
 */
public class Main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {

        String orignalStr = "AB:#:CD:#:EF";
        String[] orignalSplit = orignalStr.split(":#:");


        // Join all Strings in the Array into a Single String, separated by :#:

        System.out.println("Join Strings using separator " + StringUtils.join(orignalSplit, ":#:"));
    }
}

You can't, unless the regex is a literal, and even then it's not 100% reliable. 除非正则表达式是文字,否则您不能这样做,即使那样,它也不是100%可靠的。

The reason is the matched separators are discarded, so a non-literal regex, for example 原因是匹配的分隔符被丢弃,例如非文字正则表达式

 \s+ (any number if whitespace characters)

can not be reconstructed, because we don't know how many, and what type of, whitespace characters originally existed. 无法重建,因为我们不知道最初存在多少个,什么类型的空白字符。

If the regex is a literal, you can simply rebuild the string by inserting the literal between the elements split, but not that trailing blanks as ignored, both of these inputs: 如果正则表达式是文字,则可以通过在分隔的元素之间插入文字来简单地重建字符串,但不能忽略尾随空格,这两个输入都是:

a,b,c
a,b,c,,,,,

when split with "," both yield the same result of [a, b, c] , so even with a literal regex you can't confidently recreate the original input. 当用","分割时","两者都产生[a, b, c]相同的结果,因此即使使用文字正则表达式,您也无法自信地重新创建原始输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM