繁体   English   中英

java regex,仅当不在引号或括号中时才用逗号分隔

[英]java regex, split on comma only if not in quotes or brackets

我想通过正则表达式进行java拆分。 当我的字符串不在单引号或括号中时,我想在每个逗号上拆分我的字符串。 例:

Hello, 'my,',friend,(how ,are, you),(,)


 should give:
    hello
    my,
    friend
    how, are, you
    ,

我试过这个:

(?i),(?=([^\'|\(]*\'|\([^\'|\(]*\'|\()*[^\'|\)]*$)

但我不能让它工作(我通过http://java-regex-tester.appspot.com/测试)

有任何想法吗?

嵌套的paranthesises不能被正则表达式分割。 它更容易手动拆分。

public static List<String> split(String orig) {
    List<String> splitted = new ArrayList<String>();
    int nextingLevel = 0;
    StringBuilder result = new StringBuilder();
    for (char c : orig.toCharArray()) {
        if (c == ',' && nextingLevel == 0) {
            splitted.add(result.toString());
            result.setLength(0);// clean buffer
        } else {
            if (c == '(')
                nextingLevel++;
            if (c == ')')
                nextingLevel--;
            result.append(c);
        }
    }
    // Thanks PoeHah for pointing it out. This adds the last element to it.
    splitted.add(result.toString());
    return splitted;
}

希望这可以帮助。

java CSV解析器库比正则表达式更适合此任务: http//sourceforge.net/projects/javacsv/

假设没有嵌套() ,你可以拆分

",(?=(?:[^']*'[^']*')*[^']*$)(?=(?:[^()]*\\([^()]*\\))*[^()]*$)"

当字符串中的前面是偶数个'和括号对时,它将仅在逗号上分割。

这是一个脆弱的解决方案,但它可能已经足够好了。

正如在@Balthus的一些评论和回答中一样,最好在CSV解析器中完成 您需要做一些smart RexEx replacement来准备输入字符串以进行解析。 考虑这样的代码:

String str = "Hello, 'my,',friend,(how ,are, you),(,)"; // input string

// prepare String for CSV parser: replace left/right brackets OR ' by a "
CsvReader reader = CsvReader.parse(str.replaceAll("[(')]", "\""));
reader.readRecord(); // read the CSV input
for (int i=0; i<reader.getColumnCount(); i++)
   System.out.printf("col[%d]: [%s]%n", i, reader.get(i));

OUTPUT

col[0]: [Hello]
col[1]: [my,]
col[2]: [friend]
col[3]: [how ,are, you]
col[4]: [,]

我还需要在引号和括号之外用逗号分隔。

在SO上搜索了所有相关答案之后,我意识到在这种情况下需要一个词法分析器,我为自己编写了一个通用的实现。 它支持分隔符,多个引号和多个括号作为正则表达式。

public static List<String> split(String string, String regex, String[] quotesRegex, String[] leftBracketsRegex,
                                 String[] rightBracketsRegex) {

    if (leftBracketsRegex.length != rightBracketsRegex.length) {
        throw new IllegalArgumentException("Bracket count mismatch, left: " + leftBracketsRegex.length + ", right: "
                + rightBracketsRegex.length);
    }

    // Prepare all delimiters.
    String[] delimiters = new String[1 + quotesRegex.length + leftBracketsRegex.length + rightBracketsRegex.length];
    delimiters[0] = regex;
    System.arraycopy(quotesRegex, 0, delimiters, 1, quotesRegex.length);
    System.arraycopy(leftBracketsRegex, 0, delimiters, 1 + quotesRegex.length, leftBracketsRegex.length);
    System.arraycopy(rightBracketsRegex, 0, delimiters, 1 + quotesRegex.length + leftBracketsRegex.length,
            rightBracketsRegex.length);

    // Build delimiter regex.
    StringBuilder delimitersRegexBuilder = new StringBuilder("(?:");
    boolean first = true;
    for (String delimiter : delimiters) {
        if (delimiter.endsWith("\\") && !delimiter.endsWith("\\\\")) {
            throw new IllegalArgumentException("Delimiter contains trailing single \\: " + delimiter);
        }
        if (first) {
            first = false;
        } else {
            delimitersRegexBuilder.append("|");
        }
        delimitersRegexBuilder
                .append("(")
                .append(delimiter)
                .append(")");
    }
    delimitersRegexBuilder.append(")");
    String delimitersRegex = delimitersRegexBuilder.toString();

    // Scan.
    int pendingQuoteIndex = -1;
    Deque<Integer> bracketStack = new LinkedList<>();
    StringBuilder pendingSegmentBuilder = new StringBuilder();
    List<String> segmentList = new ArrayList<>();
    Matcher matcher = Pattern.compile(delimitersRegex).matcher(string);
    int matcherIndex = 0;
    while (matcher.find()) {
        pendingSegmentBuilder.append(string.substring(matcherIndex, matcher.start()));
        int delimiterIndex = -1;
        for (int i = 1; i <= matcher.groupCount(); ++i) {
            if (matcher.group(i) != null) {
                delimiterIndex = i - 1;
                break;
            }
        }
        if (delimiterIndex < 1) {
            // Regex.
            if (pendingQuoteIndex == -1 && bracketStack.isEmpty()) {
                segmentList.add(pendingSegmentBuilder.toString());
                pendingSegmentBuilder.setLength(0);
            } else {
                pendingSegmentBuilder.append(matcher.group());
            }
        } else {
            delimiterIndex -= 1;
            pendingSegmentBuilder.append(matcher.group());
            if (delimiterIndex < quotesRegex.length) {
                // Quote.
                if (pendingQuoteIndex == -1) {
                    pendingQuoteIndex = delimiterIndex;
                } else if (pendingQuoteIndex == delimiterIndex) {
                    pendingQuoteIndex = -1;
                }
                // Ignore unpaired quotes.
            } else if (pendingQuoteIndex == -1) {
                delimiterIndex -= quotesRegex.length;
                if (delimiterIndex < leftBracketsRegex.length) {
                    // Left bracket
                    bracketStack.push(delimiterIndex);
                } else {
                    delimiterIndex -= leftBracketsRegex.length;
                    // Right bracket
                    int topBracket = bracketStack.peek();
                    // Ignore unbalanced brackets.
                    if (delimiterIndex == topBracket) {
                        bracketStack.pop();
                    }
                }
            }
        }
        matcherIndex = matcher.end();
    }
    pendingSegmentBuilder.append(string.substring(matcherIndex, string.length()));
    segmentList.add(pendingSegmentBuilder.toString());

    while (segmentList.size() > 0 && segmentList.get(segmentList.size() - 1).isEmpty()) {
        segmentList.remove(segmentList.size() - 1);
    }

    return segmentList;
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM