简体   繁体   English

用正则表达式拆分Java字符串,忽略括号中的内容

[英]Java String split with regex ignoring content in parenthesis

I would like to split a String such as "word1 AND word2 OR (word3 AND (word4 OR word5)) AND word6" with "AND" only outside from parenthesis to get : "word1" "word2 OR (word3 AND (word4 OR word5))" "word6" 我想将一个字符串,例如“ word1 AND word2 OR(word3 AND(word4 OR word5))AND word6”“ AND”仅在括号之外进行拆分,以得到: “ word1”“ word2 OR(word3 AND(word4 AND word5) ))“” word6“

Note that a bloc of parenthesis can contain many other blocs of parenthesis. 请注意,圆括号可以包含许多其他圆括号。

I've done some researches and I've found a regex that does the opposite of what I want which is : (?:[^AND(]|\\([^)]*\\))+ This regex selects every thing but "AND" outside of parenthesis. 我做了一些研究,发现一个正则表达式与我想要的东西相反: (?:[^AND(]|\\([^)]*\\))+这个正则表达式选择了除括号外的“ AND”。 Also I tried lookahead and lookbehind but haven't been successful. 我也尝试了先行和后退,但没有成功。

Is there a way of doing what I'm asking with a regex ? 有没有办法用正则表达式来解决我要问的问题?

Thanks 谢谢

For Pattern.Compile methode you can use Pattern.DOTALL as parameter. 对于Pattern.Compile方法,您可以使用Pattern.DOTALL作为参数。 Code sampe is given 给出了代码样本

import java.util.regex.*;
public class Test
{
public static void main(String[] args)
{
    String s="word1 AND word2 OR (word3 AND (word4 OR word5)) AND word6";

    String regEx="(?:[^AND(]|\\([^)]*\\))+";
     Pattern pattern = Pattern.compile(regEx, Pattern.DOTALL);
     Matcher matcher = pattern.matcher(s);         

     while (matcher.find()) {             
        System.out.println("Found the text \"" + matcher.group() + "\" starting at " + matcher.start() + " index and ending at index " + matcher.end());         
    } 
}
}

Please try this. 请尝试这个。

Consider creating your own parser for this task (it is not that complicated). 考虑为该任务创建自己的解析器(它并不那么复杂)。

  1. Iterate over string characters to find ranges where you can't remove AND from. 遍历字符串字符以查找无法从中删除AND范围。 Create variable which will calculate level of nesting. 创建将计算嵌套级别的变量。 Increase this level when you find ( and decrease it when you find ) . 找到时提高此级别(找到时降低此级别)
    • if you find ( and you changed level from 0 to 1 then it is start of range, 如果您发现(并且您将级别从0更改为1则它是范围的开始,
    • if you find ) and you changed level from 1 to 0 then it is end of range. 如果找到)并且将级别从1更改为0则它是范围的结尾。
  2. Find positions of AND in your string ( indexOf(data,fromIndex) can be helpful here) and check if it is outside of ranges you shouldn't split on. 在字符串中查找AND位置( indexOf(data,fromIndex)在这里可能会有所帮助),并检查它是否在不应分割的范围之外。
  3. When you have all positions you should split on create substrings from start,position and update next start to be after positoon+"AND".length() . 当您拥有所有职位时,应从start,position创建子字符串start,position并在positoon+"AND".length()之后更新下一个start After this try to substring next part. 在此之后,尝试对下一部分进行子串化。

After point 3 you should have all parts you are interested in. 在第3点之后,您应该拥有所有感兴趣的部分。


Below is example of parser class which seems to be doing what you want. 下面是解析器类的示例,该类似乎在执行您想要的操作。 To see it hover your mouse over it. 要查看它,请将鼠标悬停在它上面。 But before you use it try to create your own implementation. 但是在使用它之前,请尝试创建自己的实现。

class Parser { private static class Range { private int start, end; public Range(int start, int end) { this.start = start; this.end = end; } boolean isInside(int i) { return start <= i && i <= end; } public int getStart() { return start; } @Override public String toString() { return "Range [start=" + start + ", end=" + end + "]"; } } private List<Range> ranges = new ArrayList<Range>(); private boolean checkIfOutsideRanges(int i) { if (ranges.size() == 0) return true; if (ranges.get(0).getStart() > i) return true; for (Range r : ranges) { if (r.isInside(i)) return false; } return true; }
private List<Range> setUpRanges(String data) { int level = 0; int startOfRange = 0; int i = 0; for (char ch : data.toCharArray()) { if (ch == '(') { level++; if (level == 1) startOfRange = i; } if (ch == ')') { level--; if (level == 0) ranges.add(new Range(startOfRange, i)); } i++; } return ranges; }
public List<String> parse(String data) { String toFind = "AND"; ranges = setUpRanges(data); //find indexes of "AND" we should split on List<Integer> toSplit = new ArrayList<Integer>(); int i = -1; do { i = data.indexOf(toFind, i + 1); if (i != -1 && checkIfOutsideRanges(i)) toSplit.add(i); } while (i != -1);
//split on correct AND indexes List<String> results = new ArrayList<String>(); int start = 0; for (Integer index : toSplit) { results.add(data.substring(start, index)); start = index + toFind.length(); } if (start < data.length()) results.add(data.substring(start)); return results; } }

Usage example 使用范例

String data = "word1 AND ((word2 AND word3) AND word4) AND word5";
Parser p = new Parser();
for (String s : p.parse(data))
    System.out.println(s);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM