简体   繁体   English

Java Regex:按特定顺序检查

[英]Java Regex: Check in specific order

I have the following array of regular expressions: 我有以下正则表达式数组:

String[] array = new String[] { 
  "(car)|(truck)|(bus)|(van)", //4) transportation
  "(w)|(x)|(y)|(z)", //1) options
    "1|2|3|4", //2) numbers
    "(red)|(blue)|(green)|(pink)|(yellow)" //3) color
};

and I have the following string: 并且我有以下字符串:

String s= "1 blue w truck";

I am trying to iterate over this string to see if any of the words in the string match any of the regular expressions in the array. 我试图遍历此字符串,以查看字符串中的任何单词是否与数组中的任何正则表达式匹配。 This is what I am doing: 这就是我在做什么:

for(int i=0; i<array.length;i++){
      Pattern word = Pattern.compile(array[i]);
      Matcher match = word.matcher(s);
      while(match.find() ){
        System.out.println(String.format(" Using regex %d:  %s",i,match.group()));
      }
    }

This gives the following output: 这给出以下输出:

Using regex 0:  truck
Using regex 1:  w
Using regex 2:  1
Using regex 3:  blue

But I want the following to be the output: 但我希望将以下内容作为输出:

Using regex 2:  1
Using regex 3:  blue
Using regex 1:  w
Using regex 0:  truck

I want the words in the strings to stay in the same order without changing the order of the regular expressions in the array. 我希望字符串中的单词保持相同的顺序,而不会更改数组中正则表达式的顺序。

Here's a solution using a pojo that will contain the relevant information of your matches (arbitrarily called MatchInfo here), and a TreeSet ordering your matches by the required criterion (the index of the match within the given String ). 这是一个使用pojo的解决方案,它将包含您的比赛的相关信息(此处任意称为MatchInfo ),以及一个TreeSet根据所需的条件(给定String中的比赛索引)对比赛进行排序。

// your patterns
String[] array = new String[] { 
    "(car)|(truck)|(bus)|(van)", // 4) // transportation
    "(w)|(x)|(y)|(z)", // 1) options
    "1|2|3|4", // 2) numbers
    "(red)|(blue)|(green)|(pink)|(yellow)" // 3) color
};
// your input
String s = "1 blue w truck";

// the definition of the relevant information you want to keep on matches
class MatchInfo implements Comparable<MatchInfo>{
    int index;
    Integer start;
    String match;
    MatchInfo(int index, int start, String match) {
        this.index = index;
        this.start = start;
        this.match = match;
    }
    @Override
    // comparing start index of the match within original string
    public int compareTo(MatchInfo o) {
        return start.compareTo(o.start);
    };
}
// orders unique elements by natural ordering, as defined by Comparable 
// implementation
Set<MatchInfo> groups = new TreeSet<>();

// your original iteration
for (int i = 0; i < array.length; i++) {
    Pattern word = Pattern.compile(array[i]);
    Matcher match = word.matcher(s);
    while (match.find()) {
        // adding new "MatchInfo" to the set
        groups.add(new MatchInfo(i, match.start(), match.group()));
    }
}

// iterating and printing the info
for (MatchInfo m: groups) {
    System.out.printf("Using regex %d: %s%n", m.index, m.match);
}

Output 产量

Using regex 2: 1
Using regex 3: blue
Using regex 1: w
Using regex 0: truck

You will need to loop on parts of the string instead. 您将需要循环遍历字符串的某些部分。 This might be a bit less efficient, as you will then also need to loop through each of the regexes until you hit a match also. 这可能会降低效率,因为您还需要遍历每个正则表达式,直到您也找到匹配项为止。

Something like the following should help: 类似以下内容应有所帮助:

String[] parts = s.split(" ");
for (int i = 0; i < parts.length; i++) {
    for (int r; r < array.length; r++) {
        Pattern word = Pattern.compile(array[i]);
        Matcher match = word.matcher(s);
        if (match.find()) {
            // print out stuff
            break;
        }
    }
}

Compiling the pattern on each iteration is not necessary. 无需在每次迭代时编译模式。

        Pattern[] array = new Pattern[] { 
              Pattern.compile("^((car)|(truck)|(bus)|(van))"), //4) transportation
              Pattern.compile("^((w)|(x)|(y)|(z))"), //1) options
              Pattern.compile("^(1|2|3|4)"), //2) numbers
              Pattern.compile("^((red)|(blue)|(green)|(pink)|(yellow))") //3) color
            };
    String s= "1 blue w truck";

    while(s.length() > 0) {
        for(int i=0; i<array.length;i++){
          Matcher match = array[i].matcher(s);
          if(match.find()) {
              String substr = match.group();
              System.out.println(String.format(" Using regex %d:  %s",i, substr));
              s = s.substring(substr.length()).trim();
          }
        }
    }

Another possibility would be using a more complex regular expression and use capturing groups . 另一种可能性是使用更复杂的正则表达式并使用捕获组 I added a little extra to provide you a type string by using named capturing groups in the regular expression. 我添加了一些额外的功能,以通过在正则表达式中使用命名的捕获组来为您提供类型字符串。 If you don't like it you may use groupCount() and group(i) iteration to return the matched group index. 如果您不喜欢它,则可以使用groupCount()group(i)迭代来返回匹配的组索引。

    public static void main() {
      Pattern pattern = Pattern.compile("(?<transportation>(?:car)|(?:truck)|(?:bus)|(?:van))|(?<options>[wxyz])|(?<numbers>[1-4])|(?<color>(?:red)|(?:blue)|(?:green)|(?:pink)|(?:yellow))");

      String s = "1 blue w truck";

      Matcher match = pattern.matcher(s);
      while(match.find()) {
        printGroupMatch(match, "transportation");
        printGroupMatch(match, "options");
        printGroupMatch(match, "numbers");
        printGroupMatch(match, "color");
      }
    }

    private static void printGroupMatch(Matcher match, String gName) {
      String groupValue = match.group(gName);
      if(groupValue != null){
        System.out.println(String.format(" Using regex %s:  %s", gName, groupValue));
    }
  }

This will return you something like this: 这将返回以下内容:

 Using regex numbers:  1
 Using regex color:  blue
 Using regex options:  w
 Using regex transportation:  truck

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM