简体   繁体   English

Java模式匹配器问题

[英]Java Pattern Matcher Woes

I'm looking for a pattern to check for boolean terms but only in certain conditions: 我正在寻找一种模式来检查布尔值,但仅在某些情况下:

  • If they exist by them selves 如果他们自己存在
    1. String a = "AND"; 字符串a =“ AND”;
      String b = "\\tand"; 字符串b =“ \\ tand”;
      String c = "and "; 字符串c =“ and”;
  • If they are not part of a word or phrase 如果它们不是单词或短语的一部分
    1. String d = "This or that"; 字符串d =“此或那”;
  • Terms or phrases to ignore: 要忽略的字词或词组:
    1. String e = "band"; 字符串e =“ band”;
      String f = "L'or" 字符串f =“ L'or”
      String g = "can do"; 字符串g =“可以做”;


The code I have so far only find them if they have spaces before and after the delimiter and any sort of adjustment breaks what progress I have. 到目前为止,我所拥有的代码只有在定界符前后都有空格的情况下才能找到它们,并且任何形式的调整都会破坏我的进度。 I used this page as reference but still no dice. 我将此页面用作参考,但仍然没有骰子。 I have tried using both find() and matches() but find seems to be too broad in its scope and matches doesn't seem broad enough. 我已经尝试过同时使用find()和matchs(),但是find的范围似乎太宽泛,匹配似乎还不够广泛。 Any ideas? 有任何想法吗?

final static Pattern booleanTerms = Pattern.compile("(.*)(( OR )|( or )|( NOT )|( not )( AND )|( and ))(.*)");

public static void main(String[] args) {

Set<String> terms = new HashSet<String>();
terms.add(" OR"); //false
terms.add("or "); //false
terms.add("OR"); // false
terms.add(" or "); //true
for (String s : terms) {
    System.out.println(findDilims(s));
} // end for loop

} // end main method

public static boolean findDilims(String s) {
    Matcher matcher = booleanTerms.matcher(s);
    if (matcher.matches()) {
      return true;
    } else {
      return false;
    }
} // end method

The reason that you are getting false for " OR", "or ", and "OR" is because your pattern is explicitly looking for boolean terms with a space both before or after: eg ( OR ) looks for " OR " . 你得到的理由false的“或”,“或”,以及“OR”是因为你的模式是明确地寻找与空间二者之前或之后布尔术语:如( OR )寻找" OR "

Instead of requiring spaces before and after to ensure that each boolean term is a word, you probably want to use word boundaries instead: 可能不需要使用单词边界来代替确保每个布尔项是一个单词的前后都需要空格:

Pattern.compile("\b(( OR )|( or )|( NOT )|( not )|( AND )|( and ))\b");

You can use \\s* to add optional whitespace to the beginning and end of your regex. 您可以使用\\s*将可选的空格添加到正则表达式的开头和结尾。 That way " OR \\t " will also match. 这样" OR \\t "也将匹配。

Pattern.compile("\s*\b(( OR )|( or )|( NOT )|( not )|( AND )|( and ))\b\s*");

matcher.matches() should now work fine. matcher.matches()现在应该可以正常工作了。

You said you want to find them only if they are by themselves and not as part of a phrase. 您说过,仅当它们本身而不是短语的一部分时,才想找到它们。 Then you don't want to start and end your pattern with (.*) . 然后,您不想以(.*)开头和结尾您的模式。

It seems you also want to find them even if there is whitespace around them. 即使它们周围有空格,您似乎也想找到它们。 Then you would need to start and end your patter with \\s* . 然后,您需要以\\s*开始和结束模式。 You also want to find them even if there is no space before or after. 即使之前或之后没有空间,您也想找到它们。 Then you don't want to have space on your patterns, like for example in ( or ) . 然后,您就不想在模式上留出空间,例如( or )

And it seems you want this to be case insensitive so you might want to set that with (?i) 似乎您希望此代码不区分大小写,因此您可能需要使用(?i)进行设置

final static Pattern booleanTerms = Pattern.compile("(?i)(\s*)((or)|(not)|(and))(\s*)");

You need a character class either end of the terms as an alternation: 您需要一个字符类作为替换项:

(?i)(^\s*|[^a-z]\s)(or|not|and)(\s[^a-z]|\s*$)

And you only need one line: 您只需要一行:

public static boolean findDilims(String s) {
    return s.matches(".*(?i)(^\\s*|[^a-z]\\s)(or|not|and)(\\s[^a-z]|\\s*$).*");
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM