简体   繁体   English

Java Regex匹配包含特殊字符的整个单词

[英]Java Regex matching whole word containing special characters

I'am trying to match certain keywords in a text of string. 我正在尝试匹配字符串文本中的某些关键字。 The keywords can contain any combination of special characters and must be a whole word (without space). 关键字可以包含特殊字符的任意组合,并且必须是一个完整的单词(无空格)。

public static void main(String[] args)
{
    String words[] = {"Hello", "World", "£999.00", "*&332", "$30,00", "$1230.30",
                    "Apple^*$Banana&$Pears!$", "90.09%"};

    String text = "Hello world World £99900 £999.00 Apple^*$Banana&$Pears!$"
                  + " $30,00 *&332 $1230.30 90.09%";

    StringBuilder regex = new StringBuilder();
    regex.append("(");

    for(String item : word)
        regex.append("(?:^|\\s)").append(item).append("(?:$|\\s)").append("|");

    regex.deleteCharAt(buildRegex.length() - 1);
    regex.append(")");

    Pattern pattern = Pattern.compile(regex.toString());

    Matcher match = pattern.matcher(text);

    while (match.find())
        System.out.println(match.group());
}

The results I get is: 我得到的结果是:
Hello 你好
World 世界
£999.00 £999.00
&332 &332
90.09% 90.09%

Not all of the words match. 并非所有单词都匹配。 I've tried different solution posted here and searching and non could match all the words in my example. 我尝试了这里发布的其他解决方案,搜索和non可以匹配示例中的所有单词。

How can I match keywords containing any combination of special characters? 如何匹配包含特殊字符组合的关键字?

This lookaround based regex should work: 这种基于lookaround的正则表达式应该可以工作:

for(String item : words)
   regex.append("(?<=^|\\s)").append(Pattern.quote(item)).append("(?=\\s|$)").append("|");

Main difference is: 主要区别在于:

  • Use of lookarounds to avoid matching spaces. 使用环视来避免匹配空格。 When 2 consecutive matches are to be found this creates problem in your regex since space has already been consumed. 当找到2个连续的匹配项时,这会在正则表达式中造成问题,因为已经消耗了空间。
  • Use of Pattern.quote to take care of special characters 使用Pattern.quote照顾特殊字符

This gets output: 得到输出:

Hello
World
£999.00
Apple^*$Banana&$Pears!$
$30,00
*&332
$1230.30
90.09%

Use Pattern.quote() . 使用Pattern.quote() What is more, you need to use a lookbehind and a lookahead: 更重要的是,您需要使用先行式和后行式:

for(String item : word)
    regex.append("(?<=^|\\s)")
        .append(Pattern.quote(item)) // HERE
        .append("(?=$|\\s)").append("|");

Basically, what this method does is prepend \\Q and append \\E to the string. 基本上,此方法的作用是在字符串前面加上\\Q并附加\\E See the javadoc for Pattern . 参见javadoc中的Pattern

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM