简体   繁体   English

Java正则表达式匹配具有特殊字符的精确单词

[英]Java Regular Expression to Match Exact Word with Special Characters

I have list of keywords entered by the user and they may contains the special characters like $, #, @, ^, &, etc. 我有用户输入的关键字列表,它们可能包含特殊字符,如$, #, @, ^, &,等。

As per my requirement when ever i receive list of text messages i need to search for all the keywords in every message. 根据我的要求,当我收到短信列表时,我需要搜索每条短信中的所有关键字。

We need to match exact keyword . 我们需要匹配完全关键字

CASE 1: Simple Keyword - Simple Message 案例1:简单关键字 - 简单消息

I used \\b to match exact keyword and it works fine. 我使用\\b匹配完全关键字,它工作正常。

public static void main(String[] args) {
        String patternStr =  "(?i)\\bHello\\b";

        Pattern pattern = Pattern.compile(patternStr);

        List<String> strList = new ArrayList<String>();
        strList.add("HHello Message");
        strList.add("This is Hello Message ");
        strList.add("Now Hellos again.");

        for(String str : strList) {
            Matcher matcher = pattern.matcher(str);
            System.out.println(">> "+matcher.find());
        }
    }

OUTPUT as Expected 按预期输出

>> false
>> true
>> false

CASE 2 : Simple Keyword - Message with Special Character 案例2:简单关键字 - 具有特殊字符的消息

Now, if i run above same code for following messages then it didn't work as expected. 现在,如果我在跟踪消息上面运行相同的代码,那么它没有按预期工作

List<String> strList = new ArrayList<String>();
strList.add("#Hello Message");
strList.add("This is Hello Message ");
strList.add("Now Hellos again.");

OUTPUT: OUTPUT:

true
true
false

Expected OUTPUT 预计输出

false
true
false

CASE 3 : Keyword & Message with Special Character 案例3:具有特殊字符的关键字和消息

If i receive following messages and Keyword is #Hello . 如果我收到以下消息,关键字是#Hello I wrote following code but it didn't work . 我写了下面的代码,但它没有用

public static void main(String[] args) {
        String patternStr =  "(?i)\\b#Hello\\b";

        Pattern pattern = Pattern.compile(patternStr);

        List<String> strList = new ArrayList<String>();
        strList.add("HHello Message");
        strList.add("This is #Hello Message ");
        strList.add("Now Hellos again.");

        for(String str : strList) {
            Matcher matcher = pattern.matcher(str);
            System.out.println(">> "+matcher.find());
        }
    }

OUTPUT: OUTPUT:

>> false
>> false
>> false

Expected OUTPUT: 预期产量:

>> false
>> true
>> false

How can i escape the special characters and resolve CASE 2 and CASE 3 . 如何逃避特殊字符并解决CASE 2 and CASE 3

Please help. 请帮忙。

Case 2 seems the opposite as case 3, so I don't think you can combine the Pattern s. 案例2与案例3相反,所以我不认为你可以结合Pattern

For case 2, your Pattern could look like: 对于案例2,您的Pattern可能如下所示:

Pattern pattern = Pattern.compile("(\\s|^)Hello(\\s|$)", Pattern.CASE_INSENSITIVE);

In this case we surround the keyword by whitespace or beginning/end of input. 在这种情况下,我们用空格或输入的开头/结尾包围关键字。

For case 3, your Pattern could look like: 对于案例3,您的Pattern可能如下所示:

Pattern pattern = Pattern.compile("[\\$#@\\^&]Hello(\\s|$)", Pattern.CASE_INSENSITIVE);

In this case, we precede the keyword with any of the special characters of your choice (note the escaped reserved characters $ and ^ ), then we accept whitespace or the end of input as the character following the keyword. 在这种情况下,我们在关键字前面加上您选择的任何特殊字符(注意转义的保留字符$^ ),然后我们接受空格或输入的结尾作为关键字后面的字符。

使用(?:^|\\s) (“文本或空白的开头”)代替第一个\\b(?:$|\\s) (“文本结尾或空白”)而不是第二个\\b in你的正则表达式。

The problem comes from the way that "exact word" is defined. 问题来自定义“确切词”的方式。 It is not just whitespace that can surround the word to make it a word. 它不仅仅是可以围绕单词的空白,使它成为一个单词。 For example in most circumstances one would want an exact word match for 'Hello' to work with. 例如,在大多数情况下,人们可能希望使用“Hello”的精确单词匹配。

"hello there", "That young man just said hello to that other young man" and "I wish people would still answer the telephone by saying ahoy rather than Hello." “那你好”,“那个年轻人刚跟那个年轻人打招呼”和“我希望人们仍然会说你好,而不是你好。”

If you want the match to be only split on whitespace then I believe you will have to specify the whitespace condition. 如果您希望仅在空格上拆分匹配,那么我相信您必须指定空白条件。 Assuming you also want to it to match at the end then I would propose something like this. 假设你也想在最后匹配那么我会提出这样的事情。

Pattern pattern = Pattern.compile("\(^\| \)" + escapeSearchString(patternString) + "\( \|$\)");

and then have a couple of methods like this 然后有几个像这样的方法

public String escapeSearchString(String patternString) {
    StringBuilder stringBuilder = new StringBuilder(patternString.length() * 3);
    for (char c : patternString.toCharArray()) {
        if (isEscapableCharacter(c)) {
            stringBuilder.append("\\");
        }
        stringBuilder.append(c);
    }
}

public boolean isEscapableCharacter(char c) {
    switch (c) {
        case '#':
        case '$':
        case '@':
        case '^':
        case '&':
            return true;
        default:
            return false;
    }
}

It would probably be better to iterate over a char[] for the escapable characters and load them from a config file. 为可逃避的字符迭代char []并从配置文件加载它们可能会更好。

Try maybe this way 也许这样试试吧

String patternStr = "(?i)(?<=\\s|^)"+Pattern.quote(searchedStubstring)+"(?=\\s|$)";

(?<=...) and (?=...) is positive look behind and ahead so it will check if before your searchedStubstring will have (?<= ...)和(?= ...)是积极的看后面和前面所以它会检查你的searchedStubstring是否会有

  • white-space \\\\s or start of the input ^ before, and 白色空间\\\\s\\\\s之前的输入^开始,和
  • white-space \\\\s or end of the input & after it. white-space \\\\s或输入结束&之后。

Also in case you would like to searched for special characters like $ + and others you need to escape them. 如果你想搜索像$ +和其他人这样的特殊字符,你需要逃避它们。 To do this you can use Pattern.quote(searchedStubstring) 为此,您可以使用Pattern.quote(searchedStubstring)

for example if your word want to have special char (for example here '#') at the begining and end of this you have to write the following: 例如,如果你的单词想要在开头和结尾有特殊字符(例如这里'#'),你必须写下面的内容:

Pattern p = Pattern.compile("(\\s|^|#)"+word+"(\\s|\\#|$)", Pattern.CASE_INSENSITIVE);

if you want exact match: 如果你想要完全匹配:

Pattern p = Pattern.compile("(\\s|^)"+word+"(\\s|$)", Pattern.CASE_INSENSITIVE);

with '|' 用'|' is like OR so you can add as match special char's you want ..for example: 就像OR那样你可以添加你想要的匹配特殊字符..例如:

Pattern p = Pattern.compile("(\\s|^|#|:|-)"+word+"(\\s|\\#|\\,|\\.|$)", Pattern.CASE_INSENSITIVE);

char '^' means to detect the string at beginning of line and '$' means at end of line. char'^'表示在行开头检测字符串,'$'表示在行尾。 see more here: Summary of regular-expression constructs 在这里看到更多: 正则表达式构造的摘要

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM