简体   繁体   中英

How to modify this regular expression to be case insensitive while searching for curse words?

At the moment, this profanity filter finds darn and golly but not Darn or Golly or DARN or GOLLY .

List<String> bannedWords = Arrays.asList("darn", "golly", "gosh");

StringBuilder re = new StringBuilder();
for (String bannedWord : bannedWords)
{
    if (re.length() > 0)
        re.append("|");
    String quotedWord = Pattern.quote(bannedWord);
    re.append(quotedWord);
}

inputString = inputString.replaceAll(re.toString(), "[No cursing please!]");

How can it be modified to be case insensitive?

Start the expression with (?i) .

Ie, change re.toString() to "(?i)" + re.toString() .

From the documentation of Pattern

(?idmsux-idmsux) Nothing, but turns match flags idmsux on - off

where i is the CASE_INSENSITIVE flag.

You need to set the CASE_INSENSITIVE flag, or simply add (?i) to the beginning of your regex.

StringBuilder re = new StringBuilder("(?i)");

You'll also need to change your conditional to

if (re.length() > 4)

Setting the flag via @ratchetFreak's answer is probably best, however. It allows for your condition to stay the same (which is more intuitive) and gives you a clear idea of what's going on in the code.

For more info, see this question and in particular this answer which gives some decent explanation into using regex's in java.

use a precompiled java.util.regex.Pattern

Pattern p = Pattern.compile(re.toString(),Pattern.CASE_INSENSITIVE);//do this only once

inputString = p.matcher(inputString).replaceAll("[No cursing please!]");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM