简体   繁体   中英

Case-insensitive string matching in Java without anchors

NOTE : This is NOT a question about case-insensitive matching. It is a question about regex anchors.

I'm having a lot of trouble doing basic case insensitive matching in Java:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class match {
    public static void main(String[] args) {
        String prompt="das101.lo1>";
        String str="automate@DAS101.LO1>";

        Pattern ignore = Pattern.compile(prompt.toUpperCase(), Pattern.CASE_INSENSITIVE);
        Matcher mIgn  = ignore.matcher(str);
        if(mIgn.matches())
            System.out.println(str+" Matches " + prompt.toUpperCase());
        else
            System.out.println(str+" Doesn't Match " + prompt.toUpperCase());

        char[] cStr = str.toCharArray();
        char[] cPrompt = prompt.toUpperCase().toCharArray();

        /* Verify that strings match */
        for(int i=cPrompt.length-1, j=cStr.length-1; i>=0 && j>=0 ; --i,--j) {
            if (cPrompt[i]==cStr[j])
                System.out.println("Same: "+ cPrompt[i]+":" + cStr[j]);
            else
                System.out.println("Different: "+ cPrompt[i]+":" + cStr[j]);
        }
    }
}

The output:

samveen@javadev-tahr:/tmp$ javac match.java
samveen@javadev-tahr:/tmp$ java match
automate@DAS101.LO1> Doesn't Match DAS101.LO1>
Same: >:>
Same: 1:1
Same: O:O
Same: L:L
Same: .:.
Same: 1:1
Same: 0:0
Same: 1:1
Same: S:S
Same: A:A
Same: D:D

If I change if(mIgn.matches()) to if(mIgn.find()) , I get this simple string pattern match working:

samveen@javadev-tahr:/tmp$ javac match.java
samveen@javadev-tahr:/tmp$ java match
automate@DAS101.LO1> Matches DAS101.LO1>
Same: >:>
Same: 1:1
Same: O:O
Same: L:L
Same: .:.
Same: 1:1
Same: 0:0
Same: 1:1
Same: S:S
Same: A:A
Same: D:D

Where am I going wrong?

I referred to Case-Insensitive Matching in Java RegEx and Methods of the Pattern Class

String.matches requires the entire string to match the pattern. As if the pattern has an implied "^...$".

Pattern ignore = Pattern.compile(".*" + Pattern.quote(prompt) + ".*",
    Pattern.CASE_INSENSITIVE);

is for a find like match.

This could have been done with the original pattern as:

if (mIgn.find()) {
    System.out.println("Found at position " + mIgn.start());
}

Matches return true if the whole string matches the given pattern. For this it prefix ur matcher with '^' and suffix with '$' sign and hence it is not going to look for a substring.

find() return true in case of substring matches also.

Have a look - Difference between matches() and find() in Java Regex

matches() only returns true if the whole input matches the pattern, not if part of the input matches the pattern.

The input automate@DAS101.LO1> does not match the complete pattern das101.lo1> .

That explains the different result you get when you use find() instead of matches() .

Use ?i regex with .matches for case insensitive matching:

    // ?i = case insensitive match
    if (mIgn.matches("(?i:str)")) {
        ......
    } else {
        ......
    }

Use this Utility method for case-insensitive matches

 // utlity method for mathcesIgonerCase
 public static boolean mathcesIgonerCase(String string1, String sentence){
     return string1.matches("(?i:.*"+sentence+".*)");
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM