简体   繁体   English

不带锚点的Java中不区分大小写的字符串匹配

[英]Case-insensitive string matching in Java without anchors

NOTE : This is NOT a question about case-insensitive matching. 注意 :这不是关于不区分大小写匹配的问题。 It is a question about regex anchors. 这是关于regex锚点的问题。

I'm having a lot of trouble doing basic case insensitive matching in Java: 我在Java中进行基本的不区分大小写的匹配时遇到了很多麻烦:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class match {
    public static void main(String[] args) {
        String prompt="das101.lo1>";
        String str="automate@DAS101.LO1>";

        Pattern ignore = Pattern.compile(prompt.toUpperCase(), Pattern.CASE_INSENSITIVE);
        Matcher mIgn  = ignore.matcher(str);
        if(mIgn.matches())
            System.out.println(str+" Matches " + prompt.toUpperCase());
        else
            System.out.println(str+" Doesn't Match " + prompt.toUpperCase());

        char[] cStr = str.toCharArray();
        char[] cPrompt = prompt.toUpperCase().toCharArray();

        /* Verify that strings match */
        for(int i=cPrompt.length-1, j=cStr.length-1; i>=0 && j>=0 ; --i,--j) {
            if (cPrompt[i]==cStr[j])
                System.out.println("Same: "+ cPrompt[i]+":" + cStr[j]);
            else
                System.out.println("Different: "+ cPrompt[i]+":" + cStr[j]);
        }
    }
}

The output: 输出:

samveen@javadev-tahr:/tmp$ javac match.java
samveen@javadev-tahr:/tmp$ java match
automate@DAS101.LO1> Doesn't Match DAS101.LO1>
Same: >:>
Same: 1:1
Same: O:O
Same: L:L
Same: .:.
Same: 1:1
Same: 0:0
Same: 1:1
Same: S:S
Same: A:A
Same: D:D

If I change if(mIgn.matches()) to if(mIgn.find()) , I get this simple string pattern match working: 如果我将if(mIgn.matches())更改为if(mIgn.find()) ,我会得到这个简单的字符串模式匹配工作:

samveen@javadev-tahr:/tmp$ javac match.java
samveen@javadev-tahr:/tmp$ java match
automate@DAS101.LO1> Matches DAS101.LO1>
Same: >:>
Same: 1:1
Same: O:O
Same: L:L
Same: .:.
Same: 1:1
Same: 0:0
Same: 1:1
Same: S:S
Same: A:A
Same: D:D

Where am I going wrong? 我哪里错了?

I referred to Case-Insensitive Matching in Java RegEx and Methods of the Pattern Class 我提到了Java RegEx中的Case-Insensitive MatchingPattern Class的方法

String.matches requires the entire string to match the pattern. String.matches需要整个字符串匹配模式。 As if the pattern has an implied "^...$". 好像模式有隐含的“^ ... $”。

Pattern ignore = Pattern.compile(".*" + Pattern.quote(prompt) + ".*",
    Pattern.CASE_INSENSITIVE);

is for a find like match. 是为了寻找匹配。

This could have been done with the original pattern as: 这可以使用原始模式完成:

if (mIgn.find()) {
    System.out.println("Found at position " + mIgn.start());
}

Matches return true if the whole string matches the given pattern. 如果整个字符串与给定模式匹配,则匹配返回true。 For this it prefix ur matcher with '^' and suffix with '$' sign and hence it is not going to look for a substring. 为此,它为ur匹配前缀为'^',后缀为'$'符号,因此它不会查找子字符串。

find() return true in case of substring matches also. find()在子串匹配的情况下也返回true。

Have a look - Difference between matches() and find() in Java Regex 看看 - Java Regex中的matches()和find()之间的区别

matches() only returns true if the whole input matches the pattern, not if part of the input matches the pattern. matches()仅在整个输入与模式匹配时才返回true ,而不是在输入的一部分与模式匹配时返回true

The input automate@DAS101.LO1> does not match the complete pattern das101.lo1> . 输入automate@DAS101.LO1>与完整模式das101.lo1>不匹配。

That explains the different result you get when you use find() instead of matches() . 这解释了使用find()而不是matches()时得到的不同结果。

Use ?i regex with .matches for case insensitive matching: 使用?i regex和.matches进行不区分大小写的匹配:

    // ?i = case insensitive match
    if (mIgn.matches("(?i:str)")) {
        ......
    } else {
        ......
    }

Use this Utility method for case-insensitive matches 将此实用程序方法用于不区分大小写的匹配项

 // utlity method for mathcesIgonerCase
 public static boolean mathcesIgonerCase(String string1, String sentence){
     return string1.matches("(?i:.*"+sentence+".*)");
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM