简体   繁体   English

如何在Java中以不区分大小写的方式检查一个字符串是否包含另一个字符串?

[英]How to check if a String contains another String in a case insensitive manner in Java?

Say I have two strings,假设我有两个字符串,

String s1 = "AbBaCca";
String s2 = "bac";

I want to perform a check returning that s2 is contained within s1 .我想执行一个检查,返回s2是否包含在s1中。 I can do this with:我可以这样做:

return s1.contains(s2);

I am pretty sure that contains() is case sensitive, however I can't determine this for sure from reading the documentation.我很确定contains()区分大小写,但是我无法通过阅读文档确定这一点。 If it is then I suppose my best method would be something like:如果是这样,我想我最好的方法是:

return s1.toLowerCase().contains(s2.toLowerCase());

All this aside, is there another (possibly better) way to accomplish this without caring about case-sensitivity?除了所有这些,还有另一种(可能更好)的方法来实现这一点而不关心区分大小写吗?

Yes, contains is case sensitive.是的,包含区分大小写。 You can use java.util.regex.Pattern with the CASE_INSENSITIVE flag for case insensitive matching:您可以使用带有 CASE_INSENSITIVE 标志的 java.util.regex.Pattern 进行不区分大小写的匹配:

Pattern.compile(Pattern.quote(wantedStr), Pattern.CASE_INSENSITIVE).matcher(source).find();

EDIT: If s2 contains regex special characters (of which there are many) it's important to quote it first.编辑:如果 s2 包含正则表达式特殊字符(其中有很多),首先引用它很重要。 I've corrected my answer since it is the first one people will see, but vote up Matt Quail's since he pointed this out.我已经更正了我的答案,因为这是人们会看到的第一个答案,但是自从马特·奎尔指出这一点后,我就投票赞成。

One problem with the answer by Dave L. is when s2 contains regex markup such as \d , etc. Dave L. 的答案的一个问题是当 s2 包含正则表达式标记时,例如\d等。

You want to call Pattern.quote() on s2:您想在 s2 上调用 Pattern.quote():

Pattern.compile(Pattern.quote(s2), Pattern.CASE_INSENSITIVE).matcher(s1).find();

You can use您可以使用

org.apache.commons.lang3.StringUtils.containsIgnoreCase("AbBaCca", "bac");

The Apache Commons library is very useful for this sort of thing. Apache Commons库对于这类事情非常有用。 And this particular one may be better than regular expressions as regex is always expensive in terms of performance.而且这个特定的表达式可能比正则表达式更好,因为正则表达式在性能方面总是很昂贵。

A Faster Implementation: Utilizing String.regionMatches()更快的实现:利用String.regionMatches()

Using regexp can be relatively slow.使用正则表达式可能相对较慢。 It (being slow) doesn't matter if you just want to check in one case.如果您只想检查一种情况,它(缓慢)并不重要。 But if you have an array or a collection of thousands or hundreds of thousands of strings, things can get pretty slow.但是,如果您有一个数组或包含数千或数十万个字符串的集合,那么事情可能会变得非常缓慢。

The presented solution below doesn't use regular expressions nor toLowerCase() (which is also slow because it creates another strings and just throws them away after the check).下面提出的解决方案既不使用正则表达式也不使用toLowerCase() (这也很慢,因为它会创建另一个字符串并在检查后将它们丢弃)。

The solution builds on the String.regionMatches() method which seems to be unknown.该解决方案基于似乎未知的String.regionMatches()方法。 It checks if 2 String regions match, but what's important is that it also has an overload with a handy ignoreCase parameter.它检查 2 个String区域是否匹配,但重要的是它还有一个带有方便的ignoreCase参数的重载。

public static boolean containsIgnoreCase(String src, String what) {
    final int length = what.length();
    if (length == 0)
        return true; // Empty string is contained

    final char firstLo = Character.toLowerCase(what.charAt(0));
    final char firstUp = Character.toUpperCase(what.charAt(0));

    for (int i = src.length() - length; i >= 0; i--) {
        // Quick check before calling the more expensive regionMatches() method:
        final char ch = src.charAt(i);
        if (ch != firstLo && ch != firstUp)
            continue;

        if (src.regionMatches(true, i, what, 0, length))
            return true;
    }

    return false;
}

Speed Analysis速度分析

This speed analysis does not mean to be rocket science, just a rough picture of how fast the different methods are.这种速度分析并不意味着是火箭科学,只是对不同方法有多快的粗略描述。

I compare 5 methods.我比较了5种方法。

  1. Our containsIgnoreCase() method.我们的containsIgnoreCase()方法。
  2. By converting both strings to lower-case and call String.contains() .通过将两个字符串都转换为小写并调用String.contains()
  3. By converting source string to lower-case and call String.contains() with the pre-cached, lower-cased substring.通过将源字符串转换为小写并使用预缓存的小写子字符串调用String.contains() This solution is already not as flexible because it tests a predefiend substring.这个解决方案已经不那么灵活了,因为它测试了一个预先定义的子字符串。
  4. Using regular expression (the accepted answer Pattern.compile().matcher().find() ...)使用正则表达式(接受的答案Pattern.compile().matcher().find() ...)
  5. Using regular expression but with pre-created and cached Pattern .使用正则表达式,但使用预先创建和缓存的Pattern This solution is already not as flexible because it tests a predefined substring.这个解决方案已经不那么灵活了,因为它测试了一个预定义的子字符串。

Results (by calling the method 10 million times):结果(通过调用该方法 1000 万次):

  1. Our method: 670 ms我们的方法:670 毫秒
  2. 2x toLowerCase() and contains(): 2829 ms 2x toLowerCase() 和 contains():2829 毫秒
  3. 1x toLowerCase() and contains() with cached substring: 2446 ms 1x toLowerCase() 和 contains() 缓存子字符串:2446 毫秒
  4. Regexp: 7180 ms正则表达式:7180 毫秒
  5. Regexp with cached Pattern : 1845 ms带有缓存Pattern的正则表达式:1845 毫秒

Results in a table:结果在表格中:

                                            RELATIVE SPEED   1/RELATIVE SPEED
 METHOD                          EXEC TIME    TO SLOWEST      TO FASTEST (#1)
------------------------------------------------------------------------------
 1. Using regionMatches()          670 ms       10.7x            1.0x
 2. 2x lowercase+contains         2829 ms        2.5x            4.2x
 3. 1x lowercase+contains cache   2446 ms        2.9x            3.7x
 4. Regexp                        7180 ms        1.0x           10.7x
 5. Regexp+cached pattern         1845 ms        3.9x            2.8x

Our method is 4x faster compared to lowercasing and using contains() , 10x faster compared to using regular expressions and also 3x faster even if the Pattern is pre-cached (and losing flexibility of checking for an arbitrary substring).与小写和使用contains()相比,我们的方法快 4 倍与使用正则表达式相比快 10 倍,即使Pattern被预先缓存也快 3 倍(并且失去了检查任意子字符串的灵活性)。


Analysis Test Code分析测试代码

If you're interested how the analysis was performed, here is the complete runnable application:如果您对分析的执行方式感兴趣,这里是完整的可运行应用程序:

import java.util.regex.Pattern;

public class ContainsAnalysis {

    // Case 1 utilizing String.regionMatches()
    public static boolean containsIgnoreCase(String src, String what) {
        final int length = what.length();
        if (length == 0)
            return true; // Empty string is contained

        final char firstLo = Character.toLowerCase(what.charAt(0));
        final char firstUp = Character.toUpperCase(what.charAt(0));

        for (int i = src.length() - length; i >= 0; i--) {
            // Quick check before calling the more expensive regionMatches()
            // method:
            final char ch = src.charAt(i);
            if (ch != firstLo && ch != firstUp)
                continue;

            if (src.regionMatches(true, i, what, 0, length))
                return true;
        }

        return false;
    }

    // Case 2 with 2x toLowerCase() and contains()
    public static boolean containsConverting(String src, String what) {
        return src.toLowerCase().contains(what.toLowerCase());
    }

    // The cached substring for case 3
    private static final String S = "i am".toLowerCase();

    // Case 3 with pre-cached substring and 1x toLowerCase() and contains()
    public static boolean containsConverting(String src) {
        return src.toLowerCase().contains(S);
    }

    // Case 4 with regexp
    public static boolean containsIgnoreCaseRegexp(String src, String what) {
        return Pattern.compile(Pattern.quote(what), Pattern.CASE_INSENSITIVE)
                    .matcher(src).find();
    }

    // The cached pattern for case 5
    private static final Pattern P = Pattern.compile(
            Pattern.quote("i am"), Pattern.CASE_INSENSITIVE);

    // Case 5 with pre-cached Pattern
    public static boolean containsIgnoreCaseRegexp(String src) {
        return P.matcher(src).find();
    }

    // Main method: perfroms speed analysis on different contains methods
    // (case ignored)
    public static void main(String[] args) throws Exception {
        final String src = "Hi, I am Adam";
        final String what = "i am";

        long start, end;
        final int N = 10_000_000;

        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsIgnoreCase(src, what);
        end = System.nanoTime();
        System.out.println("Case 1 took " + ((end - start) / 1000000) + "ms");

        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsConverting(src, what);
        end = System.nanoTime();
        System.out.println("Case 2 took " + ((end - start) / 1000000) + "ms");

        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsConverting(src);
        end = System.nanoTime();
        System.out.println("Case 3 took " + ((end - start) / 1000000) + "ms");

        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsIgnoreCaseRegexp(src, what);
        end = System.nanoTime();
        System.out.println("Case 4 took " + ((end - start) / 1000000) + "ms");

        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsIgnoreCaseRegexp(src);
        end = System.nanoTime();
        System.out.println("Case 5 took " + ((end - start) / 1000000) + "ms");
    }

}

A simpler way of doing this (without worrying about pattern matching) would be converting both String s to lowercase:一种更简单的方法(无需担心模式匹配)是将两个String都转换为小写:

String foobar = "fooBar";
String bar = "FOO";
if (foobar.toLowerCase().contains(bar.toLowerCase()) {
    System.out.println("It's a match!");
}

Yes, this is achievable:是的,这是可以实现的:

String s1 = "abBaCca";
String s2 = "bac";

String s1Lower = s1;

//s1Lower is exact same string, now convert it to lowercase, I left the s1 intact for print purposes if needed

s1Lower = s1Lower.toLowerCase();

String trueStatement = "FALSE!";
if (s1Lower.contains(s2)) {

    //THIS statement will be TRUE
    trueStatement = "TRUE!"
}

return trueStatement;

This code will return the String "TRUE!"此代码将返回字符串“TRUE!” as it found that your characters were contained.因为它发现你的角色被包含在内。

您可以使用正则表达式,它可以工作:

boolean found = s1.matches("(?i).*" + s2+ ".*");

Here's some Unicode-friendly ones you can make if you pull in ICU4j.如果您引入 ICU4j,您可以制作一些对 Unicode 友好的代码。 I guess "ignore case" is questionable for the method names because although primary strength comparisons do ignore case, it's described as the specifics being locale-dependent.我猜“忽略大小写”对于方法名称是有问题的,因为虽然主要强度比较确实忽略了大小写,但它被描述为依赖于语言环境的细节。 But it's hopefully locale-dependent in a way the user would expect.但它希望以用户期望的方式依赖于语言环境。

public static boolean containsIgnoreCase(String haystack, String needle) {
    return indexOfIgnoreCase(haystack, needle) >= 0;
}

public static int indexOfIgnoreCase(String haystack, String needle) {
    StringSearch stringSearch = new StringSearch(needle, haystack);
    stringSearch.getCollator().setStrength(Collator.PRIMARY);
    return stringSearch.first();
}

I did a test finding a case-insensitive match of a string.我做了一个测试,找到一个不区分大小写的字符串匹配。 I have a Vector of 150,000 objects all with a String as one field and wanted to find the subset which matched a string.我有一个包含 150,000 个对象的向量,所有对象都以字符串作为一个字段,并希望找到与字符串匹配的子集。 I tried three methods:我尝试了三种方法:

  1. Convert all to lower case全部转换为小写

    for (SongInformation song: songs) { if (song.artist.toLowerCase().indexOf(pattern.toLowercase() > -1) { ... } }
  2. Use the String matches() method使用字符串匹配()方法

    for (SongInformation song: songs) { if (song.artist.matches("(?i).*" + pattern + ".*")) { ... } }
  3. Use regular expressions使用正则表达式

    Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(""); for (SongInformation song: songs) { m.reset(song.artist); if (m.find()) { ... } }

Timing results are:计时结果为:

  • No attempted match: 20 msecs未尝试匹配:20 毫秒

  • To lower match: 182 msecs降低匹配:182 毫秒

  • String matches: 278 msecs字符串匹配:278 毫秒

  • Regular expression: 65 msecs正则表达式:65 毫秒

The regular expression looks to be the fastest for this use case.对于这个用例,正则表达式看起来是最快的。

There is a simple concise way, using regex flag (case insensitive {i}):有一个简单简洁的方法,使用正则表达式标志(不区分大小写 {i}):

 String s1 = "hello abc efg";
 String s2 = "ABC";
 s1.matches(".*(?i)"+s2+".*");

/*
 * .*  denotes every character except line break
 * (?i) denotes case insensitivity flag enabled for s2 (String)
 * */
"AbCd".toLowerCase().contains("abcD".toLowerCase())

One way to do it to convert both strings to lower or upper case using toLowerCase() or toUpperCase() methods and test.使用 toLowerCase() 或 toUpperCase() 方法和测试将两个字符串转换为小写或大写的一种方法。

public class Sample {
   public static void main(String args[]){
      String str = "Hello Welcome to insensitive Container";
      String test = "Java Testing";
      Boolean bool = str.toLowerCase().contains(test.toLowerCase());
      System.out.println(bool);
   }
}

Here is another way for case insensitive matching using java.util.regex.Pattern with the CASE_INSENSITIVE flag.这是使用带有 CASE_INSENSITIVE 标志的 java.util.regex.Pattern 进行不区分大小写匹配的另一种方法。

Pattern.compile(Pattern.quote(s2), Pattern.CASE_INSENSITIVE).matcher(s1).find();

我不确定您的主要问题是什么,但是是的,.contains 区分大小写。

String container = " Case SeNsitive ";
String sub = "sen";
if (rcontains(container, sub)) {
    System.out.println("no case");
}

public static Boolean rcontains(String container, String sub) {

    Boolean b = false;
    for (int a = 0; a < container.length() - sub.length() + 1; a++) {
        //System.out.println(sub + " to " + container.substring(a, a+sub.length()));
        if (sub.equalsIgnoreCase(container.substring(a, a + sub.length()))) {
            b = true;
        }
    }
    return b;
}

Basically, it is a method that takes two strings.基本上,它是一种需要两个字符串的方法。 It is supposed to be a not-case sensitive version of contains().它应该是不区分大小写的 contains() 版本。 When using the contains method, you want to see if one string is contained in the other.使用 contains 方法时,您想查看一个字符串是否包含在另一个字符串中。

This method takes the string that is "sub" and checks if it is equal to the substrings of the container string that are equal in length to the "sub".此方法采用“sub”字符串并检查它是否等于容器字符串的长度等于“sub”的子字符串。 If you look at the for loop, you will see that it iterates in substrings (that are the length of the "sub") over the container string.如果您查看for循环,您会看到它在子字符串(即“sub”的长度)中迭代容器字符串。

Each iteration checks to see if the substring of the container string is equalsIgnoreCase to the sub.每次迭代都会检查容器字符串的子字符串是否equalsIgnoreCase子字符串。

If you have to search an ASCII string in another ASCII string, such as a URL , you will find my solution to be better.如果您必须在另一个 ASCII 字符串(例如URL )中搜索 ASCII 字符串,您会发现我的解决方案更好。 I've tested icza's method and mine for the speed and here are the results:我已经测试了 icza 的方法和我的速度,结果如下:

  • Case 1 took 2788 ms - regionMatches案例 1 耗时 2788 毫秒 - regionMatches
  • Case 2 took 1520 ms - my案例 2 耗时 1520 毫秒 - 我的

The code:编码:

public static String lowerCaseAscii(String s) {
    if (s == null)
        return null;

    int len = s.length();
    char[] buf = new char[len];
    s.getChars(0, len, buf, 0);
    for (int i=0; i<len; i++) {
        if (buf[i] >= 'A' && buf[i] <= 'Z')
            buf[i] += 0x20;
    }

    return new String(buf);
}

public static boolean containsIgnoreCaseAscii(String str, String searchStr) {
    return StringUtils.contains(lowerCaseAscii(str), lowerCaseAscii(searchStr));
}
import java.text.Normalizer;

import org.apache.commons.lang3.StringUtils;

public class ContainsIgnoreCase {

    public static void main(String[] args) {

        String in = "   Annulée ";
        String key = "annulee";

        // 100% java
        if (Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", "").toLowerCase().contains(key)) {
            System.out.println("OK");
        } else {
            System.out.println("KO");
        }

        // use commons.lang lib
        if (StringUtils.containsIgnoreCase(Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", ""), key)) {
            System.out.println("OK");
        } else {
            System.out.println("KO");
        }

    }

}

We can use stream with anyMatch and contains of Java 8我们可以使用带有 anyMatch 的流并包含 Java 8

public class Test2 {
    public static void main(String[] args) {

        String a = "Gina Gini Protijayi Soudipta";
        String b = "Gini";

        System.out.println(WordPresentOrNot(a, b));
    }// main

    private static boolean WordPresentOrNot(String a, String b) {
    //contains is case sensitive. That's why change it to upper or lower case. Then check
        // Here we are using stream with anyMatch
        boolean match = Arrays.stream(a.toLowerCase().split(" ")).anyMatch(b.toLowerCase()::contains);
        return match;
    }

}

或者您可以使用一种简单的方法,只需将字符串的大小写转换为子字符串的大小写,然后使用 contains 方法。

String x="abCd";
System.out.println(Pattern.compile("c",Pattern.CASE_INSENSITIVE).matcher(x).find());

You could simply do something like this:你可以简单地做这样的事情:

String s1 = "AbBaCca";
String s2 = "bac";
String toLower = s1.toLowerCase();
return toLower.contains(s2);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我们可以检查字符串是否包含在另一个不区分大小写的字符串中吗? - Can we check if string contains in another string with case insensitive? 转换列表 <String> 设置为CASE INSENSITIVE Manner-Java 6 - convert List<String> to Set in CASE INSENSITIVE Manner-Java 6 在Java 8中以不区分大小写的方式对字符串值进行排序 - Sorting string value in a case-insensitive manner in Java 8 Java,检查String是否是回文。不区分大小写 - Java, Check if a String is a palindrome. Case insensitive 如何将Java中的字符串与不区分大小写的hbase中存储的另一个字符串进行比较? - How to compare a string in Java with another string stored in hbase with case-insensitive? 使字符串在Java中不区分大小写 - Make the string case insensitive in java 如何检查一个流<String>包含另一个流<String>在 Java 8 中 - How to check if a Stream<String> contains another Stream<String> in Java 8 如何检查 Java 中的多行字符串是否包含另一个多行字符串? - How to check If a multiline String contains another multiline String in Java? 用于String的AllInnsensitive变量replaceAll(,)方法Java - Case Insensitive variable for String replaceAll(,) method Java Java-不区分大小写的拆分,带有可变拆分字符串 - Java - Case insensitive split WITH VARIABLE splitting string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM