[英]How to check if a String contains another String in a case insensitive manner in Java?
Say I have two strings,假设我有两个字符串,
String s1 = "AbBaCca";
String s2 = "bac";
I want to perform a check returning that s2
is contained within s1
.我想执行一个检查,返回
s2
是否包含在s1
中。 I can do this with:我可以这样做:
return s1.contains(s2);
I am pretty sure that contains()
is case sensitive, however I can't determine this for sure from reading the documentation.我很确定
contains()
区分大小写,但是我无法通过阅读文档确定这一点。 If it is then I suppose my best method would be something like:如果是这样,我想我最好的方法是:
return s1.toLowerCase().contains(s2.toLowerCase());
All this aside, is there another (possibly better) way to accomplish this without caring about case-sensitivity?除了所有这些,还有另一种(可能更好)的方法来实现这一点而不关心区分大小写吗?
Yes, contains is case sensitive.是的,包含区分大小写。 You can use java.util.regex.Pattern with the CASE_INSENSITIVE flag for case insensitive matching:
您可以使用带有 CASE_INSENSITIVE 标志的 java.util.regex.Pattern 进行不区分大小写的匹配:
Pattern.compile(Pattern.quote(wantedStr), Pattern.CASE_INSENSITIVE).matcher(source).find();
EDIT: If s2 contains regex special characters (of which there are many) it's important to quote it first.编辑:如果 s2 包含正则表达式特殊字符(其中有很多),首先引用它很重要。 I've corrected my answer since it is the first one people will see, but vote up Matt Quail's since he pointed this out.
我已经更正了我的答案,因为这是人们会看到的第一个答案,但是自从马特·奎尔指出这一点后,我就投票赞成。
One problem with the answer by Dave L. is when s2 contains regex markup such as \d
, etc. Dave L. 的答案的一个问题是当 s2 包含正则表达式标记时,例如
\d
等。
You want to call Pattern.quote() on s2:您想在 s2 上调用 Pattern.quote():
Pattern.compile(Pattern.quote(s2), Pattern.CASE_INSENSITIVE).matcher(s1).find();
You can use您可以使用
org.apache.commons.lang3.StringUtils.containsIgnoreCase("AbBaCca", "bac");
The Apache Commons library is very useful for this sort of thing. Apache Commons库对于这类事情非常有用。 And this particular one may be better than regular expressions as regex is always expensive in terms of performance.
而且这个特定的表达式可能比正则表达式更好,因为正则表达式在性能方面总是很昂贵。
String.regionMatches()
String.regionMatches()
Using regexp can be relatively slow.使用正则表达式可能相对较慢。 It (being slow) doesn't matter if you just want to check in one case.
如果您只想检查一种情况,它(缓慢)并不重要。 But if you have an array or a collection of thousands or hundreds of thousands of strings, things can get pretty slow.
但是,如果您有一个数组或包含数千或数十万个字符串的集合,那么事情可能会变得非常缓慢。
The presented solution below doesn't use regular expressions nor toLowerCase()
(which is also slow because it creates another strings and just throws them away after the check).下面提出的解决方案既不使用正则表达式也不使用
toLowerCase()
(这也很慢,因为它会创建另一个字符串并在检查后将它们丢弃)。
The solution builds on the String.regionMatches() method which seems to be unknown.该解决方案基于似乎未知的String.regionMatches()方法。 It checks if 2
String
regions match, but what's important is that it also has an overload with a handy ignoreCase
parameter.它检查 2 个
String
区域是否匹配,但重要的是它还有一个带有方便的ignoreCase
参数的重载。
public static boolean containsIgnoreCase(String src, String what) {
final int length = what.length();
if (length == 0)
return true; // Empty string is contained
final char firstLo = Character.toLowerCase(what.charAt(0));
final char firstUp = Character.toUpperCase(what.charAt(0));
for (int i = src.length() - length; i >= 0; i--) {
// Quick check before calling the more expensive regionMatches() method:
final char ch = src.charAt(i);
if (ch != firstLo && ch != firstUp)
continue;
if (src.regionMatches(true, i, what, 0, length))
return true;
}
return false;
}
This speed analysis does not mean to be rocket science, just a rough picture of how fast the different methods are.这种速度分析并不意味着是火箭科学,只是对不同方法有多快的粗略描述。
I compare 5 methods.我比较了5种方法。
String.contains()
.String.contains()
。String.contains()
with the pre-cached, lower-cased substring.String.contains()
。 This solution is already not as flexible because it tests a predefiend substring.Pattern.compile().matcher().find()
...)Pattern.compile().matcher().find()
...)Pattern
.Pattern
。 This solution is already not as flexible because it tests a predefined substring. Results (by calling the method 10 million times):结果(通过调用该方法 1000 万次):
Pattern
: 1845 msPattern
的正则表达式:1845 毫秒Results in a table:结果在表格中:
RELATIVE SPEED 1/RELATIVE SPEED
METHOD EXEC TIME TO SLOWEST TO FASTEST (#1)
------------------------------------------------------------------------------
1. Using regionMatches() 670 ms 10.7x 1.0x
2. 2x lowercase+contains 2829 ms 2.5x 4.2x
3. 1x lowercase+contains cache 2446 ms 2.9x 3.7x
4. Regexp 7180 ms 1.0x 10.7x
5. Regexp+cached pattern 1845 ms 3.9x 2.8x
Our method is 4x faster compared to lowercasing and using contains()
, 10x faster compared to using regular expressions and also 3x faster even if the Pattern
is pre-cached (and losing flexibility of checking for an arbitrary substring).与小写和使用
contains()
相比,我们的方法快 4 倍,与使用正则表达式相比快 10 倍,即使Pattern
被预先缓存也快 3 倍(并且失去了检查任意子字符串的灵活性)。
If you're interested how the analysis was performed, here is the complete runnable application:如果您对分析的执行方式感兴趣,这里是完整的可运行应用程序:
import java.util.regex.Pattern;
public class ContainsAnalysis {
// Case 1 utilizing String.regionMatches()
public static boolean containsIgnoreCase(String src, String what) {
final int length = what.length();
if (length == 0)
return true; // Empty string is contained
final char firstLo = Character.toLowerCase(what.charAt(0));
final char firstUp = Character.toUpperCase(what.charAt(0));
for (int i = src.length() - length; i >= 0; i--) {
// Quick check before calling the more expensive regionMatches()
// method:
final char ch = src.charAt(i);
if (ch != firstLo && ch != firstUp)
continue;
if (src.regionMatches(true, i, what, 0, length))
return true;
}
return false;
}
// Case 2 with 2x toLowerCase() and contains()
public static boolean containsConverting(String src, String what) {
return src.toLowerCase().contains(what.toLowerCase());
}
// The cached substring for case 3
private static final String S = "i am".toLowerCase();
// Case 3 with pre-cached substring and 1x toLowerCase() and contains()
public static boolean containsConverting(String src) {
return src.toLowerCase().contains(S);
}
// Case 4 with regexp
public static boolean containsIgnoreCaseRegexp(String src, String what) {
return Pattern.compile(Pattern.quote(what), Pattern.CASE_INSENSITIVE)
.matcher(src).find();
}
// The cached pattern for case 5
private static final Pattern P = Pattern.compile(
Pattern.quote("i am"), Pattern.CASE_INSENSITIVE);
// Case 5 with pre-cached Pattern
public static boolean containsIgnoreCaseRegexp(String src) {
return P.matcher(src).find();
}
// Main method: perfroms speed analysis on different contains methods
// (case ignored)
public static void main(String[] args) throws Exception {
final String src = "Hi, I am Adam";
final String what = "i am";
long start, end;
final int N = 10_000_000;
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsIgnoreCase(src, what);
end = System.nanoTime();
System.out.println("Case 1 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsConverting(src, what);
end = System.nanoTime();
System.out.println("Case 2 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsConverting(src);
end = System.nanoTime();
System.out.println("Case 3 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsIgnoreCaseRegexp(src, what);
end = System.nanoTime();
System.out.println("Case 4 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsIgnoreCaseRegexp(src);
end = System.nanoTime();
System.out.println("Case 5 took " + ((end - start) / 1000000) + "ms");
}
}
A simpler way of doing this (without worrying about pattern matching) would be converting both String
s to lowercase:一种更简单的方法(无需担心模式匹配)是将两个
String
都转换为小写:
String foobar = "fooBar";
String bar = "FOO";
if (foobar.toLowerCase().contains(bar.toLowerCase()) {
System.out.println("It's a match!");
}
Yes, this is achievable:是的,这是可以实现的:
String s1 = "abBaCca";
String s2 = "bac";
String s1Lower = s1;
//s1Lower is exact same string, now convert it to lowercase, I left the s1 intact for print purposes if needed
s1Lower = s1Lower.toLowerCase();
String trueStatement = "FALSE!";
if (s1Lower.contains(s2)) {
//THIS statement will be TRUE
trueStatement = "TRUE!"
}
return trueStatement;
This code will return the String "TRUE!"此代码将返回字符串“TRUE!” as it found that your characters were contained.
因为它发现你的角色被包含在内。
您可以使用正则表达式,它可以工作:
boolean found = s1.matches("(?i).*" + s2+ ".*");
Here's some Unicode-friendly ones you can make if you pull in ICU4j.如果您引入 ICU4j,您可以制作一些对 Unicode 友好的代码。 I guess "ignore case" is questionable for the method names because although primary strength comparisons do ignore case, it's described as the specifics being locale-dependent.
我猜“忽略大小写”对于方法名称是有问题的,因为虽然主要强度比较确实忽略了大小写,但它被描述为依赖于语言环境的细节。 But it's hopefully locale-dependent in a way the user would expect.
但它希望以用户期望的方式依赖于语言环境。
public static boolean containsIgnoreCase(String haystack, String needle) {
return indexOfIgnoreCase(haystack, needle) >= 0;
}
public static int indexOfIgnoreCase(String haystack, String needle) {
StringSearch stringSearch = new StringSearch(needle, haystack);
stringSearch.getCollator().setStrength(Collator.PRIMARY);
return stringSearch.first();
}
I did a test finding a case-insensitive match of a string.我做了一个测试,找到一个不区分大小写的字符串匹配。 I have a Vector of 150,000 objects all with a String as one field and wanted to find the subset which matched a string.
我有一个包含 150,000 个对象的向量,所有对象都以字符串作为一个字段,并希望找到与字符串匹配的子集。 I tried three methods:
我尝试了三种方法:
Convert all to lower case全部转换为小写
for (SongInformation song: songs) { if (song.artist.toLowerCase().indexOf(pattern.toLowercase() > -1) { ... } }
Use the String matches() method使用字符串匹配()方法
for (SongInformation song: songs) { if (song.artist.matches("(?i).*" + pattern + ".*")) { ... } }
Use regular expressions使用正则表达式
Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(""); for (SongInformation song: songs) { m.reset(song.artist); if (m.find()) { ... } }
Timing results are:计时结果为:
No attempted match: 20 msecs未尝试匹配:20 毫秒
To lower match: 182 msecs降低匹配:182 毫秒
String matches: 278 msecs字符串匹配:278 毫秒
Regular expression: 65 msecs正则表达式:65 毫秒
The regular expression looks to be the fastest for this use case.对于这个用例,正则表达式看起来是最快的。
There is a simple concise way, using regex flag (case insensitive {i}):有一个简单简洁的方法,使用正则表达式标志(不区分大小写 {i}):
String s1 = "hello abc efg";
String s2 = "ABC";
s1.matches(".*(?i)"+s2+".*");
/*
* .* denotes every character except line break
* (?i) denotes case insensitivity flag enabled for s2 (String)
* */
"AbCd".toLowerCase().contains("abcD".toLowerCase())
One way to do it to convert both strings to lower or upper case using toLowerCase() or toUpperCase() methods and test.使用 toLowerCase() 或 toUpperCase() 方法和测试将两个字符串转换为小写或大写的一种方法。
public class Sample {
public static void main(String args[]){
String str = "Hello Welcome to insensitive Container";
String test = "Java Testing";
Boolean bool = str.toLowerCase().contains(test.toLowerCase());
System.out.println(bool);
}
}
Here is another way for case insensitive matching using java.util.regex.Pattern with the CASE_INSENSITIVE flag.这是使用带有 CASE_INSENSITIVE 标志的 java.util.regex.Pattern 进行不区分大小写匹配的另一种方法。
Pattern.compile(Pattern.quote(s2), Pattern.CASE_INSENSITIVE).matcher(s1).find();
我不确定您的主要问题是什么,但是是的,.contains 区分大小写。
String container = " Case SeNsitive ";
String sub = "sen";
if (rcontains(container, sub)) {
System.out.println("no case");
}
public static Boolean rcontains(String container, String sub) {
Boolean b = false;
for (int a = 0; a < container.length() - sub.length() + 1; a++) {
//System.out.println(sub + " to " + container.substring(a, a+sub.length()));
if (sub.equalsIgnoreCase(container.substring(a, a + sub.length()))) {
b = true;
}
}
return b;
}
Basically, it is a method that takes two strings.基本上,它是一种需要两个字符串的方法。 It is supposed to be a not-case sensitive version of contains().
它应该是不区分大小写的 contains() 版本。 When using the contains method, you want to see if one string is contained in the other.
使用 contains 方法时,您想查看一个字符串是否包含在另一个字符串中。
This method takes the string that is "sub" and checks if it is equal to the substrings of the container string that are equal in length to the "sub".此方法采用“sub”字符串并检查它是否等于容器字符串的长度等于“sub”的子字符串。 If you look at the
for
loop, you will see that it iterates in substrings (that are the length of the "sub") over the container string.如果您查看
for
循环,您会看到它在子字符串(即“sub”的长度)中迭代容器字符串。
Each iteration checks to see if the substring of the container string is equalsIgnoreCase
to the sub.每次迭代都会检查容器字符串的子字符串是否
equalsIgnoreCase
子字符串。
If you have to search an ASCII string in another ASCII string, such as a URL , you will find my solution to be better.如果您必须在另一个 ASCII 字符串(例如URL )中搜索 ASCII 字符串,您会发现我的解决方案更好。 I've tested icza's method and mine for the speed and here are the results:
我已经测试了 icza 的方法和我的速度,结果如下:
The code:编码:
public static String lowerCaseAscii(String s) {
if (s == null)
return null;
int len = s.length();
char[] buf = new char[len];
s.getChars(0, len, buf, 0);
for (int i=0; i<len; i++) {
if (buf[i] >= 'A' && buf[i] <= 'Z')
buf[i] += 0x20;
}
return new String(buf);
}
public static boolean containsIgnoreCaseAscii(String str, String searchStr) {
return StringUtils.contains(lowerCaseAscii(str), lowerCaseAscii(searchStr));
}
import java.text.Normalizer;
import org.apache.commons.lang3.StringUtils;
public class ContainsIgnoreCase {
public static void main(String[] args) {
String in = " Annulée ";
String key = "annulee";
// 100% java
if (Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", "").toLowerCase().contains(key)) {
System.out.println("OK");
} else {
System.out.println("KO");
}
// use commons.lang lib
if (StringUtils.containsIgnoreCase(Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", ""), key)) {
System.out.println("OK");
} else {
System.out.println("KO");
}
}
}
We can use stream with anyMatch and contains of Java 8我们可以使用带有 anyMatch 的流并包含 Java 8
public class Test2 {
public static void main(String[] args) {
String a = "Gina Gini Protijayi Soudipta";
String b = "Gini";
System.out.println(WordPresentOrNot(a, b));
}// main
private static boolean WordPresentOrNot(String a, String b) {
//contains is case sensitive. That's why change it to upper or lower case. Then check
// Here we are using stream with anyMatch
boolean match = Arrays.stream(a.toLowerCase().split(" ")).anyMatch(b.toLowerCase()::contains);
return match;
}
}
或者您可以使用一种简单的方法,只需将字符串的大小写转换为子字符串的大小写,然后使用 contains 方法。
String x="abCd";
System.out.println(Pattern.compile("c",Pattern.CASE_INSENSITIVE).matcher(x).find());
You could simply do something like this:你可以简单地做这样的事情:
String s1 = "AbBaCca";
String s2 = "bac";
String toLower = s1.toLowerCase();
return toLower.contains(s2);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.