简体   繁体   中英

Java regex : Case insensitive matching for non English characters

I am trying to perform case insensitive matching with Pattern and Matcher classes in Java, for Russian language. Below is the text:

"some text газированных напитков some other text"

Below is the Pattern I am using to match the text:

Pattern pattern = Pattern.compile("(?iu)\\b(" + Pattern.quote("напитки") + ")\\b", Pattern.UNICODE_CHARACTER_CLASS);

I am expecting the following to return true as it's a case insensitive comparison ( напитки vs напитков ):

System.out.println(pattern.matcher("some text газированных напитков some other text").find());

But it always returns false . I have tried with other Pattern constants (like CASE_INSENSITIVE , UNICODE_CASE , CANON_EQ ), however, it still returns false .

Is there any way in Java to perform such comparison? Is it even possible at all?

Just add this option in your Pattern:

Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE

This worked in all my cases for cyrrilic. And I use it really extensively.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM