[英]Regular expressions: how to make java treat polish letters as normal \w?
Java将波兰字母视为ó
而不是字母\\w
。 现在,我不知道如何编写正则表达式来完成以下所有单元测试。
如何更改BEFORE_LANGUAGE
和AFTER_LANGUAGE
以完成测试:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.junit.Test;
import junit.framework.TestCase;
public class tmpTest extends TestCase{
final String BEFORE_LANGUAGE = "(?<![\\w\\p{S}])";
final String AFTER_LANGUAGE = "\\d*((?![\\w\\p{S}])|(<))";
@Test
public void test1() {
// Given:
String language = ".net";
String text = "xxxxxxx xxx .net";
String regex = BEFORE_LANGUAGE + Pattern.quote(language) + AFTER_LANGUAGE;
// When:
Matcher m = Pattern.compile(regex).matcher(text);
// Then:
assertTrue(m.find());
}
@Test
public void test2() {
// Given:
String language = ".net";
String text = "xxxxxxx xxx .net<br>";
String regex = BEFORE_LANGUAGE + Pattern.quote(language) + AFTER_LANGUAGE;
// When:
Matcher m = Pattern.compile(regex).matcher(text);
// Then:
assertTrue(m.find());
}
@Test
public void test3() {
// Given:
String language = "c++";
String text = "xxxxxxx xxx c++";
String regex = BEFORE_LANGUAGE + Pattern.quote(language) + AFTER_LANGUAGE;
// When:
Matcher m = Pattern.compile(regex).matcher(text);
// Then:
assertTrue(m.find());
}
@Test
public void test4() {
// Given:
String language = "c";
String text = "xxxxxxx xxx c++";
String regex = BEFORE_LANGUAGE + Pattern.quote(language) + AFTER_LANGUAGE;
// When:
Matcher m = Pattern.compile(regex).matcher(text);
// Then:
assertFalse(m.find());
}
@Test
public void test5() {
// Given:
String language = "r";
String text = "xxxxxxx xxx różne";
String regex = BEFORE_LANGUAGE + Pattern.quote(language) + AFTER_LANGUAGE;
// When:
Matcher m = Pattern.compile(regex).matcher(text);
// Then:
assertFalse(m.find());
}
@Test
public void test6() {
// Given:
String language = "r";
String text = "xxxxxxx xxx r";
String regex = BEFORE_LANGUAGE + Pattern.quote(language) + AFTER_LANGUAGE;
// When:
Matcher m = Pattern.compile(regex).matcher(text);
// Then:
assertTrue(m.find());
}
}
要使\\w
和其他速记字符类能够Pattern.UNICODE_CHARACTER_CLASS
Unicode,请将Pattern.UNICODE_CHARACTER_CLASS
标志传递给已编译的模式:
Pattern.compile(regex, Pattern.UNICODE_CHARACTER_CLASS).matcher(text);
无需重写当前模式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.