简体   繁体   English

检查该字符串是否包含非拉丁字母

[英]Check that string contains non-latin letters

I have the following method to check that string contains only latin symbols. 我有以下方法来检查该字符串是否只包含拉丁符号。

private boolean containsNonLatin(String val) {
        return val.matches("\\w+");
}

But it returns false if I pass string: my string because it contains space. 但是如果我传递string: my string它返回false,因为它包含空格。 But I need the method which will check that if string contains letters not in Latin alphabet it should return false and it should return true in all other cases. 但我需要的方法是检查如果字符串包含不是拉丁字母的字母,它应该返回false,它应该在所有其他情况下返回true。

Please help to improve my method. 请帮助改进我的方法。

examples of valid strings: 有效字符串的示例:

w123.
w, 12
w#123
dsf%&@

You can use \\p{IsLatin} class: 你可以使用\\p{IsLatin}类:

return !(var.matches("[\\p{Punct}\\p{Space}\\p{IsLatin}]+$"));

Java Regex Reference Java Regex参考

I need something like not p{IsLatin} 我需要像p{IsLatin}这样的东西

If you need to match all letters but Latin ASCII letters, you can use 如果您需要匹配除拉丁ASCII字母之外的所有字母,您可以使用

"[\\p{L}\\p{M}&&[^\\p{Alpha}]]+"

The \\p{Alpha} POSIX class matches [A-Za-z] . \\p{Alpha} POSIX类匹配[A-Za-z] The \\p{L} matches any Unicode base letter, \\p{M} matches diacritics. \\p{L}匹配任何Unicode基本字母, \\p{M}匹配变音符号。 When we add &&[^\\p{Alpha}] we subtract these [A-Za-z] from all the Unicode letters. 当我们添加&&[^\\p{Alpha}]我们从所有Unicode字母中减去这些[A-Za-z]

The whole expression means match one or more Unicode letters other than ASCII letters . 整个表达式意味着匹配ASCII字母以外的一个或多个Unicode字母

To add a space, just add \\s : 要添加空格,只需添加\\s

"[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+"

See IDEONE demo : 请参阅IDEONE演示

List<String> strs = Arrays.asList("w123.", "w, 12", "w#123", "dsf%&@", "Двв");
for (String str : strs)
    System.out.println(!str.matches("[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+")); // => 4 true, 1 false

Just add a space to your matcher: 只需为匹配器添加一个空格:

private boolean isLatin(String val) {
    return val.matches("[ \\w]+");
}

User this : 用户这个:

public static boolean isNoAlphaNumeric(String s) {
       return s.matches("[\\p{L}\\s]+");
}
  • \\p{L} means any Unicode letter. \\p{L}表示任何Unicode字母。
  • \\s space character \\s空间角色

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM