简体   繁体   中英

Check String whether it contains only Latin characters?

Greetings,

I am developing GWT application where user can enter his details in Japanese. But the 'userid' and 'password' should only contain English characters(Latin Alphabet). How to validate Strings for this?

You can use String#matches() with a bit regex for this. Latin characters are covered by \\w .

So this should do:

boolean valid = input.matches("\\w+");

This by the way also covers numbers and the underscore _ . Not sure if that harms. Else you can just use [A-Za-z]+ instead.

If you want to cover diacritical characters as well (ä, é, ò, and so on, those are per definition also Latin characters), then you need to normalize them first and get rid of the diacritical marks before matching, simply because there's no (documented) regex which covers diacriticals.

String clean = Normalizer.normalize(input, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = clean.matches("\\w+");

Update : there's an undocumented regex in Java which covers diacriticals as well, the \\p{L} .

boolean valid = input.matches("\\p{L}+");

Above works at Java 1.6.

public static boolean isValidISOLatin1 (String s) {
    return Charset.forName("US-ASCII").newEncoder().canEncode(s);
} // or "ISO-8859-1" for ISO Latin 1

For reference, see the documentation on Charset .

There might be a better approach, but you could load a collection with whatever you deem to be acceptable characters, and then check each character in the username/password field against that collection.

Pseudo:


foreach (character in username)
{
    if !allowedCharacters.contains(character)
    {
        throw exception
    }
}

For something this simple, I'd use a regular expression.

private static final Pattern p = Pattern.compile("\\p{Alpha}+");

static boolean isValid(String input) {
  Matcher m = p.matcher(input);
  return m.matches();
}

There are other pre-defined classes like \\w that might work better.

I successfully used a combination of the answers of user232624, Joachim Sauer and Tvaroh :

static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII"); // or "ISO-8859-1" for ISO Latin 1

boolean isValid(String input) {    
    return Character.isLetter(ch) && asciiEncoder.canEncode(username);
}

There is my solution and it is working excellent

public static boolean isStringContainsLatinCharactersOnly(final String iStringToCheck)
{
    return iStringToCheck.matches("^[a-zA-Z0-9.]+$");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM