简体   繁体   English

Java - 检查STRING是否只包含某些字符的最佳方法是什么?

[英]Java - what is the best way to check if a STRING contains only certain characters?

I have this problem: I have a String , but I need to make sure that it only contains letters AZ and numbers 0-9 . 我有这个问题:我有一个String ,但我需要确保它只包含字母AZ和数字0-9 Here is my current code: 这是我目前的代码:

boolean valid = true;
for (char c : string.toCharArray()) {
    int type = Character.getType(c);
    if (type == 2 || type == 1 || type == 9) {
        // the character is either a letter or a digit
    } else {
        valid = false;
        break;
    }
}

But what is the best and the most efficient way to implement it? 但实施它的最佳和最有效的方法是什么?

Since no one else has worried about "fastest" yet, here is my contribution: 由于没有其他人担心“最快”,这是我的贡献:

boolean valid = true;

char[] a = s.toCharArray();

for (char c: a)
{
    valid = ((c >= 'a') && (c <= 'z')) || 
            ((c >= 'A') && (c <= 'Z')) || 
            ((c >= '0') && (c <= '9'));

    if (!valid)
    {
        break;
    }
}

return valid;

Full test code below: 完整测试代码如下:

public static void main(String[] args)
{
    String[] testStrings = {"abcdefghijklmnopqrstuvwxyz0123456789", "", "00000", "abcdefghijklmnopqrstuvwxyz0123456789&", "1", "q", "test123", "(#*$))&v", "ABC123", "hello", "supercalifragilisticexpialidocious"};

    long startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericOriginal(testString);
    }

    System.out.println("Time for isAlphaNumericOriginal: " + (System.nanoTime() - startNanos) + " ns"); 

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericFast(testString);
    }

    System.out.println("Time for isAlphaNumericFast: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericRegEx(testString);
    }

    System.out.println("Time for isAlphaNumericRegEx: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericIsLetterOrDigit(testString);
    }

    System.out.println("Time for isAlphaNumericIsLetterOrDigit: " + (System.nanoTime() - startNanos) + " ns");      
}

private static boolean isAlphaNumericOriginal(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) 
    {
        int type = Character.getType(c);
        if (type == 2 || type == 1 || type == 9) 
        {
            // the character is either a letter or a digit
        }
        else 
        {
            valid = false;
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericFast(String s)
{
    boolean valid = true;

    char[] a = s.toCharArray();

    for (char c: a)
    {
        valid = ((c >= 'a') && (c <= 'z')) || 
                ((c >= 'A') && (c <= 'Z')) || 
                ((c >= '0') && (c <= '9'));

        if (!valid)
        {
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericRegEx(String s)
{
    return Pattern.matches("[\\dA-Za-z]+", s);
}

private static boolean isAlphaNumericIsLetterOrDigit(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) { 
        if(!Character.isLetterOrDigit(c))
        {
            valid = false;
            break;
        }
    }
    return valid;
}

Produces this output for me: 为我生成此输出:

Time for isAlphaNumericOriginal: 164960 ns
Time for isAlphaNumericFast: 18472 ns
Time for isAlphaNumericRegEx: 1978230 ns
Time for isAlphaNumericIsLetterOrDigit: 110315 ns

If you want to avoid regex, then the Character class can help: 如果你想避免正则表达式,那么Character类可以帮助:

boolean valid = true;
for (char c : string.toCharArray()) { 
    if(!Character.isLetterOrDigit(c))
    {
        valid = false;
        break;
    }
}

If you care about being upper case, then do below if statement instead: 如果你关心的是大写,那么请在if if语句下面做:

if(!((Character.isLetter(c) && Character.isUpperCase(c)) || Character.isDigit(c)))

你可以使用Apache Commons Lang:

StringUtils.isAlphanumeric(String)

Additionally to all the other answers, here's a Guava approach: 除了所有其他答案,这里是一个番石榴方法:

boolean valid = CharMatcher.JAVA_LETTER_OR_DIGIT.matchesAllOf(string);

More on CharMatcher: https://code.google.com/p/guava-libraries/wiki/StringsExplained#CharMatcher 有关CharMatcher的更多信息: https//code.google.com/p/guava-libraries/wiki/StringsExplained#CharMatcher

Use a regular expression : 使用正则表达式

Pattern.matches("[\\dA-Z]+", string)

[\\\\dA-Z]+ : At least one occurrence (+) of digits or uppercase letters. [\\\\dA-Z]+ :至少出现一次(+)数字或大写字母。

If you want to include lowercase letter, replace [\\\\dA-Z]+ with [\\\\dA-Za-z]+ . 如果要包含小写字母,请将[\\\\dA-Z]+替换为[\\\\dA-Za-z]+

The following way is not as fast as Regular expression to implement but is one of the most efficient solution (I think) because it use bitwise operations which are really fast. 以下方法并不像正则表达式那样快,但它是最有效的解决方案之一(我认为),因为它使用非常快的按位运算。

My solution is more complex and harder to read and maintain but I think it is another simple way to do what you want. 我的解决方案更复杂,更难以阅读和维护,但我认为这是另一种简单的方法来做你想要的。

A good way to test that a string only contains numbers or capital letters is with a simple 128 bits bitmask (2 Longs) representing the ASCII table. 测试字符串只包含数字或大写字母的好方法是使用表示ASCII表的简单128 bits bitmask (2个长整数)。

So, For the standard ASCII table, there's a 1 on every character we want to keep (bit 48 to 57 and bit 65 to 90) 因此,对于标准ASCII表,我们要保留的每个字符都有1个(第48到57位和第65到第90位)

Thus, you can test that a char is a: 因此,您可以测试char是否:

  1. Number with this mask: 0x3FF000000000000L (if the character code < 65) 带此掩码的数字: 0x3FF000000000000L (如果字符代码<65)
  2. Uppercase letter with this mask: 0x3FFFFFFL (if the character code >=65) 带有此掩码的大写字母: 0x3FFFFFFL (如果字符代码> = 65)

So the following method should work: 所以下面的方法应该工作:

public boolean validate(String aString) {
    for (int i = 0; i < aString.length(); i++) {
        char c = aString.charAt(i);

        if ((c <= 64) & ((0x3FF000000000000L & (1L << c)) == 0) 
                | (c > 64) & ((0x3FFFFFFL & (1L << (c - 65))) == 0)) {
            return false;
        }
    }

    return true;
}

The best way in sense of maintainability and simplicity is the already posted regular expression. 可维护性和简单性的最佳方式是已经发布的正则表达式。 Once familiar the this technic you know what to expect and it is very easy to widen the criteria if needed. 熟悉这项技术后,您就会知道会发生什么,如果需要,可以很容易地扩大标准。 Downside of this is the performance. 这是性能的缺点。

The fastest way to go is the Array approach. 最快的方法是Array方法。 Checking if a character's numerical value falls in the wanted range ASCII AZ and 0-9 is nearly speed of light. 检查字符的数值是否落在所需范围内ASCII AZ和0-9几乎是光速。 But the maintainability is bad. 但可维护性很差。 Simplicity gone. 简单性消失了。

You could use and java 7 switch case with char approach but that's just as bad as the second. 您可以使用带有char方法的java 7 switch case,但这与第二种情况一样糟糕。

In the end, since we are talking about java, I would strongly suggest to use regular expressions. 最后,由于我们讨论的是java,我强烈建议使用正则表达式。

StringUtils in Apache Commons Lang 3 has a containsOnly method, https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html Apache Commons中的StringUtils Lang 3有一个containsOnly方法, https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html

The implementation should be fast enough. 实施应该足够快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM