简体   繁体   English

isNumber(string)方法的最佳实现

[英]Best implementation for an isNumber(string) method

In my limited experience, I've been on several projects that have had some sort of string utility class with methods to determine if a given string is a number. 在我有限的经验中,我参与了几个项目,这些项目有一些字符串实用程序类,其中包含确定给定字符串是否为数字的方法。 The idea has always been the same, however, the implementation has been different. 这个想法一直都是一样的,然而,实施却有所不同。 Some surround a parse attempt with try/catch 有些使用try / catch包围解析尝试

public boolean isInteger(String str) {
    try {
        Integer.parseInt(str);
        return true;
    } catch (NumberFormatException nfe) {}
    return false;
}

and others match with regex 和其他人匹配正则表达式

public boolean isInteger(String str) {
    return str.matches("^-?[0-9]+(\\.[0-9]+)?$");
}

Is one of these methods better than the other? 这些方法中的一种比另一种更好吗? I personally prefer using the regex approach, as it's concise, but will it perform on par if called while iterating over, say, a list of a several hundred thousand strings? 我个人更喜欢使用正则表达式方法,因为它很简洁,但是如果在迭代过程中调用,例如,数十万个字符串的列表,它会在par上执行吗?

Note: As I'm kinda new to the site I don't fully understand this Community Wiki business, so if this belongs there let me know, and I'll gladly move it. 注意:由于我是网站的新手,我不完全理解这个社区Wiki业务,所以如果这属于那里让我知道,我很乐意移动它。

EDIT: With all the TryParse suggestions I ported Asaph's benchmark code (thanks for a great post!) to C# and added a TryParse method. 编辑:有了所有的TryParse建议,我把Asaph的基准代码(感谢一个很棒的帖子!)移植到C#并添加了一个TryParse方法。 And as it seems, the TryParse wins hands down. 而且看起来,TryParse赢得了胜利。 However, the try catch approach took a crazy amount of time. 然而,尝试捕获方法耗费了大量时间。 To the point of me thinking I did something wrong! 我认为我做错了什么! I also updated regex to handle negatives and decimal points. 我还更新了正则表达式来处理负数和小数点。

Results for updated, C# benchmark code: 更新的C#基准代码的结果:

00:00:51.7390000 for isIntegerParseInt
00:00:03.9110000 for isIntegerRegex
00:00:00.3500000 for isIntegerTryParse

Using: 使用:

static bool isIntegerParseInt(string str) {
    try {
        int.Parse(str);
        return true;
    } catch (FormatException e){}
    return false;
}

static bool isIntegerRegex(string str) {
    return Regex.Match(str, "^-?[0-9]+(\\.[0-9]+)?$").Success;
}

static bool isIntegerTryParse(string str) {
    int bob;
    return Int32.TryParse(str, out bob);
}

I just ran some benchmarks on the performance of these 2 methods (On Macbook Pro OSX Leopard Java 6). 我刚刚对这两种方法的性能进行了一些基准测试(在Macbook Pro OSX Leopard Java 6上)。 ParseInt is faster. ParseInt更快。 Here is the output: 这是输出:

This operation took 1562 ms.
This operation took 2251 ms.

And here is my benchmark code: 这是我的基准代码:


public class IsIntegerPerformanceTest {

    public static boolean isIntegerParseInt(String str) {
        try {
            Integer.parseInt(str);
            return true;
        } catch (NumberFormatException nfe) {}
        return false;
    }

    public static boolean isIntegerRegex(String str) {
        return str.matches("^[0-9]+$");
    }

    public static void main(String[] args) {
        long starttime, endtime;
        int iterations = 1000000;
        starttime = System.currentTimeMillis();
        for (int i=0; i<iterations; i++) {
            isIntegerParseInt("123");
            isIntegerParseInt("not an int");
            isIntegerParseInt("-321");
        }
        endtime = System.currentTimeMillis();
        System.out.println("This operation took " + (endtime - starttime) + " ms.");
        starttime = System.currentTimeMillis();
        for (int i=0; i<iterations; i++) {
            isIntegerRegex("123");
            isIntegerRegex("not an int");
            isIntegerRegex("-321");
        }
        endtime = System.currentTimeMillis();
        System.out.println("This operation took " + (endtime - starttime) + " ms.");
    }
}

Also, note that your regex will reject negative numbers and the parseInt method will accept them. 另请注意,您的正则表达式将拒绝负数,而parseInt方法将接受它们。

Here is our way of doing this: 这是我们这样做的方式:

public boolean isNumeric(String string) throws IllegalArgumentException
{
   boolean isnumeric = false;

   if (string != null && !string.equals(""))
   {
      isnumeric = true;
      char chars[] = string.toCharArray();

      for(int d = 0; d < chars.length; d++)
      {
         isnumeric &= Character.isDigit(chars[d]);

         if(!isnumeric)
         break;
      }
   }
   return isnumeric;
}

If absolute performance is key, and if you are just checking for integers (not floating point numbers) I suspect that iterating over each character in the string, returning false if you encounter something not in the range 0-9, will be fastest. 如果绝对性能是关键,并且如果你只是检查整数(不是浮点数)我怀疑迭代字符串中的每个字符,如果你遇到不在0-9范围内的东西则返回false将是最快的。

RegEx is a more general-purpose solution so will probably not perform as fast for that special case. RegEx是一种更通用的解决方案,因此在特殊情况下可能无法快速执行。 A solution that throws an exception will have some extra overhead in that case. 抛出异常的解决方案在这种情况下会有一些额外的开销。 TryParse will be slightly slower if you don't actually care about the value of the number, just whether or not it is a number, since the conversion to a number must also take place. 如果您实际上并不关心数字的值,TryParse会稍微慢一些,只要它是否是数字,因为转换为数字也必须发生。

For anything but an inner loop that's called many times, the differences between all of these options should be insignificant. 对于除了被称为多次的内循环之外的任何东西,所有这些选项之间的差异应该是微不足道的。

I needed to refactor code like yours to get rid of NumberFormatException. 我需要重构像你的代码来摆脱NumberFormatException。 The refactored Code: 重构的代码:

public static Integer parseInteger(final String str) {
    if (str == null || str.isEmpty()) {
        return null;
    }
    final Scanner sc = new Scanner(str);
    return Integer.valueOf(sc.nextInt());
}

As a Java 1.4 guy, I didn't know about java.util.Scanner . 作为Java 1.4的人,我不知道java.util.Scanner I found this interesting article: 我找到了这篇有趣的文章:

http://rosettacode.org/wiki/Determine_if_a_string_is_numeric#Java http://rosettacode.org/wiki/Determine_if_a_string_is_numeric#Java

I personaly liked the solution with the scanner, very compact and still readable. 我个人喜欢使用扫描仪的解决方案,非常紧凑,仍然可读。

Some languages, like C#, have a TryParse (or equivalent) that works fairly well for something like this. 有些语言,比如C#,有一个TryParse(或等价的),可以很好地适用于这样的事情。

public boolean IsInteger(string value)
{
  int i;
  return Int32.TryParse(value, i);
}

Personally I would do this if you really want to simplify it. 我个人会这样做,如果你真的想简化它。

public boolean isInteger(string myValue)
{
    int myIntValue;
    return int.TryParse(myValue, myIntValue)
}

You could create an extension method for a string, and make the whole process look cleaner... 您可以为字符串创建扩展方法,并使整个过程看起来更清晰......

public static bool IsInt(this string str)
{
    int i;
    return int.TryParse(str, out i);
}

You could then do the following in your actual code... 然后,您可以在实际代码中执行以下操作...

if(myString.IsInt())....

Using .NET, you could do something like: 使用.NET,您可以执行以下操作:

private bool isNumber(string str)
{
    return str.Any(c => !char.IsDigit(c));
}

That's my implementation to check whether a string is made of digits: 这是我的实现来检查字符串是否由数字组成:

public static boolean isNumeric(String string)
{
    if (string == null)
    {
        throw new NullPointerException("The string must not be null!");
    }
    final int len = string.length();
    if (len == 0)
    {
        return false;
    }
    for (int i = 0; i < len; ++i)
    {
        if (!Character.isDigit(string.charAt(i)))
        {
            return false;
        }
    }
    return true;
}

I like code: 我喜欢代码:

public static boolean isIntegerRegex(String str) {
    return str.matches("^[0-9]+$");
}

But it will good more when create Pattern before use it: 但在使用它之前创建Pattern会更好:

public static Pattern patternInteger = Pattern.compile("^[0-9]+$");
public static boolean isIntegerRegex(String str) {
  return patternInteger.matcher(str).matches();
}

Apply by test we have result: 通过测试申请我们有结果:

This operation isIntegerParseInt took 1313 ms.
This operation isIntegerRegex took 1178 ms.
This operation isIntegerRegexNew took 304 ms.

With: 附:

public class IsIntegerPerformanceTest {
  private static Pattern pattern = Pattern.compile("^[0-9]+$");

    public static boolean isIntegerParseInt(String str) {
    try {
      Integer.parseInt(str);
      return true;
    } catch (NumberFormatException nfe) {
    }
    return false;
  }

  public static boolean isIntegerRegexNew(String str) {
    return pattern.matcher(str).matches();
  }

  public static boolean isIntegerRegex(String str) {
    return str.matches("^[0-9]+$");
  }

    public static void main(String[] args) {
        long starttime, endtime;
    int iterations = 1000000;
    starttime = System.currentTimeMillis();
    for (int i = 0; i < iterations; i++) {
      isIntegerParseInt("123");
      isIntegerParseInt("not an int");
      isIntegerParseInt("-321");
    }
    endtime = System.currentTimeMillis();
    System.out.println("This operation isIntegerParseInt took " + (endtime - starttime) + " ms.");
    starttime = System.currentTimeMillis();
    for (int i = 0; i < iterations; i++) {
      isIntegerRegex("123");
      isIntegerRegex("not an int");
      isIntegerRegex("-321");
    }
    endtime = System.currentTimeMillis();
    System.out.println("This operation took isIntegerRegex " + (endtime - starttime) + " ms.");
    starttime = System.currentTimeMillis();
    for (int i = 0; i < iterations; i++) {
      isIntegerRegexNew("123");
      isIntegerRegexNew("not an int");
      isIntegerRegexNew("-321");
    }
    endtime = System.currentTimeMillis();
    System.out.println("This operation took isIntegerRegexNew " + (endtime - starttime) + " ms.");
  }
}

I think It could be faster than previous solutions if you do the following (Java): 我认为如果您执行以下操作(Java),它可能比以前的解决方案更快:

public final static boolean isInteger(String in)
{
    char c;
    int length = in.length();
    boolean ret = length > 0;
    int i = ret && in.charAt(0) == '-' ? 1 : 0;
    for (; ret && i < length; i++)
    {
        c = in.charAt(i);
        ret = (c >= '0' && c <= '9');
    }
    return ret;
}

I ran the same code that Asaph ran and the result was: 我运行了Asaph运行的相同代码,结果如下:

This operation took 28 ms. 此操作耗时28毫秒。

A huge difference (against 1691 ms and 2049 ms -on my computer). 一个巨大的差异(对我的电脑1691毫秒和2049毫秒)。 Take in account that this method does not validate if the string is null, so you should do that previously (including the String trimming) 请注意,此方法不验证字符串是否为空,因此您应该先执行此操作(包括字符串修剪)

I think people here is missing a point. 我认为这里的人缺少一点。 The use of the same pattern repeatedly has a very easy optimization. 重复使用相同的模式具有非常容易的优化。 Just use a singleton of the pattern. 只需使用模式的单例。 Doing it, in all my tests the try-catch approach never have a better benchmark than the pattern approach. 这样做,在我的所有测试中,try-catch方法永远不会比模式方法有更好的基准。 With a success test try-catch takes twice the time, with a fail test it's 6 times slower. 成功测试try-catch需要两倍的时间,失败测试的速度要慢6倍。

public static final Pattern INT_PATTERN= Pattern.compile("^-?[0-9]+(\\.[0-9]+)?$");

public static boolean isInt(String s){
  return INT_PATTERN.matcher(s).matches();
}
public static boolean CheckString(String myString) {

char[] digits;

    digits = myString.toCharArray();
    for (char div : digits) {// for each element div of type char in the digits collection (digits is a collection containing div elements).
        try {
            Double.parseDouble(myString);
            System.out.println("All are numbers");
            return true;
        } catch (NumberFormatException e) {

            if (Character.isDigit(div)) {
                System.out.println("Not all are chars");

                return false;
            }
        }
    }

    System.out.println("All are chars");
    return true;
}

I use this but I liked Asaph's rigor in his post. 我使用这个,但我喜欢Asaph在帖子中的严谨性。

public static bool IsNumeric(object expression)
{
if (expression == null)
return false;

double number;
return Double.TryParse(Convert.ToString(expression, CultureInfo.InvariantCulture),   NumberStyles.Any,
NumberFormatInfo.InvariantInfo, out number);
}

For long numbers use this: (JAVA) 对于长数字使用此:( JAVA)

public static boolean isNumber(String string) {
    try {
        Long.parseLong(string);
    } catch (Exception e) {
        return false;
    }
    return true;
}
 public static boolean isNumber(String str){
      return str.matches("[0-9]*\\.[0-9]+");
    }

to check whether number (including float, integer) or not 检查数字是否(包括浮点数,整数)

A modified version of my previous answer: 我之前回答的修改版本:

public static boolean isInteger(String in)
{
    if (in != null)
    {
        char c;
        int i = 0;
        int l = in.length();
        if (l > 0 && in.charAt(0) == '-')
        {
            i = 1;
        }
        if (l > i)
        {
            for (; i < l; i++)
            {
                c = in.charAt(i);
                if (c < '0' || c > '9')
                    return false;
            }
            return true;
        }
    }
    return false;
}

I just added this class to my utils: 我刚把这个类添加到我的utils中:

public class TryParseLong {
private boolean isParseable;

private long value;

public TryParseLong(String toParse) {
    try {
        value = Long.parseLong(toParse);
        isParseable = true;
    } catch (NumberFormatException e) {
        // Exception set to null to indicate it is deliberately
        // being ignored, since the compensating action
        // of clearing the parsable flag is being taken.
        e = null;

        isParseable = false;
    }
}

public boolean isParsable() {
    return isParseable;
}

public long getLong() {
    return value;
}
}

To use it: 要使用它:

TryParseLong valueAsLong = new TryParseLong(value);

if (valueAsLong.isParsable()) {
    ...
    // Do something with valueAsLong.getLong();
} else {
    ...
}

This only parses the value once. 这只解析一次值。

It still makes use of the exception and control flow by exceptions, but at least it encapsulates that kind of code in a utility class, and code that uses it can work in a more normal way. 它仍然通过异常使用异常和控制流,但至少它将这种代码封装在实用程序类中,使用它的代码可以更正常的方式工作。

The problem with Java versus C#, is that C# has out values and pass by reference, so it can effectively return 2 pieces of information; Java与C#的问题在于C#具有值并通过引用传递,因此它可以有效地返回2条信息; the flag to indicate that something is parsable or not, and the actual parsed value. 指示某些内容是否可解析的标志,以及实际解析的值。 When we reutrn >1 value in Java, we need to create an object to hold them, so I took that approach and put the flag and the parsed value in an object. 当我们在Java中重新设置> 1的值时,我们需要创建一个对象来保存它们,所以我采用了这种方法并将标记和解析后的值放在一个对象中。

Escape analysis is likely to handle this efficiently, and create the value and flag on the stack, and never create this object on the heap, so I think doing this will have minimal impact on performance. 转义分析可能会有效地处理这个问题,并在堆栈上创建值和标志,并且永远不会在堆上创建此对象,因此我认为这样做对性能的影响最小。

To my thinking this gives about the optimal compromise between keeping control-flow-by-exception out your code, good performance, and not parsing the integer more than once. 根据我的想法,这给出了保持控制 - 异常流出代码,良好性能和不多次解析整数之间的最佳折衷。

public static boolean CheckIfNumber(String number){ public static boolean CheckIfNumber(String number){

    for(int i = 0; i < number.length(); i++){
        try{
            Double.parseDouble(number.substring(i));

        }catch(NumberFormatException ex){
            return false;
        }
    }
    return true;     
}

I had this problem before but when I had input a number and then a character, it would still return true, I think this is the better way to do it. 之前我遇到过这个问题,但是当我输入一个数字然后输入一个字符时,它仍然会返回true,我认为这是更好的方法。 Just check if every char is a number. 只需检查每个字符是否都是数字。 A little longer but it takes care if you have the situation of a user inputting "1abc". 稍长一点,但如果您有用户输入“1abc”的情况,则需要注意。 For some reason, when I tried to try and catch without iterating, it still thought it was a number so.. 出于某种原因,当我试图在没有迭代的情况下尝试捕获时,它仍然认为它是一个数字所以..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM