简体   繁体   中英

Regular expression checking for integer validity and range

I want to make a regular expression that can help me get rid of the following piece of code -

public class Test {
    public static void main(String[] args) {
        String test = "1026";
        int testToInt = 0;
        if(checkIfInteger(test))
            testToInt = Integer.parseInt(test);
        if(testToInt >= 1024 && testToInt <= 65535)
            System.out.println("Validity is perfect");
        else
            System.out.println("Validity is WRONG");
    }

    public static boolean checkIfInteger(String givenString) {
        boolean check = false;
        for(int i = 0; i < givenString.length(); i++) {
            if(givenString.charAt(i) >= '0' && givenString.charAt(i) >= '9')
                check = true;
            else {
                check = false;
                break;
            }
        }
        return check;
    }
}

Basically, it is checking if a String contains only numeric digits and also that its range is between 1024 to 65535.

For this purpose, I created the following regex -

"\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"

But there's a lot of values for which it fails. Can someone give me a smarter / correct way to do it?

Here's a test file if you would want to test your regex -

public class Test {
    public static void main(String[] args) {

        for (int i = 0; i < 1024; i++) {
            if (String
                    .valueOf(i)
                    .matches(
                            "\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
                System.out.println("Hum " + i);
        }


        for (int i = 1025; i < (int) Math.pow(2, 16); i++) {
            if (!String
                    .valueOf(i)
                    .matches(
                            "\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
                System.out.println("Hum " + i);
        }

        for (int i = 0; i < 100; i++) {
            if (String
                    .valueOf((int)Math.pow(2, 16) + i)
                    .matches(
                            "\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
                System.out.println("Hum " + i);
        }

    }
}

Change your code

from:

 testToInt = Integer.parseInt(test);
        if(testToInt >= 1024 && testToInt <= 65535)
            System.out.println("Validity is perfect");
        else
            System.out.println("Validity is WRONG");

To:

try {
      testToInt = Integer.parseInt(test);
     if(testToInt >= 1024 && testToInt <= 65535)
        System.out.println("Validity is perfect");
    else
        System.out.println("Validity is WRONG");
    }  
      catch(NumberFormatException nfe)  
   {  
      System.out.println("Validity is WRONG"); 
   }  

In Java, you need to use double escaped symbols, so after fixing this bit your regex string looks like:

String pattern = "\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\\b";

This already fixes a lot, I only get these "Hum"s:

Hum 65526                                                                                                                                                           
Hum 65527                                                                                                                                                           
Hum 65528                                                                                                                                                           
Hum 65529                                                                                                                                                           
Hum 65530                                                                                                                                                           
Hum 65531                                                                                                                                                           
Hum 65532                                                                                                                                                           
Hum 65533                                                                                                                                                           
Hum 65534                                                                                                                                                           
Hum 65535 

Now, adding |6553[0-5] I get a fully working regex:

String pattern = "\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[012][0-9]|6552[0-5]|6553[0-5])\\b";

The example program based on your testing code is available here .

Throwing an Exception here would IMO be a better strategy than returning a boolean.

Something like:

public int parseAndCheck(String val, int low, int high) throws IllegalArgumentException {
  try {
    int num = Integer.parseInt(val);
    if (num < low || num > high) throw new IllegalArgumentException(val);
    return num;
  }
  catch (NumberFormatException ex) {
    throw new IllegalArgumentException(ex);
  }
}
^(?:102[4-9]|10[3-9]\d|1[1-9]\d{2}|[2-9]\d{3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$

You can try this regex.See demo.

https://regex101.com/r/sJ9gM7/70

Just because you can do this with regular expressions doesn't mean you should . Not only is it error-prone and the code pretty much unreadable, but it's quite slow.

Given code like:

var intStrings = IntStream.range(0, 70000).mapToObj(Integer::toString).toArray(String[]::new);
var badStrings = IntStream.range(0, 70000).mapToObj(x -> "not an int " + x).toArray(String[]::new);

and using the regexp from Wiktor's answer :

var re = Pattern.compile("\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[012][0-9]|6552[0-5]|6553[0-5])\\b");

var matchCount = 0;
for (int i = 0, len = intStrings.length; i < len; i++) { 
  matchCount = re.matcher(intStrings[i]).matches() ? 1 + matchCount : matchCount;
  matchCount = re.matcher(badStrings[i]).matches() ? 1 + matchCount : matchCount;
} 

is going to take about twelve times longer than the same number of iterations of the character-checking version:

boolean valid(String s) {
  var len = s.length();
  if (len > 5) { // anything longer than this will be > 65535
    return false;
  }
  for (int i = 0; i < len; i++) {
    var c = s.charAt(i);
    if (c < '0' || c > '9') {
      return false;
    }
  }
  try {
    var intVal = Integer.parseInt(s);
    return intVal >= 1024 && intVal <= 65535;
  } catch (NumberFormatException e) {
    throw new IllegalStateException(e); // never happen
  }
}

The try / catch version, while much simpler --

boolean valid(String s) {
  try {
    var intVal = Integer.parseInt(s);
    return intVal >= 1024 && intVal <= 65535;
  }
  catch (NumberFormatException e) {
    return false;
  }
}

-- is about 450 times slower than the character-checking version, and 35 times slower than the regexp version.

That said, if you expect nearly all inputs to be valid , or if the code is not going to be called very often, try / catch is the best choice, because it's easy to read and the intent is very clear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM