I want to make a regular expression that can help me get rid of the following piece of code -
public class Test {
public static void main(String[] args) {
String test = "1026";
int testToInt = 0;
if(checkIfInteger(test))
testToInt = Integer.parseInt(test);
if(testToInt >= 1024 && testToInt <= 65535)
System.out.println("Validity is perfect");
else
System.out.println("Validity is WRONG");
}
public static boolean checkIfInteger(String givenString) {
boolean check = false;
for(int i = 0; i < givenString.length(); i++) {
if(givenString.charAt(i) >= '0' && givenString.charAt(i) >= '9')
check = true;
else {
check = false;
break;
}
}
return check;
}
}
Basically, it is checking if a String contains only numeric digits and also that its range is between 1024 to 65535.
For this purpose, I created the following regex -
"\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"
But there's a lot of values for which it fails. Can someone give me a smarter / correct way to do it?
Here's a test file if you would want to test your regex -
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 1024; i++) {
if (String
.valueOf(i)
.matches(
"\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
System.out.println("Hum " + i);
}
for (int i = 1025; i < (int) Math.pow(2, 16); i++) {
if (!String
.valueOf(i)
.matches(
"\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
System.out.println("Hum " + i);
}
for (int i = 0; i < 100; i++) {
if (String
.valueOf((int)Math.pow(2, 16) + i)
.matches(
"\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
System.out.println("Hum " + i);
}
}
}
Change your code
from:
testToInt = Integer.parseInt(test);
if(testToInt >= 1024 && testToInt <= 65535)
System.out.println("Validity is perfect");
else
System.out.println("Validity is WRONG");
To:
try {
testToInt = Integer.parseInt(test);
if(testToInt >= 1024 && testToInt <= 65535)
System.out.println("Validity is perfect");
else
System.out.println("Validity is WRONG");
}
catch(NumberFormatException nfe)
{
System.out.println("Validity is WRONG");
}
In Java, you need to use double escaped symbols, so after fixing this bit your regex string looks like:
String pattern = "\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\\b";
This already fixes a lot, I only get these "Hum"s:
Hum 65526
Hum 65527
Hum 65528
Hum 65529
Hum 65530
Hum 65531
Hum 65532
Hum 65533
Hum 65534
Hum 65535
Now, adding |6553[0-5]
I get a fully working regex:
String pattern = "\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[012][0-9]|6552[0-5]|6553[0-5])\\b";
The example program based on your testing code is available here .
Throwing an Exception
here would IMO be a better strategy than returning a boolean.
Something like:
public int parseAndCheck(String val, int low, int high) throws IllegalArgumentException {
try {
int num = Integer.parseInt(val);
if (num < low || num > high) throw new IllegalArgumentException(val);
return num;
}
catch (NumberFormatException ex) {
throw new IllegalArgumentException(ex);
}
}
^(?:102[4-9]|10[3-9]\d|1[1-9]\d{2}|[2-9]\d{3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$
You can try this regex.See demo.
Just because you can do this with regular expressions doesn't mean you should . Not only is it error-prone and the code pretty much unreadable, but it's quite slow.
Given code like:
var intStrings = IntStream.range(0, 70000).mapToObj(Integer::toString).toArray(String[]::new);
var badStrings = IntStream.range(0, 70000).mapToObj(x -> "not an int " + x).toArray(String[]::new);
and using the regexp from Wiktor's answer :
var re = Pattern.compile("\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[012][0-9]|6552[0-5]|6553[0-5])\\b");
var matchCount = 0;
for (int i = 0, len = intStrings.length; i < len; i++) {
matchCount = re.matcher(intStrings[i]).matches() ? 1 + matchCount : matchCount;
matchCount = re.matcher(badStrings[i]).matches() ? 1 + matchCount : matchCount;
}
is going to take about twelve times longer than the same number of iterations of the character-checking version:
boolean valid(String s) {
var len = s.length();
if (len > 5) { // anything longer than this will be > 65535
return false;
}
for (int i = 0; i < len; i++) {
var c = s.charAt(i);
if (c < '0' || c > '9') {
return false;
}
}
try {
var intVal = Integer.parseInt(s);
return intVal >= 1024 && intVal <= 65535;
} catch (NumberFormatException e) {
throw new IllegalStateException(e); // never happen
}
}
The try
/ catch
version, while much simpler --
boolean valid(String s) {
try {
var intVal = Integer.parseInt(s);
return intVal >= 1024 && intVal <= 65535;
}
catch (NumberFormatException e) {
return false;
}
}
-- is about 450 times slower than the character-checking version, and 35 times slower than the regexp version.
That said, if you expect nearly all inputs to be valid , or if the code is not going to be called very often, try
/ catch
is the best choice, because it's easy to read and the intent is very clear.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.