简体   繁体   中英

Looking for a regex to match a specific arithmetic String in Java

For a certain Java homework of mine I've been tasked with evaluating an arithmetic expression and deciding whether or not it is a valid expression. The expressions contain 3 types of brackets ( {},[],() ), digits,whitespace, and the +-* and / operators. However, the strings could also contain random junk in which case I shouldnt parse the expression at all.

I have implemented the following regex, and hopefully it will work:

Pattern p = Pattern.compile("[^0-9\\[-\\]\\{-\\}\\(-\\)+*\\/\\s-]");

However, I am not very familiar with regex and was wondering if anyone could take a second look?

The suggested syntax isn't regular. The 'regular' in 'Regular Expression' isn't just an arbitrary word, nor is it the last name of its inventor. It's a description of a subset of all imaginable syntaxes.

If a syntax isn't regular, a regex can't parse it properly. Many, many things aren't regular. Basic math (pretty much everything with hierarchy/recursive elements) isn't.

You can't use regexes to parse this stuff.

It sounds like your plan is: "Let's first reject invalid inputs, and then I'll start thinking about where to go from there". This is the wrong approach, you've got it twisted around. Start parsing the input; if it's invalid you'll figure it out trivially as you process it.

The answer to 'parse eg 5 + (2 * {3 * 10}) * [6 / 2] ' does not involve regular expressions in any way.

note that \\\\{-\\\\} means: "All characters whose unicode values lie between the unicode value of the { char and the unicode value of the } char are valid", which you didn't mean there - you wanted just \\\\{\\\\} (as in: The { character and the } character). Once you fix this, your regex will correctly 'detect' any characters that are flat out invalid, such as the letter 'a'. However, it will allow a great many things that aren't valid, such as:

  • (
  • {} * ()
  • 5+++2
  • ---5555---
  • ///
  • (5+2}
  • 99999999999999999999999999 (that's more than fits in int )
  • 5 2
  • {(5 + 2) * (3 + [5 + 2)}

you can't write a regexp that properly denies all these.

The regular expression to validate just the mentioned set of characters may be simplified and used with String::matches :

// return true if expression contains only valid characters, false otherwise
private static boolean hasValidChars(String expr) {
    return expr.matches("[-+*/0-9\\s(){}\\[\\]]*");
}

For the given set of characters, only square brackets [ , ] need to be escaped while used inside character range, - should not be escaped being the first character in the range.


If the regular expression should return true if invalid character is detected, existing expression should be negated :

private static boolean hasInvalidChars(String expr) {
    return expr.matches("(?![-+*/0-9\\s(){}\\[\\]]*$).*");
}

or the character set could be negated [^...] ( .* need to be supplied to look for any invalid character in the expression, as matches checks the entire string.)

private static boolean hasInvalidChars(String expr) {
    return expr.matches(".*([^-+*/\\s0-9(){}\\[\\]]).*");
}

Tests:

for (String expr : Arrays.asList("(123 + 456) - 789", "abc + 322", "(33 / [11 - x])")) {
    System.out.println(expr);
    System.out.println("invalid? " + hasInvalidChars(expr));
    System.out.println("valid? " + hasValidChars(expr));
    System.out.println("---");
}

Output:

(123 + 456) - 789
invalid? false
valid? true
---
abc + 322
invalid? true
valid? false
---
(33 / [11 - x])
invalid? true
valid? false
---

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM