简体   繁体   中英

Regex for conditional expressions

I need a regex that can divide into token an expression like this:

(6<=5) || (8+1)^2 >= 3 && 4 == 2   

The result should be a list like this:

(, 6, <=, 5, ), ||, (, 8, +, 1, ), ^, 2, >=, 3, &&, 4, ==, 2

I made this one but it doesn't work, it gives me this result:

[(, 6, 5, ), (, 8, +, 1, ), ^, 2, 3, 4, 2]

This is the regex:

[-]?[0-9]*+([eE][-]?[0-9]+)?|([+-/*///^])|([/(/)])|(>=)|(<=)|(&&)|(==)|(||)

It does recognize the numbers and the arithmetic symbols, but it doesn't work with the symbols for the conditions (&&, ==, ||, <=, >=).

Do you know how to correct it?

Edit: this is the code:

public void convertToList() {
    String regex = "[-]?[0-9]+([eE][-]?[0-9]+)?|([-+/*\\\\^])|([()])|(>=)|(<=)|(&&)|(==)|([|][|])";
    Matcher m3 = Pattern.compile(regex).matcher(this.stringExp);
    this.arrayExp = new ArrayList<String>(this.stringExp.length());
    while (m3.find()) {
        this.arrayExp.add(m3.group());
    }
}

but even with the regex corrected by m.butter it doesn't work (same result as above)

Edit: The regex provided works, I made a stupid mistake with the input.

You have a couple of issues in your expression:

  • You didn't escape the range operator - in the character class [+-/*///^] , it can be written as [+\\-/*^] or [-+/*^] (no need to escape if first/last).
  • You didn't escape the | in (||) , should be (\\|\\|)
  • Your expression for numbers matches empty string, you don't want that.

Also a tip when tokenizing: Put the longest token first in the expression, in case of overlaps. That is put <= before [<=] to get one token instead of two.

All in all you could use something like:

\d+|[<>=]=|&&|\|\||[-+*/^()]

Replace \\d+ with something more complex for numbers if you want (but don't match empty strings).

There are a few issues with your pattern.

  1. You are using | as an alternation. Consequently, you cannot possibly use | to match literal pipes as well (how would the regex engine distinguish?). Therefore, you need to escape the | that is supposed to match literally, or put it in a character class.

  2. Your escapes are the wrong way round. You need to use backslashes \\ instead of forward slashes / .

  3. - in a character class denotes a range unless you put it as the first or last character. That is problematic in your [+-...] character class. Either escape the hyphen or move it to the first or last position in the class.

  4. Your first alternative (the number) allows empty matches, because everything is optional. That will give you a whole bunch of additional empty results you don't want. Remove the * after the number.

Applying all of that gives:

[-]?[0-9]+([eE][-]?[0-9]+)?|([-+/*\\^])|([()])|(>=)|(<=)|(&&)|(==)|([|][|])

Note that you don't need to escape (, ) , and ^ inside a character class (unless the ^` is the first character).

Also note that to write this as a Java string you need to double up all backslashes:

str = "[-]?[0-9]+([eE][-]?[0-9]+)?|([-+/*\\\\^])|([()])|(>=)|(<=)|(&&)|(==)|([|][|])"

Finally, you can optimize this quite a lot if you get rid of all unnecessary parentheses, and make the necessary ones non-capturing (I've also merged the character classes):

str = "[-]?[0-9]+(?:[eE][-]?[0-9]+)?|[-+/*\\\\^()]|>=|<=|&&|==|[|][|]"

Of course, this only works unless you want to use the captures to determine which kind of token each match was.

Working demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM