简体   繁体   English

用于条件表达式的正则表达式

[英]Regex for conditional expressions

I need a regex that can divide into token an expression like this: 我需要一个可以将这样的表达式划分为令牌的正则表达式:

(6<=5) || (8+1)^2 >= 3 && 4 == 2   

The result should be a list like this: 结果应该是这样的列表:

(, 6, <=, 5, ), ||, (, 8, +, 1, ), ^, 2, >=, 3, &&, 4, ==, 2

I made this one but it doesn't work, it gives me this result: 我做了这个,但是不起作用,它给了我这个结果:

[(, 6, 5, ), (, 8, +, 1, ), ^, 2, 3, 4, 2]

This is the regex: 这是正则表达式:

[-]?[0-9]*+([eE][-]?[0-9]+)?|([+-/*///^])|([/(/)])|(>=)|(<=)|(&&)|(==)|(||)

It does recognize the numbers and the arithmetic symbols, but it doesn't work with the symbols for the conditions (&&, ==, ||, <=, >=). 它可以识别数字和算术符号,但不适用于条件(&&,==,||,<=,> =)的符号。

Do you know how to correct it? 你知道如何纠正吗?

Edit: this is the code: 编辑:这是代码:

public void convertToList() {
    String regex = "[-]?[0-9]+([eE][-]?[0-9]+)?|([-+/*\\\\^])|([()])|(>=)|(<=)|(&&)|(==)|([|][|])";
    Matcher m3 = Pattern.compile(regex).matcher(this.stringExp);
    this.arrayExp = new ArrayList<String>(this.stringExp.length());
    while (m3.find()) {
        this.arrayExp.add(m3.group());
    }
}

but even with the regex corrected by m.butter it doesn't work (same result as above) 但是即使使用m.butter纠正了正则表达式也不起作用(与上述结果相同)

Edit: The regex provided works, I made a stupid mistake with the input. 编辑:正则表达式提供了作品,我在输入中犯了一个愚蠢的错误。

You have a couple of issues in your expression: 您的表达中有几个问题:

  • You didn't escape the range operator - in the character class [+-/*///^] , it can be written as [+\\-/*^] or [-+/*^] (no need to escape if first/last). 您没有转义范围运算符-在字符类[+-/*///^] ,可以将其写为[+\\-/*^][-+/*^] (无需转义)如果是第一个/最后一个)。
  • You didn't escape the | 您没有逃脱| in (||) , should be (\\|\\|) (||) ,应为(\\|\\|)
  • Your expression for numbers matches empty string, you don't want that. 您的数字表达式匹配空字符串,但您不希望这样。

Also a tip when tokenizing: Put the longest token first in the expression, in case of overlaps. 标记化时的一个技巧:如果有重叠,将最长的标记放在表达式的第一位。 That is put <= before [<=] to get one token instead of two. <=放在[<=]之前,以获得一个令牌而不是两个令牌。

All in all you could use something like: 总之,您可以使用类似:

\d+|[<>=]=|&&|\|\||[-+*/^()]

Replace \\d+ with something more complex for numbers if you want (but don't match empty strings). 如果需要,可以用\\d+替换一些更复杂的数字(但不要匹配空字符串)。

There are a few issues with your pattern. 您的模式有一些问题。

  1. You are using | 您正在使用| as an alternation. 作为交替。 Consequently, you cannot possibly use | 因此,您可能无法使用| to match literal pipes as well (how would the regex engine distinguish?). 也匹配文字管道(正则表达式引擎将如何区分?)。 Therefore, you need to escape the | 因此,您需要转义| that is supposed to match literally, or put it in a character class. 应该按字面意义进行匹配,或将其放在字符类中。

  2. Your escapes are the wrong way round. 逃生是错误的方法。 You need to use backslashes \\ instead of forward slashes / . 您需要使用反斜杠\\而不是正斜杠/

  3. - in a character class denotes a range unless you put it as the first or last character. -在字符类中表示一个范围,除非您将其作为第一个或最后一个字符。 That is problematic in your [+-...] character class. 在您的[+-...]角色类中,这是有问题的。 Either escape the hyphen or move it to the first or last position in the class. 逃脱连字符或将其移至类中的第一个或最后一个位置。

  4. Your first alternative (the number) allows empty matches, because everything is optional. 您的第一个选择(数字)允许空匹配,因为所有内容都是可选的。 That will give you a whole bunch of additional empty results you don't want. 这将为您提供大量您不想要的其他空结果。 Remove the * after the number. 删除数字后的*

Applying all of that gives: 应用所有这些可以得到:

[-]?[0-9]+([eE][-]?[0-9]+)?|([-+/*\\^])|([()])|(>=)|(<=)|(&&)|(==)|([|][|])

Note that you don't need to escape (, ) , and ^ inside a character class (unless the ^` is the first character). 请注意,您不需要inside a character class (unless the(,, and ^进行转义inside a character class (unless the ^`是第一个字符)。

Also note that to write this as a Java string you need to double up all backslashes: 还请注意,要将其写为Java字符串,您需要将所有反斜杠加倍:

str = "[-]?[0-9]+([eE][-]?[0-9]+)?|([-+/*\\\\^])|([()])|(>=)|(<=)|(&&)|(==)|([|][|])"

Finally, you can optimize this quite a lot if you get rid of all unnecessary parentheses, and make the necessary ones non-capturing (I've also merged the character classes): 最后,如果您消除了所有不必要的括号,并可以使必要的括号不被捕获,那么您可以对此进行很多优化(我也合并了字符类):

str = "[-]?[0-9]+(?:[eE][-]?[0-9]+)?|[-+/*\\\\^()]|>=|<=|&&|==|[|][|]"

Of course, this only works unless you want to use the captures to determine which kind of token each match was. 当然,除非您要使用捕获来确定每个匹配项是哪种令牌,否则这仅适用。

Working demo 工作演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM