简体   繁体   English

正则表达式评估匹配不正确

[英]Regex Evaluating Matches Incorrectly

I am having trouble getting a Java flavored Regular expression to evaluate a match correctly. 我在获取Java风格的正则表达式以正确评估匹配项方面遇到麻烦。 I define the following regular expressions: 我定义以下正则表达式:

//Any digit
static String NUM = "[0-9]";

//Exponent with only 3 digits specified
static String EXPONENT = "([Ee][+-]?" + NUM + "(" + NUM + "(" + NUM + ")?)?)";

static String NUMBER = "([+-]?((" + NUM + NUM + "*.?" + NUM + "*)|(." + NUM
        + NUM + "*))" + EXPONENT + "?)";

static String S_COMMA_S = "(( )*,( )*)";

static String NUM_DATA = "(" + NUMBER + "(" + S_COMMA_S + NUMBER + ")*)";

With how NUM_DATA is defined a possible match would be "123, 456" As far as my understanding goes, any list of numbers ending with a number and not a comma should be valid. 根据NUM_DATA的定义方式,可能的匹配为“ 123,456”。据我所知,以数字而不是逗号结尾的任何数字列表都应该有效。 However, according to the following test method, it matches a number list ending in a comma 但是,根据以下测试方法,它匹配以逗号结尾的数字列表

public static void main(String[] args) {
        System.out.println(NUM_DATA);
        String s = "123";
        System.out.println(s.matches(NUM_DATA));
        s = "123, 456";
        System.out.println(s.matches(NUM_DATA));
        s = "123, 456,";//HANGING COMMA, SHOULD NOT MATCH
        System.out.println(s.matches(NUM_DATA));
}

Which results in the following output: 结果如下:

(([+-]?(([0-9][0-9]*.?[0-9]*)|(.[0-9][0-9]*))([Ee][+-]?[0-9]([0-9]([0-9])?)?)?)((( )*,( )*)([+-]?(([0-9][0-9]*.?[0-9]*)|(.[0-9][0-9]*))([Ee][+-]?[0-9]([0-9]([0-9])?)?)?))*)
true
true
true

Where are my assumptions going wrong? 我的假设错在哪里? Or is this behavior incorrect? 还是这种行为不正确?

EDIT: I suppose I should post the behavior I am expecting 编辑:我想我应该发布我期望的行为

Matches: (Any list of comma separated numbers, including one number)
    1.222
    1.222, 324.4
    2.51e123, 3e2
    -.123e-12, 32.1231, 1e1, .111, -1e-1
Non-Matches:
    123.321,
    ,
    , 123.321

In your NUMBER regex you have a . 在NUMBER正则表达式中,您有个. which matches any character, also a comma in the end, you need to escape it \\. 匹配任何字符,最后也是逗号,您需要将其转义\\. , but in Java Strings \\ has to be escaped, so it is "\\\\." ,但在Java字符串\\中必须转义,因此为"\\\\." in a String. 在一个字符串中。

Your regex can be refactored to a shorter: 您的正则表达式可以重构为较短的形式:

^([+-]?(?:\.\d+|\d+(?:\.\d+)?)(?:[Ee][+-]?\d+)?)(?: *, *([+-]?(?:\.\d+|\d+(?:\.\d+)?)(?:[Ee][+-]?\d+)?))*$

This will still meet your requirements as you can see in this: 如您所见,这仍将满足您的要求:

RegEx Demo 正则演示

You will get all your numbers in matched groups. 您将在匹配组中获得所有号码。

I recommend you to use this regex with Pattern and Matcher API to avoid compiling this long regex again & again in String#matches . 我建议您将此正则表达式与PatternMatcher API结合使用,以避免再次在String#matches再次编译此长正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM