简体   繁体   中英

regex optional group capturing JAVA

I have a pattern where a user specifies:

1998-2010:Make:model:trim:engine

trim and engine are optional, if present I should capture them; if not, the matcher should at least validate YMM.

([0-9]+-*[0-9]+):(.*):(.*):(.*):(.*)

This matches if all three are there, but how do I make the last two and only two fields optional?

Using a regular expression and ? , the “zero or one quantifier”

You can use ? to match zero or one of something, which is what you want to do with the last bit. However, your pattern needs a bit a modification to be more like [^:]* rather than .* . Some sample code and its output follow. The regular expression I ended up with was:

([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?
|-----| |-----| |-----|    |-----|      |-----|
   a       a       a          a            a

                       |-----------||-----------|
                             b            b

Each a matches a sequence of non colons (although you'd want to modify the first one to match years), and b is a non-capturing group (so it starts with ?: ) and matches zero or one time (because it has the final ? quantifier). This means that the fourth and fifth fields are optional. The sample code shows that this pattern matches in the case that there are three, four, or five fields present, and does not match if there are more than five fields or fewer than three.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        final Pattern p = Pattern.compile( "([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?" );
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            final Matcher m = p.matcher( string );
            if ( m.matches() ) {
                System.out.println( "\n=== Matches for: "+string+" ===" );
                final int count = m.groupCount();
                for ( int j = 0; j <= count; j++ ) {
                    System.out.println( j + ": "+ m.group( j ));
                }
            }
            else {
                System.out.println( "\n=== No matches for: "+string+" ===" );
            }
        }
    }
}
=== No matches for: a ===

=== No matches for: a:b ===

=== Matches for: a:b:c ===
0: a:b:c
1: a
2: b
3: c
4: null
5: null

=== Matches for: a:b:c:d ===
0: a:b:c:d
1: a
2: b
3: c
4: d
5: null

=== Matches for: a:b:c:d:e ===
0: a:b:c:d:e
1: a
2: b
3: c
4: d
5: e

=== No matches for: a:b:c:d:e:f ===

=== No matches for: a:b:c:d:e:f:g ===

=== No matches for: a:b:c:d:e:f:g:h ===

While it's certainly possible to match this kind of string by using a regular expression, it does seem like it might be easier to just split the string on : and check how many values you get back. That doesn't necessarily do other kinds of checking (eg, characters in each field), so maybe splitting isn't quite so useful in whatever non-minimal situation is motivating this.

Using String.split and a limit parameter

I noticed your comment on another post that recommended using String.split(String) (emphasis added):

Yes I know this function, but it work for me cause I have a string which is a:b:c:d:e:f:g:h.. but I just want to group the data as a:b:c:d:e if any as one and the rest of the string as another group

It's worth noting that there's a version of split that takes one more parameter, String.split(String,int) . The second parameter is a limit, described as:

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n , and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

This means that you could use split and the limit 6 to get up to five fields from your input, and you'd have the remaining input as the last string. You'd still have to check whether you had at least 3 elements, to make sure that there was enough input, but all in all, this seems like it might be a bit simpler.

import java.util.Arrays;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            System.out.println( "\n== Splits for "+string+" ===" );
            System.out.println( Arrays.toString( string.split( ":", 6 )));
        }
    }
}
== Splits for a ===
[a]

== Splits for a:b ===
[a, b]

== Splits for a:b:c ===
[a, b, c]

== Splits for a:b:c:d ===
[a, b, c, d]

== Splits for a:b:c:d:e ===
[a, b, c, d, e]

== Splits for a:b:c:d:e:f ===
[a, b, c, d, e, f]

== Splits for a:b:c:d:e:f:g ===
[a, b, c, d, e, f:g]

== Splits for a:b:c:d:e:f:g:h ===
[a, b, c, d, e, f:g:h]

Why not skip the regex and use split(":") . Seems to be straight forward. From the length of the resulting array you will then know whether or not model and engine etc was provided.

String str = "1998-2010:Make:model:trim:engine";
String[] parts  = str.split(":");
//parts[0] == Y
//parts[1] == M
//parts[2] == M
//etc

Edit: As others have mentioned, String.split uses a regex pattern too. In my oppinion that doesn't really matter though. To have a truly regex-less solution use StrwingUtils.split from apache commons (which does not use a regex at all) :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM