简体   繁体   中英

Split a string using regex into 3 parts

I have a string like this

DATA/2019-00-01-23.x

I want to get three tokens Text, Date and Hour

[DATA, 2019-00-01, 23]

I tried this

String x = "DATA/2019-00-01-23.x";
System.out.println(Arrays.toString(x.split("/|-[0-9]+.")))

This returns me

[DATA, 2019, 01, x]

You may actually use a split like

x.split("/|-(?=[^-]*$)|\\D+$")

See the Java demo , output: [DATA, 2019-00-01, 23] .

This regex will split at

  • / - a slash
  • | - or
  • -(?=[^-]*$) - last hyphen in the string
  • | - or
  • \\D+$ - any 1+ non-digit chars at the end of the string (as String.split(regex) is run with limit argument as 0 , these matches at the end of the string do not result in trailing empty items in the resulting array.)

Solution 1

You can replace the last part after the dot, then using split with /|(\\-)(?!.*\\-) :

String[] split = "DATA/2019-00-01-23.x".replaceFirst("\\..*$", "")
    .split("/|(\\-)(?!.*\\-)"); // [DATA, 2019-00-01, 23]

Solution 2

I would go with Pattern and Matcher and groups like so (.*?)/(.*?)-([^-]+)\\\\..* :

Pattern pattern = Pattern.compile("(.*?)/(.*?)-([^-]+)\\..*");
Matcher matcher = pattern.matcher("DATA/2019-00-01-23.x");
if(matcher.find()){
    System.out.println(matcher.group(1)); // DATA
    System.out.println(matcher.group(2)); // 2019-00-01
    System.out.println(matcher.group(3)); // 23
}

Or by using Java9+ you can use :

String[] result = Pattern.compile("(.*?)/(.*?)-([^-]+)\\..*")
        .matcher("DATA/2019-00-01-23.x")
        .results()
        .flatMap(grps -> Stream.of(grps.group(1), grps.group(2), grps.group(3)))
        .toArray(String[]::new);

Outputs

[DATA, 2019-00-01, 23]

Use capturing groups to extract the three parts.

private static final Pattern PATTERN = Pattern.compile("(.+)/([-0-9]+)-([0-9]{1,2})\\..*");

public static void main(String... args) {
    Matcher matcher = PATTERN.matcher("DATA/2019-00-01-23.x");

    if (matcher.matches() && matcher.groupCount() == 3) {
        String text = matcher.group(1);
        String date = matcher.group(2);
        String hour = matcher.group(3);
        System.out.println(text + "\t" + date + '\t' + hour);
    }
}

Dissected: (.+) / ([-0-9]+) - ([0-9]{2}) \\..*

  • (.+) Everything before the /
  • ([-0-9]+) Numbers, can contain -
  • - to prevent the previous part from gobbling up the hour
  • ([0-9]{2}) Two numbers
  • \\..* A period, then 'the rest'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM