简体   繁体   中英

Java parse string using regex into variables

I need to extract variables from a string.

String format = "x:y";
String string = "Marty:McFly";

Then

String x = "Marty";
String y = "McFly";

but the format can be anything it could look like this y?x => McFly?Marty

How to solve this using regex?

Edit: current solution

        String delimiter = format.replace(Y, "");
        delimiter = delimiter.replaceAll(X, "");
        delimiter = "\\"+delimiter;

        String strings[] = string.split(delimiter);

        String x; 
        String y;
        if(format.startsWith(X)){
             x = strings[0];
             y = strings[1];
        }else{
             y = strings[0];
             x = strings[1];
        }

        System.out.println(x);
        System.out.println(y);

This works well, but I would prefer more clean solution.

There is no need for regex at all.

public static void main(String[] args) {
    test("x:y", "Marty:McFly");
    test("y?x", "McFly?Marty");
}
public static void test(String format, String input) {
    if (format.length() != 3 || Character.isLetterOrDigit(format.charAt(1))
                             || (format.charAt(0) != 'x' || format.charAt(2) != 'y') &&
                                (format.charAt(0) != 'y' || format.charAt(2) != 'x'))
        throw new IllegalArgumentException("Invalid format: \"" + format + "\"");
    int idx = input.indexOf(format.charAt(1));
    if (idx == -1 || input.indexOf(format.charAt(1), idx + 1) != -1)
        throw new IllegalArgumentException("Invalid input: \"" + input + "\"");
    String x, y;
    if (format.charAt(0) == 'x') {
        x = input.substring(0, idx);
        y = input.substring(idx + 1);
    } else {
        y = input.substring(0, idx);
        x = input.substring(idx + 1);
    }
    System.out.println("x = " + x);
    System.out.println("y = " + y);
}

Output

x = Marty
y = McFly
x = Marty
y = McFly

If the format string can be changed to be a regex, then using named-capturing groups will make it very simple:

public static void main(String[] args) {
    test("(?<x>.*?):(?<y>.*)", "Marty:McFly");
    test("(?<y>.*?)\\?(?<x>.*)", "McFly?Marty");
}
public static void test(String regex, String input) {
    Matcher m = Pattern.compile(regex).matcher(input);
    if (! m.matches())
        throw new IllegalArgumentException("Invalid input: \"" + input + "\"");
    String x = m.group("x");
    String y = m.group("y");
    System.out.println("x = " + x);
    System.out.println("y = " + y);
}

Same output as above, including value order.

You can use the following regex (\\\\w)(\\\\W)(\\\\w)

This will find any alphanumeric characters followed by any non alpha-numeric followed by another set of alpha numeric characters. The parenthesis will group the finds so group 1 will be parameter 1, group 2 will be the delimiter and group 3 will be parameter 2.

Comparing parameter 1 with parameter 2 can determine which lexical order they go in.

Sample

  public static void main(String[] args)  {
    testString("x:y", "Marty:McFly");
    testString("x?y", "Marty?McFly");
    testString("y:x", "Marty:McFly");
    testString("y?x", "Marty?McFly");
  }

  /**
   * 
   */
  private static void testString(String format, String string) {
    String regex = "(\\w)(\\W)(\\w)";

    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(format);

    if (!matcher.find()) throw new IllegalArgumentException("no match found");

    String delimiter = matcher.group(2);

    String param1 = matcher.group(1); 
    String param2 = matcher.group(3); 


    String[] split = string.split("\\" + delimiter);
    String x;
    String y;
    switch(param1.compareTo(param2)) {
      case 1:
        x = split[1];
        y = split[0];
        break;
      case -1:
      case 0:
      default:
        x = split[0];
        y = split[1];
    };

    System.out.println("String x: " + x);
    System.out.println("String y: " + y);

    System.out.println(String.format("%s%s%s", x, delimiter, y));
    System.out.println();
  }

This approach will allow you to have any type of format not just x and y. You can have any format that matches the regular expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM