简体   繁体   中英

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.

String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]

String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];

// checker becomes machine

My goal is to parse that text string and just return back machine . Which is what I did in the code above.

But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?

Use a regex' lookbehind:

(?<=\bid=)[^],]*

See Regex101 .

(?<=     )            // Start matching only after what matches inside
    \bid=             // Match "\bid=" (= word boundary then "id="),
          [^],]*      // Match and keep the longest sequence without any ']' or ','

In Java, use it like this:

import java.util.regex.*;

class Main {
  public static void main(String[] args) {
    Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
    Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
    if (matcher.find()) {
      System.out.println(matcher.group(0));
    }
  }
}

This results in

machine

Assuming you're using the Polarion ALM API, you should use the EnumOption 's getId method instead of deparsing and re-parsing the value via a string:

String id = test.get(i).getId();

Using the replace and split functions don't take the structure of the data into account.

If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ] .

The value of id will be in capture group 1.

\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]

Explanation

  • \bEnumOption Match EnumOption preceded by a word boundary
  • \[enumId= Match [enumId=
  • [^=,\]]+, Match 1+ times any char except = , and ]
  • id= Match literally
  • ( Capture group 1
    • [^\]]+ Match 1+ times any char except ]
  • )\]

Regex demo | Java demo

Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");

if (matcher.find()) {
    System.out.println(matcher.group(1));
}

Output

machine

If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.

\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]

In Java

String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";

Regex demo

A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.

I would advise you not use any regex that you did not come up with yourself, or at least understand completely.

PS: I think your solution is actually quite readable.

Here's another non-regex version:

String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);

Not doing you a favor, but the downvote hurt, so here you go:

String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
  throw new RuntimeException("unexpected input: " + input);
}

System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM