简体   繁体   中英

Java String split with regex

I'm trying to split an arithmetic equation represented as a String and I want to retain the multi-character delimiters: {==, !=, >=, <=, >, <}

This is what I have:

String expression = "2*(5 +1)- 3 * 2 >= 6^3.1 + 5";
expression = expression.replaceAll("\\s", "");
String[] parsedExpression = expression.split("((?<===)|(?===))|"
            + "((?<=>=)|(?=>=))|"
            + "((?<=!=)|(?=!=))|"
            + "((?<=<=)|(?=<=))|"
            + "((?<=>)|(?=>))|"
            + "((?<=<)|(?=<))");

However it splits it like this:

[2*(5+1)-3*2, >, =, 6^3.1+5]

When the desired split would be this:

[2*(5+1)-3*2, >=, 6^3.1+5]

I'm guessing the problem is that it's my rule for using > and < as a delimiter that's causing the problem but I don't know how to fix it.

Append negative look-aheads (?!=) to the < and > look-arounds to make sure they do not match when = is part of the operator:

String[] parsedExpression = expression.split("((?<===)|(?===)|"
        + "(?<=>=)|(?=>=)|"
        + "(?<=!=)|(?=!=)|"
        + "(?<=<=)|(?=<=)|"
        + "(?<=>(?!=))|(?=>(?!=))|"   // See here
        + "(?<=<(?!=))|(?=<(?!=)))"); // and here

See IDEONE demo

The System.out.println(Arrays.toString(parsedExpression)); prints [2*(5+1)-3*2, >=, 6^3.1+5] .

Can there be only one token (">", ">=", "==", etc.) in the expression? Are you allowing something like "5 < 6 < 7"?

Though it doesn't use regex, you could try something like this.

String[] parsedExpression = new String[3]; // assuming form "3 < 4". may need to modify a little
String[] tokens = {"==", "!=", "<", ">", "<=", ">="};

int idxOfToken = expression.indexOf(try every token until one is present);
String comparOp = ""; // set to operator you found
int additional = comparOp.length() == 2 ? 2 : 1;

parsedExpression[0] = expression.substring(0, idxOfToken);
parsedExpression[1] = comparOp;
parsedExpression[2] = expression.substring(idxOfToken + additional);

Because you are splitting on lookarounds, which have zero width, even if you match a two character pattern initially, the match position doesn't move past the whole pattern. Instead, you can match a further part of the pattern.

So, even if you initially match: >= , the = within that will match a second time.

While split with lookarounds can be made to work for your problem, it makes a real mess of a regex that is hard to understand. It would be better, and simpler, to take a different approach.

For example, you could match either a delimiter or a not delimiter:

/[^><=]+|[><=]+/

The list of all matches to such a pattern would split the string as you want it. This makes certain assumptions about your input data, but it could be easily tweaked if necessary. For example, it could be expanded to only match on valid delimiters.

See it work here .

Update 1: The 4th pattern wasn't working right.

You want to split before and after only these: == , != , >= , <= , > , < , so (spaces for clarity, eg using COMMENTS / (?x) ):

  • (?= [=!]= ) : Before == , !=
  • (?= [><] ) : Before > , < (incl. >= , <= )
  • (?<= [=!><]= ) : After == , != , >= , <=
  • (?<= [><](?!=) ) : After > , < , not followed by =

The first two can be combined using | as (?= [=!]= | [><] ) .
The last two can be combined using | as (?<= [=!><]= | [><](?!=) ) .

So, all combined it means (?= [=!]= | [><] ) | (?<= [=!><]= | [><](?!=) ) (?= [=!]= | [><] ) | (?<= [=!><]= | [><](?!=) ) using COMMENTS flag, or just:

(?=[=!]=|[><])|(?<=[=!><]=|[><](?!=))

Test

String regex = "(?=[=!]=|[><])|(?<=[=!><]=|[><](?!=))";
String[] split = "2*(5+1)-3*2 >= 6^3.1+5".split(regex);
System.out.println(Arrays.toString(split));

split = "a == b != c >= d <= e > f < g = h ! i".split(regex);
System.out.println(Arrays.toString(split));

Output

[2*(5+1)-3*2 , >=,  6^3.1+5]
[a , ==,  b , !=,  c , >=,  d , <=,  e , >,  f , <,  g = h ! i]

Update 2

For a complete answer, I'd like to showcase a solution based on ideas by both Pushkin and dan1111 , which is to simply find the operator.

The pattern is much simpler and easier to understand. Likely performs better too.

String text = "2*(5+1)-3*2 >= 6^3.1+5";
Matcher m = Pattern.compile("[=!]=|[><]=?").matcher(text);
if (m.find()) {
    String left  = text.substring(0, m.start());
    String oper  = m.group(); // or text.substring(m.start(), m.end());
    String right = text.substring(m.end());
    System.out.printf("[%s, %s, %s]%n", left, oper, right);
}

Output

[2*(5+1)-3*2 , >=,  6^3.1+5]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM