简体   繁体   中英

Java regex with branch selector

Is there any way to somehow set a value in the same field across different Java regex "branches", so that a switch-like statement later on can identify which branch was followed?

For example, in a PCRE with 3 "branches" like

(\S+|d+|\s+)

the ideal answer would be to have a common variable (say, selector ) that would be set to different values (say, "non-space", "digit" and "space"), so that a switch statement like

case "non-space":
case "digit":
case "space":

can be executed afterwards.

The use case relates to a regex engine that understands Java regular expressions but does not allow execution of Java code, so if there is an answer, it has to be fully regex-based.

Probably the above can't be done, so any advise on workarounds is also welcome. :-)

There is not any regex selector available in Java.

However you car use Matcher and groups.

Pattern pattern = Pattern.compile("(\\S+)|(\\d+)|(\\s+)");
Matcher m = pattern.matcher(input);
if (m.find()) {
    if (m.group(1) != null) { // non-space

    }
    if (m.group(2) != null) { // digit

    }
    if (m.group(3) != null) { // space

    }
}

In Java, the closest you could get is the alternation (as you show) to
execute different code paths. Regex logic is a little different than if/then/else logic.

 (?:
      (?:                # ----------
           (?<a> )            # (1)
                              # do a code
        |  (?<b> )            # (2)
                              # do b code
        |  (?<c> )            # (3)
                              # do c code
      )                  # ---------

      # Common code
 )+

PCRE has additional logic called conditionals. It's most like a switch statement.

 (?:                # ----------
      (?:
           (?<a> )            # (1)
        |  (?<b> )            # (2)
        |  (?<c> )            # (3)
      )                  # ---------

      (?(<a>)            # did a match
                              # do a code
        |                   # else
           (?(<b>)            # did b match
                                   # do b code
             |                   # else
                                   # do c code
           )
      )

      # Common code
 )+

But, as you can see, there is really no difference between the two in this
context.

The primary and really only use for regex conditionals ( imo )
is as a flag to fail or accept a match at a certain point in the code.
This gives the engine a chance to retry a different combination. (?(<a>)|(?!))

Keep in mind that the use of assertions will go a long way to inject logic
into the code. This is available in the Java engine.

PCRE also have function call construct as well that can be called
recursively if needed to do balanced text matches. However, this is not
available in Java.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM