简体   繁体   English

带有分支选择器的 Java 正则表达式

[英]Java regex with branch selector

Is there any way to somehow set a value in the same field across different Java regex "branches", so that a switch-like statement later on can identify which branch was followed?有没有办法以某种方式在不同的Java正则表达式“分支”的同一字段中设置一个值,以便稍后类似 switch 的语句可以识别遵循哪个分支?

For example, in a PCRE with 3 "branches" like例如,在具有 3 个“分支”的PCRE ,例如

(\S+|d+|\s+)

the ideal answer would be to have a common variable (say, selector ) that would be set to different values (say, "non-space", "digit" and "space"), so that a switch statement like理想的答案是有一个公共变量(例如selector ),该变量将设置为不同的值(例如,“非空格”、“数字”和“空格”),以便像这样的switch语句

case "non-space":
case "digit":
case "space":

can be executed afterwards.之后可以执行。

The use case relates to a regex engine that understands Java regular expressions but does not allow execution of Java code, so if there is an answer, it has to be fully regex-based.该用例与理解Java正则表达式但不允许执行Java代码的正则表达式引擎有关,因此如果有答案,它必须完全基于正则表达式。

Probably the above can't be done, so any advise on workarounds is also welcome.以上可能无法完成,因此也欢迎任何有关解决方法的建议。 :-) :-)

There is not any regex selector available in Java. Java 中没有任何可用的正则表达式选择器。

However you car use Matcher and groups.但是,您的汽车使用Matcher和组。

Pattern pattern = Pattern.compile("(\\S+)|(\\d+)|(\\s+)");
Matcher m = pattern.matcher(input);
if (m.find()) {
    if (m.group(1) != null) { // non-space

    }
    if (m.group(2) != null) { // digit

    }
    if (m.group(3) != null) { // space

    }
}

In Java, the closest you could get is the alternation (as you show) to在 Java 中,你能得到的最接近的是交替(如你所示)到
execute different code paths.执行不同的代码路径。 Regex logic is a little different than if/then/else logic.正则表达式逻辑与 if/then/else 逻辑略有不同。

 (?:
      (?:                # ----------
           (?<a> )            # (1)
                              # do a code
        |  (?<b> )            # (2)
                              # do b code
        |  (?<c> )            # (3)
                              # do c code
      )                  # ---------

      # Common code
 )+

PCRE has additional logic called conditionals. PCRE 具有称为条件的附加逻辑。 It's most like a switch statement.它最像一个 switch 语句。

 (?:                # ----------
      (?:
           (?<a> )            # (1)
        |  (?<b> )            # (2)
        |  (?<c> )            # (3)
      )                  # ---------

      (?(<a>)            # did a match
                              # do a code
        |                   # else
           (?(<b>)            # did b match
                                   # do b code
             |                   # else
                                   # do c code
           )
      )

      # Common code
 )+

But, as you can see, there is really no difference between the two in this但是,正如您所看到的,这两者之间确实没有区别
context.语境。

The primary and really only use for regex conditionals ( imo )主要且实际上仅用于正则表达式条件 (imo)
is as a flag to fail or accept a match at a certain point in the code.作为在代码中的某个点失败或接受匹配的标志。
This gives the engine a chance to retry a different combination.这使引擎有机会重试不同的组合。 (?(<a>)|(?!))

Keep in mind that the use of assertions will go a long way to inject logic请记住,使用断言将大大有助于注入逻辑
into the code.进入代码。 This is available in the Java engine.这在 Java 引擎中可用。

PCRE also have function call construct as well that can be called PCRE也有可以调用的函数调用构造
recursively if needed to do balanced text matches.如果需要进行平衡的文本匹配,则递归。 However, this is not然而,这并不是
available in Java.在 Java 中可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM