简体   繁体   English

Java 正则表达式(java.util.regex)。 搜索美元符号

[英]Java regex (java.util.regex). Search for dollar sign

I have a search string.我有一个搜索字符串。 When it contains a dollar symbol, I want to capture all characters thereafter, but not include the dot, or a subsequent dollar symbol.. The latter would constitute a subsequent match.当它包含美元符号时,我想捕获其后的所有字符,但不包括点或后续的美元符号。后者将构成后续匹配。 So for either of these search strings...:因此,对于这些搜索字符串中的任何一个......:

"/bla/$V_N.$XYZ.bla";
"/bla/$V_N.$XYZ;

I would want to return:我想返回:

  • V_N V_N
  • XYZ XYZ

If the search string contains percent symbols, I also want to return what's between the pair of % symbols.如果搜索字符串包含百分比符号,我还想返回这对 % 符号之间的内容。

The following regex seems do the trick for that.以下正则表达式似乎可以解决这个问题。

 "%([^%]*?)%";

Inferring:推断:

  • Start and end with a %,以 % 开头和结尾,
  • Have a capture group - the ()有一个捕获组 - ()
  • have a character class containing anything except a % symbol, (caret infers not a character)有一个字符 class 包含除 % 符号以外的任何内容,(插入符号推断不是字符)
  • repeated - but not greedily *?重复 - 但不是贪婪*?

Where some languages allow %1 , %2 , for capture groups, Java uses backslash\number syntax instead.在某些语言允许%1%2用于捕获组的情况下,Java 使用backslash\number语法。 So, this string compiles and generates output.因此,此字符串编译并生成 output。

I suspect the dollar symbol and dot need escaping, as they are special symbols:我怀疑美元符号和点需要 escaping,因为它们是特殊符号:

  • $ is usually end of string $通常是字符串的结尾
  • . is a meta sequence for any character.是任何字符的元序列。

I have tried using double backslash symbols.. \我试过使用双反斜杠符号.. \

  • Both as character classes.eg [^\\.\\$%]都作为字符类。例如[^\\.\\$%]
  • and using OR'd notation %|\\$并使用OR'd符号%|\\$

in attempts to combine this logic and can't seem to get anything to play ball.试图结合这种逻辑,似乎无法得到任何可玩的东西。

I wonder if another pair of eyes can see how to solve this conundrum!不知道有没有另一双眼睛能看出如何解决这个难题!

My attempts so far:到目前为止我的尝试:

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
  public static void main(String[] args) {
        String search = "/bla/$V_N.$XYZ.bla";
        String pattern = "([%\\$])([^%\\.\\$]*?)\\1?";
  /* Either % or $ in first capture group ([%\\$])
   * Second capture group - anything except %, dot or dollar sign
   * non greedy group ( *?)
   * then a backreference to an optional first capture group \\1?
   * Have to use two \, since you escape \ in a Java string.
   */
        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(search);
        List<String> results = new ArrayList<String>();
          while (m.find()) 
        { 
          for (int i = 0; i<= m.groupCount(); i++) {
                results.add(m.group(i));
          }
        }
        for (String result : results) {
          System.out.println(result);
        }
  }
}

The following links may be helpful:以下链接可能会有所帮助:

You may use您可以使用

String search = "/bla/$V_N.$XYZ.bla";
String pattern = "[%$]([^%.$]*)";
Matcher matcher = Pattern.compile(pattern).matcher(search);
while (matcher.find()){
    System.out.println(matcher.group(1)); 
} // => V_N, XYZ

See the Java demo and the regex demo .请参阅Java 演示正则表达式演示

NOTE笔记

  • You do not need an optional \1?您不需要可选的\1? at the end of the pattern.在模式的末尾。 As it is optional, it does not restrict match context and is redundant (as the negated character class cannot already match neither $ nor % )由于它是可选的,因此它不限制匹配上下文并且是多余的(因为否定字符 class 既不能匹配$也不能匹配%
  • [%$]([^%.$]*) matches % or $ , then captures into Group 1 any zero or more chars other than % , . [%$]([^%.$]*)匹配%$ ,然后将除% , 之外的任何零个或多个字符捕获到第 1 组中. and $ .$ You only need Group 1 value, hence, matcher.group(1) is used.您只需要 Group 1 值,因此使用matcher.group(1)
  • In a character class , neither .字符 class中,既不是. nor $ are special, thus, they do not need escaping in [%.$] or [%$] .也没有$是特殊的,因此,它们不需要[%.$][%$]中的 escaping 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM