簡體   English   中英

如何通過RegEx或replaceAll刪除包含特殊字符的部分字符串?

[英]How to remove part of string that includes special characters by RegEx or replaceAll?

以下是字符串:

1. "AAA BBB  CCCCC CCCCCCC"
2. "  AAA              BBB  DDDD DDDD DDDDD"
3. "    EEE         FFF  GGGGG GGGGG"

開頭和第一個和第二個單詞之間的空格可以變化。 所以我需要一個RegEx來刪除第三個字之前的所有內容,所以它總是返回“CCCCC CCCCCCC”或“DDDD DDDD DDDDD”或“GGGGG GGGGG”。 假設它可以通過RegEx完成,而不是解析字符串中的所有單詞

您需要使用組匹配來解析所需的數據

String result = null;

try {
    Pattern regex = Pattern.compile("\\s*\\w+\\s*\\w+\\s*([\\w| ]+)");
    Matcher regexMatcher = regex.matcher("  AAA              BBB  DDDD DDDD DDDDD");
    if (regexMatcher.find()) {
        result = regexMatcher.group(1); // result = "DDDD DDDD DDDDD"
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

正則表達式解釋

"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\w" +           // Match a single character that is a “word character” (letters, digits, and underscores)
   "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\w" +           // Match a single character that is a “word character” (letters, digits, and underscores)
   "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"(" +            // Match the regular expression below and capture its match into backreference number 1
   "[\\w| ]" +       // Match a single character present in the list below
                       // A word character (letters, digits, and underscores)
                       // One of the characters “| ”
      "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
")" 

這個正則表達式將起作用

\s*\w+\s+\w+\s+(.+$)

正則表達式演示

JAVA代碼

String pattern  = "(?m)\\s*\\w+\\s+\\w+\\s+(.+$)"; 
String line = "AAA BBB  CCCCC CCCCCCC\n  AAA              BBB  DDDD DDDD DDDDD\n    EEE         FFF  GGGGG GGGGG";

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(line);
while (m.find()) {
     System.out.println("Found value: " + m.group(1) );
}

Ideone演示

與@ rock321987的答案類似,您可以修改正則表達式以使用量詞來忽略您不想要的任何數量的前面單詞。

\s*(?:\w+\s+){2}(.+$)

更多信息

或者在Java中:

"\\s*(?:\\w+\\s+){2}(.+$)"

?:使()中的模式成為非捕獲組。 {}中的數字是您要忽略的空格后面的單詞數。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM