[英]How to get rid of duplicates in regex
假設我有一根繩子,“貓貓貓狗狗狗”。
我將使用什么正則表達式來替換“貓與狗”的字符串。 即刪除重復項。 但是,表達式只能刪除彼此之后的重復項。 例如:
“貓貓貓狗狗狗貓貓貓狗”
會回來:
“貓與狗,貓與狗”
resultString = Regex.Replace(subjectString, @"\b(\w+)(?:\s+\1\b)+", "$1");
將在一個電話中完成所有替換。
說明:
\b # assert that we are at a word boundary
# (we only want to match whole words)
(\w+) # match one word, capture into backreference #1
(?: # start of non-capturing, repeating group
\s+ # match at least one space
\1 # match the same word as previously captured
\b # as long as we match it completely
)+ # do this at least once
用$1
替換(\\w+)\\s+\\1
$1
在循環中執行此操作,直到找不到更多匹配項。 設置global
標志是不夠的,因為它不會取代cats
中的第三cats cats cats
正則表達式中的\\1
指的是第一個捕獲組的內容。
嘗試:
str = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs";
str = Regex.Replace(str, @"(\b\w+\b)\s+(\1(\s+|$))+", "$1 ");
Console.WriteLine(str);
毫無疑問,有一個較小的正則表達式可能,但這個似乎可以做到這一點:
string somestring = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs";
Regex regex = new Regex(@"(\w+)\s(?:\1\s)*(?:\1(\s|$))");
string result = regex.Replace(somestring, "$1$2");
它還考慮到最后一個沒有空格結尾的“狗”。
請嘗試以下代碼。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1 { /// <summary> ///
/// A description of the regular expression: ///
/// Match expression but don't capture it. [^|\s+] /// Select from 2 alternatives /// Beginning of line or string /// Whitespace, one or more repetitions /// [1]: A numbered capture group. [(\w+)(?:\s+|$)] /// (\w+)(?:\s+|$) /// [2]: A numbered capture group. [\w+] /// Alphanumeric, one or more repetitions /// Match expression but don't capture it. [\s+|$] /// Select from 2 alternatives /// Whitespace, one or more repetitions /// End of line or string /// [3]: A numbered capture group. [\1|\2], one or more repetitions /// Select from 2 alternatives /// Backreference to capture number: 1 /// Backreference to capture number: 2 ///
/// /// </summary> class Class1 { /// /// Point d'entrée principal de l'application. /// static void Main(string[] args) { Regex regex = new Regex( "(?:^|\s+)((\w+)(?:\s+|$))(\1|\2)+", RegexOptions.IgnoreCase | RegexOptions.Compiled ); string str = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs"; string regexReplace = " $1";
Console.WriteLine("Before :" + str);
str = regex.Replace(str,regexReplace);
Console.WriteLine("After :" + str);
}
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.