簡體   English   中英

正則表達式匹配多行文本塊?

[英]Regex matching chunks of multiline text?

我有一個文本文件,其中包含200多種以下格式的記錄:

 @INPROCEEDINGS{Rajan-Sullivan03,
  author = {Hridesh Rajan and Kevin J. Sullivan},
  title = {{{Eos}: Instance-Level Aspects for Integrated System Design}},
  booktitle = {ESEC/FSE 2003},
  year = {2003},
  pages = {297--306},
  month = sep,
  isbn = {1-58113-743-5},
  location = {Helsinki, FN},
  owner = {Administrator},
  timestamp = {2009.03.08}
}

@INPROCEEDINGS{ras-mor-models-06,
  author = {Awais Rashid and Ana Moreira},
  title = {Domain Models Are {NOT} Aspect Free},
  booktitle = {MoDELS},
  year = {2006},
  editor = {Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio},
  volume = {4199},
  series = {Lecture Notes in Computer Science},
  pages = {155--169},
  publisher = {Springer},
  bibdate = {2006-12-07},
  bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/models/models2006.html#RashidM06},
  isbn = {3-540-45772-0},
  owner = {aljasser},
  timestamp = {2008.09.16},
  url = {http://dx.doi.org/10.1007/11880240_12}
}

基本上,一條記錄以@開頭,以}結尾,所以我嘗試做的是以@開頭並以} \\ n}結尾,但是沒有用,它只會匹配第一個記錄和另一個記錄,因為沒有記錄之后的新行。

            string pattern = @"(^@)([\s\S]*)(}$\n}(\n))";

當我嘗試通過制造它來修復它時,它就將所有內容都匹配為一個匹配項

 string pattern = @"(^@)([\s\S]*)(}$\n}(\n*))";

我一直嘗試着直到達到以下模式,但它無法正常工作,請解決此問題,或者給出更有效的解決方案再加上一些解釋。

這是我的代碼:

            string pattern = @"(^@)([\s\S]*)(}$\n}(\n))";
        Regex regex = new Regex(pattern,RegexOptions.Multiline);
        var matches = regex.Matches(bibFileContent).Cast<Match>().Select(m => m.Value).ToList();

如果使用Matches方法,則需要使用這種模式來處理大括號:

string pattern = @"@[A-Z]+{(?>[^{}]+|(?<open>{)|(?<-open>}))*(?(open)(?!))}";
Regex regex = new Regex(pattern);

或確保所有結果的格式正確(從括號的角度看)

string pattern = @"\G[^{}]*(@[A-Z]+{(?>[^{}]+|(?<open>{)|(?<-open>}))*(?(open)(?!))})";

這兩種模式使用命名捕獲作為計數器。 當碰到開括號時,計數器遞增;當碰到閉括號時,計數器遞減。 (?(open)(?!))是一個條件測試,如果計數器不為null,則使模式失敗。

在線演示

如果塊中不包含@字符,則使用Regex.Split(input, pattern)方法會更方便:

string[] result = Regex.Split(input, @"[^}]*(?=@)");

如果塊可以包含@字符,則可以通過更具描述性的前瞻使其更加健壯:

string[] result = Regex.Split(input, @"[^}]*(?=@[A-Z]+{)");

要么

string[] result = Regex.Split(input, @"\s*(?=@[A-Z]+{)");

我認為問題在於您的輸入未以\\ n結尾,因此第二條記錄不匹配。 您應該用$代替

這將在第1組中獲得記錄:

@(.*?)^}(?:[\r\n]+|$)

DEMO

注意,您必須使用ms修飾符

使用此代碼:

Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.Singleline);
MatchCollection mc = regex.Matches(bibFileContent);
List<String> results = new List<String>();
foreach (Group m in mc[0].Groups)
{
results.Add(m.Value);
}

您可以使用一個簡單的正則表達式,如下所示:

(@[^@]+)

工作演示

在此處輸入圖片說明

想法是匹配以@開頭且沒有其他@的內容。 順便說一句,如果您只想匹配模式而不是捕獲模式,則刪除capturin組:

@[^@]+

這看起來像是平衡群體的候選人。

 # @"(?m)^[^\S\r\n]*@[^{}]+(?:\{(?>[^{}]+|\{(?<Depth>)|\}(?<-Depth>))*(?(Depth)(?!))\})"

 (?m)
 ^ [^\S\r\n]* 
 @ [^{}]+ 
 (?:
      \{                            # Match opening {
      (?>                           # Then either match (possessively):
           [^{}]+                        #   Anything (but only if we're not at the start of { or } )
        |                              # or
           \{                            #  { (and increase the braces counter)
           (?<Depth> )
        |                              # or
           \}                            #  } (and decrease the braces counter).
           (?<-Depth> )
      )*                            # Repeat as needed.
      (?(Depth)                     # Assert that the braces counter is at zero.
           (?!)                          # Fail if it isn't
      )
      \}                            # Then match a closing }. 
 )

代碼樣例

Regex FghRx = new Regex( @"(?m)^[^\S\r\n]*@[^{}]+(?:\{(?>[^{}]+|\{(?<Depth>)|\}(?<-Depth>))*(?(Depth)(?!))\})" );
string FghData =
@"
@INPROCEEDINGS{Rajan-Sullivan03,
author = {Hridesh Rajan and Kevin J. Sullivan},
  title = {{{Eos}: Instance-Level Aspects for Integrated System Design}},
  booktitle = {ESEC/FSE 2003},
  year = {2003},
  pages = {297--306},
  month = sep,
  isbn = {1-58113-743-5},
  location = {Helsinki, FN},
  owner = {Administrator},
  timestamp = {2009.03.08}
}

@INPROCEEDINGS{ras-mor-models-06,
  author = {Awais Rashid and Ana Moreira},
  title = {Domain Models Are {NOT} Aspect Free},
  booktitle = {MoDELS},
  year = {2006},
  editor = {Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio},
  volume = {4199},
  series = {Lecture Notes in Computer Science},
  pages = {155--169},
  publisher = {Springer},
  bibdate = {2006-12-07},
  bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/models/models2006.html#RashidM06},
  isbn = {3-540-45772-0},
  owner = {aljasser},
  timestamp = {2008.09.16},
  url = {http://dx.doi.org/10.1007/11880240_12}
}
";

Match FghMatch = FghRx.Match(FghData);
while (FghMatch.Success)
{
    Console.WriteLine("New Record\n------------------------");
    Console.WriteLine("{0}", FghMatch.Groups[0].Value);
    FghMatch = FghMatch.NextMatch();
    Console.WriteLine("");
}

產量

New Record
------------------------
@INPROCEEDINGS{Rajan-Sullivan03,
author = {Hridesh Rajan and Kevin J. Sullivan},
  title = {{{Eos}: Instance-Level Aspects for Integrated System Design}},
  booktitle = {ESEC/FSE 2003},
  year = {2003},
  pages = {297--306},
  month = sep,
  isbn = {1-58113-743-5},
  location = {Helsinki, FN},
  owner = {Administrator},
  timestamp = {2009.03.08}
}

New Record
------------------------
@INPROCEEDINGS{ras-mor-models-06,
  author = {Awais Rashid and Ana Moreira},
  title = {Domain Models Are {NOT} Aspect Free},
  booktitle = {MoDELS},
  year = {2006},
  editor = {Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio},
  volume = {4199},
  series = {Lecture Notes in Computer Science},
  pages = {155--169},
  publisher = {Springer},
  bibdate = {2006-12-07},
  bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/models/models2006.html#RashidM06},
  isbn = {3-540-45772-0},
  owner = {aljasser},
  timestamp = {2008.09.16},
  url = {http://dx.doi.org/10.1007/11880240_12}
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM