簡體   English   中英

正則表達式捕獲文本

[英]Regular Expression to capture text

我有一個包含這樣內容的日志文件:

  2012-07-16 03:20:41,23796160897,Text,id:SAR-23796160897-c0-2-1 sub:000 dlvrd:001提交日期:120715220216完成日期:120716032038 stat:DELIVRD錯誤:000 text:,FOTSO TOKAM,SMSCReceiptMsgId = SAR-23796160897-c0-2-1\n 2012-07-16 03:20:48,23796160897,Text,id:SAR-23796160897-c0-2-2 sub:000 dlvrd:001提交日期:120715220216完成日期:120716032045 stat:DELIVRD錯誤:000 text:,FOTSO TOKAM,SMSCReceiptMsgId = SAR-23796160897-c0-2-2\n 2012-05-04 00:07:46,23777603300,文字,id:4FA23EB0子:000 dlvrd:001提交日期:120503225018完成日期:120504000744 stat:DELIVRD錯誤:000文本:,FLP,SMSCReceiptMsgId = 4FA23EB0\n 2012-05-04 01:50:18,23796726987,Text,id:4FA23E95 sub:000 dlvrd:001提交日期:120503225014完成日期:120504015016 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23E95\n 2012-05-04 01:50:22,23799757015,Text,id:4FA23EB2 sub:000 dlvrd:001提交日期:120503225018完成日期:120504015021 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23EB2\n 2012-05-04 01:50:48,23799907239,Text,id:4FA23F38 sub:000 dlvrd:001提交日期:120503225042完成日期:120504015046 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23F38\n 2012-05-04 01:50:48,23799896455,Text,id:4FA23D1C sub:000 dlvrd:001提交日期:120503175232完成日期:120504015046 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23D1C\n 2012-05-04 01:50:48,23799896455,Text,id:4FA23F04 sub:000 dlvrd:001提交日期:120503225031完成日期:120504015046 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23F04\n 2012-05-04 01:50:50,23794105044,Text,id:4FA23F55 sub:000 dlvrd:001提交日期:120503225046完成日期:120504015048 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23F55\n 2012-05-04 01:51:19,23796029764,Text,id:4FA23FEE sub:000 dlvrd:001提交日期:120503225114完成日期:120504015117 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23FEE\n 2012-05-04 02:17:51,23775461594,Text,id:4FA24025 sub:000 dlvrd:001提交日期:120503225125完成日期:120504021749 stat:DELIVRD錯誤:000文本:,FLP,SMSCReceiptMsgId = 4FA24025\n 2012-05-04 04:08:02,23777437781,Text,id:4FA23F23 sub:000 dlvrd:001提交日期:120503225037完成日期:120504040800 stat:DELIVRD錯誤:000 text:,FLP,SMSCReceiptMsgId = 4FA23F23\n 2012-05-04 04:50:12,23777970013,Text,id,4FA23E70 sub:000 dlvrd:000提交日期:120503225005完成日期:120504045011 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23E70\n 2012-05-04 04:50:15,23775182832,Text,id:4FA23E7E sub:000 dlvrd:000提交日期:120503225008完成日期:120504045014 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23E7E\n 2012-05-04 04:50:17,23777789644,Text,id:4FA23E80 sub:000 dlvrd:000提交日期:120503225010完成日期:120504045016 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23E80\n 2012-05-04 04:50:21,23777529371,Text,id,4FA23E8F sub:000 dlvrd:000提交日期:120503225013完成日期:120504045019 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23E8F\n 2012-05-04 04:50:21,23777613852,Text,id,4FA23E97 sub:000 dlvrd:000提交日期:120503225014完成日期:120504045020 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23E97\n 2012-05-04 04:50:24,23777407598,Text,id,4FA23EAE sub:000 dlvrd:000提交日期:120503225017完成日期:120504045023 stat:EXPIRED錯誤:032文字:,FLP,SMSCReceiptMsgId = 4FA23EAE\n 2012-05-04 04:50:26,23777736950,Text,id:4FA23EAF sub:000 dlvrd:000提交日期:120503225018完成日期:120504045024 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23EAF\n 2012-05-04 04:50:31,23775834128,Text,id,4FA23ED6 sub:000 dlvrd:000提交日期:120503225024完成日期:120504045030 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23ED6\n 2012-05-04 04:50:36,23777486441,Text,id,4FA23EF3 sub:000 dlvrd:000提交日期:120503225029完成日期:120504045035 stat:EXPIRED錯誤:027 text:,FLP,SMSCReceiptMsgId = 4FA23EF3\n

現在我想通過使用帶有c#.net和LINQ的正則表達式來捕獲此內容中的值,以用於“id,done date,stat”等少數特定字段。

如果有人有任何想法,請幫助我。

我不認為你Regex會在這里幫助你很多。 相反,你應該將行分成行然后分成列,因為我可以看到數據可以被分割成一個矩陣,您可以從中輕松地提取您正在尋找的信息......甚至您可以在JavaScript / C#/ Java中執行此操作或任何語言。

在我的實踐中這樣做:

  • 將數據拆分成行
  • 將該行拆分為列
  • 然后遍歷每一行並指向您要查找的列。

     var content = data.split('\\n'); foreach(var line in content) { var cols = line.split(','); var c1 = cols[0]; var c2 = cols[1]; var c3 = cols[2]; } 

您可以完善上述摘錄以滿足您的需求......這是最好的方法。

目前尚不清楚所有字段的含義,或者分隔符是否為常量。 使用您提供的測試數據,可將大部分信息轉換為命名組。

/// <summary>
///  Regular expression built for C# on: Tue, Jul 17, 2012, 12:08:12 PM
///  Using Expresso Version: 3.0.4334, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  Beginning of line or string
///  [Date]: A named capture group. [[^,]+]
///      Any character that is NOT in this class: [,], one or more repetitions
///  ,
///  [Number]: A named capture group. [[^,]+]
///      Any character that is NOT in this class: [,], one or more repetitions
///  ,
///  [Text1]: A named capture group. [[^,]+]
///      Any character that is NOT in this class: [,], one or more repetitions
///  ,
///  id:
///      id:
///  [ID]: A named capture group. [[^\s]+]
///      Any character that is NOT in this class: [\s], one or more repetitions
///  Whitespace
///  sub:
///      sub:
///  [Sub]: A named capture group. [\w+]
///      Alphanumeric, one or more repetitions
///  Whitespace
///  dlvrd:
///      dlvrd:
///  [Dlvrd]: A named capture group. [\w+]
///      Alphanumeric, one or more repetitions
///  Whitespace
///  submit\sdate:
///      submit
///      Whitespace
///      date:
///  [SubmitDate]: A named capture group. [\w+]
///      Alphanumeric, one or more repetitions
///  Whitespace
///  done\sdate:
///      done
///      Whitespace
///      date:
///  [DoneDate]: A named capture group. [\w+]
///      Alphanumeric, one or more repetitions
///  Whitespace
///  stat:
///      stat:
///  [Status]: A named capture group. [\w+]
///      Alphanumeric, one or more repetitions
///  Whitespace
///  err:
///      err:
///  [Error]: A named capture group. [\d+]
///      Any digit, one or more repetitions
///  Whitespace
///  
///
/// </summary>
public static Regex regex = new Regex(
      "^(?<Date>[^,]+),\r\n(?<Number>[^,]+),\r\n(?<Text1>[^,]+),\r\nid:(?"+
      "<ID>[^\\s]+)\\s\r\nsub:(?<Sub>\\w+)\\s\r\ndlvrd:(?<Dlvrd>\\w+)\\s"+
      "\r\nsubmit\\sdate:(?<SubmitDate>\\w+)\\s\r\ndone\\sdate:(?<DoneD"+
      "ate>\\w+)\\s\r\nstat:(?<Status>\\w+)\\s\r\nerr:(?<Error>\\d+)\\s",
    RegexOptions.Multiline
    | RegexOptions.ExplicitCapture
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );

所以有了這個,你可以打電話:

var matches = regex.Matches(inputData);

我個人建議您將測試限制為單行數據,然后調用它:

var match = regex.Match(inputLineOfData);

這意味着您可以:

if ( match.Success )
{
   var id = match.Groups["ID"].Value;
   var submitDate = match.Groups["SubmitDate"].Value;  // Parse to DateTime
   var doneDate = match.Groups["DoneDate"].Value;  // Parse to DateTime

   // etc for 'sub', 'dlvrd', 'Status', 'Error'..
}

可能csv解析器會更好但你可以使用這個正則表達式並將id:替換為你想要的其他字段。 done date:(?<done date>.*?)\\s

string strRegex = @"id:(?<id>.*?)\s.*?done date:(?<donedate>.*?)\s.*?stat:(?<stat>.*?)\s";
RegexOptions myRegexOptions = RegexOptions.IgnoreCase | RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = @"2012-07-16 03:20:41,23796160897,Text,id:SAR-23796160897-c0-2-1 sub:000 dlvrd:001 submit date:120715220216 done date:120716032038 stat:DELIVRD err:000 text:,FOTSO TOKAM,SMSCReceiptMsgId=SAR-23796160897-c0-2-1"
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
  if (myMatch.Success)
  {
    // Add your code here 
    //myMatch.Groups["id"].Value;
    //myMatch.Groups["donedate"].Value;
    //myMatch.Groups["stat"].Value;
  }
}

你可以使用一個正則表達式id:(?<id>.*?)\\s.*?done date:(?<donedate>.*?)\\s.*?stat:(?<stat>.*?)\\s然后用組訪問像myMatch.Groups["id"].Value

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM