繁体   English   中英

解析多行字符串的正则表达式

[英]Regular expression to parse multi-line string

我正在从 Clearquest 记录中提取笔记日志(是的,我知道,CQ 非常古老),其中包括一些我需要用双引号封装并用分号分隔的项目,基本上使其可作为评论导入 jira。 以下是注释日志可能包含的内容的示例。 这被存储在 perl 中的一个变量中。

===== State: In_Work by:user1 at 12/13/2010 10:47:23 =====

Generic notes log entry that can span multiple lines depending on length
of sentence.

===== State: In_Work by:user2 at 06/04/2010 17:34:42 =====

Another generic notes log entry.

从上面的示例中,我需要的最终结果如下所示:

my @notes_log_entries = ("\"12/13/2010 10:47:23;user1;Generic notes log entry that can span multiple lines depending on length\r\nof sentance\"", "\"06/04/2010 17:34:42;user2;Another generic notes log entry.\"");

以下代码有效,但仅适用于包含两个笔记日志条目的变量:

$Notes_Log = $resultset->GetColumnValue(7);
print "Notes Log Before: $Notes_Log\n";
$Notes_Log =~ s/\R//g;
$Notes_Log =~ s/^===== State: .* by:(.*) at (.*) =====(.*)===== State: .* by:(.*) at (.*) =====(.*)/rtx-$1;$2;$3\nrtx-$4;$5;$6/g;
print "Notes Log After:\n$Notes_Log\n";

以下是上述代码的一些示例输出:

Notes Log Before:
===== State: In_Work by:user1 at 12/13/2010 10:47:23 =====

Generic notes log entry that can span multiple lines depending on length
of sentence.

===== State: In_Work by:user2 at 06/04/2010 17:34:42 =====

Another generic notes log entry.

Notes Log After:
rtx-user1;12/13/2010 10:47:23;Generic notes log entry that can span multiple lines depending on length
of sentence.
rtx-user2;06/04/2010 17:34:42;Another generic notes log entry.

当然,可能有更多方法可以解决这个问题,这是我的:

my @notes_log_entries = ();

while ($Notes_Log =~ s/===== State: .* by:(.*) at (.*) =====\R\R([\w\s.]*)\R?\R?//) {
    my $single_entry .= '"'.$2.';'.$1.';'.$3.'"';
    $single_entry =~ s/\s\s//g;
    push(@notes_log_entries, $single_entry);    
}

while 循环经常帮助我创建正则表达式,因为它降低了实际正则表达式的复杂性。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM