简体   繁体   English

正则表达式匹配Perl中字符串的第二次出现

[英]regex match second occurrence of a string in Perl

I'm trying to match the first and second occurrence of a string in perl. 我正在尝试匹配perl中字符串的第一次和第二次出现。 The first few lines of input (contained in @intersect) are: 输入的前几行(包含在@intersect中)为:

          'gi|112807938|emb|CU075707.1|_Xenopus_tropicalis_finished_cDNA,_clone_TNeu129d01  C1:TCONS_00039972(XLOC_025068),_12.9045:32.0354,_Change:1.3118,_p:0.00025,_q:0.50752  C2:TCONS_00045925(XLOC_029835),_10.3694:43.8379,_Change:2.07985,_p:0.0004,_q:0.333824',
          'gi|115528274|gb|BC124894.1|_Xenopus_laevis_islet-1,_mRNA_(cDNA_clone_MGC:154537_IMAGE:8320777),_complete_cds C1:TCONS_00080221(XLOC_049570),_17.9027:40.8136,_Change:1.18887,_p:0.00535,_q:0.998852  C2:TCONS_00092192(XLOC_059015),_17.8995:35.5534,_Change:0.990066,_p:0.0355,_q:0.998513',
          'gi|118404233|ref|NM_001078963.1|_Xenopus_(Silurana)_tropicalis_pancreatic_lipase-related_protein_2_(pnliprp2),_mRNA  C1:TCONS_00031955(XLOC_019851),_0.944706:5.88717,_Change:2.63964,_p:0.01915,_q:0.998852 C2:TCONS_00036655(XLOC_023660),_2.31819:11.556,_Change:2.31757,_p:0.0358,_q:0.998513',

The information I'm trying to extract is the 'Change:[value]' for both C1 and C2 (which are separated by tabs), using the following: 我尝试提取的信息是C1和C2的'Change:[value]'(由制表符分隔),使用以下命令:

#!/usr/bin/perl -w
use strict; 
use File::Slurp;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;

my @log_change;
foreach (@intersect) {
    chomp;
    my @condition1_match = ($_ =~ /(C1:).*Change:(-?\d+\.\d+)/g);
    my @condition2_match = ($_ =~ /(C2:).*Change:(-?\d+\.\d+)/g);
    push @log_change, "@condition1_match\t@condition2_match";
  }

print Dumper (\@log_change);

Prints: 打印:

      'C1: 2.07985    C2: 2.07985',
      'C1: 0.990066    C2: 0.990066',
      'C1: 2.31757    C2: 2.31757',

ie the same value for C1 and C2. 即C1和C2的值相同。 It's clear that my loop stores the value for C2 in both @condition1_match and @condition2_match . 显然,我的循环将C2的值存储在@condition1_match@condition2_match

My question is: How can I specify that I want the first iteration of 'Change:[value]' to be pushed onto @condition1_match and the second onto @condition2_match ? 我的问题是:我如何指定我想要“改变:[值]”的第一次迭代被推到@condition1_match和第二到@condition2_match

What is happening is that your regexes are matching as much as possible where you have the .* . 发生的情况是,您的正则表达式在具有.*尽可能匹配。 What you need to do is make the quantifier lazy (non-greedy) and this is done by adding a question mark ? 您需要做的是使量词变得懒惰(非贪婪),并通过添加问号来做到这一点? it. 它。

my @condition1_match = ($_ =~ /(C1:).*?Change:(-?\d+\.\d+)/g);
                                  #   ^
my @condition2_match = ($_ =~ /(C2:).*?Change:(-?\d+\.\d+)/g);
                                  #   ^

That way, the regex will match the least possible characters until it 'sees' Change:(-?\\d+\\.\\d+)/g) . 这样,正则表达式将匹配最少的字符,直到它“看到” Change:(-?\\d+\\.\\d+)/g)为止。

You can check on some online regex sites what you are exactly matching, for example this site . 您可以在某些在线正则表达式网站上检查您完全匹配的内容,例如this site

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM