匹配所有出現的字符串

Question

我的搜索文本如下。

...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...

它包含許多行（實際上是一個javascript文件），但需要解析變量字符串中的值，即aaa，bbb，ccc，ddd，eee

以下是Perl代碼，或在底部使用PHP

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches = $str =~ /(?:\"(.+?)\",?)/g;
print "@matches";

我知道上面的腳本將匹配所有瞬間，但是它也會解析其他行中的字符串（ “ xyz” ）。 所以我需要檢查字符串var strings =

/var strings = \[(?:\"(.+?)\",?)/g

使用上述正則表達式將解析aaa 。

/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g

使用上面的，將得到aaa和bbb 。 因此，為避免正則表達式重復，我使用了“ +”量詞，如下所示。

/var strings = \[(?:\"(.+?)\",?)+/g

但是我只有eee ，所以我的問題是為什么我只在使用'+'量詞時才得到eee ？

更新1：使用PHP preg_match_all（這樣做可以引起更多關注：-)）

$str = <<<STR
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR;

preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);

更新2：為什么匹配eee ？ 由於(?:\\"(.+?)\\",?)+的貪婪。 通過消除貪婪/var strings = \\[(?:\\"(.+?)\\",?)+?/ aaa將被匹配。 但是為什么只有一個結果呢？ 使用單個正則表達式有什么方法可以實現？

Answer 1

這是一個單正則表達式解決方案：

/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g

\\G是一個零寬度的斷言，它匹配上一個匹配結束的位置（如果是第一次匹配，則匹配字符串的開頭）。 所以這就像：

var\s+strings\s*=\s*[\s*"([^"]*)"

...第一次嘗試，然后：

,\s*"([^"]*)"

...此后，但每場比賽都必須從最后一場比賽的確切位置開始。

這是一個PHP演示 ，但它也將在Perl中工作。

Answer 2

您可能更喜歡這種解決方案，該解決方案首先使用/g修飾符查找字符串var strings = [ 。 這會將\\G設置為在[之后緊跟下一個正則表達式，該正則表達式將在出現雙引號的字符串之后立即查找所有字符串，這些字符串可能前面帶有逗號或空格。

my @matches;

if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
  @matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}

盡管使用了/g修飾符，您的正則表達式/var strings = \\[(?:\\"(.+?)\\",?)+/g僅匹配一次，因為沒有第二次出現var strings = [ 。 每次匹配完成后，每個匹配返回捕獲變量$1 ， $2 ， $3等的值的列表，以及/(?:"(.+?)",?)+/ （無需對double進行轉義） -quotes）將多個值捕獲到$1 ，僅保留最終值。 您需要編寫類似於上述內容，每次匹配僅將單個值捕獲到$1中。

Answer 3

因為+告訴它重復括號(?:"(.+?)",?)的確切內容一次或多次。 因此它將匹配"eee"字符串，然后結束查找該"eee"字符串的重復，但找不到。

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();

The regular expression:

(?-imsx:var strings = \[(?:"(.+?)",?)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  var strings =            'var strings = '
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    (                        group and capture to \1:
----------------------------------------------------------------------
      .+?                      any character except \n (1 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \1
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    ,?                       ',' (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
  )+                       end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

一個簡單的例子是：

my @m = ('abcd' =~ m/(\w)+/g);
print "@m";

僅打印d 。 這是因為：

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();

The regular expression:

(?-imsx:(\w)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

如果在捕獲組上使用量詞，則僅使用最后一個實例。

這是一種可行的方法：

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
@matches = $array =~ m/"(.+?)"/g; # and get the strings from that

print "@matches";

更新：單行解決方案（盡管不是單個正則表達式）將是：

@matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;

但這是非常難以理解的恕我直言。

匹配所有出現的字符串

問題描述

3 個解決方案

解決方案1
2 已采納 2012-07-19 12:08:34

解決方案2
2 2012-07-19 14:39:09

解決方案3
1 2012-07-19 11:20:46

匹配所有出現的字符串

問題描述

3 個解決方案

解決方案1 2 已采納 2012-07-19 12:08:34

解決方案2 2 2012-07-19 14:39:09

解決方案3 1 2012-07-19 11:20:46

解決方案1
2 已采納 2012-07-19 12:08:34

解決方案2
2 2012-07-19 14:39:09

解決方案3
1 2012-07-19 11:20:46