PHP Regex從大字符串中查找子字符串-匹配開始和結束

Question

我想從一個巨大的干草堆中找到頁面的標題，但是沒有任何類或唯一的ID，所以我在這里不能使用DOM解析器，我知道我必須使用正則表達式。 這是我要查找的示例：

<a href="http://example.com/xyz">
    Series Hell In Heaven information
</a>
<a href="http://example.com/123">
    Series What is going information
</a>

輸出應該是一個數組

[0] => Series Hell In Heaven information
[1] => Series What is going information

所有系列標題均以系列開頭，並以信息結尾。 從很多東西的巨大字符串中，我只想提取標題。 目前，我正在嘗試使用正則表達式，但無法正常工作，這是我現在正在做的事情。

$reg = "/^Series\..*information$/";
$str = $html;
preg_match_all($reg, $str, $matches);
echo "<pre>";
    print_r($matches);
echo "</pre>";

我對制作正則表達式了解不多。 幫助將不勝感激。 謝謝

Answer 1

嘗試這個：

$str = '<a href="http://example.com/xyz">
    Series Hell In Heaven information
</a>
<a href="http://example.com/123">
    Series What is going information
</a>';
preg_match_all('/Series(.*?)information/', $str, $matches);
echo "<pre>";
    print_r($matches);
echo "</pre>";

捕獲將在$ matches [2]中。 基本上，由於\\.您的正則表達式不匹配\\. 。

[編輯]

如果您還需要單詞Series and information ，則不必捕獲/Series.*?information/並在$ matches [0]中找到匹配項。

Answer 2

嘗試

 preg_match_all('/(Series.+?information)/', $str, $matches );

如

https://regex101.com/r/oJ0jZ4/1

正如我在評論中所說，刪除文字\\. 點以及開始和結束錨點...我也將使用非貪婪要求的任何字符。 .+?

否則你可以匹配

Seriesinformation

系列的外殼或信息是否可能發生變化，例如

系列....信息

添加/i標志，如下所示

     preg_match_all('/(Series.+?information)/i', $str, $matches );

外部捕獲組並不是真正需要的，但是我認為它在那里看起來更好，如果您只想要變量內容而不需要Series或Information，則將捕獲( )移至該位。

 preg_match_all('/Series(.+?)information/i', $str, $matches );

請注意，您將需要trim()匹配項，因為它可能在開頭和結尾都有空格，或者像這樣將它們添加到regx中。

 preg_match_all('/Series\s(.+?)\sinformation/i', $str, $matches );

但這將排除匹配Series information的一個空格。

如果您想確定自己沒有匹配以下信息，例如

[Series Hell In Heaven information Series Hell In Heaven information]

匹配所有這些，您可以在后面使用積極的表情

preg_match_all('/(Series.+?(?<=information))/i', $str, $matches );

相反，如果有可能，它將包含兩個信息詞

   <a href="http://example.com/123">
        Series information is power information
   </a>

你可以這樣做

    preg_match_all('/(Series[^<]+)</i', $str, $matches );

它將與<如</a中的</a

作為附帶說明，您可以使用PHPQuery庫（這是DOM分析器），然后查找包含這些單詞a標記。

https://github.com/punkave/phpQuery

和

https://code.google.com/archive/p/phpquery/wikis/Manual.wiki

使用類似

  $tags = $doc->getElementsByTagName("a:contains('Series)")->text();

這是解析HTML的出色庫

PHP Regex從大字符串中查找子字符串-匹配開始和結束

問題描述

2 個解決方案

解決方案1
1 2016-07-30 07:03:39

解決方案2
1 已采納 2016-07-30 07:05:54

PHP Regex從大字符串中查找子字符串-匹配開始和結束

問題描述

2 個解決方案

解決方案1 1 2016-07-30 07:03:39

解決方案2 1 已采納 2016-07-30 07:05:54

解決方案1
1 2016-07-30 07:03:39

解決方案2
1 已采納 2016-07-30 07:05:54