函數 preg_match_all 的難點

Question

我想取回 span HTML 標簽之間的數字。 人數可能會變！

<span class="topic-count">
  ::before
  "
             24
          "
  ::after
</span>

我試過以下代碼：

preg_match_all("#<span class=\"topic-count\">(.*?)</span>#", $source, $nombre[$i]);

但它不起作用。

完整代碼：

$result=array();
$page = 201;
while ($page>=1) {
    $source = file_get_contents ("http://www.jeuxvideo.com/forums/0-27047-0-1-0-".$page."-0-counter-strike-global-offensive.htm");
    preg_match_all("#<span class=\"topic-count\">(.*?)</span>#", $source, $nombre[$i]);
    $result = array_merge($result, $nombre[$i][1]);
    print("Page : ".$page ."\n");
    $page-=25;
}
print_r ($nombre);

Answer 1

可以用

preg_match_all(
    '#<span class="topic-count">[^\d]*(\d+)[^\d]*?</span>#s', 
    $html, 
    $matches
);

這將捕獲跨度結束之前的任何數字。

但是，請注意，此正則表達式僅適用於這段 html。 如果標記有細微的變化，例如，另一個類或另一個屬性，該模式將不再起作用。 為 HTML 編寫可靠的正則表達式很困難。

因此建議改用 DOM 解析器，例如

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.jeuxvideo.com/forums/0-27047-0-1-0-1-0-counter-strike-global-offensive.htm');
libxml_use_internal_errors(false);

$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//span[contains(@class, "topic-count")]') as $node) {
    if (preg_match_all('#\d+#s', $node->nodeValue, $topics)) {
        echo $topics[0][0], PHP_EOL;
    }
}

DOM 會將整個頁面解析為節點樹，然后您可以通過 XPath 方便地查詢。 注意表達式

//span[contains(@class, "topic-count")]

這將為您提供包含字符串主題計數的類屬性的所有跨度元素。 然后，如果這些節點中的任何一個包含數字，則回顯它。

函數 preg_match_all 的難點

問題描述

1 個解決方案

解決方案1
1 已采納 2017-02-09 09:47:44

函數 preg_match_all 的難點

問題描述

1 個解決方案

解決方案1 1 已采納 2017-02-09 09:47:44

解決方案1
1 已采納 2017-02-09 09:47:44