使用 php 从 DoMDocument 抓取网站

Question

我有一个 php 代码可以提取类别并显示它们。 但是，我仍然无法提取与之相关的数字（没有括号）。 类别和编号之间需要分开（不要一起提取）。 也许使用正则表达式等做另一个 for 循环......

这是代码：

<?php
    $grep = new DoMDocument();
    @$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");

    $finder = new DomXPath($grep);
    $class = "CatLevel1";
    $nodes = $finder->query("//*[contains(@class, '$class')]");

    foreach ($nodes as $node) {
        $span = $node->childNodes;
        echo $span->item(0)->nodeValue."<br>";
    }
?>

有什么办法可以做到吗？ 谢谢！

这是我想要的输出：

Arts, Antiques & Collectibles : 9768<br>
B2B & Industrial Products : 2342<br>
Baby : 3453<br>
etc...

Answer 1

只需添加另一个兄弟姐妹。 例子：

foreach ($nodes as $node) {
    $span = $node->childNodes;
    echo $span->item(0)->nodeValue . ': ' . str_replace(array('(', ')'), '', $span->item(1)->nodeValue);
    echo '<br/>';
}

编辑：只需将str_replace用于删除该括号的简单目的。

旁注：始终将 UTF-8 编码放在您的 PHP 文件中。

header('Content-Type: text/html; charset=utf-8');

使用 php 从 DoMDocument 抓取网站

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-08-26 05:04:04

使用 php 从 DoMDocument 抓取网站

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-08-26 05:04:04

解决方案1
0 已采纳 2014-08-26 05:04:04