简体   繁体   English

PHP简单HTML DOM删除外部URL

[英]PHP Simple HTML DOM Scrape External URL

I'm trying to build a personal project of mine, however I'm a bit stuck when using the Simple HTML DOM class. 我正在尝试建立我的个人项目,但是在使用Simple HTML DOM类时有点卡住。

What I'd like to do is scrape a website and retrieve all the content, and it's inner html, that matches a certain class. 我想做的就是刮一个网站并检索所有内容,它是与特定类匹配的内部html。

My code so far is: 到目前为止,我的代码是:

    <?php
    error_reporting(E_ALL);
    include_once("simple_html_dom.php");
    //use curl to get html content
    $url = 'http://www.peopleperhour.com/freelance-seo-jobs';

    $html = file_get_html($url);

    //Get all data inside the <div class="item-list">
    foreach($html->find('div[class=item-list]') as $div) {
    //get all div's inside "item-list"
    foreach($div->find('div') as $d) {
    //get the inner HTML
    $data = $d->outertext;
    }
    }
print_r($data)
    echo "END";
    ?>

All I get with this is a blank page with "END", nothing else outputted at all. 我所得到的只是带有“ END”的空白页,什么都没有输出。

It seems your $data variable is being assigned a different value on each iteration. 似乎您的$ data变量在每次迭代中都分配了不同的值。 Try this instead: 尝试以下方法:

$data = "";
foreach($html->find('div[class=item-list]') as $div) {
    //get all divs inside "item-list"
    foreach($div->find('div') as $d) {
         //get the inner HTML
         $data .= $d->outertext;
    }
}
print_r($data)

I hope that helps. 希望对您有所帮助。

I think, you may want something like this 我想,您可能想要这样的东西

$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
foreach ($html->find('div.item-list div.item') as $div) {
    echo $div . '<br />';
};

This will give you something like this (if you add the proper style sheet, it'll be displayed nicely) 这将为您提供类似的信息(如果您添加适当的样式表,它将很好地显示)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM