使用简单HTML DOM解析器从HTML提取数据

Question

For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. 对于一个大学项目，我正在创建一个具有一些后端算法的网站，并在演示环境中对其进行测试，我需要大量虚假数据。 To get this data I intend to scrape some sites. 为了获得此数据，我打算抓取一些站点。 One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need. 这些网站之一是freelance.com。要提取数据，我使用的是简单HTML DOM分析器，但到目前为止，我在获取所需数据方面一直没有成功。

Here is an example of the HTML layout of the page I intend to scrape. 这是我要抓取的页面的HTML布局示例。 The red boxes mark the required data. 红色框标记所需的数据。

Freelance.com上的HTML代码的屏幕截图

Here is the code I have written so far after following some tutorials. 这是我在完成一些教程之后到目前为止编写的代码。

<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {

    foreach($tr->find('td[class=title-col]') as $t) {
        //get the inner HTML
        $data = $t->outertext;
        echo $data;
    }
}

?>

Hopefully someone can point me in the right direction as to how I can get this working. 希望有人可以指出正确的方向，告诉我如何使它正常工作。

Thanks. 谢谢。

Answer 1

The raw source code is different, that's why you're not getting the expected results... 原始源代码不同，这就是为什么您没有得到预期结果的原因...

You can check the raw source code using ctrl+u , the data are in table[id=project_table_static] , and the cells td have no attributes, so, here's a working code to get all the URLs from the table: 您可以使用ctrl+u检查原始源代码，数据位于table[id=project_table_static] ，并且单元格td没有属性，因此，这是一个工作代码，可从表中获取所有URL：

$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {

    // Skip the first empty element
    if ($i==0) {
        continue;
    }

    echo "<br/>\$i=".$i;

    // get the first anchor
    $anchor = $tr->find('a', 0);
    echo " => ".$anchor->href;
}

// Clear dom object
$html->clear(); 
unset($html);

Demo 演示

使用简单HTML DOM解析器从HTML提取数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-11-07 22:19:53

使用简单HTML DOM解析器从HTML提取数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-11-07 22:19:53

解决方案1
1 已采纳 2013-11-07 22:19:53