简体   繁体   English

如何使用php从网站上的表格中抓取数据

[英]How can I scrape data from a table on a website using php

Still new to php programming and I have been trying to scrape data from a table in a website ( https://en.wikipedia.org/wiki/HP_EliteBook ). 仍然是php编程的新手,我一直在尝试从网站( https://en.wikipedia.org/wiki/HP_EliteBook )中的表中抓取数据。 Particularly getting elitebook laptops that use intel graphics card but I am having issues going about the code to access the data in the element I want. 尤其是获得使用Intel图形卡的Elitebook笔记本电脑,但是我在访问所需元素中的代码方面遇到了问题。 If anyone could help me to an idea I would be grateful. 如果有人能帮助我提出一个主意,我将不胜感激。

Been using the simplehtmldom.php and the foreach loop to try and access the td element of the table and print the result but all I get are a variety of errors. 一直在使用simplehtmldom.php和foreach循环来尝试访问表的td元素并打印结果,但是我得到的都是各种各样的错误。 Attached is the code I am currently trying 附件是我目前正在尝试的代码

<?php

include('simple_html_dom.php');
$html = file_get_html('https://en.wikipedia.org/wiki/HP_EliteBook');

$table= $html->find('table[class="wikitable"]',1);

//$tdata= array();

    foreach($table->find('tr') as $tr){
        $tdata[0] = $tr->find('td',0); //find the first td starts from 0
        $tdata[1] = $tr->find('td',1);
        $tdata[2] = $tr->find('td',2);
        $tdata[3] = $tr->find('td',3);
        $tdata[4] = $tr->find('td',4);
        $tdata[5] = $tr->find('td',5);

        $data[]= $tdata;
    }

        print_r($data);

?>

I at least expected to see the data from all the other cells 我至少期望看到其他所有单元格的数据

Table column length is different, for this table use index 3 and 4 for getting Graphic Card name. 表格列的长度不同,为此表格使用索引3和4来获取图形卡名称。

<?php
include_once('/simple_html_dom.php');


$html = file_get_html('https://en.wikipedia.org/wiki/HP_EliteBook');

$table = $html->find('table.wikitable', 2);
$useIntel = array();

foreach ($table->find('tr') as $tr) {
    if (!$tr->find('td', 0))
        continue;

    for ($i = 3; $i <= 4; $i++)
        if ($tr->find('td', $i) && strpos($tr->find('td', $i)->innertext, 'Intel') !== false) {
            $useIntel[] = $tr->find('td', 0)->innertext;
            continue 2;
        }
}

echo "<pre>";
var_dump($useIntel);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM