繁体   English   中英

如何使用 PHP Simple HTML DOM Parser 仅获取第一个特定标签

[英]How to get only first certain tags with PHP Simple HTML DOM Parser

我正在尝试使用 PHP Simple HTML DOM Parser 获取前 3 个标签文本并将它们收集在数组中。

表是这样的:

<table>
    <tbody>
        <tr>
            <td>Floyd</td>
            <td>Machine</td>
            <td>Banking</td>
            <td>HelpScout</td>
        </tr>
        <tr>
            <td>Nirvana</td>
            <td>Paper</td>
            <td>Business</td>
            <td>GuitarTuna</td>
        </tr>
        <tr>
            <td>The edge</td>
            <td>Tree</td>
            <td>Hospital</td>
            <td>Sician</td>
        </tr>

        .....
        .....
    </tbody>
</table>

我想要实现的是在 arrays 中收集这些,不包括tr标签的第 4 个td

array(
   array(
      'art' => 'Floyd',
      'thing' => 'machine',
      'passion' => 'Banking',
   ),
   array(
      'art' => 'Nirvana',
      'thing' => 'Paper',
      'passion' => 'Business',
   ),
   array(
      'art' => 'The edge',
      'thing' => 'Tree',
      'passion' => 'Hospital',
   ),
);

这是我尝试过的是:

require_once dirname( __FILE__ ) . '/library/simple_html_dom.php';

$html    = file_get_html( 'https://www.example.com/list.html' );
$collect = array();
$list    = $html->find( 'table tbody tr td' );

foreach( $list as $l ) {
    $collect[] = $l->plaintext;
}

$html->clear();
unset($html);

print_r($collect);

这给出了数组中的所有td ,并且很难识别我需要的数组键。 有什么解决办法吗?

您可以迭代每个tr并且对于每个 tr,迭代内部 td 元素并跳过第 4 个 td,而不是一次迭代所有td元素:

$htmlString =<<<html
<table>
    <tbody>
        <tr>
            <td>Floyd</td>
            <td>Machine</td>
            <td>Banking</td>
            <td>HelpScout</td>
        </tr>
        <tr>
            <td>Nirvana</td>
            <td>Paper</td>
            <td>Business</td>
            <td>GuitarTuna</td>
        </tr>
        <tr>
            <td>The edge</td>
            <td>Tree</td>
            <td>Hospital</td>
            <td>Sician</td>
        </tr>
    </tbody>
</table>
html;
$html = str_get_html($htmlString);

// find all tr tags
$trs = $html->find('table tr');
$collect = [];

// foreach tr tag, find its td children
foreach ($trs as $tr) {
    $tds = $tr->find('td');
    // collect first 3 children and skip the 4th
    $collect []= [
        'art' => $tds[0]->plaintext,
        'thing' => $tds[1]->plaintext,
        'passion' => $tds[2]->plaintext,
    ];
}
print_r($collect); 

output 是:

Array
(
    [0] => Array
        (
            [art] => Floyd
            [thing] => Machine
            [passion] => Banking
        )

    [1] => Array
        (
            [art] => Nirvana
            [thing] => Paper
            [passion] => Business
        )

    [2] => Array
        (
            [art] => The edge
            [thing] => Tree
            [passion] => Hospital
        )

)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM