简体   繁体   English

如何使用 PHP Simple HTML DOM Parser 仅获取第一个特定标签

[英]How to get only first certain tags with PHP Simple HTML DOM Parser

I am trying to get first 3 tags texts using the PHP Simple HTML DOM Parser and collecting those in array.我正在尝试使用 PHP Simple HTML DOM Parser 获取前 3 个标签文本并将它们收集在数组中。

The table is like:表是这样的:

<table>
    <tbody>
        <tr>
            <td>Floyd</td>
            <td>Machine</td>
            <td>Banking</td>
            <td>HelpScout</td>
        </tr>
        <tr>
            <td>Nirvana</td>
            <td>Paper</td>
            <td>Business</td>
            <td>GuitarTuna</td>
        </tr>
        <tr>
            <td>The edge</td>
            <td>Tree</td>
            <td>Hospital</td>
            <td>Sician</td>
        </tr>

        .....
        .....
    </tbody>
</table>

What I am trying to achieve is collect these in arrays excluding the 4th td of the tr tag:我想要实现的是在 arrays 中收集这些,不包括tr标签的第 4 个td

array(
   array(
      'art' => 'Floyd',
      'thing' => 'machine',
      'passion' => 'Banking',
   ),
   array(
      'art' => 'Nirvana',
      'thing' => 'Paper',
      'passion' => 'Business',
   ),
   array(
      'art' => 'The edge',
      'thing' => 'Tree',
      'passion' => 'Hospital',
   ),
);

This is what I have tried is:这是我尝试过的是:

require_once dirname( __FILE__ ) . '/library/simple_html_dom.php';

$html    = file_get_html( 'https://www.example.com/list.html' );
$collect = array();
$list    = $html->find( 'table tbody tr td' );

foreach( $list as $l ) {
    $collect[] = $l->plaintext;
}

$html->clear();
unset($html);

print_r($collect);

Which is giving all the td s in array and it's being difficult to identify the array keys which I require.这给出了数组中的所有td ,并且很难识别我需要的数组键。 Is there any solution for me?有什么解决办法吗?

Instead of iterating over all td elements at once, you can iterate over each tr and for each tr, iterate over inner td elements and skip the 4th td:您可以迭代每个tr并且对于每个 tr,迭代内部 td 元素并跳过第 4 个 td,而不是一次迭代所有td元素:

$htmlString =<<<html
<table>
    <tbody>
        <tr>
            <td>Floyd</td>
            <td>Machine</td>
            <td>Banking</td>
            <td>HelpScout</td>
        </tr>
        <tr>
            <td>Nirvana</td>
            <td>Paper</td>
            <td>Business</td>
            <td>GuitarTuna</td>
        </tr>
        <tr>
            <td>The edge</td>
            <td>Tree</td>
            <td>Hospital</td>
            <td>Sician</td>
        </tr>
    </tbody>
</table>
html;
$html = str_get_html($htmlString);

// find all tr tags
$trs = $html->find('table tr');
$collect = [];

// foreach tr tag, find its td children
foreach ($trs as $tr) {
    $tds = $tr->find('td');
    // collect first 3 children and skip the 4th
    $collect []= [
        'art' => $tds[0]->plaintext,
        'thing' => $tds[1]->plaintext,
        'passion' => $tds[2]->plaintext,
    ];
}
print_r($collect); 

the output is: output 是:

Array
(
    [0] => Array
        (
            [art] => Floyd
            [thing] => Machine
            [passion] => Banking
        )

    [1] => Array
        (
            [art] => Nirvana
            [thing] => Paper
            [passion] => Business
        )

    [2] => Array
        (
            [art] => The edge
            [thing] => Tree
            [passion] => Hospital
        )

)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM