简体   繁体   中英

PHP DOM Parser parse html table

How to use the PHP DOM Parser to parse the content of the table, so I get:

  • the username
  • the mobilephone number
  • the status

So the output of what I try to extract would be:

  • randomusername - 0123456789 - active
  • randomusername2 - 0987654321 - active

This is the html i try to parse (some part of it):

...
<div class="table tbl-process-mobile">
  <div class="table-cn">
    <div class="table-bd">
      <table cellspacing="0" id="idd7">

<thead>
    <tr id="idd9">
        <th scope="col">
          <span>username</span>
        </th>
        <th scope="col">
          <span>status</span>
        </th>

        <th scope="col">        
          <span>prefered number</span>
        </th>

        <th scope="col">
          <span>action</span>
        </th>
    </tr>
</thead>

<tbody id="iddb">
    <tr class="even">
        <td class="even">
            <div>randomusername</div>
        </td><td class="odd">
            <div>0123456789</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="iddc" style="display:none"></span>
  <a href="xyz" id="idb2"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddd" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb3" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb4" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr><tr class="odd">
        <td class="even">
            <div>randomusername2</div>
        </td><td class="odd">
            <div>0987654321</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="idde" style="display:none"></span>
  <a href="xyz" id="idb5"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddf" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb6" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb7" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr>
</tbody>
</table>
    </div>
  </div>
</div>
...

I already started with some PHP code:

<?php
error_reporting(0);
$matches = array();
$dom = new DOMDocument;

$dom->loadHTMLFile('settings.html');

How to extract the values, what's the best way to parse the HTML from this point?

$field_names = ['username', 'phone', 'status'];
$result = [];

// Search for div tags having tbl-process-mobile class
$containers = $doc->getElementsByTagName('div');
foreach ($containers as $container) {
  if (!isset($container->attributes['class']))
    continue;

  if (false === strpos($container->attributes['class']->value,
    'tbl-process-mobile'))
    continue;

  // Assume that tbody tags are required
  if (!$tbodies = $container->getElementsByTagName('tbody'))
    continue;

  // Get the first tbody (there should not be more)
  if (!$tbodies->length || !$tbody = $tbodies->item(0))
    continue;

  foreach ($tbody->getElementsByTagName('tr') as $tr) {
    $i = 0;
    $row = [];
    $cells = $tr->getElementsByTagName('td');

    // Collect the first count($field_names) cell values as maximum
    foreach ($field_names as $name) {
      if (!$td = $cells->item($i++))
        break;
      $row[$name] = trim($td->textContent);
    }

    if ($row)
      $result []= $row;
  }
}

var_dump($result);

Sample Output

array(2) {
  [0]=>
  array(3) {
    ["username"]=>
    string(14) "randomusername"
    ["phone"]=>
    string(10) "0123456789"
    ["status"]=>
    string(6) "active"
  }
  [1]=>
  array(3) {
    ["username"]=>
    string(15) "randomusername2"
    ["phone"]=>
    string(10) "0987654321"
    ["status"]=>
    string(6) "active"
  }
}

No comments, as the code is self-explanatory.

PS: in the sense of parsing, the HTML structure leaves a lot to be desired.

You can use selector methods of DOMDocument class like getElementById() and getElementsByTag() to find target elements. After finding elements, get text of it and store in array.

$trs = $dom->getElementById("iddb")->getElementsByTagName("tr");
$arr = [];
foreach($trs as $key=>$tr){
    $tds = $tr->getElementsByTagName("td");
    $arr[$key] = [
        $tds->item(0)->textContent,
        $tds->item(1)->textContent,
        $tds->item(2)->textContent
    ];
}

Check result in demo

Also you can use DOMXPath class to find target elements.

$xpath = new DOMXPath($dom);
$trs = $xpath->query("//tbody/tr");

Try use strip_tags

$html='<div class="table tbl-process-mobile">
  <div class="table-cn">
    <div class="table-bd">
      <table cellspacing="0" id="idd7">

<thead>
    <tr id="idd9">
        <th scope="col">
          <span>username</span>
        </th>
        <th scope="col">
          <span>status</span>
        </th>

        <th scope="col">        
          <span>prefered number</span>
        </th>

        <th scope="col">
          <span>action</span>
        </th>
    </tr>
</thead>

<tbody id="iddb">
    <tr class="even">
        <td class="even">
            <div>randomusername</div>
        </td><td class="odd">
            <div>0123456789</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="iddc" style="display:none"></span>
  <a href="xyz" id="idb2"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddd" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb3" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb4" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr><tr class="odd">
        <td class="even">
            <div>randomusername2</div>
        </td><td class="odd">
            <div>0987654321</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="idde" style="display:none"></span>
  <a href="xyz" id="idb5"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddf" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb6" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb7" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr>
</tbody>
</table>
    </div>
  </div>
</div>';

echo strip_tags($html);

Updated:

You parse DOM elements using getElementsByTagName

Read all td

$td=$dom->getElementsByTagName('td');

loop through td and read the div contents

foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
    echo $d->textContent;
}

}

Here above will get all the div contents but we only the particular div elements so I suggest you to put some class or data attributes for divs which you want to retrieve. Then put the if condition inside the loop. Here I put data class.

 foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
    if($d->getAttribute('class')=='data'){
     echo $d->textContent;
   }

}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM