PHP DOM解析器解析html表

Question

How to use the PHP DOM Parser to parse the content of the table, so I get: 如何使用PHP DOM解析器解析表的内容，因此我得到：

the username 用户名
the mobilephone number 手机号码
the status 地位

So the output of what I try to extract would be: 因此，我尝试提取的输出将是：

randomusername - 0123456789 - active randomusername-0123456789-有效
randomusername2 - 0987654321 - active randomusername2-0987654321-有效

This is the html i try to parse (some part of it): 这是我尝试解析的html（部分内容）：

...
<div class="table tbl-process-mobile">
  <div class="table-cn">
    <div class="table-bd">
      <table cellspacing="0" id="idd7">

<thead>
    <tr id="idd9">
        <th scope="col">
          <span>username</span>
        </th>
        <th scope="col">
          <span>status</span>
        </th>

        <th scope="col">        
          <span>prefered number</span>
        </th>

        <th scope="col">
          <span>action</span>
        </th>
    </tr>
</thead>

<tbody id="iddb">
    <tr class="even">
        <td class="even">
            <div>randomusername</div>
        </td><td class="odd">
            <div>0123456789</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="iddc" style="display:none"></span>
  <a href="xyz" id="idb2"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddd" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb3" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb4" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr><tr class="odd">
        <td class="even">
            <div>randomusername2</div>
        </td><td class="odd">
            <div>0987654321</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="idde" style="display:none"></span>
  <a href="xyz" id="idb5"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddf" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb6" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb7" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr>
</tbody>
</table>
    </div>
  </div>
</div>
...

I already started with some PHP code: 我已经开始使用一些PHP代码：

<?php
error_reporting(0);
$matches = array();
$dom = new DOMDocument;

$dom->loadHTMLFile('settings.html');

How to extract the values, what's the best way to parse the HTML from this point? 如何提取值，从这一点上解析HTML的最佳方法是什么？

Answer 1

$field_names = ['username', 'phone', 'status'];
$result = [];

// Search for div tags having tbl-process-mobile class
$containers = $doc->getElementsByTagName('div');
foreach ($containers as $container) {
  if (!isset($container->attributes['class']))
    continue;

  if (false === strpos($container->attributes['class']->value,
    'tbl-process-mobile'))
    continue;

  // Assume that tbody tags are required
  if (!$tbodies = $container->getElementsByTagName('tbody'))
    continue;

  // Get the first tbody (there should not be more)
  if (!$tbodies->length || !$tbody = $tbodies->item(0))
    continue;

  foreach ($tbody->getElementsByTagName('tr') as $tr) {
    $i = 0;
    $row = [];
    $cells = $tr->getElementsByTagName('td');

    // Collect the first count($field_names) cell values as maximum
    foreach ($field_names as $name) {
      if (!$td = $cells->item($i++))
        break;
      $row[$name] = trim($td->textContent);
    }

    if ($row)
      $result []= $row;
  }
}

var_dump($result);

Sample Output 样本输出

array(2) {
  [0]=>
  array(3) {
    ["username"]=>
    string(14) "randomusername"
    ["phone"]=>
    string(10) "0123456789"
    ["status"]=>
    string(6) "active"
  }
  [1]=>
  array(3) {
    ["username"]=>
    string(15) "randomusername2"
    ["phone"]=>
    string(10) "0987654321"
    ["status"]=>
    string(6) "active"
  }
}

No comments, as the code is self-explanatory. 无需注释，因为该代码是不言自明的。

PS: in the sense of parsing, the HTML structure leaves a lot to be desired. PS：在解析的意义上，HTML结构还有很多不足之处。

Answer 2

You can use selector methods of DOMDocument class like getElementById() and getElementsByTag() to find target elements. 您可以使用DOMDocument类的选择器方法（如getElementById()和getElementsByTag()来查找目标元素。 After finding elements, get text of it and store in array. 找到元素后，获取它的文本并存储在数组中。

$trs = $dom->getElementById("iddb")->getElementsByTagName("tr");
$arr = [];
foreach($trs as $key=>$tr){
    $tds = $tr->getElementsByTagName("td");
    $arr[$key] = [
        $tds->item(0)->textContent,
        $tds->item(1)->textContent,
        $tds->item(2)->textContent
    ];
}

Check result in demo 在演示中检查结果

Also you can use DOMXPath class to find target elements. 您也可以使用DOMXPath类查找目标元素。

$xpath = new DOMXPath($dom);
$trs = $xpath->query("//tbody/tr");

Answer 3

Try use strip_tags 尝试使用strip_tags

$html='<div class="table tbl-process-mobile">
  <div class="table-cn">
    <div class="table-bd">
      <table cellspacing="0" id="idd7">

<thead>
    <tr id="idd9">
        <th scope="col">
          <span>username</span>
        </th>
        <th scope="col">
          <span>status</span>
        </th>

        <th scope="col">        
          <span>prefered number</span>
        </th>

        <th scope="col">
          <span>action</span>
        </th>
    </tr>
</thead>

<tbody id="iddb">
    <tr class="even">
        <td class="even">
            <div>randomusername</div>
        </td><td class="odd">
            <div>0123456789</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="iddc" style="display:none"></span>
  <a href="xyz" id="idb2"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddd" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb3" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb4" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr><tr class="odd">
        <td class="even">
            <div>randomusername2</div>
        </td><td class="odd">
            <div>0987654321</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="idde" style="display:none"></span>
  <a href="xyz" id="idb5"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddf" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb6" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb7" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr>
</tbody>
</table>
    </div>
  </div>
</div>';

echo strip_tags($html);

Updated: 更新：

You parse DOM elements using getElementsByTagName 您使用getElementsByTagName解析DOM元素

Read all td 阅读所有td

$td=$dom->getElementsByTagName('td');

loop through td and read the div contents 遍历td并读取div内容

foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
    echo $d->textContent;
}

} }

Here above will get all the div contents but we only the particular div elements so I suggest you to put some class or data attributes for divs which you want to retrieve. 上面的代码将获取所有div内容，但我们只包含特定的div元素，因此建议您为要检索的div放置一些类或数据属性。 Then put the if condition inside the loop. 然后将if条件放入循环中。 Here I put data class. 在这里我放数据类。

 foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
    if($d->getAttribute('class')=='data'){
     echo $d->textContent;
   }

}}

PHP DOM解析器解析html表

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-11-05 17:56:54

解决方案2
0 2016-11-05 17:51:29

解决方案3
-1 2016-11-05 17:20:29

PHP DOM解析器解析html表

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-11-05 17:56:54

解决方案2 0 2016-11-05 17:51:29

解决方案3 -1 2016-11-05 17:20:29

解决方案1
2 已采纳 2016-11-05 17:56:54

解决方案2
0 2016-11-05 17:51:29

解决方案3
-1 2016-11-05 17:20:29