[英]PHP DOM Parser parse html table
How to use the PHP DOM Parser to parse the content of the table, so I get: 如何使用PHP DOM解析器解析表的内容,因此我得到:
So the output of what I try to extract would be: 因此,我尝试提取的输出将是:
This is the html i try to parse (some part of it): 这是我尝试解析的html(部分内容):
...
<div class="table tbl-process-mobile">
<div class="table-cn">
<div class="table-bd">
<table cellspacing="0" id="idd7">
<thead>
<tr id="idd9">
<th scope="col">
<span>username</span>
</th>
<th scope="col">
<span>status</span>
</th>
<th scope="col">
<span>prefered number</span>
</th>
<th scope="col">
<span>action</span>
</th>
</tr>
</thead>
<tbody id="iddb">
<tr class="even">
<td class="even">
<div>randomusername</div>
</td><td class="odd">
<div>0123456789</div>
</td><td class="even">
<div>active</div>
</td><td class="odd">
<div>
<span id="iddc" style="display:none"></span>
<a href="xyz" id="idb2"><span>set number</span></a>
</div>
</td><td class="even">
<div>
<a id="iddd" style="display:none"></a>
<a href="xyz" class="action-icon-edit" id="idb3" title="change">
<i>change</i>
</a>
<a href="xyz" class="action-icon-delete" id="idb4" title="delete">
<i>delete</i>
</a>
</div>
</td>
</tr><tr class="odd">
<td class="even">
<div>randomusername2</div>
</td><td class="odd">
<div>0987654321</div>
</td><td class="even">
<div>active</div>
</td><td class="odd">
<div>
<span id="idde" style="display:none"></span>
<a href="xyz" id="idb5"><span>set number</span></a>
</div>
</td><td class="even">
<div>
<a id="iddf" style="display:none"></a>
<a href="xyz" class="action-icon-edit" id="idb6" title="change">
<i>change</i>
</a>
<a href="xyz" class="action-icon-delete" id="idb7" title="delete">
<i>delete</i>
</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
...
I already started with some PHP code: 我已经开始使用一些PHP代码:
<?php
error_reporting(0);
$matches = array();
$dom = new DOMDocument;
$dom->loadHTMLFile('settings.html');
How to extract the values, what's the best way to parse the HTML from this point? 如何提取值,从这一点上解析HTML的最佳方法是什么?
$field_names = ['username', 'phone', 'status'];
$result = [];
// Search for div tags having tbl-process-mobile class
$containers = $doc->getElementsByTagName('div');
foreach ($containers as $container) {
if (!isset($container->attributes['class']))
continue;
if (false === strpos($container->attributes['class']->value,
'tbl-process-mobile'))
continue;
// Assume that tbody tags are required
if (!$tbodies = $container->getElementsByTagName('tbody'))
continue;
// Get the first tbody (there should not be more)
if (!$tbodies->length || !$tbody = $tbodies->item(0))
continue;
foreach ($tbody->getElementsByTagName('tr') as $tr) {
$i = 0;
$row = [];
$cells = $tr->getElementsByTagName('td');
// Collect the first count($field_names) cell values as maximum
foreach ($field_names as $name) {
if (!$td = $cells->item($i++))
break;
$row[$name] = trim($td->textContent);
}
if ($row)
$result []= $row;
}
}
var_dump($result);
Sample Output 样本输出
array(2) {
[0]=>
array(3) {
["username"]=>
string(14) "randomusername"
["phone"]=>
string(10) "0123456789"
["status"]=>
string(6) "active"
}
[1]=>
array(3) {
["username"]=>
string(15) "randomusername2"
["phone"]=>
string(10) "0987654321"
["status"]=>
string(6) "active"
}
}
No comments, as the code is self-explanatory. 无需注释,因为该代码是不言自明的。
PS: in the sense of parsing, the HTML structure leaves a lot to be desired. PS:在解析的意义上,HTML结构还有很多不足之处。
You can use selector methods of DOMDocument
class like getElementById()
and getElementsByTag()
to find target elements. 您可以使用DOMDocument
类的选择器方法(如getElementById()
和getElementsByTag()
来查找目标元素。 After finding elements, get text of it and store in array. 找到元素后,获取它的文本并存储在数组中。
$trs = $dom->getElementById("iddb")->getElementsByTagName("tr");
$arr = [];
foreach($trs as $key=>$tr){
$tds = $tr->getElementsByTagName("td");
$arr[$key] = [
$tds->item(0)->textContent,
$tds->item(1)->textContent,
$tds->item(2)->textContent
];
}
Also you can use DOMXPath
class to find target elements. 您也可以使用DOMXPath
类查找目标元素。
$xpath = new DOMXPath($dom);
$trs = $xpath->query("//tbody/tr");
Try use strip_tags
尝试使用strip_tags
$html='<div class="table tbl-process-mobile">
<div class="table-cn">
<div class="table-bd">
<table cellspacing="0" id="idd7">
<thead>
<tr id="idd9">
<th scope="col">
<span>username</span>
</th>
<th scope="col">
<span>status</span>
</th>
<th scope="col">
<span>prefered number</span>
</th>
<th scope="col">
<span>action</span>
</th>
</tr>
</thead>
<tbody id="iddb">
<tr class="even">
<td class="even">
<div>randomusername</div>
</td><td class="odd">
<div>0123456789</div>
</td><td class="even">
<div>active</div>
</td><td class="odd">
<div>
<span id="iddc" style="display:none"></span>
<a href="xyz" id="idb2"><span>set number</span></a>
</div>
</td><td class="even">
<div>
<a id="iddd" style="display:none"></a>
<a href="xyz" class="action-icon-edit" id="idb3" title="change">
<i>change</i>
</a>
<a href="xyz" class="action-icon-delete" id="idb4" title="delete">
<i>delete</i>
</a>
</div>
</td>
</tr><tr class="odd">
<td class="even">
<div>randomusername2</div>
</td><td class="odd">
<div>0987654321</div>
</td><td class="even">
<div>active</div>
</td><td class="odd">
<div>
<span id="idde" style="display:none"></span>
<a href="xyz" id="idb5"><span>set number</span></a>
</div>
</td><td class="even">
<div>
<a id="iddf" style="display:none"></a>
<a href="xyz" class="action-icon-edit" id="idb6" title="change">
<i>change</i>
</a>
<a href="xyz" class="action-icon-delete" id="idb7" title="delete">
<i>delete</i>
</a>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>';
echo strip_tags($html);
Updated: 更新:
You parse DOM elements using getElementsByTagName
您使用getElementsByTagName
解析DOM元素
Read all td 阅读所有td
$td=$dom->getElementsByTagName('td');
loop through td and read the div contents 遍历td并读取div内容
foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
echo $d->textContent;
}
} }
Here above will get all the div contents but we only the particular div elements so I suggest you to put some class or data attributes for divs which you want to retrieve. 上面的代码将获取所有div内容,但我们只包含特定的div元素,因此建议您为要检索的div放置一些类或数据属性。 Then put the if condition inside the loop. 然后将if条件放入循环中。 Here I put data class. 在这里我放数据类。
foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
if($d->getAttribute('class')=='data'){
echo $d->textContent;
}
}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.