[英]Using php to parse html document
我正在制作一個PHP應用程序來解析HTML內容。 我需要在php變量中存儲某個表列。
這是我的代碼:
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$flag=0;
foreach ($rows as $row)
{
if($flag==0) $flag=1;
else
{
$cols = $row->getElementsByTagName('td');
foreach ($cols as $col)
{
echo $col->nodeValue; //NEED HELP HERE
}
echo '<hr />';
}
}
在每一行中,第一個col是KEY,第二個是VALUE。 如何從表中創建鍵值對並將它們存儲為php中的數組。
我嘗試了很多東西,但每次我只是將DOMElement Object()
作為值。
非常感謝任何幫助......
要求的HTML:
<table align='center' border='0' cellpadding='0' cellspacing='0' style='border-collapse: collapse' width='780' height=100%>
<tr><td height=96% align=center><BR><BR>
<html>
<head>
</head>
<body style="background:url(uptu_logo1.gif); background-repeat:no-repeat; background-position:center">
<p align="center" style="font-size:18px"><span style='font-size:20px'>this text is unimportant gibberish that is not required by my app</span><br/><span style='font-size:16px'>this text is unimportant gibberish that is not required by my app</span><br/><u>B.Tech. Third Year Result 2009-10. this text is unimportant gibberish that is not required by my app</u></p>
<br/>
<table align="center" border="1" cellpadding="0" cellspacing="0" bordercolor="#E3DDD5" width="700" style="border-collapse: collapse; font-size: 11px">
<tr>
<td width="50%"><b>Name:</b></td>
<td width="50%">John Fernandes </td>
</tr>
<tr>
<td><b>Fathers Name:</b></td>
<td>Caith Fernandes </td>
</tr>
<tr>
<td><b>Roll No:</b></td>
<td>0702410099</td>
</tr>
<tr>
<td><b>Status:</b></td>
<td>REGULAR </td>
</tr>
<tr>
<td><b>Course/Branch:</b></td>
<td>B. Tech. </td>
</tr>
<tr>
<td><b>Institute Name</b></td>
<td>Imperial College of Science and Technology</td>
</tr>
</table>
我的PHP代碼輸出:
Name:John Fernandes <hr />
Fathers Name:Caith Fernandes <hr />
Roll No:0702410099<hr />
Status:REGULAR <hr />
Course/Branch:B. Tech. Computer Science and Engineering (10)<hr />
Imperial College of Science and Technology<hr />
還有如何擺脫這種愚蠢的? 我在原始HTML中看到所以我嘗試使用PHP函數html_entity_decode()
進行清理但是它仍然存在...
您要加載的HTML是什么? 我假設這是一個簡單的事情:
<table>
<tr>
<td>heading</td>
<td>heading</td>
</tr>
<tr>
<td>key</td>
<td>value</td>
</tr>
</table>
看起來第一個tr被跳過(標題),然后你只有2列你想要配對為KEY => VALUE;
$cols = $row->getElementsByTagName('td');
$key = $cols->item(0)->nodeValue; // string(3) "key"
$val = $cols->item(1)->nodeValue; // string(5) "value"
上面的代碼將返回您想要的項目。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.