简体   繁体   English

使用php刮刮 - preg_match_all

[英]Scraping using php - preg_match_all

Trying to get the value of Internet Data Volume Balance - the script should echo 146.30mb 试图获取Internet数据量平衡的值 - 该脚本应回显146.30mb

New to all these, having a look at all the tutorials. 所有这些新手,看看所有教程。

How can this be done? 如何才能做到这一点?

<tr >
    <td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Account Status</FONT></B></div></td>
    <td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">You exceeded your allowed credit.</FONT></div></td>
</tr> 

<tr >
    <td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Period Free Time Remaining</FONT></B></div></td>
    <td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">0:00:00 hours</FONT></div></td>
</tr> 

<tr >
    <td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Internet Data Volume Balance</FONT></B></div></td>
    <td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text" style="text-transform:none;">146.30 MB</FONT></div></td>
</tr> 

PHP can interact with the DOM just like JavaScript can. PHP可以像JavaScript一样与DOM交互。 This is vastly superior to parsing the markup, as most people will tell you is the wrong approach anyway: 这远远优于解析标记,因为大多数人会告诉你错误的方法:

Loading from an HTML File 从HTML文件加载

// Start by creating a new document
$doc = new DOMDocument();
// I've loaded the table into an external file, and am loading it into the $doc
$doc->loadHTMLFile( 'htmlpage.html' );
// Since you have six table cells, I'm calling up all of them
$cells = $doc->getElementsByTagName("td");
// I'm grabbing the sixth cell's textContent property
echo $cells->item(5)->textContent;

This code will output "146.30 MB" to the screen. 此代码将“146.30 MB”输出到屏幕。

Loading from a String 从字符串加载

If you have the HTML stored within a string, you can load that into your document as well. 如果您将HTML存储在字符串中,则也可以将其加载到文档中。 We'll change the method used to load the file, into the method used to load from a string: 我们将用于加载文件的方法更改为用于从字符串加载的方法:

$str = "<table><tr><td>Foo</td></tr>...</table>";
$doc->loadHTML( $str );

We would then proceed with the same code as above to select the cells, and show their textContent in the output. 然后我们将使用与上面相同的代码来选择单元格,并在输出中显示它们的textContent。

Check out the DOMDocument Class. 查看DOMDocument类。

If you were willing to or have already installed phpQuery, you can use that. 如果您愿意或已经安装了phpQuery,您可以使用它。

phpQuery::newDocumentFileHTML('htmlpage.html');
echo pq('td:eq(6)')->text();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM