Here is an example of the HTML I need to parse into a PHP program:
<div id="dump-list">
<div class="dump-row">
<div class="dump-location odd" data-jmapping="{id: 35, point: {lng: -73.00898601, lat: 41.71727402}, category: 'office'}">
<div class="SingleLinkNoTx">
<a href="#10" class="loc-link">Acme Software</a><br/><strong>John Doe, MBA</strong><br/>123 Main St.<br />New York, NY 10036<br /><strong class="telephone">(212) 555-1234</strong><br/>
</div><!-- END.SingleLinkNoTx -->
<a href="http://www.example.com" target="_blank" class="web_link">Visit Website</a><span><br />(0.3 miles)</span>
<div class="loc-info">
<div class="loc-info-text ">
John Doe, MBA<br /><a href="http://maps.google.com/?daddr=41.71727402,-73.00898601" target="_blank">Get Directions »</a>
</div>
</div>
</div>
This is the information I want to extract from the above HTML example into PHP:
lng: -73.00898601, lat: 41.71727402
category: 'office'
Acme Software
John Doe, MBA
123 Main St.
New York, NY 10036
(212) 555-1234
http://www.example.com
I have tried using PHP Simple HTML DOM Parser, but I'm new to it and can't find a working PHP example that pertains to what I need to do. I tried some PHP code like this to understand how this works, but the var_dump($e) produces huge amounts of output and has messages in the var_dump about recursion. So I'm lost how to really use this. Greatly appreciate some kind help!
$e = $html->find('.dump-location', 0)->find('.SingleLinkNoTx', 0);
echo $e;
var_dump($e);
Use XPath to find and extract elements in an HTML/XML document - specifically the SimpleXMLElement::xpath method.
The following example will find the telephone number for a location:
$doc = new DOMDocument();
$doc->loadHTML('your html snippet goes here - or use loadHTMLFile()');
$xml = simplexml_import_dom($doc);
$elements = $xml->xpath('//*[contains(@class, "dump-location")]/div[@class="SingleLinkNoTx"]/strong[@class="telephone"]');
print_r($elements);
The most complex part is the XPath expression. A quick breakdown:
//
*[contains(@class, "dump-location")]
dump-location
class /
dump-location
parent. div[@class="SingleLinkNoTx"]
DIV
element that has a SingleLinkNoTx
class (and no other class name). strong
STRONG
tags with a telephone
class. Using this XPath expression on the HTML snippet provided in the question will result in output like the following. Which is fairly easy to iterate and extract information from:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => telephone
)
[0] => (212) 555-1234
)
)
If you know the document structure it's possible to construct an XPath expression for each piece of information you want to extract. Or, it might be simpler to use a more general XPath expression (say, an expression that retrieves all dump-location
elements) and manually iterate though the elements.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.