简体   繁体   中英

How to webscrape text inside class and element

I'm trying to webscrape text from this site I want to scrape aaa-a.nl , abcinkt.nl , accudeals.nl etc..
Those urls are from the <ul class="members members-list clearfix"> class and are inside <li></li> .
How do I webscrape those in PHP?

Let's say you have already read (CURL) the file into a variable $html . You can then follow the following procedure to extract the required element:

$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);
if (!$sxml) {
    echo "ERROR. Do something to handle this.\n";
}
$node = $sxml->xpath("//ul[contains(concat(' ', normalize-space(@class), ' '), 'members-list')]");
foreach($nodes[0]->li as $member) {
    echo (string)$member->a; // This will echo the strings you need
}

*Not tested.

(To understand the xpath query in the above code, see this: Getting DOM elements by classname )

Here I'm using DOMDocument and SimpleXml. You can do this by several other ways, say, by using DOMDocument class alone to navigate the DOM, or using DOMDocument with DOMXPath, or maybe even by just using Php string functions and regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM