简体   繁体   中英

How to get particular element from a complete website (Not single page)

Want to single element from a complete website. Searched google for few hours and no results. Maybe i search the wrong term but i can't seems to find a way to do that.

I took the sitemap.xml and got all the links in it with the code below.

I want to use this XML Links to get element from all the links together.

<?php  

$urls = array();  

$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('https://www.ivory.co.il/sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');

foreach($DomNodeList as $url) {
    $urls[] = $url->nodeValue;
}

//display it
echo "<pre>";
print_r($urls);
echo "</pre>";

?>

Need help...

Using simplexml_load_file (since it's public available):

<?php
$url = "https://www.ivory.co.il/sitemap.xml";

$xml = simplexml_load_file($url) or die ("Error: Cannot create object");
$locs = array();

for($i=0; $i<count($xml->url); $i++){
    $locs[$i] = (string) $xml->url[$i]->loc;
}

echo "<pre>";
print_r($locs);

OUTPUT:

Array
(
    [0] => https://www.ivory.co.il/
    [1] => https://www.ivory.co.il/%D7%97%D7%[...]
    [2] => https://www.ivory.co.il/%D7%98%D7%[...]
    [3] => https://www.ivory.co.il/%D7%9B%D7%[...]
    [4] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [5] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [6] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [7] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [8] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [9] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [10] => https://www.ivory.co.il/%D7%9E%D7%[...]
    [...]
)

Then you can access each URI with curl functions, iterating the array of links and treating each access to fetch data (docs are here , and some tips here as well).

Example:

$curl = curl_init();
curl_setopt_array ($curl, array(
          CURLOPT_URL => $locs[1],
          CURLOPT_RETURNTRANSFER => true)
);
$result = curl_exec($curl);
curl_close ($curl);
echo $result;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM