简体   繁体   中英

Output complete link in each href

Im using Curl with simple html dom to scrape a website and in order to fix relative links I insert a base tag like this:

foreach($html->find('head') as $f) {
    $f->innertext = "<base href='$url'>" . $f->innertext;
}

Where $url is the website Im scraping. The problem is that the links are physically outputted like this:

<a href="/path_to_file"> link </a> 

While I need the full url in the link like so:

<a href="http://www.somewebsite.com/path_to_file"> link </a> 

How can I achieve this?

append the url each time you are setting it.

$base_url = "http://www.somewebsite.com/";
foreach($html->find('head') as $f) {
    $f->innertext = "<base href='$base_url$url'>" . $f->innertext;
}

Try to get the base URL like this:

<?php 
    $baseURL =  "http://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; 
?>

then prepend $baseURL to your href

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM