Im using Curl with simple html dom to scrape a website and in order to fix relative links I insert a base tag like this:
foreach($html->find('head') as $f) {
$f->innertext = "<base href='$url'>" . $f->innertext;
}
Where $url
is the website Im scraping. The problem is that the links are physically outputted like this:
<a href="/path_to_file"> link </a>
While I need the full url in the link like so:
<a href="http://www.somewebsite.com/path_to_file"> link </a>
How can I achieve this?
append the url each time you are setting it.
$base_url = "http://www.somewebsite.com/";
foreach($html->find('head') as $f) {
$f->innertext = "<base href='$base_url$url'>" . $f->innertext;
}
Try to get the base URL like this:
<?php
$baseURL = "http://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
?>
then prepend $baseURL
to your href
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.