[英]Output complete link in each href
Im using Curl with simple html dom to scrape a website and in order to fix relative links I insert a base tag like this: 我使用带有简单html dom的Curl抓取网站,并且为了修复相对链接,我插入了一个基本标签,如下所示:
foreach($html->find('head') as $f) {
$f->innertext = "<base href='$url'>" . $f->innertext;
}
Where $url
is the website Im scraping. $url
是网站Im抓取工具。 The problem is that the links are physically outputted like this: 问题是链接的物理输出是这样的:
<a href="/path_to_file"> link </a>
While I need the full url in the link like so: 虽然我需要链接中的完整网址,如下所示:
<a href="http://www.somewebsite.com/path_to_file"> link </a>
How can I achieve this? 我该如何实现?
append the url each time you are setting it. 每次设置时都附加网址。
$base_url = "http://www.somewebsite.com/";
foreach($html->find('head') as $f) {
$f->innertext = "<base href='$base_url$url'>" . $f->innertext;
}
Try to get the base URL like this: 尝试像这样获取基本URL:
<?php
$baseURL = "http://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
?>
then prepend $baseURL
to your href 然后在您的href前面加上
$baseURL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.