简体   繁体   English

在每个href中输出完整链接

[英]Output complete link in each href

Im using Curl with simple html dom to scrape a website and in order to fix relative links I insert a base tag like this: 我使用带有简单html dom的Curl抓取网站,并且为了修复相对链接,我插入了一个基本标签,如下所示:

foreach($html->find('head') as $f) {
    $f->innertext = "<base href='$url'>" . $f->innertext;
}

Where $url is the website Im scraping. $url是网站Im抓取工具。 The problem is that the links are physically outputted like this: 问题是链接的物理输出是这样的:

<a href="/path_to_file"> link </a> 

While I need the full url in the link like so: 虽然我需要链接中的完整网址,如下所示:

<a href="http://www.somewebsite.com/path_to_file"> link </a> 

How can I achieve this? 我该如何实现?

append the url each time you are setting it. 每次设置时都附加网址。

$base_url = "http://www.somewebsite.com/";
foreach($html->find('head') as $f) {
    $f->innertext = "<base href='$base_url$url'>" . $f->innertext;
}

Try to get the base URL like this: 尝试像这样获取基本URL:

<?php 
    $baseURL =  "http://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; 
?>

then prepend $baseURL to your href 然后在您的href前面加上$baseURL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM