I'm trying to find a regular expression that is able to change all URLs of a curl'ed document from relative to absolute.
One of the way I found is the post here but it works only for the first URL and not for all.
This is the code I'm using:
$url="http://www.example.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_DNS_USE_GLOBAL_CACHE, 0);
curl_setopt($ch, CURLOPT_DNS_CACHE_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$result=curl_exec($ch);
curl_close($ch);
$result = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://www.example.com$3"', $result);
echo $result;
Where am I doing wrong?
EDIT Just to explain better. I haven't an array of urls, but I have an entire document gathered from curl so I need a preg replace method.
I'm not exactley sure why it replaces it just one time (maybe it has something to do with the backreference), but when you wrap it in a while
loop, it should work.
$pattern = '~(href|src)=(["\'])(?!#|//|http)([^\2]*)\2~i';
while (preg_match($pattern, $result)) {
$result = preg_replace($pattern,'$1="http://www.example.com$3"', $result);
}
(I also changed the pattern slightly.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.