In short, I'm utilizing pre_replace
to find style sheets and essentially proxy this experience for viewers on my website, I use the external domain and prepend
it to the current href
. The style sheet starts like so.
<link rel="stylesheet" type="text/css" href="/assets/css/base.css">
I will take the href
and prepend
the domain to be
<link rel="stylesheet" type="text/css" href="http://www.website.com/assets/css/base.css">
My issue is, when I encounter a site that does not include HTTP/HTTPS
<link rel="stylesheet" type="text/css" href="//cdn.website.com/assets/css/base.css">
Then my current preg replace would not function and return the stylesheet to the following
<link rel="stylesheet" type="text/css" href="http://www.website.com//cdn.website.com/assets/css/base.css">
Is it possible to create some sort of If then
with preg_replace
to not manipulate the "//" hrefs and only replace the ones with no absolute base domain?
Current preg_replace
being used:
$html = file_get_contents($website_url);
$domain = 'website.com';
$html = preg_replace("/(href|src)\=\"([^(http)])(\/)?/", "$1=\"$domain$2", $html);
echo $html;
There are if/then/else
conditionals in regex, although not really necessary for this to work:
(?!(href|src)=)(\")\/(\\w+.+)(\">)
Code:
$html = file_get_contents($website_url);
$domain = 'http://website.com';
$result = preg_replace("/(?!(href|src)=)(\")\/(\\w+.+)(\">)/u", "$2$domain/$3$4", $html);
echo $result;
Output:
<link rel="stylesheet" type="text/css" href="http://website.com/assets/css/base.css">
Example:
[^(href)]
is not a negation. It's still a character class.
You are looking for a (?!...)
negative lookahead :
~ (href|src) =\" (?!href:) \/? ~x
While I dispute the SO meme and overgeneralization of firing up a DOM traversal for each trivia, it should be noted that regex is often only appropriate for normalized and well-known HTML input; not if your task is proxying arbitrary websites.
function alterLinks($html) {
$ret = '';
$dom = new DomDocument();
$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $alink) {
$href = $alink->getAttribute('href');
$aMungedLink = $this->mungeHref($href);
$alink->setAttribute("href",$aMungedLink);
}
$ret = $dom->saveHTML();
return $ret;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.