简体   繁体   中英

PHP preg_replace HREF

In short, I'm utilizing pre_replace to find style sheets and essentially proxy this experience for viewers on my website, I use the external domain and prepend it to the current href . The style sheet starts like so.

<link rel="stylesheet" type="text/css" href="/assets/css/base.css">

I will take the href and prepend the domain to be

<link rel="stylesheet" type="text/css" href="http://www.website.com/assets/css/base.css">

My issue is, when I encounter a site that does not include HTTP/HTTPS

<link rel="stylesheet" type="text/css" href="//cdn.website.com/assets/css/base.css">

Then my current preg replace would not function and return the stylesheet to the following

<link rel="stylesheet" type="text/css" href="http://www.website.com//cdn.website.com/assets/css/base.css">

Is it possible to create some sort of If then with preg_replace to not manipulate the "//" hrefs and only replace the ones with no absolute base domain?

Current preg_replace being used:

$html = file_get_contents($website_url);
$domain = 'website.com';
$html = preg_replace("/(href|src)\=\"([^(http)])(\/)?/", "$1=\"$domain$2", $html);
echo $html;

There are if/then/else conditionals in regex, although not really necessary for this to work:

(?!(href|src)=)(\")\/(\\w+.+)(\">)

Code:

$html = file_get_contents($website_url);
$domain = 'http://website.com';
$result = preg_replace("/(?!(href|src)=)(\")\/(\\w+.+)(\">)/u", "$2$domain/$3$4", $html);
echo $result;

Output:

<link rel="stylesheet" type="text/css" href="http://website.com/assets/css/base.css">

Example:

http://regex101.com/r/kU7pF1

[^(href)] is not a negation. It's still a character class.

You are looking for a (?!...) negative lookahead :

 ~  (href|src) =\" (?!href:)  \/?  ~x

While I dispute the SO meme and overgeneralization of firing up a DOM traversal for each trivia, it should be noted that regex is often only appropriate for normalized and well-known HTML input; not if your task is proxying arbitrary websites.

function alterLinks($html) {

  $ret = '';

  $dom = new DomDocument();
  $dom->loadHTML($html);
  $links = $dom->getElementsByTagName('a');

  foreach ($links as $alink) {
    $href = $alink->getAttribute('href'); 
    $aMungedLink = $this->mungeHref($href);
    $alink->setAttribute("href",$aMungedLink);
  }

  $ret = $dom->saveHTML();
  return $ret;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM