简体   繁体   English

正则表达式匹配PHP

[英]Regex matching php

I'm trying to replace a given url with another one in php using preg_replace, with the following code: 我正在尝试使用preg_replace在PHP中用另一个代码替换给定的URL,并使用以下代码:

$patt = '#(?:https?:\/\/)?(?:www\.)?((?:[^\.]+)\.)?example\.com#i';
preg_replace($patt,"http://$1anotherwebsite.com",$somehtml);

I'm having two issues however: I would like $1 to be blank if the subdomain is www. 但是,我有两个问题:如果子域是www.我希望$1为空白www. , but it seems that ((?:[^\\.]+)\\.)? ,但似乎((?:[^\\.]+)\\.)? matches www. 符合www. and not (?:www\\.)? 而不是(?:www\\.)? as expected. 如预期的那样。 This seems to be a php specific issue. 这似乎是特定于php的问题。

In the case of this example , the second match contains part of the first string. 在此示例中 ,第二个匹配项包含第一个字符串的一部分。 Is there anyway I could limit this to only match between < and > ? 无论如何,我可以将其限制为仅在<>之间匹配吗? I tried using (<.*) and (.*>) , but nothing matched. 我尝试使用(<.*)(.*>) ,但是没有匹配的东西。

EDIT: Here are some sample inputs / outputs: 编辑:这是一些示例输入/输出:

http://static.example.com/assets/js/jquery-1.6.1.min.js?1384234134 -> http://static.anotherwebsite.com/assets/js/jquery-1.6.1.min.js?1384234134

http://www.example.com -> http://anotherwebsite.com

example.com -> http://anotherwebsite.com

https://example.com/index.php -> http://anotherwebsite.com/index.php

The links are coded in html, so restricting the match to be between < and > would work I believe. 链接是用html编码的,所以我相信将匹配限制在<>之间是可行的。

You can use this: 您可以使用此:

$html = preg_replace('~(?:https?://)?(?:www\.)?([^\.\s]+\.)?example\.com~i',
                       'http://$1anotherdomain.com', $html);

For parsing URL and applying rule PHP's built-in parse_url is much better suited to handle complexity of all sort of URL stricture. 为了解析URL和应用规则,PHP的内置parse_url更适合处理各种URL约束的复杂性。

$url = 'http://www.example.com/foo/bar';
$arr = parse_url($url);
$output = 'http://anotherwebsite.com';
// apply path only if www isn't there
if (stripos($arr['host'], 'www.') !== 0) {
   $path = isset($arr['path']) ? $arr['path'] : '';
   $output = 'http://anotherwebsite.com' . $path;
}

echo $output;

Using DOM: 使用DOM:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//a");
for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    $url = $img->getAttribute('href');
    $arr = parse_url($url);
    $output = 'http://anotherwebsite.com';
    // apply path only if www isn't there
    if (stripos($arr['host'], 'www.') !== 0) {
       $path = isset($arr['path']) ? $arr['path'] : '';
       $output = 'http://anotherwebsite.com' . $path;
    }
    $node->setAttribute('href', $url);
}
// save HTML back
echo $doc->saveHTML();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM