[英]Regex matching php
I'm trying to replace a given url with another one in php using preg_replace, with the following code: 我正在尝试使用preg_replace在PHP中用另一个代码替换给定的URL,并使用以下代码:
$patt = '#(?:https?:\/\/)?(?:www\.)?((?:[^\.]+)\.)?example\.com#i';
preg_replace($patt,"http://$1anotherwebsite.com",$somehtml);
I'm having two issues however: I would like $1
to be blank if the subdomain is www.
但是,我有两个问题:如果子域是www.
我希望$1
为空白www.
, but it seems that ((?:[^\\.]+)\\.)?
,但似乎((?:[^\\.]+)\\.)?
matches www.
符合www.
and not (?:www\\.)?
而不是(?:www\\.)?
as expected. 如预期的那样。 This seems to be a php specific issue. 这似乎是特定于php的问题。
In the case of this example , the second match contains part of the first string. 在此示例中 ,第二个匹配项包含第一个字符串的一部分。 Is there anyway I could limit this to only match between <
and >
? 无论如何,我可以将其限制为仅在<
和>
之间匹配吗? I tried using (<.*)
and (.*>)
, but nothing matched. 我尝试使用(<.*)
和(.*>)
,但是没有匹配的东西。
EDIT: Here are some sample inputs / outputs: 编辑:这是一些示例输入/输出:
http://static.example.com/assets/js/jquery-1.6.1.min.js?1384234134 -> http://static.anotherwebsite.com/assets/js/jquery-1.6.1.min.js?1384234134
http://www.example.com -> http://anotherwebsite.com
example.com -> http://anotherwebsite.com
https://example.com/index.php -> http://anotherwebsite.com/index.php
The links are coded in html, so restricting the match to be between <
and >
would work I believe. 链接是用html编码的,所以我相信将匹配限制在<
和>
之间是可行的。
You can use this: 您可以使用此:
$html = preg_replace('~(?:https?://)?(?:www\.)?([^\.\s]+\.)?example\.com~i',
'http://$1anotherdomain.com', $html);
For parsing URL and applying rule PHP's built-in parse_url
is much better suited to handle complexity of all sort of URL stricture. 为了解析URL和应用规则,PHP的内置parse_url
更适合处理各种URL约束的复杂性。
$url = 'http://www.example.com/foo/bar';
$arr = parse_url($url);
$output = 'http://anotherwebsite.com';
// apply path only if www isn't there
if (stripos($arr['host'], 'www.') !== 0) {
$path = isset($arr['path']) ? $arr['path'] : '';
$output = 'http://anotherwebsite.com' . $path;
}
echo $output;
Using DOM: 使用DOM:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//a");
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
$url = $img->getAttribute('href');
$arr = parse_url($url);
$output = 'http://anotherwebsite.com';
// apply path only if www isn't there
if (stripos($arr['host'], 'www.') !== 0) {
$path = isset($arr['path']) ? $arr['path'] : '';
$output = 'http://anotherwebsite.com' . $path;
}
$node->setAttribute('href', $url);
}
// save HTML back
echo $doc->saveHTML();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.