I need to convert few given relative urls in the given html text to absolute urls.
The html text would be mixed with relative and absolute urls and I need the result html text which should only contain the absolute urls with following rules.
/test/1.html
into https://www.example.com/test/1.html
http://www.example.com/test/xxx.html
, https://www.example.com/test/xxx.html
, https://www.example.de/test/xxx.html
, http://www.example.de/test/xxx.html
I know the best way to do that with preg_replace
as I am using PHP
and I tried the following code.
$server_url = "https://www.example.com";
$html = preg_replace('@(?<!https://www\.example\.com)(?<!http://www\.example\.com)(?<!https://www\.example\.de)(?<!http://www\.example\.de)/test@iU', $server_url.'/test', $html);
However, this doesn't give the desired results instead it has converted all the /test
links including the existing absolute urls. So basically some urls were ended up like http://www.example.dehttp://www.example.com/test/xxx.html
.
I'm not good at regex
, please help me to find proper regex
to get desired results.
This should match root -relative urls:
^(\/[^\/]{1}.*\.html)$
And the URL you want will be available in $1
https://regex101.com/r/E1evez/2
<?php
$urls = [
'/test/1.html',
'http://www.example.com/test/xxx.html',
'https://www.example.de/test/xxx.html',
'/relative/path/file.html'
];
foreach( $urls as $url )
{
if( preg_match( '/^(\/[^\/]{1}.*\.html)$/', $url ) )
{
echo 'match: '.$url.PHP_EOL;
}
else
{
echo 'no match: '.$url.PHP_EOL;
}
}
Outputs:
match: /test/1.html
no match: http://www.example.com/test/xxx.html
no match: https://www.example.de/test/xxx.html
match: /relative/path/file.html
If all the urls start with a forward slash, you might use:
(?<!\S)(?:/[^/\s]+)+/\S+\.html\S*
Explanation
(?<!\\S)
Assert what is directly on the left is not a non whitespace char (?:/[^/\\s]+)+
Repeat 1+ times matching /
, then not /
or a whitespace char using a negated character class /\\S+
Match /
and 1+ times a non whitespace char \\.html\\S*
Match .html as in the example data and 0+ times a non whitespace chars If you also want to match /1.html
you could use change the quantifier into )*
instead of )+
To match more extensions than .html
you might specify what you would allow to match like \\.(?:html|jpg|png)
or perhaps use character class \\.[\\w-()]
and add what you would allow to match.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.