Convert given relative urls to absolute urls

Question

I need to convert few given relative urls in the given html text to absolute urls.

The html text would be mixed with relative and absolute urls and I need the result html text which should only contain the absolute urls with following rules.

original html text contain mix of relative and absolute urls
need to convert /test/1.html into https://www.example.com/test/1.html
it should ignore the instance with absolute urls(both .com & .de) such as http://www.example.com/test/xxx.html , https://www.example.com/test/xxx.html , https://www.example.de/test/xxx.html , http://www.example.de/test/xxx.html

I know the best way to do that with preg_replace as I am using PHP and I tried the following code.

$server_url = "https://www.example.com";
$html = preg_replace('@(?<!https://www\.example\.com)(?<!http://www\.example\.com)(?<!https://www\.example\.de)(?<!http://www\.example\.de)/test@iU', $server_url.'/test', $html);

However, this doesn't give the desired results instead it has converted all the /test links including the existing absolute urls. So basically some urls were ended up like http://www.example.dehttp://www.example.com/test/xxx.html .

I'm not good at regex , please help me to find proper regex to get desired results.

Answer 1

This should match root -relative urls:

^(\/[^\/]{1}.*\.html)$

And the URL you want will be available in $1

https://regex101.com/r/E1evez/2

<?php
$urls = [
    '/test/1.html',
    'http://www.example.com/test/xxx.html',
    'https://www.example.de/test/xxx.html',
    '/relative/path/file.html'
];

foreach( $urls as $url )
{
    if( preg_match( '/^(\/[^\/]{1}.*\.html)$/', $url ) )
    {
        echo 'match: '.$url.PHP_EOL;
    }
    else
    {
        echo 'no match: '.$url.PHP_EOL;
    }
}

Outputs:

match: /test/1.html
no match: http://www.example.com/test/xxx.html
no match: https://www.example.de/test/xxx.html
match: /relative/path/file.html

Answer 2

If all the urls start with a forward slash, you might use:

(?<!\S)(?:/[^/\s]+)+/\S+\.html\S*

Explanation

(?<!\\S) Assert what is directly on the left is not a non whitespace char
(?:/[^/\\s]+)+ Repeat 1+ times matching / , then not / or a whitespace char using a negated character class
/\\S+ Match / and 1+ times a non whitespace char
\\.html\\S* Match .html as in the example data and 0+ times a non whitespace chars

Regex demo

If you also want to match /1.html you could use change the quantifier into )* instead of )+

To match more extensions than .html you might specify what you would allow to match like \\.(?:html|jpg|png) or perhaps use character class \\.[\\w-()] and add what you would allow to match.

Convert given relative urls to absolute urls

Question

2 answers

solution1
1 2019-08-01 13:40:10

solution2
1 ACCPTED 2019-08-01 13:40:54

Convert given relative urls to absolute urls

Question

2 answers

solution1 1 2019-08-01 13:40:10

solution2 1 ACCPTED 2019-08-01 13:40:54

solution1
1 2019-08-01 13:40:10

solution2
1 ACCPTED 2019-08-01 13:40:54