php regex to get string inside href tag

Question

I need a regex that will give me the string inside an href tag and inside the quotes also.

For example i need to extract theurltoget.com in the following:

<a href="theurltoget.com">URL</a>

Additionally, I only want the base url part. Ie from http://www.mydomain.com/page.html i only want http://www.mydomain.com/

Answer 1

Dont use regex for this. You can use xpath and built in php functions to get what you want:

    $xml = simplexml_load_string($myHtml);
    $list = $xml->xpath("//@href");

    $preparedUrls = array();
    foreach($list as $item) {
        $item = parse_url($item);
        $preparedUrls[] = $item['scheme'] . '://' .  $item['host'] . '/';
    }
    print_r($preparedUrls);

Answer 2

$html = '<a href="http://www.mydomain.com/page.html">URL</a>';

$url = preg_match('/<a href="(.+)">/', $html, $match);

$info = parse_url($match[1]);

echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com

Answer 3

this expression will handle 3 options:

no quotes
double quotes
single quotes

'/href=["\\']?([^"\\'>]+)["\\']?/'

Answer 4

Use the answer by @Alec if you're only looking for the base url part (the 2nd part of the question by @David)!

$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);

This will give you:

$info
Array
(
    [scheme] => http
    [host] => www.mydomain.com
    [path] => /page.html" class="myclass" rel="myrel
)

So you can use $href = $info["scheme"] . "://" . $info["host"] $href = $info["scheme"] . "://" . $info["host"] $href = $info["scheme"] . "://" . $info["host"] Which gives you:

// http://www.mydomain.com

When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by @user2520237.

$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

this will give you:

$info
Array
(
    [scheme] => http
    [host] => www.mydomain.com
    [path] => /page.html
)

Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"]; $href = $info["scheme"] . "://" . $info["host"] . $info["path"]; Which gives you:

// http://www.mydomain.com/page.html

Answer 5

http://www.the-art-of-web.com/php/parse-links/

Let's start with the simplest case - a well formatted link with no extra attributes:

/<a href=\"([^\"]*)\">(.*)<\/a>/iU

Answer 6

For all href values replacement:

function replaceHref($html, $replaceStr)
{
    $match = array();
    $url   = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);

    if(count($match))
    {
        for($j=0; $j<count($match); $j++)
        {
            $html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
        }
    }
    return $html;
}
$replaceStr  = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);

echo $replaceHtml;

Answer 7

This will handle the case where there are no quotes around the URL.

/<a [^>]*href="?([^">]+)"?>/

But seriously, do not parse HTML with regex . Use DOM or a proper parsing library.

Answer 8

Because Positive and Negative Lookbehind are cool

/(?<=href=\").+(?=\")/

It will match only what you want, without quotation marks

Array ( [0] => theurltoget.com )

Answer 9

/href="(https?://[^/]*)/

我认为您应该能够处理其余的工作。

php regex to get string inside href tag

Question

9 answers

solution1
17 2010-10-22 23:04:30

solution2
12 2010-10-22 22:17:15

solution3
6 2013-08-02 14:55:24

solution4
5 2013-08-14 07:54:49

solution5
4 2010-10-22 22:15:16

solution6
3 2012-08-10 05:33:54

solution7
0 2010-10-22 22:14:35

solution8
-1 2014-05-12 03:59:53

solution9
-1 2010-10-22 22:12:43

php regex to get string inside href tag

Question

9 answers

solution1 17 2010-10-22 23:04:30

solution2 12 2010-10-22 22:17:15

solution3 6 2013-08-02 14:55:24

solution4 5 2013-08-14 07:54:49

solution5 4 2010-10-22 22:15:16

solution6 3 2012-08-10 05:33:54

solution7 0 2010-10-22 22:14:35

solution8 -1 2014-05-12 03:59:53

solution9 -1 2010-10-22 22:12:43

solution1
17 2010-10-22 23:04:30

solution2
12 2010-10-22 22:17:15

solution3
6 2013-08-02 14:55:24

solution4
5 2013-08-14 07:54:49

solution5
4 2010-10-22 22:15:16

solution6
3 2012-08-10 05:33:54

solution7
0 2010-10-22 22:14:35

solution8
-1 2014-05-12 03:59:53

solution9
-1 2010-10-22 22:12:43