简体   繁体   中英

Using Regex to insert domain name into url

I am pulling in text from a database that is formatted like the sample below. I want to insert the domain name in front of every URL within this block of text.

<p>We recommend you check out the article 
<a id="navitem" href="/article/why-apples-new-iphones-may-delight-and-worry-it-pros/" target="_top">
Why Apple's new iPhones may delight and worry IT pros</a> to learn more</p>

So with the example above in mind I want to insert http://www.mydomainname.com/ into the URL so it reads:

href="http://www.mydomainname.com/article/why-apples-new-iphones-may-delight-and-worry-it-pros/"

I figured I could use regex and replace href=" with href="http://www.mydomainname.com but this appears to not be working as I intended. Any suggestions or better methods I should be attempting?

var content = Regex.Replace(DataBinder.Eval(e.Item.DataItem, "Content").ToString(), 
              "^href=\"$", "href=\"https://www.mydomainname.com/");

You could use regex...

...but it's very much the wrong tool for the job.

Uri has some handy constructors/factory methods for just this purpose:

Uri ConvertHref(Uri sourcePageUri, string href)
{
    //could really just be return new Uri(sourcePageUri, href);
    //but TryCreate gives more options...
    Uri newAbsUri;
    if (Uri.TryCreate(sourcePageUri, href, out newAbsUri))
    {
        return newAbsUri;
    }

    throw new Exception();
}

so, say sourcePageUri is

var sourcePageUri = new Uri("https://somehost/some/page");

the output of our method with a few different values for href :

https://www.foo.com/woo/har => https://www.foo.com/woo/har
/woo/har                    => https://somehost/woo/har
woo/har                     => https://somehost/some/woo/har

...so it's the same interpretation as the browser makes. Perfect, no?

Try this code:

var content = Regex.Replace(DataBinder.Eval(e.Item.DataItem, "Content").ToString(), 
              "(href=[ \t]*\")\/", "$1https://www.mydomainname.com/", RegexOptions.Multiline);

Use html parser, like CsQuery.

var html = "your html text here";
var path = "http://www.mydomainname.com";

CQ dom = html;
CQ links = dom["a"];

foreach (var link in links)
    link.SetAttribute("href", path + link["href"]);

html = dom.Html();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM