简体   繁体   English

如何将不可点击的纯文本 URL 转换为 HTML 源中的链接

[英]How to convert unclickable plain text URLs to links in HTML source

I want to detect URLs and make them link in HTML code.我想检测 URL 并将它们链接到 HTML 代码中。 I've searched Stack Overflow but many answers are about detecting and converting links in text strings.我搜索了 Stack Overflow,但许多答案都是关于检测和转换文本字符串中的链接。 When I do that html code will be invalid;当我这样做时,html 代码将无效; ie. IE。 img sources will change, etc. img 来源会改变,等等。

PS: Close voters: Please read question carefully. PS:密切投票者:请仔细阅读问题。 It's not duplicate.它不是重复的。

For example;例如; the line 1 needs to be converted, and lines 2 & 3 do not.第 1 行需要转换,第 2 行和第 3 行不需要。

<!-- Sample html source -->
<div>
   Line 1 : https://www.google.com/
   Line 2 : <a href="https://www.google.com/">https://www.google.com/</a>
   Line 3: <img src="http://a-domain.com/lovely-image.jpg">
</div>

I need to:我需要:

  1. Find any URL in html body part在 html 车身零件中查找任何 URL

  2. Check if it is clickable or not: If not wrapped by 'a', 'img', ',--'.检查它是否可点击:如果没有被'a','img',',--'包裹。 etc.. ETC..

  3. If not make it clickable: Wrap with 'a'如果不使其可点击:用'a'包裹

How can I do that?我怎样才能做到这一点? All C# and JS versions are OK to me.所有 C# 和 JS 版本对我来说都可以。

LATEST UPDATE Changing project build target from 4.7.2 to 4.5 and back to 4.7.2 fixed the "bug".最新更新将项目构建目标从 4.7.2 更改为 4.5 并返回到 4.7.2 修复了“错误”。

UPDATE: This is my solution with help of @jira The problem here is nodes won't change at all.更新:这是我在@jira 帮助下的解决方案这里的问题是节点根本不会改变。 I mean the recursive function does the job, replaces links, debugging says, however html document won't update at all.我的意思是递归 function 完成这项工作,替换链接,调试说,但是 html 文档根本不会更新。 Any modification inside the function doesn't effect outside of the function, I don't know why, InnerText changes - InnerHtml doesn't change function 内部的任何修改都不会影响 function 外部,我不知道为什么, InnerText 更改 - InnerHtml 没有更改

var htmlVersion = "<html><head></head><body>\r\n"
   + "Some text\r\n"
   + "<div>http://google.com</div>\r\n"
   + " Then later more text: http://500px.com\r\n"
   + "<div>Sub <span>abc</span> Back text</div>\r\n"
   + "And the final text"
   + "</body></html>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlVersion);

// Linkify body
var modified = false;
var bodyNode = doc.DocumentNode.SelectSingleNode("//body"); 
var before = bodyNode.InnerHtml;
bodyNode = Linkify(bodyNode);
modified = modified || bodyNode.InnerHtml != before;
// modified is false !!!

The recursive Linkify function:递归 Linkify function:

HtmlAgilityPack.HtmlNode Linkify(HtmlAgilityPack.HtmlNode node)
{
    if (node.Name == "a") // It's already a link
    {
        return node;
    }

    if (node.Name == "#text") // Do replacement here
    {

        // Create links
        // https://stackoverflow.com/a/4750468/627193
        node.InnerHtml = Regex.Replace(node.InnerHtml,
            @"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
            "<a target='_blank' href='$1'>$1</a>");

    }

    for (int i = 0; i < node.ChildNodes.Count; i++) // Go for child nodes
    {
        node.ChildNodes[i] = Linkify(node.ChildNodes[i]);
    }
    return node;
}

Use html parser like HtmlAgility Pack .使用 html 解析器,如HtmlAgility Pack Select only text nodes and then search for links in them. Select 仅文本节点,然后在其中搜索链接。 That way you won't touch existing links.这样你就不会触及现有的链接。 Depending on how precise you need to be you may use a regex.根据您需要的精确程度,您可以使用正则表达式。

For example例如

var doc = new HtmlDocument();
doc.LoadHtml(html);
Regex r = new Regex(@"(https?://[^\s]+)");
var textNodes = doc.DocumentNode.SelectNodes("//text()");

foreach (var textNode in textNodes) {
    var text = textNode.GetDirectInnerText();
    var withLinks = r.Replace(text, "<a href=\"$1\">$1</a>");
    textNode.InnerHtml = withLinks;
}

Fiddle小提琴

Regex to check correctly for links can get quite complicated.正确检查链接的正则表达式可能会变得相当复杂。 Check other answers here on SO.在此处查看其他答案。

After changing project build target from 4.7.2 to 4.5 and go back to 4.7.2 again fixed the "bug".将项目构建目标从 4.7.2 更改为 4.5 并将 go 更改回 4.7.2 后,再次修复了“错误”。

Here is the working code:这是工作代码:

var htmlVersion = "<html><head></head><body>\r\n"
   + "Some text\r\n"
   + "<div>http://google.com</div>\r\n"
   + " Then later more text: http://500px.com\r\n"
   + "<div>Sub <span>abc</span> Back text</div>\r\n"
   + "And the final text"
   + "</body></html>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlVersion);

// Linkify body
var modified = false;
var bodyNode = doc.DocumentNode.SelectSingleNode("//body"); 
var before = bodyNode.InnerHtml;
bodyNode = Linkify(bodyNode);
modified = modified || bodyNode.InnerHtml != before;

The recursive Linkify function:递归 Linkify function:

HtmlAgilityPack.HtmlNode Linkify(HtmlAgilityPack.HtmlNode node)
{
    if (node == null || node.Name == "a") // It's already a link
    {
        return node;
    }

    if (node.Name == "#text") // Do replacement here
    {

        // Create links
        // https://stackoverflow.com/a/4750468/627193
        node.InnerHtml = Regex.Replace(node.InnerHtml,
            @"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
            "<a target='_blank' href='$1'>$1</a>");

    }

    for (int i = 0; i < node.ChildNodes.Count; i++) // Go for child nodes
    {
        node.ChildNodes[i] = Linkify(node.ChildNodes[i]);
    }
    return node;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM