简体   繁体   English

javascript 和 DOM 中的 Url 解析

[英]Url parsing in javascript and DOM

I am writing a support chat application where I want text to be parsed for urls.我正在编写一个支持聊天应用程序,我希望将文本解析为 url。 I have found answers for similar questions but nothing for the following.我找到了类似问题的答案,但没有找到以下问题的答案。

what i have我有的

function ReplaceUrlToAnchors(text) {
    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
              [-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp,"<a href='$1' target='_blank'>$1</a>"); 
}

that pattern is a modified version of one i found on the internet.该模式是我在互联网上找到的模式的修改版本。 It includes www.它包括万维网。 in the first token, because not all urls start with protocol:// However, when www.google.com is replaced with在第一个令牌中,因为不是所有的 url 都以 protocol:// 开头但是,当 www.google.com 被替换为

<a href='www.google.com' target='_blank'>www.google.com</a>

which pulls up MySite.com/webchat/wwww.google.com and I get a 404拉起 MySite.com/webchat/wwww.google.com 我得到 404

that is my first problem, my second is...这是我的第一个问题,我的第二个问题是......

in my script for generating messages to the log, I am forced to do it a hacky way:在我用于生成日志消息的脚本中,我不得不以一种骇人听闻的方式进行操作:

var last = 0;
function UpdateChatWindow(msgArray) {

    var chat = $get("MessageLog");
    for (var i = 0; i < msgArray.length; i++) {
        var element = document.createElement("div");
        var linkified = ReplaceUrlToAnchors(msgArray[i]);
        element.setAttribute("id", last.toString());
        element.innerHTML = linkified;
        chat.appendChild(element);
        last = last + 1;
    }
}

To get the "linkified" string to render HTML out correctly I have to use the non-standard.innerHTML attribute of element.要获得“链接”字符串以正确呈现 HTML,我必须使用元素的 non-standard.innerHTML 属性。 I would prefer a way were i could parse the string as tokens - text tokens and anchor tokens - and call either createTextNode or createElement("a") and stitch them together with DOM.我更喜欢一种方法是我可以将字符串解析为标记 - 文本标记和锚标记 - 并调用 createTextNode 或 createElement("a") 并将它们与 DOM 拼接在一起。

so question 1 is how should I go about www.site.com parsing, or even site.com?所以问题1是我应该如何go关于www.site.com解析,甚至site.com? and question 2 is how would could I do this using only DOM?问题 2 是我如何仅使用 DOM 来做到这一点?

Another thing you could do is this:你可以做的另一件事是:

function ReplaceUrlToAnchors(text) {
    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
              [-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp, function(_, url) {
      return '<a href="' +
        (/^www\./.test(url) ? "http://" + url : url) +
        'target="_blank">' +
        url +
        '</a>';
    }); 
}

That is kind-of like your solution, but it does the check for "www" URLs in that callback passed in to ".replace()".这有点像您的解决方案,但它会检查传递给“.replace()”的回调中的“www”URL。

Note that you won't be picking up "stackoverflow.com" or "newegg.com" or anything like that, which I understand may be unavoidable (and even desirable, given the false positives you'd pick up).请注意,您不会选择“stackoverflow.com”或“newegg.com”或类似的东西,我理解这可能是不可避免的(甚至是可取的,因为您会选择误报)。

Here is what I came up with, perhaps someone has something better?这是我想出的,也许有人有更好的东西?

function replaceUrlToAnchors(text) {
    var naked = /(\b(www.)[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|](.com|.net|.org|.co.uk|.ca|.))/ig;
    text = text.replace(naked, "http://$1");

    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/)([-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]))/ig;
    return text.replace(exp,"<a href='$1' target='_blank'>$3</a>"); 
}

the first regex will replace www.google.com with http://www.google.com and is good enough for what I am doing.第一个正则表达式将用http://www.google.com替换 www.google.com 并且对于我正在做的事情已经足够了。 However, I will hold off marking this as the answer because I would also like to make (www.) optional but when I do (www.)?但是,我不会将此标记为答案,因为我也想将 (www.) 设为可选,但当我这样做时 (www.)? it replaces every word with http://word/它用http://word/替换每个单词

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM