简体   繁体   English

如何编写一个javascript正则表达式来替换此格式[*](*)的超链接与html超链接?

[英]How can I write a javascript regular expression to replace hyperlinks in this format [*](*) with html hyperlinks?

I need the parse text with links in the following formats: 我需要使用以下格式的链接的解析文本:

[html title](http://www.htmlpage.com)
http://www.htmlpage.com
http://i.imgur.com/OgQ9Uaf.jpg

The output for those two strings would be: 这两个字符串的输出将是:

<a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>
<a href='http://i.imgur.com/OgQ9Uaf.jpg'>http://i.imgur.com/OgQ9Uaf.jpg</a>

The string could include an arbitrary amount of these links, ie: 字符串可以包含任意数量的这些链接,即:

[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com

output: 输出:

<a href='http://www.htmlpage.com'>html title</a><a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a>    <a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a> wejwelfj <a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>

I have an extremely long function that does an alright job by passing over the string 3 times, but I can't successfully parse this string: 我有一个非常长的函数,通过传递字符串3次做一个正常的工作,但我无法成功解析此字符串:

[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something.

For brevity, I'll post the regular expressions I've tried rather than the entire find/replace function: 为简洁起见,我将发布我尝试过的正则表达式而不是整个查找/替换函数:

var matchArray2 = inString.match(/\[.*\]\(.*\)/g);

for matching [*](*) , doesn't work because []()[]() is matched 匹配[*](*) ,因为[]()[]()匹配不起作用

Really that's it, I guess. 我猜,真的就是这样。 Once I make that match I search that match for ( ) and [ ] to parse out the link an link text and build the href tag. 一旦我进行了匹配,我搜索匹配的( )和[ ]来解析链接文本并构建href标记。 I delete matches from a temp string so I don't match them when I do my second pass to find plain hyperlinks: 我从临时字符串中删除匹配项,因此当我第二次访问以查找纯超链接时,我不匹配它们:

var plainLinkArray = tempString2.match(/http\S*:\/\/\S*/g);

I'm not parsing any html with regex. 我没有用正则表达式解析任何html。 I'm parsing a string and attempting to output html. 我正在解析一个字符串并尝试输出html。

edit: I added the requirement that it parse the third link http://i.imgur.com/OgQ9Uaf.jpg after the fact. 编辑:我之后添加了解析第三个链接http://i.imgur.com/OgQ9Uaf.jpg的要求。

my final solution (based on @Cerbrus's answer): 我的最终解决方案(根据@ Cerbrus的回答):

function parseAndHandleHyperlinks(inString)
{
    var result = inString.replace(/\[(.+?)\]\((https?:\/\/.+?)\)/g, '<a href="$2">$1</a>');
    return result.replace(/(?: |^)(https?\:\/\/[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');     
}

Try this regex: 试试这个正则表达式:

/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g

var s = "[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)\n\
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)\n\
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com";

string.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>');

Regex Explanation: 正则表达式说明:

# /                   - Regex Start
# \[                  - a `[` character (escaped)
# (.+?)               - Followed by any amount of words, grouped, non-greedy, so it won't match past:
# \]                  - a `]` character (escaped)
# \(                  - Followed by a `(` character (escaped)
# (https?:\/\/
#   [a-zA-Z0-9/.(]+?) - Followed by a string that starts with `http://` or `https://`
# \)                  - Followed by a `)` character (escaped)
# /g                  - End of the regex, search globally.

Now the 2 strings in the () / [] are captured, and placed in the following string: 现在捕获() / []中的2个字符串,并将其放在以下字符串中:

'<a href="$2">$1</a>';

This works for your "problematic" string: 这适用于您的“有问题”字符串:

var s = "[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something."
s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>')

// Result:

'<a href="http://i.imgur.com/iIlhrEu.jpg">This</a> one got me crying first, then once the floodgates were opened <a href="http://i.imgur.com/IwSNFVD.jpg">this</a> one did it again and <a href="http://i.imgur.com/hxIwPKJ.jpg">this</a>. Ugh, feels. Gotta go hug someone/something.'

Some more examples with "Incorrect" input: 更多带有“不正确”输入的示例:

var s = "[Th][][is](http://x.com)\n\
    [this](http://x(.com)\n\
    [this](http://x).com)"
s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>')

//   "<a href="http://x.com">Th][][is</a>
//    <a href="http://x(.com">this</a>
//    <a href="http://x">this</a>.com)"

You can't really blame the last line for breaking, since there's no way to know if the user meant to stop the url there, or not. 你不能真的责怪破坏的最后一行,因为没有办法知道用户是否打算在那里停止网址。

To catch loose urls, add this: 要捕获松散的URL,请添加以下内容:

.replace(/(?: |^)(https?\:\/\/[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');

The (?: |^) bit catches a String start or space character, so it'll also match lines starting with a url. (?: |^)位捕获String startspace字符,因此它也匹配以url开头的行。

str.replace(/\[(.*?)\]\((.*?)\)/gi, '<a href="$2">$1</a>');

This assumes that there are no errant brackets in the string or parentheses in the URL. 这假设URL中的字符串或括号中没有错误的括号。

Then: 然后:

str.replace(/(\s|^)(https?:\/\/.*?)(?=\s|$)/gi, '$1<a href="$2">$2</a>')

This matches an "http"-like URL that is not immediately preceded by a " (which would have just been added by the previous replacement). Feel free to use a better expression if you have it, of course. 这匹配一个类似“http”的URL,它不会立即在前面加上“(之前的替换就已经添加了)。当然,如果你拥有它,可以随意使用更好的表达式。

EDIT: I edited the answer because I did not realize that JS did not have lookbehind syntax. 编辑:我编辑了答案,因为我没有意识到JS没有lookbehind语法。 Instead, you can see that the expression matches any space or the beginning of the line to match plain http links. 相反,您可以看到表达式匹配任何空格或行的开头以匹配纯http链接。 The captured space has to be put back (hence the $1 ). 捕获的空间必须放回(因此$1 )。 A lookahead at the end is done to ensure that everything up to the next space (or end of the expression) is captured. 最后的一个前瞻是确保捕获到下一个空格(或表达式的结尾)的所有内容。 If space is not a good boundary for you, you will have to come up with a better one. 如果空间对你来说不是一个好的边界,你将不得不想出一个更好的边界。

It seems that you are trying to convert Markdown syntax to HTML. 您似乎正在尝试将Markdown语法转换为HTML。 Markdown syntax has yet to have a specification (I am referring to grammar, not behavior specification) for it, so you are going to walk around blindfolded and try to incorporate bug fixes for behavior that you don't want along the way, all of that while reinventing the wheel. Markdown语法还没有规范(我指的是语法,而不是行为规范),因此你将被蒙住眼睛走动并尝试将bug修复程序纳入你不想要的行为,所有这些在重新发明轮子的同时。 I would recommend that you use an existing implementation rather than coding one yourself. 我建议您使用现有的实现,而不是自己编写。 For example, Pagedown is a JS implementation of Markdown that is currently used in StackOverflow. 例如, Pagedown是Markdown的JS实现,目前在StackOverflow中使用。

If you still want a regex solution, below is my attempt. 如果你仍然想要一个正则表达式解决方案,下面是我的尝试。 Note that I don't know whether it will play well with other features of Markdown as you progress (if you do at all). 请注意,我不知道当你进步时它是否会与Markdown的其他功能很好地配合(如果你这样做的话)。

/\[((?:[^\[\]\\]|\\.)+)\]\((https?:\/\/(?:[-A-Z0-9+&@#\/%=~_|\[\]](?= *\))|[-A-Z0-9+&@#\/%?=~_|\[\]!:,.;](?! *\))|\([-A-Z0-9+&@#\/%?=~_|\[\]!:,.;(]*\))+) *\)/i

The regex above should capture some part (I'm not confident it captures everything, the source code of Pagedown is too complex to read in one go) of the behavior of Pagedown for [description](url) style of linking (title is not supported). 上面的正则表达式应该捕获一些部分(我不相信它捕获所有内容,Pagedown的源代码太复杂而无法一次性阅读)Pagedown for [description](url)链接样式(标题不是支持的)。 The regex above is mixed from 2 different regex used in the Pagedown source code. 上面的正则表达式混合了Pagedown源代码中使用的2个不同的正则表达式。

Some features: 一些功能:

  • Capturing group 1 contains text inside [] and capturing group 2 contains the URL. 捕获组1包含[]内的文本,捕获组2包含URL。
  • Allow escaping of [ and ] inside the text part [] , by using \\ eg [a\\[1\\]](http://link.com) . 允许通过使用\\ [a\\[1\\]](http://link.com)例如[a\\[1\\]](http://link.com)转义文本部分[] ]内的[] You need to do a bit of extra processing, though. 但是,您需要进行一些额外的处理。
  • Allow 1 level of () inside link, very useful in cases like this: [String.valueOf](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#valueOf(double)) 在链接中允许1级() ,在这种情况下非常有用: [String.valueOf](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#valueOf(double))
  • Allow space after the link and before the ) . 允许链接后和之前的空格)

I don't take into account the bare link in this regex. 我没有考虑这个正则表达式中的裸链接。

Reference: 参考:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM