简体   繁体   中英

How can I write a javascript regular expression to replace hyperlinks in this format [*](*) with html hyperlinks?

I need the parse text with links in the following formats:

[html title](http://www.htmlpage.com)
http://www.htmlpage.com
http://i.imgur.com/OgQ9Uaf.jpg

The output for those two strings would be:

<a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>
<a href='http://i.imgur.com/OgQ9Uaf.jpg'>http://i.imgur.com/OgQ9Uaf.jpg</a>

The string could include an arbitrary amount of these links, ie:

[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com

output:

<a href='http://www.htmlpage.com'>html title</a><a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a>    <a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a> wejwelfj <a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>

I have an extremely long function that does an alright job by passing over the string 3 times, but I can't successfully parse this string:

[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something.

For brevity, I'll post the regular expressions I've tried rather than the entire find/replace function:

var matchArray2 = inString.match(/\[.*\]\(.*\)/g);

for matching [*](*) , doesn't work because []()[]() is matched

Really that's it, I guess. Once I make that match I search that match for ( ) and [ ] to parse out the link an link text and build the href tag. I delete matches from a temp string so I don't match them when I do my second pass to find plain hyperlinks:

var plainLinkArray = tempString2.match(/http\S*:\/\/\S*/g);

I'm not parsing any html with regex. I'm parsing a string and attempting to output html.

edit: I added the requirement that it parse the third link http://i.imgur.com/OgQ9Uaf.jpg after the fact.

my final solution (based on @Cerbrus's answer):

function parseAndHandleHyperlinks(inString)
{
    var result = inString.replace(/\[(.+?)\]\((https?:\/\/.+?)\)/g, '<a href="$2">$1</a>');
    return result.replace(/(?: |^)(https?\:\/\/[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');     
}

Try this regex:

/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g

var s = "[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)\n\
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)\n\
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com";

string.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>');

Regex Explanation:

# /                   - Regex Start
# \[                  - a `[` character (escaped)
# (.+?)               - Followed by any amount of words, grouped, non-greedy, so it won't match past:
# \]                  - a `]` character (escaped)
# \(                  - Followed by a `(` character (escaped)
# (https?:\/\/
#   [a-zA-Z0-9/.(]+?) - Followed by a string that starts with `http://` or `https://`
# \)                  - Followed by a `)` character (escaped)
# /g                  - End of the regex, search globally.

Now the 2 strings in the () / [] are captured, and placed in the following string:

'<a href="$2">$1</a>';

This works for your "problematic" string:

var s = "[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something."
s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>')

// Result:

'<a href="http://i.imgur.com/iIlhrEu.jpg">This</a> one got me crying first, then once the floodgates were opened <a href="http://i.imgur.com/IwSNFVD.jpg">this</a> one did it again and <a href="http://i.imgur.com/hxIwPKJ.jpg">this</a>. Ugh, feels. Gotta go hug someone/something.'

Some more examples with "Incorrect" input:

var s = "[Th][][is](http://x.com)\n\
    [this](http://x(.com)\n\
    [this](http://x).com)"
s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>')

//   "<a href="http://x.com">Th][][is</a>
//    <a href="http://x(.com">this</a>
//    <a href="http://x">this</a>.com)"

You can't really blame the last line for breaking, since there's no way to know if the user meant to stop the url there, or not.

To catch loose urls, add this:

.replace(/(?: |^)(https?\:\/\/[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');

The (?: |^) bit catches a String start or space character, so it'll also match lines starting with a url.

str.replace(/\[(.*?)\]\((.*?)\)/gi, '<a href="$2">$1</a>');

This assumes that there are no errant brackets in the string or parentheses in the URL.

Then:

str.replace(/(\s|^)(https?:\/\/.*?)(?=\s|$)/gi, '$1<a href="$2">$2</a>')

This matches an "http"-like URL that is not immediately preceded by a " (which would have just been added by the previous replacement). Feel free to use a better expression if you have it, of course.

EDIT: I edited the answer because I did not realize that JS did not have lookbehind syntax. Instead, you can see that the expression matches any space or the beginning of the line to match plain http links. The captured space has to be put back (hence the $1 ). A lookahead at the end is done to ensure that everything up to the next space (or end of the expression) is captured. If space is not a good boundary for you, you will have to come up with a better one.

It seems that you are trying to convert Markdown syntax to HTML. Markdown syntax has yet to have a specification (I am referring to grammar, not behavior specification) for it, so you are going to walk around blindfolded and try to incorporate bug fixes for behavior that you don't want along the way, all of that while reinventing the wheel. I would recommend that you use an existing implementation rather than coding one yourself. For example, Pagedown is a JS implementation of Markdown that is currently used in StackOverflow.

If you still want a regex solution, below is my attempt. Note that I don't know whether it will play well with other features of Markdown as you progress (if you do at all).

/\[((?:[^\[\]\\]|\\.)+)\]\((https?:\/\/(?:[-A-Z0-9+&@#\/%=~_|\[\]](?= *\))|[-A-Z0-9+&@#\/%?=~_|\[\]!:,.;](?! *\))|\([-A-Z0-9+&@#\/%?=~_|\[\]!:,.;(]*\))+) *\)/i

The regex above should capture some part (I'm not confident it captures everything, the source code of Pagedown is too complex to read in one go) of the behavior of Pagedown for [description](url) style of linking (title is not supported). The regex above is mixed from 2 different regex used in the Pagedown source code.

Some features:

  • Capturing group 1 contains text inside [] and capturing group 2 contains the URL.
  • Allow escaping of [ and ] inside the text part [] , by using \\ eg [a\\[1\\]](http://link.com) . You need to do a bit of extra processing, though.
  • Allow 1 level of () inside link, very useful in cases like this: [String.valueOf](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#valueOf(double))
  • Allow space after the link and before the ) .

I don't take into account the bare link in this regex.

Reference:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM