简体   繁体   中英

regex to match all keywords in a string

Being noob in regex I require some support from community

Let say I have this string str

  1. www.anysite.com hello demo try this link

  2. anysite.com indeed demo link

  3. http://www.anysite.com another one

  4. www.anysite.com

  5. http://anysite.com

Consider 1-5 as whole string str here

I want to convert all 'anysite.com' into clickable html links, for which I am using:

str =  str.replace(/((http|https|ftp):\/\/[\w?=&.\/-;#~%-]+(?![\w\s?&.\/;#~%"=-]*>))/g, '<a href="$1" target="_blank">$1</a>');

This converts all space separated words starting with http/https/ftp into links as

<a href="url" target="_blank">url</a>

So, line 3 and line 5 has been converted correctly. Now to convert all www.anysite.com into links I again used

str = str.replace(/(\b^(http|https|ftp)?(www\.)[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig, '<a href="https://$1" target="_blank">$1</a>');

Though it only converts www.anysite.com into link if it is found at very beginning of str . So it convert line number 1 but not line number 4 .

Note that I have used ^(http|https|ftp)?(www.) to find all www not starting with http/https/ftp, as for http they already have been converted

Also the link on line number 2 , where it is neither started with http nor www rather it ends with .com, how the regex would be for that.

For reference you can try posting this whole string to you facebook timeline, it converts all five line into links. Check snapshot

在此输入图像描述

Thanks for help, the final RegEx that helped me is:

//remove all http:// and https://
str = str.replace(/(http|https):\/\//ig, "");

//replace all string ending with .com or .in only into link
str = str.replace( /((www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.(com|in))/ig, '<a href="//$1" target="_blank">$1</a>');

I used .com and .in for my specific requirement, else the solution on this http://regexr.com/39i0i will work

Though sill there is issue like- it doesn't convert shortened url into links perfectly. eg http://s.ly/qhdfTyuiOP will give link till s.ly

Still any suggestions?

^(http|https|ftp)?(www\\.) does not mean "all www not starting with http/https/ftp" but rather "a string that starts with an optional http/https/ftp followed by www. .

Indeed, ^ in this context isn't a negation but rather an anchor representing the start of the string. I suppose you used it this way because of its meaning when used in a character class ( [^...] ) ; it is rather tricky since its meaning change depending on the context it is found in.

You could just remove it and you should be fine, as I see no point of making sure the string does not start with http/https/ftp (you transformed those occurrences just before, there should be none left).


Edit : I mentioned lookbehind but forgot it's not available in JS...

If you wanted to make some kind of negation, the easiest way would be to use a negative lookbehind :

(?<!http|https|ftp)www\.

This matches "www." only when it's not preceded by http, https nor ftp.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM