简体   繁体   中英

Regex (JS): Match URLs without a protocal, ignore URLS that with one

Apologies for yet another regex URL matching question, but I haven't been a able to find a solution in any of the other threads.

I want to run a replace() method on a string, with a pattern that matches all URLs without a protocal (http, https etc) but ignores urls that do have one.

So given this input:

www.google.com www.facebook.com http://www.google.com http://www.facebook.com

It would match www.google.com and www.facebook.com on the first line (without any surrounding whitespace), but ignore the other URLs on the second and third line.

I thought about just looking for www and ignoring matches which have // as preceding characters, which led me to this:

https://www.regex101.com/r/Y3rqxy/1

However, as you can see the second match includes the preceding whitespace. As I want to replace the www with http://www this whitespace buggers things up a little.

Any regex mandarins able to help me out on this one?

Mere seconds after posting this, one of my colleagues came up with a solution. It's a little wacky (thanks javascript) but it works! This example assumes you want to add http:// to any URLs that are missing their protocal.

First you have to reverse the string you're running the .replace() method on:

string.split('').reverse().join('')

Then you can run call the replace method with the following regex (note the backwards http://www !):

string.replace(/www(?!\\/\\/)/gi, 'www//:ptth')

Then you just reverse your string again:

string.split('').reverse().join('')

And any URLS that are missing a protocal in that string will now have them.

It's not going to win any awards for cleanliness, but it works!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM