简体   繁体   中英

How to match links without top-level domain using regex?

I use next regex (updated version of linkify regex) to match links and do not match emails.

(\s*|[^a-zA-Z0-9.\+_\/"\>\-]|^)(?:([a-zA-Z0-9\+_\-]+(?:\.[a-zA-Z0-9\+_\-]+)*@)?(http:\/\/|https:\/\/|ftp:\/\/|scp:\/\/){1}?((?:(?:[a-zA-Z0-9][a-zA-Z0-9_%\-_+]*\.)+))(?:[a-zA-Z]{2,})((?::\d{1,5}))?((?:[\/|\?](?:[\-a-zA-Z0-9_%#*&+=~!?,;:.\/]*)*)[\-\/a-zA-Z0-9_%#*&+=~]|\/?)?)([^a-zA-Z0-9\+_\/"\<\-]|$)

However this regex does not find urls like: https://someurl:3333/view/something

Can you please help me with this? Thanks!

This should be the "least modified" version of your expression to match domains without top-levels:

(\s*|[^a-zA-Z0-9.\+_\/"\>\-]|^)(?:([a-zA-Z0-9\+_\-]+(?:\.[a-zA-Z0-9\+_\-]+)*@)?(http:\/\/|https:\/\/|ftp:\/\/|scp:\/\/){1}?((?:[a-zA-Z0-9][a-zA-Z0-9_%\-_+.]*)(?:\.[a-zA-Z]{2,})?)((?::\d{1,5}))?((?:[\/|\?](?:[\-a-zA-Z0-9_%#*&+=~!?,;:.\/]*)*)[\-\/a-zA-Z0-9_%#*&+=~]|\/?)?)([^a-zA-Z0-9\+_\/"\<\-]|$)

The part that change was capture group 3, the one that grabbed the domain. It went from:

(
 (?:
  (?:
   [a-zA-Z0-9]
   [a-zA-Z0-9_%\-_+]*
   \.
  )+                  (?# this is how they repeated for optional subdomains)
 )
)
(?:
 [a-zA-Z]{2,}         (?# here is the mandatory TLD)
)

To this:

(
 (?:
  [a-zA-Z0-9]
  [a-zA-Z0-9_%\-_+.]* (?# the . is in the character class here for subdomains)
 )
 (?:
  \.
  [a-zA-Z]{2,}
 )?                   (?# this TLD is optional)
)

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM