简体   繁体   中英

Regex for finding custom URLs

I want to create a regex to match URLs that start with http://, https://, // or to find urls that have an extension different from html, htm, php and php3. URL query substrings are optional

Let's say that I want to find these:

http://example.com
/example.mp3
/example.mp3?q=example
http://example.com/example.mp3
#example

And to reject these:

example
/example
/example/
/example.htm
/example.htm?q=example
/example.mp3/example //The .mp3 needs to be extension to be accepted
/example#example

I already tried this /(^(http:\\/\\/|https:\\/\\/|\\/\\/|#)|(.*)((.*)\\.^(?!html|htm|php|php3)$)(\\?.*)?$)/igm but it didn't worked.

If the opposite(reversing the accepted and declined lists) is easier to do, even that is very appreciated, I can change the function that handles the regex.

It seems you may use

^(?:#.+|(?:https?:/)?/[^?#\n]*\.(?!(?:html?|php3?)\b)\w+(?:\?.*)?)$

See the regex demo

Pattern details :

  • ^ - start of string
  • (?:#.+ - either a # followed with any 1+ chars
  • | - or
  • (?:https?:/)?/[^?#\\n]*\\.(?!html?|php3?)\\w+(?:\\?.*)?) -
    • (?:https?:/)?/ - an optional http:/ or https:/ and then /
    • [^?#]* - 0+ chars other than ? and #
    • \\. - a dot
    • (?!(?:html?|php3?)\\b)\\w+ - 1 or more letters/digits/underscore that is not equal to htm , html , php or php3
    • (?:\\?.*)?) - an optional ? followed with any 0+ chars
  • $ - end of string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM