简体   繁体   中英

Regex matching a string pattern and number ( url format )

I have a string that follows this url pattern as

https://www.examples.org/activity-group/8712/activity/202803
// note :  the end ending of the url can be different
https://www.examples.org/activity-group/8712/activity/202803‌​?ref=bla
https://www.examples.org/activity-group/8712/activity/202803‌​/something

I'm trying to write a regex that matches

https://www.examples.org/activity-group/{number}/activity/{number}*

Where {number} is an integer of length 1 to 10.

How to define a regex that checks the string pattern and checks if the number is at the right position in the string ?

Background: in Google form, in order validate an answer , I want to enforce people to enter an url in this format. Hence the use of this regular expression.

For Urls not matching that format, the regex should return false. For example : https://www.notthesite.org/group/8712/activity/astring

I went through several examples, but they match only if the number is present in the string.

Examples sources :

^https:\\/\\/www\\.examples\\.org\\/activity-group\\/[0-9]{1,10}\\/activity\\/[0-9]{1,10}(\\/[az]+)*((\\?[az]+=[a-zA-Z0-9]+)(\\&[az]+=[a-zA-Z0-9]+)*)*$

  • ^ - start of string
  • \\ - escape character
  • [0-9] - a digit
  • {1,10} - between one and ten of the previous items
  • (\\/[az]+)* - Allow additional URL segments
  • ((\\?[az]+=[a-zA-Z0-9]+)(\\&[az]+=[a-zA-Z0-9]+)*)* - Allow query parameters with first parameter using a ? and all others using &
  • $ - end of string

This is assuming the URL segment and query parameter keys are lowercase letters only. The query parameter values can be lowercase letters, uppercase letters, or digits.

You could use

https?:\/\/(?:[^/]+\/){2}(\d+)\/[^/]+\/(\d+)

See a demo on regex101.com .


Broken down, this says:

 https?:\\/\\/ # http:// or https:// (?:[^/]+\\/){2} # not "/", followed by "/", twice (\\d+) # 1+ digits \\/[^/]+\\/ # same pattern as above (\\d+) # the other number 

You'll need to use group 1 and 2 , respectively.


If this is too permissive, use

 https:\\/\\/[^/]+\\/activity-group\\/(\\d+)\\/activity\\/(\\d+) 

Which reads

 https:\\/\\/[^/]+ # https:// + some domain name \\/activity-group\\/ # /activity-group/ (\\d+) # first number \\/activity\\/ # /activity/ (\\d+) # second number 

See another demo on regex101.com .

Probably you need something like:

(http[s]?:\/\/)?www.examples.org\/activity-group\/(\d{1,10})\/activity\/(\d{1,10})([\S]+?)$

Where:

  • (http[s]?:\\/\\/)? matches any http:// or https:// part.
  • www.examples.org is your domain name.
  • (\\d{1,10}) will match the first integer with max len of 10(after activity-group ).
  • Second (\\d{1,10}) will match the second integer after activity .
  • And finally ([\\S]+?)$ will match any optional data after the second number until a new line is found, assuming that you use multiline flag with \\m .

Check it at http://regexr.com/3h448

Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM