简体   繁体   中英

Regex to pick out url part of a long string

I have a very long string, somewhere in this string, there is an url. In this example this url is at the beginning.

"http://localhost:1234/api/$metadata#this_entry_is_variable_and_can_exist_of_numbers_and_characters/$entity","Version":"AAAEEEIIU=""

I'm trying to write a RegEx in C# for this particular string, to extract the url after the following rules:

  1. The url always starts with http:// or https://
  2. After the url, the port is sometimes specified, not always
  3. After the port, there is a path, in this example /api , but it can be any characters
  4. After the path, in this example /api , it is always /$metadata
  5. After the /$metadata there is a hashtag # followed by a string of any characters
  6. The last part of the url always ends with /$entity

This is the RegEx I have come up with so far:

(^http://\w+(\.\w+)*(:[0-9]+)?\/?(\/[.\^$metadata$(\#(\[a-zA-Z0-9)(\$(\entity$))]*).*?)

When testing this in LinqPad, the following issues occur:

  1. If the string contains more than the url, there is no match
  2. It does not strictly validate on /$metadata, it accepts /$metadata1111
  3. It does not strictly validate on /$entity, it accepts /$entity111
  4. Obviously it does not accept https:// yet.

Can anyone give me a hint on were to continue, as I'm stuck..

Your regex doesn't follow a Regular Expression constructing rules hence no expected match. This is what you are expressing:

https?://[^/]+/[^/]+/\$metadata#[^/]+/\$entity

Live demo

Try this regex:

https?://[\w-]+(?:\.[\w-]+)*(?::\d+)?/.*?\$metadata#.*?\$entity\b

Demo

To you questions:

  1. You matched only one regex because of the ^ . It matches only the start of input string if RegexOptions.Multiline is not set, and also start of every new line (after newline chars) if RegexOptions.Multiline is set.

  2. The regex gets mixed up in the part where $metadata...entity$ is surrounded by []

  3. See 2.

  4. Simply make the s optional with ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM