简体   繁体   English

正则表达式选择长字符串的url部分

[英]Regex to pick out url part of a long string

I have a very long string, somewhere in this string, there is an url. 我有一个很长的字符串,在该字符串的某个地方,有一个URL。 In this example this url is at the beginning. 在此示例中,该URL在开头。

"http://localhost:1234/api/$metadata#this_entry_is_variable_and_can_exist_of_numbers_and_characters/$entity","Version":"AAAEEEIIU=""

I'm trying to write a RegEx in C# for this particular string, to extract the url after the following rules: 我正在尝试使用C#为此特定字符串编写一个RegEx,以遵循以下规则提取URL:

  1. The url always starts with http:// or https:// 网址始终以http://或https://开头
  2. After the url, the port is sometimes specified, not always 在url之后,有时会指定端口,但并非总是
  3. After the port, there is a path, in this example /api , but it can be any characters 在端口之后,有一个路径,在此示例中为/api ,但可以是任何字符
  4. After the path, in this example /api , it is always /$metadata 路径之后,在此示例中/api ,始终为/$metadata
  5. After the /$metadata there is a hashtag # followed by a string of any characters /$metadata之后,有一个#后跟任何字符的字符串
  6. The last part of the url always ends with /$entity 网址的最后部分始终以/$entity结尾

This is the RegEx I have come up with so far: 到目前为止,这是我想出的RegEx:

(^http://\w+(\.\w+)*(:[0-9]+)?\/?(\/[.\^$metadata$(\#(\[a-zA-Z0-9)(\$(\entity$))]*).*?)

When testing this in LinqPad, the following issues occur: 在LinqPad中进行测试时,会发生以下问题:

  1. If the string contains more than the url, there is no match 如果该字符串包含的网址超出限制,则没有匹配项
  2. It does not strictly validate on /$metadata, it accepts /$metadata1111 它不严格在/ $ metadata上验证,它接受/ $ metadata1111
  3. It does not strictly validate on /$entity, it accepts /$entity111 它不会严格验证/ $ entity,而是接受/ $ entity111
  4. Obviously it does not accept https:// yet. 显然,它还不接受https://。

Can anyone give me a hint on were to continue, as I'm stuck.. 任何人都可以给我一个提示,因为我被困住了。

Your regex doesn't follow a Regular Expression constructing rules hence no expected match. 您的正则表达式不遵循正则表达式构造规则,因此没有预期的匹配。 This is what you are expressing: 这是您要表达的内容:

https?://[^/]+/[^/]+/\$metadata#[^/]+/\$entity

Live demo 现场演示

Try this regex: 试试这个正则表达式:

https?://[\w-]+(?:\.[\w-]+)*(?::\d+)?/.*?\$metadata#.*?\$entity\b

Demo 演示版

To you questions: 给您的问题:

  1. You matched only one regex because of the ^ . 由于^您仅匹配了一个正则表达式。 It matches only the start of input string if RegexOptions.Multiline is not set, and also start of every new line (after newline chars) if RegexOptions.Multiline is set. 如果未设置RegexOptions.Multiline则仅与输入字符串的开头匹配;如果设置了RegexOptions.Multiline则仅与每个新行的开头(在换行符之后) RegexOptions.Multiline

  2. The regex gets mixed up in the part where $metadata...entity$ is surrounded by [] 正则表达式在$metadata...entity$[]包围的部分中混杂在一起

  3. See 2. 见2。

  4. Simply make the s optional with ? 只需将s可选?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM