简体   繁体   English

如何在超链接Regex中包含连字符?

[英]How do I include a hyphen in a hyperlink Regex?

I am trying to find links in user entered text and convert them to link automatically. 我试图在用户输入的文本中找到链接并自动将它们转换为链接。

I am using current Regex as following, which good to find hyperlinks from text. 我正在使用当前正则表达式,这很好找到文本的超链接。

Regex regexResolveUrl = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);

It is working good for almost all links so far i came across but it is giving problem when i want to detect links with hypen. 到目前为止,我遇到的几乎所有链接都有效,但是当我想要检测与hypen的链接时它会给出问题。

ie www.abc-xyz.com will not work, with above regex, can anyone help me with this? 即www.abc-xyz.com不起作用,上面的正则表达式,任何人都可以帮我这个吗?

If you want - to mean dash literally in a character class definition, you need to put it as the last (or first) character. 如果你想-在字符类定义中字面意思是破折号,你需要把它作为最后一个(或第一个)字符。 So [abc-] is a character class containing 4 characters, a , b , c , - . 所以[abc-]是一个包含4个字符的字符类, abc- On the other hand, [ab-c] only contains 3 characters, not including the - , because - is a range definition. 另一方面, [ab-c]只包含3个字符,不包括- ,因为-是范围定义。

So, something like this (from your pattern): 所以,像这样(从你的模式):

[A-Z0-9.-:]

Defines 3 ranges, from A to Z , from 0 to 9 , and from . 定义3个范围,从AZ ,从09 ,以及从. (ASCII 46) to : (ASCII 58). (ASCII 46)到: (ASCII 58)。 You want instead: 你想要的是:

[A-Z0-9.:-]

References 参考


Note on repetition 关于重复的注意事项

I noticed that you used {1,} in your pattern to denote "one-or-more of". 我注意到你在模式中用{1,}来表示“一个或多个”。

.NET regex (like most other flavors) support these shorthands: .NET正则表达式(像大多数其他版本一样)支持这些简写:

  • ? : "zero-or-one" {0,1} :“零或一” {0,1}
  • * : "zero-or-more" {0,} * :“零或多” {0,}
  • + : "one-or-more" {1,} + :“一个或多个” {1,}

They may take some getting used to, but they're also pretty standard. 他们可能需要一些习惯,但他们也很标准。

References 参考

Related questions 相关问题


Note on C# @ -quoted string literals 关于C# @ -quoted字符串文字的注释

While doubling the slashes in string literals for regex pattern is the norm in eg Java (out of necessity), in C# you actually have an option to use @ -quoted string literals. 虽然将正则表达式模式的字符串文字中的斜杠加倍是例如Java中的标准(出于必要性),但在C#中,您实际上可以选择使用@ -quoted字符串文字。

That is, these pairs of strings are identical: 也就是说,这些字符串对是相同的:

"(http://|www\\.)"
@"(http://|www\.)"

"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"

Using @ can lead to more readable regex patterns because a literal slash don't have to be doubled (although on the other hand, a double quote must now in turn be doubled). 使用@可以导致更可读的正则表达式模式,因为文字斜杠不必加倍(尽管另一方面,双引号现在必须加倍)。

References 参考

将连字符添加为字符类中的第一个或最后一个字符。

逃脱连字符:

 Regex("((http://|www\\.)([A-Z0-9.\-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM