简体   繁体   English

如何在ac #regex语句中指定通配符(对于任何字符)?

[英]How do I specify a wildcard (for ANY character) in a c# regex statement?

Trying to use a wildcard in C# to grab information from a webpage source, but I cannot seem to figure out what to use as the wildcard character. 试图在C#中使用通配符来从网页源中获取信息,但我似乎无法弄清楚要使用什么作为通配符。 Nothing I've tried works! 我没有尝试过任何作品!

The wildcard only needs to allow for numbers, but as the page is generated the same every time, I may as well allow for any characters. 通配符只需要允许数字,但由于页面每次生成相同,我也可以允许任何字符。

Regex statement in use: 正在使用的正则表达式声明:

Regex guestbookWidgetIDregex = new Regex("GuestbookWidget(' INSERT WILDCARD HERE ', '(.*?)', 500);", RegexOptions.IgnoreCase);

If anyone can figure out what I'm doing wrong, it would be greatly appreciated! 如果有人能弄清楚我做错了什么,我将不胜感激!

The wildcard character is . 通配符是. .
To match any number of arbitrary characters, use .* (which means zero or more . ) or .+ (which means one or more . ) 要匹配任意数量的任意字符,请使用.* (表示零或更多. )或.+ (表示一个或多个.

Note that you need to escape your parentheses as \\\\( and \\\\) . 请注意,您需要将括号转义为\\\\(\\\\) (or \\( and \\) in an @"" string) (或@"" \\(\\) in @""字符串)

On the dot 准点

In regular expression, the dot . 在正则表达式中,点. matches almost any character. 匹配几乎任何角色。 The only characters it doesn't normally match are the newline characters. 它通常不匹配的唯一字符是换行符。 For the dot to match all characters, you must enable what is called the single line mode (aka "dot all"). 要使点匹配所有字符,必须启用所谓的单行模式(也称为“全点”)。

In C#, this is specified using RegexOptions.Singleline . 在C#中,这是使用RegexOptions.Singleline指定的。 You can also embed this as (?s) in the pattern. 您也可以将其作为(?s)嵌入模式中。

References 参考


On metacharacters and escaping 关于元字符和转义

The . . isn't the only regex metacharacters. 不是唯一的正则表达式元字符。 They are: 他们是:

(   )   {   }   [   ]   ?   *   +   -   ^   $   .   |   \

Depending on where they appear, if you want these characters to mean literally (eg . as a period), you may need to do what is called "escaping". 根据它们出现的位置,如果您希望这些字符按字面意思 (例如.作为句点),您可能需要执行所谓的“转义”。 This is done by preceding the character with a \\ . 这是通过在字符前加上\\

Of course, a \\ is also an escape character for C# string literals. 当然, \\也是C#字符串文字的转义字符。 To get a literal \\ , you need to double it in your string literal (ie "\\\\" is a string of length one). 要获得文字\\ ,您需要在字符串文字中加倍(即"\\\\"是长度为1的字符串)。 Alternatively, C# also has what is called @ -quoted string literals, where escape sequences are not processed. 或者,C#也有所谓的@ -quoted字符串文字,其中不处理转义序列。 Thus, the following two strings are equal: 因此,以下两个字符串是相等的:

"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"

Since \\ is used a lot in regular expression, @ -quoting is often used to avoid excessive doubling. 由于\\在正则表达式中使用了很多,因此@ -quoting通常用于避免过度加倍。

References 参考


On character classes 关于角色类

Regular expression engines allow you to define character classes, eg [aeiou] is a character class containing the 5 vowel letters. 正则表达式引擎允许您定义字符类,例如[aeiou]是包含5个元音字母的字符类。 You can also use - metacharacter to define a range, eg [0-9] is a character classes containing all 10 digit characters. 您还可以使用-元字符来定义范围,例如[0-9]是包含所有10位数字符的字符类。

Since digit characters are so frequently used, regex also provides a shorthand notation for it, which is \\d . 由于数字字符经常使用,正则表达式也为它提供了简写符号,即\\d In C#, this will also match decimal digits from other Unicode character sets, unless you're using RegexOptions.ECMAScript where it's strictly just [0-9] . 在C#中,这也将匹配来自其他Unicode字符集的十进制数字,除非你使用的是RegexOptions.ECMAScript ,它只是[0-9]

References 参考

Related questions 相关问题


Putting it all together 把它们放在一起

It looks like the following will work for you: 看起来以下内容对您有用:

      @-quoting          digits_      _____anything but ', captured
          |                   / \    /     \
new Regex(@"GuestbookWidget\('\d*', '([^']*)', 500\);", RegexOptions.IgnoreCase);
                           \/                     \/
                         escape (              escape )

Note that I've modified the pattern slightly so that it uses negated character class instead of reluctance wildcard matching. 请注意,我稍微修改了模式,因此它使用了否定的字符类而不是磁阻通配符匹配。 This causes a slight difference in behavior if you allow ' to be escaped in your input string, but neither pattern handle this case perfectly. 如果允许'在输入字符串中进行转义,这会导致行为略有不同,但这两种模式都不能完美地处理这种情况。 If you're not allowing ' to be escaped, however, this pattern is definitely better. 但是,如果你不允许'逃脱' ,这种模式肯定会更好。

References 参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM