简体   繁体   English

简单的正则表达式匹配问题?

[英]Simple Regex match question?

I have a stringstream where it has many strings inside like this: 我有一个字符串流,其中有很多这样的字符串:

  <A style="FONT-WEIGHT: bold" id=thread_title_559960       href="http://microsoft.com/forum/f80/topicName-1234/">Beautiful Topic Name</A> </DIV> 

I am trying to get appropriate links that starts with: 我正在尝试以以下开头的适当链接:

style="FONT-WEIGHT: bold

So in the end I will have the link: 因此,最后我将具有以下链接:

http://microsoft.com/forum/f80/topicName-1234/

Topic Id:
    1234

Topic Display Name:
    Beautiful Topic Name

I am using this pattern, right now, but it doesn't do it all:
    "href=\"(?<url>.*?)\">(?<title>.*?)</A>"

Because there are other links that start with href. 因为还有其他以href开头的链接。

Also to use Regex, I added all lines in a single line of string. 同样使用正则表达式,我在一行字符串中添加了所有行。 Does regex care about new lines? 正则表达式是否关心新行? IE can it continue to match for strings that span multiple lines? IE浏览器可以继续匹配跨越多行的字符串吗?

Please help me with the pattern. 请帮助我的模式。

In regular expression the dot wildcard does not match newlines. 在正则表达式中,点通配符与换行符匹配。 If you want to match any character including newlines, use [^\\x00] instead of . 如果要匹配包括换行符在内的任何字符,请使用[^\\x00]代替. . This matches everything except the null character, which means it matches everything. 这匹配除空字符以外的所有内容,这意味着它与所有内容匹配。

Try this: 尝试这个:

<A\s+style="FONT-WEIGHT: bold"\s+id=(\S+)\s+href="([^"]*)">([^\x00]*?)</A>

If you're trying to assign this to a string using double quotes, you'll need to escape the quotes and backslashes. 如果尝试使用双引号将其分配给字符串,则需要转义引号和反斜杠。 It'll look something like this: 它看起来像这样:

myVar = "<A\\s+style=\"FONT-WEIGHT: bold\"\\s+id=(\\S+)\\s+href=\"([^\"]*)\">([^\\x00]*?)</A>";

You can make the . 您可以制作. in a pattern match newlines by using the RegexOptions.Singleline enumeration: 通过使用RegexOptions.Singleline枚举,在模式中匹配换行符:

Specifies single-line mode. 指定单行模式。 Changes the meaning of the dot (.) so it matches every character (instead of every character except \\n). 更改点(。)的含义,使其匹配每个字符(而不是\\ n以外的每个字符)。

So if your title spanned multiple lines, with the option enabled the (?<title>.*?) part of the pattern would continue across lines attempting to find a match. 因此,如果您的标题跨越多行,则启用该选项后,模式的(?<title>.*?)部分将在尝试查找匹配项的行之间继续。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM