[英]I'm having trouble with a multiline regex in C#, how do I fix this?
I have the following code to attempt to extract the content of li tags. 我有以下代码尝试提取li标签的内容。
string blah = @"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = @"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
Match liMatches = liRegex.Match(blah);
if (liMatches.Success)
{
foreach (var group in liMatches.Groups)
{
Console.WriteLine(group);
}
}
Console.ReadLine();
The Regex started much simpler and without the multiline option, but I've been tweaking it to try to make it work. 正则表达式开始简单得多,没有多行选项,但是我一直在对其进行调整以尝试使其正常工作。
I want results foo
, bar
and oof
but instead I get <li>foo</li>
and foo
. 我想要结果foo
, bar
和oof
但我却得到了<li>foo</li>
和foo
。
On top of this I it seems to work fine in Regex101, https://regex101.com/r/jY6rnz/1 最重要的是,我似乎可以在Regex101中正常工作, https: //regex101.com/r/jY6rnz/1
Any thoughts? 有什么想法吗?
I will start by saying that I think as mentioned in comments you should be parsing HTML with a proper HTML parser such as the HtmlAgilityPack. 我首先要说的是,我认为如注释中所述,您应该使用适当的HTML解析器(例如HtmlAgilityPack)解析HTML。 Moving on to actually answer your question though... 继续实际回答您的问题...
The problem is that you are getting a single match because liRegex.Match(blah);
问题是您因为liRegex.Match(blah);
而得到了一个匹配liRegex.Match(blah);
only returns a single match. 仅返回一个匹配项。 What you want is liRegex.Matches(blah)
which will return all matches. 您想要的是liRegex.Matches(blah)
,它将返回所有匹配项。
So your use would be: 因此,您将使用:
var liMatches = liRegex.Matches(blah);
foreach(Match match in liMatches)
{
Console.WriteLine(match.Groups[1].Value);
}
Your regex produces multiple matches when matched with blah
. 与blah
匹配时,您的正则表达式会产生多个匹配项。 The method Match
only returns the first match, which is the foo
one. Match
方法仅返回第一个匹配项,即foo
。 You are printing all groups in that first match. 您正在打印该第一个匹配项中的所有组。 That will get you 1. the whole match 2. group 1 of the match. 这将为您带来1.整个比赛2.比赛的第1组。
If you want to get foo
and bar
, then you should print group 1 of each match. 如果要获取foo
和bar
,则应打印每个匹配项的组1。 To do this you should get all the matches using Matches
first. 为此,您应该首先使用“匹配”来获取所有Matches
。 Then iterate over the MatchCollection
and print Groups[1]
: 然后遍历MatchCollection
并打印Groups[1]
:
string blah = @"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = @"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
MatchCollection liMatches = liRegex.Matches(blah);
foreach (var match in liMatches.Cast<Match>())
{
Console.WriteLine(match.Groups[1]);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.