简体   繁体   English

我在使用C#中的多行正则表达式时遇到问题,如何解决此问题?

[英]I'm having trouble with a multiline regex in C#, how do I fix this?

I have the following code to attempt to extract the content of li tags. 我有以下代码尝试提取li标签的内容。

        string blah = @"<ul>
        <li>foo</li>
        <li>bar</li>
        <li>oof</li>
        </ul>";

        string liRegexString = @"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
        Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
        Match liMatches = liRegex.Match(blah);
        if (liMatches.Success)
        {
            foreach (var group in liMatches.Groups)
            {
                Console.WriteLine(group);
            }
        }
        Console.ReadLine();

The Regex started much simpler and without the multiline option, but I've been tweaking it to try to make it work. 正则表达式开始简单得多,没有多行选项,但是我一直在对其进行调整以尝试使其正常工作。

I want results foo , bar and oof but instead I get <li>foo</li> and foo . 我想要结果foobaroof但我却得到了<li>foo</li>foo

On top of this I it seems to work fine in Regex101, https://regex101.com/r/jY6rnz/1 最重要的是,我似乎可以在Regex101中正常工作, https: //regex101.com/r/jY6rnz/1

Any thoughts? 有什么想法吗?

I will start by saying that I think as mentioned in comments you should be parsing HTML with a proper HTML parser such as the HtmlAgilityPack. 我首先要说的是,我认为如注释中所述,您应该使用适当的HTML解析器(例如HtmlAgilityPack)解析HTML。 Moving on to actually answer your question though... 继续实际回答您的问题...

The problem is that you are getting a single match because liRegex.Match(blah); 问题是您因为liRegex.Match(blah);而得到了一个匹配liRegex.Match(blah); only returns a single match. 仅返回一个匹配项。 What you want is liRegex.Matches(blah) which will return all matches. 您想要的是liRegex.Matches(blah) ,它将返回所有匹配项。

So your use would be: 因此,您将使用:

var liMatches = liRegex.Matches(blah);
foreach(Match match in liMatches)
{
    Console.WriteLine(match.Groups[1].Value);
}

Your regex produces multiple matches when matched with blah . blah匹配时,您的正则表达式会产生多个匹配项。 The method Match only returns the first match, which is the foo one. Match方法仅返回第一个匹配项,即foo You are printing all groups in that first match. 您正在打印该第一个匹配项中的所有组。 That will get you 1. the whole match 2. group 1 of the match. 这将为您带来1.整个比赛2.比赛的第1组。

If you want to get foo and bar , then you should print group 1 of each match. 如果要获取foobar ,则应打印每个匹配项的组1。 To do this you should get all the matches using Matches first. 为此,您应该首先使用“匹配”来获取所有Matches Then iterate over the MatchCollection and print Groups[1] : 然后遍历MatchCollection并打印Groups[1]

string blah = @"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = @"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
MatchCollection liMatches = liRegex.Matches(blah);
foreach (var match in liMatches.Cast<Match>())
{
    Console.WriteLine(match.Groups[1]);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM