简体   繁体   中英

Regex “\d+” selector selecting digits one by one

I've created a small sample of the string which needs to be filtered:

https://regex101.com/r/PvXRiC/1

I would like to get the "61" from the below html:

<p class="b-list__count__number">
<span>61</span>/
<span>18786</span>
</p>

As you can see from my example, the "([\\d+])" selector is selecting 6 and 1 is different match:

在此处输入图片说明

Is there any way I can get the "61" in a single match?

Your regex does not work because .* is a greedy dot pattern that matches the whole line at once, and then starts backtracking, trying to accommodate some text that should be matched by the subsequent subpatterns. Thus, only the last digit lands in the second capturing group as \\d+ can match 1 digit.

Although you may fix the issue by just making .* lazy with .*? , or a safer [^<]*? , you should not use regex to parse HTML.

Use HtmlAgilityPack , example:

var html = "<p class=\"b-list__count__number\">\n<span>61</span>/\n<span>18786</span>\n</p>";
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult))
{ // html is a URL 
    var doc = new HtmlAgilityPack.HtmlWeb();
    hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
    hap = new HtmlAgilityPack.HtmlDocument();
    hap.LoadHtml(html);
}
var node = hap.DocumentNode.SelectSingleNode("//p[@class='b-list__count__number']");
if (node != null)
{
    Console.Write(node.SelectSingleNode("//span").InnerText); // => 61
}

The //p[@class='b-list__count__number'] is an XPath expression that gets a p node with class attribute having b-list__count__number value. The node.SelectSingleNode("//span").InnerText gets the inner text of the first span child node of the p node found.

The problem in your regex (<p class="b-list__count__number">\\n<span>.*)([\\d+]) is that .* is greedy and takes also all the digits save the last one. You can use [^\\d]* to stop at the first digit.

(<p class="b-list__count__number">\n<span>[^\d]*)(\d+)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM