正则表达式“ \\ d +”选择器一一选择数字

Question

I've created a small sample of the string which needs to be filtered: 我创建了一个字符串的小样本，需要对其进行过滤：

https://regex101.com/r/PvXRiC/1 https://regex101.com/r/PvXRiC/1

I would like to get the "61" from the below html: 我想从下面的html中获取“ 61”：

<p class="b-list__count__number">
<span>61</span>/
<span>18786</span>
</p>

As you can see from my example, the "([\\d+])" selector is selecting 6 and 1 is different match: 从我的示例中可以看到，“（[[d +]）”选择器选择6和1是不同的匹配项：

Is there any way I can get the "61" in a single match? 有什么办法可以让我在单场比赛中获得“ 61”？

Answer 1

Your regex does not work because .* is a greedy dot pattern that matches the whole line at once, and then starts backtracking, trying to accommodate some text that should be matched by the subsequent subpatterns. 您的正则表达式无法正常工作，因为.*是一个贪婪的点模式，该模式一次匹配整行，然后开始回溯，尝试容纳一些应与后续子模式匹配的文本。 Thus, only the last digit lands in the second capturing group as \\d+ can match 1 digit. 因此，只有最后一位落在第二捕获组中，因为\\d+可以匹配一位。

Although you may fix the issue by just making .* lazy with .*? 尽管您可以通过仅使.*与.*?成为惰性来解决此问题.*? , or a safer [^<]*? ，或更安全的[^<]*? , you should not use regex to parse HTML. ，则不应使用正则表达式来解析HTML。

Use HtmlAgilityPack , example: 使用HtmlAgilityPack ，例如：

var html = "<p class=\"b-list__count__number\">\n<span>61</span>/\n<span>18786</span>\n</p>";
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult))
{ // html is a URL 
    var doc = new HtmlAgilityPack.HtmlWeb();
    hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
    hap = new HtmlAgilityPack.HtmlDocument();
    hap.LoadHtml(html);
}
var node = hap.DocumentNode.SelectSingleNode("//p[@class='b-list__count__number']");
if (node != null)
{
    Console.Write(node.SelectSingleNode("//span").InnerText); // => 61
}

The //p[@class='b-list__count__number'] is an XPath expression that gets a p node with class attribute having b-list__count__number value. //p[@class='b-list__count__number']是一个XPath表达式，该表达式获取具有class属性的p节点具有b-list__count__number值。 The node.SelectSingleNode("//span").InnerText gets the inner text of the first span child node of the p node found. node.SelectSingleNode("//span").InnerText获取找到的p节点的第一个span子节点的内部文本。

Answer 2

The problem in your regex (<p class="b-list__count__number">\\n<span>.*)([\\d+]) is that .* is greedy and takes also all the digits save the last one. 正则表达式(<p class="b-list__count__number">\\n<span>.*)([\\d+])是.*贪婪，并且所有数字都保存了最后一位。 You can use [^\\d]* to stop at the first digit. 您可以使用[^\\d]*停在第一位。

(<p class="b-list__count__number">\n<span>[^\d]*)(\d+)

正则表达式“ \\ d +”选择器一一选择数字

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-07-05 12:42:50

解决方案2
0 2018-07-05 12:27:34

正则表达式“ \\ d +”选择器一一选择数字

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-07-05 12:42:50

解决方案2 0 2018-07-05 12:27:34

解决方案1
1 已采纳 2018-07-05 12:42:50

解决方案2
0 2018-07-05 12:27:34