简体   繁体   English

正则表达式“ \\ d +”选择器一一选择数字

[英]Regex “\d+” selector selecting digits one by one

I've created a small sample of the string which needs to be filtered: 我创建了一个字符串的小样本,需要对其进行过滤:

https://regex101.com/r/PvXRiC/1 https://regex101.com/r/PvXRiC/1

I would like to get the "61" from the below html: 我想从下面的html中获取“ 61”:

<p class="b-list__count__number">
<span>61</span>/
<span>18786</span>
</p>

As you can see from my example, the "([\\d+])" selector is selecting 6 and 1 is different match: 从我的示例中可以看到,“([[d +])”选择器选择6和1是不同的匹配项:

在此处输入图片说明

Is there any way I can get the "61" in a single match? 有什么办法可以让我在单场比赛中获得“ 61”?

Your regex does not work because .* is a greedy dot pattern that matches the whole line at once, and then starts backtracking, trying to accommodate some text that should be matched by the subsequent subpatterns. 您的正则表达式无法正常工作,因为.*是一个贪婪的点模式,该模式一次匹配整行,然后开始回溯,尝试容纳一些应与后续子模式匹配的文本。 Thus, only the last digit lands in the second capturing group as \\d+ can match 1 digit. 因此,只有最后一位落在第二捕获组中,因为\\d+可以匹配一位。

Although you may fix the issue by just making .* lazy with .*? 尽管您可以通过仅使.*.*?成为惰性来解决此问题.*? , or a safer [^<]*? ,或更安全的[^<]*? , you should not use regex to parse HTML. ,则不应使用正则表达式来解析HTML。

Use HtmlAgilityPack , example: 使用HtmlAgilityPack ,例如:

var html = "<p class=\"b-list__count__number\">\n<span>61</span>/\n<span>18786</span>\n</p>";
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult))
{ // html is a URL 
    var doc = new HtmlAgilityPack.HtmlWeb();
    hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
    hap = new HtmlAgilityPack.HtmlDocument();
    hap.LoadHtml(html);
}
var node = hap.DocumentNode.SelectSingleNode("//p[@class='b-list__count__number']");
if (node != null)
{
    Console.Write(node.SelectSingleNode("//span").InnerText); // => 61
}

The //p[@class='b-list__count__number'] is an XPath expression that gets a p node with class attribute having b-list__count__number value. //p[@class='b-list__count__number']是一个XPath表达式,该表达式获取具有class属性的p节点具有b-list__count__number值。 The node.SelectSingleNode("//span").InnerText gets the inner text of the first span child node of the p node found. node.SelectSingleNode("//span").InnerText获取找到的p节点的第一个span子节点的内部文本。

The problem in your regex (<p class="b-list__count__number">\\n<span>.*)([\\d+]) is that .* is greedy and takes also all the digits save the last one. 正则表达式(<p class="b-list__count__number">\\n<span>.*)([\\d+]).*贪婪,并且所有数字都保存了最后一位。 You can use [^\\d]* to stop at the first digit. 您可以使用[^\\d]*停在第一位。

(<p class="b-list__count__number">\n<span>[^\d]*)(\d+)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 C#正则表达式删除\\ d +,一,九和小数点以外的所有内容 - C# Regex remove everything except \d+, one, nine and decimal point 无法将此字符串“ $ 25.56上的18%”与此正则表达式“。*(\\ d +)%。* \\ $(\\ d +)(\\。\\ d +)?$”匹配。 我究竟做错了什么? - Unable to match this string “18% on $25.56” with this regex “.*(\d+)%.*\$(\d+)(\.\d+)?$”. What am I doing wrong? 正则表达式在字符串中找到一位或两位数,但找不到三位 - regex find one or two digits but not three digits in a string 如何修复我的正则表达式 ^\d+[-\d]?\d* 以匹配 123-45 但不匹配 123-? - How do I fix my regex ^\d+[-\d]?\d* to match 123-45 but not 123-? 正则表达式查找给定范围内5个连续数字的一个或多个实例 - RegEx To Find One or More Instances of 5 Consecutive Digits Within A Given Range 正则表达式匹配点前2位和点后1位(仅0或5) - RegEx to match 2 digits before the dot and one digit after the dot(just 0 or 5) 正则表达式掩码,可替换数字和仅一个连字符C# - Regex mask with replace for digits and only one hyphen C# 如何优化此C#正则表达式? “^ \\ S([ - *] | [。] \\ d +)?\\ S +” - How can I optimize this C# regex? “^\s?([-*]|\d+[.])\s+” 正则表达式-如何用一个字符替换每个数字而又不擦除它们周围的任何字符? - Regex - How to replace each of those digits with one character without erasing any character just around them? 如何编写C#Regex以确保字符串以一个char后跟6位开头 - How to write a C# Regex to ensure a string starts with one char followed by 6 digits
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM