简体   繁体   English

仅捕获单个比赛-正则表达式

[英]Capture a single match only - Regex

i want to capture only the first match through the expression 我只想通过表达式捕获第一个匹配项

<p>.*?</p>

i have tried <p>.*?</p>{1} but it is not working it returns all the p tags which are in the html document, please help 我已经尝试过<p>.*?</p>{1}但无法正常工作,它会返回html文档中的所有p标签,请提供帮助

It looks like you are using a method which returns every match in the string given a regex, that being the case you need to anchor the regex to the beggining of the string so it doesn't return every match, but only the first one: 看起来您正在使用一种方法,该方法会在给定正则表达式的情况下返回字符串中的每个匹配项,在这种情况下,您需要将正则表达式锚定在字符串的开头,因此它不会返回所有匹配项,而只会返回第一个匹配项:

^.*?<p>.*?</p>

Use parentheses to capture what you want to capture. 使用括号捕获要捕获的内容。

PS: Here goes the standard 'avoid using regex to parse HTML, use a proper HTML parser' advice. PS:这是标准的“避免使用正则表达式解析HTML,使用适当的HTML解析器”的建议。 This simple regex will fail for nested <p> sections (which I don't recall if are valid in HTML, but still you can probably get them even if they aren't). 对于嵌套的<p>部分,此简单的正则表达式将失败 (我不记得它是否在HTML中有效,但是即使它们不是无效的,您仍然可以得到它们)。

The Regex.Match method does this by default, and the regular expression is correct. Regex.Match方法默认情况下会执行此操作,并且正则表达式正确。

Regex regex = new Regex("<p>(.*?)</p>");
Match match = regex.Match("<p>1</p><p>2</p>");
Console.WriteLine("{0}", match.Value);

Running this program will print 1 . 运行该程序将打印1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM