简体   繁体   English

Python正则表达式-匹配多次

[英]Python regex - match a number of times

I am wanting to match a character a specific number of times. 我想匹配一个字符特定的次数。 For example, I am wanting to match an author's name in a HTML string that looks like this: 例如,我要在如下所示的HTML字符串中匹配作者的姓名:

base>"author's name"</span>

The following regex matches any character in between "base>" and "/span>" and returns only the author's name: 以下正则表达式匹配“ base>”和“ / span>”之间的任何字符,并仅返回作者的姓名:

base>\s*(.*?)(?=\s*<\/span>)

However, the HTML file contains 50 instances of this and the above regex returns all 50 matches. 但是,HTML文件包含此内容的50个实例,上面的regex返回所有50个匹配项。 How would I modify it so that only the first 10 instances of the matches are returned? 我将如何修改它,以便仅返回匹配的前10个实例?

It is possible to make a regex that captures the first ten instances of that regex by concatenating it after itself delimited by .*? 可以通过将正则表达式自身以.*?分隔后的方式将其连接起来,从而捕获该正则表达式的前十个实例.*? . You can then use the first ten capture groups to extract the authors: 然后,您可以使用前十个捕获组来提取作者:

base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>).*?base>\\s*(.*?)(?=\\s*<\\/span>)

This is however not what you usually want to do as it makes it relatively hard to change the number of authors you search for. 但是,这不是您通常要执行的操作,因为它使更改搜索的作者数量变得相对困难。 Finding all captures and using only the first few might be more CPU intensive but will make it easier to respond to changing requirements. 查找所有捕获并仅使用前几个捕获可能会占用更多的CPU资源,但可以更轻松地响应不断变化的需求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM