简体   繁体   English

正则表达式返回两个字符串之间的所有字符

[英]Regular expression to return all characters between two strings

How can I design a regular expression that will capture all the characters between 2 strings? 如何设计一个能捕获2个字符串之间所有字符的正则表达式? Specifically, from this big string: 具体来说,从这个大字符串:

Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]

I want to extract all the characters between [^title= and ] , that is, Fish consumption and incidence of stroke: a meta-analysis of cohort studies and The second title . 我想提取[^title=]之间的所有字符,即Fish consumption and incidence of stroke: a meta-analysis of cohort studiesThe second title

I think I will have to use re.findall(), and that I can start with this: re.findall(r'\\[([^]]*)\\]', big_string) , which will give me all the matches between the square brackets [ ] , but I'm not sure how to extend it. 我想我将不得不使用re.findall(),并且我可以从这开始: re.findall(r'\\[([^]]*)\\]', big_string) ,这将给我所有的匹配方括号[ ] ,但我不知道如何扩展它。

>>> text = "Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]"
>>> re.findall(r"\[\^title=(.*?)\]", text)
['Fish consumption and incidence of stroke: a meta-analysis of cohort studies', 'The second title']

Here is a breakdown of the regex: 这是正则表达式的细分:

\\[ is an escaped [ character. \\[是一个逃脱的[角色。

\\^ is an escaped ^ character. \\^是一个转义的^字符。

title= matches title= title=匹配title =

(.*?) matches any characters, non-greedily, and puts them in a group (for findall to extract). (.*?)匹配任何字符,非贪婪,并将它们放在一个组中(用于findall提取)。 Which means it stops when it finds a... 这意味着当它找到...时会停止

\\] , which is an escaped ] character. \\] ,这是一个逃脱的]角色。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM