[英]Regular expression to return all characters between two strings
How can I design a regular expression that will capture all the characters between 2 strings? 如何设计一个能捕获2个字符串之间所有字符的正则表达式? Specifically, from this big string:
具体来说,从这个大字符串:
Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]
I want to extract all the characters between [^title=
and ]
, that is, Fish consumption and incidence of stroke: a meta-analysis of cohort studies
and The second title
. 我想提取
[^title=
和]
之间的所有字符,即Fish consumption and incidence of stroke: a meta-analysis of cohort studies
和The second title
。
I think I will have to use re.findall(), and that I can start with this: re.findall(r'\\[([^]]*)\\]', big_string)
, which will give me all the matches between the square brackets [ ]
, but I'm not sure how to extend it. 我想我将不得不使用re.findall(),并且我可以从这开始:
re.findall(r'\\[([^]]*)\\]', big_string)
,这将给我所有的匹配方括号[ ]
,但我不知道如何扩展它。
>>> text = "Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]"
>>> re.findall(r"\[\^title=(.*?)\]", text)
['Fish consumption and incidence of stroke: a meta-analysis of cohort studies', 'The second title']
Here is a breakdown of the regex: 这是正则表达式的细分:
\\[
is an escaped [ character. \\[
是一个逃脱的[角色。
\\^
is an escaped ^ character. \\^
是一个转义的^字符。
title=
matches title= title=
匹配title =
(.*?)
matches any characters, non-greedily, and puts them in a group (for findall to extract). (.*?)
匹配任何字符,非贪婪,并将它们放在一个组中(用于findall提取)。 Which means it stops when it finds a... 这意味着当它找到...时会停止
\\]
, which is an escaped ] character. \\]
,这是一个逃脱的]角色。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.