查找两个字符串之间的文本

Question

I've a big text like the following excerpt: 我有一段大文字，例如以下摘录：

test = '''
Sra. Montero.- ¡No, no! No empecemos.   
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...   
Sra. Montero.- El senador Fernández
Sra. Montero.- ¡No, no! No empecemos.   
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...   
Sra. Montero.- El senador Fernández
Sra. Montero.- ¡No, no! No empecemos.   
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...   
Sra. Montero.- El senador Fernández
Sra. Montero.- ¡No, no! No empecemos.   
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...   
Sra. Montero.- El senador Fernández
'''

I'd like to get all the text between the string "Sr. Jefe de Gabinete de Ministros.-" and the string "Sr{{ random_text_here }}.-". 我想获取字符串“ Sr. Jefe de Gabinete de Ministros.-”和字符串“ Sr {{random_text_here}} .-”之间的所有文本。 So in this example what I'd like to get would be the following: 因此，在此示例中，我想要得到的是以下内容：

data = ['Respetuosamente se lo digo...', 'Respetuosamente se lo digo...', 'Respetuosamente se lo digo...']

I know the regex clause has to be non-greedy and I already tested something like this: 我知道regex子句必须是非贪婪的，并且我已经测试过类似的东西：

bw_sr = re.compile('\.\-(.+?)Sr[.+]\.\-') #non greedy regexx              
data = bw_sr.findall(test)

But I end up getting an empty list. 但是我最终得到一个空名单。 I tried several clauses but I can't seem to get to a solution. 我尝试了几个子句，但似乎无法解决。

Answer 1

your regex was wrong (this one [.+] was between brackets which defined a character range, so it wasn't working, among other issues, like no way to distinguish between "Sr." and "Sra" (seems what you wanted to do seeing the output), which I fixed by doing Sr\\. ). 您的正则表达式是错误的（此[.+]位于定义字符范围的方括号之间，因此它不起作用，还有其他问题，例如无法区分“ Sr.”和“ Sra”（似乎是您想要的）来查看输出），这是通过Sr\\.修复的。

I came up with that one which matches the formulas and also "El senador Fernández", etc... there's no criterion to filter those. 我想出了一个与公式以及“ El senadorFernández”相匹配的公式，等等。。。没有筛选标准。 I also added \\s* before the capturing group to "strip" blanks: 我还在捕获组之前添加了\\s*来“剥离”空白：

bw_sr = re.compile('\.\-\s*(.+?)\nSr\..+?\.\-')
data = bw_sr.findall(test)

print(data)

result: 结果：

['¡No, no! No empecemos.', '¡No, no! No empecemos.', '¡No, no! No empecemos.', '¡No, no! No empecemos.']

Answer 2

It's work: 是工作：

bw_sr = re.compile('\.\- (.*)')
data = bw_sr.findall(test)

查找两个字符串之间的文本

问题描述

2 个解决方案

解决方案1
0 2017-08-16 20:03:31

解决方案2
0 2017-08-16 20:08:30

查找两个字符串之间的文本

问题描述

2 个解决方案

解决方案1 0 2017-08-16 20:03:31

解决方案2 0 2017-08-16 20:08:30

解决方案1
0 2017-08-16 20:03:31

解决方案2
0 2017-08-16 20:08:30