[英]Find text between two strings
I've a big text like the following excerpt: 我有一段大文字,例如以下摘录:
test = '''
Sra. Montero.- ¡No, no! No empecemos.
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...
Sra. Montero.- El senador Fernández
Sra. Montero.- ¡No, no! No empecemos.
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...
Sra. Montero.- El senador Fernández
Sra. Montero.- ¡No, no! No empecemos.
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...
Sra. Montero.- El senador Fernández
Sra. Montero.- ¡No, no! No empecemos.
Sr. Jefe de Gabinete de Ministros.- Respetuosamente se lo digo...
Sra. Montero.- El senador Fernández
'''
I'd like to get all the text between the string "Sr. Jefe de Gabinete de Ministros.-" and the string "Sr{{ random_text_here }}.-". 我想获取字符串“ Sr. Jefe de Gabinete de Ministros.-”和字符串“ Sr {{random_text_here}} .-”之间的所有文本。 So in this example what I'd like to get would be the following: 因此,在此示例中,我想要得到的是以下内容:
data = ['Respetuosamente se lo digo...', 'Respetuosamente se lo digo...', 'Respetuosamente se lo digo...']
I know the regex clause has to be non-greedy and I already tested something like this: 我知道regex子句必须是非贪婪的,并且我已经测试过类似的东西:
bw_sr = re.compile('\.\-(.+?)Sr[.+]\.\-') #non greedy regexx
data = bw_sr.findall(test)
But I end up getting an empty list. 但是我最终得到一个空名单。 I tried several clauses but I can't seem to get to a solution. 我尝试了几个子句,但似乎无法解决。
your regex was wrong (this one [.+]
was between brackets which defined a character range, so it wasn't working, among other issues, like no way to distinguish between "Sr." and "Sra" (seems what you wanted to do seeing the output), which I fixed by doing Sr\\.
). 您的正则表达式是错误的(此[.+]
位于定义字符范围的方括号之间,因此它不起作用,还有其他问题,例如无法区分“ Sr.”和“ Sra”(似乎是您想要的)来查看输出),这是通过Sr\\.
修复的。
I came up with that one which matches the formulas and also "El senador Fernández", etc... there's no criterion to filter those. 我想出了一个与公式以及“ El senadorFernández”相匹配的公式,等等。。。没有筛选标准。 I also added \\s*
before the capturing group to "strip" blanks: 我还在捕获组之前添加了\\s*
来“剥离”空白:
bw_sr = re.compile('\.\-\s*(.+?)\nSr\..+?\.\-')
data = bw_sr.findall(test)
print(data)
result: 结果:
['¡No, no! No empecemos.', '¡No, no! No empecemos.', '¡No, no! No empecemos.', '¡No, no! No empecemos.']
It's work: 是工作:
bw_sr = re.compile('\.\- (.*)')
data = bw_sr.findall(test)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.