[英]Python regex - finding all substrings between two delimiters
I've been dealing with this problem for over a day already and i just can't figure it out.. 我已经解决这个问题超过一天了,但我无法解决。
The problem i have is following: Given the text: 我遇到的问题如下:给定文本:
Obratite pažnju na sljedece:
Obratitepažnjuna sljedece:
Pad prometaPad Prometa
Rentabilnost imovineRentabilnost imovine
Neto maržu内托·马尔祖(Netomaržu)
**************************************************************
************************************************** ************
I need to extract all the text that is between word "sljedece:" ( without qouatiton marks) and the row of asterisks. 我需要提取单词“ sljedece:”(没有qouatiton标记)和星号行之间的所有文本。
I tried to use the following code: 我尝试使用以下代码:
import re
text = """
Obratite pažnju na sljedece:
Pad prometa
Rentabilnost imovine
Neto maržu
**************************************************************
"""
pattern = r"sljecece:(.*?)\*+"
napomene = re.findall(pattern, text)
print(napomene)
But it prints out an empty list. 但它会打印出一个空列表。
Thx to everyone in advance! 提前向大家致谢!
You have to pass re.DOTALL
to make .
您必须通过
re.DOTALL
才能进行.
match newlines: 匹配换行符:
re.findall(pattern, text, re.DOTALL)
You also have a typo on your pattern r"sljecece:(.*?)\\*+"
should be r"sljedece:(.*?)\\*+"
. 您的模式
r"sljecece:(.*?)\\*+"
上也有错字r"sljecece:(.*?)\\*+"
应该是r"sljedece:(.*?)\\*+"
。
To be more efficient, you can limit the impact of the lazy quantifier grabbing entire lines until the asterisk line: 为了提高效率,您可以限制惰性量词捕获整行的影响,直到星号行为止:
re.findall(r'\bsljedece:((?:.*\n)+?)\*+$', text, re.M)
Perhaps the re.search
method is more appropriate in your case. 也许
re.search
方法更适合您的情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.