[英]How to extract multiline text using regex python
Hi I have the following text. 嗨,我有以下文字。
x = """Hello, this is a\\nmultiline text\\nend.Hello, this is\\nthe second chunck\\nend."""
x =“”“您好,这是一个\\ n多行文字\\ nend。您好,这是\\ n第二个分块\\ nend。”“”
This pattern of Hello, \\nend. 您好,\\ nend的这种模式。 keeps on repeating.
不断重复。 I want to extract the text between each set of these two words.
我想在这两个单词的每组之间提取文本。 I tried using this
我尝试使用这个
b=re.search(r'(?<=Hello,).+(?=end)', x, re.DOTALL)
b = re.search(r'(?<= Hello,)。+(?= end)',x,re.DOTALL)
but I get all the text from the start to the end. 但我从头到尾都得到了所有文字。 How do I get the separate chunks of text?
如何获得单独的文本块?
Thanks.p Thanks.p
Use a lazy quantifier : .+?
使用惰性的量词 :
.+?
instead of .+
. 而不是
.+
。
The problem is that the .+
matches as far as it can, so just eats all the way to the end of the documents. 问题是
.+
尽可能匹配,因此一直吃到文档末尾。 Adding the question mark tells it to match as little as it can. 添加问号会告诉它尽可能少地匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.