简体   繁体   English

如何在Python中的两个重复关键字之间获取子字符串

[英]How to get sub-string between two repetitive keywords in Python

Denote a string: 表示一个字符串:

 string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

I want to extract the first three sentence, that is, 我想提取前三个句子,即

This is the first sentence.\nIt is the second one.\nNow, this is the third one.

Apparently, the following regular expression does not work: 显然,以下正则表达式不起作用:

re.search('(?<=This)(.*?)(?=\n)', string)

What is the correct expression for extracting text between This and the third \\n ? This和第三个\\n之间提取文本的正确表达式是什么?

Thanks. 谢谢。

You can use this regex for capturing three sentences starting with This text, 你可以使用这个表达式,用于捕获三句话开始This文本,

This(?:[^\n]*\n){3}

Demo 演示

Edit: 编辑:

Python code, Python代码

import re

s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
 print(m.group())

Prints, 打印,

This is the first sentence.
It is the second one.
Now, this is the third one.

Jerry's right, regex isn't the right tool for the job and there are much easier and more efficient ways of tackling the problem; 杰里(Jerry)说的对,正则表达式(Regex)不是解决问题的正确工具,并且有很多更容易,更有效的方法来解决问题。

this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

print('\n'.join(this.split('\n', 3)[:-1]))

OUTPUT: OUTPUT:

This is the first sentence.

It is the second one.

Now, this is the third one.

If you just want to practice using regex, following a tutorial would be much easier. 如果您只想练习使用正则表达式,那么遵循教程会容易得多。

Try the following: 请尝试以下操作:

import re

string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)

Giving you: 给你:

 is the first sentence.
It is the second one.
Now, this is the third one.

This assumes there was a missing n before Now . 假设Now之前缺少n If you wish to keep This then you can move it inside the ( 如果您希望保留This则可以将其移至(

(?s)(This.*?)(?=\\nThis)

Make the . 使. include newline with (?s) , look for a sequence starting with This and followed by \\nThis . 包括带有(?s)换行符,寻找一个以This开头,然后是\\nThis的序列。

Don't forget that __repr__ of the search result doesn't print the whole matched string, so you'll need to 不要忘记搜索结果__repr__不会打印出整个匹配的字符串,因此您需要

print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM