[英]How to get sub-string between two repetitive keywords in Python
Denote a string: 表示一个字符串:
string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
I want to extract the first three sentence, that is, 我想提取前三个句子,即
This is the first sentence.\nIt is the second one.\nNow, this is the third one.
Apparently, the following regular expression does not work: 显然,以下正则表达式不起作用:
re.search('(?<=This)(.*?)(?=\n)', string)
What is the correct expression for extracting text between This
and the third \\n
? 在This
和第三个\\n
之间提取文本的正确表达式是什么?
Thanks. 谢谢。
You can use this regex for capturing three sentences starting with This
text, 你可以使用这个表达式,用于捕获三句话开始This
文本,
This(?:[^\n]*\n){3}
Edit: 编辑:
Python code, Python代码
import re
s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
print(m.group())
Prints, 打印,
This is the first sentence.
It is the second one.
Now, this is the third one.
Jerry's right, regex isn't the right tool for the job and there are much easier and more efficient ways of tackling the problem; 杰里(Jerry)说的对,正则表达式(Regex)不是解决问题的正确工具,并且有很多更容易,更有效的方法来解决问题。
this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
print('\n'.join(this.split('\n', 3)[:-1]))
OUTPUT: OUTPUT:
This is the first sentence.
It is the second one.
Now, this is the third one.
If you just want to practice using regex, following a tutorial would be much easier. 如果您只想练习使用正则表达式,那么遵循教程会容易得多。
Try the following: 请尝试以下操作:
import re
string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)
Giving you: 给你:
is the first sentence.
It is the second one.
Now, this is the third one.
This assumes there was a missing n
before Now
. 假设Now
之前缺少n
。 If you wish to keep This
then you can move it inside the (
如果您希望保留This
则可以将其移至(
(?s)(This.*?)(?=\\nThis)
Make the .
使.
include newline with (?s)
, look for a sequence starting with This
and followed by \\nThis
. 包括带有(?s)
换行符,寻找一个以This
开头,然后是\\nThis
的序列。
Don't forget that __repr__
of the search result doesn't print the whole matched string, so you'll need to 不要忘记搜索结果__repr__
不会打印出整个匹配的字符串,因此您需要
print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.