如何在Python中的两个重复关键字之间获取子字符串

Question

Denote a string: 表示一个字符串：

 string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

I want to extract the first three sentence, that is, 我想提取前三个句子，即

This is the first sentence.\nIt is the second one.\nNow, this is the third one.

Apparently, the following regular expression does not work: 显然，以下正则表达式不起作用：

re.search('(?<=This)(.*?)(?=\n)', string)

What is the correct expression for extracting text between This and the third \\n ? 在This和第三个\\n之间提取文本的正确表达式是什么？

Thanks. 谢谢。

Answer 1

You can use this regex for capturing three sentences starting with This text, 你可以使用这个表达式，用于捕获三句话开始This文本，

This(?:[^\n]*\n){3}

Demo 演示

Edit: 编辑：

Python code, Python代码

import re

s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
 print(m.group())

Prints, 打印，

This is the first sentence.
It is the second one.
Now, this is the third one.

Answer 2

Jerry's right, regex isn't the right tool for the job and there are much easier and more efficient ways of tackling the problem; 杰里（Jerry）说的对，正则表达式（Regex）不是解决问题的正确工具，并且有很多更容易，更有效的方法来解决问题。

this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

print('\n'.join(this.split('\n', 3)[:-1]))

OUTPUT: OUTPUT：

This is the first sentence.

It is the second one.

Now, this is the third one.

If you just want to practice using regex, following a tutorial would be much easier. 如果您只想练习使用正则表达式，那么遵循教程会容易得多。

Answer 3

Try the following: 请尝试以下操作：

import re

string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)

Giving you: 给你：

 is the first sentence.
It is the second one.
Now, this is the third one.

This assumes there was a missing n before Now . 假设Now之前缺少n 。 If you wish to keep This then you can move it inside the ( 如果您希望保留This则可以将其移至(

Answer 4

(?s)(This.*?)(?=\\nThis)

Make the . 使. include newline with (?s) , look for a sequence starting with This and followed by \\nThis . 包括带有(?s)换行符，寻找一个以This开头，然后是\\nThis的序列。

Don't forget that __repr__ of the search result doesn't print the whole matched string, so you'll need to 不要忘记搜索结果__repr__不会打印出整个匹配的字符串，因此您需要

print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])

如何在Python中的两个重复关键字之间获取子字符串

问题描述

4 个解决方案

解决方案1
1 已采纳 2019-02-27 06:57:23

解决方案2
0 2019-02-27 06:54:42

解决方案3
0 2019-02-27 07:00:45

解决方案4
0 2019-02-27 07:16:34

如何在Python中的两个重复关键字之间获取子字符串

问题描述

4 个解决方案

解决方案1 1 已采纳 2019-02-27 06:57:23

解决方案2 0 2019-02-27 06:54:42

解决方案3 0 2019-02-27 07:00:45

解决方案4 0 2019-02-27 07:16:34

解决方案1
1 已采纳 2019-02-27 06:57:23

解决方案2
0 2019-02-27 06:54:42

解决方案3
0 2019-02-27 07:00:45

解决方案4
0 2019-02-27 07:16:34