简体   繁体   English

通过使用正则表达式匹配在两个字符串之间选择文本

[英]Selecting text between two strings by matching using regex

I know there are similar posts to get the text between two strings but I couldn't figure out what's wrong with my code even after multiple tries, so I decided to post a question.我知道有类似的帖子可以获取两个字符串之间的文本,但即使经过多次尝试,我也无法弄清楚我的代码有什么问题,所以我决定发布一个问题。 The text data I am trying to use regex upon looks as follows:我尝试使用正则表达式的文本数据如下所示:

* * *

  

level a20. heading1 random

  

paragraph 1
paragraph 2


paragraph 3
  

* * *

paragraph 4

paragraph 5

* * *

  

level b22. random-heading2

  

someparagraphs...

I aim to get all the text between level a20.我的目标是获得 a20 级之间的所有文本。 heading1 random to * * * level b22.标题 1 随机到 * * * 级别 b22。 randomheading2.随机标题2。 I was able to find the start of the text using我能够使用找到文本的开头

regex = r"^\* \* \*[ \t\n\r\f]+level \S+ heading random"

but when I try to add the rest of the regex, the code fails to grab the text但是当我尝试添加正则表达式的其余部分时,代码无法获取文本

regex_full = r"^\* \* \*[ \t\n\r\f]+level \S+ heading random(.*?)\* \* \*[ \t\n\r\f]+level \S+ [a-z]+"
re.finditer(regex_full, above_text_data, re.MULTILINE | re.DOTALL)

because I am sure of 'heading random' but in different documents the other heading ('random-heading2') changes, it could even be one word or two works.因为我确定 'heading random' 但在不同的文档中,另一个标题 ('random-heading2') 发生了变化,它甚至可能是一两个词。 Can someone please highlight what's the error with the regex_full expression to obtain all the text between level a20.有人可以突出显示 regex_full 表达式的错误是什么,以获取 a20 级之间的所有文本。 heading1 random to * * * level b22.标题 1 随机到 * * * 级别 b22。 randomheading2 ?随机标题2? On regex101.com it shows the error as "Your regular expression does not match the subject string."在 regex101.com 上,它显示错误为“您的正则表达式与主题字符串不匹配。”

* * *

  

level a20. heading1 random


TEXT OF INTEREST
* * *
  

level b22. random-heading2

Could be this可能是这个

r"\\* \\* \\*\\s*level a20\\. heading1 random\\s*(.*?)\\s*\\* \\* \\*\\s*level b22\\. random-heading2"

capture group 1 contains trimmed content.捕获组 1 包含修剪的内容。

If you want to select text(including newlines) between two Strings:如果要在两个字符串之间选择文本(包括换行符):

(?<=level a20. heading1 random)[\s\S]*?(?=level b22. random-heading2)

May work可以工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM