[英]python develop non-greedy regex to match specific pattern several times
I am about to develop a regex for a pattern given in a file I want to process. 我将要为要处理的文件中给出的模式开发一个正则表达式。
The file contains several articles, which all follow a similar pattern: 该文件包含几篇文章,所有文章都遵循类似的模式:
I try to come up with a non-greedy regex, that accurately matches the start, body, and end of the article(s). 我尝试提出一个非贪婪的正则表达式,该表达式与文章的开头,正文和结尾准确匹配。
For 1-4 I have ^n\\W+Dokument.+?[\\r\\n][\\r\\n]\\W+Copyright[^\\n]+\\n
对于1-4,我有
^n\\W+Dokument.+?[\\r\\n][\\r\\n]\\W+Copyright[^\\n]+\\n
What is necessary for 5-6? 5-6需要什么?
Do I actually need a dotall flag if I aim to use this regex as proposed to match the pattern several times in a file? 如果我打算按照建议的方式使用此正则表达式来在文件中多次匹配模式,是否真的需要dotall标志?
I have been on this all day. 我整天都在忙。 Can someone with a fresh mind show me the missing bits?
可以让有新主意的人告诉我缺失的地方吗?
Cheers, Andrew 干杯,安德鲁
You can use the following: 您可以使用以下内容:
- one optional line containing non-word characters followed by more characters and a new line
一条包含非单词字符的可选行,后接更多字符和新行
(\W+?(?:(?!All|Alle).)+?\n)?
- one line containing non-word characters followed by either "All Rights Reserved" or "Alle Rechte vorbehalten" and a new line
一行包含非单词字符,后跟“保留所有权利”或“ Alle Rechte vorbehalten”和换行
\W+(All Rights Reserved|Alle Rechte vorbehalten)\n
Combining 1-6: 结合1-6:
^\W+Dokument.+?[\r\n][\r\n]\W+Copyright[^\n]+\n(\W+?(?:(?!All|Alle).)+?\n)?\W+?(?:All Rights Reserved|Alle Rechte vorbehalten)\n
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.