简体   繁体   English

正则表达式捕获两个正则表达式模式之间的 n 行文本

[英]Regular expression to capture n lines of text between two regex patterns

Need help with a regular expression to grab exactly n lines of text between two regex matches.需要正则表达式的帮助才能在两个正则表达式匹配之间准确抓取 n 行文本。 For example, I need 17 lines of text and I used the example below, which does not work.例如,我需要 17 行文本,我使用了下面的示例,它不起作用。 I

Please see sample code below:请看下面的示例代码:

import re
match_string = re.search(r'^.*MDC_IDC_RAW_MARKER((.*?\r?\n){17})Stored_EGM_Trigger.*\n'), t, re.DOTALL).group()
value1 = re.search(r'value="(\d+)"', match_string).group(1)
value2 = re.search(r'value="(\d+\.\d+)"', match_string).group(1)
print(match_string)
print(value1)
print(value2)

I added a sample string to here, because SO does not allow long code string: https://hastebin.com/aqowusijuc.xml我在这里添加了一个示例字符串,因为 SO 不允许长代码字符串: https://hastebin.com/aqowusijuc.xml

You are getting false positives because you are using the re.DOTALL flag, which allows the .你得到误报是因为你使用了 re.DOTALL 标志,它允许. character to match newline characters.匹配换行符的字符。 That is, when you are matching ((.*?\r?\n){17}) , the .也就是说,当您匹配((.*?\r?\n){17})时, . could eat up many extra newline characters just to satisfy your required count of 17. You also now realize that the \r is superfluous.可能会吃掉许多额外的换行符以满足您所需的 17 个计数。您现在还意识到\r是多余的。 Also, starting your regex with ^.*?另外,用^.*?开始你的正则表达式is superfluous because you are forcing the search to start from the beginning but then saying that the search engine should skip as many characters as necessary to find MDC_IDC_RAW_MARKER .是多余的,因为您强制搜索从头开始,然后说搜索引擎应该跳过尽可能多的字符以找到MDC_IDC_RAW_MARKER So, a simplified and correct regex would be:因此,一个简化且正确的正则表达式将是:

match_string = re.search(r'MDC_IDC_RAW_MARKER.*\n((.*\n){17})Stored_EGM_Trigger.*\n', t)

Regex Demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM