python regex将行开头的定义单词与另一行的定义单词之间的所有内容匹配

Question

I have file like the below, it's part of the config which contains references for ruledefs (ie rd-6). 我有如下文件，它是配置的一部分，其中包含对ruledefs（即rd-6）的引用。 Config file structure always looks the same except the rulebase and ruledefs names. 除了rulebase和ruledefs名称外，配置文件结构始终看起来相同。 This part is rulebase-definition (for this purpose of this question this is also my RB-definitions.txt) 这部分是rulebase-definition（出于这个问题的目的，这也是我的RB-definitions.txt）

##Rulebase-definition  
rulebase bb
      action priority 6 dynamic-only ruledef rd-6 charging-action throttle monitoring-key 1
      action priority 7 dynamic-only ruledef rd-7 charging-action p2p_Drop
      action priority 139 dynamic-only ruledef rd-8 charging-action p2p_Drop monitoring-key 1
#exit

Here is the ruledef-definition example (also this is the output I'm looking for in this rising this question) 这是ruledef-definition示例（这也是我在提出此问题时所寻找的输出）

##Ruledef-definition
ruledef rd-8
          ip server-ip-address range host-pool BB10_RIM_1
          ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
          ip any-match = TRUE
#exit

I was able to match specyfic rulebase name (with rulebase definition) given by raw_input(), and save it to the file RB-definitions.txt as you can see it above. 我能够匹配raw_input（）给出的特定规则库名称（具有规则库定义），并将其保存到文件RB-definitions.txt中，如上所示。 Also I was able to match ruledef names(but only names) from RB-definitions.txt and store it in ruledef_list with the below 我还能够匹配RB-definitions.txt中的ruledef名称（但只有名称），并将其存储在ruledef_list中，如下所示

RDFile = open('RB-definitions.txt')
txt2 = RDFile.read()
ruledef_list = []
for match2 in re.findall((?<=ruledef)((?:.|\n)*?)(?=charging-action), txt2):
    print match2 +"\n" 
    ruledef_list.append(match2)

But I keep failing when I have to match specific ruledef from ruledef-defitnition as shown above. 但是，当我必须从上图所示的ruledef-defitnition中匹配特定的ruledef时，我仍然失败。 ruledef word is always first in the line Ruledef字始终排在第一位

start_tag =    '^ruledef ' #additional space char
content = '((?:.|\n)*?)'                                
end_tag = '#exit'

for RD_name in ruledef_list:
 print RD_name
 for match in re.findall(start_tag + RD_name + content + end_tag, txt):
    print match + end_tag + "\n"

I tried with '^ruledef ', '^ruledef\\s+' or even '([ruledef ])\\b', but none of this is working. 我尝试使用'^ ruledef'，'^ ruledef \\ s +'甚至是'（[ruledef]）\\ b'，但是这些都不起作用。 I Have to mathc the first word, because if not I will match also part from rulebase-defitnition which starts from "ruledef". 我必须对第一个单词进行数学运算，因为如果没有，我还将匹配“ ruledef”开头的rulebase-defitnition的一部分。

How I can match everything between defined first word in the line to next "#exit"? 如何匹配下一个“ #exit”行中定义的第一个单词之间的所有内容？ So as output I could get the below 所以作为输出我可以得到以下内容

ruledef rd-8
      ip server-ip-address range host-pool BB10_RIM_1
      ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
      ip any-match = TRUE
#exit

For better understanding please find the whole script with example config here http://pastebin.com/q3VUeAdh 为了更好地理解，请在此处http://pastebin.com/q3VUeAdh中找到带有示例配置的整个脚本

Answer 1

You are missing multiline mode. 您缺少多行模式。 Otherwise ^ matches only at the beginning of the entire string. 否则， ^仅在整个字符串的开头匹配。 Also, you can avoid the (?:.|\\n) by using the singleline/dotall mode (which makes . match any character): 另外，您可以通过使用单行/ dotall模式（使.匹配任何字符）来避免(?:.|\\n) ）：

start_tag = r'^ruledef ' #additional space char
content = r'(.*?)'                                
end_tag = r'#exit'

...

for match in re.findall(start_tag + RD_name + content + end_tag, txt, re.M|re.S):
    ...

Note that this will give you the contents of the ruledef (ie just the things that were matched by the content part - no ruledef , no name, no #exit). If this is not what you want, simply remove the parentheses in 请注意，这将为您提供ruledef的内容（即，仅是content部分匹配的content -无ruledef ，无名称，无#exit). If this is not what you want, simply remove the parentheses in #exit). If this is not what you want, simply remove the parentheses in content`: #exit). If this is not what you want, simply remove the parentheses in即可：

...
content = r'.*?'
...

By the way, it might be more efficient to use a negative lookahead instead of an ungreedy quantifier (but it doesn't have to - please profile this, if speed is an important concern for you): 顺便说一句，使用负前瞻而不是贪婪的量词可能会更有效（但是不必这样做-如果速度是您的重要考虑因素，请对此进行简要介绍）：

...
content = r'(?:(?!#exit).)*'
...

Finally, note how I use raw strings for all regex patterns. 最后，请注意我如何对所有正则表达式模式使用原始字符串。 This is just good practice in Python - otherwise you might get problems with complex escape patterns (ie, you'll have to double-escape some things). 这只是Python中的好习惯-否则您可能会遇到复杂的转义模式问题（即，您必须对某些事情进行两次转义）。

python regex将行开头的定义单词与另一行的定义单词之间的所有内容匹配

问题描述

1 个解决方案

解决方案1
2 2013-06-30 16:04:02

python regex将行开头的定义单词与另一行的定义单词之间的所有内容匹配

问题描述

1 个解决方案

解决方案1 2 2013-06-30 16:04:02

解决方案1
2 2013-06-30 16:04:02