简体   繁体   English

python regex将行开头的定义单词与另一行的定义单词之间的所有内容匹配

[英]python regex match everything between defined word on the beginning of the line to defined word in other line

I have file like the below, it's part of the config which contains references for ruledefs (ie rd-6). 我有如下文件,它是配置的一部分,其中包含对ruledefs(即rd-6)的引用。 Config file structure always looks the same except the rulebase and ruledefs names. 除了rulebase和ruledefs名称外,配置文件结构始终看起来相同。 This part is rulebase-definition (for this purpose of this question this is also my RB-definitions.txt) 这部分是rulebase-definition(出于这个问题的目的,这也是我的RB-definitions.txt)

##Rulebase-definition  
rulebase bb
      action priority 6 dynamic-only ruledef rd-6 charging-action throttle monitoring-key 1
      action priority 7 dynamic-only ruledef rd-7 charging-action p2p_Drop
      action priority 139 dynamic-only ruledef rd-8 charging-action p2p_Drop monitoring-key 1
#exit

Here is the ruledef-definition example (also this is the output I'm looking for in this rising this question) 这是ruledef-definition示例(这也是我在提出此问题时所寻找的输出)

##Ruledef-definition
ruledef rd-8
          ip server-ip-address range host-pool BB10_RIM_1
          ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
          ip any-match = TRUE
#exit

I was able to match specyfic rulebase name (with rulebase definition) given by raw_input(), and save it to the file RB-definitions.txt as you can see it above. 我能够匹配raw_input()给出的特定规则库名称(具有规则库定义),并将其保存到文件RB-definitions.txt中,如上所示。 Also I was able to match ruledef names(but only names) from RB-definitions.txt and store it in ruledef_list with the below 我还能够匹配RB-definitions.txt中的ruledef名称(但只有名称),并将其存储在ruledef_list中,如下所示

RDFile = open('RB-definitions.txt')
txt2 = RDFile.read()
ruledef_list = []
for match2 in re.findall((?<=ruledef)((?:.|\n)*?)(?=charging-action), txt2):
    print match2 +"\n" 
    ruledef_list.append(match2)

But I keep failing when I have to match specific ruledef from ruledef-defitnition as shown above. 但是,当我必须从上图所示的ruledef-defitnition中匹配特定的ruledef时,我仍然失败。 ruledef word is always first in the line Ruledef字始终排在第一位

start_tag =    '^ruledef ' #additional space char
content = '((?:.|\n)*?)'                                
end_tag = '#exit'

for RD_name in ruledef_list:
 print RD_name
 for match in re.findall(start_tag + RD_name + content + end_tag, txt):
    print match + end_tag + "\n" 

I tried with '^ruledef ', '^ruledef\\s+' or even '([ruledef ])\\b', but none of this is working. 我尝试使用'^ ruledef','^ ruledef \\ s +'甚至是'([ruledef])\\ b',但是这些都不起作用。 I Have to mathc the first word, because if not I will match also part from rulebase-defitnition which starts from "ruledef". 我必须对第一个单词进行数学运算,因为如果没有,我还将匹配“ ruledef”开头的rulebase-defitnition的一部分。

How I can match everything between defined first word in the line to next "#exit"? 如何匹配下一个“ #exit”行中定义的第一个单词之间的所有内容? So as output I could get the below 所以作为输出我可以得到以下内容

ruledef rd-8
      ip server-ip-address range host-pool BB10_RIM_1
      ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
      ip any-match = TRUE
#exit

For better understanding please find the whole script with example config here http://pastebin.com/q3VUeAdh 为了更好地理解,请在此处http://pastebin.com/q3VUeAdh中找到带有示例配置的整个脚本

You are missing multiline mode. 您缺少多行模式。 Otherwise ^ matches only at the beginning of the entire string. 否则, ^仅在整个字符串的开头匹配。 Also, you can avoid the (?:.|\\n) by using the singleline/dotall mode (which makes . match any character): 另外,您可以通过使用单行/ dotall模式(使.匹配任何字符)来避免(?:.|\\n) ):

start_tag = r'^ruledef ' #additional space char
content = r'(.*?)'                                
end_tag = r'#exit'

...

for match in re.findall(start_tag + RD_name + content + end_tag, txt, re.M|re.S):
    ...

Note that this will give you the contents of the ruledef (ie just the things that were matched by the content part - no ruledef , no name, no #exit). If this is not what you want, simply remove the parentheses in 请注意,这将为您提供ruledef的内容(即,仅是content部分匹配的content -无ruledef ,无名称,无#exit). If this is not what you want, simply remove the parentheses in #exit). If this is not what you want, simply remove the parentheses in content`: #exit). If this is not what you want, simply remove the parentheses in即可:

...
content = r'.*?'
...

By the way, it might be more efficient to use a negative lookahead instead of an ungreedy quantifier (but it doesn't have to - please profile this, if speed is an important concern for you): 顺便说一句,使用负前瞻而不是贪婪的量词可能会更有效(但是不必这样做-如果速度是您的重要考虑因素,请对此进行简要介绍):

...
content = r'(?:(?!#exit).)*'
...

Finally, note how I use raw strings for all regex patterns. 最后,请注意我如何对所有正则表达式模式使用原始字符串。 This is just good practice in Python - otherwise you might get problems with complex escape patterns (ie, you'll have to double-escape some things). 这只是Python中的好习惯-否则您可能会遇到复杂的转义模式问题(即,您必须对某些事情进行两次转义)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM