简体   繁体   中英

python regex match everything between defined word on the beginning of the line to defined word in other line

I have file like the below, it's part of the config which contains references for ruledefs (ie rd-6). Config file structure always looks the same except the rulebase and ruledefs names. This part is rulebase-definition (for this purpose of this question this is also my RB-definitions.txt)

##Rulebase-definition  
rulebase bb
      action priority 6 dynamic-only ruledef rd-6 charging-action throttle monitoring-key 1
      action priority 7 dynamic-only ruledef rd-7 charging-action p2p_Drop
      action priority 139 dynamic-only ruledef rd-8 charging-action p2p_Drop monitoring-key 1
#exit

Here is the ruledef-definition example (also this is the output I'm looking for in this rising this question)

##Ruledef-definition
ruledef rd-8
          ip server-ip-address range host-pool BB10_RIM_1
          ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
          ip any-match = TRUE
#exit

I was able to match specyfic rulebase name (with rulebase definition) given by raw_input(), and save it to the file RB-definitions.txt as you can see it above. Also I was able to match ruledef names(but only names) from RB-definitions.txt and store it in ruledef_list with the below

RDFile = open('RB-definitions.txt')
txt2 = RDFile.read()
ruledef_list = []
for match2 in re.findall((?<=ruledef)((?:.|\n)*?)(?=charging-action), txt2):
    print match2 +"\n" 
    ruledef_list.append(match2)

But I keep failing when I have to match specific ruledef from ruledef-defitnition as shown above. ruledef word is always first in the line

start_tag =    '^ruledef ' #additional space char
content = '((?:.|\n)*?)'                                
end_tag = '#exit'

for RD_name in ruledef_list:
 print RD_name
 for match in re.findall(start_tag + RD_name + content + end_tag, txt):
    print match + end_tag + "\n" 

I tried with '^ruledef ', '^ruledef\\s+' or even '([ruledef ])\\b', but none of this is working. I Have to mathc the first word, because if not I will match also part from rulebase-defitnition which starts from "ruledef".

How I can match everything between defined first word in the line to next "#exit"? So as output I could get the below

ruledef rd-8
      ip server-ip-address range host-pool BB10_RIM_1
      ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
      ip any-match = TRUE
#exit

For better understanding please find the whole script with example config here http://pastebin.com/q3VUeAdh

You are missing multiline mode. Otherwise ^ matches only at the beginning of the entire string. Also, you can avoid the (?:.|\\n) by using the singleline/dotall mode (which makes . match any character):

start_tag = r'^ruledef ' #additional space char
content = r'(.*?)'                                
end_tag = r'#exit'

...

for match in re.findall(start_tag + RD_name + content + end_tag, txt, re.M|re.S):
    ...

Note that this will give you the contents of the ruledef (ie just the things that were matched by the content part - no ruledef , no name, no #exit). If this is not what you want, simply remove the parentheses in #exit). If this is not what you want, simply remove the parentheses in content`:

...
content = r'.*?'
...

By the way, it might be more efficient to use a negative lookahead instead of an ungreedy quantifier (but it doesn't have to - please profile this, if speed is an important concern for you):

...
content = r'(?:(?!#exit).)*'
...

Finally, note how I use raw strings for all regex patterns. This is just good practice in Python - otherwise you might get problems with complex escape patterns (ie, you'll have to double-escape some things).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM