[英]python regex match everything between defined word on the beginning of the line to defined word in other line
I have file like the below, it's part of the config which contains references for ruledefs (ie rd-6). 我有如下文件,它是配置的一部分,其中包含对ruledefs(即rd-6)的引用。 Config file structure always looks the same except the rulebase and ruledefs names.
除了rulebase和ruledefs名称外,配置文件结构始终看起来相同。 This part is rulebase-definition (for this purpose of this question this is also my RB-definitions.txt)
这部分是rulebase-definition(出于这个问题的目的,这也是我的RB-definitions.txt)
##Rulebase-definition
rulebase bb
action priority 6 dynamic-only ruledef rd-6 charging-action throttle monitoring-key 1
action priority 7 dynamic-only ruledef rd-7 charging-action p2p_Drop
action priority 139 dynamic-only ruledef rd-8 charging-action p2p_Drop monitoring-key 1
#exit
Here is the ruledef-definition example (also this is the output I'm looking for in this rising this question) 这是ruledef-definition示例(这也是我在提出此问题时所寻找的输出)
##Ruledef-definition
ruledef rd-8
ip server-ip-address range host-pool BB10_RIM_1
ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
ip any-match = TRUE
#exit
I was able to match specyfic rulebase name (with rulebase definition) given by raw_input(), and save it to the file RB-definitions.txt as you can see it above. 我能够匹配raw_input()给出的特定规则库名称(具有规则库定义),并将其保存到文件RB-definitions.txt中,如上所示。 Also I was able to match ruledef names(but only names) from RB-definitions.txt and store it in ruledef_list with the below
我还能够匹配RB-definitions.txt中的ruledef名称(但只有名称),并将其存储在ruledef_list中,如下所示
RDFile = open('RB-definitions.txt')
txt2 = RDFile.read()
ruledef_list = []
for match2 in re.findall((?<=ruledef)((?:.|\n)*?)(?=charging-action), txt2):
print match2 +"\n"
ruledef_list.append(match2)
But I keep failing when I have to match specific ruledef from ruledef-defitnition as shown above. 但是,当我必须从上图所示的ruledef-defitnition中匹配特定的ruledef时,我仍然失败。 ruledef word is always first in the line
Ruledef字始终排在第一位
start_tag = '^ruledef ' #additional space char
content = '((?:.|\n)*?)'
end_tag = '#exit'
for RD_name in ruledef_list:
print RD_name
for match in re.findall(start_tag + RD_name + content + end_tag, txt):
print match + end_tag + "\n"
I tried with '^ruledef ', '^ruledef\\s+' or even '([ruledef ])\\b', but none of this is working. 我尝试使用'^ ruledef','^ ruledef \\ s +'甚至是'([ruledef])\\ b',但是这些都不起作用。 I Have to mathc the first word, because if not I will match also part from rulebase-defitnition which starts from "ruledef".
我必须对第一个单词进行数学运算,因为如果没有,我还将匹配“ ruledef”开头的rulebase-defitnition的一部分。
How I can match everything between defined first word in the line to next "#exit"? 如何匹配下一个“ #exit”行中定义的第一个单词之间的所有内容? So as output I could get the below
所以作为输出我可以得到以下内容
ruledef rd-8
ip server-ip-address range host-pool BB10_RIM_1
ip server-ip-address range host-pool BB10_RIM_2
#exit
ruledef rd-3
ip any-match = TRUE
#exit
For better understanding please find the whole script with example config here http://pastebin.com/q3VUeAdh 为了更好地理解,请在此处http://pastebin.com/q3VUeAdh中找到带有示例配置的整个脚本
You are missing multiline mode. 您缺少多行模式。 Otherwise
^
matches only at the beginning of the entire string. 否则,
^
仅在整个字符串的开头匹配。 Also, you can avoid the (?:.|\\n)
by using the singleline/dotall mode (which makes .
match any character): 另外,您可以通过使用单行/ dotall模式(使
.
匹配任何字符)来避免(?:.|\\n)
):
start_tag = r'^ruledef ' #additional space char
content = r'(.*?)'
end_tag = r'#exit'
...
for match in re.findall(start_tag + RD_name + content + end_tag, txt, re.M|re.S):
...
Note that this will give you the contents of the ruledef
(ie just the things that were matched by the content
part - no ruledef
, no name, no #exit). If this is not what you want, simply remove the parentheses in
请注意,这将为您提供
ruledef
的内容(即,仅是content
部分匹配的content
-无ruledef
,无名称,无#exit). If this is not what you want, simply remove the parentheses in
#exit). If this is not what you want, simply remove the parentheses in
content`: #exit). If this is not what you want, simply remove the parentheses in
即可:
...
content = r'.*?'
...
By the way, it might be more efficient to use a negative lookahead instead of an ungreedy quantifier (but it doesn't have to - please profile this, if speed is an important concern for you): 顺便说一句,使用负前瞻而不是贪婪的量词可能会更有效(但是不必这样做-如果速度是您的重要考虑因素,请对此进行简要介绍):
...
content = r'(?:(?!#exit).)*'
...
Finally, note how I use raw strings for all regex patterns. 最后,请注意我如何对所有正则表达式模式使用原始字符串。 This is just good practice in Python - otherwise you might get problems with complex escape patterns (ie, you'll have to double-escape some things).
这只是Python中的好习惯-否则您可能会遇到复杂的转义模式问题(即,您必须对某些事情进行两次转义)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.