[英]Python regex match multiline text
I have a text in a file. 我的文件中有文字。
INCLUDE '.\..\..\
FE_10-28\
ASSY.bdf'
INCLUDE '.\..\..\FE_10-28\standalone\COORD.bdf'
$ INCLUDE '.\..\..\FE_10-28\standalone\bracket.bdf'
$ INCLUDE '.\..\..\
$ FE_10-28\standalone\
$ ITFC.bdf'
I would like to have an expression to capture strings (lines beginning with $ should be skipped): 我想要一个表达式来捕获字符串(以$开头的行应跳过):
['.\..\..\FE_10-28\ASSY.bdf', '.\..\..\FE_10-28\standalone\COORD.bdf']
I managed to filter single line string: 我设法过滤了单行字符串:
with open(bdf_name,'r') as f:
file_buff = f.readlines()
text = ''.join(file_buff)
regex_incl = re.compile("[^$]\s+include\s+\'(.*)\'",re.IGNORECASE|re.MULTILINE)
print(regex_incl.findall(text))
But, how would it be for the multiline? 但是,多线路情况如何?
In the first place, you need the flag re.DOTALL
, otherwise a dot .
首先,您需要标记
re.DOTALL
,否则需要一个点.
does not match newlines. 与换行符不匹配。 And read all the data at once.
并一次读取所有数据。
with open(bdf_name, 'r') as f:
data = r.read()
re.findall("^include\s+\'(.*?)\'", data,
flags=re.IGNORECASE|re.MULTILINE|re.DOTALL)
#['.\\..\\..\\\nFE_10-28\\\nASSY.bdf', '.\\..\\..\\FE_10-28\\standalone\\COORD.bdf']
If you do not want the line breaks, remove them with .replace("\\n","")
. 如果您不希望换行,请使用
.replace("\\n","")
删除它们。
You can use this regex
: 您可以使用此
regex
:
>>> raw = '''
... INCLUDE '.\..\..\
FE_10-28\
ASSY.bdf'
INCLUDE '.\..\..\FE_10-28\standalone\COORD.bdf'
$ INCLUDE '.\..\..\FE_10-28\standalone\bracket.bdf'
$ INCLUDE '.\..\..\
$ FE_10-28\standalone\
$ ITFC.bdf'... ... ... ... ... ... ... ... ... ...
... '''
>>>
>>> re.findall(r"^INCLUDE\s+'(.+?)'\n", raw, re.M|re.DOTALL)
['.\\..\\..FE_10-28ASSY.bdf', '.\\..\\..\\FE_10-28\\standalone\\COORD.bdf']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.