[英]Regular expression to match anything between two symbols
試圖更多地了解Python中的正則表達式,我發現很難在兩個符號(包括這些符號)之間匹配任何字符(包括換行符,制表符,空格等) 。
例如:
foobar89\\n\\nfoo\\tbar; '''blah blah blah'8&^"'''
foobar89\\n\\nfoo\\tbar; '''blah blah blah'8&^"'''
需要匹配''blah blah blah'8&^"'''
fjfdaslfdj; '''blah\\n blah\\n\\t\\t blah\\n'8&^"'''
fjfdaslfdj; '''blah\\n blah\\n\\t\\t blah\\n'8&^"'''
需要匹配'''blah\\n blah\\n\\t\\t blah\\n'8&^"'''
(請注意,用\\n
和\\t
符號表示文本文件中的換行符和制表符空格)
在這個問題之后 ,我嘗試了這個^.*\\'''(.*)\\'''.*$
和這個*?\\'''(.*)\\'''.*
沒有成功。
有人可以指導我做錯什么嗎? 我也希望任何簡短的解釋。
另外,為了理解轉義特殊字符的概念,我想知道是否通過在正則表達式中替換兩個符號(例如從'''
到"""
或***
)是否仍然可以正常工作(對於相關字符串)?
例如
fjfdaslfdj; """blah\\n blah\\n\\t\\t blah\\n'8&^"""
fjfdaslfdj; """blah\\n blah\\n\\t\\t blah\\n'8&^"""
需要匹配"""blah\\n blah\\n\\t\\t blah\\n'8&^"""
更新
我正在嘗試測試regexes的代碼(從此處獲取和修改):
import collections
import re
Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])
def tokenize(code):
token_specification = [
# regexes suggested from [Thomas Ayoub][3]
('BOTH', r'([\'"]{3}).*?\2'), # for both triple-single quotes and triple-double quotes
('SINGLE', r"('''.*?''')"), # triple-single quotes
('DOUBLE', r'(""".*?""")'), # triple-double quotes
# regexes which match OK
('COM', r'#.*'),
('NUMBER', r'\d+(\.\d*)?'), # Integer or decimal number
('ASSIGN', r':='), # Assignment operator
('END', r';'), # Statement terminator
('ID', r'[A-Za-z]+'), # Identifiers
('OP', r'[+\-*/]'), # Arithmetic operators
('NEWLINE', r'\n'), # Line endings
('SKIP', r'[ \t]+'), # Skip over spaces and tabs
('MISMATCH',r'.'), # Any other character
]
test_regexes = ['COM', 'BOTH', 'SINGLE', 'DOUBLE']
tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
line_num = 1
line_start = 0
for mo in re.finditer(tok_regex, code):
kind = mo.lastgroup
value = mo.group(kind)
if kind == 'NEWLINE':
line_start = mo.end()
line_num += 1
elif kind == 'SKIP':
pass
elif kind == 'MISMATCH':
pass
else:
if kind in test_regexes:
print(kind, value)
column = mo.start() - line_start
yield Token(kind, value, line_num, column)
f = r'C:\path_to_python_file_with_above_examples'
with open(f) as sfile:
content = sfile.read()
for t in tokenize(content):
pass #print(t)
您可以選擇:
((['"]{3}).*?\2)
^.*\\'''(.*)\\'''.*$
=>您在行首/行尾添加了錨點,這在需要多行匹配時不起作用 *?\\'''(.*)\\'''.*
=>語法錯誤 re.compile(ur'(([\\'"]{3}).*?\\2)', re.MULTILINE | re.DOTALL)
=> re.DOTALL
使.
匹配新行。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.