簡體   English   中英

正則表達式匹配兩個符號之間的任何東西

[英]Regular expression to match anything between two symbols

試圖更多地了解Python中的正則表達式,我發現很難在兩個符號(包括這些符號)之間匹配任何字符(包括換行符,制表符,空格等)

例如:

  • foobar89\\n\\nfoo\\tbar; '''blah blah blah'8&^"''' foobar89\\n\\nfoo\\tbar; '''blah blah blah'8&^"'''需要匹配''blah blah blah'8&^"'''

  • fjfdaslfdj; '''blah\\n blah\\n\\t\\t blah\\n'8&^"''' fjfdaslfdj; '''blah\\n blah\\n\\t\\t blah\\n'8&^"'''需要匹配'''blah\\n blah\\n\\t\\t blah\\n'8&^"'''

(請注意,用\\n\\t符號表示文本文件中的換行符和制表符空格)

在這個問題之后 ,我嘗試了這個^.*\\'''(.*)\\'''.*$和這個*?\\'''(.*)\\'''.*沒有成功。

有人可以指導我做錯什么嗎? 我也希望任何簡短的解釋。

另外,為了理解轉義特殊字符的概念,我想知道是否通過在正則表達式中替換兩個符號(例如從'''"""*** )是否仍然可以正常工作(對於相關字符串)?

例如

  • fjfdaslfdj; """blah\\n blah\\n\\t\\t blah\\n'8&^""" fjfdaslfdj; """blah\\n blah\\n\\t\\t blah\\n'8&^"""需要匹配"""blah\\n blah\\n\\t\\t blah\\n'8&^"""

更新

我正在嘗試測試regexes的代碼(從此處獲取和修改):

import collections
import re

Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])

def tokenize(code):
    token_specification = [
        # regexes suggested from [Thomas Ayoub][3]
        ('BOTH',      r'([\'"]{3}).*?\2'), # for both triple-single quotes and triple-double quotes
        ('SINGLE',    r"('''.*?''')"),     # triple-single quotes 
        ('DOUBLE',    r'(""".*?""")'),     # triple-double quotes 
        # regexes which match OK
        ('COM',       r'#.*'),
        ('NUMBER',  r'\d+(\.\d*)?'),  # Integer or decimal number
        ('ASSIGN',  r':='),           # Assignment operator
        ('END',     r';'),            # Statement terminator
        ('ID',      r'[A-Za-z]+'),    # Identifiers
        ('OP',      r'[+\-*/]'),      # Arithmetic operators
        ('NEWLINE', r'\n'),           # Line endings
        ('SKIP',    r'[ \t]+'),       # Skip over spaces and tabs
        ('MISMATCH',r'.'),            # Any other character
    ]

    test_regexes = ['COM', 'BOTH', 'SINGLE', 'DOUBLE']

    tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
    line_num = 1
    line_start = 0
    for mo in re.finditer(tok_regex, code):
        kind = mo.lastgroup
        value = mo.group(kind)
        if kind == 'NEWLINE':
            line_start = mo.end()
            line_num += 1
        elif kind == 'SKIP':
            pass
        elif kind == 'MISMATCH':
            pass
        else:
            if kind in test_regexes:
                print(kind, value)
            column = mo.start() - line_start
            yield Token(kind, value, line_num, column)

f = r'C:\path_to_python_file_with_above_examples'

with open(f) as sfile:
    content = sfile.read()

for t in tokenize(content):
    pass #print(t)

您可以選擇:

((['"]{3}).*?\2)

查看實時運行的python實時運行的正則表達式


  • ^.*\\'''(.*)\\'''.*$ =>您在行首/行尾添加了錨點,這在需要多行匹配時不起作用
  • *?\\'''(.*)\\'''.* =>語法錯誤
  • re.compile(ur'(([\\'"]{3}).*?\\2)', re.MULTILINE | re.DOTALL) => re.DOTALL使.匹配新行。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM