簡體   English   中英

如何為以下用例編寫正則表達式

[英]How to write a regex for the following use case

我有以下文字。

<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->

我需要獲取標簽以及數字。 我在Python中對此有以下介紹。

tag_list = re.findall(r'[A-Z]+(?:_[A-Z\d]+)+\(\d+\)', str(feed))
        for tag in tag_list:
            index = tag.index('(')
            result[tag[:index]] = int(tag.split("(")[1].rstrip(")"))
        print result

這會將輸出打印為:

{'RENAME_CSS': 3, 'IMAGE_COMPRESSION': 59, 'MINIFY_JAVASCRIPT': 10, 'RENAME_JAVASCRIPT': 9, 'RENAME_IMAGE': 59, 'EMBED_JAVASCRIPT': 2}

現在,我只想對以上文本中的應用進行此操作。 例如我只想獲取上述信息'TextTransApplie'或'TagTransApplied'。

我嘗試了以下:-

re.findall(r'TextTransApplied:[AZ]+(?:_[AZ\\d]+)+\\(\\d+\\)但這只給出第一個值,我如何獲取所有已應用值的全部值。

最好先獲取與TagTransApplied / TextTransApplied相關的所有內容,然后再提取所需的部分:

import re

feed = """<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->"""

result = dict()
tagged = re.findall(r'T(?:ag|ext)TransApplied[^;]+', str(feed))
for part in tagged:
    tag_list = re.findall(r'[A-Z]+(?:_[A-Z\d]+)+\(\d+\)', part)
    for tag in tag_list:
        id = tag.index('(')
        result[tag[:id]] = int(tag.split("(")[1].rstrip(")"))
print result

結果:

{'RENAME_CSS': 3, 'IMAGE_COMPRESSION': 59, 'MINIFY_JAVASCRIPT': 10, 'RENAME_JAVASCRIPT': 9, 'RENAME_IMAGE': 59, 'EMBED_JAVASCRIPT': 2}

ideone演示

嘗試獲取捕獲組中的所有內容,然后處理該字符串。
(我稍微修改了您現有的邏輯,並且將RENAME_JAVASCRIPT(9)更改為RENAME_JAVASCRIPT(19)只是為了說明區別)

import re
s = '<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(19), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->'
tag_list = re.findall(r'(?:TextTransAttempted|TextTransApplied):\s*((?:(?:[A-Z]+(?:_[A-Z\d]+)+)?\(\d+\)\s*(?:,\s*|;))*)', s)
for tag in tag_list:
    result = {}
    for e in tag.split(","):
        index = e.index('(')
        if e[:index].strip():
            result[e[:index].strip()] = (e.split("(")[1].rstrip(");"))
    print result


'''
OUTPUT
>>> 
{'RENAME_CSS': '3', 'IMAGE_COMPRESSION': '59', 'MINIFY_JAVASCRIPT': '10', 'RENAME_JAVASCRIPT': '19', 'RENAME_IMAGE': '59', 'EMBED_JAVASCRIPT': '2'}
{'RENAME_CSS': '3', 'IMAGE_COMPRESSION': '59', 'MINIFY_JAVASCRIPT': '10', 'RENAME_JAVASCRIPT': '9', 'RENAME_IMAGE': '59', 'EMBED_JAVASCRIPT': '2'}
'''

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM