[英]How to write a regex for the following use case
我有以下文字。
<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->
我需要獲取標簽以及數字。 我在Python中對此有以下介紹。
tag_list = re.findall(r'[A-Z]+(?:_[A-Z\d]+)+\(\d+\)', str(feed))
for tag in tag_list:
index = tag.index('(')
result[tag[:index]] = int(tag.split("(")[1].rstrip(")"))
print result
這會將輸出打印為:
{'RENAME_CSS': 3, 'IMAGE_COMPRESSION': 59, 'MINIFY_JAVASCRIPT': 10, 'RENAME_JAVASCRIPT': 9, 'RENAME_IMAGE': 59, 'EMBED_JAVASCRIPT': 2}
現在,我只想對以上文本中的應用進行此操作。 例如我只想獲取上述信息'TextTransApplie'或'TagTransApplied'。
我嘗試了以下:-
re.findall(r'TextTransApplied:[AZ]+(?:_[AZ\\d]+)+\\(\\d+\\)
但這只給出第一個值,我如何獲取所有已應用值的全部值。
最好先獲取與TagTransApplied
/ TextTransApplied
相關的所有內容,然后再提取所需的部分:
import re
feed = """<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->"""
result = dict()
tagged = re.findall(r'T(?:ag|ext)TransApplied[^;]+', str(feed))
for part in tagged:
tag_list = re.findall(r'[A-Z]+(?:_[A-Z\d]+)+\(\d+\)', part)
for tag in tag_list:
id = tag.index('(')
result[tag[:id]] = int(tag.split("(")[1].rstrip(")"))
print result
結果:
{'RENAME_CSS': 3, 'IMAGE_COMPRESSION': 59, 'MINIFY_JAVASCRIPT': 10, 'RENAME_JAVASCRIPT': 9, 'RENAME_IMAGE': 59, 'EMBED_JAVASCRIPT': 2}
嘗試獲取捕獲組中的所有內容,然后處理該字符串。
(我稍微修改了您現有的邏輯,並且將RENAME_JAVASCRIPT(9)
更改為RENAME_JAVASCRIPT(19)
只是為了說明區別)
import re
s = '<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(19), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->'
tag_list = re.findall(r'(?:TextTransAttempted|TextTransApplied):\s*((?:(?:[A-Z]+(?:_[A-Z\d]+)+)?\(\d+\)\s*(?:,\s*|;))*)', s)
for tag in tag_list:
result = {}
for e in tag.split(","):
index = e.index('(')
if e[:index].strip():
result[e[:index].strip()] = (e.split("(")[1].rstrip(");"))
print result
'''
OUTPUT
>>>
{'RENAME_CSS': '3', 'IMAGE_COMPRESSION': '59', 'MINIFY_JAVASCRIPT': '10', 'RENAME_JAVASCRIPT': '19', 'RENAME_IMAGE': '59', 'EMBED_JAVASCRIPT': '2'}
{'RENAME_CSS': '3', 'IMAGE_COMPRESSION': '59', 'MINIFY_JAVASCRIPT': '10', 'RENAME_JAVASCRIPT': '9', 'RENAME_IMAGE': '59', 'EMBED_JAVASCRIPT': '2'}
'''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.