簡體   English   中英

Python 正則表達式:如果字符串元素以 $ 為前綴,則從字符串元素中刪除 $ 和花括號

[英]Python REGEX: remove $ and Curly Brackets from string elements if string element is prefixed by $

在 Python 筆記本中,我有一個想要以特定方式解析的字符串,但我無法找出必要的正則表達式。 這並不重要,但該字符串之前是一個復雜的嵌套字典,該字典源自通過 json 方法將 Oozie 工作流 xml 轉換為 Python 字典。

'{"workflow-app": {"@xmlns": "uri:oozie:workflow:0.4", "@name": "simple-Workflow", "start": {"@to": "Create_External_Table"}, "action": [{"@name": "Create_External_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/external.hive}"}, "ok": {"@to": "Create_orc_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Create_orc_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/orc.hive}"}, "ok": {"@to": "Insert_into_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Insert_into_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/Copydata.hive}", "param": "${database_name}"}, "ok": {"@to": "end"}, "error": {"@to": "kill_job"}}], "kill": {"@name": "kill_job", "message": "Job failed"}, "end": {"@name": "end"}}}'

無論哪種情況,您都會注意到字符串中的某些元素以美元符號為前綴。 例如“${xyz.com:8088}”、“${hdfs_path_of_script/external.hive}”等等。

其他元素也由花括號包裹,但對於那些且只有那些以美元符號為前綴的元素,我想刪除美元符號前綴和立即包裹它的花括號。

在上面的兩個例子中,我想獲取“xyz.com:8088”和“hdfs_path_of_script/external.hive”。 這就是字符串最終的樣子。

'{"workflow-app": {"@xmlns": "uri:oozie:workflow:0.4", "@name": "simple-Workflow", "start": {"@to": "Create_External_Table"}, "action": [{"@name": "Create_External_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "xyz.com:8088", "name-node": "hdfs://rootname", "script": "hdfs_path_of_script/external.hive"}, "ok": {"@to": "Create_orc_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Create_orc_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "xyz.com:8088", "name-node": "hdfs://rootname", "script": "hdfs_path_of_script/orc.hive"}, "ok": {"@to": "Insert_into_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Insert_into_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "xyz.com:8088", "name-node": "hdfs://rootname", "script": "hdfs_path_of_script/Copydata.hive", "param": "database_name"}, "ok": {"@to": "end"}, "error": {"@to": "kill_job"}}], "kill": {"@name": "kill_job", "message": "Job failed"}, "end": {"@name": "end"}}}'

有人可以幫我解析這個東西嗎? 如果重要的話,我正在使用 Python 3.7。

您可以使用遞歸來遍歷字典並更改適當的值:

import re
import json


pat = re.compile(r"\$\{(.*)\}")


def transform(d):
    if isinstance(d, dict):
        for k, v in d.items():
            if isinstance(v, str):
                d[k] = pat.sub(r"\1", v)
            else:
                transform(v)
    elif isinstance(d, list):
        for v in d:
            transform(v)


s = '{"workflow-app": {"@xmlns": "uri:oozie:workflow:0.4", "@name": "simple-Workflow", "start": {"@to": "Create_External_Table"}, "action": [{"@name": "Create_External_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/external.hive}"}, "ok": {"@to": "Create_orc_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Create_orc_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/orc.hive}"}, "ok": {"@to": "Insert_into_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Insert_into_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/Copydata.hive}", "param": "${database_name}"}, "ok": {"@to": "end"}, "error": {"@to": "kill_job"}}], "kill": {"@name": "kill_job", "message": "Job failed"}, "end": {"@name": "end"}}}'
data = json.loads(s)
transform(data)
print(json.dumps(data, indent=4))

印刷:

{
    "workflow-app": {
        "@xmlns": "uri:oozie:workflow:0.4",
        "@name": "simple-Workflow",
        "start": {
            "@to": "Create_External_Table"
        },
        "action": [
            {
                "@name": "Create_External_Table",
                "hive": {
                    "@xmlns": "uri:oozie:hive-action:0.4",
                    "job-tracker": "xyz.com:8088",
                    "name-node": "hdfs://rootname",
                    "script": "hdfs_path_of_script/external.hive"
                },
                "ok": {
                    "@to": "Create_orc_Table"
                },
                "error": {
                    "@to": "kill_job"
                }
            },
            {
                "@name": "Create_orc_Table",
                "hive": {
                    "@xmlns": "uri:oozie:hive-action:0.4",
                    "job-tracker": "xyz.com:8088",
                    "name-node": "hdfs://rootname",
                    "script": "hdfs_path_of_script/orc.hive"
                },
                "ok": {
                    "@to": "Insert_into_Table"
                },
                "error": {
                    "@to": "kill_job"
                }
            },
            {
                "@name": "Insert_into_Table",
                "hive": {
                    "@xmlns": "uri:oozie:hive-action:0.4",
                    "job-tracker": "xyz.com:8088",
                    "name-node": "hdfs://rootname",
                    "script": "hdfs_path_of_script/Copydata.hive",
                    "param": "database_name"
                },
                "ok": {
                    "@to": "end"
                },
                "error": {
                    "@to": "kill_job"
                }
            }
        ],
        "kill": {
            "@name": "kill_job",
            "message": "Job failed"
        },
        "end": {
            "@name": "end"
        }
    }
}

我可能會加載json並處理數據,但是這個正則表達式可以滿足您的要求:

import re

# your original JSON
ins = '{"workflow-app": {"@xmlns": "uri:oozie:workflow:0.4", "@name": "simple-Workflow", "start": {"@to": "Create_External_Table"}, "action": [{"@name": "Create_External_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/external.hive}"}, "ok": {"@to": "Create_orc_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Create_orc_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/orc.hive}"}, "ok": {"@to": "Insert_into_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Insert_into_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "${xyz.com:8088}", "name-node": "${hdfs://rootname}", "script": "${hdfs_path_of_script/Copydata.hive}", "param": "${database_name}"}, "ok": {"@to": "end"}, "error": {"@to": "kill_job"}}], "kill": {"@name": "kill_job", "message": "Job failed"}, "end": {"@name": "end"}}}'

# this is your expected output string
outs = '{"workflow-app": {"@xmlns": "uri:oozie:workflow:0.4", "@name": "simple-Workflow", "start": {"@to": "Create_External_Table"}, "action": [{"@name": "Create_External_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "xyz.com:8088", "name-node": "hdfs://rootname", "script": "hdfs_path_of_script/external.hive"}, "ok": {"@to": "Create_orc_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Create_orc_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "xyz.com:8088", "name-node": "hdfs://rootname", "script": "hdfs_path_of_script/orc.hive"}, "ok": {"@to": "Insert_into_Table"}, "error": {"@to": "kill_job"}}, {"@name": "Insert_into_Table", "hive": {"@xmlns": "uri:oozie:hive-action:0.4", "job-tracker": "xyz.com:8088", "name-node": "hdfs://rootname", "script": "hdfs_path_of_script/Copydata.hive", "param": "database_name"}, "ok": {"@to": "end"}, "error": {"@to": "kill_job"}}], "kill": {"@name": "kill_job", "message": "Job failed"}, "end": {"@name": "end"}}}'

# replace strings that...
# * start with a "
# * then has '${'
# * capture non-greedy arbitrary number of characters with (.*?) 
# * then has '}'
# * then ends with "
# Replace it with the capture in \1 and surround with quotes
subbed = re.sub(r'"\${(.*?)}"', r'"\1"', ins)


print(subbed == outs)
# this output True

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM