在python中提取特定的XML标签值

Question

我有一个XML文件，其中包含此类标签。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DataFlows>
    <DataFlow id="ABC">
            <Flow name="flow4" type="Ingest">
                <Ingest dataSourceName="type1" tableName="table1">
                    <DataSet>
                        <DataSetRef>value1-${d1}-${t1}</DataSetRef>
                        <DataStore>ingest</DataStore>
                    </DataSet>
                    <Mode>Overwrite</Mode>
                </Ingest>
            </Flow>    
        </DataFlow>
        <DataFlow id="MHH" dependsOn="ABC">
            <Flow name="flow5" type="Reconcile">
                <Reconciliation>
                    <Source>QW</Source>
                    <Target>EF</Target>
                    <ComparisonKey>
                        <Column>dealNumber</Column>
                    </ComparisonKey>
    <ReconcileColumns mode="required">
                        <Column>bookId</Column>
                    </ReconcileColumns>
                </Reconciliation>
            </Flow>
            <Flow name="output" type="Export" format="Native">
                <Table publishToSQLServer="true">
                    <DataSet>
                        <DataSetRef>value4_${cob}_${ts}</DataSetRef>
                        <DataStore>recon</DataStore>
                        <Date>${run_date}</Date>
                    </DataSet>
                    <Mode>Overwrite</Mode>
                </Table>
            </Flow>
        </DataFlow>
</DataFlows>

我想使用Python最小DOM实现在python中处理此XML。 仅当“协调”中的流类型时，才需要提取数据集标签中的信息。

例如：

如果我的流类型是“协调”，那么我需要转到下一个名为“输出”的流标记，并提取DataSetRef，DataSource和Date标记的值。

到目前为止，我已经尝试过下面提到的代码，但是我在所有may字段中都得到了空白值。

#!/usr/bin/python

from xml.dom.minidom import parse

import xml.dom.minidom

# Open XML document using minidom parser

DOMTree = xml.dom.minidom.parse("Store.xml")

collection = DOMTree.documentElement

#if collection.hasAttribute("DataFlows"):

#   print "Root element : %s" % collection.getAttribute("DataFlows")

pretty = DOMTree.toprettyxml()

print "Collectio: %s" % collection

dataflows = DOMTree.getElementsByTagName("DataFlow")

# Print detail of each movie.

for dataflow in dataflows:

   print "*****dataflow*****"

   if dataflow.hasAttribute("dependsOn"):

      print "Depends On is present"

      flows = DOMTree.getElementsByTagName("Flow")

      print "flows"

      for flow in flows:

        print "******flow******"

        if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":

          flowByReconcileType = flow.getAttribute("type")

          TagValue = flow.getElementsByTagName("DataSet")

          print "Tag Value is %s" % TagValue

          print "flow type is: %s" % flowByReconcileType

从那以后，我需要将上面提取的这3个值传递给Unix Shell脚本来处理一些目录。 任何帮助，将不胜感激。

Answer 1

首先检查您的XML格式是否正确。 您缺少根标记，并且在这里使用了错误的双引号，例如<Flow name=“flow4" type="Ingest">

在您的代码中，您正确地获取了数据流。

您不需要再次查询DOMTree的流，可以通过如下查询来检查每个数据流的流：

flows = dataflow.getElementsByTagName("Flow")

if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":对我来说还不错，因此，要获取下一个流项目，您可以执行以下操作，始终检查索引在数组内部。

for index, flow in enumerate(flows):
    if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":
        if index + 1 < len(flows):
            your_flow = flows[index + 1]

在python中提取特定的XML标签值

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-10-10 07:31:05

在python中提取特定的XML标签值

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-10-10 07:31:05

解决方案1
0 已采纳 2016-10-10 07:31:05