简体   繁体   English

在python中提取特定的XML标签值

[英]Extract specific XML tags Values in python

I have a XML file which contains tags like these. 我有一个XML文件,其中包含此类标签。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DataFlows>
    <DataFlow id="ABC">
            <Flow name="flow4" type="Ingest">
                <Ingest dataSourceName="type1" tableName="table1">
                    <DataSet>
                        <DataSetRef>value1-${d1}-${t1}</DataSetRef>
                        <DataStore>ingest</DataStore>
                    </DataSet>
                    <Mode>Overwrite</Mode>
                </Ingest>
            </Flow>    
        </DataFlow>
        <DataFlow id="MHH" dependsOn="ABC">
            <Flow name="flow5" type="Reconcile">
                <Reconciliation>
                    <Source>QW</Source>
                    <Target>EF</Target>
                    <ComparisonKey>
                        <Column>dealNumber</Column>
                    </ComparisonKey>
    <ReconcileColumns mode="required">
                        <Column>bookId</Column>
                    </ReconcileColumns>
                </Reconciliation>
            </Flow>
            <Flow name="output" type="Export" format="Native">
                <Table publishToSQLServer="true">
                    <DataSet>
                        <DataSetRef>value4_${cob}_${ts}</DataSetRef>
                        <DataStore>recon</DataStore>
                        <Date>${run_date}</Date>
                    </DataSet>
                    <Mode>Overwrite</Mode>
                </Table>
            </Flow>
        </DataFlow>
</DataFlows>

I want to process this XML in python using Python Minimal DOM implementation. 我想使用Python最小DOM实现在python中处理此XML。 I need to extract information in DataSet Tag only when the Flow type in “Reconcile". 仅当“协调”中的流类型时,才需要提取数据集标签中的信息。

For Example: 例如:

If my Flow Type is "Reconcile" then i need to go to next Flow tag named "output" and extract values of DataSetRef,DataSource and Date tags. 如果我的流类型是“协调”,那么我需要转到下一个名为“输出”的流标记,并提取DataSetRef,DataSource和Date标记的值。

So far i have tried below mentioned Code but i am getting blank values in all may fields. 到目前为止,我已经尝试过下面提到的代码,但是我在所有may字段中都得到了空白值。

#!/usr/bin/python

from xml.dom.minidom import parse

import xml.dom.minidom

# Open XML document using minidom parser

DOMTree = xml.dom.minidom.parse("Store.xml")

collection = DOMTree.documentElement

#if collection.hasAttribute("DataFlows"):

#   print "Root element : %s" % collection.getAttribute("DataFlows")

pretty = DOMTree.toprettyxml()

print "Collectio: %s" % collection

dataflows = DOMTree.getElementsByTagName("DataFlow")

# Print detail of each movie.

for dataflow in dataflows:

   print "*****dataflow*****"

   if dataflow.hasAttribute("dependsOn"):

      print "Depends On is present"

      flows = DOMTree.getElementsByTagName("Flow")

      print "flows"

      for flow in flows:

        print "******flow******"

        if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":

          flowByReconcileType = flow.getAttribute("type")

          TagValue = flow.getElementsByTagName("DataSet")

          print "Tag Value is %s" % TagValue

          print "flow type is: %s" % flowByReconcileType

From there onwards i need to pass these 3 values extracted above to Unix Shell scripts to process some directories. 从那以后,我需要将上面提取的这3个值传递给Unix Shell脚本来处理一些目录。 Any Help would be appreciated. 任何帮助,将不胜感激。

First of all check if your XML is well formatted. 首先检查您的XML格式是否正确。 You are missing a root tag and you got wrong double quotes for example here <Flow name=“flow4" type="Ingest"> 您缺少根标记,并且在这里使用了错误的双引号,例如<Flow name=“flow4" type="Ingest">

IN your code you are correctly grabbing the dataflows. 在您的代码中,您正确地获取了数据流。

You don't need to query the DOMTree again for the flows, you can check every dataflow's flow by querying like this: 您不需要再次查询DOMTree的流,可以通过如下查询来检查每个数据流的流:

flows = dataflow.getElementsByTagName("Flow")

Your condition if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile": looks ok to me, so in order to get the next flow item you can do something like this always checking your index is inside the array. if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":对我来说还不错,因此,要获取下一个流项目,您可以执行以下操作,始终检查索引在数组内部。

for index, flow in enumerate(flows):
    if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":
        if index + 1 < len(flows):
            your_flow = flows[index + 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM