简体   繁体   English

如何使用ElementTree在xml文件中搜索标签,其中我有一个具有特定值的特定“父”标签? (蟒蛇)

[英]How do I search for a Tag in xml file using ElementTree where i have a certain “Parent”tag with a specific value? (python)

I just started learning Python and have to write a program, that parses xml files. 我刚开始学习Python并且必须编写一个解析xml文件的程序。 I have to find a certain Tag called OrganisationReference in 2 different files and return it. 我必须在2个不同的文件中找到一个名为OrganisationReference的Tag,并将其返回。 In fact there are multiple Tags with this name, but only one, the one I am trying to return, that has the Tag OrganisationType with the value DEALER as a parent Tag (not quite sure whether the term is right). 实际上有多个具有此名称的标签,但只有一个,我试图返回的标签,其标签组织类型的值为DEALER作为父标签(不太确定该术语是否正确)。 I tried to use ElementTree for this. 我尝试使用ElementTree。 Here is the code: 这是代码:

    import xml.etree.ElementTree as ET

    tree1 = ET.parse('Master1.xml')
    root1 = tree1.getroot()

    tree2 = ET.parse('Master2.xml')
    root2 = tree2.getroot()

    for OrganisationReference in root1.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

    for OrganisationReference in root2.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

But this returns nothing (also no error). 但这没有任何回报(也没有错误)。 Can somebody help me? 有人能帮助我吗?

My file looks like this: 我的文件看起来像这样:

  <MessageOrganisationCount>a</MessageOrganisationCount>
  <MessageVehicleCount>x</MessageVehicleCount>
  <MessageCreditLineCount>y</MessageCreditLineCount>
  <MessagePlanCount>z</MessagePlanCount>
  <OrganisationData>
      <Organisation>
          <OrganisationId>
              <OrganisationType>DEALER</OrganisationType>
              <OrganisationReference>WHATINEED</OrganisationReference>
          </OrganisationId>
          <OrganisationName>XYZ.</OrganisationName>
 ....

Due to the fact that OrganisationReference appears a few more times in this file with different text between start and endtag, I want to get exactly the one, that you see in line 9: it has OrganisationId as a parent tag, and DEALER is also a child tag of OrganisationId. 由于OrganisationReference在这个文件中出现了几次,在开始和结束标签之间有不同的文本,我想要得到你在第9行看到的那个:它有OrganisationId作为父标签,而DEALER也是一个OrganisationId的子标签。

You were super close with your original attempt. 你原来的尝试非常接近。 You just need to make a couple of changes to your xpath and a tiny change to your python. 您只需要对xpath进行一些更改,然后对python进行微小的更改。

The first part of your xpath starts with ./Organization . xpath的第一部分以./Organization Since you're doing the xpath from root, it expects Organization to be a child. 由于您是从root进行xpath,因此它希望Organization成为子进程。 It's not; 不是; it's a descendant. 它是后代。

Try changing ./Organization to .//Organization . 尝试将./Organization改为.//Organization ( // is short for /descendant-or-self::node()/ . See here for more info. ) ///descendant-or-self::node()/缩写。 有关详细信息,请参阅此处。

The second issue is with OrganisationId/[@OrganisationType='DEALER'] . 第二个问题是OrganisationId/[@OrganisationType='DEALER'] That's invalid xpath. 这是无效的xpath。 The / should be removed from between OrganisationId and the predicate . /应该从OrganisationId谓词之间删除。

Also, @ is abbreviated syntax for the attribute:: axis and OrganisationType is an element, not an attribute. 另外, @attribute:: axis的缩写语法, OrganisationType是元素,而不是属性。

Try changing OrganisationId/[@OrganisationType='DEALER'] to OrganisationId[OrganisationType='DEALER'] . 尝试将OrganisationId/[@OrganisationType='DEALER']更改为OrganisationId[OrganisationType='DEALER']

The python issue is with print(OrganisationReference.attrib) . python问题是print(OrganisationReference.attrib) The OrganisationReference doesn't have any attributes; OrganisationReference没有任何属性; just text. 只是文字。

Try changing print(OrganisationReference.attrib) to print(OrganisationReference.text) . 尝试更改print(OrganisationReference.attrib)进行print(OrganisationReference.text)

Here's an example using just one XML file for demo purposes... 这是一个仅使用一个XML文件进行演示的示例...

XML Input (Master1.xml; with doc element added to make it well-formed) XML输入 (Master1.xml;添加了doc元素以使其格式正确)

<doc>
    <MessageOrganisationCount>a</MessageOrganisationCount>
    <MessageVehicleCount>x</MessageVehicleCount>
    <MessageCreditLineCount>y</MessageCreditLineCount>
    <MessagePlanCount>z</MessagePlanCount>
    <OrganisationData>
        <Organisation>
            <OrganisationId>
                <OrganisationType>DEALER</OrganisationType>
                <OrganisationReference>WHATINEED</OrganisationReference>
            </OrganisationId>
            <OrganisationName>XYZ.</OrganisationName>
        </Organisation>
    </OrganisationData>
</doc>

Python 蟒蛇

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

for OrganisationReference in root1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)

Printed Output 印刷输出

WHATINEED

Also note that it doesn't appear that you need to use getroot() at all. 另请注意,您似乎根本不需要使用getroot() You can use findall() directly on the tree... 你可以直接在树上使用findall() ......

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')

for OrganisationReference in tree1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)

You can use a nested for-loop to do it. 您可以使用嵌套的for循环来执行此操作。 First you check whether the text of OrganisationType is DEALER and then get the text of the OrganisationReference that you need. 首先,检查OrganisationType的文本是否为DEALER,然后获取所需的OrganisationReference文本。

If you want to learn more about parsing XML with Python I strongly recommend the documentation of the XMLtree library. 如果您想了解有关使用Python解析XML的更多信息,我强烈建议您使用XMLtree库的文档

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

tree2 = ET.parse('Master2.xml')
root2 = tree2.getroot()

#Find the parent Dealer
for element in root1.findall('./Organisation/OrganisationId'):
    if element[0].text == "DEALER":
         print(element[1].text)

This works if the first tag in your OrganisationId is OrganisationType :) 如果OrganisationId中的第一个标签是OrganisationType :),则此方法有效

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM