繁体   English   中英

用python解析xml,超过1深度

[英]xml parsing with python, more than 1 depth

我有一个非常大的xml文件,并希望根据childnode文本获取一些记录。 让我们看看我有一个xml以下,我想得到价格值,如果项目味道好。 (好)我试图使用minidom和ET.ElementTree但找不到合适的方法。

我想做那样的事情;

from xml.dom.minidom import parse, parseString
dom = parse( "file.xml" )
for node in dom.getElementsByTagName('food'):
    node_child=node.getAttribute('description')
       taste=node_child.getAttribute('taste')
       if taste=='good':
          price=node.getAttribute('price')

任何想法?

<breakfast_menu>
 <food>
  <name>Belgian Waffles</name>
  <price>$5.95</price>
  <description>
   <taste>good</taste>
   <sight>bad</sight>
 </description>
 <calories>650</calories>
</food>
<food>
 <name>Strawberry Belgian Waffles</name>
 <price>$7.95</price>
 <description>
   <taste>bad</taste>
   <sight>bad</sight>
 </description>
 <calories>900</calories>
</food>
<food>
 <name>Berry-Berry Belgian Waffles</name>
 <price>$8.95</price>
 <description>
  <taste>good</taste>
  <sight>good</sight>
 </description>
 <calories>900</calories>
</food>
<food>
 <name>French Toast</name>
 <price>$4.50</price>
 <description>
   <taste>good</taste>
   <sight>bad</sight>
 </description>
 <calories>600</calories>
</food>

您可以使用lxml来解析它。

码:

from lxml import html

data = """
    <breakfast_menu>
        <food>
            <name>Belgian Waffles</name>
            <price>$5.95</price>
            <description>
                <taste>good</taste>
                <sight>bad</sight>
            </description>
            <calories>650</calories>
        </food>
        <food>
            <name>Strawberry Belgian Waffles</name>
            <price>$7.95</price>
            <description>
                <taste>bad</taste>
                <sight>bad</sight>
            </description>
            <calories>900</calories>
        </food>
        <food>
            <name>Berry-Berry Belgian Waffles</name>
            <price>$8.95</price>
            <description>
                <taste>good</taste>
                <sight>good</sight>
            </description>
            <calories>900</calories>
        </food>
        <food>
            <name>French Toast</name>
            <price>$4.50</price>
            <description>
                <taste>good</taste>
                <sight>bad</sight>
            </description>
            <calories>600</calories>
        </food>
    """

tree = html.fromstring(data)
tastes = tree.xpath("//taste")
for taste in tastes:
    foodparent = taste.getparent().getparent()
    name = foodparent.xpath("name")[0].text 
    if taste.text == "good":
        price = foodparent.xpath("price")[0].text
        print "%s: %s" % (name, price)
    else:
        print "%s: %s" % (name, "Taste is bad, yuck.")

结果:

Belgian Waffles: $5.95
Strawberry Belgian Waffles: Taste is bad, yuck.
Berry-Berry Belgian Waffles: $8.95
French Toast: $4.50
[Finished in 0.1s]

如果这有帮助,请告诉我们。

假设您的xml存储在名为xml_string的字符串变量中,因此使用ElementTreeXPath ,您可以选择包含description元素的所有food元素,其中taste元素的值为“good”。 然后,您可以从这些食物元素中提取您想要的任何信息。

from xml.etree import ElementTree

tree = ElementTree.fromstring(xml_string)

food_elements = tree.findall('.//food/description[taste="good"]/..')
prices = [(food.find('name').text, food.find('price').text) for food in food_elements]
print(prices)

打印出来:

[('Belgian Waffles', '$5.95'), ('Berry-Berry Belgian Waffles', '$8.95'), ('French Toast', '$4.50')]

这是使用ElementTree的解决方案

import xml.etree.ElementTree as et

tree = et.parse('breakfast.xml')
root = tree.getroot()
for food in root.findall('food'):
    if food.find('description').find('taste').text == 'good':
        price = food.find('price').text
        print "found good food:{0} at price {1}".format(food.find('name').text, price)

结果:

found good food:Belgian Waffles at price $5.95
found good food:Berry-Berry Belgian Waffles at price $8.95
found good food:French Toast at price $4.50

编辑:我还必须修复你的xml,因为你错过了结束标记

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM