如何使用xmltodict从xml文件中获取项目

Question

I am trying to easily access values from an xml file. 我正在尝试轻松地从xml文件访问值。

<artikelen>
    <artikel nummer="121">
        <code>ABC123</code>
        <naam>Highlight pen</naam>
        <voorraad>231</voorraad>
        <prijs>0.56</prijs>
    </artikel>
    <artikel nummer="123">
        <code>PQR678</code>
        <naam>Nietmachine</naam>
        <voorraad>587</voorraad>
        <prijs>9.99</prijs>
    </artikel>
..... etc

If i want to acces the value ABC123, how do I get it? 如果我想访问值ABC123，我该如何获取？

import xmltodict

with open('8_1.html') as fd:
    doc = xmltodict.parse(fd.read())
    print(doc[fd]['code'])

Answer 1

Using your example: 使用您的示例：

import xmltodict

with open('artikelen.xml') as fd:
    doc = xmltodict.parse(fd.read())

If you examine doc , you'll see it's an OrderedDict , ordered by tag: 如果您检查doc ，则会看到它是一个OrderedDict ，按标签排序：

>>> doc
OrderedDict([('artikelen',
              OrderedDict([('artikel',
                            [OrderedDict([('@nummer', '121'),
                                          ('code', 'ABC123'),
                                          ('naam', 'Highlight pen'),
                                          ('voorraad', '231'),
                                          ('prijs', '0.56')]),
                             OrderedDict([('@nummer', '123'),
                                          ('code', 'PQR678'),
                                          ('naam', 'Nietmachine'),
                                          ('voorraad', '587'),
                                          ('prijs', '9.99')])])]))])

The root node is called artikelen , and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do: 根节点称为artikelen ，子artikel是OrderedDict对象的列表，因此，如果需要每篇文章的code ，都可以这样做：

codes = []
for artikel in doc['artikelen']['artikel']:
    codes.append(artikel['code'])

# >>> codes
# ['ABC123', 'PQR678']

If you specifically want the code only when nummer is 121 , you could do this: 如果仅在nummer为121时特别需要code ，则可以执行以下操作：

code = None
for artikel in doc['artikelen']['artikel']:
    if artikel['@nummer'] == '121':
        code = artikel['code']
        break

That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions , which are supported by ElementTree . 就是说，如果您正在解析XML文档并想要搜索这样的特定值，我将考虑使用XPath表达式，该表达式由ElementTree支持。

Answer 2

This is using xml.etree You can try this: 这正在使用xml.etree您可以尝试以下操作：

for artikelobj in root.findall('artikel'):
    print artikelobj.find('code')

if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this: 如果您要基于artikel的属性“ nummer”提取特定代码，则可以尝试以下操作：

for artikelobj in root.findall('artikel'):
    if artikel.get('nummer') == 121:
        print artikelobj.find('code')

this will print only the code you want. 这只会打印您想要的代码。

Answer 3

To read .xml files : 读取.xml文件：

import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text

Answer 4

You can use lxml package using XPath Expression. 您可以使用XPath Expression使用lxml包。

from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code

# ABC123

The thing to notice here is the expression. 这里要注意的是表达式。 /artikelen is the root element. /artikelen是根元素。 /artikel[1] chooses the first artikel element under root (Notice first element is not at index 0). /artikel[1]选择root下的第一个artikel元素（注意，第一个元素不在索引0处）。 /code is the child element under artikel[1] . /code是artikel[1]下的子元素。 You can read more about at lxml and xpath syntax . 您可以通过lxml和xpath语法了解更多信息。

如何使用xmltodict从xml文件中获取项目

问题描述

4 个解决方案

解决方案1
15 2016-10-20 14:51:34

解决方案2
0 2016-10-20 14:51:34

解决方案3
-1 2016-10-20 12:43:07

解决方案4
-2 2016-10-20 15:35:17

如何使用xmltodict从xml文件中获取项目

问题描述

4 个解决方案

解决方案1 15 2016-10-20 14:51:34

解决方案2 0 2016-10-20 14:51:34

解决方案3 -1 2016-10-20 12:43:07

解决方案4 -2 2016-10-20 15:35:17

解决方案1
15 2016-10-20 14:51:34

解决方案2
0 2016-10-20 14:51:34

解决方案3
-1 2016-10-20 12:43:07

解决方案4
-2 2016-10-20 15:35:17