[英]How to use xmltodict to get items out of an xml file
I am trying to easily access values from an xml file. 我正在尝试轻松地从xml文件访问值。
<artikelen>
<artikel nummer="121">
<code>ABC123</code>
<naam>Highlight pen</naam>
<voorraad>231</voorraad>
<prijs>0.56</prijs>
</artikel>
<artikel nummer="123">
<code>PQR678</code>
<naam>Nietmachine</naam>
<voorraad>587</voorraad>
<prijs>9.99</prijs>
</artikel>
..... etc
If i want to acces the value ABC123, how do I get it? 如果我想访问值ABC123,我该如何获取?
import xmltodict
with open('8_1.html') as fd:
doc = xmltodict.parse(fd.read())
print(doc[fd]['code'])
Using your example: 使用您的示例:
import xmltodict
with open('artikelen.xml') as fd:
doc = xmltodict.parse(fd.read())
If you examine doc
, you'll see it's an OrderedDict
, ordered by tag: 如果您检查
doc
,则会看到它是一个OrderedDict
,按标签排序:
>>> doc
OrderedDict([('artikelen',
OrderedDict([('artikel',
[OrderedDict([('@nummer', '121'),
('code', 'ABC123'),
('naam', 'Highlight pen'),
('voorraad', '231'),
('prijs', '0.56')]),
OrderedDict([('@nummer', '123'),
('code', 'PQR678'),
('naam', 'Nietmachine'),
('voorraad', '587'),
('prijs', '9.99')])])]))])
The root node is called artikelen
, and there a subnode artikel
which is a list of OrderedDict
objects, so if you want the code
for every article, you would do: 根节点称为
artikelen
,子artikel
是OrderedDict
对象的列表,因此,如果需要每篇文章的code
,都可以这样做:
codes = []
for artikel in doc['artikelen']['artikel']:
codes.append(artikel['code'])
# >>> codes
# ['ABC123', 'PQR678']
If you specifically want the code
only when nummer
is 121
, you could do this: 如果仅在
nummer
为121
时特别需要code
,则可以执行以下操作:
code = None
for artikel in doc['artikelen']['artikel']:
if artikel['@nummer'] == '121':
code = artikel['code']
break
That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions , which are supported by ElementTree
. 就是说,如果您正在解析XML文档并想要搜索这样的特定值,我将考虑使用XPath表达式 ,该表达式由
ElementTree
支持。
This is using xml.etree You can try this: 这正在使用xml.etree您可以尝试以下操作:
for artikelobj in root.findall('artikel'):
print artikelobj.find('code')
if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this: 如果您要基于artikel的属性“ nummer”提取特定代码,则可以尝试以下操作:
for artikelobj in root.findall('artikel'):
if artikel.get('nummer') == 121:
print artikelobj.find('code')
this will print only the code you want. 这只会打印您想要的代码。
To read .xml files : 读取.xml文件:
import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text
You can use lxml package using XPath Expression. 您可以使用XPath Expression使用lxml包。
from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code
# ABC123
The thing to notice here is the expression. 这里要注意的是表达式。
/artikelen
is the root element. /artikelen
是根元素。 /artikel[1]
chooses the first artikel
element under root
(Notice first element is not at index 0). /artikel[1]
选择root
下的第一个artikel
元素(注意,第一个元素不在索引0处)。 /code
is the child element under artikel[1]
. /code
是artikel[1]
下的子元素。 You can read more about at lxml and xpath syntax . 您可以通过lxml和xpath语法 了解更多信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.