使用 python 解析 XML 文件的一部分

Question

Im new to both python and xml.我是 python 和 xml 的新手。 Have looked at the previous posts on the topic, and I cant figure out how to do exactly what I need to.看过有关该主题的以前的帖子，我无法弄清楚如何准确地做我需要做的事情。 Although it seems to be simple enough in principle.虽然原则上看起来很简单。

<Project>
 <Items>
  <Item>
   <Code>A456B</Code>
   <Database>
    <Data>
     <Id>mountain</Id>
     <Value>12000</Value>
    </Data>
    <Data>
     <Id>UTEM</Id>
     <Value>53.2</Value>
    </Data>
   </Database>
  </Item>
  <Item>
   <Code>A786C</Code>
   <Database>
    <Data>
     <Id>mountain</Id>
     <Value>5000</Value>
    </Data>
    <Data>
     <Id>UTEM</Id>
     <Value></Value>
    </Data>
   </Database>
  </Item>
 </Items>
</Project>

All I want to do is extract all of the Codes, Values and ID's, which is no problem.我想要做的就是提取所有的代码、值和 ID，这没问题。

import xml.etree.cElementTree as ET

name = 'example tree.xml'
tree = ET.parse(name)
root = tree.getroot()
codes=[]
ids=[]
val=[]
for db in root.iter('Code'):
    codes.append(db.text)
for ID in root.iter('Id'):
    ids.append(ID.text)
for VALUE in root.iter('Value'):
    val.append(VALUE.text)
print codes
print ids
print val

['A456B', 'A786C']
['mountain', 'UTEM', 'mountain', 'UTEM']
['12000', '53.2', '5000', None]

I want to know which Ids and Values go with which Code.我想知道哪些 ID 和值与哪些代码对应。 Something like a dictionary of dictionaries maybe OR perhaps a list of DataFrames with the row index being the Id, and the column header being Code.像字典这样的东西可能或者可能是一个 DataFrame 列表，其中行索引是 Id，列标题是 Code。

for example例如

A456B = {mountain:12000, UTEM:53.2} A456B = {山：12000，UTEM：53.2}
A786C = {mountain:5000, UTEM: None} A786C = {山：5000，UTEM：无}

Eventually I want to use the Values to feed an equation.最终我想使用这些值来提供一个方程。

Note that the real xml file might not contain the same number of Ids and Values in each Code.请注意，真实的 xml 文件可能不会在每个代码中包含相同数量的 ID 和值。 Also, Id and Value might be different from one Code section to another.此外，Id 和 Value 可能因一个 Code 部分而异。

Sorry if this question is elementary, or unclear...I've only been doing python for a month :/对不起，如果这个问题是基本的，或者不清楚......我只做了一个月的python：/

Answer 1

BeautifulSoup is a very useful module for parsing HTML and XML. BeautifulSoup是一个非常有用的模块，用于解析 HTML 和 XML。

from bs4 import BeautifulSoup
import os

# read the file into a BeautifulSoup object
soup = BeautifulSoup(open(os.getcwd() + "\\input.txt"))

results = {}

# parse the data, and put it into a dict, where the values are dicts
for item in soup.findAll('item'):
    # assemble dicts on the fly using a dict comprehension:
    # http://stackoverflow.com/a/14507637/4400277
    results[item.code.text] = {data.id.text:data.value.text for data in item.findAll('data')}

>>> results
{u'A786C': {u'mountain': u'5000', u'UTEM': u''}, 
 u'A456B': {u'mountain': u'12000', u'UTEM': u'53.2'}

Answer 2

This might be what you want:这可能是你想要的：

import xml.etree.cElementTree as ET

name = 'test.xml'
tree = ET.parse(name)
root = tree.getroot()
codes={}

for item in root.iter('Item'):
    code = item.find('Code').text
    codes[code] = {}

    for datum in item.iter('Data'):
        if datum.find('Value') is not None:
            value = datum.find('Value').text
        else:
            value = None
        if datum.find('Id') is not None:
            id = datum.find('Id').text
            codes[code][id] = value

print codes

This produces: {'A456B' : {'mountain' : '12000', 'UTEM' : '53.2'}, 'A786C' : {'mountain' : '5000', 'UTEM' : None}}这产生： {'A456B' : {'mountain' : '12000', 'UTEM' : '53.2'}, 'A786C' : {'mountain' : '5000', 'UTEM' : None}}

This iterates over all Item tags, and for each one, creates a dict key pointing to a dict of id/value pairs.这将遍历所有 Item 标签，并为每个标签创建一个指向 id/value 对的字典的字典键。 An id/data pair is only created if the Id tag is not empty.仅当 Id 标签不为空时，才会创建 id/data 对。

使用 python 解析 XML 文件的一部分

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-09-22 20:19:21

解决方案2
0 2016-09-22 20:24:41

使用 python 解析 XML 文件的一部分

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-09-22 20:19:21

解决方案2 0 2016-09-22 20:24:41

解决方案1
1 已采纳 2016-09-22 20:19:21

解决方案2
0 2016-09-22 20:24:41