简体   繁体   English

使用元素树读取xml文件

[英]Reading an xml file using element tree

I have one xml file. 我有一个xml文件。 Its looks like, 看起来像

<root>
  <Group>    
    <ChapterNo>1</ChapterNo>    
    <ChapterName>A</ChapterName>    
    <Line>1</Line>    
    <Content>zfsdfsdf</Content>    
    <Synonyms>fdgd</Synonyms>    
    <Translation>assdfsdfsdf</Translation>    
  </Group>    
  <Group>    
    <ChapterNo>1</ChapterNo>    
    <ChapterName>A</ChapterName>    
    <Line>2</Line>    
    <Content>ertreter</Content>    
    <Synonyms>retreter</Synonyms>    
    <Translation>erterte</Translation>    
  </Group>    
  <Group>    
    <ChapterNo>2</ChapterNo>    
    <ChapterName>B</ChapterName>    
    <Line>1</Line>    
    <Content>sadsafs</Content>
    <Synonyms>sdfsdfsd</Synonyms>
    <Translation>sdfsdfsd</Translation>
  </Group>
  <Group>
    <ChapterNo>2</ChapterNo>
    <ChapterName>B</ChapterName>
    <Line>2</Line>
    <Content>retete</Content>
    <Synonyms>retertret</Synonyms>
    <Translation>retertert</Translation>
  </Group>
</root>

I tried in this way....... 我尝试过这种方式.......

root = ElementTree.parse('data.xml').getroot()
ChapterNo = root.find('ChapterNo').text 
ChapterName = root.find('ChapterName').text 
GitaLine = root.find('Line').text 
Content = root.find('Content').text 
Synonyms = root.find('Synonyms').text 
Translation = root.find('Translation').text

But it shows an error 但是显示错误

ChapterNo=root.find('ChapterNo').text 
AttributeError: 'NoneType' object has no attribute 'text'`

Now i want to get the all ChapterNo,ChapterName, etc are separately using element tree and I want to insert these dats into the database.... Any one can help me? 现在我想获取所有的ChapterNo,ChapterName等,分别使用元素树,我想将这些数据插入数据库中。...有人可以帮助我吗?

Rgds, RGDS,

Nimmy Nimmy

To parse your simple two-level data structure and assemble a dict for each group, all you need to do is this: 要解析简单的两级数据结构并为每个组组合一个字典,您需要做的是:

>>> # what you did to get `root`
>>> from pprint import pprint as pp
>>> for group in root:
...     d = {}
...     for elem in group:
...         d[elem.tag] = elem.text
...     pp(d) # or whack it ito a database
...
{'ChapterName': 'A',
 'ChapterNo': '1',
 'Content': 'zfsdfsdf',
 'Line': '1',
 'Synonyms': 'fdgd',
 'Translation': 'assdfsdfsdf'}
{'ChapterName': 'A',
 'ChapterNo': '1',
 'Content': 'ertreter',
 'Line': '2',
 'Synonyms': 'retreter',
 'Translation': 'erterte'}
{'ChapterName': 'B',
 'ChapterNo': '2',
 'Content': 'sadsafs',
 'Line': '1',
 'Synonyms': 'sdfsdfsd',
 'Translation': 'sdfsdfsd'}
{'ChapterName': 'B',
 'ChapterNo': '2',
 'Content': 'retete',
 'Line': '2',
 'Synonyms': 'retertret',
 'Translation': 'retertert'}
>>>

Look, Ma, no xpath! 瞧,妈,没有xpath!

ChapterNo is not a direct child of root , so root.find('ChapterNo') won't work. ChapterNo不是root的直接子代,因此root.find('ChapterNo')将不起作用。 You'll need to use xpath syntax to find the data. 您将需要使用xpath语法来查找数据。

Also, there are multiple occurrences of ChapterNo, ChapterName, etc, so you should use findall and iterate through the results to get the text for each one. 另外,有多次出现的ChapterNo,ChapterName等,因此您应该使用findall并遍历结果以获取每个文本。

chapter_nos = [e.text for e in root.findall('.//ChapterNo')]

and so on. 等等。

Here's a small example using sqlalchemy to define a object that will extract and store the data in a sqlite database. 这是一个使用sqlalchemy定义一个对象的小示例,该对象将提取数据并将其存储在sqlite数据库中。

from sqlalchemy import create_engine, Unicode, Integer, Column, UnicodeText
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('sqlite:///chapters.sqlite', echo=True)
Base = declarative_base(bind=engine)

class ChapterLine(Base):
    __tablename__ = 'chapterlines'
    chapter_no = Column(Integer, primary_key=True)
    chapter_name = Column(Unicode(200))
    line = Column(Integer, primary_key=True)
    content = Column(UnicodeText)
    synonyms = Column(UnicodeText)
    translation = Column(UnicodeText)

    @classmethod
    def from_xmlgroup(cls, element):
        l = cls()
        l.chapter_no = int(element.find('ChapterNo').text)
        l.chapter_name = element.find('ChapterName').text
        l.line = int(element.find('Line').text)
        l.content = element.find('Content').text
        l.synonyms = element.find('Synonyms').text
        l.translation = element.find('Translation').text
        return l

Base.metadata.create_all() # creates the table

Here's how to use it: 使用方法如下:

from xml.etree import ElementTree as etree

session = create_session(bind=engine, autocommit=False)
doc = etree.parse('myfile.xml').getroot()
for group in doc.findall('Group'):
    l = ChapterLine.from_xmlgroup(group)
    session.add(l)

session.commit()

I have tested this code in your xml data and it works fine, inserting everything into the database. 我已经在您的xml数据中测试了此代码,并且工作正常,可以将所有内容插入数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM