简体   繁体   English

Python:从xml提取文本

[英]Python: Extract text from xml

I have a Directory full of XML-files. 我有一个充满XML文件的目录。 I'd like to go through all files, extract the text in a new file and Name the new file like the meta-tag in the XML + a number. 我想浏览所有文件,将文本提取到一个新文件中,然后像XML中的meta-tag和一个数字一样命名新文件。

Example: All XML-files look like: 示例:所有XML文件如下所示:

<?xml version='1.0' encoding='UTF-8'?>
<event><event id="9936070" origin="2008_01.xml"><meta/><text><original>Here is the text.</original>
</text><actors><actor><name>Enterprise</name><polarity>negativ implizit</polarity><centrality>zentral</centrality></actor></actors></event></event>

Now, I'd like to extract the text ("Here is the text.") and save it in a file called "Enterprise_2008_01.xml". 现在,我想提取文本(“此处是文本。”)并将其保存在名为“ Enterprise_2008_01.xml”的文件中。 How is this possible? 这怎么可能? And, especially, how is it possible to consider all files and, according to circumstances, increase the last number (if there is another file with the tag "Enterprise" from 2008, the file-Name should be "Enterprise_2008_02.xml". Thanks a lot for any help! :) 并且,尤其是,如何考虑所有文件,并根据情况增加最后一个数字(如果还有另一个带有2008年标签“ Enterprise”的文件,则文件名应为“ Enterprise_2008_02.xml”。)很多帮助!:)

import xml.etree.ElementTree as eltre
path = 'C:/myfile.xml'
eTree = eltre.ElementTree()
eTree.parse(path)
elements = list(eTree.getroot())
for el in elements:
    print el.text
    #prints body
    print el.tag
    #prints elementName
    print el.attrib
    #prints attributes
    childElements = list(el)
    #gets children

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM