简体   繁体   English

python脚本从XML文件中提取短语

[英]python script to extract phrases from an XML file

I am trying to parse an XML file which contains these tags. 我试图解析包含这些标签的XML文件。

<?xml version="4.0" encoding="utf-8"?>
<phrases>
  <phrase title="bacd_dd" version_id="10" version_string="lphaf"><![CDATA[bacd dsfbsd dfsd]]></phrase>
  <phrase title="bcvd_ff" version_id="10" version_string="lphaf"><![CDATA[ans fkdfjid dfdf]]></phrase>
  <phrase title="bdsd_fffd" version_id="17" version_string="lphaf 7"><![CDATA[jdhfd dsfodf wernksdlg ffguywer 
<BR>
dsf
sddsfdsfdsf ksdfj fdsf]]></phrase>
</phrases>

Now i want to get only the tag values. 现在我只想获取标签值。 How can i parse the whole XML file ? 如何解析整个XML文件?

Try this with xml.etree 试试xml.etree

import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="utf-8"?>
<phrases>
  <phrase title="bacd_dd" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[bacd dsfbsd dfsd]]></phrase>
  <phrase title="bcvd_ff" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[ans fkdfjid dfdf]]></phrase>
  <phrase title="bdsd_fffd" version_id="1000017" version_string="1.0.0 Alpha 7"><![CDATA[jdhfd dsfodf wernksdlg ffguywer 
<BR>
dsf
sddsfdsfdsf ksdfj fdsf]]></phrase>
</phrases>""")

print root.tag
>>>'phrases'

for i in root:
    print i.text

>>>bacd dsfbsd dfsd
ans fkdfjid dfdf
jdhfd dsfodf wernksdlg ffguywer 
<BR>
dsf
sddsfdsfdsf ksdfj fdsf


for i in root:
    print i.attrib

>>>{'version_string': '1.1.0 Alpha', 'version_id': '1010010', 'title': 'bacd_dd'}
{'version_string': '1.1.0 Alpha', 'version_id': '1010010', 'title': 'bcvd_ff'}
{'version_string': '1.0.0 Alpha 7', 'version_id': '1000017', 'title': 'bdsd_fffd'}

If need of parse from xml file . 如果需要从xml文件解析

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

For more refer https://docs.python.org/2/library/xml.etree.elementtree.html 有关更多信息,请参见https://docs.python.org/2/library/xml.etree.elementtree.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM