简体   繁体   English

用 Python 解析 XML 并导出到 excel

[英]Parsing XML with Python and exporting to excel

I have an XML file that looks like below:我有一个如下所示的 XML 文件:

     <Result name="1">
       <point>
       <objects>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
             <path/>
          <object/>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
               </path>
            <object/>
         <objects/>
      <Result/>
   <Results/>

I would like a python script that can export to excel in the format below:我想要一个可以导出到excel的python脚本,格式如下:

在此处输入图片说明

I would really appreciate the help.我真的很感激你的帮助。 Thank you谢谢

I recommend reading the python documentation.我建议阅读 python 文档。 From https://docs.python.org/2/library/xml.etree.elementtree.html :https://docs.python.org/2/library/xml.etree.elementtree.html

 import xml.etree.ElementTree as ET
 tree = ET.parse('country_data.xml')
 root = tree.getroot()

I also recommend checking out https://xlsxwriter.readthedocs.io/我还建议查看https://xlsxwriter.readthedocs.io/

As far as the logic for manipulating the data, unless you absolutely know the specific order/structure of the xml file, you may run in to problems as there is nothing (as far as I can tell) differentiating <node>A</node> from <node>a</node> or limiting the number of nodes to exactly 8, so you would need to check that stuff and other things of that nature to make sure things end up lined up correctly in the excel file.至于操作数据的逻辑,除非您完全知道 xml 文件的特定顺序/结构,否则您可能会遇到问题,因为没有任何东西(据我所知)区分<node>A</node>来自<node>a</node>或将节点数限制为恰好 8 个,因此您需要检查这些内容和其他具有该性质的内容,以确保最终在 excel 文件中正确排列。

One solution is to transform this XML (?) to HTML table and then load the HTML table to excel.一种解决方案是将此 XML (?) 转换为 HTML 表,然后将 HTML 表加载到 excel。

For example (using BeautifulSoup library):例如(使用 BeautifulSoup 库):

data = '''
     <Result name="1">
       <point>
       <objects>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
             <path/>
          <object/>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
               </path>
            <object/>
         <objects/>
      <Result/>
   <Results/>'''

import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(re.sub(r'<(.*?)/>', r'</\1>', data), 'html.parser')

num_objects = len(soup.select('object'))
num_nodes = len(soup.select_one('object').select('node')) // 2

print('<html><table border=1>')
print('<tr>')
print('<th>Result</th>')
for i in range(num_objects):
    print('<th colspan={}>Object</th>'.format(num_nodes))
print('</tr>')

for result in soup.select('Result[name]'):
    print('<tr>')
    print('<td rowspan=2>{}</td>'.format(result['name']))

    nodes = result.select('node')
    for node in nodes[::2]:
        print('<td>' + node.text + '</td>')
    print('</tr>')
    print('<tr>')
    for node in nodes[1::2]:
        print('<td>' + node.text + '</td>')
    print('</tr>')
print('</table></html>')

This prints:这打印:

<html><table border=1>
<tr>
<th>Result</th>
<th colspan=4>Object</th>
<th colspan=4>Object</th>
</tr>
<tr>
<td rowspan=2>1</td>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
</tr>
<tr>
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>
</tr>
</table></html>

In Firefox, it looks:在 Firefox 中,它看起来:

在此处输入图片说明

Loading the data to LibreOffice Calc is easy (should be easy in Excel too):将数据加载到 LibreOffice Calc 很容易(在 Excel 中也应该很容易):

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM