简体   繁体   中英

extract specific cell data from XML .xls file with python

I have a giant XML file that is exported from a device as a .xls file.

            <?xml version='1.0'?>
        <?mso-application progid='Excel.Sheet'?>
        <s:Workbook xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:s="urn:schemas-microsoft-com:office:spreadsheet">
          <s:Styles>
            ...
          <s:Worksheet s:Name="Description">
       ...
    <s:Worksheet s:Name="Data">
        <s:Table s:DefaultColumnWidth="100">
          <s:Row>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Time</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Temp1</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Temp2</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Liquid</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Response</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Base</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Events</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">Low</s:Data>
            </s:Cell>
            <s:Cell s:StyleID="Bold">
              <s:Data s:Type="String">High</s:Data>
            </s:Cell>
            <s:Cell />
          </s:Row>
       ...
      <s:Row>
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">45</s:Data> # Time
        </s:Cell>
          # There is no Temp1 data
        <s:Cell />
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">29.74</s:Data> # Temp2
        </s:Cell>
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">12.11</s:Data> # Liquid
        </s:Cell>
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">100</s:Data> # Response
        </s:Cell>
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">30</s:Data> # Base
        </s:Cell>
          # There are no events in this data
        <s:Cell />
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">0</s:Data> # Low
        </s:Cell>
        <s:Cell s:StyleID="Default">
          <s:Data s:Type="Number">55</s:Data> # High
        </s:Cell>
        <s:Cell />
      </s:Row>

What I am trying to do is extract information from the worksheet named "Data." There are 9 headers for the data, but I am only interested in the data that corresponds to "Time" and "Temp2", which would be "45" and "29.74", respectively.

I have managed to figure out how to navigate the file using:

import xml.etree.ElementTree as ET

tree = ET.parse('xmlfile')
root = tree.getroot()

ns = {'x':'urn:schemas-microsoft-com:office:excel',
              'o':'urn:schemas-microsoft-com:office:office',
              's':'urn:schemas-microsoft-com:office:spreadsheet'}

root.findall('./s:Worksheet/s:Table/s:Row/s:Cell/s:Data', namespaces=ns)

The closest I have gotten to getting the data out of the cells is using an example I found in another post, and trying variations of the following:

for elem in xmlTree.iter():
    if elem.text != None:
        print(elem.text)

This outputs everything (all 18901 rows of data), and I do not really know how to proceed from here. Ultimately what I would like to do is to store this data in a data frame or something equivalent so that I may plot it.

This may be a naive suggestion, but have you tried simply using Pandas (after installing the package, of course)?

import pandas
df = pandas.read_excel(excel_file)

# ... analyze and plot from the DataFrame

(This could have been a comment, but I'm not allowed to comment yet...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM