I have a giant XML file that is exported from a device as a .xls file.
<?xml version='1.0'?>
<?mso-application progid='Excel.Sheet'?>
<s:Workbook xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:s="urn:schemas-microsoft-com:office:spreadsheet">
<s:Styles>
...
<s:Worksheet s:Name="Description">
...
<s:Worksheet s:Name="Data">
<s:Table s:DefaultColumnWidth="100">
<s:Row>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Time</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Temp1</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Temp2</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Liquid</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Response</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Base</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Events</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">Low</s:Data>
</s:Cell>
<s:Cell s:StyleID="Bold">
<s:Data s:Type="String">High</s:Data>
</s:Cell>
<s:Cell />
</s:Row>
...
<s:Row>
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">45</s:Data> # Time
</s:Cell>
# There is no Temp1 data
<s:Cell />
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">29.74</s:Data> # Temp2
</s:Cell>
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">12.11</s:Data> # Liquid
</s:Cell>
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">100</s:Data> # Response
</s:Cell>
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">30</s:Data> # Base
</s:Cell>
# There are no events in this data
<s:Cell />
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">0</s:Data> # Low
</s:Cell>
<s:Cell s:StyleID="Default">
<s:Data s:Type="Number">55</s:Data> # High
</s:Cell>
<s:Cell />
</s:Row>
What I am trying to do is extract information from the worksheet named "Data." There are 9 headers for the data, but I am only interested in the data that corresponds to "Time" and "Temp2", which would be "45" and "29.74", respectively.
I have managed to figure out how to navigate the file using:
import xml.etree.ElementTree as ET
tree = ET.parse('xmlfile')
root = tree.getroot()
ns = {'x':'urn:schemas-microsoft-com:office:excel',
'o':'urn:schemas-microsoft-com:office:office',
's':'urn:schemas-microsoft-com:office:spreadsheet'}
root.findall('./s:Worksheet/s:Table/s:Row/s:Cell/s:Data', namespaces=ns)
The closest I have gotten to getting the data out of the cells is using an example I found in another post, and trying variations of the following:
for elem in xmlTree.iter():
if elem.text != None:
print(elem.text)
This outputs everything (all 18901 rows of data), and I do not really know how to proceed from here. Ultimately what I would like to do is to store this data in a data frame or something equivalent so that I may plot it.
This may be a naive suggestion, but have you tried simply using Pandas (after installing the package, of course)?
import pandas
df = pandas.read_excel(excel_file)
# ... analyze and plot from the DataFrame
(This could have been a comment, but I'm not allowed to comment yet...)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.