Python读取SAS生成的XML类型.xls文件

Question

I am trying to extract tabs from hundreds of SAS generated .xls files. 我正在尝试从数百个SAS生成的.xls文件中提取选项卡。 I tried the following approach without luck. 我没有运气就尝试了以下方法。 My version of xlrd is 0.9.2. 我的xlrd版本是0.9.2。

import xlrd 
book = xlrd.open_workbook('out_1.xls')

The error message is: 错误消息是：

Traceback (most recent call last):[Finished in 0.2s with exit code 1]
  File "I:\Dropbox\Sas data\sacwin\test.py", line 3, in <module>
    book = xlrd.open_workbook('out_1.xls') # Open an .xls file
  File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Python27\lib\site-packages\xlrd\book.py", line 91, in open_workbook_xls
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "C:\Python27\lib\site-packages\xlrd\book.py", line 1258, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "C:\Python27\lib\site-packages\xlrd\book.py", line 1252, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'

Once I opened the .xls file in an editor the header looks like: 在编辑器中打开.xls文件后，标题如下：

<?xml version="1.0" encoding="windows-1252"?>

<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:x="urn:schemas-microsoft-com:office:excel"
          xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office">

Would you mind giving me some suggestions on how to parse these files? 您介意给我一些有关如何解析这些文件的建议吗？ Thanks! 谢谢！

Answer 1

I'm looking for a solution to this problem as well. 我也在寻找解决这个问题的方法。 I can tell you that the file format is xml but pre-dates Excel 2007 'Office Open XML (ECMA-376)' format (I think it's SpreadsheetML), so it's not supported by xlrd. 我可以告诉您，文件格式是xml，但早于Excel 2007'Office Open XML（ECMA-376）'格式（我认为是SpreadsheetML），因此xlrd不支持该格式。

If there's no python library available and you have good prior knowledge of the structure of the files you need to process I'd just use an xml reader. 如果没有可用的python库，并且您对要处理的文件结构有很好的先验知识，那么我只会使用xml阅读器。

Regards Dave 问候戴夫

Python读取SAS生成的XML类型.xls文件

问题描述

1 个解决方案

解决方案1
1 2014-01-02 06:20:51

Python读取SAS生成的XML类型.xls文件

问题描述

1 个解决方案

解决方案1 1 2014-01-02 06:20:51

解决方案1
1 2014-01-02 06:20:51